• Questions
• # Top 'N' Analysis

Thanks for the question, arunima.

Asked: February 22, 2002 - 11:59 am UTC

Last updated: February 25, 2012 - 11:56 pm UTC

Version: 8.1.7

Viewed 10K+ times! This question is

Tom,

Hope you had a good vacation. Can you please help me understand what does it mean to Perform Top 'N' Analysis. I see this topic under Creating Views in the sql cerfication exam.

Thanks a lot.

## and Tom said...

it in general refers to getting the top-n rows from a result set.

Eg: find the top 3 employees ranked by salary:

select *
from ( select ename from emp order by sal )
where rownum <= 3;

thats a top-n type query.

## Rating

(35 ratings)

Is this answer out of date? If it is, please let us know via a Comment

### Top 'N' query

arunima, February 22, 2002 - 6:38 pm UTC

Thanks Tom.

Shane Hu, March 20, 2002 - 1:50 pm UTC

If the info for the employee with highest salary is needed and the table is huge, with millions rows, will it be expensive to do the top-1 analysis? Is there an alternative? Will it be useful to create an index on the "order by" column? Thanks,

March 20, 2002 - 2:56 pm UTC

Yes, if there were an index on SAL (and I just noticed the query should be:

select *
from ( select ename from emp order by sal DESC )
where rownum <= 3

of course) and SAL was non-nullable (or a predicate that precluded NULL sal's was added), we would read the index "backwards" to find the top 3 and stop. The optimizer is "smart" about this.

### a followup question

Shane Hu, March 28, 2002 - 2:57 pm UTC

How can I retrive data for those whose sal is between the 4th highest and the 6th highest? I notice that the PL/SQL manual states that the only meaningful
uses of ROWNUM in a WHERE clause are
... WHERE ROWNUM < constant;
... WHERE ROWNUM <= constant;
Thus, I cannot do where rownum between 4 and 6.

My true intention is to skip top N records and retrive data from the middle. It would be more efficient by not retriving the top N records. Do you have the trick to do the skip?

Thanks,

Shane

### chen sun HK

chen sun, July 16, 2002 - 9:44 pm UTC

Hi Tom,
How can do the following requirements in the oracle discover?
Dimension1- by princial, channel, agent account
Dimension2- for current month, YTD, current month last year, YTD last year
Dimension3- Top 50 +ve incurred movement,Top -ve incurred movement, others
Dimension4- new claim, old claim, all
and every dimension should display the following data :
1. gross reserve b/f
2. gross paid
3. net paid
4. gross incurred movement
5. net incurred movement

It is difficult for me to display the Top 50 +ve incurred movement and Top 50 -ve movement when I choose diff dimensions.

Thanks...

### thanks Tom

hazim, May 22, 2003 - 12:38 am UTC

```Hi Tom
I have tow PL/SQL blocks, as I thought they must be the same but the returned results are different:

----------Block 1----
declare
sort_type integer :=1;
mi_id integer :=0;
n number := 100;
m number := 70;
k number := 30;
begin
execute immediate 'alter session enable parallel dml';
execute immediate 'alter table temp nologging';
insert /*append*/ into temp
where src_as = 0
order by sum_p desc
) b
where rownum<=n
)
order by sum_p desc
) a
where rownum <=n*m
);
end;
/
-----------------------------------------------------------------------
------Block 2-----------------------
declare
sort_type integer :=1;
mi_id integer :=0;
n number := 100;
m number := 70;
k number := 30;
begin
execute immediate 'alter session enable parallel dml';
execute immediate 'alter table top_out_mi modify partition top_out_mi_p nologging';
insert /*append*/ into top_out_mi
sum(dpkts+doctets) sum_po from all_ip
where src_as = 0
order by sum_p desc
) b
where rownum<=n )
order by sum_p desc
) a
where rownum <=n*m
);
commit;
end;
/
--------------------------------------------------------------------------
The structure and content of two blocks are the same, only different in inserted columns. My desired result is that :
- I want to find couples (srcaddr,dstaddr) that rank top-n*m in sum_p and srcaddr must be in the set of srcaddr(s) that rank top-n in sum_p ( as sub-query named b)
and my question are :
1. are above blocks correct to get my target?Please guide me if there're better ways to do that.
2. as I mentioned at the top, two blocks insert data to temp and top_out_mi, as I thought the (srcaddr,dstaddr) in two table must the same but following query return contrary thing :
SQL> select count(*) from top_out_mi;

COUNT(*)
----------
7000

SQL> select count(*) from temp;

COUNT(*)
----------
7000

COUNT(*)
----------
253 (as I thought this must be zero).
Thanks very much
Kind Regards.
```

### Prune data not in top-n

JHT, July 28, 2004 - 12:15 pm UTC

Thanks again Tom for your great solutions to all type of issues raised on this website.

I have a table with history data of, say customer's orders. We want to keep only the oldest 100 rows of the latest orders (by order date) for each customer (100K customers).

For other history tables, I have just used range partitioning on date fields and it has worked great in keeping a rolling history of data.

I was wondering what was the most efficient way to delete data NOT in the top-N rows for a particular customer. I would be using this delete method to periodically prune old data. If possible, I would like to use partitioning for data pruning, but I can't think of a way to make it work here.

July 28, 2004 - 1:30 pm UTC

i can only see partitioning if the process that adds new rows -- updates the existing rows. as you added a new row, it would have to "age" the old data - in order to slide the oldest row into the other partition over time.

otherwise

delete from t
where rowid in ( select rowid
from (select rowid, row_number() over (partition by customer order by date ) rn from t )
where rn > 100 );

### complexity for top N ...

Giovanni Azua, November 12, 2004 - 6:38 am UTC

Hello Thomas,

Because of some really performance critical
application I would like to know exactly how
to ensure to get top N without having to order
the whole subquery set but instead get the N
top maximums which would be linear in complexity:

N * (log N') where N' is constant (our top N)
hence is N*C where C is a constant.

N * (log N) what I suppose is Oracle doing when
it is asked to order the complete subquery e.g.

select *
from (select theval from mytable order by theval)
where rownum <= 5

This one is costing N*log(N) and then picks up
the top 5 ones.

Best Regards,
Giovanni

November 12, 2004 - 7:16 am UTC

unless you are able to use an index to access the data -- you'll have to get it all.

there is a top-n optimization (demoed in effective Oracle by design if you are interested) such that when you ask for the top 10, we don't sort the entire result set -- rather we start running the query and as the rows flow out of the query, we are just keeping the top ten, so we don't "sort" really.

### RE: complexity for top N ...

Giovanni Azua, December 07, 2004 - 9:26 am UTC

Hello Thomas,

What if the value is not indexed i.e. computed dynamically
as in my case (no way around changing this). I was trying to figure out a SQL statement generated dynamically (see bellow) that had many subqueries "each time selecting the max excluding the previous max" (as many as M of "top M"), something like:

SELECT score(x) as "top M"
, "top M - 1"
, "top M - 2"
...
, "top 1"
FROM ( SELECT max(x) as "top M -1"
, "top M - 2"
...
, "top 1"
FROM (...)
WHERE x not in ("top M - 2", ..., "top 1")
)
WHERE x not in
("top M - 1", "top M - 2", ..., "top 1")

Theoretically the complexity is M*N plus some overhead
still C*N would go much faster than N log N, dont know really how Oracle would implement this, whether the subquery layering overhead outbalance avoiding the sort.

But unfortunately this idea above is not feasible because I do not want only the x (an score) but also those other column values related to it, which means I would only have the choice of using Analytic functions e.g.

first_value(<colum>) as many times as columns I have and the
very bad news having to re-order for each layer. I guess
there is no optimal Oracle SQL solution to this problem, right?

btw specifially on which chapter of "Effective Oracle by Design" do you make reference to a solution to this problem?

Best Regards,
Giovanni

December 07, 2004 - 10:31 am UTC

sorry, did not follow this at all. I don't see what the index/not index status of a column would have to do with anything (except runtime performance)?

### To Giovanni

Menon, December 07, 2004 - 10:49 am UTC

"btw specifially on which chapter of "Effective Oracle by Design" do you make
reference to a solution to this problem?"

I believe the top-n optimization Tom refers to
is on page 502 of effective oracle by design.

### looks like C*N being the actual complexity :-)

A reader, December 07, 2004 - 12:16 pm UTC

```Hello Menon,

Thank you for the hint and I appologize for my lazzyness
looking for the chapter.

is already taking care of the complexity. I did not know
before the description "SORT ORDER BY STOPKEY" which seems to be that oracle is not sorting the whole set just the N top, is that correct?

I verified it by executing my query and getting the explain, exactly the good result as recommended by Thomas, see bellow, it catches the bad guy in around 400 milliseconds permanently, not bad ...

SQL> select /*+ NOPARALLEL(ft1) */ to_char(ft1.entity_id) as id
2  , to_char(round(ft1.score)) as score
3  , ft1.bestmatch
4  , ft1.name_last    as lastname
5  , ft1.name_first   as firstname
6  , ft1.category
7  , ft1.entry_type   as entrytype
8  , to_char(ft1.country_id) as country_id
9  , ft1.ctry_name_english as country
10  , ft1.list_abbreviations as source
12  from (SELECT /*+ NOPARALLEL(t1) NOPARALLEL(t2) FIRST_ROWS DOMAIN_INDEX_NO_SORT */
13         t2.entity_id
14       , t2.real_name_first as name_first
15       , t2.real_name_last  as name_last
16       , t2.bestmatch
17       , t2.enty_category   as category
18        , t2.enty_country_id as country_id
19        , t2.enty_entry_type as entry_type
20          , t2.ctry_name_english
22          , t2.list_abbreviations
23       , round(( greatest(score_1*decode(occurs_last_flag_1, 1, 10.0, 0)
24  /greatest(1, CASE WHEN c_name_last_wc >= 2 THEN c_name_last_wc ELSE 2 - c_name_first_wc END , c_name_last_wc)
25  , score_1*decode(occurs_first_flag_1, 1, 4.0, 0)
26  /greatest(1, CASE WHEN c_name_full_wc > 2 THEN c_name_first_wc ELSE 2 - c_name_last_wc END, c_name_first_wc))  +  greatest(score_2*decode(occurs_last_flag_2, 1, 10.0, 0)
27  /greatest(1, CASE WHEN c_name_last_wc >= 2 THEN c_name_last_wc ELSE 2 - c_name_first_wc END , c_name_last_wc)
28  , score_2*decode(occurs_first_flag_2, 1, 4.0, 0)
29  /greatest(1, CASE WHEN c_name_full_wc > 2 THEN c_name_first_wc ELSE 2 - c_name_last_wc END, c_name_first_wc)) )/GREATEST(DECODE(c_name_last, NULL, 0, 10.0) + DECODE(c_name_first, NULL, 0, 4.0), 1)) as score
30   FROM (
31               SELECT /*+ NOPARALLEL(t1) NOPARALLEL(t2) FIRST_ROWS DOMAIN_INDEX_NO_SORT */
32                      name_id  , max(decode( token_id, 1, score           ,  0)) score_1
33   , max(decode( token_id, 1, occurs_last_flag,  0)) occurs_last_flag_1
34   , max(decode( token_id, 1, decode(occurs_last_flag, 1, 0, occurs_first_flag), 0)) occurs_first_flag_1
35   , max(decode( token_id, 2, score           ,  0)) score_2
36   , max(decode( token_id, 2, occurs_last_flag,  0)) occurs_last_flag_2
37   , max(decode( token_id, 2, decode(occurs_last_flag, 2, 0, occurs_first_flag), 0)) occurs_first_flag_2
38            FROM (
39                  SELECT term_id
40                       , max(score)    as score
41                       , max(token_id) as token_id
42                  FROM ( select /*+ NOPARALLEL(T_NLM_TERM) FIRST_ROWS DOMAIN_INDEX_NO_SORT */
43          term_id
44        , GREATEST(LEAST(score(1)/((CASE WHEN 155524/22 <= 215 THEN 30.0 + 0.5 ELSE 30.0*(1-0.2184*LOG(10, (155524/22)/215.19)) + 0.5 END)*(1+log(10, 155524/22)))*100, 98)
45        , CASE WHEN TERM='OSAMA' THEN 100 ELSE 0 END) as score
46        , 1 as token_id
47   from T_NLM_TERM
48   where contains(term, 'fuzzy({OSAMA},20,20,weight)', 1) > 0
49   union all
50   select /*+ NOPARALLEL(T_NLM_TERM) FIRST_ROWS DOMAIN_INDEX_NO_SORT */
51          term_id
52        , GREATEST(LEAST(score(2)/((CASE WHEN 155524/20 <= 215 THEN 30.0 + 0.5 ELSE 30.0*(1-0.2184*LOG(10, (155524/20)/215.19)) + 0.5 END)*(1+log(10, 155524/20)))*100, 98)
53        , CASE WHEN TERM='BENLADEN' THEN 100 ELSE 0 END) as score
54        , 2 as token_id
55   from T_NLM_TERM
56   where contains(term, 'fuzzy({BENLADEN},20,20,weight)', 2) > 0
57  ) t1
58                  GROUP BY term_id
59                ) t1
60              , T_NLM_SEARCHTERM_REL t2
61           WHERE t1.term_id=t2.term_id
62           GROUP BY name_id
63        ) t1
64      , (
65           SELECT /*+ NOPARALLEL(t2) */ t2.*
66           FROM   T_NLM_SEARCH t2
67        ) t2
68  WHERE t1.name_id=t2.name_id
69   and rownum <= 5
70  ORDER BY 11 desc, 4 asc ) ft1
71  /

Elapsed: 00:00:01.04

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=21 Card=5 Bytes=4580)
1    0   VIEW (Cost=21 Card=5 Bytes=4580)
2    1     SORT (ORDER BY) (Cost=21 Card=5 Bytes=915)
3    2       COUNT (STOPKEY)
4    3         NESTED LOOPS (Cost=19 Card=9 Bytes=1647)
5    4           VIEW (Cost=10 Card=9 Bytes=747)
6    5             SORT (GROUP BY) (Cost=10 Card=9 Bytes=585)
7    6               NESTED LOOPS (Cost=8 Card=9 Bytes=585)
8    7                 VIEW (Cost=7 Card=2 Bytes=58)
9    8                   SORT (GROUP BY) (Cost=7 Card=2 Bytes=58)
10    9                     VIEW (Cost=5 Card=2 Bytes=58)
11   10                       UNION-ALL
12   11                         TABLE ACCESS (BY INDEX ROWID) OF 'T_NLM_TERM' (Cost=2 Card=1 Bytes=21)
13   12                           DOMAIN INDEX OF 'IDXC_KDM_TERM' (Cost=0 Card=1)
14   11                         TABLE ACCESS (BY INDEX ROWID) OF 'T_NLM_TERM' (Cost=2 Card=1 Bytes=21)
15   14                           DOMAIN INDEX OF 'IDXC_KDM_TERM' (Cost=0 Card=1)
16    7                 INDEX (RANGE SCAN) OF 'T_NLM_SEARCHTERM_REL_PK' (UNIQUE) (Cost=2 Card=5 Bytes=180)
17    4           TABLE ACCESS (BY INDEX ROWID) OF 'T_NLM_SEARCH' (Cost=1 Card=1 Bytes=100)
18   17             INDEX (UNIQUE SCAN) OF 'PK_NLM_SEARCH' (UNIQUE)

Statistics
----------------------------------------------------------
583  recursive calls
0  db block gets
1118  consistent gets
0  redo size
852  bytes sent via SQL*Net to client
1850  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
12  sorts (memory)
0  sorts (disk)
5  rows processed
SQL> spool off

Best Regards,
Giovanni
```

### sorry put the rownum <= 5 in the wrong place ... :-(

A reader, December 07, 2004 - 12:25 pm UTC

```SQL> select /*+ NOPARALLEL(ft1) */ to_char(ft1.entity_id) as id
2  , to_char(round(ft1.score)) as score
3  , ft1.bestmatch
4  , ft1.name_last    as lastname
5  , ft1.name_first   as firstname
6  , ft1.category
7  , ft1.entry_type   as entrytype
8  , to_char(ft1.country_id) as country_id
9  , ft1.ctry_name_english as country
10  , ft1.list_abbreviations as source
12  from (SELECT /*+ NOPARALLEL(t1) NOPARALLEL(t2) FIRST_ROWS DOMAIN_INDEX_NO_SORT */
13         t2.entity_id
14       , t2.real_name_first as name_first
15       , t2.real_name_last  as name_last
16       , t2.bestmatch
17       , t2.enty_category   as category
18        , t2.enty_country_id as country_id
19        , t2.enty_entry_type as entry_type
20          , t2.ctry_name_english
22          , t2.list_abbreviations
23       , round(( greatest(score_1*decode(occurs_last_flag_1, 1, 10.0, 0)
24  /greatest(1, CASE WHEN c_name_last_wc >= 2 THEN c_name_last_wc ELSE 2 - c_name_first_wc END , c_name_last_wc)
25  , score_1*decode(occurs_first_flag_1, 1, 4.0, 0)
26  /greatest(1, CASE WHEN c_name_full_wc > 2 THEN c_name_first_wc ELSE 2 - c_name_last_wc END, c_name_first_wc))  +  greatest(score_2*decode(occurs_last_flag_2, 1, 10.0, 0)
27  /greatest(1, CASE WHEN c_name_last_wc >= 2 THEN c_name_last_wc ELSE 2 - c_name_first_wc END , c_name_last_wc)
28  , score_2*decode(occurs_first_flag_2, 1, 4.0, 0)
29  /greatest(1, CASE WHEN c_name_full_wc > 2 THEN c_name_first_wc ELSE 2 - c_name_last_wc END, c_name_first_wc)) )/GREATEST(DECODE(c_name_last, NULL, 0, 10.0) + DECODE(c_name_first, NULL, 0, 4.0), 1)) as score
30   FROM (
31               SELECT /*+ NOPARALLEL(t1) NOPARALLEL(t2) FIRST_ROWS DOMAIN_INDEX_NO_SORT */
32                      name_id  , max(decode( token_id, 1, score           ,  0)) score_1
33   , max(decode( token_id, 1, occurs_last_flag,  0)) occurs_last_flag_1
34   , max(decode( token_id, 1, decode(occurs_last_flag, 1, 0, occurs_first_flag), 0)) occurs_first_flag_1
35   , max(decode( token_id, 2, score           ,  0)) score_2
36   , max(decode( token_id, 2, occurs_last_flag,  0)) occurs_last_flag_2
37   , max(decode( token_id, 2, decode(occurs_last_flag, 2, 0, occurs_first_flag), 0)) occurs_first_flag_2
38            FROM (
39                  SELECT term_id
40                       , max(score)    as score
41                       , max(token_id) as token_id
42                  FROM ( select /*+ NOPARALLEL(T_NLM_TERM) FIRST_ROWS DOMAIN_INDEX_NO_SORT */
43          term_id
44        , GREATEST(LEAST(score(1)/((CASE WHEN 155524/22 <= 215 THEN 30.0 + 0.5 ELSE 30.0*(1-0.2184*LOG(10, (155524/22)/215.19)) + 0.5 END)*(1+log(10, 155524/22)))*100, 98)
45        , CASE WHEN TERM='OSAMA' THEN 100 ELSE 0 END) as score
46        , 1 as token_id
47   from T_NLM_TERM
48   where contains(term, 'fuzzy({OSAMA},20,20,weight)', 1) > 0
49   union all
50   select /*+ NOPARALLEL(T_NLM_TERM) FIRST_ROWS DOMAIN_INDEX_NO_SORT */
51          term_id
52        , GREATEST(LEAST(score(2)/((CASE WHEN 155524/20 <= 215 THEN 30.0 + 0.5 ELSE 30.0*(1-0.2184*LOG(10, (155524/20)/215.19)) + 0.5 END)*(1+log(10, 155524/20)))*100, 98)
53        , CASE WHEN TERM='BENLADEN' THEN 100 ELSE 0 END) as score
54        , 2 as token_id
55   from T_NLM_TERM
56   where contains(term, 'fuzzy({BENLADEN},20,20,weight)', 2) > 0
57  ) t1
58                  GROUP BY term_id
59                ) t1
60              , T_NLM_SEARCHTERM_REL t2
61           WHERE t1.term_id=t2.term_id
62           GROUP BY name_id
63        ) t1
64      , (
65           SELECT /*+ NOPARALLEL(t2) */ t2.*
66           FROM   T_NLM_SEARCH t2
67        ) t2
68  WHERE t1.name_id=t2.name_id
69  ORDER BY 11 desc, 4 asc ) ft1
70  where rownum <= 5
71  /

Elapsed: 00:00:01.02

Execution Plan
----------------------------------------------------------
0      SELECT STATEMENT Optimizer=HINT: FIRST_ROWS (Cost=21 Card=5
Bytes=4580)

1    0   COUNT (STOPKEY)
2    1     VIEW (Cost=21 Card=9 Bytes=8244)
3    2       SORT (ORDER BY STOPKEY) (Cost=21 Card=9 Bytes=1647)
4    3         NESTED LOOPS (Cost=19 Card=9 Bytes=1647)
5    4           VIEW (Cost=10 Card=9 Bytes=747)
6    5             SORT (GROUP BY) (Cost=10 Card=9 Bytes=585)
7    6               NESTED LOOPS (Cost=8 Card=9 Bytes=585)
8    7                 VIEW (Cost=7 Card=2 Bytes=58)
9    8                   SORT (GROUP BY) (Cost=7 Card=2 Bytes=58)
10    9                     VIEW (Cost=5 Card=2 Bytes=58)
11   10                       UNION-ALL
12   11                         TABLE ACCESS (BY INDEX ROWID) OF 'T_NLM_TERM' (Cost=2 Card=1 Bytes=21)
13   12                           DOMAIN INDEX OF 'IDXC_KDM_TERM' (Cost=0 Card=1)
14   11                         TABLE ACCESS (BY INDEX ROWID) OF 'T_NLM_TERM' (Cost=2 Card=1 Bytes=21)
15   14                           DOMAIN INDEX OF 'IDXC_KDM_TERM' (Cost=0 Card=1)
16    7                 INDEX (RANGE SCAN) OF 'T_NLM_SEARCHTERM_REL_PK' (UNIQUE) (Cost=2 Card=5 Bytes=180)
17    4           TABLE ACCESS (BY INDEX ROWID) OF 'T_NLM_SEARCH' (Cost=1 Card=1 Bytes=100)
18   17             INDEX (UNIQUE SCAN) OF 'PK_NLM_SEARCH' (UNIQUE)

Statistics
----------------------------------------------------------
531  recursive calls
0  db block gets
1606  consistent gets
0  redo size
899  bytes sent via SQL*Net to client
1851  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
12  sorts (memory)
0  sorts (disk)
5  rows processed
SQL> spool off

There I have it, the "SORT (ORDER BY STOPKEY)".

Best Regards,
Giovanni
```

### OK

James, February 23, 2005 - 12:20 pm UTC

Hi Tom,
I need a query which returns top two salary getting employees
for each deptno.I mean the output to be in the format of

deptno ename
------ ------
10 KING
10 ...
20 ...
20 ..
30 ...
30 ...

I use Oracle 8.0.3 and I am not able to use analytic functions.
Could you please provide the query for this???

February 24, 2005 - 4:50 am UTC

scott@ORA806> select deptno, empno, sal
2 from emp a
3 where 2 >= (select count(distinct sal)
4 from emp b
5 where b.deptno = a.deptno
6 and b.sal >= a.sal)
7 order by deptno, sal desc
8 /

DEPTNO EMPNO SAL
---------- ---------- ----------
10 7839 5000
10 7782 2450
20 7788 3000
20 7902 3000
20 7566 2975
30 7698 2850
30 7499 1600

7 rows selected.

beware of your question however...... I return the set of people making the top two salaries in each deptno.... I feel that is the most appropriate result set.

What if 3 people in deptno=20 made 3,000 -- what would the answer be then?

### Thanks much

A stressed student, March 29, 2005 - 11:33 am UTC

Thank you, I was looking for this information without even knowing how it was called to "Select first N rows in a table" :) And this is the first place where i found it.

### Top 3

Priya, April 14, 2008 - 5:13 pm UTC

Hi Tom,

I am trying to find out the top 3 deals based on the sqft.

Eg : For one voucher there could be multiple rows because of the data - say 10 rows

so I have used distinct in my query so it returns just one row. Till now ok.

When I try to use
`row_number()  over (partition by f.CB_L08DESCR  order by nvl(b.building_size,0) desc) rn,`

I get 10 rows of data with rn 1,2,3,4 and so on..whereas I expect only one row with rn =1 since I have used distinct

Why is that? Would this method of finding top N not work with distinct?

Thanks,
Priya.
April 16, 2008 - 2:10 pm UTC

umm, distinct sort of happens at the end, you know, after all functions have run (would not make sense to distinct and then compute stuff - think about it)

you give entirely insufficient data to help you though - I have no clue what "sqft" is. with F is, what B is, what anything is.

### Explain Query Execution

Baiju_P_M, July 04, 2008 - 1:38 am UTC

Sir,

Can you please explain step by step process of execution of the query.

select last_name,salary from employees e where (select count(*) from employees where e.salary< salary) < 6

I am not able to make out the query is being executed.

July 07, 2008 - 11:01 am UTC

this query:

```select e.last_name, e.salary
from employees e
where (select count(*)
from employees e2
where e.salary< e2.salary) < 6```

is performed sort of like this:

```for x in ( select last_name, salary from employees )
loop
select count(*) into TMP from employees where X.SALARY < employees.salary;
if ( tmp < 6 )
then
output record;
end if;
end;
```

in english:

for each employee record in employees

count the number of employee records that make MORE than the current record

if that number is less then six, output that record.

### Explain Query Execution

Baiju_P_M, July 17, 2008 - 3:52 am UTC

Sir,

Thanks a lot.

Jagan, June 03, 2009 - 3:38 pm UTC

when we get top rows and join with another table why does the query performance chnage significantly?

the following takes less than a second

(select * from (
select * from trt_weekly_pricing twp1
where twp1.weekly_start_date between sysdate-365 and sysdate
order by average_price desc,weekly_start_date desc
)
where rownum < 10)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.02 0.02 0 0 0 0
Fetch 2 0.00 0.00 0 44 0 9
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.02 0.02 0 44 0 9

Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 215 (PHOENIX4)

Rows Row Source Operation
------- ---------------------------------------------------
9 COUNT STOPKEY (cr=44 pr=0 pw=0 time=377 us)
9 VIEW (cr=44 pr=0 pw=0 time=365 us)
9 FILTER (cr=44 pr=0 pw=0 time=355 us)
9 TABLE ACCESS BY INDEX ROWID TRT_WEEKLY_PRICING (cr=44 pr=0 pw=0 time=330 us)
9 INDEX FULL SCAN DESCENDING IDX_TWP_PRICE_START (cr=6 pr=0 pw=0 time=1571 us)(object id 174219)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT MODE: FIRST_ROWS
9 COUNT (STOPKEY)
9 VIEW
9 FILTER
9 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF
'TRT_WEEKLY_PRICING' (TABLE)
9 INDEX MODE: ANALYZED (FULL SCAN DESCENDING) OF
'IDX_TWP_PRICE_START' (INDEX)

when i make this query an inline view and join with another smaller table this takes nearly a minute.

select * from
(select * from (
select * from trt_weekly_pricing twp1
where twp1.weekly_start_date between sysdate-:"SYS_B_0" and sysdate
order by average_price desc,weekly_start_date desc
)
where rownum < 10) twp,
trt_weather tw
where twp.trt_destination_airport_code = tw.trt_airport_code

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.02 0.02 0 0 0 0
Fetch 2 28.32 58.13 248 116361 5 108
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 28.35 58.16 248 116361 5 108

Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 215 (PHOENIX4)

Rows Row Source Operation
------- ---------------------------------------------------
108 HASH JOIN (cr=116361 pr=248 pw=20349 time=58130046 us)
9 VIEW (cr=116279 pr=248 pw=20349 time=58126831 us)
9 COUNT STOPKEY (cr=116279 pr=248 pw=20349 time=58126807 us)
9 VIEW (cr=116279 pr=248 pw=20349 time=58126779 us)
9 FILTER (cr=116279 pr=248 pw=20349 time=58126741 us)
9 SORT ORDER BY (cr=116279 pr=248 pw=20349 time=58126697 us)
481539 TABLE ACCESS FULL TRT_WEEKLY_PRICING (cr=116279 pr=0 pw=0 time=21917693 us)
1944 TABLE ACCESS FULL TRT_WEATHER (cr=82 pr=0 pw=0 time=2259 us)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT MODE: FIRST_ROWS
108 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF 'TRT_WEATHER'
(TABLE)
9 NESTED LOOPS
9 VIEW
9 COUNT (STOPKEY)
9 VIEW
9 FILTER
481539 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF
'TRT_WEEKLY_PRICING' (TABLE)
1944 INDEX MODE: ANALYZED (FULL SCAN DESCENDING) OF
'IDX_TWP_PRICE_START' (INDEX)
0 INDEX MODE: ANALYZED (RANGE SCAN) OF 'PK_TRT_WEATHER'
(INDEX (UNIQUE))

June 04, 2009 - 2:35 pm UTC

I do not trust your displayed information, you are using cursor sharing = force or similar apparently, but the query above doesn't properly reflect that.

you've edited something, the query would look like this:

select * from (select * from ( select * from trt_weekly_pricing twp1 where
twp1.weekly_start_date between sysdate-:"SYS_B_0" and sysdate order by
average_price desc,weekly_start_date desc ) where rownum < :"SYS_B_1") twp,
trt_weather tw where twp.trt_destination_airport_code = tw.trt_airport_code

So, I want (as usual) A FULL TEST CASE - starting with create table and indexes and inserts (if not inserts, then appropriate dbms_stats.set_table/index_stats) to understand your schema and reproduce with

Jagan, June 07, 2009 - 2:03 am UTC

Here are the scripts.

CREATE TABLE TRT_WEEKLY_PRICING
(
TRT_ORIGIN_AIRPORT_CODE CHAR(3 BYTE) NOT NULL,
TRT_DESTINATION_AIRPORT_CODE CHAR(3 BYTE) NOT NULL,
WEEKLY_START_DATE DATE NOT NULL,
PGOOD_CODE CHAR(1 BYTE) NOT NULL,
AVERAGE_PRICE NUMBER(10,2),
CURRENCY_CODE CHAR(3 BYTE),
DATA_SOURCE_CODE CHAR(1 BYTE) NOT NULL,
CREATE_DATE DATE DEFAULT SYSDATE NOT NULL,
UPDATE_DATE DATE DEFAULT SYSDATE NOT NULL,
CREATE_NAME VARCHAR2(30 BYTE) DEFAULT user NOT NULL,
UPDATE_NAME VARCHAR2(30 BYTE) DEFAULT user NOT NULL,
YOY_CHANGE NUMBER(10,2),
AVG_YEARLY_PRICE NUMBER(10,2)
);

CREATE UNIQUE INDEX PK_TRT_WEEKLY_PRICING ON TRT_WEEKLY_PRICING
(TRT_ORIGIN_AIRPORT_CODE, TRT_DESTINATION_AIRPORT_CODE, WEEKLY_START_DATE, PGOOD_CODE, DATA_SOURCE_CODE);

CREATE INDEX IDX_TWP_START_DATE ON TRT_WEEKLY_PRICING
(WEEKLY_START_DATE);

CREATE INDEX IDX_AVERAGE_PRICE ON TRT_WEEKLY_PRICING
(AVERAGE_PRICE);

CREATE INDEX IDX_TWP_PRICE_START ON TRT_WEEKLY_PRICING
(AVERAGE_PRICE DESC, WEEKLY_START_DATE DESC);

CREATE TABLE TRT_WEATHER
(
TRT_AIRPORT_CODE CHAR(3 BYTE) NOT NULL,
MONTH NUMBER(2),
MAX_AVG_TEMPERATURE_IN_F NUMBER(4,1),
MIN_AVG_TEMPERATURE_IN_F NUMBER(4,1),
AVG_PRECIPITATION_IN_INCH NUMBER(6,2),
CREATE_DATE DATE DEFAULT SYSDATE NOT NULL,
UPDATE_DATE DATE DEFAULT SYSDATE NOT NULL,
CREATE_NAME VARCHAR2(30 BYTE) DEFAULT user NOT NULL,
UPDATE_NAME VARCHAR2(30 BYTE) DEFAULT user NOT NULL
);

CREATE UNIQUE INDEX PK_TRT_WEATHER ON TRT_WEATHER
(TRT_AIRPORT_CODE, MONTH);

select * from (
select * from trt_weekly_pricing twp1
where twp1.weekly_start_date between sysdate-365 and sysdate
order by average_price desc,weekly_start_date desc
)
where rownum < 10

select * from
(select * from (
select * from trt_weekly_pricing twp1
where twp1.weekly_start_date between sysdate-365 and sysdate
order by average_price desc,weekly_start_date desc
)
where rownum < 10) twp,
trt_weather tw
where twp.trt_destination_airport_code = tw.trt_airport_code
AND TO_CHAR(twp.weekly_start_date,'MM') = tw.month(+)

exec dbms_stats.set_table_stats(ownname => 'PHOENIX4', tabname => 'TRT_WEEKLY_PRICING', numrows => 3054968, numblks => 113343, avgrlen => 77);

exec DBMS_STATS.SET_INDEX_STATS ( ownname =>'PHOENIX4', indname => 'PK_TRT_WEEKLY_PRICING', numrows => 3054968, numlblks => 84064);

exec DBMS_STATS.SET_INDEX_STATS ( ownname =>'PHOENIX4', indname => 'IDX_TWP_START_DATE', numrows => 3088367, numlblks => 35911);

exec DBMS_STATS.SET_INDEX_STATS ( ownname =>'PHOENIX4', indname => 'IDX_TWP_PRICE_START', numrows => 3054968, numlblks => 48466);

exec DBMS_STATS.SET_INDEX_STATS ( ownname =>'PHOENIX4', indname => 'IDX_AVERAGE_PRICE', numrows => 3037216, numlblks => 27590);

exec dbms_stats.set_table_stats(ownname => 'PHOENIX4', tabname => 'TRT_WEATHER', numrows => 1944, numblks => 60, avgrlen => 46);

exec DBMS_STATS.SET_INDEX_STATS ( ownname =>'PHOENIX4', indname => 'PK_TRT_WEATHER', numrows => 1944, numlblks => 21);

select * from (
select * from trt_weekly_pricing twp1
where twp1.weekly_start_date between sysdate-:"SYS_B_0" and sysdate
order by average_price desc,weekly_start_date desc
)
where rownum < :"SYS_B_1"

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.01 0.01 0 0 0 0
Fetch 2 0.00 0.00 0 16 0 9
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.02 0 16 0 9

Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 215 (PHOENIX4)

Rows Row Source Operation
------- ---------------------------------------------------
9 COUNT STOPKEY (cr=16 pr=0 pw=0 time=459 us)
9 VIEW (cr=16 pr=0 pw=0 time=444 us)
9 FILTER (cr=16 pr=0 pw=0 time=428 us)
9 TABLE ACCESS BY INDEX ROWID TRT_WEEKLY_PRICING (cr=16 pr=0 pw=0 time=399 us)
9 INDEX FULL SCAN IDX_TWP_PRICE_START (cr=7 pr=0 pw=0 time=1829 us)(object id 174468)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT MODE: FIRST_ROWS
9 COUNT (STOPKEY)
9 VIEW
9 FILTER
9 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF
'TRT_WEEKLY_PRICING' (TABLE)
9 INDEX MODE: ANALYZED (FULL SCAN) OF
'IDX_TWP_PRICE_START' (INDEX)

********************************************************************************

select * from
(select * from (
select * from trt_weekly_pricing twp1
where twp1.weekly_start_date between sysdate-:"SYS_B_0" and sysdate
order by average_price desc,weekly_start_date desc
)
where rownum < :"SYS_B_1") twp,
trt_weather tw
where twp.trt_destination_airport_code = tw.trt_airport_code

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.03 0.02 0 0 0 0
Fetch 9 34.24 89.55 114928 115008 7 108
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 11 34.27 89.58 114928 115008 7 108

Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 215 (PHOENIX4)

Rows Row Source Operation
------- ---------------------------------------------------
108 HASH JOIN (cr=115008 pr=114928 pw=20495 time=89544472 us)
9 VIEW (cr=114919 pr=114928 pw=20495 time=89541921 us)
9 COUNT STOPKEY (cr=114919 pr=114928 pw=20495 time=89541881 us)
9 VIEW (cr=114919 pr=114928 pw=20495 time=89541847 us)
9 FILTER (cr=114919 pr=114928 pw=20495 time=89541827 us)
9 SORT ORDER BY (cr=114919 pr=114928 pw=20495 time=89541768 us)
481539 TABLE ACCESS FULL TRT_WEEKLY_PRICING (cr=114919 pr=114568 pw=0 time=49466969 us)
1944 TABLE ACCESS FULL TRT_WEATHER (cr=89 pr=0 pw=0 time=2280 us)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT MODE: FIRST_ROWS
108 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF 'TRT_WEATHER'
(TABLE)
9 NESTED LOOPS
9 VIEW
9 COUNT (STOPKEY)
9 VIEW
9 FILTER
481539 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF
'TRT_WEEKLY_PRICING' (TABLE)
1944 INDEX MODE: ANALYZED (FULL SCAN) OF
'IDX_TWP_PRICE_START' (INDEX)
0 INDEX MODE: ANALYZED (RANGE SCAN) OF 'PK_TRT_WEATHER'
(INDEX (UNIQUE))

********************************************************************************

June 08, 2009 - 1:29 pm UTC

it'll go back to the plan - if you look at the estimated row counts:

```ops\$tkyte%ORA10GR2> select *
2    from ( select *
3             from trt_weekly_pricing twp1
4            where twp1.weekly_start_date between sysdate-365 and sysdate
5            order by  average_price desc,weekly_start_date desc
6          )
7   where rownum < 10
8  /

no rows selected

ops\$tkyte%ORA10GR2> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
SQL_ID  6mbfgm9jaf4gy, child number 0
-------------------------------------
select *   from ( select *            from trt_weekly_pricing twp1
twp1.weekly_start_date between sysdate-:"SYS_B_0" and sysdate
average_price desc,weekly_start_date desc         )  where rownum < :"SY

Plan hash value: 685608170

------------------------------------------------------------------------
| Id  | Operation                      | Name                | Rows  | B
------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                     |       |
|*  1 |  COUNT STOPKEY                 |                     |       |
|   2 |   VIEW                         |                     |    10 |
|*  3 |    FILTER                      |                     |       |
|   4 |     TABLE ACCESS BY INDEX ROWID| TRT_WEEKLY_PRICING  |  7637 |
|*  5 |      INDEX FULL SCAN           | IDX_TWP_PRICE_START |     1 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<:SYS_B_1)
3 - filter(SYSDATE@!-:SYS_B_0<=SYSDATE@!)
5 - access("TWP1"."SYS_NC00015\$">=SYS_OP_DESCEND(SYSDATE@!) AND
"TWP1"."SYS_NC00015\$"<=SYS_OP_DESCEND(SYSDATE@!-:SYS_B_0))
filter((SYS_OP_UNDESCEND("TWP1"."SYS_NC00015\$")<=SYSDATE@! AND
SYS_OP_UNDESCEND("TWP1"."SYS_NC00015\$")>=SYSDATE@!-:SYS_B_

29 rows selected.
```

in that case, it is guessing that ON AVERAGE it will have to find 7,637 rows in the table - filter them to find 9 that match the filter.

In YOUR SPECIFIC CASE - it just happens that the first nine it finds (sorted by average_price DESC) match that filter - in other words - all of your "expensive" stuff (stuff with the highest prices) happened just recently.

Think about what would happen if your most expensive stuff happened five years ago - how many records would we filter through for that to find the first nine? A lot.

so, when you gave it the other table to work with - it said "ah hah, i can do this better IN GENERAL"

since your highest priced data is also the most current - you can try this, but bear in mind, if your highest priced data starts becoming the oldest data - this will be a bad idea.

```ops\$tkyte%ORA10GR2> with twp
2  as
3  (
4  select /*+ MATERIALIZE */ *
5    from ( select *
6             from trt_weekly_pricing twp1
7            where twp1.weekly_start_date between sysdate-365 and sysdate
8            order by  average_price desc,weekly_start_date desc
9          )
10   where rownum < 10
11  )
12  select *
13    from twp, trt_weather tw
14  where twp.trt_destination_airport_code = tw.trt_airport_code
15  /

no rows selected

ops\$tkyte%ORA10GR2> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  9xr6tffkrncvk, child number 0
-------------------------------------
with twp as ( select /*+ MATERIALIZE */ *   from ( select *            from trt_weekly_pricing twp1
where twp1.weekly_start_date between sysdate-365 and sysdate           order by  average_pric
desc,weekly_start_date desc         )  where rownum < 10 ) select *   from twp, trt_weather tw where
twp.trt_destination_airport_code = tw.trt_airport_code

Plan hash value: 4006670865

----------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                       | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                            |       |       |    75 (100)|
|   1 |  TEMP TABLE TRANSFORMATION       |                            |       |       |            |
|   2 |   LOAD AS SELECT                 |                            |       |       |            |
|*  3 |    COUNT STOPKEY                 |                            |       |       |            |
|   4 |     VIEW                         |                            |    10 |  1210 |    66   (0)|
|*  5 |      FILTER                      |                            |       |       |            |
|   6 |       TABLE ACCESS BY INDEX ROWID| TRT_WEEKLY_PRICING         |  7637 |   574K|    66   (0)|
|*  7 |        INDEX FULL SCAN           | IDX_TWP_PRICE_START        |     1 |       |    65   (0)|
|*  8 |   HASH JOIN                      |                            | 17496 |  2853K|     9  (12)|
|   9 |    VIEW                          |                            |     9 |  1089 |     2   (0)|
|  10 |     TABLE ACCESS FULL            | SYS_TEMP_0FD9D6603_50C3F23 |     9 |  1089 |     2   (0)|
|  11 |    TABLE ACCESS FULL             | TRT_WEATHER                |  1944 | 89424 |     6   (0)|
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter(ROWNUM<10)
5 - filter(SYSDATE@!-365<=SYSDATE@!)
7 - access("TWP1"."SYS_NC00015\$">=SYS_OP_DESCEND(SYSDATE@!) AND
"TWP1"."SYS_NC00015\$"<=SYS_OP_DESCEND(SYSDATE@!-365))
filter((SYS_OP_UNDESCEND("TWP1"."SYS_NC00015\$")<=SYSDATE@! AND
SYS_OP_UNDESCEND("TWP1"."SYS_NC00015\$")>=SYSDATE@!-365))
8 - access("TWP"."TRT_DESTINATION_AIRPORT_CODE"="TW"."TRT_AIRPORT_CODE")

37 rows selected.
```

Jagan, June 07, 2009 - 2:12 am UTC

in the previous information i mentioned the sqls i am trying. but one of the queries had additional and clause "AND TO_CHAR(twp.weekly_start_date,'MM') = tw.month(+) " which is not required.
the query just should be
select * from
(select * from (
select * from trt_weekly_pricing twp1
where twp1.weekly_start_date between sysdate-365 and sysdate
order by average_price desc,weekly_start_date desc
)
where rownum < 10) twp,
trt_weather tw
where twp.trt_destination_airport_code = tw.trt_airport_code

Thanks
- Jagan

### Why Materialize hint?

Narendra, June 09, 2009 - 4:09 am UTC

Hello Tom,

You responded with following alternative.

since your highest priced data is also the most current - you can try this, but bear in mind, if your highest priced data starts becoming the oldest data - this will be a bad idea.

ops\$tkyte%ORA10GR2> with twp
2 as
3 (
4 select /*+ MATERIALIZE */ *
5 from ( select *
6 from trt_weekly_pricing twp1
7 where twp1.weekly_start_date between sysdate-365 and sysdate
8 order by average_price desc,weekly_start_date desc
9 )
10 where rownum < 10
11 )
12 select *
13 from twp, trt_weather tw
14 where twp.trt_destination_airport_code = tw.trt_airport_code
15 /

Now, I am a bit confused as to why is it needed to explicitly specify the MATERIALIZE hint in above query? Since the query has ROWNUM (in WHERE clause), should that not force CBO to materialize the results?
June 09, 2009 - 10:26 am UTC

the rownum didn't force the query to materialize it when it was an inline view either - so why would it be otherwise for the with subquery?

### That is my question

Narendra, June 10, 2009 - 4:58 am UTC

Hello Tom,

That is exactly my question. Why did the CBO did not choose to materialize the query in either of the cases (i.e. when used as an inline view or as a WITH subquery)?
I remember reading about the ROWNUM trick to force CBO to materialize the query results in your 'Effective Oracle By Design' book. Is that behaviour changed now?

A reader, June 10, 2009 - 1:48 pm UTC

Thanks Tom!
as always it is a useful solution, which I can implement.
on the otherhand I have the same question as the previous reviewer on why the rownum in the query did not cuase the result set to materialized without a hint.
is this a change from 10g onwards ?
Thanks
- Jagan
June 10, 2009 - 4:27 pm UTC

it didn't need to in that case - it doesn't always have to. it never 'had' to.

### multi column order by

Kiran, June 25, 2009 - 3:10 am UTC

Hi Tom,

I've just started my career in Oracle programming.
Apart from on the job, I've been learning a lot by going through your excellent solutions.
I help on order by result set which I was not able to solve, though I believe the inline query (is one option) can be used to achieve. I need the first column in asc and second column in desc.

create table test_order (a number, b number);

insert into test_order values(1,4);
insert into test_order values(5,3);
insert into test_order values(3,5);
insert into test_order values(4,1);
insert into test_order values(2,2);

Data is in :
1 4
5 3
3 5
4 1
2 2

Output should be in :
1 5
2 4
3 3
4 2
5 1

Expecting couple of possible ways/methods. Many Thanks in advance for your help.
June 26, 2009 - 10:38 am UTC

wow, this sort of flies in the face of everything relational.

These are not two columns - not if you sort them independently of each other like that. This is very strange stuff

can we do it? Sure.
does it make sense? I cannot imagine the real world use case of this

but here you go, don't expect it to be fast for big tables

```ops\$tkyte%ORA10GR2> select x.a, y.b
2    from (select a, row_number() over (order by a asc ) rn1 from test_order) X,
3         (select b, row_number() over (order by b desc) rn2 from test_order) Y
4   where x.rn1 = y.rn2
5   order by x.rn1
6  /

A          B
---------- ----------
1          5
2          4
3          3
4          2
5          1

```

### To "Kiran from India"

Narendra, June 26, 2009 - 9:08 am UTC

Kiran,
Here is one way but I believe more elegant approach exists
```with rank_test_order as
(select a,
b,
rank() over (order by a) ra,
rank() over (order by b desc) rb
)
select rt1.a, rt2.b
from rank_test_order rt1, rank_test_order rt2
where rt1.ra = rt2.rb
```

June 26, 2009 - 11:09 am UTC

you cannot use rank - rank allows duplicates (you'd have cartesian product issues) and is not gap free.

you have to use row_number (not dense_rank either - it allows duplicates), as it is unique and gap free.

### This is for my mini Project

Divyesh S Pai, November 14, 2009 - 8:07 am UTC

I am Using Oracle 8i.
There two tables- applicant and marks.
the structure of table are
applicant table include some of these...
name null? type
---------- -------- -----------
form_no Not Null varchar(5)
name varchar(25)
course varchar(10)
belong varchar(10)

the marks table include some of these...
name null? type
---------- -------- -----------
form_no varchar(5)
indx number

The form_no in marks table is foreign key refered to applicant(form_no)

the questions are,
1. select the name from applicant & indx from marks also with Sl.no in order, of top 15 candidates according to indx of marks table.

2. Excluding the above 15 candidates, select the name from applicant & indx from marks also with Sl.no in order, of top 5 candidates where applicant.belong='X' according to indx of marks table.

This is an important Query for me....
November 15, 2009 - 3:20 pm UTC

homework? I didn't do it when I was in school, why now?

besides, no creates, no inserts, no look - why should I do all of the work??

### for my project

Divyesh S Pai, November 17, 2009 - 4:15 am UTC

I'm doing Bachelor of Computer Application degree course.
I tried very much to find the query.
even our teachers dont det the qquery.
thats why i consulted you...
There are only a limited days to submit my record.
please make and send me the query of the question i've sent to you two days before...

November 23, 2009 - 1:51 pm UTC

this is exactly and precisely why I will *not* answer this question.

It is homework, it is a test for you, something you need to work out, a skill you need to have.

I use similar questions in some interviews,

I would expect the candidate to answer them - no "phone a friend".

### For my project

Divyesh S Pai, November 17, 2009 - 4:22 am UTC

but the rownum doesn't come in the order

select rownum as Sl_no,a.name,m.indx
from applicant a,marks m
where a.form_no=m.form_no
and rownum<=5
order by m.indx desc,rownum

Farhan Ali, July 13, 2011 - 5:02 am UTC

Hi Tom,

I want to retrieve top 12 different passwords from user profile log for a specific user ID.

We have following fields in log

User ID, Passwd, , Branch, User Name, Op Date(This is activity date in log)

This required to implement our password i-e user current password should not be the one from his last 12 used. So we can retrieve his password logs from the USERS_LOG table but how can we identify the distinct top 12.

July 13, 2011 - 8:16 pm UTC

why is this considered a hard query? It is pretty trivial

first, since they cannot reuse a password from the last 12 - you know that the last 12 are already unique (by definition)

so, just

select * from (select passwd from t where user_id = ? order by op_date DESC) where rownum <= 12;

done.

A reader, July 16, 2011 - 7:08 am UTC

Dear Tom,

using several examples of the following two queries

select job, max(sal)
FROM EMP
GROUP BY JOB
order by job;

and

SELECT *
FROM (SELECT JOB, sal, ROW_NUMBER() OVER (PARTITION BY JOB ORDER BY SAL DESC) rn
FROM EMP)
WHERE rn = 1
order by job;

I come to the conclusion that the query

select a,b, max(c)
from table_t
group by a, b
order by a;

is equivalent to

SELECT *
FROM (SELECT a, b, ROW_NUMBER() OVER (PARTITION BY a, b ORDER BY c DESC) rn
FROM table_t)
WHERE rn = 1
order by a;

I am correct?

Thanks very much
July 18, 2011 - 10:25 am UTC

you'd want a NULLS LAST on that order by.

```ops\$tkyte%ORA11GR2> create table t ( a int, b int, c int );

Table created.

ops\$tkyte%ORA11GR2> insert into t values ( 1, 1, 1 );

1 row created.

ops\$tkyte%ORA11GR2> insert into t values ( 1, 1, null );

1 row created.

ops\$tkyte%ORA11GR2>
ops\$tkyte%ORA11GR2> select a, b, max(c) from t group by a,b order by a,b;

A          B     MAX(C)
---------- ---------- ----------
1          1          1

ops\$tkyte%ORA11GR2>
ops\$tkyte%ORA11GR2> select * from (
2  SELECT a, b, c, ROW_NUMBER() OVER (PARTITION BY a, b ORDER  BY c DESC) rn
3         FROM t
4  ) where rn = 1
5  order by a
6  /

A          B          C         RN
---------- ---------- ---------- ----------
1          1                     1

```

You'd also want to order by a,b on the second query as well. You also need to select C

```  2  SELECT a, b, c, ROW_NUMBER() OVER (PARTITION BY a, b ORDER  BY c DESC NULLS LAST) rn
3         FROM t
4  ) where rn = 1
5  order by a,b
6  /

A          B          C         RN
---------- ---------- ---------- ----------
1          1          1          1

```

then, the 2nd query would be a syntactically different but semantically equivalent query.

### What is wrong?

Luca Ciano, August 22, 2011 - 9:21 am UTC

Hi Tom,

I have a simple table named logger, with structure:

msgid NUMBER primary key,
datetime TIMESTAMP,
runid NUMBER,
body CLOB

one index on datetime and one index on runid. There's about 27,000,000 of rows.

If I try to retrive the first record by msgid:
SELECT * FROM (SELECT * FROM logger ORDER BY msgid) WHERE rownum < 2;

the query takes < 1 sec.

If I try to retrive the first record by datetime (the same with runid):

SELECT * FROM (SELECT * FROM logger ORDER BY datetime) WHERE rownum < 2;

the query takes a lot of minutes.

My questions are: what is wrong? there's something that I can ask to DB administrators in order to avoid this misbehaviour?

Thank you,

Luca Ciano
August 30, 2011 - 2:19 pm UTC

A reader, August 31, 2011 - 3:09 am UTC

```PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4166973464

------------------------------------------------------------------------------------------
| Id  | Operation               | Name   | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |        |     1 |  2045 |       |   842K  (1)| 06:13:34 |
|*  1 |  COUNT STOPKEY          |        |       |       |       |            |          |
|   2 |   VIEW                  |        |    27M|    51G|       |   842K  (1)| 06:13:34 |
|*  3 |    SORT ORDER BY STOPKEY|        |    27M|  2835M|  3278M|   842K  (1)| 06:13:34 |
|   4 |     TABLE ACCESS FULL   | LOGGER |    27M|  2835M|       |   241K  (1)| 01:47:06 |
------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<2)
3 - filter(ROWNUM<2)

17 rows selected.

```

August 31, 2011 - 1:48 pm UTC

I believe that MSGID is NOT NULL and DATETIME is NULLABLE.

I believe further that the indexes are single column indexes - only msgid in one and only datetime in the other.

I know for a fact that entirely null keys are not placed in b*tree indexes - therefore an index on DATETIME cannot be used - since if the table had 27,000,000 rows where datetime was null in all occurences (it could happen), the index would not be able to retrieve them - it would get the *wrong* answer.

Here is the full example demonstrating this and then offers three alternative solutions for the problem:

```ops\$tkyte%ORA11GR2> create table t ( datetime date, msgid number not null, data char(2000) );

Table created.

ops\$tkyte%ORA11GR2> create index t_datetime_idx on t(datetime);

Index created.

ops\$tkyte%ORA11GR2> create index t_msgid_idx on t(msgid);

Index created.

ops\$tkyte%ORA11GR2>
ops\$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'T', numrows=> 27000000, numblks => 2700000 );

PL/SQL procedure successfully completed.

ops\$tkyte%ORA11GR2>
ops\$tkyte%ORA11GR2> set autotrace traceonly explain
ops\$tkyte%ORA11GR2> select * from (select * from t order by msgid) where rownum < 2;

Execution Plan
----------------------------------------------------------
Plan hash value: 1845708854

---------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |     1 |  2024 |     0   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |             |       |       |            |          |
|   2 |   VIEW                        |             |     1 |  2024 |     0   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T           |    27M|  2574M|     0   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | T_MSGID_IDX |     1 |       |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<2)

<b>Msgid naturally uses the index - because it is NOT NULL</b>

ops\$tkyte%ORA11GR2> select * from (select * from t order by datetime) where rownum < 2;

Execution Plan
----------------------------------------------------------
Plan hash value: 3299198703

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     1 |  2024 |       |  1345K  (1)| 04:29:06 |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      |    27M|    50G|       |  1345K  (1)| 04:29:06 |
|*  3 |    SORT ORDER BY STOPKEY|      |    27M|  2574M|  2929M|  1345K  (1)| 04:29:06 |
|   4 |     TABLE ACCESS FULL   | T    |    27M|  2574M|       |   732K  (1)| 02:26:28 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<2)
3 - filter(ROWNUM<2)

<b>datetime *cannot* use the index - because the NULL records would never be available - and the wrong answer could result</b>

<b>here is solution number 1 - if you do not care about the null records, tell us that:</b>

ops\$tkyte%ORA11GR2> select * from (select * from t WHERE DATETIME IS NOT NULL order by datetime) where rownum < 2;

Execution Plan
----------------------------------------------------------
Plan hash value: 3418418384

------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |     1 |  2024 |     0   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |                |       |       |            |          |
|   2 |   VIEW                        |                |     1 |  2024 |     0   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T              |     1 |   100 |     0   (0)| 00:00:01 |
|*  4 |     INDEX FULL SCAN           | T_DATETIME_IDX |     1 |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<2)
4 - filter("DATETIME" IS NOT NULL)

<b>if you do care about the NULL records - add at least one not-null attribute to the index - the constant expression "0" would suffice - but it can be any not null attribute that exists in the table if you want:</b>

ops\$tkyte%ORA11GR2> drop index t_datetime_idx;

Index dropped.

ops\$tkyte%ORA11GR2> create index t_datetime_idx on t(datetime,0);

Index created.

ops\$tkyte%ORA11GR2> select * from (select * from t order by datetime) where rownum < 2;

Execution Plan
----------------------------------------------------------
Plan hash value: 3418418384

------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |     1 |  2024 |     0   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |                |       |       |            |          |
|   2 |   VIEW                        |                |     1 |  2024 |     0   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T              |    27M|  2574M|     0   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | T_DATETIME_IDX |     1 |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<2)

<b>or, if datetime is really NOT NULL - tell us that:</b>

ops\$tkyte%ORA11GR2> drop index t_datetime_idx;

Index dropped.

ops\$tkyte%ORA11GR2> create index t_datetime_idx on t(datetime);

Index created.

ops\$tkyte%ORA11GR2> alter table t modify datetime NOT NULL;

Table altered.

ops\$tkyte%ORA11GR2> select * from (select * from t order by datetime) where rownum < 2;

Execution Plan
----------------------------------------------------------
Plan hash value: 3418418384

------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |     1 |  2024 |     0   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |                |       |       |            |          |
|   2 |   VIEW                        |                |     1 |  2024 |     0   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T              |    27M|  2574M|     0   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | T_DATETIME_IDX |     1 |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<2)

ops\$tkyte%ORA11GR2> set autotrace off
```

A reader, August 31, 2011 - 4:26 am UTC

The previous plan was for the query by datetime, this is for the query by msgid:

```PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 4223408808

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     1 |  2045 |     4   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |           |       |       |            |          |
|   2 |   VIEW                        |           |     1 |  2045 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| LOGGER    |    27M|  2835M|     4   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | LOGGER_PK |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(ROWNUM<2)

16 rows selected.

```

### Great solution

Luca Ciano, September 01, 2011 - 7:02 am UTC

Thak you very much Tom, the point was the presence of NULL values in the columns and your solutions work great.

Luca Ciano

### Top-N with Aggregation and Multi-Value Columns

A reader, February 24, 2012 - 7:56 pm UTC

CREATE TABLE Users (
user_id INTEGER NOT NULL,
first_name VARCHAR2(20) NOT NULL,
last_name VARCHAR2(20) NOT NULL
);

CREATE TABLE Severity (
severity_id INTEGER NOT NULL,
severity VARCHAR2(10)
);

CREATE TABLE service_tickets (
ticket_id INTEGER NOT NULL,
severity_id INTEGER NOT NULL,
user_id INTEGER NOT NULL
);

INSERT INTO Severity VALUES (1, 'High');
INSERT INTO Severity VALUES (2, 'Medium');
INSERT INTO Severity VALUES (3, 'Low');

INSERT INTO Users VALUES (1, 'John', 'Doe');
INSERT INTO Users VALUES (2, 'Mary', 'Tyler');

INSERT INTO service_tickets VALUES (1, 1, 1);
INSERT INTO service_tickets VALUES (2, 2, 1);
INSERT INTO service_tickets VALUES (3, 1, 2);
INSERT INTO service_tickets VALUES (4, 1, 2);
INSERT INTO service_tickets VALUES (5, 3, 2);

SELECT u.first_name, u.last_name, ticket_id, severity
FROM Users u, Severity s, service_tickets t
WHERE t.user_id = u.user_id AND
t.severity_id = s.severity_id
ORDER BY 1, 2, 3;

FIRST_NAME LAST_NAME TICKET_ID SEVERITY
-------------------- -------------------- ---------- ----------
John Doe 1 High
John Doe 2 Medium
Mary Tyler 3 High
Mary Tyler 4 High
Mary Tyler 5 Low

SELECT *
FROM (
SELECT u.first_name, u.last_name, COUNT(*) number_of_tickets
FROM Users u, service_tickets t
WHERE t.user_id = u.user_id
GROUP BY u.first_name, u.last_name
ORDER BY number_of_tickets DESC, u.first_name, u.last_name
)
WHERE rownum <= 10;

FIRST_NAME LAST_NAME NUMBER_OF_TICKETS
-------------------- -------------------- -----------------
Mary Tyler 3
John Doe 2

How do I write a SQL to produce the following result? It's an aggregation SQL since it sums up the number of tickets for each user. However, severity cannot be aggregated. How do I write the top-N SQL so that it will perform well when there are million of users and tickets?

FIRST_NAME LAST_NAME NUMBER_OF_TICKETS Severity
-------------------- -------------------- ----------------- ---------------------
Mary Tyler 3 High, Low
John Doe 2 High, Medium

February 25, 2012 - 11:56 pm UTC

to get the top ten ticket holders - is ambiguous.

what if 1,000 people have the "top number of tickets".

do you want

a) a random selection of ten people having the top number of tickets?
b) the set of people having the top ten number of tickets (which could be a lot of people - we'd find the counts of the top ten ticket counts and get everyone that has that)

You'll have to realize that this is a full scan question - so if there are millions - make sure you have the IO capabilities to read millions of records fast enough - and enough sort space to deal with any potential temp space requests.

The second form will likely be much more performant - but would require knowledge of the possible severity levels (you could query the severity table to get thelist of severities and dynamically construct that query if you think the levels will change over time)

```ops\$tkyte%ORA11GR2> select first_name, last_name, sum(cnt) cnt2, listagg( severity, ',' ) within group (order by severity) sevs
2    from (
3  SELECT u.user_id, u.first_name, u.last_name, count(ticket_id) cnt, severity
4  FROM   Users u, Severity s, service_tickets t
5  WHERE  t.user_id = u.user_id AND
6         t.severity_id = s.severity_id
7  group  by u.user_id, first_name, last_name, severity
8         )
9  group by user_id, first_name, last_name
10  order by cnt2 DESC, first_name, last_name
11  /

FIRST_NAME           LAST_NAME                  CNT2 SEVS
-------------------- -------------------- ---------- --------------------
Mary                 Tyler                         3 High,Low
John                 Doe                           2 High,Medium

ops\$tkyte%ORA11GR2>
ops\$tkyte%ORA11GR2>
ops\$tkyte%ORA11GR2> SELECT u.user_id, u.first_name, u.last_name,
2         count(ticket_id) cnt,
3         max( decode( severity, 'High', 'High, ' ) ) ||
4         max( decode( severity, 'Medium', 'Medium, ' ) ) ||
5         max( decode( severity, 'Low', 'Low' ) ) sevs
6  FROM   Users u, Severity s, service_tickets t
7  WHERE  t.user_id = u.user_id AND
8         t.severity_id = s.severity_id
9  group  by u.user_id, first_name, last_name
10  order by cnt DESC, u.first_name, u.last_name
11  /

USER_ID FIRST_NAME           LAST_NAME                   CNT SEVS
---------- -------------------- -------------------- ---------- --------------------
2 Mary                 Tyler                         3 High, Low
1 John                 Doe                           2 High, Medium,

ops\$tkyte%ORA11GR2>
```

### Top N analysis

simon, September 23, 2012 - 1:14 pm UTC