Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Janardhan.

Asked: August 04, 2000 - 3:22 am UTC

Last updated: September 08, 2009 - 6:14 am UTC

Version: 8.1.5

Viewed 1000+ times

You Asked

Hi,

Here We are executing a query which returns top 2 salaries
dept wise from emp.

Query:select sal,deptno
from emp a
where 2>(select count(sal)
from emp
where deptno=a.deptno
and sal>a.sal)
order by deptno,sal;

But if there are more then 12,00,000 records in EMP table
it is taking lot of time to retrieve results..more then 10 mins.
suggest me solution.

select count(*) from emp is taking 2 mins to give result.

Thanks in advance..

Regards,

Janardhan.


and Tom said...

In Oracle8i, release 8.1 and up, you can use order by in a subquery. Therefore:

select sal, deptno
from ( select sal, deptno
from emp
order by SAL desc )
where rownum <= 2
order by deptno, sal
/

will work...

Now, that query is DIFFERENT from the one you have, here is an exmaple showing the different results you can expect and an alternate query that is equivalent to yours:

ops$tkyte@8i> create table emp ( sal int, deptno int );
Table created.

ops$tkyte@8i> insert into emp values ( 100, 10 );
ops$tkyte@8i> insert into emp values ( 100, 10 );
ops$tkyte@8i> insert into emp values ( 100, 10 );
ops$tkyte@8i> insert into emp values ( 200, 10 );
ops$tkyte@8i> insert into emp values ( 200, 10 );
ops$tkyte@8i> insert into emp values ( 200, 10 );
ops$tkyte@8i> insert into emp values ( 300, 10 );
1 row created.


ops$tkyte@8i> select sal,deptno
2 from emp a
3 where 2>(select count(sal)
4 from emp
5 where deptno=a.deptno
6 and sal>a.sal)
7 order by deptno,sal
8 /

SAL DEPTNO
---------- ----------
200 10
200 10
200 10
300 10

ops$tkyte@8i> select sal, deptno
2 from ( select sal, deptno
3 from emp
4 order by SAL desc )
5 where rownum <= 2
6 order by deptno, sal
7 /

SAL DEPTNO
---------- ----------
200 10
300 10

ops$tkyte@8i> select sal, deptno
2 from emp
3 where sal >= any ( select sal
4 from (select sal
5 from emp
6 order by sal desc)
7 where rownum <= 2 )
8 order by deptno, sal
9 /

SAL DEPTNO
---------- ----------
200 10
200 10
200 10
300 10


Rating

  (24 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, April 09, 2002 - 10:44 am UTC

what is the significance of any in the above query, what will happen if we remove it.

Tom Kyte
April 09, 2002 - 11:15 am UTC

If you remove it, the query fails in most cases:

ops$tkyte@ORA817DEV.US.ORACLE.COM> select sal, deptno
  2    from emp
  3   where sal >=        ( select sal
  4                        from (select sal
  5                                from emp
  6                               order by sal desc)
  7                       where rownum <= 2 )
  8   order by deptno, sal
  9  /
 where sal >=     ( select sal
                    *
ERROR at line 3:
ORA-01427: single-row subquery returns more than one ro


the any keyword does exactly what it sounds like it does -- 

where sal >= ANY ( subquery ) 

says when SAL is greater than or equal to ANY of the values in the subquery.

My subquery generated a set, I wanted all salaries greater then or equal to any of the values in that set. 

Performance - Top N Queries

Rich Sekunda, April 09, 2002 - 1:06 pm UTC

There always seems to be several approaches to writing an SQL statement but I've become a big fan of analytic funtions. The following works well too.

select *
FROM (select annual_sal_amt,
deptid,
dense_rank()
over (partition by deptid
order by annual_sal_amt desc) dr
from offer
where annual_sal_amt > 0 )
where dr <= 2
order by deptid,
annual_sal_amt desc

Tom Kyte
April 09, 2002 - 1:41 pm UTC

absolutely (this question was from 815 -- predated analytic functions, new with 816 and up)

why did you choose ANY instead of in

A reader, April 12, 2002 - 3:44 pm UTC

Look at the following...


1 select sal, deptno
2 from emp
3 where sal in ( select sal
4 from (select sal
5 from emp
6 order by sal desc)
7 where rownum <= 2 )
8* order by deptno, sal


SAL DEPTNO
--------- ---------
5000 10
3000 20
3000 20
Wrote file afiedt.buf

1 select sal, deptno
2 from emp
3 where sal >= any ( select sal
4 from (select sal
5 from emp
6 order by sal desc)
7 where rownum <= 2 )
8* order by deptno, sal


SAL DEPTNO
--------- ---------
5000 10
3000 20
3000 20

The result is the same when you used ANY or when I used IN .

Why did you choose ANY over IN.

Whose performance is better.


Tom Kyte
April 12, 2002 - 3:59 pm UTC

benchmark it to see which is faster.


I suppose at the moment I wrote it, >= ANY fit in with my thinking better then in did.

I generate the set of sals in descending order (should have added NULLS LAST to that in retrospect)...

I pick off the first two...

Now I want anyone that makes a salary >= any of those salaries....


I wrote the sql as I thought the problem out -- the sql is a literal translation of thw way I thought about getting the answer.


Besides, how many times do you see someone use ANY or SOME?



Interesting

A reader, April 12, 2002 - 4:03 pm UTC

as you have said

'Besides, how many times do you see someone use ANY or SOME? '

can you show when and how to use SOME, and

Why do you think developers never use ANY or SOME( these things are never even asked in any interviews)


Tom Kyte
April 12, 2002 - 8:40 pm UTC

They are pretty much the same as "in" or "=" in most general cases.

Can be useful for this -- find me all of the items such that the price is greater then ANY/SOME (they are synonymous) of the prices of the items in department x

select *
from items
where price > ANY ( select price from items where department = 'X' )

it probably more intuitive than:

select *
from items
where price >= ( select min(price) from items where department = 'X' )

The fact is, you can pretty much do all of the ANY/SOME processing with IN or = (combined with min/max). Hence it is not used very often.

along the same lines -- how many people know about

o having
o union all (vs union and what the difference is)
o minus
o intersect

and so on (analytic functions, NULLS first, NULLS last, update a join, and so on and so on and so on)...

I ask this question in interviews. You have a table:

create table t ( ....., month number, ..... );

Month is always a number between 1 and 12.

I ask three questions:

1) how many rows are in the table

2) how many rows BY MONTH are in the table (i want to know how many rows for month one, month two and so on)

3) what MONTH has the most rows (and for a special bonus, tell me why this question is ambigous)


You would be surprised how many people struggle with #1 -- let alone #2. I've had people YELL at me that #3 cannot possibly be done in a single query, no doubt about it. They get really dejected when I put up at least three different answers before asking if they would like more. I don't expect people to get #3 -- it's a tricky one, mostly want to see how they handle the pressure. It's #2 and #1 that gets me -- most people who rated themselves 8 or 9's (i have them rate their sql ability on a scale of 1..10 -- higher is better) *cannot do it*. I have them re-rate themselves after the three questions. It's interesting.

(my favorite answer to question #3 involves using a HAVING clause with ">= ALL" in it -- not any, not some, but ALL...)



IS, April 12, 2002 - 5:10 pm UTC

I think the question was whah are the top 2 salaries for each department. I think you answered what are the top two salaries in all the departments. Within the realms of SQL I am not sure if there is a better way than being used. If there is I would love to know.

Tom Kyte
April 12, 2002 - 8:44 pm UTC

Look at that -- no one pointed that out yet. You are correct, i read the question wrong. The answer to that is:


select *
from ( select sal, deptno, row_number() over ( partition by deptno
order by sal desc NULLS LAST ) rn
from emp
)
where rn <= 2;

thanks!

A reader, September 10, 2003 - 1:13 pm UTC

Version 8.1.7.4

Tom,

I have a table like this

T1
==
c1 int
c2 varchar2(10)
c3 varchar2(10)

this table has 100,000 records.

Now,user can enter any number from 1 to 6.
lets say the user enters 6.What i want to do is divide
ceil(100000/6)
that returns 16667.

That is like 6 sets of 16667 round about ( last set might have less records).

I want to write an update statement like this

for the 1st set of 16667, c1 should be 1
for the 2nd set of 1667, c1 should be 2
for the 3rd set of 1667, c1 should be 3
.....
..
for the 6th set... c1 should be 6

can this be done in sql without writing plsql.

Thanks.




Tom Kyte
September 10, 2003 - 8:10 pm UTC

ntile does that.



1* select ename, ntile(6) over (order by ename) from emp
scott@ORA920LAP> /

ENAME NTILE(6)OVER(ORDERBYENAME)
---------- --------------------------
ADAMS 1
ALLEN 1
BLAKE 1
CLARK 2
FORD 2
JAMES 2
JONES 3
KING 3
MARTIN 4
MILLER 4
SCOTT 5
SMITH 5
TURNER 6
WARD 6

14 rows selected.



You are the best..

A reader, September 10, 2003 - 11:03 pm UTC

Thanks Tom!!

Regarding your 3 interview questions

Asim, September 11, 2003 - 6:18 am UTC

Hi,

Regarding your 3 interview questions earlier in this
discussion.

Why people find the Q.2 difficult? isnt it a simple
GROUP BY query, and GROUP BY is very commonly used unlike
ANY or SOME. Isnt this simple query is the answer.

SELECT MONTH, COUNT(*) FROM T GROUP BY MONTH


And about question 1, i am really surpriced how can
one tell that how many rows are there in the table unless
he executes SELECT COUNT(*) FROM T. and even this query
doesnt have any uncommonly used clause.

Do you mean to say that even
these simple queries are uncommon
amongs the people.

OR

am I misunderstanding the whole thing??????



Tom Kyte
September 11, 2003 - 8:44 am UTC

yes, it is that simple.

yes, many people cannot do it. (really, trust me, starting asking it in interviews and see what happens yourself)

you are not misunderstanding at all.

it is as simple as it sounds, yet apparently "really hard"

A reader, September 11, 2003 - 10:23 am UTC

Tom,

I tried updating using ntile with a 15 records table t1,but all records of c1 were updated to 1.

SQL>update t1 a set c1 = ( select ntile(6) over (order by c1 ) from t1 where t1.rowid=a.rowid )

15 rows updated.

SQL>select c1 from t1;

c1
--
1
1
1
1
1
..
..
15 rows selected.

 

Tom Kyte
September 11, 2003 - 6:09 pm UTC

well, ntile is being evaluated how there?


for a single row at a time.  not for the entire set.

you didn't ask "how do i update a result set like this".  ntile works nicely on retrieval -- but not for update.


ops$tkyte@ORA920> create table emp as select ename, 0 xxx from scott.emp;
 
Table created.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> update emp
  2    set xxx = trunc( rownum / (select count(*)/6 from emp ) - 0.0001 )+1
  3  /
 
14 rows updated.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> select * from emp;
 
ENAME             XXX
---------- ----------
SMITH               1
ALLEN               1
WARD                2
JONES               2
MARTIN              3
BLAKE               3
CLARK               3
SCOTT               4
KING                4
TURNER              5
ADAMS               5
JAMES               6
FORD                6
MILLER              6
 
14 rows selected.
 
would be one approach -- but best to just retrieve this value using ntile 

I would like your opinion about this query please

A reader, September 11, 2003 - 4:43 pm UTC

SELECT
employees.USID,
review_status,
jobs.ID,
jobs.STATUS,
JOB_JAVA_PKG.GET_CHEMIST(jobs.ID) ASSIGNED_TO,
jobs.TEST ,
JOB_JAVA_PKG.get_review_status(jobs.ID) REVIEW_STATUS,
jobs.CREATED_BY ,
jobs.REASON_FOR_CHANGE,
DECODE(STATUS,'CANCELLED',jobs.MODIFIED_BY,NULL) MODIFIED_BY
FROM JOBS, job_chemists, employees, enotebook_reviews
where jobs.id = job_fk_id
and job_chemists.RECORD_STATUS='CURRENT'
and enotebook_reviews.RECORD_STATUS='CURRENT'
and job_chemists.employee_fk_id = employee_id
and enotebook_reviews.RECORD_STATUS= jobs.record_status


Tom Kyte
September 11, 2003 - 7:49 pm UTC

i don't like the indentation, i code like this:

select c1,
c2,
c3
from t1, t2, t3
where x = y
and z = a
/
^
^
|
i like the nice "alley"

other then that, looks dandy...

as long as it answers your questions? couple of cartesian products in there which is suspicious but legitimate.

oh wait, i see "java" in there -- hmmm, probably don't need that, just use SQL.

remove nulls from tile?

Rob, February 09, 2004 - 4:55 pm UTC

Is there a way to ntile without the nulls or place nulls in tile 0?

IE
select cust_no,
sum(case when to_char(ord_date,'YYYY')='1999' then net else 0 end) sales_1999,
sum(case when to_char(ord_date,'YYYY')='2000' then net else 0 end) sales_2000,
ntile(10) over(order by sum(case when to_char(ord_date,'YYYY')='1999' then net else 0 end) desc) decile_1999,
ntile(10) over(order by sum(case when to_char(ord_date,'YYYY')='2000' then net else 0 end) desc) decile_2000
from sales_data
group by cust_no

the problem is that the decile_2000 deciles all customers, we would like to exclude customers with no purchases from the decile. I tried caseing the net>0 but then I'm told I need to group by net (ora-00979) and thats not correct.


Tom Kyte
February 09, 2004 - 8:29 pm UTC

Pretty sure this does it -- note, used an inline view to make "more readable" -- you can undo that but I would keep my use of trunc instead of to_char, consumes less cpu on big result sets.


ops$tkyte@ORA9IR2> create table t ( cust_no number, ord_date date, net number );
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t
  2  select trunc(rownum/2),
  3         add_months( to_date('01-jun-1999'), mod(rownum,2)*12 ),
  4             user_id+1
  5    from all_users
  6  /
 
48 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select cust_no,
  2         sales_1999,
  3         decode(sales_1999,null,to_number(null),
  4             ntile(10) over (partition by decode(sales_1999,null,null,'x')
  5                             order by sales_1999)) decile_1999,
  6         sales_2000,
  7         decode(sales_2000,null,to_number(null),
  8             ntile(10) over (partition by decode(sales_2000,null,null,'x')
  9                             order by sales_2000)) decile_2000
 10    from (
 11  select cust_no,
 12         sum(decode(trunc(ord_date,'y'),to_date('01-jan-1999'),net)) sales_1999,
 13         sum(decode(trunc(ord_date,'y'),to_date('01-jan-2000'),net)) sales_2000
 14    from t
 15   group by cust_no
 16         )
 17   order by sales_1999 NULLS LAST
 18  /

   CUST_NO SALES_1999 DECILE_1999 SALES_2000 DECILE_2000
---------- ---------- ----------- ---------- -----------
         1          6           1         12           1
         2         20           1         22           1
         3         31           1         32           2
         4         33           2         34           2
         5         36           2         37           2
         6         40           2         41           3
         8         43           3         47           3
         7         45           3         46           3
         9         49           3         50           4
        10         51           4         52           4
        11         54           4         55           4
        12         56           4         57           5
        13         58           5         59           5
        14         60           5         61           6
        16         65           6         82           8
        18         71           6         72           7
        19         73           7         74           7
        20         75           7         83           8
        21         84           8         85           9
        22         86           8         87           9
        23         88           9        100          10
        24         93           9
        15         96          10         64           6
        17        102          10        103          10
         0                                 1           1
 
25 rows selected.
ops$tkyte@ORA9IR2> select cust_no,
  2         sales_1999,
  3         decode(sales_1999,null,to_number(null),
  4             ntile(10) over (partition by decode(sales_1999,null,null,'x')
  5                             order by sales_1999)) decile_1999,
  6         sales_2000,
  7         decode(sales_2000,null,to_number(null),
  8             ntile(10) over (partition by decode(sales_2000,null,null,'x')
  9                             order by sales_2000)) decile_2000
 10    from (
 11  select cust_no,
 12         sum(decode(trunc(ord_date,'y'),to_date('01-jan-1999'),net)) sales_1999,
 13         sum(decode(trunc(ord_date,'y'),to_date('01-jan-2000'),net)) sales_2000
 14    from t
 15   group by cust_no
 16         )
 17   order by sales_2000 NULLS LAST
 18  /
 
   CUST_NO SALES_1999 DECILE_1999 SALES_2000 DECILE_2000
---------- ---------- ----------- ---------- -----------
         0                                 1           1
         1          6           1         12           1
         2         20           1         22           1
         3         31           1         32           2
         4         33           2         34           2
         5         36           2         37           2
         6         40           2         41           3
         7         45           3         46           3
         8         43           3         47           3
         9         49           3         50           4
        10         51           4         52           4
        11         54           4         55           4
        12         56           4         57           5
        13         58           5         59           5
        14         60           5         61           6
        15         96          10         64           6
        18         71           6         72           7
        19         73           7         74           7
        16         65           6         82           8
        20         75           7         83           8
        21         84           8         85           9
        22         86           8         87           9
        23         88           9        100          10
        17        102          10        103          10
        24         93           9
 
25 rows selected.

 

Perfect

Rob, February 10, 2004 - 10:30 am UTC

Thats exactly what I was looking for. I'm surpised that a trunc and to_date comparison is faster than a to_char comparison, but I'll take your word on it.

Just for refernece here's our decile, sales-data cube.

select
cust_no,
store_count,
sales_ltd,
min_sale,
max_sale,
first_purchase,
last_purchase,
sales_1999,
sales_2000,
sales_2001,
sales_2002,
sales_2003,
sales_2004,
decode(sales_1999,null,to_number(null),
ntile(10) over (partition by decode(sales_1999,null,null,'x')
order by sales_1999 desc nulls last)) decile_1999,
decode(sales_2000,null,to_number(null),
ntile(10) over (partition by decode(sales_2000,null,null,'x')
order by sales_2000 desc nulls last)) decile_2000,
decode(sales_2001,null,to_number(null),
ntile(10) over (partition by decode(sales_2001,null,null,'x')
order by sales_2001 desc nulls last)) decile_2001,
decode(sales_2002,null,to_number(null),
ntile(10) over (partition by decode(sales_2002,null,null,'x')
order by sales_2002 desc nulls last)) decile_2002,
decode(sales_2003,null,to_number(null),
ntile(10) over (partition by decode(sales_2003,null,null,'x')
order by sales_2003 desc nulls last)) decile_2003,
decode(decode(nvl(sales_2000,0)+nvl(sales_2001,0),0,null,nvl(sales_2000,0)+nvl(sales_2001,0)),null,to_number(null),
ntile(10) over (partition by decode(decode(nvl(sales_2000,0)+nvl(sales_2001,0),0,null,nvl(sales_2000,0)+nvl(sales_2001,0) ),null,null,'x')
order by decode(nvl(sales_2000,0)+nvl(sales_2001,0),0,null,nvl(sales_2000,0)+nvl(sales_2001,0) ) desc nulls last)) decile_00_01,
decode(decode(nvl(sales_2002,0)+nvl(sales_2003,0),0,null,nvl(sales_2002,0)+nvl(sales_2003,0)),null,to_number(null),
ntile(10) over (partition by decode(decode(nvl(sales_2002,0)+nvl(sales_2002,0),0,null,nvl(sales_2002,0)+nvl(sales_2003,0)),null,null,'x')
order by decode(nvl(sales_2002,0)+nvl(sales_2003,0),0,null,nvl(sales_2002,0)+nvl(sales_2003,0)) desc nulls last)) decile_02_03
from (select cust_no,
count(distinct br_no) store_count,
sum(net) sales_ltd,
min(net) min_sale,
max(net) max_sale,
min(ord_date) first_purchase,
max(ord_date) last_purchase,
sum(decode(trunc(ord_date,'y'),to_date('01-jan-1999'),net)) sales_1999,
sum(decode(trunc(ord_date,'y'),to_date('01-jan-2000'),net)) sales_2000,
sum(decode(trunc(ord_date,'y'),to_date('01-jan-2001'),net)) sales_2001,
sum(decode(trunc(ord_date,'y'),to_date('01-jan-2002'),net)) sales_2002,
sum(decode(trunc(ord_date,'y'),to_date('01-jan-2003'),net)) sales_2003,
sum(decode(trunc(ord_date,'y'),to_date('01-jan-2004'),net)) sales_2004
from
sales_data
group by cust_no )

Tom Kyte
February 10, 2004 - 3:31 pm UTC

to_char = lots of NLS stuff, conversion from 7 byte binary to string.
trunc = set some bytes to "0"


ntile on value, not row?

Christo Kutrovsky, February 17, 2004 - 8:17 pm UTC

Hello Tom,

I would like to divide a number of rows into N buckets, but not equal number of rows, i would like a equal number based on summing the "value" column.

Example with 4 buckets:

Name Val bucket
------ ----- ------
Ele1 100 1 (1 -> 100)
Ele2 20 2 (2 -> 20)
Ele3 30 3 (3 -> 30)
Ele4 30 4 (4 -> 30)
Ele5 30 2 (2 -> 50)
Ele6 80 3 (3 -> 110)
Ele7 15 4 (4 -> 45)
Ele8 50 2 (2 -> 100)
Ele9 70 4 (4 -> 115)
Ele10 20 1 (1 -> 120)
...

You get the idea. The idea is at the end to have a equal "sum" of value for each bucket, or at least as close as possible.

Is it possible to do that with a single query ?


Tom Kyte
February 17, 2004 - 8:49 pm UTC

absolutely, it is the technique I use to "bin" extents to divide a table up. ntile doesn't work but a "group" column based on running_total/(total/#bins) will.


see
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:10498431232211 <code>


Now, you seem to have some "algorithm" going there as the buckets are 1,2,3,4,2,3,4,2,4,1 -- and so on. Mine algorithm will just take them as they come -- getting "as close as it can" to the "perfect value", but it won't "optimize" it in any fashion.

Follow up

Christo Kutrovsky, February 17, 2004 - 10:30 pm UTC

Interesting approach,

However there are a few problems. For example, if the size of a single item is over the "optimal" size per bucket, then a bucket is missing.

The algoritm above is not accurate as I typed the values manualy.

The algorithm should be:
"put the next value into the bucket with the smallest sum" and values are ordered by value, descending.

That way, when one (or more) item is bigger then the optimal value, it will simply be alone in a bucket, and all the remaining items will be equally distributed among the rest of the buckets.

I did try the ordering with your approach, and here's the result:

select grp, sof, opt_chunk, name, sum(size_mb)
from (
select
trunc( (sum(bytes) over (order by bytes desc)-0.01) /
(sum(bytes) over ()/5) ) grp,
round(sum(bytes) over (order by bytes desc)/1024/1024) as sof,
round(sum(bytes) over ()/5/1024/1024) as opt_chunk,
segment_name name, round(bytes/1024/1024) as size_mb
from user_segments u
where segment_type = 'TABLE' and bytes > 1000000000
) t
group by grouping sets ((sof, opt_chunk, name,grp),grp)

Here's my result:

GRP SOF OPT_CHUNK NAME SUM(SIZE_MB)
------- ------- --------------- ------- -----------------
1 15489 8634 T1 15489
1 15489
3 29114 8634 T2 13625
3 31162 8634 T3 2048
3 32698 8634 T4 1536
3 34120 8634 T5 1422
3 18631
4 35530 8634 T6 1410
4 36814 8634 T7 1284
4 37966 8634 T8 1152
4 39060 8634 T9 1094
4 40110 8634 T10 1050
4 41130 8634 T11 1020
4 42150 8634 T12 1020
4 43170 8634 T13 1020
4 9050

P.S.
How do I preserve formatting in follow ups? I tried pressing the Help in top right corner, no info.


Tom Kyte
February 18, 2004 - 7:44 am UTC

to preserve formatting, just use "regular characters and spaces" (eg: no tabs)


the binning problem is an optimization problem really -- it is a procedural algorithm.

VKOUL, February 20, 2004 - 10:53 am UTC

Excerpt :
"Tom,

I tried updating using ntile with a 15 records table t1,but all records of c1 
were updated to 1.

SQL>update t1 a set c1 = ( select ntile(6) over (order by c1 ) from t1 where 
t1.rowid=a.rowid )

15 rows updated.

SQL>select c1 from t1;

c1
--
1
1
1
1
1
..
..
15 rows selected."

I have tried to do it in following way, but the query takes long time of the order of 15 to 20 minutes on 24K records.

SQL> drop table abc;

Table dropped.

SQL> 
SQL> ed
Wrote file afiedt.buf

  1  CREATE TABLE abc (object_name, bucket)
  2  AS
  3  SELECT object_name, ntile(2000) OVER (ORDER BY object_name)
  4* FROM   all_objects
SQL> 
SQL> /

Table created.

SQL> select count(*) from abc;
Press Enter ...

  COUNT(*)
----------
     24578

SQL> alter table abc add new_bucket number;

Table altered.

SQL> desc abc
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OBJECT_NAME                               NOT NULL VARCHAR2(30)
 BUCKET                                             NUMBER
 NEW_BUCKET                                         NUMBER

SQL> 
SQL> select count(object_name), count(bucket), count(new_bucket)
  2  from   abc;
Press Enter ...

COUNT(OBJECT_NAME) COUNT(BUCKET) COUNT(NEW_BUCKET)
------------------ ------------- -----------------
             24578         24578                 0

SQL> 
SQL> UPDATE abc t1
  2  SET    new_bucket = (SELECT t2.new_ntile
  3                       FROM  (SELECT rowid row_id, ntile(2000) OVER (ORDER BY object_name) new_ntile
  4                              FROM   abc
  5                             ) t2
  6                       WHERE  t1.rowid = t2.row_id
  7                      )
  8  /

24578 rows updated. (*** Very Slow ***)

SQL> commit;

Commit complete.

SQL> 
SQL> 
SQL> select count(*) from abc where bucket <> new_bucket;
Press Enter ...

  COUNT(*)
----------
         0

SQL> select count(object_name), count(bucket), count(new_bucket)
  2  from   abc;
Press Enter ...

COUNT(OBJECT_NAME) COUNT(BUCKET) COUNT(NEW_BUCKET)
------------------ ------------- -----------------
             24578         24578             24578

SQL> select count(*) from abc;
Press Enter ...

  COUNT(*)
----------
     24578

SQL> 
SQL> alter table abc add count_bucket number;

Table altered.

SQL> ed
Wrote file afiedt.buf

  1  update abc
  2* set    count_bucket = trunc( rownum / (select count(*)/2000 from abc ) - 0.0001 ) + 1
SQL> 
SQL> 
SQL> /

24578 rows updated.

SQL> commit;

Commit complete.

SQL> select count(object_name), count(bucket), count(new_bucket), count(count_bucket)
  2  from   abc;
Press Enter ...

COUNT(OBJECT_NAME) COUNT(BUCKET) COUNT(NEW_BUCKET) COUNT(COUNT_BUCKET)
------------------ ------------- ----------------- -------------------
             24578         24578             24578               24578

SQL> 
SQL> select count(*)
  2  from   abc
  3  where  bucket <> count_bucket;
Press Enter ...

  COUNT(*)
----------
     24264

SQL> 

Why it is so slow ???
 

Tom Kyte
February 20, 2004 - 1:04 pm UTC

it is most likely executing the correlated subquery once per row.  so, it is building the NTILE result 24,578 times -- scanning it to find the right rowid and updating a single row at a time.


In 9i, we can do this via:

ops$tkyte@ORA920PC> merge into abc t1
  2  using ( select rowid rid, ntile(2000) over (order by object_name) new_ntile
  3            from abc ) t2
  4  on (t1.rowid = t2.rowid)
  5  when matched then update set new_bucket = new_ntile
  6  when not matched then insert (object_name) values(null);
 
30218 rows merged.
 
Elapsed: 00:00:03.10
ops$tkyte@ORA920PC> select * from abc where bucket <> new_bucket;
 
no rows selected
 

Ntile

VKOUL, February 20, 2004 - 2:54 pm UTC

Thanks, a Lot.

Very helpful

Mystifying

Raj, March 16, 2004 - 6:08 am UTC

Dear Sir,
The query
"sql> select sal,deptno
from emp a
where 2>(select count(sal)
from emp
where deptno=a.deptno
and sal>a.sal)
order by deptno,sal;"
is not obvious to me.Could you please demystify it?
How does it procede?
Bye!




Tom Kyte
March 16, 2004 - 8:07 am UTC

think of it like this:


select sal, deptno
from emp
where some_function(deptno,sal) < 2
order by deptno, sal


that would make sense right? if you had a function some_function that when given a particular deptno, sal would return the count of records in that deptno that had a sal greater than the input.


That correlated subquery:

(select count(sal)
from emp
where deptno=a.deptno
and sal>a.sal)

is that function. It is as if that query is run for each row.

your 3rd question

Mathan, March 16, 2004 - 10:44 am UTC

Tom,

will this sql answer ur 3rd interview question?

select * from
(select month, count(1) from tab group by month order by 2 desc)
where rownum =1

pls provide ur answer also (using a having & all)

advance thanks

Tom Kyte
March 16, 2004 - 1:34 pm UTC

Nope. that gets the wrong answer

ask yourself "what happens when the table has two rows - one with 1 and one with 2"

the problem is the answer is a SET, not a single ROW.

Your 3rd question

Nic, March 16, 2004 - 5:42 pm UTC

Is this the answer to your 3rd question?

select month, count(*)
from t
group by month
having count(*) >=
(select max(cnt) from
(select month, count(*) cnt
from t
group by month))

It seems that there may be a better solution than this. It queried the table twice doing the same group by.

Tom Kyte
March 16, 2004 - 6:23 pm UTC

that is one way

select month
from t
group by month
having count(*) >= ALL( select count(*) from t group by month)

is another, there are many many variations on this. analytics open a whole new realm of solutions.



ORA-01427: single-row subquery returns more than one row

Serge, April 10, 2007 - 11:41 am UTC

Hi Tom,
I didn't want to open a new thread specific to ORA-01427 as I found few spots in existing threads where you have an approach on this.
However, when it comes to IN clause I can't find anything to address something like:

select distinct financial_product
from DDS_FINANCIAL_PROD
where source_system = 'SAVINGS'
and FINANCIAL_PROD_SEC_CLASS_CD = 'UNSECURED'
and SRC_SYS_PROD_CD in case when (select count(*)
from dds_financial_account
where source_system = 'SAVINGS'
and valid_date < '28-FEB-2007') > 0
then (select PRODUCT
from dds_financial_account
where source_system = 'SAVINGS')
else 'UNKNOWN'
end;

Basically the ideea is to check if field SRC_SYS_PROD_CD can be mapped to one or more PRODUCTs in the other table. This other table (dds_financial_account) might be populated with what I need or might not. So I need that 'CASE WHEN ... ELSE' clause.
The problem is when is populated with more than one PRODUCT, the query returns ORA-01427.
I don't see a way to reengineer this query in another way, yet I don't want to use procedural code if possible.
Thank you.

Tom Kyte
April 10, 2007 - 3:53 pm UTC

you don't use case for something like that...

ops$tkyte%ORA10GR2> create table t1 ( financial_product int, source_system varchar2(10), FINANCIAL_PROD_SEC_CLASS_CD varchar2(10), SRC_SYS_PROD_CD varchar2(10) );

Table created.

ops$tkyte%ORA10GR2> create table t2 ( valid_date date, source_system varchar2(10), product varchar2(10) );

Table created.

ops$tkyte%ORA10GR2> insert into t1 values ( 100, 'SAVINGS', 'UNSECURED', 'UNKNOWN' );

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select distinct financial_product
  2    from t1
  3   where source_system = 'SAVINGS'
  4     and FINANCIAL_PROD_SEC_CLASS_CD = 'UNSECURED'
  5     and SRC_SYS_PROD_CD in (select product from t2 where source_system = 'SAVINGS'
  6                             union all
  7                             select 'UNKNOWN'
  8                               from dual
  9                              where not exists ( select null
 10                                                   from t2
 11                                                  where source_system = 'SAVINGS'
 12                                                    and valid_date < '28-FEB-2007')
 13                                                    )
 14  /

FINANCIAL_PRODUCT
-----------------
              100



Regarding interview questions..

Aravind, May 16, 2007 - 1:29 am UTC

Hi Tom,

I didn't get why the 3rd question is ambiguos..

#1. select count(*) from month
#2. select month,count(*) from month group by month
#3. select month from month having count(month)=(select max(count(*)) from month group by month) group by month
Tom Kyte
May 16, 2007 - 10:25 am UTC

does #3 return A MONTH?

a SINGLE month?

no - it doesn't it returns something between 0 and 12 records - a set, it may well be that the most frequently occurring month is all of them.

Regarding the interview questions

Aravind, May 23, 2007 - 8:04 am UTC

Hi Tom,

I didn't exactly get what you mean. Can you please elaborate?

SQL> select month,count(*) from month group by month;

MONTH COUNT(*)
---------- ----------
1 3
2 6
3 3
4 2
5 9
6 4
7 2
8 1
9 1
10 3
11 2

MONTH COUNT(*)
---------- ----------
12 4

12 rows selected.

SQL> select month from month having count(month)=(select max(count(*)) from month group by month) group by month ;

MONTH
----------
5


SQL> select month,count(*) from month group by month;

MONTH COUNT(*)
---------- ----------
1 3
2 9
3 3
4 2
5 9
6 4
7 2
8 1
9 1
10 3
11 2

MONTH COUNT(*)
---------- ----------
12 4

12 rows selected.


SQL> select month from month having count(month)=(select max(count(*)) from month group by month) group by month;

MONTH
----------
2
5




Regarding the interview Questions

Aravind, May 23, 2007 - 8:07 am UTC

Hi Tom, I didnt get exactly what you meant. Can you please elaborate?

SQL> select month,count(*) from month group by month;

MONTH COUNT(*)
---------- ----------
1 3
2 6
3 3
4 2
5 9
6 4
7 2
8 1
9 1
10 3
11 2

MONTH COUNT(*)
---------- ----------
12 4

12 rows selected.

SQL> select month from month having count(month)=(select max(count(*)) from month group by month) group by month ;

MONTH
----------
5


SQL> select month,count(*) from month group by month;

MONTH COUNT(*)
---------- ----------
1 3
2 9
3 3
4 2
5 9
6 4
7 2
8 1
9 1
10 3
11 2

MONTH COUNT(*)
---------- ----------
12 4

12 rows selected.


SQL> select month from month having count(month)=(select max(count(*)) from month group by month) group by month;

MONTH
----------
2
5

Tom Kyte
May 23, 2007 - 8:52 am UTC

the question i ask is "what is the most frequently occurring month"

the answer must be

"there might not be a SINGLE most frequently occurring month, you must change the question, the question needs to be 'what is the SET of most frequently occurring months' - you must be ready and prepared to have this question result in MORE than one row in the result set, you seem to be assuming a single row"

It is in the same genre as

"show me the person that makes the most money"
"the sales rep with the most sales"

and so on - they are all SETS, not singletons.

2nd example shows multiple rows in return set

Ric, September 08, 2009 - 5:22 am UTC

Tom,

Your blog post about interview questions brought me here. In Aravind's second example, he shows his query WILL deal with/return multiple rows in the return set:

SQL> select month from month having count(month)=(select max(count(*)) from month group by month) group by month;

  MONTH
----------
      2
      5 

Or are you looking for the person just to simply be aware that the question is deliberately ambiguous and the answer is less important??

Tom Kyte
September 08, 2009 - 6:14 am UTC

... Or are you looking for the person just to simply be aware that the question is
deliberately ambiguous ....

I asked two questions for that third part.

I asked for the most frequently occurring month and I asked to be told why that question was poorly worded.


there isn't "A SINGLE" most frequently occurring month in general, there are potentially many (or even all) months