Skip to Main Content

Breadcrumb

Dev Live Dev Intro

We are celebrating Developers at AskTOM. We welcome Developers of all levels of experience to join us at our FREE Developer Live events coming in August and September. Just click on the left to register today! If you are brand new to Database Technology, then we also have got you covered. Just click on the right for your comprehensive FREE training program to kick start your Oracle Database Development journey!

Question and Answer

Tom Kyte

Thanks for the question, wallacel.

Asked: October 24, 2000 - 11:41 am UTC

Answered by: Tom Kyte - Last updated: July 18, 2018 - 4:46 am UTC

Category: Database - Version: 8.1.6

Viewed 100K+ times! This question is

You Asked

Tom:

can you give me some example at which situation
IN is better than exist, and vice versa.

and we said...

Well, the two are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select *
from t1, ( select distinct y from t2 ) t2
where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to the original table -- typically.


As opposed to

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:


for x in ( select * from t1 )
loop
if ( exists ( select null from t2 where y = x.x )
then
OUTPUT THE RECORD
end if
end loop

It always results in a full scan of T1 whereas the first query can make use of an index on T1(x).


So, when is where exists appropriate and in appropriate?

Lets say the result of the subquery
( select y from T2 )

is "huge" and takes a long time. But the table T1 is relatively small and executing ( select null from t2 where y = x.x ) is very very fast (nice index on t2(y)). Then the exists will be faster as the time to full scan T1 and do the index probe into T2 could be less then the time to simply full scan T2 to build the subquery we need to distinct on.


Lets say the result of the subquery is small -- then IN is typicaly more appropriate.


If both the subquery and the outer table are huge -- either might work as well as the other -- depends on the indexes and other factors.

and you rated our response

  (183 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

Thanks for the question regarding "IN & EXISTS", version 8.1.6

March 31, 2001 - 7:29 pm UTC

Reviewer: Seetharaman Srinivasan from Chennai , TamilNadu, India

Excellent

Thanks for the question regarding "IN & EXISTS", version 8.1.6

April 11, 2001 - 3:00 pm UTC

Reviewer: Mani Srinivasan from CA USA

This is good stuff....

May 18, 2001 - 2:13 am UTC

Reviewer: Helena Markova from Bratislava, Slovakia


Could not Call Report from Form (Menu)

May 28, 2001 - 1:45 am UTC

Reviewer: Mohd. Manik Miah from Dhaka, Bangladesh


Dear Sir

I learned oracle a few months ago.
But I could not call report from Form (Menu).
Would you please help me in this regard?

Thank & best regards
Manik Miah
manikmiah@yahoo.com

Instead of T, why cant you use EMP AND DEPT TABLES

September 28, 2001 - 4:31 pm UTC

Reviewer: A reader

Tom

Can you illustrate your answer by using the emp dept table.
and by
1.increasing and decreasing the number of rows in both the tables.
2.by dropping and adding indexes on both the tables

That way your explanation will be vindicated.

Can you do that please.

Tom Kyte

Followup  

September 29, 2001 - 2:43 pm UTC

Well -- I'm not going to use EMP and DEPT as I would have to generate tons of EMP and DEPT data to illustrate my point (feel free to do that if you want to ;)

I'll use BIG and SMALL to make the point. I ran:

rem create table big as select * from all_objects;
rem insert /*+ append */ into big select * from big;
rem commit;
rem insert /*+ append */ into big select * from big;
rem commit;
rem insert /*+ append */ into big select * from big;
rem create index big_idx on big(object_id);
rem
rem
rem create table small as select * from all_objects where rownum < 100;
rem create index small_idx on small(object_id);
rem
rem analyze table big compute statistics
rem for table
rem for all indexes
rem for all indexed columns
rem /
rem analyze table small compute statistics
rem for table
rem for all indexes
rem for all indexed columns
rem /

so, small has 99 rows, big has 133,000+

select count(subobject_name)
from big
where object_id in ( select object_id from small )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.02 0.02 0 993 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.03 0.03 0 993 0 1

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
792 MERGE JOIN
100 SORT (JOIN)
100 VIEW OF 'VW_NSO_1'
99 SORT (UNIQUE)
792 INDEX GOAL: ANALYZED (FULL SCAN) OF 'SMALL_IDX'
(NON-UNIQUE)
891 SORT (JOIN)
0 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'BIG'


versus:

select count(subobject_name)
from big
where exists ( select null from small where small.object_id = big.object_id )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 4.12 4.12 0 135356 15 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 4.12 4.12 0 135356 15 1

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
792 FILTER
135297 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'BIG'
133504 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'SMALL_IDX'
(NON-UNIQUE)

That shows if the outer query is "big" and the inner query is "small", in is generally more efficient then NOT EXISTS.

Now:

select count(subobject_name)
from small
where object_id in ( select object_id from big )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 2 0.51 0.82 50 298 22 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5 0.52 0.83 50 298 22 1



Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
99 MERGE JOIN
16913 SORT (JOIN)
16912 VIEW OF 'VW_NSO_1'
16912 SORT (UNIQUE)
135296 INDEX GOAL: ANALYZED (FAST FULL SCAN) OF 'BIG_IDX'
(NON-UNIQUE)
99 SORT (JOIN)
99 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'SMALL'


versus:
select count(subobject_name)
from small
where exists ( select null from big where small.object_id = big.object_id )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.01 0.01 0 204 12 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.01 0 204 12 1

EGATE)
99 FILTER
100 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'SMALL'
99 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'BIG_IDX' (NON-UNIQUE)

shows that is the outer query is "small" and the inner query is "big" -- a WHERE EXISTS can be quite efficient.




What difference does it make

September 30, 2001 - 12:12 am UTC

Reviewer: A reader

when you remove those indexes on the small and big tables.



Tom Kyte

Followup  

September 30, 2001 - 8:57 am UTC

Run it and see. All of the necessary code is there -- it is very easy to do these sorts of tests.

The outcome is easy to hypothesize however.

The "select * from big where object_id in ( select object_id from small )" will sort BIG once and SMALL once and join them (sort merge join) in all likelyhood.

The "select * from big where exists ( select null from small where small.object_id = big.object_id )" will tend to FULL SCAN big once and FOR EVERY ROW in big -- FULL SCAN small.

(after running it -- i verified it and the "rule of thumb" holds true. BIG outer query and SMALL inner query = IN. SMALL outer query and BIG inner query = WHERE EXISTS. Remember -- thats is a RULE OF THUMB and rules of thumb always have infinitely many exceptions to the rule.


Tom, this is very clever and not fair

September 30, 2001 - 9:33 am UTC

Reviewer: TJ from USA

You have explained the whole of the in and exists concepts so well( no one has ever done in like this, you must be one happy person, you believe in sharing)

and now you leave us with an open ended answer..

"Remember -- thats is a RULE OF THUMB and rules of thumb always have infinitely many exceptions to the rule."

Kindly discuss what can be some of those possible exceptions, tell us what factors can cause those exceptions.




Tom Kyte

Followup  

September 30, 2001 - 10:02 am UTC

Well, there are infinitely many -- IO can affect the outcome of this. The goal of the query can effect this (eg: the WHERE EXISTS will find the first row faster in general then the IN will -- the IN will get the LAST row (all rows) faster then the where exists). If your goal is the FIRST row -- exists might totally blow away IN. If you are a batch process (and hence getting to the LAST row is vital) ....

and so on. The point is: be aware of their differences, try them both when tuning, understand conceptually what they do and you'll be able to use them to maximum effect.



does this explanation hold good even for NOT IN and NOT EXISTS

September 30, 2001 - 7:18 pm UTC

Reviewer: Chenna


Tom Kyte

Followup  

October 01, 2001 - 6:35 am UTC

No, partially because NOT IN and NOT EXISTS are not substitutes for eachother! They mean semantically different things and can/will return different answers. See
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:442029737684 <code>

for example

Summary

October 01, 2001 - 10:38 am UTC

Reviewer: Chenna

This is discussion of the top most class --

O.k. let me try to sum up what you are saying --

in and exists can be substitutes for each other, and we have to use one in the place of other depending upon the rowns returned by the outer and inner query,

where as

'not in' and 'not exists' are not substitutes for each other , and they are not related in any way. There use depends on the specific need.

o.k. to sum it up finally , can you give us examples of when the use of 'not in' is appropriate and when 'not exists' is appropriate.

Tom Kyte

Followup  

October 01, 2001 - 3:05 pm UTC

you got the first part right (in/exists)

the second part is too strong. not in and not exists are not perfect substitutes. NOT IN is different then NOT exists but NOT EXISTS and NOT IN are the same when the subquery you use in the NOT IN does not contain NULLS.

Both of not in and not exists can be very efficient when there are no nulls (and not in WITH THE CBO is pretty good -- using an "anti join" -- see the design/tuning for performance guide for details on that). With NULLS -- a not in can be very in-efficient and many people substitute a not exists for it (not realzing the ANSWER just changed!!!)



anti join

October 01, 2001 - 3:32 pm UTC

Reviewer: A reader

(and not in WITH THE CBO is pretty good -- using an "anti join" -- see the design/tuning for performance guide for details on that)

Please give the link to the concerned page in the manual, there is no search feature in the document to search for 'anti join'

Tom Kyte

Followup  

October 01, 2001 - 8:30 pm UTC

</code> http://docs.oracle.com/docs/cd/A81042_01/DOC/server.816/a76992/optimops.htm#36687 <code>


the mystery with non in and not exists

October 01, 2001 - 9:07 pm UTC

Reviewer: A reader

"Both of not in and not exists can be very efficient when there are no nulls (and not in WITH THE CBO is pretty good -- using an "anti join" -- see the design/tuning for performance guide for details on that). With NULLS -- a not in can be very in-efficient and many people substitute a not exists for it (not realzing the ANSWER just changed!!!) "

1.Not in and not exists return the same performance result when there arer no nulls.
2.Not in is especially very efficient when used with cbo and no nulls.
3.And when there are nulls what is preferred is not exists, and not not in.

I think Iam right in what I understood above.

Now

1.Why is not in in efficient when there are nulls. Can you demonstrate this.
2.And how is not exists better over not in when there are nulls.
3.What do you mean by saying
"and many people substitute a not exists for it (not realzing the ANSWER just changed!!!) "
do you mean not exists should be substituted for not in when there are nulls ???????



Tom Kyte

Followup  

October 02, 2001 - 7:46 pm UTC

1a) not in can be faster then not exists (or slower).  The are different, do different things.  A not exists is alot like an EXISTS.  A not in when the subquery cannot contain NULLS can be alot like an IN (processed as an ANTI-JOIN instead of a JOIN).  So, just like IN is sometimes better then EXISTS and vice versa, NOT IT is sometimes (and sometimes not) better then NOT EXISTS

2a) thats when it works best, yes

3a) typically...

1b) Ok, time for the demo of NOT IN and NOT EXISTS (i guess I knew it was coming eventually!)

Here we go:

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table big as select * from all_objects;
Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table small as select * from all_objects where rownum < 10;
Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table big modify object_id null;
Table altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table small modify object_id null;
Table altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> update small set object_id = null where rownum = 1;

1 row updated.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create unique index big_idx on big(object_id);
Index created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create unique index small_idx on small(object_id);
Index created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table big compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns;
Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table small compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns;
Table analyzed.


So, there is the big setup -- we have our tables.  SMALL will be used in the subquery.  As it stands now -- lets see what happens with NOT IN and NOT EXISTS:


ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly

ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from big
  2   where NOT EXISTS (select null
  3                       from small
  4                      where small.object_id = big.object_id)
  5  /

16886 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
   3    1     INDEX (UNIQUE SCAN) OF 'SMALL_IDX' (UNIQUE)


Statistics
----------------------------------------------------------
        305  recursive calls
         15  db block gets
      18278  consistent gets
          0  physical reads
          0  redo size
    1961046  bytes sent via SQL*Net to client
     125305  bytes received via SQL*Net from client
       1127  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      16886  rows processed

So, for each row in BIG it did an INDEX PROBE into SMALL to see if that row existed or not.  So far, so good...  18k consistent reads (high) but not as high as:


ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from big
  2   where object_id NOT IN ( select object_id
  3                              from small )
  4  /
no rows selected


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
   3    1     TABLE ACCESS (FULL) OF 'SMALL' (Cost=1 Card=1 Bytes=3)




Statistics
----------------------------------------------------------
          0  recursive calls
     202743  db block gets
      84706  consistent gets
          0  physical reads
          0  redo size
        862  bytes sent via SQL*Net to client
        319  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          0  rows processed

Wow -- what happened here?  Well, for each row in BIG -- we FULL SCANNED small -- yes, for each row, we full scanned small -- we full scanned small some 16,000 times (imagine if small weren't so small).  Also take care to note that this query processed 0 rows - no data found.  The prior one got over 16k -- that is the effect of a NULL in the subquery of a NOT IN.

So, lets get rid of that nasty (we don't have to get rid of it, we could use a predicate to preclude it as well and we'll do that on BIG as a demonstration)


ops$tkyte@ORA817DEV.US.ORACLE.COM> update small set object_id = -1 where object_id is null;
1 row updated.

ops$tkyte@ORA817DEV.US.ORACLE.COM> commit;
Commit complete.

ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table small modify object_id NOT NULL;
Table altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table small compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns;
Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big
  2   where NOT EXISTS (select null
  3                       from small
  4                      where small.object_id = big.object_id)
  5  /

16886 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
   3    1     INDEX (UNIQUE SCAN) OF 'SMALL_IDX' (UNIQUE)




Statistics
----------------------------------------------------------
         71  recursive calls
         15  db block gets
      18270  consistent gets
          0  physical reads
          0  redo size
    1961046  bytes sent via SQL*Net to client
     125305  bytes received via SQL*Net from client
       1127  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      16886  rows processed

So, that is no different... but:


ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b0
  2   where object_id IS NOT NULL
  3     and object_id NOT IN ( select object_id
  4                              from small )
  5  /

16886 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
   3    1     INDEX (UNIQUE SCAN) OF 'SMALL_IDX' (UNIQUE)




Statistics
----------------------------------------------------------
          0  recursive calls
         15  db block gets
      18243  consistent gets
          0  physical reads
          0  redo size
    1961046  bytes sent via SQL*Net to client
     125305  bytes received via SQL*Net from client
       1127  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      16886  rows processed

Hey -- thats pretty good already. Its just like the NOT EXISTS.  But wait -- there is more.  Lets start anti-joining...


ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set always_anti_join = MERGE
  2  /
Session altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b1
  2   where object_id is not null
  3     and object_id NOT IN ( select object_id
  4                              from small )
  5  /

16886 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=269 Card=16885 Bytes=1857350)
   1    0   MERGE JOIN (ANTI) (Cost=269 Card=16885 Bytes=1857350)
   2    1     SORT (JOIN) (Cost=267 Card=16894 Bytes=1638718)
   3    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=16894 Bytes=1638718)
   4    1     SORT (UNIQUE) (Cost=3 Card=9 Bytes=117)
   5    4       VIEW OF 'VW_NSO_1' (Cost=1 Card=9 Bytes=117)
   6    5         INDEX (FULL SCAN) OF 'SMALL_IDX' (UNIQUE) (Cost=1 Card=9 Bytes=36)




Statistics
----------------------------------------------------------
          0  recursive calls
         19  db block gets
        237  consistent gets
        236  physical reads
          0  redo size
    1863304  bytes sent via SQL*Net to client
     125305  bytes received via SQL*Net from client
       1127  SQL*Net roundtrips to/from client
          4  sorts (memory)
          1  sorts (disk)
      16886  rows processed


Now that is much much less work -- from 18k to 237 consistent gets -- not bad.  The hash anti join is similar:

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set always_anti_join = HASH
  2  /

Session altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b2
  2   where object_id is not null
  3     and object_id NOT IN ( select object_id
  4                              from small )
  5  /

16886 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=145 Card=16885 Bytes=1857350)
   1    0   HASH JOIN (ANTI) (Cost=145 Card=16885 Bytes=1857350)
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=16894 Bytes=1638718)
   3    1     VIEW OF 'VW_NSO_1' (Cost=1 Card=9 Bytes=117)
   4    3       INDEX (FULL SCAN) OF 'SMALL_IDX' (UNIQUE) (Cost=1 Card=9 Bytes=36)




Statistics
----------------------------------------------------------
          0  recursive calls
         15  db block gets
        237  consistent gets
         31  physical reads
          0  redo size
    2350779  bytes sent via SQL*Net to client
     125305  bytes received via SQL*Net from client
       1127  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
      16886  rows processed


2b) see 1b)

3b) see 1b) where the NOT IN returns 0 rows, but NOT EXISTS returns 16k -- they are NOT the same when NULLs are involved.
 

ANTI JOIN and 'ANY', 'SOME','ALL'

October 02, 2001 - 4:27 pm UTC

Reviewer: Chenna

Optimizing Anti-Joins and Semi-Joins
An anti-join returns rows from the left side of the predicate for which there is no corresponding row on the right side of the predicate. That is, it returns rows that fail to match (NOT IN) the subquery on the right side. For example, an anti-join can select a list of employees who are not in a particular set of departments:

SELECT * FROM emp
WHERE deptno NOT IN
(SELECT deptno FROM dept
WHERE loc = 'HEADQUARTERS');


The optimizer uses a nested loops algorithm for NOT IN subqueries by default, unless the initialization parameter ALWAYS_ANTI_JOIN is set to MERGE or HASH and various required conditions are met that allow the transformation of the NOT IN subquery into a sort-merge or hash anti-join. You can place a MERGE_AJ or HASH_AJ hint in the NOT IN subquery to specify which algorithm the optimizer should use.

A semi-join returns rows that match an EXISTS subquery, without duplicating rows from the left side of the predicate when multiple rows on the right side satisfy the criteria of the subquery. For example:

SELECT * FROM dept
WHERE EXISTS
(SELECT * FROM emp
WHERE dept.ename = emp.ename
AND emp.bonus > 5000);


In this query, only one row needs to be returned from dept even though many rows in emp might match the subquery. If there is no index on the bonus column in emp, then a semi-join can be used to improve query performance.

The optimizer uses a nested loops algorithm for EXISTS subqueries by default, unless the initialization parameter ALWAYS_SEMI_JOIN is set to MERGE or HASH and various required conditions are met. You can place a MERGE_SJ or HASH_SJ hint in the EXISTS subquery to specify which algorithm the optimizer should use.



SELECT * FROM dept
WHERE EXISTS
(SELECT * FROM emp
WHERE dept.ename = emp.ename
AND emp.bonus > 5000);
What difference does it make when there is an index on the bonus column.

A semi join is already being executed in the above query, right ?

How will the use of MERGE_SJ or HASH_SJ hint in the EXISTS subquery improve the performance.

what do these hints tell the optimizer.


And , finally, can you give examples of using 'ANY' ,'ALL' AND 'SOME' clauses, as to where and when to use these clauses. We have never used these clauses.


Tom Kyte

Followup  

October 02, 2001 - 7:53 pm UTC

ANY/SOME are alot like an in.

where x in ( select y from t )

is the same as:

where x = ANY(select y from t)

SOME is an aliase for ANY, ANY is the same as SOME.

any/some can be useful with inequalities:

scott@ORA817DEV.US.ORACLE.COM> select ename, sal, comm from emp where sal < some( select comm from emp );

ENAME SAL COMM
---------- ---------- ----------
A 800
WARD 1250 500
MARTIN 1250 1400
ADAMS 1100
JAMES 950
MILLER 1300

6 rows selected.

that finds emps such that there exists SOME commision greater then their salary....

ALL is a cool one. I use it to find the most frequently ocurring thing, for example:

scott@ORA817DEV.US.ORACLE.COM> select to_char(hiredate,'mm')
2 from emp
3 group by to_char(hiredate,'mm')
4 having count(*) >= ALL ( select count(*) from emp group by to_char(hiredate,'mm') );

TO
--
12

that shows me the month with the MOST hiredates.... it says give me the month whose count is greater than or equal to ALL other counts by month...



Tom, will the result be the same if ..

October 03, 2001 - 2:18 am UTC

Reviewer: Chenna

ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set always_anti_join = MERGE
2 /
Session altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b1
2 where object_id is not null
3 and object_id NOT IN ( select object_id
4 from small )
5 /

16886 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=269 Card=16885 Bytes=1857350)
1 0 MERGE JOIN (ANTI) (Cost=269 Card=16885 Bytes=1857350)
2 1 SORT (JOIN) (Cost=267 Card=16894 Bytes=1638718)
3 2 TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=16894 Bytes=1638718)
4 1 SORT (UNIQUE) (Cost=3 Card=9 Bytes=117)
5 4 VIEW OF 'VW_NSO_1' (Cost=1 Card=9 Bytes=117)
6 5 INDEX (FULL SCAN) OF 'SMALL_IDX' (UNIQUE) (Cost=1 Card=9
Bytes=36)




Statistics
----------------------------------------------------------
0 recursive calls
19 db block gets
237 consistent gets
236 physical reads
0 redo size
1863304 bytes sent via SQL*Net to client
125305 bytes received via SQL*Net from client
1127 SQL*Net roundtrips to/from client
4 sorts (memory)
1 sorts (disk)
16886 rows processed


1.Tom, will the result be the same if there is no index on the object_id of the small table.


2.You are using CBO , all the way, what will be the result of all these tests for 'in and exists' and 'not in and not exists' if we are using RBO.

3.After we analyze the tables, how should we switch to RBO if we want to.

4.Will the push_sq (push sub query ) come into picture in any way inthe above examples. Actually when is this hint relevant.

5.out of context: Can you give us the link to the oracle error messages manual.

6.Suppose if the query's where clause has 5 sub queries, for the third sub query in the where clause , what will be the outer query, I mean how will it resolve.

For the fifth sub query in the where clause, I guess the whole of the restof the query will be the outer query ?

Tom Kyte

Followup  

October 03, 2001 - 8:02 am UTC

1) yes more or less -- the ANSWER will be the same (16,886 rows processed). The plan will be slightly different -- a FULL SCAN of small instead of an index full scan. (you have ALL of the code needed to replicate my tests -- try it out!)

2) I don't recommend the RBO with 8i -- too many features not available. But -- hey -- you have the code, give it a whirl, try it out (but make sure you have lots of time -- the RBO is not as smart as the CBO when it comes to this -- reread this entire article, I stress the need for the CBO to make this efficient!)

3) use an alter session set optimizer_goal or /*+ RULE */ hint in the query.

4) push_subq affects when subqueries are evaluated (last by default, earlier if you ask for it). So yes, it could affect the plan here.

5) all documentation is available from </code> http://otn.oracle.com/, <code>the Oracle error messages manual varies from release to release. Suggest you goto otn, follow the documentation link and goto the version you are interested in.

6) this does not make syntactic sense to me.




Train us to use parallel query

October 03, 2001 - 6:37 pm UTC

Reviewer: A reader

From the oracle magazine article


What Can Execute in Parallel?
By using parallel options, data-sensitive SQL state-
ments, database recovery, and data loads can be exe-
cuted by multiple processes simultaneously. With the
release of Oracle 8i, all the following operations can
be executed in parallel:

Table scan
PL/SQL functions called from SQL
Sort merge join
"Not in"
Nested loop join
Select distinct
Hash join
Aggregation
Group by
Cube
Union and union all
Create table as select
Order by
Rebuild index partition
Rollup
Move partition
Create index
Update
Rebuild index
Insert . . . select
Split partition
Enable constraint (the table
Delete scan is parallelized)
Star transformation


If the want to execute the above not in sub queries in parallel, actually when should I decide to do it in paralle.

Will it suffice if I give the following.

Select * from big b1
2 where object_id is not null
3 and object_id NOT IN ( select/*+ PARALLEL(a1,8)*/ object_id from small a1)

usually I see that it is the Sort merge joins which take the most amount of time in a query, how can I parallelize a sort merge join.

Thank You





Tom Kyte

Followup  

October 04, 2001 - 6:16 am UTC

If you have to parallelize the subquery -- you are in a heap of trouble. that means it is really big and since you would be full scanning IT once per row in big -- its going to be slow. Look at the other options below in order to AVOID having to full scan it once per row.

My advice here -- try it yourself. Take your ideas, set up the test case, test it out -- see what works, what doesn't. Its the best way to really learn and understand how this works.

I will do it only my own but -- --

October 04, 2001 - 11:45 am UTC

Reviewer: Chenna

Select * from big b1
2 where object_id is not null
3 and object_id NOT IN ( select/*+ PARALLEL(a1,8)*/ object_id from
small a1)


O.k. , let me try parallel, for myself, and report the conclusions to you for discussion.

But before that , you need to weed out these small doubts for me ..

1.select/*+ PARALLEL(a1,8)*/ object_id from
small a1 -- before using the parallel hint, do I need to change any init.ora parameters,

2.I understand that 8 refers to the number of processors, how can I know the number of processors available for my server.

3.Where you use parallel query, how does my query get divided between multiple processors. Where are the results from each divided query kept , and then combined , before they are displayed to the user.

4.Give us a brief insignt into the internals of how parallel query works.

And by the way, I cant see the asktom or any other images on this page. Some thing you might want to know about.


Tom Kyte

Followup  

October 04, 2001 - 8:15 pm UTC

1) probably not, do a show parameter parallel and check out the settings - make sure parallel max servers isn't 0

2) you own the machine don't you? show parameter cpu_count will tell you as well

3) read
</code> http://docs.oracle.com/docs/cd/A81042_01/DOC/server.816/a76965/c22paral.htm#365 <code>

4) see #3


as for the images, must have been a temporary "issue" - they seem alright now.

October 04, 2001 - 10:53 pm UTC

Reviewer: Chenna

SQL> connect system/manager
Connected.
SQL> show parameter cpu_count

NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
cpu_count                            integer 1
SQL>  show parameter parallel

NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
fast_start_parallel_rollback         string  LOW
optimizer_percent_parallel           integer 0
parallel_adaptive_multi_user         boolean FALSE
parallel_automatic_tuning            boolean FALSE
parallel_broadcast_enabled           boolean FALSE
parallel_execution_message_size      integer 2148
parallel_instance_group              string
parallel_max_servers                 integer 5
parallel_min_percent                 integer 0
parallel_min_servers                 integer 0
parallel_server                      boolean FALSE

NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
parallel_server_instances            integer 1
parallel_threads_per_cpu             integer 2
recovery_parallelism                 integer 0
SQL> 

The cpu count shows only one , but look at this 
parallel_max_servers                 integer 5.

1.how can the parallel_max_servers be 5 when the cpu count is 1.
2.Does this mean that I can give a parallel hint to use 5 parallel servers.

3.When the cpu count is 1 I expect the parallel_max_servers 
to be 1. 

Perplexing? Need to be clarified.

 

Tom Kyte

Followup  

October 05, 2001 - 11:49 am UTC

did you read the documentation i pointed you to?

you can do parallel query on a single cpu box. You can do a parallel 100 query on a 20 cpu box. You need not have parity between processes and cpu's (if you did, unix would be a single user OS).

You can have parallel_max_servers at whatever you want, regardless of the number of cpus on the system. Since parallel queries typically involve tons of IO, and IO is something you wait for -- there is plenty of opportunity for using parallel query on a single CPU system (when one process is waiting for IO, blocked, the other process can be munching through some data it read).

Read the docs, get a good understanding of how it really works -- then it won't be confusing.

sub query

October 06, 2001 - 5:34 pm UTC

Reviewer: Jay

Tom

Lets assume that the where clause has 10 conditions, and each condition gets its value from a sub query. IN what order do these sub queries get resolved.

select a from b,c
were b.x=c.x
and b.t=(select t from some ttablle where something)
and c.l=(select l from some xtable where something)

Now when resolving and c.l=(select l from some xtable where something)

will select a from b,c
were b.x=c.x
and b.t=(select t from some ttablle where something)

be considered as outer query, or only
select a from b,c
were b.x=c.x

be considered as outer query




Tom Kyte

Followup  

October 07, 2001 - 4:02 pm UTC

when looking at the IN, the outer query is the result set that will be joined to the IN result.

if b.t = (select t ... ) is processed FIRST (before c.l = ( select l...)), the that would be the "outer query" with regards to c.1 = ( select l...).

However -- you cannot count on the order of operation of predicates!!! we can and will change the order at will.

Good topic

May 17, 2002 - 12:14 pm UTC

Reviewer: A reader

Tom and team, Many thanks. Love your site.

I'm running Oracle 8.1.7.3
I need help with a "NOT IN" query.
Here is what the tables look like:

Table 1:

CREATE TABLE SKU
(ITEM_ID NUMBER NOT NULL,
SKU_CODE VARCHAR2(10) NOT NULL,
SKU_DESC VARCHAR2(255) NOT NULL
,
CONSTRAINT PK_SKU
PRIMARY KEY (ITEM_ID, SKU_CODE)
USING INDEX)
/

CREATE TABLE SKU_SOLD_OUT
(ITEM_ID NUMBER NOT NULL,
SKU_CODE VARCHAR2(10) NOT NULL,
SOLD_OUT_DATE DATE NOT NULL,
DELETED CHAR(1) NOT NULL,
SOLD_OUT_TEXT VARCHAR2(1000) NOT NULL
,
CONSTRAINT PK_SKU_SOLD_OUT
PRIMARY KEY (ITEM_ID, SKU_CODE, SOLD_OUT_DATE)
USING INDEX)
/

With item_id, sku_code combinations having zero to many rows in sku_sold_out
but only one will be active (deleted = 'N') at a time.

I need to get a list of items that are not completely sold out. i.e.
If an Item_id has four sku_codes and three are currently sold out, return the item_id.
If all four sku_codes are sold out, do not return the item_id.

Here is the query I am useing now.

select distinct item_id
from sku
where item_id||'-'||sku_code not in (
select item_id||'-'||sku_code
from sku_sold_out
where deleted = 'N'
and sold_out_date <= sysdate)

sku has about 10,000 rows
sku sold out has about 120,000 rows.

Is there a better way of doing this? The schema is set and is production so I cant
make any database changes other than views, procs, ect.

Thanks for the help.
Thanks

Tom Kyte

Followup  

May 17, 2002 - 1:12 pm UTC

Well, other ways would be


select distinct item_id
from sku
where (item_id,sku_code) not in ( select item_id, sku_code
from sku_sold_out
where deleted = 'N'
and sold_out_date <= sysdate)
/

select distinct sku.item_id
from sku, sku_sold_out
where sku.item_id = sku_sold_out.item_id (+)
and sku.sku_code = sku_sold_out.sku_code (+)
and nvl(sku_sold_out.deleted,'Y') = 'Y'
and nvl(sku_sold_out.sold_out_date,sysdate) <= sysdate
/




Index result of Subquery and FILTER it

May 18, 2002 - 12:21 am UTC

Reviewer: Jaromir D.B. Nemec from Vienna, Austria

Hi Tom,

in the original explanation you told us:

The subquery is evaluated, distinct'ed, >>>indexed<<<< (or hashed or sorted) and then ....

1) Can you explain how can be the result of a subquery indexed.
temporary index is created or something like that? What you see in explain plan in that case? - never heard of this feature.

2) Some of the explain plans you demonstrated consist of:

FILTER
row set 1
row set 2

I've learned that FILTER is a kind of "filter" to suppress rows that doesn't match some condition. This is evidently NOT (in) the case.



Can you explain (or link) what is the functionality of this two-row-ser-filter?
My experience is that this kind of FILTER operate similar to nested loops and is not performable in parallel - is it right?

Thanks


Jaromir

www.db-nemec.com



Tom Kyte

Followup  

May 18, 2002 - 10:07 am UTC

1) it is stored sorted or hashed.  The structure itself is an implicit "indexing" of sorts.  Lets say you were doing a sort merge join -- you would see two sorts (out to temp) and then a merge.  A hash join -- one of the sets gets hashed.

for #2, I assume you mean something like:

ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b0
  2   where object_id IS NOT NULL
  3     and object_id NOT IN ( select object_id
  4                              from small )
  5  /

16886 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
   3    1     INDEX (UNIQUE SCAN) OF 'SMALL_IDX' (UNIQUE)



The FILTER in this case is sort of like a nested loops join yes.  The table access full is done -- and the "is not null" part is evaluated.  Then each row must be filtered for the NOT IN and that is done via the index.  

This could be done in parallel -- meaning the full scan can be parallelized.

 

Useful info about IN & EXISTS

July 23, 2002 - 3:06 pm UTC

Reviewer: Vish Inamdar from USA

Yes, this article gives me clear idea how IN & EXISTS are different from each other and where to apply them.

Good and simple example made it more clear.

I will be asking more question in future.

Vish


Exists, IN and consistent gets

August 22, 2002 - 8:04 am UTC

Reviewer: Vikas Sharma from Delhi India

Hi tom,

Can i directly relate the consistent gets of plan with the time taken by the query executed by using IN and another same query executed by using exists. Actually I have query which I execute in two different ways eg.

Query with exists :
select person_id from persons where exists
(select con_person_id from contacts where con_person_id = person_id ) and contact_details = 'Y'

93 rows selected ..
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=951 Card=1 Bytes=9)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'PERSONS' (Cost=951 Card=1 Bytes=
9)

3 1 INDEX (UNIQUE SCAN) OF 'CON_PK' (UNIQUE)

Statistics
----------------------------------------------------------
0 recursive calls
8 db block gets
6365 consistent gets
0 physical reads
0 redo size
2967 bytes sent via SQL*Net to client
1091 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
93 rows processed


Query with IN :
select person_id from persons where person_id in (select con_person_id from contacts ) and contact_details = 'Y';

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=187 Card=1 Bytes=17)
1 0 NESTED LOOPS (Cost=187 Card=1 Bytes=17)
2 1 INDEX (FULL SCAN) OF 'CON_PK' (UNIQUE) (Cost=1 Card=93 B
ytes=744)

3 1 TABLE ACCESS (BY INDEX ROWID) OF 'PERSONS' (Cost=2 Card=
1 Bytes=9)

4 3 INDEX (UNIQUE SCAN) OF 'PER_PK' (UNIQUE) (Cost=1 Card=
1)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
543 consistent gets
0 physical reads
0 redo size
2967 bytes sent via SQL*Net to client
1091 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
93 rows processed

The persons table has 31000+ records and the contact table has 120+ record in this case as you proved the IN is better I am too getting the same results as you can see the IN is fast.

I would like to know that is it always true for two same queries (executed in the different way ) if the consistent gets for a query are more then that query is not optimized and will always result a poor show, we should always go for the query which results less consistent get.

I read in you book also the consistent gets depend upon the order of the data of the index columns. It will be more when the data for the index column is disordered. Does that will always effect the time taken by the query.


Tom Kyte

Followup  

August 22, 2002 - 9:14 am UTC

Less consistent gets is a good goal. In generate, the less consistent gets

o the faster it'll be
o the more SCALABLE it'll be (as you add users, you can add MORE users with less consistent gets)

Always is a very very strong term. Almost Always i would say.


As for the last one -- that is due to a large index range scan that'll be getting LOTS of data from the table -- not really applicable in this example. Does it affect the time taken by the query -- probably, the more consistent gets, the more latching (thats a lock, a serialization thing), the longer it takes to get the data.

August 22, 2002 - 10:05 am UTC

Reviewer: Ashok, Bombay ,India

your style of explaining things is just amazing..!!you must be one helluva teacher !!!

Thanks a Lot

August 22, 2002 - 12:58 pm UTC

Reviewer: Vikas Sharma from Delhi India

Hi Tom,

Thanks lot

The way you explain the things it is great..

Regards,

Vikas Sharma

Very helpful

August 22, 2002 - 8:45 pm UTC

Reviewer: Bijay from Fremont, CA

Tom
I wonder how you can find so much time in replying to all questions. Its amazing a simple question can create such a big chain of Q & A.
You have been a great source of knowledge for all of us. Thank you for being you.
Bj

why is my query plan using NL? where as your use SORT-MERGE

September 05, 2002 - 3:03 pm UTC

Reviewer: A reader

Tom,

I am trying to understand exists and not exists

 analyze table big compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns
  5  /

Table analyzed.

SQL> analyze table small compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns
  5  /

Table analyzed.



SQL> truncate table plan_table;

Table truncated.

SQL> explain plan for
  2  select count(subobject_name)
  from big
 where object_id in ( select object_id from small )
  3    4    5  /

Explained.

SQL> @$ORACLE_HOME/rdbms/admin/utlxpls.sql
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |          |     1 |   19 |    598 |       |       |
|  SORT AGGREGATE           |          |     1 |   19 |        |       |       |
|   NESTED LOOPS            |          |   396 |    7K|    598 |       |       |
|    VIEW                   |VW_NSO_1  |    99 |    1K|      4 |       |       |
|     SORT UNIQUE           |          |    99 |  396 |      4 |       |       |
|      INDEX FULL SCAN      |SMALL_IDX |    99 |  396 |      1 |       |       |
|    TABLE ACCESS BY INDEX R|BIG       |   113K|  665K|      6 |       |       |
|     INDEX RANGE SCAN      |BIG_IDX   |   113K|      |      2 |       |       |
--------------------------------------------------------------------------------

11 rows selected.


Thans and Regards,                           

Tom Kyte

Followup  

September 05, 2002 - 10:27 pm UTC

things like sort_area_size, db_block_buffers, hash area size, database version, etc will all affect the plans.

Update Process Options

September 09, 2002 - 10:32 am UTC

Reviewer: SK from NJ

Hi Tom,

I have to update my production table daily with new/update data. My update process is like this.

Table big_table (col1 integer, col2 integer, col3 integer)
Table Small_table (c1 integer, c2 integer, c3 integer)

Small table is a external Table. and Big Table as index on Col1.

My update process is like this:

Delete from big_table where col1 in (select c1 from small_table);
insert into big_table select c1,c2,c3 from small_table.

My Big table has 32 million rows and small table has 400K rows. This process is taking 30 minutes to do the above two processes. The requirement is to reduce this time. How can we do it. I believe using Exists will do both full table scans so that will not be efficient. Where as using in may use indexes (but I doubt that it is using indexes).

1. Should I go for updating using procedure where I delete records one by one and I will be sure that processes will use indexes. But that is not my choice and it will defy your first rule of using single statement instead of using procedure.

2. In one such update my big_table has more than 500 million records. and small Table has 7 million records. I have 24GB of temp tablespace. My delete statement is failing due to unable to allocate extents on temp tablespace. I am surprised on this. Is Temp tablespace of 24GB too large? In the above delete statement will it use only col1 and c1 for sort/merge. How do I estimate the temp tablespace size for this transaction. I am running Oracle 9iR2 RAC.

Your response will be highly appreciated.

Thanks

Tom Kyte

Followup  

September 09, 2002 - 10:41 am UTC

why delete and insert.

Just MERGE, see
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1615330671789 <code>



Follow up

September 09, 2002 - 11:41 am UTC

Reviewer: SK from NJ

Hi Tom,

I am using merge for some tables where I have one to one relationship between the two table. But for many big tables I have multiple records for same key (tables with non unique key). So will merge will take care of following situation where for big table has 5 records for col1. and small table has 3 records for the same key. Will merge replace 5 rows in big table with 3 rows from small table for the same key (means deleteion of two records from big table).

I will certainly prefer merge if this is possible.

Regards

Tom Kyte

Followup  

September 09, 2002 - 12:30 pm UTC

Then that is not a merge.

A merge would leave the same number of rows.

You'll need to do the delete+insert. You'll want to use tkprof to see what it is doing now and see if it can be speeded up. Deleting 400k rows is going to take a while if you have indexes on the table.

Oracle Optimizer and In & Exists

September 11, 2002 - 12:55 am UTC

Reviewer: Subhrajyoti Paul from Mumbai, India

Tom,
From the discussion above, I get the impression that In and exists clause can be used interchangibly, in case there is a subquery inside the IN clause. But, one would be better than the other depending on the amount data retrieved in the inner and the outer query.
If they can be used interchangibly, then it sounds logical that the optimizer can substitute an in clause with an exists and vice versa, given it is provided with proper statistics... Does this happen in practice... Or am I making some mistakes in my understanding of the things...

One other question,

"select *
from t1, ( select distinct y from t2 ) t2
where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then
joined to the original table -- typically. "

Here you mentioned that it is distinct'ed, which means it is sorted, now once sorted, will oracle favor the use of merge sort join, because the sort operation is already complete...or may it choose to Indexing or hash join. How much importance is given to the fact that one step (sort, in this case) can be eliminated from the join operation, or is this relevant at all...




Tom Kyte

Followup  

September 11, 2002 - 7:44 am UTC

And in 8i, ... 9i, .... 10i, and on up -- that is true. You'll see interesting query rewrites with both (given that you use the cost based optimizer which I can assure you you will be in 10i as the RBO is gone in that release)


distinct <> sort. SORT is SORT, distinct is distinct. DISTINCT does not imply or infer or mean SORTed. Never has, never will.

But how otherwise can u perform a distinct if u do not sort...

September 11, 2002 - 8:14 am UTC

Reviewer: Subhrajyoti Paul from Mumbai, India

Tom,
I am relieved that the optimizer does the substitution...And ofcourse, I use cost based optimizer...

But the distinct thing.. If distincted data does not mean sorted, how otherwise can u have distinct data if u did not sort it... Can you give me an example...

Subhro

Tom Kyte

Followup  

September 11, 2002 - 8:39 am UTC

Lets say you have a partitioned table.

You used hash partitioning on the ID column.


We know that across partitions -- ID is "unique" (eg: if ID=5 is in partition 1, ID=5 will NOT be in partitions 2, 3, 4, .... and so on).

We use parallel query.

Each partitioned is distincted and then glued together. ID=5 from partition 1 could come after ID=1000 from partition 2 if the PQ slave processing partition 2 was "faster" then the one doing partition 1.


For example. There are other cases as well. distinct, group by, etc -- do not imply "sorted"

Thanks a lot... I got the point...

September 11, 2002 - 8:46 am UTC

Reviewer: Subhrajyoti Paul from Mumbai, India

Tom,
Most of what u said in the last followup is a bouncer to me(in cricket, a bouncer is a ball that goes above the batsman's head ;-) )...

But, I got the point that distinct does not mean sorted...

I just could not think of a case myself ..so the example was needed...

Thanks a lot... everytime i think i got u...u prove me wrong... cool....

not in & not exists

November 05, 2002 - 5:31 am UTC

Reviewer: Brigitte from Germany

Hallo Tom,

Some times ago I changed a query to "not exists", because "not in" takes some hours.
Now I tried your interesting followup from October 01, 2001 with the test database, but alter session
set always_anti_join = HASH/MERGE hasn't an effect. Why?
notesid is not null in all tables, it is the primary key in testidxdokument,testidxvorgang and testidxmappe

Sql> set autotrace trace
Sql> set timing on
Sql> r
1 select distinct mandant,notesid from testleserfeldgruppe l
2 where not exists
3 (select 'x' from testidxdokument
4 where l.notesid = notesid)
5 and not exists
6 (select 'x' from testidxvorgang
7 where l.notesid = notesid)
8 and not exists
9 (select 'x' from testidxmappe
10 where l.notesid = notesid)
11* order by mandant, notesid

no rows selected

Elapsed: 00:00:01.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=35)
1 0 SORT (ORDER BY) (Cost=6 Card=1 Bytes=35)
2 1 SORT (UNIQUE) (Cost=5 Card=1 Bytes=35)
3 2 FILTER
4 3 TABLE ACCESS (FULL) OF 'TESTLESERFELDGRUPPE' (Cost=3 Card=1 Bytes=35)
5 3 INDEX (UNIQUE SCAN) OF 'UK_TESTIDXDOKUMENT' (UNIQUE) (Cost=1 Card=1 Bytes=18)
6 3 INDEX (UNIQUE SCAN) OF 'UK_TESTIDXVORGANG' (UNIQUE) (Cost=1 Card=1 Bytes=18)
7 3 INDEX (UNIQUE SCAN) OF 'UK_TESTIDXMAPPE' (UNIQUE) (Cost=1 Card=1 Bytes=18)

Statistics
----------------------------------------------------------
0 recursive calls
4 db block gets
1507 consistent gets
0 physical reads
0 redo size
160 bytes sent via SQL*Net to client
251 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
0 rows processed

Sql> r
1 select distinct mandant,notesid from testleserfeldgruppe where notesid not in
2 (select notesid from testidxdokument)
3 and notesid not in
4 (select notesid from testidxvorgang)
5 and notesid not in
6 (select notesid from testidxmappe)
7* order by mandant, notesid

no rows selected

Elapsed: 00:00:02.44

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=35)
1 0 SORT (ORDER BY) (Cost=6 Card=1 Bytes=35)
2 1 SORT (UNIQUE) (Cost=5 Card=1 Bytes=35)
3 2 FILTER
4 3 TABLE ACCESS (FULL) OF 'TESTLESERFELDGRUPPE' (Cost=3Card=1 Bytes=35)
5 3 TABLE ACCESS (FULL) OF 'TESTIDXDOKUMENT' (Cost=12 Card=303 Bytes=5454)
6 3 TABLE ACCESS (FULL) OF 'TESTIDXVORGANG' (Cost=1 Card=9 Bytes=162)
7 3 TABLE ACCESS (FULL) OF 'TESTIDXMAPPE' (Cost=1 Card=9 Bytes=162)

Statistics
----------------------------------------------------------
0 recursive calls
3304 db block gets
32597 consistent gets
0 physical reads
0 redo size
167 bytes sent via SQL*Net to client
249 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
0 rows processed

Sql> alter session set always_anti_join = MERGE;

Session altered.

Sql> select distinct mandant,notesid from testleserfeldgruppe where notesid not in
2 (select notesid from testidxdokument)
3 and notesid not in
4 (select notesid from testidxvorgang)
5 and notesid not in
6 (select notesid from testidxmappe)
7 order by mandant, notesid;

no rows selected

Elapsed: 00:00:01.23

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=35)
1 0 SORT (ORDER BY) (Cost=6 Card=1 Bytes=35)
2 1 SORT (UNIQUE) (Cost=5 Card=1 Bytes=35)
3 2 FILTER
4 3 TABLE ACCESS (FULL) OF 'TESTLESERFELDGRUPPE' (Cost=3 Card=1 Bytes=35)
5 3 TABLE ACCESS (FULL) OF 'TESTIDXDOKUMENT' (Cost=12 Card=303 Bytes=5454)
6 3 TABLE ACCESS (FULL) OF 'TESTIDXVORGANG' (Cost=1 Card=9 Bytes=162)
7 3 TABLE ACCESS (FULL) OF 'TESTIDXMAPPE' (Cost=1 Card=9 Bytes=162)

Statistics
----------------------------------------------------------
0 recursive calls
3304 db block gets
32597 consistent gets
0 physical reads
0 redo size
167 bytes sent via SQL*Net to client
249 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
0 rows processed

Sql> alter session set always_anti_join = HASH;

Session altered.

Sql> select distinct mandant,notesid from testleserfeldgruppe where notesid not in
2 (select notesid from testidxdokument)
3 and notesid not in
4 (select notesid from testidxvorgang)
5 and notesid not in
6 (select notesid from testidxmappe)
7 order by mandant, notesid;

no rows selected

Elapsed: 00:00:01.13

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=35)
1 0 SORT (ORDER BY) (Cost=6 Card=1 Bytes=35)
2 1 SORT (UNIQUE) (Cost=5 Card=1 Bytes=35)
3 2 FILTER
4 3 TABLE ACCESS (FULL) OF 'TESTLESERFELDGRUPPE' (Cost=3 Card=1 Bytes=35)
5 3 TABLE ACCESS (FULL) OF 'TESTIDXDOKUMENT' (Cost=12 Card=303 Bytes=5454)
6 3 TABLE ACCESS (FULL) OF 'TESTIDXVORGANG' (Cost=1 Card=9 Bytes=162)
7 3 TABLE ACCESS (FULL) OF 'TESTIDXMAPPE' (Cost=1 Card=9 Bytes=162)

Statistics
----------------------------------------------------------
0 recursive calls
3304 db block gets
32597 consistent gets
0 physical reads
0 redo size
167 bytes sent via SQL*Net to client
249 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
0 rows processed

Regards


Tom Kyte

Followup  

November 05, 2002 - 9:44 am UTC

are those statistics close to right? 9 rows and such -- on such a trivial amount of data -- all bets are off

either get real stats on there or load the tables with real data.

not in or not exists

November 06, 2002 - 10:06 am UTC

Reviewer: Brigitte from Germany

Hallo Tom,
here are more data.
1. Why has the alter session no effect. What's wrong? Some parameters?
2. what means: "elapsed 02:133:8032.30", I know hours:minutes:seconds
Does it mean 2 hours + 13 minutes +...
or 2 hours + 2 hours + 13 minutes +...

06.11.02 07:53 Anzahl der Tabellenzeilen Seite: 1
dms

DMSIDXDOKUMENT : 46220
DMSIDXMAPPE : 13
DMSIDXVORGANG : 464
DMSLESERFELDGRUPPE : 137369

Na so was> r
1 select distinct mandant,notesid from dmsleserfeldgruppe l
2 where not exists
3 (select 'x' from dmsidxdokument
4 where l.notesid = notesid)
5 and not exists
6 (select 'x' from dmsidxvorgang
7 where l.notesid = notesid)
8 and not exists
9 (select 'x' from dmsidxmappe
10 where l.notesid = notesid)
11* order by mandant, notesid

no rows selected

Elapsed: 00:00:19.18

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=71 Card=25 Bytes=950)
1 0 SORT (ORDER BY) (Cost=71 Card=25 Bytes=950)
2 1 SORT (UNIQUE) (Cost=68 Card=25 Bytes=950)
3 2 FILTER
4 3 TABLE ACCESS (FULL) OF 'DMSLESERFELDGRUPPE' (Cost=65 Card=25 Bytes=950)
5 3 INDEX (UNIQUE SCAN) OF 'UK_DMSIDXDOKUMENT' (UNIQUE) (Cost=1 Card=1 Bytes=32)
6 3 INDEX (UNIQUE SCAN) OF 'UK_DMSIDXVORGANG' (UNIQUE) (Cost=1 Card=1 Bytes=31)
7 3 INDEX (UNIQUE SCAN) OF 'UK_DMSIDXMAPPE' (UNIQUE)

Statistics
----------------------------------------------------------
245 recursive calls
7 db block gets
149823 consistent gets
1439 physical reads
0 redo size
161 bytes sent via SQL*Net to client
246 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
18 sorts (memory)
0 sorts (disk)
0 rows processed

Na so was> r
1 select distinct mandant,notesid from dmsleserfeldgruppe where notesid not in
2 (select notesid from dmsidxdokument)
3 and notesid not in
4 (select notesid from dmsidxvorgang)
5 and notesid not in
6 (select notesid from dmsidxmappe)
7* order by mandant, notesid

no rows selected

Elapsed: 02:133:8032.30

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=71 Card=25 Bytes=950)
1 0 SORT (ORDER BY) (Cost=71 Card=25 Bytes=950)
2 1 SORT (UNIQUE) (Cost=68 Card=25 Bytes=950)
3 2 FILTER
4 3 TABLE ACCESS (FULL) OF 'DMSLESERFELDGRUPPE' (Cost=65 Card=25 Bytes=950)
5 3 TABLE ACCESS (FULL) OF 'DMSIDXDOKUMENT' (Cost=118 Card=2101 Bytes=67232)
6 3 TABLE ACCESS (FULL) OF 'DMSIDXVORGANG' (Cost=1 Card=21 Bytes=651)
7 3 TABLE ACCESS (FULL) OF 'DMSIDXMAPPE' (Cost=1 Card=2 Bytes=50)

Statistics
----------------------------------------------------------
0 recursive calls
464062 db block gets
101189223 consistent gets
12940155 physical reads
0 redo size
167 bytes sent via SQL*Net to client
244 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
0 rows processed

Na so was> alter session set always_anti_join = MERGE;

Session altered.

Na so was> r
1 select distinct mandant,notesid from dmsleserfeldgruppe where notesid not in
2 (select notesid from dmsidxdokument)
3 and notesid not in
4 (select notesid from dmsidxvorgang)
5 and notesid not in
6 (select notesid from dmsidxmappe)
7* order by mandant, notesid

no rows selected

Elapsed: 02:133:8036.06

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=71 Card=25 Bytes=950)
1 0 SORT (ORDER BY) (Cost=71 Card=25 Bytes=950)
2 1 SORT (UNIQUE) (Cost=68 Card=25 Bytes=950)
3 2 FILTER
4 3 TABLE ACCESS (FULL) OF 'DMSLESERFELDGRUPPE' (Cost=65 Card=25 Bytes=950)
5 3 TABLE ACCESS (FULL) OF 'DMSIDXDOKUMENT' (Cost=118 Card=2101 Bytes=67232)
6 3 TABLE ACCESS (FULL) OF 'DMSIDXVORGANG' (Cost=1 Card=21 Bytes=651)
7 3 TABLE ACCESS (FULL) OF 'DMSIDXMAPPE' (Cost=1 Card=2 Bytes=50)

Statistics
----------------------------------------------------------
0 recursive calls
464112 db block gets
101219920 consistent gets
13238978 physical reads
0 redo size
167 bytes sent via SQL*Net to client
244 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
0 rows processed

Na so was> alter session set always_anti_join = HASH;

Session altered.

Na so was> select distinct mandant,notesid from dmsleserfeldgruppe where notesid not in
2 (select notesid from dmsidxdokument)
3 and notesid not in
4 (select notesid from dmsidxvorgang)
5 and notesid not in
6 (select notesid from dmsidxmappe)
7 order by mandant, notesid;

no rows selected

Elapsed: 02:135:8144.32

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=71 Card=25 Bytes=950)
1 0 SORT (ORDER BY) (Cost=71 Card=25 Bytes=950)
2 1 SORT (UNIQUE) (Cost=68 Card=25 Bytes=950)
3 2 FILTER
4 3 TABLE ACCESS (FULL) OF 'DMSLESERFELDGRUPPE' (Cost=65 Card=25 Bytes=950)
5 3 TABLE ACCESS (FULL) OF 'DMSIDXDOKUMENT' (Cost=118 Card=2101 Bytes=67232)
6 3 TABLE ACCESS (FULL) OF 'DMSIDXVORGANG' (Cost=1 Card=21 Bytes=651)
7 3 TABLE ACCESS (FULL) OF 'DMSIDXMAPPE' (Cost=1 Card=2 Bytes=50)

Statistics
----------------------------------------------------------
0 recursive calls
464327 db block gets
101357656 consistent gets
14016238 physical reads
0 redo size
167 bytes sent via SQL*Net to client
244 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
0 rows processed

Regards

not in & not exists

November 08, 2002 - 3:20 am UTC

Reviewer: Brigitte from Germany

Hallo Tom,
I hope the data which I send on November 06 are enough.
I think a difference from 19 minutes to some hours is significant.
I'm very interested in your reply.

Regards

Tom Kyte

Followup  

November 08, 2002 - 7:46 am UTC

Brigitte -- for something so big, you'll need to do it as a question - this is for followups, quick questions. This is an entire question in a question.

Is the logic backwards for NOT IN and NOT EXISTS

November 15, 2002 - 4:27 pm UTC

Reviewer: Doug from CT, USA

Tom - this may be an oversimplification - however if I was to distill whether IN or EXISTS was better I would say this after reading your reply. An IN will generally full table scan the subquery, and an EXISTS will generally full table scan the outer query. So, if my outer loop is small and my inner code is large then an exists might be appropriate but if my outer loop is large and my inner code is small (the match is quick).. then an IN might be appropriate. Is that correct? If so.. is the situation completely reversed if using NOT in and NOT exists? I would guess is you were using not in.. you would have to full table scan the OUTER query rather than sorting,hashing, the inner one. You did cover that NOT IN will return zero rows if it is NOT IN (null).. but I wasn't sure if the psuedo-code for processing is backwards from the original examples if one is using NOT IN and NOT EXISTS.

Tom Kyte

Followup  

November 15, 2002 - 8:21 pm UTC

I encapsulate it with this:

...
So, when is where exists appropriate and in appropriate?

Lets say the result of the subquery
( select y from T2 )

is "huge" and takes a long time. But the table T1 is relatively small and
executing ( select null from t2 where y = x.x ) is very very fast (nice index on
t2(y)). Then the exists will be faster as the time to full scan T1 and do the
index probe into T2 could be less then the time to simply full scan T2 to build
the subquery we need to distinct on.


Lets say the result of the subquery is small -- then IN is typicaly more
appropriate.


If both the subquery and the outer table are huge -- either might work as well
as the other -- depends on the indexes and other factors.

.....


full scans -- no, they enter not into the picture at all IN GENERAL. If the subquery can use indexes, it'll use an index with IN or EXIST.

It is if the subquery is LARGE and TAKES A LONG TIME TO BUILD.... (regardless of full scan)


Not in and not exists is even harder to characterize (pull my AOTC slides from the home page for an interesting example -- they will be there for a while around November of 2002 -- if you are reading this later then that, they might not be there anymore)

How to make NOT EXISTS handle comparison of NULLs in same way as MINUS

March 13, 2003 - 10:30 am UTC

Reviewer: Jim Dickson (DBA/developer) from London, UK

As ever, learn a lot from this site.

But one thing not addressed is how to make an antijoin deal with NULLs. The set operators (UNION, MINUS) do this automatically.

But can it be done efficiently with NOT EXISTS or NOT IN.

e.g.
I need to compare 2 tables such that two rows are duplicates if corresponding columns are either equal, or both contain NULL.

I cannot use MINUS since that columns being compared are a subset of the row, but i need to return the whole row.

My understanding is that I cannot use NOT IN because it does not handle "null comparison".

NOT EXISTS works but seems to use NESTED LOOPS => v slow
My understanding is that HASH antijoin will not work because HASH join only works for equality tests (which fail for NULLs).

I can post example code if this helps.


Tom Kyte

Followup  

March 14, 2003 - 5:22 pm UTC

both not exists and ESPECIALLLY not in do not do nulls like that.

not in and not exists don't even treat nulls the same in fact. do this:

select * from emp where empno not in ( select mgr from emp );

and compare that to the "equivalent" not exists:

select * from emp where not exists ( select null from emp e2 where e2.mgr = emp.empno );

you'll get a different answer.

You would have to tediously compare column by column by column

where (x = y or (x is null and y is null))

otherwise.

How to make NOT EXISTS handle comparison of NULLs in same way as MINUS

March 17, 2003 - 12:08 pm UTC

Reviewer: Jim Dickson from London, UK

I absolutely agree but the question more about access paths than semantics.

I believe i understand the semantics surrounding nulls - which is why MINUS/INTERSECT are such great operators.

I am prepared to write "tedious" column comparison.

Can I get Oracle to perform comparisons via a single-pass full table scan access method (hash, sort-merge).

I have only seen Cartesian product/Nested Loops approach which obviously does not scale with number of rows.

Is there any chance of Oracle implementing ANSI set operator extension "CORRESPONDING BY {column-list}" ?



Tom Kyte

Followup  

March 17, 2003 - 1:04 pm UTC

curious -- why cannot you simply use MERGE/INTERSECT?

How to make NOT EXISTS handle comparison of NULLs in same way as MINUS

March 18, 2003 - 4:47 am UTC

Reviewer: Jim Dickson from London, UK

Because I am not merging into target table directly.

The sequence of processing is

1/ Read an external table
2/ Identify differences based on a subset of columns
3/ Write out differences to OS for further processing (fuzzy logic on Name & Address data)
4/ Read 2nd external table containing processed data and then "merge" into target table.

Where I have a primary key, I can use MINUS for step 2.
But where there is no PK, then I am trying to get NOT EXISTS to scale.

Hope that makes sense.

BTW Is MERGE statement ANSI?

Tom Kyte

Followup  

March 18, 2003 - 7:47 am UTC

well, not following you by what you mean by "getting not exists to scale" -- but give me an EXAMPLE of the syntax you would like to use and describe what would be returned and I'll see if there isn't a way to code that...


Don't know if merge is "ansi". Thing is NIST stop testing for ansi compliance years ago, no one does it, the standard is so huge now that full compliance is pretty much "not going to happen" and database indenpendence, well, I don't even go there -- the only results I see from that over the last 15 years has been applications that perform marginally well on a single database and blow chunks on the rest.

How to make NOT EXISTS handle comparison of NULLs in same way as

March 19, 2003 - 4:01 am UTC

Reviewer: Jim Dickson from London, UK

Code posted below.

The NOT EXISTS does not include the SECONDARY column.

Ran with 100 rows, 1000 rows and 2389 rows.
On my machine

MINUS 100 rows 00.44 secs, 8 db block gets
1000 rows 00.27 secs, 7 db block gets
2389 rows 00.51 secs, 4 db block gets

NOT EXISTS 100 rows 00.43 secs, 208 db block gets
1000 rows 02.90 secs, 2007 db block gets
2389 rows 11.75 secs, 4780 db block gets

I need this to run with 10m rows.

spool not_exists.2389.out

set timing on
set echo on

drop table t1;
drop table t2;

create table t1 as select * from all_objects where rownum <= 2500;
create table t2 as select * from all_objects where rownum <= 2500;

analyze table t1 compute statistics;
analyze table t2 compute statistics;

--desc t1
--desc t2

insert into t1 values ('EXTRA ROW', null, null, null, null, null, null, null, null, null, null, null, null);

select count(*) from t1;
select count(*) from t2;

set autotrace on

select count(*) from
(
select * from t1
minus
select * from t2
);

select count(*) from t1 where NOT EXISTS
( select null from t2 where
( t1.OWNER = t2.OWNER or ( t1.OWNER is null and t2.OWNER is null ) )
and
( t1.OBJECT_NAME = t2.OBJECT_NAME or ( t1.OBJECT_NAME is null and t2.OBJECT_NAME is null ) )
and
( t1.SUBOBJECT_NAME = t2.SUBOBJECT_NAME or ( t1.SUBOBJECT_NAME is null and t2.SUBOBJECT_NAME is null ) )
and
( t1.OBJECT_ID = t2.OBJECT_ID or ( t1.OBJECT_ID is null and t2.OBJECT_ID is null ) )
and
( t1.DATA_OBJECT_ID = t2.DATA_OBJECT_ID or ( t1.DATA_OBJECT_ID is null and t2.DATA_OBJECT_ID is null ) )
and
( t1.OBJECT_TYPE = t2.OBJECT_TYPE or ( t1.OBJECT_TYPE is null and t2.OBJECT_TYPE is null ) )
and
( t1.CREATED = t2.CREATED or ( t1.CREATED is null and t2.CREATED is null ) )
and
( t1.LAST_DDL_TIME = t2.LAST_DDL_TIME or ( t1.LAST_DDL_TIME is null and t2.LAST_DDL_TIME is null ) )
and
( t1.TIMESTAMP = t2.TIMESTAMP or ( t1.TIMESTAMP is null and t2.TIMESTAMP is null ) )
and
( t1.STATUS = t2.STATUS or ( t1.STATUS is null and t2.STATUS is null ) )
and
( t1.TEMPORARY = t2.TEMPORARY or ( t1.TEMPORARY is null and t2.TEMPORARY is null ) )
and
( t1.GENERATED = t2.GENERATED or ( t1.GENERATED is null and t2.GENERATED is null ) )
)
;

spool off
exit

As ever, thanks.

Think the whole Oracle community is disappointed that the new book will be delayed.

Tom Kyte

Followup  

March 20, 2003 - 1:01 pm UTC

what I meant by:

but
give me an EXAMPLE of the syntax you would like to use and describe what would
be returned and I'll see if there isn't a way to code that...

was -- give me a scenario where you want to use not exists instead of minus (eg: in the above, i certainly would NOT use not exists) and I'll see what I can do with that.

That is -- give me a real case to work with whereby minus cannot be used.

March 21, 2003 - 1:39 pm UTC

Reviewer: green

Hi Tom,
Can you please explain a litte bit 'NOT IN' here? Where can it be used properly?

Thanks!

Tom Kyte

Followup  

March 21, 2003 - 2:21 pm UTC

I don't understand?

not in can be used anywhere you need "not in" -- where "record" not in "set".


How to use MINUS on subset of columns

March 27, 2003 - 4:51 am UTC

Reviewer: Jim Dickson from London, UK

I use MINUS extensively but in certain situations I believe it is unsuitable.

e.g.

When trying to find the difference between 2 tables, where at least 1 of tables lack a primary key,
using a subset of columns for comparison and some of those columns can be null.

The real life example is comparing a new list of suppressions against an existing table of suppressions.
(Suppressions used in marketing to avoid target customers who do not wish to receive offers).
The ddl is pretty much address1-10, suppression_flags1-10.
The comparison is done on address lines only but return set must be all columns.

Hope that makes sense.

The ANSI syntax i would love to use is

select * from suppressions_new
except corresponding by (address1, address2, ...., address10)
select * from suppressions_old

Suppressions_new will have no primary key.

How can i achieve this in Oracle using MINUS instead of EXCEPT.

If it cannot be done using MINUS, then how to achieve reasonable performance using NOT EXISTS.

Tom Kyte

Followup  

March 27, 2003 - 8:34 am UTC

Ok, I have a big_table (4million rows) and a sample of that (400k rows). Using NVL() on columns that "might be nullable" and supplying an unreasonable value (eg: chr(0) for strings, 99999999999999999999999999 for numbers, to_date(01010001,'ddmmyyyyy') for dates) you can anti join with NOT IN like this:

select *
from big_table
where (object_id, nvl(subobject_name,chr(0)) )
NOT IN (select object_id, nvl(subobject_name,chr(0))
from ten_percent
where nvl(subobject_name,chr(0)) IS NOT NULL )
and nvl(subobject_name,chr(0)) is not null

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 114.65 248.86 118052 60214 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 114.65 248.86 118052 60214 0 0

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 60

Rows Row Source Operation
------- ---------------------------------------------------
0 HASH JOIN ANTI (cr=60214 r=118052 w=58940 time=77317035 us)
3817191 TABLE ACCESS FULL BIG_TABLE (cr=54700 r=54649 w=0 time=14975098 us)
381274 TABLE ACCESS FULL TEN_PERCENT (cr=5514 r=4463 w=0 time=1360329 us)


Now, that "scales" just as well as:

select *
from big_table
where (object_id, object_name )
NOT IN (select object_id, object_name
from ten_percent )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 114.19 224.33 117115 60213 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 114.19 224.33 117115 60213 0 0

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 60

Rows Row Source Operation
------- ---------------------------------------------------
0 HASH JOIN ANTI (cr=60213 r=117115 w=57995 time=224338032 us)
3817191 TABLE ACCESS FULL BIG_TABLE (cr=54699 r=54657 w=0 time=16561038 us)
381274 TABLE ACCESS FULL TEN_PERCENT (cr=5514 r=4463 w=0 time=1308385 us)


without the NVL's.


Use of "nvl(subobject_name,chr(0)) IS NOT NULL "

March 27, 2003 - 1:27 pm UTC

Reviewer: Logan Palanisamy from Santa Clara, CA USA

Tom,

In your example,

select *
from big_table
where (object_id, nvl(subobject_name,chr(0)) )
NOT IN (select object_id, nvl(subobject_name,chr(0))
from ten_percent
where nvl(subobject_name,chr(0)) IS NOT NULL )
and nvl(subobject_name,chr(0)) is not null


1. What does chr(0) translate to? When I run "select chr(0) from dual", it returns me an invisble character. My db character set is WE8ISO8859P15

2. What does "nvl(subobject_name,chr(0)) IS NOT NULL" do? Doesn't it evaluate to TRUE always, since chr(0) is a not null value?

3. Also, the number of rows returned as shown in TKPROF output is zero. How come? Shouldn't return about 3.6 millon rows?

4. I am familiar with TKPROF. But what does the name really stand for? Is it an acronym for something?

Grateful as ever.

Tom Kyte

Followup  

March 27, 2003 - 2:14 pm UTC

1) exactly -- it is just "\0" in C, a null string. It is an improbable character, something subobject_name cannot be...

2) exactly -- I'm telling the optimizer "this query returns only data that is NOT NULL". That permits the use of the anti-join which is what our goal was here.

3) no, there were no rows "not in" the other set.

4) tool kit profiler is the rumorous name I've heard it refered to as.

LIKE.... OR LIKE.... OR

May 12, 2003 - 11:41 am UTC

Reviewer: Robert from PA

Tom,
Is there a better way (with a sub-query) to do
LIKE .... OR LIKE ... OR LIKE ... ?
e.g.:
select * from user_objects where
OBJECT_TYPE LIKE 'PROC%' OR OBJECT_TYPE LIKE 'FUNC%' OR OBJECT_TYPE LIKE 'VIEW%" ?

Thanks


Tom Kyte

Followup  

May 12, 2003 - 1:13 pm UTC

that would be the most "optimal" way normally -- it could permit a normal index on object_type to be used.

You could shorten it to

where substr( object_type, 1, 4 ) in ( 'PROC', 'FUNC', 'VIEW' )

in your case -- and if you had a function based index on substr(object_type,1,4) it could use an index.

EXISTS and FTS

May 14, 2003 - 6:57 pm UTC

Reviewer: Nitin from Atlanta

I tried the following.
SQL> create table outer (a number);

Table created.

SQL> create index outer_ind on outer (a);

Index created.

SQL> create table inner (b number);

Table created.

SQL> create index inner_ind on inner (b);

Index created.

SQL> select 'x' from outer where exists (select 'x' from inner where outer.a = inner.b);

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'OUTER'
   3    1     INDEX (RANGE SCAN) OF 'INNER_IND' (NON-UNIQUE)


SQL> analyze table outer compute statistics;

Table analyzed.

SQL> analyze table inner compute statistics;

Table analyzed.

SQL> select 'x' from outer where exists (select 'x' from inner where outer.a = inner.b);

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=13)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'OUTER' (Cost=1 Card=1 Bytes=13)
   3    1     INDEX (RANGE SCAN) OF 'INNER_IND' (NON-UNIQUE) (Cost=1 C
          ard=1 Bytes=13)

SQL> insert into outer select OBJECT_ID from user_objects;

40899 rows created.

SQL> commit;

Commit complete.

SQL> analyze table outer compute statistics;

Table analyzed.

SQL> select 'x' from outer where exists (select 'x' from inner where outer.a = inner.b);

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=2045 Bytes=81
          80)

   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'OUTER' (Cost=4 Card=2045 Bytes=8
          180)

   3    1     INDEX (RANGE SCAN) OF 'INNER_IND' (NON-UNIQUE) (Cost=1 C
          ard=1 Bytes=13)


Should it not perform an index scan on the OUTER table?

Regards

Nitin 

Tom Kyte

Followup  

May 14, 2003 - 11:46 pm UTC

why would it?

every row must be inspected.
the rows are very narrow
the table is SMALLER then the index (the index has 2 columns -- the column and the rowid, the table is just a column)

use "real" examples.


ops$tkyte@ORA920LAP> create table t1 as select * from all_objects;

Table created.

ops$tkyte@ORA920LAP> create index t1_idx on t1(object_id);

Index created.

ops$tkyte@ORA920LAP> analyze table t1 compute statistics
  2  for table for all indexes for all indexed columns;

Table analyzed.

ops$tkyte@ORA920LAP> create table t2 as select * from all_objects where 1=0;

Table created.

ops$tkyte@ORA920LAP> create index t2_idx on t2(object_id);

Index created.

ops$tkyte@ORA920LAP> analyze table t1 compute statistics
  2  for table for all indexes for all indexed columns;

Table analyzed.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> set autotrace traceonly explain
ops$tkyte@ORA920LAP> select * from t1
  2  where exists ( select null from t2 where t2.object_id = t1.object_id );

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=168 Card=82 Bytes=8938)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=1 Bytes=96)
   2    1     NESTED LOOPS (Cost=168 Card=82 Bytes=8938)
   3    2       SORT (UNIQUE)
   4    3         TABLE ACCESS (FULL) OF 'T2' (Cost=2 Card=82 Bytes=1066)
   5    2       INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1 Card=1)



ops$tkyte@ORA920LAP> set autotrace on
 

Still FTS

May 15, 2003 - 12:17 pm UTC

Reviewer: Nitin

I replicated all the steps performed by you. And I am getting the following execution plan.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=55 Card=3515 Bytes=2
70655)

1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'T1' (Cost=55 Card=3515 Bytes=270
655)

3 1 INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE) (Cost=1 Card
=1 Bytes=13)


The db is 8.1.7.4. Is is possible that there is some init.ora parameter which is dictating the above FTS execution plan?

Thanks

Tom Kyte

Followup  

May 15, 2003 - 5:57 pm UTC

that is fundementally the same plan I had -- so?

it needs to full scan, mine full scanned.

tell me -- what would the "straw man plan" you think would be better be exactly?

Still FTS

May 15, 2003 - 7:12 pm UTC

Reviewer: Nitin

The difference is that the optimizer is choosing to perform FTS on T1 (which is huge) and use index scan on T2 which is very small.

Regards



Tom Kyte

Followup  

May 15, 2003 - 7:34 pm UTC

ops$tkyte@ORA817DEV> set autotrace traceonly explain
ops$tkyte@ORA817DEV> select * from t1, ( select distinct object_id from t2 ) t2
  2  where t1.object_id = t2.object_id;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=1 Bytes=109)
   1    0   NESTED LOOPS (Cost=5 Card=1 Bytes=109)
   2    1     VIEW (Cost=3 Card=1 Bytes=13)
   3    2       SORT (UNIQUE) (Cost=3 Card=1 Bytes=13)
   4    3         INDEX (FULL SCAN) OF 'T2_IDX' (NON-UNIQUE)
   5    1     TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=26793 Bytes=2572128)
   6    5       INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1 Card=26793)



in 8i seems to be the way to go -- things change, the optimizer gets smarter 

What about with updates?

May 21, 2003 - 2:16 am UTC

Reviewer: Brett from Australia

Tom,

How does IN with a subquery affect performance of updates - is the subquery executed once for each row (which could be really inefficient), or once for the query. eg:

update table1
set column_1 = 'Y'
where row_id in (select row_id
from table1 t1,
table2 t2
where t1.f_key = t2.row_id
and t2.class_cd = 'Address'
and t2.created > sysdate - 30);

Is this subquery executed only once for the entire update? Or would it be better to use the subquery to insert the data into a temporary table with a unique constraint/primary key and then use an updatable view based on table1 and the temporary table?

Tom Kyte

Followup  

May 21, 2003 - 8:36 am UTC

the subquery, since it is not a correlated subquery, would tend to be done "once"



Now, if t2.row_id is UNIQUE, and there is a unique/primary key on it I would code:

ops$tkyte@ORA920> update ( select column_1
  2             from t1, t2
  3                    where t1.f_key = t2.row_id
  4                      and t2.class_cd = 'Address'
  5                          and t2.created > sysdate-30 )
  6    set column_1 = 'Y'
  7  /


 

Are these same?

May 22, 2003 - 12:48 pm UTC

Reviewer: Nitin

Are the following SQL's similar? Will they retrieve same set of data under all circumstances?

1.

select
APRCN_ID_BI, APPRAISAL_DT, RANK_ID
from APPRECIATN_TBL WHERE
(APRCN_ID, APRCN_ID, ACTIVITY_ID, NOTES_ID) in ( SELECT
APRCN_ID, APRCN_ID, ACTIVITY_ID, NOTES_ID
FROM APPRECIATN_TBL_TMP TMP WHERE
APPRECIATN_TBL.APRCN_ID = TMP.APRCN_ID AND
APPRECIATN_TBL.APRCN_ID = TMP.APRCN_ID AND
APPRECIATN_TBL.ACTIVITY_ID = TMP.ACTIVITY_ID AND
APPRECIATN_TBL.NOTES_ID = TMP.NOTES_ID AND TMP.PROCESS_INSTANCE =
363480 AND TMP.APRCN_ID_BI = '52000' AND TMP.APPRAISAL_DT =
TO_DATE('2003-01-19','YYYY-MM-DD') AND TMP.RANK_ID = '55053')


2.

select
APRCN_ID_BI, APPRAISAL_DT, RANK_ID
from APPRECIATN_TBL WHERE
EXISTS ( SELECT
APRCN_ID, APRCN_ID, ACTIVITY_ID, NOTES_ID
FROM APPRECIATN_TBL_TMP TMP WHERE
APPRECIATN_TBL.APRCN_ID = TMP.APRCN_ID AND
APPRECIATN_TBL.APRCN_ID = TMP.APRCN_ID AND
APPRECIATN_TBL.ACTIVITY_ID = TMP.ACTIVITY_ID AND
APPRECIATN_TBL.NOTES_ID = TMP.NOTES_ID AND TMP.PROCESS_INSTANCE =
363480 AND TMP.APRCN_ID_BI = '52000' AND TMP.APPRAISAL_DT =
TO_DATE('2003-01-19','YYYY-MM-DD') AND TMP.RANK_ID = '55053')


Regards


Tom Kyte

Followup  

May 23, 2003 - 8:25 am UTC

they are the same, but the first is less efficient by far. a correlated "IN", wouldn't do that.

query

May 22, 2003 - 7:09 pm UTC

Reviewer: mo

Tom:

Do you know why the first query does not produce any results while the other does:

select request_id from shipm_request where request_id not in (Select distinct request_id from shipment);

select request_id from shipm_request a where request_id not exists (select request_id from shipment b where a.request_id = b.request_id);

Table shipm_request(request_id) has one to many relationship with table shipment(shipment_id).
Thanks

Tom Kyte

Followup  

May 23, 2003 - 8:39 am UTC

lookup upwards for this text:

No, partially because NOT IN and NOT EXISTS are not substitutes for eachother!

and read that. it is all about NULL

Proof IN / EXISTS

August 29, 2003 - 1:13 pm UTC

Reviewer: Marcio from br

Tom, I did test to prove to my colleague the theory of IN/EXISTS exposed here (He just ignore it and ask to developers to use IN always -- to improve performance).
But when I executed -- to me they are same!
take a look

ops$t_mp00@MRP9I1> drop table small;

Table dropped.

ops$t_mp00@MRP9I1> drop table big;

Table dropped.

ops$t_mp00@MRP9I1> ed teste_in_exist

ops$t_mp00@MRP9I1> create table big as select * from all_objects;

Table created.

ops$t_mp00@MRP9I1> insert /*+ append */ into big select * from big;

31317 rows created.

ops$t_mp00@MRP9I1> commit;

Commit complete.

ops$t_mp00@MRP9I1> insert /*+ append */ into big select * from big;

62634 rows created.

ops$t_mp00@MRP9I1> commit;

Commit complete.

ops$t_mp00@MRP9I1> insert /*+ append */ into big select * from big;

125268 rows created.

ops$t_mp00@MRP9I1> commit;

Commit complete.

ops$t_mp00@MRP9I1> create index big_idx on big(object_id);

Index created.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> create table small as select * from all_objects where rownum < 100;

Table created.

ops$t_mp00@MRP9I1> create index small_idx on small(object_id);

Index created.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> analyze table big compute statistics
2 for table
3 for all indexes
4 for all indexed columns
5 /

Table analyzed.

ops$t_mp00@MRP9I1> analyze table small compute statistics
2 for table
3 for all indexes
4 for all indexed columns
5 /

Table analyzed.

ops$t_mp00@MRP9I1> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.2.1 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production

c:\migracao\loads>plus /

SQL*Plus: Release 9.2.0.2.0 - Production on Fri Aug 29 14:07:08 2003

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.2.1 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production

ops$t_mp00@MRP9I1> @teste_in_exist
ops$t_mp00@MRP9I1> rem create table big as select * from all_objects;
ops$t_mp00@MRP9I1> rem insert /*+ append */ into big select * from big;
ops$t_mp00@MRP9I1> rem commit;
ops$t_mp00@MRP9I1> rem insert /*+ append */ into big select * from big;
ops$t_mp00@MRP9I1> rem commit;
ops$t_mp00@MRP9I1> rem insert /*+ append */ into big select * from big;
ops$t_mp00@MRP9I1> rem commit;
ops$t_mp00@MRP9I1> rem create index big_idx on big(object_id);
ops$t_mp00@MRP9I1> rem
ops$t_mp00@MRP9I1> rem
ops$t_mp00@MRP9I1> rem create table small as select * from all_objects where rownum < 100;
ops$t_mp00@MRP9I1> rem create index small_idx on small(object_id);
ops$t_mp00@MRP9I1> rem
ops$t_mp00@MRP9I1> rem analyze table big compute statistics
ops$t_mp00@MRP9I1> rem for table
ops$t_mp00@MRP9I1> rem for all indexes
ops$t_mp00@MRP9I1> rem for all indexed columns
ops$t_mp00@MRP9I1> rem /
ops$t_mp00@MRP9I1> rem analyze table small compute statistics
ops$t_mp00@MRP9I1> rem for table
ops$t_mp00@MRP9I1> rem for all indexes
ops$t_mp00@MRP9I1> rem for all indexed columns
ops$t_mp00@MRP9I1> rem /
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> alter session set sql_trace = true;

Session altered.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> select count(subobject_name)
2 from big
3 where object_id in ( select object_id from small )
4 /

COUNT(SUBOBJECT_NAME)
---------------------
0

1 row selected.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> select count(subobject_name)
2 from big
3 where exists ( select null from small where small.object_id = big.object_id )
4 /

COUNT(SUBOBJECT_NAME)
---------------------
0

1 row selected.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> select count(subobject_name)
2 from small
3 where object_id in ( select object_id from big )
4 /

COUNT(SUBOBJECT_NAME)
---------------------
0

1 row selected.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> select count(subobject_name)
2 from small
3 where exists ( select null from big where small.object_id = big.object_id )
4 /

COUNT(SUBOBJECT_NAME)
---------------------
0

1 row selected.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> alter session set sql_trace = false;

Session altered.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> set echo off
ops$t_mp00@MRP9I1> host tkprof e:\oracle\admin\mrp9i1\udump\mrp9i1_ora_1876.trc sys=n output=tkprof.txt explain=/

TKPROF: Release 9.2.0.2.1 - Production on Fri Aug 29 14:07:57 2003

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

ops$t_mp00@MRP9I1>

tkprof.txt
==========


select count(subobject_name)
from big
where object_id in ( select object_id from small )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 7.11 7.45 0 3424 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 7.13 7.47 0 3424 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 102 (OPS$T_MP00)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=3424 r=0 w=0 time=7457761 us)
792 NESTED LOOPS SEMI (cr=3424 r=0 w=0 time=7454167 us)
250536 TABLE ACCESS FULL BIG (cr=3422 r=0 w=0 time=1579621 us)
792 INDEX RANGE SCAN SMALL_IDX (cr=2 r=0 w=0 time=2667651 us)(object id 40890)


Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
792 NESTED LOOPS (SEMI)
250536 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'BIG'
792 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'SMALL_IDX'
(NON-UNIQUE)

********************************************************************************

select count(subobject_name)
from big
where exists ( select null from small where small.object_id = big.object_id )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 6.99 6.99 0 3421 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 7.00 7.00 0 3421 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 102 (OPS$T_MP00)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=3421 r=0 w=0 time=6995659 us)
792 NESTED LOOPS SEMI (cr=3421 r=0 w=0 time=6992099 us)
250536 TABLE ACCESS FULL BIG (cr=3419 r=0 w=0 time=1473069 us)
792 INDEX RANGE SCAN SMALL_IDX (cr=2 r=0 w=0 time=2557279 us)(object id 40890)


Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
792 NESTED LOOPS (SEMI)
250536 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'BIG'
792 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'SMALL_IDX'
(NON-UNIQUE)

********************************************************************************

select count(subobject_name)
from small
where object_id in ( select object_id from big )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.01 0.00 0 105 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.02 0.01 0 105 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 102 (OPS$T_MP00)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=105 r=0 w=0 time=7280 us)
99 NESTED LOOPS SEMI (cr=105 r=0 w=0 time=6659 us)
99 TABLE ACCESS FULL SMALL (cr=4 r=0 w=0 time=886 us)
99 INDEX RANGE SCAN BIG_IDX (cr=101 r=0 w=0 time=3305 us)(object id 40888)


Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
99 NESTED LOOPS (SEMI)
99 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'SMALL'
99 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'BIG_IDX' (NON-UNIQUE)


********************************************************************************

select count(subobject_name)
from small
where exists ( select null from big where small.object_id = big.object_id )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.01 0.00 0 105 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.01 0 105 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 102 (OPS$T_MP00)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=105 r=0 w=0 time=7255 us)
99 NESTED LOOPS SEMI (cr=105 r=0 w=0 time=6643 us)
99 TABLE ACCESS FULL SMALL (cr=4 r=0 w=0 time=863 us)
99 INDEX RANGE SCAN BIG_IDX (cr=101 r=0 w=0 time=3243 us)(object id 40888)


Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
99 NESTED LOOPS (SEMI)
99 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'SMALL'
99 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'BIG_IDX' (NON-UNIQUE)

To me they are same!
Any thought?


Tom Kyte

Followup  

August 29, 2003 - 1:17 pm UTC

yes, the cbo is much much smarter then the rbo ever was, i should have caveated that. using the rbo -- very different. using the cbo -- all fast, you need not be as concerned about small things like this as much

Query Result

September 03, 2003 - 11:10 pm UTC

Reviewer: AVS from India

Will the two queries metioned below fetch me the same result ?


Query : 1 SELECT /*+ index(ord_l2) */
MAX(ord.data_date) data_date,
us.person_id,
MAX(us.level_1) ouc,
MAX(ord.taker_user_id) css_user_id,
MAX(ord.week_number) week_no,
SUM(DECODE(ord.create_type,'M',
DECODE(mv1.measure,'OSM00091',0,'OSM00096',0,ord_l2.quantity),
ord_l2.quantity)) total_quantity,
SUM(ord_l2.SPECIAL_PAYMENT_CHARGE) * 100 Money_value,
mv1.measure measure,
us.mgr_ouc
FROM poc1 ord,
fcih us,
polc1 ord_l2,
fmv mv1
WHERE ord.taker_user_id = us.css_user_id
AND ord.create_type IN ('P', 'G', 'M')
AND ord.data_date = report_date
AND ord.sub_zone LIKE report_sub_zone
AND ord.order_status != 'OL'
AND ord_l2.input_cont_type IN ('O', 'R', 'S1')
AND ord_l2.order_number = ord.order_number
AND ord_l2.data_date = ord.data_date
AND ord_l2.sub_zone = ord.sub_zone
AND ord_l2.product_id = mv1.column_value
AND ord_l2.action_id NOT IN ('C', 'R', 'S')
AND ord_l2.status != 'X'
AND mv1.source_system = 'FCH'
AND (mv1.measure = 'OSM00100' OR mv1.measure LIKE ('OSM0009%'))
AND mv1.column_name = 'PRODUCT_ID'
AND EXISTS (SELECT 'x'
FROM polc1 ord_l,
fmv mv2
WHERE ord_l.input_cont_type IN ('O', 'R', 'S1')
AND ord_l.order_number = ord.order_number
AND ord_l.data_date = ord.data_date
AND ord_l.sub_zone = ord.sub_zone
AND ord_l.product_id = mv2.column_value
AND ord_l.action_id NOT IN ('C', 'R', 'S')
AND ord_l.status != 'X'
AND mv2.source_system = 'FCH'
AND (mv2.measure ='OSM00100' OR mv2.measure LIKE ('OSM0009%'))
AND mv2.column_name = 'PRODUCT_ID'
AND EXISTS (SELECT 'x'
FROM fmv mv3
WHERE ord_l.sales_channel = mv3.column_value
AND mv3.source_system = 'FCH'
AND mv3.measure = mv2.measure
AND mv3.column_name = 'SALES_CHANNEL'))
GROUP BY person_id, measure, mgr_ouc;



Query 2 : SELECT /*+ index(ord_l2) */
MAX(ord.data_date) data_date,
us.person_id,
MAX(us.level_1) ouc,
MAX(ord.taker_user_id) css_user_id,
MAX(ord.week_number) week_no,
SUM(DECODE(ord.create_type,'M',
DECODE(mv1.measure,'OSM00091',0,'OSM00096',0,ord_l2.quantity),
ord_l2.quantity)) total_quantity,
SUM(ord_l2.SPECIAL_PAYMENT_CHARGE) * 100 Money_value,
mv1.measure measure,
us.mgr_ouc
FROM poc1 ord,
fcih us,
polc1 ord_l2,
fmv mv1
WHERE ord.taker_user_id = us.css_user_id
AND ord.create_type IN ('P', 'G', 'M')
AND ord.data_date = report_date
AND ord.sub_zone LIKE report_sub_zone
AND ord.order_status != 'OL'
AND ord_l2.input_cont_type IN ('O', 'R', 'S1')
AND ord_l2.order_number = ord.order_number
AND ord_l2.data_date = ord.data_date
AND ord_l2.sub_zone = ord.sub_zone
AND ord_l2.product_id = mv1.column_value
AND ord_l2.action_id NOT IN ('C', 'R', 'S')
AND ord_l2.status != 'X'
AND mv1.source_system = 'FCH'
AND (mv1.measure = 'OSM00100' OR mv1.measure LIKE ('OSM0009%'))
AND mv1.column_name = 'PRODUCT_ID'
AND EXISTS (SELECT 'x'
FROM fmv mv3
WHERE ord_l.sales_channel = mv3.column_value
AND mv3.source_system = 'FCH'
AND mv3.measure = mv1.measure
AND mv3.column_name = 'SALES_CHANNEL')
GROUP BY person_id, measure, mgr_ouc;



Tom Kyte

Followup  

September 04, 2003 - 8:52 am UTC

i don't know, hows about you tell us what is different between them.

they are pretty big - point out the salient changes and why you made them.

Query Tuning (Rewriting).

September 05, 2003 - 9:22 am UTC

Reviewer: AVS from India

The Query 1 looks like

SELECT XXXXXX FROM A,B,C,D /* PART 1 of QUERY 1 */
WHERE CONDITION1
AND CONDITION2
AND CONDITION3
AND CONDITION4
AND CONDITION5
AND CONDITION6
AND CONDITION7
AND CONDITION8
AND CONDITION9
AND CONDITION10
AND CONDITION11
AND CONDITION12
AND CONDITION13
AND CONDITION14
AND CONDITION15
AND EXISTS
(SELECT x FROM C,D /* PART 2 of QUERY 1 */
WHERE CONDITION1
AND CONDITION2
AND CONDITION3
AND CONDITION4
AND CONDITION5
AND CONDITION6
AND CONDITION7
AND CONDITION8
AND CONDITION9
AND EXIST
(SELECT 'x' FROM C,D /* PART 3 of QUERY 1 */
WHERE CONDITION1
AND CONDITION2
AND CONDITION3
AND CONDITION4 )

I found that the PART 2's (FIRST EXISTS) WHERE conditions is repeatation of PART1's condtions. Hence I want to eliminate it, which I've done in QUERY 2.

The PART1 of Query1 without EXISTS fetch me reults in 5 minutes but with SINGLE EXISTS (QUERY 2) it runs for 4 hours .

So I wrote another yet query by creating a new table, NEWTAB


/* CREATE TABLE NEWTAB AS
SELECT * FROM fch
WHERE SOURCE_SYSTEM = 'FCH'
and COLUMN_NAME = 'SALES_COLUMN' */

Query 3 :

SELECT /*+ index(ord_l2) */
MAX(ord.data_date) data_date,
us.person_id,
MAX(us.level_1) ouc,
MAX(ord.taker_user_id) css_user_id,
MAX(ord.week_number) week_no,
SUM(DECODE(ord.create_type,'M',
DECODE(mv1.measure,'OSM00091',0,'OSM00096',0,ord_l2.quantity),
ord_l2.quantity)) total_quantity,
SUM(ord_l2.SPECIAL_PAYMENT_CHARGE) * 100 Money_value,
mv1.measure measure,
us.mgr_ouc
FROM PSTN_ORDER_C_1@qdwp ord,
fch_user_id_hier@qdwp us,
PSTN_ORDER_LINE_C_1@qdwp ord_l2,
fch_measure_values@qdwp mv1, NEWTAB mv3
WHERE ord.taker_user_id = us.css_user_id
AND ord.create_type IN ('P', 'G', 'M')
AND ord.data_date = report_date
AND ord.sub_zone LIKE report_sub_zone
AND ord.order_status != 'OL'
AND ord_l2.input_cont_type IN ('O', 'R', 'S1')
AND ord_l2.order_number = ord.order_number
AND ord_l2.data_date = ord.data_date
AND ord_l2.sub_zone = ord.sub_zone
AND ord_l2.product_id = mv1.column_value
AND ord_l2.action_id NOT IN ('C', 'R', 'S')
AND ord_l2.status != 'X'
AND mv1.source_system = 'FCH'
AND (mv1.measure = 'OSM00100' OR mv1.measure LIKE ('OSM0009%'))
AND mv1.column_name = 'PRODUCT_ID'
AND ord_l2.sales_channel = mv3.column_value
AND mv1.measure = mv3.measure
GROUP BY person_id, mv1.measure, mgr_ouc;


This query (QUERY 3) also takes 4 hours.

Hope you are now clear, what I want to achieve !
Please Help.

Tom Kyte

Followup  

September 05, 2003 - 4:17 pm UTC

what do you want to achieve?

help with what?

if this is "please tune my query" -- not here, this is for followups on the original question.

Query Rewriting

September 08, 2003 - 8:22 am UTC

Reviewer: AVS from India

Yes you are right I was trying to tune the query, for which I required your help. Since it consisted of removing of EXIST clause from one of the query I thought its continuation of the original subject(IN & EXISTS). If it was annoying I'm sorry.


Query Execution takes forever using IN operator

November 07, 2003 - 2:52 pm UTC

Reviewer: TS from USA

Tom,

I 've a query like this:-

select id from table1
where ( contains( name, ' $?AB and $?CD and $?EF ' ) > 0)
and id in ('1234','4567','97343','0934',4576','6376''5017', '3930', '5099', '5089', '5061', '4753', '3646', '3357', '3047', '8059','4124', '17591','14700', '60973', '34987', '37586', '58985', '34874', '49161', '11489');

the name column which has an intermedia index is being used during query execution but the index on ID is not being used.
The query takes a long time to execute. I tried using
RBO, but still was slow. Eventually I tried deleting the
statistics using "analyze table1 delete statistics" and the
query was very fast. So,do I need to delete the statistics
to make it run faster(my only concern is this query might run faster after stats delete but some other query hitting the same table might become slower),or is there a way to rewrite the query to make it run faster?

Thanks for your help.
TS

Tom Kyte

Followup  

November 07, 2003 - 5:24 pm UTC

well, a query with contains is going to use CBO no matter what (so, you never used RBO, cannot be done!)

Hows about you show us

a) tkprof AND autotrace traceonly explain of the query with stats
b) tkprof AND autorace traceonly explain of the query without stats


Followup: Query Execution takes forever using IN operator

November 12, 2003 - 11:25 am UTC

Reviewer: TS from USA

Tom,

Here 's the tkprof output running the query,

Without Stats:-
==============
select id from table1
where ( contains( name, ' $?AB and $?CD and $?EF ' ) > 0)
and id in ('1234','4567','97343','0934',4576','6376''5017', '3930', '5099',
'5089', '5061', '4753', '3646', '3357', '3047', '8059','4124', '17591','14700',
'60973', '34987', '37586', '58985', '34874', '49161', '11489');

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=88)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TABLE1' (Cost=1 Card=1 B
ytes=88)

2 1 DOMAIN INDEX OF 'NAME_TXT_IND' (Cost=1 Card=1)

from tkprof output:-
===================
EXPLAIN PLAN SET STATEMENT_ID='PLUS1583908' FOR select id from table1
where ( contains( name, ' $?AB and $?CD and $?EF ' ) > 0)
and id in ('1234','4567','97343','0934',4576','6376''5017', '3930', '5099',
'5089', '5061', '4753', '3646', '3357', '3047', '8059','4124', '17591','14700',
'60973', '34987', '37586', '58985', '34874', '49161', '11489');

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 92 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 92 0 0

With Stats:-
==========
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=88)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TABLE1' (Cost=1 Card=1 B
ytes=88)

2 1 DOMAIN INDEX OF 'NAME_TXT_IND' (Cost=1 Card=1)




Statistics
----------------------------------------------------------
1313 recursive calls
1 db block gets
1715 consistent gets
205 physical reads
9568 redo size
6729 bytes sent via SQL*Net to client
4967 bytes received via SQL*Net from client
67 SQL*Net roundtrips to/from client
16 sorts (memory)
0 sorts (disk)
0 rows processed

tkprof output:-
=============
select id from table1
where ( contains( name, ' $?AB and $?CD and $?EF ' ) > 0)
and id in ('1234','4567','97343','0934',4576','6376''5017', '3930', '5099',
'5089', '5061', '4753', '3646', '3357', '3047', '8059','4124', '17591','14700',
'60973', '34987', '37586', '58985', '34874', '49161', '11489');


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 33 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 0 33 0 0


Thanks,
TS

Tom Kyte

Followup  

November 12, 2003 - 4:34 pm UTC

i want the RAW tkprof with timings -- no explain plans (they can lie). show me the real tkprof, making sure of course that timed statistics was ON

Thanks for the insight

December 30, 2003 - 4:20 pm UTC

Reviewer: dharma from CA USA

HI Tom,

I have a table like this(A fictitious though)
company_id deptno empno status
1 10 1 A
1 10 2 A
1 10 3 P
1 20 4 A
1 20 5 A
1 30 6 A
1 30 7 P
2 10 8 A
2 10 9 A
2 20 10 P
2 30 11 A

Primary Key on company_id,deptno,empno

To get the depts that dont have employees with status=P.

Is there anyother way other than using not exists to bring the required rows

SELECT DISTINCT deptno FROM COMP_EMP ce
WHERE NOT EXISTS (SELECT 1 FROM COMP_EMP ces
WHERE ce.company_id=ces.company_id
AND ce.deptno=ces.deptno
AND ces.status='P')
since it does a sub-query on the table comp_emp and comp_emp has a million rows, will "not exists" withstand the query.

BTW
Database is Oracle9i Enterprise Edition Release 9.2.0.4.0

Tom Kyte

Followup  

December 30, 2003 - 4:57 pm UTC

select deptno, count( decode(status,'P',1) )
from comp_emp
group by deptno
having count( decode( status,'P',1 ) ) = 0;

will do it as well. there are dozens of ways....


select deptno from comp_emp
minus
select deptno from comp_emp where status = 'P';

is another....

ORA-01415 too many distinct aggregate functions..

January 02, 2004 - 6:23 am UTC

Reviewer: Naveen from India

Hi Tom,

I have a select statement which has a IN clause like below:

Select <column list>
from <tablename>
where column1 in(1,2,3,..)

When i am testing it by giving more and more values in the IN clause it gave the above error. Why is there a limit for values that can be set in a IN clause. Is the above approach recommended. Is there a way to overcome this. Kindly throw some light on the issue.

Thank you
Nav.

Tom Kyte

Followup  

January 02, 2004 - 9:39 am UTC

there is a documented limited on the number of OR's and inlist items.

use a subquery (you probably aren't using binds either are you!).

create global temporary table t ( x int ) on commit delete rows;

and then INSERT the values into t and query simply


select * from table where c in ( select * from t );



Thanks Tom

January 02, 2004 - 11:50 am UTC

Reviewer: Naveen from India

Hi Tom,

Yes you are correct. I am not using bind variables either. I am checking the limit of IN clause as i have to decide which approach i need to take. I'll keep this in mind and thanks again for immediate reply.

Thank you
Nav.

Tom Kyte

Followup  

January 02, 2004 - 1:46 pm UTC

not using bind variables will be the largest mistake of your entire career! rethink that.

interesting ?

February 03, 2004 - 6:39 am UTC

Reviewer: Fenng from CNOUG China

SQL> desc a
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 X                                                  NUMBER(38)
 Y                                                  VARCHAR2(1)

SQL> desc b
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 X1                                                 NUMBER(38)
 Y1                                                 VARCHAR2(1)

SQL> select x from a
  2   where x not in (select x from b);

no rows selected


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'A'
   3    1     FILTER
   4    3       TABLE ACCESS (FULL) OF 'B'





SQL> 

It's OK.But "x" is NOT table B's field.Would you like explain it for us ? Thanks a lot ! 

Tom Kyte

Followup  

February 03, 2004 - 7:57 am UTC

it is expected, done by design, part of the standard.

it is simply called a correlated subquery.


consider:

select * from dept
where exists ( select null from emp where emp.deptno = dept.deptno )
                                                       ^^^^^^^^^^^

you expect that to work right?  well:

SQL> select x from a
  2   where x not in (select x from b);

<b>is not any different than:</b>

SQL> select x from a
  2   where x not in (select <b>A.X</b> from b); 

Thank you!

February 03, 2004 - 9:06 pm UTC

Reviewer: Fenng

Thank you very much~!

How to optimize this query?

April 14, 2004 - 9:32 am UTC

Reviewer: Tony from India, Chennai

Tom,
I want the rows from table cust_trans that are not in both cust_mast and customer_master@db_link1 tables. The query is

SELECT a.cust_id, a.cust_name, a.trans_date
FROM cust_trans SUBPARTITION(sp001) a
WHERE( a.cust_id
NOT IN(SELECT b.cust_id
FROM cust_mast SUBPARTITION(sp001) b )
AND a.cust_id
NOT IN(SELECT c.cust_id
FROM customer_master@db_link1 c )
);

The table customer_master is accessed through db link and it has millions of records. Other tables also have millions of records. The subpartiton name is dynamically passed to the stored procedure. Thus I'm going to use dynamic cursor or Execute immediate. Is there any way to optimize this query?


Tom Kyte

Followup  

April 14, 2004 - 10:51 am UTC

got a query plan for us to look at? (set autotrace traceonly explain would be best)

exists is NOT the same as join right?

April 28, 2004 - 4:55 am UTC

Reviewer: A reader

Hi

I always thought that an EXISTS can be converted to a JOIN until I find this query

select count(*)
FROM OM_WELCOME_PACK t
WHERE EXISTS (SELECT 1
FROM OM_EMF e, LISTA_ORDENES l
WHERE t.emf_ext_id=e.external_id
AND e.order_id=l.order_id
AND l.order_type_id=3)

COUNT(*)
----------
99

If I convert this to a join

select count(*)
FROM OM_WELCOME_PACK t, OM_EMF e, LISTA_ORDENES l
WHERE t.emf_ext_id=e.external_id
AND e.order_id=l.order_id

COUNT(*)
----------
163

I get duplicate rows, is this correct behaviour...?

Jr. Oracle Developer

May 24, 2004 - 5:29 pm UTC

Reviewer: Dave from California, USA

Hello Tom,

I need to combine three select statement in script1/select1.
Two columns (PCS, Speciality) of script1 should be base on the condition define in script 2 & script 3.
Columns alias are "PCS" and "Specialty"
PCS should be base on script two and Speciality should be base on script 3.
Please have a look.
Script 1: (Main Script)

select a.ref_patient_fname|| ' , ' || a.ref_patient_lname "Patient Name",
to_char( a.ref_patient_dob, 'mm/dd/yy' ) DOB,
a.ref_patient_id1 "OVMC ID",
d.site_name || ' , '|| b. clact_desc "PCS",
d.site_name || ' , '|| b. clact_desc "Specialty",
to_char(c.appt_date , 'HH:MI AM' ) Appointments
from dhs_referrals a,
dhs_clinical_activities b,
dhs_appointments c,
dhs_sites d
where a.ref_clact_id_requested = b.clact_id and
a.ref_clact_id_requesting = b.clact_id and
c.appt_ref_id = a.ref_id and
d.site_id = b.clact_site_id

Script 2:

select c.site_name || ' , '|| b. clact_desc "PCS" , a.ref_clact_id_requesting
from dhs_sites c, dhs_clinical_activities b, dhs_referrals a
where a.ref_clact_id_requesting = b.clact_id

Script 3
select c.site_name || ' , '|| b. clact_desc "Speciality" , a.ref_clact_id_requested
from dhs_sites c, dhs_clinical_activities b, dhs_referrals a
where a.ref_clact_id_requested = b.clact_id

Any help will be appreciated.

Dave


Tom Kyte

Followup  

May 24, 2004 - 7:12 pm UTC

don't understand the question. I see three queries but I've not any idea what you are trying to do with them really?

Jr. Oracle Developer

May 24, 2004 - 7:38 pm UTC

Reviewer: Dave from CA, USA

Sorry for inconvinince.

Actually, On my script1.
I have two columns "PCS" and "Speciality".
I need to call values for "PCS" and "Speciality" based on script 2 & script 3 respectively.

e.g:

select c.site_name || ' , '|| b. clact_desc "PCS"
from dhs_sites c, dhs_clinical_activities b, dhs_referrals a
where a.ref_clact_id_requesting = b.clact_id

will give me the values for "PCS" in script1.

and

select c.site_name || ' , '|| b. clact_desc "Speciality"
from dhs_sites c, dhs_clinical_activities b, dhs_referrals a
where a.ref_clact_id_requested = b.clact_id

will give me values for "Speciality" in Script1.

Please note: I used "PCS" and "Speciality" as a column aliases in all scripts.

cheers,
Dave



Tom Kyte

Followup  

May 24, 2004 - 11:44 pm UTC

and what does:

I need to call values for "PCS" and "Speciality" based

mean exactly? I mean you have two spool files maybe, one full of column "X" and one full of column "Y". Now what? what is the logic to use that?

Jr. Oracle Developer

May 25, 2004 - 12:44 pm UTC

Reviewer: Dave from CA, USA

No I do not have any spool file etc.
Here what I wanted to do.


e.g:


I have main script which is consists of three columns.

select a.DOB,
b.clact_desc||' , '||c.site_name "PCS",
b.clact_desc||' , '||c.site_name "Speciality"
from dhs_referrals a, dhs_clinical b, dhs_site c
where .....

In dhs_referrals table I also have two more columns called clact_requested_id & clact_requesting_id.

Here is the logic I wanted to put for column 2 & 3 respectively.

where b.clact_id = a.clact_requesting_id
gives me the value for PCS. (Column 2 in the main script above)

or

where b.clact_id = a.clact_requested_id gives me the value for Speacility. (Column 3 in the main script above)

I need your help to add above two logic in the main select script.
















Tom Kyte

Followup  

May 25, 2004 - 12:58 pm UTC

sorry -- i don't think we are going to get anywhere here. Your terminology is just not anything I'm understanding. I do not see the tie in between columns in a select and separate predicates.

Join or sub-query

August 20, 2004 - 5:01 am UTC

Reviewer: atul from India

Hi,

What is fast Join or subquery??
Under what conditions??

Thanks,
Atul

Tom Kyte

Followup  

August 20, 2004 - 11:14 am UTC

er? does not compute.

How to update a row if the row exist if not insert?

September 23, 2004 - 10:09 pm UTC

Reviewer: Vinod from Banglore, India

Tom - I want to update a row if the row exist if not insert that row. Is there a simple way to do this? I was doing if exist then update if not insert but I would like to try it in one single DML if possible.

Vinod

Tom Kyte

Followup  

September 24, 2004 - 10:35 am UTC

in theory you could use merge

merge into t
into ( select :x x, :y y, :z z from dual ) d
on ( t.x = d.x )
when matched then update set ...
when not matched then insert....

but

update t set y = :y, z = :z where x = :x;
if (sql%rowcount = 0)
then
insert into t (...) values (....);
end if;

it about the same in this case.


What about "=" and "LIKE" comparison operators

October 11, 2004 - 3:15 pm UTC

Reviewer: denni50 from na

Hi Tom....Welcome back!....(we MISSED you!)

I know the discussion here involves NOT,EXISTS,NOT IN and
NOT EXISTS operators....however I recently stumbled upon a
situation where using the "=" comparison caused the
system to come to a grinding halt. When I changed the
operator to LIKE 'A%' instead of = 'A' the system performed
normally.

We recently converted test system to CBO and encountered
this situation for the first time when testing one of
our third party software modules where users are allowed
to insert additional sql statements. With RBO users would
enter: where col_name='A' and everything would work fine.

This was discovered when investigating why this function performed normally during the first phase, then came to a grinding slowdown during the second phase and saw where some users entered col_name LIKE 'A%' and others had entered col_name ='A'(this is when the system would virtually stop). When I changed all the ='A' to LIKE 'A%'everything ran smoothly.

Have you ever encountered performance issues between = and
LIKE?

glad you made it "home" safe and sound.






Tom Kyte

Followup  

October 11, 2004 - 4:56 pm UTC

how do you gather statistics -- exact command.

when you do an autotrace traceonly explain on the two -- what do you see as differences? is the CARD=xxxx part of the query even close to reality.

OEM SQL ANALYZE

October 11, 2004 - 5:18 pm UTC

Reviewer: denni50 from na

I ran a sql analyze through OEM on both statements
the cost was around 10450-14512 for both sql statements.
Can't post results due to proprietary constraints.

There's alot of nested loops and bitmap conversions
from rowids with the largest table(approx 8+ million
records) doing full table scans.



Tom Kyte

Followup  

October 11, 2004 - 7:47 pm UTC

cost is not meaningful.

the CARD= parts -- are they even *close* to reality.

given you stats method of collection (no histograms -- just one bucket with method_opt at its default) the optimizer is using "distributions and best guesses".

My guess is, 'A' has lots and lots of values.

where column = 'A' is guessing "too low" (no histograms)

where column like 'A%' is guessing "higher, closer to reality"




Gather statistics

October 11, 2004 - 5:21 pm UTC

Reviewer: denni50 from na

I use the dbms_stats package:
gather_schema_stats cascade>=true



thanks Tom

October 12, 2004 - 9:43 am UTC

Reviewer: denni50 from na

I understand what you're saying about the histograms.
While you were away I experimented and ran statistics(dbms_stats) on the schema and used the method_opt SIZE AUTO on Indexed Columns parameter to gather histogram stats. Upon completion everything I tried to run, simple sql statements and queries took forever to run. I would log into the software application and everything seemed to just hang. I deleted all the histograms, re-ran table and index stats only and performance went back to normal processing
except for this latest incident with the = and LIKE operators.

I'm still trying to learn and understand how histograms
affect and work within CBO. The other frustrating part
is that I'm dealing with code that is not mine so tuning
it is out of the question. All I can do is run my own
analyze and trace reports...send it to the vendor and
HOPE someone figures out the problem.

I am going to re-run the trace reports through sqlplus
instead of OEM since the OEM does not show the card=
(at least I didn't see it unless it is represented under
a different header).

I know "cost" is not what is important. Cost is just an
arbitrary number assigned to a particular query at a
particular time in a given environment that the
optimizer chooses as the lowest cost.

I've learned quite a bit in 2+ years but still have a
heck of alot more to learn and understand as far as CBO
is concerned.








Explain Plan

October 15, 2004 - 3:54 pm UTC

Reviewer: denni50 from na

Tom

attached is the explain plan for the scenario I've been describing in the earlier threads here about the '=' and
'LIKE' operators.

(had to modify the names of tables,columns and indexes
since this is not my code.)


Table1 has approx 3.2 million records
Table2 is a temporary table(just for this function)
Table3 has 8+ million records.

Just by looking at the explain plan can you tell what
is going on. This is a new procedure that's been added
to a newer version of software that is causing the functionality to take 3 times longer to complete.

thanks


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=15825 Card=468 Bytes
=30888)

1 0 NESTED LOOPS (Cost=15825 Card=468 Bytes=30888)
2 1 HASH JOIN (Cost=15824 Card=468 Bytes=28080)
3 2 NESTED LOOPS (Cost=2714 Card=15279 Bytes=320859)
4 3 SORT (UNIQUE)
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'TABLE1' (Cost=
2649 Card=15279 Bytes=229185)

6 5 INDEX (RANGE SCAN) OF 'TABLE1_COL2_INDEX'
(NON-UNIQUE) (Cost=439 Card=241176)

7 3 INDEX (UNIQUE SCAN) OF 'TABLE2_ID_IDX' (UNIQUE)
8 2 VIEW (Cost=13084 Card=11085 Bytes=432315)
9 8 FILTER
10 9 SORT (GROUP BY) (Cost=13084 Card=11085 Bytes=31038
0)

11 10 NESTED LOOPS (Cost=10452 Card=230371 Bytes=64503
88)

12 11 TABLE ACCESS (FULL) OF 'TABLE3' (Cost=10450 C
ard=230371 Bytes=5068162)

13 11 INDEX (UNIQUE SCAN) OF 'TABLE2_ID_IDX' (UNIQ
UE)

14 1 INDEX (UNIQUE SCAN) OF 'PK_TABLE1_ID' (UNIQUE)













Crazy behaviour!!

December 09, 2004 - 7:25 am UTC

Reviewer: Ganesh from Bangalore, India

create table emp (empno number(3), dno number(3))
/
create table dept (dno number(3))
/
select * from emp
where empno in
(select empno from dept)
/

I'm wondering how the above query works when there is no empno field in dept table?

Tom Kyte

Followup  

December 09, 2004 - 2:03 pm UTC

that is not crazy, it is expected and called a correlated subquery.

select *
from emp
where emp.empno in ( select emp.empno
from dept );

makes more sense with the correlation names there.....


just like:


select * from dept
where exists ( select null from emp where emp.deptno = dept.deptno );

the columns of the outer query are available in the inner queries.

why is it so?

December 10, 2004 - 1:46 am UTC

Reviewer: Ganesh from Bangalore, India

Thanks Tom. I guessed that it would be correlated subquery.

By why does oracle apply its intelligence, when I dont mention explicitly emp.empno?



Tom Kyte

Followup  

December 10, 2004 - 10:53 am UTC

confusion -- it did apply "its intelligence"

your query was syntactically correct
your query was valid SQL
your query must be run as is

It did exactly what is it documented, expected and supposed to do. It is not a "bad query", it is not a "mistake". the only mistake here was that you didn't mean to run THAT query.

This is like asking us to raise an error because you query "where 1=0" or something.

delete performance issue using 'IN'

December 13, 2004 - 12:16 pm UTC

Reviewer: A reader

Hi Tom,

I have the following SQL:

DELETE
FROM nitsmedia
WHERE nitstory_srl not in
(
SELECT srl
FROM nitstory
WHERE lower(category) = 'stjohns'
AND to_date('MAY-01-2003' || ' 00:00:00', 'MM-DD-YYYY HH24:MI:SS') <= last_modified_time
AND last_modified_time <= to_date('MAY-01-2003' || '23:59:59', 'MM-DD-YYYY HH24:MI:SS')
AND story_origin = 'Regional'
);

It takes forever!
Number of rows for inner query: 10 rows
Number of rows for outer query: 15289

However, the following are fast:

(1) The inner query:
SELECT srl
FROM nitstory
WHERE lower(category) = 'stjohns'
AND to_date('MAY-01-2003' || ' 00:00:00', 'MM-DD-YYYY HH24:MI:SS') <= last_modified_time
AND last_modified_time <= to_date('MAY-01-2003' || '23:59:59', 'MM-DD-YYYY HH24:MI:SS')
AND story_origin = 'Regional';

SRL
----------
70640
70670
70671
70673
70674
70681
70682
70688
70689
70692

10 rows selected.


(2)
DELETE FROM nitsmedia WHERE nitstory_srl IN (70640,70670,70671,70673,70674,70681,70682,70688,70689,70692)

(3) DELETE FROM nitsmedia;

I remember this rule:
If the result of the subquery is small -- then IN is typicaly more appropriate. However, this is not working this time!

Can you shed any light on why it is like this and what I should do?

Here are the info:
Oracle 9.2 on HP-UX
All tabled have been analyzed

Thanks,
- Arash






Tom Kyte

Followup  

December 13, 2004 - 2:00 pm UTC

are you using the CBO
and is sr1 defined as NOT NULL


(and forget that rule of thumb, not really valid in CBO days... this dates way back to the time of the RBO... thinks change.)

December 13, 2004 - 2:13 pm UTC

Reviewer: A reader

Yes Tom. I use the default which is 'choose' as optimizer mode. Srl is not null and there is foreign key between srl and nitstory_srl

and here is the explain plan:
Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=232 Card=15288 Bytes
          =1223040)

   1    0   DELETE OF 'NITSMEDIA'
   2    1     FILTER
   3    2       NESTED LOOPS (ANTI) (Cost=232 Card=15288 Bytes=1223040
          )

   4    3         TABLE ACCESS (FULL) OF 'NITSMEDIA' (Cost=25 Card=152
          89 Bytes=397514)

   5    3         TABLE ACCESS (BY INDEX ROWID) OF 'NITSTORY' (Cost=23
          2 Card=1 Bytes=54)

   6    5           BITMAP CONVERSION (TO ROWIDS)
   7    6             BITMAP AND
   8    7               BITMAP CONVERSION (FROM ROWIDS)
   9    8                 INDEX (RANGE SCAN) OF 'PK_NITSTORY' (UNIQUE)
  10    7               BITMAP CONVERSION (FROM ROWIDS)
  11   10                 INDEX (RANGE SCAN) OF 'I2_NITSTORY_STORY_ORI
          GIN' (NON-UNIQUE)



SQL> desc nitsmedia
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SRL                                       NOT NULL NUMBER(12)
 NITSTORY_SRL                              NOT NULL NUMBER(12)
 MEDIAFILE_NME                                      VARCHAR2(100)
 MEDIA_TYP                                 NOT NULL VARCHAR2(20)
 THUMBNAIL                                          VARCHAR2(100)
 CAPTION                                            VARCHAR2(1000)
 POSITION                                           NUMBER
 LAYOUT                                             VARCHAR2(10)

SQL> desc nitstory
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SRL                                       NOT NULL NUMBER(12)
 STORYFILE_NME                             NOT NULL VARCHAR2(30)
 AUTHOR                                             VARCHAR2(42)
 STORY_ORIGIN                                       VARCHAR2(10)
 CATEGORY                                           VARCHAR2(30)
 HEADLINE                                           VARCHAR2(200)
 SUBHEADLINE                                        VARCHAR2(200)
 KEYWORDS                                           VARCHAR2(200)
 HISTORY                                            VARCHAR2(200)
 DATELINE                                           VARCHAR2(100)
 HIGHLIGHT                                          VARCHAR2(1200)
 CREATION_TIME                                      VARCHAR2(30)
 LAST_MODIFIED_TIME                                 VARCHAR2(30)
 TIMEZONE                                           VARCHAR2(25)
 FOOTNOTES                                          VARCHAR2(1000)
 BODY                                               CLOB


I am not sure why there lots of "BITMAP CONVERSION" in plan?

Thank you Tom,
- Arash'


 

Tom Kyte

Followup  

December 13, 2004 - 2:49 pm UTC

can we see a tkprof of the "really slow one"

December 14, 2004 - 11:23 am UTC

Reviewer: A reader

Tom,

Here is the tkprof for the query. I canceled the connection in the middle as I told you it takes forever. I also noticed that, it locks both tables 'nitsmedia' and 'nitstory'.

Thanks again for your help and outstanding website.
- Arash

DELETE
FROM nitsmedia
WHERE nitstory_srl not in
(
SELECT srl
FROM nitstory
WHERE lower(category) = 'stjohns'
AND to_date('MAY-01-2003' || ' 00:00:00', 'MM-DD-YYYY HH24:MI:SS') <= last_mo
dified_time
AND last_modified_time <= to_date('MAY-01-2003' || '23:59:59', 'MM-DD-YYYY H
H24:MI:SS')
AND story_origin = 'Regional'
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.11 0.11 0 323 0 0
Execute 1 51.59 52.22 0 2336957 2122 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 51.70 52.34 0 2337280 2122 0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 101 (NI_DDL)

Rows Execution Plan
------- ---------------------------------------------------
0 DELETE STATEMENT GOAL: CHOOSE
0 DELETE OF 'NITSMEDIA'
0 FILTER
0 TABLE ACCESS (FULL) OF 'NITSMEDIA'
0 TABLE ACCESS (BY INDEX ROWID) OF 'NITSTORY'
0 INDEX (RANGE SCAN) OF 'I2_NITSTORY_STORY_ORIGIN'
(NON-UNIQUE)




********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.11 0.11 0 323 0 0
Execute 1 51.59 52.22 0 2336957 2122 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 51.70 52.34 0 2337280 2122 0

Misses in library cache during parse: 1


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 29 0.06 0.04 0 147 0 0
Execute 76 0.02 0.01 0 0 0 0
Fetch 287 0.00 0.01 0 323 0 223
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 392 0.08 0.07 0 470 0 223

Misses in library cache during parse: 20

1 user SQL statements in session.
29 internal SQL statements in session.
30 SQL statements in session.
1 statement EXPLAINed in this session.
********************************************************************************
Trace file: /db/9.2/dbo01/log/udump/dbo01_ora_11063.trc
Trace file compatibility: 9.00.01
Sort options: default

1 session in tracefile.
1 user SQL statements in trace file.
29 internal SQL statements in trace file.
30 SQL statements in trace file.
21 unique SQL statements in trace file.
1 SQL statements EXPLAINed using schema:
SYSTEM.prof$plan_table
Default table was used.
Table was created.
Table was dropped.
604 lines in trace file.


Tom Kyte

Followup  

December 15, 2004 - 12:28 pm UTC

sorry -- but a query won't lock any tables.


but something is fishy here.


if you run this script:

drop table emp;
create table emp as select * from scott.emp;
alter session set sql_trace=true;
select count(*) from emp;


then tkprof will say:



select count(*)
from
emp

....

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
14 TABLE ACCESS (FULL) OF 'EMP'




but if you run this one:

drop table emp;
create table emp as select * from scott.emp;
exec dbms_stats.gather_table_stats( user, 'EMP' );
alter session set sql_trace=true;
select count(*) from emp;

tkprof says

select count(*)
from
emp

....
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
14 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'EMP'



But before you showed us:

Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=232 Card=15288 Bytes
=1223040)

1 0 DELETE OF 'NITSMEDIA'
2 1 FILTER
3 2 NESTED LOOPS (ANTI) (Cost=232 Card=15288 Bytes=1223040
)

4 3 TABLE ACCESS (FULL) OF 'NITSMEDIA' (Cost=25 Card=152
89 Bytes=397514)

5 3 TABLE ACCESS (BY INDEX ROWID) OF 'NITSTORY' (Cost=23
2 Card=1 Bytes=54)

6 5 BITMAP CONVERSION (TO ROWIDS)
7 6 BITMAP AND
8 7 BITMAP CONVERSION (FROM ROWIDS)
9 8 INDEX (RANGE SCAN) OF 'PK_NITSTORY' (UNIQUE)
10 7 BITMAP CONVERSION (FROM ROWIDS)
11 10 INDEX (RANGE SCAN) OF 'I2_NITSTORY_STORY_ORI
GIN' (NON-UNIQUE)


which shows the use of the CBO (the cost= card= is there ONLY when the CBO is used)

so, I'm thinking you have a big mixture of ANALYZED and UNANALYZED tables here the and the optimizer doesn't stand a ghost's of a chance of getting the right answer.

EG: table you are deleting from is analyzed....

These tables you are querying are not (and must be -- if you want NOT IN to run "in your lifetime"






December 15, 2004 - 2:09 pm UTC

Reviewer: A reader

Tom,

All Tables are analyzed, however the problem still there.

ANALYZE TABLE NITSMEDIA COMPUTE STATISTICS FOR TABLE FOR ALL INDEXES ;

ANALYZE TABLE NITSTORY COMPUTE STATISTICS FOR TABLE FOR ALL INDEXES ;

I noticed something that may help.

I created two new tables like:
create table NITSTORY1 as select * from NITSTORY;
create table NITSMEDIA as select * from NITMEDIA;

and then ran the query with new table names:
DELETE
FROM nitsmedia1
WHERE nitstory_srl not in
(
SELECT srl
FROM nitstory1
WHERE lower(category) = 'stjohns'
AND to_date('MAY-01-2003' || ' 00:00:00', 'MM-DD-YYYY HH24:MI:SS') <= last_modified_time
AND last_modified_time <= to_date('MAY-01-2003' || '23:59:59', 'MM-DD-YYYY HH24:MI:SS')
AND story_origin = 'Regional'
)

and it was very FAST. The only difference that I saw between the old tables and new ones was a foreign key between the old tables that new tables didn't have that. So I dropped the foreign key from old tables to make it in sync with the new tables. But still the delete is very slow on old tables.

Please let me know your idea.

Thanks,
- Arash


Tom Kyte

Followup  

December 15, 2004 - 6:14 pm UTC

don't use analyze.

use dbms_stats.gather_table_stats( user, 'table_name', cascade=>true );

so, show us the tkprofs between the two. (i've a feeling that not null column isn't really not null -- eg: maybe it says not null but someone put it on there with "novalidate" or something)

December 16, 2004 - 9:55 am UTC

Reviewer: A reader

Hi Tom,

can you tell me why, when I recreate tables (create table NITSTORY1 as select * from NITSTORY....), it works well?

I also test combination of new tables and old tables in the delete statement, and noticed , the problem is the inner table "NITSTORY' which makes the query slow. And when I replace this table with the new table 'NITSTORY1' , it works fine.

Here are the tables:

create table NITSTORY (
SRL NUMBER(12) not null,
STORYFILE_NME VARCHAR2(30) not null,
AUTHOR VARCHAR2(42),
STORY_ORIGIN VARCHAR2(10),
CATEGORY VARCHAR2(30),
HEADLINE VARCHAR2(200),
SUBHEADLINE VARCHAR2(200),
KEYWORDS VARCHAR2(200),
HISTORY VARCHAR2(200),
DATELINE VARCHAR2(100),
HIGHLIGHT VARCHAR2(1200),
CREATION_TIME VARCHAR2(30),
LAST_MODIFIED_TIME VARCHAR2(30),
TIMEZONE VARCHAR2(25),
FOOTNOTES VARCHAR2(1000),
BODY CLOB,
constraint PK_NITSTORY primary key (SRL)
);

create table NITSMEDIA (
SRL NUMBER(12) not null,
NITSTORY_SRL NUMBER(12) not null,
MEDIAFILE_NME VARCHAR2(100),
MEDIA_TYP VARCHAR2(20) not null,
THUMBNAIL VARCHAR2(100),
CAPTION VARCHAR2(1000),
POSITION NUMBER,
LAYOUT VARCHAR2(10),
constraint PK_NITSMEDIA primary key (SRL)
);

alter table NITSMEDIA
add constraint FK1_NITSTORY_NITSMEDIA foreign key (NITSTORY_SRL)
references NITSTORY (SRL);


As always, thank you for your time,
- Arash


Tom Kyte

Followup  

December 16, 2004 - 10:05 am UTC

not without the tkprofs I asked for, no.

December 16, 2004 - 11:19 am UTC

Reviewer: A reader

Tom.

I used:
dbms_stats.gather_table_stats( user, 'table_name', cascade=>true );

and it fixed the problem. Why is that? Can you elaborate it a little more?

I found that Analyze method from your site. But it seems it doesn’t work well for 9.2!

Thank you again,
- Arash


Tom Kyte

Followup  

December 16, 2004 - 11:46 am UTC

you gathered table and index stats but no column stats -- that gathers table, column and index stats.

analyze table t compute statistics;

would do roughly the same thing, but dbms_stats is more proper.

IN or EXISTS - back on topic !

December 16, 2004 - 3:23 pm UTC

Reviewer: DaPi from Geneva - CH

Hi Tom,

Many thanks for the site. I much appreciate your scientific/analytic approach.

This thread keeps popping to the top even after 4 years. All you say is correct, but there is a problem: I believe the question is not the right one to ask.

Real-life:
I have something to code. I do so to the best of my ability, keeping in mind clarity & maintainability. I test to make sure I have the right result and, if the performance is appropriate, I give thanks and go on to the next task. If it's slow (for the purpose), I quickly try some tricks I have up my sleeve and, if still slow, begin an analysis of what is going "wrong". I can't afford the time to analyse - not ANALYZE :) - every query.

I think the right question to ask is:
Q: I can code my query with IN or with EXISTS; which should I try first?

In 5 years of Rdb and 5 years of Oracle, I developed the habit of coding EXISTS. In those 5 Oracle years I have recoded EXISTS as IN only twice (I'm pretty sure of this because it's been so rare). A colleague with the IN habit, would give me a slow query to tune every week or two; I'd just recode the IN as EXISTS.

OK, I have no statistics, but in my real-life experience the answer is:
A: Try EXISTS first. You may have to recode, but not very often.

Waiting to be blasted - DaPi


Tom Kyte

Followup  

December 16, 2004 - 3:51 pm UTC

A: use the cbo and it'll try out both.


No blasting, I use exists so infrequently and in almost all of the time myself.

IN & EXISTS - CBO tries both?

December 16, 2004 - 4:04 pm UTC

Reviewer: DaPi from Geneva - CH

You said: "A: use the cbo and it'll try out both."

I've been running CBO (7.3.4 then 8.0.6 then 8.1.7) with up-to-date stats all those 5 years - it must miss a trick from time to time, because recoding HAS made a difference in some cases.

I'd agree that in many cases there is NO performance difference between IN and EXISTS (when the CBO gets it right?). But if there is going to be a difference, I'd put my money on EXISTS.

Thanks - DaPi

Tom Kyte

Followup  

December 16, 2004 - 5:10 pm UTC

select *
from big_table
where exists ( select null from small_table where small_table.x = big_table.x)

vs

select * from big_table
where x in ( select x from small_table )


give me the IN any time (in rbo days)

IN vs EXISTS

December 17, 2004 - 3:17 am UTC

Reviewer: DaPi from Geneva - CH

OK - I'll buy that (confirmed by experiment under CBO).

I guess it's that my real-life experience tends not to deal with small tables . . . . and with big tables it hurts even more when you guess wrong!

Tom Kyte

Followup  

December 17, 2004 - 7:58 am UTC

doesn't even have to be small tables, big tables that return smallish results


select * from big_table
where x in ( select x from really_huge_table where <some predicate>)

if some predicate returned just 100 rows from really huge table, IN would have been preferred :)

December 26, 2004 - 2:21 am UTC

Reviewer: Logan

for x in ( select * from t1 )
loop
if ( exists ( select null from t2 where y = x.x )
then
OUTPUT THE RECORD
end if
end loop

-----------------------------------------------------------

The explanation you gave for IN & EXISTS being better and in which situation is convincing, but the example loop you gave...

1.Well when you use exists the SQL always returns only one row, is that true i.e. when you used exists to find whether a conditions is true or false. For examples


select decode( count(*), 1, 'y', 'n')
from dual where exists ( select null from t1 , t2 where
where t1.x = t2.x)

I guess you cannot do such a sql using IN?

2. My question is doesnt EXISTS as the name of the clause itself suggests checks for the existense of a condition. Lets say if 10 rows meet that condition, does the exists clause read all the 10 rows, or it returns the result after meeting the condition for the first time.

Tom Kyte

Followup  

December 26, 2004 - 12:40 pm UTC

I was giving psuedo code here to explain the concept.


exists and in can both short circut since the CBO will turn in into exists and exists into in as needed these days. so, it is safe to say that either can stop after the first one -- or get them all and semijoin.

December 26, 2004 - 9:37 pm UTC

Reviewer: Logan

--------------
it is safe to say that either can stop
after the first one -- or get them all and semijoin
-----------

1.Tom , can you please illustrate the same. Can you give an example where an EXISTS stops after the first row, and where it gets all the rows.

2.What do you mean by a semi join?

Tom Kyte

Followup  

December 27, 2004 - 9:39 am UTC

it is that:

select * from t1 where EXISTS ( select NULL from t2 where t2.id = t1.id )

can be written as:

select * from t1 where id in ( select id from t2 );


and the optimizer is free to (re)write it either way.


consider:

ops$tkyte@ORA9IR2> create table t1( id int, y char(2000));
Table created.
 
ops$tkyte@ORA9IR2> create table t2( id int );
Table created.
 
ops$tkyte@ORA9IR2> delete from plan_table;
4 rows deleted.
 
ops$tkyte@ORA9IR2> explain plan for
  2  select * from t1 where exists ( select null from t2 where t2.id = t1.id );
Explained.
 
ops$tkyte@ORA9IR2> select * from table( dbms_xplan.display );
 
PLAN_TABLE_OUTPUT
--------------------
--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |       |       |       |
|*  1 |  FILTER              |             |       |       |       |
|   2 |   TABLE ACCESS FULL  | T1          |       |       |       |
|*  3 |   TABLE ACCESS FULL  | T2          |       |       |       |
--------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter( EXISTS (SELECT 0 FROM "T2" "T2" WHERE "T2"."ID"=:B1))
   3 - filter("T2"."ID"=:B1)
 
Note: rule based optimization
 
17 rows selected.

<b>that shows Oracle doing the "where exists" as a filter -- for every row in T1 it'll run that filter (which results in a full scan of T2 -- but stops after it hits the first "t2.id = :b1" -- so it'll not totally full scan the table each time PERHAPS, just scans till it gets a hit)

But, lets use the more intelligent CBO and tell it about our tables (tell it "t1" is medium size, "t2" is big)</b>
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats( user, 'T1', numrows => 5000, numblks=>500);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats( user, 'T2', numrows => 50000, numblks=>5000);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> delete from plan_table;
 
4 rows deleted.
 
ops$tkyte@ORA9IR2> explain plan for
  2  select * from t1 where exists ( select null from t2 where t2.id = t1.id );
 
Explained.
 
ops$tkyte@ORA9IR2> select * from table( dbms_xplan.display );
 
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
 
--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |  5000 |   551K|   854 |
|*  1 |  HASH JOIN SEMI      |             |  5000 |   551K|   854 |
|   2 |   TABLE ACCESS FULL  | T1          |  5000 |   488K|    78 |
|   3 |   TABLE ACCESS FULL  | T2          | 50000 |   634K|   760 |
--------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("T2"."ID"="T1"."ID")
 
Note: cpu costing is off
 
16 rows selected.
 
ops$tkyte@ORA9IR2>

<b>Now, it is doing a single full scan of T1 and of T2 (instead of a full scan of T2 for EVERY ROW IN T1) followed by that "semi join" thingy (which I wrote about in Effective Oracle by Design in detail)</b>


The semi join is a join that joins T1 with T2 but only ONCE (if id=55 is in T2 5,000 times -- we don't join the row in T1 with id=55 to each on -- only to the first).  A semi join is a "where exists" style join, just join to the first occurrence

But - we built the entire set.  We read all of T2, got all of the rows and are joining (semi joining) to them
 

how to use index in exists statement?

March 25, 2005 - 12:53 am UTC

Reviewer: A reader

Tom,
I have the delete on the large table pa_budget_lines.
I want to use the index on resource_assignment_id on the table pa_budget_lines. It seams impossible?
thanks a lot!


DELETE FROM pa_budget_lines bl
where EXISTS (SELECT 'X'
FROM pa_resource_assignments ra,
pa_budget_versions bv
WHERE bv.current_flag <> 'Y'
AND bv.budget_status_code = 'B'
AND bv.baselined_date IS NOT NULL
AND bl.resource_assignment_id = ra.resource_assignment_id
AND bv.budget_version_id = ra.budget_version_id)
/

Tom Kyte

Followup  

March 25, 2005 - 7:48 am UTC

are you using the cbo.

March 25, 2005 - 9:02 pm UTC

Reviewer: A reader

"
are you using the cbo.
"

Yes, SIr. it the cost based opt on 8174.

Tom Kyte

Followup  

March 26, 2005 - 9:08 am UTC

then lets see the autotrace traceonly explain output. perhaps the index would not be useful.

March 26, 2005 - 11:17 am UTC

Reviewer: A reader

the autotrace as the follow.
there 2 millions rows in pa_budget_lines , 164k in pa_resource_assignments and 13k in pa_budget_versions .

APPS@S0P2DBT3> DELETE FROM pa_budget_lines bl
2 where EXISTS (SELECT 'X'
3 FROM pa_resource_assignments ra,
4 pa_budget_versions bv
5 WHERE bv.current_flag <> 'Y'
6 AND bv.budget_status_code = 'B'
7 AND bv.baselined_date IS NOT NULL
8 AND bl.resource_assignment_id = ra.resource_assignment_id
9 AND bv.budget_version_id = ra.budget_version_id)
10 /

432 rows deleted.


Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=11837 Card=102454 Bytes=2868712)

1 0 DELETE OF 'PA_BUDGET_LINES'
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'PA_BUDGET_LINES' (Cost=11837 Card=102454 Bytes=2868712)

4 2 NESTED LOOPS (Cost=4 Card=1 Bytes=24)
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'PA_RESOURCE_ASSIGNMENTS' (Cost=3 Card=1 Bytes=10)

6 5 INDEX (UNIQUE SCAN) OF 'PA_RESOURCE_ASSIGNMENTS_U1' (UNIQUE) (Cost=2 Card=2)

7 4 TABLE ACCESS (BY INDEX ROWID) OF 'PA_BUDGET_VERSIONS' (Cost=1 Card=1085 Bytes=15190)

8 7 INDEX (UNIQUE SCAN) OF 'PA_BUDGET_VERSIONS_U1' (UNIQUE)





Statistics
----------------------------------------------------------
1329 recursive calls
8377 db block gets
1314941 consistent gets
84134 physical reads
586996 redo size
515 bytes sent via SQL*Net to client
939 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
432 rows processed

APPS@S0P2DBT3>
APPS@S0P2DBT3> rollback
2 /

Rollback complete.


Tom Kyte

Followup  

March 26, 2005 - 12:30 pm UTC

just explain this query (you don't need run it, just explain plan it)

ops$tkyte@ORA9IR2> DELETE FROM t1 bl
  2  where resource_assignment_id in
  3  (SELECT ra.resource_assignment_id
  4        FROM   t2 ra,
  5               t3 bv
  6          WHERE  bv.current_flag <> 'Y'
  7            AND  bv.budget_status_code = 'B'
  8            AND  bv.baselined_date IS NOT NULL
  9            AND  bv.budget_version_id = ra.budget_version_id)
 10  /
 
 

March 26, 2005 - 1:01 pm UTC

Reviewer: A reader

apps@s0p2dbt3> explain plan for
2 DELETE FROM pa_budget_lines bl
3 where resource_assignment_id in
4 (SELECT ra.resource_assignment_id
5 FROM pa_resource_assignments ra,
6 pa_budget_versions bv
7 WHERE bv.current_flag <> 'Y'
8 AND bv.budget_status_code = 'B'
9 AND bv.baselined_date IS NOT NULL
10 AND bv.budget_version_id = ra.budget_version_id)
11 /

Explained.

apps@s0p2dbt3>
apps@s0p2dbt3> @plan

QUERY_PLAN
------------------------------------------------------------
DELETE PA_BUDGET_LINES
HASH JOIN
VIEW VW_NSO_1
SORT UNIQUE
HASH JOIN
TABLE ACCESS FULL PA_BUDGET_VERSIONS
TABLE ACCESS FULL PA_RESOURCE_ASSIGNMENTS
TABLE ACCESS FULL PA_BUDGET_LINES


Tom Kyte

Followup  

March 26, 2005 - 1:10 pm UTC

are you sure the indices are in place? what happens with the following scripts?

(safe if you do not have t1, t2, t3)


drop table t1;
drop table t2;
drop table t3;

create table t1 ( resource_assignment_id int, data char(80) );
create table t2 ( resource_assignment_id int, budget_version_id int, data char(80) );
create table t3 ( current_flag varchar2(1), budget_status_code varchar2(10),
baselined_date date, budget_version_id int, data char(80) );

exec dbms_stats.set_table_stats( user, 'T1', numrows=> 2000000, numblks => 12000 );
exec dbms_stats.set_table_stats( user, 'T2', numrows=> 164000, numblks => 1000 );
exec dbms_stats.set_table_stats( user, 'T3', numrows=> 13000, numblks => 500 );

create index t1_idx on t1(resource_assignment_id);
create UNIQUE index t2_idx on t2(resource_assignment_id);
create UNIQUE index t3_idx on t3(budget_version_id);

exec dbms_stats.set_index_stats( user, 'T1_IDX', numrows=> 2000000, numlblks => 1200 );
exec dbms_stats.set_index_stats( user, 'T2_IDX', numrows=> 164000, numlblks => 100 );
exec dbms_stats.set_index_stats( user, 'T3_IDX', numrows=> 13000, numlblks => 50 );

set autotrace on explain
DELETE FROM t1 bl
where EXISTS (SELECT 'X'
FROM t2 ra,
t3 bv
WHERE bv.current_flag <> 'Y'
AND bv.budget_status_code = 'B'
AND bv.baselined_date IS NOT NULL
AND bl.resource_assignment_id = ra.resource_assignment_id
AND bv.budget_version_id = ra.budget_version_id)
/
DELETE FROM t1 bl
where resource_assignment_id in
(SELECT ra.resource_assignment_id
FROM t2 ra,
t3 bv
WHERE bv.current_flag <> 'Y'
AND bv.budget_status_code = 'B'
AND bv.baselined_date IS NOT NULL
AND bv.budget_version_id = ra.budget_version_id)
/
set autotrace off


March 26, 2005 - 3:14 pm UTC

Reviewer: A reader

apps@s0p2dbt3> create table pa.t2 as select * from pa.pa_resource_assignments;

Table created.

apps@s0p2dbt3>  create table pa.t3 as select * from pa.pa_budget_versions ;

Table created.

apps@s0p2dbt3> create table pa.t1 as select * from pa.pa_budget_lines;

Table created.

apps@s0p2dbt3> exec dbms_stats.set_table_stats( 'PA', 'T1', numrows=> 2000000, numblks => 1
2000 );

PL/SQL procedure successfully completed.
apps@s0p2dbt3> exec dbms_stats.set_table_stats( 'PA', 'T3', numrows=> 13000, numblks => 500
 );

PL/SQL procedure successfully completed.

apps@s0p2dbt3>   exec dbms_stats.set_table_stats( 'PA', 'T2', numrows=> 164000, numblks => 
1000 );

PL/SQL procedure successfully completed.

apps@s0p2dbt3>create index pa.t1_idx on pa.t1(resource_assignment_id) tablespace pax;

Index created.

SQL> create UNIQUE index pa.t2_idx on pa.t2(resource_assignment_id) tablespace pax;
create UNIQUE index pa.t3_idx on pa.t3(budget_version_id) tablespace pax;
                                                    

Index created.

SQL> 
Index created.

apps@s0p2dbt3> exec dbms_stats.set_index_stats( 'PA', 'T1_IDX', numrows=> 2000000, numlblks
 => 1200 );

PL/SQL procedure successfully completed.

apps@s0p2dbt3> exec dbms_stats.set_index_stats( 'PA', 'T2_IDX', numrows=> 164000, numlblks 
=> 100 );

PL/SQL procedure successfully completed.

apps@s0p2dbt3> exec dbms_stats.set_index_stats( 'PA', 'T3_IDX', numrows=> 13000, numlblks =
> 50 );

PL/SQL procedure successfully completed.


apps@s0p2dbt3> set autotrace on explain
apps@s0p2dbt3> DELETE FROM pa.t1 bl
  2  where EXISTS (SELECT  'X'
  3        FROM   pa.t2 ra,
  4               pa.t3 bv
  5          WHERE  bv.current_flag <> 'Y'
  6            AND  bv.budget_status_code = 'B'
  7            AND  bv.baselined_date IS NOT NULL
  8            AND  bl.resource_assignment_id = ra.resource_assignment_id
  9            AND  bv.budget_version_id = ra.budget_version_id)
 10  /

432 rows deleted.


Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=1201 Card=100000 Byt
          es=1300000)

   1    0   DELETE OF 'T1'
   2    1     INDEX (FULL SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1201 Ca
          rd=100000 Bytes=1300000)

   3    2       NESTED LOOPS (Cost=3 Card=1 Bytes=67)
   4    3         TABLE ACCESS (BY INDEX ROWID) OF 'T2' (Cost=2 Card=1
           Bytes=26)

   5    4           INDEX (UNIQUE SCAN) OF 'T2_IDX' (UNIQUE) (Cost=1 C
          ard=1640)

   6    3         TABLE ACCESS (BY INDEX ROWID) OF 'T3' (Cost=1 Card=1
           Bytes=41)

   7    6           INDEX (UNIQUE SCAN) OF 'T3_IDX' (UNIQUE)



apps@s0p2dbt3> 
apps@s0p2dbt3> rollback
  2  /

Rollback complete.

apps@s0p2dbt3> 
apps@s0p2dbt3> DELETE FROM pa.t1 bl
  2  where resource_assignment_id in
  3  (SELECT ra.resource_assignment_id
  4        FROM   pa.t2 ra,
  5               pa.t3 bv
  6          WHERE  bv.current_flag <> 'Y'
  7            AND  bv.budget_status_code = 'B'
  8            AND  bv.baselined_date IS NOT NULL
  9            AND  bv.budget_version_id = ra.budget_version_id)
 10  /

432 rows deleted.


Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=1474 Card=32800000 B
          ytes=852800000)

   1    0   DELETE OF 'T1'
   2    1     MERGE JOIN (Cost=1474 Card=32800000 Bytes=852800000)
   3    2       INDEX (FULL SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1201
          Card=2000000 Bytes=26000000)

   4    2       SORT (JOIN) (Cost=273 Card=1640 Bytes=21320)
   5    4         VIEW OF 'VW_NSO_1' (Cost=266 Card=1640 Bytes=21320)
   6    5           SORT (UNIQUE) (Cost=266 Card=1640 Bytes=109880)
   7    6             HASH JOIN (Cost=246 Card=1640 Bytes=109880)
   8    7               TABLE ACCESS (FULL) OF 'T3' (Cost=78 Card=1 By
          tes=41)

   9    7               TABLE ACCESS (FULL) OF 'T2' (Cost=153 Card=164
          000 Bytes=4264000)




apps@s0p2dbt3> 
apps@s0p2dbt3> rollback
  2  /

Rollback complete. 

Tom Kyte

Followup  

March 26, 2005 - 4:30 pm UTC

looks like it could be a limitation of the very old software. I need to go way back to 8i to reproduce this behavior.

suggest you might temporarily try two stepping this, eg:

tkyte@ORA8IW> create global temporary table gtt as
2 (SELECT ra.resource_assignment_id
3 FROM t2 ra
4 WHERE 1=0)
5 /
Table created.

tkyte@ORA8IW> exec dbms_stats.set_table_stats( user, 'GTT', numrows => 500 );
PL/SQL procedure successfully completed.

the above two things are done ONCE, then:


tkyte@ORA8IW> insert into gtt
2 (SELECT ra.resource_assignment_id
3 FROM t2 ra,
4 t3 bv
5 WHERE bv.current_flag <> 'Y'
6 AND bv.budget_status_code = 'B'
7 AND bv.baselined_date IS NOT NULL
8 AND bv.budget_version_id = ra.budget_version_id);

tkyte@ORA8IW> delete from t1 where resource_assignment_id in ( select * from gtt );

Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=519 Card=10000000 Bytes=260000000)
1 0 DELETE OF 'T1'
2 1 NESTED LOOPS (Cost=519 Card=10000000 Bytes=260000000)
3 2 VIEW OF 'VW_NSO_1' (Cost=19 Card=500 Bytes=6500)
4 3 SORT (UNIQUE) (Cost=19 Card=500 Bytes=6500)
5 4 TABLE ACCESS (FULL) OF 'GTT' (Cost=16 Card=500 Bytes=6500)
6 2 INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1 Card=2000000 Bytes=26000000)



tkyte@ORA8IW> set autotrace off

Purge huge table

April 02, 2005 - 7:31 am UTC

Reviewer: A reader

Tom,
"
looks like it could be a limitation of the very old software. I need to go way
back to 8i to reproduce this behavior.

suggest you might temporarily try two stepping this, eg:
"

I should give you the what I am trying to accomplish here.
We want to purge a table which has 70 million rows about 10 GB. on critira to partition the table. the goal is to remove 60 million rows from it based the critiras.
After reading your thread:
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6407993912330 <code>

It is no pratical to delete the table by the conventional method.
So i am working on based on
create tmp table with nologging;
insert table with append, and parallel;
drop original table;
rename tmp to original;
recreate indexes and grants;
done.

the insert 60 millions took 20 hours.
I would like your comment on the temp table, if the temp suitable to the massive inserts. what are needed to be change on the table:


CREATE TABLE PURGE_TMP_S
(
RESOURCE_ASSIGNMENT_ID NUMBER(15) NOT NULL,
START_DATE DATE NOT NULL,
LAST_UPDATE_DATE DATE NOT NULL,
LAST_UPDATED_BY NUMBER NOT NULL,
CREATION_DATE DATE NOT NULL,
CREATED_BY NUMBER NOT NULL,
LAST_UPDATE_LOGIN NUMBER NOT NULL,
END_DATE DATE NOT NULL,
PERIOD_NAME VARCHAR2(30),
QUANTITY NUMBER,
RAW_COST NUMBER,
BURDENED_COST NUMBER,
REVENUE NUMBER,
CHANGE_REASON_CODE VARCHAR2(30),
DESCRIPTION VARCHAR2(255),
ATTRIBUTE_CATEGORY VARCHAR2(30),
ATTRIBUTE1 VARCHAR2(150),
ATTRIBUTE2 VARCHAR2(150),
ATTRIBUTE3 VARCHAR2(150),
ATTRIBUTE4 VARCHAR2(150),
ATTRIBUTE5 VARCHAR2(150),
ATTRIBUTE6 VARCHAR2(150),
ATTRIBUTE14 VARCHAR2(150),
ATTRIBUTE15 VARCHAR2(150),
RAW_COST_SOURCE VARCHAR2(2),
BURDENED_COST_SOURCE VARCHAR2(2),
QUANTITY_SOURCE VARCHAR2(2),
REVENUE_SOURCE VARCHAR2(2),
PM_PRODUCT_CODE VARCHAR2(30),
PM_BUDGET_LINE_REFERENCE VARCHAR2(25),
COST_REJECTION_CODE VARCHAR2(30),
REVENUE_REJECTION_CODE VARCHAR2(30),
BURDEN_REJECTION_CODE VARCHAR2(30),
OTHER_REJECTION_CODE VARCHAR2(30),
CODE_COMBINATION_ID NUMBER(15),
CCID_GEN_STATUS_CODE VARCHAR2(1),
CCID_GEN_REJ_MESSAGE VARCHAR2(2000),
REQUEST_ID NUMBER
)
TABLESPACE PAD
PCTUSED 40
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (
INITIAL 40K
NEXT 40K
MINEXTENTS 1
MAXEXTENTS 10000000
PCTINCREASE 0
FREELISTS 1
FREELIST GROUPS 1
BUFFER_POOL DEFAULT
)
NOLOGGING
NOCACHE
NOPARALLEL;

Thanks a lot Tom.


Tom Kyte

Followup  

April 02, 2005 - 9:24 am UTC

the insert of 60million records took 20hours?

what kind of hardware/disk are we talking here. That is abnormally slow.

did you do a subset of it with trace to see what is taking so long?

April 02, 2005 - 7:33 am UTC

Reviewer: A reader

"We want to purge a table which has 70 million rows about 10 GB. on critira to
partition the table. "

i meant there is no critira for implement partitions on the table.
sorry for the typo.

April 02, 2005 - 11:31 am UTC

Reviewer: A reader

"
what kind of hardware/disk are we talking here. That is abnormally slow.

did you do a subset of it with trace to see what is taking so long?
"
it is Sun solaris 480 with 4 CPU. disk is raid 5. and there are some other applications running on the machine also.

I had it with the parallel 2 and did not set the sqltrace on it.

here is from statspack:
Buffer Gets Executions Gets per Exec % Total Hash Value
--------------- ------------ -------------- ------- ------------
29,014,953 1 29,014,953.0 25120040
insert
/*+ append */ into pa.pa_budget_lines_tmp_s pbls select
* from pa.pa_budget_lines pbl where pbl.resource_assignment_id i
n (SELECT ra.resource_assignment_id FROM pa.pa_resourc
e_assignments ra, pa.pa_budget_versions bv
WHERE bv.current_flag <> 'Y' AND bv.budget_sta



Tom Kyte

Followup  

April 02, 2005 - 11:54 am UTC

that is Oracle applications, do you have support working with you if you are planning on doing stuff to those tables with SQL?

is is the QUERY that takes 20 hours or the insert?

If you explain plan the insert and the select, do they use the same plan.

raid 5 = not fast stuff for direct path writing. but not 20 hours slow.

April 02, 2005 - 12:09 pm UTC

Reviewer: A reader

"
that is Oracle applications, do you have support working with you if you are planning on doing stuff to those tables with SQL?
"
Yes, it's oracle apps.
We want to present oracle support what we can do for the purge of PA tables, since the support has not been able to provide the method. we test the purge on our Test environment.


"
is is the QUERY that takes 20 hours or the insert?
"

Yes.

"
If you explain plan the insert and the select, do they use the same plan.

"
apps@s0p2dbt1> explain plan for
2 insert /*+ append parallel(pbls,4)*/
3 into pa.pa_budget_lines_tmp_s pbls
4 select /*+ parallel(pbl, 4) */
5 * from pa.pa_budget_lines pbl
6 where pbl.resource_assignment_id in
7 (SELECT ra.resource_assignment_id
8 FROM pa.pa_resource_assignments ra,
9 pa.pa_budget_versions bv
10 WHERE bv.current_flag <> 'Y'
11 AND bv.budget_status_code = 'B'
12 AND bv.baselined_date IS NOT NULL
13 AND bv.budget_version_id = ra.budget_version_id)
14 /

Explained.

apps@s0p2dbt1> @plan

Sat Apr 02

QUERY_PLAN
------------------------------------------------------------
LOAD AS SELECT
NESTED LOOPS
VIEW VW_NSO_1
SORT UNIQUE
HASH JOIN
TABLE ACCESS FULL PA_BUDGET_VERSIONS
TABLE ACCESS FULL PA_RESOURCE_ASSIGNMENTS
TABLE ACCESS BY INDEX ROWID PA_BUDGET_LINES
INDEX RANGE SCAN PA_BUDGET_LINES_N2

9 rows selected.


10 rows deleted.

apps@s0p2dbt1> explain plan for
2 select /*+ parallel(pbl, 4) */
3 * from pa.pa_budget_lines pbl
4 where pbl.resource_assignment_id in
5 (SELECT ra.resource_assignment_id
6 FROM pa.pa_resource_assignments ra,
7 pa.pa_budget_versions bv
8 WHERE bv.current_flag <> 'Y'
9 AND bv.budget_status_code = 'B'
10 AND bv.baselined_date IS NOT NULL
11 AND bv.budget_version_id = ra.budget_version_id)
12 /

Explained.

apps@s0p2dbt1> @plan

Sat Apr 02


QUERY_PLAN
------------------------------------------------------------
NESTED LOOPS
VIEW VW_NSO_1
SORT UNIQUE
HASH JOIN
TABLE ACCESS FULL PA_BUDGET_VERSIONS
TABLE ACCESS FULL PA_RESOURCE_ASSIGNMENTS
TABLE ACCESS BY INDEX ROWID PA_BUDGET_LINES
INDEX RANGE SCAN PA_BUDGET_LINES_N2

8 rows selected.


9 rows deleted.

apps@s0p2dbt1>


Tom Kyte

Followup  

April 02, 2005 - 1:13 pm UTC

"
is is the QUERY that takes 20 hours or the insert?
"
Yes.


Hmmm

Q: Is it raining or sunny
A: Yes

Must be a sunshower. So, still don't know which it is.

use autotrace traceonly explain, you'll get the cost/card values which are vital.

April 02, 2005 - 1:34 pm UTC

Reviewer: A reader

Q: Is it raining or sunny
A: Yes

sorry i did not read your question carefully.
Yes, I mean 20 hrs for inserts. the query run 30 minutes.

I found the parallelism does not work here:

insert /*+ append parallel(pbls,4)*/ 
into pa.pa_budget_lines_tmp_s pbls
 select /*+ parallel(pbl, 4) */
* from pa.pa_budget_lines pbl
where pbl.resource_assignment_id in 
(SELECT ra.resource_assignment_id
        FROM   pa.pa_resource_assignments ra,
               pa.pa_budget_versions bv
          WHERE  bv.current_flag <> 'Y'
            AND  bv.budget_status_code = 'B'
            AND  bv.baselined_date IS NOT NULL
            AND  bv.budget_version_id = ra.budget_version_id)
/

And The parallelism does not work for

insert /*+ append parallel(pbls,4)*/ 
into pa.pa_budget_lines_tmp_s pbls
 select 
* from pa.pa_budget_lines pbl
where pbl.resource_assignment_id in 
(SELECT ra.resource_assignment_id
        FROM   pa.pa_resource_assignments ra,
               pa.pa_budget_versions bv
          WHERE  bv.current_flag <> 'Y'
            AND  bv.budget_status_code = 'B'
            AND  bv.baselined_date IS NOT NULL
            AND  bv.budget_version_id = ra.budget_version_id)
/

The parallelism does not work for 
 select /*+ parallel(pbl, 4) */
* from pa.pa_budget_lines pbl
where pbl.resource_assignment_id in 
(SELECT ra.resource_assignment_id
        FROM   pa.pa_resource_assignments ra,
               pa.pa_budget_versions bv
          WHERE  bv.current_flag <> 'Y'
            AND  bv.budget_status_code = 'B'
            AND  bv.baselined_date IS NOT NULL
            AND  bv.budget_version_id = ra.budget_version_id)
/

The parallelism only work for:
select /*+ parallel(pbl, 4) */
* from pa.pa_budget_lines pbl

init.ora 

NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
fast_start_parallel_rollback         string  LOW
optimizer_percent_parallel           integer 0
parallel_adaptive_multi_user         boolean FALSE
parallel_automatic_tuning            boolean FALSE
parallel_broadcast_enabled           boolean FALSE
parallel_execution_message_size      integer 2148
parallel_instance_group              string
parallel_max_servers                 integer 25

Sat Apr 02                                                                                                                 page    2
 

NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
parallel_min_percent                 integer 0
parallel_min_servers                 integer 0
parallel_server                      boolean FALSE
parallel_server_instances            integer 1
parallel_threads_per_cpu             integer 2
recovery_parallelism                 integer 0
SQL> 

it's 8174 db.
 

Tom Kyte

Followup  

April 02, 2005 - 1:36 pm UTC

so, given that it would not normally take 19.5 hours to direct path write that data.... I'd suggest sql tracing on a sizable subset of the data and see where the bottleneck is. Just do it serially, easier to trace and understand.

April 02, 2005 - 2:24 pm UTC

Reviewer: A reader

Tom,
i will trace it. before doing that i would like the parallelism does not for the inserts and the query as i refered. it is only working for

select /*+ parallel(t,4) */ * from pa.pa_budget_lines t;



Tom Kyte

Followup  

April 02, 2005 - 2:28 pm UTC

not everything is parallelizable. and given the plans -- hash join used in a NL, I would guess that parallelizing the access to the outer table (which is now being picked up by index) wouldn't make sense.

maybe extent management ...

April 02, 2005 - 3:07 pm UTC

Reviewer: Alberto Dell'Era from Milan, Italy

from the review "Purge huge table" by "A Reader":

CREATE TABLE PURGE_TMP_S ...
STORAGE (
INITIAL 40K
NEXT 40K
...
)

if the tablespace is dictionary managed, parallel or not, most of the time will be spent changing the data dictionary and logging the changes ;)

Especially since the rows appear to be quite long, say 1k, so 1 extent for every 40 rows, 60,000,000 rows -> 1,500,000 extents looks a tad too much.


Tom Kyte

Followup  

April 02, 2005 - 3:09 pm UTC

laughing out loud, didn't even see that ;)

yes, that would be considered a tad small.

April 02, 2005 - 7:53 pm UTC

Reviewer: A reader

from the review "Purge huge table" by "A Reader":

CREATE TABLE PURGE_TMP_S ...
STORAGE (
INITIAL 40K
NEXT 40K
...
)

"
if the tablespace is dictionary managed, parallel or not, most of the time will
be spent changing the data dictionary and logging the changes ;)

Especially since the rows appear to be quite long, say 1k, so 1 extent for every
40 rows, 60,000,000 rows -> 1,500,000 extents looks a tad too much.
"

Forgot to metioned that. It is local managed tablespace in Oracle Applications 11.5.7.
and the temp table was created as

create table temptable as select * from pa.table;

The PA table is the orginal apps table.

Tom, that was the reason i asked your comment on the temp table.

to "A Reader"

April 03, 2005 - 11:29 am UTC

Reviewer: Alberto Dell'Era from Milan, Italy

>and the temp table was created as
>create table temptable as select * from pa.table;

(i guess you meant "create table temptable as select * from pa.table WHERE 1=0", right ?)

By "temptable", you mean pa.pa_budget_lines_tmp_s, correct ?

Btw, what's the output of

select TABLESPACE_NAME, EXTENT_MANAGEMENT, allocation_type, initial_extent, next_extent from dba_tablespaces where TABLESPACE_NAME= (select tablespace_name from dba_tables where owner = 'PA' and table_name = upper ('pa_budget_lines_tmp_s'));

select tablespace_name, initial_extent, next_extent, pct_increase from dba_tables where owner = 'PA' and table_name = upper ('pa_budget_lines_tmp_s');

And i would be curious to know what is the CREATE PURGE_TMP_S statement you gave above, if you created the table as a create-table-as-select ...

April 03, 2005 - 7:22 pm UTC

Reviewer: A reader

"
(i guess you meant "create table temptable as select * from pa.table WHERE 1=0",
right ?)
"
that's correct.
create table pa.pa_budget_lines_tmp_s as select * from pa.pa_budget_lines where 1=2
/

the temp table was created in the PA data tablespace PAD.

we are working on Apps 11i. we do not want to the original table definitions/storge, because the apps may become no supportable by Oracle.

APPS@S0P2DBT1> select TABLESPACE_NAME, EXTENT_MANAGEMENT, allocation_type, initial_extent,
2 next_extent from dba_tablespaces where TABLESPACE_NAME= (select tablespace_name
3 from dba_tables where owner = 'PA' and table_name = upper
4 ('pa_budget_lines_tmp_s'));

TABLESPACE_NAME
------------------------------------------------------------------------------------------
EXTENT_MANAGEMENT ALLOCATION_TYPE INITIAL_EXTENT NEXT_EXTENT
------------------------------ --------------------------- -------------- -----------
PAD
LOCAL USER 40960 40960


APPS@S0P2DBT1> select tablespace_name, initial_extent, next_extent, pct_increase from
2 dba_tables where owner = 'PA' and table_name = upper ('pa_budget_lines_tmp_s');

TABLESPACE_NAME
------------------------------------------------------------------------------------------
INITIAL_EXTENT NEXT_EXTENT PCT_INCREASE
-------------- ----------- ------------
PAD
40960 40960 0


APPS@S0P2DBT1>
APPS@S0P2DBT1>

Tom Kyte

Followup  

April 03, 2005 - 7:24 pm UTC

40k is a tad small for a really big table, don't you think?

I'll really have to recommend you work with support on this task. They can advice you if the extent sizes are relevant. You need to work within their advice for apps.

April 03, 2005 - 7:23 pm UTC

Reviewer: A reader

"we do not want to the original table ..."

should be "we do not want to alter the original table ..."
sorry for the typo.

IN and Inner Join

April 04, 2005 - 2:48 pm UTC

Reviewer: Praveen from Bangalore

Hi Tom,

I did the following experiment:

Oracle9iR2

CREATE TABLE ALL_OBJECTS1 AS SELECT * FROM sys.ALL_OBJECTS;

CREATE TABLE ALL_OBJECTS2 AS SELECT * FROM sys.ALL_OBJECTS;

CREATE INDEX idx_allobj1_objid ON ALL_OBJECTS1(object_id);

CREATE INDEX idx_allobj2_objid ON ALL_OBJECTS2(object_id);

ANALYZE TABLE ALL_OBJECTS1 COMPUTE STATISTICS;

ANALYZE TABLE ALL_OBJECTS2 COMPUTE STATISTICS;

SELECT COUNT(*) FROM ALL_OBJECTS;

COUNT(*)
--------
23533


Query 1
-------
SELECT * FROM ALL_OBJECTS1 a1
WHERE EXISTS (SELECT NULL FROM ALL_OBJECTS2 WHERE OBJECT_ID = a1.OBJECT_ID);

Explain Plan Output
-------------------
SELECT STATEMENT Optimizer Mode=CHOOSE
HASH JOIN
SORT UNIQUE
INDEX FAST FULL SCAN ORCL.IDX_ALLOBJ2_OBJID
TABLE ACCESS FULL ORCL.ALL_OBJECTS1

Query2
------
SELECT * FROM ALL_OBJECTS1 a1
WHERE EXISTS (SELECT NULL FROM ALL_OBJECTS2 WHERE OBJECT_ID = a1.OBJECT_ID);

Explain Plan Output
-------------------
SELECT STATEMENT Optimizer Mode=CHOOSE
HASH JOIN
SORT UNIQUE
INDEX FAST FULL SCAN ORCL.IDX_ALLOBJ2_OBJID
TABLE ACCESS FULL ORCL.ALL_OBJECTS1

Query3
------
SELECT * FROM ALL_OBJECTS1 a1, ALL_OBJECTS2 a2 WHERE a1.object_id = a2.object_id;

Explain Plan Output
-------------------
SELECT STATEMENT Optimizer Mode=CHOOSE
HASH JOIN
TABLE ACCESS FULL ORCL.ALL_OBJECTS1
TABLE ACCESS FULL ORCL.ALL_OBJECTS2


Que 1: The explain plan output of first two queries contradicts with what you have answered
to the question of this thread. ie: IN and EXISTS are processed very, very
differently. But the plan here shows same exact access paths and costs and performance.
What went wrong?

Que 2: Ref: Query3. Why did a direct inner join between the two tables gave and entirely
different plan and poor performance than IN (or EXISTS). You said that IN is
infact a join done by first table on a 'distincted' and 'hashed', inner query.
Query 3 is not even using the indexes available!

Que 3: How should I choose between inner join and IN?

Thanks
Praveen


Tom Kyte

Followup  

April 04, 2005 - 5:25 pm UTC

good point

things change, good thing I did a test case above that you can test with to observe that and point it out to the world (I'm not being "snarky", I'm really overjoyed at the timing actually)....


Use the RBO and see what you see. way back when I wrote this, that was the "more popular" of the two perhaps

today in 2005, what I said years ago using the RBO does not apply to the CBO. the cbo is smart enough to recognize these two things are effectively the same.

test recoded

April 04, 2005 - 6:17 pm UTC

Reviewer: Alberto Dell'Era from Milan, Italy

But query3 isn't semantically equivalent to query1/2... you should put a unique constraint on ALL_OBJECTS2_uq (object_id) to make them "the same", and recode the statement as below (or not?)

It goes even better anyway - the sort can be avoided alltogether (and it's confirmed that query1 and query2 plans are the same) - i have coded query2 the way i suppose it should be (query1 text is equal to query2 in the review):

CREATE TABLE ALL_OBJECTS1 AS SELECT * FROM sys.ALL_OBJECTS;

CREATE TABLE ALL_OBJECTS2 AS SELECT * FROM sys.ALL_OBJECTS;

CREATE INDEX idx_allobj1_objid ON ALL_OBJECTS1(object_id);

CREATE INDEX idx_allobj2_objid ON ALL_OBJECTS2(object_id);

exec dbms_stats.gather_table_stats (user, 'ALL_OBJECTS1', cascade=>true);
exec dbms_stats.gather_table_stats (user, 'ALL_OBJECTS2', cascade=>true);


Query 1:
SELECT /* check */ * FROM ALL_OBJECTS1 a1
WHERE EXISTS (SELECT NULL FROM ALL_OBJECTS2 WHERE OBJECT_ID = a1.OBJECT_ID);

Query2:
SELECT /* check */ * FROM ALL_OBJECTS1 a1
WHERE OBJECT_ID in (SELECT OBJECT_ID FROM ALL_OBJECTS2);

alter table ALL_OBJECTS2 add constraint ALL_OBJECTS2_uq unique (object_id);

Query3:
SELECT /* check unique */ a1.* FROM ALL_OBJECTS1 a1, ALL_OBJECTS2 a2
WHERE a1.object_id = a2.object_id;

======================================================================
SELECT /* check */ * FROM ALL_OBJECTS1 a1 WHERE EXISTS (SELECT NULL
FROM ALL_OBJECTS2 WHERE OBJECT_ID = a1.OBJECT_ID)

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 94 |
|* 1 | HASH JOIN SEMI | | 35281 | 3376K| 94 |
| 2 | TABLE ACCESS FULL | ALL_OBJECTS1 | 35281 | 3204K| 50 |
| 3 | INDEX FAST FULL SCAN| IDX_ALLOBJ2_OBJID | 35282 | 172K| 9 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("OBJECT_ID"="OBJECT_ID")

Note: cpu costing is off
======================================================================
SELECT /* check */ * FROM ALL_OBJECTS1 a1 WHERE OBJECT_ID in (SELECT OBJECT_ID
FROM ALL_OBJECTS2)


----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 94 |
|* 1 | HASH JOIN SEMI | | 35281 | 3376K| 94 |
| 2 | TABLE ACCESS FULL | ALL_OBJECTS1 | 35281 | 3204K| 50 |
| 3 | INDEX FAST FULL SCAN| IDX_ALLOBJ2_OBJID | 35282 | 172K| 9 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("OBJECT_ID"="OBJECT_ID")

Note: cpu costing is off

======================================================================
SELECT /* check unique */ a1.* FROM ALL_OBJECTS1 a1, ALL_OBJECTS2 a2 WHERE a1.object_id = a2.object_id

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 50 |
| 1 | NESTED LOOPS | | 35281 | 3376K| 50 |
| 2 | TABLE ACCESS FULL | ALL_OBJECTS1 | 35281 | 3204K| 50 |
|* 3 | INDEX RANGE SCAN | IDX_ALLOBJ2_OBJID | 1 | 5 | |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("OBJECT_ID"="OBJECT_ID")

Note: cpu costing is off

Note: plans fetched from v$sql_plan, 9.2.0.6.

In and Out Parameters and IN/Exists operator

April 05, 2005 - 3:13 am UTC

Reviewer: Keshav Madhusoodan from India

Excellent stuff, I got here.



Inner Join and IN

April 05, 2005 - 9:47 am UTC

Reviewer: Praveen from Bangalore

Well Tom, by that you mean, with newer versions of CBO playing around, there won't by any difference at all between IN and EXISTS - whatever be the nature of data sources are - means whether an index exists or not, unique or not, relative sizes of each tables etc, did you? That's what my execution plan shows me - with a small difference. With unique key defined, 'NESTED LOOP SEMI' - for EXISTS, 'NESTED LOOP' - for IN; But otherwise the costs are same.

I had one more doubt that whether IN is always going to perform better that than a direct inner join? Testing I've done is telling this. Or is there any situation where inner join is better than IN?

Thanks, Alberto, for pointing out something important (and also correcting the error,...:). Unfortunately, the execution plan I get here, after putting an unique constraint on object_id, is very much different from the one that you have given. 'NESTED LOOP SEMI' - for EXISTS, 'NESTED LOOP' - for IN and surprisingly, HASH JOIN - for inner join! Well, I gathered statistics using ANALYZE.


Thanks
Praveen

Tom Kyte

Followup  

April 05, 2005 - 12:10 pm UTC

in and exists are equivalent, so the advice of the past (with rbo) does not apply.


use either or.

Question regarding "IN & EXISTS"

May 16, 2005 - 4:38 pm UTC

Reviewer: Leo James from Glen Burnie, MD, USA

Very well explained TOM.

in, inner join and unnesting

August 10, 2005 - 9:56 pm UTC

Reviewer: James from Australia

Dear Tom,

You said (right at the start of this thread) concerning the unnesting/query rewriting of in subqueries:

>is typically processed as:
>
>select *
> from t1, ( select distinct y from t2 ) t2
> where t1.x = t2.y;
>
>The subquery is evaluated, distinct'ed, indexed (or >hashed or sorted) and then
>joined to the original table -- typically.

I'm worried about the typically :-)

I have 3 queries that return the same result:

Query 1 - in query
select /*+ NORMAL */ count(*) from carsales_view
where carsales_view.dealer_id in (select dealer_id from dealer_regions where region_id=325)

Query 2 - inner join
select /*+ INNER */ count(*) from carsales_view,dealer_regions
where carsales_view.dealer_Id=dealer_regions.dealer_id and region_id = 325

Query 3 - "Typical" unnested in subquery
select /*+ UNNESTED */
count(*) from carsales_view t1,( select distinct dealer_id from dealer_regions where region_id=325) t2
where t1.dealer_Id=t2.dealer_id

The dealer_regions table has had stats created and enough buckets specified for the CBO to choose between two indexes.

The problem is that the 1st query - the in subquery - is not being executed as well as it should, the wrong index is being selected. What is even more curious is that the other 2 queries perform as expected - even q3 which is the "typical" unnested version of the 1st query.

I could just use an index hint but I would rather the CBO got it right as the selectivity of "region_id" varies (and sometimes the index I specified in the hint wont be the best one.)

Any advice to "encourage" the CBO to see things my way?

BTW I am using 9.1.5 so that maybe where my problem lies.

--trace output--

select /*+ NORMAL */ count(*) from carsales_view
where carsales_view.dealer_id in (select dealer_id from dealer_regions where region_id=325)

call count cpu elapsed disk query current rows
total 4 0.58 0.61 55 94638 0 1

Rows Row Source Operation
1 SORT AGGREGATE (cr=94620 r=55 w=0 time=595018 us)
39 NESTED LOOPS SEMI (cr=94620 r=55 w=0 time=594979 us)
45815 NESTED LOOPS (cr=48803 r=55 w=0 time=349295 us)
48690 INDEX FAST FULL SCAN CSVMV_DEID_IND (cr=111 r=55 w=0 time=65768 us)(object id 270087)
45815 INDEX UNIQUE SCAN CSDM_PK (cr=48692 r=0 w=0 time=199251 us)(object id 270093)
39 INDEX UNIQUE SCAN DERE_PK (cr=45817 r=0 w=0 time=167174 us)(object id 281197)

Rows Execution Plan
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
39 NESTED LOOPS (SEMI)
45815 NESTED LOOPS
48690 INDEX GOAL: ANALYZED (FAST FULL SCAN) OF
'CSVMV_DEID_IND' (NON-UNIQUE)
45815 INDEX GOAL: ANALYZED (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)
39 INDEX (UNIQUE SCAN) OF 'DERE_PK' (UNIQUE)
********************************************************************************
select /*+ INNERJOIN */ count(*) from carsales_view,dealer_regions
where carsales_view.dealer_Id=dealer_regions.dealer_id and region_id = 325

call count cpu elapsed disk query current rows
total 4 0.00 0.00 3 56 0 1

Rows Row Source Operation
1 SORT AGGREGATE (cr=54 r=3 w=0 time=965 us)
39 NESTED LOOPS (cr=54 r=3 w=0 time=941 us)
39 NESTED LOOPS (cr=13 r=3 w=0 time=690 us)
5 TABLE ACCESS BY INDEX ROWID DEALER_REGIONS (cr=6 r=3 w=0 time=514 us)
5 INDEX RANGE SCAN DERE_REID_IND (cr=2 r=1 w=0 time=229 us)(object id 281196)
39 INDEX RANGE SCAN CSVMV_DEID_IND (cr=7 r=0 w=0 time=143 us)(object id 270087)
39 INDEX UNIQUE SCAN CSDM_PK (cr=41 r=0 w=0 time=180 us)(object id 270093)

Rows Execution Plan
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
39 NESTED LOOPS
39 NESTED LOOPS
5 TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF
'DEALER_REGIONS'
5 INDEX (RANGE SCAN) OF 'DERE_REID_IND' (NON-UNIQUE)
39 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'CSVMV_DEID_IND'
(NON-UNIQUE)
39 INDEX GOAL: ANALYZED (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)
********************************************************************************
select /*+ UNNESTED */
count(*) from carsales_view t1,( select distinct dealer_id from dealer_regions where region_id=325) t2
where t1.dealer_Id=t2.dealer_id

call count cpu elapsed disk query current rows
total 4 0.00 0.00 0 56 0 1

Rows Row Source Operation
1 SORT AGGREGATE (cr=54 r=0 w=0 time=596 us)
39 NESTED LOOPS (cr=54 r=0 w=0 time=577 us)
39 NESTED LOOPS (cr=13 r=0 w=0 time=337 us)
5 VIEW (cr=6 r=0 w=0 time=196 us)
5 SORT UNIQUE (cr=6 r=0 w=0 time=189 us)
5 TABLE ACCESS BY INDEX ROWID OBJ#(281059) (cr=6 r=0 w=0 time=99 us)
5 INDEX RANGE SCAN OBJ#(281196) (cr=2 r=0 w=0 time=48 us)(object id 281196)
39 INDEX RANGE SCAN OBJ#(270087) (cr=7 r=0 w=0 time=111 us)(object id 270087)
39 INDEX UNIQUE SCAN OBJ#(270093) (cr=41 r=0 w=0 time=161 us)(object id 270093)

Rows Execution Plan
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
39 NESTED LOOPS
39 NESTED LOOPS
5 VIEW
5 SORT (UNIQUE)
5 TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF
'DEALER_REGIONS'
5 INDEX (RANGE SCAN) OF 'DERE_REID_IND' (NON-UNIQUE)
39 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'CSVMV_DEID_IND'
(NON-UNIQUE)
39 INDEX GOAL: ANALYZED (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)













Tom Kyte

Followup  

August 11, 2005 - 9:29 am UTC

do you really do this without bind variables? in the application?

what is 9.1.5

A soloution to my last review

August 11, 2005 - 1:21 am UTC

Reviewer: James from Australia

Sorry but I have found a work around.

The 1st query can be written as:

select
count(*) from carsales_view
where dealer_id in (
select dealer_id from dealer_regions where region_id=325 and rownum>0)
/

Which achieves the same performance of a inner join for selective and non-selective values of region_id (although with a different plan):

region_id=325 - low selectivity:

56 consistant gets compared with 94,000 (qry1)
This compares to the manually unested query (q3) of 56 consistant gets.

region_id=81 - high selectivity:

15,572 consistant gets compared to 94,000 (using qry1)
This compares to the manually unested query (q3) of 15572 consistant gets.

So I have managed to get the CBO to execute the query based on the selectivity of the region_id field in the dealer_regions table.

I beleive this is probably a bug in the 9.1.5 CBO?

region_id=325 - high selectivity with rownum>0

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=19 Card=1 Bytes=22)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (Cost=19 Card=241 Bytes=5302)
3 2 NESTED LOOPS (Cost=19 Card=241 Bytes=4338)
4 3 VIEW OF 'VW_NSO_1' (Cost=2 Card=5 Bytes=65)
5 4 SORT (UNIQUE)
6 5 COUNT
7 6 FILTER
8 7 TABLE ACCESS (BY INDEX ROWID) OF 'DEALER_REG
IONS' (Cost=2 Card=5 Bytes=40)

9 8 INDEX (RANGE SCAN) OF 'DERE_REID_IND' (NON
-UNIQUE) (Cost=1 Card=5)

10 3 INDEX (RANGE SCAN) OF 'CSVMV_DEID_IND' (NON-UNIQUE)
(Cost=1 Card=48 Bytes=240)

11 2 INDEX (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)




Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
56 consistent gets
0 physical reads
0 redo size
214 bytes sent via SQL*Net to client
275 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed

region_id=81 - high selectivity with rownum>0

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=22 Card=1 Bytes=22)
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (Cost=22 Card=33150 Bytes=729300)
3 2 HASH JOIN (SEMI) (Cost=22 Card=33150 Bytes=596700)
4 3 INDEX (FAST FULL SCAN) OF 'CSVMV_DEID_IND' (NON-UNIQ
UE) (Cost=12 Card=48691 Bytes=243455)

5 3 VIEW OF 'VW_NSO_1' (Cost=3 Card=689 Bytes=8957)
6 5 COUNT
7 6 FILTER
8 7 TABLE ACCESS (FULL) OF 'DEALER_REGIONS' (Cost=
3 Card=689 Bytes=5512)

9 2 INDEX (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)




Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
15572 consistent gets
0 physical reads
0 redo size
216 bytes sent via SQL*Net to client
275 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed







reference for last answer

August 11, 2005 - 1:24 am UTC

Reviewer: James from Australia

IN, EXISTS and NULL

November 21, 2005 - 4:09 am UTC

Reviewer: Sanjaya Balasuriya from Colombo, Sri Lanka

Hi Tom,
How IN and EXISTS are effected by NULL values ?

Is it possible that IN and EXISTS return different sets of data depending on NULL values ?

Tom Kyte

Followup  

November 21, 2005 - 8:41 am UTC

in and exists are comparable in the presence of NULLS

not in and not exists behave differently.

Oracle10g Doc has a different opinion.

December 20, 2005 - 5:02 am UTC

Reviewer: Praveen from Dubai

Hi Tom,

Oracle10g documentation says IN and EXISTS executes and performs differently depending on the situation.

"In general, if the selective predicate is in the subquery, then use IN. If the
selective predicate is in the parent query, then use EXISTS."

</code> http://download-west.oracle.com/docs/cd/B14117_01/server.101/b10752/sql_1016.htm#28145 <code>

This is contradicting with your opinion that-

"today in 2005, what I said years ago using the RBO does not apply to the CBO.
the cbo is smart enough to recognize these two things are effectively the same." above.

Please explain.

Regards


Tom Kyte

Followup  

December 20, 2005 - 10:15 am UTC

ops$tkyte@ORA10GR2> create table employees as select * from hr.employees where 1=0;
 
Table created.
 
ops$tkyte@ORA10GR2> create table orders as select * from oe.orders where 1=0;
 
Table created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert into employees
  2  select object_id,
  3         substr(object_name,1,20),
  4         substr(object_name,1,25),
  5             substr(object_type,1,25),
  6             substr(last_ddl_time,1,20),
  7         created,
  8         substr(object_name,1,10),
  9         object_id,
 10         0.10,
 11         object_id+1,
 12         mod(rownum,1000)
 13    from all_objects
 14   where rownum <= 27000;
 
27000 rows created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert into orders
  2  select rownum,
  3         created,
  4         substr(object_type,1,8),
  5         mod(rownum,150),
  6         1,
  7         object_id,
  8         object_id,
  9         1
 10    from all_objects
 11   where rownum <= 10000;
 
10000 rows created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> alter table employees add constraint emp_pk primary key(employee_id);
 
Table altered.
 
ops$tkyte@ORA10GR2> create index orders_customer_id on orders(customer_id);
 
Index created.
 
ops$tkyte@ORA10GR2> create index employees_department_id on employees(department_id);
 
Index created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'EMPLOYEES', method_opt=>'for all indexed columns' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'ORDERS', method_opt=>'for all indexed columns' );
 
PL/SQL procedure successfully completed.


ops$tkyte@ORA10GR2> set autotrace traceonly explain
ops$tkyte@ORA10GR2> SELECT /* EXISTS example */
  2           e.employee_id, e.first_name, e.last_name, e.salary
  3    FROM employees e
  4   WHERE EXISTS (SELECT 1 FROM orders o                  /* Note 1 */
  5                    WHERE e.employee_id = o.sales_rep_id   /* Note 2 */
  6                      AND o.customer_id = 144);
 
Execution Plan
----------------------------------------------------------
Plan hash value: 551415261
 
------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| 
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    67 |  4087 |    49   (3)| 
|   1 |  NESTED LOOPS                |           |    67 |  4087 |    49   (3)| 
|   2 |   SORT UNIQUE                |           |    67 |  1139 |    14   (0)| 
|*  3 |    TABLE ACCESS FULL         | ORDERS    |    67 |  1139 |    14   (0)| 
|   4 |   TABLE ACCESS BY INDEX ROWID| EMPLOYEES |     1 |    44 |     1   (0)| 
|*  5 |    INDEX UNIQUE SCAN         | EMP_PK    |     1 |       |     0   (0)| 
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("O"."CUSTOMER_ID"=144)
   5 - access("E"."EMPLOYEE_ID"="O"."SALES_REP_ID")
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> SELECT /* IN example */
  2           e.employee_id, e.first_name, e.last_name, e.salary
  3      FROM employees e
  4     WHERE e.employee_id IN (SELECT o.sales_rep_id         /* Note 4 */
  5                               FROM orders o
  6                              WHERE o.customer_id = 144);
 
Execution Plan
----------------------------------------------------------
Plan hash value: 551415261
 
------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| 
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    67 |  4087 |    49   (3)| 
|   1 |  NESTED LOOPS                |           |    67 |  4087 |    49   (3)| 
|   2 |   SORT UNIQUE                |           |    67 |  1139 |    14   (0)| 
|*  3 |    TABLE ACCESS FULL         | ORDERS    |    67 |  1139 |    14   (0)| 
|   4 |   TABLE ACCESS BY INDEX ROWID| EMPLOYEES |     1 |    44 |     1   (0)| 
|*  5 |    INDEX UNIQUE SCAN         | EMP_PK    |     1 |       |     0   (0)| 
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("O"."CUSTOMER_ID"=144)
   5 - access("E"."EMPLOYEE_ID"="O"."SALES_REP_ID")
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> set autotrace off



They give insufficient details in their example to reproduce.  when you tested this out - what did you yourself see? 

what's the maximum number of values in the IN expression

January 03, 2006 - 8:38 pm UTC

Reviewer: Jianhui from CA

Hi Tom,

IN (value1, value2, ... valueN)

What's the maximum limit of N here allowed by Oracle? Do you have URL to the document for this limit?

Thanks



Tom Kyte

Followup  

January 03, 2006 - 8:46 pm UTC

</code> http://docs.oracle.com/docs/cd/B19306_01/server.102/b14200/expressions014.htm#i1033664 <code>



Question about not exists

January 19, 2006 - 8:27 am UTC

Reviewer: Oleg

Hi,Tom,
I have this query:
SELECT b.branch_no, b.salesman_no,
ROUND
( SUM (CASE
WHEN b.returning_no !=2
THEN order_qty
END)
/ 22
) avg_rtrn_orders_qty,
ROUND ( SUM (CASE
WHEN b.bsnss_no != 6
THEN order_qty
END)
/ 22
) avg_no_gas_orders_qty,
ROUND ( SUM (CASE
WHEN b.bsnss_no = 6
THEN order_qty
END)
/ 22
) avg_gas_orders_qty
FROM km_orders a, km_customers b, km_nwpr_editions c,km_closed_customers d
WHERE a.cstmr_no = b.cstmr_no
and b.cstmr_no=d.cstmr_no(+)
and c.nwpr_date BETWEEN d.date_from(+) AND d.date_to(+)
AND b.bsnsslike_sw = 'ë'
AND a.prod_no = c.prod_no
AND c.nwpr_date BETWEEN a.date_from AND a.date_to
AND a.prod_no = 1
AND b.branch_no != 9
AND c.nwpr_date BETWEEN TO_DATE ('01/11/2005', 'DD/MM/YYYY')
AND TO_DATE ('30/11/2005', 'DD/MM/YYYY')
AND a.day_in_week = c.nwpr_day
AND NOT EXISTS (
SELECT 1
FROM km_closed_customers
WHERE cstmr_no = a.cstmr_no
AND c.nwpr_date BETWEEN date_from AND date_to)
GROUP BY b.branch_no, b.salesman_no
I see that without
" AND NOT EXISTS (
SELECT 1
FROM km_closed_customers
WHERE cstmr_no = a.cstmr_no
AND c.nwpr_date BETWEEN date_from AND date_to)"
the query is running faster,but i need to check this table,
what can i do instead this not exists statement?
Thank you,
Oleg,


best practice?

February 17, 2006 - 1:14 pm UTC

Reviewer: A reader

Tom,
is it a good practice to have 2 exists in the form?

SELECT *
FROM TABLE a
WHERE NOT EXISTS (SELECT 1
FROM TABLE B i
WHERE i.CLASS = a.CLASS_NAME
AND NVL(i.CLASS_NAME,0) = NVL(a.CLASS_DEPT,0)
)
AND NOT EXISTS (SELECT 1
FROM work_orders w
WHERE w.class_id = a.s_name
AND NVL(w.CLASS_ID,0) = nvl(a.CLASS_DEPT,0)
)

Tom Kyte

Followup  

February 17, 2006 - 4:59 pm UTC

it seems to be less a matter of "good practice" and more a matter of getting the right answer though isn't it?

It is a bad practice to do that nvl'ing bit in general. sure you really need to do that.

can you clarify with example please

February 22, 2006 - 2:00 pm UTC

Reviewer: A reader


Tom Kyte

Followup  

February 22, 2006 - 2:13 pm UTC

an example of what exactly

of how can I avoid

February 22, 2006 - 3:06 pm UTC

Reviewer: A reader

Tom,
How can I avoid "the nvl'ing bit" in this case.

Tom Kyte

Followup  

February 22, 2006 - 3:21 pm UTC

fixing the model? apparently you didn't mean to use nulls? Not if you basically join on them like that.

Else, we'd have to function base index them in this case.

Are you sure nulls are an issue here. Do you really need to join by them or is that "just in case"



THANKS!!!!!

February 22, 2006 - 4:03 pm UTC

Reviewer: A reader


updating using EXISTS

March 09, 2006 - 3:23 pm UTC

Reviewer: anirudh from usa

hi tom

in our project we often have a situaton where some of the fields in a table (master_table in the example below) needs to be updated with data in another table (UPDATES_FEED in the example below)

MASTER_TABLE
key_fld
fld_1
fld_2
fld_3
fld_4
fld_5
fld_6

UPDATES_FEED
key_fld
fld_1
fld_2

i typically use SQL such as :

update master_table a
set (fld_1,fld_2) = (select fld_1,fld_2
from updates_feed b
where a.key_fld = b.key_fld)
where exists (select 1
from updates_feed c
where a.key_fld = c.key_fld)

I have to add the WHERE condition otherwise if no match is found in updates_feed table it would update the fields to NULL)

even though it works - it looks some what less than elegant - and has performance issues as well - as i think the lookup on the updates_feed is happening two times

Is there a better way of achieving the result ?





Tom Kyte

Followup  

March 09, 2006 - 4:21 pm UTC

update a join


update (select m.fld_1, m.fld_2, u.fld_1 n1, u.fld_2 n2
from master_table m, updates_feed u
where m.key_fld = u.key_fld )
set fld_1 = n1, fld_2 = n2;



wonderful

March 10, 2006 - 4:54 am UTC

Reviewer: A reader

Morning Tom,

i have always wondered the best way of doing that kind of update. in your example is it therefore possible to update either or indeed both of the tables used in the select statement?

Regards
Dave

Tom Kyte

Followup  

March 10, 2006 - 12:17 pm UTC

only one, you are updating a table from another table

Strange behaviour

March 30, 2006 - 6:39 am UTC

Reviewer: Abhijit Mallick from India

Hi Tom,
   Today I have found a strange behaviuor of SQL. The problem is as follows:
  
Please see the details: 

SQL> create table tab2
  2  (col21 number);

Table created.

SQL> insert into tab2 select rownum from user_objects;

2429 rows created.

SQL> create table tab1
  2  (col1 varchar2(1000),
  3   col2 number);

Table created.

SQL> insert into tab1 select table_name,rownum from user_tables;

293 rows created.

SQL> commit;

Commit complete.

SQL> ed
Wrote file afiedt.buf

  1  select count(*) from tab1
  2  where col1 is not null
  3* and col2 not in (select col2 from tab2)

SQL> /

  COUNT(*)
----------
         0

---LOOK AT THE INNER QUERY 
THE TABLE TAB2 DOES NOT HAVE THE COL2 STILL THE QUERY EXECUTES WITHOUT ANY ERROR. SEE THE BELOW PART , IT GIVES THE ERROR.

SQL> select col2 from tab2;
select col2 from tab2
       *
ERROR at line 1:
ORA-00904: "COL2": invalid identifier


SQL> 


Can you please discuss the behaviour of the above sub-query.

Thanks
--Abhijit Mallick
Bangalore
 

Tom Kyte

Followup  

March 31, 2006 - 11:22 am UTC

it is called a correlated subquery.

select count(*)
from tab1
where col1 is not null
and col2 not in (select TAB1.COL2 from tab2)


that is the same query as the one you executed.

Just like you could:

and not exists ( select * from tab2 where tab2.col21 = tab1.col2 )

would work - as would
and not exists ( select * from tab2 where col21 = col2 )

we would just figure out using standard scoping rules whence each reference column was from.


IN & EXISTS

April 19, 2006 - 8:09 am UTC

Reviewer: jash from India

How to tune the below query to get result very quickly

select count(*) from polineitemtab;

COUNT(*)
----------
1581181

select count(*) from requisitiontab;

COUNT(*)
----------
284956

select rootid from NEW_TEMP_polineitemtab
where poli_requisition not in
(select rootid from NEW_TEMP_requisitiontab)

Tom Kyte

Followup  

April 19, 2006 - 9:08 am UTC

make sure you are using the CBO
make sure rootid is NOT NULL in NEW_TEMP_requisitiontab OR add "where rootid is not null" to the subquery.

or

select /*+ ALL_ROWS */ a.rootid
from NEW_TEMP_polineitemtab a, NEW_TEMP_requisitiontab b
where a.rootid = b.rootid(+)
and b.rootid is null;

(a do it yourself hash anti-join)

IN & EXISTS

April 19, 2006 - 9:33 am UTC

Reviewer: jash from India

What is b.rootid(+) significance in this query

Tom Kyte

Followup  

April 19, 2006 - 9:37 am UTC

outer join - you wanted to find rows in A, not in B - so we outer join A to B and only keep those where there was no match (b.rootid is null)

IN & EXISTS

April 19, 2006 - 9:44 am UTC

Reviewer: jash from India

There is no null values in b.rootid(column) NEW_TEMP_requisitiontab(table) what is the reason using is null

Tom Kyte

Followup  

April 19, 2006 - 10:25 am UTC

when you outer join T1 to T2, SQL will "make up" a NULL row in T2 to join to when there isn't a row to be joined to.

So, the following two queries are semantically equivalent (assuming t2.x is NOT NULL)

ops$tkyte@ORA9IR2> create table t1 ( x int );
Table created.
 
ops$tkyte@ORA9IR2> create table t2 ( x int NOT NULL );
Table created.
 
ops$tkyte@ORA9IR2> insert into t1 values ( 1 );
1 row created.
 
ops$tkyte@ORA9IR2> insert into t1 values ( 2 );
1 row created.
 
ops$tkyte@ORA9IR2> insert into t2 values ( 1 );
1 row created.
 
ops$tkyte@ORA9IR2> select * from t1 where x not in (select x from t2);
 
         X
----------
         2
 
ops$tkyte@ORA9IR2> select t1.*
  2    from t1, t2
  3   where t1.x = t2.x(+)
  4     and t2.x is null;
 
         X
----------
         2
 

EXISTS vs "where rownum <=1"

May 04, 2006 - 7:03 pm UTC

Reviewer: K. Cruz from WashingtonDC

Thanks Tom for your valuable support.

I'm reviewing potentially bad queries and ran across a query that looks for the existence of a value in a table. This query looked suspicious to me and wanted to get your feedback.

SELECT DECODE(COUNT(NF.SUFFIX), 0, 'N', 'Y')
FROM NF_FILES NF,
NF_RPC_LOCATION NRL
WHERE NF.SUFFIX = :B3
AND NF.OFFICE_ID = NRL.OFFICE_ID
AND NF.SECTION_CODE = NRL.SECTION_CODE
AND NF.RESP_CODE = NRL.RESP_CODE
AND NF.OFFICE_ID = :B2
AND NF.LAST_TRANSACTION_DATE >= to_date(:B1, 'MMDDYYYY')
AND ROWNUM <= 1

After rewriting the query so that it uses an EXIST clause, the explain plan for both seemed to look very similar.

17:48:53 nfuser@NFDEV> l
1 SELECT DECODE(COUNT(NF.SUFFIX), 0, 'N', 'Y')
2 FROM NF_FILES NF,
3 NF_RPC_LOCATION NRL
4 WHERE NF.SUFFIX = :B3
5 AND NF.OFFICE_ID = NRL.OFFICE_ID
6 AND NF.SECTION_CODE = NRL.SECTION_CODE
7 AND NF.RESP_CODE = NRL.RESP_CODE
8 AND NF.OFFICE_ID = :B2
9 AND NF.LAST_TRANSACTION_DATE >= to_date(:B1, 'MMDDYYYY')
10* AND ROWNUM <= 1
17:48:54 nfuser@NFDEV> /

D
-
N

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=1 Bytes=43)
1 0 SORT (AGGREGATE)
2 1 COUNT (STOPKEY)
3 2 NESTED LOOPS (Cost=7 Card=1 Bytes=43)
4 3 PARTITION RANGE (SINGLE)
5 4 TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'NF_FILES' (Cost=5 Card=1 Bytes=31)
6 5 INDEX (RANGE SCAN) OF 'NF_OFFID_P1_IDX' (NON-UNIQUE) (Cost=4 Card=1)
7 3 INDEX (RANGE SCAN) OF 'RPC_LOCATION_RESP_CNTL_FK_I'(NON-UNIQUE) (Cost=2 Card=1 Bytes=12)


17:50:15 nfuser@NFDEV> l
1 SELECT 'Y'
2 FROM DUAL
3 WHERE EXISTS ( SELECT 1
4 FROM NF_FILES NF,
5 NF_RPC_LOCATION NRL
6 WHERE NF.SUFFIX = :B3
7 AND NF.OFFICE_ID = NRL.OFFICE_ID
8 AND NF.SECTION_CODE = NRL.SECTION_CODE
9 AND NF.RESP_CODE = NRL.RESP_CODE
10 AND NF.OFFICE_ID = :B2
11* AND NF.LAST_TRANSACTION_DATE >= to_date(:B1, 'MMDDYYYY') )
17:50:16 nfuser@NFDEV> /

no rows selected

Elapsed: 00:00:00.06

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=17 Card=8168)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'DUAL' (Cost=17 Card=8168)
3 1 NESTED LOOPS (Cost=7 Card=1 Bytes=43)
4 3 PARTITION RANGE (SINGLE)
5 4 TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'NF_FILES'(Cost=5 Card=1 Bytes=31)
6 5 INDEX (RANGE SCAN) OF 'NF_OFFID_P1_IDX' (NON-UNIQUE) (Cost=4 Card=1)
7 3 INDEX (RANGE SCAN) OF 'RPC_LOCATION_RESP_CNTL_FK_I' (NON-UNIQUE) (Cost=2 Card=1 Bytes=12)

NF_FILES is a 300 million row table so based on your previous advice, my outer query was the DUAL table and my inner query was the large NF_FILES table. I'm not able to determine which is the most efficient and the original query has been flagged and resource intensive. Could you provide some feedback on which approach makes more sense?

Thank you!!



Tom Kyte

Followup  

May 05, 2006 - 1:35 am UTC

the original doesn't have to read dual, yours does the work of the original PLUS the reading of dual.

hopefully the index nf_offid_p1_idx has suffix and last_transaction_date in it as well (concatenated index)


WHERE NF.SUFFIX = :B3
7 AND NF.OFFICE_ID = NRL.OFFICE_ID
8 AND NF.SECTION_CODE = NRL.SECTION_CODE
9 AND NF.RESP_CODE = NRL.RESP_CODE
10 AND NF.OFFICE_ID = :B2
11* AND NF.LAST_TRANSACTION_DATE >= to_date(:B1, 'MMDDYYYY')

I'd want an index on (office_id,suffix,last_transaction_date) probably.

May 10, 2006 - 9:38 am UTC

Reviewer: Reader

I have a table called person. I need to update the last name and birth date of person_no 2 with last name and birthdate of person_no 1. "id" is the primary key of the person table.

Can you please let me know how to do this update?

create sequence id_seq start with 1;

create table person
(id number primary key
, person_no number
,first_nm varchar2(30)
,last_nm varchar2(30)
,birth_dt varchar2(14)
,CONSTRAINT pk_id PRIMARY KEY(id) )

insert into person
values
(id_seq.nextval,1,'ABC','AAA','19991120');

insert into person
values
(id_seq.nextval,2,'DEF','','00000000');

insert into person
values
(id_seq.nextval,1,'GHI','KLM','19921021');

insert into person
values
(id_seq.nextval,2,'NPQ','','00000000');

insert into person
values
(id_seq.nextval,1,'RST','UVW','19721215');

insert into person
values
(id_seq.nextval,2,'XYZ','','00000000');

insert into person
values
(id_Seq.nextval,1,'TTT','GGG','19750324');

select * from person;

ID PERSON_NO FIRST_NM LAST_NM BIRTH_DT
--- -------- -------- ------- --------
1 1 ABC AAA 19991120
2 2 DEF 00000000
3 1 GHI KLM 19921021
4 2 NPQ 00000000
5 1 RST UVW 19721215
6 2 XYZ 00000000
7 1 TTT GGG 19750324


Thanks

Tom Kyte

Followup  

May 10, 2006 - 10:16 am UTC

you seem to have person_no = 1 and 2 in there more than once.

It is not at all clear how you would pick which person_no=1 to use to update all of the person_no = 2 records.

May 10, 2006 - 11:42 am UTC

Reviewer: Reader

Tom,
I am sorry for not being clear on the above question. Please find below the details of the tables.

I need to update the last name and birth date of
person_no 2 with last name and birthdate of person_no 1.
of the person table.

Can you please let me know how to do this update?



create sequence id_pseq start with 200;


create table person
(id_person number primary key
, person_no number
,first_nm varchar2(30)
,last_nm varchar2(30)
,birth_dt varchar2(14)
)

insert into person
values
(id_pseq.nextval,1,'ABC','AAA','19991120');

insert into person
values
(id_pseq.nextval,2,'DEF','','00000000');

insert into person
values
(id_pseq.nextval,1,'GHI','KLM','19921021');

insert into person
values
(id_pseq.nextval,2,'NPQ','','00000000');

insert into person
values
(id_pseq.nextval,1,'RST','UVW','19721215');

insert into person
values
(id_pseq.nextval,2,'XYZ','','00000000');

insert into person
values
(id_pseq.nextval,1,'TTT','GGG','19750324');


create sequence id_pbseq start with 100;

create table person_i_bnft
(id_ibnft number primary key
,id_person number references person(id_person)
,person_no number
,id_bnft number references person_bnft(id_bnft)
,age number
);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,200,1,3000,20);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,201,2,3000,25);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,202,1,3001,30);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,203,2,3001,35);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,204,1,3002,40);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,205,2,3002,45);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,206,1,3003,50);

create sequence id_bseq start with 3000;

create table person_bnft
(id_bnft number primary key
,amt varchar2(6)
);

insert into person_bnft
values
(id_bseq.NEXTVAL,'$1000');

insert into person_bnft
values
(id_bseq.NEXTVAL,'$2000');

insert into person_bnft
values
(id_bseq.NEXTVAL,'$3000');

insert into person_bnft
values
(id_bseq.NEXTVAL,'$4000');


Data from tables:
PErson table:
ID_PERSON PERSON_NO FIRST_NM LAST_NM BIRTH_DT
200 1 ABC AAA 19991120
201 2 DEF 00000000
202 1 GHI KLM 19921021
203 2 NPQ 00000000
204 1 RST UVW 19721215
205 2 XYZ 00000000
206 1 TTT GGG 19750324


PErson_i_bnft
ID_IBNFT ID_PERSON PERSON_NO ID_BNFT AGE
100 200 1 3000 20
101 201 2 3000 25
102 202 1 3001 30
103 203 2 3001 35
104 204 1 3002 40
105 205 2 3002 45
106 206 1 3003 50


Person_bnft
ID_BNFT AMT
3000 $1000
3001 $2000
3002 $3000
3003 $4000

Thanks

Tom Kyte

Followup  

May 11, 2006 - 7:34 am UTC

no, this is still as clear as a puddle of mud.

sigh, do you see that you have "FOUR PERSON_NO=1" records. You have "THREE PERSON_NO=2" records. You have only said:

...
I need to update the last name and birth date of
person_no 2 with last name and birthdate of person_no 1.
of the person table.
.......

No one can answer your question - there is no answer to your question. Your question doesn't make sense yet. Which of the four should be applied to the three, and why.


May 11, 2006 - 8:04 am UTC

Reviewer: Reader

ID_PERSON PERSON_NO FIRST_NM LAST_NM BIRTH_DT
200 1 ABC AAA 19991120
201 2 DEF 00000000
202 1 GHI KLM 19921021
203 2 NPQ 00000000
204 1 RST UVW 19721215
205 2 XYZ 00000000
206 1 TTT GGG 19750324


I need to update (last_nm,birth_dt) of ID_PERSON=201(person_no=2) with the value of (last_nm,birth_dt) in ID_PERSON=200, if the birth_dt of 201 is '00000000' and last_nm of 201 is null

I need to update (last_nm,birth_dt) of ID_PERSON=203(person_no=2) with the value of (last_nm,birth_dt) in ID_PERSON=202, if the birth_dt of 202 is '00000000' and last_nm of 202 is null

I need to update (last_nm,birth_dt) of ID_PERSON=205(person_no=2) with the value of (last_nm,birth_dt) in ID_PERSON=204, if the birth_dt of 204 is '00000000' and last_nm of 204 is null.

Thanks

Tom Kyte

Followup  

May 11, 2006 - 7:32 pm UTC

and how pray tell did you "know that"

logic - tell use the LOGIC here.

why does 200 go with 201, are they always sequential, is 200 a fixed constant or will that change.

Yes, it is now "semi-obvious" what you seem to be saying, but I want you to say it, out loud here.

What if the row with id_person = 202 is missing for example. CONSIDER ALL OF THE CASES, be detailed, be explicit, state requirements - go 100% of the way.

May 11, 2006 - 11:48 am UTC

Reviewer: Reader

I am trying to do the following update, but I get the error
"ORA-01779: cannot modify a column which maps to a non key-preserved table "

Can you please let me know what is wrong here?


update
(
select t1.fm1,t1.lm1,t2.fm2,t2.lm2
from
(select p11.ID_PERSON id1
,p11.FIRST_NM fm1
,p11.LAST_NM lm1
,p11.birth_dt dob1
,p11.PERSON_NO per_no1
,p13.ID_BNFT T1_ID_BNFT
from person p11
,person_i_bnft p12
,person_bnft p13
where p11.ID_PERSON = p12.ID_PERSON
and p12.ID_BNFT = p13.ID_BNFT
and p11.PERSON_NO = 1
and p11.BIRTH_DT <> '00000000'
) T1,
(select p21.ID_PERSON id2
,p21.FIRST_NM fm2
,p21.LAST_NM lm2
,p21.birth_dt dob2
,p21.PERSON_NO per_no2
,p23.ID_BNFT T2_ID_BNFT
from person p21
,person_i_bnft p22
,person_bnft p23
where p21.ID_PERSON = p22.ID_PERSON
and p22.ID_BNFT = p23.ID_BNFT
and p21.PERSON_NO = 2
and p21.BIRTH_DT = '00000000'
) T2
where T1_ID_BNFT = T2_ID_BNFT)
set lm2=lm1
,dob2=dob1;

Thanks


Tom Kyte

Followup  

May 11, 2006 - 7:55 pm UTC

did you search for "key preserved"

I only ask because those words have not yet appeared here at all and they seem like a logical thing to search for?



are these two queries the same...?

June 02, 2006 - 5:08 am UTC

Reviewer: A reader

Hi

I have a query which uses EXISTS I modified it to a join (reduced 50% logical I/O) however for the test data we have it seems correct but I am not 100% sure, may you have a look and see if they are the same?

SELECT *
FROM BCOMMON c
WHERE c.type = 'B'
AND c.i_period = 1
AND EXISTS (
SELECT 1
FROM BCOMMON c2
WHERE c2.type = c.type
AND c2.id_bonif_periodica = c.id_bonif
AND TO_CHAR (c2.f_imputacion, 'YYYYMMDD') = '20060601')


SELECT c.*
FROM BCOMMON c, BCOMMON c2
WHERE c.type='B'
AND c.i_period=1
AND c2.ID_BONIF_PERIODICA=c.ID_BONIF
AND to_char(c2.F_IMPUTACION, 'YYYYMMDD')='20060601'

Tom Kyte

Followup  

June 02, 2006 - 10:51 am UTC

No.

you don't have a predicate on the second query to make sure that c2.type is 'B'
if id_bonif_periodica is not unique for a given type/f_imputacion, you cannot just join (it'll increase the number of returned records)

and please, never do this:

AND TO_CHAR (c2.f_imputacion, 'YYYYMMDD') = '20060601'

do this:

and f_imputacion = to_date( '20060601', 'yyyymmdd' )


or, if the date column has a time component:

and f_imputacion >= to_date( '20060601', 'yyyymmdd' )
and f_imputacion < to_date( '20060601', 'yyyymmdd' )+1



thanks

June 02, 2006 - 6:29 pm UTC

Reviewer: A reader

Hi Tom

Thanks for your valuable advices.

join returns more records

June 05, 2006 - 3:44 am UTC

Reviewer: A reader

Hi Tom

I added a predicate to make sure that c2.type is 'B' so the query becomes

SELECT c.*
FROM BCOMMON c, BCOMMON c2
WHERE c.type='B'
AND c2.type='B'
AND c.i_period=1
AND c2.ID_BONIF_PERIODICA=c.ID_BONIF
and c2.f_imputacion >= to_date( '20060601', 'yyyymmdd' )
and c2.f_imputacion < to_date( '20060601', 'yyyymmdd' )+1

Since c2.ID_BONIF_PERIODICA is not unique I see more rows (with EXISTS only a rown is returned, with join two)

I guess join doesnt work in this case?

Sorry for my stupidness :-(


Tom Kyte

Followup  

June 05, 2006 - 7:49 am UTC

but that is just what I said above:

...
if id_bonif_periodica is not unique for a given type/f_imputacion, you cannot
just join (it'll increase the number of returned records)
.......



you can use IN, you can use EXISTS, you can join to an inline view that distincts - but if that column is not unique for the given attributes - you cannot "just join" really - it'll increase the number of records returned

Merge exists and not exists into one SQL statement

October 10, 2006 - 11:20 am UTC

Reviewer: Kandy Train from NY

Hi,

Hope this is an appropriate question for thread.
Here is an example,

drop table picking_tab;
drop table locations_tab;
create table picking_tab (pick_ticket_no number, location_id varchar2(10));
create table locations_tab (location_id varchar2(10), warehouse varchar2(10));

insert into picking_tab values(1001, 'Bos');
insert into picking_tab values(1002, 'Bos');
insert into picking_tab values(1002, 'Chi');
insert into picking_tab values(1003, 'Chi');

insert into locations_tab values('Bos', 'NorthEast');
insert into locations_tab values('Chi', 'MidWest');

commit;

What I would like to have is just one SQL statement so that I can pass one literal "NorthEast" or "MidWest" to a view with that SQL statement.

When I pass NorthEast, I want NorthEast specific Pick Tickets and any other pick ticket that picks from more locations, but at least one of them must be from NorthEast.

The results should look like,

PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
1002 Bos NorthEast
1002 Chi MidWest
1003 Chi MidWest

When I pass MidWest, I only need MidWest Pick Tickets that are not shared with any other region.

The results should look like,

PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
1003 Chi MidWest

The example may look silly, as I have to come up with something simple for you to help me.

I appreciate your help as always.

Tom Kyte

Followup  

October 10, 2006 - 7:57 pm UTC

why are there not two northeast records in the first result?

and you know, you do not "pass" things to a view, you apply predicates to a view, what do you mean by "pass"???

Merge exists and not exists into one SQL statement

October 10, 2006 - 11:36 am UTC

Reviewer: Kandy Train from NY

In the above posting, I didn't mention what I have tried so far. I am putting it here in case you wanted to know it.

--Results
PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
1003 Chi MidWest

To get the above results, I used

select p.pick_ticket_no, p.location_id, l.warehouse
from picking_tab p, locations_tab l
where p.location_id = l.location_id
and not exists (select lt.warehouse
from picking_tab pt, locations_tab lt
where pt.pick_ticket_no = p.pick_ticket_no
and pt.location_id = lt.location_id
and lt.warehouse = 'NorthEast');

--Results
PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
1002 Bos NorthEast
1002 Chi MidWest
1003 Chi MidWest

To get the above results, I used
select p.pick_ticket_no, p.location_id, l.warehouse
from picking_tab p, locations_tab l
where p.location_id = l.location_id
and exists (select lt.warehouse
from picking_tab pt, locations_tab lt
where pt.pick_ticket_no = p.pick_ticket_no
and pt.location_id = lt.location_id
and lt.warehouse = 'MidWest');

I want to have just one SQL where I could pass either "MidWest" or "NorthEast".

Thanks,




Tom Kyte

Followup  

October 10, 2006 - 8:04 pm UTC

see above, why just one northeast, doesn't make sense.

Merge exists and not exists into one SQL statement

October 10, 2006 - 8:19 pm UTC

Reviewer: Kandy train from NY

Hi Tom,

You are right and my first result (in first posting) should look like,

PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
1001 Bos NorthEast
1002 Bos NorthEast
1002 Chi MidWest

And this is the SQL statement I have used to get this results.

select p.pick_ticket_no, p.location_id, l.warehouse
from picking_tab p, locations_tab l
where p.location_id = l.location_id
and exists (select lt.warehouse
from picking_tab pt, locations_tab lt
where pt.pick_ticket_no = p.pick_ticket_no
and pt.location_id = lt.location_id
and lt.warehouse = 'NorthEast');

As you can see, It is the same SQL statement, and the only difference is in EXISTS and NOT EXISTS clauses.

I have a ref cursor that has a return type. I tried to use native dynamic SQL by creating two views with these two sql statements and did hit a error, which is a limitation in 9i.

So, now I have to create a one view that can produce result sets in one SQL statement.

May be I am making it harder for you to understand.

If that's the case, I will post whole procedure togather with the ref cursor tomorrow.

Thanks,



Tom Kyte

Followup  

October 10, 2006 - 8:31 pm UTC

ok, why just one midwest then, why not both.

Merge exists and not exists into one SQL statement

October 11, 2006 - 6:34 am UTC

Reviewer: Kandy Train from NY

Pick_TIcket_Id 1002 has two records (Picking from two locations).

When the SQL is run from MidWest, They should only get the Pick Tickets that are purely from MidWest. If a Pick Ticket is filled from NorthEast and MidWest, it is not handled from MidWest, but from NorthEast.

When the SQL is run from NorthEast, They should get all the
Pick tickets that at least have a record in pickcing_tab for the same pick_ticket_no.

Does that make sense??

Thanks,

Tom Kyte

Followup  

October 11, 2006 - 8:15 am UTC

nope, sorry it does not, this is sounding alot like "procedural logic" to me here, the rules seem arbitrary.

Merge exists and not exists into one SQL statement

October 11, 2006 - 11:02 am UTC

Reviewer: Kandy Train from NY

Thanks and I will use Procedural code to solve this issue.

Thanks again and appreciate your help

Performance Issue when using Exists...

November 02, 2006 - 12:11 pm UTC

Reviewer: VLS from Bombay, India

Hi Tom,

In recent Oracle Magazine, in one of the example, you mentioned that exists will perform better in cases where the columns of only one table is used in SELECT statement. For example,

select x.a, x.b. x.c
from x, y
where x.a = y.a
and x.b = :b1

can be coded as

select x.a, x.b. x.c
from x
where x.b = :b1
and exists
(select 'x' from y
where y.a = x.a)

But, in my cases, I have observed that the query performance degrades dratiscally when I change these type of queries to make use of EXISTS. For example,

SELECT dn.dn_num msisdn, ca.customer_id, ca.custcode, co.co_id,
t.ttl_des title, cc.ccfname fname, cc.cclname lname, ca.birthdate,
cc.ccemail email, ca.billcycle, ca.csclimit csc_limit, ca.prgcode,
prg.prgname, co.co_activated, ch.ch_status, co.tmcode, tm.shdes shdes,
sm.SM_PUK , cc.CCZIP , cc.ccline3 , ccline4 , ccline5 , cccity
FROM (select customer_id, custcode, birthdate, billcycle, csclimit,
prgcode, rownum srno from customer_all
where prgcode in
(select prgcode from hss_cust_segment where description='POST')) ca,
contract_all co,
curr_co_status ch,
ccontact_all cc,
directory_number dn,
pricegroup_all prg,
mputmtab tm,
title t,
storage_medium sm
WHERE ca.customer_id = co.customer_id
AND cc.customer_id = ca.customer_id
AND cc.ccbill = 'X'
AND cc.CCTITLE = t.TTL_ID(+)
AND co.co_id = ch.co_id
AND ch.ch_status IN ('a', 's')
AND ca.prgcode = prg.prgcode
AND co.tmcode = tm.tmcode
AND tm.vscode = 1
and exists
(select 'x' from contr_services_cap cs
where cs.co_id = co.co_id
AND cs.sncode = 1
AND cs.dn_id = dn.dn_id)
and exists
(select 'x' from contr_devices cd
where cd.co_id = co.co_id
AND cd.cd_deactiv_date IS NULL
AND cd.port_id = sm.sm_id);

Above query did not fetch result in 6 hours whereas below query executed in 4 hours.

SELECT dn.dn_num msisdn, ca.customer_id, ca.custcode, co.co_id,
t.ttl_des title, cc.ccfname fname, cc.cclname lname, ca.birthdate,
cc.ccemail email, ca.billcycle, ca.csclimit csc_limit, ca.prgcode,
prg.prgname, co.co_activated, ch.ch_status, co.tmcode, tm.shdes shdes,
sm.SM_PUK , cc.CCZIP , cc.ccline3 , ccline4 , ccline5 , cccity
FROM (select customer_id, custcode, birthdate, billcycle, csclimit,
prgcode, rownum srno from customer_all
where prgcode in
(select prgcode from hss_cust_segment where description='POST')) ca,
contract_all co,
curr_co_status ch,
ccontact_all cc,
directory_number dn,
pricegroup_all prg,
mputmtab tm,
title t,
storage_medium sm,
contr_devices cd,
contr_services_cap cs
WHERE ca.customer_id = co.customer_id
AND cc.customer_id = ca.customer_id
AND cc.ccbill = 'X'
AND cc.CCTITLE = t.TTL_ID(+)
AND co.co_id = ch.co_id
AND ch.ch_status IN ('a', 's')
and cs.co_id = co.co_id
AND cs.sncode = 1
AND cs.seqno = (SELECT MAX (seqno)
FROM contr_services_cap x
WHERE cs.co_id = x.co_id AND x.sncode = 1)
AND ca.prgcode = prg.prgcode
AND co.tmcode = tm.tmcode
AND tm.vscode = 1
AND cs.dn_id = dn.dn_id
and cd.co_id = co.co_id
AND cd.cd_deactiv_date IS NULL
AND cd.port_id = sm.sm_id;


Similarly, there were other incidence when exists has caused performance issues. Also, the tables used inside the exists condition are big and have proper indexes even though these fail.

What could be the problem ? Please help...

Regards
VLS

Tom Kyte

Followup  

November 02, 2006 - 12:17 pm UTC

I did not say "would work better"

I said "semantically and syntactically it would be MORE MEANINGFUL"

are you using the cbo.

Query Performance with Exists

November 02, 2006 - 10:14 pm UTC

Reviewer: VLS from Bombay, India

Yes Tom, I am using CBO. The inner query also uses a proper indexes.

Regards
VLS

Tom Kyte

Followup  

November 03, 2006 - 9:11 am UTC

those two queries do not seem to be even remotely the same here?

how can you even compare them? they do not return the same results.

Difference between the 2 Queries

November 03, 2006 - 9:41 am UTC

Reviewer: VLS from Bombay, India

Hi Tom,

Would appreciate if you can let me know the difference between the two queries. Second query is the original query and first one is modified by me. In the Original query, contr_services_cap is joined back to contr_services_cap to get max(seqno) so that only one co_id is returned. I tried to avoid this aggregation by using EXISTS and thought that aggregation should resource intensive.

Regards
VLS

Tom Kyte

Followup  

November 03, 2006 - 10:52 am UTC

that is precisely the bit that seems strange - how can they be equivalent?

one says "keep only the row with the max seqno"

one says "keep any row that has a mate over here"

first query - no predicate on seqno
second query - predicate on seqno

don't get how they are the same.

In response to my previous post

November 03, 2006 - 12:33 pm UTC

Reviewer: VLS from Bombay, India

Hi Tom,

"one says "keep only the row with the max seqno"" v/s one says "keep any row that has a mate over here". How does it matter ? For example,

Table A

Column x Column y
------------------------
100 128
100 290
100 300
100 180

200 768
200 190

300 650

select x from a o
where y = (select max(y) from a i where i.x=o.x.x)

will be

100
200
300
No matter 300 has only one value but this one value will be a max for this value of x. So, anyway each of x value will have a max. I used this logic and hence instead of taking max used EXISTS.

Now, for the predicates on SEQNO, this predicate is only used to get a single row for each co_id. Again using max. This was not required when I used EXISTS.

Regards
VLS


Tom Kyte

Followup  

November 03, 2006 - 2:33 pm UTC

you cannot compare the two queries you have.

where exists (something)

where something = (select max(somethingelse)


are not comparable. Show me the WHERE exists that is comparable to you:

select x from a o
where y = (select max(y) from a i where i.x=o.x.x)

please!!

I don't have (nor do I want) your schema
I don't know your relationships
I did not decompose your query

I do however have a strong feeling "they just ain't the same"

exits perform better?

January 18, 2007 - 9:10 am UTC

Reviewer: A reader

Hi Tom
I had a query like this : which was doing over 2250 LIO/execute -

select DISTINCT
BO128_Accounts_Mod.Entity_cre_flag,
BO128_Accounts_Mod.MainTableID, BO128_Accounts_Mod.BOCreatedBy,
BO128_Accounts_Mod.Cust_Last_Name, BO128_Accounts_Mod.Cust_First_Name,
BO128_Accounts_Mod.OrgKey,
fn_CategoryLookup(BO128_Accounts_Mod.Cust_Type, 'ACCOUNT_TYPE','en_US'),
BO128_Accounts_Mod.Cust_Type,
BO128_Accounts_Mod.MakerID, BO128_Accounts_Mod.MakerID,
BO128_Accounts_Mod.ownedUserID, BO128_Accounts_Mod.AssignedTo,
BO128_Accounts_Mod.AssignedTo, BO128_Accounts_Mod.ownedUserID,
BO128_Accounts_Mod.AssignedTo, BO128_Accounts_Mod.AssignedToGroup,
fn_CategoryLookup(BO128_Accounts_Mod.StageName,'PROCESSSTEP_STAGE','en_US'),
BO128_Accounts_Mod.StageName,
fn_CategoryLookup(BO128_Accounts_Mod.RecordStatus,'RECORD_STATUS','en_US'),
BO128_Accounts_Mod.RecordStatus,
fn_CategoryLookup(BO128_Accounts_Mod.LastOperPerformed,'LAST_OPERATION', 'en_US'),
BO128_Accounts_Mod.LastOperPerformed,
BO128_Accounts_Mod.ProcessID, decode(BO128_Accounts_Mod.AccountID,NULL,NULL, 'srmBOObj:AccountModBO/AccountModBO.accountID:' || BO128_Accounts_Mod.AccountID),
BO128_Accounts_Mod.GroupID,
BO128_Accounts_Mod.createdUserID, BO128_Accounts_Mod.AccountID,
BO128_Accounts_Mod.AccessOwnerGroup, BO128_Accounts_Mod.LastEditedPage,
BO128_Accounts_Mod.Cust_DOB, BO128_Accounts_Mod.TFPartyFlag,
BO128_Accounts_Mod.Negated, BO128_Accounts_Mod.BlackListed,
BO128_Accounts_Mod.Suspended, BO128_Accounts_Mod.IsMCEdited,
BO128_Accounts_Mod.AssignedByUserID, BO128_Accounts_Mod.IsTampered,
BO128_Accounts_Mod.OwnerGroup
From
ao_acctmod BO128_Accounts_Mod, ao_demomod BO129_Demographic_Mod
where BO128_Accounts_Mod.AccountID = BO129_Demographic_Mod.AccountID AND
( ( ( ( ( ( ( ( ( BO128_Accounts_Mod.MakerID = 1 ) OR (
BO128_Accounts_Mod.ownedUserID = 400013 ) ) ) OR (
BO128_Accounts_Mod.AssignedTo = 400013 ) ) ) AND ( ( ( ( (
BO128_Accounts_Mod.RecordStatus = 'F' ) OR (
BO128_Accounts_Mod.RecordStatus = 'F' ) ) ) OR (
BO128_Accounts_Mod.RecordStatus = 'F' ) ) ) ) ) AND (rownum <= 201) ) )
order by BO128_Accounts_Mod.Cust_Last_Name Asc

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 15 0.10 0.08 0 443 0 201
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 17 0.11 0.09 0 443 0 201

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 38 (CRMUSER)

Rows Row Source Operation
------- ---------------------------------------------------
201 SORT UNIQUE (cr=2252 pr=0 pw=0 time=951615 us)
201 COUNT STOPKEY (cr=443 pr=0 pw=0 time=4815 us)
201 NESTED LOOPS (cr=443 pr=0 pw=0 time=4812 us)
206 TABLE ACCESS FULL AO_ACCTMOD (cr=235 pr=0 pw=0 time=3567 us)
201 INDEX RANGE SCAN IX_AO_DEMO_ACCID (cr=208 pr=0 pw=0 time=1943 us)(object id 18490)

when I changed it to like below (used exists instead of distinct) then it is doing alot less LIO's and peforms better too (in terms of response time) .

select
BO128_Accounts_Mod.Entity_cre_flag,
BO128_Accounts_Mod.MainTableID, BO128_Accounts_Mod.BOCreatedBy,
BO128_Accounts_Mod.Cust_Last_Name, BO128_Accounts_Mod.Cust_First_Name,
BO128_Accounts_Mod.OrgKey,
fn_CategoryLookup(BO128_Accounts_Mod.Cust_Type, 'ACCOUNT_TYPE','en_US'),
BO128_Accounts_Mod.Cust_Type,
BO128_Accounts_Mod.MakerID, BO128_Accounts_Mod.MakerID,
BO128_Accounts_Mod.ownedUserID, BO128_Accounts_Mod.AssignedTo,
BO128_Accounts_Mod.AssignedTo, BO128_Accounts_Mod.ownedUserID,
BO128_Accounts_Mod.AssignedTo, BO128_Accounts_Mod.AssignedToGroup,
fn_CategoryLookup(BO128_Accounts_Mod.StageName,'PROCESSSTEP_STAGE','en_US'),
BO128_Accounts_Mod.StageName,
fn_CategoryLookup(BO128_Accounts_Mod.RecordStatus,'RECORD_STATUS','en_US'),
BO128_Accounts_Mod.RecordStatus,
fn_CategoryLookup(BO128_Accounts_Mod.LastOperPerformed,'LAST_OPERATION', 'en_US'),
BO128_Accounts_Mod.LastOperPerformed,
BO128_Accounts_Mod.ProcessID, decode(BO128_Accounts_Mod.AccountID,NULL,NULL, 'srmBOObj:AccountModBO/AccountModBO.accountID:' || BO128_Accounts_Mod.AccountID),
BO128_Accounts_Mod.GroupID,
BO128_Accounts_Mod.createdUserID, BO128_Accounts_Mod.AccountID,
BO128_Accounts_Mod.AccessOwnerGroup, BO128_Accounts_Mod.LastEditedPage,
BO128_Accounts_Mod.Cust_DOB, BO128_Accounts_Mod.TFPartyFlag,
BO128_Accounts_Mod.Negated, BO128_Accounts_Mod.BlackListed,
BO128_Accounts_Mod.Suspended, BO128_Accounts_Mod.IsMCEdited,
BO128_Accounts_Mod.AssignedByUserID, BO128_Accounts_Mod.IsTampered,
BO128_Accounts_Mod.OwnerGroup
From
ao_acctmod BO128_Accounts_Mod
where exists ( select null from
ao_demomod BO129_Demographic_Mod
where
BO128_Accounts_Mod.AccountID = BO129_Demographic_Mod.AccountID AND
( ( BO128_Accounts_Mod.MakerID = 1 ) OR ( BO128_Accounts_Mod.ownedUserID = 400020 ) OR ( BO128_Accounts_Mod.AssignedTo = 400020 ) ) AND ( BO128_Accounts_Mod.RecordStatus = 'F' OR BO128_Accounts_Mod.RecordStatus = 'F' OR BO128_Accounts_Mod.RecordStatus = 'F' ) )
AND rownum <= 201
order by BO128_Accounts_Mod.Cust_Last_Name Asc

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 15 0.07 0.17 0 735 0 201
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 17 0.07 0.17 0 735 0 201

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 38 (CRMUSER)

Rows Row Source Operation
------- ---------------------------------------------------
201 COUNT STOPKEY (cr=735 pr=0 pw=0 time=5822 us)
201 FILTER (cr=735 pr=0 pw=0 time=5618 us)
206 TABLE ACCESS BY INDEX ROWID AO_ACCTMOD (cr=323 pr=0 pw=0 time=2705 us)
206 INDEX FULL SCAN IX_AO1_CUST_LAST_NAME (cr=17 pr=0 pw=0 time=225 us)(object id 18495)
201 FILTER (cr=412 pr=0 pw=0 time=2461 us)
201 INDEX RANGE SCAN IX_AO_DEMO_ACCID (cr=412 pr=0 pw=0 time=2071 us)(object id 18490)

my question - what is the reason - how distinct peforms poor than exists..

Ajeet

change this query

March 26, 2007 - 7:47 am UTC

Reviewer: A reader

select PGM_VEH_CONFIG.CNTRY_CD
, PGM_VEH_CONFIG.LANG_CD
, PGM_VEH_CONFIG.BUSNS_FCN_USE_CD
, PGM_VEH_CONFIG.MODL_YR_NBR
, PGM_VEH_CONFIG.PVC_ID
, RTRIM(PGM_VEH_CONFIG.PVC_DESC)
, PGM_VEH_CONFIG.PVC_MSRP
, TO_CHAR(PGM_VEH_CONFIG.PVC_EFCTV_DT,'yyyy-mm-dd')
, TO_CHAR(PGM_VEH_CONFIG.PVC_TERMNT_DT,'yyyy-mm-dd')
, PGM_VEH_CONFIG.SELLNG_SRC_CD
, PGM_VEH_CONFIG.NAMPLT_CD
, PGM_VEH_CONFIG.MDSNG_MODL_DESGTR
, PGM_VEH_CONFIG.USG_SEQ_NBR
, PGM_VEH_CONFIG.MKTG_BODY_STYLE_CD
, PGM_VEH_CONFIG.SUB_BRAND_CD
, PGM_VEH_CONFIG.PACKG_OPTN_CD
, PGM_VEH_CONFIG.SEGMENT_CD
, PGM_VEH_CONFIG.COMPRS_MODL_CD
, PGM_VEH_CONFIG.CONFIG_MODL_CD
, PGM_VEH_CONFIG.OPTN_USG_STMT
from gmceycea.PGM_VEH_CONFIG
, gmceycea.PRICE_RANGE
where PGM_VEH_CONFIG.BUSNS_FCN_USE_CD ='038'
and PGM_VEH_CONFIG.CNTRY_CD = PRICE_RANGE.CNTRY_CD
and PGM_VEH_CONFIG.LANG_CD = PRICE_RANGE.LANG_CD
and PGM_VEH_CONFIG.BUSNS_FCN_USE_CD = PRICE_RANGE.BUSNS_FCN_USE_CD
and PGM_VEH_CONFIG.PVC_MSRP between PRICE_RANGE.START_PRICE_RANGE and PRICE_RANGE.END_PRICE_RANGE
AND EXISTS (SELECT * from gmceycea.PVC_COLR
where pvc_colr.cntry_cd = pgm_veh_config.cntry_cd
and pvc_colr.lang_cd = pgm_veh_config.lang_cd
and pvc_colr.busns_fcn_use_cd = pgm_veh_config.busns_fcn_use_cd
and pvc_colr.modl_yr_nbr = pgm_veh_config.modl_yr_nbr
and pvc_colr.pvc_id = pgm_veh_config.pvc_id
and pvc_colr.busns_fcn_use_cd = '038'
and PGM_VEH_CONFIG.BUSNS_FCN_USE_CD= '038');

if PGM_VEH_CONFIG.BUSNS_FCN_USE_CD= '038'
column having the value 038 then EXISTS part should work otherwise exists part should not work.
kindly rephrase it.








Tom Kyte

Followup  

March 26, 2007 - 11:06 am UTC

huh?

Question

March 27, 2007 - 9:15 am UTC

Reviewer: Reader

Hello,
How do we make Oracle to use index if the query has NOT IN?

SELECT *
FROM ORDERS
WHERE STATUS NOT IN ('NEW','OLD')

I created a Function Based Index.

CREATE INDEX IDX_FBI ON ORDERS (DECODE(STATUS,'NEW','1','OLD','1',0)) = '0'

SELECT *
FROM ORDERS
WHERE DECODE(STATUS,'NEW','1','OLD','1',0)) = '0'

It does a Full table scan but when I do a COUNT(*) it does a Fast Full Index scan. How to use index while using NOT IN?

Thanks
Tom Kyte

Followup  

March 27, 2007 - 10:16 am UTC

no really, what index did you create - given the one you demonstrate does not work.

and what version are you using.


ops$tkyte%ORA10GR2> /*
ops$tkyte%ORA10GR2> drop table t;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t
ops$tkyte%ORA10GR2> as
ops$tkyte%ORA10GR2> select decode( mod(rownum,3),0,'OLD',1,'NEW'  ) ostatus, a.*
ops$tkyte%ORA10GR2>   from all_objects a;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2> drop index t_idx;

Index dropped.

ops$tkyte%ORA10GR2> create index t_idx on t( decode( ostatus, 'NEW', null, 'OLD', null, 'YES' ) );

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t where  decode( ostatus, 'NEW', null, 'OLD', null, 'YES' ) = 'YES';

Execution Plan
----------------------------------------------------------
Plan hash value: 470836197

------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   500 | 48000 |    44   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |   500 | 48000 |    44   (0
|*  2 |   INDEX RANGE SCAN          | T_IDX |   200 |       |    35   (0
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(DECODE("OSTATUS",'NEW',NULL,'OLD',NULL,'YES')='YES')

ops$tkyte%ORA10GR2> set autotrace off


March 28, 2007 - 10:57 am UTC

Reviewer: A reader

Tom,

I have a problem with a select I am not able to built

consider that

select * from t
where decode (:x, null,'999', t.a) in (select t2.a from t2 where t2.a = :x)

This is not working when :x is null

When :x is null I will normally have this
select * from t
where '999'in (select t2.a from t2 where t2.a = '')

What I would like to have in this case is a sort of
select * from t
where '999' in ('999')

Could you please help me

Thanks a lot for your help

Tom Kyte

Followup  

March 28, 2007 - 12:14 pm UTC

... This is not working when :x is null ...

sure it is, it works perfectly. It just probably doesn't do what you want - that is something else altogether.

select *
  from t
 where :x is null or a in (select t2.a from t2 where t2.a = :x)


seems to say exactly what you meant.

NOT IN too slow

May 04, 2007 - 2:31 am UTC

Reviewer: Sagar from Mumbai India

Hi Tom,
I have one small NOT IN query on 2 tables which when run on 9i database takes more than 2 hours, I had to kill the session without ouput.But IN works in 2 seconds!.
I copied same tables and created same indexes in 10gR2 , now the query takes around 5-6 minutes in 10gR2.
For Details see below.I am interested in knowing what causes consistent gets to be so large. There is no one connected to the user other than me.


SQL> select count(1) from camps_upload ;

  COUNT(1)
----------
     43656

Elapsed: 00:00:02.65

Execution Plan
----------------------------------------------------------
Plan hash value: 889239046

---------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |   184   (2)| 00:00:03 |
|   1 |  SORT AGGREGATE    |              |     1 |            |          |
|   2 |   TABLE ACCESS FULL| CAMPS_UPLOAD | 43656 |   184   (2)| 00:00:03 |
---------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        805  consistent gets
        781  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select count(1) from camps_master_upload ;

  COUNT(1)
----------
     38266

Elapsed: 00:00:07.71

Execution Plan
----------------------------------------------------------
Plan hash value: 1748044587

----------------------------------------------------------------------------------
| Id  | Operation          | Name                | Rows  | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                     |     1 |   182   (2)| 00:00:03 |
|   1 |  SORT AGGREGATE    |                     |     1 |            |          |
|   2 |   TABLE ACCESS FULL| CAMPS_MASTER_UPLOAD | 38266 |   182   (2)| 00:00:03 |
----------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
        338  recursive calls
          0  db block gets
        852  consistent gets
        792  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select bytes/(1024*1024) usage from user_segments where segment_name  IN ('CAMPS_UPLOAD','CAMPS
_MASTER_UPLOAD') ;

     USAGE
----------
         7
         7

Elapsed: 00:00:04.75

Execution Plan
----------------------------------------------------------
ERROR:
ORA-01039: insufficient privileges on underlying objects of the view


SP2-0612: Error generating AUTOTRACE EXPLAIN report

Statistics
----------------------------------------------------------
       2017  recursive calls
          0  db block gets
       4111  consistent gets
        309  physical reads
          0  redo size
        444  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         56  sorts (memory)
          0  sorts (disk)
          2  rows processed

SQL> select count(1) from camps_upload where 
  2  (folio_no,comp_code) in ( select folio_no,comp_code from camps_master_upload)  ;

  COUNT(1)
----------
     38121

Elapsed: 00:00:02.50

Execution Plan
----------------------------------------------------------
Plan hash value: 123087718

-------------------------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |     1 |    26 |    66   (7)| 00:00:01 |
|   1 |  SORT AGGREGATE        |                  |     1 |    26 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI |                  |     1 |    26 |    66   (7)| 00:00:01 |
|   3 |    INDEX FAST FULL SCAN| IND_CMP_MAST_UPD | 38266 |   485K|    30   (4)| 00:00:01 |
|   4 |    INDEX FAST FULL SCAN| IND_CMP_UPD      | 43656 |   554K|    34   (3)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")


Statistics
----------------------------------------------------------
        551  recursive calls
          0  db block gets
        380  consistent gets
        273  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> ed
Wrote file afiedt.buf

  1  select count(1) from camps_upload where
  2* (folio_no,comp_code) not in ( select folio_no,comp_code from camps_master_upload)
SQL> /

  COUNT(1)
----------
      5535

Elapsed: 00:05:57.61

Execution Plan
----------------------------------------------------------
Plan hash value: 3220892056

-------------------------------------------------------------------------------------------
| Id  | Operation           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                     |     1 |    13 |  7228K  (3)| 24:05:45 |
|   1 |  SORT AGGREGATE     |                     |     1 |    13 |            |          |
|*  2 |   FILTER            |                     |       |       |            |          |
|   3 |    TABLE ACCESS FULL| CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
|*  4 |    TABLE ACCESS FULL| CAMPS_MASTER_UPLOAD |     1 |    13 |   185   (3)| 00:00:03 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "CAMPS_MASTER_UPLOAD"
              "CAMPS_MASTER_UPLOAD" WHERE LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2)))
   4 - filter(LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2))


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
   19545709  consistent gets
         20  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> 


Even though index is not used why consistent gets are so abnormally high?.


Tom Kyte

Followup  

May 04, 2007 - 1:00 pm UTC

the camps_master_upload table was full scanned 43,656 times.

NOT IN too slow

May 05, 2007 - 1:22 am UTC

Reviewer: Sagar from Mumbai , India

Thanks Tom,
I have used "NOT EXIST" instead of "NOT IN" and the query executes within 3-4 seconds.Both tables do not have NULLs in the columns used in the query.


SQL> ED
Wrote file afiedt.buf

  1  select  COUNT(1) FROM camps_upload a
  2   where   not exists   ( select 1  from camps_master_upload b where a.folio_no =b.folio_no
  3* and a.comp_code = b.comp_code  )
SQL> /

  COUNT(1)
----------
      5535

Elapsed: 00:00:03.46

Execution Plan
----------------------------------------------------------
Plan hash value: 1970687259

--------------------------------------------------------------------------------
-----------

| Id  | Operation              | Name             | Rows  | Bytes | Cost (%CPU)|
 Time     |

--------------------------------------------------------------------------------
-----------

|   0 | SELECT STATEMENT       |                  |     1 |    26 |   220   (5)|
 00:00:03 |

|   1 |  SORT AGGREGATE        |                  |     1 |    26 |            |
          |

|*  2 |   HASH JOIN RIGHT ANTI |                  | 43655 |  1108K|   220   (5)|
 00:00:03 |

|   3 |    INDEX FAST FULL SCAN| IND_CMP_MAST_UPD | 38266 |   485K|    30   (4)|
 00:00:01 |

|   4 |    TABLE ACCESS FULL   | CAMPS_UPLOAD     | 43656 |   554K|   188   (4)|
 00:00:03 |

--------------------------------------------------------------------------------
-----------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"."FOLIO_NO"="B"."FOLIO_NO" AND "A"."COMP_CODE"="B"."COMP_CODE")


Statistics
----------------------------------------------------------
         17  recursive calls
          0  db block gets
        939  consistent gets
        239  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed



Can you please explain how "NOT IN" works as "IN" executes within 3-4 seconds even if we use no indexes.This will be helpful for all.


Thanks in advance.






Wrote file afiedt.buf

  1   select  COUNT(1) FROM camps_upload
  2    where (folio_no,comp_code)   in
  3*   ( select folio_no,comp_code from camps_master_upload)
SQL> /

  COUNT(1)
----------
     38121

Elapsed: 00:00:12.18

Execution Plan
----------------------------------------------------------
Plan hash value: 474859051

--------------------------------------------------------------------------------
-------------

| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU
)| Time     |

--------------------------------------------------------------------------------
-------------

|   0 | SELECT STATEMENT      |                     |     1 |    26 |   375   (4
)| 00:00:05 |

|   1 |  SORT AGGREGATE       |                     |     1 |    26 |
 |          |

|*  2 |   HASH JOIN RIGHT SEMI|                     |     1 |    26 |   375   (4
)| 00:00:05 |

|   3 |    TABLE ACCESS FULL  | CAMPS_MASTER_UPLOAD | 38266 |   485K|   185   (3
)| 00:00:03 |

|   4 |    TABLE ACCESS FULL  | CAMPS_UPLOAD        | 43656 |   554K|   188   (4
)| 00:00:03 |

--------------------------------------------------------------------------------
-------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")


Statistics
----------------------------------------------------------
        398  recursive calls
          0  db block gets
       1662  consistent gets
        559  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         10  sorts (memory)
          0  sorts (disk)
          1  rows processed


Tom Kyte

Followup  

May 08, 2007 - 10:04 am UTC

well, I don't know about you - but I see a "not exists" and an "in"

that is

I see entirely different queries here that do entirely different things and return entirely different answers.

NOT IN too slow

May 09, 2007 - 1:40 am UTC

Reviewer: Sagar from Mumbai India.

Thanks for the reply Tom,
Actually my question was continuation of the previously asked quetion just above.
Anyway here is the problem I am facing.I have a simple "IN" query which executes in less than 5 seconds.If I put "NOT IN" in place of "IN" the query takes more than 5 minutes.Both the tables have around 40,000 records.You had replied that one of the table is full scanned 40,000 times in case of "NOT IN".Then why it is not full scanned it case of "IN"?.

Thanks in advance.


SQL> ed
Wrote file afiedt.buf

  1  select count(1) from camps_upload
  2  where
  3  (folio_no,comp_code) in
  4* ( select folio_no,comp_code from camps_master_upload )
SQL> /

  COUNT(1)
----------
     38121

Elapsed: 00:00:04.28

Execution Plan
----------------------------------------------------------
Plan hash value: 474859051

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    26 |   375   (4)| 00:00:05 |
|   1 |  SORT AGGREGATE       |                     |     1 |    26 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|                     |     1 |    26 |   375   (4)| 00:00:05 |
|   3 |    TABLE ACCESS FULL  | CAMPS_MASTER_UPLOAD | 38266 |   485K|   185   (3)| 00:00:03 |
|   4 |    TABLE ACCESS FULL  | CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")


Statistics
----------------------------------------------------------
        654  recursive calls
          0  db block gets
       1704  consistent gets
       1596  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> ed
Wrote file afiedt.buf

  1  select count(1) from camps_upload
  2  where
  3  (folio_no,comp_code) not in
  4* ( select folio_no,comp_code from camps_master_upload )
SQL> /

  COUNT(1)
----------
      5535

Elapsed: 00:05:28.67

Execution Plan
----------------------------------------------------------
Plan hash value: 3220892056

-------------------------------------------------------------------------------------------
| Id  | Operation           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                     |     1 |    13 |  7228K  (3)| 24:05:45 |
|   1 |  SORT AGGREGATE     |                     |     1 |    13 |            |          |
|*  2 |   FILTER            |                     |       |       |            |          |
|   3 |    TABLE ACCESS FULL| CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
|*  4 |    TABLE ACCESS FULL| CAMPS_MASTER_UPLOAD |     1 |    13 |   185   (3)| 00:00:03 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "CAMPS_MASTER_UPLOAD"
              "CAMPS_MASTER_UPLOAD" WHERE LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2)))
   4 - filter(LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2))


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
   19545709  consistent gets
       1591  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

   

Tom Kyte

Followup  

May 11, 2007 - 9:28 am UTC

because in and not in are ENTIRELY DIFFERENT

I don't know why you would even consider comparing them.


NOT IN too slow

May 12, 2007 - 4:10 am UTC

Reviewer: Sagar from Mumbai , India

Thanks for the reply Tom,
Yes IN and NOT IN are different, but from point of view of algorithm for finding elements from the set A also in the set B (the IN process ) and finding elements in set A that are not in set B ( the NOT IN process) they are very much similar.

Tom Kyte

Followup  

May 14, 2007 - 12:55 pm UTC

huh, you are kidding right?

they are very different.

think about this - you are asked to find "where x = 5", that is very very very different from "where x <> 5"

think about what you just said. Other than the fact both are written in SQL, there is nothing similar about them at all.

NOT IN too slow

May 16, 2007 - 5:16 am UTC

Reviewer: Sagar from Mumbai , India

Thanks for the reply Tom,
I am still not convinced.

Consider this manual process.

There is person having 2 boxes containing some objects.

Box1 has 2 objects.
Box2 has 3 objects.

1. He is told to find out objects from box1 that are also in box2.

Suppose he takes 1 minute to finish.This is the IN process.

2. Now we ask him to find objects from box1 that are not there in box2. This is NOT IN process.

For this he takes 100 minutes!.
(the eg. I showed you for IN executes within 4 seconds while for NOT IN with same data it takes 400 seconds)

In this case can we believe him when he takes 100 minutes for this ?.

IN and NOT IN have similar algorithm only the difference is that search is carried out till last element in the case of NOT IN but I don't think this can justify the delay.






Tom Kyte

Followup  

May 16, 2007 - 10:31 am UTC

do you understand why

where x = 5

where x <> 5

would pretty much be processed very differently???

eg: where x = 5, obviously, a candidate for an index - find 5, get data

where x <> 5, obviously - NOT a candidate for an index, indexes are using to "find" stuff, not "not find" stuff.


in and not in are like NIGHT and DAY. eg: DIFFERENT.

NOT IN and filter

May 16, 2007 - 7:11 pm UTC

Reviewer: Jonathan Lewis from NY NY

Sagar, in one of your posts you say: "Both tables do not have NULLs in the columns used in the query." However, the predicate section of the execution plan shows that you have not declared these columns to be NOT NULL:

   2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "CAMPS_MASTER_UPLOAD"
              "CAMPS_MASTER_UPLOAD" WHERE LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2)))
   4 - filter(LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2))


If you do not declare the columns as NOT NULL, the optimizer has to assume that they may contain nulls, hence the use of the lnnvl() function calls that disable the use of indexes, and leave Oracle doing a tablescan of the camps_master_upload table for (potientially) every row in the camps_upload table.

Declare the columns not null, or add '{column} is not null' predicates for all for column referenced, and you would probably get the same anti-join as you did with the 'not exists' version of the query. For related comments see: http://jonathanlewis.wordpress.com/2007/02/25/not-in/






NOT IN too slow

May 17, 2007 - 6:54 am UTC

Reviewer: Sagar from Mumbai , India.

Thanks Jonathan!!!
Jonathan you are absolutely right!!!.Thank you very much.
I just added not null constraint to the referenced columns and its DONE ;).See the output below.Its good example for oracle fans.

Also thanks Tom for taking his very precious time for this query.


SQL> alter table camps_master_upload modify (folio_no not null , comp_code not null ) ; 

Table altered.

SQL> alter table camps_upload modify (folio_no not null , comp_code not null ) ; 

Table altered.

SQL> ed
Wrote file afiedt.buf

  1  select count(1) from camps_upload
  2  where
  3* (folio_no,comp_code)  in ( select folio_no,comp_code from camps_master_upload )
SQL> /

  COUNT(1)
----------
     38121

Elapsed: 00:00:00.54

Execution Plan
----------------------------------------------------------
Plan hash value: 474859051

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    26 |   375   (4)| 00:00:05 |
|   1 |  SORT AGGREGATE       |                     |     1 |    26 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|                     |     1 |    26 |   375   (4)| 00:00:05 |
|   3 |    TABLE ACCESS FULL  | CAMPS_MASTER_UPLOAD | 38266 |   485K|   185   (3)| 00:00:03 |
|   4 |    TABLE ACCESS FULL  | CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1601  consistent gets
          0  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> ed
Wrote file afiedt.buf

  1  select count(1) from camps_upload
  2  where
  3* (folio_no,comp_code) not  in ( select folio_no,comp_code from camps_master_upload )
SQL> /

  COUNT(1)
----------
      5535

Elapsed: 00:00:00.16

Execution Plan
----------------------------------------------------------
Plan hash value: 1887649868

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    26 |   375   (4)| 00:00:05 |
|   1 |  SORT AGGREGATE       |                     |     1 |    26 |            |          |
|*  2 |   HASH JOIN RIGHT ANTI|                     | 43655 |  1108K|   375   (4)| 00:00:05 |
|   3 |    TABLE ACCESS FULL  | CAMPS_MASTER_UPLOAD | 38266 |   485K|   185   (3)| 00:00:03 |
|   4 |    TABLE ACCESS FULL  | CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1601  consistent gets
          0  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed


NOT IN too slow...

May 17, 2007 - 8:14 am UTC

Reviewer: Sagar from Mumbai , India.

Note that there are no indexes on both the tables.




Tom Kyte

Followup  

May 17, 2007 - 11:25 am UTC

all I've been trying to get you to realize is:

not in and in are as fundamentally different as night and day are.

not in, especially with regards to nullality, is hugely different.

Thanks Tom

May 17, 2007 - 12:06 pm UTC

Reviewer: Sagar from Mumbai , India.

Thanks once again Tom,
Yes I got it now. Each day I learn something new here.Keep up the great work.I am a Tom Kyte fan ;).


I need this

October 23, 2007 - 7:36 am UTC

Reviewer: rebeccah tukisi from south africa,midrand

good afternoon Tom

please help me to register on your site
so that i can email question too, i kind of new to the site and the oracle world

do you also help with the discoverer queries/reports

from rebeccah
Tom Kyte

Followup  

October 23, 2007 - 11:14 am UTC

nobody registers here.

when i have time, I permit new questions to be asked - it is first come, first serve and catch as catch can.

March 10, 2008 - 7:30 am UTC

Reviewer: Sachin from India

Hi Tom ,

Look at below sql query ,
insert into rp_alert_cdr
select
/*+ INDEX(c, CDR_IDX2)*/
al_number_seq.currval,
c.*
from
cdr c
where
c.prof_num = lv_high_usage_data(i).plg_prof_num and
c.insert_dt >= lv_start_time and
c.insert_dt <= lv_end_time and
c.prof_num is not null and
c.toll_value > 0 and
c.b_intl_ind = 2 and
c.b_off_hook_ind = 1 and
c.orig_type <> 1 and
upper(c.DIALLED_DIGITS) Not Like '101%' and
c.term_type <> 0 and
nvl(c.cic,999999) not in ('*V','*U'); -- added new clause



My process is ,just inserting records from CDR table to RP_ALERT_CDR only those record which condition is qulifiy the where clause. this was running as per need,, now i have added new condition.

"and nvl(c.cic,999999) not in ('*V','*U')" -- added new clause

cdr table has CIC column varchar2(5) datatype. it can contain null record also .
in current time , we have only *V,*U and null record in CIC column, after adding condition with out using NVL
it gives 0 record
and when we use query with nvl function it give desire
result.







Tom Kyte

Followup  

March 10, 2008 - 11:40 am UTC

"and  nvl(c.cic,999999)  not in ('*V','*U')"  -- added new clause

cdr table has CIC column varchar2(5) datatype.



one utterly and completely wonders why you would nvl to a number then doesn't one? If you have a varchar2(5) - what could be the possible logic behind nvl'ing to 999,999???

but anyway - you have this column cic

it has in it:

a) *V
b) *U
c) a null value


Now, you want to keep all records where cic is not in *V, *U

Well, null is neither NOT IN, nor IN ('*v','*u', NULL is "not known"

so, where null not in (a,b) - is NOT TRUE (nor is it false)
where NOT(null not in (a,b)) - is NOT TRUE (nor is it false)

they are unknown.


so, currently, seems what you really meant to say was:

where cic is null


plain and simple

March 12, 2008 - 1:30 am UTC

Reviewer: sachin from india

Thanks Tom ,

i was missing this.

are these 2 SQLs are same

March 14, 2008 - 6:39 am UTC

Reviewer: Reene

Hi Tom

I have a SQL developed by some developer.

select ooh.order_number,count(ool.line_id)
from ont.oe_order_headers_all ooh,
ont.oe_order_lines_all ool,
wsh.wsh_delivery_Details wdd
where
exists
( select */ 'X'
from ont.oe_order_headers_all oh,
ont.oe_order_lines_all ol,
wsh.wsh_delivery_Details wdd,
ont.OE_SYSTEM_PARAMETERS_ALL osp
where oh.header_id=ol.header_id and
ol.line_id=wdd.source_line_id (+) and
( oh.ORDER_TYPE_ID = osp.ATTRIBUTE1 and osp.attribute1 is not null ) and
oh.org_id=3171 and
oh.order_source_id=1203 and
nvl(wdd.released_status,'x') <> 'C' and
nvl(wdd.released_status,'x') <> 'D' and
ol.shippable_flag='Y' and
ooh.order_number = oh.order_number
) and
ooh.order_source_id=1203 and
ooh.header_id=ool.header_id and
nvl(wdd.released_status,'x') <> 'D' and
ool.line_id=wdd.source_line_id (+) and
ool.shippable_flag='Y'
group by ooh.order_number
/

I tried to simplify it -as its tkprof shows a poor execution plan - i think that the above SQL can be re-wrtitten as :

select oh.order_number,count(ol.line_id)
from ont.oe_order_headers_all oh,
ont.oe_order_lines_all ol,
wsh.wsh_delivery_Details wdd,
ont.OE_SYSTEM_PARAMETERS_ALL osp
where
oh.order_source_id=1203 and
oh.header_id=ol.header_id and
oh.ORDER_TYPE_ID = osp.ATTRIBUTE1 and osp.attribute1 is not null ) and
oh.org_id=3171 and
nvl(wdd.released_status,'x') <> 'D' and
nvl(wdd.released_status,'x') <> 'C' and
ol.line_id=wdd.source_line_id (+) and
ol.shippable_flag='Y'
group by oh.order_number
/

is it correct .

Thanks
Tom Kyte

Followup  

March 15, 2008 - 9:37 am UTC

run it - and observe the two totally different answers you will likely get.

no, definitely not the same.

typo in above post

March 14, 2008 - 6:41 am UTC

Reviewer: Reene

select ooh.order_number,count(ool.line_id)
from ont.oe_order_headers_all ooh,
ont.oe_order_lines_all ool,
wsh.wsh_delivery_Details wdd
where
exists
( select 'X'
from ont.oe_order_headers_all oh,
ont.oe_order_lines_all ol,
wsh.wsh_delivery_Details wdd,
ont.OE_SYSTEM_PARAMETERS_ALL osp
where oh.header_id=ol.header_id and
ol.line_id=wdd.source_line_id (+) and
( oh.ORDER_TYPE_ID = osp.ATTRIBUTE1 and osp.attribute1 is not null ) and
oh.org_id=3171 and
oh.order_source_id=1203 and
nvl(wdd.released_status,'x') <> 'C' and
nvl(wdd.released_status,'x') <> 'D' and
ol.shippable_flag='Y' and
ooh.order_number = oh.order_number
) and
ooh.order_source_id=1203 and
ooh.header_id=ool.header_id and
nvl(wdd.released_status,'x') <> 'D' and
ool.line_id=wdd.source_line_id (+) and
ool.shippable_flag='Y'
group by ooh.order_number
/

I tried to simplify it -as its tkprof shows a poor execution plan - i think that the above SQL can
be re-wrtitten as :

select oh.order_number,count(ol.line_id)
from ont.oe_order_headers_all oh,
ont.oe_order_lines_all ol,
wsh.wsh_delivery_Details wdd,
ont.OE_SYSTEM_PARAMETERS_ALL osp
where
oh.order_source_id=1203 and
oh.header_id=ol.header_id and
oh.ORDER_TYPE_ID = osp.ATTRIBUTE1 and osp.attribute1 is not null ) and
oh.org_id=3171 and
nvl(wdd.released_status,'x') <> 'D' and
nvl(wdd.released_status,'x') <> 'C' and
ol.line_id=wdd.source_line_id (+) and
ol.shippable_flag='Y'
group by oh.order_number
/

is it correct .

Thanks


correct

April 07, 2008 - 4:05 am UTC

Reviewer: Reene

Thanks Tom,

even though the number of records fethced were same,but the data was different. that is order_numbers were same but the counts were different.thanks for poiniting it out.
understanding them a bit better now.

thanks

July 23, 2008 - 12:37 pm UTC

Reviewer: A reader from NYC, USA

Tom,
After reading this thread, i am running the IN & EXIST query for emp and dept table. Both the query shows the same execution plan. But it should be different. For IN query,optimizer should use index scan on emp table. For exists query, optimizer should use FTS on emp table. Please correct me if i am wrong..


SQL> select count(*) from emp;

  COUNT(*)
----------
    458752
 

SQL> select count(*) from dept;

  COUNT(*)
----------
         2
 
SQL>  EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SQLADMIN',TABNAME => 'EMP',ESTIMAT
E_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

SQL> 
SQL>  EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SQLADMIN',TABNAME => 'DEPT',ESTIMA
TE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

SQL> 

SQL> select count(ename) from emp where deptno in(select deptno from dept);

COUNT(ENAME)
------------
         100


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=335 Card=1 Bytes=13)
   1    0   SORT (AGGREGATE)
   2    1     NESTED LOOPS (Cost=335 Card=228355 Bytes=2968615)
   3    2       TABLE ACCESS (FULL) OF 'EMP' (Cost=335 Card=456710 Byt
          es=4567100)

   4    2       INDEX (UNIQUE SCAN) OF 'SYS_C0010206' (UNIQUE)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       5485  consistent gets
          0  physical reads
          0  redo size
        307  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> SELECT COUNT(ENAME) from emp where exists (select null  from dept where dept.deptno =
 emp.deptno);

COUNT(ENAME)
------------
         100


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=335 Card=1 Bytes=13)
   1    0   SORT (AGGREGATE)
   2    1     NESTED LOOPS (SEMI) (Cost=335 Card=228355 Bytes=2968615)
   3    2       TABLE ACCESS (FULL) OF 'EMP' (Cost=335 Card=456710 Byt
          es=4567100)

   4    2       INDEX (UNIQUE SCAN) OF 'SYS_C0010206' (UNIQUE)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       5485  consistent gets
          0  physical reads
          0  redo size
        307  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

Query Tuning

November 10, 2008 - 3:15 am UTC

Reviewer: Thakur Manoj from India

Dear Sir

This query is taking 3 seconds to execute it

select distinct
atcf.base_entity_id
, be.entity_name_full As Cutomer_Name
, mbf.bank_facility_name AS Account_No
, atcf.proposed_limit AS Amount_Sanction
, atcf.os_balance AS Amount_Outstand
, rfdt.filter_date
, bd.mst_borrower_state_id
from aud_trn_c3_fac atcf
, rvw_cas_getchkfacdt rfdt
, base_entity be
, rvw_cas_stmt_borr_dtls bd
, mst_bank_facility mbf
Where atcf.trans_datetime = rfdt.facility_dt and
atcf.base_entity_id = rfdt.base_entity_id and
atcf.base_entity_id = be.id and
atcf.id = rfdt.trn_c3_fac_id and
rfdt.base_entity_id = bd.base_entity_id and
rfdt.sub_process_id = bd.sub_process_id and
rfdt.filter_date = bd.filter_date and
rfdt.base_entity_id = be.id and
atcf.mst_bank_facility_id = mbf.id and
mbf.mst_facilitytype_id = 1 and
(atcf.base_entity_id, atcf.id) not in
(
select distinct tss.base_entity_id,
tss.trn_prop_facility_id
from
rvw_cas_stmt_sanc_sec_dtls tss)


Please check the explain Plan

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=292 Card=1 Bytes=153
)

1 0 SORT (UNIQUE) (Cost=228 Card=1 Bytes=153)
2 1 FILTER
3 2 MERGE JOIN (Cost=226 Card=1 Bytes=153)
4 3 SORT (JOIN) (Cost=64 Card=1 Bytes=105)
5 4 MERGE JOIN (Cost=63 Card=1 Bytes=105)
6 5 SORT (JOIN) (Cost=14 Card=48 Bytes=3696)
7 6 HASH JOIN (Cost=12 Card=48 Bytes=3696)
8 7 HASH JOIN (Cost=7 Card=48 Bytes=2352)
9 8 TABLE ACCESS (BY INDEX ROWID) OF 'MST_BANK
_FACILITY' (Cost=2 Card=29 Bytes=754)

10 9 INDEX (RANGE SCAN) OF 'IDX_MST_BANK_FACI
LITY' (NON-UNIQUE) (Cost=1 Card=29)

11 8 TABLE ACCESS (FULL) OF 'AUD_TRN_C3_FAC' (C
ost=4 Card=48 Bytes=1104)

12 7 TABLE ACCESS (FULL) OF 'BASE_ENTITY' (Cost=4
Card=1567 Bytes=43876)

13 5 SORT (JOIN) (Cost=49 Card=1 Bytes=28)
14 13 VIEW OF 'RVW_CAS_GETCHKFACDT' (Cost=48 Card=1
Bytes=28)

15 14 SORT (GROUP BY) (Cost=48 Card=1 Bytes=63)
16 15 MERGE JOIN (Cost=46 Card=1 Bytes=63)
17 16 SORT (JOIN) (Cost=10 Card=1 Bytes=26)
18 17 HASH JOIN (Cost=8 Card=1 Bytes=26)
19 18 TABLE ACCESS (FULL) OF 'TRN_C3_FAC'
(Cost=3 Card=328 Bytes=2952)

20 18 TABLE ACCESS (FULL) OF 'AUD_TRN_C3_F
AC' (Cost=4 Card=951 Bytes=16167)

21 16 FILTER
22 21 SORT (JOIN)
23 22 VIEW (Cost=44 Card=97 Bytes=3589)
24 23 UNION-ALL
25 24 SORT (UNIQUE) (Cost=30 Card=89 B
ytes=2492)

26 25 FILTER
27 26 TABLE ACCESS (FULL) OF 'AUD_
TRN_PROC_STATE' (Cost=28 Card=89 Bytes=2492)

28 27 SORT (AGGREGATE)
29 28 TABLE ACCESS (FULL) OF '
MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

30 26 TABLE ACCESS (FULL) OF 'MSTW
F_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

31 24 SORT (UNIQUE) (Cost=6 Card=8 Byt
es=176)

32 31 FILTER
33 32 TABLE ACCESS (FULL) OF 'AUD_
TRN_DATA_STATE' (Cost=4 Card=8 Bytes=176)

34 33 SORT (AGGREGATE)
35 34 TABLE ACCESS (FULL) OF '
MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

36 32 TABLE ACCESS (FULL) OF 'MSTW
F_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

37 3 SORT (JOIN) (Cost=162 Card=3 Bytes=144)
38 37 VIEW OF 'RVW_CAS_STMT_BORR_DTLS' (Cost=162 Card=3
Bytes=144)

39 38 UNION-ALL
40 39 SORT (UNIQUE) (Cost=56 Card=1 Bytes=51)
41 40 MERGE JOIN (Cost=54 Card=1 Bytes=51)
42 41 SORT (JOIN) (Cost=49 Card=1 Bytes=24)
43 42 VIEW (Cost=47 Card=1 Bytes=24)
44 43 SORT (GROUP BY) (Cost=47 Card=1 Bytes=
45)

45 44 MERGE JOIN (Cost=44 Card=1 Bytes=45)
46 45 SORT (JOIN) (Cost=8 Card=4 Bytes=7
6)

47 46 HASH JOIN (Cost=6 Card=4 Bytes=7
6)

48 47 TABLE ACCESS (FULL) OF 'TRN_C3
_CORPORATE' (Cost=2 Card=110 Bytes=660)

49 47 TABLE ACCESS (FULL) OF 'AUD_TR
N_C3_CORPORATE' (Cost=3 Card=358 Bytes=4654)

50 45 FILTER
51 50 SORT (JOIN)
52 51 VIEW (Cost=44 Card=97 Bytes=25
22)

53 52 UNION-ALL
54 53 SORT (UNIQUE) (Cost=30 Car
d=89 Bytes=2492)

55 54 FILTER
56 55 TABLE ACCESS (FULL) OF
'AUD_TRN_PROC_STATE' (Cost=28 Card=89 Bytes=2492)

57 56 SORT (AGGREGATE)
58 57 TABLE ACCESS (FULL
) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

59 55 TABLE ACCESS (FULL) OF
'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

60 53 SORT (UNIQUE) (Cost=6 Card
=8 Bytes=176)

61 60 FILTER
62 61 TABLE ACCESS (FULL) OF
'AUD_TRN_DATA_STATE' (Cost=4 Card=8 Bytes=176)

63 62 SORT (AGGREGATE)
64 63 TABLE ACCESS (FULL
) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

65 61 TABLE ACCESS (FULL) OF
'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

66 41 SORT (JOIN) (Cost=6 Card=358 Bytes=9666)
67 66 TABLE ACCESS (FULL) OF 'AUD_TRN_C3_CORPO
RATE' (Cost=3 Card=358 Bytes=9666)

68 39 SORT (UNIQUE) (Cost=53 Card=1 Bytes=51)
69 68 MERGE JOIN (Cost=51 Card=1 Bytes=51)
70 69 SORT (JOIN) (Cost=48 Card=3 Bytes=72)
71 70 VIEW (Cost=46 Card=3 Bytes=72)
72 71 SORT (GROUP BY) (Cost=46 Card=3 Bytes=
135)

73 72 MERGE JOIN (Cost=43 Card=49 Bytes=22
05)

74 73 SORT (JOIN) (Cost=7 Card=30 Bytes=
570)

75 74 HASH JOIN (Cost=5 Card=30 Bytes=
570)

76 75 TABLE ACCESS (FULL) OF 'TRN_C3
_BANK' (Cost=2 Card=4 Bytes=24)

77 75 TABLE ACCESS (FULL) OF 'AUD_TR
N_C3_BANK' (Cost=2 Card=68 Bytes=884)

78 73 FILTER
79 78 SORT (JOIN)
80 79 VIEW (Cost=44 Card=97 Bytes=25
22)

81 80 UNION-ALL
82 81 SORT (UNIQUE) (Cost=30 Car
d=89 Bytes=2492)

83 82 FILTER
84 83 TABLE ACCESS (FULL) OF
'AUD_TRN_PROC_STATE' (Cost=28 Card=89 Bytes=2492)

85 84 SORT (AGGREGATE)
86 85 TABLE ACCESS (FULL
) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

87 83 TABLE ACCESS (FULL) OF
'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

88 81 SORT (UNIQUE) (Cost=6 Card
=8 Bytes=176)

89 88 FILTER
90 89 TABLE ACCESS (FULL) OF
'AUD_TRN_DATA_STATE' (Cost=4 Card=8 Bytes=176)

91 90 SORT (AGGREGATE)
92 91 TABLE ACCESS (FULL
) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

93 89 TABLE ACCESS (FULL) OF
'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

94 69 SORT (JOIN) (Cost=4 Card=68 Bytes=1836)
95 94 TABLE ACCESS (FULL) OF 'AUD_TRN_C3_BANK'
(Cost=2 Card=68 Bytes=1836)

96 39 SORT (UNIQUE) (Cost=53 Card=1 Bytes=45)
97 96 MERGE JOIN (Cost=51 Card=1 Bytes=45)
98 97 SORT (JOIN) (Cost=48 Card=3 Bytes=72)
99 98 VIEW (Cost=46 Card=3 Bytes=72)
100 99 SORT (GROUP BY) (Cost=46 Card=3 Bytes=
135)

101 100 MERGE JOIN (Cost=43 Card=29 Bytes=13
05)

102 101 SORT (JOIN) (Cost=7 Card=18 Bytes=
342)

103 102 HASH JOIN (Cost=5 Card=18 Bytes=
342)

104 103 TABLE ACCESS (FULL) OF 'TRN_C3
_SOVEREIGN' (Cost=2 Card=3 Bytes=18)

105 103 TABLE ACCESS (FULL) OF 'AUD_TR
N_C3_SOVEREIGN' (Cost=2 Card=53 Bytes=689)

106 101 FILTER
107 106 SORT (JOIN)
108 107 VIEW (Cost=44 Card=97 Bytes=25
22)

109 108 UNION-ALL
110 109 SORT (UNIQUE) (Cost=30 Car
d=89 Bytes=2492)

111 110 FILTER
112 111 TABLE ACCESS (FULL) OF
'AUD_TRN_PROC_STATE' (Cost=28 Card=89 Bytes=2492)

113 112 SORT (AGGREGATE)
114 113 TABLE ACCESS (FULL
) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

115 111 TABLE ACCESS (FULL) OF
'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

116 109 SORT (UNIQUE) (Cost=6 Card
=8 Bytes=176)

117 116 FILTER
118 117 TABLE ACCESS (FULL) OF
'AUD_TRN_DATA_STATE' (Cost=4 Card=8 Bytes=176)

119 118 SORT (AGGREGATE)
120 119 TABLE ACCESS (FULL
) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

121 117 TABLE ACCESS (FULL) OF
'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

122 97 SORT (JOIN) (Cost=4 Card=53 Bytes=1113)
123 122 TABLE ACCESS (FULL) OF 'AUD_TRN_C3_SOVER
EIGN' (Cost=2 Card=53 Bytes=1113)

124 2 MERGE JOIN (Cost=64 Card=1 Bytes=69)
125 124 SORT (JOIN) (Cost=60 Card=1 Bytes=50)
126 125 VIEW (Cost=59 Card=1 Bytes=50)
127 126 SORT (GROUP BY) (Cost=59 Card=1 Bytes=47)
128 127 MERGE JOIN (Cost=56 Card=1 Bytes=47)
129 128 SORT (JOIN) (Cost=7 Card=1 Bytes=28)
130 129 HASH JOIN (Cost=5 Card=1 Bytes=28)
131 130 TABLE ACCESS (FULL) OF 'TRN_SANC_SECURIT
Y' (Cost=2 Card=270 Bytes=2430)

132 130 TABLE ACCESS (FULL) OF 'AUD_TRN_SANC_SEC
URITY' (Cost=2 Card=428 Bytes=8132)

133 128 FILTER
134 133 SORT (JOIN)
135 134 VIEW OF 'RVW_CAS_GETCHKFACDT' (Cost=48 C
ard=1 Bytes=19)

136 135 SORT (GROUP BY) (Cost=48 Card=1 Bytes=
63)

137 136 MERGE JOIN (Cost=46 Card=1 Bytes=63)
138 137 SORT (JOIN) (Cost=10 Card=1 Bytes=
26)

139 138 HASH JOIN (Cost=8 Card=1 Bytes=2
6)

140 139 TABLE ACCESS (FULL) OF 'TRN_C3
_FAC' (Cost=3 Card=328 Bytes=2952)

141 139 TABLE ACCESS (FULL) OF 'AUD_TR
N_C3_FAC' (Cost=4 Card=951 Bytes=16167)

142 137 FILTER
143 142 SORT (JOIN)
144 143 VIEW (Cost=44 Card=97 Bytes=35
89)

145 144 UNION-ALL
146 145 SORT (UNIQUE) (Cost=30 Car
d=89 Bytes=2492)

147 146 FILTER
148 147 TABLE ACCESS (FULL) OF
'AUD_TRN_PROC_STATE' (Cost=28 Card=89 Bytes=2492)

149 148 SORT (AGGREGATE)
150 149 TABLE ACCESS (FULL
) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

151 147 TABLE ACCESS (FULL) OF
'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

152 145 SORT (UNIQUE) (Cost=6 Card
=8 Bytes=176)

153 152 FILTER
154 153 TABLE ACCESS (FULL) OF
'AUD_TRN_DATA_STATE' (Cost=4 Card=8 Bytes=176)

155 154 SORT (AGGREGATE)
156 155 TABLE ACCESS (FULL
) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

157 153 TABLE ACCESS (FULL) OF
'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

158 124 SORT (JOIN) (Cost=4 Card=1 Bytes=19)
159 158 TABLE ACCESS (FULL) OF 'AUD_TRN_SANC_SECURITY' (Co
st=2 Card=1 Bytes=19)





Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
35867 consistent gets
22459 physical reads
0 redo size
3309 bytes sent via SQL*Net to client
543 bytes received via SQL*Net from client
6 SQL*Net roundtrips to/from client
964 sorts (memory)
0 sorts (disk)
68 rows processed
Please help to tune the below query?

Regards

IN&EXISTS

May 03, 2010 - 1:47 am UTC

Reviewer: Jyothsna from Andhra Pradesh,INDIA

In Rooms table ive 6 rows (2,4,8,16,32,64 )
if i fire
select roomsize from rooms
where 30>roomsize

Then o/p is 4 rows (2,4,8,16)
But the following gives
select roomsize from rooms
where exists (select 1 from rooms
where 30 > roomsize)
It gives all the 6 rows
Why it happens???

Tom Kyte

Followup  

May 06, 2010 - 11:37 am UTC

see the other place you asked this identical question, and only ask in one place from now on please

working of exists

July 08, 2010 - 12:36 pm UTC

Reviewer: Mukund Nampally from Austin,TX

HI Jyothsna,
Well thats the basics!
Exists is a boolean operator in SQL
The query you've written is correlated.The subquery runs for every record in the parent query.So for every parent record your subquery checks to see is there ATLEAST one record where size > 30.Hence exists returns TRUE and this happens to all the records.

Tom Kyte

Followup  

July 08, 2010 - 1:20 pm UTC

it might run for every record, or it very well could be turned into a join, it depends.

ooh! the above is not correlated

July 08, 2010 - 1:05 pm UTC

Reviewer: Mukund Nampally from Austin,TX

I'm sorry the above is not correlated.
you will get the desired results by using alias for parent query and change the sub query to reference the record in the parent query
so this is your algorithm:

for record in tablep
do
for record in tablec
do
check if 30 > tablep.roomsize(outer for loop)
if TRUE break;
done
ouput record from tablep
done
if you refer to tablec instead of tablep you'll retrieve all the records.Hope that helps


The question I would to ask Tom is:
what is the difference in explain plan if just selecting constant from a huge table instead of certain columns in that Table?
The optimizer should be intelligent enough to just output the constant value instead of scanning the table

Tom Kyte

Followup  

July 08, 2010 - 1:21 pm UTC

it would have to either

a) scan the table
b) scan an index on a column that is defined as NOT NULL on that table


we need to know how many times to output the constant after all.

clear understanding of in/exists -not in /not exists

July 27, 2010 - 8:19 am UTC

Reviewer: A reader

Hi Tom,

The below are few questions which most of the times confuses but wanted to have crystal clear understanding of all together
could you please explain them in detail ?


1.Could you please explain the implementations of the execution plans of 'IN' and 'EXISTS' in 9i,10g,11g ,
What are the situations in general to use each one properly ? Can I use them interchnagebly *ALWAYS* (no impact to the output)
2.Could you please explain the implementations of the execution plans of 'NOT IN' and 'NOT EXISTS' in 9i,10g,11g ,
What are the situations in general to use each one properly ?Can I use them interchangeably *ALWAYS* (no impact to the output)

3.I have come across a situation where I have to use either 'JOIN' or 'IN' for two table A and B -
Size of A >> size of B - which is better to use when
a) I want to select only columns of A
b) I want to select only columns of B
c)I want to select columns from both table A and B .

4.which do you think more appropriate to use JOINS (or) Operators like IN/EXITS/NOT IN /NOT EXITS
for the same output ?


Thanks a million for your time
Tom Kyte

Followup  

July 27, 2010 - 12:48 pm UTC

1) they are pretty much interchangeable.

Never say never, never say always, I always say.

The optimizer is a piece of software written by (imperfect) human beings. In most all cases they (in/exists) will be interchangeable - however, there will always be exceptions. No, there are no generalizations you can make about the exceptions - they wouldn't be exceptional if you could :) You might even refer to the exceptions as "product issues" (which they would be). You'll find as many exception on one side of the coin as the other (in/exists being sides of a coin). Meaning - don't pick one over the other as law. Use the one that SEMANTICALLY means the most and is the most convenient to code or read.


Not in and Not exists are different.

where outer.x not in (select y from t)

is NOT the same as

where not exists (select null from t where t.y = outer.x)

UNLESS the expression "y" is not null. That said:

where outer.x not in (select y from t where y is not null)

is the same as
where not exists (select null from t where t.y = outer.x)


In 99.99999999999999999999% of the cases - if Y is nullable, where outer.x not in (select y from t) is a *bug* - a mistake made by the coder (who does not realize that if Y is null once in T, then ZERO ROWS will be satisfied by that predicate).


So, if you always select NOT NULLABLE expressions - the two are interchangable as above.

If the expression is nullable, then you have to use the one that semantically answers the question you are asking!!


3) no you didn't.

You either need to join (because you need data from the joined table) or you didn't.

That is where the question begins and ends - you either

a) need to join
b) didn't need to join.


you should be able to answer the question yourself from those comments...

4) http://www.oracle.com/technology/oramag/oracle/06-may/o36asktom.html

sums up my feelings on all of this.

senario

August 23, 2010 - 5:45 am UTC

Reviewer: radha from India

data like 1123425
result i want like 12345 but where 1 is second priority one
Tom Kyte

Followup  

August 26, 2010 - 10:00 am UTC

that makes absolutely no sense to me.


provide a tiny bit of logic here, pretend you are explaining this to your mom, use small words and a large level of detail

September 21, 2010 - 6:48 am UTC

Reviewer: A Reader

Tom,
Thanks for your time.

I have the following query.


SQL> set pagesize 90;
SQL> set linesize 123;
SQL> set autotrace traceonly;
SQL> set timing on;
SQL> SELECT *
  t1  
    WHERE
        not exists
          (
              select 'x' from t2   where t2.col2 = t1.col3
           );

557087 rows selected.

Elapsed: 00:01:58.29

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=74180 Card=542898 Bytes=148211154)

   1    0   MERGE JOIN (ANTI) (Cost=74180 Card=542898 Bytes=148211154)
   2    1     SORT (JOIN) (Cost=72417 Card=897917 Bytes=239743839)
   3    2       TABLE ACCESS (FULL) OF 't1' (Cost=1995 Card=897917 Bytes=239743839)
   4    1     SORT (UNIQUE) (Cost=1763 Card=355021 Bytes=2130126)
   5    4       TABLE ACCESS (FULL) OF 't2' (Cost=601 Card=355021 Bytes=2130126)


Statistics
----------------------------------------------------------
          0  recursive calls
        358  db block gets
      42933  consistent gets
      92415  physical reads
          0  redo size
   67671164  bytes sent via SQL*Net to client
     409184  bytes received via SQL*Net from client
      37141  SQL*Net roundtrips to/from client
          0  sorts (memory)
          2  sorts (disk)
     557087  rows processed

SQL> Set autotrace off;
SQL>


Optimiser is not choosing HASH JOIN ANTI.

I tried hinting but it follows the same plan.

we are at 
a) 9.2.0.8
b) All stats are up to date
c) count(*) of t1 & t2 matches almost the cardinality
d) HASH parameters
NAME                         VALUE
hash_join_enabled            TRUE
hash_area_size               1048576
e) Optimizer parameter 
alter session set always_anti_join = HASH
is depricated in 9i so doesnt want to use undocumented parameter.
 



Kindly suggest what would be the reason/s Optimiser not choosing the HASH ANTI JOIN. 
 
   

Tom Kyte

Followup  

September 21, 2010 - 4:04 pm UTC

it chose a merge join ANTI instead...

are you using pga_aggregate_target?
any non-typical features like shared server?
what is the sort area size, given that the hash area size is set so small...

aa

September 23, 2010 - 4:42 am UTC

Reviewer: cc

bb

HASH ANTI Join

September 23, 2010 - 9:14 am UTC

Reviewer: A Reader

Tom,

Thanks a lot for your time. 
...contd above question
My fault I was giving HASH_AJ hint in the main query rather then inner query

After giving appropriate hint

I got the following trace


SQL> set timing on;
SQL> set pagesize 90;
SQL> set linesize 123;
SQL> set autotrace traceonly;
 
SQL> SELECT *
    FROM   
   t1 dt
    WHERE
        not exists
        (  select /*+HASH_AJ*/ 'x' from t2 ma where t2.col2 = t1.col3
        ); 

560969 rows selected.

Elapsed: 00:03:04.14

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=132646 Card=542898 Bytes=148211154)

   1    0   HASH JOIN (ANTI) (Cost=132646 Card=542898 Bytes=148211154)
   2    1     TABLE ACCESS (FULL) OF 't1' (Cost=1995 Card=897917 Bytes=239743839)

   3    1     TABLE ACCESS (FULL) OF 't2' (Cost=601    Card=355021 Bytes=2130126)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      43315  consistent gets
      68807  physical reads
          0  redo size
   70888229  bytes sent via SQL*Net to client
     412023  bytes received via SQL*Net from client
      37399  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     560969  rows processed




Above Plan is costlier then SORT MERGE ANTI posted earlier.
( time taken also more)


Other details

which you asked


SQL> show  parameter pga_aggregate_target;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target                 big integer 209715200
SQL> select distinct server from v$session;

SERVER
---------
DEDICATED

SQL> show parameter sort_area;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
sort_area_retained_size              integer     1048576
sort_area_size                       integer     524288

SQL> show parameter workarea_size_policy;
NAME                                 TYPE  VALUE
-----------  --------------------------------     ----------
 
workarea_size_policy                 string           MANUAL
SQL>


========================================================================================================
I tested the same in our test env ( test env was refreshed recenlty from live - using exp / IMP )
and found that Oracle chosen the best plan ( ANTI HASH JOIN) with lesser cost ( for  tables t1, t2 which are alomost same as of live , 
except the data distribution etc :) )

here its is :



SQL> set pagesize 90;
SQL> set linesize 123;
SQL> set autotrace traceonly;
 

SQL> SELECT *
      FROM
           t1  
    WHERE   
        not exists    
 ( 
              select 'x' from t2 ma where  t2.col2 = t1.col3
           );

548755 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=6595 Card=537594      Bytes=145150380)

   1    0   HASH JOIN (ANTI) (Cost=6595 Card=537594 Bytes=145150380)
   2    1     TABLE ACCESS (FULL) OF 't1' (Cost=2140 Card=888661 Bytes=234606504)

   3    1     TABLE ACCESS (FULL) OF 't2' (Cost=632     Card=351069 Bytes=2106414)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      45433  consistent gets
      69231  physical reads
          0  redo size
   70024207  bytes sent via SQL*Net to client
     403068  bytes received via SQL*Net from client
      36585  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     548755  rows processed








Other details.

SQL> show  parameter pga_aggregate_target;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target                 big integer 209715200

SQL> select distinct server from v$session;

SERVER
---------
DEDICATED

SQL> show parameter sort_area;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
sort_area_retained_size              integer     1048576
sort_area_size                       integer     524288

SQL> show parameter workarea_size_policy;
NAME                                 TYPE  VALUE
-----------  --------------------------------     ----------
 
workarea_size_policy                 string   AUTO
SQL>



Could you please
a) comment of above diffrences in live? (HASH_AJ is costlier then SM_AJ )
b) In Test it is doing good ( because of use of PGA .. workwareasize policy?)
c) Any other way to write the main query (  i used NOT IN ( .... where col is NOT NULL) but cost is in Millions)

regards


Tom Kyte

Followup  

September 27, 2010 - 9:28 am UTC

... Above Plan is costlier then SORT MERGE ANTI posted earlier.
( time taken also more ...




and that means... it was right without the hint.



why are you using manual memory management? On one system you are using manual, on the other auto - be consistent. that is why (a) happened - your optimizer settings are very different. You have constrained to a very small size the amount of memory that can be used with the manual system.


as for (c), forget the cost, what is the plan we see.


contd.. HASH ANTI Join....

September 24, 2010 - 5:35 am UTC

Reviewer: A Reader

Tom,

further question..

d)  Can I set 
SQL> alter session set workarea_size_policy=automatic;
for my session in prod DB ( where we have it is set to manual) and run the SQL to see if it is chosing the hash join ( without hint) and doing good at lesser cost?
(sorry to ask you this  as this could be DIY ,  but just thinking if it would be safe for a single session to set this parameter to automatic in  live system as my session may cross the pga_aggregate_target ( a soft boundary)and machine may starve of memory?)

   

Tom Kyte

Followup  

September 27, 2010 - 10:17 am UTC

d) yes.

In and Exists - Oracle 10g - 10.2.0.1

November 02, 2010 - 5:59 am UTC

Reviewer: Hari from India

Hi Tom,

I was actually experimenting the usage of IN and EXISTS in Oracle 10g version. But I could not get the same result as you got in 8i. I did the following:

create table smalltable (data1 number(3), data2 varchar2(10), data3 varchar2(10), data4 varchar2(10));

create or replace procedure smalltableproc is
counter number(3);
begin
 counter := 1;
 while (counter <= 100)
 loop
 insert into smalltable values (counter,'aa' || counter,'bb' || counter, 'cc' || counter);
 counter := counter + 1;
 end loop;
 commit;
end;

SQL> exec smalltableproc

PL/SQL procedure successfully completed.

SQL> create index smalltabledata1idx on smalltable(data1);

SQL> create table largetable (data5 number(6), data6 varchar2(10), data7 varchar2(10), data8 varchar2(10));

create or replace procedure largetableproc is
counter number(6);
begin
 counter := 1;
 while (counter <= 100000)
 loop
  insert into largetable values (counter,'aa' || counter,'bb' || counter, 'cc' || counter);
  counter := counter + 1;
 end loop;
 commit;
end;

SQL> exec largetableproc

PL/SQL procedure successfully completed.

SQL> create index largetabledata5idx on largetable(data5);

SQL> exec DBMS_STATS.gather_table_stats('INXQA','SMALLTABLE');

PL/SQL procedure successfully completed.

SQL> exec DBMS_STATS.gather_table_stats('INXQA','LARGETABLE');

PL/SQL procedure successfully completed.

SQL> exec DBMS_STATS.gather_index_stats('INXQA','SMALLTABLEDATA1IDX');

PL/SQL procedure successfully completed.

SQL> exec DBMS_STATS.gather_index_stats('INXQA','LARGETABLEDATA5IDX');

PL/SQL procedure successfully completed.

Now, when I issued the select statements, I do not find any difference:

SQL> select * from largetable where data5 in (select data1 from smalltable);

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2333463779
--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |   100 |  3100 |   102   (1)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| LARGETABLE         |     1 |    28 |     2   (0)| 00:00:01 |
|   2 |   NESTED LOOPS              |                    |   100 |  3100 |   102   (1)| 00:00:02 |
|   3 |    SORT UNIQUE              |                    |   100 |   300 |     1   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN         | SMALLTABLEDATA1IDX |   100 |   300 |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN         | LARGETABLEDATA5IDX |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("DATA5"="DATA1")

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        124  consistent gets
          0  physical reads
          0  redo size
       2965  bytes sent via SQL*Net to client
        442  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed

SQL> select * from largetable l where exists (select null from smalltable s where data1=l.data5);

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2333463779
--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |   100 |  3100 |   102   (1)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| LARGETABLE         |     1 |    28 |     2   (0)| 00:00:01 |
|   2 |   NESTED LOOPS              |                    |   100 |  3100 |   102   (1)| 00:00:02 |
|   3 |    SORT UNIQUE              |                    |   100 |   300 |     1   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN         | SMALLTABLEDATA1IDX |   100 |   300 |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN         | LARGETABLEDATA5IDX |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("DATA1"="L"."DATA5")

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        124  consistent gets
          0  physical reads
          0  redo size
       2965  bytes sent via SQL*Net to client
        442  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed

When I reveresed the usage of tables, with Small in Outer and Large in Inner, I get the same results for both IN and EXISTS

Does this mean that the usage on IN and EXISTS will be more or less the same in Oracle 10g?

Please Explain

Thanks

Hari

Tom Kyte

Followup  

November 02, 2010 - 6:34 am UTC

things change in 10 years.

In fact, if you ctl-f for:

IN and Inner Join April 4, 2005

on this page, you'll see we had this exact discussion a long time ago ;)

November 04, 2010 - 1:17 am UTC

Reviewer: Kulkarni from India

Tom,
I did the same way as you explained above but did not get different execution plans or costs. I got the same results irrespective of the query that was run.


Table created.

SQL>  insert into big /*+ append */ select * from big;

13124 rows created.

SQL> /

26248 rows created.

SQL> /

52496 rows created.

SQL> /

104992 rows created.

SQL> /

209984 rows created.

SQL> commit;

Commit complete.

SQL> create table small as select * from all_objects where rownum<1000;

Table created.

SQL> desc big
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(30)
 OBJECT_NAME                               NOT NULL VARCHAR2(30)
 SUBOBJECT_NAME                                     VARCHAR2(30)
 OBJECT_ID                                 NOT NULL NUMBER
 DATA_OBJECT_ID                                     NUMBER
 OBJECT_TYPE                                        VARCHAR2(19)
 CREATED                                   NOT NULL DATE
 LAST_DDL_TIME                             NOT NULL DATE
 TIMESTAMP                                          VARCHAR2(19)
 STATUS                                             VARCHAR2(7)
 TEMPORARY                                          VARCHAR2(1)
 GENERATED                                          VARCHAR2(1)
 SECONDARY                                          VARCHAR2(1)
 NAMESPACE                                 NOT NULL NUMBER
 EDITION_NAME                                       VARCHAR2(30)

SQL> create index big_idx on big(SUBOBJECT_NAME);

Index created.

SQL> create index small_idx on small(SUBOBJECT_NAME);

Index created.


SQL> analyze  table big compute statistics for table for all indexed columns for all indexes;

Table analyzed.

SQL> analyze table small compute statistics for table for all indexed columns for all indexes;

Table analyzed.

SQL> set autotrace on exp stat
SQL> select count(SUBOBJECT_NAME) from big where object_id in(select object_id from small);

COUNT(SUBOBJECT_NAME)
---------------------
                    0


Execution Plan
----------------------------------------------------------
Plan hash value: 2513579575

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    28 |  1510   (2)| 00:00:19 |
|   1 |  SORT AGGREGATE       |       |     1 |    28 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|       |   419K|    11M|  1510   (2)| 00:00:19 |
|   3 |    TABLE ACCESS FULL  | SMALL |   999 | 12987 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | BIG   |   419K|  6151K|  1500   (1)| 00:00:19 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"="OBJECT_ID")


Statistics
----------------------------------------------------------
         24  recursive calls
          0  db block gets
       5506  consistent gets
          0  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> l
  1* select count(SUBOBJECT_NAME) from big where object_id in(select object_id from small)
SQL> select count(SUBOBJECT_NAME) from big where object_id in(select object_id from small where small.object_id=big.object_id);

COUNT(SUBOBJECT_NAME)
---------------------
                    0


Execution Plan
----------------------------------------------------------
Plan hash value: 2513579575

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    28 |  1510   (2)| 00:00:19 |
|   1 |  SORT AGGREGATE       |       |     1 |    28 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|       |   419K|    11M|  1510   (2)| 00:00:19 |
|   3 |    TABLE ACCESS FULL  | SMALL |   999 | 12987 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | BIG   |   419K|  6151K|  1500   (1)| 00:00:19 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"="OBJECT_ID")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       5504  consistent gets
          0  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select count(SUBOBJECT_NAME) from big where exists ( select null from small where small.object_id=big.object_id);

COUNT(SUBOBJECT_NAME)
---------------------
                    0


Execution Plan
----------------------------------------------------------
Plan hash value: 2513579575

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    28 |  1510   (2)| 00:00:19 |
|   1 |  SORT AGGREGATE       |       |     1 |    28 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|       |   419K|    11M|  1510   (2)| 00:00:19 |
|   3 |    TABLE ACCESS FULL  | SMALL |   999 | 12987 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | BIG   |   419K|  6151K|  1500   (1)| 00:00:19 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("SMALL"."OBJECT_ID"="BIG"."OBJECT_ID")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       5504  consistent gets
          0  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select count(SUBOBJECT_NAME) from small where exists ( select null from big where small.object_id=big.object_id);

COUNT(SUBOBJECT_NAME)
---------------------
                    0


Execution Plan
----------------------------------------------------------
Plan hash value: 107836466

-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |     1 |    27 |  1510   (2)| 00:00:19 |
|   1 |  SORT AGGREGATE     |       |     1 |    27 |            |          |
|*  2 |   HASH JOIN SEMI    |       |   999 | 26973 |  1510   (2)| 00:00:19 |
|   3 |    TABLE ACCESS FULL| SMALL |   999 | 13986 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| BIG   |   419K|  5331K|  1500   (1)| 00:00:19 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("SMALL"."OBJECT_ID"="BIG"."OBJECT_ID")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         32  consistent gets
          0  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>  select count(SUBOBJECT_NAME) from small where object_id in(select object_id from big);

COUNT(SUBOBJECT_NAME)
---------------------
                    0


Execution Plan
----------------------------------------------------------
Plan hash value: 107836466

-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |     1 |    27 |  1510   (2)| 00:00:19 |
|   1 |  SORT AGGREGATE     |       |     1 |    27 |            |          |
|*  2 |   HASH JOIN SEMI    |       |   999 | 26973 |  1510   (2)| 00:00:19 |
|   3 |    TABLE ACCESS FULL| SMALL |   999 | 13986 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| BIG   |   419K|  5331K|  1500   (1)| 00:00:19 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"="OBJECT_ID")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         32  consistent gets
          0  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

I am using 11.2 

Tom Kyte

Followup  

November 04, 2010 - 3:46 am UTC

please read the review immediately prior to your - same exact thing. again.

EXISTS and ROWNUM = 1

June 19, 2012 - 8:29 am UTC

Reviewer: A reader

When we have EXISTS check, is it better to put ROWNUM = 1?

I am listing two queries below with their corresponding Auto Trace info. Only difference between these queries is the ROWNUM = 1 with the EXISTS check for the second query.
The first query takes 25 secs and the second query takes 5 secs
SQL> SET TIMING ON
SQL> SELECT s.supplier, s.sup_name
  2  FROM sups s
  3  WHERE EXISTS(SELECT 'Y'
  4               FROM aes_dlp_iscl adi
  5               WHERE adi.supplier = s.supplier);

7576 rows selected.

Elapsed: 00:00:25.81

Execution Plan
----------------------------------------------------------                      
Plan hash value: 532059534                                                      
                                                                                
--------------------------------------------------------------------------------
-----------------------                                                         
                                                                                
| Id  | Operation           | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |                                                         
                                                                                
-------------------------------------------------------------------------------------------------------                                                         
                                                                                
|   0 | SELECT STATEMENT    |                 |  5947 |   191K|  1327  (47)| 00:00:19 |       |       |                                                         
                                                                                
|*  1 |  HASH JOIN SEMI     |                 |  5947 |   191K|  1327  (47)| 00:00:19 |       |       |                                                         
                                                                                
|   2 |   TABLE ACCESS FULL | SUPS            | 25120 |   662K|   112   (1)| 00:00:02 |       |       |                                                         
                                                                                
|   3 |   PARTITION HASH ALL|                 |    65M|   375M|   608   (3)| 00:00:09 |     1 |    16 |                                                         
                                                                                
|   4 |    INDEX FULL SCAN  | AES_DLP_ISCL_I1 |    65M|   375M|   608   (3)| 00:00:09 |     1 |    16 |                                                         
                                                                                
-------------------------------------------------------------------------------------------------------                                                         
                                                                                
                                                                                
Predicate Information (identified by operation id):                             
---------------------------------------------------                             
                                                                                
   1 - access("ADI"."SUPPLIER"="S"."SUPPLIER")                                  


Statistics
----------------------------------------------------------                      
          0  recursive calls                                                    
          0  db block gets                                                      
      60274  consistent gets                                                    
      42419  physical reads                                                     
          0  redo size                                                          
     235240  bytes sent via SQL*Net to client                                   
       3787  bytes received via SQL*Net from client                             
        507  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
       7576  rows processed                                                     

SQL> 
SQL> SELECT s.supplier, s.sup_name
  2  FROM sups s
  3  WHERE EXISTS(SELECT 'Y'
  4               FROM aes_dlp_iscl adi
  5               WHERE adi.supplier = s.supplier
  6                AND  ROWNUM = 1);

7576 rows selected.

Elapsed: 00:00:04.71

Execution Plan
----------------------------------------------------------                      
Plan hash value: 1883786237                                                     
                                                                                
--------------------------------------------------------------------------------------------------------                                                        
                                                                                
| Id  | Operation            | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |                                                        
                                                                                
--------------------------------------------------------------------------------------------------------                                                        
                                                                                
|   0 | SELECT STATEMENT     |                 |     1 |    27 | 15874   (1)| 00:03:43 |       |       |                                                        
                                                                                
|*  1 |  FILTER              |                 |       |       |            |          |       |       |                                                        
                                                                                
|   2 |   TABLE ACCESS FULL  | SUPS            | 25120 |   662K|   112   (1)| 00:00:02 |       |       |                                                        
                                                                                
|*  3 |   COUNT STOPKEY      |                 |       |       |            |          |       |       |                                                        
                                                                                
|   4 |    PARTITION HASH ALL|                 |     2 |    12 |     1   (0)| 00:00:01 |     1 |    16 |                                                        
                                                                                
|*  5 |     INDEX RANGE SCAN | AES_DLP_ISCL_I1 |     2 |    12 |     1   (0)| 00:00:01 |     1 |    16 |                                                        
                                                                                
--------------------------------------------------------------------------------------------------------                                                        
                                                                                
                                                                                
Predicate Information (identified by operation id):                             
---------------------------------------------------                             
                                                                                
   1 - filter( EXISTS (SELECT 0 FROM "ETL"."AES_DLP_ISCL" "ADI" WHERE ROWNUM=1 AND                                                                                                                                             
              "ADI"."SUPPLIER"=:B1))                                            
   3 - filter(ROWNUM=1)                                                         
   5 - access("ADI"."SUPPLIER"=:B1)                                             


Statistics
----------------------------------------------------------                      
          0  recursive calls                                                    
          0  db block gets                                                      
     979545  consistent gets                                                    
          1  physical reads                                                     
          0  redo size                                                          
     235240  bytes sent via SQL*Net to client                                   
       3787  bytes received via SQL*Net from client                             
        507  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
       7576  rows processed                                                     

SQL> SPOOL OFF

Tom Kyte

Followup  

June 19, 2012 - 8:46 am UTC

one did physical IO - PIO, the other did not.

does that one that did PIO always do PIO
does that one that did not do PIO always not do PIO


get a tkprof to compare, autotrace is insufficient, we'd need to see where the IO was taking place.

My suspicion is the first query did all of the physical IO that the second would have had to of performed had it been run first.

Cache

June 19, 2012 - 9:06 am UTC

Reviewer: Tony Fernandez from Atlanta, GA

Will alter system clear buffer_cache; help in this case to compare apples to apples?
Tom Kyte

Followup  

June 19, 2012 - 9:38 am UTC

Nope, not unless you are using an unbuffered file system (like ASM or RAW disks for "forcedirectio" mount like options).

Otherwise, the file system is typically buffered by the OS and your second phyiscal IO is not really a physical IO but a read from the secondary SGA - the file system cache (search this site for "secondary sga" to read more about that)

Also, what are the odds your buffer cache will be entirely empty? Zero except right after startup, so flushing the buffer cache is as fake as having the second query run second.



EXISTS and ROWNUM = 1

June 19, 2012 - 9:16 am UTC

Reviewer: A reader

It doesn't matter if I change the order execution of the query, the consistent gets and physical reads remain the same. I ran both the queries multiple times with the same result. Let me see if I get the DBA to give me a TKPROF for the queries

EXISTS and ROWNUM = 1

June 19, 2012 - 10:46 am UTC

Reviewer: A reader

Please find the TKPROF results below

With ROWNUM =1

TKPROF: Release 10.2.0.3.0 - Production on Tue Jun 19 11:33:52 2012

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Trace file: /in/RMD2/oracle/diag/rdbms/rmd2/RMD2/trace/RMD2_ora_1179858.trc
Sort options: default

********************************************************************************
count    = number of times OCI procedure was executed
cpu      = cpu time in seconds executing 
elapsed  = elapsed time in seconds executing
disk     = number of physical reads of buffers from disk
query    = number of buffers gotten for consistent read
current  = number of buffers gotten in current mode (usually for update)
rows     = number of rows processed by the fetch or execute call
********************************************************************************

SELECT s.supplier, s.sup_name
FROM sups s
WHERE EXISTS(SELECT :"SYS_B_0"
             FROM aes_dlp_iscl adi
             WHERE adi.supplier = s.supplier
              AND  ROWNUM = :"SYS_B_1")

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      507      3.46       5.20          0     979545          0        7576
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      509      3.46       5.20          0     979545          0        7576

Misses in library cache during parse: 0
Optimizer mode: CHOOSE
Parsing user id: 488  

Rows     Row Source Operation
-------  ---------------------------------------------------
   7576  FILTER  (cr=979545 pr=0 pw=0 time=5434066 us)
  24965   TABLE ACCESS FULL SUPS (cr=798 pr=0 pw=0 time=25702 us cost=112 size=678240 card=25120)
   7576   COUNT STOPKEY (cr=978747 pr=0 pw=0 time=5072884 us)
   7576    PARTITION HASH ALL PARTITION: 1 16 (cr=978747 pr=0 pw=0 time=4987595 us cost=1 size=12 card=2)
   7576     INDEX RANGE SCAN AES_DLP_ISCL_I1 PARTITION: 1 16 (cr=978747 pr=0 pw=0 time=4546245 us cost=1 size=12 card=2)(object id 2467472)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     507        0.00          0.00
  Disk file operations I/O                        1        0.00          0.00
  SQL*Net message from client                   507        8.01          8.27
********************************************************************************

SELECT RTRIM(c.value, :"SYS_B_0")||:"SYS_B_1"||d.instance_name||:"SYS_B_2"||LTRIM(TO_CHAR(a.spid))||:"SYS_B_3" trace_file
FROM v$process a, v$session b, v$parameter c, v$instance d
WHERE a.addr = b.paddr
 AND  b.audsid = SYS_CONTEXT(:"SYS_B_4", :"SYS_B_5")
 AND  c.name = :"SYS_B_6"

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.04       0.04          0          0          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.04       0.04          0          0          0           1

Misses in library cache during parse: 0
Optimizer mode: CHOOSE
Parsing user id: 488  

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  MERGE JOIN  (cr=0 pr=0 pw=0 time=47707 us)
      1   SORT JOIN (cr=0 pr=0 pw=0 time=41801 us)
      1    MERGE JOIN  (cr=0 pr=0 pw=0 time=41765 us)
     40     SORT JOIN (cr=0 pr=0 pw=0 time=29797 us)
     44      MERGE JOIN  (cr=0 pr=0 pw=0 time=29654 us)
     44       SORT JOIN (cr=0 pr=0 pw=0 time=28656 us)
     44        NESTED LOOPS  (cr=0 pr=0 pw=0 time=24611 us)
      1         MERGE JOIN  (cr=0 pr=0 pw=0 time=28086 us)
   2655          SORT JOIN (cr=0 pr=0 pw=0 time=11500 us)
   2655           NESTED LOOPS  (cr=0 pr=0 pw=0 time=7948 us)
      1            NESTED LOOPS  (cr=0 pr=0 pw=0 time=111 us)
      1             NESTED LOOPS  (cr=0 pr=0 pw=0 time=78 us)
      1              FIXED TABLE FULL X$QUIESCE (cr=0 pr=0 pw=0 time=41 us)
      1              FIXED TABLE FULL X$KVIT (cr=0 pr=0 pw=0 time=29 us)
      1             FIXED TABLE FULL X$KSUXSINST (cr=0 pr=0 pw=0 time=26 us)
   2655            FIXED TABLE FULL X$KSPPCV (cr=0 pr=0 pw=0 time=6714 us)
      1          FILTER  (cr=0 pr=0 pw=0 time=15303 us)
      1           SORT JOIN (cr=0 pr=0 pw=0 time=11728 us)
      1            FIXED TABLE FULL X$KSPPI (cr=0 pr=0 pw=0 time=8150 us)
     44         FIXED TABLE FULL X$KSLWT (cr=0 pr=0 pw=0 time=110 us)
     44       SORT JOIN (cr=0 pr=0 pw=0 time=950 us)
   1142        FIXED TABLE FULL X$KSLED (cr=0 pr=0 pw=0 time=897 us)
      1     SORT JOIN (cr=0 pr=0 pw=0 time=11978 us)
      1      FIXED TABLE FULL X$KSUSE (cr=0 pr=0 pw=0 time=11910 us)
      1   SORT JOIN (cr=0 pr=0 pw=0 time=5906 us)
     44    FIXED TABLE FULL X$KSUPR (cr=0 pr=0 pw=0 time=99 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  SQL*Net message from client                     2       17.22         17.23



********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch      509      3.50       5.25          0     979545          0        7577
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      513      3.50       5.25          0     979545          0        7577

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     510        0.00          0.00
  SQL*Net message from client                   510       17.22         40.00
  Disk file operations I/O                        1        0.00          0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute      0      0.00       0.00          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        0      0.00       0.00          0          0          0           0

Misses in library cache during parse: 0

    2  user  SQL statements in session.
    0  internal SQL statements in session.
    2  SQL statements in session.
********************************************************************************
Trace file: /in/RMD2/oracle/diag/rdbms/rmd2/RMD2/trace/RMD2_ora_1179858.trc
Trace file compatibility: 10.01.00
Sort options: default

       1  session in tracefile.
       2  user  SQL statements in trace file.
       0  internal SQL statements in trace file.
       2  SQL statements in trace file.
       2  unique SQL statements in trace file.
    1673  lines in trace file.
 9217858  elapsed seconds in trace file.


With No Rownum
SELECT s.supplier, s.sup_name
FROM sups s
WHERE EXISTS(SELECT :"SYS_B_0"
             FROM anp_dlp_iscl adi
             WHERE adi.supplier = s.supplier)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.01       0.00          0          0          0           0
Fetch      507     58.84      60.16      42419      60274          0        7576
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      509     58.85      60.17      42419      60274          0        7576

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: CHOOSE
Parsing user id: 488  

Rows     Row Source Operation
-------  ---------------------------------------------------
   7576  HASH JOIN SEMI (cr=60274 pr=42419 pw=0 time=68810647 us cost=1327 size=196251 card=5947)
  24965   TABLE ACCESS FULL SUPS (cr=293 pr=0 pw=0 time=12459 us cost=112 size=678240 card=25120)
66175360   PARTITION HASH ALL PARTITION: 1 16 (cr=59981 pr=42419 pw=0 time=80186367 us cost=608 size=394093758 card=65682293)
66175360    INDEX FULL SCAN ANP_DLP_ISCL_I1 PARTITION: 1 16 (cr=59981 pr=42419 pw=0 time=41272380 us cost=608 size=394093758 card=65682293)(object id 2467472)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     507        0.00          0.00
  Disk file operations I/O                       14        0.00          0.00
  SQL*Net message from client                   507        1.08          1.35
  db file sequential read                     42419        0.01          0.53
********************************************************************************

SELECT RTRIM(c.value, :"SYS_B_0")||:"SYS_B_1"||d.instance_name||:"SYS_B_2"||LTRIM(TO_CHAR(a.spid))||:"SYS_B_3" trace_file
FROM v$process a, v$session b, v$parameter c, v$instance d
WHERE a.addr = b.paddr
 AND  b.audsid = SYS_CONTEXT(:"SYS_B_4", :"SYS_B_5")
 AND  c.name = :"SYS_B_6"

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.04       0.04          0          0          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.04       0.04          0          0          0           1

Misses in library cache during parse: 0
Optimizer mode: CHOOSE
Parsing user id: 488  

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  MERGE JOIN  (cr=0 pr=0 pw=0 time=47972 us)
      1   SORT JOIN (cr=0 pr=0 pw=0 time=42076 us)
      1    MERGE JOIN  (cr=0 pr=0 pw=0 time=42037 us)
     35     SORT JOIN (cr=0 pr=0 pw=0 time=29404 us)
     43      MERGE JOIN  (cr=0 pr=0 pw=0 time=29284 us)
     43       SORT JOIN (cr=0 pr=0 pw=0 time=28286 us)
     43        NESTED LOOPS  (cr=0 pr=0 pw=0 time=24186 us)
      1         MERGE JOIN  (cr=0 pr=0 pw=0 time=27569 us)
   2655          SORT JOIN (cr=0 pr=0 pw=0 time=11089 us)
   2655           NESTED LOOPS  (cr=0 pr=0 pw=0 time=7941 us)
      1            NESTED LOOPS  (cr=0 pr=0 pw=0 time=96 us)
      1             NESTED LOOPS  (cr=0 pr=0 pw=0 time=66 us)
      1              FIXED TABLE FULL X$QUIESCE (cr=0 pr=0 pw=0 time=33 us)
      1              FIXED TABLE FULL X$KVIT (cr=0 pr=0 pw=0 time=27 us)
      1             FIXED TABLE FULL X$KSUXSINST (cr=0 pr=0 pw=0 time=24 us)
   2655            FIXED TABLE FULL X$KSPPCV (cr=0 pr=0 pw=0 time=6335 us)
      1          FILTER  (cr=0 pr=0 pw=0 time=15256 us)
      1           SORT JOIN (cr=0 pr=0 pw=0 time=11699 us)
      1            FIXED TABLE FULL X$KSPPI (cr=0 pr=0 pw=0 time=8177 us)
     43         FIXED TABLE FULL X$KSLWT (cr=0 pr=0 pw=0 time=104 us)
     43       SORT JOIN (cr=0 pr=0 pw=0 time=920 us)
   1142        FIXED TABLE FULL X$KSLED (cr=0 pr=0 pw=0 time=891 us)
      1     SORT JOIN (cr=0 pr=0 pw=0 time=12663 us)
      1      FIXED TABLE FULL X$KSUSE (cr=0 pr=0 pw=0 time=12575 us)
      1   SORT JOIN (cr=0 pr=0 pw=0 time=5896 us)
     43    FIXED TABLE FULL X$KSUPR (cr=0 pr=0 pw=0 time=138 us)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       3        0.00          0.00
  SQL*Net message from client                     3       12.50         12.51
  SQL*Net break/reset to client                   1        0.00          0.00



********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.01       0.00          0          0          0           0
Fetch      509     58.88      60.21      42419      60274          0        7577
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      513     58.89      60.22      42419      60274          0        7577

Misses in library cache during parse: 1
Misses in library cache during execute: 1

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     511        0.00          0.00
  SQL*Net message from client                   511       12.50         22.26
  Disk file operations I/O                       14        0.00          0.00
  db file sequential read                     42419        0.01          0.53
  SQL*Net break/reset to client                   1        0.00          0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute      0      0.00       0.00          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        0      0.00       0.00          0          0          0           0

Misses in library cache during parse: 0

    2  user  SQL statements in session.
    0  internal SQL statements in session.
    2  SQL statements in session.
********************************************************************************
Trace file: /in/RMD2/oracle/diag/rdbms/rmd2/RMD2/trace/RMD2_ora_1220680.trc
Trace file compatibility: 10.01.00
Sort options: default

       1  session in tracefile.
       2  user  SQL statements in trace file.
       0  internal SQL statements in trace file.
       2  SQL statements in trace file.
       2  unique SQL statements in trace file.
   44205  lines in trace file.
 9220797  elapsed seconds in trace file.


Tom Kyte

Followup  

June 19, 2012 - 12:41 pm UTC

cursor sharing = force or similar :( :( :( ugh. I always shudder when I see that.



what kind of hardware do you have? It seems to be really "not fast" cpu wise at hashing. I would have thought the hash semi join should have been better - but we spent a ton of cpu time hashing.


is this index that is being used hash partitioned into 16 partitions (by something other than supplier id?)

count stopkey

June 19, 2012 - 11:37 am UTC

Reviewer: Keith from Austin, TX

I noticed that the plan with "AND ROWNUM = 1" did a COUNT STOPKEY while the other plan omitted this step. So it seems to me that COUNT STOPKEY should usually (always?) be implied by WHERE EXISTS. What are your thought/insights Tom?

Tom Kyte

Followup  

June 19, 2012 - 12:44 pm UTC

rownum is count stopkey, you won't see it without a rownum predicate ever.

EXISTS and ROWNUM = 1

June 19, 2012 - 1:18 pm UTC

Reviewer: A reader

How did you figure out the cursor_sharing was similar? What part in TKPROF tells you that info?

The index definition is as follows
CREATE INDEX AES_DLP_ISCL_I1 ON AES_DLP_ISCL
(SUPPLIER, ORIGIN_COUNTRY_ID, LOC, LOC_TYPE)
NOLOGGING
LOCAL (  
  PARTITION ADI_PKX_P01
    TABLESPACE ADIX_P01,  
  PARTITION ADI_PKX_P02
    TABLESPACE ADIX_P02,  
  PARTITION ADI_PKX_P03
    TABLESPACE ADIX_P03,  
  PARTITION ADI_PKX_P04
    TABLESPACE ADIX_P04,  
  PARTITION ADI_PKX_P05
    TABLESPACE ADIX_P05,  
  PARTITION ADI_PKX_P06
    TABLESPACE ADIX_P06,  
  PARTITION ADI_PKX_P07
    TABLESPACE ADIX_P07,  
  PARTITION ADI_PKX_P08
    TABLESPACE ADIX_P08,  
  PARTITION ADI_PKX_P09
    TABLESPACE ADIX_P09,  
  PARTITION ADI_PKX_P10
    TABLESPACE ADIX_P10,  
  PARTITION ADI_PKX_P11
    TABLESPACE ADIX_P11,  
  PARTITION ADI_PKX_P12
    TABLESPACE ADIX_P12,  
  PARTITION ADI_PKX_P13
    TABLESPACE ADIX_P13,  
  PARTITION ADI_PKX_P14
    TABLESPACE ADIX_P14,  
  PARTITION ADI_PKX_P15
    TABLESPACE ADIX_P15,  
  PARTITION ADI_PKX_P16
    TABLESPACE ADIX_P16
)
NOPARALLEL
COMPRESS 3;


I am not that sure about hardware. Let me try and find out
Tom Kyte

Followup  

June 19, 2012 - 3:01 pm UTC

SELECT s.supplier, s.sup_name
FROM sups s
WHERE EXISTS(SELECT :"SYS_B_0"
             FROM anp_dlp_iscl adi
             WHERE adi.supplier = s.supplier)


:"SYS_B_0"


that is the first bind name we use to replace literals with, then :"SYS_B_1" and so on. It is a signature.

I guessed it right based on the LIOs - I was sure that was a 16 partition index with a height of about 2, sometimes 3, depending on the partition.

SELECT s.supplier, s.sup_name
FROM sups s
WHERE EXISTS(SELECT :"SYS_B_0"
             FROM aes_dlp_iscl adi
             WHERE adi.supplier = s.supplier
              AND  ROWNUM = :"SYS_B_1")

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      507      3.46       5.20          0     979545          0        7576
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      509      3.46       5.20          0     979545          0        7576

Misses in library cache during parse: 0
Optimizer mode: CHOOSE
Parsing user id: 488  

Rows     Row Source Operation
-------  ---------------------------------------------------
   7576  FILTER  (cr=979545 pr=0 pw=0 time=5434066 us)
  24965   TABLE ACCESS FULL SUPS (cr=798 pr=0 pw=0 time=25702 us cost=112 size=678240 card=25120)
   7576   COUNT STOPKEY (cr=978747 pr=0 pw=0 time=5072884 us)
   7576    PARTITION HASH ALL PARTITION: 1 16 (cr=978747 pr=0 pw=0 time=4987595 us cost=1 size=12 
card=2)
   7576     INDEX RANGE SCAN AES_DLP_ISCL_I1 PARTITION: 1 16 (cr=978747 pr=0 pw=0 time=4546245 us 
cost=1 size=12 card=2)(object id 2467472)



978,747 IO's on the index, but only 24,965 seeks on the index - so about 39 IO's per index range scan.

Since it was partitioned and hash partitioned at that, I knew it would be in 1, 2, 4, 8, 16, 32, 64, ... partitions (it better be!). 64 was out of the question. If you divide 16 into 39 you get about 2/3 IO's per index partition.


So, does that index need to be partitioned, could that index by partitioned by something else? does it have to be local, you'll have to always scan 16 partitions unless you use the partition key in the where clause too.

EXISTS and ROWNUM = 1

June 19, 2012 - 3:28 pm UTC

Reviewer: A reader

Thanks for the explanation on the TKPROF.

But sorry, I did not understand why ROWNUM = 1 was faster? Should we use ROWNUM = 1 in the exists?
Tom Kyte

Followup  

June 19, 2012 - 5:59 pm UTC

In general - no, you shouldn't.

It is a complex plan - you have 16 partitions to index range scan, 25k times. Or we can full scan and hash join (big bulky operations, good)

For some reason, the hashing took an inordinate amount of time on your computer.

nice explanation

January 25, 2013 - 4:08 am UTC

Reviewer: mailos from Bratislava, Slovakia


May 03, 2013 - 8:10 am UTC

Reviewer: Soco from Software Park, Dalian, China

Hi Tom,

Thanks for all your sharing on this topic and I have learnt a lot from it. I also run few tests on my own database and I found my results are different from yours. My guess is it may cause by the different versions of oracle database, I saw this post was almost 13 years old and lots of things might change, such as CBO, I use 11g right now by the way. Here are my process, and hope you can help with me on that.

jerry@DBJERRY> create table big as select * from all_objects;

Table created.

jerry@DBJERRY> insert into big select * from big;

55625 rows created.

jerry@DBJERRY> commit;

Commit complete.

jerry@DBJERRY> insert into big select * from big;

111250 rows created.

jerry@DBJERRY> commit;

Commit complete.

jerry@DBJERRY> insert into big select * from big;

222500 rows created.

jerry@DBJERRY> commit;

Commit complete.

jerry@DBJERRY> insert into big select * from big;

445000 rows created.

jerry@DBJERRY> commit;

Commit complete.

jerry@DBJERRY> create table small as select * from all_objects where rownum<100;

Table created.

jerry@DBJERRY> create index big_idx on big(object_id);

Index created.

jerry@DBJERRY> create index small_idx on small(object_id);

Index created.

jerry@DBJERRY> exec dbms_stats.gather_table_stats(user,'big',cascade=>true);

PL/SQL procedure successfully completed.

jerry@DBJERRY> exec dbms_stats.gather_table_stats(user,'small',cascade=>true);

PL/SQL procedure successfully completed.

jerry@DBJERRY> alter system flush buffer_cache;

System altered.

jerry@DBJERRY> alter system flush shared_pool;

System altered.

jerry@DBJERRY> set autotrace on



till now my setup is done and follow up with the example:

***** outer query is big and inner query is small*****

jerry@DBJERRY> select count(subobject_name) from big where object_id in(select object_id from small);

COUNT(SUBOBJECT_NAME)
---------------------
                    0


Execution Plan
----------------------------------------------------------
Plan hash value: 3874879108

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     1 |    26 |   854   (1)| 00:00:11 |
|   1 |  SORT AGGREGATE               |           |     1 |    26 |            |          |
|   2 |   NESTED LOOPS                |           |       |       |            |          |
|   3 |    NESTED LOOPS               |           |  1573 | 40898 |   854   (1)| 00:00:11 |
|   4 |     SORT UNIQUE               |           |    99 |   396 |     1   (0)| 00:00:01 |
|   5 |      INDEX FULL SCAN          | SMALL_IDX |    99 |   396 |     1   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | BIG_IDX   |    16 |       |     2   (0)| 00:00:01 |
|   7 |    TABLE ACCESS BY INDEX ROWID| BIG       |    16 |   352 |    18   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("OBJECT_ID"="OBJECT_ID")


Statistics
----------------------------------------------------------
        508  recursive calls    --reduce to 0 at the second run
          0  db block gets
       1691  consistent gets    --reduce to 1634 at the second run
         61  physical reads    --reduce to 0 at the second run
          0  redo size
        434  bytes sent via SQL*Net to client
        415  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          9  sorts (memory)    --reduce to 1 at the second run

jerry@DBJERRY> select count(subobject_name) from big where exists(select null from small where small.object_id = big.object_id);

COUNT(SUBOBJECT_NAME)
---------------------
                    0


Execution Plan
----------------------------------------------------------
Plan hash value: 3874879108

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     1 |    26 |   854   (1)| 00:00:11 |
|   1 |  SORT AGGREGATE               |           |     1 |    26 |            |          |
|   2 |   NESTED LOOPS                |           |       |       |            |          |
|   3 |    NESTED LOOPS               |           |  1573 | 40898 |   854   (1)| 00:00:11 |
|   4 |     SORT UNIQUE               |           |    99 |   396 |     1   (0)| 00:00:01 |
|   5 |      INDEX FULL SCAN          | SMALL_IDX |    99 |   396 |     1   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | BIG_IDX   |    16 |       |     2   (0)| 00:00:01 |
|   7 |    TABLE ACCESS BY INDEX ROWID| BIG       |    16 |   352 |    18   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("SMALL"."OBJECT_ID"="BIG"."OBJECT_ID")


Statistics
----------------------------------------------------------
          1  recursive calls   --reduce to 0 at the second run
          0  db block gets
       1634  consistent gets
          0  physical reads
          0  redo size
        434  bytes sent via SQL*Net to client
        415  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

****** outer query is small and inner query is big*****

jerry@DBJERRY> select count(subobject_name) from small where object_id in(select object_id from big);

COUNT(SUBOBJECT_NAME)
---------------------
                    0


Execution Plan
----------------------------------------------------------
Plan hash value: 4075727039

-------------------------------------------------------------------------------
| Id  | Operation           | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |     1 |    26 |   201   (0)| 00:00:03 |
|   1 |  SORT AGGREGATE     |         |     1 |    26 |            |          |
|   2 |   NESTED LOOPS SEMI |         |    99 |  2574 |   201   (0)| 00:00:03 |
|   3 |    TABLE ACCESS FULL| SMALL   |    99 |  2079 |     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN | BIG_IDX |   890K|  4345K|     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("OBJECT_ID"="OBJECT_ID")


Statistics
----------------------------------------------------------
         24  recursive calls   --reduce to 0 at the second run
          0  db block gets
         53  consistent gets   --reduce to 50 at the second run
          3  physical reads
          0  redo size
        434  bytes sent via SQL*Net to client
        415  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

jerry@DBJERRY> select count(subobject_name) from small where exists(select null from big where small.object_id = big.object_id);

COUNT(SUBOBJECT_NAME)
---------------------
                    0


Execution Plan
----------------------------------------------------------
Plan hash value: 4075727039

-------------------------------------------------------------------------------
| Id  | Operation           | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |     1 |    26 |   201   (0)| 00:00:03 |
|   1 |  SORT AGGREGATE     |         |     1 |    26 |            |          |
|   2 |   NESTED LOOPS SEMI |         |    99 |  2574 |   201   (0)| 00:00:03 |
|   3 |    TABLE ACCESS FULL| SMALL   |    99 |  2079 |     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN | BIG_IDX |   890K|  4345K|     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("SMALL"."OBJECT_ID"="BIG"."OBJECT_ID")


Statistics
----------------------------------------------------------
          1  recursive calls   --reduce to 0 at the second run
          0  db block gets
         50  consistent gets
          0  physical reads
          0  redo size
        434  bytes sent via SQL*Net to client
        415  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

the results are they are almost the same

Thanks for your time and hope you have a nice day!

Tom Kyte

Followup  

May 06, 2013 - 7:03 pm UTC

you are correct that things change and we actually discussed this in the body of this very very very long posting (13 years of posting ;) )

it is the difference between the CBO and RBO.

here is a more formal write up of it:

http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html

January 17, 2016 - 4:13 pm UTC

Reviewer: A reader

Neat explanation

Is the above explanations still true for Oracle 12C database?

July 17, 2018 - 3:13 pm UTC

Reviewer: Manisundaram R from India

In some websites regarding IN vs EXISTS, it is pointed out that after oracle 9i, the database is intelligent enough to choose which would perform better. Is it really true?

February 13, 2019 - 11:39 am UTC

Reviewer: A reader