Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Dan.

Asked: January 29, 2003 - 10:00 am UTC

Last updated: June 10, 2009 - 11:45 am UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Tom,

How bad will be the performance of query if NOT IN is used.Is there any other way of writing a query not using NOT IN.

Thanks,
Dan

and Tom said...

Well, it depends.

If you are one of those people "afraid" of the big bad CBO, it could be very bad. If you are not -- you will see the best of all possible performance using it. Take this script:


create table t1 as select * from all_objects where rownum <= 10000;
create table t2 as select * from all_objects where rownum <= 9950;
create index t2_idx on t2(object_id);

alter session set sql_trace=true;
select count(*)
from t1 rbo
where object_id not in ( select object_id from t2 );

select count(*)
from t1 rbo
where NOT EXISTS (select null from t2 where t2.object_id = rbo.object_id )
/
select count(*)
from t1, t2 rbo
where t1.object_id = rbo.object_id(+)
and rbo.object_id IS NULL
/

alter session set sql_trace=false;
analyze table t1 compute statistics
for table
for all indexes
for all indexed columns;
analyze table t2 compute statistics
for table
for all indexes
for all indexed columns;
alter session set sql_trace=true;
select count(*)
from t1 cbo
where object_id not in ( select object_id from t2 );

select count(*)
from t1 cbo
where NOT EXISTS (select null from t2 where t2.object_id = cbo.object_id )
/

select count(*)
from t1, t2 cbo
where t1.object_id = cbo.object_id(+)
and cbo.object_id IS NULL
/

now, lets look at the NOT IN performance of RBO vs CBO:

select count(*) from t1 rbo where object_id not in ( select object_id from t2 )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 17.55 18.69 134 735899 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 17.56 18.69 134 735899 0 1

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=735899 r=134 w=0 time=18696501 us)
50 FILTER (cr=735899 r=134 w=0 time=18696326 us)
10000 TABLE ACCESS FULL T1 (cr=138 r=134 w=0 time=69425 us)
9950 TABLE ACCESS FULL T2 (cr=735761 r=0 w=0 time=18393649 us)
********************************************************************************
select count(*) from t1 cbo where object_id not in ( select object_id from t2 )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.13 0.14 0 161 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.13 0.15 0 161 0 1

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=161 r=0 w=0 time=146502 us)
50 HASH JOIN ANTI (cr=161 r=0 w=0 time=146422 us)
10000 TABLE ACCESS FULL T1 (cr=138 r=0 w=0 time=16401 us)
9950 INDEX FULL SCAN T2_IDX (cr=23 r=0 w=0 time=16219 us)(object id 34305)
********************************************************************************

Well, that is startling -- isn't it? The RBO full scanned the T2 table 100,000 times! The CBO, will, once. It was slightly faster... Looking at other ways you MIGHT be able to write a NOT IN:

select count(*) from t1 rbo
where NOT EXISTS (select null from t2 where t2.object_id = rbo.object_id )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.22 0.21 22 20138 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.22 0.21 22 20138 0 1

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=20138 r=22 w=0 time=217043 us)
50 FILTER (cr=20138 r=22 w=0 time=216954 us)
10000 TABLE ACCESS FULL T1 (cr=138 r=0 w=0 time=24927 us)
9950 INDEX RANGE SCAN T2_IDX (cr=20000 r=22 w=0 time=109100 us)(object id 34305)
********************************************************************************
select count(*) from t1 cbo
where NOT EXISTS (select null from t2 where t2.object_id = cbo.object_id )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.03 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.19 0.19 0 20138 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.19 0.22 0 20138 0 1

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=20138 r=0 w=0 time=192747 us)
50 FILTER (cr=20138 r=0 w=0 time=192643 us)
10000 TABLE ACCESS FULL T1 (cr=138 r=0 w=0 time=22106 us)
9950 INDEX RANGE SCAN T2_IDX (cr=20000 r=0 w=0 time=94977 us)(object id 34305)


that is a tie -- but NOTE that this is NOT the same as "not in". To see why figure out why this two queries return different answers:

scott@ORA920> select count(*) from emp where empno not in ( select mgr from emp );

COUNT(*)
----------
0

scott@ORA920> select count(*) from emp
2 where not exists ( select null from emp e2 where e2.mgr = emp.empno );

COUNT(*)
----------
8

and think about NULLS..


And a third way:



select count(*)
from t1, t2 rbo
where t1.object_id = rbo.object_id(+)
and rbo.object_id IS NULL

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.20 0.23 0 10161 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.20 0.23 0 10161 0 1

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=10161 r=0 w=0 time=231584 us)
50 FILTER (cr=10161 r=0 w=0 time=231495 us)
10000 NESTED LOOPS OUTER (cr=10161 r=0 w=0 time=216922 us)
10000 TABLE ACCESS FULL T1 (cr=138 r=0 w=0 time=23132 us)
9950 INDEX RANGE SCAN T2_IDX (cr=10023 r=0 w=0 time=122286 us)(object id 34305)
********************************************************************************
select count(*)
from t1, t2 cbo
where t1.object_id = cbo.object_id(+)
and cbo.object_id IS NULL

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.16 0.16 0 161 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.16 0.17 0 161 0 1

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=161 r=0 w=0 time=168176 us)
50 FILTER (cr=161 r=0 w=0 time=168094 us)
10000 HASH JOIN OUTER (cr=161 r=0 w=0 time=154291 us)
10000 TABLE ACCESS FULL OBJ#(34303) (cr=138 r=0 w=0 time=15475 us)
9950 INDEX FULL SCAN OBJ#(34305) (cr=23 r=0 w=0 time=16363 us)(object id 34305)


here again, the CBO rules -- but note that this is really just emulating the first NOT IN and is not nearly as clear to the reader what it is doing....




So, not in is not something to be afraid of....

Rating

  (31 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Not clear...

Kashif, January 29, 2003 - 2:24 pm UTC

Hi Tom,

I'm not sure how you concluded that t2 will be full scanned 100,000 times. Do you mean 10,000? Also, how are you able to tell looking at the tkprof that t2 was full-scanned more than once? I would have thought that both t1 and t2 would be full table scanned once, and then their results compared. Is that not true? If there are any questions on your forum you might have previously answered explaining this or similar questions, please provide me with the link to that question. Thanks in advance.

Kashif

Tom Kyte
January 29, 2003 - 3:35 pm UTC

yes, 10,000 not 100,000 sorry about the extra zero.

the query plan tells me so...

the consistent gets backs it up -- the ONLY way they could be that high is to full scan that table over and over.

That was my point -- the RBO is very rudimentary here. It full scans T1 and applied a FILTER on T2 by full scanning T2 to see if the value was "not in t2"....




Mike, January 30, 2003 - 12:17 am UTC

We have seen HASH join very expensive on hugh tables (>1,000,000 rows) as comparison to the NL

Tom Kyte
January 30, 2003 - 8:50 am UTC

ever the contraryian, suggest you read:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6749454952894 <code>

there you will see the evil NL join literally killing your performance (but, giving you a very good cache hit ratio if you like that sort of stuff)

and tell you what -- take this example, extrapolate the data upto 1+ million rows and give it a whirl....

Different conclusion...

Kashif, October 06, 2003 - 12:59 pm UTC

Hi Tom,

I happened to revisit this question and tried out the test you had presented in your answer. I wanted to ascertain whether the drop in consistent gets when using the CBO (in your second query) was as a result of the reduced number of full scans of table t2. It turns out, however, that the drop in consistent gets was as a result of the index that was added; table t2 was scanned the same of number of times in both the RBO and CBO environments, per my testing. Here are my results:

/* My script: I did not include the exists and outer join
alternatives to NOT IN */

create table t1 as select * from all_objects where rownum <= 10000;
create table t2 as select * from all_objects where rownum <= 9950;
--create index t2_idx on t2(object_id); --Excluded the index creation for now...

alter session set sql_trace=true;
select count(*)
from t1 rbo
where object_id not in ( select object_id from t2 );

alter session set sql_trace=false;
analyze table t1 compute statistics
for table
for all indexes
for all indexed columns;
analyze table t2 compute statistics
for table
for all indexes
for all indexed columns;
alter session set sql_trace=true;
select count(*)
from t1 cbo
where object_id not in ( select object_id from t2 );
create index t2_idx on t2(object_id); --Added the index over here.
analyze table t2 compute statistics
for table
for all indexes
for all indexed columns;
select count(*)
from t1 cbo2
where object_id not in ( select object_id from t2 );

Results from tkprof:

Without the index, RBO:

select count(*)
from t1 rbo
where object_id not in ( select object_id from t2 )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 1007332 1492552 212229 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 1007332 1492552 212229 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 1885

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
50 FILTER
10001 TABLE ACCESS FULL T1
10000 TABLE ACCESS FULL T2

Without the index, CBO:

select count(*)
from t1 cbo
where object_id not in ( select object_id from t2 )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 1492552 212229 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 1492552 212229 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 1885

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
50 FILTER
10001 TABLE ACCESS FULL T1
10000 TABLE ACCESS FULL T2

The consistent gets for the query in RBO and CBO modes is the same without the index, which tells me that the difference in your results was a result of the added index, and NOT because of the number of times t2 is full-scanned is reduced. I also threw the indexed query in at the end (I added the index and then reanalyzed table t2, and ran the query):

select count(*)
from t1 cbo2
where object_id not in ( select object_id from t2 )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 283 20294 33 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 283 20294 33 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 1885

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
50 FILTER
10001 TABLE ACCESS FULL T1
10000 INDEX RANGE SCAN (object id 298547)

The consistent gets do go down, but the t2 query still gets executed 10,000 times, as proven by this test:

kkashif@DEV> exec dbms_application_info.set_client_info( 0 )

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.40
kkashif@DEV> create or replace function client_info return varchar2
2 as
3 begin
4 dbms_application_info.set_client_info( userenv('client_info')+1 );
5 -- return userenv('client_info');
6 return ('A');
7 end;
8 /

Function created.

Elapsed: 00:00:00.41
kkashif@DEV> ed
Wrote file afiedt.buf

1 select count(*)
2 from t1 cbo3
3* where object_id not in ( select object_id from t2 where client_info = 'A')
kkashif@DEV> /

COUNT(*)
----------
50

Elapsed: 00:00:02.33
kkashif@DEV> select userenv ('client_info') from dual
2 /

USERENV('CLIENT_INFO')
----------------------------------------------------------------
10000

Elapsed: 00:00:00.50
kkashif@DEV>

So the results indicate that the NOT IN clause does in fact execute the same number of times using the RBO and CBO. Am I missing something? Thanks.

Kashif

Tom Kyte
October 06, 2003 - 1:53 pm UTC

that is NOT a cbo plan you have there -- no cost, no card!

the cbo plan has a hash anti join:


Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=161 r=0 w=0 time=146502 us)
50 HASH JOIN ANTI (cr=161 r=0 w=0 time=146422 us)
10000 TABLE ACCESS FULL T1 (cr=138 r=0 w=0 time=16401 us)
9950 INDEX FULL SCAN T2_IDX (cr=23 r=0 w=0 time=16219 us)(object id 34305)


that scans 1 time! you are looking at the wrong plan.



Interesting...

Kashif, October 06, 2003 - 3:08 pm UTC

Hi Tom,

The same query in the same exact environment with the same statistics etc. produces the following plan and statistics, when executed after turning autotrace on instead of using sql_trace:

kkashif@DEV> set autotrace traceonly
kkashif@DEV> select count(*)
2 from t1 cbo5
3 where object_id not in ( select object_id from t2 )
4 /

Elapsed: 00:00:00.21

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=44 Card=1 Bytes=5)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'T1' (Cost=44 Card=500 Bytes=2500)

4 2 INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE) (Cost=1 Card=1 Bytes=5)





Statistics
----------------------------------------------------------
0 recursive calls
33 db block gets
20294 consistent gets
283 physical reads
0 redo size
367 bytes sent via SQL*Net to client
430 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1 rows processed

kkashif@DEV>

As you can see, the query shows the cost, cards etc. in the plan. So I guess now I have different questions:

Why would there be a difference in the plans shown by autotrace and sql_trace? Is there a setting in sql_trace that will cause the cost, cards etc. to show up that I have not set? Thanks.

Kashif

Tom Kyte
October 06, 2003 - 3:45 pm UTC

are you actually observing two different plans?

you ran this with sql_trace and get a hash anti join and then run with autotrace and get a different plan?


I cannot reproduce that behaviour -- but -- is that what you are seeing?

Kashif, October 06, 2003 - 4:47 pm UTC

Hi Tom,

Actually, I get the same plan in both the autotrace and sql_trace traces, but the autotrace produces a plan with the card and cost information, whereas the sql_trace produces the same plan without the cost and card information:

In SQL_TRACE:

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
50 FILTER
10001 TABLE ACCESS FULL T1
10000 INDEX RANGE SCAN (object id 298547)

In Autotrace:

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=44 Card=1 Bytes=5)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'T1' (Cost=44 Card=500 Bytes=2500)

4 2 INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE) (Cost=1 Card=1
Bytes=5)

However, I checked the setting on the always_anti_join parameter, and it was set as NESTED_LOOPS. So I changed it to HASH and executed the query again. This time, the plan is similar to your plan (i.e. with the hash anti join), but still no cost/card info in the trace file!

select count(*)
from t1 cbo5
where object_id not in ( select object_id from t2)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 275 341 39 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 275 341 39 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 1885

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
50 HASH JOIN ANTI
10000 TABLE ACCESS FULL T1
9950 VIEW VW_NSO_1
9950 INDEX FAST FULL SCAN (object id 298547)

The autotrace plan though, again, shows the cost and card info:

kkashif@DEV> select count(*)
2 from t1 cbo5
3 where object_id not in ( select object_id from t2 )
4 /

Elapsed: 00:00:00.31

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=114 Card=1 Bytes=18)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (ANTI) (Cost=114 Card=50 Bytes=900.00000000000
1)

3 2 TABLE ACCESS (FULL) OF 'T1' (Cost=44 Card=10000 Bytes=
50000)

4 2 VIEW OF 'VW_NSO_1' (Cost=8 Card=9950 Bytes=129350)
5 4 INDEX (FAST FULL SCAN) OF 'T2_IDX' (NON-UNIQUE) (Cos
t=8 Card=9950 Bytes=49750)

Strange... If this is becoming too long, I'll wait for when you are accepting questions and post it as a new question. Thanks!

Kashif

Tom Kyte
October 06, 2003 - 6:14 pm UTC

thats accurate -- my "wrong" above. the STAT records in the trace file won't have the cost/card -- sorry about that.

only if you use explain=u/p will you get the cost/card in the trace.

In anycase, you discovered "the problem"

Got it.

Kashif, October 07, 2003 - 9:37 am UTC

Hi Tom,

Yep, I now understand why the query was being executed multiple times even in the CBO environment (always_anti_join was set to NESTED_LOOPS). Though I wonder why Oracle set the default value for the always_anti_join parameter to NESTED_LOOPS. Curious to know if you have any explanation/insight. Thanks again for your insight.

Kashif

Tom Kyte
October 07, 2003 - 9:44 am UTC

in current releases of the software, this is a hidden parameter and has a new value "choose" meaning it can do what it wants.

in older releases, it defaulted to the way it was first implemented for backwards compatibility.

anti join

A reader, March 31, 2004 - 3:03 am UTC

Hi

I have this query to compare table structure differences

select table_name, column_name, data_type, data_length, data_precision
from user_tab_columns a
where column_name not in (select column_name
from user_tab_columns@oe_his b
where a.table_name = b.table_name)
and table_name in (select nombre_tabla
from his_lista_tablas@oe_his)

this shows me the tables in my local database which has more columns than the old tables using db link.

My question is I would like to show columns from user_tab_columns@VAN in my query result as well. Since this is an anti join how can I accomplish this?

Tom Kyte
March 31, 2004 - 8:38 am UTC

in 9i, you could use a FULL OUTER join.

but, you don't have any versions :( so I cannot tell you what to do....

hi my version is 8.1.7.4

A reader, March 31, 2004 - 9:34 am UTC

hi my version is 8.1.7.4

so we dont have full outer join :-(

Tom Kyte
March 31, 2004 - 9:47 am UTC

you would have to simiulate then


select ...
from t1, t2
where t1.key = t2.key(+)
UNION ALL
select ....
from t1, t2
where t1.key(+) = t2.key
AND t1.key is NULL
/


that is all the syntatic sugar of a "full outer join" does under the covers...

top half gets everything in T1 and if there is something in T2, gets it. bottom half gets everything in t2 that isn't in T1 since we already got that.

hmm but I am trying to get this output

A reader, April 01, 2004 - 3:15 am UTC

Right now I have this output

TABLE_NAME COLUMN_NAME DATA_TYPE DATA_LENGTH
----------------- ------------- ---------- -----------
OM_CUSTOMER_INST COMMENTS VARCHAR2 2000
OM_CUSTOMER_INST COMMENTS VARCHAR2 4000

I would like to have the following output which I dont think can be done using full outer join :-?

TABLE_NAME COLUMN_NAME DATA_TYPE DATA_LENGTH TABLE_NAME COLUMN_NAME DATA_TYPE DATA_LENGTH
----------------- ------------- ---------- ----------- ----------------- ------------- ---------- -----------
OM_CUSTOMER_INST COMMENTS VARCHAR2 2000 OM_CUSTOMER_INST COMMENTS VARCHAR2 4000


Tom Kyte
April 01, 2004 - 10:32 am UTC

a join is what is needed exactly to get that second output.


ops$tkyte@ORA9IR2> create table t1( tname varchar2(5), cname varchar2(5), dtype varchar(10), dlen number );
 
Table created.
 
ops$tkyte@ORA9IR2> create table t2( tname varchar2(5), cname varchar2(5), dtype varchar(10), dlen number );
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t1 values ( 'T1', 'C1', 'VARCHAR2', 2000 );
 
1 row created.
 
ops$tkyte@ORA9IR2> insert into t1 values ( 'T1', 'C2', 'NUMBER', 38 );
 
1 row created.
 
ops$tkyte@ORA9IR2> insert into t2 values ( 'T1', 'C1', 'VARCHAR2', 4000 );
 
1 row created.
 
ops$tkyte@ORA9IR2> insert into t2 values ( 'T1', 'C3', 'DATE', null );
 
1 row created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select *
  2    from t1 full outer join t2 on ( t1.tname = t2.tname and t1.cname = t2.cname )
  3  /
 
TNAME CNAME DTYPE            DLEN TNAME CNAME DTYPE            DLEN
----- ----- ---------- ---------- ----- ----- ---------- ----------
T1    C1    VARCHAR2         2000 T1    C1    VARCHAR2         4000
T1    C2    NUMBER             38
                                  T1    C3    DATE
 
ops$tkyte@ORA9IR2>
 

anti join

Venkat, April 19, 2004 - 12:47 am UTC

Tom,

When we tried the example for anit join the way you have shown it, the "consistent gets" has reduced significantly in anti join with cbo whereas the response time has gone up a bit when compared to anti join rbo. Can you please expalin this phenomena?
===========================================================
SQL> create table x as select * from dba_objects where rownum<7001;

Table created.

Elapsed: 00:00:00.47
SQL> create table y as select * from dba_objects where rownum<6951;

Table created.

Elapsed: 00:00:00.47
SQL> alter table y modify(object_id not null);

Table altered.

Elapsed: 00:00:00.18
SQL> create index y_idx on y(object_id);

Index created.

Elapsed: 00:00:00.99
SQL> alter session set optimizer_mode=RULE;

Session altered.

Elapsed: 00:00:00.00
SQL> set autotrace traceonly
SQL> set timing on
SQL> select count(*) from x where object_id not in(select object_id from y);

Elapsed: 00:00:14.97

Execution Plan
----------------------------------------------------------                      
   0      SELECT STATEMENT Optimizer=RULE                                       
   1    0   SORT (AGGREGATE)                                                    
   2    1     FILTER                                                            
   3    2       TABLE ACCESS (FULL) OF 'X'                                      
   4    2       TABLE ACCESS (FULL) OF 'Y'                                      




Statistics
----------------------------------------------------------                      
          0  recursive calls                                                    
          0  db block gets                                                      
     312696  consistent gets                                                    
         86  physical reads                                                     
          0  redo size                                                          
        491  bytes sent via SQL*Net to client                                   
        655  bytes received via SQL*Net from client                             
          2  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
          1  rows processed                                                     

SQL> alter session set optimizer_mode=CHOOSE;

Session altered.

Elapsed: 00:00:00.00
SQL> analyze table x compute statistics for table for all indexes for all indexed columns;

Table analyzed.

Elapsed: 00:00:00.27
SQL> analyze table y compute statistics for table for all indexes for all indexed columns;

Table analyzed.

Elapsed: 00:00:00.81

SQL> select count(*) from x where object_id not in(select object_id from y);

Elapsed: 00:00:15.41

Execution Plan
----------------------------------------------------------                      
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=31 Card=1 Bytes=13)           
   1    0   SORT (AGGREGATE)                                                    
   2    1     FILTER                                                            
   3    2       TABLE ACCESS (FULL) OF 'X' (Cost=15 Card=350 Bytes=455          
          0)                                                                    
                                                                                
   4    2       INDEX (FULL SCAN) OF 'Y_IDX' (NON-UNIQUE) (Cost=16 Car          
          d=348 Bytes=1044)                                                     
                                                                                




Statistics
----------------------------------------------------------                      
         61  recursive calls                                                    
          0  db block gets                                                      
      61740  consistent gets                                                    
          0  physical reads                                                     
          0  redo size                                                          
        491  bytes sent via SQL*Net to client                                   
        655  bytes received via SQL*Net from client                             
          2  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
          1  rows processed                                                     
=======================================================

Thanks,
Venkat 

Tom Kyte
April 19, 2004 - 6:39 am UTC

do not measure things with 'set timing on' and consider things a second or two apart "different".

too many things in between you and the timing.

use TKPROF.

set timing on is good to measure things that take minutes (if you use set timing and the difference between approach A and B is say 1.5 minutes, that is meaningful. If the difference is 1.5 seconds -- probably not).




NOT EXISTS

Jennifer Chen, May 08, 2004 - 3:57 pm UTC

Hi Tom,

I need to retrieve data in table alias_snapshot, but not in table iii_snapshot, which means both sid and fbi do not match:

SELECT a.SID chr_sid, a.fbi chr_fbi, a.nam chr_nam,
a.iii_status chr_iii_status, a.sex chr_sex,
a.dob chr_dob, a.rac chr_rac, a.iffs chr_iffs
FROM alias.alias_snapshot a
WHERE NOT EXISTS (
SELECT 1
FROM alias.iii_snapshot iii
WHERE a.SID = iii.SID OR a.fbi = iii.fbi)

Table alias_snapshot contains 1.5 million rows and table iii_snapshot contains roughly 1.2 million rows. This query takes forever. Do you have a better way to retrieve this dataset?

Thanks in advance for your help


Tom Kyte
May 10, 2004 - 7:20 am UTC

use the CBO and use NOT IN


select ...
from t1
where ( columns ) not in ( select columns from t2 )


brutally efficient and fast.

clarification about anti joins

amit, January 20, 2005 - 10:06 am UTC

 1  explain plan for
  2  UPDATE /*+ dynamic_sampling(C 10) */ yloci.oci_section C
  3      set         sect_syllabus_url = null,
  4          sect_syllabus_last_dated = sysdate
  5      where sect_syllabus_url is not null
  6        and sect_term = :b1
  7        and (sect_term,crse_subject_id,crse_course_number,nvl(C.sect_syllabus_url,'X')) not in
  8*          (select /*+ HASH_AJ */ sect_term,crse_subject_id,crse_course_number,nvl(S.crse_syllabus_url,'X')  from yloci.oci_syllabus_load S)
  9  /

Explained.

SQL> @?/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------
| Id  | Operation            |  Name              | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | UPDATE STATEMENT     |                    |    39 |  1482 |   104K|
|   1 |  UPDATE              | OCI_SECTION        |       |       |       |
|*  2 |   FILTER             |                    |       |       |       |
|*  3 |    TABLE ACCESS FULL | OCI_SECTION        |    39 |  1482 |    30 |
|*  4 |    TABLE ACCESS FULL | OCI_SYLLABUS_LOAD  |     9 |   549 |  2671 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( NOT EXISTS (SELECT /*+ HASH_AJ */ 0 FROM
              "YLOCI"."OCI_SYLLABUS_LOAD" "S" WHERE LNNVL("S"."SECT_TERM"<>:B1) AND
              LNNVL("S"."CRSE_SUBJECT_ID"<>:B2) AND LNNVL("S"."CRSE_COURSE_NUMBER"<>:B3)
              AND LNNVL(NVL(:B4,'X')<>NVL("S"."CRSE_SYLLABUS_URL",'X'))))
   3 - filter("SYS_ALIAS_4"."SECT_SYLLABUS_URL" IS NOT NULL AND
              "SYS_ALIAS_4"."SECT_TERM"=:Z)
   4 - filter(LNNVL("S"."SECT_TERM"<>:B1) AND
              LNNVL("S"."CRSE_SUBJECT_ID"<>:B2) AND LNNVL("S"."CRSE_COURSE_NUMBER"<>:B3)
              AND LNNVL(NVL(:B4,'X')<>NVL("S"."CRSE_SYLLABUS_URL",'X')))

Note: cpu costing is off

If I remove the nvl(..) column from the not it condition

  1  explain plan for
  2  UPDATE /*+ dynamic_sampling(C 10) */ yloci.oci_section C
  3      set         sect_syllabus_url = null,
  4          sect_syllabus_last_dated = sysdate
  5      where sect_syllabus_url is not null
  6        and sect_term = :b1
  7        and (sect_term,crse_subject_id,crse_course_number) not in
  8*          (select /*+ HASH_AJ */ sect_term,crse_subject_id,crse_course_number  from yloci.oci_syllabus_load S)
  9  /

Explained.

SQL> @?/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
| Id  | Operation              |  Name                 | Rows  | Bytes | Cost  |
--------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT       |                       |   150 |  8250 |    67 |
|   1 |  UPDATE                | OCI_SECTION           |       |       |       |
|*  2 |   HASH JOIN ANTI       |                       |   150 |  8250 |    67 |
|*  3 |    TABLE ACCESS FULL   | OCI_SECTION           |   770 | 29260 |    30 |
|   4 |    INDEX FAST FULL SCAN| X1_OCI_SYLLABUS_LOAD  |  1470K|    23M|     4 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("C"."SECT_TERM"="S"."SECT_TERM" AND
              "C"."CRSE_SUBJECT_ID"="S"."CRSE_SUBJECT_ID" AND
              "C"."CRSE_COURSE_NUMBER"="S"."CRSE_COURSE_NUMBER")
   3 - filter("C"."SECT_SYLLABUS_URL" IS NOT NULL AND "C"."SECT_TERM"=:Z)

Note: cpu costing is off


I thought the nvl thing makes the column not null Hence cbo can still use the ANTI join even with the nvl column.

Where am I going wrong ? 

Tom Kyte
January 20, 2005 - 10:48 am UTC

(select /*+ HASH_AJ */
sect_term,crse_subject_id,crse_course_number from yloci.oci_syllabus_load S)

can that return null values -- you would add

where sect_term is not null and crse_subject_id is not ........

nvl is just a function. a function that could return NULL.

nvl is just a function...

padders, January 20, 2005 - 11:21 am UTC

...or is it?

The 10g optimizer appears to appreciate that NVL with non-null literal equates to NOT NULL.

Personal Oracle Database 10g Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> SET AUTOTRACE ON EXPLAIN;
SQL> SELECT empno
  2  FROM   emp
  3  WHERE  ename NOT IN (
  4         SELECT ename
  5         FROM   emp);

no rows selected

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=24 Card=13 Bytes=104)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=112)
   3    1     TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=1 Bytes=5)

SQL> SELECT empno
  2  FROM   emp
  3  WHERE  NVL (ename, 'X') NOT IN (
  4         SELECT NVL (ename, 'X')
  5         FROM   emp);

no rows selected

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=7 Card=1 Bytes=13)
   1    0   HASH JOIN (ANTI) (Cost=7 Card=1 Bytes=13)
   2    1     TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=112)
   3    1     TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=70)

SQL>  

Tom Kyte
January 20, 2005 - 7:13 pm UTC

but don't count on it. can you think of a case whereby the above would not hold true? :)



more clarification

amit poddar, January 20, 2005 - 1:53 pm UTC

sect_term,crse_subject_id,crse_course_number,nvl(S.crse_syllabus_url,'X')

sect_term,crse_subject_id,crse_course_number
these three columns are defined as not null in the table.
The only column nullable is crse_syllabus_url which I thought I can make non nullable by using nvl.

So are you saying in this case anti join cannot be used since nvl is not sufficeint to make this column not null.


Tom Kyte
January 20, 2005 - 7:33 pm UTC

actually, it looks like it should be able to work:

scott@ORA10G> set autotrace on
scott@ORA10G> update(
2 select /*+ ALL_ROWS */ *
3 FROM emp
4 WHERE (NVL (ename, 'X'), empno ) NOT IN (
5 SELECT NVL (ename, 'X'), empno
6 FROM emp))
7 set empno = 55;

0 rows updated.


Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=HINT: ALL_ROWS (Cost=7 Card=1 Bytes=20)
1 0 UPDATE OF 'EMP'
2 1 HASH JOIN (ANTI) (Cost=7 Card=1 Bytes=20)
3 2 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=140)
4 2 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=140)



could be that since, well -- a hint is a hint...... it is doing the right thing.


do you have a fully reproducible testcase -- the entire thing, we can comment more fully with that. something we can run to get the same result.

Elaborate?

A reader, January 20, 2005 - 9:20 pm UTC

Could you elaborate your comment a couple posts above:
"but don't count on it. can you think of a case whereby the above would not hold true? :)"

When would NVL( SCHEMA.TABLE.COLUMN, NON-NULL LITERAL ) not be treated as a NOT NULL? (way to many negative things in that sentence)


Tom Kyte
January 20, 2005 - 11:11 pm UTC

I'm asking you to guess :) think about it -- what is a feature of the database since 8i that I'm not fond of that might change all of this.....

example as you said.

amit poddar, January 20, 2005 - 10:37 pm UTC

CREATE TABLE OCI_SECTION
( SECT_TERM VARCHAR2(6) NOT NULL,
CRSE_SUBJECT_ID VARCHAR2(4) NOT NULL,
CRSE_COURSE_NUMBER VARCHAR2(5) NOT NULL,
SECT_SYLLABUS_URL VARCHAR2(100),
SECT_SYLLABUS_LAST_DATED DATE NOT NULL
);

insert into oci_section values ('200501','CHEM','114','htt://classes.yale.edu/syllabus/chem114b/
',to_date('01/08/2005 18:06:48','mm/dd/yyyy hh24:mi:ss'));
insert into oci_section values ('200501','HSHM','217','htt://classes.yale.edu/syllabus/hshm217b/
',to_date('01/08/2005 21:07:04','mm/dd/yyyy hh24:mi:ss'));
insert into oci_section values ('200501','CPSC','478','htt://classes.yale.edu/syllabus/cpsc478b/
',to_date('01/08/2005 21:07:04','mm/dd/yyyy hh24:mi:ss'));
insert into oci_section values ('200501','SPAN','224','htt://classes.yale.edu/syllabus/span224-1
b/',to_date('01/08/2005 23:07:49','mm/dd/yyyy hh24:mi:ss'));
insert into oci_section values ('200501','PSYC','165','htt://classes.yale.edu/syllabus/psyc165b/
',to_date('01/13/2005 14:10:13','mm/dd/yyyy hh24:mi:ss'));
insert into oci_section values ('200501','AFST','603','htt://classes.yale.edu/syllabus/afst603b/
',to_date('01/08/2005 18:06:48','mm/dd/yyyy hh24:mi:ss'));
insert into oci_section values ('200501','JDST','235',null,to_date('01/07/2005 15:11:31','mm/dd/
yyyy hh24:mi:ss'));
insert into oci_section values ('200501','JDST','265',null,to_date('01/07/2005 15:11:31','mm/dd/
yyyy hh24:mi:ss'));
insert into oci_section values ('200501','JDST','270','htt://classes.yale.edu/syllabus/jdst270b/
',to_date('01/08/2005 09:12:43','mm/dd/yyyy hh24:mi:ss'));

begin
dbms_stats.gather_table_stats(OWNNAME=>'OBJ_MIG', TABNAME=>'OCI_SECTION');
end;
/

CREATE TABLE OCI_SYLLABUS_LOAD
( SECT_TERM VARCHAR2(6) NOT NULL ,
CRSE_SUBJECT_ID VARCHAR2(4) NOT NULL ,
CRSE_COURSE_NUMBER VARCHAR2(5) NOT NULL ,
CRSE_SECTION VARCHAR2(3),
CRSE_SYLLABUS_URL VARCHAR2(100),
LAST_UPDATED DATE
);

insert into oci_syllabus_load values('200403','WGSS','371','01','htt://classes.yale.edu/syllabus
/wgss371a/',null);
insert into oci_syllabus_load values('200403','WGSS','446','01','htt://classes.yale.edu/syllabus
/wgss446a/',null);
insert into oci_syllabus_load values('200403','WGSS','746','01','htt://classes.yale.edu/syllabus
/wgss746a/',null);
insert into oci_syllabus_load values('200403','WGST','745','01','htt://classes.yale.edu/syllabus
/wgst745a/',null);
insert into oci_syllabus_load values('200403','SPAN','938','01','htt://classes.yale.edu/syllabus
/span938a/',null);
insert into oci_syllabus_load values('200403','STAT','100','01','htt://classes.yale.edu/syllabus
/stat100a/',null);
insert into oci_syllabus_load values('200403','STAT','101','01','htt://classes.yale.edu/syllabus
/stat101a/',null);
insert into oci_syllabus_load values('200403','STAT','102','01','htt://classes.yale.edu/syllabus
/stat102a/',null);
insert into oci_syllabus_load values('200403','STAT','103','01','htt://classes.yale.edu/syllabus
/stat103a/',null);

begin
dbms_stats.gather_table_stats(OWNNAME=>'OBJ_MIG', TABNAME=>'OCI_SYLLABUS_LOAD');
end;
/

explain plan for
UPDATE oci_section C
set sect_syllabus_url = null,
sect_syllabus_last_dated = sysdate
where sect_syllabus_url is not null
and sect_term = :b1
and (sect_term,
crse_subject_id,
crse_course_number,
nvl(C.sect_syllabus_url,'X')
) not in
(select sect_term,
crse_subject_id,
crse_course_number,
nvl(S.crse_syllabus_url,'X')
from oci_syllabus_load S);

select plan_table_output from table(dbms_xplan.display());



Tom Kyte
January 20, 2005 - 11:51 pm UTC

it 10g, it does anti join, in 9i -- nvl is just a function.

Unless

padders, January 21, 2005 - 6:34 am UTC

Unless (as Tom hints at above) you happen to be using this feature...

Personal Oracle Database 10g Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> ALTER SESSION SET cursor_sharing = FORCE;

Session altered.

SQL> SELECT empno 
  2  FROM   emp 
  3  WHERE  NVL (ename, 'X') NOT IN ( 
  4         SELECT NVL (ename, 'X') 
  5         FROM   emp);

no rows selected

SQL> SELECT plan_table_output
  2  FROM   TABLE (DBMS_XPLAN.DISPLAY_CURSOR ((
  3         SELECT sql_id
  4         FROM   v$sql
  5         WHERE  sql_text LIKE LOWER ('%EMPNO%EMP%ENAME%'))));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  f6547mcu7944m, child number 0
-------------------------------------
SELECT empno FROM   emp WHERE  NVL (ename, :"SYS_B_0") NOT IN (
SELECT NVL (ename, :"SYS_B_1")        FROM   emp)

Plan hash value: 84080821

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    24 (100)|          |
--------------------------------------------------------------------------------
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| EMP  |    14 |   112 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| EMP  |     1 |     5 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( IS NULL)
   3 - filter(LNNVL(NVL(:B1,:SYS_B_0)<>NVL("ENAME",:SYS_B_1)))


22 rows selected.

SQL>  

Tom Kyte
January 21, 2005 - 8:28 am UTC

precisely :)

I'm not sure that's the same

A reader, January 21, 2005 - 7:18 am UTC

I thought the NVL( SOMETHING, LITERAL ) was in reference to what the optimizer saw? With cursor_sharing = force, the optimizer will never see literals.

Regardless, is there ever a time NVL( SOMETHING, LITERAL ) would not be considered NOT NULL if that's what the optimizer saw in the query?

Tom Kyte
January 21, 2005 - 8:39 am UTC

in 9i, apparently so -- it is not considered "not null". you would outer join instead, keeping rows where the outer joined to table is NULL -- mimicking the anti-join.

always_anti_join parameter

Tom, January 21, 2005 - 7:29 am UTC

You mentioned in older releases always_anti_join=NESTED_LOOPS by default. Is there any way in 8.1.7 to "remove" this setting, so that the optimizer will choose HASH or NESTED_LOOPS depending on which it thinks is better?

If not, how do we decide which value is best for this setting in an OLTP environment? The hash anti join is excellent for large result sets but my feeling [non-scientific currently] is that nested_loops will be better for first rows type optimization.

Suggestions.

Tom Kyte
January 21, 2005 - 8:43 am UTC

suggestion: leave the setting as it is.


it is one of three values in 8i, only.


using outer join for mimiking anti join

amit poddar, January 21, 2005 - 1:02 pm UTC

Hi,

" you would outer join instead, keeping rows where the outer joined to table is NULL -- mimicking the anti-join. "

How would the above apply to

UPDATE oci_section C
set sect_syllabus_url = null,
sect_syllabus_last_dated = sysdate
where sect_syllabus_url is not null
and sect_term = :b1
and (sect_term,
crse_subject_id,
crse_course_number,
nvl(C.sect_syllabus_url,'X')
) not in
(select sect_term,
crse_subject_id,
crse_course_number,
nvl(S.crse_syllabus_url,'X')
from oci_syllabus_load S);

unless you are updating the join.
But to update the join

we would need a unique index on
oci_syllabus_load
(sect_term,crse_subject_id,crse_course_number,nvl(crse_syllabus_url,'X'))

correct ?



Tom Kyte
January 21, 2005 - 6:48 pm UTC

actually, it would not work -- because of the lack of a unique constraint or primary key on "S"


so, you would be using not exists here most likely for the update.

using outer join for mimiking anti join

amit poddar, January 21, 2005 - 1:27 pm UTC

"/mike_specific/ora_admin/generic/scripts/afiedt.buf" 14 lines, 537 characters 

  1  UPDATE (select c.sect_syllabus_url,
  2                 c.sect_syllabus_last_dated,
  3                 c.sect_term
  4           from oci_section c,
  5                oci_syllabus_load s
  6           where c.sect_term(+) = s.sect_term and
  7                 c.crse_subject_id(+) = s.crse_subject_id and
  8                 c.crse_course_number(+) = s.crse_course_number and
  9                 nvl(c.sect_syllabus_url,'X') (+) = nvl(s.crse_syllabus_url,'X') and
 10                 c.sect_term is null
 11         )
 12  set sect_syllabus_url=null,
 13*     sect_syllabus_last_dated=sysdate
SQL> /
               nvl(c.sect_syllabus_url,'X') (+) = nvl(s.crse_syllabus_url,'X') and
                                              *
ERROR at line 9:
ORA-00936: missing expression


SQL> 

It seems I can't outer join on a nvl
So how would I write this update join using the outer join method you mentioned.
 

Tom Kyte
January 21, 2005 - 7:37 pm UTC

the (+) would go with the column -- inside of the function, but you'll get "not a key preserved table" so it won't actually work with the update.

NOT IN is brutally slow. HELP!!!!

A reader, June 06, 2007 - 3:05 pm UTC

Hi Tom,
Could you please help me? For some reason NOT IN is brutally slow for me. Can you please suggest what could be wrong? This is **very** crucial for us since we use NOT IN all over the place. We are using oracle version 9.2.0.8.

The below select stmt did not return even after 45 minutes!

Dev_id is indexed in request table and is the PK in device table. The stats are collected.

select dev_id from request where dev_id not in (select dev_id from device);


set autotrace traceonly explain

select dev_id  from request where dev_id not in (select dev_id from device);

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=6709 Card=1 Bytes=6)
   1    0   SORT (AGGREGATE)
   2    1     FILTER
   3    2       TABLE ACCESS (FULL) OF 'REQUEST' (Cost=2517 Card=38939
           Bytes=233634)

   4    2       INDEX (FULL SCAN) OF 'PK_DEVICE' (UNIQUE) (Cost=4192 C
          ard=79074 Bytes=474444)


set autotrace off
set timin on

SQL> select table_name,num_rows,last_analyzed from user_tables where table_name in ('REQUEST','DEVICE')

TABLE_NAME                       NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
DEVICE                            1581482 25-MAR-07
REQUEST                            778788 24-MAR-07

SQL> select count(*) from device;

  COUNT(*)
----------
   1624924

Elapsed: 00:00:01.99

SQL> select count(*) from request;

  COUNT(*)
----------
    806131

Elapsed: 00:00:01.46

SQL> select index_name from user_ind_columns where table_name='REQUEST' and column_name='DEV_ID';

INDEX_NAME
-------------
DUP_REQUEST_2


Thanks so much in advance!
Tom Kyte
June 06, 2007 - 9:29 pm UTC

no creates
no look

i want evidence that device in the subquery is NOT NULL

Found a solution - sort of

A reader, June 06, 2007 - 6:26 pm UTC

Hi Tom,
I was able to at least make the query return by adding a NOT NULL clause. This time it used anti-join?? rather than FTS.

select dev_id from request where dev_id is not null and dev_id not in (select dev_id from device)


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2429 Card=1 Bytes=12
          )

   1    0   NESTED LOOPS (ANTI) (Cost=2429 Card=1 Bytes=12)
   2    1     INDEX (FAST FULL SCAN) OF 'DUP_REQUEST_2' (NON-UNIQUE) (
          Cost=178 Card=778788 Bytes=4672728)

   3    1     INDEX (UNIQUE SCAN) OF 'PK_DEVICE' (UNIQUE) (Cost=1 Card
          =1581482 Bytes=9488892)

no rows selected

Elapsed: 00:00:15.11


So I have 2 questions for you:
1) why did I have to add the NOT NULL clause? Is this something I can instruct Oracle to handle automatically?

2) 15 secs is still too long. How can I further reduce this?

Thanks again.

A reader, June 07, 2007 - 5:00 am UTC

Tom,

Replying to the original post you had said there is a difference in NOT IN and NOT EXISTS with the following example:


that is a tie -- but NOTE that this is NOT the same as "not in". To see why figure out
why this two queries return different answers:

scott@ORA920> select count(*) from emp where empno not in ( select mgr from emp );

COUNT(*)
----------
0

scott@ORA920> select count(*) from emp
2 where not exists ( select null from emp e2 where e2.mgr = emp.empno );

COUNT(*)
----------
8

and think about NULLS..


Now,

I assumed it was because of the null value in the MGR column:

select count(*) from emp
where empno not in (select nvl(mgr,-01) from emp)
/

8 rows selected


Now,

I assumed Oracle would change the NOT IN to something like `NOT this OR NOT this¿. Tried with a simplistic example

1 select *
2 from dual
3* where 1 <> 2 or 1 <> null
z001698@DEVELOP > /

D
-
X

But, it doesn¿t work like the above query:


1 select *
2 from dual
3* where 1 not in (2,null)
z001698@DEVELOP > /

no rows selected


The question is

1) The NOT IN does not work like NOT EXISTS because the NULL value is indeterminate?
2) The NOT EXISTS work because its not allowed to do NULL = something even NULL=NULL is the row being not selected?
3) Why does the IN or NOT IN work like (`This OR This¿)


Thanks

Ravi

Tom Kyte
June 07, 2007 - 2:46 pm UTC

1) correct
2) correct
3) because ANSI said it should. It is in the definition of the language itself.

A reader, June 08, 2007 - 5:45 am UTC

x in ('A', null) --> (x = 'A') OR (x = null)
x not in ('A', null) --> (x <> 'A') AND (x <> null)


logic rules are

NOT (a OR b) = (NOT A) AND (NOT B)
                       ^^^


So it will be AND instead of OR ...

A reader, September 30, 2007 - 11:17 pm UTC

Hi Tom,

I need to optimize a SQL query which does an anti-join on one table.

select A from table_t
where A not in
(select distinct A from table_t
where B in ('ABC','DEF') and C ='LS')
I read about using NOT EXISTS in place of NOT IN but i can't seem to implement it into my query as the examples shown uses anti-join on 2 tables.

Is there a way to optimize an anti-join on one table? Will appreciate it if you can show me a link or so.

Thanks a lot!

Tom Kyte
October 03, 2007 - 2:29 pm UTC

not in and not exists are not equivalent in general - but when they are - then the optimizer knows both are possible and costs them both out.

eg: if you can replace the above with not exists, the optimizer probably already has.


Now, the distinct - not necessary (and the optimizer knows that and has discarded it for you)

If A is NULLABLE in table_t - bear in mind that not in works differently than not exists:

Table created.

ops$tkyte%ORA10GR2> insert into t1 values ( 1, 'xyz', 'xx' );

1 row created.

ops$tkyte%ORA10GR2> insert into t1 values ( 2, 'xyz', 'xx' );

1 row created.

ops$tkyte%ORA10GR2> insert into t1 values ( 2, 'ABC', 'LS' );

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select a
  2    from t1
  3   where a not in (select a from t1 where b in ('ABC','DEF') and c = 'LS' );

         A
----------
         1

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select a
  2    from t1 x
  3   where NOT EXISTS (select NULL from t1 y where y.a = x.a and y.b in ('ABC','DEF') and y.c = 'LS' );

         A
----------
         1

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 values ( null, 'ABC', 'LS' );

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select a
  2    from t1
  3   where a not in (select a from t1 where b in ('ABC','DEF') and c = 'LS' );

no rows selected

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select a
  2    from t1 x
  3   where NOT EXISTS (select NULL from t1 y where y.a = x.a and y.b in ('ABC','DEF') and y.c = 'LS' );

         A
----------

         1



You might mean to add "and a is not null" to the subquery there so that

a) we can use an anti join, if A is nullable
b) you do not get the empty set as a result if A does become NULL for some value, which is probably not what you want.

Why not hash -aj used?

jian huang zheng, October 11, 2008 - 9:05 am UTC

Hello Tom:
SQL> show parameter hash

NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ------------------------------
hash_area_size                       integer                1048576
hash_join_enabled                    boolean                TRUE
SQL> show parameter workare
NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ------------------------------
workarea_size_policy                 string                 AUTO

SQL> show parameter pga

NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ---------------------------
pga_aggregate_target                 big integer            104857600
SQL> select *  from v$version;

BANNER
-----------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE    9.2.0.1.0       Production
TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production

I have two tables:
1. CREATE TABLE "MEDISTORE"."IMAGE_FULL_DIR"
  (    "IMAGE_NAME" VARCHAR2(25),
       "IMAGE_FULL_DIR" VARCHAR2(255),
        CONSTRAINT "IMAGE_NT" CHECK ( image_full_dir is not null) ENABLE
  ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS NOLOGGING
 STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
 TABLESPACE "RENJI"

2.CREATE TABLE "MEDISTORE"."IMAGE_UID_IMG"
 (    "IMAGE_NAME" VARCHAR2(30),
       CONSTRAINT "IMAGE_N" CHECK ( image_name is not null) ENABLE
 ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "RENJI"

3.SQL> select count(*) from image_full_dir;

  COUNT(*)
----------
   7287429
4.SQL> select count(*) from image_uid_img;

  COUNT(*)
----------
  20023885
5.SQL> exec print_table('select *  from user_ind_columns where table_name=''IMAGE_FULL_DIR''');
INDEX_NAME                    : IMAGE_UQ
TABLE_NAME                    : IMAGE_FULL_DIR
COLUMN_NAME                   : IMAGE_NAME
COLUMN_POSITION               : 1
COLUMN_LENGTH                 : 25
CHAR_LENGTH                   : 25
DESCEND                       : ASC

6SQL> EXEC dbms_stats.gather_table_stats(user,'IMAGE_FULL_DIR',cascade=>true);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_table_stats(user,'IMAGE_UID_IMG',cascade=>true);
PL/SQL procedure successfully completed
SQL> exec print_table('select *  from user_tables where table_name=''IMAGE_UID_IMG''');
TABLE_NAME                    : IMAGE_UID_IMG
TABLESPACE_NAME               : RENJI
CLUSTER_NAME                  :
IOT_NAME                      :
PCT_FREE                      : 10
PCT_USED                      : 40
INI_TRANS                     : 1
MAX_TRANS                     : 255
INITIAL_EXTENT                : 65536
NEXT_EXTENT                   :
MIN_EXTENTS                   : 1
MAX_EXTENTS                   : 2147483645
PCT_INCREASE                  :
FREELISTS                     : 1
FREELIST_GROUPS               : 1
LOGGING                       : NO
BACKED_UP                     : N
NUM_ROWS                      : 20023885
BLOCKS                        : 48321
EMPTY_BLOCKS                  : 0
AVG_SPACE                     : 0
CHAIN_CNT                     : 0
AVG_ROW_LEN                   : 12
AVG_SPACE_FREELIST_BLOCKS     : 0
NUM_FREELIST_BLOCKS           : 0
DEGREE                        :          1
INSTANCES                     :          1
CACHE                         :     N
TABLE_LOCK                    : ENABLED
SAMPLE_SIZE                   : 20023885
LAST_ANALYZED                 : 2008-10-11 20:33:19
PARTITIONED                   : NO
IOT_TYPE                      :
TEMPORARY                     : N
SECONDARY                     : N
NESTED                        : NO
BUFFER_POOL                   : DEFAULT
ROW_MOVEMENT                  : DISABLED
GLOBAL_STATS                  : YES
USER_STATS                    : NO
DURATION                      :
SKIP_CORRUPT                  : DISABLED
MONITORING                    : NO
CLUSTER_OWNER                 :
DEPENDENCIES                  : DISABLED
-----------------

PL/SQL procedure successfully completed.

SQL>
SQL> exec print_table('select *  from user_tables where table_name=''IMAGE_FULL_DIR''');
TABLE_NAME                    : IMAGE_FULL_DIR
TABLESPACE_NAME               : RENJI
CLUSTER_NAME                  :
IOT_NAME                      :
PCT_FREE                      : 10
PCT_USED                      : 40
INI_TRANS                     : 1
MAX_TRANS                     : 255
INITIAL_EXTENT                : 65536
NEXT_EXTENT                   :
MIN_EXTENTS                   : 1
MAX_EXTENTS                   : 2147483645
PCT_INCREASE                  :
FREELISTS                     : 1
FREELIST_GROUPS               : 1
LOGGING                       : NO
BACKED_UP                     : N
NUM_ROWS                      : 7287429
BLOCKS                        : 77993
EMPTY_BLOCKS                  : 0
AVG_SPACE                     : 0
CHAIN_CNT                     : 0
AVG_ROW_LEN                   : 72
AVG_SPACE_FREELIST_BLOCKS     : 0
NUM_FREELIST_BLOCKS           : 0
DEGREE                        :          1
INSTANCES                     :          1
CACHE                         :     N
TABLE_LOCK                    : ENABLED
SAMPLE_SIZE                   : 7287429
LAST_ANALYZED                 : 2008-10-11 20:31:09
PARTITIONED                   : NO
IOT_TYPE                      :
TEMPORARY                     : N
SECONDARY                     : N
NESTED                        : NO
BUFFER_POOL                   : DEFAULT
ROW_MOVEMENT                  : DISABLED
GLOBAL_STATS                  : YES
USER_STATS                    : NO
DURATION                      :
SKIP_CORRUPT                  : DISABLED
MONITORING                    : NO
CLUSTER_OWNER                 :
DEPENDENCIES                  : DISABLED
-----------------

PL/SQL procedure successfully completed.

Now I run this query :

select image_full_dir from image_full_dir where image_name not in ( select image_name from image_uid_img ); 
it takes forever to run,v$session_longops says 12millions seconds to finish. explain plan says:
----------------------------------------------------------------------------

------------------------------------------------------------------------
| Id  | Operation            |  Name           | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                 |   364K|    25M|  1693M|
|*  1 |  FILTER              |                 |       |       |       |
|   2 |   TABLE ACCESS FULL  | IMAGE_FULL_DIR  |   364K|    25M|  7503 |
|*  3 |   TABLE ACCESS FULL  | IMAGE_UID_IMG   |  1002K|    14M|  4649 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "IMAGE_UID_IM
              G" "IMAGE_UID_IMG" WHERE LNNVL("IMAGE_UID_IMG"."IMAGE_N
              AME"<>:B1)))
   3 - filter(LNNVL("IMAGE_UID_IMG"."IMAGE_NAME"<>:B1))

Note: cpu costing is off

19 rows selected.

From the v$session_longops , it is in accordance with the actual running.

But if I rewrote sql like:
select image_full_dir from image_full_dir,image_uid_img where image_full_dir.image_name=image_uid_img.image_name(+) and image_uid_img.image_name is null;
it just takes minutes to finish and give back the results, from explain plan i see hash outerjoin is involved:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------

------------------------------------------------------------------------
| Id  | Operation            |  Name           | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                 |    20M|  1663M| 55577 |
|*  1 |  FILTER              |                 |       |       |       |
|*  2 |   HASH JOIN OUTER    |                 |       |       |       |
|   3 |    TABLE ACCESS FULL | IMAGE_FULL_DIR  |  7287K|   500M|  7503 |
|   4 |    TABLE ACCESS FULL | IMAGE_UID_IMG   |    20M|   286M|  4649 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("IMAGE_UID_IMG"."IMAGE_NAME" IS NULL)
   2 - access("IMAGE_FULL_DIR"."IMAGE_NAME"="IMAGE_UID_IMG"."IMAGE_NA
              ME"(+))

Note: cpu costing is off

the image_name in this two tables are all unique.

I would expect the first query using hash-aj,but why oracle choose to full scan table? And if you rewrite the sql , why oracle choose the different plan?

It puzzles me so much, and I most am appreciated if you can provide some clue about them.





Tom Kyte
October 13, 2008 - 2:50 am UTC

CREATE TABLE "MEDISTORE"."IMAGE_UID_IMG"
( "IMAGE_NAME" VARCHAR2(30),
CONSTRAINT "IMAGE_N" CHECK ( image_name is not null)

why didn't you just use NOT NULL?

we don't know it is not null for real - so we are doing processing as if it were NULLABLE.

still the same plan,after change the not null?

jian huang zheng, October 13, 2008 - 11:41 am UTC

Hello Tom
    Thanks for your advice. but after I do this:
SQL> select *  from user_constraints where table_name='IMAGE_UID_IMG';

OWNER                                                        CONSTRAINT_NAME                                      CO TABLE_NAME                                                   SEARCH_CONDITION
------------------------------------------------------------ ------------------------------------------------------------ -- ------------------------------------------------------------ -------------------------------------
MEDISTORE                                                    IMAGE_N                                              C  IMAGE_UID_IMG                                                 image_name is not null
SQL> alter table image_uid_img drop constraints image_n;

Table altered.
SQL> alter table image_uid_img modify  image_name not null;

Table altered.

SQL> explain plan for select image_full_dir from image_full_dir where image_name not in ( select image_name from image_

Explained.

SQL> select *  from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------
| Id  | Operation            |  Name           | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                 |   364K|    25M|  1693M|
|*  1 |  FILTER              |                 |       |       |       |
|   2 |   TABLE ACCESS FULL  | IMAGE_FULL_DIR  |   364K|    25M|  7503 |
|*  3 |   TABLE ACCESS FULL  | IMAGE_UID_IMG   |  1001K|    11M|  4649 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "IMAGE_UID_IMG" "IMAG
              E_UID_IMG" WHERE LNNVL("IMAGE_UID_IMG"."IMAGE_NAME"<>:B
              1)))
   3 - filter(LNNVL("IMAGE_UID_IMG"."IMAGE_NAME"<>:B1))

Note: cpu costing is off

19 rows selected.

SQL>

Still the same plan, anything that I miss? 

Thanks~!

Tom Kyte
October 14, 2008 - 5:00 pm UTC

full examples from start to finish always please, sort of like this:

ops$tkyte%ORA11GR1> create table t1 ( x int, y int, z int );

Table created.

ops$tkyte%ORA11GR1> exec dbms_stats.set_table_stats( user, 'T1', numrows=>1000000, numblks => 100000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> create table t2 ( a int NOT NULL, b int, c int );

Table created.

ops$tkyte%ORA11GR1> exec dbms_stats.set_table_stats( user, 'T2', numrows=>1000000, numblks => 100000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> @at
ops$tkyte%ORA11GR1> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA11GR1> set autotrace traceonly explain
ops$tkyte%ORA11GR1> select * from t1 where x not in ( select a from t2 );

Execution Plan
----------------------------------------------------------
Plan hash value: 3756544281

------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes |TempSpc| Cost (
------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |     1 |    52 |       | 57905
|*  1 |  HASH JOIN RIGHT ANTI SNA|      |     1 |    52 |    23M| 57905
|   2 |   TABLE ACCESS FULL      | T2   |  1000K|    12M|       | 27142
|   3 |   TABLE ACCESS FULL      | T1   |  1000K|    37M|       | 27145
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("X"="A")

ops$tkyte%ORA11GR1> set autotrace off

I cant simulate your results , very different to understand?

jian huang zheng, October 15, 2008 - 11:41 am UTC

Hello Tom
   Thanks for your answer. I just copy and paste your example as :
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production

SQL> create table t1 ( x int, y int, z int );

Table created.
SQL> exec dbms_stats.set_table_stats(user,'T1',numrows=>1000000,numblks=>100000);

PL/SQL procedure successfully completed.
SQL> create table t2 ( a int NOT NULL, b int, c int );

Table created.
SQL>
SQL> exec dbms_stats.set_table_stats(user,'T2',numrows=>1000000,numblks=>100000);

PL/SQL procedure successfully completed.
SQL> explain plan for select * from t1 where x not in ( select a from t2 );

Explained.
SQL>
SQL> select *  from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------

--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             | 50000 |  1904K|   480M|
|*  1 |  FILTER              |             |       |       |       |
|   2 |   TABLE ACCESS FULL  | T1          | 50000 |  1904K|  9619 |
|*  3 |   TABLE ACCESS FULL  | T2          | 50000 |   634K|  9619 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "T2" "T2" WHERE L
              NNVL("T2"."A"<>:B1)))
   3 - filter(LNNVL("T2"."A"<>:B1))

Note: cpu costing is off

18 rows selected.

by the way,I dont know if those are needed:

SQL> Show parameter hash

NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- -----------
hash_area_size                       integer                1048576
hash_join_enabled                    boolean                TRUE
SQL> show parameter opti

NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ----------
filesystemio_options                 string
object_cache_optimal_size            integer                102400
optimizer_dynamic_sampling           integer                1
optimizer_features_enable            string                 9.2.0
optimizer_index_caching              integer                0
optimizer_index_cost_adj             integer                100
optimizer_max_permutations           integer                2000
optimizer_mode                       string                 ALL_ROWS
SQL> show parameter pga
NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ----------
pga_aggregate_target                 big integer            314572800
SQL> show parameter work
NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ----------
workarea_size_policy                 string                 AUTO
Thanks,it seems to me that hash-aj is some sort of disabled?  Could you please shed some lihgt on it? 

Tom Kyte
October 15, 2008 - 5:48 pm UTC

yes, looks like 9i did not have that feature in the optimizer at that point in time.

OK,THANKS

A reader, October 16, 2008 - 7:45 am UTC

Thanks,I got it. Much appreciated your time..

A reader, June 09, 2009 - 6:16 pm UTC

Hey Tom,

I recently came across something strange while toying with rewriting a query with a "NOT IN".

Basically I have around 1.5 M transactions, each of which has exactly one row in TRANS_HEADER. Each transactions consists of 0..N rows in TRANS_ROW. There's a total of around 20 M transaction rows. I wanted to find out all the transactions with no transaction rows associated with them.

I went ahead and wrote the simple query to return all such transactions but noticed the plan wasn't exactly something I'd like. So I reformulated my search as "return all transactions which are not in the group of transactions that DO have rows". To my surprise the second query actually runs in less than a minute (a reasonable time) and returns 40 rows (no reason to think that wouldn't be "correct").

I don't quite understand why the plans are so vastly different and why the more complex query is faster. Are my two queries not logically equivalent after all?

EXPLAIN PLAN FOR
SELECT DISTINCT ID FROM TRANS_HEADER WHERE ID NOT IN (
SELECT TRANS_ID FROM TRANS_ROW);

Explained.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 610510780
--------------------------------------------------------------------------------------------------
| Id  | Operation              | Name            | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                 |  1616K|  9469K|       |   171G  (2)|999:59:59 |
|   1 |  HASH UNIQUE           |                 |  1616K|  9469K|    37M|   171G  (2)|999:59:59 |
|*  2 |   FILTER               |                 |       |       |       |            |          |
|   3 |    INDEX FAST FULL SCAN| TRANS_HEADER_PK |  1635K|  9584K|       |   673   (2)| 00:00:09 |
|*  4 |    TABLE ACCESS FULL   | TRANS_ROW       |    41 |   246 |       |   105K  (2)| 00:21:03 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "TRANS_ROW" "TRANS_ROW"
              WHERE LNNVL("TRANS_ID"<>:B1)))
   4 - filter(LNNVL("TRANS_ID"<>:B1))
18 rows selected.


EXPLAIN PLAN FOR
SELECT DISTINCT ID FROM TRANS_HEADER WHERE ID NOT IN (
SELECT ID FROM TRANS_HEADER WHERE ID IN (
SELECT TRANS_ID FROM TRANS_ROW));

Explained.

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1670912697
---------------------------------------------------------------------------------------------------
| Id  | Operation               | Name            | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |   197K|  3657K|       | 27790   (2)| 00:05:34 |
|*  1 |  HASH JOIN ANTI                           |   197K|  3657K|    28M| 27790   (2)| 00:05:34 |
|   2 |   INDEX FAST FULL SCAN  | TRANS_HEADER_PK |  1635K|  9584K|       |   671   (2)| 00:00:09 |
|   3 |   VIEW                  | VW_NSO_1        |    20M|   259M|       |   740  (11)| 00:00:09 |
|   4 |    NESTED LOOPS         |                 |    20M|   239M|       |   740  (11)| 00:00:09 |
|   5 |     INDEX FAST FULL SCAN| TRANS_HEADER_PK |  1635K|  9584K|       |   671   (2)| 00:00:09 |
|*  6 |     INDEX RANGE SCAN    | TRANSROW_TRANS_INDX |    13 |    78 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("ID"="$nso_col_1")
   6 - access("ID"="TRANS_ID")
19 rows selected.

Tom Kyte
June 10, 2009 - 11:45 am UTC

first, proof that they are not semantically equivalent - I can get your two queries to return different results. I have to assume that TRANS_ROW allows for TRANS_ID to be nullable:

ops$tkyte%ORA10GR2> create table trans_header( id number primary key, data char(80) );

Table created.

ops$tkyte%ORA10GR2> create table trans_row( trans_id number  );

Table created.

ops$tkyte%ORA10GR2> insert into trans_header values ( 1, 'x' );

1 row created.

ops$tkyte%ORA10GR2> insert into trans_header values ( 2, 'y' );

1 row created.

ops$tkyte%ORA10GR2> insert into trans_row values ( null );

1 row created.

ops$tkyte%ORA10GR2> insert into trans_row values ( 2 );

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> SELECT DISTINCT ID FROM TRANS_HEADER WHERE ID NOT IN (
  2  SELECT TRANS_ID FROM TRANS_ROW);

no rows selected

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> SELECT DISTINCT ID FROM TRANS_HEADER WHERE ID NOT IN (
  2  SELECT ID FROM TRANS_HEADER WHERE ID IN (
  3  SELECT TRANS_ID FROM TRANS_ROW));

        ID
----------
         1



so, they are not the same - and the reason that the first query performs "slowly" for you is the nullable quality of the trans_id column, consider:

ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'TRANS_HEADER', numrows => 1635000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'TRANS_ROW', numrows => 41 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set serveroutput off
ops$tkyte%ORA10GR2> delete from plan_table;

4 rows deleted.

ops$tkyte%ORA10GR2> explain plan for
  2  SELECT DISTINCT ID FROM TRANS_HEADER WHERE ID NOT IN ( SELECT TRANS_ID FROM TRANS_ROW);

Explained.

ops$tkyte%ORA10GR2> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 457573287

--------------------------------------------------------------------------------------
| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |              |  1634K|    20M|  2982K  (1)| 06:06:01 |
|*  1 |  FILTER               |              |       |       |            |          |
|   2 |   INDEX FAST FULL SCAN| SYS_C0038082 |  1635K|    20M|     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL   | TRANS_ROW    |     2 |    26 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "TRANS_ROW" "TRANS_ROW" WHERE
              LNNVL("TRANS_ID"<>:B1)))
   3 - filter(LNNVL("TRANS_ID"<>:B1))

17 rows selected.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> delete from plan_table;

4 rows deleted.

ops$tkyte%ORA10GR2> explain plan for
  2  SELECT DISTINCT ID FROM TRANS_HEADER WHERE ID NOT IN ( SELECT TRANS_ID FROM TRANS_ROW where trans_id is not null);

Explained.

ops$tkyte%ORA10GR2> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3077635579

--------------------------------------------------------------------------------------
| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |              |  1515K|    37M|    26  (58)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI |              |  1515K|    37M|    26  (58)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL   | TRANS_ROW    |     2 |    26 |     8   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| SYS_C0038082 |  1635K|    20M|     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ID"="TRANS_ID")
   2 - filter("TRANS_ID" IS NOT NULL)

16 rows selected.



add the "is not null" to the subquery and we can use the hash anti join (fast) rather than a filter - the nullness changes the meaning of NOT IN


where x not in (any set that contains NULL) <<<== never true, never false, rather it is UNKNOWN

where x not in (any set of NOT NULL values) <<<=== always true OR false assuming X itself is NOT NULL


You allowed for the anti join to be used since ID is probably NOT NULL in your trans_header table so using it as the target of a NOT IN allowed it to be 'safe' to anti join with.