Not clear...
Kashif, January 29, 2003 - 2:24 pm UTC
Hi Tom,
I'm not sure how you concluded that t2 will be full scanned 100,000 times. Do you mean 10,000? Also, how are you able to tell looking at the tkprof that t2 was full-scanned more than once? I would have thought that both t1 and t2 would be full table scanned once, and then their results compared. Is that not true? If there are any questions on your forum you might have previously answered explaining this or similar questions, please provide me with the link to that question. Thanks in advance.
Kashif
January 29, 2003 - 3:35 pm UTC
yes, 10,000 not 100,000 sorry about the extra zero.
the query plan tells me so...
the consistent gets backs it up -- the ONLY way they could be that high is to full scan that table over and over.
That was my point -- the RBO is very rudimentary here. It full scans T1 and applied a FILTER on T2 by full scanning T2 to see if the value was "not in t2"....
Mike, January 30, 2003 - 12:17 am UTC
We have seen HASH join very expensive on hugh tables (>1,000,000 rows) as comparison to the NL
January 30, 2003 - 8:50 am UTC
ever the contraryian, suggest you read:
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6749454952894 <code>
there you will see the evil NL join literally killing your performance (but, giving you a very good cache hit ratio if you like that sort of stuff)
and tell you what -- take this example, extrapolate the data upto 1+ million rows and give it a whirl....
Different conclusion...
Kashif, October 06, 2003 - 12:59 pm UTC
Hi Tom,
I happened to revisit this question and tried out the test you had presented in your answer. I wanted to ascertain whether the drop in consistent gets when using the CBO (in your second query) was as a result of the reduced number of full scans of table t2. It turns out, however, that the drop in consistent gets was as a result of the index that was added; table t2 was scanned the same of number of times in both the RBO and CBO environments, per my testing. Here are my results:
/* My script: I did not include the exists and outer join
alternatives to NOT IN */
create table t1 as select * from all_objects where rownum <= 10000;
create table t2 as select * from all_objects where rownum <= 9950;
--create index t2_idx on t2(object_id); --Excluded the index creation for now...
alter session set sql_trace=true;
select count(*)
from t1 rbo
where object_id not in ( select object_id from t2 );
alter session set sql_trace=false;
analyze table t1 compute statistics
for table
for all indexes
for all indexed columns;
analyze table t2 compute statistics
for table
for all indexes
for all indexed columns;
alter session set sql_trace=true;
select count(*)
from t1 cbo
where object_id not in ( select object_id from t2 );
create index t2_idx on t2(object_id); --Added the index over here.
analyze table t2 compute statistics
for table
for all indexes
for all indexed columns;
select count(*)
from t1 cbo2
where object_id not in ( select object_id from t2 );
Results from tkprof:
Without the index, RBO:
select count(*)
from t1 rbo
where object_id not in ( select object_id from t2 )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 1007332 1492552 212229 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 1007332 1492552 212229 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 1885
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
50 FILTER
10001 TABLE ACCESS FULL T1
10000 TABLE ACCESS FULL T2
Without the index, CBO:
select count(*)
from t1 cbo
where object_id not in ( select object_id from t2 )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 1492552 212229 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 1492552 212229 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 1885
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
50 FILTER
10001 TABLE ACCESS FULL T1
10000 TABLE ACCESS FULL T2
The consistent gets for the query in RBO and CBO modes is the same without the index, which tells me that the difference in your results was a result of the added index, and NOT because of the number of times t2 is full-scanned is reduced. I also threw the indexed query in at the end (I added the index and then reanalyzed table t2, and ran the query):
select count(*)
from t1 cbo2
where object_id not in ( select object_id from t2 )
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 283 20294 33 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 283 20294 33 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 1885
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
50 FILTER
10001 TABLE ACCESS FULL T1
10000 INDEX RANGE SCAN (object id 298547)
The consistent gets do go down, but the t2 query still gets executed 10,000 times, as proven by this test:
kkashif@DEV> exec dbms_application_info.set_client_info( 0 )
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.40
kkashif@DEV> create or replace function client_info return varchar2
2 as
3 begin
4 dbms_application_info.set_client_info( userenv('client_info')+1 );
5 -- return userenv('client_info');
6 return ('A');
7 end;
8 /
Function created.
Elapsed: 00:00:00.41
kkashif@DEV> ed
Wrote file afiedt.buf
1 select count(*)
2 from t1 cbo3
3* where object_id not in ( select object_id from t2 where client_info = 'A')
kkashif@DEV> /
COUNT(*)
----------
50
Elapsed: 00:00:02.33
kkashif@DEV> select userenv ('client_info') from dual
2 /
USERENV('CLIENT_INFO')
----------------------------------------------------------------
10000
Elapsed: 00:00:00.50
kkashif@DEV>
So the results indicate that the NOT IN clause does in fact execute the same number of times using the RBO and CBO. Am I missing something? Thanks.
Kashif
October 06, 2003 - 1:53 pm UTC
that is NOT a cbo plan you have there -- no cost, no card!
the cbo plan has a hash anti join:
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=161 r=0 w=0 time=146502 us)
50 HASH JOIN ANTI (cr=161 r=0 w=0 time=146422 us)
10000 TABLE ACCESS FULL T1 (cr=138 r=0 w=0 time=16401 us)
9950 INDEX FULL SCAN T2_IDX (cr=23 r=0 w=0 time=16219 us)(object id 34305)
that scans 1 time! you are looking at the wrong plan.
Interesting...
Kashif, October 06, 2003 - 3:08 pm UTC
Hi Tom,
The same query in the same exact environment with the same statistics etc. produces the following plan and statistics, when executed after turning autotrace on instead of using sql_trace:
kkashif@DEV> set autotrace traceonly
kkashif@DEV> select count(*)
2 from t1 cbo5
3 where object_id not in ( select object_id from t2 )
4 /
Elapsed: 00:00:00.21
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=44 Card=1 Bytes=5)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'T1' (Cost=44 Card=500 Bytes=2500)
4 2 INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE) (Cost=1 Card=1 Bytes=5)
Statistics
----------------------------------------------------------
0 recursive calls
33 db block gets
20294 consistent gets
283 physical reads
0 redo size
367 bytes sent via SQL*Net to client
430 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
1 rows processed
kkashif@DEV>
As you can see, the query shows the cost, cards etc. in the plan. So I guess now I have different questions:
Why would there be a difference in the plans shown by autotrace and sql_trace? Is there a setting in sql_trace that will cause the cost, cards etc. to show up that I have not set? Thanks.
Kashif
October 06, 2003 - 3:45 pm UTC
are you actually observing two different plans?
you ran this with sql_trace and get a hash anti join and then run with autotrace and get a different plan?
I cannot reproduce that behaviour -- but -- is that what you are seeing?
Kashif, October 06, 2003 - 4:47 pm UTC
Hi Tom,
Actually, I get the same plan in both the autotrace and sql_trace traces, but the autotrace produces a plan with the card and cost information, whereas the sql_trace produces the same plan without the cost and card information:
In SQL_TRACE:
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
50 FILTER
10001 TABLE ACCESS FULL T1
10000 INDEX RANGE SCAN (object id 298547)
In Autotrace:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=44 Card=1 Bytes=5)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'T1' (Cost=44 Card=500 Bytes=2500)
4 2 INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE) (Cost=1 Card=1
Bytes=5)
However, I checked the setting on the always_anti_join parameter, and it was set as NESTED_LOOPS. So I changed it to HASH and executed the query again. This time, the plan is similar to your plan (i.e. with the hash anti join), but still no cost/card info in the trace file!
select count(*)
from t1 cbo5
where object_id not in ( select object_id from t2)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 275 341 39 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 275 341 39 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 1885
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
50 HASH JOIN ANTI
10000 TABLE ACCESS FULL T1
9950 VIEW VW_NSO_1
9950 INDEX FAST FULL SCAN (object id 298547)
The autotrace plan though, again, shows the cost and card info:
kkashif@DEV> select count(*)
2 from t1 cbo5
3 where object_id not in ( select object_id from t2 )
4 /
Elapsed: 00:00:00.31
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=114 Card=1 Bytes=18)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (ANTI) (Cost=114 Card=50 Bytes=900.00000000000
1)
3 2 TABLE ACCESS (FULL) OF 'T1' (Cost=44 Card=10000 Bytes=
50000)
4 2 VIEW OF 'VW_NSO_1' (Cost=8 Card=9950 Bytes=129350)
5 4 INDEX (FAST FULL SCAN) OF 'T2_IDX' (NON-UNIQUE) (Cos
t=8 Card=9950 Bytes=49750)
Strange... If this is becoming too long, I'll wait for when you are accepting questions and post it as a new question. Thanks!
Kashif
October 06, 2003 - 6:14 pm UTC
thats accurate -- my "wrong" above. the STAT records in the trace file won't have the cost/card -- sorry about that.
only if you use explain=u/p will you get the cost/card in the trace.
In anycase, you discovered "the problem"
Got it.
Kashif, October 07, 2003 - 9:37 am UTC
Hi Tom,
Yep, I now understand why the query was being executed multiple times even in the CBO environment (always_anti_join was set to NESTED_LOOPS). Though I wonder why Oracle set the default value for the always_anti_join parameter to NESTED_LOOPS. Curious to know if you have any explanation/insight. Thanks again for your insight.
Kashif
October 07, 2003 - 9:44 am UTC
in current releases of the software, this is a hidden parameter and has a new value "choose" meaning it can do what it wants.
in older releases, it defaulted to the way it was first implemented for backwards compatibility.
anti join
A reader, March 31, 2004 - 3:03 am UTC
Hi
I have this query to compare table structure differences
select table_name, column_name, data_type, data_length, data_precision
from user_tab_columns a
where column_name not in (select column_name
from user_tab_columns@oe_his b
where a.table_name = b.table_name)
and table_name in (select nombre_tabla
from his_lista_tablas@oe_his)
this shows me the tables in my local database which has more columns than the old tables using db link.
My question is I would like to show columns from user_tab_columns@VAN in my query result as well. Since this is an anti join how can I accomplish this?
March 31, 2004 - 8:38 am UTC
in 9i, you could use a FULL OUTER join.
but, you don't have any versions :( so I cannot tell you what to do....
hi my version is 8.1.7.4
A reader, March 31, 2004 - 9:34 am UTC
hi my version is 8.1.7.4
so we dont have full outer join :-(
March 31, 2004 - 9:47 am UTC
you would have to simiulate then
select ...
from t1, t2
where t1.key = t2.key(+)
UNION ALL
select ....
from t1, t2
where t1.key(+) = t2.key
AND t1.key is NULL
/
that is all the syntatic sugar of a "full outer join" does under the covers...
top half gets everything in T1 and if there is something in T2, gets it. bottom half gets everything in t2 that isn't in T1 since we already got that.
hmm but I am trying to get this output
A reader, April 01, 2004 - 3:15 am UTC
Right now I have this output
TABLE_NAME COLUMN_NAME DATA_TYPE DATA_LENGTH
----------------- ------------- ---------- -----------
OM_CUSTOMER_INST COMMENTS VARCHAR2 2000
OM_CUSTOMER_INST COMMENTS VARCHAR2 4000
I would like to have the following output which I dont think can be done using full outer join :-?
TABLE_NAME COLUMN_NAME DATA_TYPE DATA_LENGTH TABLE_NAME COLUMN_NAME DATA_TYPE DATA_LENGTH
----------------- ------------- ---------- ----------- ----------------- ------------- ---------- -----------
OM_CUSTOMER_INST COMMENTS VARCHAR2 2000 OM_CUSTOMER_INST COMMENTS VARCHAR2 4000
April 01, 2004 - 10:32 am UTC
a join is what is needed exactly to get that second output.
ops$tkyte@ORA9IR2> create table t1( tname varchar2(5), cname varchar2(5), dtype varchar(10), dlen number );
Table created.
ops$tkyte@ORA9IR2> create table t2( tname varchar2(5), cname varchar2(5), dtype varchar(10), dlen number );
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t1 values ( 'T1', 'C1', 'VARCHAR2', 2000 );
1 row created.
ops$tkyte@ORA9IR2> insert into t1 values ( 'T1', 'C2', 'NUMBER', 38 );
1 row created.
ops$tkyte@ORA9IR2> insert into t2 values ( 'T1', 'C1', 'VARCHAR2', 4000 );
1 row created.
ops$tkyte@ORA9IR2> insert into t2 values ( 'T1', 'C3', 'DATE', null );
1 row created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select *
2 from t1 full outer join t2 on ( t1.tname = t2.tname and t1.cname = t2.cname )
3 /
TNAME CNAME DTYPE DLEN TNAME CNAME DTYPE DLEN
----- ----- ---------- ---------- ----- ----- ---------- ----------
T1 C1 VARCHAR2 2000 T1 C1 VARCHAR2 4000
T1 C2 NUMBER 38
T1 C3 DATE
ops$tkyte@ORA9IR2>
anti join
Venkat, April 19, 2004 - 12:47 am UTC
Tom,
When we tried the example for anit join the way you have shown it, the "consistent gets" has reduced significantly in anti join with cbo whereas the response time has gone up a bit when compared to anti join rbo. Can you please expalin this phenomena?
===========================================================
SQL> create table x as select * from dba_objects where rownum<7001;
Table created.
Elapsed: 00:00:00.47
SQL> create table y as select * from dba_objects where rownum<6951;
Table created.
Elapsed: 00:00:00.47
SQL> alter table y modify(object_id not null);
Table altered.
Elapsed: 00:00:00.18
SQL> create index y_idx on y(object_id);
Index created.
Elapsed: 00:00:00.99
SQL> alter session set optimizer_mode=RULE;
Session altered.
Elapsed: 00:00:00.00
SQL> set autotrace traceonly
SQL> set timing on
SQL> select count(*) from x where object_id not in(select object_id from y);
Elapsed: 00:00:14.97
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=RULE
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'X'
4 2 TABLE ACCESS (FULL) OF 'Y'
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
312696 consistent gets
86 physical reads
0 redo size
491 bytes sent via SQL*Net to client
655 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> alter session set optimizer_mode=CHOOSE;
Session altered.
Elapsed: 00:00:00.00
SQL> analyze table x compute statistics for table for all indexes for all indexed columns;
Table analyzed.
Elapsed: 00:00:00.27
SQL> analyze table y compute statistics for table for all indexes for all indexed columns;
Table analyzed.
Elapsed: 00:00:00.81
SQL> select count(*) from x where object_id not in(select object_id from y);
Elapsed: 00:00:15.41
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=31 Card=1 Bytes=13)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'X' (Cost=15 Card=350 Bytes=455
0)
4 2 INDEX (FULL SCAN) OF 'Y_IDX' (NON-UNIQUE) (Cost=16 Car
d=348 Bytes=1044)
Statistics
----------------------------------------------------------
61 recursive calls
0 db block gets
61740 consistent gets
0 physical reads
0 redo size
491 bytes sent via SQL*Net to client
655 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
=======================================================
Thanks,
Venkat
April 19, 2004 - 6:39 am UTC
do not measure things with 'set timing on' and consider things a second or two apart "different".
too many things in between you and the timing.
use TKPROF.
set timing on is good to measure things that take minutes (if you use set timing and the difference between approach A and B is say 1.5 minutes, that is meaningful. If the difference is 1.5 seconds -- probably not).
NOT EXISTS
Jennifer Chen, May 08, 2004 - 3:57 pm UTC
Hi Tom,
I need to retrieve data in table alias_snapshot, but not in table iii_snapshot, which means both sid and fbi do not match:
SELECT a.SID chr_sid, a.fbi chr_fbi, a.nam chr_nam,
a.iii_status chr_iii_status, a.sex chr_sex,
a.dob chr_dob, a.rac chr_rac, a.iffs chr_iffs
FROM alias.alias_snapshot a
WHERE NOT EXISTS (
SELECT 1
FROM alias.iii_snapshot iii
WHERE a.SID = iii.SID OR a.fbi = iii.fbi)
Table alias_snapshot contains 1.5 million rows and table iii_snapshot contains roughly 1.2 million rows. This query takes forever. Do you have a better way to retrieve this dataset?
Thanks in advance for your help
May 10, 2004 - 7:20 am UTC
use the CBO and use NOT IN
select ...
from t1
where ( columns ) not in ( select columns from t2 )
brutally efficient and fast.
clarification about anti joins
amit, January 20, 2005 - 10:06 am UTC
1 explain plan for
2 UPDATE /*+ dynamic_sampling(C 10) */ yloci.oci_section C
3 set sect_syllabus_url = null,
4 sect_syllabus_last_dated = sysdate
5 where sect_syllabus_url is not null
6 and sect_term = :b1
7 and (sect_term,crse_subject_id,crse_course_number,nvl(C.sect_syllabus_url,'X')) not in
8* (select /*+ HASH_AJ */ sect_term,crse_subject_id,crse_course_number,nvl(S.crse_syllabus_url,'X') from yloci.oci_syllabus_load S)
9 /
Explained.
SQL> @?/rdbms/admin/utlxpls.sql
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 39 | 1482 | 104K|
| 1 | UPDATE | OCI_SECTION | | | |
|* 2 | FILTER | | | | |
|* 3 | TABLE ACCESS FULL | OCI_SECTION | 39 | 1482 | 30 |
|* 4 | TABLE ACCESS FULL | OCI_SYLLABUS_LOAD | 9 | 549 | 2671 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( NOT EXISTS (SELECT /*+ HASH_AJ */ 0 FROM
"YLOCI"."OCI_SYLLABUS_LOAD" "S" WHERE LNNVL("S"."SECT_TERM"<>:B1) AND
LNNVL("S"."CRSE_SUBJECT_ID"<>:B2) AND LNNVL("S"."CRSE_COURSE_NUMBER"<>:B3)
AND LNNVL(NVL(:B4,'X')<>NVL("S"."CRSE_SYLLABUS_URL",'X'))))
3 - filter("SYS_ALIAS_4"."SECT_SYLLABUS_URL" IS NOT NULL AND
"SYS_ALIAS_4"."SECT_TERM"=:Z)
4 - filter(LNNVL("S"."SECT_TERM"<>:B1) AND
LNNVL("S"."CRSE_SUBJECT_ID"<>:B2) AND LNNVL("S"."CRSE_COURSE_NUMBER"<>:B3)
AND LNNVL(NVL(:B4,'X')<>NVL("S"."CRSE_SYLLABUS_URL",'X')))
Note: cpu costing is off
If I remove the nvl(..) column from the not it condition
1 explain plan for
2 UPDATE /*+ dynamic_sampling(C 10) */ yloci.oci_section C
3 set sect_syllabus_url = null,
4 sect_syllabus_last_dated = sysdate
5 where sect_syllabus_url is not null
6 and sect_term = :b1
7 and (sect_term,crse_subject_id,crse_course_number) not in
8* (select /*+ HASH_AJ */ sect_term,crse_subject_id,crse_course_number from yloci.oci_syllabus_load S)
9 /
Explained.
SQL> @?/rdbms/admin/utlxpls.sql
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 150 | 8250 | 67 |
| 1 | UPDATE | OCI_SECTION | | | |
|* 2 | HASH JOIN ANTI | | 150 | 8250 | 67 |
|* 3 | TABLE ACCESS FULL | OCI_SECTION | 770 | 29260 | 30 |
| 4 | INDEX FAST FULL SCAN| X1_OCI_SYLLABUS_LOAD | 1470K| 23M| 4 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("C"."SECT_TERM"="S"."SECT_TERM" AND
"C"."CRSE_SUBJECT_ID"="S"."CRSE_SUBJECT_ID" AND
"C"."CRSE_COURSE_NUMBER"="S"."CRSE_COURSE_NUMBER")
3 - filter("C"."SECT_SYLLABUS_URL" IS NOT NULL AND "C"."SECT_TERM"=:Z)
Note: cpu costing is off
I thought the nvl thing makes the column not null Hence cbo can still use the ANTI join even with the nvl column.
Where am I going wrong ?
January 20, 2005 - 10:48 am UTC
(select /*+ HASH_AJ */
sect_term,crse_subject_id,crse_course_number from yloci.oci_syllabus_load S)
can that return null values -- you would add
where sect_term is not null and crse_subject_id is not ........
nvl is just a function. a function that could return NULL.
nvl is just a function...
padders, January 20, 2005 - 11:21 am UTC
...or is it?
The 10g optimizer appears to appreciate that NVL with non-null literal equates to NOT NULL.
Personal Oracle Database 10g Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> SET AUTOTRACE ON EXPLAIN;
SQL> SELECT empno
2 FROM emp
3 WHERE ename NOT IN (
4 SELECT ename
5 FROM emp);
no rows selected
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=24 Card=13 Bytes=104)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=112)
3 1 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=1 Bytes=5)
SQL> SELECT empno
2 FROM emp
3 WHERE NVL (ename, 'X') NOT IN (
4 SELECT NVL (ename, 'X')
5 FROM emp);
no rows selected
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=7 Card=1 Bytes=13)
1 0 HASH JOIN (ANTI) (Cost=7 Card=1 Bytes=13)
2 1 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=112)
3 1 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=70)
SQL>
January 20, 2005 - 7:13 pm UTC
but don't count on it. can you think of a case whereby the above would not hold true? :)
more clarification
amit poddar, January 20, 2005 - 1:53 pm UTC
sect_term,crse_subject_id,crse_course_number,nvl(S.crse_syllabus_url,'X')
sect_term,crse_subject_id,crse_course_number
these three columns are defined as not null in the table.
The only column nullable is crse_syllabus_url which I thought I can make non nullable by using nvl.
So are you saying in this case anti join cannot be used since nvl is not sufficeint to make this column not null.
January 20, 2005 - 7:33 pm UTC
actually, it looks like it should be able to work:
scott@ORA10G> set autotrace on
scott@ORA10G> update(
2 select /*+ ALL_ROWS */ *
3 FROM emp
4 WHERE (NVL (ename, 'X'), empno ) NOT IN (
5 SELECT NVL (ename, 'X'), empno
6 FROM emp))
7 set empno = 55;
0 rows updated.
Execution Plan
----------------------------------------------------------
0 UPDATE STATEMENT Optimizer=HINT: ALL_ROWS (Cost=7 Card=1 Bytes=20)
1 0 UPDATE OF 'EMP'
2 1 HASH JOIN (ANTI) (Cost=7 Card=1 Bytes=20)
3 2 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=140)
4 2 TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=3 Card=14 Bytes=140)
could be that since, well -- a hint is a hint...... it is doing the right thing.
do you have a fully reproducible testcase -- the entire thing, we can comment more fully with that. something we can run to get the same result.
Elaborate?
A reader, January 20, 2005 - 9:20 pm UTC
Could you elaborate your comment a couple posts above:
"but don't count on it. can you think of a case whereby the above would not hold true? :)"
When would NVL( SCHEMA.TABLE.COLUMN, NON-NULL LITERAL ) not be treated as a NOT NULL? (way to many negative things in that sentence)
January 20, 2005 - 11:11 pm UTC
I'm asking you to guess :) think about it -- what is a feature of the database since 8i that I'm not fond of that might change all of this.....
example as you said.
amit poddar, January 20, 2005 - 10:37 pm UTC
CREATE TABLE OCI_SECTION
( SECT_TERM VARCHAR2(6) NOT NULL,
CRSE_SUBJECT_ID VARCHAR2(4) NOT NULL,
CRSE_COURSE_NUMBER VARCHAR2(5) NOT NULL,
SECT_SYLLABUS_URL VARCHAR2(100),
SECT_SYLLABUS_LAST_DATED DATE NOT NULL
);
insert into oci_section values ('200501','CHEM','114','htt://classes.yale.edu/syllabus/chem114b/
',to_date('01/08/2005 18:06:48','mm/dd/yyyy hh24:mi:ss'));
insert into oci_section values ('200501','HSHM','217','htt://classes.yale.edu/syllabus/hshm217b/
',to_date('01/08/2005 21:07:04','mm/dd/yyyy hh24:mi:ss'));
insert into oci_section values ('200501','CPSC','478','htt://classes.yale.edu/syllabus/cpsc478b/
',to_date('01/08/2005 21:07:04','mm/dd/yyyy hh24:mi:ss'));
insert into oci_section values ('200501','SPAN','224','htt://classes.yale.edu/syllabus/span224-1
b/',to_date('01/08/2005 23:07:49','mm/dd/yyyy hh24:mi:ss'));
insert into oci_section values ('200501','PSYC','165','htt://classes.yale.edu/syllabus/psyc165b/
',to_date('01/13/2005 14:10:13','mm/dd/yyyy hh24:mi:ss'));
insert into oci_section values ('200501','AFST','603','htt://classes.yale.edu/syllabus/afst603b/
',to_date('01/08/2005 18:06:48','mm/dd/yyyy hh24:mi:ss'));
insert into oci_section values ('200501','JDST','235',null,to_date('01/07/2005 15:11:31','mm/dd/
yyyy hh24:mi:ss'));
insert into oci_section values ('200501','JDST','265',null,to_date('01/07/2005 15:11:31','mm/dd/
yyyy hh24:mi:ss'));
insert into oci_section values ('200501','JDST','270','htt://classes.yale.edu/syllabus/jdst270b/
',to_date('01/08/2005 09:12:43','mm/dd/yyyy hh24:mi:ss'));
begin
dbms_stats.gather_table_stats(OWNNAME=>'OBJ_MIG', TABNAME=>'OCI_SECTION');
end;
/
CREATE TABLE OCI_SYLLABUS_LOAD
( SECT_TERM VARCHAR2(6) NOT NULL ,
CRSE_SUBJECT_ID VARCHAR2(4) NOT NULL ,
CRSE_COURSE_NUMBER VARCHAR2(5) NOT NULL ,
CRSE_SECTION VARCHAR2(3),
CRSE_SYLLABUS_URL VARCHAR2(100),
LAST_UPDATED DATE
);
insert into oci_syllabus_load values('200403','WGSS','371','01','htt://classes.yale.edu/syllabus
/wgss371a/',null);
insert into oci_syllabus_load values('200403','WGSS','446','01','htt://classes.yale.edu/syllabus
/wgss446a/',null);
insert into oci_syllabus_load values('200403','WGSS','746','01','htt://classes.yale.edu/syllabus
/wgss746a/',null);
insert into oci_syllabus_load values('200403','WGST','745','01','htt://classes.yale.edu/syllabus
/wgst745a/',null);
insert into oci_syllabus_load values('200403','SPAN','938','01','htt://classes.yale.edu/syllabus
/span938a/',null);
insert into oci_syllabus_load values('200403','STAT','100','01','htt://classes.yale.edu/syllabus
/stat100a/',null);
insert into oci_syllabus_load values('200403','STAT','101','01','htt://classes.yale.edu/syllabus
/stat101a/',null);
insert into oci_syllabus_load values('200403','STAT','102','01','htt://classes.yale.edu/syllabus
/stat102a/',null);
insert into oci_syllabus_load values('200403','STAT','103','01','htt://classes.yale.edu/syllabus
/stat103a/',null);
begin
dbms_stats.gather_table_stats(OWNNAME=>'OBJ_MIG', TABNAME=>'OCI_SYLLABUS_LOAD');
end;
/
explain plan for
UPDATE oci_section C
set sect_syllabus_url = null,
sect_syllabus_last_dated = sysdate
where sect_syllabus_url is not null
and sect_term = :b1
and (sect_term,
crse_subject_id,
crse_course_number,
nvl(C.sect_syllabus_url,'X')
) not in
(select sect_term,
crse_subject_id,
crse_course_number,
nvl(S.crse_syllabus_url,'X')
from oci_syllabus_load S);
select plan_table_output from table(dbms_xplan.display());
January 20, 2005 - 11:51 pm UTC
it 10g, it does anti join, in 9i -- nvl is just a function.
Unless
padders, January 21, 2005 - 6:34 am UTC
Unless (as Tom hints at above) you happen to be using this feature...
Personal Oracle Database 10g Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> ALTER SESSION SET cursor_sharing = FORCE;
Session altered.
SQL> SELECT empno
2 FROM emp
3 WHERE NVL (ename, 'X') NOT IN (
4 SELECT NVL (ename, 'X')
5 FROM emp);
no rows selected
SQL> SELECT plan_table_output
2 FROM TABLE (DBMS_XPLAN.DISPLAY_CURSOR ((
3 SELECT sql_id
4 FROM v$sql
5 WHERE sql_text LIKE LOWER ('%EMPNO%EMP%ENAME%'))));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID f6547mcu7944m, child number 0
-------------------------------------
SELECT empno FROM emp WHERE NVL (ename, :"SYS_B_0") NOT IN (
SELECT NVL (ename, :"SYS_B_1") FROM emp)
Plan hash value: 84080821
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 24 (100)| |
--------------------------------------------------------------------------------
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| EMP | 14 | 112 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| EMP | 1 | 5 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( IS NULL)
3 - filter(LNNVL(NVL(:B1,:SYS_B_0)<>NVL("ENAME",:SYS_B_1)))
22 rows selected.
SQL>
January 21, 2005 - 8:28 am UTC
precisely :)
I'm not sure that's the same
A reader, January 21, 2005 - 7:18 am UTC
I thought the NVL( SOMETHING, LITERAL ) was in reference to what the optimizer saw? With cursor_sharing = force, the optimizer will never see literals.
Regardless, is there ever a time NVL( SOMETHING, LITERAL ) would not be considered NOT NULL if that's what the optimizer saw in the query?
January 21, 2005 - 8:39 am UTC
in 9i, apparently so -- it is not considered "not null". you would outer join instead, keeping rows where the outer joined to table is NULL -- mimicking the anti-join.
always_anti_join parameter
Tom, January 21, 2005 - 7:29 am UTC
You mentioned in older releases always_anti_join=NESTED_LOOPS by default. Is there any way in 8.1.7 to "remove" this setting, so that the optimizer will choose HASH or NESTED_LOOPS depending on which it thinks is better?
If not, how do we decide which value is best for this setting in an OLTP environment? The hash anti join is excellent for large result sets but my feeling [non-scientific currently] is that nested_loops will be better for first rows type optimization.
Suggestions.
January 21, 2005 - 8:43 am UTC
suggestion: leave the setting as it is.
it is one of three values in 8i, only.
using outer join for mimiking anti join
amit poddar, January 21, 2005 - 1:02 pm UTC
Hi,
" you would outer join instead, keeping rows where the outer joined to table is NULL -- mimicking the anti-join. "
How would the above apply to
UPDATE oci_section C
set sect_syllabus_url = null,
sect_syllabus_last_dated = sysdate
where sect_syllabus_url is not null
and sect_term = :b1
and (sect_term,
crse_subject_id,
crse_course_number,
nvl(C.sect_syllabus_url,'X')
) not in
(select sect_term,
crse_subject_id,
crse_course_number,
nvl(S.crse_syllabus_url,'X')
from oci_syllabus_load S);
unless you are updating the join.
But to update the join
we would need a unique index on
oci_syllabus_load
(sect_term,crse_subject_id,crse_course_number,nvl(crse_syllabus_url,'X'))
correct ?
January 21, 2005 - 6:48 pm UTC
actually, it would not work -- because of the lack of a unique constraint or primary key on "S"
so, you would be using not exists here most likely for the update.
using outer join for mimiking anti join
amit poddar, January 21, 2005 - 1:27 pm UTC
"/mike_specific/ora_admin/generic/scripts/afiedt.buf" 14 lines, 537 characters
1 UPDATE (select c.sect_syllabus_url,
2 c.sect_syllabus_last_dated,
3 c.sect_term
4 from oci_section c,
5 oci_syllabus_load s
6 where c.sect_term(+) = s.sect_term and
7 c.crse_subject_id(+) = s.crse_subject_id and
8 c.crse_course_number(+) = s.crse_course_number and
9 nvl(c.sect_syllabus_url,'X') (+) = nvl(s.crse_syllabus_url,'X') and
10 c.sect_term is null
11 )
12 set sect_syllabus_url=null,
13* sect_syllabus_last_dated=sysdate
SQL> /
nvl(c.sect_syllabus_url,'X') (+) = nvl(s.crse_syllabus_url,'X') and
*
ERROR at line 9:
ORA-00936: missing expression
SQL>
It seems I can't outer join on a nvl
So how would I write this update join using the outer join method you mentioned.
January 21, 2005 - 7:37 pm UTC
the (+) would go with the column -- inside of the function, but you'll get "not a key preserved table" so it won't actually work with the update.
NOT IN is brutally slow. HELP!!!!
A reader, June 06, 2007 - 3:05 pm UTC
Hi Tom,
Could you please help me? For some reason NOT IN is brutally slow for me. Can you please suggest what could be wrong? This is **very** crucial for us since we use NOT IN all over the place. We are using oracle version 9.2.0.8.
The below select stmt did not return even after 45 minutes!
Dev_id is indexed in request table and is the PK in device table. The stats are collected.
select dev_id from request where dev_id not in (select dev_id from device);
set autotrace traceonly explain
select dev_id from request where dev_id not in (select dev_id from device);
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6709 Card=1 Bytes=6)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'REQUEST' (Cost=2517 Card=38939
Bytes=233634)
4 2 INDEX (FULL SCAN) OF 'PK_DEVICE' (UNIQUE) (Cost=4192 C
ard=79074 Bytes=474444)
set autotrace off
set timin on
SQL> select table_name,num_rows,last_analyzed from user_tables where table_name in ('REQUEST','DEVICE')
TABLE_NAME NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
DEVICE 1581482 25-MAR-07
REQUEST 778788 24-MAR-07
SQL> select count(*) from device;
COUNT(*)
----------
1624924
Elapsed: 00:00:01.99
SQL> select count(*) from request;
COUNT(*)
----------
806131
Elapsed: 00:00:01.46
SQL> select index_name from user_ind_columns where table_name='REQUEST' and column_name='DEV_ID';
INDEX_NAME
-------------
DUP_REQUEST_2
Thanks so much in advance!
June 06, 2007 - 9:29 pm UTC
no creates
no look
i want evidence that device in the subquery is NOT NULL
Found a solution - sort of
A reader, June 06, 2007 - 6:26 pm UTC
Hi Tom,
I was able to at least make the query return by adding a NOT NULL clause. This time it used anti-join?? rather than FTS.
select dev_id from request where
dev_id is not null and dev_id not in (select dev_id from device)
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2429 Card=1 Bytes=12
)
1 0 NESTED LOOPS (ANTI) (Cost=2429 Card=1 Bytes=12)
2 1 INDEX (FAST FULL SCAN) OF 'DUP_REQUEST_2' (NON-UNIQUE) (
Cost=178 Card=778788 Bytes=4672728)
3 1 INDEX (UNIQUE SCAN) OF 'PK_DEVICE' (UNIQUE) (Cost=1 Card
=1581482 Bytes=9488892)
no rows selected
Elapsed: 00:00:15.11
So I have 2 questions for you:
1) why did I have to add the NOT NULL clause? Is this something I can instruct Oracle to handle automatically?
2) 15 secs is still too long. How can I further reduce this?
Thanks again.
A reader, June 07, 2007 - 5:00 am UTC
Tom,
Replying to the original post you had said there is a difference in NOT IN and NOT EXISTS with the following example:
that is a tie -- but NOTE that this is NOT the same as "not in". To see why figure out
why this two queries return different answers:
scott@ORA920> select count(*) from emp where empno not in ( select mgr from emp );
COUNT(*)
----------
0
scott@ORA920> select count(*) from emp
2 where not exists ( select null from emp e2 where e2.mgr = emp.empno );
COUNT(*)
----------
8
and think about NULLS..
Now,
I assumed it was because of the null value in the MGR column:
select count(*) from emp
where empno not in (select nvl(mgr,-01) from emp)
/
8 rows selected
Now,
I assumed Oracle would change the NOT IN to something like `NOT this OR NOT this¿. Tried with a simplistic example
1 select *
2 from dual
3* where 1 <> 2 or 1 <> null
z001698@DEVELOP > /
D
-
X
But, it doesn¿t work like the above query:
1 select *
2 from dual
3* where 1 not in (2,null)
z001698@DEVELOP > /
no rows selected
The question is
1) The NOT IN does not work like NOT EXISTS because the NULL value is indeterminate?
2) The NOT EXISTS work because its not allowed to do NULL = something even NULL=NULL is the row being not selected?
3) Why does the IN or NOT IN work like (`This OR This¿)
Thanks
Ravi
June 07, 2007 - 2:46 pm UTC
1) correct
2) correct
3) because ANSI said it should. It is in the definition of the language itself.
A reader, June 08, 2007 - 5:45 am UTC
x in ('A', null) --> (x = 'A') OR (x = null)
x not in ('A', null) --> (x <> 'A') AND (x <> null)
logic rules are
NOT (a OR b) = (NOT A) AND (NOT B)
^^^
So it will be AND instead of OR ...
A reader, September 30, 2007 - 11:17 pm UTC
Hi Tom,
I need to optimize a SQL query which does an anti-join on one table.
select A from table_t
where A not in
(select distinct A from table_t
where B in ('ABC','DEF') and C ='LS')
I read about using NOT EXISTS in place of NOT IN but i can't seem to implement it into my query as the examples shown uses anti-join on 2 tables.
Is there a way to optimize an anti-join on one table? Will appreciate it if you can show me a link or so.
Thanks a lot!
October 03, 2007 - 2:29 pm UTC
not in and not exists are not equivalent in general - but when they are - then the optimizer knows both are possible and costs them both out.
eg: if you can replace the above with not exists, the optimizer probably already has.
Now, the distinct - not necessary (and the optimizer knows that and has discarded it for you)
If A is NULLABLE in table_t - bear in mind that not in works differently than not exists:
Table created.
ops$tkyte%ORA10GR2> insert into t1 values ( 1, 'xyz', 'xx' );
1 row created.
ops$tkyte%ORA10GR2> insert into t1 values ( 2, 'xyz', 'xx' );
1 row created.
ops$tkyte%ORA10GR2> insert into t1 values ( 2, 'ABC', 'LS' );
1 row created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select a
2 from t1
3 where a not in (select a from t1 where b in ('ABC','DEF') and c = 'LS' );
A
----------
1
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select a
2 from t1 x
3 where NOT EXISTS (select NULL from t1 y where y.a = x.a and y.b in ('ABC','DEF') and y.c = 'LS' );
A
----------
1
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 values ( null, 'ABC', 'LS' );
1 row created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select a
2 from t1
3 where a not in (select a from t1 where b in ('ABC','DEF') and c = 'LS' );
no rows selected
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select a
2 from t1 x
3 where NOT EXISTS (select NULL from t1 y where y.a = x.a and y.b in ('ABC','DEF') and y.c = 'LS' );
A
----------
1
You might mean to add "and a is not null" to the subquery there so that
a) we can use an anti join, if A is nullable
b) you do not get the empty set as a result if A does become NULL for some value, which is probably not what you want.
Why not hash -aj used?
jian huang zheng, October 11, 2008 - 9:05 am UTC
Hello Tom:
SQL> show parameter hash
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
hash_area_size integer 1048576
hash_join_enabled boolean TRUE
SQL> show parameter workare
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
workarea_size_policy string AUTO
SQL> show parameter pga
NAME TYPE VALUE
------------------------------------ ---------------------- ---------------------------
pga_aggregate_target big integer 104857600
SQL> select * from v$version;
BANNER
-----------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE 9.2.0.1.0 Production
TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production
I have two tables:
1. CREATE TABLE "MEDISTORE"."IMAGE_FULL_DIR"
( "IMAGE_NAME" VARCHAR2(25),
"IMAGE_FULL_DIR" VARCHAR2(255),
CONSTRAINT "IMAGE_NT" CHECK ( image_full_dir is not null) ENABLE
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "RENJI"
2.CREATE TABLE "MEDISTORE"."IMAGE_UID_IMG"
( "IMAGE_NAME" VARCHAR2(30),
CONSTRAINT "IMAGE_N" CHECK ( image_name is not null) ENABLE
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "RENJI"
3.SQL> select count(*) from image_full_dir;
COUNT(*)
----------
7287429
4.SQL> select count(*) from image_uid_img;
COUNT(*)
----------
20023885
5.SQL> exec print_table('select * from user_ind_columns where table_name=''IMAGE_FULL_DIR''');
INDEX_NAME : IMAGE_UQ
TABLE_NAME : IMAGE_FULL_DIR
COLUMN_NAME : IMAGE_NAME
COLUMN_POSITION : 1
COLUMN_LENGTH : 25
CHAR_LENGTH : 25
DESCEND : ASC
6SQL> EXEC dbms_stats.gather_table_stats(user,'IMAGE_FULL_DIR',cascade=>true);
PL/SQL procedure successfully completed.
SQL> EXEC dbms_stats.gather_table_stats(user,'IMAGE_UID_IMG',cascade=>true);
PL/SQL procedure successfully completed
SQL> exec print_table('select * from user_tables where table_name=''IMAGE_UID_IMG''');
TABLE_NAME : IMAGE_UID_IMG
TABLESPACE_NAME : RENJI
CLUSTER_NAME :
IOT_NAME :
PCT_FREE : 10
PCT_USED : 40
INI_TRANS : 1
MAX_TRANS : 255
INITIAL_EXTENT : 65536
NEXT_EXTENT :
MIN_EXTENTS : 1
MAX_EXTENTS : 2147483645
PCT_INCREASE :
FREELISTS : 1
FREELIST_GROUPS : 1
LOGGING : NO
BACKED_UP : N
NUM_ROWS : 20023885
BLOCKS : 48321
EMPTY_BLOCKS : 0
AVG_SPACE : 0
CHAIN_CNT : 0
AVG_ROW_LEN : 12
AVG_SPACE_FREELIST_BLOCKS : 0
NUM_FREELIST_BLOCKS : 0
DEGREE : 1
INSTANCES : 1
CACHE : N
TABLE_LOCK : ENABLED
SAMPLE_SIZE : 20023885
LAST_ANALYZED : 2008-10-11 20:33:19
PARTITIONED : NO
IOT_TYPE :
TEMPORARY : N
SECONDARY : N
NESTED : NO
BUFFER_POOL : DEFAULT
ROW_MOVEMENT : DISABLED
GLOBAL_STATS : YES
USER_STATS : NO
DURATION :
SKIP_CORRUPT : DISABLED
MONITORING : NO
CLUSTER_OWNER :
DEPENDENCIES : DISABLED
-----------------
PL/SQL procedure successfully completed.
SQL>
SQL> exec print_table('select * from user_tables where table_name=''IMAGE_FULL_DIR''');
TABLE_NAME : IMAGE_FULL_DIR
TABLESPACE_NAME : RENJI
CLUSTER_NAME :
IOT_NAME :
PCT_FREE : 10
PCT_USED : 40
INI_TRANS : 1
MAX_TRANS : 255
INITIAL_EXTENT : 65536
NEXT_EXTENT :
MIN_EXTENTS : 1
MAX_EXTENTS : 2147483645
PCT_INCREASE :
FREELISTS : 1
FREELIST_GROUPS : 1
LOGGING : NO
BACKED_UP : N
NUM_ROWS : 7287429
BLOCKS : 77993
EMPTY_BLOCKS : 0
AVG_SPACE : 0
CHAIN_CNT : 0
AVG_ROW_LEN : 72
AVG_SPACE_FREELIST_BLOCKS : 0
NUM_FREELIST_BLOCKS : 0
DEGREE : 1
INSTANCES : 1
CACHE : N
TABLE_LOCK : ENABLED
SAMPLE_SIZE : 7287429
LAST_ANALYZED : 2008-10-11 20:31:09
PARTITIONED : NO
IOT_TYPE :
TEMPORARY : N
SECONDARY : N
NESTED : NO
BUFFER_POOL : DEFAULT
ROW_MOVEMENT : DISABLED
GLOBAL_STATS : YES
USER_STATS : NO
DURATION :
SKIP_CORRUPT : DISABLED
MONITORING : NO
CLUSTER_OWNER :
DEPENDENCIES : DISABLED
-----------------
PL/SQL procedure successfully completed.
Now I run this query :
select image_full_dir from image_full_dir where image_name not in ( select image_name from image_uid_img );
it takes forever to run,v$session_longops says 12millions seconds to finish. explain plan says:
----------------------------------------------------------------------------
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 364K| 25M| 1693M|
|* 1 | FILTER | | | | |
| 2 | TABLE ACCESS FULL | IMAGE_FULL_DIR | 364K| 25M| 7503 |
|* 3 | TABLE ACCESS FULL | IMAGE_UID_IMG | 1002K| 14M| 4649 |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "IMAGE_UID_IM
G" "IMAGE_UID_IMG" WHERE LNNVL("IMAGE_UID_IMG"."IMAGE_N
AME"<>:B1)))
3 - filter(LNNVL("IMAGE_UID_IMG"."IMAGE_NAME"<>:B1))
Note: cpu costing is off
19 rows selected.
From the v$session_longops , it is in accordance with the actual running.
But if I rewrote sql like:
select image_full_dir from image_full_dir,image_uid_img where image_full_dir.image_name=image_uid_img.image_name(+) and image_uid_img.image_name is null;
it just takes minutes to finish and give back the results, from explain plan i see hash outerjoin is involved:
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20M| 1663M| 55577 |
|* 1 | FILTER | | | | |
|* 2 | HASH JOIN OUTER | | | | |
| 3 | TABLE ACCESS FULL | IMAGE_FULL_DIR | 7287K| 500M| 7503 |
| 4 | TABLE ACCESS FULL | IMAGE_UID_IMG | 20M| 286M| 4649 |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("IMAGE_UID_IMG"."IMAGE_NAME" IS NULL)
2 - access("IMAGE_FULL_DIR"."IMAGE_NAME"="IMAGE_UID_IMG"."IMAGE_NA
ME"(+))
Note: cpu costing is off
the image_name in this two tables are all unique.
I would expect the first query using hash-aj,but why oracle choose to full scan table? And if you rewrite the sql , why oracle choose the different plan?
It puzzles me so much, and I most am appreciated if you can provide some clue about them.
October 13, 2008 - 2:50 am UTC
CREATE TABLE "MEDISTORE"."IMAGE_UID_IMG"
( "IMAGE_NAME" VARCHAR2(30),
CONSTRAINT "IMAGE_N" CHECK ( image_name is not null)
why didn't you just use NOT NULL?
we don't know it is not null for real - so we are doing processing as if it were NULLABLE.
still the same plan,after change the not null?
jian huang zheng, October 13, 2008 - 11:41 am UTC
Hello Tom
Thanks for your advice. but after I do this:
SQL> select * from user_constraints where table_name='IMAGE_UID_IMG';
OWNER CONSTRAINT_NAME CO TABLE_NAME SEARCH_CONDITION
------------------------------------------------------------ ------------------------------------------------------------ -- ------------------------------------------------------------ -------------------------------------
MEDISTORE IMAGE_N C IMAGE_UID_IMG image_name is not null
SQL> alter table image_uid_img drop constraints image_n;
Table altered.
SQL> alter table image_uid_img modify image_name not null;
Table altered.
SQL> explain plan for select image_full_dir from image_full_dir where image_name not in ( select image_name from image_
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 364K| 25M| 1693M|
|* 1 | FILTER | | | | |
| 2 | TABLE ACCESS FULL | IMAGE_FULL_DIR | 364K| 25M| 7503 |
|* 3 | TABLE ACCESS FULL | IMAGE_UID_IMG | 1001K| 11M| 4649 |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "IMAGE_UID_IMG" "IMAG
E_UID_IMG" WHERE LNNVL("IMAGE_UID_IMG"."IMAGE_NAME"<>:B
1)))
3 - filter(LNNVL("IMAGE_UID_IMG"."IMAGE_NAME"<>:B1))
Note: cpu costing is off
19 rows selected.
SQL>
Still the same plan, anything that I miss?
Thanks~!
October 14, 2008 - 5:00 pm UTC
full examples from start to finish always please, sort of like this:
ops$tkyte%ORA11GR1> create table t1 ( x int, y int, z int );
Table created.
ops$tkyte%ORA11GR1> exec dbms_stats.set_table_stats( user, 'T1', numrows=>1000000, numblks => 100000 );
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR1> create table t2 ( a int NOT NULL, b int, c int );
Table created.
ops$tkyte%ORA11GR1> exec dbms_stats.set_table_stats( user, 'T2', numrows=>1000000, numblks => 100000 );
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> @at
ops$tkyte%ORA11GR1> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA11GR1> set autotrace traceonly explain
ops$tkyte%ORA11GR1> select * from t1 where x not in ( select a from t2 );
Execution Plan
----------------------------------------------------------
Plan hash value: 3756544281
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 52 | | 57905
|* 1 | HASH JOIN RIGHT ANTI SNA| | 1 | 52 | 23M| 57905
| 2 | TABLE ACCESS FULL | T2 | 1000K| 12M| | 27142
| 3 | TABLE ACCESS FULL | T1 | 1000K| 37M| | 27145
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("X"="A")
ops$tkyte%ORA11GR1> set autotrace off
I cant simulate your results , very different to understand?
jian huang zheng, October 15, 2008 - 11:41 am UTC
Hello Tom
Thanks for your answer. I just copy and paste your example as :
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
SQL> create table t1 ( x int, y int, z int );
Table created.
SQL> exec dbms_stats.set_table_stats(user,'T1',numrows=>1000000,numblks=>100000);
PL/SQL procedure successfully completed.
SQL> create table t2 ( a int NOT NULL, b int, c int );
Table created.
SQL>
SQL> exec dbms_stats.set_table_stats(user,'T2',numrows=>1000000,numblks=>100000);
PL/SQL procedure successfully completed.
SQL> explain plan for select * from t1 where x not in ( select a from t2 );
Explained.
SQL>
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50000 | 1904K| 480M|
|* 1 | FILTER | | | | |
| 2 | TABLE ACCESS FULL | T1 | 50000 | 1904K| 9619 |
|* 3 | TABLE ACCESS FULL | T2 | 50000 | 634K| 9619 |
--------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "T2" "T2" WHERE L
NNVL("T2"."A"<>:B1)))
3 - filter(LNNVL("T2"."A"<>:B1))
Note: cpu costing is off
18 rows selected.
by the way,I dont know if those are needed:
SQL> Show parameter hash
NAME TYPE VALUE
------------------------------------ ---------------------- -----------
hash_area_size integer 1048576
hash_join_enabled boolean TRUE
SQL> show parameter opti
NAME TYPE VALUE
------------------------------------ ---------------------- ----------
filesystemio_options string
object_cache_optimal_size integer 102400
optimizer_dynamic_sampling integer 1
optimizer_features_enable string 9.2.0
optimizer_index_caching integer 0
optimizer_index_cost_adj integer 100
optimizer_max_permutations integer 2000
optimizer_mode string ALL_ROWS
SQL> show parameter pga
NAME TYPE VALUE
------------------------------------ ---------------------- ----------
pga_aggregate_target big integer 314572800
SQL> show parameter work
NAME TYPE VALUE
------------------------------------ ---------------------- ----------
workarea_size_policy string AUTO
Thanks,it seems to me that hash-aj is some sort of disabled? Could you please shed some lihgt on it?
October 15, 2008 - 5:48 pm UTC
yes, looks like 9i did not have that feature in the optimizer at that point in time.
OK,THANKS
A reader, October 16, 2008 - 7:45 am UTC
Thanks,I got it. Much appreciated your time..
A reader, June 09, 2009 - 6:16 pm UTC
Hey Tom,
I recently came across something strange while toying with rewriting a query with a "NOT IN".
Basically I have around 1.5 M transactions, each of which has exactly one row in TRANS_HEADER. Each transactions consists of 0..N rows in TRANS_ROW. There's a total of around 20 M transaction rows. I wanted to find out all the transactions with no transaction rows associated with them.
I went ahead and wrote the simple query to return all such transactions but noticed the plan wasn't exactly something I'd like. So I reformulated my search as "return all transactions which are not in the group of transactions that DO have rows". To my surprise the second query actually runs in less than a minute (a reasonable time) and returns 40 rows (no reason to think that wouldn't be "correct").
I don't quite understand why the plans are so vastly different and why the more complex query is faster. Are my two queries not logically equivalent after all?
EXPLAIN PLAN FOR
SELECT DISTINCT ID FROM TRANS_HEADER WHERE ID NOT IN (
SELECT TRANS_ID FROM TRANS_ROW);
Explained.
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 610510780
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1616K| 9469K| | 171G (2)|999:59:59 |
| 1 | HASH UNIQUE | | 1616K| 9469K| 37M| 171G (2)|999:59:59 |
|* 2 | FILTER | | | | | | |
| 3 | INDEX FAST FULL SCAN| TRANS_HEADER_PK | 1635K| 9584K| | 673 (2)| 00:00:09 |
|* 4 | TABLE ACCESS FULL | TRANS_ROW | 41 | 246 | | 105K (2)| 00:21:03 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "TRANS_ROW" "TRANS_ROW"
WHERE LNNVL("TRANS_ID"<>:B1)))
4 - filter(LNNVL("TRANS_ID"<>:B1))
18 rows selected.
EXPLAIN PLAN FOR
SELECT DISTINCT ID FROM TRANS_HEADER WHERE ID NOT IN (
SELECT ID FROM TRANS_HEADER WHERE ID IN (
SELECT TRANS_ID FROM TRANS_ROW));
Explained.
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1670912697
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | 197K| 3657K| | 27790 (2)| 00:05:34 |
|* 1 | HASH JOIN ANTI | 197K| 3657K| 28M| 27790 (2)| 00:05:34 |
| 2 | INDEX FAST FULL SCAN | TRANS_HEADER_PK | 1635K| 9584K| | 671 (2)| 00:00:09 |
| 3 | VIEW | VW_NSO_1 | 20M| 259M| | 740 (11)| 00:00:09 |
| 4 | NESTED LOOPS | | 20M| 239M| | 740 (11)| 00:00:09 |
| 5 | INDEX FAST FULL SCAN| TRANS_HEADER_PK | 1635K| 9584K| | 671 (2)| 00:00:09 |
|* 6 | INDEX RANGE SCAN | TRANSROW_TRANS_INDX | 13 | 78 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"="$nso_col_1")
6 - access("ID"="TRANS_ID")
19 rows selected.
June 10, 2009 - 11:45 am UTC
first, proof that they are not semantically equivalent - I can get your two queries to return different results. I have to assume that TRANS_ROW allows for TRANS_ID to be nullable:
ops$tkyte%ORA10GR2> create table trans_header( id number primary key, data char(80) );
Table created.
ops$tkyte%ORA10GR2> create table trans_row( trans_id number );
Table created.
ops$tkyte%ORA10GR2> insert into trans_header values ( 1, 'x' );
1 row created.
ops$tkyte%ORA10GR2> insert into trans_header values ( 2, 'y' );
1 row created.
ops$tkyte%ORA10GR2> insert into trans_row values ( null );
1 row created.
ops$tkyte%ORA10GR2> insert into trans_row values ( 2 );
1 row created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> SELECT DISTINCT ID FROM TRANS_HEADER WHERE ID NOT IN (
2 SELECT TRANS_ID FROM TRANS_ROW);
no rows selected
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> SELECT DISTINCT ID FROM TRANS_HEADER WHERE ID NOT IN (
2 SELECT ID FROM TRANS_HEADER WHERE ID IN (
3 SELECT TRANS_ID FROM TRANS_ROW));
ID
----------
1
so, they are not the same - and the reason that the first query performs "slowly" for you is the nullable quality of the trans_id column, consider:
ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'TRANS_HEADER', numrows => 1635000 );
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'TRANS_ROW', numrows => 41 );
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set serveroutput off
ops$tkyte%ORA10GR2> delete from plan_table;
4 rows deleted.
ops$tkyte%ORA10GR2> explain plan for
2 SELECT DISTINCT ID FROM TRANS_HEADER WHERE ID NOT IN ( SELECT TRANS_ID FROM TRANS_ROW);
Explained.
ops$tkyte%ORA10GR2> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 457573287
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1634K| 20M| 2982K (1)| 06:06:01 |
|* 1 | FILTER | | | | | |
| 2 | INDEX FAST FULL SCAN| SYS_C0038082 | 1635K| 20M| 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | TRANS_ROW | 2 | 26 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "TRANS_ROW" "TRANS_ROW" WHERE
LNNVL("TRANS_ID"<>:B1)))
3 - filter(LNNVL("TRANS_ID"<>:B1))
17 rows selected.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> delete from plan_table;
4 rows deleted.
ops$tkyte%ORA10GR2> explain plan for
2 SELECT DISTINCT ID FROM TRANS_HEADER WHERE ID NOT IN ( SELECT TRANS_ID FROM TRANS_ROW where trans_id is not null);
Explained.
ops$tkyte%ORA10GR2> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3077635579
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1515K| 37M| 26 (58)| 00:00:01 |
|* 1 | HASH JOIN RIGHT ANTI | | 1515K| 37M| 26 (58)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | TRANS_ROW | 2 | 26 | 8 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| SYS_C0038082 | 1635K| 20M| 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"="TRANS_ID")
2 - filter("TRANS_ID" IS NOT NULL)
16 rows selected.
add the "is not null" to the subquery and we can use the hash anti join (fast) rather than a filter - the nullness changes the meaning of NOT IN
where x not in (any set that contains NULL) <<<== never true, never false, rather it is UNKNOWN
where x not in (any set of NOT NULL values) <<<=== always true OR false assuming X itself is NOT NULL
You allowed for the anti join to be used since ID is probably NOT NULL in your trans_header table so using it as the target of a NOT IN allowed it to be 'safe' to anti join with.