Thanks for the question regarding "IN & EXISTS", version 8.1.6
Seetharaman Srinivasan, March     31, 2001 - 7:29 pm UTC
 
 
Excellent 
 
 
Thanks for the question regarding "IN & EXISTS", version 8.1.6 
Mani Srinivasan, April     11, 2001 - 3:00 pm UTC
 
 
This is good stuff.... 
 
 
Helena Markova, May       18, 2001 - 2:13 am UTC
 
 
  
 
Could not Call Report from Form (Menu)
Mohd. Manik Miah, May       28, 2001 - 1:45 am UTC
 
 
Dear Sir
I learned oracle a few months ago. 
But I could not call report from Form (Menu). 
Would you please help me in this regard?
Thank & best regards
Manik Miah 
manikmiah@yahoo.com 
 
 
Instead of T, why cant you use EMP AND DEPT TABLES
A reader, September 28, 2001 - 4:31 pm UTC
 
 
Tom
Can you illustrate your answer by using the emp dept table.
and by 
1.increasing and decreasing the number of rows in both the tables.
2.by dropping and adding indexes on both the tables
That way your explanation will be vindicated.
Can you do that please. 
 
September 29, 2001 - 2:43 pm UTC 
 
 
Well -- I'm not going to use EMP and DEPT as I would  have to generate tons of EMP and DEPT data to illustrate my point (feel free to do that if you want to ;)
I'll use BIG and SMALL to make the point. I ran:
rem create table big as select * from all_objects;
rem insert /*+ append */ into big select * from big;
rem commit;
rem insert /*+ append */ into big select * from big;
rem commit;
rem insert /*+ append */ into big select * from big;
rem create index big_idx on big(object_id);
rem
rem
rem create table small as select * from all_objects where rownum < 100;
rem create index small_idx on small(object_id);
rem
rem analyze table big compute statistics
rem for table
rem for all indexes
rem for all indexed columns
rem /
rem analyze table small compute statistics
rem for table
rem for all indexes
rem for all indexed columns
rem /
so, small has 99 rows, big has 133,000+
select count(subobject_name)
  from big
 where object_id in ( select object_id from small )
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.02       0.02          0        993          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.03       0.03          0        993          0           1
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
    792    MERGE JOIN
    100     SORT (JOIN)
    100      VIEW OF 'VW_NSO_1'
     99       SORT (UNIQUE)
    792        INDEX   GOAL: ANALYZED (FULL SCAN) OF 'SMALL_IDX'
                   (NON-UNIQUE)
    891     SORT (JOIN)
      0      TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'BIG'
versus:
select count(subobject_name)
  from big
 where exists ( select null from small where small.object_id = big.object_id )
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      4.12       4.12          0     135356         15           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      4.12       4.12          0     135356         15           1
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
    792    FILTER
 135297     TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'BIG'
 133504     INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'SMALL_IDX'
                (NON-UNIQUE)
That shows if the outer query is "big" and the inner query is "small", in is generally more efficient then NOT EXISTS.
Now:
select count(subobject_name)
  from small
 where object_id in ( select object_id from big )
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2      0.51       0.82         50        298         22           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.52       0.83         50        298         22           1
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     99    MERGE JOIN
  16913     SORT (JOIN)
  16912      VIEW OF 'VW_NSO_1'
  16912       SORT (UNIQUE)
 135296        INDEX   GOAL: ANALYZED (FAST FULL SCAN) OF 'BIG_IDX'
                   (NON-UNIQUE)
     99     SORT (JOIN)
     99      TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'SMALL'
versus:
select count(subobject_name)
  from small
 where exists ( select null from big where small.object_id = big.object_id )
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.01       0.01          0        204         12           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.01       0.01          0        204         12           1
EGATE)
     99    FILTER
    100     TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'SMALL'
     99     INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'BIG_IDX' (NON-UNIQUE)
shows that is the outer query is "small" and the inner query is "big" -- a WHERE EXISTS can be quite efficient.
 
 
 
 
What difference does it make
A reader, September 30, 2001 - 12:12 am UTC
 
 
when you remove those indexes on the small and big tables.
 
 
September 30, 2001 - 8:57 am UTC 
 
 
Run it and see.  All of the necessary code is there -- it is very easy to do these sorts of tests.
The outcome is easy to hypothesize however.
The "select * from big where object_id in ( select object_id from small )" will sort BIG once and SMALL once and join them (sort merge join) in all likelyhood.
The "select * from big where exists ( select null from small where small.object_id = big.object_id )" will tend to FULL SCAN big once and FOR EVERY ROW in big -- FULL SCAN small.
(after running it -- i verified it and the "rule of thumb" holds true.  BIG outer query and SMALL inner query = IN.  SMALL outer query and BIG inner query = WHERE EXISTS.  Remember -- thats is a RULE OF THUMB and rules of thumb always have infinitely many exceptions to the rule.
 
 
 
 
Tom, this is very clever and not fair
TJ, September 30, 2001 - 9:33 am UTC
 
 
You have explained the whole of the in and exists concepts so well( no one has ever done in like this, you must be one happy person, you believe in sharing)
and now you leave us with an open ended answer..
"Remember -- thats is a RULE OF THUMB and rules of thumb always have infinitely many exceptions to the rule."
Kindly discuss what can be some of those possible exceptions, tell us what factors can cause  those exceptions.
 
 
September 30, 2001 - 10:02 am UTC 
 
 
Well, there are infinitely many -- IO can affect the outcome of this.  The goal of the query can effect this (eg: the WHERE EXISTS will find the first row faster in general then the IN will -- the IN will get the LAST row (all rows) faster then the where exists).  If your goal is the FIRST row -- exists might totally blow away IN.  If you are a batch process (and hence getting to the LAST row is vital) .... 
and so on.  The point is:  be aware of their differences, try them both when tuning, understand conceptually what they do and you'll be able to use them to maximum effect.
 
 
 
 
does this explanation hold good even for NOT IN  and NOT EXISTS
Chenna, September 30, 2001 - 7:18 pm UTC
 
 
  
 
Summary
Chenna, October   01, 2001 - 10:38 am UTC
 
 
This is discussion of the top most class --
O.k. let me try to sum up what you are saying --
in and exists can be substitutes for each other, and we have to use one in the place of other depending upon the rowns returned by the outer and inner query,
where as
'not in' and 'not exists' are not substitutes for each other , and they are not related in any way. There use depends on the specific need.
o.k. to sum it up finally , can you give us examples of when the use of 'not in' is appropriate and when 'not exists' is appropriate. 
 
October   01, 2001 - 3:05 pm UTC 
 
 
you got the first part right (in/exists)
the second part is too strong.  not in and not exists are not perfect substitutes.  NOT IN is different then NOT exists but NOT EXISTS and NOT IN are the same when the subquery you use in the NOT IN does not contain NULLS.
Both of not in and not exists can be very efficient when there are no nulls (and not in WITH THE CBO is pretty good -- using an "anti join" -- see the design/tuning for performance guide for details  on that).  With NULLS -- a not in can be very in-efficient and many people substitute a not exists for it (not realzing the ANSWER just changed!!!)
 
 
 
 
anti join
A reader, October   01, 2001 - 3:32 pm UTC
 
 
(and not in WITH THE CBO is pretty good -- using an "anti join" -- see the design/tuning for performance guide for details  on that)
Please give the link to the concerned page in the manual, there is no search feature in the document to search for 'anti join' 
 
October   01, 2001 - 8:30 pm UTC 
 
 
 
 
the mystery with non in and not exists
A reader, October   01, 2001 - 9:07 pm UTC
 
 
"Both of not in and not exists can be very efficient when there are no nulls (and not in WITH THE CBO is pretty good -- using an "anti join" -- see the design/tuning for performance guide for details  on that).  With NULLS -- a not in can be very in-efficient and many people substitute a not exists for it (not realzing the ANSWER just changed!!!) "
1.Not in and not exists return the same performance result when there arer no nulls.
2.Not in is especially very efficient when used with cbo and no nulls.
3.And when there are nulls what is preferred is not exists, and not not in.
I think Iam right in what I understood above.
Now
1.Why is not in in efficient when there are nulls. Can you demonstrate this.
2.And how is not exists better over not in when there are nulls.
3.What do you mean by saying
"and many people substitute a not exists for it (not realzing the ANSWER just changed!!!) "
do you mean not exists should be substituted for  not in when there are nulls ???????
 
 
October   02, 2001 - 7:46 pm UTC 
 
 
1a) not in can be faster then not exists (or slower).  The are different, do different things.  A not exists is alot like an EXISTS.  A not in when the subquery cannot contain NULLS can be alot like an IN (processed as an ANTI-JOIN instead of a JOIN).  So, just like IN is sometimes better then EXISTS and vice versa, NOT IT is sometimes (and sometimes not) better then NOT EXISTS
2a) thats when it works best, yes
3a) typically...
1b) Ok, time for the demo of NOT IN and NOT EXISTS (i guess I knew it was coming eventually!)
Here we go:
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table big as select * from all_objects;
Table created.
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table small as select * from all_objects where rownum < 10;
Table created.
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table big modify object_id null;
Table altered.
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table small modify object_id null;
Table altered.
ops$tkyte@ORA817DEV.US.ORACLE.COM> update small set object_id = null where rownum = 1;
1 row updated.
ops$tkyte@ORA817DEV.US.ORACLE.COM> create unique index big_idx on big(object_id);
Index created.
ops$tkyte@ORA817DEV.US.ORACLE.COM> create unique index small_idx on small(object_id);
Index created.
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table big compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns;
Table analyzed.
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table small compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns;
Table analyzed.
So, there is the big setup -- we have our tables.  SMALL will be used in the subquery.  As it stands now -- lets see what happens with NOT IN and NOT EXISTS:
ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly
ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from big
  2   where NOT EXISTS (select null
  3                       from small
  4                      where small.object_id = big.object_id)
  5  /
16886 rows selected.
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
   3    1     INDEX (UNIQUE SCAN) OF 'SMALL_IDX' (UNIQUE)
Statistics
----------------------------------------------------------
        305  recursive calls
         15  db block gets
      18278  consistent gets
          0  physical reads
          0  redo size
    1961046  bytes sent via SQL*Net to client
     125305  bytes received via SQL*Net from client
       1127  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      16886  rows processed
So, for each row in BIG it did an INDEX PROBE into SMALL to see if that row existed or not.  So far, so good...  18k consistent reads (high) but not as high as:
ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from big
  2   where object_id NOT IN ( select object_id
  3                              from small )
  4  /
no rows selected
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
   3    1     TABLE ACCESS (FULL) OF 'SMALL' (Cost=1 Card=1 Bytes=3)
Statistics
----------------------------------------------------------
          0  recursive calls
     202743  db block gets
      84706  consistent gets
          0  physical reads
          0  redo size
        862  bytes sent via SQL*Net to client
        319  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          0  rows processed
Wow -- what happened here?  Well, for each row in BIG -- we FULL SCANNED small -- yes, for each row, we full scanned small -- we full scanned small some 16,000 times (imagine if small weren't so small).  Also take care to note that this query processed 0 rows - no data found.  The prior one got over 16k -- that is the effect of a NULL in the subquery of a NOT IN.
So, lets get rid of that nasty (we don't have to get rid of it, we could use a predicate to preclude it as well and we'll do that on BIG as a demonstration)
ops$tkyte@ORA817DEV.US.ORACLE.COM> update small set object_id = -1 where object_id is null;
1 row updated.
ops$tkyte@ORA817DEV.US.ORACLE.COM> commit;
Commit complete.
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table small modify object_id NOT NULL;
Table altered.
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table small compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns;
Table analyzed.
ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big
  2   where NOT EXISTS (select null
  3                       from small
  4                      where small.object_id = big.object_id)
  5  /
16886 rows selected.
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
   3    1     INDEX (UNIQUE SCAN) OF 'SMALL_IDX' (UNIQUE)
Statistics
----------------------------------------------------------
         71  recursive calls
         15  db block gets
      18270  consistent gets
          0  physical reads
          0  redo size
    1961046  bytes sent via SQL*Net to client
     125305  bytes received via SQL*Net from client
       1127  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      16886  rows processed
So, that is no different... but:
ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b0
  2   where object_id IS NOT NULL
  3     and object_id NOT IN ( select object_id
  4                              from small )
  5  /
16886 rows selected.
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
   3    1     INDEX (UNIQUE SCAN) OF 'SMALL_IDX' (UNIQUE)
Statistics
----------------------------------------------------------
          0  recursive calls
         15  db block gets
      18243  consistent gets
          0  physical reads
          0  redo size
    1961046  bytes sent via SQL*Net to client
     125305  bytes received via SQL*Net from client
       1127  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
      16886  rows processed
Hey -- thats pretty good already. Its just like the NOT EXISTS.  But wait -- there is more.  Lets start anti-joining...
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set always_anti_join = MERGE
  2  /
Session altered.
ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b1
  2   where object_id is not null
  3     and object_id NOT IN ( select object_id
  4                              from small )
  5  /
16886 rows selected.
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=269 Card=16885 Bytes=1857350)
   1    0   MERGE JOIN (ANTI) (Cost=269 Card=16885 Bytes=1857350)
   2    1     SORT (JOIN) (Cost=267 Card=16894 Bytes=1638718)
   3    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=16894 Bytes=1638718)
   4    1     SORT (UNIQUE) (Cost=3 Card=9 Bytes=117)
   5    4       VIEW OF 'VW_NSO_1' (Cost=1 Card=9 Bytes=117)
   6    5         INDEX (FULL SCAN) OF 'SMALL_IDX' (UNIQUE) (Cost=1 Card=9 Bytes=36)
Statistics
----------------------------------------------------------
          0  recursive calls
         19  db block gets
        237  consistent gets
        236  physical reads
          0  redo size
    1863304  bytes sent via SQL*Net to client
     125305  bytes received via SQL*Net from client
       1127  SQL*Net roundtrips to/from client
          4  sorts (memory)
          1  sorts (disk)
      16886  rows processed
Now that is much much less work -- from 18k to 237 consistent gets -- not bad.  The hash anti join is similar:
ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set always_anti_join = HASH
  2  /
Session altered.
ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b2
  2   where object_id is not null
  3     and object_id NOT IN ( select object_id
  4                              from small )
  5  /
16886 rows selected.
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=145 Card=16885 Bytes=1857350)
   1    0   HASH JOIN (ANTI) (Cost=145 Card=16885 Bytes=1857350)
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=16894 Bytes=1638718)
   3    1     VIEW OF 'VW_NSO_1' (Cost=1 Card=9 Bytes=117)
   4    3       INDEX (FULL SCAN) OF 'SMALL_IDX' (UNIQUE) (Cost=1 Card=9 Bytes=36)
Statistics
----------------------------------------------------------
          0  recursive calls
         15  db block gets
        237  consistent gets
         31  physical reads
          0  redo size
    2350779  bytes sent via SQL*Net to client
     125305  bytes received via SQL*Net from client
       1127  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
      16886  rows processed
2b) see 1b)
3b) see 1b) where the NOT IN returns 0 rows, but NOT EXISTS returns 16k -- they are NOT the same when NULLs are involved.
 
 
 
 
 
ANTI JOIN and 'ANY', 'SOME','ALL'
Chenna, October   02, 2001 - 4:27 pm UTC
 
 
Optimizing Anti-Joins and Semi-Joins 
An anti-join returns rows from the left side of the predicate for which there is no corresponding row on the right side of the predicate. That is, it returns rows that fail to match (NOT IN) the subquery on the right side. For example, an anti-join can select a list of employees who are not in a particular set of departments: 
SELECT * FROM emp 
  WHERE deptno NOT IN 
    (SELECT deptno FROM dept 
      WHERE loc = 'HEADQUARTERS'); 
The optimizer uses a nested loops algorithm for NOT IN subqueries by default, unless the initialization parameter ALWAYS_ANTI_JOIN is set to MERGE or HASH and various required conditions are met that allow the transformation of the NOT IN subquery into a sort-merge or hash anti-join. You can place a MERGE_AJ or HASH_AJ hint in the NOT IN subquery to specify which algorithm the optimizer should use. 
A semi-join returns rows that match an EXISTS subquery, without duplicating rows from the left side of the predicate when multiple rows on the right side satisfy the criteria of the subquery. For example: 
SELECT * FROM dept 
  WHERE EXISTS 
    (SELECT * FROM emp 
      WHERE dept.ename = emp.ename 
        AND emp.bonus > 5000); 
In this query, only one row needs to be returned from dept even though many rows in emp might match the subquery. If there is no index on the bonus column in emp, then a semi-join can be used to improve query performance. 
The optimizer uses a nested loops algorithm for EXISTS subqueries by default, unless the initialization parameter ALWAYS_SEMI_JOIN is set to MERGE or HASH and various required conditions are met. You can place a MERGE_SJ or HASH_SJ hint in the EXISTS subquery to specify which algorithm the optimizer should use. 
SELECT * FROM dept 
  WHERE EXISTS 
    (SELECT * FROM emp 
      WHERE dept.ename = emp.ename 
        AND emp.bonus > 5000); 
What difference does it make when there is an index on the bonus column.
A semi join is already being executed in the above query, right ?
How will the use of MERGE_SJ or HASH_SJ hint in the EXISTS subquery improve the performance.
what do these hints tell the optimizer.
And , finally, can you give examples of using 'ANY' ,'ALL' AND 'SOME' clauses, as to where and when to use these clauses. We have never used these clauses.
 
 
October   02, 2001 - 7:53 pm UTC 
 
 
ANY/SOME are alot like an in.
where x in ( select y from t )
is the same as:
where x = ANY(select y from t)
SOME is an aliase for ANY, ANY is the same as SOME. 
any/some can be useful with inequalities:
scott@ORA817DEV.US.ORACLE.COM> select ename, sal, comm from emp where sal < some( select comm from emp );
ENAME             SAL       COMM
---------- ---------- ----------
A                 800
WARD             1250        500
MARTIN           1250       1400
ADAMS            1100
JAMES             950
MILLER           1300
6 rows selected.
that finds emps such that there exists SOME commision greater then their salary....
ALL is a cool one.  I use it to find the most frequently ocurring thing, for example:
scott@ORA817DEV.US.ORACLE.COM> select to_char(hiredate,'mm')
  2  from emp 
  3  group by to_char(hiredate,'mm')
  4  having count(*) >= ALL ( select count(*) from emp group by to_char(hiredate,'mm') );
TO
--
12
that shows me the month with the MOST hiredates....  it says give me the month whose count is greater than or equal to ALL other counts by month...
 
 
 
 
Tom, will the result be the same  if ..
Chenna, October   03, 2001 - 2:18 am UTC
 
 
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set always_anti_join = MERGE
  2  /
Session altered.
ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b1
  2   where object_id is not null
  3     and object_id NOT IN ( select object_id
  4                              from small )
  5  /
16886 rows selected.
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=269 Card=16885 Bytes=1857350)
   1    0   MERGE JOIN (ANTI) (Cost=269 Card=16885 Bytes=1857350)
   2    1     SORT (JOIN) (Cost=267 Card=16894 Bytes=1638718)
   3    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=16894 Bytes=1638718)
   4    1     SORT (UNIQUE) (Cost=3 Card=9 Bytes=117)
   5    4       VIEW OF 'VW_NSO_1' (Cost=1 Card=9 Bytes=117)
   6    5         INDEX (FULL SCAN) OF 'SMALL_IDX' (UNIQUE) (Cost=1 Card=9 
Bytes=36)
Statistics
----------------------------------------------------------
          0  recursive calls
         19  db block gets
        237  consistent gets
        236  physical reads
          0  redo size
    1863304  bytes sent via SQL*Net to client
     125305  bytes received via SQL*Net from client
       1127  SQL*Net roundtrips to/from client
          4  sorts (memory)
          1  sorts (disk)
      16886  rows processed
1.Tom, will the result be the same  if there is no index on the object_id of the small table.
2.You are using CBO , all the way, what will be the result of all these tests for 'in and exists' and 'not in and not exists' if we are using RBO.
3.After we analyze the tables, how should we switch to RBO if we want to.
4.Will the push_sq (push sub query ) come into picture in any way inthe above examples. Actually when is this hint relevant.
5.out of context: Can you give us the link to the oracle error messages manual.
6.Suppose if the query's where clause has 5 sub queries, for the third sub query in the where clause , what will be the outer query, I mean how will it resolve.
For the fifth sub query in the where clause, I guess the whole of the restof the query will be the outer query ? 
 
October   03, 2001 - 8:02 am UTC 
 
 
1) yes more or less -- the ANSWER will be the same (16,886 rows processed).  The plan will be slightly different -- a FULL SCAN of small instead of an index full scan.  (you have ALL of the code needed to replicate my tests -- try it out!)
2) I don't recommend the RBO with 8i -- too many features not available.  But -- hey -- you have the code, give it a whirl, try it out (but make sure you have lots of time -- the RBO is not as smart as the CBO when it comes to this -- reread this entire article, I stress the need for the CBO to make this efficient!)
3) use an alter session set optimizer_goal or /*+ RULE */ hint in the query.
4) push_subq affects when subqueries are evaluated (last by default, earlier if you ask for it).  So yes, it could affect the plan here.
5) all documentation is available from </code>  
http://otn.oracle.com/,  <code>the Oracle error messages manual varies from release to release.  Suggest you goto otn, follow the documentation link and goto the version you are interested in.
6) this does not make syntactic sense to me.  
 
 
 
Train us to use parallel query
A reader, October   03, 2001 - 6:37 pm UTC
 
 
From the oracle magazine article
What Can Execute in Parallel? 
By using parallel options, data-sensitive SQL state-
ments, database recovery, and data loads can be exe-
cuted by multiple processes simultaneously. With the
release of Oracle 8i, all the following operations can
be executed in parallel: 
Table scan 
PL/SQL functions called from SQL 
Sort merge join 
"Not in"
Nested loop join 
Select distinct
Hash join 
Aggregation
Group by 
Cube
Union and union all 
Create table as select
Order by 
Rebuild index partition
Rollup 
Move partition
Create index 
Update 
Rebuild index 
Insert . . . select 
Split partition 
Enable constraint (the table
Delete scan is parallelized)
Star transformation 
 
If the want to execute the above not in sub queries in parallel, actually when should I decide to do it in paralle.
Will it suffice if I give the following.
Select * from big b1
  2   where object_id is not null
  3     and object_id NOT IN ( select/*+ PARALLEL(a1,8)*/    object_id  from small a1)
usually I see that it is the  Sort merge joins which take the most amount of time in a query, how can I parallelize a sort merge join. 
Thank You 
   
 
October   04, 2001 - 6:16 am UTC 
 
 
If you have to parallelize the subquery -- you are in a heap of trouble.  that means it is really big and since you would be full scanning IT once per row in big -- its going to be slow.  Look at the other options below in order to AVOID having to full scan it once per row.
My advice here -- try it yourself.  Take your ideas, set up the test case, test it out -- see what works, what doesn't.  Its the best way to really learn and understand how this works. 
 
 
 
I will do it only my own but --  --
Chenna, October   04, 2001 - 11:45 am UTC
 
 
Select * from big b1
  2   where object_id is not null
  3     and object_id NOT IN ( select/*+ PARALLEL(a1,8)*/    object_id  from 
small a1)
O.k. , let me try parallel, for myself, and report the conclusions to you for discussion.
But before that , you need to weed out these small doubts for me ..
1.select/*+ PARALLEL(a1,8)*/    object_id  from 
small a1  -- before using the parallel hint, do I need to change any init.ora parameters, 
2.I understand that 8 refers to the number of processors, how can I know the number of processors available for my server.
3.Where you use parallel query, how does my query get divided between multiple processors. Where are the results from each divided query kept , and then combined , before they are displayed to the user.
4.Give us a brief insignt into the internals of how parallel query works.
And by the way, I cant see the asktom or any other images  on this page. Some thing you might want to know about.
 
 
October   04, 2001 - 8:15 pm UTC 
 
 
1) probably not, do a show parameter parallel and check out the settings - make sure parallel max servers isn't 0
2) you own the machine don't you?  show parameter cpu_count will tell you as well
3) read
</code>  
http://docs.oracle.com/docs/cd/A81042_01/DOC/server.816/a76965/c22paral.htm#365   <code>
4) see #3
as for the images, must have been a temporary "issue" - they seem alright now.  
 
 
 
Chenna, October   04, 2001 - 10:53 pm UTC
 
 
SQL> connect system/manager
Connected.
SQL> show parameter cpu_count
NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
cpu_count                            integer 1
SQL>  show parameter parallel
NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
fast_start_parallel_rollback         string  LOW
optimizer_percent_parallel           integer 0
parallel_adaptive_multi_user         boolean FALSE
parallel_automatic_tuning            boolean FALSE
parallel_broadcast_enabled           boolean FALSE
parallel_execution_message_size      integer 2148
parallel_instance_group              string
parallel_max_servers                 integer 5
parallel_min_percent                 integer 0
parallel_min_servers                 integer 0
parallel_server                      boolean FALSE
NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
parallel_server_instances            integer 1
parallel_threads_per_cpu             integer 2
recovery_parallelism                 integer 0
SQL> 
The cpu count shows only one , but look at this 
parallel_max_servers                 integer 5.
1.how can the parallel_max_servers be 5 when the cpu count is 1.
2.Does this mean that I can give a parallel hint to use 5 parallel servers.
3.When the cpu count is 1 I expect the parallel_max_servers 
to be 1. 
Perplexing? Need to be clarified.
 
 
 
October   05, 2001 - 11:49 am UTC 
 
 
did you read the documentation i pointed you to?
you can do parallel query on a single cpu box.  You can do a parallel 100 query on a 20 cpu box.  You need not have parity between processes and cpu's (if you did, unix would be a single user OS).
You can have parallel_max_servers at whatever you want, regardless of the number of cpus on the system.  Since parallel queries typically involve tons of IO, and IO is something you wait for -- there is plenty of opportunity for using parallel query on a single CPU system (when one process is waiting for IO, blocked, the other process can be munching through some data it read).
Read the docs, get a good understanding of how it really works -- then it won't be confusing. 
 
 
 
sub query 
Jay, October   06, 2001 - 5:34 pm UTC
 
 
Tom
Lets assume that the where clause has 10 conditions, and each condition gets its value from a sub query. IN what order do these sub queries get resolved.
select a from b,c
were b.x=c.x
and b.t=(select t from some ttablle where something)
and c.l=(select l from some xtable where something)
Now when resolving and c.l=(select l from some xtable where something)
will select a from b,c
were b.x=c.x
and b.t=(select t from some ttablle where something)
be considered as outer query, or only
select a from b,c
were b.x=c.x
be considered as outer query
 
 
October   07, 2001 - 4:02 pm UTC 
 
 
when looking at the IN, the outer query is the result set that will be joined to the IN result.  
if b.t = (select t ... ) is processed FIRST (before c.l = ( select l...)), the that would be the "outer query" with regards to c.1 = ( select l...).  
However -- you cannot count on the order of operation of predicates!!! we can and will change the order at will. 
 
 
 
Good topic
A reader, May       17, 2002 - 12:14 pm UTC
 
 
Tom and team,  Many thanks.  Love your site.  
I'm running Oracle 8.1.7.3
I need help with a "NOT IN" query.  
Here is what the tables look like:
Table 1:  
CREATE TABLE SKU
    (ITEM_ID                        NUMBER NOT NULL,
    SKU_CODE                       VARCHAR2(10) NOT NULL,
    SKU_DESC                       VARCHAR2(255) NOT NULL
  ,
  CONSTRAINT PK_SKU
  PRIMARY KEY (ITEM_ID, SKU_CODE)
  USING INDEX)
/
CREATE TABLE SKU_SOLD_OUT
    (ITEM_ID                        NUMBER NOT NULL,
    SKU_CODE                       VARCHAR2(10) NOT NULL,
    SOLD_OUT_DATE                  DATE NOT NULL,
    DELETED                        CHAR(1) NOT NULL,
    SOLD_OUT_TEXT                  VARCHAR2(1000) NOT NULL
  ,
  CONSTRAINT PK_SKU_SOLD_OUT
  PRIMARY KEY (ITEM_ID, SKU_CODE, SOLD_OUT_DATE)
  USING INDEX)
/
With item_id, sku_code combinations having zero to many rows in sku_sold_out 
but only one will be active (deleted = 'N') at a time.
I need to get a list of items that are not completely sold out.  i.e.
If an Item_id has four sku_codes and three are currently sold out, return the item_id.
If all four sku_codes are sold out, do not return the item_id.
Here is the query I am useing now.
select distinct item_id 
from sku
where item_id||'-'||sku_code not in (
    select item_id||'-'||sku_code 
    from sku_sold_out
    where deleted = 'N'
     and  sold_out_date <= sysdate)
sku has about 10,000 rows
sku sold out has about 120,000 rows.
Is there a better way of doing this?  The schema is set and is production so I cant 
make any database changes other than views, procs, ect.
Thanks for the help.  
Thanks 
 
May       17, 2002 - 1:12 pm UTC 
 
 
Well, other ways would be
select distinct item_id
  from sku
  where (item_id,sku_code) not in ( select item_id, sku_code
                                      from sku_sold_out
                                     where deleted = 'N'
                                       and  sold_out_date <= sysdate)
/
select distinct sku.item_id
  from sku, sku_sold_out
 where sku.item_id = sku_sold_out.item_id (+)
   and sku.sku_code = sku_sold_out.sku_code (+)
   and nvl(sku_sold_out.deleted,'Y') = 'Y'
   and nvl(sku_sold_out.sold_out_date,sysdate) <= sysdate
/
 
 
 
 
Index  result of Subquery and FILTER it
Jaromir D.B. Nemec, May       18, 2002 - 12:21 am UTC
 
 
Hi Tom,
in the original explanation you told us:
The subquery is evaluated, distinct'ed, >>>indexed<<<< (or hashed or sorted) and then ....
1) Can you explain how can be the result of a subquery indexed.  
temporary index is created or something like that? What you see in explain plan in that case? - never heard of this feature.
2) Some of the explain plans you demonstrated consist of:
FILTER
    row set 1
    row set 2
I've learned that FILTER is a kind of "filter" to suppress rows that doesn't match some condition. This is  evidently NOT (in) the case.
Can you explain (or link) what is the functionality of this two-row-ser-filter? 
My experience is that this kind of FILTER operate similar to  nested loops and is not performable in parallel - is it right?
Thanks
Jaromir
www.db-nemec.com
 
 
May       18, 2002 - 10:07 am UTC 
 
 
1) it is stored sorted or hashed.  The structure itself is an implicit "indexing" of sorts.  Lets say you were doing a sort merge join -- you would see two sorts (out to temp) and then a merge.  A hash join -- one of the sets gets hashed.
for #2, I assume you mean something like:
ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b0
  2   where object_id IS NOT NULL
  3     and object_id NOT IN ( select object_id
  4                              from small )
  5  /
16886 rows selected.
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
   3    1     INDEX (UNIQUE SCAN) OF 'SMALL_IDX' (UNIQUE)
The FILTER in this case is sort of like a nested loops join yes.  The table access full is done -- and the "is not null" part is evaluated.  Then each row must be filtered for the NOT IN and that is done via the index.  
This could be done in parallel -- meaning the full scan can be parallelized.
 
 
 
 
 
Useful info about IN & EXISTS
Vish Inamdar, July      23, 2002 - 3:06 pm UTC
 
 
Yes, this article gives me clear idea how IN & EXISTS are different from each other and where to apply them.
Good and simple example made it more clear.
I will be asking more question in future.
Vish
 
 
 
Exists, IN and consistent gets
Vikas Sharma, August    22, 2002 - 8:04 am UTC
 
 
Hi tom,
Can i directly relate the consistent gets of plan with the time taken by the query executed by using IN  and another same query executed by using exists. Actually I have query which I execute in two different ways eg.
Query with exists :
select person_id from persons where exists
 (select con_person_id  from contacts where con_person_id = person_id ) and contact_details = 'Y'  
93 rows selected ..
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=951 Card=1 Bytes=9)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'PERSONS' (Cost=951 Card=1 Bytes=
          9)
   3    1     INDEX (UNIQUE SCAN) OF 'CON_PK' (UNIQUE)
Statistics
----------------------------------------------------------
          0  recursive calls
          8  db block gets
       6365  consistent gets
          0  physical reads
          0  redo size
       2967  bytes sent via SQL*Net to client
       1091  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
93    rows processed
Query with IN :
select person_id from persons where person_id in (select con_person_id  from contacts ) and contact_details = 'Y';
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=187 Card=1 Bytes=17)
   1    0   NESTED LOOPS (Cost=187 Card=1 Bytes=17)
   2    1     INDEX (FULL SCAN) OF 'CON_PK' (UNIQUE) (Cost=1 Card=93 B
          ytes=744)
   3    1     TABLE ACCESS (BY INDEX ROWID) OF 'PERSONS' (Cost=2 Card=
          1 Bytes=9)
   4    3       INDEX (UNIQUE SCAN) OF 'PER_PK' (UNIQUE) (Cost=1 Card=
          1)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        543  consistent gets
          0  physical reads
          0  redo size
       2967  bytes sent via SQL*Net to client
       1091  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
93    rows processed
The persons table has 31000+ records and the contact table has 120+ record in this case as you proved the IN is better I  am too getting the same results as you can see the IN is fast. 
I would like to know that is it always true for two same queries (executed in the different way ) if the  consistent gets for a query are more then that query is not optimized and will always  result a poor show, we should always go for the query which results less consistent get. 
I read in you book also the consistent gets depend upon the order of the data of the index columns. It will be more when the data for the index column is disordered. Does that will always effect the time taken by the query.  
 
 
August    22, 2002 - 9:14 am UTC 
 
 
Less consistent gets is a good goal.  In generate, the less consistent gets
o the faster it'll be
o the more SCALABLE it'll be (as you add users, you can add MORE users with less consistent gets)
Always is a very very strong term.  Almost Always i would say.
As for the last one -- that is due to a large index range scan that'll be getting LOTS of data from the table -- not really applicable in this example.  Does it affect the time taken by the query -- probably, the more consistent gets, the more latching (thats a lock, a serialization thing), the longer it takes to get the data. 
 
 
 
Ashok, Bombay ,India, August    22, 2002 - 10:05 am UTC
 
 
your style of explaining things is just amazing..!!you must be one helluva teacher !!! 
 
 
Thanks a Lot
Vikas Sharma, August    22, 2002 - 12:58 pm UTC
 
 
Hi Tom,
 
Thanks lot 
The way you explain the things it is great..
Regards,
Vikas Sharma 
 
 
Very helpful
Bijay, August    22, 2002 - 8:45 pm UTC
 
 
Tom
I wonder how you can find so much time in replying to all questions. Its amazing a simple question can create such a big chain of Q & A.
You have been a great source of knowledge for all of us. Thank you for being you.
Bj 
 
 
why is my query plan using NL? where as your use SORT-MERGE
A reader, September 05, 2002 - 3:03 pm UTC
 
 
Tom,
I am trying to understand exists and not exists
 analyze table big compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns
  5  /
Table analyzed.
SQL> analyze table small compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns
  5  /
Table analyzed.
SQL> truncate table plan_table;
Table truncated.
SQL> explain plan for
  2  select count(subobject_name)
  from big
 where object_id in ( select object_id from small )
  3    4    5  /
Explained.
SQL> @$ORACLE_HOME/rdbms/admin/utlxpls.sql
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |          |     1 |   19 |    598 |       |       |
|  SORT AGGREGATE           |          |     1 |   19 |        |       |       |
|   NESTED LOOPS            |          |   396 |    7K|    598 |       |       |
|    VIEW                   |VW_NSO_1  |    99 |    1K|      4 |       |       |
|     SORT UNIQUE           |          |    99 |  396 |      4 |       |       |
|      INDEX FULL SCAN      |SMALL_IDX |    99 |  396 |      1 |       |       |
|    TABLE ACCESS BY INDEX R|BIG       |   113K|  665K|      6 |       |       |
|     INDEX RANGE SCAN      |BIG_IDX   |   113K|      |      2 |       |       |
--------------------------------------------------------------------------------
11 rows selected.
Thans and Regards,                           
 
 
September 05, 2002 - 10:27 pm UTC 
 
 
things like sort_area_size, db_block_buffers, hash area size, database version, etc will all affect the plans. 
 
 
 
Update Process Options
SK, September 09, 2002 - 10:32 am UTC
 
 
Hi Tom,
I have to update my production table daily with new/update data. My update process is like this.
Table big_table (col1 integer, col2 integer, col3 integer)
Table Small_table (c1 integer, c2 integer, c3 integer)
Small table is a external Table. and Big Table as index on Col1.
My update process is like this:
Delete from big_table where col1 in (select c1 from small_table);
insert into big_table select c1,c2,c3 from small_table.
My Big table has 32 million rows and small table has 400K rows. This process is taking 30 minutes to do the above two processes. The requirement is to reduce this time. How can we do it. I believe using Exists will do both full table scans so that will not be efficient. Where as using in may use indexes (but I doubt that it is using indexes).
1. Should I go for updating using procedure where I delete records one by one and I will be sure that processes will use indexes. But that is not my choice and it will defy your first rule of using single statement instead of using procedure.
2. In one such update my big_table has more than 500 million records. and small Table has 7 million records. I have 24GB of temp tablespace. My delete statement is failing due to unable to allocate extents on temp tablespace. I am surprised on this. Is Temp tablespace of 24GB too large? In the above delete statement will it use only col1 and c1 for sort/merge. How do I estimate the temp tablespace size for this transaction. I am running Oracle 9iR2 RAC.
Your response will be highly appreciated.
Thanks 
 
September 09, 2002 - 10:41 am UTC 
 
 
 
 
Follow up
SK, September 09, 2002 - 11:41 am UTC
 
 
Hi Tom,
I am using merge for some tables where I have one to one relationship between the two table. But for many big tables I have multiple records for same key (tables with non unique key). So will merge will take care of following situation where for big table has 5 records for col1. and small table has 3 records for the same key. Will merge replace 5 rows in big table with 3 rows from small table for the same key (means deleteion of two records from big table).
I will certainly prefer merge if this is possible.
Regards 
 
September 09, 2002 - 12:30 pm UTC 
 
 
Then that is not a merge. 
A merge would leave the same number of rows.
You'll need to do the delete+insert.  You'll want to use tkprof to see what it is doing now and see if it can be speeded up.  Deleting 400k rows is going to take a while if you have indexes on the table. 
 
 
 
Oracle Optimizer and In & Exists
Subhrajyoti Paul, September 11, 2002 - 12:55 am UTC
 
 
Tom,
   From the discussion above, I get the impression that In and exists clause can be used interchangibly, in case there is a subquery inside the IN clause. But, one would be better than the other depending on the amount data retrieved in the inner and the outer query.
   If they can be used interchangibly, then it sounds logical that the optimizer can substitute an in clause with an exists and vice versa, given it is provided with proper statistics... Does this happen in practice... Or am I making some mistakes in my understanding of the things...
   One other question, 
"select * 
  from t1, ( select distinct y from t2 ) t2
 where t1.x = t2.y;
The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then 
joined to the original table -- typically. "
Here you mentioned that it is distinct'ed, which means it is sorted, now once sorted, will oracle favor the use of merge sort join, because the sort operation is already complete...or may it choose to Indexing or hash join. How much importance is given to the fact that one step (sort, in this case) can be eliminated from the join operation, or is this relevant at all...
    
 
September 11, 2002 - 7:44 am UTC 
 
 
And in 8i, ... 9i, .... 10i, and on up -- that is true.  You'll see interesting query rewrites with both (given that you use the cost based optimizer which I can assure you you will be in 10i as the RBO is gone in that release)
distinct <> sort.  SORT is SORT, distinct is distinct.  DISTINCT does not imply or infer or mean SORTed.  Never has, never will. 
 
 
 
But how otherwise can u perform a distinct if u do not sort...
Subhrajyoti Paul, September 11, 2002 - 8:14 am UTC
 
 
Tom,
   I am relieved that the optimizer does the substitution...And ofcourse, I use cost based optimizer...
   But the distinct thing.. If distincted data does not mean sorted, how otherwise can u have distinct data if u did not sort it... Can you give me an example...
Subhro 
 
September 11, 2002 - 8:39 am UTC 
 
 
Lets say you have a partitioned table.
You used hash partitioning on the ID column.
We know that across partitions -- ID is "unique" (eg: if ID=5 is in partition 1, ID=5 will NOT be in partitions 2, 3, 4, ....  and so on).
We use parallel query.
Each partitioned is distincted and then glued together.  ID=5 from partition 1 could come after ID=1000 from partition 2 if the PQ slave processing partition 2 was "faster" then the one doing partition 1.
For example.  There are other cases as well.  distinct, group by, etc -- do not imply "sorted" 
 
 
 
Thanks a lot... I got the point...
Subhrajyoti Paul, September 11, 2002 - 8:46 am UTC
 
 
Tom,
   Most of what u said in the last followup is a bouncer to me(in cricket, a bouncer is a ball that goes above the batsman's head ;-) )...
   But, I got the point that distinct does not mean sorted...
   I just could not think of a case myself ..so the example was needed...
Thanks a lot... everytime i think i got u...u prove me wrong... cool.... 
 
 
not in & not exists
Brigitte, November  05, 2002 - 5:31 am UTC
 
 
Hallo Tom,
Some times ago I changed a query to "not exists", because "not in" takes some hours. 
Now I tried your interesting followup from October 01, 2001  with the test database, but alter session 
set always_anti_join = HASH/MERGE hasn't an effect. Why?
notesid is not null in all tables, it is the primary key in testidxdokument,testidxvorgang and testidxmappe
Sql> set autotrace trace
Sql> set timing on
Sql> r
  1  select distinct mandant,notesid from testleserfeldgruppe l
  2  where not exists
  3  (select 'x' from testidxdokument
  4  where l.notesid = notesid)
  5  and not exists
  6  (select 'x' from testidxvorgang
  7  where l.notesid = notesid)
  8  and not exists
  9  (select 'x' from testidxmappe
 10  where l.notesid = notesid)
 11* order by mandant, notesid
no rows selected
Elapsed: 00:00:01.02
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=35)
   1    0   SORT (ORDER BY) (Cost=6 Card=1 Bytes=35)
   2    1     SORT (UNIQUE) (Cost=5 Card=1 Bytes=35)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'TESTLESERFELDGRUPPE' (Cost=3 Card=1 Bytes=35)
   5    3         INDEX (UNIQUE SCAN) OF 'UK_TESTIDXDOKUMENT' (UNIQUE) (Cost=1 Card=1 Bytes=18)
   6    3         INDEX (UNIQUE SCAN) OF 'UK_TESTIDXVORGANG' (UNIQUE) (Cost=1 Card=1 Bytes=18)
   7    3         INDEX (UNIQUE SCAN) OF 'UK_TESTIDXMAPPE' (UNIQUE) (Cost=1 Card=1 Bytes=18)
Statistics
----------------------------------------------------------
          0  recursive calls
          4  db block gets
       1507  consistent gets
          0  physical reads
          0  redo size
        160  bytes sent via SQL*Net to client
        251  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          0  rows processed
Sql> r
  1  select distinct mandant,notesid from testleserfeldgruppe where notesid not in
  2  (select notesid from testidxdokument)
  3  and notesid not in
  4  (select notesid from testidxvorgang)
  5  and notesid not in
  6  (select notesid from testidxmappe)
  7* order by mandant, notesid
no rows selected
Elapsed: 00:00:02.44
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=35)
   1    0   SORT (ORDER BY) (Cost=6 Card=1 Bytes=35)
   2    1     SORT (UNIQUE) (Cost=5 Card=1 Bytes=35)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'TESTLESERFELDGRUPPE' (Cost=3Card=1 Bytes=35)
   5    3         TABLE ACCESS (FULL) OF 'TESTIDXDOKUMENT' (Cost=12 Card=303 Bytes=5454)
   6    3         TABLE ACCESS (FULL) OF 'TESTIDXVORGANG' (Cost=1 Card=9 Bytes=162)
   7    3         TABLE ACCESS (FULL) OF 'TESTIDXMAPPE' (Cost=1 Card=9 Bytes=162)
Statistics
----------------------------------------------------------
          0  recursive calls
       3304  db block gets
      32597  consistent gets
          0  physical reads
          0  redo size
        167  bytes sent via SQL*Net to client
        249  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed
Sql> alter session set always_anti_join = MERGE;
Session altered.
Sql> select distinct mandant,notesid from testleserfeldgruppe where notesid not in
  2  (select notesid from testidxdokument)
  3  and notesid not in
  4  (select notesid from testidxvorgang)
  5  and notesid not in
  6  (select notesid from testidxmappe)
  7  order by mandant, notesid;
no rows selected
Elapsed: 00:00:01.23
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=35)
   1    0   SORT (ORDER BY) (Cost=6 Card=1 Bytes=35)
   2    1     SORT (UNIQUE) (Cost=5 Card=1 Bytes=35)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'TESTLESERFELDGRUPPE' (Cost=3 Card=1 Bytes=35)
   5    3         TABLE ACCESS (FULL) OF 'TESTIDXDOKUMENT' (Cost=12 Card=303 Bytes=5454)
   6    3         TABLE ACCESS (FULL) OF 'TESTIDXVORGANG' (Cost=1 Card=9 Bytes=162)
   7    3         TABLE ACCESS (FULL) OF 'TESTIDXMAPPE' (Cost=1 Card=9 Bytes=162)
Statistics
----------------------------------------------------------
          0  recursive calls
       3304  db block gets
      32597  consistent gets
          0  physical reads
          0  redo size
        167  bytes sent via SQL*Net to client
        249  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed
Sql> alter session set always_anti_join = HASH;
Session altered.
Sql> select distinct mandant,notesid from testleserfeldgruppe where notesid not in
  2  (select notesid from testidxdokument)
  3  and notesid not in
  4  (select notesid from testidxvorgang)
  5  and notesid not in
  6  (select notesid from testidxmappe)
  7  order by mandant, notesid;
no rows selected
Elapsed: 00:00:01.13
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=35)
   1    0   SORT (ORDER BY) (Cost=6 Card=1 Bytes=35)
   2    1     SORT (UNIQUE) (Cost=5 Card=1 Bytes=35)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'TESTLESERFELDGRUPPE' (Cost=3 Card=1 Bytes=35)           
   5    3         TABLE ACCESS (FULL) OF 'TESTIDXDOKUMENT' (Cost=12 Card=303 Bytes=5454)
   6    3         TABLE ACCESS (FULL) OF 'TESTIDXVORGANG' (Cost=1 Card=9 Bytes=162)
   7    3         TABLE ACCESS (FULL) OF 'TESTIDXMAPPE' (Cost=1 Card=9 Bytes=162)
Statistics
----------------------------------------------------------
          0  recursive calls
       3304  db block gets
      32597  consistent gets
          0  physical reads
          0  redo size
        167  bytes sent via SQL*Net to client
        249  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed
Regards
 
 
November  05, 2002 - 9:44 am UTC 
 
 
are those statistics close to right?  9 rows and such -- on such a trivial amount of data -- all bets are off
either get real stats on there or load the tables with real data. 
 
 
 
not in  or not exists
Brigitte, November  06, 2002 - 10:06 am UTC
 
 
Hallo Tom, 
here are more data.
1. Why has the alter session no effect. What's wrong? Some parameters?
2. what means: "elapsed 02:133:8032.30", I know hours:minutes:seconds
   Does it mean 2 hours + 13 minutes +...
             or 2 hours + 2 hours + 13 minutes +...
06.11.02 07:53    Anzahl der Tabellenzeilen       Seite:   1
                            dms
DMSIDXDOKUMENT               : 46220
DMSIDXMAPPE                  : 13
DMSIDXVORGANG                : 464
DMSLESERFELDGRUPPE           : 137369
Na so was> r
  1  select distinct mandant,notesid from dmsleserfeldgruppe l
  2  where not exists
  3  (select 'x' from dmsidxdokument
  4  where l.notesid = notesid)
  5  and not exists
  6  (select 'x' from dmsidxvorgang
  7  where l.notesid = notesid)
  8  and not exists
  9  (select 'x' from dmsidxmappe
 10  where l.notesid = notesid)
 11* order by mandant, notesid
no rows selected
Elapsed: 00:00:19.18
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=71 Card=25 Bytes=950)
   1    0   SORT (ORDER BY) (Cost=71 Card=25 Bytes=950)
   2    1     SORT (UNIQUE) (Cost=68 Card=25 Bytes=950)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'DMSLESERFELDGRUPPE' (Cost=65 Card=25 Bytes=950)
   5    3         INDEX (UNIQUE SCAN) OF 'UK_DMSIDXDOKUMENT' (UNIQUE) (Cost=1 Card=1 Bytes=32)
   6    3         INDEX (UNIQUE SCAN) OF 'UK_DMSIDXVORGANG' (UNIQUE) (Cost=1 Card=1 Bytes=31)
   7    3         INDEX (UNIQUE SCAN) OF 'UK_DMSIDXMAPPE' (UNIQUE)
Statistics
----------------------------------------------------------
        245  recursive calls
          7  db block gets
     149823  consistent gets
       1439  physical reads
          0  redo size
        161  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
         18  sorts (memory)
          0  sorts (disk)
          0  rows processed
Na so was> r
  1  select distinct mandant,notesid from dmsleserfeldgruppe where notesid not in
  2  (select notesid from dmsidxdokument)
  3  and notesid not in
  4  (select notesid from dmsidxvorgang)
  5  and notesid not in
  6  (select notesid from dmsidxmappe)
  7* order by mandant, notesid
no rows selected
Elapsed: 02:133:8032.30
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=71 Card=25 Bytes=950)
   1    0   SORT (ORDER BY) (Cost=71 Card=25 Bytes=950)
   2    1     SORT (UNIQUE) (Cost=68 Card=25 Bytes=950)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'DMSLESERFELDGRUPPE' (Cost=65 Card=25 Bytes=950)
   5    3         TABLE ACCESS (FULL) OF 'DMSIDXDOKUMENT' (Cost=118 Card=2101 Bytes=67232)
   6    3         TABLE ACCESS (FULL) OF 'DMSIDXVORGANG' (Cost=1 Card=21 Bytes=651)
   7    3         TABLE ACCESS (FULL) OF 'DMSIDXMAPPE' (Cost=1 Card=2 Bytes=50)
Statistics
----------------------------------------------------------
          0  recursive calls
     464062  db block gets
  101189223  consistent gets
   12940155  physical reads
          0  redo size
        167  bytes sent via SQL*Net to client
        244  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed
Na so was> alter session set always_anti_join = MERGE;
Session altered.
Na so was> r
  1  select distinct mandant,notesid from dmsleserfeldgruppe where notesid not in
  2  (select notesid from dmsidxdokument)
  3  and notesid not in
  4  (select notesid from dmsidxvorgang)
  5  and notesid not in
  6  (select notesid from dmsidxmappe)
  7* order by mandant, notesid
no rows selected
Elapsed: 02:133:8036.06
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=71 Card=25 Bytes=950)
   1    0   SORT (ORDER BY) (Cost=71 Card=25 Bytes=950)
   2    1     SORT (UNIQUE) (Cost=68 Card=25 Bytes=950)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'DMSLESERFELDGRUPPE' (Cost=65 Card=25 Bytes=950)
   5    3         TABLE ACCESS (FULL) OF 'DMSIDXDOKUMENT' (Cost=118 Card=2101 Bytes=67232)
   6    3         TABLE ACCESS (FULL) OF 'DMSIDXVORGANG' (Cost=1 Card=21 Bytes=651)
   7    3         TABLE ACCESS (FULL) OF 'DMSIDXMAPPE' (Cost=1 Card=2 Bytes=50)
Statistics
----------------------------------------------------------
          0  recursive calls
     464112  db block gets
  101219920  consistent gets
   13238978  physical reads
          0  redo size
        167  bytes sent via SQL*Net to client
        244  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed
Na so was> alter session set always_anti_join = HASH;
Session altered.
Na so was> select distinct mandant,notesid from dmsleserfeldgruppe where notesid not in
  2  (select notesid from dmsidxdokument)
  3  and notesid not in
  4  (select notesid from dmsidxvorgang)
  5  and notesid not in
  6  (select notesid from dmsidxmappe)
  7  order by mandant, notesid;
no rows selected
Elapsed: 02:135:8144.32
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=71 Card=25 Bytes=950)
   1    0   SORT (ORDER BY) (Cost=71 Card=25 Bytes=950)
   2    1     SORT (UNIQUE) (Cost=68 Card=25 Bytes=950)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'DMSLESERFELDGRUPPE' (Cost=65 Card=25 Bytes=950)
   5    3         TABLE ACCESS (FULL) OF 'DMSIDXDOKUMENT' (Cost=118 Card=2101 Bytes=67232)
   6    3         TABLE ACCESS (FULL) OF 'DMSIDXVORGANG' (Cost=1 Card=21 Bytes=651)
   7    3         TABLE ACCESS (FULL) OF 'DMSIDXMAPPE' (Cost=1 Card=2 Bytes=50)
Statistics
----------------------------------------------------------
          0  recursive calls
     464327  db block gets
  101357656  consistent gets
   14016238  physical reads
          0  redo size
        167  bytes sent via SQL*Net to client
        244  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed
Regards 
 
 
not in & not exists
Brigitte, November  08, 2002 - 3:20 am UTC
 
 
Hallo Tom,
I hope the data which I send on November 06 are enough.
I think a difference from 19 minutes to some hours is significant.
I'm very interested in your reply.
Regards 
 
November  08, 2002 - 7:46 am UTC 
 
 
Brigitte -- for something so big, you'll need to do it as a question - this is for followups, quick questions.  This is an entire question in a question. 
 
 
 
Is the logic backwards for NOT IN and NOT EXISTS
Doug, November  15, 2002 - 4:27 pm UTC
 
 
Tom - this may be an oversimplification - however if I was to distill whether IN or EXISTS was better I would say this after reading your reply.  An IN will generally full table scan the subquery, and an EXISTS will generally full table scan the outer query.  So, if my outer loop is small and my inner code is large then an exists might be appropriate but if my outer loop is large and my inner code is small (the match is quick).. then an IN might be appropriate.  Is that correct?  If so.. is the situation completely reversed if using NOT in and NOT exists?  I would guess is you were using not in.. you would have to full table scan the OUTER query rather than sorting,hashing, the inner one.  You did cover that NOT IN will return zero rows if it is NOT IN (null).. but I wasn't sure if the psuedo-code for processing is backwards from the original examples if one is using NOT IN and NOT EXISTS.   
 
November  15, 2002 - 8:21 pm UTC 
 
 
I encapsulate it with this:
...
So, when is where exists appropriate and in appropriate?
Lets say the result of the subquery
    ( select y from T2 )
is "huge" and takes a long time.  But the table T1 is relatively small and 
executing ( select null from t2 where y = x.x ) is very very fast (nice index on 
t2(y)).  Then the exists will be faster as the time to full scan T1 and do the 
index probe into T2 could be less then the time to simply full scan T2 to build 
the subquery we need to distinct on.
Lets say the result of the subquery is small -- then IN is typicaly more 
appropriate.
If both the subquery and the outer table are huge -- either might work as well 
as the other -- depends on the indexes and other factors. 
.....
full scans -- no, they enter not into the picture at all IN GENERAL.  If the subquery can use indexes, it'll use an index with IN or EXIST.
It is if the subquery is LARGE and TAKES A LONG TIME TO BUILD.... (regardless of full scan)
Not in and not exists is even harder to characterize (pull my AOTC slides from the home page for an interesting example -- they will be there for a while around November of 2002 -- if you are reading this later then that, they might not be there anymore) 
 
 
 
How to make NOT EXISTS handle comparison of NULLs in same way as MINUS
Jim Dickson (DBA/developer), March     13, 2003 - 10:30 am UTC
 
 
As ever, learn a lot from this site.
But one thing not addressed is how to make an antijoin deal with NULLs. The set operators (UNION, MINUS) do this automatically.
But can it be done efficiently with NOT EXISTS or NOT IN.
e.g.
I need to compare 2 tables such that two rows are duplicates if corresponding columns are either equal, or both contain NULL. 
I cannot use MINUS since that columns being compared are a subset of the row, but i need to return the whole row. 
My understanding is that I cannot use NOT IN because it does not handle "null comparison". 
NOT EXISTS works but seems to use NESTED LOOPS => v slow 
My understanding is that HASH antijoin will not work because HASH join only works for equality tests (which fail for NULLs). 
I can post example code if this helps.
 
 
March     14, 2003 - 5:22 pm UTC 
 
 
both not exists and ESPECIALLLY not in do not do nulls like that.
not in and not exists don't even treat nulls the same in fact.  do this:
select * from emp where empno not in ( select mgr from emp );
and compare that to the "equivalent" not exists:
select * from emp where not exists ( select null from emp e2 where e2.mgr = emp.empno );
you'll get a different answer.
You would have to tediously compare column by column by column
where (x = y or (x is null and y is null))
otherwise. 
 
 
 
How to make NOT EXISTS handle comparison of NULLs in same way as MINUS
Jim Dickson, March     17, 2003 - 12:08 pm UTC
 
 
I absolutely agree but the question more about access paths than semantics.
I believe i understand the semantics surrounding nulls - which is why MINUS/INTERSECT are such great operators.
I am prepared to write "tedious" column comparison.
Can I get Oracle to perform comparisons via a single-pass full table scan access method (hash, sort-merge).
I have only seen Cartesian product/Nested Loops approach which obviously does not scale with number of rows.
Is there any chance of Oracle implementing ANSI set operator extension "CORRESPONDING BY {column-list}" ?
 
 
March     17, 2003 - 1:04 pm UTC 
 
 
curious -- why cannot you simply use MERGE/INTERSECT? 
 
 
 
How to make NOT EXISTS handle comparison of NULLs in same way as MINUS  
Jim Dickson, March     18, 2003 - 4:47 am UTC
 
 
Because I am not merging into target table directly.
The sequence of processing is
1/    Read an external table
2/    Identify differences based on a subset of columns
3/    Write out differences to OS for further processing  (fuzzy logic on Name & Address data)
4/    Read 2nd external table containing processed data and then "merge" into target table.
Where I have a primary key, I can use MINUS for step 2.
But where there is no PK, then I am trying to get NOT EXISTS to scale.
Hope that makes sense.
BTW Is MERGE statement ANSI? 
 
March     18, 2003 - 7:47 am UTC 
 
 
well, not following you by what you mean by "getting not exists to scale" -- but give me an EXAMPLE of the syntax you would like to use and describe what would be returned and I'll see if there isn't a way to code that...
Don't know if merge is "ansi".  Thing is NIST stop testing for ansi compliance years ago, no one does it, the standard is so huge now that full compliance is pretty much "not going to happen" and database indenpendence, well, I don't even go there -- the only results I see from that over the last 15 years has been applications that perform marginally well on a single database and blow chunks on the rest. 
 
 
 
How to make NOT EXISTS handle comparison of NULLs in same way as 
Jim Dickson, March     19, 2003 - 4:01 am UTC
 
 
Code posted below.
The NOT EXISTS does not include the SECONDARY column.
Ran with 100 rows, 1000 rows and 2389 rows.
On my machine
MINUS        100 rows  00.44 secs,    8  db block gets
            1000 rows  00.27 secs,    7  db block gets
            2389 rows  00.51 secs,    4  db block gets
NOT EXISTS   100 rows  00.43 secs,  208  db block gets
            1000 rows  02.90 secs, 2007  db block gets
            2389 rows  11.75 secs, 4780  db block gets
I need this to run with 10m rows.
spool not_exists.2389.out
set timing on
set echo on
drop   table t1;
drop   table t2;
create table t1 as select * from all_objects where rownum <= 2500;
create table t2 as select * from all_objects where rownum <= 2500;
analyze table t1 compute statistics;
analyze table t2 compute statistics;
--desc t1
--desc t2
insert into t1 values ('EXTRA ROW', null, null, null, null, null, null, null, null, null, null, null, null);
select count(*) from t1;
select count(*) from t2;
 
set autotrace on
select count(*) from
( 
  select * from t1
  minus
  select * from t2
);
select count(*) from t1 where NOT EXISTS 
( select null from t2 where 
  ( t1.OWNER = t2.OWNER or ( t1.OWNER is null and t2.OWNER is null ) )
  and
  ( t1.OBJECT_NAME = t2.OBJECT_NAME or ( t1.OBJECT_NAME is null and t2.OBJECT_NAME is null ) )
  and
  ( t1.SUBOBJECT_NAME = t2.SUBOBJECT_NAME or ( t1.SUBOBJECT_NAME is null and t2.SUBOBJECT_NAME is null ) )
  and
  ( t1.OBJECT_ID = t2.OBJECT_ID or ( t1.OBJECT_ID is null and t2.OBJECT_ID is null ) )
  and
  ( t1.DATA_OBJECT_ID = t2.DATA_OBJECT_ID or ( t1.DATA_OBJECT_ID is null and t2.DATA_OBJECT_ID is null ) )
  and
  ( t1.OBJECT_TYPE = t2.OBJECT_TYPE or ( t1.OBJECT_TYPE is null and t2.OBJECT_TYPE is null ) )
  and
  ( t1.CREATED = t2.CREATED or ( t1.CREATED is null and t2.CREATED is null ) )
  and
  ( t1.LAST_DDL_TIME = t2.LAST_DDL_TIME or ( t1.LAST_DDL_TIME is null and t2.LAST_DDL_TIME is null ) )
  and
  ( t1.TIMESTAMP = t2.TIMESTAMP or ( t1.TIMESTAMP is null and t2.TIMESTAMP is null ) )
  and
  ( t1.STATUS = t2.STATUS or ( t1.STATUS is null and t2.STATUS is null ) )
  and
  ( t1.TEMPORARY = t2.TEMPORARY or ( t1.TEMPORARY is null and t2.TEMPORARY is null ) )
  and
  ( t1.GENERATED = t2.GENERATED or ( t1.GENERATED is null and t2.GENERATED is null ) )
)
;
spool off
exit
As ever, thanks.
Think the whole Oracle community is disappointed that the new book will be delayed. 
 
March     20, 2003 - 1:01 pm UTC 
 
 
what I meant by:
 but 
give me an EXAMPLE of the syntax you would like to use and describe what would 
be returned and I'll see if there isn't a way to code that...
was -- give me a scenario where you want to use not exists instead of minus (eg: in the above, i certainly would NOT use not exists) and I'll see what I can do with that. 
That is -- give me a real case to work with whereby minus cannot be used. 
 
 
 
green, March     21, 2003 - 1:39 pm UTC
 
 
Hi Tom,
Can you please explain a litte bit 'NOT IN' here? Where  can  it be used properly?
Thanks! 
 
March     21, 2003 - 2:21 pm UTC 
 
 
I don't understand?  
not in can be used anywhere you need "not in" -- where "record" not in "set".  
 
 
 
 
How to use MINUS on subset of columns
Jim Dickson, March     27, 2003 - 4:51 am UTC
 
 
I use MINUS extensively but in certain situations I believe it is unsuitable.
e.g.
When trying to find the difference between 2 tables, where at least 1 of tables lack a primary key,
using a subset of columns for comparison and some of those columns can be null.
The real life example is comparing a new list of suppressions against an existing table of suppressions.
(Suppressions used in marketing to avoid target customers who do not wish to receive offers).
The ddl is pretty much address1-10, suppression_flags1-10.
The comparison is done on address lines only but return set must be all columns.
Hope that makes sense.
The ANSI syntax i would love to use is
select * from suppressions_new
except corresponding by (address1, address2, ...., address10)
select * from suppressions_old
Suppressions_new will have no primary key.
How can i achieve this in Oracle using MINUS instead of EXCEPT.
If it cannot be done using MINUS, then how to achieve reasonable performance using NOT EXISTS. 
 
March     27, 2003 - 8:34 am UTC 
 
 
Ok, I have a big_table (4million rows) and a sample of that (400k rows).  Using NVL() on columns that "might be nullable" and supplying an unreasonable value (eg: chr(0) for strings, 99999999999999999999999999 for numbers, to_date(01010001,'ddmmyyyyy') for dates) you can anti join with NOT IN like this:
select *
  from big_table
 where (object_id, nvl(subobject_name,chr(0)) )
   NOT IN (select object_id, nvl(subobject_name,chr(0))
             from ten_percent
            where nvl(subobject_name,chr(0)) IS NOT NULL )
   and nvl(subobject_name,chr(0)) is not null
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1    114.65     248.86     118052      60214          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3    114.65     248.86     118052      60214          0           0
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 60
Rows     Row Source Operation
-------  ---------------------------------------------------
      0  HASH JOIN ANTI (cr=60214 r=118052 w=58940 time=77317035 us)
3817191   TABLE ACCESS FULL BIG_TABLE (cr=54700 r=54649 w=0 time=14975098 us)
 381274   TABLE ACCESS FULL TEN_PERCENT (cr=5514 r=4463 w=0 time=1360329 us)
Now, that "scales" just as well as:
select *
  from big_table
 where (object_id, object_name )
   NOT IN (select object_id, object_name
             from ten_percent )
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1    114.19     224.33     117115      60213          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3    114.19     224.33     117115      60213          0           0
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 60
Rows     Row Source Operation
-------  ---------------------------------------------------
      0  HASH JOIN ANTI (cr=60213 r=117115 w=57995 time=224338032 us)
3817191   TABLE ACCESS FULL BIG_TABLE (cr=54699 r=54657 w=0 time=16561038 us)
 381274   TABLE ACCESS FULL TEN_PERCENT (cr=5514 r=4463 w=0 time=1308385 us)
without the NVL's.
 
 
 
 
Use of  "nvl(subobject_name,chr(0)) IS NOT NULL "
Logan Palanisamy, March     27, 2003 - 1:27 pm UTC
 
 
Tom, 
In your example,
select *
  from big_table
 where (object_id, nvl(subobject_name,chr(0)) )
   NOT IN (select object_id, nvl(subobject_name,chr(0))
             from ten_percent
            where nvl(subobject_name,chr(0)) IS NOT NULL )
   and nvl(subobject_name,chr(0)) is not null
1. What does chr(0) translate to? When I run "select chr(0) from dual", it returns me an invisble character. My db character set is WE8ISO8859P15
2. What does "nvl(subobject_name,chr(0)) IS NOT NULL" do? Doesn't it evaluate to TRUE always, since chr(0) is a not null value? 
3. Also, the number of rows returned as shown in TKPROF output is zero. How come? Shouldn't return about 3.6 millon rows?
4. I am familiar with TKPROF. But what does the name really stand for? Is it an acronym for something?
Grateful as ever. 
 
March     27, 2003 - 2:14 pm UTC 
 
 
1) exactly -- it is just "\0" in C, a null string.  It is an improbable character, something subobject_name cannot be...
2) exactly -- I'm telling the optimizer "this query returns only data that is NOT NULL".  That permits the use of the anti-join which is what our goal was here.
3) no, there were no rows "not in" the other set.
4) tool kit profiler is the rumorous name I've heard it refered to as. 
 
 
 
LIKE.... OR LIKE.... OR
Robert, May       12, 2003 - 11:41 am UTC
 
 
Tom, 
Is there a better way (with a sub-query) to do
LIKE .... OR LIKE ... OR LIKE ... ?
e.g.:
select * from user_objects where 
OBJECT_TYPE LIKE 'PROC%' OR OBJECT_TYPE  LIKE 'FUNC%' OR OBJECT_TYPE  LIKE 'VIEW%" ?
Thanks
 
 
May       12, 2003 - 1:13 pm UTC 
 
 
that would be the most "optimal" way normally -- it could permit a normal index on object_type to be used.
You could shorten it to 
where substr( object_type, 1, 4 ) in ( 'PROC', 'FUNC', 'VIEW' ) 
in your case -- and if you had a function based index on substr(object_type,1,4) it could use an index. 
 
 
 
EXISTS and FTS
Nitin, May       14, 2003 - 6:57 pm UTC
 
 
I tried the following.
SQL> create table outer (a number);
Table created.
SQL> create index outer_ind on outer (a);
Index created.
SQL> create table inner (b number);
Table created.
SQL> create index inner_ind on inner (b);
Index created.
SQL> select 'x' from outer where exists (select 'x' from inner where outer.a = inner.b);
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'OUTER'
   3    1     INDEX (RANGE SCAN) OF 'INNER_IND' (NON-UNIQUE)
SQL> analyze table outer compute statistics;
Table analyzed.
SQL> analyze table inner compute statistics;
Table analyzed.
SQL> select 'x' from outer where exists (select 'x' from inner where outer.a = inner.b);
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=13)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'OUTER' (Cost=1 Card=1 Bytes=13)
   3    1     INDEX (RANGE SCAN) OF 'INNER_IND' (NON-UNIQUE) (Cost=1 C
          ard=1 Bytes=13)
SQL> insert into outer select OBJECT_ID from user_objects;
40899 rows created.
SQL> commit;
Commit complete.
SQL> analyze table outer compute statistics;
Table analyzed.
SQL> select 'x' from outer where exists (select 'x' from inner where outer.a = inner.b);
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=2045 Bytes=81
          80)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'OUTER' (Cost=4 Card=2045 Bytes=8
          180)
   3    1     INDEX (RANGE SCAN) OF 'INNER_IND' (NON-UNIQUE) (Cost=1 C
          ard=1 Bytes=13)
Should it not perform an index scan on the OUTER table?
Regards
Nitin 
 
 
May       14, 2003 - 11:46 pm UTC 
 
 
why would it?
every row must be inspected.
the rows are very narrow
the table is SMALLER then the index (the index has 2 columns -- the column and the rowid, the table is just a column)
use "real" examples.
ops$tkyte@ORA920LAP> create table t1 as select * from all_objects;
Table created.
ops$tkyte@ORA920LAP> create index t1_idx on t1(object_id);
Index created.
ops$tkyte@ORA920LAP> analyze table t1 compute statistics
  2  for table for all indexes for all indexed columns;
Table analyzed.
ops$tkyte@ORA920LAP> create table t2 as select * from all_objects where 1=0;
Table created.
ops$tkyte@ORA920LAP> create index t2_idx on t2(object_id);
Index created.
ops$tkyte@ORA920LAP> analyze table t1 compute statistics
  2  for table for all indexes for all indexed columns;
Table analyzed.
ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> set autotrace traceonly explain
ops$tkyte@ORA920LAP> select * from t1
  2  where exists ( select null from t2 where t2.object_id = t1.object_id );
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=168 Card=82 Bytes=8938)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=1 Bytes=96)
   2    1     NESTED LOOPS (Cost=168 Card=82 Bytes=8938)
   3    2       SORT (UNIQUE)
   4    3         TABLE ACCESS (FULL) OF 'T2' (Cost=2 Card=82 Bytes=1066)
   5    2       INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1 Card=1)
ops$tkyte@ORA920LAP> set autotrace on
 
 
 
 
 
Still FTS
Nitin, May       15, 2003 - 12:17 pm UTC
 
 
I replicated all the steps performed by you. And I am getting the following execution plan.
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=55 Card=3515 Bytes=2
          70655)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'T1' (Cost=55 Card=3515 Bytes=270
          655)
   3    1     INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE) (Cost=1 Card
          =1 Bytes=13)
The db is 8.1.7.4. Is is possible that there is some init.ora parameter which is dictating the above FTS execution plan?
Thanks 
 
May       15, 2003 - 5:57 pm UTC 
 
 
that is fundementally the same plan I had -- so?
it needs to full scan, mine full scanned.  
tell me -- what would the "straw man plan" you think would be better be exactly? 
 
 
 
Still FTS
Nitin, May       15, 2003 - 7:12 pm UTC
 
 
The difference is that the optimizer is choosing to perform FTS on T1 (which is huge) and use index scan on T2 which is very small.
Regards
 
 
May       15, 2003 - 7:34 pm UTC 
 
 
ops$tkyte@ORA817DEV> set autotrace traceonly explain
ops$tkyte@ORA817DEV> select * from t1, ( select distinct object_id from t2 ) t2
  2  where t1.object_id = t2.object_id;
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=1 Bytes=109)
   1    0   NESTED LOOPS (Cost=5 Card=1 Bytes=109)
   2    1     VIEW (Cost=3 Card=1 Bytes=13)
   3    2       SORT (UNIQUE) (Cost=3 Card=1 Bytes=13)
   4    3         INDEX (FULL SCAN) OF 'T2_IDX' (NON-UNIQUE)
   5    1     TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=26793 Bytes=2572128)
   6    5       INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1 Card=26793)
in 8i seems to be the way to go -- things change, the optimizer gets smarter 
 
 
 
 
What about with updates?
Brett, May       21, 2003 - 2:16 am UTC
 
 
Tom,
How does IN with a subquery affect performance of updates - is the subquery executed once for each row (which could be really inefficient), or once for the query. eg:
update table1
   set column_1 = 'Y'
 where row_id in (select row_id
                    from table1 t1,
                         table2 t2
                   where t1.f_key = t2.row_id
                     and t2.class_cd = 'Address'
                     and t2.created > sysdate - 30);
Is this subquery executed only once for the entire update? Or would it be better to use the subquery to insert the data into a temporary table with a unique constraint/primary key and then use an updatable view based on table1 and the temporary table? 
 
May       21, 2003 - 8:36 am UTC 
 
 
the subquery, since it is not a correlated subquery, would tend to be done "once"
Now, if t2.row_id is UNIQUE, and there is a unique/primary key on it I would code:
ops$tkyte@ORA920> update ( select column_1
  2             from t1, t2
  3                    where t1.f_key = t2.row_id
  4                      and t2.class_cd = 'Address'
  5                          and t2.created > sysdate-30 )
  6    set column_1 = 'Y'
  7  /
 
 
 
 
 
Are these same?
Nitin, May       22, 2003 - 12:48 pm UTC
 
 
Are the following SQL's similar? Will they retrieve same set of data under all circumstances?
1.
select 
APRCN_ID_BI, APPRAISAL_DT, RANK_ID 
from APPRECIATN_TBL WHERE
 (APRCN_ID, APRCN_ID, ACTIVITY_ID, NOTES_ID)   in  ( SELECT 
APRCN_ID, APRCN_ID, ACTIVITY_ID, NOTES_ID
FROM APPRECIATN_TBL_TMP TMP WHERE
  APPRECIATN_TBL.APRCN_ID = TMP.APRCN_ID AND
  APPRECIATN_TBL.APRCN_ID = TMP.APRCN_ID AND
  APPRECIATN_TBL.ACTIVITY_ID = TMP.ACTIVITY_ID AND
  APPRECIATN_TBL.NOTES_ID = TMP.NOTES_ID AND TMP.PROCESS_INSTANCE =
  363480 AND TMP.APRCN_ID_BI = '52000' AND TMP.APPRAISAL_DT =
  TO_DATE('2003-01-19','YYYY-MM-DD') AND TMP.RANK_ID = '55053')
2.
select 
APRCN_ID_BI, APPRAISAL_DT, RANK_ID 
from APPRECIATN_TBL WHERE
EXISTS  ( SELECT 
APRCN_ID, APRCN_ID, ACTIVITY_ID, NOTES_ID
FROM APPRECIATN_TBL_TMP TMP WHERE
  APPRECIATN_TBL.APRCN_ID = TMP.APRCN_ID AND
  APPRECIATN_TBL.APRCN_ID = TMP.APRCN_ID AND
  APPRECIATN_TBL.ACTIVITY_ID = TMP.ACTIVITY_ID AND
  APPRECIATN_TBL.NOTES_ID = TMP.NOTES_ID AND TMP.PROCESS_INSTANCE =
  363480 AND TMP.APRCN_ID_BI = '52000' AND TMP.APPRAISAL_DT =
  TO_DATE('2003-01-19','YYYY-MM-DD') AND TMP.RANK_ID = '55053')
Regards
 
 
May       23, 2003 - 8:25 am UTC 
 
 
they are the same, but the first is less efficient by far.  a correlated "IN", wouldn't do that. 
 
 
 
query
mo, May       22, 2003 - 7:09 pm UTC
 
 
Tom:
Do you know why the first query does not produce any results while the other does:
select request_id from shipm_request where request_id not in (Select distinct request_id from shipment);
select request_id from shipm_request a where request_id not exists (select request_id from shipment b where a.request_id = b.request_id);
Table shipm_request(request_id) has one to many relationship with table shipment(shipment_id).
Thanks 
 
May       23, 2003 - 8:39 am UTC 
 
 
lookup upwards for this text:
No, partially because NOT IN and NOT EXISTS are not substitutes for eachother! 
and read that.  it is all about NULL 
 
 
 
Proof IN / EXISTS
Marcio, August    29, 2003 - 1:13 pm UTC
 
 
Tom, I did test to prove to my colleague the theory of IN/EXISTS exposed here (He just ignore it and ask to developers to use IN always -- to improve performance). 
But when I executed -- to me they are same!
take a look 
ops$t_mp00@MRP9I1> drop table small;
Table dropped.
ops$t_mp00@MRP9I1> drop table big;
Table dropped.
ops$t_mp00@MRP9I1> ed teste_in_exist
ops$t_mp00@MRP9I1> create table big as select * from all_objects;
Table created.
ops$t_mp00@MRP9I1> insert /*+ append */ into big select * from big;
31317 rows created.
ops$t_mp00@MRP9I1> commit;
Commit complete.
ops$t_mp00@MRP9I1> insert /*+ append */ into big select * from big;
62634 rows created.
ops$t_mp00@MRP9I1> commit;
Commit complete.
ops$t_mp00@MRP9I1> insert /*+ append */ into big select * from big;
125268 rows created.
ops$t_mp00@MRP9I1> commit;
Commit complete.
ops$t_mp00@MRP9I1> create index big_idx on big(object_id);
Index created.
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> create table small as select * from all_objects where rownum < 100;
Table created.
ops$t_mp00@MRP9I1> create index small_idx on small(object_id);
Index created.
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> analyze table big compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns
  5  /
Table analyzed.
ops$t_mp00@MRP9I1> analyze table small compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns
  5  /
Table analyzed.
ops$t_mp00@MRP9I1> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.2.1 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production
c:\migracao\loads>plus /
SQL*Plus: Release 9.2.0.2.0 - Production on Fri Aug 29 14:07:08 2003
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.2.1 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production
ops$t_mp00@MRP9I1> @teste_in_exist
ops$t_mp00@MRP9I1> rem create table big as select * from all_objects;
ops$t_mp00@MRP9I1> rem insert /*+ append */ into big select * from big;
ops$t_mp00@MRP9I1> rem commit;
ops$t_mp00@MRP9I1> rem insert /*+ append */ into big select * from big;
ops$t_mp00@MRP9I1> rem commit;
ops$t_mp00@MRP9I1> rem insert /*+ append */ into big select * from big;
ops$t_mp00@MRP9I1> rem commit;
ops$t_mp00@MRP9I1> rem create index big_idx on big(object_id);
ops$t_mp00@MRP9I1> rem
ops$t_mp00@MRP9I1> rem
ops$t_mp00@MRP9I1> rem create table small as select * from all_objects where rownum < 100;
ops$t_mp00@MRP9I1> rem create index small_idx on small(object_id);
ops$t_mp00@MRP9I1> rem
ops$t_mp00@MRP9I1> rem analyze table big compute statistics
ops$t_mp00@MRP9I1> rem for table
ops$t_mp00@MRP9I1> rem for all indexes
ops$t_mp00@MRP9I1> rem for all indexed columns
ops$t_mp00@MRP9I1> rem /
ops$t_mp00@MRP9I1> rem analyze table small compute statistics
ops$t_mp00@MRP9I1> rem for table
ops$t_mp00@MRP9I1> rem for all indexes
ops$t_mp00@MRP9I1> rem for all indexed columns
ops$t_mp00@MRP9I1> rem /
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> alter session set sql_trace = true;
Session altered.
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> select count(subobject_name)
  2    from big
  3   where object_id in ( select object_id from small )
  4  /
COUNT(SUBOBJECT_NAME)
---------------------
                    0
1 row selected.
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> select count(subobject_name)
  2    from big
  3   where exists ( select null from small where small.object_id = big.object_id )
  4  /
COUNT(SUBOBJECT_NAME)
---------------------
                    0
1 row selected.
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> select count(subobject_name)
  2    from small
  3   where object_id in ( select object_id from big )
  4  /
COUNT(SUBOBJECT_NAME)
---------------------
                    0
1 row selected.
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> select count(subobject_name)
  2    from small
  3   where exists ( select null from big where small.object_id = big.object_id )
  4  /
COUNT(SUBOBJECT_NAME)
---------------------
                    0
1 row selected.
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> alter session set sql_trace = false;
Session altered.
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> set echo off
ops$t_mp00@MRP9I1> host tkprof e:\oracle\admin\mrp9i1\udump\mrp9i1_ora_1876.trc sys=n output=tkprof.txt explain=/
TKPROF: Release 9.2.0.2.1 - Production on Fri Aug 29 14:07:57 2003
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
ops$t_mp00@MRP9I1>
tkprof.txt
==========
select count(subobject_name)
  from big
 where object_id in ( select object_id from small )
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.02       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      7.11       7.45          0       3424          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      7.13       7.47          0       3424          0           1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 102  (OPS$T_MP00)
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=3424 r=0 w=0 time=7457761 us)
    792   NESTED LOOPS SEMI (cr=3424 r=0 w=0 time=7454167 us)
 250536    TABLE ACCESS FULL BIG (cr=3422 r=0 w=0 time=1579621 us)
    792    INDEX RANGE SCAN SMALL_IDX (cr=2 r=0 w=0 time=2667651 us)(object id 40890)
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
    792    NESTED LOOPS (SEMI)
 250536     TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'BIG'
    792     INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'SMALL_IDX' 
                (NON-UNIQUE)
********************************************************************************
select count(subobject_name)
  from big
 where exists ( select null from small where small.object_id = big.object_id )
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      6.99       6.99          0       3421          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      7.00       7.00          0       3421          0           1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 102  (OPS$T_MP00)
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=3421 r=0 w=0 time=6995659 us)
    792   NESTED LOOPS SEMI (cr=3421 r=0 w=0 time=6992099 us)
 250536    TABLE ACCESS FULL BIG (cr=3419 r=0 w=0 time=1473069 us)
    792    INDEX RANGE SCAN SMALL_IDX (cr=2 r=0 w=0 time=2557279 us)(object id 40890)
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
    792    NESTED LOOPS (SEMI)
 250536     TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'BIG'
    792     INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'SMALL_IDX' 
                (NON-UNIQUE)
********************************************************************************
select count(subobject_name)
  from small
 where object_id in ( select object_id from big )
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.01       0.00          0        105          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.02       0.01          0        105          0           1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 102  (OPS$T_MP00)
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=105 r=0 w=0 time=7280 us)
     99   NESTED LOOPS SEMI (cr=105 r=0 w=0 time=6659 us)
     99    TABLE ACCESS FULL SMALL (cr=4 r=0 w=0 time=886 us)
     99    INDEX RANGE SCAN BIG_IDX (cr=101 r=0 w=0 time=3305 us)(object id 40888)
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     99    NESTED LOOPS (SEMI)
     99     TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'SMALL'
     99     INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'BIG_IDX' (NON-UNIQUE)
********************************************************************************
select count(subobject_name)
  from small
 where exists ( select null from big where small.object_id = big.object_id )
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.01       0.00          0        105          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.01       0.01          0        105          0           1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 102  (OPS$T_MP00)
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=105 r=0 w=0 time=7255 us)
     99   NESTED LOOPS SEMI (cr=105 r=0 w=0 time=6643 us)
     99    TABLE ACCESS FULL SMALL (cr=4 r=0 w=0 time=863 us)
     99    INDEX RANGE SCAN BIG_IDX (cr=101 r=0 w=0 time=3243 us)(object id 40888)
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     99    NESTED LOOPS (SEMI)
     99     TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'SMALL'
     99     INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'BIG_IDX' (NON-UNIQUE)
To me they are same! 
Any thought?
 
 
August    29, 2003 - 1:17 pm UTC 
 
 
yes, the cbo is much much smarter then the rbo ever was, i should have caveated that.  using the rbo -- very different.  using the cbo -- all fast, you need not be as concerned about small things like this as much 
 
 
 
Query Result
AVS, September 03, 2003 - 11:10 pm UTC
 
 
Will the two queries metioned below fetch me the same result ?
Query : 1     SELECT /*+ index(ord_l2) */
       MAX(ord.data_date) data_date,
           us.person_id,
           MAX(us.level_1) ouc,
           MAX(ord.taker_user_id) css_user_id,
           MAX(ord.week_number) week_no,
           SUM(DECODE(ord.create_type,'M',
            DECODE(mv1.measure,'OSM00091',0,'OSM00096',0,ord_l2.quantity),
            ord_l2.quantity)) total_quantity,
       SUM(ord_l2.SPECIAL_PAYMENT_CHARGE) * 100 Money_value,
           mv1.measure measure,
       us.mgr_ouc
      FROM poc1  ord,
           fcih  us,
           polc1  ord_l2,
           fmv  mv1
     WHERE ord.taker_user_id = us.css_user_id
       AND ord.create_type IN ('P', 'G', 'M')
       AND ord.data_date = report_date
       AND ord.sub_zone LIKE report_sub_zone
       AND ord.order_status         != 'OL'
       AND ord_l2.input_cont_type     IN  ('O', 'R', 'S1')
       AND ord_l2.order_number = ord.order_number
       AND ord_l2.data_date    = ord.data_date
       AND ord_l2.sub_zone     = ord.sub_zone
       AND ord_l2.product_id = mv1.column_value
       AND ord_l2.action_id NOT IN ('C', 'R', 'S')
       AND ord_l2.status != 'X'
       AND mv1.source_system = 'FCH'
       AND (mv1.measure = 'OSM00100' OR mv1.measure LIKE ('OSM0009%'))
       AND mv1.column_name = 'PRODUCT_ID'
       AND EXISTS (SELECT 'x'
                     FROM polc1  ord_l,
                          fmv  mv2
                    WHERE ord_l.input_cont_type     IN  ('O', 'R', 'S1')
                AND ord_l.order_number = ord.order_number
                      AND ord_l.data_date    = ord.data_date
                      AND ord_l.sub_zone     = ord.sub_zone
                      AND ord_l.product_id = mv2.column_value
                      AND ord_l.action_id NOT IN ('C', 'R', 'S')
                      AND ord_l.status != 'X'
                      AND mv2.source_system = 'FCH'
                      AND (mv2.measure ='OSM00100' OR mv2.measure LIKE ('OSM0009%'))
                      AND mv2.column_name = 'PRODUCT_ID'
                      AND EXISTS (SELECT 'x'
                                    FROM fmv  mv3
                                   WHERE ord_l.sales_channel = mv3.column_value
                                     AND mv3.source_system = 'FCH'
                                     AND mv3.measure = mv2.measure
                                     AND mv3.column_name = 'SALES_CHANNEL'))
  GROUP BY person_id, measure, mgr_ouc;
Query 2 :    SELECT /*+ index(ord_l2) */
       MAX(ord.data_date) data_date,
           us.person_id,
           MAX(us.level_1) ouc,
           MAX(ord.taker_user_id) css_user_id,
           MAX(ord.week_number) week_no,
           SUM(DECODE(ord.create_type,'M',
            DECODE(mv1.measure,'OSM00091',0,'OSM00096',0,ord_l2.quantity),
            ord_l2.quantity)) total_quantity,
       SUM(ord_l2.SPECIAL_PAYMENT_CHARGE) * 100 Money_value,
           mv1.measure measure,
       us.mgr_ouc
      FROM poc1  ord,
           fcih  us,
           polc1  ord_l2,
           fmv  mv1
     WHERE ord.taker_user_id = us.css_user_id
       AND ord.create_type IN ('P', 'G', 'M')
       AND ord.data_date = report_date
       AND ord.sub_zone LIKE report_sub_zone
       AND ord.order_status         != 'OL'
       AND ord_l2.input_cont_type     IN  ('O', 'R', 'S1')
       AND ord_l2.order_number = ord.order_number
       AND ord_l2.data_date    = ord.data_date
       AND ord_l2.sub_zone     = ord.sub_zone
       AND ord_l2.product_id = mv1.column_value
       AND ord_l2.action_id NOT IN ('C', 'R', 'S')
       AND ord_l2.status != 'X'
       AND mv1.source_system = 'FCH'
       AND (mv1.measure = 'OSM00100' OR mv1.measure LIKE ('OSM0009%'))
       AND mv1.column_name = 'PRODUCT_ID'
       AND EXISTS (SELECT 'x'
                   FROM fmv  mv3
                   WHERE ord_l.sales_channel = mv3.column_value
                   AND mv3.source_system = 'FCH'
                   AND mv3.measure = mv1.measure
                   AND mv3.column_name = 'SALES_CHANNEL')
  GROUP BY person_id, measure, mgr_ouc;
 
 
September 04, 2003 - 8:52 am UTC 
 
 
i don't know, hows about you tell us what is different between them.  
they are pretty big - point out the salient changes and why you made them. 
 
 
 
Query Tuning (Rewriting).
AVS, September 05, 2003 - 9:22 am UTC
 
 
The Query 1 looks like 
SELECT XXXXXX FROM A,B,C,D       /* PART 1 of QUERY 1 */ 
WHERE CONDITION1
AND   CONDITION2
AND   CONDITION3
AND   CONDITION4
AND   CONDITION5
AND   CONDITION6
AND   CONDITION7
AND   CONDITION8
AND   CONDITION9
AND   CONDITION10
AND   CONDITION11
AND   CONDITION12
AND   CONDITION13
AND   CONDITION14
AND   CONDITION15
AND   EXISTS 
       (SELECT x FROM C,D        /* PART 2 of QUERY 1 */ 
        WHERE CONDITION1
        AND   CONDITION2
        AND   CONDITION3
        AND   CONDITION4
        AND   CONDITION5
        AND   CONDITION6
        AND   CONDITION7
        AND   CONDITION8
        AND   CONDITION9
        AND EXIST 
           (SELECT 'x' FROM C,D /* PART 3 of QUERY 1 */ 
            WHERE CONDITION1
            AND   CONDITION2
            AND   CONDITION3
            AND   CONDITION4 )
I found that the PART 2's (FIRST EXISTS) WHERE conditions is repeatation of PART1's condtions. Hence I want to eliminate it, which I've done in QUERY 2.
The PART1 of Query1 without EXISTS fetch me reults in 5 minutes but with SINGLE EXISTS (QUERY 2) it runs for 4 hours . 
So I wrote another yet query by creating a new table, NEWTAB
/*   CREATE TABLE NEWTAB AS 
SELECT * FROM fch
WHERE SOURCE_SYSTEM = 'FCH' 
and   COLUMN_NAME = 'SALES_COLUMN'  */
Query 3 :
    SELECT /*+ index(ord_l2) */
       MAX(ord.data_date) data_date,
           us.person_id,
           MAX(us.level_1) ouc,
           MAX(ord.taker_user_id) css_user_id,
           MAX(ord.week_number) week_no,
           SUM(DECODE(ord.create_type,'M',
            DECODE(mv1.measure,'OSM00091',0,'OSM00096',0,ord_l2.quantity),
            ord_l2.quantity)) total_quantity,
       SUM(ord_l2.SPECIAL_PAYMENT_CHARGE) * 100 Money_value,
           mv1.measure measure,
       us.mgr_ouc
      FROM PSTN_ORDER_C_1@qdwp ord,
           fch_user_id_hier@qdwp us,
           PSTN_ORDER_LINE_C_1@qdwp ord_l2,
           fch_measure_values@qdwp mv1, NEWTAB mv3
     WHERE ord.taker_user_id = us.css_user_id
       AND ord.create_type IN ('P', 'G', 'M')
       AND ord.data_date = report_date
       AND ord.sub_zone LIKE report_sub_zone
       AND ord.order_status         != 'OL'
       AND ord_l2.input_cont_type     IN  ('O', 'R', 'S1')
       AND ord_l2.order_number = ord.order_number
       AND ord_l2.data_date    = ord.data_date
       AND ord_l2.sub_zone     = ord.sub_zone
       AND ord_l2.product_id = mv1.column_value
       AND ord_l2.action_id NOT IN ('C', 'R', 'S')
       AND ord_l2.status != 'X'
       AND mv1.source_system = 'FCH'
       AND (mv1.measure = 'OSM00100' OR mv1.measure LIKE ('OSM0009%'))
       AND mv1.column_name = 'PRODUCT_ID'
       AND ord_l2.sales_channel = mv3.column_value
       AND mv1.measure = mv3.measure 
  GROUP BY person_id, mv1.measure, mgr_ouc;
This query (QUERY 3) also takes 4 hours. 
Hope you are now clear, what I want to achieve !
Please Help. 
 
September 05, 2003 - 4:17 pm UTC 
 
 
what do you want to achieve?  
help with what?  
if this is "please tune my query" -- not here, this is for followups on the original question. 
 
 
 
Query Rewriting
AVS, September 08, 2003 - 8:22 am UTC
 
 
Yes you are right I was trying to tune the query, for which I required your help. Since it consisted of removing of EXIST clause from one of the query I thought its continuation of the original subject(IN & EXISTS). If it was annoying I'm sorry.
 
 
 
Query Execution takes forever using IN operator
TS, November  07, 2003 - 2:52 pm UTC
 
 
Tom,
I 've a query like this:-
select id from table1 
where ( contains( name, ' $?AB and $?CD and $?EF ' ) > 0)
and id in ('1234','4567','97343','0934',4576','6376''5017', '3930', '5099', '5089', '5061', '4753', '3646', '3357', '3047', '8059','4124', '17591','14700', '60973', '34987', '37586', '58985', '34874', '49161', '11489');
the name column which has an intermedia index  is being used during query execution but the index on ID is not being used.
The query takes a long time to execute. I tried using
RBO, but still was slow. Eventually I tried deleting the 
statistics using "analyze table1 delete statistics" and the
query was very fast. So,do I need to delete the statistics
to make it run faster(my only concern is this query might run faster after stats delete but some other query hitting the same table might become slower),or is there a way to rewrite the query to make it run faster?
Thanks for your help.
TS 
 
November  07, 2003 - 5:24 pm UTC 
 
 
well, a query with contains is going to use CBO no matter what (so, you never used RBO, cannot be done!)
Hows about you show us
a) tkprof AND autotrace traceonly explain of the query with stats
b) tkprof AND autorace traceonly explain of the query without stats
 
 
 
 
Followup: Query Execution takes forever using IN operator 
TS, November  12, 2003 - 11:25 am UTC
 
 
Tom,
Here 's the tkprof output running the query,
Without Stats:-
==============
select id from table1 
where ( contains( name, ' $?AB and $?CD and $?EF ' ) > 0)
and id in ('1234','4567','97343','0934',4576','6376''5017', '3930', '5099', 
'5089', '5061', '4753', '3646', '3357', '3047', '8059','4124', '17591','14700', 
'60973', '34987', '37586', '58985', '34874', '49161', '11489');
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=88)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TABLE1' (Cost=1 Card=1 B
      ytes=88)
   2    1     DOMAIN INDEX OF 'NAME_TXT_IND' (Cost=1 Card=1)
from tkprof output:-
===================
EXPLAIN PLAN SET STATEMENT_ID='PLUS1583908' FOR select id from table1 
where ( contains( name, ' $?AB and $?CD and $?EF ' ) > 0)
and id in ('1234','4567','97343','0934',4576','6376''5017', '3930', '5099', 
'5089', '5061', '4753', '3646', '3357', '3047', '8059','4124', '17591','14700', 
'60973', '34987', '37586', '58985', '34874', '49161', '11489');
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0         92          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.00          0         92          0           0
With Stats:-
==========
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=88)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TABLE1' (Cost=1 Card=1 B
      ytes=88)
   2    1     DOMAIN INDEX OF 'NAME_TXT_IND' (Cost=1 Card=1)
Statistics
----------------------------------------------------------
       1313  recursive calls
      1  db block gets
       1715  consistent gets
    205  physical reads
       9568  redo size
       6729  bytes sent via SQL*Net to client
       4967  bytes received via SQL*Net from client
     67  SQL*Net roundtrips to/from client
     16  sorts (memory)
      0  sorts (disk)
      0  rows processed
tkprof output:-
=============
select id from table1 
where ( contains( name, ' $?AB and $?CD and $?EF ' ) > 0)
and id in ('1234','4567','97343','0934',4576','6376''5017', '3930', '5099', 
'5089', '5061', '4753', '3646', '3357', '3047', '8059','4124', '17591','14700', 
'60973', '34987', '37586', '58985', '34874', '49161', '11489');
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.00       0.00          0         33          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.00       0.00          0         33          0           0
Thanks,
TS 
 
November  12, 2003 - 4:34 pm UTC 
 
 
i want the RAW tkprof with timings -- no explain plans (they can lie).  show me the real tkprof, making sure of course that timed statistics was ON 
 
 
 
Thanks for the insight
dharma, December  30, 2003 - 4:20 pm UTC
 
 
HI Tom,
I have a table like this(A fictitious though)
company_id    deptno    empno    status
1        10    1    A
1        10    2    A
1        10    3    P
1        20    4    A
1        20    5    A
1        30    6    A
1        30    7    P
2        10    8    A
2        10    9    A
2        20    10    P
2        30    11    A
Primary Key on company_id,deptno,empno
To get the depts that dont have employees with status=P.
Is there anyother way other than using not exists to bring the required rows
SELECT DISTINCT deptno FROM COMP_EMP ce
 WHERE NOT EXISTS (SELECT 1 FROM COMP_EMP ces
                    WHERE ce.company_id=ces.company_id
                      AND ce.deptno=ces.deptno
                      AND ces.status='P')
since it does a sub-query on the table comp_emp and comp_emp has a million rows, will "not exists" withstand the query.
BTW
Database is Oracle9i Enterprise Edition Release 9.2.0.4.0 
 
December  30, 2003 - 4:57 pm UTC 
 
 
select deptno, count( decode(status,'P',1) )
  from comp_emp
 group by deptno
having count( decode( status,'P',1 ) ) = 0;
will do it as well.  there are dozens of ways....
select deptno from comp_emp
minus
select deptno from comp_emp where status = 'P';
is another.... 
 
 
 
ORA-01415 too many distinct aggregate functions..
Naveen, January   02, 2004 - 6:23 am UTC
 
 
Hi Tom,
I have a select statement which has a IN clause like below:
Select <column list>
from <tablename>
where column1 in(1,2,3,..)
When i am testing it by giving more and more values in the IN clause it gave the above error. Why is there a limit for values that can be set in a IN clause. Is the above approach recommended. Is there a way to overcome this. Kindly throw some light on the issue.
Thank you
Nav. 
 
January   02, 2004 - 9:39 am UTC 
 
 
there is a documented limited on the number of OR's and inlist items.
use a subquery (you probably aren't using binds either are you!).
create global temporary table t ( x int ) on commit delete rows;
and then INSERT the values into t and query simply
select * from table where c in ( select * from t );
 
 
 
 
Thanks Tom
Naveen, January   02, 2004 - 11:50 am UTC
 
 
Hi Tom,
Yes you are correct. I am not using bind variables either. I am checking the limit of IN clause as i have to decide which approach i need to take. I'll keep this in mind and thanks again for immediate reply.
Thank you
Nav.  
 
January   02, 2004 - 1:46 pm UTC 
 
 
not using bind variables will be the largest mistake of your entire career!  rethink that. 
 
 
 
interesting ?
Fenng, February  03, 2004 - 6:39 am UTC
 
 
SQL> desc a
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 X                                                  NUMBER(38)
 Y                                                  VARCHAR2(1)
SQL> desc b
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 X1                                                 NUMBER(38)
 Y1                                                 VARCHAR2(1)
SQL> select x from a
  2   where x not in (select x from b);
no rows selected
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'A'
   3    1     FILTER
   4    3       TABLE ACCESS (FULL) OF 'B'
SQL> 
It's OK.But "x" is NOT table B's field.Would you like explain it for us ? Thanks a lot ! 
 
 
February  03, 2004 - 7:57 am UTC 
 
 
it is expected, done by design, part of the standard.
it is simply called a correlated subquery.
consider:
select * from dept
where exists ( select null from emp where emp.deptno = dept.deptno )
                                                       ^^^^^^^^^^^
you expect that to work right?  well:
SQL> select x from a
  2   where x not in (select x from b);
<b>is not any different than:</b>
SQL> select x from a
  2   where x not in (select <b>A.X</b> from b); 
 
 
 
 
Thank you!
Fenng, February  03, 2004 - 9:06 pm UTC
 
 
Thank you very much~! 
 
 
How to optimize this query?
Tony, April     14, 2004 - 9:32 am UTC
 
 
Tom,
 I want the rows from table cust_trans that are not in both cust_mast and customer_master@db_link1 tables. The query is
    
SELECT a.cust_id, a.cust_name, a.trans_date 
FROM cust_trans  SUBPARTITION(sp001) a
WHERE( a.cust_id 
             NOT IN(SELECT b.cust_id 
                    FROM cust_mast SUBPARTITION(sp001) b )
        AND a.cust_id 
             NOT IN(SELECT c.cust_id 
                    FROM customer_master@db_link1 c )
      );
The table customer_master is accessed through db link and it has millions of records. Other tables also have millions of records. The subpartiton name is dynamically passed to the stored procedure. Thus I'm going to use dynamic cursor or Execute immediate. Is there any way to optimize this query?
 
 
April     14, 2004 - 10:51 am UTC 
 
 
got a query plan for us to look at? (set autotrace traceonly explain would be best) 
 
 
 
exists is NOT the same as join right?
A reader, April     28, 2004 - 4:55 am UTC
 
 
Hi
I always thought that an EXISTS can be converted to a JOIN until I find this query
select count(*)
FROM OM_WELCOME_PACK t
WHERE EXISTS (SELECT 1
                FROM OM_EMF e, LISTA_ORDENES l
               WHERE t.emf_ext_id=e.external_id
                 AND e.order_id=l.order_id
                 AND l.order_type_id=3)
  COUNT(*)
----------
        99
If I convert this to a join
select count(*)
FROM OM_WELCOME_PACK t, OM_EMF e, LISTA_ORDENES l
WHERE t.emf_ext_id=e.external_id
AND e.order_id=l.order_id
  COUNT(*)
----------
       163
I get duplicate rows, is this correct behaviour...? 
 
 
Jr. Oracle Developer
Dave, May       24, 2004 - 5:29 pm UTC
 
 
Hello Tom, 
I need to combine three select statement in script1/select1. 
Two columns (PCS, Speciality) of script1 should be base on the condition define in script 2 & script 3. 
Columns alias are "PCS" and "Specialty" 
PCS should be base on script two and Speciality should be base on script 3. 
Please have a look. 
Script 1: (Main Script) 
select a.ref_patient_fname|| ' , ' || a.ref_patient_lname "Patient Name", 
to_char( a.ref_patient_dob, 'mm/dd/yy' ) DOB, 
a.ref_patient_id1 "OVMC ID", 
d.site_name || ' , '|| b. clact_desc "PCS", 
d.site_name || ' , '|| b. clact_desc "Specialty", 
to_char(c.appt_date , 'HH:MI AM' ) Appointments 
from dhs_referrals a, 
dhs_clinical_activities b, 
dhs_appointments c, 
dhs_sites d 
where a.ref_clact_id_requested = b.clact_id and 
a.ref_clact_id_requesting = b.clact_id and 
c.appt_ref_id = a.ref_id and 
d.site_id = b.clact_site_id 
Script 2: 
select c.site_name || ' , '|| b. clact_desc "PCS" , a.ref_clact_id_requesting 
from dhs_sites c, dhs_clinical_activities b, dhs_referrals a 
where a.ref_clact_id_requesting = b.clact_id 
Script 3 
select c.site_name || ' , '|| b. clact_desc "Speciality" , a.ref_clact_id_requested 
from dhs_sites c, dhs_clinical_activities b, dhs_referrals a 
where a.ref_clact_id_requested = b.clact_id 
Any help will be appreciated. 
Dave
 
 
May       24, 2004 - 7:12 pm UTC 
 
 
don't understand the question.  I see three queries but I've not any idea what you are trying to do with them really? 
 
 
 
Jr. Oracle Developer
Dave, May       24, 2004 - 7:38 pm UTC
 
 
Sorry for inconvinince.
Actually, On my script1.
I have two columns "PCS" and "Speciality".
I need to call values for "PCS" and "Speciality" based on script 2 & script 3 respectively.
e.g:
select  c.site_name || ' , '|| b. clact_desc "PCS" 
from dhs_sites c, dhs_clinical_activities b, dhs_referrals a
where a.ref_clact_id_requesting = b.clact_id 
will give me the values for "PCS" in script1.
and
select  c.site_name || ' , '|| b. clact_desc "Speciality"  
from dhs_sites c, dhs_clinical_activities b, dhs_referrals a
where a.ref_clact_id_requested = b.clact_id
will give me values for "Speciality" in Script1.
Please note: I used "PCS" and "Speciality" as a column aliases in all scripts. 
cheers,
Dave
 
 
May       24, 2004 - 11:44 pm UTC 
 
 
and what does:
I need to call values for "PCS" and "Speciality" based 
mean exactly?  I mean you have two spool files maybe, one full of column "X" and one full of column "Y".  Now what?  what is the logic to use that? 
 
 
 
Jr. Oracle Developer
Dave, May       25, 2004 - 12:44 pm UTC
 
 
No I do not have any spool file etc.
Here what I wanted to do.
e.g:
I have main script which is consists of three columns.
select a.DOB, 
b.clact_desc||' , '||c.site_name "PCS", 
b.clact_desc||' , '||c.site_name "Speciality"
from dhs_referrals a, dhs_clinical b, dhs_site c
where .....
In dhs_referrals table I also have two more columns called clact_requested_id & clact_requesting_id.
Here is the logic I wanted to put for column 2 & 3 respectively.
where b.clact_id = a.clact_requesting_id 
gives me the value for PCS. (Column 2 in the main script above)
or
where b.clact_id = a.clact_requested_id gives me the value for Speacility. (Column 3 in the main script above)
I need your help to add above two logic in the main select script.
 
 
May       25, 2004 - 12:58 pm UTC 
 
 
sorry -- i don't think we are going to get anywhere here.  Your terminology is just not anything I'm understanding.  I do not see the tie in between columns in a select and separate predicates.    
 
 
 
Join or sub-query
atul, August    20, 2004 - 5:01 am UTC
 
 
Hi,
What is fast Join or subquery??
Under what conditions??
Thanks,
Atul 
 
August    20, 2004 - 11:14 am UTC 
 
 
er?  does not compute. 
 
 
 
How to update a row if the row exist if not insert?
Vinod, September 23, 2004 - 10:09 pm UTC
 
 
Tom - I want to update a row if the row exist if not insert that row. Is there a simple way to do this? I was doing if exist then update if not insert but I would like to try it in one single DML if possible.
Vinod 
 
September 24, 2004 - 10:35 am UTC 
 
 
in theory you could use merge
merge into t
into ( select :x x, :y y, :z z from dual ) d
on ( t.x = d.x )
when matched then update set ...
when not matched then insert....
but 
update t set y = :y, z = :z where x = :x;
if (sql%rowcount = 0)
then
   insert into t (...) values (....);
end if;
it about the same in this case.
 
 
 
 
What about "=" and "LIKE" comparison operators
denni50, October   11, 2004 - 3:15 pm UTC
 
 
Hi Tom....Welcome back!....(we MISSED you!)
I know the discussion here involves NOT,EXISTS,NOT IN and
NOT EXISTS operators....however I recently stumbled upon a
situation where using the "=" comparison caused the 
system to come to a grinding halt. When I changed the
operator to LIKE 'A%' instead of = 'A' the system performed
normally. 
We recently converted test system to CBO and encountered
this situation for the first time when testing one of
our third party software modules where users are allowed
to insert additional sql statements. With RBO users would
enter: where col_name='A' and everything would work fine. 
This was discovered when investigating why this function performed normally during the first phase, then came to a grinding slowdown during the second phase and saw where some users entered col_name LIKE 'A%' and others had entered col_name ='A'(this is when the system would virtually stop). When I changed all the ='A' to LIKE 'A%'everything ran smoothly.
Have you ever encountered performance issues between = and
LIKE?
glad you made it "home" safe and sound.
 
 
 
October   11, 2004 - 4:56 pm UTC 
 
 
how do you gather statistics -- exact command.
when you do an autotrace traceonly explain on the two -- what do you see as differences?  is the CARD=xxxx part of the query even close to reality. 
 
 
 
OEM SQL ANALYZE
denni50, October   11, 2004 - 5:18 pm UTC
 
 
I ran a sql analyze through OEM on both statements
the cost was around 10450-14512 for both sql statements.
Can't post results due to proprietary constraints.
There's alot of nested loops and bitmap conversions
from rowids with the largest table(approx 8+ million
records) doing full table scans.
 
 
October   11, 2004 - 7:47 pm UTC 
 
 
cost is not meaningful.
the CARD= parts -- are they even *close* to reality.
given you stats method of collection (no histograms -- just one bucket with method_opt at its default) the optimizer is using "distributions and best guesses".
My guess is, 'A' has lots and lots of values.
where column = 'A' is guessing "too low" (no histograms)
where column like 'A%' is guessing "higher, closer to reality"
 
 
 
 
Gather statistics
denni50, October   11, 2004 - 5:21 pm UTC
 
 
I use the dbms_stats package:
gather_schema_stats cascade>=true
 
 
 
thanks Tom
denni50, October   12, 2004 - 9:43 am UTC
 
 
I understand what you're saying about the histograms.
While you were away I experimented and ran statistics(dbms_stats) on the schema and used the method_opt SIZE AUTO on Indexed Columns parameter to gather histogram stats. Upon completion everything I tried to run, simple sql statements and queries took forever to run. I would log into the software application and everything seemed to just hang. I deleted all the histograms, re-ran table and index stats only and performance went back to normal processing
except for this latest incident with the = and LIKE operators.
I'm still trying to learn and understand how histograms
affect and work within CBO. The other frustrating part
is that I'm dealing with code that is not mine so tuning
it is out of the question. All I can do is run my own
analyze and trace reports...send it to the vendor and
HOPE someone figures out the problem. 
I am going to re-run the trace reports through sqlplus
instead of OEM since the OEM does not show the card=
(at least I didn't see it unless it is represented under
 a different header).
I know "cost" is not what is important. Cost is just an
arbitrary number assigned to a particular query at a
particular time in a given environment that the
optimizer chooses as the lowest cost.
I've learned quite a bit in 2+ years but still have a
heck of alot more to learn and understand as far as CBO
is concerned.
 
 
 
Explain Plan
denni50, October   15, 2004 - 3:54 pm UTC
 
 
Tom
attached is the explain plan for the scenario I've been describing in the earlier threads here about the '=' and
'LIKE' operators.
(had to modify the names of tables,columns and indexes
 since this is not my code.)
Table1 has approx 3.2 million records
Table2 is a temporary table(just for this function) 
Table3 has 8+ million records.
Just by looking at the explain plan can you tell what
is going on. This is a new procedure that's been added
to a newer version of software that is causing the functionality to take 3 times longer to complete.
thanks
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=15825 Card=468 Bytes
          =30888)
   1    0   NESTED LOOPS (Cost=15825 Card=468 Bytes=30888)
   2    1     HASH JOIN (Cost=15824 Card=468 Bytes=28080)
   3    2       NESTED LOOPS (Cost=2714 Card=15279 Bytes=320859)
   4    3         SORT (UNIQUE)
   5    4           TABLE ACCESS (BY INDEX ROWID) OF 'TABLE1' (Cost=
          2649 Card=15279 Bytes=229185)
   6    5             INDEX (RANGE SCAN) OF 'TABLE1_COL2_INDEX'
           (NON-UNIQUE) (Cost=439 Card=241176)
   7    3         INDEX (UNIQUE SCAN) OF 'TABLE2_ID_IDX' (UNIQUE)
   8    2       VIEW (Cost=13084 Card=11085 Bytes=432315)
   9    8         FILTER
  10    9           SORT (GROUP BY) (Cost=13084 Card=11085 Bytes=31038
          0)
  11   10             NESTED LOOPS (Cost=10452 Card=230371 Bytes=64503
          88)
  12   11               TABLE ACCESS (FULL) OF 'TABLE3' (Cost=10450 C
          ard=230371 Bytes=5068162)
  13   11               INDEX (UNIQUE SCAN) OF 'TABLE2_ID_IDX' (UNIQ
          UE)
  14    1     INDEX (UNIQUE SCAN) OF 'PK_TABLE1_ID' (UNIQUE)
 
 
 
Crazy behaviour!!
Ganesh, December  09, 2004 - 7:25 am UTC
 
 
create table emp (empno number(3), dno number(3))
/
create table dept (dno number(3))
/
select * from emp
where empno in
(select empno from dept)
/
I'm wondering how the above query works when there is no empno field in dept table?   
 
December  09, 2004 - 2:03 pm UTC 
 
 
that is not crazy, it is expected and called a correlated subquery.
select * 
  from emp 
 where emp.empno in ( select emp.empno
                        from dept );
makes more sense with the correlation names there.....
just like:
select * from dept
where exists ( select null from emp where emp.deptno = dept.deptno );
the columns of the outer query are available in the inner queries. 
 
 
 
why is it so?
Ganesh, December  10, 2004 - 1:46 am UTC
 
 
Thanks Tom. I guessed that it would be correlated subquery.
By why does oracle apply its intelligence, when I dont mention explicitly emp.empno?
 
 
December  10, 2004 - 10:53 am UTC 
 
 
confusion -- it did apply "its intelligence"
your query was syntactically correct
your query was valid SQL
your query must be run as is
It did exactly what is it documented, expected and supposed to do.  It is not a "bad query", it is not a "mistake".  the only mistake here was that you didn't mean to run THAT query.
This is like asking us to raise an error because you query "where 1=0" or something.    
 
 
 
delete performance issue using 'IN'
A reader, December  13, 2004 - 12:16 pm UTC
 
 
Hi Tom,
I have the following SQL:
DELETE 
FROM nitsmedia 
WHERE nitstory_srl not in
(
 SELECT srl 
 FROM nitstory 
 WHERE lower(category) = 'stjohns'
   AND to_date('MAY-01-2003' || ' 00:00:00', 'MM-DD-YYYY HH24:MI:SS') <= last_modified_time
   AND last_modified_time <= to_date('MAY-01-2003' ||  '23:59:59', 'MM-DD-YYYY HH24:MI:SS')
   AND story_origin = 'Regional'
);
It takes forever! 
Number of rows for inner query: 10 rows
Number of rows for outer query: 15289
 
However, the following are fast:
(1) The inner query:
SELECT srl 
 FROM nitstory 
 WHERE lower(category) = 'stjohns'
   AND to_date('MAY-01-2003' || ' 00:00:00', 'MM-DD-YYYY HH24:MI:SS') <= last_modified_time
   AND last_modified_time <= to_date('MAY-01-2003' ||  '23:59:59', 'MM-DD-YYYY HH24:MI:SS')
   AND story_origin = 'Regional';
       SRL
----------
     70640
     70670
     70671
     70673
     70674
     70681
     70682
     70688
     70689
     70692
10 rows selected.
(2)
DELETE FROM nitsmedia WHERE nitstory_srl IN (70640,70670,70671,70673,70674,70681,70682,70688,70689,70692)
(3) DELETE FROM nitsmedia;
I remember this rule: 
If the result of the subquery is small -- then IN is typicaly more appropriate. However, this is not working this time!
Can you shed any light on why it is like this and what I should do?
Here are the info:
Oracle 9.2 on HP-UX
All tabled have been analyzed
Thanks,
- Arash
 
 
December  13, 2004 - 2:00 pm UTC 
 
 
are you using the CBO 
and is sr1 defined as NOT NULL
(and forget that rule of thumb, not really valid in CBO days... this dates way back to the time of the RBO... thinks change.) 
 
 
 
A reader, December  13, 2004 - 2:13 pm UTC
 
 
Yes Tom. I use the default which is 'choose' as optimizer mode. Srl is not null and there is foreign key between srl and nitstory_srl
and here is the explain plan:
Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=232 Card=15288 Bytes
          =1223040)
   1    0   DELETE OF 'NITSMEDIA'
   2    1     FILTER
   3    2       NESTED LOOPS (ANTI) (Cost=232 Card=15288 Bytes=1223040
          )
   4    3         TABLE ACCESS (FULL) OF 'NITSMEDIA' (Cost=25 Card=152
          89 Bytes=397514)
   5    3         TABLE ACCESS (BY INDEX ROWID) OF 'NITSTORY' (Cost=23
          2 Card=1 Bytes=54)
   6    5           BITMAP CONVERSION (TO ROWIDS)
   7    6             BITMAP AND
   8    7               BITMAP CONVERSION (FROM ROWIDS)
   9    8                 INDEX (RANGE SCAN) OF 'PK_NITSTORY' (UNIQUE)
  10    7               BITMAP CONVERSION (FROM ROWIDS)
  11   10                 INDEX (RANGE SCAN) OF 'I2_NITSTORY_STORY_ORI
          GIN' (NON-UNIQUE)
SQL> desc nitsmedia
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SRL                                       NOT NULL NUMBER(12)
 NITSTORY_SRL                              NOT NULL NUMBER(12)
 MEDIAFILE_NME                                      VARCHAR2(100)
 MEDIA_TYP                                 NOT NULL VARCHAR2(20)
 THUMBNAIL                                          VARCHAR2(100)
 CAPTION                                            VARCHAR2(1000)
 POSITION                                           NUMBER
 LAYOUT                                             VARCHAR2(10)
SQL> desc nitstory
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SRL                                       NOT NULL NUMBER(12)
 STORYFILE_NME                             NOT NULL VARCHAR2(30)
 AUTHOR                                             VARCHAR2(42)
 STORY_ORIGIN                                       VARCHAR2(10)
 CATEGORY                                           VARCHAR2(30)
 HEADLINE                                           VARCHAR2(200)
 SUBHEADLINE                                        VARCHAR2(200)
 KEYWORDS                                           VARCHAR2(200)
 HISTORY                                            VARCHAR2(200)
 DATELINE                                           VARCHAR2(100)
 HIGHLIGHT                                          VARCHAR2(1200)
 CREATION_TIME                                      VARCHAR2(30)
 LAST_MODIFIED_TIME                                 VARCHAR2(30)
 TIMEZONE                                           VARCHAR2(25)
 FOOTNOTES                                          VARCHAR2(1000)
 BODY                                               CLOB
I am not sure why there lots of "BITMAP CONVERSION" in plan?
Thank you Tom,
- Arash'
 
 
 
December  13, 2004 - 2:49 pm UTC 
 
 
can we see a tkprof of the "really slow one" 
 
 
 
A reader, December  14, 2004 - 11:23 am UTC
 
 
Tom,
Here is the tkprof for the query. I canceled the connection in the middle as I told you it takes forever. I also noticed that, it locks both tables 'nitsmedia' and 'nitstory'.
Thanks again for your help and outstanding website.
- Arash
DELETE
FROM nitsmedia
WHERE nitstory_srl not in
(
 SELECT srl
 FROM nitstory
 WHERE lower(category) = 'stjohns'
   AND to_date('MAY-01-2003' || ' 00:00:00', 'MM-DD-YYYY HH24:MI:SS') <= last_mo
dified_time
   AND last_modified_time <= to_date('MAY-01-2003' ||  '23:59:59', 'MM-DD-YYYY H
H24:MI:SS')
   AND story_origin = 'Regional'
)
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.11       0.11          0        323          0           0
Execute      1     51.59      52.22          0    2336957       2122           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2     51.70      52.34          0    2337280       2122           0
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 101  (NI_DDL)
Rows     Execution Plan
-------  ---------------------------------------------------
      0  DELETE STATEMENT   GOAL: CHOOSE
      0   DELETE OF 'NITSMEDIA'
      0    FILTER
      0     TABLE ACCESS (FULL) OF 'NITSMEDIA'
      0     TABLE ACCESS (BY INDEX ROWID) OF 'NITSTORY'
      0      INDEX (RANGE SCAN) OF 'I2_NITSTORY_STORY_ORIGIN'
                 (NON-UNIQUE)
********************************************************************************
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.11       0.11          0        323          0           0
Execute      1     51.59      52.22          0    2336957       2122           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2     51.70      52.34          0    2337280       2122           0
Misses in library cache during parse: 1
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       29      0.06       0.04          0        147          0           0
Execute     76      0.02       0.01          0          0          0           0
Fetch      287      0.00       0.01          0        323          0         223
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      392      0.08       0.07          0        470          0         223
Misses in library cache during parse: 20
    1  user  SQL statements in session.
   29  internal SQL statements in session.
   30  SQL statements in session.
    1  statement EXPLAINed in this session.
********************************************************************************
Trace file: /db/9.2/dbo01/log/udump/dbo01_ora_11063.trc
Trace file compatibility: 9.00.01
Sort options: default
       1  session in tracefile.
       1  user  SQL statements in trace file.
      29  internal SQL statements in trace file.
      30  SQL statements in trace file.
      21  unique SQL statements in trace file.
       1  SQL statements EXPLAINed using schema:
           SYSTEM.prof$plan_table
             Default table was used.
             Table was created.
             Table was dropped.
     604  lines in trace file.
 
 
December  15, 2004 - 12:28 pm UTC 
 
 
sorry -- but a query won't lock any tables.
but something is fishy here.
if you run this script:
drop table emp;
create table emp as select * from scott.emp;
alter session set sql_trace=true;
select count(*) from emp;
then tkprof will say:
select count(*)
from
 emp
                                                                                                        
                                                                                                        ....
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     14    TABLE ACCESS (FULL) OF 'EMP'
but if you run this one:
drop table emp;
create table emp as select * from scott.emp;
exec dbms_stats.gather_table_stats( user, 'EMP' );
alter session set sql_trace=true;
select count(*) from emp;
tkprof says
select count(*)
from
 emp
                                                                                                        
....
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     14    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'EMP'
But before you showed us:
Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=232 Card=15288 Bytes
          =1223040)
   1    0   DELETE OF 'NITSMEDIA'
   2    1     FILTER
   3    2       NESTED LOOPS (ANTI) (Cost=232 Card=15288 Bytes=1223040
          )
   4    3         TABLE ACCESS (FULL) OF 'NITSMEDIA' (Cost=25 Card=152
          89 Bytes=397514)
   5    3         TABLE ACCESS (BY INDEX ROWID) OF 'NITSTORY' (Cost=23
          2 Card=1 Bytes=54)
   6    5           BITMAP CONVERSION (TO ROWIDS)
   7    6             BITMAP AND
   8    7               BITMAP CONVERSION (FROM ROWIDS)
   9    8                 INDEX (RANGE SCAN) OF 'PK_NITSTORY' (UNIQUE)
  10    7               BITMAP CONVERSION (FROM ROWIDS)
  11   10                 INDEX (RANGE SCAN) OF 'I2_NITSTORY_STORY_ORI
          GIN' (NON-UNIQUE)
which shows the use of the CBO (the cost= card= is there ONLY when the CBO is used)
so, I'm thinking you have a big mixture of ANALYZED and UNANALYZED tables here the and the optimizer doesn't stand a ghost's of a chance of getting the right answer.
EG: table you are deleting from is analyzed....
These tables you are querying are not (and must be -- if you want NOT IN to run "in your lifetime"
 
 
 
 
A reader, December  15, 2004 - 2:09 pm UTC
 
 
Tom,
All Tables are analyzed, however the problem still there.
ANALYZE TABLE NITSMEDIA COMPUTE STATISTICS FOR TABLE FOR ALL INDEXES ;
ANALYZE TABLE NITSTORY COMPUTE STATISTICS FOR TABLE FOR ALL INDEXES ;
I noticed something that may help.
I created two new tables like:
create table NITSTORY1 as select * from NITSTORY;
create table NITSMEDIA as select * from NITMEDIA;
and then ran the query with new table names:
DELETE 
FROM nitsmedia1 
WHERE nitstory_srl not in
(
 SELECT srl 
 FROM nitstory1 
 WHERE lower(category) = 'stjohns'
   AND to_date('MAY-01-2003' || ' 00:00:00', 'MM-DD-YYYY HH24:MI:SS') <= last_modified_time
   AND last_modified_time <= to_date('MAY-01-2003' ||  '23:59:59', 'MM-DD-YYYY HH24:MI:SS')
   AND story_origin = 'Regional'
)
and it was very FAST. The only difference that I saw between the old tables and new ones was a foreign key between the old tables that new tables didn't have that. So I dropped the foreign key from old tables to make it in sync with the new tables. But still the delete is very slow on old tables.
Please let me know your idea.
Thanks,
- Arash
 
 
December  15, 2004 - 6:14 pm UTC 
 
 
don't use analyze.
use dbms_stats.gather_table_stats( user, 'table_name', cascade=>true );
so, show us the tkprofs between the two.  (i've a feeling that not null column isn't really not null -- eg: maybe it says not null but someone put it on there with "novalidate" or something) 
 
 
 
A reader, December  16, 2004 - 9:55 am UTC
 
 
Hi Tom,
can you tell me why, when I recreate tables (create table NITSTORY1 as select * from NITSTORY....), it works well?
I also  test combination of new tables and old tables in the delete statement, and noticed , the problem is the inner table "NITSTORY' which makes the query slow. And when I replace this table with the new table 'NITSTORY1' , it works fine.
Here are the tables:
create table NITSTORY  (
   SRL                  NUMBER(12)                      not null,
   STORYFILE_NME        VARCHAR2(30)                    not null,
   AUTHOR               VARCHAR2(42),
   STORY_ORIGIN         VARCHAR2(10),
   CATEGORY             VARCHAR2(30),
   HEADLINE             VARCHAR2(200),
   SUBHEADLINE          VARCHAR2(200),
   KEYWORDS             VARCHAR2(200),
   HISTORY              VARCHAR2(200),
   DATELINE             VARCHAR2(100),
   HIGHLIGHT            VARCHAR2(1200),
   CREATION_TIME        VARCHAR2(30),
   LAST_MODIFIED_TIME   VARCHAR2(30),
   TIMEZONE             VARCHAR2(25),
   FOOTNOTES            VARCHAR2(1000),
   BODY                 CLOB,
   constraint PK_NITSTORY primary key (SRL)
);
create table NITSMEDIA  (
   SRL                  NUMBER(12)                      not null,
   NITSTORY_SRL         NUMBER(12)                      not null,
   MEDIAFILE_NME        VARCHAR2(100),
   MEDIA_TYP            VARCHAR2(20)                    not null,
   THUMBNAIL            VARCHAR2(100),
   CAPTION              VARCHAR2(1000),
   POSITION             NUMBER,
   LAYOUT               VARCHAR2(10),
   constraint PK_NITSMEDIA primary key (SRL)
);
alter table NITSMEDIA
   add constraint FK1_NITSTORY_NITSMEDIA foreign key (NITSTORY_SRL)
      references NITSTORY (SRL);
As always, thank you for your time,
- Arash
 
 
December  16, 2004 - 10:05 am UTC 
 
 
not without the tkprofs I asked for, no. 
 
 
 
A reader, December  16, 2004 - 11:19 am UTC
 
 
Tom. 
I used: 
dbms_stats.gather_table_stats( user, 'table_name', cascade=>true );
and it fixed the problem. Why is that? Can you elaborate it a little more? 
I found that Analyze method from your site. But it seems it doesnÂ’t work well for 9.2!
Thank you again,
- Arash
 
 
December  16, 2004 - 11:46 am UTC 
 
 
you gathered table and index stats but no column stats -- that gathers table, column and index stats.
analyze table t compute statistics;
would do roughly the same thing, but dbms_stats is more proper. 
 
 
 
IN or EXISTS - back on topic !
DaPi, December  16, 2004 - 3:23 pm UTC
 
 
Hi Tom, 
Many thanks for the site.  I much appreciate your scientific/analytic approach.
This thread keeps popping to the top even after 4 years.  All you say is correct, but there is a problem:  I believe the question is not the right one to ask.
Real-life:  
I have something to code.  I do so to the best of my ability, keeping in mind clarity & maintainability.  I test to make sure I have the right result and, if the performance is appropriate, I give thanks and go on to the next task.  If it's slow (for the purpose), I quickly try some tricks I have up my sleeve and, if still slow, begin an analysis of what is going "wrong".  I can't afford the time to analyse - not ANALYZE :) - every query.
I think the right question to ask is:
Q:  I can code my query with IN or with EXISTS; which should I try first?
In 5 years of Rdb and 5 years of Oracle, I developed the habit of coding EXISTS.  In those 5 Oracle years I have recoded EXISTS as IN only twice (I'm pretty sure of this because it's been so rare).  A colleague with the IN habit, would give me a slow query to tune every week or two; I'd just recode the IN as EXISTS.
OK, I have no statistics, but in my real-life experience the answer is:  
A:  Try EXISTS first.  You may have to recode, but not very often.
Waiting to be blasted - DaPi
 
 
December  16, 2004 - 3:51 pm UTC 
 
 
A: use the cbo and it'll try out both.
No blasting, I use exists so infrequently and in almost all of the time myself. 
 
 
 
IN & EXISTS - CBO tries both?
DaPi, December  16, 2004 - 4:04 pm UTC
 
 
You said: "A: use the cbo and it'll try out both."
I've been running CBO (7.3.4 then 8.0.6 then 8.1.7) with up-to-date stats all those 5 years - it must miss a trick from time to time, because recoding HAS made a difference in some cases.
I'd agree that in many cases there is NO performance difference between IN and EXISTS (when the CBO gets it right?).  But if there is going to be a difference, I'd put my money on EXISTS.
Thanks - DaPi 
 
December  16, 2004 - 5:10 pm UTC 
 
 
select * 
  from big_table
 where exists ( select null from small_table where small_table.x = big_table.x)
vs
select * from big_table
where x in ( select x from small_table )
give me the IN any time (in rbo days) 
 
 
 
IN vs EXISTS
DaPi, December  17, 2004 - 3:17 am UTC
 
 
OK - I'll buy that (confirmed by experiment under CBO).
I guess it's that my real-life experience tends not to deal with small tables . . . . and with big tables it hurts even more when you guess wrong! 
 
December  17, 2004 - 7:58 am UTC 
 
 
doesn't even have to be small tables, big tables that return smallish results
select * from big_table
where x in ( select x from really_huge_table where <some predicate>)
if some predicate returned just 100 rows from really huge table, IN would have been preferred :) 
 
 
 
Logan, December  26, 2004 - 2:21 am UTC
 
 
   for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then 
         OUTPUT THE RECORD
      end if
   end loop
-----------------------------------------------------------
The explanation you gave for IN & EXISTS being better and in which situation is convincing, but the example loop you gave...
1.Well when you use exists the SQL always returns only one row, is that true i.e. when you used exists to find whether a conditions is true or false. For examples 
select decode( count(*), 1, 'y', 'n')
from dual where exists ( select null from t1 , t2 where 
                         where t1.x = t2.x)
I guess you cannot do such a sql using IN?
2. My question is doesnt EXISTS as the name of the clause itself suggests checks for the existense of a condition. Lets say if 10 rows meet that condition, does the exists clause read all the 10 rows, or it returns the result after meeting the condition for the first time. 
 
December  26, 2004 - 12:40 pm UTC 
 
 
I was giving psuedo code here to explain the concept.  
exists and in can both short circut since the CBO will turn in into exists and exists into in as needed these days.  so, it is safe to say that either can stop after the first one -- or get them all and semijoin. 
 
 
 
Logan, December  26, 2004 - 9:37 pm UTC
 
 
--------------
it is safe to say that either can stop 
after the first one -- or get them all and semijoin
-----------
1.Tom , can you please illustrate the same. Can you give an example where an EXISTS stops after the first row, and where it gets all the rows.
2.What do you mean by a semi join? 
 
December  27, 2004 - 9:39 am UTC 
 
 
it is that:
select * from t1 where EXISTS ( select NULL from t2 where t2.id = t1.id )
can be written as:
select * from t1 where id in ( select id from t2 );
and the optimizer is free to (re)write it either way.
consider:
ops$tkyte@ORA9IR2> create table t1( id int, y char(2000));
Table created.
 
ops$tkyte@ORA9IR2> create table t2( id int );
Table created.
 
ops$tkyte@ORA9IR2> delete from plan_table;
4 rows deleted.
 
ops$tkyte@ORA9IR2> explain plan for
  2  select * from t1 where exists ( select null from t2 where t2.id = t1.id );
Explained.
 
ops$tkyte@ORA9IR2> select * from table( dbms_xplan.display );
 
PLAN_TABLE_OUTPUT
--------------------
--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |       |       |       |
|*  1 |  FILTER              |             |       |       |       |
|   2 |   TABLE ACCESS FULL  | T1          |       |       |       |
|*  3 |   TABLE ACCESS FULL  | T2          |       |       |       |
--------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter( EXISTS (SELECT 0 FROM "T2" "T2" WHERE "T2"."ID"=:B1))
   3 - filter("T2"."ID"=:B1)
 
Note: rule based optimization
 
17 rows selected.
<b>that shows Oracle doing the "where exists" as a filter -- for every row in T1 it'll run that filter (which results in a full scan of T2 -- but stops after it hits the first "t2.id = :b1" -- so it'll not totally full scan the table each time PERHAPS, just scans till it gets a hit)
But, lets use the more intelligent CBO and tell it about our tables (tell it "t1" is medium size, "t2" is big)</b>
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats( user, 'T1', numrows => 5000, numblks=>500);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats( user, 'T2', numrows => 50000, numblks=>5000);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> delete from plan_table;
 
4 rows deleted.
 
ops$tkyte@ORA9IR2> explain plan for
  2  select * from t1 where exists ( select null from t2 where t2.id = t1.id );
 
Explained.
 
ops$tkyte@ORA9IR2> select * from table( dbms_xplan.display );
 
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
 
--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |  5000 |   551K|   854 |
|*  1 |  HASH JOIN SEMI      |             |  5000 |   551K|   854 |
|   2 |   TABLE ACCESS FULL  | T1          |  5000 |   488K|    78 |
|   3 |   TABLE ACCESS FULL  | T2          | 50000 |   634K|   760 |
--------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("T2"."ID"="T1"."ID")
 
Note: cpu costing is off
 
16 rows selected.
 
ops$tkyte@ORA9IR2>
<b>Now, it is doing a single full scan of T1 and of T2 (instead of a full scan of T2 for EVERY ROW IN T1) followed by that "semi join" thingy (which I wrote about in Effective Oracle by Design in detail)</b>
The semi join is a join that joins T1 with T2 but only ONCE (if id=55 is in T2 5,000 times -- we don't join the row in T1 with id=55 to each on -- only to the first).  A semi join is a "where exists" style join, just join to the first occurrence
But - we built the entire set.  We read all of T2, got all of the rows and are joining (semi joining) to them
 
 
 
 
 
how to use index in exists statement?
A reader, March     25, 2005 - 12:53 am UTC
 
 
Tom,
I have the delete on the large table pa_budget_lines.
I want to use the index on resource_assignment_id on the table pa_budget_lines. It seams impossible?
thanks a lot!
DELETE FROM pa_budget_lines bl
where EXISTS (SELECT  'X'
      FROM   pa_resource_assignments ra,
             pa_budget_versions bv
        WHERE  bv.current_flag <> 'Y'
          AND  bv.budget_status_code = 'B'
          AND  bv.baselined_date IS NOT NULL
          AND  bl.resource_assignment_id = ra.resource_assignment_id
          AND  bv.budget_version_id = ra.budget_version_id)
/ 
 
March     25, 2005 - 7:48 am UTC 
 
 
are you using the cbo. 
 
 
 
A reader, March     25, 2005 - 9:02 pm UTC
 
 
"
are you using the cbo. 
"
Yes, SIr. it the cost based opt on 8174. 
 
March     26, 2005 - 9:08 am UTC 
 
 
then lets see the autotrace traceonly explain output.  perhaps the index would not be useful. 
 
 
 
A reader, March     26, 2005 - 11:17 am UTC
 
 
the autotrace as the follow.
there 2 millions rows in pa_budget_lines , 164k in pa_resource_assignments and 13k in pa_budget_versions .
APPS@S0P2DBT3> DELETE FROM pa_budget_lines bl
  2  where EXISTS (SELECT  'X'
  3        FROM   pa_resource_assignments ra,
  4            pa_budget_versions bv
  5          WHERE  bv.current_flag <> 'Y'
  6            AND  bv.budget_status_code = 'B'
  7            AND  bv.baselined_date IS NOT NULL
  8            AND  bl.resource_assignment_id = ra.resource_assignment_id
  9            AND  bv.budget_version_id = ra.budget_version_id)
 10  /
432 rows deleted.
Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=11837 Card=102454 Bytes=2868712)
   1    0   DELETE OF 'PA_BUDGET_LINES'
   2    1     FILTER
   3    2       TABLE ACCESS (FULL) OF 'PA_BUDGET_LINES' (Cost=11837 Card=102454 Bytes=2868712)
   4    2       NESTED LOOPS (Cost=4 Card=1 Bytes=24)
   5    4         TABLE ACCESS (BY INDEX ROWID) OF 'PA_RESOURCE_ASSIGNMENTS' (Cost=3 Card=1 Bytes=10)
   6    5           INDEX (UNIQUE SCAN) OF 'PA_RESOURCE_ASSIGNMENTS_U1' (UNIQUE) (Cost=2 Card=2)
   7    4         TABLE ACCESS (BY INDEX ROWID) OF 'PA_BUDGET_VERSIONS' (Cost=1 Card=1085 Bytes=15190)
   8    7           INDEX (UNIQUE SCAN) OF 'PA_BUDGET_VERSIONS_U1' (UNIQUE)
Statistics
----------------------------------------------------------
       1329  recursive calls
       8377  db block gets
    1314941  consistent gets
      84134  physical reads
     586996  redo size
        515  bytes sent via SQL*Net to client
        939  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        432  rows processed
APPS@S0P2DBT3> 
APPS@S0P2DBT3> rollback
  2  /
Rollback complete.
 
 
March     26, 2005 - 12:30 pm UTC 
 
 
just explain this query (you don't need run it, just explain plan it)
ops$tkyte@ORA9IR2> DELETE FROM t1 bl
  2  where resource_assignment_id in
  3  (SELECT ra.resource_assignment_id
  4        FROM   t2 ra,
  5               t3 bv
  6          WHERE  bv.current_flag <> 'Y'
  7            AND  bv.budget_status_code = 'B'
  8            AND  bv.baselined_date IS NOT NULL
  9            AND  bv.budget_version_id = ra.budget_version_id)
 10  /
 
 
 
 
 
 
A reader, March     26, 2005 - 1:01 pm UTC
 
 
apps@s0p2dbt3> explain plan for 
  2  DELETE FROM pa_budget_lines bl
  3    where resource_assignment_id in
  4    (SELECT ra.resource_assignment_id
  5          FROM   pa_resource_assignments ra,
  6                 pa_budget_versions bv
  7            WHERE  bv.current_flag <> 'Y'
  8              AND  bv.budget_status_code = 'B'
  9              AND  bv.baselined_date IS NOT NULL
 10              AND  bv.budget_version_id = ra.budget_version_id)
 11  /
Explained.
apps@s0p2dbt3> 
apps@s0p2dbt3> @plan
QUERY_PLAN
------------------------------------------------------------
  DELETE  PA_BUDGET_LINES
    HASH JOIN
      VIEW  VW_NSO_1
        SORT UNIQUE
          HASH JOIN
            TABLE ACCESS FULL PA_BUDGET_VERSIONS
            TABLE ACCESS FULL PA_RESOURCE_ASSIGNMENTS
      TABLE ACCESS FULL PA_BUDGET_LINES
 
 
March     26, 2005 - 1:10 pm UTC 
 
 
are you sure the indices are in place?  what happens with the following scripts?
(safe if you do not have t1, t2, t3)
drop table t1;
drop table t2;
drop table t3;
                                                                                                                         
create table t1 ( resource_assignment_id int, data char(80) );
create table t2 ( resource_assignment_id int, budget_version_id int, data char(80) );
create table t3 ( current_flag varchar2(1), budget_status_code varchar2(10),
                  baselined_date date, budget_version_id int, data char(80) );
                                                                                                                         
exec dbms_stats.set_table_stats( user, 'T1', numrows=> 2000000, numblks => 12000 );
exec dbms_stats.set_table_stats( user, 'T2', numrows=> 164000, numblks => 1000 );
exec dbms_stats.set_table_stats( user, 'T3', numrows=> 13000, numblks => 500 );
                                                                                                                         
create index t1_idx on t1(resource_assignment_id);
create UNIQUE index t2_idx on t2(resource_assignment_id);
create UNIQUE index t3_idx on t3(budget_version_id);
                                                                                                                         
exec dbms_stats.set_index_stats( user, 'T1_IDX', numrows=> 2000000, numlblks => 1200 );
exec dbms_stats.set_index_stats( user, 'T2_IDX', numrows=> 164000, numlblks => 100 );
exec dbms_stats.set_index_stats( user, 'T3_IDX', numrows=> 13000, numlblks => 50 );
                                                                                                                         
set autotrace on explain
DELETE FROM t1 bl
where EXISTS (SELECT  'X'
      FROM   t2 ra,
             t3 bv
        WHERE  bv.current_flag <> 'Y'
          AND  bv.budget_status_code = 'B'
          AND  bv.baselined_date IS NOT NULL
          AND  bl.resource_assignment_id = ra.resource_assignment_id
          AND  bv.budget_version_id = ra.budget_version_id)
/
DELETE FROM t1 bl
where resource_assignment_id in
(SELECT ra.resource_assignment_id
      FROM   t2 ra,
             t3 bv
        WHERE  bv.current_flag <> 'Y'
          AND  bv.budget_status_code = 'B'
          AND  bv.baselined_date IS NOT NULL
          AND  bv.budget_version_id = ra.budget_version_id)
/
set autotrace off
 
 
 
 
A reader, March     26, 2005 - 3:14 pm UTC
 
 
apps@s0p2dbt3> create table pa.t2 as select * from pa.pa_resource_assignments;
Table created.
apps@s0p2dbt3>  create table pa.t3 as select * from pa.pa_budget_versions ;
Table created.
apps@s0p2dbt3> create table pa.t1 as select * from pa.pa_budget_lines;
Table created.
apps@s0p2dbt3> exec dbms_stats.set_table_stats( 'PA', 'T1', numrows=> 2000000, numblks => 1
2000 );
PL/SQL procedure successfully completed.
apps@s0p2dbt3> exec dbms_stats.set_table_stats( 'PA', 'T3', numrows=> 13000, numblks => 500
 );
PL/SQL procedure successfully completed.
apps@s0p2dbt3>   exec dbms_stats.set_table_stats( 'PA', 'T2', numrows=> 164000, numblks => 
1000 );
PL/SQL procedure successfully completed.
apps@s0p2dbt3>create index pa.t1_idx on pa.t1(resource_assignment_id) tablespace pax;
Index created.
SQL> create UNIQUE index pa.t2_idx on pa.t2(resource_assignment_id) tablespace pax;
create UNIQUE index pa.t3_idx on pa.t3(budget_version_id) tablespace pax;
                                                    
Index created.
SQL> 
Index created.
apps@s0p2dbt3> exec dbms_stats.set_index_stats( 'PA', 'T1_IDX', numrows=> 2000000, numlblks
 => 1200 );
PL/SQL procedure successfully completed.
apps@s0p2dbt3> exec dbms_stats.set_index_stats( 'PA', 'T2_IDX', numrows=> 164000, numlblks 
=> 100 );
PL/SQL procedure successfully completed.
apps@s0p2dbt3> exec dbms_stats.set_index_stats( 'PA', 'T3_IDX', numrows=> 13000, numlblks =
> 50 );
PL/SQL procedure successfully completed.
apps@s0p2dbt3> set autotrace on explain
apps@s0p2dbt3> DELETE FROM pa.t1 bl
  2  where EXISTS (SELECT  'X'
  3        FROM   pa.t2 ra,
  4               pa.t3 bv
  5          WHERE  bv.current_flag <> 'Y'
  6            AND  bv.budget_status_code = 'B'
  7            AND  bv.baselined_date IS NOT NULL
  8            AND  bl.resource_assignment_id = ra.resource_assignment_id
  9            AND  bv.budget_version_id = ra.budget_version_id)
 10  /
432 rows deleted.
Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=1201 Card=100000 Byt
          es=1300000)
   1    0   DELETE OF 'T1'
   2    1     INDEX (FULL SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1201 Ca
          rd=100000 Bytes=1300000)
   3    2       NESTED LOOPS (Cost=3 Card=1 Bytes=67)
   4    3         TABLE ACCESS (BY INDEX ROWID) OF 'T2' (Cost=2 Card=1
           Bytes=26)
   5    4           INDEX (UNIQUE SCAN) OF 'T2_IDX' (UNIQUE) (Cost=1 C
          ard=1640)
   6    3         TABLE ACCESS (BY INDEX ROWID) OF 'T3' (Cost=1 Card=1
           Bytes=41)
   7    6           INDEX (UNIQUE SCAN) OF 'T3_IDX' (UNIQUE)
apps@s0p2dbt3> 
apps@s0p2dbt3> rollback
  2  /
Rollback complete.
apps@s0p2dbt3> 
apps@s0p2dbt3> DELETE FROM pa.t1 bl
  2  where resource_assignment_id in
  3  (SELECT ra.resource_assignment_id
  4        FROM   pa.t2 ra,
  5               pa.t3 bv
  6          WHERE  bv.current_flag <> 'Y'
  7            AND  bv.budget_status_code = 'B'
  8            AND  bv.baselined_date IS NOT NULL
  9            AND  bv.budget_version_id = ra.budget_version_id)
 10  /
432 rows deleted.
Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=1474 Card=32800000 B
          ytes=852800000)
   1    0   DELETE OF 'T1'
   2    1     MERGE JOIN (Cost=1474 Card=32800000 Bytes=852800000)
   3    2       INDEX (FULL SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1201
          Card=2000000 Bytes=26000000)
   4    2       SORT (JOIN) (Cost=273 Card=1640 Bytes=21320)
   5    4         VIEW OF 'VW_NSO_1' (Cost=266 Card=1640 Bytes=21320)
   6    5           SORT (UNIQUE) (Cost=266 Card=1640 Bytes=109880)
   7    6             HASH JOIN (Cost=246 Card=1640 Bytes=109880)
   8    7               TABLE ACCESS (FULL) OF 'T3' (Cost=78 Card=1 By
          tes=41)
   9    7               TABLE ACCESS (FULL) OF 'T2' (Cost=153 Card=164
          000 Bytes=4264000)
apps@s0p2dbt3> 
apps@s0p2dbt3> rollback
  2  /
Rollback complete. 
 
 
March     26, 2005 - 4:30 pm UTC 
 
 
looks like it could be a limitation of the very old software.  I need to go way back to 8i to reproduce this behavior.
suggest you might temporarily try two stepping this, eg:
tkyte@ORA8IW> create global temporary table gtt as
  2  (SELECT ra.resource_assignment_id
  3        FROM   t2 ra
  4          WHERE  1=0)
  5  /
Table created.
tkyte@ORA8IW> exec dbms_stats.set_table_stats( user, 'GTT', numrows => 500 );
PL/SQL procedure successfully completed.
the above two things are done ONCE, then:
tkyte@ORA8IW> insert into gtt
  2  (SELECT ra.resource_assignment_id
  3        FROM   t2 ra,
  4               t3 bv
  5          WHERE  bv.current_flag <> 'Y'
  6            AND  bv.budget_status_code = 'B'
  7            AND  bv.baselined_date IS NOT NULL
  8            AND  bv.budget_version_id = ra.budget_version_id);
tkyte@ORA8IW> delete from t1 where resource_assignment_id in ( select * from gtt );
Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=519 Card=10000000 Bytes=260000000)
   1    0   DELETE OF 'T1'
   2    1     NESTED LOOPS (Cost=519 Card=10000000 Bytes=260000000)
   3    2       VIEW OF 'VW_NSO_1' (Cost=19 Card=500 Bytes=6500)
   4    3         SORT (UNIQUE) (Cost=19 Card=500 Bytes=6500)
   5    4           TABLE ACCESS (FULL) OF 'GTT' (Cost=16 Card=500 Bytes=6500)
   6    2       INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1 Card=2000000 Bytes=26000000)
tkyte@ORA8IW> set autotrace off 
 
 
 
Purge huge table
A reader, April     02, 2005 - 7:31 am UTC
 
 
Tom,
"
looks like it could be a limitation of the very old software.  I need to go way 
back to 8i to reproduce this behavior.
suggest you might temporarily try two stepping this, eg:
"
I should give you the what I am trying to accomplish here.
We want to purge a table which has 70 million rows about 10 GB. on critira to partition the table. the goal is to remove 60 million rows from it based the critiras. 
After reading your thread:
</code>  
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6407993912330  <code>
It is no pratical to delete the table by the conventional method.
So i am working on based on
create tmp table with nologging;
insert table with append, and parallel;
drop original table;
rename tmp to original;
recreate indexes and grants;
done.
the insert 60 millions took 20 hours. 
I would like your comment on the temp table, if the temp suitable to the massive inserts. what are needed to be change on the table:
CREATE TABLE PURGE_TMP_S
(
  RESOURCE_ASSIGNMENT_ID    NUMBER(15)          NOT NULL,
  START_DATE                DATE                NOT NULL,
  LAST_UPDATE_DATE          DATE                NOT NULL,
  LAST_UPDATED_BY           NUMBER              NOT NULL,
  CREATION_DATE             DATE                NOT NULL,
  CREATED_BY                NUMBER              NOT NULL,
  LAST_UPDATE_LOGIN         NUMBER              NOT NULL,
  END_DATE                  DATE                NOT NULL,
  PERIOD_NAME               VARCHAR2(30),
  QUANTITY                  NUMBER,
  RAW_COST                  NUMBER,
  BURDENED_COST             NUMBER,
  REVENUE                   NUMBER,
  CHANGE_REASON_CODE        VARCHAR2(30),
  DESCRIPTION               VARCHAR2(255),
  ATTRIBUTE_CATEGORY        VARCHAR2(30),
  ATTRIBUTE1                VARCHAR2(150),
  ATTRIBUTE2                VARCHAR2(150),
  ATTRIBUTE3                VARCHAR2(150),
  ATTRIBUTE4                VARCHAR2(150),
  ATTRIBUTE5                VARCHAR2(150),
  ATTRIBUTE6                VARCHAR2(150),
    ATTRIBUTE14               VARCHAR2(150),
  ATTRIBUTE15               VARCHAR2(150),
  RAW_COST_SOURCE           VARCHAR2(2),
  BURDENED_COST_SOURCE      VARCHAR2(2),
  QUANTITY_SOURCE           VARCHAR2(2),
  REVENUE_SOURCE            VARCHAR2(2),
  PM_PRODUCT_CODE           VARCHAR2(30),
  PM_BUDGET_LINE_REFERENCE  VARCHAR2(25),
  COST_REJECTION_CODE       VARCHAR2(30),
  REVENUE_REJECTION_CODE    VARCHAR2(30),
  BURDEN_REJECTION_CODE     VARCHAR2(30),
  OTHER_REJECTION_CODE      VARCHAR2(30),
  CODE_COMBINATION_ID       NUMBER(15),
  CCID_GEN_STATUS_CODE      VARCHAR2(1),
  CCID_GEN_REJ_MESSAGE      VARCHAR2(2000),
  REQUEST_ID                NUMBER
)
TABLESPACE PAD
PCTUSED    40
PCTFREE    10
INITRANS   1
MAXTRANS   255
STORAGE    (
            INITIAL          40K
            NEXT             40K
            MINEXTENTS       1
            MAXEXTENTS       10000000
            PCTINCREASE      0
            FREELISTS        1
            FREELIST GROUPS  1
            BUFFER_POOL      DEFAULT
           )
NOLOGGING 
NOCACHE
NOPARALLEL;
Thanks a lot Tom.  
 
April     02, 2005 - 9:24 am UTC 
 
 
the insert of 60million records took 20hours?
what kind of hardware/disk are we talking here.  That is abnormally slow.
did you do a subset of it with trace to see what is taking so long? 
 
 
 
A reader, April     02, 2005 - 7:33 am UTC
 
 
"We want to purge a table which has 70 million rows about 10 GB. on critira to 
partition the table. "
i meant there is no critira for implement partitions on the table.
sorry for the typo. 
 
 
A reader, April     02, 2005 - 11:31 am UTC
 
 
"
what kind of hardware/disk are we talking here.  That is abnormally slow.
did you do a subset of it with trace to see what is taking so long?
" 
it is Sun solaris 480 with 4 CPU. disk is raid 5. and there are some other applications running on the machine also.
I had it with the parallel 2 and did not set the sqltrace on it.
here is from statspack:
  Buffer Gets    Executions  Gets per Exec  % Total  Hash Value
--------------- ------------ -------------- ------- ------------
     29,014,953            1   29,014,953.0      25120040                
insert
/*+ append */ into pa.pa_budget_lines_tmp_s pbls  select
* from pa.pa_budget_lines pbl where pbl.resource_assignment_id i
n (SELECT ra.resource_assignment_id         FROM   pa.pa_resourc
e_assignments ra,                pa.pa_budget_versions bv
    WHERE  bv.current_flag <> 'Y'             AND  bv.budget_sta
 
 
April     02, 2005 - 11:54 am UTC 
 
 
that is Oracle applications, do you have support working with you if you are planning on doing stuff to those tables with SQL?
is is the QUERY that takes 20 hours or the insert?
If you explain plan the insert and the select, do they use the same plan.
raid 5 = not fast stuff for direct path writing.  but not 20 hours slow. 
 
 
 
A reader, April     02, 2005 - 12:09 pm UTC
 
 
"
that is Oracle applications, do you have support working with you if you are planning on doing stuff to those tables with SQL?
"
Yes, it's oracle apps.
We want to present oracle support what we can do for the purge of PA tables, since the support has not been able to provide the method. we test the purge on our Test environment.
"
is is the QUERY that takes 20 hours or the insert?
"
Yes.
"
If you explain plan the insert and the select, do they use the same plan.
"
apps@s0p2dbt1> explain plan for 
  2  insert /*+ append parallel(pbls,4)*/ 
  3  into pa.pa_budget_lines_tmp_s pbls
  4   select /*+ parallel(pbl, 4) */
  5  * from pa.pa_budget_lines pbl
  6  where pbl.resource_assignment_id in 
  7  (SELECT ra.resource_assignment_id
  8          FROM   pa.pa_resource_assignments ra,
  9                 pa.pa_budget_versions bv
 10            WHERE  bv.current_flag <> 'Y'
 11              AND  bv.budget_status_code = 'B'
 12              AND  bv.baselined_date IS NOT NULL
 13              AND  bv.budget_version_id = ra.budget_version_id)
 14  /
Explained.
apps@s0p2dbt1> @plan
Sat Apr 02                                                                      
                 
QUERY_PLAN
------------------------------------------------------------
  LOAD AS SELECT
    NESTED LOOPS
      VIEW  VW_NSO_1
        SORT UNIQUE
          HASH JOIN
            TABLE ACCESS FULL PA_BUDGET_VERSIONS
            TABLE ACCESS FULL PA_RESOURCE_ASSIGNMENTS
      TABLE ACCESS BY INDEX ROWID PA_BUDGET_LINES
        INDEX RANGE SCAN PA_BUDGET_LINES_N2
9 rows selected.
10 rows deleted.
apps@s0p2dbt1> explain plan for 
  2   select /*+ parallel(pbl, 4) */
  3  * from pa.pa_budget_lines pbl
  4  where pbl.resource_assignment_id in 
  5  (SELECT ra.resource_assignment_id
  6          FROM   pa.pa_resource_assignments ra,
  7                 pa.pa_budget_versions bv
  8            WHERE  bv.current_flag <> 'Y'
  9              AND  bv.budget_status_code = 'B'
 10              AND  bv.baselined_date IS NOT NULL
 11              AND  bv.budget_version_id = ra.budget_version_id)
 12  /
Explained.
apps@s0p2dbt1> @plan
Sat Apr 02                                                                      
                                                           
QUERY_PLAN
------------------------------------------------------------
  NESTED LOOPS
    VIEW  VW_NSO_1
      SORT UNIQUE
        HASH JOIN
          TABLE ACCESS FULL PA_BUDGET_VERSIONS
          TABLE ACCESS FULL PA_RESOURCE_ASSIGNMENTS
    TABLE ACCESS BY INDEX ROWID PA_BUDGET_LINES
      INDEX RANGE SCAN PA_BUDGET_LINES_N2
8 rows selected.
9 rows deleted.
apps@s0p2dbt1> 
 
 
April     02, 2005 - 1:13 pm UTC 
 
 
"
is is the QUERY that takes 20 hours or the insert?
"
Yes.
Hmmm
Q: Is it raining or sunny
A: Yes
Must be a sunshower.  So, still don't know which it is.
use autotrace traceonly explain, you'll get the cost/card values which are vital. 
 
 
 
A reader, April     02, 2005 - 1:34 pm UTC
 
 
Q: Is it raining or sunny
A: Yes
sorry i did not read your question carefully.
Yes, I mean 20 hrs for inserts. the query run 30 minutes.
I found the parallelism does not work here:
insert /*+ append parallel(pbls,4)*/ 
into pa.pa_budget_lines_tmp_s pbls
 select /*+ parallel(pbl, 4) */
* from pa.pa_budget_lines pbl
where pbl.resource_assignment_id in 
(SELECT ra.resource_assignment_id
        FROM   pa.pa_resource_assignments ra,
               pa.pa_budget_versions bv
          WHERE  bv.current_flag <> 'Y'
            AND  bv.budget_status_code = 'B'
            AND  bv.baselined_date IS NOT NULL
            AND  bv.budget_version_id = ra.budget_version_id)
/
And The parallelism does not work for
insert /*+ append parallel(pbls,4)*/ 
into pa.pa_budget_lines_tmp_s pbls
 select 
* from pa.pa_budget_lines pbl
where pbl.resource_assignment_id in 
(SELECT ra.resource_assignment_id
        FROM   pa.pa_resource_assignments ra,
               pa.pa_budget_versions bv
          WHERE  bv.current_flag <> 'Y'
            AND  bv.budget_status_code = 'B'
            AND  bv.baselined_date IS NOT NULL
            AND  bv.budget_version_id = ra.budget_version_id)
/
The parallelism does not work for 
 select /*+ parallel(pbl, 4) */
* from pa.pa_budget_lines pbl
where pbl.resource_assignment_id in 
(SELECT ra.resource_assignment_id
        FROM   pa.pa_resource_assignments ra,
               pa.pa_budget_versions bv
          WHERE  bv.current_flag <> 'Y'
            AND  bv.budget_status_code = 'B'
            AND  bv.baselined_date IS NOT NULL
            AND  bv.budget_version_id = ra.budget_version_id)
/
The parallelism only work for:
select /*+ parallel(pbl, 4) */
* from pa.pa_budget_lines pbl
init.ora 
NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
fast_start_parallel_rollback         string  LOW
optimizer_percent_parallel           integer 0
parallel_adaptive_multi_user         boolean FALSE
parallel_automatic_tuning            boolean FALSE
parallel_broadcast_enabled           boolean FALSE
parallel_execution_message_size      integer 2148
parallel_instance_group              string
parallel_max_servers                 integer 25
Sat Apr 02                                                                                                                 page    2
 
NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
parallel_min_percent                 integer 0
parallel_min_servers                 integer 0
parallel_server                      boolean FALSE
parallel_server_instances            integer 1
parallel_threads_per_cpu             integer 2
recovery_parallelism                 integer 0
SQL> 
it's 8174 db.
 
 
 
April     02, 2005 - 1:36 pm UTC 
 
 
so, given that it would not normally take 19.5 hours to direct path write that data.... I'd suggest sql tracing on a sizable subset of the data and see where the bottleneck is.  Just do it serially, easier to trace and understand. 
 
 
 
A reader, April     02, 2005 - 2:24 pm UTC
 
 
Tom, 
i will trace it. before doing that i would like the parallelism does not for the inserts and the query as i refered. it is only working for
select /*+ parallel(t,4) */ * from pa.pa_budget_lines t;
 
 
April     02, 2005 - 2:28 pm UTC 
 
 
not everything is parallelizable.  and given the plans -- hash join used in a NL, I would guess that parallelizing the access to the outer table (which is now being picked up by index) wouldn't make sense. 
 
 
 
maybe extent management ...
Alberto Dell'Era, April     02, 2005 - 3:07 pm UTC
 
 
from the review "Purge huge table" by "A Reader":
CREATE TABLE PURGE_TMP_S ...
STORAGE    (
            INITIAL          40K
            NEXT             40K
            ...
           )
if the tablespace is dictionary managed, parallel or not, most of the time will be spent changing the data dictionary and logging the changes ;)
Especially since the rows appear to be quite long, say 1k, so 1 extent for every 40 rows, 60,000,000 rows -> 1,500,000 extents looks a tad too much.
 
 
April     02, 2005 - 3:09 pm UTC 
 
 
laughing out loud, didn't even see that ;)
yes, that would be considered a tad small. 
 
 
 
A reader, April     02, 2005 - 7:53 pm UTC
 
 
from the review "Purge huge table" by "A Reader":
CREATE TABLE PURGE_TMP_S ...
STORAGE    (
            INITIAL          40K
            NEXT             40K
            ...
           )
"
if the tablespace is dictionary managed, parallel or not, most of the time will 
be spent changing the data dictionary and logging the changes ;)
Especially since the rows appear to be quite long, say 1k, so 1 extent for every 
40 rows, 60,000,000 rows -> 1,500,000 extents looks a tad too much.
"
Forgot to metioned that. It is local managed tablespace in Oracle Applications 11.5.7.
and the temp table was created as
create table temptable as select * from pa.table;
The PA table is the orginal apps table. 
Tom, that was the reason i asked your comment on the temp table. 
 
 
to "A Reader"
Alberto Dell'Era, April     03, 2005 - 11:29 am UTC
 
 
>and the temp table was created as
>create table temptable as select * from pa.table;
(i guess you meant "create table temptable as select * from pa.table WHERE 1=0", right ?) 
By "temptable", you mean pa.pa_budget_lines_tmp_s, correct ? 
Btw, what's the output of
select  TABLESPACE_NAME, EXTENT_MANAGEMENT, allocation_type, initial_extent, next_extent from dba_tablespaces where TABLESPACE_NAME= (select tablespace_name from dba_tables where owner = 'PA' and table_name = upper ('pa_budget_lines_tmp_s'));
select tablespace_name, initial_extent, next_extent, pct_increase from dba_tables where owner = 'PA' and table_name = upper ('pa_budget_lines_tmp_s');
And i would be curious to know what is the CREATE PURGE_TMP_S statement you gave above, if you created the table as a create-table-as-select ...  
 
 
A reader, April     03, 2005 - 7:22 pm UTC
 
 
"
(i guess you meant "create table temptable as select * from pa.table WHERE 1=0", 
right ?) 
"
that's correct. 
create table pa.pa_budget_lines_tmp_s as select * from pa.pa_budget_lines where 1=2
/
the temp table was created in the PA data tablespace PAD.
we are working on Apps 11i. we do not want to the original table definitions/storge, because the apps may become no supportable by Oracle.
APPS@S0P2DBT1> select  TABLESPACE_NAME, EXTENT_MANAGEMENT, allocation_type, initial_extent, 
  2  next_extent from dba_tablespaces where TABLESPACE_NAME= (select tablespace_name 
  3  from dba_tables where owner = 'PA' and table_name = upper 
  4  ('pa_budget_lines_tmp_s'));
TABLESPACE_NAME
------------------------------------------------------------------------------------------
EXTENT_MANAGEMENT              ALLOCATION_TYPE             INITIAL_EXTENT NEXT_EXTENT
------------------------------ --------------------------- -------------- -----------
PAD
LOCAL                          USER                                 40960       40960
APPS@S0P2DBT1> select tablespace_name, initial_extent, next_extent, pct_increase from 
  2  dba_tables where owner = 'PA' and table_name = upper ('pa_budget_lines_tmp_s');
TABLESPACE_NAME
------------------------------------------------------------------------------------------
INITIAL_EXTENT NEXT_EXTENT PCT_INCREASE
-------------- ----------- ------------
PAD
         40960       40960            0
APPS@S0P2DBT1> 
APPS@S0P2DBT1>  
 
April     03, 2005 - 7:24 pm UTC 
 
 
40k is a tad small for a really big table, don't you think?
I'll really have to recommend you work with support on this task.  They can advice you if the extent sizes are relevant.  You need to work within their advice for apps. 
 
 
 
A reader, April     03, 2005 - 7:23 pm UTC
 
 
"we do not want to the original table ..."
should be "we do not want to alter the original table ..."
sorry for the typo. 
 
 
IN and Inner Join
Praveen, April     04, 2005 - 2:48 pm UTC
 
 
Hi Tom,
I did the following experiment:
Oracle9iR2
CREATE TABLE ALL_OBJECTS1 AS SELECT * FROM sys.ALL_OBJECTS;
CREATE TABLE ALL_OBJECTS2 AS SELECT * FROM sys.ALL_OBJECTS;
CREATE INDEX idx_allobj1_objid ON ALL_OBJECTS1(object_id);
CREATE INDEX idx_allobj2_objid ON ALL_OBJECTS2(object_id);
ANALYZE TABLE ALL_OBJECTS1 COMPUTE STATISTICS;
ANALYZE TABLE ALL_OBJECTS2 COMPUTE STATISTICS;
SELECT COUNT(*) FROM ALL_OBJECTS;
COUNT(*)
--------
23533
Query 1
-------
SELECT * FROM ALL_OBJECTS1 a1
WHERE EXISTS (SELECT NULL FROM ALL_OBJECTS2 WHERE OBJECT_ID = a1.OBJECT_ID);
Explain Plan Output
-------------------
SELECT STATEMENT Optimizer Mode=CHOOSE
  HASH JOIN
    SORT UNIQUE                                                         
      INDEX FAST FULL SCAN    ORCL.IDX_ALLOBJ2_OBJID    
    TABLE ACCESS FULL        ORCL.ALL_OBJECTS1
    
Query2
------
SELECT * FROM ALL_OBJECTS1 a1
WHERE EXISTS (SELECT NULL FROM ALL_OBJECTS2 WHERE OBJECT_ID = a1.OBJECT_ID);
Explain Plan Output
-------------------
SELECT STATEMENT Optimizer Mode=CHOOSE
  HASH JOIN
    SORT UNIQUE                                                         
      INDEX FAST FULL SCAN    ORCL.IDX_ALLOBJ2_OBJID    
    TABLE ACCESS FULL        ORCL.ALL_OBJECTS1
Query3
------
SELECT * FROM ALL_OBJECTS1 a1, ALL_OBJECTS2 a2 WHERE a1.object_id = a2.object_id;
Explain Plan Output
-------------------
SELECT STATEMENT Optimizer Mode=CHOOSE
  HASH JOIN
    TABLE ACCESS FULL    ORCL.ALL_OBJECTS1
    TABLE ACCESS FULL    ORCL.ALL_OBJECTS2
Que 1: The explain plan output of first two queries contradicts with what you have answered 
    to the question of this thread. ie: IN and EXISTS are processed very, very 
    differently. But the plan here shows same exact access paths and costs and performance.
    What went wrong?
    
Que 2:     Ref: Query3. Why did a direct inner join between the two tables gave and entirely 
    different plan and poor performance than IN (or EXISTS). You said that IN is 
    infact a join done by first table on a 'distincted' and 'hashed', inner query.
    Query 3 is not even using the indexes available!
    
Que 3:  How should I choose between inner join and IN?
Thanks
Praveen
         
 
April     04, 2005 - 5:25 pm UTC 
 
 
good point
things change, good thing I did a test case above that you can test with to observe that and point it out to the world (I'm not being "snarky", I'm really overjoyed at the timing actually)....
Use the RBO and see what you see.  way back when I wrote this, that was the "more popular" of the two perhaps
today in 2005, what I said years ago using the RBO does not apply to the CBO.  the cbo is smart enough to recognize these two things are effectively the same. 
 
 
 
test recoded
Alberto Dell'Era, April     04, 2005 - 6:17 pm UTC
 
 
But query3 isn't semantically equivalent to query1/2... you should put a unique constraint on ALL_OBJECTS2_uq (object_id) to make them "the same", and recode the statement as below (or not?)
It goes even better anyway - the sort can be avoided alltogether (and it's confirmed that query1 and query2 plans are the same) - i have coded query2 the way i suppose it should be (query1 text is equal to query2 in the review):
CREATE TABLE ALL_OBJECTS1 AS SELECT * FROM sys.ALL_OBJECTS;
CREATE TABLE ALL_OBJECTS2 AS SELECT * FROM sys.ALL_OBJECTS;
CREATE INDEX idx_allobj1_objid ON ALL_OBJECTS1(object_id);
CREATE INDEX idx_allobj2_objid ON ALL_OBJECTS2(object_id);
exec dbms_stats.gather_table_stats (user, 'ALL_OBJECTS1', cascade=>true);
exec dbms_stats.gather_table_stats (user, 'ALL_OBJECTS2', cascade=>true);
Query 1:
SELECT /* check */ * FROM ALL_OBJECTS1 a1
WHERE EXISTS (SELECT NULL FROM ALL_OBJECTS2 WHERE OBJECT_ID = a1.OBJECT_ID);
Query2:
SELECT /* check */ * FROM ALL_OBJECTS1 a1
WHERE OBJECT_ID in (SELECT OBJECT_ID  FROM ALL_OBJECTS2);
alter table ALL_OBJECTS2 add constraint ALL_OBJECTS2_uq unique (object_id);
Query3:
SELECT /* check unique */ a1.* FROM ALL_OBJECTS1 a1, ALL_OBJECTS2 a2 
WHERE a1.object_id = a2.object_id;
======================================================================
SELECT /* check */ * FROM ALL_OBJECTS1 a1 WHERE EXISTS (SELECT NULL 
FROM ALL_OBJECTS2 WHERE OBJECT_ID = a1.OBJECT_ID)
----------------------------------------------------------------------------
| Id  | Operation             |  Name              | Rows  | Bytes | Cost  |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |       |       |    94 |
|*  1 |  HASH JOIN SEMI       |                    | 35281 |  3376K|    94 |
|   2 |   TABLE ACCESS FULL   | ALL_OBJECTS1       | 35281 |  3204K|    50 |
|   3 |   INDEX FAST FULL SCAN| IDX_ALLOBJ2_OBJID  | 35282 |   172K|     9 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("OBJECT_ID"="OBJECT_ID")
Note: cpu costing is off
======================================================================
SELECT /* check */ * FROM ALL_OBJECTS1 a1 WHERE OBJECT_ID in (SELECT OBJECT_ID  
FROM ALL_OBJECTS2)
----------------------------------------------------------------------------
| Id  | Operation             |  Name              | Rows  | Bytes | Cost  |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |       |       |    94 |
|*  1 |  HASH JOIN SEMI       |                    | 35281 |  3376K|    94 |
|   2 |   TABLE ACCESS FULL   | ALL_OBJECTS1       | 35281 |  3204K|    50 |
|   3 |   INDEX FAST FULL SCAN| IDX_ALLOBJ2_OBJID  | 35282 |   172K|     9 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("OBJECT_ID"="OBJECT_ID")
Note: cpu costing is off
======================================================================
SELECT /* check unique */ a1.* FROM ALL_OBJECTS1 a1, ALL_OBJECTS2 a2 WHERE a1.object_id = a2.object_id
---------------------------------------------------------------------------
| Id  | Operation            |  Name              | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                    |       |       |    50 |
|   1 |  NESTED LOOPS        |                    | 35281 |  3376K|    50 |
|   2 |   TABLE ACCESS FULL  | ALL_OBJECTS1       | 35281 |  3204K|    50 |
|*  3 |   INDEX RANGE SCAN   | IDX_ALLOBJ2_OBJID  |     1 |     5 |       |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("OBJECT_ID"="OBJECT_ID")
Note: cpu costing is off
Note: plans fetched from v$sql_plan, 9.2.0.6. 
 
 
In and Out Parameters and IN/Exists operator
Keshav Madhusoodan, April     05, 2005 - 3:13 am UTC
 
 
Excellent stuff, I got here.
 
 
 
Inner Join and IN
Praveen, April     05, 2005 - 9:47 am UTC
 
 
Well Tom, by that you mean, with newer versions of CBO playing around, there won't by any difference at all between IN and EXISTS - whatever be the nature of data sources are - means whether an index exists or not, unique or not, relative sizes of each tables etc, did you? That's what my execution plan shows me - with a small difference. With unique key defined, 'NESTED LOOP SEMI' - for EXISTS, 'NESTED LOOP' - for IN; But otherwise the costs are same.
I had one more doubt that whether IN is always going to perform better that than a direct inner join? Testing I've done is telling this. Or is there any  situation where inner join is better than IN? 
Thanks, Alberto, for pointing out something important (and also correcting the error,...:). Unfortunately, the execution plan I get here, after putting an unique constraint on object_id, is very much different from the one that you have given. 'NESTED LOOP SEMI' - for EXISTS, 'NESTED LOOP' - for IN and surprisingly, HASH JOIN - for inner join! Well, I gathered statistics using ANALYZE.
Thanks
Praveen 
 
April     05, 2005 - 12:10 pm UTC 
 
 
in and exists are equivalent, so the advice of the past (with rbo) does not apply.  
use either or. 
 
 
 
Question regarding "IN & EXISTS"
Leo James, May       16, 2005 - 4:38 pm UTC
 
 
Very well explained TOM. 
 
 
in, inner join and unnesting
James, August    10, 2005 - 9:56 pm UTC
 
 
Dear Tom,
You said (right at the start of this thread) concerning the unnesting/query rewriting of in subqueries:
>is typically processed as:
>
>select * 
>  from t1, ( select distinct y from t2 ) t2
> where t1.x = t2.y;
>
>The subquery is evaluated, distinct'ed, indexed (or >hashed or sorted) and then 
>joined to the original table -- typically.
I'm worried about the typically :-)
I have 3 queries that return the same result:
Query 1 - in query
select /*+ NORMAL */ count(*) from carsales_view
where carsales_view.dealer_id in (select dealer_id from dealer_regions where region_id=325)
Query 2 - inner join
select /*+ INNER */ count(*) from carsales_view,dealer_regions
where carsales_view.dealer_Id=dealer_regions.dealer_id and region_id = 325
Query 3 - "Typical" unnested in subquery
select /*+ UNNESTED */
count(*) from carsales_view t1,( select distinct dealer_id from dealer_regions where region_id=325) t2
where t1.dealer_Id=t2.dealer_id
The dealer_regions table has had stats created and enough buckets specified for the CBO to choose between two indexes.
The problem is that the 1st query - the in subquery - is not being executed as well as it should, the wrong index is being selected. What is even more curious is that the other 2 queries perform as expected - even q3 which is the "typical" unnested version of the 1st query.
I could just use an index hint but I would rather the CBO got it right as the selectivity of "region_id" varies (and sometimes the index I specified in the hint wont be the best one.)
Any advice to "encourage" the CBO to see things my way?
BTW I am using 9.1.5 so that maybe where my problem lies.
--trace output--
select /*+ NORMAL */ count(*) from carsales_view
where carsales_view.dealer_id in (select dealer_id from dealer_regions where region_id=325)
call     count       cpu    elapsed       disk      query    current        rows
total        4      0.58       0.61         55      94638          0           1
Rows     Row Source Operation
      1  SORT AGGREGATE (cr=94620 r=55 w=0 time=595018 us)
     39   NESTED LOOPS SEMI (cr=94620 r=55 w=0 time=594979 us)
  45815    NESTED LOOPS  (cr=48803 r=55 w=0 time=349295 us)
  48690     INDEX FAST FULL SCAN CSVMV_DEID_IND (cr=111 r=55 w=0 time=65768 us)(object id 270087)
  45815     INDEX UNIQUE SCAN CSDM_PK (cr=48692 r=0 w=0 time=199251 us)(object id 270093)
     39    INDEX UNIQUE SCAN DERE_PK (cr=45817 r=0 w=0 time=167174 us)(object id 281197)
Rows     Execution Plan
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     39    NESTED LOOPS (SEMI)
  45815     NESTED LOOPS
  48690      INDEX   GOAL: ANALYZED (FAST FULL SCAN) OF 
                 'CSVMV_DEID_IND' (NON-UNIQUE)
  45815      INDEX   GOAL: ANALYZED (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)
     39     INDEX (UNIQUE SCAN) OF 'DERE_PK' (UNIQUE)
********************************************************************************
select /*+ INNERJOIN */ count(*) from carsales_view,dealer_regions
where carsales_view.dealer_Id=dealer_regions.dealer_id and region_id = 325
call     count       cpu    elapsed       disk      query    current        rows
total        4      0.00       0.00          3         56          0           1
Rows     Row Source Operation
      1  SORT AGGREGATE (cr=54 r=3 w=0 time=965 us)
     39   NESTED LOOPS  (cr=54 r=3 w=0 time=941 us)
     39    NESTED LOOPS  (cr=13 r=3 w=0 time=690 us)
      5     TABLE ACCESS BY INDEX ROWID DEALER_REGIONS (cr=6 r=3 w=0 time=514 us)
      5      INDEX RANGE SCAN DERE_REID_IND (cr=2 r=1 w=0 time=229 us)(object id 281196)
     39     INDEX RANGE SCAN CSVMV_DEID_IND (cr=7 r=0 w=0 time=143 us)(object id 270087)
     39    INDEX UNIQUE SCAN CSDM_PK (cr=41 r=0 w=0 time=180 us)(object id 270093)
Rows     Execution Plan
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     39    NESTED LOOPS
     39     NESTED LOOPS
      5      TABLE ACCESS   GOAL: ANALYZED (BY INDEX ROWID) OF 
                 'DEALER_REGIONS'
      5       INDEX (RANGE SCAN) OF 'DERE_REID_IND' (NON-UNIQUE)
     39      INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'CSVMV_DEID_IND' 
                 (NON-UNIQUE)
     39     INDEX   GOAL: ANALYZED (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)
********************************************************************************
select /*+ UNNESTED */
count(*) from carsales_view t1,( select distinct dealer_id from dealer_regions where region_id=325) t2
where t1.dealer_Id=t2.dealer_id
call     count       cpu    elapsed       disk      query    current        rows
total        4      0.00       0.00          0         56          0           1
Rows     Row Source Operation
      1  SORT AGGREGATE (cr=54 r=0 w=0 time=596 us)
     39   NESTED LOOPS  (cr=54 r=0 w=0 time=577 us)
     39    NESTED LOOPS  (cr=13 r=0 w=0 time=337 us)
      5     VIEW  (cr=6 r=0 w=0 time=196 us)
      5      SORT UNIQUE (cr=6 r=0 w=0 time=189 us)
      5       TABLE ACCESS BY INDEX ROWID OBJ#(281059) (cr=6 r=0 w=0 time=99 us)
      5        INDEX RANGE SCAN OBJ#(281196) (cr=2 r=0 w=0 time=48 us)(object id 281196)
     39     INDEX RANGE SCAN OBJ#(270087) (cr=7 r=0 w=0 time=111 us)(object id 270087)
     39    INDEX UNIQUE SCAN OBJ#(270093) (cr=41 r=0 w=0 time=161 us)(object id 270093)
Rows     Execution Plan
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     39    NESTED LOOPS
     39     NESTED LOOPS
      5      VIEW
      5       SORT (UNIQUE)
      5        TABLE ACCESS   GOAL: ANALYZED (BY INDEX ROWID) OF 
                   'DEALER_REGIONS'
      5         INDEX (RANGE SCAN) OF 'DERE_REID_IND' (NON-UNIQUE)
     39      INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'CSVMV_DEID_IND' 
                 (NON-UNIQUE)
     39     INDEX   GOAL: ANALYZED (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)
  
 
 
August    11, 2005 - 9:29 am UTC 
 
 
do you really do this without bind variables?  in the application?
what is 9.1.5 
 
 
 
A soloution to my last  review
James, August    11, 2005 - 1:21 am UTC
 
 
Sorry but I have found a work around.
The 1st query can be written as:
select 
count(*) from carsales_view
where dealer_id in (
select dealer_id from dealer_regions where region_id=325 and rownum>0)
/
Which achieves the same performance of a inner join for selective and non-selective values of region_id (although with a different plan):
region_id=325 - low selectivity:
56 consistant gets compared with 94,000 (qry1)
This compares to the manually unested query (q3) of 56 consistant gets.
region_id=81 - high selectivity:
15,572 consistant gets compared to 94,000 (using qry1)
This compares to the manually unested query (q3) of 15572 consistant gets.
So I have managed to get the CBO to execute the query based on the selectivity of the region_id field in the dealer_regions table.
I beleive this is probably a bug in the 9.1.5 CBO?
region_id=325 - high selectivity with rownum>0
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=19 Card=1 Bytes=22)
   1    0   SORT (AGGREGATE)
   2    1     NESTED LOOPS (Cost=19 Card=241 Bytes=5302)
   3    2       NESTED LOOPS (Cost=19 Card=241 Bytes=4338)
   4    3         VIEW OF 'VW_NSO_1' (Cost=2 Card=5 Bytes=65)
   5    4           SORT (UNIQUE)
   6    5             COUNT
   7    6               FILTER
   8    7                 TABLE ACCESS (BY INDEX ROWID) OF 'DEALER_REG
          IONS' (Cost=2 Card=5 Bytes=40)
   9    8                   INDEX (RANGE SCAN) OF 'DERE_REID_IND' (NON
          -UNIQUE) (Cost=1 Card=5)
  10    3         INDEX (RANGE SCAN) OF 'CSVMV_DEID_IND' (NON-UNIQUE)
          (Cost=1 Card=48 Bytes=240)
  11    2       INDEX (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)
Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
         56  consistent gets
          0  physical reads
          0  redo size
        214  bytes sent via SQL*Net to client
        275  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed
region_id=81 - high selectivity with rownum>0
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=22 Card=1 Bytes=22)
   1    0   SORT (AGGREGATE)
   2    1     NESTED LOOPS (Cost=22 Card=33150 Bytes=729300)
   3    2       HASH JOIN (SEMI) (Cost=22 Card=33150 Bytes=596700)
   4    3         INDEX (FAST FULL SCAN) OF 'CSVMV_DEID_IND' (NON-UNIQ
          UE) (Cost=12 Card=48691 Bytes=243455)
   5    3         VIEW OF 'VW_NSO_1' (Cost=3 Card=689 Bytes=8957)
   6    5           COUNT
   7    6             FILTER
   8    7               TABLE ACCESS (FULL) OF 'DEALER_REGIONS' (Cost=
          3 Card=689 Bytes=5512)
   9    2       INDEX (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)
Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
      15572  consistent gets
          0  physical reads
          0  redo size
        216  bytes sent via SQL*Net to client
        275  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
 
 
reference for last answer 
James, August    11, 2005 - 1:24 am UTC
 
 
 
IN, EXISTS and NULL
Sanjaya Balasuriya, November  21, 2005 - 4:09 am UTC
 
 
Hi Tom, 
How IN and EXISTS are effected by NULL values ?
Is it possible that IN and EXISTS return different sets of data depending on NULL values ? 
 
November  21, 2005 - 8:41 am UTC 
 
 
in and exists are comparable in the presence of NULLS
not in and not exists behave differently. 
 
 
 
Oracle10g Doc has a different opinion.
Praveen, December  20, 2005 - 5:02 am UTC
 
 
Hi Tom,
Oracle10g documentation says IN and EXISTS executes and performs differently depending on the situation.
"In general, if the selective predicate is in the subquery, then use IN. If the 
selective predicate is in the parent query, then use EXISTS."
</code>  
http://download-west.oracle.com/docs/cd/B14117_01/server.101/b10752/sql_1016.htm#28145  <code>
This is contradicting with your opinion that-
"today in 2005, what I said years ago using the RBO does not apply to the CBO.  
the cbo is smart enough to recognize these two things are effectively the same." above.
Please explain.
Regards  
 
December  20, 2005 - 10:15 am UTC 
 
 
ops$tkyte@ORA10GR2> create table employees as select * from hr.employees where 1=0;
 
Table created.
 
ops$tkyte@ORA10GR2> create table orders as select * from oe.orders where 1=0;
 
Table created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert into employees
  2  select object_id,
  3         substr(object_name,1,20),
  4         substr(object_name,1,25),
  5             substr(object_type,1,25),
  6             substr(last_ddl_time,1,20),
  7         created,
  8         substr(object_name,1,10),
  9         object_id,
 10         0.10,
 11         object_id+1,
 12         mod(rownum,1000)
 13    from all_objects
 14   where rownum <= 27000;
 
27000 rows created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert into orders
  2  select rownum,
  3         created,
  4         substr(object_type,1,8),
  5         mod(rownum,150),
  6         1,
  7         object_id,
  8         object_id,
  9         1
 10    from all_objects
 11   where rownum <= 10000;
 
10000 rows created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> alter table employees add constraint emp_pk primary key(employee_id);
 
Table altered.
 
ops$tkyte@ORA10GR2> create index orders_customer_id on orders(customer_id);
 
Index created.
 
ops$tkyte@ORA10GR2> create index employees_department_id on employees(department_id);
 
Index created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'EMPLOYEES', method_opt=>'for all indexed columns' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'ORDERS', method_opt=>'for all indexed columns' );
 
PL/SQL procedure successfully completed.
ops$tkyte@ORA10GR2> set autotrace traceonly explain
ops$tkyte@ORA10GR2> SELECT /* EXISTS example */
  2           e.employee_id, e.first_name, e.last_name, e.salary
  3    FROM employees e
  4   WHERE EXISTS (SELECT 1 FROM orders o                  /* Note 1 */
  5                    WHERE e.employee_id = o.sales_rep_id   /* Note 2 */
  6                      AND o.customer_id = 144);
 
Execution Plan
----------------------------------------------------------
Plan hash value: 551415261
 
------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| 
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    67 |  4087 |    49   (3)| 
|   1 |  NESTED LOOPS                |           |    67 |  4087 |    49   (3)| 
|   2 |   SORT UNIQUE                |           |    67 |  1139 |    14   (0)| 
|*  3 |    TABLE ACCESS FULL         | ORDERS    |    67 |  1139 |    14   (0)| 
|   4 |   TABLE ACCESS BY INDEX ROWID| EMPLOYEES |     1 |    44 |     1   (0)| 
|*  5 |    INDEX UNIQUE SCAN         | EMP_PK    |     1 |       |     0   (0)| 
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("O"."CUSTOMER_ID"=144)
   5 - access("E"."EMPLOYEE_ID"="O"."SALES_REP_ID")
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> SELECT /* IN example */
  2           e.employee_id, e.first_name, e.last_name, e.salary
  3      FROM employees e
  4     WHERE e.employee_id IN (SELECT o.sales_rep_id         /* Note 4 */
  5                               FROM orders o
  6                              WHERE o.customer_id = 144);
 
Execution Plan
----------------------------------------------------------
Plan hash value: 551415261
 
------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| 
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    67 |  4087 |    49   (3)| 
|   1 |  NESTED LOOPS                |           |    67 |  4087 |    49   (3)| 
|   2 |   SORT UNIQUE                |           |    67 |  1139 |    14   (0)| 
|*  3 |    TABLE ACCESS FULL         | ORDERS    |    67 |  1139 |    14   (0)| 
|   4 |   TABLE ACCESS BY INDEX ROWID| EMPLOYEES |     1 |    44 |     1   (0)| 
|*  5 |    INDEX UNIQUE SCAN         | EMP_PK    |     1 |       |     0   (0)| 
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("O"."CUSTOMER_ID"=144)
   5 - access("E"."EMPLOYEE_ID"="O"."SALES_REP_ID")
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> set autotrace off
They give insufficient details in their example to reproduce.  when you tested this out - what did you yourself see? 
 
 
 
 
what's the maximum number of values in the IN expression
Jianhui, January   03, 2006 - 8:38 pm UTC
 
 
Hi Tom,
IN (value1, value2, ... valueN)
What's the maximum limit of N here allowed by Oracle? Do you have URL to the document for this limit?
Thanks
 
 
January   03, 2006 - 8:46 pm UTC 
 
 
 
 
Question about  not exists 
Oleg, January   19, 2006 - 8:27 am UTC
 
 
Hi,Tom,
I have this query:
SELECT   b.branch_no, b.salesman_no,
         ROUND
             (  SUM (CASE
                        WHEN b.returning_no !=2
                           THEN order_qty
                     END)
              / 22
             ) avg_rtrn_orders_qty,
         ROUND (  SUM (CASE
                          WHEN b.bsnss_no != 6
                             THEN order_qty
                       END)
                / 22
               ) avg_no_gas_orders_qty,
         ROUND (  SUM (CASE
                          WHEN b.bsnss_no = 6
                             THEN order_qty
                       END)
                / 22
               ) avg_gas_orders_qty
    FROM km_orders a, km_customers b, km_nwpr_editions c,km_closed_customers d
   WHERE a.cstmr_no = b.cstmr_no
     and b.cstmr_no=d.cstmr_no(+)
     and c.nwpr_date BETWEEN d.date_from(+) AND d.date_to(+)
     AND b.bsnsslike_sw = 'ë'
     AND a.prod_no = c.prod_no
     AND c.nwpr_date BETWEEN a.date_from AND a.date_to
     AND a.prod_no = 1
     AND b.branch_no != 9
     AND c.nwpr_date BETWEEN TO_DATE ('01/11/2005', 'DD/MM/YYYY')
                         AND TO_DATE ('30/11/2005', 'DD/MM/YYYY')
     AND a.day_in_week = c.nwpr_day
     AND NOT EXISTS (
             SELECT 1
               FROM km_closed_customers
              WHERE cstmr_no = a.cstmr_no
                AND c.nwpr_date BETWEEN date_from AND date_to)
GROUP BY b.branch_no, b.salesman_no
I see that without 
" AND NOT EXISTS (
             SELECT 1
               FROM km_closed_customers
              WHERE cstmr_no = a.cstmr_no
                AND c.nwpr_date BETWEEN date_from AND date_to)"
the query is running faster,but i need to check this table,
what can i do instead this not exists statement?
Thank you,
Oleg,
 
 
 
best practice?
A reader, February  17, 2006 - 1:14 pm UTC
 
 
Tom,
is it a good practice to have 2 exists in the form?
   SELECT *
                    FROM TABLE a
                   WHERE NOT EXISTS (SELECT 1
                                       FROM TABLE B i
                                      WHERE i.CLASS = a.CLASS_NAME
                                        AND NVL(i.CLASS_NAME,0) = NVL(a.CLASS_DEPT,0)
                                    )
                     AND NOT EXISTS (SELECT 1
                                       FROM work_orders w
                                      WHERE w.class_id     = a.s_name
                                        AND NVL(w.CLASS_ID,0) = nvl(a.CLASS_DEPT,0)
                                     ) 
 
February  17, 2006 - 4:59 pm UTC 
 
 
it seems to be less a matter of "good practice" and more a matter of getting the right answer though isn't it?
It is a bad practice to do that nvl'ing bit in general.  sure you really need to do that. 
 
 
 
can you clarify with example please
A reader, February  22, 2006 - 2:00 pm UTC
 
 
 
February  22, 2006 - 2:13 pm UTC 
 
 
an example of what exactly 
 
 
 
of how can I avoid 
A reader, February  22, 2006 - 3:06 pm UTC
 
 
Tom,
How can I avoid "the nvl'ing bit" in this case. 
 
February  22, 2006 - 3:21 pm UTC 
 
 
fixing the model?  apparently you didn't mean to use nulls?  Not if you basically join on them like that.
Else, we'd have to function base index them in this case.  
Are you sure nulls are an issue here.  Do you really need to join by them or is that "just in case"
 
 
 
 
THANKS!!!!!
A reader, February  22, 2006 - 4:03 pm UTC
 
 
 
 
updating using EXISTS
anirudh, March     09, 2006 - 3:23 pm UTC
 
 
hi tom
in our project we often have a situaton where some of the fields in a table (master_table in the example below) needs to be updated with data in another table (UPDATES_FEED in the example below)
MASTER_TABLE
key_fld
fld_1
fld_2
fld_3
fld_4
fld_5
fld_6
UPDATES_FEED
key_fld
fld_1
fld_2
i typically use SQL such as :
update master_table a
set (fld_1,fld_2) = (select fld_1,fld_2
             from updates_feed b    
             where a.key_fld = b.key_fld)
where exists (select 1  
          from updates_feed c
          where a.key_fld = c.key_fld)
I have to add the WHERE condition otherwise if no match is found in updates_feed table it would update the fields to NULL)
even though it works - it looks some what less than elegant - and has performance issues as well - as i think the lookup on the updates_feed is happening two times
Is there a better way of achieving the result ?
 
 
March     09, 2006 - 4:21 pm UTC 
 
 
update a join
update (select m.fld_1, m.fld_2, u.fld_1 n1, u.fld_2 n2
          from master_table m, updates_feed u
         where m.key_fld = u.key_fld )
 set fld_1 = n1, fld_2 = n2;
 
 
 
 
wonderful
A reader, March     10, 2006 - 4:54 am UTC
 
 
Morning Tom,
i have always wondered the best way of doing that kind of update. in your example is it therefore possible to update either or indeed both of the tables used in the select statement?
Regards 
Dave 
 
March     10, 2006 - 12:17 pm UTC 
 
 
only one, you are updating a table from another table 
 
 
 
Strange behaviour
Abhijit Mallick, March     30, 2006 - 6:39 am UTC
 
 
Hi Tom,
   Today I have found a strange behaviuor of SQL. The problem is as follows:
  
Please see the details: 
SQL> create table tab2
  2  (col21 number);
Table created.
SQL> insert into tab2 select rownum from user_objects;
2429 rows created.
SQL> create table tab1
  2  (col1 varchar2(1000),
  3   col2 number);
Table created.
SQL> insert into tab1 select table_name,rownum from user_tables;
293 rows created.
SQL> commit;
Commit complete.
SQL> ed
Wrote file afiedt.buf
  1  select count(*) from tab1
  2  where col1 is not null
  3* and col2 not in (select col2 from tab2)
SQL> /
  COUNT(*)
----------
         0
---LOOK AT THE INNER QUERY 
THE TABLE TAB2 DOES NOT HAVE THE COL2 STILL THE QUERY EXECUTES WITHOUT ANY ERROR. SEE THE BELOW PART , IT GIVES THE ERROR.
SQL> select col2 from tab2;
select col2 from tab2
       *
ERROR at line 1:
ORA-00904: "COL2": invalid identifier
SQL> 
Can you please discuss the behaviour of the above sub-query.
Thanks
--Abhijit Mallick
Bangalore
 
 
 
March     31, 2006 - 11:22 am UTC 
 
 
it is called a correlated subquery.
select count(*) 
  from tab1
 where col1 is not null
   and col2 not in (select TAB1.COL2 from tab2)
that is the same query as the one you executed.  
Just like you could:
  and not exists ( select * from tab2 where tab2.col21 = tab1.col2 )
would work - as would
  and not exists ( select * from tab2 where col21 = col2 )
we would just figure out using standard scoping rules whence each reference column was from.
 
 
 
 
IN & EXISTS
jash, April     19, 2006 - 8:09 am UTC
 
 
How to tune the below query to get result very quickly
select count(*) from polineitemtab;
  COUNT(*)
----------
   1581181
select count(*) from requisitiontab;
  COUNT(*)
----------
    284956
select rootid from NEW_TEMP_polineitemtab
where poli_requisition not in 
                   (select rootid from NEW_TEMP_requisitiontab) 
 
April     19, 2006 - 9:08 am UTC 
 
 
make sure you are using the CBO
make sure rootid is NOT NULL in NEW_TEMP_requisitiontab OR add "where rootid is not null" to the subquery.
or
select /*+ ALL_ROWS */ a.rootid
  from NEW_TEMP_polineitemtab a, NEW_TEMP_requisitiontab b
 where a.rootid = b.rootid(+)
   and b.rootid is null;
(a do it yourself hash anti-join) 
 
 
 
IN & EXISTS
jash, April     19, 2006 - 9:33 am UTC
 
 
What is b.rootid(+) significance in this query 
 
April     19, 2006 - 9:37 am UTC 
 
 
outer join - you wanted to find rows in A, not in B - so we outer join A to B and only keep those where there was no match (b.rootid is null) 
 
 
 
IN & EXISTS
jash, April     19, 2006 - 9:44 am UTC
 
 
There is no null values in b.rootid(column) NEW_TEMP_requisitiontab(table) what is the reason using is null  
 
April     19, 2006 - 10:25 am UTC 
 
 
when you outer join T1 to T2, SQL will "make up" a NULL row in T2 to join to when there isn't a row to be joined to.
So, the following two queries are semantically equivalent (assuming t2.x is NOT NULL)
ops$tkyte@ORA9IR2> create table t1 ( x int );
Table created.
 
ops$tkyte@ORA9IR2> create table t2 ( x int NOT NULL );
Table created.
 
ops$tkyte@ORA9IR2> insert into t1 values ( 1 );
1 row created.
 
ops$tkyte@ORA9IR2> insert into t1 values ( 2 );
1 row created.
 
ops$tkyte@ORA9IR2> insert into t2 values ( 1 );
1 row created.
 
ops$tkyte@ORA9IR2> select * from t1 where x not in (select x from t2);
 
         X
----------
         2
 
ops$tkyte@ORA9IR2> select t1.*
  2    from t1, t2
  3   where t1.x = t2.x(+)
  4     and t2.x is null;
 
         X
----------
         2
 
 
 
 
 
EXISTS vs "where rownum <=1"
K. Cruz, May       04, 2006 - 7:03 pm UTC
 
 
Thanks Tom for your valuable support. 
I'm reviewing potentially bad queries and ran across a query that looks for the existence of a value in a table. This query looked suspicious to me and wanted to get your feedback. 
SELECT DECODE(COUNT(NF.SUFFIX), 0, 'N', 'Y')
  FROM NF_FILES NF,
       NF_RPC_LOCATION NRL
 WHERE NF.SUFFIX = :B3
   AND NF.OFFICE_ID = NRL.OFFICE_ID
   AND NF.SECTION_CODE = NRL.SECTION_CODE
   AND NF.RESP_CODE = NRL.RESP_CODE
   AND NF.OFFICE_ID = :B2
   AND NF.LAST_TRANSACTION_DATE >= to_date(:B1, 'MMDDYYYY')
   AND ROWNUM <= 1
After rewriting the query so that it uses an EXIST clause, the explain plan for both seemed to look very similar. 
17:48:53 nfuser@NFDEV> l
  1  SELECT DECODE(COUNT(NF.SUFFIX), 0, 'N', 'Y')
  2    FROM NF_FILES NF,
  3         NF_RPC_LOCATION NRL
  4   WHERE NF.SUFFIX = :B3
  5     AND NF.OFFICE_ID = NRL.OFFICE_ID
  6     AND NF.SECTION_CODE = NRL.SECTION_CODE
  7     AND NF.RESP_CODE = NRL.RESP_CODE
  8     AND NF.OFFICE_ID = :B2
  9     AND NF.LAST_TRANSACTION_DATE >= to_date(:B1, 'MMDDYYYY')
 10*    AND ROWNUM <= 1
17:48:54 nfuser@NFDEV> /
D
-
N
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=1 Bytes=43)
   1    0   SORT (AGGREGATE)
   2    1     COUNT (STOPKEY)
   3    2       NESTED LOOPS (Cost=7 Card=1 Bytes=43)
   4    3         PARTITION RANGE (SINGLE)
   5    4           TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'NF_FILES' (Cost=5 Card=1 Bytes=31)
   6    5             INDEX (RANGE SCAN) OF 'NF_OFFID_P1_IDX' (NON-UNIQUE) (Cost=4 Card=1)
   7    3         INDEX (RANGE SCAN) OF 'RPC_LOCATION_RESP_CNTL_FK_I'(NON-UNIQUE) (Cost=2 Card=1 Bytes=12)
17:50:15 nfuser@NFDEV> l
  1    SELECT 'Y'
  2      FROM DUAL
  3     WHERE EXISTS ( SELECT 1
  4                    FROM NF_FILES NF,
  5                         NF_RPC_LOCATION NRL
  6                   WHERE NF.SUFFIX = :B3
  7                     AND NF.OFFICE_ID = NRL.OFFICE_ID
  8                     AND NF.SECTION_CODE = NRL.SECTION_CODE
  9                     AND NF.RESP_CODE = NRL.RESP_CODE
 10                     AND NF.OFFICE_ID = :B2
 11*                    AND NF.LAST_TRANSACTION_DATE >= to_date(:B1, 'MMDDYYYY') )
17:50:16 nfuser@NFDEV> /
no rows selected
Elapsed: 00:00:00.06
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=17 Card=8168)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'DUAL' (Cost=17 Card=8168)
   3    1     NESTED LOOPS (Cost=7 Card=1 Bytes=43)
   4    3       PARTITION RANGE (SINGLE)
   5    4         TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'NF_FILES'(Cost=5 Card=1 Bytes=31)
   6    5           INDEX (RANGE SCAN) OF 'NF_OFFID_P1_IDX' (NON-UNIQUE) (Cost=4 Card=1)
   7    3       INDEX (RANGE SCAN) OF 'RPC_LOCATION_RESP_CNTL_FK_I' (NON-UNIQUE) (Cost=2 Card=1 Bytes=12)
NF_FILES is a 300 million row table so based on your previous advice, my outer query was the DUAL table and my inner query was the large NF_FILES table. I'm not able to determine which is the most efficient and the original query has been flagged and resource intensive. Could you provide some feedback on which approach makes more sense? 
Thank you!!
 
 
May       05, 2006 - 1:35 am UTC 
 
 
the original doesn't have to read dual, yours does the work of the original PLUS the reading of dual.
hopefully the index nf_offid_p1_idx has suffix and last_transaction_date in it as well (concatenated index)
 WHERE NF.SUFFIX = :B3
  7                     AND NF.OFFICE_ID = NRL.OFFICE_ID
  8                     AND NF.SECTION_CODE = NRL.SECTION_CODE
  9                     AND NF.RESP_CODE = NRL.RESP_CODE
 10                     AND NF.OFFICE_ID = :B2
 11*                    AND NF.LAST_TRANSACTION_DATE >= to_date(:B1, 'MMDDYYYY')
I'd want an index on (office_id,suffix,last_transaction_date) probably. 
 
 
 
Reader, May       10, 2006 - 9:38 am UTC
 
 
I have a table called person. I need to update the last name and birth date of person_no 2 with last name and birthdate of person_no 1. "id" is the primary key of the person table. 
Can you please let me know how to do this update? 
create sequence id_seq start with 1;
create table person
(id number primary key
, person_no number
,first_nm varchar2(30)
,last_nm varchar2(30)
,birth_dt varchar2(14)
,CONSTRAINT pk_id PRIMARY KEY(id) )
insert into person
values
(id_seq.nextval,1,'ABC','AAA','19991120');
insert into person
values
(id_seq.nextval,2,'DEF','','00000000');
insert into person
values
(id_seq.nextval,1,'GHI','KLM','19921021');
insert into person
values
(id_seq.nextval,2,'NPQ','','00000000');
insert into person
values
(id_seq.nextval,1,'RST','UVW','19721215');
insert into person
values
(id_seq.nextval,2,'XYZ','','00000000');
insert into person
values
(id_Seq.nextval,1,'TTT','GGG','19750324');
select * from person;
 ID  PERSON_NO FIRST_NM  LAST_NM   BIRTH_DT   
---  --------  --------  -------   --------   
  1    1       ABC       AAA       19991120   
  2    2       DEF                 00000000   
  3    1       GHI       KLM       19921021   
  4    2       NPQ                 00000000   
  5    1       RST       UVW       19721215   
  6    2       XYZ                 00000000   
  7    1       TTT       GGG       19750324   
Thanks 
 
May       10, 2006 - 10:16 am UTC 
 
 
you seem to have person_no = 1 and 2 in there more than once.
It is not at all clear how you would pick which person_no=1 to use to update all of the person_no = 2 records. 
 
 
 
Reader, May       10, 2006 - 11:42 am UTC
 
 
Tom,
I am sorry for not being clear on the above question. Please find below the details of the tables. 
I need to update the last name and birth date of 
person_no 2 with last name and birthdate of person_no 1. 
of the person table. 
Can you please let me know how to do this update? 
create sequence id_pseq start with 200;
create table person
(id_person number primary key
, person_no number
,first_nm varchar2(30)
,last_nm varchar2(30)
,birth_dt varchar2(14)
)
insert into person
values
(id_pseq.nextval,1,'ABC','AAA','19991120');
insert into person
values
(id_pseq.nextval,2,'DEF','','00000000');
insert into person
values
(id_pseq.nextval,1,'GHI','KLM','19921021');
insert into person
values
(id_pseq.nextval,2,'NPQ','','00000000');
insert into person
values
(id_pseq.nextval,1,'RST','UVW','19721215');
insert into person
values
(id_pseq.nextval,2,'XYZ','','00000000');
insert into person
values
(id_pseq.nextval,1,'TTT','GGG','19750324');
create sequence id_pbseq start with 100;
create table person_i_bnft
(id_ibnft number primary key
,id_person number references person(id_person)
,person_no number
,id_bnft number references person_bnft(id_bnft)
,age number 
);
insert into person_i_bnft
values
(id_pbseq.NEXTVAL,200,1,3000,20);
insert into person_i_bnft
values
(id_pbseq.NEXTVAL,201,2,3000,25);
insert into person_i_bnft
values
(id_pbseq.NEXTVAL,202,1,3001,30);
insert into person_i_bnft
values
(id_pbseq.NEXTVAL,203,2,3001,35);
insert into person_i_bnft
values
(id_pbseq.NEXTVAL,204,1,3002,40);
insert into person_i_bnft
values
(id_pbseq.NEXTVAL,205,2,3002,45);
insert into person_i_bnft
values
(id_pbseq.NEXTVAL,206,1,3003,50);
create sequence id_bseq start with 3000;
create table person_bnft
(id_bnft number primary key
,amt varchar2(6)
);
insert into person_bnft
values
(id_bseq.NEXTVAL,'$1000');
insert into person_bnft
values
(id_bseq.NEXTVAL,'$2000');
insert into person_bnft
values
(id_bseq.NEXTVAL,'$3000');
insert into person_bnft
values
(id_bseq.NEXTVAL,'$4000');
Data from tables:
PErson table:
ID_PERSON PERSON_NO    FIRST_NM LAST_NM BIRTH_DT    
200         1    ABC     AAA     19991120    
201         2    DEF          00000000    
202         1    GHI     KLM     19921021    
203         2    NPQ          00000000    
204         1    RST     UVW     19721215    
205         2    XYZ          00000000    
206         1    TTT     GGG     19750324    
PErson_i_bnft
ID_IBNFT ID_PERSON   PERSON_NO ID_BNFT    AGE
100      200        1    3000    20 
101      201        2    3000    25 
102      202        1    3001    30 
103      203        2    3001    35 
104      204        1    3002    40 
105      205        2    3002    45 
106      206        1    3003    50 
Person_bnft
ID_BNFT    AMT
3000    $1000
3001    $2000
3002    $3000
3003    $4000
Thanks 
 
May       11, 2006 - 7:34 am UTC 
 
 
no, this is still as clear as a puddle of mud.
sigh, do you see that you have "FOUR PERSON_NO=1" records.  You have "THREE PERSON_NO=2" records.  You have only said:
...
I need to update the last name and birth date of 
person_no 2 with last name and birthdate of person_no 1. 
of the person table. 
.......
No one can answer your question - there is no answer to your question.  Your question doesn't make sense yet.  Which of the four should be applied to the three, and why.
 
 
 
 
Reader, May       11, 2006 - 8:04 am UTC
 
 
ID_PERSON PERSON_NO    FIRST_NM LAST_NM BIRTH_DT    
200         1        ABC     AAA     19991120    
201         2        DEF              00000000    
202         1        GHI     KLM     19921021    
203         2        NPQ              00000000    
204         1        RST     UVW     19721215    
205         2        XYZ              00000000    
206         1        TTT     GGG     19750324  
I need to update (last_nm,birth_dt) of ID_PERSON=201(person_no=2) with the value of (last_nm,birth_dt) in ID_PERSON=200, if the birth_dt of 201 is '00000000' and last_nm of 201 is null
I need to update (last_nm,birth_dt) of ID_PERSON=203(person_no=2) with the value of (last_nm,birth_dt) in ID_PERSON=202, if the birth_dt of 202 is '00000000' and last_nm of 202 is null
I need to update (last_nm,birth_dt) of ID_PERSON=205(person_no=2) with the value of (last_nm,birth_dt) in ID_PERSON=204, if the birth_dt of 204 is '00000000' and last_nm of 204 is null.
Thanks 
 
May       11, 2006 - 7:32 pm UTC 
 
 
and how pray tell did you "know that"
logic - tell use the LOGIC here.
why does 200 go with 201, are they always sequential, is 200 a fixed constant or will that change.
Yes, it is now "semi-obvious" what you seem to be saying, but I want you to say it, out loud here.
What if the row with id_person = 202 is missing for example.  CONSIDER ALL OF THE CASES, be detailed, be explicit, state requirements - go 100% of the way. 
 
 
 
Reader, May       11, 2006 - 11:48 am UTC
 
 
I am trying to do the following update, but I get the error
"ORA-01779: cannot modify a column which maps to a non key-preserved table "
Can you please let me know what is wrong here?
update 
(
  select t1.fm1,t1.lm1,t2.fm2,t2.lm2  
  from 
       (select p11.ID_PERSON id1
              ,p11.FIRST_NM fm1
          ,p11.LAST_NM lm1
              ,p11.birth_dt dob1
          ,p11.PERSON_NO per_no1
          ,p13.ID_BNFT T1_ID_BNFT
        from person p11
            ,person_i_bnft p12
        ,person_bnft p13
        where p11.ID_PERSON = p12.ID_PERSON
          and p12.ID_BNFT = p13.ID_BNFT
          and p11.PERSON_NO = 1
          and p11.BIRTH_DT <> '00000000'
      ) T1,
      (select p21.ID_PERSON id2
             ,p21.FIRST_NM fm2
         ,p21.LAST_NM lm2
             ,p21.birth_dt dob2
         ,p21.PERSON_NO per_no2
        ,p23.ID_BNFT T2_ID_BNFT
        from person p21
            ,person_i_bnft p22
        ,person_bnft p23
        where p21.ID_PERSON = p22.ID_PERSON
          and p22.ID_BNFT = p23.ID_BNFT
          and p21.PERSON_NO = 2
          and p21.BIRTH_DT = '00000000'
        ) T2
   where T1_ID_BNFT = T2_ID_BNFT)
set lm2=lm1
   ,dob2=dob1;
Thanks
 
 
May       11, 2006 - 7:55 pm UTC 
 
 
did you search for "key preserved"
I only ask because those words have not yet appeared here at all and they seem like a logical thing to search for?
 
 
 
 
are these two queries the same...?
A reader, June      02, 2006 - 5:08 am UTC
 
 
Hi
I have a query which uses EXISTS I modified it to a join (reduced 50% logical I/O) however for the test data we have it seems correct but I am not 100% sure, may you have a look and see if they are the same?
SELECT *
  FROM BCOMMON c
 WHERE c.type = 'B'
   AND c.i_period = 1
   AND EXISTS (
          SELECT 1
            FROM BCOMMON c2
           WHERE c2.type = c.type
             AND c2.id_bonif_periodica = c.id_bonif
             AND TO_CHAR (c2.f_imputacion, 'YYYYMMDD') = '20060601')
SELECT c.*
FROM BCOMMON c, BCOMMON c2
WHERE c.type='B'
AND c.i_period=1
AND c2.ID_BONIF_PERIODICA=c.ID_BONIF
AND to_char(c2.F_IMPUTACION, 'YYYYMMDD')='20060601' 
 
June      02, 2006 - 10:51 am UTC 
 
 
No.
you don't have a predicate on the second query to make sure that c2.type is 'B'
if id_bonif_periodica is not unique for a given type/f_imputacion, you cannot just join (it'll increase the number of returned records)
and please, never do this:
AND TO_CHAR (c2.f_imputacion, 'YYYYMMDD') = '20060601'
do this:
and f_imputacion = to_date( '20060601', 'yyyymmdd' )
or, if the date column has a time component:
and f_imputacion >= to_date( '20060601', 'yyyymmdd' )
and f_imputacion < to_date( '20060601', 'yyyymmdd' )+1
 
 
 
 
thanks
A reader, June      02, 2006 - 6:29 pm UTC
 
 
Hi Tom
Thanks for your valuable advices. 
 
 
join returns more records
A reader, June      05, 2006 - 3:44 am UTC
 
 
Hi Tom
I added a predicate to make sure that c2.type is 'B' so the query becomes
SELECT c.*
FROM BCOMMON c, BCOMMON c2
WHERE c.type='B'
AND c2.type='B'
AND c.i_period=1
AND c2.ID_BONIF_PERIODICA=c.ID_BONIF
and c2.f_imputacion >= to_date( '20060601', 'yyyymmdd' )
and c2.f_imputacion < to_date( '20060601', 'yyyymmdd' )+1
Since c2.ID_BONIF_PERIODICA is not unique I see more rows (with EXISTS only a rown is returned, with join two)
I guess join doesnt work in this case?
Sorry for my stupidness :-(
 
 
June      05, 2006 - 7:49 am UTC 
 
 
but that is just what I said above:
...
if id_bonif_periodica is not unique for a given type/f_imputacion, you cannot 
just join (it'll increase the number of returned records)
.......
you can use IN, you can use EXISTS, you can join to an inline view that distincts - but if that column is not unique for the given attributes - you cannot "just join" really - it'll increase the number of records returned 
 
 
 
Merge exists and not exists into one SQL statement
Kandy Train, October   10, 2006 - 11:20 am UTC
 
 
Hi,
Hope this is an appropriate question for thread.
Here is an example,
drop table picking_tab;
drop table locations_tab;
create table picking_tab (pick_ticket_no number, location_id varchar2(10));
create table locations_tab (location_id varchar2(10), warehouse varchar2(10));
insert into picking_tab values(1001, 'Bos');
insert into picking_tab values(1002, 'Bos');
insert into picking_tab values(1002, 'Chi');
insert into picking_tab values(1003, 'Chi');
insert into locations_tab values('Bos', 'NorthEast');
insert into locations_tab values('Chi', 'MidWest');
commit;
What I would like to have is just one SQL statement so that I can pass one literal "NorthEast" or "MidWest" to a view with that SQL statement.
When I pass NorthEast, I want NorthEast specific Pick Tickets and any other pick ticket that picks from more locations, but at least one of them must be from NorthEast.
The results should look like,
PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
          1002 Bos         NorthEast
          1002 Chi         MidWest
          1003 Chi         MidWest
When I pass MidWest, I only need MidWest Pick Tickets that are not shared with any other region.
The results should look like,
PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
          1003 Chi         MidWest
The example may look silly, as I have to come up with something simple for you to help me.
I appreciate your help as always. 
 
October   10, 2006 - 7:57 pm UTC 
 
 
why are there not two northeast records in the first result?
and you know, you do not "pass" things to a view, you apply predicates to a view, what do you mean by "pass"??? 
 
 
 
Merge exists and not exists into one SQL statement 
Kandy Train, October   10, 2006 - 11:36 am UTC
 
 
In the above posting, I didn't mention what I have tried so far. I am putting it here in case you wanted to know it.
--Results
PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
          1003 Chi         MidWest
To get the above results, I used
select p.pick_ticket_no, p.location_id, l.warehouse
from picking_tab p, locations_tab l
where p.location_id = l.location_id
and not exists (select lt.warehouse 
            from picking_tab pt, locations_tab lt
            where pt.pick_ticket_no = p.pick_ticket_no
            and pt.location_id = lt.location_id
            and lt.warehouse = 'NorthEast');
--Results
PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
          1002 Bos         NorthEast
          1002 Chi         MidWest
          1003 Chi         MidWest
To get the above results, I used
select p.pick_ticket_no, p.location_id, l.warehouse
from picking_tab p, locations_tab l
where p.location_id = l.location_id
and exists (select lt.warehouse 
            from picking_tab pt, locations_tab lt
            where pt.pick_ticket_no = p.pick_ticket_no
            and pt.location_id = lt.location_id
            and lt.warehouse = 'MidWest');
I want to have just one SQL where I could pass either "MidWest" or "NorthEast".
Thanks,
 
 
October   10, 2006 - 8:04 pm UTC 
 
 
see above, why just one northeast, doesn't make sense. 
 
 
 
Merge exists and not exists into one SQL statement
Kandy train, October   10, 2006 - 8:19 pm UTC
 
 
Hi Tom,
You are right and my first result (in first posting) should look like,
PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
          1001 Bos         NorthEast
          1002 Bos         NorthEast
          1002 Chi         MidWest
And this is the SQL statement I have used to get this results.
select p.pick_ticket_no, p.location_id, l.warehouse
from picking_tab p, locations_tab l
where p.location_id = l.location_id
and exists (select lt.warehouse 
            from picking_tab pt, locations_tab lt
            where pt.pick_ticket_no = p.pick_ticket_no
            and pt.location_id = lt.location_id
            and lt.warehouse = 'NorthEast');
As you can see, It is the same SQL statement, and the only difference is in EXISTS and NOT EXISTS clauses.
I have a ref cursor that has a return type. I tried to use native dynamic SQL by creating two views with these two sql statements and did hit a error, which is a limitation in 9i.
So, now I have to create a one view that can produce result sets in one SQL statement.
May be I am making it harder for you to understand.
If that's the case, I will post whole procedure togather with the ref cursor tomorrow.
Thanks,
 
 
October   10, 2006 - 8:31 pm UTC 
 
 
ok, why just one midwest then, why not both. 
 
 
 
Merge exists and not exists into one SQL statement  
Kandy Train, October   11, 2006 - 6:34 am UTC
 
 
Pick_TIcket_Id 1002 has two records (Picking from two locations).
When the SQL is run from MidWest, They should only get the Pick Tickets that are purely from MidWest. If a Pick Ticket is filled from NorthEast and MidWest, it is not handled from MidWest, but from NorthEast.
When the SQL is run from NorthEast, They should get all the
Pick tickets that at least have a record in pickcing_tab for the same pick_ticket_no.
Does that make sense??
Thanks, 
 
October   11, 2006 - 8:15 am UTC 
 
 
nope, sorry it does not, this is sounding alot like "procedural logic" to me here, the rules seem arbitrary. 
 
 
 
Merge exists and not exists into one SQL statement   
Kandy Train, October   11, 2006 - 11:02 am UTC
 
 
Thanks and I will use Procedural code to solve this issue.
Thanks again and appreciate your help 
 
 
Performance Issue when using Exists...
VLS, November  02, 2006 - 12:11 pm UTC
 
 
Hi Tom,
In recent Oracle Magazine, in one of the example, you mentioned that exists will perform better in cases where the columns of only one table is used in SELECT statement. For example, 
select x.a, x.b. x.c 
from x, y
where x.a = y.a
and   x.b = :b1
can be coded as
select x.a, x.b. x.c 
from x
where x.b = :b1
and  exists
(select 'x' from y 
where y.a = x.a)
But, in my cases, I have observed that the query performance degrades dratiscally when I change these type of queries to make use of EXISTS. For example, 
SELECT dn.dn_num msisdn, ca.customer_id, ca.custcode, co.co_id,
t.ttl_des title, cc.ccfname fname, cc.cclname lname, ca.birthdate,
cc.ccemail email, ca.billcycle, ca.csclimit csc_limit, ca.prgcode,
prg.prgname, co.co_activated, ch.ch_status, co.tmcode, tm.shdes shdes,
sm.SM_PUK , cc.CCZIP , cc.ccline3 , ccline4 , ccline5 , cccity
FROM       (select customer_id, custcode, birthdate, billcycle, csclimit,
    prgcode, rownum srno from customer_all
    where prgcode in 
    (select prgcode from hss_cust_segment where description='POST')) ca,
    contract_all co,
    curr_co_status ch,
    ccontact_all cc,
    directory_number dn,
    pricegroup_all prg,
    mputmtab tm,
    title t,
    storage_medium sm
WHERE     ca.customer_id = co.customer_id
AND     cc.customer_id = ca.customer_id
AND     cc.ccbill = 'X'
AND     cc.CCTITLE = t.TTL_ID(+)
AND     co.co_id = ch.co_id
AND     ch.ch_status IN ('a', 's')
AND     ca.prgcode = prg.prgcode
AND     co.tmcode = tm.tmcode
AND     tm.vscode = 1
and     exists
(select 'x' from contr_services_cap cs
where     cs.co_id = co.co_id
AND     cs.sncode = 1
AND     cs.dn_id  = dn.dn_id)
and    exists
(select 'x' from contr_devices cd
where    cd.co_id = co.co_id
AND     cd.cd_deactiv_date IS NULL
AND     cd.port_id = sm.sm_id);
Above query did not fetch result in 6 hours whereas below query executed in 4 hours.
SELECT dn.dn_num msisdn, ca.customer_id, ca.custcode, co.co_id,
t.ttl_des title, cc.ccfname fname, cc.cclname lname, ca.birthdate,
cc.ccemail email, ca.billcycle, ca.csclimit csc_limit, ca.prgcode,
prg.prgname, co.co_activated, ch.ch_status, co.tmcode, tm.shdes shdes,
sm.SM_PUK , cc.CCZIP , cc.ccline3 , ccline4 , ccline5 , cccity
FROM       (select customer_id, custcode, birthdate, billcycle, csclimit,
    prgcode, rownum srno from customer_all
    where prgcode in 
    (select prgcode from hss_cust_segment where description='POST')) ca,
    contract_all co,
    curr_co_status ch,
    ccontact_all cc,
    directory_number dn,
    pricegroup_all prg,
    mputmtab tm,
    title t,
    storage_medium sm,
    contr_devices cd,
    contr_services_cap cs
WHERE     ca.customer_id = co.customer_id
AND     cc.customer_id = ca.customer_id
AND     cc.ccbill = 'X'
AND     cc.CCTITLE = t.TTL_ID(+)
AND     co.co_id = ch.co_id
AND     ch.ch_status IN ('a', 's')
and    cs.co_id = co.co_id
AND     cs.sncode = 1
AND     cs.seqno = (SELECT MAX (seqno)
    FROM contr_services_cap x
    WHERE cs.co_id = x.co_id AND x.sncode = 1)
AND     ca.prgcode = prg.prgcode
AND     co.tmcode = tm.tmcode
AND     tm.vscode = 1
AND     cs.dn_id  = dn.dn_id
and    cd.co_id = co.co_id
AND     cd.cd_deactiv_date IS NULL
AND     cd.port_id = sm.sm_id;
Similarly, there were other incidence when exists has caused performance issues. Also, the tables used inside the exists condition are big and have proper indexes even though these fail.
What could be the problem ? Please help...
Regards
VLS 
 
November  02, 2006 - 12:17 pm UTC 
 
 
I did not say "would work better"
I said "semantically and syntactically it would be MORE MEANINGFUL"
are you using the cbo. 
 
 
 
Query Performance with Exists
VLS, November  02, 2006 - 10:14 pm UTC
 
 
Yes Tom, I am using CBO. The inner query also uses a proper indexes. 
Regards
VLS 
 
November  03, 2006 - 9:11 am UTC 
 
 
those two queries do not seem to be even remotely the same here?
how can you even compare them?  they do not return the same results. 
 
 
 
Difference between the 2 Queries
VLS, November  03, 2006 - 9:41 am UTC
 
 
Hi Tom,
Would appreciate if you can let me know the difference between the two queries. Second query is the original query and first one is modified by me. In the Original query, contr_services_cap is joined back to contr_services_cap to get max(seqno) so that only one co_id is returned. I tried to avoid this aggregation by using EXISTS and thought that aggregation should resource intensive.
Regards
VLS 
 
November  03, 2006 - 10:52 am UTC 
 
 
that is precisely the bit that seems strange - how can they be equivalent?
one says "keep only the row with the max seqno"
one says "keep any row that has a mate over here"
first query - no predicate on seqno
second query - predicate on seqno
don't get how they are the same. 
 
 
 
In response to my previous post
VLS, November  03, 2006 - 12:33 pm UTC
 
 
Hi Tom,
"one says "keep only the row with the max seqno"" v/s one says "keep any row that has a mate over here". How does it matter ? For example,
Table A
Column x    Column y
------------------------
100        128
100        290
100        300
100        180
200        768
200        190
300        650
select x from a o
where y = (select max(y) from a i where i.x=o.x.x)
will be 
100
200
300 
No matter 300 has only one value but this one value will be a max for this value of x. So, anyway each of x value will have a max. I used this logic and hence instead of taking max used EXISTS. 
Now, for the predicates on SEQNO, this predicate is only used to get a single row for each co_id. Again using max. This was not required when I used EXISTS.
Regards
VLS
 
 
November  03, 2006 - 2:33 pm UTC 
 
 
you cannot compare the two queries you have.
where exists (something)
where something = (select max(somethingelse) 
are not comparable.  Show me the WHERE exists that is comparable to you:
select x from a o
where y = (select max(y) from a i where i.x=o.x.x)
 
please!!
I don't have (nor do I want) your schema
I don't know your relationships
I did not decompose your query 
I do however have a strong feeling "they just ain't the same" 
 
 
 
exits perform better?
A reader, January   18, 2007 - 9:10 am UTC
 
 
Hi Tom
I had a query like this : which was doing over 2250 LIO/execute -
select   DISTINCT
BO128_Accounts_Mod.Entity_cre_flag,
BO128_Accounts_Mod.MainTableID, BO128_Accounts_Mod.BOCreatedBy,
BO128_Accounts_Mod.Cust_Last_Name, BO128_Accounts_Mod.Cust_First_Name,
BO128_Accounts_Mod.OrgKey,
fn_CategoryLookup(BO128_Accounts_Mod.Cust_Type,  'ACCOUNT_TYPE','en_US'),
BO128_Accounts_Mod.Cust_Type,
BO128_Accounts_Mod.MakerID, BO128_Accounts_Mod.MakerID,
BO128_Accounts_Mod.ownedUserID, BO128_Accounts_Mod.AssignedTo,
BO128_Accounts_Mod.AssignedTo, BO128_Accounts_Mod.ownedUserID,
BO128_Accounts_Mod.AssignedTo, BO128_Accounts_Mod.AssignedToGroup,
fn_CategoryLookup(BO128_Accounts_Mod.StageName,'PROCESSSTEP_STAGE','en_US'),
BO128_Accounts_Mod.StageName,
fn_CategoryLookup(BO128_Accounts_Mod.RecordStatus,'RECORD_STATUS','en_US'),
BO128_Accounts_Mod.RecordStatus,
fn_CategoryLookup(BO128_Accounts_Mod.LastOperPerformed,'LAST_OPERATION',  'en_US'),
BO128_Accounts_Mod.LastOperPerformed,
BO128_Accounts_Mod.ProcessID, decode(BO128_Accounts_Mod.AccountID,NULL,NULL,  'srmBOObj:AccountModBO/AccountModBO.accountID:' || BO128_Accounts_Mod.AccountID),
BO128_Accounts_Mod.GroupID,
BO128_Accounts_Mod.createdUserID, BO128_Accounts_Mod.AccountID,
BO128_Accounts_Mod.AccessOwnerGroup, BO128_Accounts_Mod.LastEditedPage,
BO128_Accounts_Mod.Cust_DOB, BO128_Accounts_Mod.TFPartyFlag,
BO128_Accounts_Mod.Negated, BO128_Accounts_Mod.BlackListed,
BO128_Accounts_Mod.Suspended, BO128_Accounts_Mod.IsMCEdited,
BO128_Accounts_Mod.AssignedByUserID, BO128_Accounts_Mod.IsTampered,
BO128_Accounts_Mod.OwnerGroup
From
ao_acctmod  BO128_Accounts_Mod, ao_demomod  BO129_Demographic_Mod
where  BO128_Accounts_Mod.AccountID = BO129_Demographic_Mod.AccountID  AND
(  (  (  (  (  (  (  (  ( BO128_Accounts_Mod.MakerID = 1  )   OR  (
BO128_Accounts_Mod.ownedUserID =   400013  )  )  )   OR  (
BO128_Accounts_Mod.AssignedTo =   400013  )  )  )   AND  (  (  (  (  (
BO128_Accounts_Mod.RecordStatus =  'F'  )   OR  (
BO128_Accounts_Mod.RecordStatus =  'F'  )  )  )   OR  (
BO128_Accounts_Mod.RecordStatus =  'F'  )  )  )  )  )  AND (rownum <=  201)  )  )
order by BO128_Accounts_Mod.Cust_Last_Name Asc
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       15      0.10       0.08          0        443          0         201
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       17      0.11       0.09          0        443          0         201
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 38  (CRMUSER)
Rows     Row Source Operation
-------  ---------------------------------------------------
    201  SORT UNIQUE (cr=2252 pr=0 pw=0 time=951615 us)
    201   COUNT STOPKEY (cr=443 pr=0 pw=0 time=4815 us)
    201    NESTED LOOPS  (cr=443 pr=0 pw=0 time=4812 us)
    206     TABLE ACCESS FULL AO_ACCTMOD (cr=235 pr=0 pw=0 time=3567 us)
    201     INDEX RANGE SCAN IX_AO_DEMO_ACCID (cr=208 pr=0 pw=0 time=1943 us)(object id 18490)
when I changed it to like below (used exists instead of distinct) then it is doing alot less LIO's and peforms better too (in terms of response time) .
select
BO128_Accounts_Mod.Entity_cre_flag,
BO128_Accounts_Mod.MainTableID, BO128_Accounts_Mod.BOCreatedBy,
BO128_Accounts_Mod.Cust_Last_Name, BO128_Accounts_Mod.Cust_First_Name,
BO128_Accounts_Mod.OrgKey,
fn_CategoryLookup(BO128_Accounts_Mod.Cust_Type,  'ACCOUNT_TYPE','en_US'),
BO128_Accounts_Mod.Cust_Type,
BO128_Accounts_Mod.MakerID, BO128_Accounts_Mod.MakerID,
BO128_Accounts_Mod.ownedUserID, BO128_Accounts_Mod.AssignedTo,
BO128_Accounts_Mod.AssignedTo, BO128_Accounts_Mod.ownedUserID,
BO128_Accounts_Mod.AssignedTo, BO128_Accounts_Mod.AssignedToGroup,
fn_CategoryLookup(BO128_Accounts_Mod.StageName,'PROCESSSTEP_STAGE','en_US'),
BO128_Accounts_Mod.StageName,
fn_CategoryLookup(BO128_Accounts_Mod.RecordStatus,'RECORD_STATUS','en_US'),
BO128_Accounts_Mod.RecordStatus,
fn_CategoryLookup(BO128_Accounts_Mod.LastOperPerformed,'LAST_OPERATION',  'en_US'),
BO128_Accounts_Mod.LastOperPerformed,
BO128_Accounts_Mod.ProcessID, decode(BO128_Accounts_Mod.AccountID,NULL,NULL,  'srmBOObj:AccountModBO/AccountModBO.accountID:' || BO128_Accounts_Mod.AccountID),
BO128_Accounts_Mod.GroupID,
BO128_Accounts_Mod.createdUserID, BO128_Accounts_Mod.AccountID,
BO128_Accounts_Mod.AccessOwnerGroup, BO128_Accounts_Mod.LastEditedPage,
BO128_Accounts_Mod.Cust_DOB, BO128_Accounts_Mod.TFPartyFlag,
BO128_Accounts_Mod.Negated, BO128_Accounts_Mod.BlackListed,
BO128_Accounts_Mod.Suspended, BO128_Accounts_Mod.IsMCEdited,
BO128_Accounts_Mod.AssignedByUserID, BO128_Accounts_Mod.IsTampered,
BO128_Accounts_Mod.OwnerGroup
From
ao_acctmod  BO128_Accounts_Mod
where exists  ( select null from
       ao_demomod  BO129_Demographic_Mod
              where
              BO128_Accounts_Mod.AccountID = BO129_Demographic_Mod.AccountID  AND
(  ( BO128_Accounts_Mod.MakerID = 1  )   OR  ( BO128_Accounts_Mod.ownedUserID =   400020  )     OR  ( BO128_Accounts_Mod.AssignedTo =   400020  )  )     AND  ( BO128_Accounts_Mod.RecordStatus =  'F'     OR  BO128_Accounts_Mod.RecordStatus =  'F'     OR   BO128_Accounts_Mod.RecordStatus =  'F'  )   )
 AND rownum <=  201
order by BO128_Accounts_Mod.Cust_Last_Name Asc
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       15      0.07       0.17          0        735          0         201
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       17      0.07       0.17          0        735          0         201
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 38  (CRMUSER)
Rows     Row Source Operation
-------  ---------------------------------------------------
    201  COUNT STOPKEY (cr=735 pr=0 pw=0 time=5822 us)
    201   FILTER  (cr=735 pr=0 pw=0 time=5618 us)
    206    TABLE ACCESS BY INDEX ROWID AO_ACCTMOD (cr=323 pr=0 pw=0 time=2705 us)
    206     INDEX FULL SCAN IX_AO1_CUST_LAST_NAME (cr=17 pr=0 pw=0 time=225 us)(object id 18495)
    201    FILTER  (cr=412 pr=0 pw=0 time=2461 us)
    201     INDEX RANGE SCAN IX_AO_DEMO_ACCID (cr=412 pr=0 pw=0 time=2071 us)(object id 18490)
my question - what is the reason - how distinct peforms poor than exists..
Ajeet 
 
change this query
A reader, March     26, 2007 - 7:47 am UTC
 
 
select PGM_VEH_CONFIG.CNTRY_CD
       , PGM_VEH_CONFIG.LANG_CD
       , PGM_VEH_CONFIG.BUSNS_FCN_USE_CD 
       , PGM_VEH_CONFIG.MODL_YR_NBR
       , PGM_VEH_CONFIG.PVC_ID
       , RTRIM(PGM_VEH_CONFIG.PVC_DESC)
       , PGM_VEH_CONFIG.PVC_MSRP
       , TO_CHAR(PGM_VEH_CONFIG.PVC_EFCTV_DT,'yyyy-mm-dd')
       , TO_CHAR(PGM_VEH_CONFIG.PVC_TERMNT_DT,'yyyy-mm-dd') 
       , PGM_VEH_CONFIG.SELLNG_SRC_CD
       , PGM_VEH_CONFIG.NAMPLT_CD
       , PGM_VEH_CONFIG.MDSNG_MODL_DESGTR
       , PGM_VEH_CONFIG.USG_SEQ_NBR
       , PGM_VEH_CONFIG.MKTG_BODY_STYLE_CD
       , PGM_VEH_CONFIG.SUB_BRAND_CD
       , PGM_VEH_CONFIG.PACKG_OPTN_CD
       , PGM_VEH_CONFIG.SEGMENT_CD
       , PGM_VEH_CONFIG.COMPRS_MODL_CD
       , PGM_VEH_CONFIG.CONFIG_MODL_CD
       , PGM_VEH_CONFIG.OPTN_USG_STMT
    from   gmceycea.PGM_VEH_CONFIG
       , gmceycea.PRICE_RANGE
    where PGM_VEH_CONFIG.BUSNS_FCN_USE_CD     ='038'
      and PGM_VEH_CONFIG.CNTRY_CD             = PRICE_RANGE.CNTRY_CD
      and PGM_VEH_CONFIG.LANG_CD              = PRICE_RANGE.LANG_CD
      and PGM_VEH_CONFIG.BUSNS_FCN_USE_CD     = PRICE_RANGE.BUSNS_FCN_USE_CD
      and PGM_VEH_CONFIG.PVC_MSRP between PRICE_RANGE.START_PRICE_RANGE and PRICE_RANGE.END_PRICE_RANGE
AND   EXISTS (SELECT * from gmceycea.PVC_COLR
          where pvc_colr.cntry_cd           = pgm_veh_config.cntry_cd
          and pvc_colr.lang_cd              = pgm_veh_config.lang_cd
          and pvc_colr.busns_fcn_use_cd     = pgm_veh_config.busns_fcn_use_cd
          and pvc_colr.modl_yr_nbr          = pgm_veh_config.modl_yr_nbr
          and pvc_colr.pvc_id               = pgm_veh_config.pvc_id
          and pvc_colr.busns_fcn_use_cd     = '038'
          and PGM_VEH_CONFIG.BUSNS_FCN_USE_CD= '038');
if PGM_VEH_CONFIG.BUSNS_FCN_USE_CD= '038'
column having the value 038 then EXISTS part should work otherwise exists part should not work.
kindly rephrase it.
 
March     26, 2007 - 11:06 am UTC 
 
 
huh? 
 
 
Question
Reader, March     27, 2007 - 9:15 am UTC
 
 
Hello,
How do we make Oracle to use index if the query has NOT IN?
SELECT * 
FROM ORDERS 
WHERE STATUS NOT IN ('NEW','OLD')
I created a Function Based Index.
CREATE INDEX IDX_FBI ON ORDERS (DECODE(STATUS,'NEW','1','OLD','1',0)) = '0'
SELECT * 
FROM ORDERS 
WHERE DECODE(STATUS,'NEW','1','OLD','1',0)) = '0'
It does a Full table scan but when I do a COUNT(*) it does a Fast Full Index scan. How to use index while using NOT IN?
Thanks 
March     27, 2007 - 10:16 am UTC 
 
 
no really, what index did you create - given the one you demonstrate does not work.
and what version are you using.
ops$tkyte%ORA10GR2> /*
ops$tkyte%ORA10GR2> drop table t;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t
ops$tkyte%ORA10GR2> as
ops$tkyte%ORA10GR2> select decode( mod(rownum,3),0,'OLD',1,'NEW'  ) ostatus, a.*
ops$tkyte%ORA10GR2>   from all_objects a;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2> drop index t_idx;
Index dropped.
ops$tkyte%ORA10GR2> create index t_idx on t( decode( ostatus, 'NEW', null, 'OLD', null, 'YES' ) );
Index created.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t where  decode( ostatus, 'NEW', null, 'OLD', null, 'YES' ) = 'YES';
Execution Plan
----------------------------------------------------------
Plan hash value: 470836197
------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   500 | 48000 |    44   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |   500 | 48000 |    44   (0
|*  2 |   INDEX RANGE SCAN          | T_IDX |   200 |       |    35   (0
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(DECODE("OSTATUS",'NEW',NULL,'OLD',NULL,'YES')='YES')
ops$tkyte%ORA10GR2> set autotrace off 
 
 
 
A reader, March     28, 2007 - 10:57 am UTC
 
 
Tom,
I have a problem with a select I am not able to built 
consider that
select * from t
where decode (:x, null,'999', t.a)  in (select t2.a from t2 where t2.a = :x)
This is not working when :x is null
When :x is null I will normally have this
select * from t
where '999'in (select t2.a from t2 where t2.a = '')
What I would like to have in this case is a sort of
select * from t
where '999' in ('999')
Could you please help me
Thanks a lot for your help
 
March     28, 2007 - 12:14 pm UTC 
 
 
... This is not working when :x is null  ...
sure it is, it works perfectly.  It just probably doesn't do what you want - that is something else altogether.
select *
  from t
 where :x is null or a in (select t2.a from t2 where t2.a = :x)
seems to say exactly what you meant. 
 
 
 
NOT  IN too slow 
Sagar, May       04, 2007 - 2:31 am UTC
 
 
Hi Tom,
      I have one small NOT IN query on 2 tables which when run on 9i database takes more than 2 hours, I had to kill the session without ouput.But IN works in 2 seconds!.
     I copied same tables and created same indexes in 10gR2 , now the query takes around 5-6 minutes in 10gR2.
     For Details see below.I am interested in knowing what causes  consistent gets to be so large. There is no one connected to the user other than me.
SQL> select count(1) from camps_upload ;
  COUNT(1)
----------
     43656
Elapsed: 00:00:02.65
Execution Plan
----------------------------------------------------------
Plan hash value: 889239046
---------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |   184   (2)| 00:00:03 |
|   1 |  SORT AGGREGATE    |              |     1 |            |          |
|   2 |   TABLE ACCESS FULL| CAMPS_UPLOAD | 43656 |   184   (2)| 00:00:03 |
---------------------------------------------------------------------------
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        805  consistent gets
        781  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
SQL> select count(1) from camps_master_upload ;
  COUNT(1)
----------
     38266
Elapsed: 00:00:07.71
Execution Plan
----------------------------------------------------------
Plan hash value: 1748044587
----------------------------------------------------------------------------------
| Id  | Operation          | Name                | Rows  | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                     |     1 |   182   (2)| 00:00:03 |
|   1 |  SORT AGGREGATE    |                     |     1 |            |          |
|   2 |   TABLE ACCESS FULL| CAMPS_MASTER_UPLOAD | 38266 |   182   (2)| 00:00:03 |
----------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
        338  recursive calls
          0  db block gets
        852  consistent gets
        792  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
          1  rows processed
SQL> select bytes/(1024*1024) usage from user_segments where segment_name  IN ('CAMPS_UPLOAD','CAMPS
_MASTER_UPLOAD') ;
     USAGE
----------
         7
         7
Elapsed: 00:00:04.75
Execution Plan
----------------------------------------------------------
ERROR:
ORA-01039: insufficient privileges on underlying objects of the view
SP2-0612: Error generating AUTOTRACE EXPLAIN report
Statistics
----------------------------------------------------------
       2017  recursive calls
          0  db block gets
       4111  consistent gets
        309  physical reads
          0  redo size
        444  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         56  sorts (memory)
          0  sorts (disk)
          2  rows processed
SQL> select count(1) from camps_upload where 
  2  (folio_no,comp_code) in ( select folio_no,comp_code from camps_master_upload)  ;
  COUNT(1)
----------
     38121
Elapsed: 00:00:02.50
Execution Plan
----------------------------------------------------------
Plan hash value: 123087718
-------------------------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |     1 |    26 |    66   (7)| 00:00:01 |
|   1 |  SORT AGGREGATE        |                  |     1 |    26 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI |                  |     1 |    26 |    66   (7)| 00:00:01 |
|   3 |    INDEX FAST FULL SCAN| IND_CMP_MAST_UPD | 38266 |   485K|    30   (4)| 00:00:01 |
|   4 |    INDEX FAST FULL SCAN| IND_CMP_UPD      | 43656 |   554K|    34   (3)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")
Statistics
----------------------------------------------------------
        551  recursive calls
          0  db block gets
        380  consistent gets
        273  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
          1  rows processed
SQL> ed
Wrote file afiedt.buf
  1  select count(1) from camps_upload where
  2* (folio_no,comp_code) not in ( select folio_no,comp_code from camps_master_upload)
SQL> /
  COUNT(1)
----------
      5535
Elapsed: 00:05:57.61
Execution Plan
----------------------------------------------------------
Plan hash value: 3220892056
-------------------------------------------------------------------------------------------
| Id  | Operation           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                     |     1 |    13 |  7228K  (3)| 24:05:45 |
|   1 |  SORT AGGREGATE     |                     |     1 |    13 |            |          |
|*  2 |   FILTER            |                     |       |       |            |          |
|   3 |    TABLE ACCESS FULL| CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
|*  4 |    TABLE ACCESS FULL| CAMPS_MASTER_UPLOAD |     1 |    13 |   185   (3)| 00:00:03 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "CAMPS_MASTER_UPLOAD"
              "CAMPS_MASTER_UPLOAD" WHERE LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2)))
   4 - filter(LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2))
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
   19545709  consistent gets
         20  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
SQL> 
Even though index is not used why consistent gets are so abnormally high?.      
 
May       04, 2007 - 1:00 pm UTC 
 
 
the camps_master_upload table was full scanned 43,656 times. 
 
 
NOT IN too slow 
Sagar, May       05, 2007 - 1:22 am UTC
 
 
Thanks Tom,
            I have used "NOT EXIST" instead of "NOT IN" and the query executes within 3-4 seconds.Both tables do not have NULLs in the columns used in the query.
SQL> ED
Wrote file afiedt.buf
  1  select  COUNT(1) FROM camps_upload a
  2   where   not exists   ( select 1  from camps_master_upload b where a.folio_no =b.folio_no
  3* and a.comp_code = b.comp_code  )
SQL> /
  COUNT(1)
----------
      5535
Elapsed: 00:00:03.46
Execution Plan
----------------------------------------------------------
Plan hash value: 1970687259
--------------------------------------------------------------------------------
-----------
| Id  | Operation              | Name             | Rows  | Bytes | Cost (%CPU)|
 Time     |
--------------------------------------------------------------------------------
-----------
|   0 | SELECT STATEMENT       |                  |     1 |    26 |   220   (5)|
 00:00:03 |
|   1 |  SORT AGGREGATE        |                  |     1 |    26 |            |
          |
|*  2 |   HASH JOIN RIGHT ANTI |                  | 43655 |  1108K|   220   (5)|
 00:00:03 |
|   3 |    INDEX FAST FULL SCAN| IND_CMP_MAST_UPD | 38266 |   485K|    30   (4)|
 00:00:01 |
|   4 |    TABLE ACCESS FULL   | CAMPS_UPLOAD     | 43656 |   554K|   188   (4)|
 00:00:03 |
--------------------------------------------------------------------------------
-----------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("A"."FOLIO_NO"="B"."FOLIO_NO" AND "A"."COMP_CODE"="B"."COMP_CODE")
Statistics
----------------------------------------------------------
         17  recursive calls
          0  db block gets
        939  consistent gets
        239  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
            Can you please explain  how "NOT IN" works as "IN" executes within 3-4 seconds even if we use no indexes.This will be helpful for all. 
Thanks in advance.
Wrote file afiedt.buf
  1   select  COUNT(1) FROM camps_upload
  2    where (folio_no,comp_code)   in
  3*   ( select folio_no,comp_code from camps_master_upload)
SQL> /
  COUNT(1)
----------
     38121
Elapsed: 00:00:12.18
Execution Plan
----------------------------------------------------------
Plan hash value: 474859051
--------------------------------------------------------------------------------
-------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU
)| Time     |
--------------------------------------------------------------------------------
-------------
|   0 | SELECT STATEMENT      |                     |     1 |    26 |   375   (4
)| 00:00:05 |
|   1 |  SORT AGGREGATE       |                     |     1 |    26 |
 |          |
|*  2 |   HASH JOIN RIGHT SEMI|                     |     1 |    26 |   375   (4
)| 00:00:05 |
|   3 |    TABLE ACCESS FULL  | CAMPS_MASTER_UPLOAD | 38266 |   485K|   185   (3
)| 00:00:03 |
|   4 |    TABLE ACCESS FULL  | CAMPS_UPLOAD        | 43656 |   554K|   188   (4
)| 00:00:03 |
--------------------------------------------------------------------------------
-------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")
Statistics
----------------------------------------------------------
        398  recursive calls
          0  db block gets
       1662  consistent gets
        559  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         10  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
 
May       08, 2007 - 10:04 am UTC 
 
 
well, I don't know about you - but I see a "not exists" and an "in"
that is
I see entirely different queries here that do entirely different things and return entirely different answers. 
 
 
NOT IN too slow
Sagar, May       09, 2007 - 1:40 am UTC
 
 
Thanks for the reply Tom,
      Actually my question was continuation of the previously asked quetion just above.
      Anyway here is the problem I am facing.I have a simple "IN" query which executes in less than 5 seconds.If I put "NOT IN" in place of "IN" the query takes more than 5 minutes.Both the tables have around 40,000 records.You had replied that one of the table is full scanned 40,000 times in case of "NOT IN".Then why it is not full scanned it case of "IN"?.
Thanks in advance.
SQL> ed
Wrote file afiedt.buf
  1  select count(1) from camps_upload
  2  where
  3  (folio_no,comp_code) in
  4* ( select folio_no,comp_code from camps_master_upload )
SQL> /
  COUNT(1)
----------
     38121
Elapsed: 00:00:04.28
Execution Plan
----------------------------------------------------------
Plan hash value: 474859051
---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    26 |   375   (4)| 00:00:05 |
|   1 |  SORT AGGREGATE       |                     |     1 |    26 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|                     |     1 |    26 |   375   (4)| 00:00:05 |
|   3 |    TABLE ACCESS FULL  | CAMPS_MASTER_UPLOAD | 38266 |   485K|   185   (3)| 00:00:03 |
|   4 |    TABLE ACCESS FULL  | CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")
Statistics
----------------------------------------------------------
        654  recursive calls
          0  db block gets
       1704  consistent gets
       1596  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
          1  rows processed
SQL> ed
Wrote file afiedt.buf
  1  select count(1) from camps_upload
  2  where
  3  (folio_no,comp_code) not in
  4* ( select folio_no,comp_code from camps_master_upload )
SQL> /
  COUNT(1)
----------
      5535
Elapsed: 00:05:28.67
Execution Plan
----------------------------------------------------------
Plan hash value: 3220892056
-------------------------------------------------------------------------------------------
| Id  | Operation           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                     |     1 |    13 |  7228K  (3)| 24:05:45 |
|   1 |  SORT AGGREGATE     |                     |     1 |    13 |            |          |
|*  2 |   FILTER            |                     |       |       |            |          |
|   3 |    TABLE ACCESS FULL| CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
|*  4 |    TABLE ACCESS FULL| CAMPS_MASTER_UPLOAD |     1 |    13 |   185   (3)| 00:00:03 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "CAMPS_MASTER_UPLOAD"
              "CAMPS_MASTER_UPLOAD" WHERE LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2)))
   4 - filter(LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2))
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
   19545709  consistent gets
       1591  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
   
       
 
May       11, 2007 - 9:28 am UTC 
 
 
because in and not in are ENTIRELY DIFFERENT
I don't know why you would even consider comparing them.
 
 
 
NOT IN too slow
Sagar, May       12, 2007 - 4:10 am UTC
 
 
Thanks for the reply Tom,
        Yes IN and NOT IN are different, but from point of view of algorithm for  finding elements from the set A also in the set B (the IN process ) and  finding elements in set A that are not in set B ( the NOT IN process) they are very much similar.
         
May       14, 2007 - 12:55 pm UTC 
 
 
huh, you are kidding right? 
they are very different.
think about this - you are asked to find "where x = 5", that is very very very different from "where x <> 5"
think about what you just said.  Other than the fact both are written in SQL, there is nothing similar about them at all. 
 
 
NOT IN too slow
Sagar, May       16, 2007 - 5:16 am UTC
 
 
Thanks for the reply Tom, 
               I am still not convinced.
Consider this manual process. 
There is person having  2 boxes containing some objects.
Box1 has 2 objects.
Box2 has 3 objects.
1. He is told to find out objects  from box1 that are also in box2.
Suppose he takes 1 minute to finish.This is the IN process.
2. Now we ask him to find objects from box1 that are not there in box2. This is NOT IN process.
For this he takes 100 minutes!.
(the eg. I showed you for IN executes within 4 seconds while for  NOT IN with same data it takes 400 seconds)
In this case can we believe him when he takes 100 minutes for this ?. 
IN and NOT IN have similar algorithm only the difference is that search is carried out till last element in the case of NOT IN but I don't think this can justify the delay.
 
May       16, 2007 - 10:31 am UTC 
 
 
do you understand why
where x = 5
where x <> 5 
would pretty much be processed very differently???
eg: where x = 5, obviously, a candidate for an index - find 5, get data
where x <> 5, obviously - NOT a candidate for an index, indexes are using to "find" stuff, not "not find" stuff.
in and not in are like NIGHT and DAY.  eg: DIFFERENT. 
 
 
NOT IN and filter
Jonathan Lewis, May       16, 2007 - 7:11 pm UTC
 
 
Sagar, in one of your posts you say: 
"Both tables do not have NULLs in the columns used in the query." However, the predicate section of the execution plan shows that you have not declared these columns to be NOT NULL:
   2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "CAMPS_MASTER_UPLOAD"
              "CAMPS_MASTER_UPLOAD" WHERE LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2)))
   4 - filter(LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2))If you do not declare the columns as NOT NULL, the optimizer has to assume that they may contain nulls, hence the use of the 
lnnvl() function calls that disable the use of indexes, and leave Oracle doing a tablescan of the 
camps_master_upload table for (potientially) every row in the 
camps_upload table.
Declare the columns not null, or add '{column} is not null' predicates for all for column referenced, and you would probably get the same anti-join as you did with the 'not exists' version of the query. For related comments see:  
http://jonathanlewis.wordpress.com/2007/02/25/not-in/   
 
 
NOT IN too slow 
Sagar, May       17, 2007 - 6:54 am UTC
 
 
Thanks Jonathan!!!
Jonathan you are absolutely right!!!.Thank you very much.
I just added not null constraint to the referenced columns and its DONE ;).See the output below.Its good example for oracle fans.
Also thanks Tom for taking his very precious time for this query.
SQL> alter table camps_master_upload modify (folio_no not null , comp_code not null ) ; 
Table altered.
SQL> alter table camps_upload modify (folio_no not null , comp_code not null ) ; 
Table altered.
SQL> ed
Wrote file afiedt.buf
  1  select count(1) from camps_upload
  2  where
  3* (folio_no,comp_code)  in ( select folio_no,comp_code from camps_master_upload )
SQL> /
  COUNT(1)
----------
     38121
Elapsed: 00:00:00.54
Execution Plan
----------------------------------------------------------
Plan hash value: 474859051
---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    26 |   375   (4)| 00:00:05 |
|   1 |  SORT AGGREGATE       |                     |     1 |    26 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|                     |     1 |    26 |   375   (4)| 00:00:05 |
|   3 |    TABLE ACCESS FULL  | CAMPS_MASTER_UPLOAD | 38266 |   485K|   185   (3)| 00:00:03 |
|   4 |    TABLE ACCESS FULL  | CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1601  consistent gets
          0  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
SQL> ed
Wrote file afiedt.buf
  1  select count(1) from camps_upload
  2  where
  3* (folio_no,comp_code) not  in ( select folio_no,comp_code from camps_master_upload )
SQL> /
  COUNT(1)
----------
      5535
Elapsed: 00:00:00.16
Execution Plan
----------------------------------------------------------
Plan hash value: 1887649868
---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    26 |   375   (4)| 00:00:05 |
|   1 |  SORT AGGREGATE       |                     |     1 |    26 |            |          |
|*  2 |   HASH JOIN RIGHT ANTI|                     | 43655 |  1108K|   375   (4)| 00:00:05 |
|   3 |    TABLE ACCESS FULL  | CAMPS_MASTER_UPLOAD | 38266 |   485K|   185   (3)| 00:00:03 |
|   4 |    TABLE ACCESS FULL  | CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1601  consistent gets
          0  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
 
 
NOT IN too slow...
Sagar, May       17, 2007 - 8:14 am UTC
 
 
Note that there are no indexes on both the tables.
 
May       17, 2007 - 11:25 am UTC 
 
 
all I've been trying to get you to realize is:
not in and in are as fundamentally different as night and day are.
not in, especially with regards to nullality, is hugely different. 
 
 
Thanks Tom
Sagar, May       17, 2007 - 12:06 pm UTC
 
 
Thanks once again Tom,
               Yes I got it now. Each day I learn something new here.Keep up the great work.I am a Tom Kyte fan ;).
 
 
I need this
rebeccah tukisi, October   23, 2007 - 7:36 am UTC
 
 
good afternoon Tom
please help me to register on your site 
so that i can email question too, i kind of new to the site and the oracle world
do you also help with the discoverer queries/reports
from rebeccah 
October   23, 2007 - 11:14 am UTC 
 
 
nobody registers here.
when i have time, I permit new questions to be asked - it is first come, first serve and catch as catch can. 
 
 
Sachin, March     10, 2008 - 7:30 am UTC
 
 
Hi Tom ,
Look at below  sql query , 
insert into rp_alert_cdr
          select
          /*+ INDEX(c, CDR_IDX2)*/   
          al_number_seq.currval,
  c.*
         from
  cdr c
  where
 c.prof_num = lv_high_usage_data(i).plg_prof_num and
 c.insert_dt >= lv_start_time and
 c.insert_dt <= lv_end_time and
 c.prof_num is not null and
 c.toll_value > 0 and
 c.b_intl_ind = 2 and
 c.b_off_hook_ind = 1 and
 c.orig_type <> 1 and
 upper(c.DIALLED_DIGITS) Not Like '101%'  and 
 c.term_type <> 0 and 
nvl(c.cic,999999) not in ('*V','*U');  -- added new clause
My process is ,just inserting records from CDR table to RP_ALERT_CDR only those record which condition is qulifiy the where clause. this was running as per need,, now i have added new condition.
"and  nvl(c.cic,999999)  not in ('*V','*U')"  -- added new clause
cdr table has CIC column varchar2(5) datatype. it can contain null record also . 
in current time , we have only *V,*U and null record in CIC column, after adding condition with out using NVL
it gives 0 record
and when we use query with nvl function it give desire 
result.
  
March     10, 2008 - 11:40 am UTC 
 
 
"and  nvl(c.cic,999999)  not in ('*V','*U')"  -- added new clause
cdr table has CIC column varchar2(5) datatype.one utterly and completely wonders why you would nvl to a number then doesn't one?  If you have a varchar2(5) - what could be the possible logic behind nvl'ing to 999,999???
but anyway - you have this column cic
it has in it:
a) *V
b) *U
c) a null value
Now, you want to keep all records where cic is not in *V, *U
Well, null is neither NOT IN, nor IN ('*v','*u', NULL is "not known"
so, where null not in (a,b) - is NOT TRUE (nor is it false)
where NOT(null not in (a,b)) - is NOT TRUE (nor is it false)
they are unknown.
so, currently, seems what you really meant to say was:
where cic is null
plain and simple 
 
 
 
sachin, March     12, 2008 - 1:30 am UTC
 
 
Thanks Tom ,
i was missing this.  
 
are these 2 SQLs are same
Reene, March     14, 2008 - 6:39 am UTC
 
 
Hi Tom
I have a SQL developed by some developer.
select  ooh.order_number,count(ool.line_id)
from       ont.oe_order_headers_all ooh,
           ont.oe_order_lines_all ool,
           wsh.wsh_delivery_Details wdd
           where
          exists
          ( select  */  'X'
          from    ont.oe_order_headers_all oh,
                  ont.oe_order_lines_all ol,
                  wsh.wsh_delivery_Details wdd,
                  ont.OE_SYSTEM_PARAMETERS_ALL osp
                  where oh.header_id=ol.header_id and
                   ol.line_id=wdd.source_line_id (+) and
                  ( oh.ORDER_TYPE_ID = osp.ATTRIBUTE1 and osp.attribute1 is not null ) and
                   oh.org_id=3171 and
                   oh.order_source_id=1203 and
                   nvl(wdd.released_status,'x') <> 'C' and
                   nvl(wdd.released_status,'x') <> 'D' and
                   ol.shippable_flag='Y' and
                   ooh.order_number = oh.order_number
         ) and
          ooh.order_source_id=1203  and
          ooh.header_id=ool.header_id and
          nvl(wdd.released_status,'x') <> 'D' and
          ool.line_id=wdd.source_line_id (+) and
          ool.shippable_flag='Y'
group by ooh.order_number
/
I tried to simplify it -as its tkprof shows a poor execution plan - i think that the above SQL can be re-wrtitten as :
select     oh.order_number,count(ol.line_id)
from       ont.oe_order_headers_all oh,
           ont.oe_order_lines_all ol,
           wsh.wsh_delivery_Details wdd,
           ont.OE_SYSTEM_PARAMETERS_ALL osp
where
          oh.order_source_id=1203  and
          oh.header_id=ol.header_id and
          oh.ORDER_TYPE_ID = osp.ATTRIBUTE1 and osp.attribute1 is not null ) and
          oh.org_id=3171 and
          nvl(wdd.released_status,'x') <> 'D' and
          nvl(wdd.released_status,'x') <> 'C' and
          ol.line_id=wdd.source_line_id (+) and
          ol.shippable_flag='Y'
group by oh.order_number
/
is it correct .
Thanks 
March     15, 2008 - 9:37 am UTC 
 
 
run it - and observe the two totally different answers you will likely get.
no, definitely not the same.   
 
 
typo in above post
Reene, March     14, 2008 - 6:41 am UTC
 
 
select  ooh.order_number,count(ool.line_id)
from       ont.oe_order_headers_all ooh,
           ont.oe_order_lines_all ool,
           wsh.wsh_delivery_Details wdd
           where
          exists
          ( select   'X'
          from    ont.oe_order_headers_all oh,
                  ont.oe_order_lines_all ol,
                  wsh.wsh_delivery_Details wdd,
                  ont.OE_SYSTEM_PARAMETERS_ALL osp
                  where oh.header_id=ol.header_id and
                   ol.line_id=wdd.source_line_id (+) and
                  ( oh.ORDER_TYPE_ID = osp.ATTRIBUTE1 and osp.attribute1 is not null ) and
                   oh.org_id=3171 and
                   oh.order_source_id=1203 and
                   nvl(wdd.released_status,'x') <> 'C' and
                   nvl(wdd.released_status,'x') <> 'D' and
                   ol.shippable_flag='Y' and
                   ooh.order_number = oh.order_number
         ) and
          ooh.order_source_id=1203  and
          ooh.header_id=ool.header_id and
          nvl(wdd.released_status,'x') <> 'D' and
          ool.line_id=wdd.source_line_id (+) and
          ool.shippable_flag='Y'
group by ooh.order_number
/
I tried to simplify it -as its tkprof shows a poor execution plan - i think that the above SQL can 
be re-wrtitten as :
select     oh.order_number,count(ol.line_id)
from       ont.oe_order_headers_all oh,
           ont.oe_order_lines_all ol,
           wsh.wsh_delivery_Details wdd,
           ont.OE_SYSTEM_PARAMETERS_ALL osp
where
          oh.order_source_id=1203  and
          oh.header_id=ol.header_id and
          oh.ORDER_TYPE_ID = osp.ATTRIBUTE1 and osp.attribute1 is not null ) and
          oh.org_id=3171 and
          nvl(wdd.released_status,'x') <> 'D' and
          nvl(wdd.released_status,'x') <> 'C' and
          ol.line_id=wdd.source_line_id (+) and
          ol.shippable_flag='Y'
group by oh.order_number
/
is it correct .
Thanks
 
 
correct
Reene, April     07, 2008 - 4:05 am UTC
 
 
Thanks Tom,
even though the number of records fethced were same,but the data was different. that is order_numbers were same but the counts were different.thanks for poiniting it out.
understanding them a bit better now.
thanks 
 
A reader, July      23, 2008 - 12:37 pm UTC
 
 
Tom, 
After reading this thread, i am running the IN & EXIST query for emp and dept table. Both the query shows the same execution plan. But it should be different. For IN query,optimizer should use index scan on emp table. For exists query, optimizer should use FTS on emp table. Please correct me if i am wrong..
SQL> select count(*) from emp;
  COUNT(*)
----------
    458752
 
SQL> select count(*) from dept;
  COUNT(*)
----------
         2
 
SQL>  EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SQLADMIN',TABNAME => 'EMP',ESTIMAT
E_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);
PL/SQL procedure successfully completed.
SQL> 
SQL>  EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SQLADMIN',TABNAME => 'DEPT',ESTIMA
TE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);
PL/SQL procedure successfully completed.
SQL> 
SQL> select count(ename) from emp where deptno in(select deptno from dept);
COUNT(ENAME)
------------
         100
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=335 Card=1 Bytes=13)
   1    0   SORT (AGGREGATE)
   2    1     NESTED LOOPS (Cost=335 Card=228355 Bytes=2968615)
   3    2       TABLE ACCESS (FULL) OF 'EMP' (Cost=335 Card=456710 Byt
          es=4567100)
   4    2       INDEX (UNIQUE SCAN) OF 'SYS_C0010206' (UNIQUE)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       5485  consistent gets
          0  physical reads
          0  redo size
        307  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
SQL> SELECT COUNT(ENAME) from emp where exists (select null  from dept where dept.deptno =
 emp.deptno);
COUNT(ENAME)
------------
         100
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=335 Card=1 Bytes=13)
   1    0   SORT (AGGREGATE)
   2    1     NESTED LOOPS (SEMI) (Cost=335 Card=228355 Bytes=2968615)
   3    2       TABLE ACCESS (FULL) OF 'EMP' (Cost=335 Card=456710 Byt
          es=4567100)
   4    2       INDEX (UNIQUE SCAN) OF 'SYS_C0010206' (UNIQUE)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       5485  consistent gets
          0  physical reads
          0  redo size
        307  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
SQL>  
 
 
Query Tuning
Thakur Manoj, November  10, 2008 - 3:15 am UTC
 
 
Dear Sir
This query is taking 3 seconds to execute it
select  distinct
           atcf.base_entity_id
         , be.entity_name_full As Cutomer_Name
         , mbf.bank_facility_name AS Account_No
         , atcf.proposed_limit AS Amount_Sanction
         , atcf.os_balance AS Amount_Outstand
         , rfdt.filter_date
         , bd.mst_borrower_state_id
    from     aud_trn_c3_fac atcf
           , rvw_cas_getchkfacdt rfdt
           , base_entity be
           , rvw_cas_stmt_borr_dtls bd
           , mst_bank_facility mbf
Where     atcf.trans_datetime = rfdt.facility_dt and
          atcf.base_entity_id = rfdt.base_entity_id and
          atcf.base_entity_id = be.id and
          atcf.id = rfdt.trn_c3_fac_id and
          rfdt.base_entity_id = bd.base_entity_id and
          rfdt.sub_process_id = bd.sub_process_id and
          rfdt.filter_date = bd.filter_date and
          rfdt.base_entity_id = be.id and
          atcf.mst_bank_facility_id = mbf.id and
          mbf.mst_facilitytype_id = 1 and
          (atcf.base_entity_id, atcf.id) not in
          (
            select distinct tss.base_entity_id, 
            tss.trn_prop_facility_id 
            from 
            rvw_cas_stmt_sanc_sec_dtls tss)
            
Please check the explain Plan
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=292 Card=1 Bytes=153
          )
   1    0   SORT (UNIQUE) (Cost=228 Card=1 Bytes=153)
   2    1     FILTER
   3    2       MERGE JOIN (Cost=226 Card=1 Bytes=153)
   4    3         SORT (JOIN) (Cost=64 Card=1 Bytes=105)
   5    4           MERGE JOIN (Cost=63 Card=1 Bytes=105)
   6    5             SORT (JOIN) (Cost=14 Card=48 Bytes=3696)
   7    6               HASH JOIN (Cost=12 Card=48 Bytes=3696)
   8    7                 HASH JOIN (Cost=7 Card=48 Bytes=2352)
   9    8                   TABLE ACCESS (BY INDEX ROWID) OF 'MST_BANK
          _FACILITY' (Cost=2 Card=29 Bytes=754)
  10    9                     INDEX (RANGE SCAN) OF 'IDX_MST_BANK_FACI
          LITY' (NON-UNIQUE) (Cost=1 Card=29)
  11    8                   TABLE ACCESS (FULL) OF 'AUD_TRN_C3_FAC' (C
          ost=4 Card=48 Bytes=1104)
  12    7                 TABLE ACCESS (FULL) OF 'BASE_ENTITY' (Cost=4
           Card=1567 Bytes=43876)
  13    5             SORT (JOIN) (Cost=49 Card=1 Bytes=28)
  14   13               VIEW OF 'RVW_CAS_GETCHKFACDT' (Cost=48 Card=1
          Bytes=28)
  15   14                 SORT (GROUP BY) (Cost=48 Card=1 Bytes=63)
  16   15                   MERGE JOIN (Cost=46 Card=1 Bytes=63)
  17   16                     SORT (JOIN) (Cost=10 Card=1 Bytes=26)
  18   17                       HASH JOIN (Cost=8 Card=1 Bytes=26)
  19   18                         TABLE ACCESS (FULL) OF 'TRN_C3_FAC'
          (Cost=3 Card=328 Bytes=2952)
  20   18                         TABLE ACCESS (FULL) OF 'AUD_TRN_C3_F
          AC' (Cost=4 Card=951 Bytes=16167)
  21   16                     FILTER
  22   21                       SORT (JOIN)
  23   22                         VIEW (Cost=44 Card=97 Bytes=3589)
  24   23                           UNION-ALL
  25   24                             SORT (UNIQUE) (Cost=30 Card=89 B
          ytes=2492)
  26   25                               FILTER
  27   26                                 TABLE ACCESS (FULL) OF 'AUD_
          TRN_PROC_STATE' (Cost=28 Card=89 Bytes=2492)
  28   27                                   SORT (AGGREGATE)
  29   28                                     TABLE ACCESS (FULL) OF '
          MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)
  30   26                                 TABLE ACCESS (FULL) OF 'MSTW
          F_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)
  31   24                             SORT (UNIQUE) (Cost=6 Card=8 Byt
          es=176)
  32   31                               FILTER
  33   32                                 TABLE ACCESS (FULL) OF 'AUD_
          TRN_DATA_STATE' (Cost=4 Card=8 Bytes=176)
  34   33                                   SORT (AGGREGATE)
  35   34                                     TABLE ACCESS (FULL) OF '
          MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)
  36   32                                 TABLE ACCESS (FULL) OF 'MSTW
          F_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)
  37    3         SORT (JOIN) (Cost=162 Card=3 Bytes=144)
  38   37           VIEW OF 'RVW_CAS_STMT_BORR_DTLS' (Cost=162 Card=3
          Bytes=144)
  39   38             UNION-ALL
  40   39               SORT (UNIQUE) (Cost=56 Card=1 Bytes=51)
  41   40                 MERGE JOIN (Cost=54 Card=1 Bytes=51)
  42   41                   SORT (JOIN) (Cost=49 Card=1 Bytes=24)
  43   42                     VIEW (Cost=47 Card=1 Bytes=24)
  44   43                       SORT (GROUP BY) (Cost=47 Card=1 Bytes=
          45)
  45   44                         MERGE JOIN (Cost=44 Card=1 Bytes=45)
  46   45                           SORT (JOIN) (Cost=8 Card=4 Bytes=7
          6)
  47   46                             HASH JOIN (Cost=6 Card=4 Bytes=7
          6)
  48   47                               TABLE ACCESS (FULL) OF 'TRN_C3
          _CORPORATE' (Cost=2 Card=110 Bytes=660)
  49   47                               TABLE ACCESS (FULL) OF 'AUD_TR
          N_C3_CORPORATE' (Cost=3 Card=358 Bytes=4654)
  50   45                           FILTER
  51   50                             SORT (JOIN)
  52   51                               VIEW (Cost=44 Card=97 Bytes=25
          22)
  53   52                                 UNION-ALL
  54   53                                   SORT (UNIQUE) (Cost=30 Car
          d=89 Bytes=2492)
  55   54                                     FILTER
  56   55                                       TABLE ACCESS (FULL) OF
           'AUD_TRN_PROC_STATE' (Cost=28 Card=89 Bytes=2492)
  57   56                                         SORT (AGGREGATE)
  58   57                                           TABLE ACCESS (FULL
          ) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)
  59   55                                       TABLE ACCESS (FULL) OF
           'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)
  60   53                                   SORT (UNIQUE) (Cost=6 Card
          =8 Bytes=176)
  61   60                                     FILTER
  62   61                                       TABLE ACCESS (FULL) OF
           'AUD_TRN_DATA_STATE' (Cost=4 Card=8 Bytes=176)
  63   62                                         SORT (AGGREGATE)
  64   63                                           TABLE ACCESS (FULL
          ) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)
  65   61                                       TABLE ACCESS (FULL) OF
           'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)
  66   41                   SORT (JOIN) (Cost=6 Card=358 Bytes=9666)
  67   66                     TABLE ACCESS (FULL) OF 'AUD_TRN_C3_CORPO
          RATE' (Cost=3 Card=358 Bytes=9666)
  68   39               SORT (UNIQUE) (Cost=53 Card=1 Bytes=51)
  69   68                 MERGE JOIN (Cost=51 Card=1 Bytes=51)
  70   69                   SORT (JOIN) (Cost=48 Card=3 Bytes=72)
  71   70                     VIEW (Cost=46 Card=3 Bytes=72)
  72   71                       SORT (GROUP BY) (Cost=46 Card=3 Bytes=
          135)
  73   72                         MERGE JOIN (Cost=43 Card=49 Bytes=22
          05)
  74   73                           SORT (JOIN) (Cost=7 Card=30 Bytes=
          570)
  75   74                             HASH JOIN (Cost=5 Card=30 Bytes=
          570)
  76   75                               TABLE ACCESS (FULL) OF 'TRN_C3
          _BANK' (Cost=2 Card=4 Bytes=24)
  77   75                               TABLE ACCESS (FULL) OF 'AUD_TR
          N_C3_BANK' (Cost=2 Card=68 Bytes=884)
  78   73                           FILTER
  79   78                             SORT (JOIN)
  80   79                               VIEW (Cost=44 Card=97 Bytes=25
          22)
  81   80                                 UNION-ALL
  82   81                                   SORT (UNIQUE) (Cost=30 Car
          d=89 Bytes=2492)
  83   82                                     FILTER
  84   83                                       TABLE ACCESS (FULL) OF
           'AUD_TRN_PROC_STATE' (Cost=28 Card=89 Bytes=2492)
  85   84                                         SORT (AGGREGATE)
  86   85                                           TABLE ACCESS (FULL
          ) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)
  87   83                                       TABLE ACCESS (FULL) OF
           'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)
  88   81                                   SORT (UNIQUE) (Cost=6 Card
          =8 Bytes=176)
  89   88                                     FILTER
  90   89                                       TABLE ACCESS (FULL) OF
           'AUD_TRN_DATA_STATE' (Cost=4 Card=8 Bytes=176)
  91   90                                         SORT (AGGREGATE)
  92   91                                           TABLE ACCESS (FULL
          ) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)
  93   89                                       TABLE ACCESS (FULL) OF
           'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)
  94   69                   SORT (JOIN) (Cost=4 Card=68 Bytes=1836)
  95   94                     TABLE ACCESS (FULL) OF 'AUD_TRN_C3_BANK'
           (Cost=2 Card=68 Bytes=1836)
  96   39               SORT (UNIQUE) (Cost=53 Card=1 Bytes=45)
  97   96                 MERGE JOIN (Cost=51 Card=1 Bytes=45)
  98   97                   SORT (JOIN) (Cost=48 Card=3 Bytes=72)
  99   98                     VIEW (Cost=46 Card=3 Bytes=72)
 100   99                       SORT (GROUP BY) (Cost=46 Card=3 Bytes=
          135)
 101  100                         MERGE JOIN (Cost=43 Card=29 Bytes=13
          05)
 102  101                           SORT (JOIN) (Cost=7 Card=18 Bytes=
          342)
 103  102                             HASH JOIN (Cost=5 Card=18 Bytes=
          342)
 104  103                               TABLE ACCESS (FULL) OF 'TRN_C3
          _SOVEREIGN' (Cost=2 Card=3 Bytes=18)
 105  103                               TABLE ACCESS (FULL) OF 'AUD_TR
          N_C3_SOVEREIGN' (Cost=2 Card=53 Bytes=689)
 106  101                           FILTER
 107  106                             SORT (JOIN)
 108  107                               VIEW (Cost=44 Card=97 Bytes=25
          22)
 109  108                                 UNION-ALL
 110  109                                   SORT (UNIQUE) (Cost=30 Car
          d=89 Bytes=2492)
 111  110                                     FILTER
 112  111                                       TABLE ACCESS (FULL) OF
           'AUD_TRN_PROC_STATE' (Cost=28 Card=89 Bytes=2492)
 113  112                                         SORT (AGGREGATE)
 114  113                                           TABLE ACCESS (FULL
          ) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)
 115  111                                       TABLE ACCESS (FULL) OF
           'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)
 116  109                                   SORT (UNIQUE) (Cost=6 Card
          =8 Bytes=176)
 117  116                                     FILTER
 118  117                                       TABLE ACCESS (FULL) OF
           'AUD_TRN_DATA_STATE' (Cost=4 Card=8 Bytes=176)
 119  118                                         SORT (AGGREGATE)
 120  119                                           TABLE ACCESS (FULL
          ) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)
 121  117                                       TABLE ACCESS (FULL) OF
           'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)
 122   97                   SORT (JOIN) (Cost=4 Card=53 Bytes=1113)
 123  122                     TABLE ACCESS (FULL) OF 'AUD_TRN_C3_SOVER
          EIGN' (Cost=2 Card=53 Bytes=1113)
 124    2       MERGE JOIN (Cost=64 Card=1 Bytes=69)
 125  124         SORT (JOIN) (Cost=60 Card=1 Bytes=50)
 126  125           VIEW (Cost=59 Card=1 Bytes=50)
 127  126             SORT (GROUP BY) (Cost=59 Card=1 Bytes=47)
 128  127               MERGE JOIN (Cost=56 Card=1 Bytes=47)
 129  128                 SORT (JOIN) (Cost=7 Card=1 Bytes=28)
 130  129                   HASH JOIN (Cost=5 Card=1 Bytes=28)
 131  130                     TABLE ACCESS (FULL) OF 'TRN_SANC_SECURIT
          Y' (Cost=2 Card=270 Bytes=2430)
 132  130                     TABLE ACCESS (FULL) OF 'AUD_TRN_SANC_SEC
          URITY' (Cost=2 Card=428 Bytes=8132)
 133  128                 FILTER
 134  133                   SORT (JOIN)
 135  134                     VIEW OF 'RVW_CAS_GETCHKFACDT' (Cost=48 C
          ard=1 Bytes=19)
 136  135                       SORT (GROUP BY) (Cost=48 Card=1 Bytes=
          63)
 137  136                         MERGE JOIN (Cost=46 Card=1 Bytes=63)
 138  137                           SORT (JOIN) (Cost=10 Card=1 Bytes=
          26)
 139  138                             HASH JOIN (Cost=8 Card=1 Bytes=2
          6)
 140  139                               TABLE ACCESS (FULL) OF 'TRN_C3
          _FAC' (Cost=3 Card=328 Bytes=2952)
 141  139                               TABLE ACCESS (FULL) OF 'AUD_TR
          N_C3_FAC' (Cost=4 Card=951 Bytes=16167)
 142  137                           FILTER
 143  142                             SORT (JOIN)
 144  143                               VIEW (Cost=44 Card=97 Bytes=35
          89)
 145  144                                 UNION-ALL
 146  145                                   SORT (UNIQUE) (Cost=30 Car
          d=89 Bytes=2492)
 147  146                                     FILTER
 148  147                                       TABLE ACCESS (FULL) OF
           'AUD_TRN_PROC_STATE' (Cost=28 Card=89 Bytes=2492)
 149  148                                         SORT (AGGREGATE)
 150  149                                           TABLE ACCESS (FULL
          ) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)
 151  147                                       TABLE ACCESS (FULL) OF
           'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)
 152  145                                   SORT (UNIQUE) (Cost=6 Card
          =8 Bytes=176)
 153  152                                     FILTER
 154  153                                       TABLE ACCESS (FULL) OF
           'AUD_TRN_DATA_STATE' (Cost=4 Card=8 Bytes=176)
 155  154                                         SORT (AGGREGATE)
 156  155                                           TABLE ACCESS (FULL
          ) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)
 157  153                                       TABLE ACCESS (FULL) OF
           'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)
 158  124         SORT (JOIN) (Cost=4 Card=1 Bytes=19)
 159  158           TABLE ACCESS (FULL) OF 'AUD_TRN_SANC_SECURITY' (Co
          st=2 Card=1 Bytes=19)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      35867  consistent gets
      22459  physical reads
          0  redo size
       3309  bytes sent via SQL*Net to client
        543  bytes received via SQL*Net from client
          6  SQL*Net roundtrips to/from client
        964  sorts (memory)
          0  sorts (disk)
         68  rows processed
Please help to tune the below query?
Regards 
 
IN&EXISTS
Jyothsna, May       03, 2010 - 1:47 am UTC
 
 
In Rooms table ive 6 rows (2,4,8,16,32,64 )
if i fire 
select roomsize from rooms
where 30>roomsize  
Then o/p is 4 rows (2,4,8,16)
But the following gives
select roomsize  from rooms 
where exists  (select 1 from rooms
              where 30 > roomsize)It gives all the 6 rows 
Why it happens???
 
May       06, 2010 - 11:37 am UTC 
 
 
see the other place you asked this identical question, and only ask in one place from now on please 
 
 
working of exists
Mukund Nampally, July      08, 2010 - 12:36 pm UTC
 
 
HI Jyothsna,
Well thats the basics!
Exists is a boolean operator in SQL
The query you've written is correlated.The subquery runs for every record in the parent query.So for every parent record your subquery checks to see is there ATLEAST one record where size > 30.Hence exists returns TRUE and this happens to all the records.
 
July      08, 2010 - 1:20 pm UTC 
 
 
it might run for every record, or it very well could be turned into a join, it depends. 
 
 
ooh! the above is not correlated
Mukund Nampally, July      08, 2010 - 1:05 pm UTC
 
 
I'm sorry the above is not correlated.
you will get the desired results by using alias for parent query and change the sub query to reference the record in the parent query
so this is your algorithm:
for record in tablep
 do  
   for record in tablec
     do
       check if 30 > tablep.roomsize(outer for loop)  
         if TRUE break;
     done 
   ouput record from tablep
 done
if you refer to tablec instead of tablep you'll retrieve all the records.Hope that helps
The question I would to ask Tom is:
what is the difference in explain plan if just selecting constant from a huge table instead of certain columns in that Table?
The optimizer should be intelligent enough to just output the constant value instead of scanning the table
 
July      08, 2010 - 1:21 pm UTC 
 
 
it would have to either
a) scan the table
b) scan an index on a column that is defined as NOT NULL on that table
we need to know how many times to output the constant after all. 
 
 
clear understanding of in/exists -not in /not exists
A reader, July      27, 2010 - 8:19 am UTC
 
 
Hi Tom,
The below are few questions which most of the times confuses but wanted to have crystal clear understanding of all together 
could you please explain them in detail ?
1.Could you please explain the implementations of the execution plans of 'IN' and 'EXISTS' in 9i,10g,11g ,
  What are the situations in general to use each one properly ? Can I use them interchnagebly *ALWAYS* (no impact to the output)
2.Could you please explain the implementations of the execution plans of 'NOT IN' and 'NOT EXISTS' in 9i,10g,11g ,
  What are the situations in general to use each one properly ?Can I use them interchangeably *ALWAYS* (no impact to the output)
3.I have come across a situation where I have to use either 'JOIN' or 'IN' for two table A and B -
    Size of A >> size of B - which is better to use when
   a) I want to select only columns of A 
   b) I want to select only columns of B
   c)I want to select columns from both table A and B .
4.which do you think more appropriate to use JOINS (or) Operators like IN/EXITS/NOT IN /NOT EXITS 
 for the same output ?
  
Thanks a million for your time  
July      27, 2010 - 12:48 pm UTC 
 
 
1) they are pretty much interchangeable.  
Never say never, never say always, I always say.  
The optimizer is a piece of software written by (imperfect) human beings.  In most all cases they (in/exists) will be interchangeable - however, there will always be exceptions.  No, there are no generalizations you can make about the exceptions - they wouldn't be exceptional if you could :)  You might even refer to the exceptions as "product issues" (which they would be).  You'll find as many exception on one side of the coin as the other (in/exists being sides of a coin).  Meaning - don't pick one over the other as law.  Use the one that SEMANTICALLY means the most and is the most convenient to code or read.  
Not in and Not exists are different.  
where outer.x not in (select y from t)
is NOT the same as
where not exists (select null from t where t.y = outer.x)
UNLESS the expression "y" is not null.  That said:
where outer.x not in (select y from t where y is not null)
is the same as
where not exists (select null from t where t.y = outer.x)
In 99.99999999999999999999% of the cases - if Y is nullable, where outer.x not in (select y from t) is a *bug* - a mistake made by the coder (who does not realize that if Y is null once in T, then ZERO ROWS will be satisfied by that predicate).
So, if you always select NOT NULLABLE expressions - the two are interchangable as above.
If the expression is nullable, then you have to use the one that semantically answers the question you are asking!!
3) no you didn't.
You either need to join (because you need data from the joined table) or you didn't.  
That is where the question begins and ends - you either 
a) need to join
b) didn't need to join.
you should be able to answer the question yourself from those comments...
4)  
https://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html sums up my feelings on all of this. 
 
 
 
senario
radha, August    23, 2010 - 5:45 am UTC
 
 
data like 1123425
result i want like 12345 but where 1 is second priority one 
August    26, 2010 - 10:00 am UTC 
 
 
that makes absolutely no sense to me.
provide a tiny bit of logic here, pretend you are explaining this to your mom, use small words and a large level of detail 
 
 
A Reader, September 21, 2010 - 6:48 am UTC
 
 
Tom,
Thanks for your time.
I have the following query.
SQL> set pagesize 90;
SQL> set linesize 123;
SQL> set autotrace traceonly;
SQL> set timing on;
SQL> SELECT *
  t1  
    WHERE
        not exists
          (
              select 'x' from t2   where t2.col2 = t1.col3
           );
557087 rows selected.
Elapsed: 00:01:58.29
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=74180 Card=542898 Bytes=148211154)
   1    0   MERGE JOIN (ANTI) (Cost=74180 Card=542898 Bytes=148211154)
   2    1     SORT (JOIN) (Cost=72417 Card=897917 Bytes=239743839)
   3    2       TABLE ACCESS (FULL) OF 't1' (Cost=1995 Card=897917 Bytes=239743839)
   4    1     SORT (UNIQUE) (Cost=1763 Card=355021 Bytes=2130126)
   5    4       TABLE ACCESS (FULL) OF 't2' (Cost=601 Card=355021 Bytes=2130126)
Statistics
----------------------------------------------------------
          0  recursive calls
        358  db block gets
      42933  consistent gets
      92415  physical reads
          0  redo size
   67671164  bytes sent via SQL*Net to client
     409184  bytes received via SQL*Net from client
      37141  SQL*Net roundtrips to/from client
          0  sorts (memory)
          2  sorts (disk)
     557087  rows processed
SQL> Set autotrace off;
SQL>
Optimiser is not choosing HASH JOIN ANTI.
I tried hinting but it follows the same plan.
we are at 
a) 9.2.0.8
b) All stats are up to date
c) count(*) of t1 & t2 matches almost the cardinality
d) HASH parameters
NAME                         VALUE
hash_join_enabled            TRUE
hash_area_size               1048576
e) Optimizer parameter 
alter session set always_anti_join = HASH
is depricated in 9i so doesnt want to use undocumented parameter.
 
Kindly suggest what would be the reason/s Optimiser not choosing the HASH ANTI JOIN. 
 
    
 
September 21, 2010 - 4:04 pm UTC 
 
 
it chose a merge join ANTI instead...
are you using pga_aggregate_target?
any non-typical features like shared server?
what is the sort area size, given that the hash area size is set so small...
 
 
 
aa
cc, September 23, 2010 - 4:42 am UTC
 
 
bb 
 
HASH ANTI Join
A Reader, September 23, 2010 - 9:14 am UTC
 
 
Tom,
Thanks a lot for your time. 
...contd above question
My fault I was giving HASH_AJ hint in the main query rather then inner query
After giving appropriate hint
I got the following trace
SQL> set timing on;
SQL> set pagesize 90;
SQL> set linesize 123;
SQL> set autotrace traceonly;
 
SQL> SELECT *
    FROM   
   t1 dt
    WHERE
        not exists
        (  select /*+HASH_AJ*/ 'x' from t2 ma where t2.col2 = t1.col3
        ); 
560969 rows selected.
Elapsed: 00:03:04.14
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=132646 Card=542898 Bytes=148211154)
   1    0   HASH JOIN (ANTI) (Cost=132646 Card=542898 Bytes=148211154)
   2    1     TABLE ACCESS (FULL) OF 't1' (Cost=1995 Card=897917 Bytes=239743839)
   3    1     TABLE ACCESS (FULL) OF 't2' (Cost=601    Card=355021 Bytes=2130126)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      43315  consistent gets
      68807  physical reads
          0  redo size
   70888229  bytes sent via SQL*Net to client
     412023  bytes received via SQL*Net from client
      37399  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     560969  rows processed
Above Plan is costlier then SORT MERGE ANTI posted earlier.
( time taken also more)
Other details
which you asked
SQL> show  parameter pga_aggregate_target;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target                 big integer 209715200
SQL> select distinct server from v$session;
SERVER
---------
DEDICATED
SQL> show parameter sort_area;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
sort_area_retained_size              integer     1048576
sort_area_size                       integer     524288
SQL> show parameter workarea_size_policy;
NAME                                 TYPE  VALUE
-----------  --------------------------------     ----------
 
workarea_size_policy                 string           MANUAL
SQL>
========================================================================================================
I tested the same in our test env ( test env was refreshed recenlty from live - using exp / IMP )
and found that Oracle chosen the best plan ( ANTI HASH JOIN) with lesser cost ( for  tables t1, t2 which are alomost same as of live , 
except the data distribution etc :) )
here its is :
SQL> set pagesize 90;
SQL> set linesize 123;
SQL> set autotrace traceonly;
 
SQL> SELECT *
      FROM
           t1  
    WHERE   
        not exists    
 ( 
              select 'x' from t2 ma where  t2.col2 = t1.col3
           );
548755 rows selected.
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=6595 Card=537594      Bytes=145150380)
   1    0   HASH JOIN (ANTI) (Cost=6595 Card=537594 Bytes=145150380)
   2    1     TABLE ACCESS (FULL) OF 't1' (Cost=2140 Card=888661 Bytes=234606504)
   3    1     TABLE ACCESS (FULL) OF 't2' (Cost=632     Card=351069 Bytes=2106414)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      45433  consistent gets
      69231  physical reads
          0  redo size
   70024207  bytes sent via SQL*Net to client
     403068  bytes received via SQL*Net from client
      36585  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     548755  rows processed
Other details.
SQL> show  parameter pga_aggregate_target;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target                 big integer 209715200
SQL> select distinct server from v$session;
SERVER
---------
DEDICATED
SQL> show parameter sort_area;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
sort_area_retained_size              integer     1048576
sort_area_size                       integer     524288
SQL> show parameter workarea_size_policy;
NAME                                 TYPE  VALUE
-----------  --------------------------------     ----------
 
workarea_size_policy                 string   AUTO
SQL>
Could you please
a) comment of above diffrences in live? (HASH_AJ is costlier then SM_AJ )
b) In Test it is doing good ( because of use of PGA .. workwareasize policy?)
c) Any other way to write the main query (  i used NOT IN ( .... where col is NOT NULL) but cost is in Millions)
regards
 
 
September 27, 2010 - 9:28 am UTC 
 
 
... Above Plan is costlier then SORT MERGE ANTI posted earlier.
( time taken also more ...
and that means... it was right without the hint.
why are you using manual memory management?  On one system you are using manual, on the other auto - be consistent.  that is why (a) happened - your optimizer settings are very different.  You have constrained to a very small size the amount of memory that can be used with the manual system.
as for (c), forget the cost, what is the plan we see.
 
 
 
 
contd.. HASH ANTI Join....
A Reader, September 24, 2010 - 5:35 am UTC
 
 
Tom,
further question..
d)  Can I set 
SQL> alter session set workarea_size_policy=automatic;
for my session in prod DB ( where we have it is set to manual) and run the SQL to see if it is chosing the hash join ( without hint) and doing good at lesser cost?
(sorry to ask you this  as this could be DIY ,  but just thinking if it would be safe for a single session to set this parameter to automatic in  live system as my session may cross the pga_aggregate_target ( a soft boundary)and machine may starve of memory?)
   
 
 
September 27, 2010 - 10:17 am UTC 
 
 
d) yes. 
 
 
In and Exists - Oracle 10g - 10.2.0.1
Hari, November  02, 2010 - 5:59 am UTC
 
 
Hi Tom,
I was actually experimenting the usage of IN and EXISTS in Oracle 10g version. But I could not get the same result as you got in 8i. I did the following:
create table smalltable (data1 number(3), data2 varchar2(10), data3 varchar2(10), data4 varchar2(10));
create or replace procedure smalltableproc is
counter number(3);
begin
 counter := 1;
 while (counter <= 100)
 loop
 insert into smalltable values (counter,'aa' || counter,'bb' || counter, 'cc' || counter);
 counter := counter + 1;
 end loop;
 commit;
end;
SQL> exec smalltableproc
PL/SQL procedure successfully completed.
SQL> create index smalltabledata1idx on smalltable(data1);
SQL> create table largetable (data5 number(6), data6 varchar2(10), data7 varchar2(10), data8 varchar2(10));
create or replace procedure largetableproc is
counter number(6);
begin
 counter := 1;
 while (counter <= 100000)
 loop
  insert into largetable values (counter,'aa' || counter,'bb' || counter, 'cc' || counter);
  counter := counter + 1;
 end loop;
 commit;
end;
SQL> exec largetableproc
PL/SQL procedure successfully completed.
SQL> create index largetabledata5idx on largetable(data5);
SQL> exec DBMS_STATS.gather_table_stats('INXQA','SMALLTABLE');
PL/SQL procedure successfully completed.
SQL> exec DBMS_STATS.gather_table_stats('INXQA','LARGETABLE');
PL/SQL procedure successfully completed.
SQL> exec DBMS_STATS.gather_index_stats('INXQA','SMALLTABLEDATA1IDX');
PL/SQL procedure successfully completed.
SQL> exec DBMS_STATS.gather_index_stats('INXQA','LARGETABLEDATA5IDX');
PL/SQL procedure successfully completed.
Now, when I issued the select statements, I do not find any difference:
SQL> select * from largetable where data5 in (select data1 from smalltable);
100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2333463779
--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |   100 |  3100 |   102   (1)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| LARGETABLE         |     1 |    28 |     2   (0)| 00:00:01 |
|   2 |   NESTED LOOPS              |                    |   100 |  3100 |   102   (1)| 00:00:02 |
|   3 |    SORT UNIQUE              |                    |   100 |   300 |     1   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN         | SMALLTABLEDATA1IDX |   100 |   300 |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN         | LARGETABLEDATA5IDX |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("DATA5"="DATA1")
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        124  consistent gets
          0  physical reads
          0  redo size
       2965  bytes sent via SQL*Net to client
        442  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed
SQL> select * from largetable l where exists (select null from smalltable s where data1=l.data5);
100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2333463779
--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |   100 |  3100 |   102   (1)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| LARGETABLE         |     1 |    28 |     2   (0)| 00:00:01 |
|   2 |   NESTED LOOPS              |                    |   100 |  3100 |   102   (1)| 00:00:02 |
|   3 |    SORT UNIQUE              |                    |   100 |   300 |     1   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN         | SMALLTABLEDATA1IDX |   100 |   300 |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN         | LARGETABLEDATA5IDX |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("DATA1"="L"."DATA5")
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        124  consistent gets
          0  physical reads
          0  redo size
       2965  bytes sent via SQL*Net to client
        442  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed
When I reveresed the usage of tables, with Small in Outer and Large in Inner, I get the same results for both IN and EXISTS
Does this mean that the usage on IN and EXISTS will be more or less the same in Oracle 10g?
Please Explain
Thanks
Hari 
 
November  02, 2010 - 6:34 am UTC 
 
 
things change in 10 years.
In fact, if you ctl-f for:
IN and Inner Join  April 4, 2005
on this page, you'll see we had this exact discussion a long time ago ;) 
 
 
Kulkarni, November  04, 2010 - 1:17 am UTC
 
 
Tom,
I did the same way as you explained above but did not get different execution plans or costs. I got the same results irrespective of the query that was run.
Table created.
SQL>  insert into big /*+ append */ select * from big;
13124 rows created.
SQL> /
26248 rows created.
SQL> /
52496 rows created.
SQL> /
104992 rows created.
SQL> /
209984 rows created.
SQL> commit;
Commit complete.
SQL> create table small as select * from all_objects where rownum<1000;
Table created.
SQL> desc big
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL VARCHAR2(30)
 OBJECT_NAME                               NOT NULL VARCHAR2(30)
 SUBOBJECT_NAME                                     VARCHAR2(30)
 OBJECT_ID                                 NOT NULL NUMBER
 DATA_OBJECT_ID                                     NUMBER
 OBJECT_TYPE                                        VARCHAR2(19)
 CREATED                                   NOT NULL DATE
 LAST_DDL_TIME                             NOT NULL DATE
 TIMESTAMP                                          VARCHAR2(19)
 STATUS                                             VARCHAR2(7)
 TEMPORARY                                          VARCHAR2(1)
 GENERATED                                          VARCHAR2(1)
 SECONDARY                                          VARCHAR2(1)
 NAMESPACE                                 NOT NULL NUMBER
 EDITION_NAME                                       VARCHAR2(30)
SQL> create index big_idx on big(SUBOBJECT_NAME);
Index created.
SQL> create index small_idx on small(SUBOBJECT_NAME);
Index created.
SQL> analyze  table big compute statistics for table for all indexed columns for all indexes;
Table analyzed.
SQL> analyze table small compute statistics for table for all indexed columns for all indexes;
Table analyzed.
SQL> set autotrace on exp stat
SQL> select count(SUBOBJECT_NAME) from big where object_id in(select object_id from small);
COUNT(SUBOBJECT_NAME)
---------------------
                    0
Execution Plan
----------------------------------------------------------
Plan hash value: 2513579575
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    28 |  1510   (2)| 00:00:19 |
|   1 |  SORT AGGREGATE       |       |     1 |    28 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|       |   419K|    11M|  1510   (2)| 00:00:19 |
|   3 |    TABLE ACCESS FULL  | SMALL |   999 | 12987 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | BIG   |   419K|  6151K|  1500   (1)| 00:00:19 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_ID"="OBJECT_ID")
Statistics
----------------------------------------------------------
         24  recursive calls
          0  db block gets
       5506  consistent gets
          0  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
SQL> l
  1* select count(SUBOBJECT_NAME) from big where object_id in(select object_id from small)
SQL> select count(SUBOBJECT_NAME) from big where object_id in(select object_id from small where small.object_id=big.object_id);
COUNT(SUBOBJECT_NAME)
---------------------
                    0
Execution Plan
----------------------------------------------------------
Plan hash value: 2513579575
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    28 |  1510   (2)| 00:00:19 |
|   1 |  SORT AGGREGATE       |       |     1 |    28 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|       |   419K|    11M|  1510   (2)| 00:00:19 |
|   3 |    TABLE ACCESS FULL  | SMALL |   999 | 12987 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | BIG   |   419K|  6151K|  1500   (1)| 00:00:19 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_ID"="OBJECT_ID")
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       5504  consistent gets
          0  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
SQL> select count(SUBOBJECT_NAME) from big where exists ( select null from small where small.object_id=big.object_id);
COUNT(SUBOBJECT_NAME)
---------------------
                    0
Execution Plan
----------------------------------------------------------
Plan hash value: 2513579575
-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    28 |  1510   (2)| 00:00:19 |
|   1 |  SORT AGGREGATE       |       |     1 |    28 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|       |   419K|    11M|  1510   (2)| 00:00:19 |
|   3 |    TABLE ACCESS FULL  | SMALL |   999 | 12987 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | BIG   |   419K|  6151K|  1500   (1)| 00:00:19 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("SMALL"."OBJECT_ID"="BIG"."OBJECT_ID")
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       5504  consistent gets
          0  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
SQL> select count(SUBOBJECT_NAME) from small where exists ( select null from big where small.object_id=big.object_id);
COUNT(SUBOBJECT_NAME)
---------------------
                    0
Execution Plan
----------------------------------------------------------
Plan hash value: 107836466
-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |     1 |    27 |  1510   (2)| 00:00:19 |
|   1 |  SORT AGGREGATE     |       |     1 |    27 |            |          |
|*  2 |   HASH JOIN SEMI    |       |   999 | 26973 |  1510   (2)| 00:00:19 |
|   3 |    TABLE ACCESS FULL| SMALL |   999 | 13986 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| BIG   |   419K|  5331K|  1500   (1)| 00:00:19 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("SMALL"."OBJECT_ID"="BIG"."OBJECT_ID")
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         32  consistent gets
          0  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
SQL>  select count(SUBOBJECT_NAME) from small where object_id in(select object_id from big);
COUNT(SUBOBJECT_NAME)
---------------------
                    0
Execution Plan
----------------------------------------------------------
Plan hash value: 107836466
-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |     1 |    27 |  1510   (2)| 00:00:19 |
|   1 |  SORT AGGREGATE     |       |     1 |    27 |            |          |
|*  2 |   HASH JOIN SEMI    |       |   999 | 26973 |  1510   (2)| 00:00:19 |
|   3 |    TABLE ACCESS FULL| SMALL |   999 | 13986 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| BIG   |   419K|  5331K|  1500   (1)| 00:00:19 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_ID"="OBJECT_ID")
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         32  consistent gets
          0  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
I am using 11.2 
 
 
November  04, 2010 - 3:46 am UTC 
 
 
please read the review immediately prior to your - same exact thing.  again. 
 
 
EXISTS and  ROWNUM = 1
A reader, June      19, 2012 - 8:29 am UTC
 
 
When we have EXISTS check, is it better to put ROWNUM = 1? 
I am listing two queries below with their corresponding Auto Trace info. 
Only difference between these queries is the ROWNUM = 1 with the EXISTS check for the second query.
The first query takes 25 secs and the second query takes 5 secs
SQL> SET TIMING ON
SQL> SELECT s.supplier, s.sup_name
  2  FROM sups s
  3  WHERE EXISTS(SELECT 'Y'
  4               FROM aes_dlp_iscl adi
  5               WHERE adi.supplier = s.supplier);
7576 rows selected.
Elapsed: 00:00:25.81
Execution Plan
----------------------------------------------------------                      
Plan hash value: 532059534                                                      
                                                                                
--------------------------------------------------------------------------------
-----------------------                                                         
                                                                                
| Id  | Operation           | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |                                                         
                                                                                
-------------------------------------------------------------------------------------------------------                                                         
                                                                                
|   0 | SELECT STATEMENT    |                 |  5947 |   191K|  1327  (47)| 00:00:19 |       |       |                                                         
                                                                                
|*  1 |  HASH JOIN SEMI     |                 |  5947 |   191K|  1327  (47)| 00:00:19 |       |       |                                                         
                                                                                
|   2 |   TABLE ACCESS FULL | SUPS            | 25120 |   662K|   112   (1)| 00:00:02 |       |       |                                                         
                                                                                
|   3 |   PARTITION HASH ALL|                 |    65M|   375M|   608   (3)| 00:00:09 |     1 |    16 |                                                         
                                                                                
|   4 |    INDEX FULL SCAN  | AES_DLP_ISCL_I1 |    65M|   375M|   608   (3)| 00:00:09 |     1 |    16 |                                                         
                                                                                
-------------------------------------------------------------------------------------------------------                                                         
                                                                                
                                                                                
Predicate Information (identified by operation id):                             
---------------------------------------------------                             
                                                                                
   1 - access("ADI"."SUPPLIER"="S"."SUPPLIER")                                  
Statistics
----------------------------------------------------------                      
          0  recursive calls                                                    
          0  db block gets                                                      
      60274  consistent gets                                                    
      42419  physical reads                                                     
          0  redo size                                                          
     235240  bytes sent via SQL*Net to client                                   
       3787  bytes received via SQL*Net from client                             
        507  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
       7576  rows processed                                                     
SQL> 
SQL> SELECT s.supplier, s.sup_name
  2  FROM sups s
  3  WHERE EXISTS(SELECT 'Y'
  4               FROM aes_dlp_iscl adi
  5               WHERE adi.supplier = s.supplier
  6                AND  ROWNUM = 1);
7576 rows selected.
Elapsed: 00:00:04.71
Execution Plan
----------------------------------------------------------                      
Plan hash value: 1883786237                                                     
                                                                                
--------------------------------------------------------------------------------------------------------                                                        
                                                                                
| Id  | Operation            | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |                                                        
                                                                                
--------------------------------------------------------------------------------------------------------                                                        
                                                                                
|   0 | SELECT STATEMENT     |                 |     1 |    27 | 15874   (1)| 00:03:43 |       |       |                                                        
                                                                                
|*  1 |  FILTER              |                 |       |       |            |          |       |       |                                                        
                                                                                
|   2 |   TABLE ACCESS FULL  | SUPS            | 25120 |   662K|   112   (1)| 00:00:02 |       |       |                                                        
                                                                                
|*  3 |   COUNT STOPKEY      |                 |       |       |            |          |       |       |                                                        
                                                                                
|   4 |    PARTITION HASH ALL|                 |     2 |    12 |     1   (0)| 00:00:01 |     1 |    16 |                                                        
                                                                                
|*  5 |     INDEX RANGE SCAN | AES_DLP_ISCL_I1 |     2 |    12 |     1   (0)| 00:00:01 |     1 |    16 |                                                        
                                                                                
--------------------------------------------------------------------------------------------------------                                                        
                                                                                
                                                                                
Predicate Information (identified by operation id):                             
---------------------------------------------------                             
                                                                                
   1 - filter( EXISTS (SELECT 0 FROM "ETL"."AES_DLP_ISCL" "ADI" WHERE ROWNUM=1 AND                                                                                                                                             
              "ADI"."SUPPLIER"=:B1))                                            
   3 - filter(ROWNUM=1)                                                         
   5 - access("ADI"."SUPPLIER"=:B1)                                             
Statistics
----------------------------------------------------------                      
          0  recursive calls                                                    
          0  db block gets                                                      
     979545  consistent gets                                                    
          1  physical reads                                                     
          0  redo size                                                          
     235240  bytes sent via SQL*Net to client                                   
       3787  bytes received via SQL*Net from client                             
        507  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
       7576  rows processed                                                     
SQL> SPOOL OFF
 
 
June      19, 2012 - 8:46 am UTC 
 
 
one did physical IO - PIO, the other did not.
does that one that did PIO always do PIO
does that one that did not do PIO always not do PIO
get a tkprof to compare, autotrace is insufficient, we'd need to see where the IO was taking place.
My suspicion is the first query did all of the physical IO that the second would have had to of performed had it been run first. 
 
 
Cache
Tony Fernandez, June      19, 2012 - 9:06 am UTC
 
 
Will alter system clear buffer_cache; help in this case to compare apples to apples? 
June      19, 2012 - 9:38 am UTC 
 
 
Nope, not unless you are using an unbuffered file system (like ASM or RAW disks for "forcedirectio" mount like options).
Otherwise, the file system is typically buffered by the OS and your second phyiscal IO is not really a physical IO but a read from the secondary SGA - the file system cache (search this site for "secondary sga" to read more about that)
Also, what are the odds your buffer cache will be entirely empty?  Zero except right after startup, so flushing the buffer cache is as fake as having the second query run second.
 
 
 
EXISTS and ROWNUM = 1 
A reader, June      19, 2012 - 9:16 am UTC
 
 
It doesn't matter if I change the order execution of the query, the consistent gets and physical reads remain the same. I ran both the queries multiple times with the same result. Let me see if I get the DBA to give me a TKPROF for the queries 
 
EXISTS and ROWNUM = 1
A reader, June      19, 2012 - 10:46 am UTC
 
 
Please find the TKPROF results below
With ROWNUM =1
TKPROF: Release 10.2.0.3.0 - Production on Tue Jun 19 11:33:52 2012
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
Trace file: /in/RMD2/oracle/diag/rdbms/rmd2/RMD2/trace/RMD2_ora_1179858.trc
Sort options: default
********************************************************************************
count    = number of times OCI procedure was executed
cpu      = cpu time in seconds executing 
elapsed  = elapsed time in seconds executing
disk     = number of physical reads of buffers from disk
query    = number of buffers gotten for consistent read
current  = number of buffers gotten in current mode (usually for update)
rows     = number of rows processed by the fetch or execute call
********************************************************************************
SELECT s.supplier, s.sup_name
FROM sups s
WHERE EXISTS(SELECT :"SYS_B_0"
             FROM aes_dlp_iscl adi
             WHERE adi.supplier = s.supplier
              AND  ROWNUM = :"SYS_B_1")
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      507      3.46       5.20          0     979545          0        7576
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      509      3.46       5.20          0     979545          0        7576
Misses in library cache during parse: 0
Optimizer mode: CHOOSE
Parsing user id: 488  
Rows     Row Source Operation
-------  ---------------------------------------------------
   7576  FILTER  (cr=979545 pr=0 pw=0 time=5434066 us)
  24965   TABLE ACCESS FULL SUPS (cr=798 pr=0 pw=0 time=25702 us cost=112 size=678240 card=25120)
   7576   COUNT STOPKEY (cr=978747 pr=0 pw=0 time=5072884 us)
   7576    PARTITION HASH ALL PARTITION: 1 16 (cr=978747 pr=0 pw=0 time=4987595 us cost=1 size=12 card=2)
   7576     INDEX RANGE SCAN AES_DLP_ISCL_I1 PARTITION: 1 16 (cr=978747 pr=0 pw=0 time=4546245 us cost=1 size=12 card=2)(object id 2467472)
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     507        0.00          0.00
  Disk file operations I/O                        1        0.00          0.00
  SQL*Net message from client                   507        8.01          8.27
********************************************************************************
SELECT RTRIM(c.value, :"SYS_B_0")||:"SYS_B_1"||d.instance_name||:"SYS_B_2"||LTRIM(TO_CHAR(a.spid))||:"SYS_B_3" trace_file
FROM v$process a, v$session b, v$parameter c, v$instance d
WHERE a.addr = b.paddr
 AND  b.audsid = SYS_CONTEXT(:"SYS_B_4", :"SYS_B_5")
 AND  c.name = :"SYS_B_6"
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.04       0.04          0          0          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.04       0.04          0          0          0           1
Misses in library cache during parse: 0
Optimizer mode: CHOOSE
Parsing user id: 488  
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  MERGE JOIN  (cr=0 pr=0 pw=0 time=47707 us)
      1   SORT JOIN (cr=0 pr=0 pw=0 time=41801 us)
      1    MERGE JOIN  (cr=0 pr=0 pw=0 time=41765 us)
     40     SORT JOIN (cr=0 pr=0 pw=0 time=29797 us)
     44      MERGE JOIN  (cr=0 pr=0 pw=0 time=29654 us)
     44       SORT JOIN (cr=0 pr=0 pw=0 time=28656 us)
     44        NESTED LOOPS  (cr=0 pr=0 pw=0 time=24611 us)
      1         MERGE JOIN  (cr=0 pr=0 pw=0 time=28086 us)
   2655          SORT JOIN (cr=0 pr=0 pw=0 time=11500 us)
   2655           NESTED LOOPS  (cr=0 pr=0 pw=0 time=7948 us)
      1            NESTED LOOPS  (cr=0 pr=0 pw=0 time=111 us)
      1             NESTED LOOPS  (cr=0 pr=0 pw=0 time=78 us)
      1              FIXED TABLE FULL X$QUIESCE (cr=0 pr=0 pw=0 time=41 us)
      1              FIXED TABLE FULL X$KVIT (cr=0 pr=0 pw=0 time=29 us)
      1             FIXED TABLE FULL X$KSUXSINST (cr=0 pr=0 pw=0 time=26 us)
   2655            FIXED TABLE FULL X$KSPPCV (cr=0 pr=0 pw=0 time=6714 us)
      1          FILTER  (cr=0 pr=0 pw=0 time=15303 us)
      1           SORT JOIN (cr=0 pr=0 pw=0 time=11728 us)
      1            FIXED TABLE FULL X$KSPPI (cr=0 pr=0 pw=0 time=8150 us)
     44         FIXED TABLE FULL X$KSLWT (cr=0 pr=0 pw=0 time=110 us)
     44       SORT JOIN (cr=0 pr=0 pw=0 time=950 us)
   1142        FIXED TABLE FULL X$KSLED (cr=0 pr=0 pw=0 time=897 us)
      1     SORT JOIN (cr=0 pr=0 pw=0 time=11978 us)
      1      FIXED TABLE FULL X$KSUSE (cr=0 pr=0 pw=0 time=11910 us)
      1   SORT JOIN (cr=0 pr=0 pw=0 time=5906 us)
     44    FIXED TABLE FULL X$KSUPR (cr=0 pr=0 pw=0 time=99 us)
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  SQL*Net message from client                     2       17.22         17.23
********************************************************************************
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch      509      3.50       5.25          0     979545          0        7577
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      513      3.50       5.25          0     979545          0        7577
Misses in library cache during parse: 0
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     510        0.00          0.00
  SQL*Net message from client                   510       17.22         40.00
  Disk file operations I/O                        1        0.00          0.00
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute      0      0.00       0.00          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        0      0.00       0.00          0          0          0           0
Misses in library cache during parse: 0
    2  user  SQL statements in session.
    0  internal SQL statements in session.
    2  SQL statements in session.
********************************************************************************
Trace file: /in/RMD2/oracle/diag/rdbms/rmd2/RMD2/trace/RMD2_ora_1179858.trc
Trace file compatibility: 10.01.00
Sort options: default
       1  session in tracefile.
       2  user  SQL statements in trace file.
       0  internal SQL statements in trace file.
       2  SQL statements in trace file.
       2  unique SQL statements in trace file.
    1673  lines in trace file.
 9217858  elapsed seconds in trace file.
With No RownumSELECT s.supplier, s.sup_name
FROM sups s
WHERE EXISTS(SELECT :"SYS_B_0"
             FROM anp_dlp_iscl adi
             WHERE adi.supplier = s.supplier)
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.01       0.00          0          0          0           0
Fetch      507     58.84      60.16      42419      60274          0        7576
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      509     58.85      60.17      42419      60274          0        7576
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: CHOOSE
Parsing user id: 488  
Rows     Row Source Operation
-------  ---------------------------------------------------
   7576  HASH JOIN SEMI (cr=60274 pr=42419 pw=0 time=68810647 us cost=1327 size=196251 card=5947)
  24965   TABLE ACCESS FULL SUPS (cr=293 pr=0 pw=0 time=12459 us cost=112 size=678240 card=25120)
66175360   PARTITION HASH ALL PARTITION: 1 16 (cr=59981 pr=42419 pw=0 time=80186367 us cost=608 size=394093758 card=65682293)
66175360    INDEX FULL SCAN ANP_DLP_ISCL_I1 PARTITION: 1 16 (cr=59981 pr=42419 pw=0 time=41272380 us cost=608 size=394093758 card=65682293)(object id 2467472)
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     507        0.00          0.00
  Disk file operations I/O                       14        0.00          0.00
  SQL*Net message from client                   507        1.08          1.35
  db file sequential read                     42419        0.01          0.53
********************************************************************************
SELECT RTRIM(c.value, :"SYS_B_0")||:"SYS_B_1"||d.instance_name||:"SYS_B_2"||LTRIM(TO_CHAR(a.spid))||:"SYS_B_3" trace_file
FROM v$process a, v$session b, v$parameter c, v$instance d
WHERE a.addr = b.paddr
 AND  b.audsid = SYS_CONTEXT(:"SYS_B_4", :"SYS_B_5")
 AND  c.name = :"SYS_B_6"
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.04       0.04          0          0          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.04       0.04          0          0          0           1
Misses in library cache during parse: 0
Optimizer mode: CHOOSE
Parsing user id: 488  
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  MERGE JOIN  (cr=0 pr=0 pw=0 time=47972 us)
      1   SORT JOIN (cr=0 pr=0 pw=0 time=42076 us)
      1    MERGE JOIN  (cr=0 pr=0 pw=0 time=42037 us)
     35     SORT JOIN (cr=0 pr=0 pw=0 time=29404 us)
     43      MERGE JOIN  (cr=0 pr=0 pw=0 time=29284 us)
     43       SORT JOIN (cr=0 pr=0 pw=0 time=28286 us)
     43        NESTED LOOPS  (cr=0 pr=0 pw=0 time=24186 us)
      1         MERGE JOIN  (cr=0 pr=0 pw=0 time=27569 us)
   2655          SORT JOIN (cr=0 pr=0 pw=0 time=11089 us)
   2655           NESTED LOOPS  (cr=0 pr=0 pw=0 time=7941 us)
      1            NESTED LOOPS  (cr=0 pr=0 pw=0 time=96 us)
      1             NESTED LOOPS  (cr=0 pr=0 pw=0 time=66 us)
      1              FIXED TABLE FULL X$QUIESCE (cr=0 pr=0 pw=0 time=33 us)
      1              FIXED TABLE FULL X$KVIT (cr=0 pr=0 pw=0 time=27 us)
      1             FIXED TABLE FULL X$KSUXSINST (cr=0 pr=0 pw=0 time=24 us)
   2655            FIXED TABLE FULL X$KSPPCV (cr=0 pr=0 pw=0 time=6335 us)
      1          FILTER  (cr=0 pr=0 pw=0 time=15256 us)
      1           SORT JOIN (cr=0 pr=0 pw=0 time=11699 us)
      1            FIXED TABLE FULL X$KSPPI (cr=0 pr=0 pw=0 time=8177 us)
     43         FIXED TABLE FULL X$KSLWT (cr=0 pr=0 pw=0 time=104 us)
     43       SORT JOIN (cr=0 pr=0 pw=0 time=920 us)
   1142        FIXED TABLE FULL X$KSLED (cr=0 pr=0 pw=0 time=891 us)
      1     SORT JOIN (cr=0 pr=0 pw=0 time=12663 us)
      1      FIXED TABLE FULL X$KSUSE (cr=0 pr=0 pw=0 time=12575 us)
      1   SORT JOIN (cr=0 pr=0 pw=0 time=5896 us)
     43    FIXED TABLE FULL X$KSUPR (cr=0 pr=0 pw=0 time=138 us)
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       3        0.00          0.00
  SQL*Net message from client                     3       12.50         12.51
  SQL*Net break/reset to client                   1        0.00          0.00
********************************************************************************
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.01       0.00          0          0          0           0
Fetch      509     58.88      60.21      42419      60274          0        7577
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      513     58.89      60.22      42419      60274          0        7577
Misses in library cache during parse: 1
Misses in library cache during execute: 1
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     511        0.00          0.00
  SQL*Net message from client                   511       12.50         22.26
  Disk file operations I/O                       14        0.00          0.00
  db file sequential read                     42419        0.01          0.53
  SQL*Net break/reset to client                   1        0.00          0.00
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute      0      0.00       0.00          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        0      0.00       0.00          0          0          0           0
Misses in library cache during parse: 0
    2  user  SQL statements in session.
    0  internal SQL statements in session.
    2  SQL statements in session.
********************************************************************************
Trace file: /in/RMD2/oracle/diag/rdbms/rmd2/RMD2/trace/RMD2_ora_1220680.trc
Trace file compatibility: 10.01.00
Sort options: default
       1  session in tracefile.
       2  user  SQL statements in trace file.
       0  internal SQL statements in trace file.
       2  SQL statements in trace file.
       2  unique SQL statements in trace file.
   44205  lines in trace file.
 9220797  elapsed seconds in trace file.
 
 
June      19, 2012 - 12:41 pm UTC 
 
 
cursor sharing = force or similar :( :( :(  ugh.  I always shudder when I see that.
what kind of hardware do you have?  It seems to be really "not fast" cpu wise at hashing.  I would have thought the hash semi join should have been better - but we spent a ton of cpu time hashing.
is this index that is being used hash partitioned into 16 partitions (by something other than supplier id?) 
 
 
count stopkey
Keith, June      19, 2012 - 11:37 am UTC
 
 
I noticed that the plan with "AND ROWNUM = 1" did a COUNT STOPKEY while the other plan omitted this step. So it seems to me that COUNT STOPKEY should usually (always?) be implied by WHERE EXISTS. What are your thought/insights Tom?
 
June      19, 2012 - 12:44 pm UTC 
 
 
rownum is count stopkey, you won't see it without a rownum predicate ever. 
 
 
EXISTS and ROWNUM = 1
A reader, June      19, 2012 - 1:18 pm UTC
 
 
How did you figure out the cursor_sharing was similar? What part in TKPROF tells you that info?
The index definition is as follows
CREATE INDEX AES_DLP_ISCL_I1 ON AES_DLP_ISCL
(SUPPLIER, ORIGIN_COUNTRY_ID, LOC, LOC_TYPE)
NOLOGGING
LOCAL (  
  PARTITION ADI_PKX_P01
    TABLESPACE ADIX_P01,  
  PARTITION ADI_PKX_P02
    TABLESPACE ADIX_P02,  
  PARTITION ADI_PKX_P03
    TABLESPACE ADIX_P03,  
  PARTITION ADI_PKX_P04
    TABLESPACE ADIX_P04,  
  PARTITION ADI_PKX_P05
    TABLESPACE ADIX_P05,  
  PARTITION ADI_PKX_P06
    TABLESPACE ADIX_P06,  
  PARTITION ADI_PKX_P07
    TABLESPACE ADIX_P07,  
  PARTITION ADI_PKX_P08
    TABLESPACE ADIX_P08,  
  PARTITION ADI_PKX_P09
    TABLESPACE ADIX_P09,  
  PARTITION ADI_PKX_P10
    TABLESPACE ADIX_P10,  
  PARTITION ADI_PKX_P11
    TABLESPACE ADIX_P11,  
  PARTITION ADI_PKX_P12
    TABLESPACE ADIX_P12,  
  PARTITION ADI_PKX_P13
    TABLESPACE ADIX_P13,  
  PARTITION ADI_PKX_P14
    TABLESPACE ADIX_P14,  
  PARTITION ADI_PKX_P15
    TABLESPACE ADIX_P15,  
  PARTITION ADI_PKX_P16
    TABLESPACE ADIX_P16
)
NOPARALLEL
COMPRESS 3;
I am not that sure about hardware. Let me try and find out 
 
June      19, 2012 - 3:01 pm UTC 
 
 
SELECT s.supplier, s.sup_name
FROM sups s
WHERE EXISTS(SELECT :"SYS_B_0"
             FROM anp_dlp_iscl adi
             WHERE adi.supplier = s.supplier)
:"SYS_B_0"
that is the first bind name we use to replace literals with, then :"SYS_B_1" and so on.  It is a signature.
I guessed it right based on the LIOs - I was sure that was a 16 partition index with a height of about 2, sometimes 3, depending on the partition.
SELECT s.supplier, s.sup_name
FROM sups s
WHERE EXISTS(SELECT :"SYS_B_0"
             FROM aes_dlp_iscl adi
             WHERE adi.supplier = s.supplier
              AND  ROWNUM = :"SYS_B_1")
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch      507      3.46       5.20          0     979545          0        7576
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      509      3.46       5.20          0     979545          0        7576
Misses in library cache during parse: 0
Optimizer mode: CHOOSE
Parsing user id: 488  
Rows     Row Source Operation
-------  ---------------------------------------------------
   7576  FILTER  (cr=979545 pr=0 pw=0 time=5434066 us)
  24965   TABLE ACCESS FULL SUPS (cr=798 pr=0 pw=0 time=25702 us cost=112 size=678240 card=25120)
   7576   COUNT STOPKEY (cr=978747 pr=0 pw=0 time=5072884 us)
   7576    PARTITION HASH ALL PARTITION: 1 16 (cr=978747 pr=0 pw=0 time=4987595 us cost=1 size=12 
card=2)
   7576     INDEX RANGE SCAN AES_DLP_ISCL_I1 PARTITION: 1 16 (cr=978747 pr=0 pw=0 time=4546245 us 
cost=1 size=12 card=2)(object id 2467472)978,747 IO's on the index, but only 24,965 seeks on the index - so about 39 IO's per index range scan.
Since it was partitioned and hash partitioned at that, I knew it would be in 1, 2, 4, 8, 16, 32, 64, ... partitions (it better be!).  64 was out of the question.  If you divide 16 into 39 you get about 2/3 IO's per index partition.
So, does that index need to be partitioned, could that index by partitioned by something else?  does it have to be local, you'll have to always scan 16 partitions unless you use the partition key in the where clause too. 
 
 
 
EXISTS and ROWNUM = 1
A reader, June      19, 2012 - 3:28 pm UTC
 
 
Thanks for the explanation on the TKPROF.
But sorry, I did not understand why ROWNUM = 1 was faster? Should we use ROWNUM = 1 in the exists? 
June      19, 2012 - 5:59 pm UTC 
 
 
In general - no, you shouldn't.
It is a complex plan - you have 16 partitions to index range scan, 25k times.  Or we can full scan and hash join (big bulky operations, good)
For some reason, the hashing took an inordinate amount of time on your computer. 
 
 
nice explanation
mailos, January   25, 2013 - 4:08 am UTC
 
 
 
 
Soco, May       03, 2013 - 8:10 am UTC
 
 
Hi Tom,
Thanks for all your sharing on this topic and I have learnt a lot from it. I also run few tests on my own database and I found my results are different from yours. My guess is it may cause by the different versions of oracle database, I saw this post was almost 13 years old and lots of things might change, such as CBO, I use 11g right now by the way. Here are my process, and hope you can help with me on that.
jerry@DBJERRY> create table big as select * from all_objects;
Table created.
jerry@DBJERRY> insert into big select * from big;
55625 rows created.
jerry@DBJERRY> commit;
Commit complete.
jerry@DBJERRY> insert into big select * from big;
111250 rows created.
jerry@DBJERRY> commit;
Commit complete.
jerry@DBJERRY> insert into big select * from big;
222500 rows created.
jerry@DBJERRY> commit;
Commit complete.
jerry@DBJERRY> insert into big select * from big;
445000 rows created.
jerry@DBJERRY> commit;
Commit complete.
jerry@DBJERRY> create table small as select * from all_objects where rownum<100;
Table created.
jerry@DBJERRY> create index big_idx on big(object_id);
Index created.
jerry@DBJERRY> create index small_idx on small(object_id);
Index created.
jerry@DBJERRY> exec dbms_stats.gather_table_stats(user,'big',cascade=>true);
PL/SQL procedure successfully completed.
jerry@DBJERRY> exec dbms_stats.gather_table_stats(user,'small',cascade=>true);
PL/SQL procedure successfully completed.
jerry@DBJERRY> alter system flush buffer_cache;
System altered.
jerry@DBJERRY> alter system flush shared_pool;
System altered.
jerry@DBJERRY> set autotrace on
till now my setup is done and follow up with the example:
***** outer query is big and inner query is small*****
jerry@DBJERRY> select count(subobject_name) from big where object_id in(select object_id from small);
COUNT(SUBOBJECT_NAME)
---------------------
                    0
Execution Plan
----------------------------------------------------------
Plan hash value: 3874879108
-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     1 |    26 |   854   (1)| 00:00:11 |
|   1 |  SORT AGGREGATE               |           |     1 |    26 |            |          |
|   2 |   NESTED LOOPS                |           |       |       |            |          |
|   3 |    NESTED LOOPS               |           |  1573 | 40898 |   854   (1)| 00:00:11 |
|   4 |     SORT UNIQUE               |           |    99 |   396 |     1   (0)| 00:00:01 |
|   5 |      INDEX FULL SCAN          | SMALL_IDX |    99 |   396 |     1   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | BIG_IDX   |    16 |       |     2   (0)| 00:00:01 |
|   7 |    TABLE ACCESS BY INDEX ROWID| BIG       |    16 |   352 |    18   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("OBJECT_ID"="OBJECT_ID")
Statistics
----------------------------------------------------------
        508  recursive calls    --reduce to 0 at the second run
          0  db block gets
       1691  consistent gets    --reduce to 1634 at the second run
         61  physical reads    --reduce to 0 at the second run
          0  redo size
        434  bytes sent via SQL*Net to client
        415  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          9  sorts (memory)    --reduce to 1 at the second run
jerry@DBJERRY> select count(subobject_name) from big where exists(select null from small where small.object_id = big.object_id);
COUNT(SUBOBJECT_NAME)
---------------------
                    0
Execution Plan
----------------------------------------------------------
Plan hash value: 3874879108
-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     1 |    26 |   854   (1)| 00:00:11 |
|   1 |  SORT AGGREGATE               |           |     1 |    26 |            |          |
|   2 |   NESTED LOOPS                |           |       |       |            |          |
|   3 |    NESTED LOOPS               |           |  1573 | 40898 |   854   (1)| 00:00:11 |
|   4 |     SORT UNIQUE               |           |    99 |   396 |     1   (0)| 00:00:01 |
|   5 |      INDEX FULL SCAN          | SMALL_IDX |    99 |   396 |     1   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | BIG_IDX   |    16 |       |     2   (0)| 00:00:01 |
|   7 |    TABLE ACCESS BY INDEX ROWID| BIG       |    16 |   352 |    18   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("SMALL"."OBJECT_ID"="BIG"."OBJECT_ID")
Statistics
----------------------------------------------------------
          1  recursive calls   --reduce to 0 at the second run
          0  db block gets
       1634  consistent gets
          0  physical reads
          0  redo size
        434  bytes sent via SQL*Net to client
        415  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed
****** outer query is small and inner query is big*****
jerry@DBJERRY> select count(subobject_name) from small where object_id in(select object_id from big);
COUNT(SUBOBJECT_NAME)
---------------------
                    0
Execution Plan
----------------------------------------------------------
Plan hash value: 4075727039
-------------------------------------------------------------------------------
| Id  | Operation           | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |     1 |    26 |   201   (0)| 00:00:03 |
|   1 |  SORT AGGREGATE     |         |     1 |    26 |            |          |
|   2 |   NESTED LOOPS SEMI |         |    99 |  2574 |   201   (0)| 00:00:03 |
|   3 |    TABLE ACCESS FULL| SMALL   |    99 |  2079 |     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN | BIG_IDX |   890K|  4345K|     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("OBJECT_ID"="OBJECT_ID")
Statistics
----------------------------------------------------------
         24  recursive calls   --reduce to 0 at the second run
          0  db block gets
         53  consistent gets   --reduce to 50 at the second run
          3  physical reads
          0  redo size
        434  bytes sent via SQL*Net to client
        415  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
jerry@DBJERRY> select count(subobject_name) from small where exists(select null from big where small.object_id = big.object_id);
COUNT(SUBOBJECT_NAME)
---------------------
                    0
Execution Plan
----------------------------------------------------------
Plan hash value: 4075727039
-------------------------------------------------------------------------------
| Id  | Operation           | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |     1 |    26 |   201   (0)| 00:00:03 |
|   1 |  SORT AGGREGATE     |         |     1 |    26 |            |          |
|   2 |   NESTED LOOPS SEMI |         |    99 |  2574 |   201   (0)| 00:00:03 |
|   3 |    TABLE ACCESS FULL| SMALL   |    99 |  2079 |     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN | BIG_IDX |   890K|  4345K|     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("SMALL"."OBJECT_ID"="BIG"."OBJECT_ID")
Statistics
----------------------------------------------------------
          1  recursive calls   --reduce to 0 at the second run
          0  db block gets
         50  consistent gets
          0  physical reads
          0  redo size
        434  bytes sent via SQL*Net to client
        415  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
the results are they are almost the same
Thanks for your time and hope you have a nice day!
 
 
 
A reader, January   17, 2016 - 4:13 pm UTC
 
 
Neat explanation 
 
Is the above explanations still true for Oracle 12C database?
Manisundaram R, July      17, 2018 - 3:13 pm UTC
 
 
In some websites regarding IN vs EXISTS, it is pointed out that after oracle 9i, the database is intelligent enough to choose which would perform better. Is it really true? 
July      18, 2018 - 4:46 am UTC 
 
 
 
 
A reader, February  13, 2019 - 11:39 am UTC
 
 
 
 
Semi join and anti joins
Asim, October   11, 2022 - 4:20 am UTC
 
 
Are semi joins and anti joins really joins? Or are they just Oracle specific terms? I Mean all the examples shown of above 2 type of joins are using subquery in WHERE clause, there is no joining in FROM clause.
Is there any ANSI standard syntax of semi and anti join?