Home>Question Details



wallacel -- Thanks for the question regarding "IN & EXISTS", version 8.1.6

Submitted on 24-Oct-2000 11:41 Central time zone
Last updated 15-Mar-2008 9:37

You Asked

Tom:

can you give me some example at which situation
IN is better than exist, and vice versa. 

and we said...

Well, the two are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select * 
  from t1, ( select distinct y from t2 ) t2
 where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to 
the original table -- typically.


As opposed to 

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:


   for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then 
         OUTPUT THE RECORD
      end if
   end loop

It always results in a full scan of T1 whereas the first query can make use of an index 
on T1(x).


So, when is where exists appropriate and in appropriate?

Lets say the result of the subquery
    ( select y from T2 )

is "huge" and takes a long time.  But the table T1 is relatively small and executing ( 
select null from t2 where y = x.x ) is very very fast (nice index on t2(y)).  Then the 
exists will be faster as the time to full scan T1 and do the index probe into T2 could be 
less then the time to simply full scan T2 to build the subquery we need to distinct on.


Lets say the result of the subquery is small -- then IN is typicaly more appropriate.


If both the subquery and the outer table are huge -- either might work as well as the 
other -- depends on the indexes and other factors. 

Reviews    
5 stars Thanks for the question regarding "IN & EXISTS", version 8.1.6   March 31, 2001 - 7pm Central time zone
Reviewer: Seetharaman Srinivasan from Chennai , TamilNadu, India
Excellent 


4 stars Thanks for the question regarding "IN & EXISTS", version 8.1.6   April 11, 2001 - 3pm Central time zone
Reviewer: Mani Srinivasan from CA USA
This is good stuff.... 


5 stars   May 18, 2001 - 2am Central time zone
Reviewer: Helena Markova from Bratislava, Slovakia


5 stars Could not Call Report from Form (Menu)   May 28, 2001 - 1am Central time zone
Reviewer: Mohd. Manik Miah from Dhaka, Bangladesh
Dear Sir

I learned oracle a few months ago. 
But I could not call report from Form (Menu). 
Would you please help me in this regard?

Thank & best regards
Manik Miah 
manikmiah@yahoo.com 


5 stars Instead of T, why cant you use EMP AND DEPT TABLES   September 28, 2001 - 4pm Central time zone
Reviewer: A reader 
Tom

Can you illustrate your answer by using the emp dept table.
and by 
1.increasing and decreasing the number of rows in both the tables.
2.by dropping and adding indexes on both the tables

That way your explanation will be vindicated.

Can you do that please. 


Followup   September 29, 2001 - 2pm Central time zone:

Well -- I'm not going to use EMP and DEPT as I would  have to generate tons of EMP and DEPT data to 
illustrate my point (feel free to do that if you want to ;)

I'll use BIG and SMALL to make the point. I ran:

rem create table big as select * from all_objects;
rem insert /*+ append */ into big select * from big;
rem commit;
rem insert /*+ append */ into big select * from big;
rem commit;
rem insert /*+ append */ into big select * from big;
rem create index big_idx on big(object_id);
rem
rem
rem create table small as select * from all_objects where rownum < 100;
rem create index small_idx on small(object_id);
rem
rem analyze table big compute statistics
rem for table
rem for all indexes
rem for all indexed columns
rem /
rem analyze table small compute statistics
rem for table
rem for all indexes
rem for all indexed columns
rem /

so, small has 99 rows, big has 133,000+

select count(subobject_name)
  from big
 where object_id in ( select object_id from small )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.02       0.02          0        993          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.03       0.03          0        993          0           1

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
    792    MERGE JOIN
    100     SORT (JOIN)
    100      VIEW OF 'VW_NSO_1'
     99       SORT (UNIQUE)
    792        INDEX   GOAL: ANALYZED (FULL SCAN) OF 'SMALL_IDX'
                   (NON-UNIQUE)
    891     SORT (JOIN)
      0      TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'BIG'


versus:

select count(subobject_name)
  from big
 where exists ( select null from small where small.object_id = big.object_id )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      4.12       4.12          0     135356         15           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      4.12       4.12          0     135356         15           1

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
    792    FILTER
 135297     TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'BIG'
 133504     INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'SMALL_IDX'
                (NON-UNIQUE)

That shows if the outer query is "big" and the inner query is "small", in is generally more 
efficient then NOT EXISTS.

Now:

select count(subobject_name)
  from small
 where object_id in ( select object_id from big )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2      0.51       0.82         50        298         22           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.52       0.83         50        298         22           1



Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     99    MERGE JOIN
  16913     SORT (JOIN)
  16912      VIEW OF 'VW_NSO_1'
  16912       SORT (UNIQUE)
 135296        INDEX   GOAL: ANALYZED (FAST FULL SCAN) OF 'BIG_IDX'
                   (NON-UNIQUE)
     99     SORT (JOIN)
     99      TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'SMALL'


versus:
select count(subobject_name)
  from small
 where exists ( select null from big where small.object_id = big.object_id )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.01       0.01          0        204         12           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.01       0.01          0        204         12           1

EGATE)
     99    FILTER
    100     TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'SMALL'
     99     INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'BIG_IDX' (NON-UNIQUE)

shows that is the outer query is "small" and the inner query is "big" -- a WHERE EXISTS can be 
quite efficient.


 

3 stars What difference does it make   September 30, 2001 - 12am Central time zone
Reviewer: A reader 
when you remove those indexes on the small and big tables.

 


Followup   September 30, 2001 - 8am Central time zone:

Run it and see.  All of the necessary code is there -- it is very easy to do these sorts of tests.

The outcome is easy to hypothesize however.

The "select * from big where object_id in ( select object_id from small )" will sort BIG once and 
SMALL once and join them (sort merge join) in all likelyhood.

The "select * from big where exists ( select null from small where small.object_id = big.object_id 
)" will tend to FULL SCAN big once and FOR EVERY ROW in big -- FULL SCAN small.

(after running it -- i verified it and the "rule of thumb" holds true.  BIG outer query and SMALL 
inner query = IN.  SMALL outer query and BIG inner query = WHERE EXISTS.  Remember -- thats is a 
RULE OF THUMB and rules of thumb always have infinitely many exceptions to the rule.
 

5 stars Tom, this is very clever and not fair   September 30, 2001 - 9am Central time zone
Reviewer: TJ from USA
You have explained the whole of the in and exists concepts so well( no one has ever done in like 
this, you must be one happy person, you believe in sharing)

and now you leave us with an open ended answer..

"Remember -- thats is a RULE OF THUMB and rules of thumb always have infinitely many exceptions to 
the rule."

Kindly discuss what can be some of those possible exceptions, tell us what factors can cause  those 
exceptions.


 


Followup   September 30, 2001 - 10am Central time zone:

Well, there are infinitely many -- IO can affect the outcome of this.  The goal of the query can 
effect this (eg: the WHERE EXISTS will find the first row faster in general then the IN will -- the 
IN will get the LAST row (all rows) faster then the where exists).  If your goal is the FIRST row 
-- exists might totally blow away IN.  If you are a batch process (and hence getting to the LAST 
row is vital) .... 

and so on.  The point is:  be aware of their differences, try them both when tuning, understand 
conceptually what they do and you'll be able to use them to maximum effect.

 

5 stars does this explanation hold good even for NOT IN and NOT EXISTS   September 30, 2001 - 7pm Central time zone
Reviewer: Chenna 


Followup   October 1, 2001 - 6am Central time zone:

No, partially because NOT IN and NOT EXISTS are not substitutes for eachother!  They mean 
semantically different things and can/will return different answers.  See
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:442029737684
for example 

5 stars Summary   October 1, 2001 - 10am Central time zone
Reviewer: Chenna 
This is discussion of the top most class --

O.k. let me try to sum up what you are saying --

in and exists can be substitutes for each other, and we have to use one in the place of other 
depending upon the rowns returned by the outer and inner query,

where as

'not in' and 'not exists' are not substitutes for each other , and they are not related in any way. 
There use depends on the specific need.

o.k. to sum it up finally , can you give us examples of when the use of 'not in' is appropriate and 
when 'not exists' is appropriate. 


Followup   October 1, 2001 - 3pm Central time zone:

you got the first part right (in/exists)

the second part is too strong.  not in and not exists are not perfect substitutes.  NOT IN is 
different then NOT exists but NOT EXISTS and NOT IN are the same when the subquery you use in the 
NOT IN does not contain NULLS.

Both of not in and not exists can be very efficient when there are no nulls (and not in WITH THE 
CBO is pretty good -- using an "anti join" -- see the design/tuning for performance guide for 
details  on that).  With NULLS -- a not in can be very in-efficient and many people substitute a 
not exists for it (not realzing the ANSWER just changed!!!)

 

3 stars anti join   October 1, 2001 - 3pm Central time zone
Reviewer: A reader 
(and not in WITH THE CBO is pretty good -- using an "anti join" -- see the design/tuning for 
performance guide for details  on that)

Please give the link to the concerned page in the manual, there is no search feature in the 
document to search for 'anti join' 


5 stars the mystery with non in and not exists   October 1, 2001 - 9pm Central time zone
Reviewer: A reader 
"Both of not in and not exists can be very efficient when there are no nulls (and not in WITH THE 
CBO is pretty good -- using an "anti join" -- see the design/tuning for performance guide for 
details  on that).  With NULLS -- a not in can be very in-efficient and many people substitute a 
not exists for it (not realzing the ANSWER just changed!!!) "

1.Not in and not exists return the same performance result when there arer no nulls.
2.Not in is especially very efficient when used with cbo and no nulls.
3.And when there are nulls what is preferred is not exists, and not not in.

I think Iam right in what I understood above.

Now

1.Why is not in in efficient when there are nulls. Can you demonstrate this.
2.And how is not exists better over not in when there are nulls.
3.What do you mean by saying
"and many people substitute a not exists for it (not realzing the ANSWER just changed!!!) "
do you mean not exists should be substituted for  not in when there are nulls ???????

 


Followup   October 2, 2001 - 7pm Central time zone:

<code>1a) not in can be faster then not exists (or slower). The are different, do different things. A not exists is alot like an EXISTS. A not in when the subquery cannot contain NULLS can be alot like an IN (processed as an ANTI-JOIN instead of a JOIN). So, just like IN is sometimes better then EXISTS and vice versa, NOT IT is sometimes (and sometimes not) better then NOT EXISTS

2a) thats when it works best, yes

3a) typically...

1b) Ok, time for the demo of NOT IN and NOT EXISTS (i guess I knew it was coming eventually!)

Here we go:

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table big as select * from all_objects;
Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table small as select * from all_objects where rownum < 10;
Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table big modify object_id null;
Table altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table small modify object_id null;
Table altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> update small set object_id = null where rownum = 1;

1 row updated.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create unique index big_idx on big(object_id);
Index created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create unique index small_idx on small(object_id);
Index created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table big compute statistics
2 for table
3 for all indexes
4 for all indexed columns;
Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table small compute statistics
2 for table
3 for all indexes
4 for all indexed columns;
Table analyzed.


So, there is the big setup -- we have our tables. SMALL will be used in the subquery. As it stands now -- lets see what happens with NOT IN and NOT EXISTS:


ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly

ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from big
2  where NOT EXISTS (select null
3              from small
4              where small.object_id = big.object_id)
5 /

16886 rows selected.


Execution Plan
----------------------------------------------------------
  0    SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
  1  0  FILTER
  2  1  TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
  3  1  INDEX (UNIQUE SCAN) OF 'SMALL_IDX' (UNIQUE)


Statistics
----------------------------------------------------------
    305 recursive calls
      15 db block gets
    18278 consistent gets
      0 physical reads
      0 redo size
  1961046 bytes sent via SQL*Net to client
  125305 bytes received via SQL*Net from client
    1127 SQL*Net roundtrips to/from client
      2 sorts (memory)
      0 sorts (disk)
    16886 rows processed

So, for each row in BIG it did an INDEX PROBE into SMALL to see if that row existed or not. So far, so good... 18k consistent reads (high) but not as high as:


ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from big
2  where object_id NOT IN ( select object_id
3                    from small )
4 /
no rows selected


Execution Plan
----------------------------------------------------------
  0    SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
  1  0  FILTER
  2  1  TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
  3  1  TABLE ACCESS (FULL) OF 'SMALL' (Cost=1 Card=1 Bytes=3)




Statistics
----------------------------------------------------------
      0 recursive calls
  202743 db block gets
    84706 consistent gets
      0 physical reads
      0 redo size
    862 bytes sent via SQL*Net to client
    319 bytes received via SQL*Net from client
      1 SQL*Net roundtrips to/from client
      2 sorts (memory)
      0 sorts (disk)
      0 rows processed

Wow -- what happened here? Well, for each row in BIG -- we FULL SCANNED small -- yes, for each row, we full scanned small -- we full scanned small some 16,000 times (imagine if small weren't so small). Also take care to note that this query processed 0 rows - no data found. The prior one got over 16k -- that is the effect of a NULL in the subquery of a NOT IN.

So, lets get rid of that nasty (we don't have to get rid of it, we could use a predicate to preclude it as well and we'll do that on BIG as a demonstration)


ops$tkyte@ORA817DEV.US.ORACLE.COM> update small set object_id = -1 where object_id is null;
1 row updated.

ops$tkyte@ORA817DEV.US.ORACLE.COM> commit;
Commit complete.

ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table small modify object_id NOT NULL;
Table altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table small compute statistics
2 for table
3 for all indexes
4 for all indexed columns;
Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big
2  where NOT EXISTS (select null
3              from small
4              where small.object_id = big.object_id)
5 /

16886 rows selected.


Execution Plan
----------------------------------------------------------
  0    SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
  1  0  FILTER
  2  1  TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
  3  1  INDEX (UNIQUE SCAN) OF 'SMALL_IDX' (UNIQUE)




Statistics
----------------------------------------------------------
      71 recursive calls
      15 db block gets
    18270 consistent gets
      0 physical reads
      0 redo size
  1961046 bytes sent via SQL*Net to client
  125305 bytes received via SQL*Net from client
    1127 SQL*Net roundtrips to/from client
      2 sorts (memory)
      0 sorts (disk)
    16886 rows processed

So, that is no different... but:


ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b0
2  where object_id IS NOT NULL
3  and object_id NOT IN ( select object_id
4                    from small )
5 /

16886 rows selected.


Execution Plan
----------------------------------------------------------
  0    SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
  1  0  FILTER
  2  1  TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
  3  1  INDEX (UNIQUE SCAN) OF 'SMALL_IDX' (UNIQUE)




Statistics
----------------------------------------------------------
      0 recursive calls
      15 db block gets
    18243 consistent gets
      0 physical reads
      0 redo size
  1961046 bytes sent via SQL*Net to client
  125305 bytes received via SQL*Net from client
    1127 SQL*Net roundtrips to/from client
      2 sorts (memory)
      0 sorts (disk)
    16886 rows processed

Hey -- thats pretty good already. Its just like the NOT EXISTS. But wait -- there is more. Lets start anti-joining...


ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set always_anti_join = MERGE
2 /
Session altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b1
2  where object_id is not null
3  and object_id NOT IN ( select object_id
4                    from small )
5 /

16886 rows selected.


Execution Plan
----------------------------------------------------------
  0    SELECT STATEMENT Optimizer=CHOOSE (Cost=269 Card=16885 Bytes=1857350)
  1  0  MERGE JOIN (ANTI) (Cost=269 Card=16885 Bytes=1857350)
  2  1  SORT (JOIN) (Cost=267 Card=16894 Bytes=1638718)
  3  2    TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=16894 Bytes=1638718)
  4  1  SORT (UNIQUE) (Cost=3 Card=9 Bytes=117)
  5  4    VIEW OF 'VW_NSO_1' (Cost=1 Card=9 Bytes=117)
  6  5      INDEX (FULL SCAN) OF 'SMALL_IDX' (UNIQUE) (Cost=1 Card=9 Bytes=36)




Statistics
----------------------------------------------------------
      0 recursive calls
      19 db block gets
    237 consistent gets
    236 physical reads
      0 redo size
  1863304 bytes sent via
5 stars ANTI JOIN and 'ANY', 'SOME','ALL'   October 2, 2001 - 4pm Central time zone
Reviewer: Chenna 
Optimizing Anti-Joins and Semi-Joins 
An anti-join returns rows from the left side of the predicate for which there is no corresponding 
row on the right side of the predicate. That is, it returns rows that fail to match (NOT IN) the 
subquery on the right side. For example, an anti-join can select a list of employees who are not in 
a particular set of departments: 

SELECT * FROM emp 
  WHERE deptno NOT IN 
    (SELECT deptno FROM dept 
      WHERE loc = 'HEADQUARTERS'); 


The optimizer uses a nested loops algorithm for NOT IN subqueries by default, unless the 
initialization parameter ALWAYS_ANTI_JOIN is set to MERGE or HASH and various required conditions 
are met that allow the transformation of the NOT IN subquery into a sort-merge or hash anti-join. 
You can place a MERGE_AJ or HASH_AJ hint in the NOT IN subquery to specify which algorithm the 
optimizer should use. 

A semi-join returns rows that match an EXISTS subquery, without duplicating rows from the left side 
of the predicate when multiple rows on the right side satisfy the criteria of the subquery. For 
example: 

SELECT * FROM dept 
  WHERE EXISTS 
    (SELECT * FROM emp 
      WHERE dept.ename = emp.ename 
        AND emp.bonus > 5000); 


In this query, only one row needs to be returned from dept even though many rows in emp might match 
the subquery. If there is no index on the bonus column in emp, then a semi-join can be used to 
improve query performance. 

The optimizer uses a nested loops algorithm for EXISTS subqueries by default, unless the 
initialization parameter ALWAYS_SEMI_JOIN is set to MERGE or HASH and various required conditions 
are met. You can place a MERGE_SJ or HASH_SJ hint in the EXISTS subquery to specify which algorithm 
the optimizer should use. 



SELECT * FROM dept 
  WHERE EXISTS 
    (SELECT * FROM emp 
      WHERE dept.ename = emp.ename 
        AND emp.bonus > 5000); 
What difference does it make when there is an index on the bonus column.

A semi join is already being executed in the above query, right ?

How will the use of MERGE_SJ or HASH_SJ hint in the EXISTS subquery improve the performance.

what do these hints tell the optimizer.


And , finally, can you give examples of using 'ANY' ,'ALL' AND 'SOME' clauses, as to where and when 
to use these clauses. We have never used these clauses.
 


Followup   October 2, 2001 - 7pm Central time zone:

ANY/SOME are alot like an in.

where x in ( select y from t )

is the same as:

where x = ANY(select y from t)

SOME is an aliase for ANY, ANY is the same as SOME. 

any/some can be useful with inequalities:

scott@ORA817DEV.US.ORACLE.COM> select ename, sal, comm from emp where sal < some( select comm from 
emp );

ENAME             SAL       COMM
---------- ---------- ----------
A                 800
WARD             1250        500
MARTIN           1250       1400
ADAMS            1100
JAMES             950
MILLER           1300

6 rows selected.

that finds emps such that there exists SOME commision greater then their salary....

ALL is a cool one.  I use it to find the most frequently ocurring thing, for example:

scott@ORA817DEV.US.ORACLE.COM> select to_char(hiredate,'mm')
  2  from emp 
  3  group by to_char(hiredate,'mm')
  4  having count(*) >= ALL ( select count(*) from emp group by to_char(hiredate,'mm') );

TO
--
12

that shows me the month with the MOST hiredates....  it says give me the month whose count is 
greater than or equal to ALL other counts by month...

 

5 stars Tom, will the result be the same if ..   October 3, 2001 - 2am Central time zone
Reviewer: Chenna 
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set always_anti_join = MERGE
  2  /
Session altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b1
  2   where object_id is not null
  3     and object_id NOT IN ( select object_id
  4                              from small )
  5  /

16886 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=269 Card=16885 Bytes=1857350)
   1    0   MERGE JOIN (ANTI) (Cost=269 Card=16885 Bytes=1857350)
   2    1     SORT (JOIN) (Cost=267 Card=16894 Bytes=1638718)
   3    2       TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=16894 Bytes=1638718)
   4    1     SORT (UNIQUE) (Cost=3 Card=9 Bytes=117)
   5    4       VIEW OF 'VW_NSO_1' (Cost=1 Card=9 Bytes=117)
   6    5         INDEX (FULL SCAN) OF 'SMALL_IDX' (UNIQUE) (Cost=1 Card=9 
Bytes=36)




Statistics
----------------------------------------------------------
          0  recursive calls
         19  db block gets
        237  consistent gets
        236  physical reads
          0  redo size
    1863304  bytes sent via SQL*Net to client
     125305  bytes received via SQL*Net from client
       1127  SQL*Net roundtrips to/from client
          4  sorts (memory)
          1  sorts (disk)
      16886  rows processed


1.Tom, will the result be the same  if there is no index on the object_id of the small table.


2.You are using CBO , all the way, what will be the result of all these tests for 'in and exists' 
and 'not in and not exists' if we are using RBO.

3.After we analyze the tables, how should we switch to RBO if we want to.

4.Will the push_sq (push sub query ) come into picture in any way inthe above examples. Actually 
when is this hint relevant.

5.out of context: Can you give us the link to the oracle error messages manual.

6.Suppose if the query's where clause has 5 sub queries, for the third sub query in the where 
clause , what will be the outer query, I mean how will it resolve.

For the fifth sub query in the where clause, I guess the whole of the restof the query will be the 
outer query ? 


Followup   October 3, 2001 - 8am Central time zone:

1) yes more or less -- the ANSWER will be the same (16,886 rows processed).  The plan will be 
slightly different -- a FULL SCAN of small instead of an index full scan.  (you have ALL of the 
code needed to replicate my tests -- try it out!)

2) I don't recommend the RBO with 8i -- too many features not available.  But -- hey -- you have 
the code, give it a whirl, try it out (but make sure you have lots of time -- the RBO is not as 
smart as the CBO when it comes to this -- reread this entire article, I stress the need for the CBO 
to make this efficient!)

3) use an alter session set optimizer_goal or /*+ RULE */ hint in the query.

4) push_subq affects when subqueries are evaluated (last by default, earlier if you ask for it).  
So yes, it could affect the plan here.

5) all documentation is available from 
http://otn.oracle.com/,
the Oracle error messages manual varies from release to release.  Suggest you goto otn, follow the 
documentation link and goto the version you are interested in.

6) this does not make syntactic sense to me.


 

5 stars Train us to use parallel query   October 3, 2001 - 6pm Central time zone
Reviewer: A reader 
From the oracle magazine article


What Can Execute in Parallel? 
By using parallel options, data-sensitive SQL state-
ments, database recovery, and data loads can be exe-
cuted by multiple processes simultaneously. With the
release of Oracle 8i, all the following operations can
be executed in parallel: 

Table scan 
PL/SQL functions called from SQL 
Sort merge join 
"Not in"
Nested loop join 
Select distinct
Hash join 
Aggregation
Group by 
Cube
Union and union all 
Create table as select
Order by 
Rebuild index partition
Rollup 
Move partition
Create index 
Update 
Rebuild index 
Insert . . . select 
Split partition 
Enable constraint (the table
Delete scan is parallelized)
Star transformation 
 

If the want to execute the above not in sub queries in parallel, actually when should I decide to 
do it in paralle.

Will it suffice if I give the following.

Select * from big b1
  2   where object_id is not null
  3     and object_id NOT IN ( select/*+ PARALLEL(a1,8)*/    object_id  from small a1)

usually I see that it is the  Sort merge joins which take the most amount of time in a query, how 
can I parallelize a sort merge join. 

Thank You 



   


Followup   October 4, 2001 - 6am Central time zone:

If you have to parallelize the subquery -- you are in a heap of trouble.  that means it is really 
big and since you would be full scanning IT once per row in big -- its going to be slow.  Look at 
the other options below in order to AVOID having to full scan it once per row.

My advice here -- try it yourself.  Take your ideas, set up the test case, test it out -- see what 
works, what doesn't.  Its the best way to really learn and understand how this works. 

5 stars I will do it only my own but -- --   October 4, 2001 - 11am Central time zone
Reviewer: Chenna 
Select * from big b1
  2   where object_id is not null
  3     and object_id NOT IN ( select/*+ PARALLEL(a1,8)*/    object_id  from 
small a1)


O.k. , let me try parallel, for myself, and report the conclusions to you for discussion.

But before that , you need to weed out these small doubts for me ..

1.select/*+ PARALLEL(a1,8)*/    object_id  from 
small a1  -- before using the parallel hint, do I need to change any init.ora parameters, 

2.I understand that 8 refers to the number of processors, how can I know the number of processors 
available for my server.

3.Where you use parallel query, how does my query get divided between multiple processors. Where 
are the results from each divided query kept , and then combined , before they are displayed to the 
user.

4.Give us a brief insignt into the internals of how parallel query works.

And by the way, I cant see the asktom or any other images  on this page. Some thing you might want 
to know about.
 


Followup   October 4, 2001 - 8pm Central time zone:

1) probably not, do a show parameter parallel and check out the settings - make sure parallel max 
servers isn't 0

2) you own the machine don't you?  show parameter cpu_count will tell you as well

3) read
http://download-east.oracle.com/docs/cd/A81042_01/DOC/server.816/a76965/c22paral.htm#365
4) see #3


as for the images, must have been a temporary "issue" - they seem alright now. 

4 stars   October 4, 2001 - 10pm Central time zone
Reviewer: Chenna 
SQL> connect system/manager
Connected.
SQL> show parameter cpu_count

NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
cpu_count                            integer 1
SQL>  show parameter parallel

NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
fast_start_parallel_rollback         string  LOW
optimizer_percent_parallel           integer 0
parallel_adaptive_multi_user         boolean FALSE
parallel_automatic_tuning            boolean FALSE
parallel_broadcast_enabled           boolean FALSE
parallel_execution_message_size      integer 2148
parallel_instance_group              string
parallel_max_servers                 integer 5
parallel_min_percent                 integer 0
parallel_min_servers                 integer 0
parallel_server                      boolean FALSE

NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
parallel_server_instances            integer 1
parallel_threads_per_cpu             integer 2
recovery_parallelism                 integer 0
SQL> 

The cpu count shows only one , but look at this 
parallel_max_servers                 integer 5.

1.how can the parallel_max_servers be 5 when the cpu count is 1.
2.Does this mean that I can give a parallel hint to use 5 parallel servers.

3.When the cpu count is 1 I expect the parallel_max_servers 
to be 1. 

Perplexing? Need to be clarified.

 


Followup   October 5, 2001 - 11am Central time zone:

did you read the documentation i pointed you to?

you can do parallel query on a single cpu box.  You can do a parallel 100 query on a 20 cpu box.  
You need not have parity between processes and cpu's (if you did, unix would be a single user OS).

You can have parallel_max_servers at whatever you want, regardless of the number of cpus on the 
system.  Since parallel queries typically involve tons of IO, and IO is something you wait for -- 
there is plenty of opportunity for using parallel query on a single CPU system (when one process is 
waiting for IO, blocked, the other process can be munching through some data it read).

Read the docs, get a good understanding of how it really works -- then it won't be confusing. 

5 stars sub query   October 6, 2001 - 5pm Central time zone
Reviewer: Jay 
Tom

Lets assume that the where clause has 10 conditions, and each condition gets its value from a sub 
query. IN what order do these sub queries get resolved.

select a from b,c
were b.x=c.x
and b.t=(select t from some ttablle where something)
and c.l=(select l from some xtable where something)

Now when resolving and c.l=(select l from some xtable where something)

will select a from b,c
were b.x=c.x
and b.t=(select t from some ttablle where something)

be considered as outer query, or only
select a from b,c
were b.x=c.x

be considered as outer query


 


Followup   October 7, 2001 - 4pm Central time zone:

when looking at the IN, the outer query is the result set that will be joined to the IN result.  

if b.t = (select t ... ) is processed FIRST (before c.l = ( select l...)), the that would be the 
"outer query" with regards to c.1 = ( select l...).  

However -- you cannot count on the order of operation of predicates!!! we can and will change the 
order at will. 

5 stars Good topic   May 17, 2002 - 12pm Central time zone
Reviewer: A reader 
Tom and team,  Many thanks.  Love your site.  

I'm running Oracle 8.1.7.3
I need help with a "NOT IN" query.  
Here is what the tables look like:

Table 1:  

CREATE TABLE SKU
    (ITEM_ID                        NUMBER NOT NULL,
    SKU_CODE                       VARCHAR2(10) NOT NULL,
    SKU_DESC                       VARCHAR2(255) NOT NULL
  ,
  CONSTRAINT PK_SKU
  PRIMARY KEY (ITEM_ID, SKU_CODE)
  USING INDEX)
/

CREATE TABLE SKU_SOLD_OUT
    (ITEM_ID                        NUMBER NOT NULL,
    SKU_CODE                       VARCHAR2(10) NOT NULL,
    SOLD_OUT_DATE                  DATE NOT NULL,
    DELETED                        CHAR(1) NOT NULL,
    SOLD_OUT_TEXT                  VARCHAR2(1000) NOT NULL
  ,
  CONSTRAINT PK_SKU_SOLD_OUT
  PRIMARY KEY (ITEM_ID, SKU_CODE, SOLD_OUT_DATE)
  USING INDEX)
/

With item_id, sku_code combinations having zero to many rows in sku_sold_out 
but only one will be active (deleted = 'N') at a time.

I need to get a list of items that are not completely sold out.  i.e.
If an Item_id has four sku_codes and three are currently sold out, return the item_id.
If all four sku_codes are sold out, do not return the item_id.

Here is the query I am useing now.

select distinct item_id 
from sku
where item_id||'-'||sku_code not in (
    select item_id||'-'||sku_code 
    from sku_sold_out
    where deleted = 'N'
     and  sold_out_date <= sysdate)

sku has about 10,000 rows
sku sold out has about 120,000 rows.

Is there a better way of doing this?  The schema is set and is production so I cant 
make any database changes other than views, procs, ect.

Thanks for the help.  
Thanks 


Followup   May 17, 2002 - 1pm Central time zone:

Well, other ways would be


select distinct item_id
  from sku
  where (item_id,sku_code) not in ( select item_id, sku_code
                                      from sku_sold_out
                                     where deleted = 'N'
                                       and  sold_out_date <= sysdate)
/

select distinct sku.item_id
  from sku, sku_sold_out
 where sku.item_id = sku_sold_out.item_id (+)
   and sku.sku_code = sku_sold_out.sku_code (+)
   and nvl(sku_sold_out.deleted,'Y') = 'Y'
   and nvl(sku_sold_out.sold_out_date,sysdate) <= sysdate
/


 

4 stars Index result of Subquery and FILTER it   May 18, 2002 - 12am Central time zone
Reviewer: Jaromir D.B. Nemec from Vienna, Austria
Hi Tom,

in the original explanation you told us:

The subquery is evaluated, distinct'ed, >>>indexed<<<< (or hashed or sorted) and then ....

1) Can you explain how can be the result of a subquery indexed.  
temporary index is created or something like that? What you see in explain plan in that case? - 
never heard of this feature.

2) Some of the explain plans you demonstrated consist of:

FILTER
    row set 1
    row set 2

I've learned that FILTER is a kind of "filter" to suppress rows that doesn't match some condition. 
This is  evidently NOT (in) the case.



Can you explain (or link) what is the functionality of this two-row-ser-filter? 
My experience is that this kind of FILTER operate similar to  nested loops and is not performable 
in parallel - is it right?

Thanks


Jaromir

www.db-nemec.com

 


Followup   May 18, 2002 - 10am Central time zone:

1) it is stored sorted or hashed.  The structure itself is an implicit "indexing" of sorts.  Lets 
say you were doing a sort merge join -- you would see two sorts (out to temp) and then a merge.  A 
hash join -- one of the sets gets hashed.

for #2, I assume you mean something like:

ops$tkyte@ORA817DEV.US.ORACLE.COM> Select * from big b0
  2   where object_id IS NOT NULL
  3     and object_id NOT IN ( select object_id
  4                              from small )
  5  /

16886 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=845 Bytes=81965)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'BIG' (Cost=7 Card=845 Bytes=81965)
   3    1     INDEX (UNIQUE SCAN) OF 'SMALL_IDX' (UNIQUE)



The FILTER in this case is sort of like a nested loops join yes.  The table access full is done -- 
and the "is not null" part is evaluated.  Then each row must be filtered for the NOT IN and that is 
done via the index.  

This could be done in parallel -- meaning the full scan can be parallelized.

 

4 stars Useful info about IN & EXISTS   July 23, 2002 - 3pm Central time zone
Reviewer: Vish Inamdar from USA
Yes, this article gives me clear idea how IN & EXISTS are different from each other and where to 
apply them.

Good and simple example made it more clear.

I will be asking more question in future.

Vish
 


4 stars Exists, IN and consistent gets   August 22, 2002 - 8am Central time zone
Reviewer: Vikas Sharma from Delhi India
Hi tom,

Can i directly relate the consistent gets of plan with the time taken by the query executed by 
using IN  and another same query executed by using exists. Actually I have query which I execute in 
two different ways eg.

Query with exists :
select person_id from persons where exists
 (select con_person_id  from contacts where con_person_id = person_id ) and contact_details = 'Y'  

93 rows selected ..
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=951 Card=1 Bytes=9)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'PERSONS' (Cost=951 Card=1 Bytes=
          9)

   3    1     INDEX (UNIQUE SCAN) OF 'CON_PK' (UNIQUE)

Statistics
----------------------------------------------------------
          0  recursive calls
          8  db block gets
       6365  consistent gets
          0  physical reads
          0  redo size
       2967  bytes sent via SQL*Net to client
       1091  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
93    rows processed


Query with IN :
select person_id from persons where person_id in (select con_person_id  from contacts ) and 
contact_details = 'Y';

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=187 Card=1 Bytes=17)
   1    0   NESTED LOOPS (Cost=187 Card=1 Bytes=17)
   2    1     INDEX (FULL SCAN) OF 'CON_PK' (UNIQUE) (Cost=1 Card=93 B
          ytes=744)

   3    1     TABLE ACCESS (BY INDEX ROWID) OF 'PERSONS' (Cost=2 Card=
          1 Bytes=9)

   4    3       INDEX (UNIQUE SCAN) OF 'PER_PK' (UNIQUE) (Cost=1 Card=
          1)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        543  consistent gets
          0  physical reads
          0  redo size
       2967  bytes sent via SQL*Net to client
       1091  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
93    rows processed

The persons table has 31000+ records and the contact table has 120+ record in this case as you 
proved the IN is better I  am too getting the same results as you can see the IN is fast. 

I would like to know that is it always true for two same queries (executed in the different way ) 
if the  consistent gets for a query are more then that query is not optimized and will always  
result a poor show, we should always go for the query which results less consistent get. 

I read in you book also the consistent gets depend upon the order of the data of the index columns. 
It will be more when the data for the index column is disordered. Does that will always effect the 
time taken by the query.  
 


Followup   August 22, 2002 - 9am Central time zone:

Less consistent gets is a good goal.  In generate, the less consistent gets

o the faster it'll be
o the more SCALABLE it'll be (as you add users, you can add MORE users with less consistent gets)

Always is a very very strong term.  Almost Always i would say.


As for the last one -- that is due to a large index range scan that'll be getting LOTS of data from 
the table -- not really applicable in this example.  Does it affect the time taken by the query -- 
probably, the more consistent gets, the more latching (thats a lock, a serialization thing), the 
longer it takes to get the data. 

5 stars   August 22, 2002 - 10am Central time zone
Reviewer: Ashok, Bombay ,India 
your style of explaining things is just amazing..!!you must be one helluva teacher !!! 


4 stars Thanks a Lot   August 22, 2002 - 12pm Central time zone
Reviewer: Vikas Sharma from Delhi India
Hi Tom,
 
Thanks lot 

The way you explain the things it is great..

Regards,

Vikas Sharma 


5 stars Very helpful   August 22, 2002 - 8pm Central time zone
Reviewer: Bijay from Fremont, CA
Tom
I wonder how you can find so much time in replying to all questions. Its amazing a simple question 
can create such a big chain of Q & A.
You have been a great source of knowledge for all of us. Thank you for being you.
Bj 


4 stars why is my query plan using NL? where as your use SORT-MERGE   September 5, 2002 - 3pm Central time zone
Reviewer: A reader 
Tom,

I am trying to understand exists and not exists

 analyze table big compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns
  5  /

Table analyzed.

SQL> analyze table small compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns
  5  /

Table analyzed.



SQL> truncate table plan_table;

Table truncated.

SQL> explain plan for
  2  select count(subobject_name)
  from big
 where object_id in ( select object_id from small )
  3    4    5  /

Explained.

SQL> @$ORACLE_HOME/rdbms/admin/utlxpls.sql
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |          |     1 |   19 |    598 |       |       |
|  SORT AGGREGATE           |          |     1 |   19 |        |       |       |
|   NESTED LOOPS            |          |   396 |    7K|    598 |       |       |
|    VIEW                   |VW_NSO_1  |    99 |    1K|      4 |       |       |
|     SORT UNIQUE           |          |    99 |  396 |      4 |       |       |
|      INDEX FULL SCAN      |SMALL_IDX |    99 |  396 |      1 |       |       |
|    TABLE ACCESS BY INDEX R|BIG       |   113K|  665K|      6 |       |       |
|     INDEX RANGE SCAN      |BIG_IDX   |   113K|      |      2 |       |       |
--------------------------------------------------------------------------------

11 rows selected.


Thans and Regards,                           


Followup   September 5, 2002 - 10pm Central time zone:

things like sort_area_size, db_block_buffers, hash area size, database version, etc will all affect 
the plans. 

5 stars Update Process Options   September 9, 2002 - 10am Central time zone
Reviewer: SK from NJ
Hi Tom,

I have to update my production table daily with new/update data. My update process is like this.

Table big_table (col1 integer, col2 integer, col3 integer)
Table Small_table (c1 integer, c2 integer, c3 integer)

Small table is a external Table. and Big Table as index on Col1.

My update process is like this:

Delete from big_table where col1 in (select c1 from small_table);
insert into big_table select c1,c2,c3 from small_table.

My Big table has 32 million rows and small table has 400K rows. This process is taking 30 minutes 
to do the above two processes. The requirement is to reduce this time. How can we do it. I believe 
using Exists will do both full table scans so that will not be efficient. Where as using in may use 
indexes (but I doubt that it is using indexes).

1. Should I go for updating using procedure where I delete records one by one and I will be sure 
that processes will use indexes. But that is not my choice and it will defy your first rule of 
using single statement instead of using procedure.

2. In one such update my big_table has more than 500 million records. and small Table has 7 million 
records. I have 24GB of temp tablespace. My delete statement is failing due to unable to allocate 
extents on temp tablespace. I am surprised on this. Is Temp tablespace of 24GB too large? In the 
above delete statement will it use only col1 and c1 for sort/merge. How do I estimate the temp 
tablespace size for this transaction. I am running Oracle 9iR2 RAC.

Your response will be highly appreciated.

Thanks 


Followup   September 9, 2002 - 10am Central time zone:

why delete and insert.

Just MERGE, see
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1615330671789
 

3 stars Follow up   September 9, 2002 - 11am Central time zone
Reviewer: SK from NJ
Hi Tom,

I am using merge for some tables where I have one to one relationship between the two table. But 
for many big tables I have multiple records for same key (tables with non unique key). So will 
merge will take care of following situation where for big table has 5 records for col1. and small 
table has 3 records for the same key. Will merge replace 5 rows in big table with 3 rows from small 
table for the same key (means deleteion of two records from big table).

I will certainly prefer merge if this is possible.

Regards 


Followup   September 9, 2002 - 12pm Central time zone:

Then that is not a merge. 

A merge would leave the same number of rows.

You'll need to do the delete+insert.  You'll want to use tkprof to see what it is doing now and see 
if it can be speeded up.  Deleting 400k rows is going to take a while if you have indexes on the 
table. 

5 stars Oracle Optimizer and In & Exists   September 11, 2002 - 12am Central time zone
Reviewer: Subhrajyoti Paul from Mumbai, India
Tom,
   From the discussion above, I get the impression that In and exists clause can be used 
interchangibly, in case there is a subquery inside the IN clause. But, one would be better than the 
other depending on the amount data retrieved in the inner and the outer query.
   If they can be used interchangibly, then it sounds logical that the optimizer can substitute an 
in clause with an exists and vice versa, given it is provided with proper statistics... Does this 
happen in practice... Or am I making some mistakes in my understanding of the things...

   One other question, 

"select * 
  from t1, ( select distinct y from t2 ) t2
 where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then 
joined to the original table -- typically. "

Here you mentioned that it is distinct'ed, which means it is sorted, now once sorted, will oracle 
favor the use of merge sort join, because the sort operation is already complete...or may it choose 
to Indexing or hash join. How much importance is given to the fact that one step (sort, in this 
case) can be eliminated from the join operation, or is this relevant at all...


    


Followup   September 11, 2002 - 7am Central time zone:

And in 8i, ... 9i, .... 10i, and on up -- that is true.  You'll see interesting query rewrites with 
both (given that you use the cost based optimizer which I can assure you you will be in 10i as the 
RBO is gone in that release)


distinct <> sort.  SORT is SORT, distinct is distinct.  DISTINCT does not imply or infer or mean 
SORTed.  Never has, never will. 

5 stars But how otherwise can u perform a distinct if u do not sort...   September 11, 2002 - 8am Central time zone
Reviewer: Subhrajyoti Paul from Mumbai, India
Tom,
   I am relieved that the optimizer does the substitution...And ofcourse, I use cost based 
optimizer...

   But the distinct thing.. If distincted data does not mean sorted, how otherwise can u have 
distinct data if u did not sort it... Can you give me an example...

Subhro 


Followup   September 11, 2002 - 8am Central time zone:

Lets say you have a partitioned table.

You used hash partitioning on the ID column.


We know that across partitions -- ID is "unique" (eg: if ID=5 is in partition 1, ID=5 will NOT be 
in partitions 2, 3, 4, ....  and so on).

We use parallel query.

Each partitioned is distincted and then glued together.  ID=5 from partition 1 could come after 
ID=1000 from partition 2 if the PQ slave processing partition 2 was "faster" then the one doing 
partition 1.


For example.  There are other cases as well.  distinct, group by, etc -- do not imply "sorted" 

5 stars Thanks a lot... I got the point...   September 11, 2002 - 8am Central time zone
Reviewer: Subhrajyoti Paul from Mumbai, India
Tom,
   Most of what u said in the last followup is a bouncer to me(in cricket, a bouncer is a ball that 
goes above the batsman's head ;-) )...

   But, I got the point that distinct does not mean sorted...

   I just could not think of a case myself ..so the example was needed...

Thanks a lot... everytime i think i got u...u prove me wrong... cool.... 


4 stars not in & not exists   November 5, 2002 - 5am Central time zone
Reviewer: Brigitte from Germany
Hallo Tom,

Some times ago I changed a query to "not exists", because "not in" takes some hours. 
Now I tried your interesting followup from October 01, 2001  with the test database, but alter 
session 
set always_anti_join = HASH/MERGE hasn't an effect. Why?
notesid is not null in all tables, it is the primary key in testidxdokument,testidxvorgang and 
testidxmappe

Sql> set autotrace trace
Sql> set timing on
Sql> r
  1  select distinct mandant,notesid from testleserfeldgruppe l
  2  where not exists
  3  (select 'x' from testidxdokument
  4  where l.notesid = notesid)
  5  and not exists
  6  (select 'x' from testidxvorgang
  7  where l.notesid = notesid)
  8  and not exists
  9  (select 'x' from testidxmappe
 10  where l.notesid = notesid)
 11* order by mandant, notesid

no rows selected

Elapsed: 00:00:01.02

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=35)
   1    0   SORT (ORDER BY) (Cost=6 Card=1 Bytes=35)
   2    1     SORT (UNIQUE) (Cost=5 Card=1 Bytes=35)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'TESTLESERFELDGRUPPE' (Cost=3 Card=1 Bytes=35)
   5    3         INDEX (UNIQUE SCAN) OF 'UK_TESTIDXDOKUMENT' (UNIQUE) (Cost=1 Card=1 Bytes=18)
   6    3         INDEX (UNIQUE SCAN) OF 'UK_TESTIDXVORGANG' (UNIQUE) (Cost=1 Card=1 Bytes=18)
   7    3         INDEX (UNIQUE SCAN) OF 'UK_TESTIDXMAPPE' (UNIQUE) (Cost=1 Card=1 Bytes=18)

Statistics
----------------------------------------------------------
          0  recursive calls
          4  db block gets
       1507  consistent gets
          0  physical reads
          0  redo size
        160  bytes sent via SQL*Net to client
        251  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          0  rows processed

Sql> r
  1  select distinct mandant,notesid from testleserfeldgruppe where notesid not in
  2  (select notesid from testidxdokument)
  3  and notesid not in
  4  (select notesid from testidxvorgang)
  5  and notesid not in
  6  (select notesid from testidxmappe)
  7* order by mandant, notesid

no rows selected

Elapsed: 00:00:02.44

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=35)
   1    0   SORT (ORDER BY) (Cost=6 Card=1 Bytes=35)
   2    1     SORT (UNIQUE) (Cost=5 Card=1 Bytes=35)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'TESTLESERFELDGRUPPE' (Cost=3Card=1 Bytes=35)
   5    3         TABLE ACCESS (FULL) OF 'TESTIDXDOKUMENT' (Cost=12 Card=303 Bytes=5454)
   6    3         TABLE ACCESS (FULL) OF 'TESTIDXVORGANG' (Cost=1 Card=9 Bytes=162)
   7    3         TABLE ACCESS (FULL) OF 'TESTIDXMAPPE' (Cost=1 Card=9 Bytes=162)

Statistics
----------------------------------------------------------
          0  recursive calls
       3304  db block gets
      32597  consistent gets
          0  physical reads
          0  redo size
        167  bytes sent via SQL*Net to client
        249  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed

Sql> alter session set always_anti_join = MERGE;

Session altered.

Sql> select distinct mandant,notesid from testleserfeldgruppe where notesid not in
  2  (select notesid from testidxdokument)
  3  and notesid not in
  4  (select notesid from testidxvorgang)
  5  and notesid not in
  6  (select notesid from testidxmappe)
  7  order by mandant, notesid;

no rows selected

Elapsed: 00:00:01.23

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=35)
   1    0   SORT (ORDER BY) (Cost=6 Card=1 Bytes=35)
   2    1     SORT (UNIQUE) (Cost=5 Card=1 Bytes=35)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'TESTLESERFELDGRUPPE' (Cost=3 Card=1 Bytes=35)
   5    3         TABLE ACCESS (FULL) OF 'TESTIDXDOKUMENT' (Cost=12 Card=303 Bytes=5454)
   6    3         TABLE ACCESS (FULL) OF 'TESTIDXVORGANG' (Cost=1 Card=9 Bytes=162)
   7    3         TABLE ACCESS (FULL) OF 'TESTIDXMAPPE' (Cost=1 Card=9 Bytes=162)

Statistics
----------------------------------------------------------
          0  recursive calls
       3304  db block gets
      32597  consistent gets
          0  physical reads
          0  redo size
        167  bytes sent via SQL*Net to client
        249  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed

Sql> alter session set always_anti_join = HASH;

Session altered.

Sql> select distinct mandant,notesid from testleserfeldgruppe where notesid not in
  2  (select notesid from testidxdokument)
  3  and notesid not in
  4  (select notesid from testidxvorgang)
  5  and notesid not in
  6  (select notesid from testidxmappe)
  7  order by mandant, notesid;

no rows selected

Elapsed: 00:00:01.13

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=1 Bytes=35)
   1    0   SORT (ORDER BY) (Cost=6 Card=1 Bytes=35)
   2    1     SORT (UNIQUE) (Cost=5 Card=1 Bytes=35)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'TESTLESERFELDGRUPPE' (Cost=3 Card=1 Bytes=35)           
   5    3         TABLE ACCESS (FULL) OF 'TESTIDXDOKUMENT' (Cost=12 Card=303 Bytes=5454)
   6    3         TABLE ACCESS (FULL) OF 'TESTIDXVORGANG' (Cost=1 Card=9 Bytes=162)
   7    3         TABLE ACCESS (FULL) OF 'TESTIDXMAPPE' (Cost=1 Card=9 Bytes=162)

Statistics
----------------------------------------------------------
          0  recursive calls
       3304  db block gets
      32597  consistent gets
          0  physical reads
          0  redo size
        167  bytes sent via SQL*Net to client
        249  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed

Regards
 


Followup   November 5, 2002 - 9am Central time zone:

are those statistics close to right?  9 rows and such -- on such a trivial amount of data -- all 
bets are off

either get real stats on there or load the tables with real data. 

1 stars not in or not exists   November 6, 2002 - 10am Central time zone
Reviewer: Brigitte from Germany
Hallo Tom, 
here are more data.
1. Why has the alter session no effect. What's wrong? Some parameters?
2. what means: "elapsed 02:133:8032.30", I know hours:minutes:seconds
   Does it mean 2 hours + 13 minutes +...
             or 2 hours + 2 hours + 13 minutes +...

06.11.02 07:53    Anzahl der Tabellenzeilen       Seite:   1
                            dms

DMSIDXDOKUMENT               : 46220
DMSIDXMAPPE                  : 13
DMSIDXVORGANG                : 464
DMSLESERFELDGRUPPE           : 137369

Na so was> r
  1  select distinct mandant,notesid from dmsleserfeldgruppe l
  2  where not exists
  3  (select 'x' from dmsidxdokument
  4  where l.notesid = notesid)
  5  and not exists
  6  (select 'x' from dmsidxvorgang
  7  where l.notesid = notesid)
  8  and not exists
  9  (select 'x' from dmsidxmappe
 10  where l.notesid = notesid)
 11* order by mandant, notesid

no rows selected

Elapsed: 00:00:19.18

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=71 Card=25 Bytes=950)
   1    0   SORT (ORDER BY) (Cost=71 Card=25 Bytes=950)
   2    1     SORT (UNIQUE) (Cost=68 Card=25 Bytes=950)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'DMSLESERFELDGRUPPE' (Cost=65 Card=25 Bytes=950)
   5    3         INDEX (UNIQUE SCAN) OF 'UK_DMSIDXDOKUMENT' (UNIQUE) (Cost=1 Card=1 Bytes=32)
   6    3         INDEX (UNIQUE SCAN) OF 'UK_DMSIDXVORGANG' (UNIQUE) (Cost=1 Card=1 Bytes=31)
   7    3         INDEX (UNIQUE SCAN) OF 'UK_DMSIDXMAPPE' (UNIQUE)

Statistics
----------------------------------------------------------
        245  recursive calls
          7  db block gets
     149823  consistent gets
       1439  physical reads
          0  redo size
        161  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
         18  sorts (memory)
          0  sorts (disk)
          0  rows processed

Na so was> r
  1  select distinct mandant,notesid from dmsleserfeldgruppe where notesid not in
  2  (select notesid from dmsidxdokument)
  3  and notesid not in
  4  (select notesid from dmsidxvorgang)
  5  and notesid not in
  6  (select notesid from dmsidxmappe)
  7* order by mandant, notesid

no rows selected

Elapsed: 02:133:8032.30

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=71 Card=25 Bytes=950)
   1    0   SORT (ORDER BY) (Cost=71 Card=25 Bytes=950)
   2    1     SORT (UNIQUE) (Cost=68 Card=25 Bytes=950)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'DMSLESERFELDGRUPPE' (Cost=65 Card=25 Bytes=950)
   5    3         TABLE ACCESS (FULL) OF 'DMSIDXDOKUMENT' (Cost=118 Card=2101 Bytes=67232)
   6    3         TABLE ACCESS (FULL) OF 'DMSIDXVORGANG' (Cost=1 Card=21 Bytes=651)
   7    3         TABLE ACCESS (FULL) OF 'DMSIDXMAPPE' (Cost=1 Card=2 Bytes=50)

Statistics
----------------------------------------------------------
          0  recursive calls
     464062  db block gets
  101189223  consistent gets
   12940155  physical reads
          0  redo size
        167  bytes sent via SQL*Net to client
        244  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed

Na so was> alter session set always_anti_join = MERGE;

Session altered.

Na so was> r
  1  select distinct mandant,notesid from dmsleserfeldgruppe where notesid not in
  2  (select notesid from dmsidxdokument)
  3  and notesid not in
  4  (select notesid from dmsidxvorgang)
  5  and notesid not in
  6  (select notesid from dmsidxmappe)
  7* order by mandant, notesid

no rows selected

Elapsed: 02:133:8036.06

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=71 Card=25 Bytes=950)
   1    0   SORT (ORDER BY) (Cost=71 Card=25 Bytes=950)
   2    1     SORT (UNIQUE) (Cost=68 Card=25 Bytes=950)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'DMSLESERFELDGRUPPE' (Cost=65 Card=25 Bytes=950)
   5    3         TABLE ACCESS (FULL) OF 'DMSIDXDOKUMENT' (Cost=118 Card=2101 Bytes=67232)
   6    3         TABLE ACCESS (FULL) OF 'DMSIDXVORGANG' (Cost=1 Card=21 Bytes=651)
   7    3         TABLE ACCESS (FULL) OF 'DMSIDXMAPPE' (Cost=1 Card=2 Bytes=50)

Statistics
----------------------------------------------------------
          0  recursive calls
     464112  db block gets
  101219920  consistent gets
   13238978  physical reads
          0  redo size
        167  bytes sent via SQL*Net to client
        244  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed

Na so was> alter session set always_anti_join = HASH;

Session altered.

Na so was> select distinct mandant,notesid from dmsleserfeldgruppe where notesid not in
  2  (select notesid from dmsidxdokument)
  3  and notesid not in
  4  (select notesid from dmsidxvorgang)
  5  and notesid not in
  6  (select notesid from dmsidxmappe)
  7  order by mandant, notesid;

no rows selected

Elapsed: 02:135:8144.32

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=71 Card=25 Bytes=950)
   1    0   SORT (ORDER BY) (Cost=71 Card=25 Bytes=950)
   2    1     SORT (UNIQUE) (Cost=68 Card=25 Bytes=950)
   3    2       FILTER
   4    3         TABLE ACCESS (FULL) OF 'DMSLESERFELDGRUPPE' (Cost=65 Card=25 Bytes=950)
   5    3         TABLE ACCESS (FULL) OF 'DMSIDXDOKUMENT' (Cost=118 Card=2101 Bytes=67232)
   6    3         TABLE ACCESS (FULL) OF 'DMSIDXVORGANG' (Cost=1 Card=21 Bytes=651)
   7    3         TABLE ACCESS (FULL) OF 'DMSIDXMAPPE' (Cost=1 Card=2 Bytes=50)

Statistics
----------------------------------------------------------
          0  recursive calls
     464327  db block gets
  101357656  consistent gets
   14016238  physical reads
          0  redo size
        167  bytes sent via SQL*Net to client
        244  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          0  rows processed

Regards 


1 stars not in & not exists   November 8, 2002 - 3am Central time zone
Reviewer: Brigitte from Germany
Hallo Tom,
I hope the data which I send on November 06 are enough.
I think a difference from 19 minutes to some hours is significant.
I'm very interested in your reply.

Regards 


Followup   November 8, 2002 - 7am Central time zone:

Brigitte -- for something so big, you'll need to do it as a question - this is for followups, quick 
questions.  This is an entire question in a question. 

5 stars Is the logic backwards for NOT IN and NOT EXISTS   November 15, 2002 - 4pm Central time zone
Reviewer: Doug from CT, USA
Tom - this may be an oversimplification - however if I was to distill whether IN or EXISTS was 
better I would say this after reading your reply.  An IN will generally full table scan the 
subquery, and an EXISTS will generally full table scan the outer query.  So, if my outer loop is 
small and my inner code is large then an exists might be appropriate but if my outer loop is large 
and my inner code is small (the match is quick).. then an IN might be appropriate.  Is that 
correct?  If so.. is the situation completely reversed if using NOT in and NOT exists?  I would 
guess is you were using not in.. you would have to full table scan the OUTER query rather than 
sorting,hashing, the inner one.  You did cover that NOT IN will return zero rows if it is NOT IN 
(null).. but I wasn't sure if the psuedo-code for processing is backwards from the original 
examples if one is using NOT IN and NOT EXISTS.   


Followup   November 15, 2002 - 8pm Central time zone:

I encapsulate it with this:

...
So, when is where exists appropriate and in appropriate?

Lets say the result of the subquery
    ( select y from T2 )

is "huge" and takes a long time.  But the table T1 is relatively small and 
executing ( select null from t2 where y = x.x ) is very very fast (nice index on 
t2(y)).  Then the exists will be faster as the time to full scan T1 and do the 
index probe into T2 could be less then the time to simply full scan T2 to build 
the subquery we need to distinct on.


Lets say the result of the subquery is small -- then IN is typicaly more 
appropriate.


If both the subquery and the outer table are huge -- either might work as well 
as the other -- depends on the indexes and other factors. 

.....


full scans -- no, they enter not into the picture at all IN GENERAL.  If the subquery can use 
indexes, it'll use an index with IN or EXIST.

It is if the subquery is LARGE and TAKES A LONG TIME TO BUILD.... (regardless of full scan)


Not in and not exists is even harder to characterize (pull my AOTC slides from the home page for an 
interesting example -- they will be there for a while around November of 2002 -- if you are reading 
this later then that, they might not be there anymore) 

3 stars How to make NOT EXISTS handle comparison of NULLs in same way as MINUS   March 13, 2003 - 10am Central time zone
Reviewer: Jim Dickson (DBA/developer) from London, UK
As ever, learn a lot from this site.

But one thing not addressed is how to make an antijoin deal with NULLs. The set operators (UNION, 
MINUS) do this automatically.

But can it be done efficiently with NOT EXISTS or NOT IN.

e.g.
I need to compare 2 tables such that two rows are duplicates if corresponding columns are either 
equal, or both contain NULL. 

I cannot use MINUS since that columns being compared are a subset of the row, but i need to return 
the whole row. 

My understanding is that I cannot use NOT IN because it does not handle "null comparison". 

NOT EXISTS works but seems to use NESTED LOOPS => v slow 
My understanding is that HASH antijoin will not work because HASH join only works for equality 
tests (which fail for NULLs). 

I can post example code if this helps.
 


Followup   March 14, 2003 - 5pm Central time zone:

both not exists and ESPECIALLLY not in do not do nulls like that.

not in and not exists don't even treat nulls the same in fact.  do this:

select * from emp where empno not in ( select mgr from emp );

and compare that to the "equivalent" not exists:

select * from emp where not exists ( select null from emp e2 where e2.mgr = emp.empno );

you'll get a different answer.

You would have to tediously compare column by column by column

where (x = y or (x is null and y is null))

otherwise. 

3 stars How to make NOT EXISTS handle comparison of NULLs in same way as MINUS   March 17, 2003 - 12pm Central time zone
Reviewer: Jim Dickson from London, UK
I absolutely agree but the question more about access paths than semantics.

I believe i understand the semantics surrounding nulls - which is why MINUS/INTERSECT are such 
great operators.

I am prepared to write "tedious" column comparison.

Can I get Oracle to perform comparisons via a single-pass full table scan access method (hash, 
sort-merge).

I have only seen Cartesian product/Nested Loops approach which obviously does not scale with number 
of rows.

Is there any chance of Oracle implementing ANSI set operator extension "CORRESPONDING BY 
{column-list}" ?

 


Followup   March 17, 2003 - 1pm Central time zone:

curious -- why cannot you simply use MERGE/INTERSECT? 

3 stars How to make NOT EXISTS handle comparison of NULLs in same way as MINUS   March 18, 2003 - 4am Central time zone
Reviewer: Jim Dickson from London, UK
Because I am not merging into target table directly.

The sequence of processing is

1/    Read an external table
2/    Identify differences based on a subset of columns
3/    Write out differences to OS for further processing  (fuzzy logic on Name & Address data)
4/    Read 2nd external table containing processed data and then "merge" into target table.

Where I have a primary key, I can use MINUS for step 2.
But where there is no PK, then I am trying to get NOT EXISTS to scale.

Hope that makes sense.

BTW Is MERGE statement ANSI? 


Followup   March 18, 2003 - 7am Central time zone:

well, not following you by what you mean by "getting not exists to scale" -- but give me an EXAMPLE 
of the syntax you would like to use and describe what would be returned and I'll see if there isn't 
a way to code that...


Don't know if merge is "ansi".  Thing is NIST stop testing for ansi compliance years ago, no one 
does it, the standard is so huge now that full compliance is pretty much "not going to happen" and 
database indenpendence, well, I don't even go there -- the only results I see from that over the 
last 15 years has been applications that perform marginally well on a single database and blow 
chunks on the rest. 

4 stars How to make NOT EXISTS handle comparison of NULLs in same way as   March 19, 2003 - 4am Central time zone
Reviewer: Jim Dickson from London, UK
Code posted below.

The NOT EXISTS does not include the SECONDARY column.

Ran with 100 rows, 1000 rows and 2389 rows.
On my machine

MINUS        100 rows  00.44 secs,    8  db block gets
            1000 rows  00.27 secs,    7  db block gets
            2389 rows  00.51 secs,    4  db block gets

NOT EXISTS   100 rows  00.43 secs,  208  db block gets
            1000 rows  02.90 secs, 2007  db block gets
            2389 rows  11.75 secs, 4780  db block gets

I need this to run with 10m rows.

spool not_exists.2389.out

set timing on
set echo on

drop   table t1;
drop   table t2;

create table t1 as select * from all_objects where rownum <= 2500;
create table t2 as select * from all_objects where rownum <= 2500;

analyze table t1 compute statistics;
analyze table t2 compute statistics;

--desc t1
--desc t2

insert into t1 values ('EXTRA ROW', null, null, null, null, null, null, null, null, null, null, 
null, null);

select count(*) from t1;
select count(*) from t2;
 
set autotrace on

select count(*) from
( 
  select * from t1
  minus
  select * from t2
);

select count(*) from t1 where NOT EXISTS 
( select null from t2 where 
  ( t1.OWNER = t2.OWNER or ( t1.OWNER is null and t2.OWNER is null ) )
  and
  ( t1.OBJECT_NAME = t2.OBJECT_NAME or ( t1.OBJECT_NAME is null and t2.OBJECT_NAME is null ) )
  and
  ( t1.SUBOBJECT_NAME = t2.SUBOBJECT_NAME or ( t1.SUBOBJECT_NAME is null and t2.SUBOBJECT_NAME is 
null ) )
  and
  ( t1.OBJECT_ID = t2.OBJECT_ID or ( t1.OBJECT_ID is null and t2.OBJECT_ID is null ) )
  and
  ( t1.DATA_OBJECT_ID = t2.DATA_OBJECT_ID or ( t1.DATA_OBJECT_ID is null and t2.DATA_OBJECT_ID is 
null ) )
  and
  ( t1.OBJECT_TYPE = t2.OBJECT_TYPE or ( t1.OBJECT_TYPE is null and t2.OBJECT_TYPE is null ) )
  and
  ( t1.CREATED = t2.CREATED or ( t1.CREATED is null and t2.CREATED is null ) )
  and
  ( t1.LAST_DDL_TIME = t2.LAST_DDL_TIME or ( t1.LAST_DDL_TIME is null and t2.LAST_DDL_TIME is null 
) )
  and
  ( t1.TIMESTAMP = t2.TIMESTAMP or ( t1.TIMESTAMP is null and t2.TIMESTAMP is null ) )
  and
  ( t1.STATUS = t2.STATUS or ( t1.STATUS is null and t2.STATUS is null ) )
  and
  ( t1.TEMPORARY = t2.TEMPORARY or ( t1.TEMPORARY is null and t2.TEMPORARY is null ) )
  and
  ( t1.GENERATED = t2.GENERATED or ( t1.GENERATED is null and t2.GENERATED is null ) )
)
;

spool off
exit

As ever, thanks.

Think the whole Oracle community is disappointed that the new book will be delayed. 


Followup   March 20, 2003 - 1pm Central time zone:

what I meant by:

 but 
give me an EXAMPLE of the syntax you would like to use and describe what would 
be returned and I'll see if there isn't a way to code that...

was -- give me a scenario where you want to use not exists instead of minus (eg: in the above, i 
certainly would NOT use not exists) and I'll see what I can do with that. 

That is -- give me a real case to work with whereby minus cannot be used. 

4 stars   March 21, 2003 - 1pm Central time zone
Reviewer: green 
Hi Tom,
Can you please explain a litte bit 'NOT IN' here? Where  can  it be used properly?

Thanks! 


Followup   March 21, 2003 - 2pm Central time zone:

I don't understand?  

not in can be used anywhere you need "not in" -- where "record" not in "set".  
 

2 stars How to use MINUS on subset of columns   March 27, 2003 - 4am Central time zone
Reviewer: Jim Dickson from London, UK
I use MINUS extensively but in certain situations I believe it is unsuitable.

e.g.

When trying to find the difference between 2 tables, where at least 1 of tables lack a primary key,
using a subset of columns for comparison and some of those columns can be null.

The real life example is comparing a new list of suppressions against an existing table of 
suppressions.
(Suppressions used in marketing to avoid target customers who do not wish to receive offers).
The ddl is pretty much address1-10, suppression_flags1-10.
The comparison is done on address lines only but return set must be all columns.

Hope that makes sense.

The ANSI syntax i would love to use is

select * from suppressions_new
except corresponding by (address1, address2, ...., address10)
select * from suppressions_old

Suppressions_new will have no primary key.

How can i achieve this in Oracle using MINUS instead of EXCEPT.

If it cannot be done using MINUS, then how to achieve reasonable performance using NOT EXISTS. 


Followup   March 27, 2003 - 8am Central time zone:

Ok, I have a big_table (4million rows) and a sample of that (400k rows).  Using NVL() on columns 
that "might be nullable" and supplying an unreasonable value (eg: chr(0) for strings, 
99999999999999999999999999 for numbers, to_date(01010001,'ddmmyyyyy') for dates) you can anti join 
with NOT IN like this:

select *
  from big_table
 where (object_id, nvl(subobject_name,chr(0)) )
   NOT IN (select object_id, nvl(subobject_name,chr(0))
             from ten_percent
            where nvl(subobject_name,chr(0)) IS NOT NULL )
   and nvl(subobject_name,chr(0)) is not null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1    114.65     248.86     118052      60214          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3    114.65     248.86     118052      60214          0           0

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 60

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  HASH JOIN ANTI (cr=60214 r=118052 w=58940 time=77317035 us)
3817191   TABLE ACCESS FULL BIG_TABLE (cr=54700 r=54649 w=0 time=14975098 us)
 381274   TABLE ACCESS FULL TEN_PERCENT (cr=5514 r=4463 w=0 time=1360329 us)


Now, that "scales" just as well as:

select *
  from big_table
 where (object_id, object_name )
   NOT IN (select object_id, object_name
             from ten_percent )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1    114.19     224.33     117115      60213          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3    114.19     224.33     117115      60213          0           0

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 60

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  HASH JOIN ANTI (cr=60213 r=117115 w=57995 time=224338032 us)
3817191   TABLE ACCESS FULL BIG_TABLE (cr=54699 r=54657 w=0 time=16561038 us)
 381274   TABLE ACCESS FULL TEN_PERCENT (cr=5514 r=4463 w=0 time=1308385 us)


without the NVL's.
 

5 stars Use of "nvl(subobject_name,chr(0)) IS NOT NULL "   March 27, 2003 - 1pm Central time zone
Reviewer: Logan Palanisamy from Santa Clara, CA USA
Tom, 

In your example,

select *
  from big_table
 where (object_id, nvl(subobject_name,chr(0)) )
   NOT IN (select object_id, nvl(subobject_name,chr(0))
             from ten_percent
            where nvl(subobject_name,chr(0)) IS NOT NULL )
   and nvl(subobject_name,chr(0)) is not null


1. What does chr(0) translate to? When I run "select chr(0) from dual", it returns me an invisble 
character. My db character set is WE8ISO8859P15

2. What does "nvl(subobject_name,chr(0)) IS NOT NULL" do? Doesn't it evaluate to TRUE always, since 
chr(0) is a not null value? 

3. Also, the number of rows returned as shown in TKPROF output is zero. How come? Shouldn't return 
about 3.6 millon rows?

4. I am familiar with TKPROF. But what does the name really stand for? Is it an acronym for 
something?

Grateful as ever. 


Followup   March 27, 2003 - 2pm Central time zone:

1) exactly -- it is just "\0" in C, a null string.  It is an improbable character, something 
subobject_name cannot be...

2) exactly -- I'm telling the optimizer "this query returns only data that is NOT NULL".  That 
permits the use of the anti-join which is what our goal was here.

3) no, there were no rows "not in" the other set.

4) tool kit profiler is the rumorous name I've heard it refered to as. 

3 stars LIKE.... OR LIKE.... OR   May 12, 2003 - 11am Central time zone
Reviewer: Robert from PA
Tom, 
Is there a better way (with a sub-query) to do
LIKE .... OR LIKE ... OR LIKE ... ?
e.g.:
select * from user_objects where 
OBJECT_TYPE LIKE 'PROC%' OR OBJECT_TYPE  LIKE 'FUNC%' OR OBJECT_TYPE  LIKE 'VIEW%" ?

Thanks
 


Followup   May 12, 2003 - 1pm Central time zone:

that would be the most "optimal" way normally -- it could permit a normal index on object_type to 
be used.

You could shorten it to 

where substr( object_type, 1, 4 ) in ( 'PROC', 'FUNC', 'VIEW' ) 

in your case -- and if you had a function based index on substr(object_type,1,4) it could use an 
index. 

4 stars EXISTS and FTS   May 14, 2003 - 6pm Central time zone
Reviewer: Nitin from Atlanta
I tried the following.
SQL> create table outer (a number);

Table created.

SQL> create index outer_ind on outer (a);

Index created.

SQL> create table inner (b number);

Table created.

SQL> create index inner_ind on inner (b);

Index created.

SQL> select 'x' from outer where exists (select 'x' from inner where outer.a = inner.b);

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'OUTER'
   3    1     INDEX (RANGE SCAN) OF 'INNER_IND' (NON-UNIQUE)


SQL> analyze table outer compute statistics;

Table analyzed.

SQL> analyze table inner compute statistics;

Table analyzed.

SQL> select 'x' from outer where exists (select 'x' from inner where outer.a = inner.b);

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=13)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'OUTER' (Cost=1 Card=1 Bytes=13)
   3    1     INDEX (RANGE SCAN) OF 'INNER_IND' (NON-UNIQUE) (Cost=1 C
          ard=1 Bytes=13)

SQL> insert into outer select OBJECT_ID from user_objects;

40899 rows created.

SQL> commit;

Commit complete.

SQL> analyze table outer compute statistics;

Table analyzed.

SQL> select 'x' from outer where exists (select 'x' from inner where outer.a = inner.b);

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=2045 Bytes=81
          80)

   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'OUTER' (Cost=4 Card=2045 Bytes=8
          180)

   3    1     INDEX (RANGE SCAN) OF 'INNER_IND' (NON-UNIQUE) (Cost=1 C
          ard=1 Bytes=13)


Should it not perform an index scan on the OUTER table?

Regards

Nitin 


Followup   May 14, 2003 - 11pm Central time zone:

why would it?

every row must be inspected.
the rows are very narrow
the table is SMALLER then the index (the index has 2 columns -- the column and the rowid, the table 
is just a column)

use "real" examples.


ops$tkyte@ORA920LAP> create table t1 as select * from all_objects;

Table created.

ops$tkyte@ORA920LAP> create index t1_idx on t1(object_id);

Index created.

ops$tkyte@ORA920LAP> analyze table t1 compute statistics
  2  for table for all indexes for all indexed columns;

Table analyzed.

ops$tkyte@ORA920LAP> create table t2 as select * from all_objects where 1=0;

Table created.

ops$tkyte@ORA920LAP> create index t2_idx on t2(object_id);

Index created.

ops$tkyte@ORA920LAP> analyze table t1 compute statistics
  2  for table for all indexes for all indexed columns;

Table analyzed.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> set autotrace traceonly explain
ops$tkyte@ORA920LAP> select * from t1
  2  where exists ( select null from t2 where t2.object_id = t1.object_id );

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=168 Card=82 Bytes=8938)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=1 Bytes=96)
   2    1     NESTED LOOPS (Cost=168 Card=82 Bytes=8938)
   3    2       SORT (UNIQUE)
   4    3         TABLE ACCESS (FULL) OF 'T2' (Cost=2 Card=82 Bytes=1066)
   5    2       INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1 Card=1)



ops$tkyte@ORA920LAP> set autotrace on
 

4 stars Still FTS   May 15, 2003 - 12pm Central time zone
Reviewer: Nitin 
I replicated all the steps performed by you. And I am getting the following execution plan.

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=55 Card=3515 Bytes=2
          70655)

   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'T1' (Cost=55 Card=3515 Bytes=270
          655)

   3    1     INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE) (Cost=1 Card
          =1 Bytes=13)


The db is 8.1.7.4. Is is possible that there is some init.ora parameter which is dictating the 
above FTS execution plan?

Thanks 


Followup   May 15, 2003 - 5pm Central time zone:

that is fundementally the same plan I had -- so?

it needs to full scan, mine full scanned.  

tell me -- what would the "straw man plan" you think would be better be exactly? 

3 stars Still FTS   May 15, 2003 - 7pm Central time zone
Reviewer: Nitin 
The difference is that the optimizer is choosing to perform FTS on T1 (which is huge) and use index 
scan on T2 which is very small.

Regards

 


Followup   May 15, 2003 - 7pm Central time zone:

ops$tkyte@ORA817DEV> set autotrace traceonly explain
ops$tkyte@ORA817DEV> select * from t1, ( select distinct object_id from t2 ) t2
  2  where t1.object_id = t2.object_id;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=1 Bytes=109)
   1    0   NESTED LOOPS (Cost=5 Card=1 Bytes=109)
   2    1     VIEW (Cost=3 Card=1 Bytes=13)
   3    2       SORT (UNIQUE) (Cost=3 Card=1 Bytes=13)
   4    3         INDEX (FULL SCAN) OF 'T2_IDX' (NON-UNIQUE)
   5    1     TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=26793 Bytes=2572128)
   6    5       INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1 Card=26793)



in 8i seems to be the way to go -- things change, the optimizer gets smarter 

4 stars What about with updates?   May 21, 2003 - 2am Central time zone
Reviewer: Brett from Australia
Tom,

How does IN with a subquery affect performance of updates - is the subquery executed once for each 
row (which could be really inefficient), or once for the query. eg:

update table1
   set column_1 = 'Y'
 where row_id in (select row_id
                    from table1 t1,
                         table2 t2
                   where t1.f_key = t2.row_id
                     and t2.class_cd = 'Address'
                     and t2.created > sysdate - 30);

Is this subquery executed only once for the entire update? Or would it be better to use the 
subquery to insert the data into a temporary table with a unique constraint/primary key and then 
use an updatable view based on table1 and the temporary table? 


Followup   May 21, 2003 - 8am Central time zone:

the subquery, since it is not a correlated subquery, would tend to be done "once"



Now, if t2.row_id is UNIQUE, and there is a unique/primary key on it I would code:

ops$tkyte@ORA920> update ( select column_1
  2             from t1, t2
  3                    where t1.f_key = t2.row_id
  4                      and t2.class_cd = 'Address'
  5                          and t2.created > sysdate-30 )
  6    set column_1 = 'Y'
  7  /


 

4 stars Are these same?   May 22, 2003 - 12pm Central time zone
Reviewer: Nitin 
Are the following SQL's similar? Will they retrieve same set of data under all circumstances?

1.

select 
APRCN_ID_BI, APPRAISAL_DT, RANK_ID 
from APPRECIATN_TBL WHERE
 (APRCN_ID, APRCN_ID, ACTIVITY_ID, NOTES_ID)   in  ( SELECT 
APRCN_ID, APRCN_ID, ACTIVITY_ID, NOTES_ID
FROM APPRECIATN_TBL_TMP TMP WHERE
  APPRECIATN_TBL.APRCN_ID = TMP.APRCN_ID AND
  APPRECIATN_TBL.APRCN_ID = TMP.APRCN_ID AND
  APPRECIATN_TBL.ACTIVITY_ID = TMP.ACTIVITY_ID AND
  APPRECIATN_TBL.NOTES_ID = TMP.NOTES_ID AND TMP.PROCESS_INSTANCE =
  363480 AND TMP.APRCN_ID_BI = '52000' AND TMP.APPRAISAL_DT =
  TO_DATE('2003-01-19','YYYY-MM-DD') AND TMP.RANK_ID = '55053')


2.

select 
APRCN_ID_BI, APPRAISAL_DT, RANK_ID 
from APPRECIATN_TBL WHERE
EXISTS  ( SELECT 
APRCN_ID, APRCN_ID, ACTIVITY_ID, NOTES_ID
FROM APPRECIATN_TBL_TMP TMP WHERE
  APPRECIATN_TBL.APRCN_ID = TMP.APRCN_ID AND
  APPRECIATN_TBL.APRCN_ID = TMP.APRCN_ID AND
  APPRECIATN_TBL.ACTIVITY_ID = TMP.ACTIVITY_ID AND
  APPRECIATN_TBL.NOTES_ID = TMP.NOTES_ID AND TMP.PROCESS_INSTANCE =
  363480 AND TMP.APRCN_ID_BI = '52000' AND TMP.APPRAISAL_DT =
  TO_DATE('2003-01-19','YYYY-MM-DD') AND TMP.RANK_ID = '55053')


Regards
 


Followup   May 23, 2003 - 8am Central time zone:

they are the same, but the first is less efficient by far.  a correlated "IN", wouldn't do that. 

5 stars query   May 22, 2003 - 7pm Central time zone
Reviewer: mo 
Tom:

Do you know why the first query does not produce any results while the other does:

select request_id from shipm_request where request_id not in (Select distinct request_id from 
shipment);

select request_id from shipm_request a where request_id not exists (select request_id from shipment 
b where a.request_id = b.request_id);

Table shipm_request(request_id) has one to many relationship with table shipment(shipment_id).
Thanks 


Followup   May 23, 2003 - 8am Central time zone:

lookup upwards for this text:

No, partially because NOT IN and NOT EXISTS are not substitutes for eachother! 

and read that.  it is all about NULL 

2 stars Proof IN / EXISTS   August 29, 2003 - 1pm Central time zone
Reviewer: Marcio from br
<code>Tom, I did test to prove to my colleague the theory of IN/EXISTS exposed here (He just ignore it and ask to developers to use IN always -- to improve performance).
But when I executed -- to me they are same!
take a look

ops$t_mp00@MRP9I1> drop table small;

Table dropped.

ops$t_mp00@MRP9I1> drop table big;

Table dropped.

ops$t_mp00@MRP9I1> ed teste_in_exist

ops$t_mp00@MRP9I1> create table big as select * from all_objects;

Table created.

ops$t_mp00@MRP9I1> insert /*+ append */ into big select * from big;

31317 rows created.

ops$t_mp00@MRP9I1> commit;

Commit complete.

ops$t_mp00@MRP9I1> insert /*+ append */ into big select * from big;

62634 rows created.

ops$t_mp00@MRP9I1> commit;

Commit complete.

ops$t_mp00@MRP9I1> insert /*+ append */ into big select * from big;

125268 rows created.

ops$t_mp00@MRP9I1> commit;

Commit complete.

ops$t_mp00@MRP9I1> create index big_idx on big(object_id);

Index created.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> create table small as select * from all_objects where rownum < 100;

Table created.

ops$t_mp00@MRP9I1> create index small_idx on small(object_id);

Index created.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> analyze table big compute statistics
2 for table
3 for all indexes
4 for all indexed columns
5 /

Table analyzed.

ops$t_mp00@MRP9I1> analyze table small compute statistics
2 for table
3 for all indexes
4 for all indexed columns
5 /

Table analyzed.

ops$t_mp00@MRP9I1> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.2.1 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production

c:\migracao\loads>plus /

SQL*Plus: Release 9.2.0.2.0 - Production on Fri Aug 29 14:07:08 2003

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.2.1 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.2.0 - Production

ops$t_mp00@MRP9I1> @teste_in_exist
ops$t_mp00@MRP9I1> rem create table big as select * from all_objects;
ops$t_mp00@MRP9I1> rem insert /*+ append */ into big select * from big;
ops$t_mp00@MRP9I1> rem commit;
ops$t_mp00@MRP9I1> rem insert /*+ append */ into big select * from big;
ops$t_mp00@MRP9I1> rem commit;
ops$t_mp00@MRP9I1> rem insert /*+ append */ into big select * from big;
ops$t_mp00@MRP9I1> rem commit;
ops$t_mp00@MRP9I1> rem create index big_idx on big(object_id);
ops$t_mp00@MRP9I1> rem
ops$t_mp00@MRP9I1> rem
ops$t_mp00@MRP9I1> rem create table small as select * from all_objects where rownum < 100;
ops$t_mp00@MRP9I1> rem create index small_idx on small(object_id);
ops$t_mp00@MRP9I1> rem
ops$t_mp00@MRP9I1> rem analyze table big compute statistics
ops$t_mp00@MRP9I1> rem for table
ops$t_mp00@MRP9I1> rem for all indexes
ops$t_mp00@MRP9I1> rem for all indexed columns
ops$t_mp00@MRP9I1> rem /
ops$t_mp00@MRP9I1> rem analyze table small compute statistics
ops$t_mp00@MRP9I1> rem for table
ops$t_mp00@MRP9I1> rem for all indexes
ops$t_mp00@MRP9I1> rem for all indexed columns
ops$t_mp00@MRP9I1> rem /
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> alter session set sql_trace = true;

Session altered.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> select count(subobject_name)
2  from big
3  where object_id in ( select object_id from small )
4 /

COUNT(SUBOBJECT_NAME)
---------------------
            0

1 row selected.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> select count(subobject_name)
2  from big
3  where exists ( select null from small where small.object_id = big.object_id )
4 /

COUNT(SUBOBJECT_NAME)
---------------------
            0

1 row selected.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> select count(subobject_name)
2  from small
3  where object_id in ( select object_id from big )
4 /

COUNT(SUBOBJECT_NAME)
---------------------
            0

1 row selected.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> select count(subobject_name)
2  from small
3  where exists ( select null from big where small.object_id = big.object_id )
4 /

COUNT(SUBOBJECT_NAME)
---------------------
            0

1 row selected.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> alter session set sql_trace = false;

Session altered.

ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1>
ops$t_mp00@MRP9I1> set echo off
ops$t_mp00@MRP9I1> host tkprof e:\oracle\admin\mrp9i1\udump\mrp9i1_ora_1876.trc sys=n output=tkprof.txt explain=/

TKPROF: Release 9.2.0.2.1 - Production on Fri Aug 29 14:07:57 2003

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

ops$t_mp00@MRP9I1>

tkprof.txt
==========


select count(subobject_name)
from big
where object_id in ( select object_id from small )

call  count    cpu  elapsed    disk    query  current    rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse    1    0.02    0.01      0      0      0      0
Execute    1    0.00    0.00      0      0      0      0
Fetch    2    7.11    7.45      0    3424      0      1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total    4    7.13    7.47      0    3424      0      1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 102 (OPS$T_MP00)

Rows  Row Source Operation
------- ---------------------------------------------------
    1 SORT AGGREGATE (cr=3424 r=0 w=0 time=7457761 us)
  792  NESTED LOOPS SEMI (cr=3424 r=0 w=0 time=7454167 us)
250536  TABLE ACCESS FULL BIG (cr=3422 r=0 w=0 time=1579621 us)
  792  INDEX RANGE SCAN SMALL_IDX (cr=2 r=0 w=0 time=2667651 us)(object id 40890)


Rows  Execution Plan
------- ---------------------------------------------------
    0 SELECT STATEMENT  GOAL: CHOOSE
    1  SORT (AGGREGATE)
  792  NESTED LOOPS (SEMI)
250536  TABLE ACCESS  GOAL: ANALYZED (FULL) OF 'BIG'
  792  INDEX  GOAL: ANALYZED (RANGE SCAN) OF 'SMALL_IDX'
          (NON-UNIQUE)

********************************************************************************

select count(subobject_name)
from big
where exists ( select null from small where small.object_id = big.object_id )

call  count    cpu  elapsed    disk    query  current    rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse    1    0.01    0.00      0      0      0      0
Execute    1    0.00    0.00      0      0      0      0
Fetch    2    6.99    6.99      0    3421      0      1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total    4    7.00    7.00      0    3421      0      1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 102 (OPS$T_MP00)

Rows  Row Source Operation
------- ---------------------------------------------------
    1 SORT AGGREGATE (cr=3421 r=0 w=0 time=6995659 us)
  792  NESTED LOOPS SEMI (cr=3421 r=0 w=0 time=6992099 us)
250536  TABLE ACCESS FULL BIG (cr=3419 r=0 w=0 time=1473069 us)
  792  INDEX RANGE SCAN SMALL_IDX (cr=2 r=0 w=0 time=2557279 us)(object id 40890)


Rows  Execution Plan
------- ---------------------------------------------------
    0 SELECT STATEMENT  GOAL: CHOOSE
    1  SORT (AGGREGATE)
  792  NESTED LOOPS (SEMI)
250536  TABLE ACCESS  GOAL: ANALYZED (FULL) OF 'BIG'
  792  INDEX  GOAL: ANALYZED (RANGE SCAN) OF 'SMALL_IDX'
          (NON-UNIQUE)

********************************************************************************

select count(subobject_name)
from small
where object_id in ( select object_id from big )


Followup   August 29, 2003 - 1pm Central time zone:

yes, the cbo is much much smarter then the rbo ever was, i should have caveated that.  using the 
rbo -- very different.  using the cbo -- all fast, you need not be as concerned about small things 
like this as much 

5 stars Query Result   September 3, 2003 - 11pm Central time zone
Reviewer: AVS from India
Will the two queries metioned below fetch me the same result ?


Query : 1     SELECT /*+ index(ord_l2) */
       MAX(ord.data_date) data_date,
           us.person_id,
           MAX(us.level_1) ouc,
           MAX(ord.taker_user_id) css_user_id,
           MAX(ord.week_number) week_no,
           SUM(DECODE(ord.create_type,'M',
            DECODE(mv1.measure,'OSM00091',0,'OSM00096',0,ord_l2.quantity),
            ord_l2.quantity)) total_quantity,
       SUM(ord_l2.SPECIAL_PAYMENT_CHARGE) * 100 Money_value,
           mv1.measure measure,
       us.mgr_ouc
      FROM poc1  ord,
           fcih  us,
           polc1  ord_l2,
           fmv  mv1
     WHERE ord.taker_user_id = us.css_user_id
       AND ord.create_type IN ('P', 'G', 'M')
       AND ord.data_date = report_date
       AND ord.sub_zone LIKE report_sub_zone
       AND ord.order_status         != 'OL'
       AND ord_l2.input_cont_type     IN  ('O', 'R', 'S1')
       AND ord_l2.order_number = ord.order_number
       AND ord_l2.data_date    = ord.data_date
       AND ord_l2.sub_zone     = ord.sub_zone
       AND ord_l2.product_id = mv1.column_value
       AND ord_l2.action_id NOT IN ('C', 'R', 'S')
       AND ord_l2.status != 'X'
       AND mv1.source_system = 'FCH'
       AND (mv1.measure = 'OSM00100' OR mv1.measure LIKE ('OSM0009%'))
       AND mv1.column_name = 'PRODUCT_ID'
       AND EXISTS (SELECT 'x'
                     FROM polc1  ord_l,
                          fmv  mv2
                    WHERE ord_l.input_cont_type     IN  ('O', 'R', 'S1')
                AND ord_l.order_number = ord.order_number
                      AND ord_l.data_date    = ord.data_date
                      AND ord_l.sub_zone     = ord.sub_zone
                      AND ord_l.product_id = mv2.column_value
                      AND ord_l.action_id NOT IN ('C', 'R', 'S')
                      AND ord_l.status != 'X'
                      AND mv2.source_system = 'FCH'
                      AND (mv2.measure ='OSM00100' OR mv2.measure LIKE ('OSM0009%'))
                      AND mv2.column_name = 'PRODUCT_ID'
                      AND EXISTS (SELECT 'x'
                                    FROM fmv  mv3
                                   WHERE ord_l.sales_channel = mv3.column_value
                                     AND mv3.source_system = 'FCH'
                                     AND mv3.measure = mv2.measure
                                     AND mv3.column_name = 'SALES_CHANNEL'))
  GROUP BY person_id, measure, mgr_ouc;



Query 2 :    SELECT /*+ index(ord_l2) */
       MAX(ord.data_date) data_date,
           us.person_id,
           MAX(us.level_1) ouc,
           MAX(ord.taker_user_id) css_user_id,
           MAX(ord.week_number) week_no,
           SUM(DECODE(ord.create_type,'M',
            DECODE(mv1.measure,'OSM00091',0,'OSM00096',0,ord_l2.quantity),
            ord_l2.quantity)) total_quantity,
       SUM(ord_l2.SPECIAL_PAYMENT_CHARGE) * 100 Money_value,
           mv1.measure measure,
       us.mgr_ouc
      FROM poc1  ord,
           fcih  us,
           polc1  ord_l2,
           fmv  mv1
     WHERE ord.taker_user_id = us.css_user_id
       AND ord.create_type IN ('P', 'G', 'M')
       AND ord.data_date = report_date
       AND ord.sub_zone LIKE report_sub_zone
       AND ord.order_status         != 'OL'
       AND ord_l2.input_cont_type     IN  ('O', 'R', 'S1')
       AND ord_l2.order_number = ord.order_number
       AND ord_l2.data_date    = ord.data_date
       AND ord_l2.sub_zone     = ord.sub_zone
       AND ord_l2.product_id = mv1.column_value
       AND ord_l2.action_id NOT IN ('C', 'R', 'S')
       AND ord_l2.status != 'X'
       AND mv1.source_system = 'FCH'
       AND (mv1.measure = 'OSM00100' OR mv1.measure LIKE ('OSM0009%'))
       AND mv1.column_name = 'PRODUCT_ID'
       AND EXISTS (SELECT 'x'
                   FROM fmv  mv3
                   WHERE ord_l.sales_channel = mv3.column_value
                   AND mv3.source_system = 'FCH'
                   AND mv3.measure = mv1.measure
                   AND mv3.column_name = 'SALES_CHANNEL')
  GROUP BY person_id, measure, mgr_ouc;

 


Followup   September 4, 2003 - 8am Central time zone:

i don't know, hows about you tell us what is different between them.  

they are pretty big - point out the salient changes and why you made them. 

3 stars Query Tuning (Rewriting).   September 5, 2003 - 9am Central time zone
Reviewer: AVS from India
The Query 1 looks like 

SELECT XXXXXX FROM A,B,C,D       /* PART 1 of QUERY 1 */ 
WHERE CONDITION1
AND   CONDITION2
AND   CONDITION3
AND   CONDITION4
AND   CONDITION5
AND   CONDITION6
AND   CONDITION7
AND   CONDITION8
AND   CONDITION9
AND   CONDITION10
AND   CONDITION11
AND   CONDITION12
AND   CONDITION13
AND   CONDITION14
AND   CONDITION15
AND   EXISTS 
       (SELECT x FROM C,D        /* PART 2 of QUERY 1 */ 
        WHERE CONDITION1
        AND   CONDITION2
        AND   CONDITION3
        AND   CONDITION4
        AND   CONDITION5
        AND   CONDITION6
        AND   CONDITION7
        AND   CONDITION8
        AND   CONDITION9
        AND EXIST 
           (SELECT 'x' FROM C,D /* PART 3 of QUERY 1 */ 
            WHERE CONDITION1
            AND   CONDITION2
            AND   CONDITION3
            AND   CONDITION4 )

I found that the PART 2's (FIRST EXISTS) WHERE conditions is repeatation of PART1's condtions. 
Hence I want to eliminate it, which I've done in QUERY 2.

The PART1 of Query1 without EXISTS fetch me reults in 5 minutes but with SINGLE EXISTS (QUERY 2) it 
runs for 4 hours . 

So I wrote another yet query by creating a new table, NEWTAB


/*   CREATE TABLE NEWTAB AS 
SELECT * FROM fch
WHERE SOURCE_SYSTEM = 'FCH' 
and   COLUMN_NAME = 'SALES_COLUMN'  */

Query 3 :

    SELECT /*+ index(ord_l2) */
       MAX(ord.data_date) data_date,
           us.person_id,
           MAX(us.level_1) ouc,
           MAX(ord.taker_user_id) css_user_id,
           MAX(ord.week_number) week_no,
           SUM(DECODE(ord.create_type,'M',
            DECODE(mv1.measure,'OSM00091',0,'OSM00096',0,ord_l2.quantity),
            ord_l2.quantity)) total_quantity,
       SUM(ord_l2.SPECIAL_PAYMENT_CHARGE) * 100 Money_value,
           mv1.measure measure,
       us.mgr_ouc
      FROM PSTN_ORDER_C_1@qdwp ord,
           fch_user_id_hier@qdwp us,
           PSTN_ORDER_LINE_C_1@qdwp ord_l2,
           fch_measure_values@qdwp mv1, NEWTAB mv3
     WHERE ord.taker_user_id = us.css_user_id
       AND ord.create_type IN ('P', 'G', 'M')
       AND ord.data_date = report_date
       AND ord.sub_zone LIKE report_sub_zone
       AND ord.order_status         != 'OL'
       AND ord_l2.input_cont_type     IN  ('O', 'R', 'S1')
       AND ord_l2.order_number = ord.order_number
       AND ord_l2.data_date    = ord.data_date
       AND ord_l2.sub_zone     = ord.sub_zone
       AND ord_l2.product_id = mv1.column_value
       AND ord_l2.action_id NOT IN ('C', 'R', 'S')
       AND ord_l2.status != 'X'
       AND mv1.source_system = 'FCH'
       AND (mv1.measure = 'OSM00100' OR mv1.measure LIKE ('OSM0009%'))
       AND mv1.column_name = 'PRODUCT_ID'
       AND ord_l2.sales_channel = mv3.column_value
       AND mv1.measure = mv3.measure 
  GROUP BY person_id, mv1.measure, mgr_ouc;


This query (QUERY 3) also takes 4 hours. 

Hope you are now clear, what I want to achieve !
Please Help. 


Followup   September 5, 2003 - 4pm Central time zone:

what do you want to achieve?  

help with what?  

if this is "please tune my query" -- not here, this is for followups on the original question. 

1 stars Query Rewriting   September 8, 2003 - 8am Central time zone
Reviewer: AVS from India
Yes you are right I was trying to tune the query, for which I required your help. Since it 
consisted of removing of EXIST clause from one of the query I thought its continuation of the 
original subject(IN & EXISTS). If it was annoying I'm sorry.
 


4 stars Query Execution takes forever using IN operator   November 7, 2003 - 2pm Central time zone
Reviewer: TS from USA
Tom,

I 've a query like this:-

select id from table1 
where ( contains( name, ' $?AB and $?CD and $?EF ' ) > 0)
and id in ('1234','4567','97343','0934',4576','6376''5017', '3930', '5099', '5089', '5061', '4753', 
'3646', '3357', '3047', '8059','4124', '17591','14700', '60973', '34987', '37586', '58985', 
'34874', '49161', '11489');

the name column which has an intermedia index  is being used during query execution but the index 
on ID is not being used.
The query takes a long time to execute. I tried using
RBO, but still was slow. Eventually I tried deleting the 
statistics using "analyze table1 delete statistics" and the
query was very fast. So,do I need to delete the statistics
to make it run faster(my only concern is this query might run faster after stats delete but some 
other query hitting the same table might become slower),or is there a way to rewrite the query to 
make it run faster?

Thanks for your help.
TS 


Followup   November 7, 2003 - 5pm Central time zone:

well, a query with contains is going to use CBO no matter what (so, you never used RBO, cannot be 
done!)

Hows about you show us

a) tkprof AND autotrace traceonly explain of the query with stats
b) tkprof AND autorace traceonly explain of the query without stats
 

4 stars Followup: Query Execution takes forever using IN operator   November 12, 2003 - 11am Central time zone
Reviewer: TS from USA
Tom,

Here 's the tkprof output running the query,

Without Stats:-
==============
select id from table1 
where ( contains( name, ' $?AB and $?CD and $?EF ' ) > 0)
and id in ('1234','4567','97343','0934',4576','6376''5017', '3930', '5099', 
'5089', '5061', '4753', '3646', '3357', '3047', '8059','4124', '17591','14700', 
'60973', '34987', '37586', '58985', '34874', '49161', '11489');

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=88)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TABLE1' (Cost=1 Card=1 B
      ytes=88)

   2    1     DOMAIN INDEX OF 'NAME_TXT_IND' (Cost=1 Card=1)

from tkprof output:-
===================
EXPLAIN PLAN SET STATEMENT_ID='PLUS1583908' FOR select id from table1 
where ( contains( name, ' $?AB and $?CD and $?EF ' ) > 0)
and id in ('1234','4567','97343','0934',4576','6376''5017', '3930', '5099', 
'5089', '5061', '4753', '3646', '3357', '3047', '8059','4124', '17591','14700', 
'60973', '34987', '37586', '58985', '34874', '49161', '11489');

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0         92          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.00          0         92          0           0

With Stats:-
==========
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=88)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TABLE1' (Cost=1 Card=1 B
      ytes=88)

   2    1     DOMAIN INDEX OF 'NAME_TXT_IND' (Cost=1 Card=1)




Statistics
----------------------------------------------------------
       1313  recursive calls
      1  db block gets
       1715  consistent gets
    205  physical reads
       9568  redo size
       6729  bytes sent via SQL*Net to client
       4967  bytes received via SQL*Net from client
     67  SQL*Net roundtrips to/from client
     16  sorts (memory)
      0  sorts (disk)
      0  rows processed

tkprof output:-
=============
select id from table1 
where ( contains( name, ' $?AB and $?CD and $?EF ' ) > 0)
and id in ('1234','4567','97343','0934',4576','6376''5017', '3930', '5099', 
'5089', '5061', '4753', '3646', '3357', '3047', '8059','4124', '17591','14700', 
'60973', '34987', '37586', '58985', '34874', '49161', '11489');


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.00       0.00          0         33          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.00       0.00          0         33          0           0


Thanks,
TS 


Followup   November 12, 2003 - 4pm Central time zone:

i want the RAW tkprof with timings -- no explain plans (they can lie).  show me the real tkprof, 
making sure of course that timed statistics was ON 

5 stars Thanks for the insight   December 30, 2003 - 4pm Central time zone
Reviewer: dharma from CA USA
HI Tom,

I have a table like this(A fictitious though)
company_id    deptno    empno    status
1        10    1    A
1        10    2    A
1        10    3    P
1        20    4    A
1        20    5    A
1        30    6    A
1        30    7    P
2        10    8    A
2        10    9    A
2        20    10    P
2        30    11    A

Primary Key on company_id,deptno,empno

To get the depts that dont have employees with status=P.

Is there anyother way other than using not exists to bring the required rows

SELECT DISTINCT deptno FROM COMP_EMP ce
 WHERE NOT EXISTS (SELECT 1 FROM COMP_EMP ces
                    WHERE ce.company_id=ces.company_id
                      AND ce.deptno=ces.deptno
                      AND ces.status='P')
since it does a sub-query on the table comp_emp and comp_emp has a million rows, will "not exists" 
withstand the query.

BTW
Database is Oracle9i Enterprise Edition Release 9.2.0.4.0 


Followup   December 30, 2003 - 4pm Central time zone:

select deptno, count( decode(status,'P',1) )
  from comp_emp
 group by deptno
having count( decode( status,'P',1 ) ) = 0;

will do it as well.  there are dozens of ways....


select deptno from comp_emp
minus
select deptno from comp_emp where status = 'P';

is another.... 

4 stars ORA-01415 too many distinct aggregate functions..   January 2, 2004 - 6am Central time zone
Reviewer: Naveen from India
Hi Tom,

I have a select statement which has a IN clause like below:

Select <column list>
from <tablename>
where column1 in(1,2,3,..)

When i am testing it by giving more and more values in the IN clause it gave the above error. Why 
is there a limit for values that can be set in a IN clause. Is the above approach recommended. Is 
there a way to overcome this. Kindly throw some light on the issue.

Thank you
Nav. 


Followup   January 2, 2004 - 9am Central time zone:

there is a documented limited on the number of OR's and inlist items.

use a subquery (you probably aren't using binds either are you!).

create global temporary table t ( x int ) on commit delete rows;

and then INSERT the values into t and query simply


select * from table where c in ( select * from t );

 

5 stars Thanks Tom   January 2, 2004 - 11am Central time zone
Reviewer: Naveen from India
Hi Tom,

Yes you are correct. I am not using bind variables either. I am checking the limit of IN clause as 
i have to decide which approach i need to take. I'll keep this in mind and thanks again for 
immediate reply.

Thank you
Nav.  


Followup   January 2, 2004 - 1pm Central time zone:

not using bind variables will be the largest mistake of your entire career!  rethink that. 

5 stars interesting ?   February 3, 2004 - 6am Central time zone
Reviewer: Fenng from CNOUG China
SQL> desc a
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 X                                                  NUMBER(38)
 Y                                                  VARCHAR2(1)

SQL> desc b
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 X1                                                 NUMBER(38)
 Y1                                                 VARCHAR2(1)

SQL> select x from a
  2   where x not in (select x from b);

no rows selected


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'A'
   3    1     FILTER
   4    3       TABLE ACCESS (FULL) OF 'B'





SQL> 

It's OK.But "x" is NOT table B's field.Would you like explain it for us ? Thanks a lot ! 


Followup   February 3, 2004 - 7am Central time zone:

it is expected, done by design, part of the standard.

it is simply called a correlated subquery.


consider:

select * from dept
where exists ( select null from emp where emp.deptno = dept.deptno )
                                                       ^^^^^^^^^^^

you expect that to work right?  well:

SQL> select x from a
  2   where x not in (select x from b);

is not any different than:

SQL> select x from a
  2   where x not in (select A.X from b); 

5 stars Thank you!   February 3, 2004 - 9pm Central time zone
Reviewer: Fenng 
Thank you very much~! 


5 stars How to optimize this query?   April 14, 2004 - 9am Central time zone
Reviewer: Tony from India, Chennai
Tom,
 I want the rows from table cust_trans that are not in both cust_mast and customer_master@db_link1 
tables. The query is
    
SELECT a.cust_id, a.cust_name, a.trans_date 
FROM cust_trans  SUBPARTITION(sp001) a
WHERE( a.cust_id 
             NOT IN(SELECT b.cust_id 
                    FROM cust_mast SUBPARTITION(sp001) b )
        AND a.cust_id 
             NOT IN(SELECT c.cust_id 
                    FROM customer_master@db_link1 c )
      );

The table customer_master is accessed through db link and it has millions of records. Other tables 
also have millions of records. The subpartiton name is dynamically passed to the stored procedure. 
Thus I'm going to use dynamic cursor or Execute immediate. Is there any way to optimize this query?
 


Followup   April 14, 2004 - 10am Central time zone:

got a query plan for us to look at? (set autotrace traceonly explain would be best) 

5 stars exists is NOT the same as join right?   April 28, 2004 - 4am Central time zone
Reviewer: A reader 
Hi

I always thought that an EXISTS can be converted to a JOIN until I find this query

select count(*)
FROM OM_WELCOME_PACK t
WHERE EXISTS (SELECT 1
                FROM OM_EMF e, LISTA_ORDENES l
               WHERE t.emf_ext_id=e.external_id
                 AND e.order_id=l.order_id
                 AND l.order_type_id=3)

  COUNT(*)
----------
        99

If I convert this to a join

select count(*)
FROM OM_WELCOME_PACK t, OM_EMF e, LISTA_ORDENES l
WHERE t.emf_ext_id=e.external_id
AND e.order_id=l.order_id

  COUNT(*)
----------
       163

I get duplicate rows, is this correct behaviour...? 


5 stars Jr. Oracle Developer   May 24, 2004 - 5pm Central time zone
Reviewer: Dave from California, USA
Hello Tom, 

I need to combine three select statement in script1/select1. 
Two columns (PCS, Speciality) of script1 should be base on the condition define in script 2 & 
script 3. 
Columns alias are "PCS" and "Specialty" 
PCS should be base on script two and Speciality should be base on script 3. 
Please have a look. 
Script 1: (Main Script) 

select a.ref_patient_fname|| ' , ' || a.ref_patient_lname "Patient Name", 
to_char( a.ref_patient_dob, 'mm/dd/yy' ) DOB, 
a.ref_patient_id1 "OVMC ID", 
d.site_name || ' , '|| b. clact_desc "PCS", 
d.site_name || ' , '|| b. clact_desc "Specialty", 
to_char(c.appt_date , 'HH:MI AM' ) Appointments 
from dhs_referrals a, 
dhs_clinical_activities b, 
dhs_appointments c, 
dhs_sites d 
where a.ref_clact_id_requested = b.clact_id and 
a.ref_clact_id_requesting = b.clact_id and 
c.appt_ref_id = a.ref_id and 
d.site_id = b.clact_site_id 

Script 2: 

select c.site_name || ' , '|| b. clact_desc "PCS" , a.ref_clact_id_requesting 
from dhs_sites c, dhs_clinical_activities b, dhs_referrals a 
where a.ref_clact_id_requesting = b.clact_id 

Script 3 
select c.site_name || ' , '|| b. clact_desc "Speciality" , a.ref_clact_id_requested 
from dhs_sites c, dhs_clinical_activities b, dhs_referrals a 
where a.ref_clact_id_requested = b.clact_id 

Any help will be appreciated. 

Dave
 


Followup   May 24, 2004 - 7pm Central time zone:

don't understand the question.  I see three queries but I've not any idea what you are trying to do 
with them really? 

5 stars Jr. Oracle Developer   May 24, 2004 - 7pm Central time zone
Reviewer: Dave from CA, USA
Sorry for inconvinince.

Actually, On my script1.
I have two columns "PCS" and "Speciality".
I need to call values for "PCS" and "Speciality" based on script 2 & script 3 respectively.

e.g:

select  c.site_name || ' , '|| b. clact_desc "PCS" 
from dhs_sites c, dhs_clinical_activities b, dhs_referrals a
where a.ref_clact_id_requesting = b.clact_id 

will give me the values for "PCS" in script1.

and

select  c.site_name || ' , '|| b. clact_desc "Speciality"  
from dhs_sites c, dhs_clinical_activities b, dhs_referrals a
where a.ref_clact_id_requested = b.clact_id

will give me values for "Speciality" in Script1.

Please note: I used "PCS" and "Speciality" as a column aliases in all scripts. 

cheers,
Dave

 


Followup   May 24, 2004 - 11pm Central time zone:

and what does:

I need to call values for "PCS" and "Speciality" based 

mean exactly?  I mean you have two spool files maybe, one full of column "X" and one full of column 
"Y".  Now what?  what is the logic to use that? 

5 stars Jr. Oracle Developer   May 25, 2004 - 12pm Central time zone
Reviewer: Dave from CA, USA
No I do not have any spool file etc.
Here what I wanted to do.


e.g:


I have main script which is consists of three columns.

select a.DOB, 
b.clact_desc||' , '||c.site_name "PCS", 
b.clact_desc||' , '||c.site_name "Speciality"
from dhs_referrals a, dhs_clinical b, dhs_site c
where .....

In dhs_referrals table I also have two more columns called clact_requested_id & 
clact_requesting_id.

Here is the logic I wanted to put for column 2 & 3 respectively.

where b.clact_id = a.clact_requesting_id 
gives me the value for PCS. (Column 2 in the main script above)

or

where b.clact_id = a.clact_requested_id gives me the value for Speacility. (Column 3 in the main 
script above)

I need your help to add above two logic in the main select script.














 


Followup   May 25, 2004 - 12pm Central time zone:

sorry -- i don't think we are going to get anywhere here.  Your terminology is just not anything 
I'm understanding.  I do not see the tie in between columns in a select and separate predicates.    

5 stars Join or sub-query   August 20, 2004 - 5am Central time zone
Reviewer: atul from India
Hi,

What is fast Join or subquery??
Under what conditions??

Thanks,
Atul 


Followup   August 20, 2004 - 11am Central time zone:

er?  does not compute. 

3 stars How to update a row if the row exist if not insert?   September 23, 2004 - 10pm Central time zone
Reviewer: Vinod from Banglore, India
Tom - I want to update a row if the row exist if not insert that row. Is there a simple way to do 
this? I was doing if exist then update if not insert but I would like to try it in one single DML 
if possible.

Vinod 


Followup   September 24, 2004 - 10am Central time zone:

in theory you could use merge

merge into t
into ( select :x x, :y y, :z z from dual ) d
on ( t.x = d.x )
when matched then update set ...
when not matched then insert....

but 

update t set y = :y, z = :z where x = :x;
if (sql%rowcount = 0)
then
   insert into t (...) values (....);
end if;

it about the same in this case.
 

3 stars What about "=" and "LIKE" comparison operators   October 11, 2004 - 3pm Central time zone
Reviewer: denni50 from na
Hi Tom....Welcome back!....(we MISSED you!)

I know the discussion here involves NOT,EXISTS,NOT IN and
NOT EXISTS operators....however I recently stumbled upon a
situation where using the "=" comparison caused the 
system to come to a grinding halt. When I changed the
operator to LIKE 'A%' instead of = 'A' the system performed
normally. 

We recently converted test system to CBO and encountered
this situation for the first time when testing one of
our third party software modules where users are allowed
to insert additional sql statements. With RBO users would
enter: where col_name='A' and everything would work fine. 

This was discovered when investigating why this function performed normally during the first phase, 
then came to a grinding slowdown during the second phase and saw where some users entered col_name 
LIKE 'A%' and others had entered col_name ='A'(this is when the system would virtually stop). When 
I changed all the ='A' to LIKE 'A%'everything ran smoothly.

Have you ever encountered performance issues between = and
LIKE?

glad you made it "home" safe and sound.



 
 


Followup   October 11, 2004 - 4pm Central time zone:

how do you gather statistics -- exact command.

when you do an autotrace traceonly explain on the two -- what do you see as differences?  is the 
CARD=xxxx part of the query even close to reality. 

3 stars OEM SQL ANALYZE   October 11, 2004 - 5pm Central time zone
Reviewer: denni50 from na
I ran a sql analyze through OEM on both statements
the cost was around 10450-14512 for both sql statements.
Can't post results due to proprietary constraints.

There's alot of nested loops and bitmap conversions
from rowids with the largest table(approx 8+ million
records) doing full table scans.

 


Followup   October 11, 2004 - 7pm Central time zone:

cost is not meaningful.

the CARD= parts -- are they even *close* to reality.

given you stats method of collection (no histograms -- just one bucket with method_opt at its 
default) the optimizer is using "distributions and best guesses".

My guess is, 'A' has lots and lots of values.

where column = 'A' is guessing "too low" (no histograms)

where column like 'A%' is guessing "higher, closer to reality"


 

3 stars Gather statistics   October 11, 2004 - 5pm Central time zone
Reviewer: denni50 from na
I use the dbms_stats package:
gather_schema_stats cascade>=true

 


3 stars thanks Tom   October 12, 2004 - 9am Central time zone
Reviewer: denni50 from na
I understand what you're saying about the histograms.
While you were away I experimented and ran statistics(dbms_stats) on the schema and used the 
method_opt SIZE AUTO on Indexed Columns parameter to gather histogram stats. Upon completion 
everything I tried to run, simple sql statements and queries took forever to run. I would log into 
the software application and everything seemed to just hang. I deleted all the histograms, re-ran 
table and index stats only and performance went back to normal processing
except for this latest incident with the = and LIKE operators.

I'm still trying to learn and understand how histograms
affect and work within CBO. The other frustrating part
is that I'm dealing with code that is not mine so tuning
it is out of the question. All I can do is run my own
analyze and trace reports...send it to the vendor and
HOPE someone figures out the problem. 

I am going to re-run the trace reports through sqlplus
instead of OEM since the OEM does not show the card=
(at least I didn't see it unless it is represented under
 a different header).

I know "cost" is not what is important. Cost is just an
arbitrary number assigned to a particular query at a
particular time in a given environment that the
optimizer chooses as the lowest cost.

I've learned quite a bit in 2+ years but still have a
heck of alot more to learn and understand as far as CBO
is concerned.






 


3 stars Explain Plan   October 15, 2004 - 3pm Central time zone
Reviewer: denni50 from na
Tom

attached is the explain plan for the scenario I've been describing in the earlier threads here 
about the '=' and
'LIKE' operators.

(had to modify the names of tables,columns and indexes
 since this is not my code.)


Table1 has approx 3.2 million records
Table2 is a temporary table(just for this function) 
Table3 has 8+ million records.

Just by looking at the explain plan can you tell what
is going on. This is a new procedure that's been added
to a newer version of software that is causing the functionality to take 3 times longer to 
complete.

thanks


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=15825 Card=468 Bytes
          =30888)

   1    0   NESTED LOOPS (Cost=15825 Card=468 Bytes=30888)
   2    1     HASH JOIN (Cost=15824 Card=468 Bytes=28080)
   3    2       NESTED LOOPS (Cost=2714 Card=15279 Bytes=320859)
   4    3         SORT (UNIQUE)
   5    4           TABLE ACCESS (BY INDEX ROWID) OF 'TABLE1' (Cost=
          2649 Card=15279 Bytes=229185)

   6    5             INDEX (RANGE SCAN) OF 'TABLE1_COL2_INDEX'
           (NON-UNIQUE) (Cost=439 Card=241176)

   7    3         INDEX (UNIQUE SCAN) OF 'TABLE2_ID_IDX' (UNIQUE)
   8    2       VIEW (Cost=13084 Card=11085 Bytes=432315)
   9    8         FILTER
  10    9           SORT (GROUP BY) (Cost=13084 Card=11085 Bytes=31038
          0)

  11   10             NESTED LOOPS (Cost=10452 Card=230371 Bytes=64503
          88)

  12   11               TABLE ACCESS (FULL) OF 'TABLE3' (Cost=10450 C
          ard=230371 Bytes=5068162)

  13   11               INDEX (UNIQUE SCAN) OF 'TABLE2_ID_IDX' (UNIQ
          UE)

  14    1     INDEX (UNIQUE SCAN) OF 'PK_TABLE1_ID' (UNIQUE)











 


3 stars Crazy behaviour!!   December 9, 2004 - 7am Central time zone
Reviewer: Ganesh from Bangalore, India
create table emp (empno number(3), dno number(3))
/
create table dept (dno number(3))
/
select * from emp
where empno in
(select empno from dept)
/

I'm wondering how the above query works when there is no empno field in dept table?   


Followup   December 9, 2004 - 2pm Central time zone:

that is not crazy, it is expected and called a correlated subquery.

select * 
  from emp 
 where emp.empno in ( select emp.empno
                        from dept );

makes more sense with the correlation names there.....


just like:


select * from dept
where exists ( select null from emp where emp.deptno = dept.deptno );

the columns of the outer query are available in the inner queries. 

2 stars why is it so?   December 10, 2004 - 1am Central time zone
Reviewer: Ganesh from Bangalore, India
Thanks Tom. I guessed that it would be correlated subquery.

By why does oracle apply its intelligence, when I dont mention explicitly emp.empno?

 


Followup   December 10, 2004 - 10am Central time zone:

confusion -- it did apply "its intelligence"

your query was syntactically correct
your query was valid SQL
your query must be run as is

It did exactly what is it documented, expected and supposed to do.  It is not a "bad query", it is 
not a "mistake".  the only mistake here was that you didn't mean to run THAT query.

This is like asking us to raise an error because you query "where 1=0" or something.    

4 stars delete performance issue using 'IN'   December 13, 2004 - 12pm Central time zone
Reviewer: A reader 
Hi Tom,

I have the following SQL:

DELETE 
FROM nitsmedia 
WHERE nitstory_srl not in
(
 SELECT srl 
 FROM nitstory 
 WHERE lower(category) = 'stjohns'
   AND to_date('MAY-01-2003' || ' 00:00:00', 'MM-DD-YYYY HH24:MI:SS') <= last_modified_time
   AND last_modified_time <= to_date('MAY-01-2003' ||  '23:59:59', 'MM-DD-YYYY HH24:MI:SS')
   AND story_origin = 'Regional'
);

It takes forever! 
Number of rows for inner query: 10 rows
Number of rows for outer query: 15289
 
However, the following are fast:

(1) The inner query:
SELECT srl 
 FROM nitstory 
 WHERE lower(category) = 'stjohns'
   AND to_date('MAY-01-2003' || ' 00:00:00', 'MM-DD-YYYY HH24:MI:SS') <= last_modified_time
   AND last_modified_time <= to_date('MAY-01-2003' ||  '23:59:59', 'MM-DD-YYYY HH24:MI:SS')
   AND story_origin = 'Regional';

       SRL
----------
     70640
     70670
     70671
     70673
     70674
     70681
     70682
     70688
     70689
     70692

10 rows selected.


(2)
DELETE FROM nitsmedia WHERE nitstory_srl IN 
(70640,70670,70671,70673,70674,70681,70682,70688,70689,70692)

(3) DELETE FROM nitsmedia;

I remember this rule: 
If the result of the subquery is small -- then IN is typicaly more appropriate. However, this is 
not working this time!

Can you shed any light on why it is like this and what I should do?

Here are the info:
Oracle 9.2 on HP-UX
All tabled have been analyzed

Thanks,
- Arash




 


Followup   December 13, 2004 - 2pm Central time zone:

are you using the CBO 
and is sr1 defined as NOT NULL


(and forget that rule of thumb, not really valid in CBO days... this dates way back to the time of 
the RBO... thinks change.) 

4 stars   December 13, 2004 - 2pm Central time zone
Reviewer: A reader 
Yes Tom. I use the default which is 'choose' as optimizer mode. Srl is not null and there is 
foreign key between srl and nitstory_srl

and here is the explain plan:
Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=232 Card=15288 Bytes
          =1223040)

   1    0   DELETE OF 'NITSMEDIA'
   2    1     FILTER
   3    2       NESTED LOOPS (ANTI) (Cost=232 Card=15288 Bytes=1223040
          )

   4    3         TABLE ACCESS (FULL) OF 'NITSMEDIA' (Cost=25 Card=152
          89 Bytes=397514)

   5    3         TABLE ACCESS (BY INDEX ROWID) OF 'NITSTORY' (Cost=23
          2 Card=1 Bytes=54)

   6    5           BITMAP CONVERSION (TO ROWIDS)
   7    6             BITMAP AND
   8    7               BITMAP CONVERSION (FROM ROWIDS)
   9    8                 INDEX (RANGE SCAN) OF 'PK_NITSTORY' (UNIQUE)
  10    7               BITMAP CONVERSION (FROM ROWIDS)
  11   10                 INDEX (RANGE SCAN) OF 'I2_NITSTORY_STORY_ORI
          GIN' (NON-UNIQUE)



SQL> desc nitsmedia
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SRL                                       NOT NULL NUMBER(12)
 NITSTORY_SRL                              NOT NULL NUMBER(12)
 MEDIAFILE_NME                                      VARCHAR2(100)
 MEDIA_TYP                                 NOT NULL VARCHAR2(20)
 THUMBNAIL                                          VARCHAR2(100)
 CAPTION                                            VARCHAR2(1000)
 POSITION                                           NUMBER
 LAYOUT                                             VARCHAR2(10)

SQL> desc nitstory
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SRL                                       NOT NULL NUMBER(12)
 STORYFILE_NME                             NOT NULL VARCHAR2(30)
 AUTHOR                                             VARCHAR2(42)
 STORY_ORIGIN                                       VARCHAR2(10)
 CATEGORY                                           VARCHAR2(30)
 HEADLINE                                           VARCHAR2(200)
 SUBHEADLINE                                        VARCHAR2(200)
 KEYWORDS                                           VARCHAR2(200)
 HISTORY                                            VARCHAR2(200)
 DATELINE                                           VARCHAR2(100)
 HIGHLIGHT                                          VARCHAR2(1200)
 CREATION_TIME                                      VARCHAR2(30)
 LAST_MODIFIED_TIME                                 VARCHAR2(30)
 TIMEZONE                                           VARCHAR2(25)
 FOOTNOTES                                          VARCHAR2(1000)
 BODY                                               CLOB


I am not sure why there lots of "BITMAP CONVERSION" in plan?

Thank you Tom,
- Arash'


 


Followup   December 13, 2004 - 2pm Central time zone:

can we see a tkprof of the "really slow one" 

4 stars   December 14, 2004 - 11am Central time zone
Reviewer: A reader 
Tom,

Here is the tkprof for the query. I canceled the connection in the middle as I told you it takes 
forever. I also noticed that, it locks both tables 'nitsmedia' and 'nitstory'.

Thanks again for your help and outstanding website.
- Arash

DELETE
FROM nitsmedia
WHERE nitstory_srl not in
(
 SELECT srl
 FROM nitstory
 WHERE lower(category) = 'stjohns'
   AND to_date('MAY-01-2003' || ' 00:00:00', 'MM-DD-YYYY HH24:MI:SS') <= last_mo
dified_time
   AND last_modified_time <= to_date('MAY-01-2003' ||  '23:59:59', 'MM-DD-YYYY H
H24:MI:SS')
   AND story_origin = 'Regional'
)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.11       0.11          0        323          0           0
Execute      1     51.59      52.22          0    2336957       2122           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2     51.70      52.34          0    2337280       2122           0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 101  (NI_DDL)

Rows     Execution Plan
-------  ---------------------------------------------------
      0  DELETE STATEMENT   GOAL: CHOOSE
      0   DELETE OF 'NITSMEDIA'
      0    FILTER
      0     TABLE ACCESS (FULL) OF 'NITSMEDIA'
      0     TABLE ACCESS (BY INDEX ROWID) OF 'NITSTORY'
      0      INDEX (RANGE SCAN) OF 'I2_NITSTORY_STORY_ORIGIN'
                 (NON-UNIQUE)




********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.11       0.11          0        323          0           0
Execute      1     51.59      52.22          0    2336957       2122           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2     51.70      52.34          0    2337280       2122           0

Misses in library cache during parse: 1


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       29      0.06       0.04          0        147          0           0
Execute     76      0.02       0.01          0          0          0           0
Fetch      287      0.00       0.01          0        323          0         223
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      392      0.08       0.07          0        470          0         223

Misses in library cache during parse: 20

    1  user  SQL statements in session.
   29  internal SQL statements in session.
   30  SQL statements in session.
    1  statement EXPLAINed in this session.
********************************************************************************
Trace file: /db/9.2/dbo01/log/udump/dbo01_ora_11063.trc
Trace file compatibility: 9.00.01
Sort options: default

       1  session in tracefile.
       1  user  SQL statements in trace file.
      29  internal SQL statements in trace file.
      30  SQL statements in trace file.
      21  unique SQL statements in trace file.
       1  SQL statements EXPLAINed using schema:
           SYSTEM.prof$plan_table
             Default table was used.
             Table was created.
             Table was dropped.
     604  lines in trace file.
 


Followup   December 15, 2004 - 12pm Central time zone:

sorry -- but a query won't lock any tables.


but something is fishy here.


if you run this script:

drop table emp;
create table emp as select * from scott.emp;
alter session set sql_trace=true;
select count(*) from emp;


then tkprof will say:



select count(*)
from
 emp
                                                                                                    
    
                                                                                                    
    ....

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     14    TABLE ACCESS (FULL) OF 'EMP'




but if you run this one:

drop table emp;
create table emp as select * from scott.emp;
exec dbms_stats.gather_table_stats( user, 'EMP' );
alter session set sql_trace=true;
select count(*) from emp;

tkprof says

select count(*)
from
 emp
                                                                                                    
    
....
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     14    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'EMP'



But before you showed us:

Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=232 Card=15288 Bytes
          =1223040)

   1    0   DELETE OF 'NITSMEDIA'
   2    1     FILTER
   3    2       NESTED LOOPS (ANTI) (Cost=232 Card=15288 Bytes=1223040
          )

   4    3         TABLE ACCESS (FULL) OF 'NITSMEDIA' (Cost=25 Card=152
          89 Bytes=397514)

   5    3         TABLE ACCESS (BY INDEX ROWID) OF 'NITSTORY' (Cost=23
          2 Card=1 Bytes=54)

   6    5           BITMAP CONVERSION (TO ROWIDS)
   7    6             BITMAP AND
   8    7               BITMAP CONVERSION (FROM ROWIDS)
   9    8                 INDEX (RANGE SCAN) OF 'PK_NITSTORY' (UNIQUE)
  10    7               BITMAP CONVERSION (FROM ROWIDS)
  11   10                 INDEX (RANGE SCAN) OF 'I2_NITSTORY_STORY_ORI
          GIN' (NON-UNIQUE)


which shows the use of the CBO (the cost= card= is there ONLY when the CBO is used)

so, I'm thinking you have a big mixture of ANALYZED and UNANALYZED tables here the and the 
optimizer doesn't stand a ghost's of a chance of getting the right answer.

EG: table you are deleting from is analyzed....

These tables you are querying are not (and must be -- if you want NOT IN to run "in your 
lifetime"



 

4 stars   December 15, 2004 - 2pm Central time zone
Reviewer: A reader 
Tom,

All Tables are analyzed, however the problem still there.

ANALYZE TABLE NITSMEDIA COMPUTE STATISTICS FOR TABLE FOR ALL INDEXES ;

ANALYZE TABLE NITSTORY COMPUTE STATISTICS FOR TABLE FOR ALL INDEXES ;

I noticed something that may help.

I created two new tables like:
create table NITSTORY1 as select * from NITSTORY;
create table NITSMEDIA as select * from NITMEDIA;

and then ran the query with new table names:
DELETE 
FROM nitsmedia1 
WHERE nitstory_srl not in
(
 SELECT srl 
 FROM nitstory1 
 WHERE lower(category) = 'stjohns'
   AND to_date('MAY-01-2003' || ' 00:00:00', 'MM-DD-YYYY HH24:MI:SS') <= last_modified_time
   AND last_modified_time <= to_date('MAY-01-2003' ||  '23:59:59', 'MM-DD-YYYY HH24:MI:SS')
   AND story_origin = 'Regional'
)

and it was very FAST. The only difference that I saw between the old tables and new ones was a 
foreign key between the old tables that new tables didn't have that. So I dropped the foreign key 
from old tables to make it in sync with the new tables. But still the delete is very slow on old 
tables.

Please let me know your idea.

Thanks,
- Arash
 


Followup   December 15, 2004 - 6pm Central time zone:

don't use analyze.

use dbms_stats.gather_table_stats( user, 'table_name', cascade=>true );

so, show us the tkprofs between the two.  (i've a feeling that not null column isn't really not 
null -- eg: maybe it says not null but someone put it on there with "novalidate" or something) 

4 stars   December 16, 2004 - 9am Central time zone
Reviewer: A reader 
Hi Tom,

can you tell me why, when I recreate tables (create table NITSTORY1 as select * from NITSTORY....), 
it works well?

I also  test combination of new tables and old tables in the delete statement, and noticed , the 
problem is the inner table "NITSTORY' which makes the query slow. And when I replace this table 
with the new table 'NITSTORY1' , it works fine.

Here are the tables:

create table NITSTORY  (
   SRL                  NUMBER(12)                      not null,
   STORYFILE_NME        VARCHAR2(30)                    not null,
   AUTHOR               VARCHAR2(42),
   STORY_ORIGIN         VARCHAR2(10),
   CATEGORY             VARCHAR2(30),
   HEADLINE             VARCHAR2(200),
   SUBHEADLINE          VARCHAR2(200),
   KEYWORDS             VARCHAR2(200),
   HISTORY              VARCHAR2(200),
   DATELINE             VARCHAR2(100),
   HIGHLIGHT            VARCHAR2(1200),
   CREATION_TIME        VARCHAR2(30),
   LAST_MODIFIED_TIME   VARCHAR2(30),
   TIMEZONE             VARCHAR2(25),
   FOOTNOTES            VARCHAR2(1000),
   BODY                 CLOB,
   constraint PK_NITSTORY primary key (SRL)
);

create table NITSMEDIA  (
   SRL                  NUMBER(12)                      not null,
   NITSTORY_SRL         NUMBER(12)                      not null,
   MEDIAFILE_NME        VARCHAR2(100),
   MEDIA_TYP            VARCHAR2(20)                    not null,
   THUMBNAIL            VARCHAR2(100),
   CAPTION              VARCHAR2(1000),
   POSITION             NUMBER,
   LAYOUT               VARCHAR2(10),
   constraint PK_NITSMEDIA primary key (SRL)
);

alter table NITSMEDIA
   add constraint FK1_NITSTORY_NITSMEDIA foreign key (NITSTORY_SRL)
      references NITSTORY (SRL);


As always, thank you for your time,
- Arash
 


Followup   December 16, 2004 - 10am Central time zone:

not without the tkprofs I asked for, no. 

4 stars   December 16, 2004 - 11am Central time zone
Reviewer: A reader 
Tom. 

I used: 
dbms_stats.gather_table_stats( user, 'table_name', cascade=>true );

and it fixed the problem. Why is that? Can you elaborate it a little more? 

I found that Analyze method from your site. But it seems it doesnÂ’t work well for 9.2!

Thank you again,
- Arash
 


Followup   December 16, 2004 - 11am Central time zone:

you gathered table and index stats but no column stats -- that gathers table, column and index 
stats.

analyze table t compute statistics;

would do roughly the same thing, but dbms_stats is more proper. 

3 stars IN or EXISTS - back on topic !   December 16, 2004 - 3pm Central time zone
Reviewer: DaPi from Geneva - CH
Hi Tom, 

Many thanks for the site.  I much appreciate your scientific/analytic approach.

This thread keeps popping to the top even after 4 years.  All you say is correct, but there is a 
problem:  I believe the question is not the right one to ask.

Real-life:  
I have something to code.  I do so to the best of my ability, keeping in mind clarity & 
maintainability.  I test to make sure I have the right result and, if the performance is 
appropriate, I give thanks and go on to the next task.  If it's slow (for the purpose), I quickly 
try some tricks I have up my sleeve and, if still slow, begin an analysis of what is going "wrong". 
 I can't afford the time to analyse - not ANALYZE :) - every query.

I think the right question to ask is:
Q:  I can code my query with IN or with EXISTS; which should I try first?

In 5 years of Rdb and 5 years of Oracle, I developed the habit of coding EXISTS.  In those 5 Oracle 
years I have recoded EXISTS as IN only twice (I'm pretty sure of this because it's been so rare).  
A colleague with the IN habit, would give me a slow query to tune every week or two; I'd just 
recode the IN as EXISTS.

OK, I have no statistics, but in my real-life experience the answer is:  
A:  Try EXISTS first.  You may have to recode, but not very often.

Waiting to be blasted - DaPi
 


Followup   December 16, 2004 - 3pm Central time zone:

A: use the cbo and it'll try out both.


No blasting, I use exists so infrequently and in almost all of the time myself. 

4 stars IN & EXISTS - CBO tries both?   December 16, 2004 - 4pm Central time zone
Reviewer: DaPi from Geneva - CH
You said: "A: use the cbo and it'll try out both."

I've been running CBO (7.3.4 then 8.0.6 then 8.1.7) with up-to-date stats all those 5 years - it 
must miss a trick from time to time, because recoding HAS made a difference in some cases.

I'd agree that in many cases there is NO performance difference between IN and EXISTS (when the CBO 
gets it right?).  But if there is going to be a difference, I'd put my money on EXISTS.

Thanks - DaPi 


Followup   December 16, 2004 - 5pm Central time zone:

select * 
  from big_table
 where exists ( select null from small_table where small_table.x = big_table.x)

vs

select * from big_table
where x in ( select x from small_table )


give me the IN any time (in rbo days) 

4 stars IN vs EXISTS   December 17, 2004 - 3am Central time zone
Reviewer: DaPi from Geneva - CH
OK - I'll buy that (confirmed by experiment under CBO).

I guess it's that my real-life experience tends not to deal with small tables . . . . and with big 
tables it hurts even more when you guess wrong! 


Followup   December 17, 2004 - 7am Central time zone:

doesn't even have to be small tables, big tables that return smallish results


select * from big_table
where x in ( select x from really_huge_table where <some predicate>)

if some predicate returned just 100 rows from really huge table, IN would have been preferred :) 

3 stars   December 26, 2004 - 2am Central time zone
Reviewer: Logan 
   for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then 
         OUTPUT THE RECORD
      end if
   end loop

-----------------------------------------------------------

The explanation you gave for IN & EXISTS being better and in which situation is convincing, but the 
example loop you gave...

1.Well when you use exists the SQL always returns only one row, is that true i.e. when you used 
exists to find whether a conditions is true or false. For examples 


select decode( count(*), 1, 'y', 'n')
from dual where exists ( select null from t1 , t2 where 
                         where t1.x = t2.x)

I guess you cannot do such a sql using IN?

2. My question is doesnt EXISTS as the name of the clause itself suggests checks for the existense 
of a condition. Lets say if 10 rows meet that condition, does the exists clause read all the 10 
rows, or it returns the result after meeting the condition for the first time. 


Followup   December 26, 2004 - 12pm Central time zone:

I was giving psuedo code here to explain the concept.  


exists and in can both short circut since the CBO will turn in into exists and exists into in as 
needed these days.  so, it is safe to say that either can stop after the first one -- or get them 
all and semijoin. 

2 stars   December 26, 2004 - 9pm Central time zone
Reviewer: Logan 
--------------
it is safe to say that either can stop 
after the first one -- or get them all and semijoin
-----------

1.Tom , can you please illustrate the same. Can you give an example where an EXISTS stops after the 
first row, and where it gets all the rows.

2.What do you mean by a semi join? 


Followup   December 27, 2004 - 9am Central time zone:

it is that:

select * from t1 where EXISTS ( select NULL from t2 where t2.id = t1.id )

can be written as:

select * from t1 where id in ( select id from t2 );


and the optimizer is free to (re)write it either way.


consider:

ops$tkyte@ORA9IR2> create table t1( id int, y char(2000));
Table created.
 
ops$tkyte@ORA9IR2> create table t2( id int );
Table created.
 
ops$tkyte@ORA9IR2> delete from plan_table;
4 rows deleted.
 
ops$tkyte@ORA9IR2> explain plan for
  2  select * from t1 where exists ( select null from t2 where t2.id = t1.id );
Explained.
 
ops$tkyte@ORA9IR2> select * from table( dbms_xplan.display );
 
PLAN_TABLE_OUTPUT
--------------------
--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |       |       |       |
|*  1 |  FILTER              |             |       |       |       |
|   2 |   TABLE ACCESS FULL  | T1          |       |       |       |
|*  3 |   TABLE ACCESS FULL  | T2          |       |       |       |
--------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter( EXISTS (SELECT 0 FROM "T2" "T2" WHERE "T2"."ID"=:B1))
   3 - filter("T2"."ID"=:B1)
 
Note: rule based optimization
 
17 rows selected.

that shows Oracle doing the "where exists" as a filter -- for every row in T1 it'll run that 
filter (which results in a full scan of T2 -- but stops after it hits the first "t2.id = :b1" -- so 
it'll not totally full scan the table each time PERHAPS, just scans till it gets a hit)

But, lets use the more intelligent CBO and tell it about our tables (tell it "t1" is medium size, 
"t2" is big)
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats( user, 'T1', numrows => 5000, numblks=>500);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats( user, 'T2', numrows => 50000, numblks=>5000);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> delete from plan_table;
 
4 rows deleted.
 
ops$tkyte@ORA9IR2> explain plan for
  2  select * from t1 where exists ( select null from t2 where t2.id = t1.id );
 
Explained.
 
ops$tkyte@ORA9IR2> select * from table( dbms_xplan.display );
 
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------
 
--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |  5000 |   551K|   854 |
|*  1 |  HASH JOIN SEMI      |             |  5000 |   551K|   854 |
|   2 |   TABLE ACCESS FULL  | T1          |  5000 |   488K|    78 |
|   3 |   TABLE ACCESS FULL  | T2          | 50000 |   634K|   760 |
--------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("T2"."ID"="T1"."ID")
 
Note: cpu costing is off
 
16 rows selected.
 
ops$tkyte@ORA9IR2>

Now, it is doing a single full scan of T1 and of T2 (instead of a full scan of T2 for EVERY ROW 
IN T1) followed by that "semi join" thingy (which I wrote about in Effective Oracle by Design in 
detail)


The semi join is a join that joins T1 with T2 but only ONCE (if id=55 is in T2 5,000 times -- we 
don't join the row in T1 with id=55 to each on -- only to the first).  A semi join is a "where 
exists" style join, just join to the first occurrence

But - we built the entire set.  We read all of T2, got all of the rows and are joining (semi 
joining) to them
 

5 stars how to use index in exists statement?   March 25, 2005 - 12am Central time zone
Reviewer: A reader 
Tom,
I have the delete on the large table pa_budget_lines.
I want to use the index on resource_assignment_id on the table pa_budget_lines. It seams 
impossible?
thanks a lot!


DELETE FROM pa_budget_lines bl
where EXISTS (SELECT  'X'
      FROM   pa_resource_assignments ra,
             pa_budget_versions bv
        WHERE  bv.current_flag <> 'Y'
          AND  bv.budget_status_code = 'B'
          AND  bv.baselined_date IS NOT NULL
          AND  bl.resource_assignment_id = ra.resource_assignment_id
          AND  bv.budget_version_id = ra.budget_version_id)
/ 


Followup   March 25, 2005 - 7am Central time zone:

are you using the cbo. 

4 stars   March 25, 2005 - 9pm Central time zone
Reviewer: A reader 
"
are you using the cbo. 
"

Yes, SIr. it the cost based opt on 8174. 


Followup   March 26, 2005 - 9am Central time zone:

then lets see the autotrace traceonly explain output.  perhaps the index would not be useful. 

5 stars   March 26, 2005 - 11am Central time zone
Reviewer: A reader 
the autotrace as the follow.
there 2 millions rows in pa_budget_lines , 164k in pa_resource_assignments and 13k in 
pa_budget_versions .

APPS@S0P2DBT3> DELETE FROM pa_budget_lines bl
  2  where EXISTS (SELECT  'X'
  3        FROM   pa_resource_assignments ra,
  4            pa_budget_versions bv
  5          WHERE  bv.current_flag <> 'Y'
  6            AND  bv.budget_status_code = 'B'
  7            AND  bv.baselined_date IS NOT NULL
  8            AND  bl.resource_assignment_id = ra.resource_assignment_id
  9            AND  bv.budget_version_id = ra.budget_version_id)
 10  /

432 rows deleted.


Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=11837 Card=102454 Bytes=2868712)

   1    0   DELETE OF 'PA_BUDGET_LINES'
   2    1     FILTER
   3    2       TABLE ACCESS (FULL) OF 'PA_BUDGET_LINES' (Cost=11837 Card=102454 Bytes=2868712)

   4    2       NESTED LOOPS (Cost=4 Card=1 Bytes=24)
   5    4         TABLE ACCESS (BY INDEX ROWID) OF 'PA_RESOURCE_ASSIGNMENTS' (Cost=3 Card=1 
Bytes=10)

   6    5           INDEX (UNIQUE SCAN) OF 'PA_RESOURCE_ASSIGNMENTS_U1' (UNIQUE) (Cost=2 Card=2)

   7    4         TABLE ACCESS (BY INDEX ROWID) OF 'PA_BUDGET_VERSIONS' (Cost=1 Card=1085 
Bytes=15190)

   8    7           INDEX (UNIQUE SCAN) OF 'PA_BUDGET_VERSIONS_U1' (UNIQUE)





Statistics
----------------------------------------------------------
       1329  recursive calls
       8377  db block gets
    1314941  consistent gets
      84134  physical reads
     586996  redo size
        515  bytes sent via SQL*Net to client
        939  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        432  rows processed

APPS@S0P2DBT3> 
APPS@S0P2DBT3> rollback
  2  /

Rollback complete.
 


Followup   March 26, 2005 - 12pm Central time zone:

just explain this query (you don't need run it, just explain plan it)

ops$tkyte@ORA9IR2> DELETE FROM t1 bl
  2  where resource_assignment_id in
  3  (SELECT ra.resource_assignment_id
  4        FROM   t2 ra,
  5               t3 bv
  6          WHERE  bv.current_flag <> 'Y'
  7            AND  bv.budget_status_code = 'B'
  8            AND  bv.baselined_date IS NOT NULL
  9            AND  bv.budget_version_id = ra.budget_version_id)
 10  /
 
 

4 stars   March 26, 2005 - 1pm Central time zone
Reviewer: A reader 
apps@s0p2dbt3> explain plan for 
  2  DELETE FROM pa_budget_lines bl
  3    where resource_assignment_id in
  4    (SELECT ra.resource_assignment_id
  5          FROM   pa_resource_assignments ra,
  6                 pa_budget_versions bv
  7            WHERE  bv.current_flag <> 'Y'
  8              AND  bv.budget_status_code = 'B'
  9              AND  bv.baselined_date IS NOT NULL
 10              AND  bv.budget_version_id = ra.budget_version_id)
 11  /

Explained.

apps@s0p2dbt3> 
apps@s0p2dbt3> @plan

QUERY_PLAN
------------------------------------------------------------
  DELETE  PA_BUDGET_LINES
    HASH JOIN
      VIEW  VW_NSO_1
        SORT UNIQUE
          HASH JOIN
            TABLE ACCESS FULL PA_BUDGET_VERSIONS
            TABLE ACCESS FULL PA_RESOURCE_ASSIGNMENTS
      TABLE ACCESS FULL PA_BUDGET_LINES
 


Followup   March 26, 2005 - 1pm Central time zone:

are you sure the indices are in place?  what happens with the following scripts?

(safe if you do not have t1, t2, t3)


drop table t1;
drop table t2;
drop table t3;
                                                                                                    
                     
create table t1 ( resource_assignment_id int, data char(80) );
create table t2 ( resource_assignment_id int, budget_version_id int, data char(80) );
create table t3 ( current_flag varchar2(1), budget_status_code varchar2(10),
                  baselined_date date, budget_version_id int, data char(80) );
                                                                                                    
                     
exec dbms_stats.set_table_stats( user, 'T1', numrows=> 2000000, numblks => 12000 );
exec dbms_stats.set_table_stats( user, 'T2', numrows=> 164000, numblks => 1000 );
exec dbms_stats.set_table_stats( user, 'T3', numrows=> 13000, numblks => 500 );
                                                                                                    
                     
create index t1_idx on t1(resource_assignment_id);
create UNIQUE index t2_idx on t2(resource_assignment_id);
create UNIQUE index t3_idx on t3(budget_version_id);
                                                                                                    
                     
exec dbms_stats.set_index_stats( user, 'T1_IDX', numrows=> 2000000, numlblks => 1200 );
exec dbms_stats.set_index_stats( user, 'T2_IDX', numrows=> 164000, numlblks => 100 );
exec dbms_stats.set_index_stats( user, 'T3_IDX', numrows=> 13000, numlblks => 50 );
                                                                                                    
                     
set autotrace on explain
DELETE FROM t1 bl
where EXISTS (SELECT  'X'
      FROM   t2 ra,
             t3 bv
        WHERE  bv.current_flag <> 'Y'
          AND  bv.budget_status_code = 'B'
          AND  bv.baselined_date IS NOT NULL
          AND  bl.resource_assignment_id = ra.resource_assignment_id
          AND  bv.budget_version_id = ra.budget_version_id)
/
DELETE FROM t1 bl
where resource_assignment_id in
(SELECT ra.resource_assignment_id
      FROM   t2 ra,
             t3 bv
        WHERE  bv.current_flag <> 'Y'
          AND  bv.budget_status_code = 'B'
          AND  bv.baselined_date IS NOT NULL
          AND  bv.budget_version_id = ra.budget_version_id)
/
set autotrace off
 

4 stars   March 26, 2005 - 3pm Central time zone
Reviewer: A reader 
apps@s0p2dbt3> create table pa.t2 as select * from pa.pa_resource_assignments;

Table created.

apps@s0p2dbt3>  create table pa.t3 as select * from pa.pa_budget_versions ;

Table created.

apps@s0p2dbt3> create table pa.t1 as select * from pa.pa_budget_lines;

Table created.

apps@s0p2dbt3> exec dbms_stats.set_table_stats( 'PA', 'T1', numrows=> 2000000, numblks => 1
2000 );

PL/SQL procedure successfully completed.
apps@s0p2dbt3> exec dbms_stats.set_table_stats( 'PA', 'T3', numrows=> 13000, numblks => 500
 );

PL/SQL procedure successfully completed.

apps@s0p2dbt3>   exec dbms_stats.set_table_stats( 'PA', 'T2', numrows=> 164000, numblks => 
1000 );

PL/SQL procedure successfully completed.

apps@s0p2dbt3>create index pa.t1_idx on pa.t1(resource_assignment_id) tablespace pax;

Index created.

SQL> create UNIQUE index pa.t2_idx on pa.t2(resource_assignment_id) tablespace pax;
create UNIQUE index pa.t3_idx on pa.t3(budget_version_id) tablespace pax;
                                                    

Index created.

SQL> 
Index created.

apps@s0p2dbt3> exec dbms_stats.set_index_stats( 'PA', 'T1_IDX', numrows=> 2000000, numlblks
 => 1200 );

PL/SQL procedure successfully completed.

apps@s0p2dbt3> exec dbms_stats.set_index_stats( 'PA', 'T2_IDX', numrows=> 164000, numlblks 
=> 100 );

PL/SQL procedure successfully completed.

apps@s0p2dbt3> exec dbms_stats.set_index_stats( 'PA', 'T3_IDX', numrows=> 13000, numlblks =
> 50 );

PL/SQL procedure successfully completed.


apps@s0p2dbt3> set autotrace on explain
apps@s0p2dbt3> DELETE FROM pa.t1 bl
  2  where EXISTS (SELECT  'X'
  3        FROM   pa.t2 ra,
  4               pa.t3 bv
  5          WHERE  bv.current_flag <> 'Y'
  6            AND  bv.budget_status_code = 'B'
  7            AND  bv.baselined_date IS NOT NULL
  8            AND  bl.resource_assignment_id = ra.resource_assignment_id
  9            AND  bv.budget_version_id = ra.budget_version_id)
 10  /

432 rows deleted.


Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=1201 Card=100000 Byt
          es=1300000)

   1    0   DELETE OF 'T1'
   2    1     INDEX (FULL SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1201 Ca
          rd=100000 Bytes=1300000)

   3    2       NESTED LOOPS (Cost=3 Card=1 Bytes=67)
   4    3         TABLE ACCESS (BY INDEX ROWID) OF 'T2' (Cost=2 Card=1
           Bytes=26)

   5    4           INDEX (UNIQUE SCAN) OF 'T2_IDX' (UNIQUE) (Cost=1 C
          ard=1640)

   6    3         TABLE ACCESS (BY INDEX ROWID) OF 'T3' (Cost=1 Card=1
           Bytes=41)

   7    6           INDEX (UNIQUE SCAN) OF 'T3_IDX' (UNIQUE)



apps@s0p2dbt3> 
apps@s0p2dbt3> rollback
  2  /

Rollback complete.

apps@s0p2dbt3> 
apps@s0p2dbt3> DELETE FROM pa.t1 bl
  2  where resource_assignment_id in
  3  (SELECT ra.resource_assignment_id
  4        FROM   pa.t2 ra,
  5               pa.t3 bv
  6          WHERE  bv.current_flag <> 'Y'
  7            AND  bv.budget_status_code = 'B'
  8            AND  bv.baselined_date IS NOT NULL
  9            AND  bv.budget_version_id = ra.budget_version_id)
 10  /

432 rows deleted.


Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=1474 Card=32800000 B
          ytes=852800000)

   1    0   DELETE OF 'T1'
   2    1     MERGE JOIN (Cost=1474 Card=32800000 Bytes=852800000)
   3    2       INDEX (FULL SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1201
          Card=2000000 Bytes=26000000)

   4    2       SORT (JOIN) (Cost=273 Card=1640 Bytes=21320)
   5    4         VIEW OF 'VW_NSO_1' (Cost=266 Card=1640 Bytes=21320)
   6    5           SORT (UNIQUE) (Cost=266 Card=1640 Bytes=109880)
   7    6             HASH JOIN (Cost=246 Card=1640 Bytes=109880)
   8    7               TABLE ACCESS (FULL) OF 'T3' (Cost=78 Card=1 By
          tes=41)

   9    7               TABLE ACCESS (FULL) OF 'T2' (Cost=153 Card=164
          000 Bytes=4264000)




apps@s0p2dbt3> 
apps@s0p2dbt3> rollback
  2  /

Rollback complete. 


Followup   March 26, 2005 - 4pm Central time zone:

looks like it could be a limitation of the very old software.  I need to go way back to 8i to 
reproduce this behavior.

suggest you might temporarily try two stepping this, eg:

tkyte@ORA8IW> create global temporary table gtt as
  2  (SELECT ra.resource_assignment_id
  3        FROM   t2 ra
  4          WHERE  1=0)
  5  /
Table created.

tkyte@ORA8IW> exec dbms_stats.set_table_stats( user, 'GTT', numrows => 500 );
PL/SQL procedure successfully completed.

the above two things are done ONCE, then:


tkyte@ORA8IW> insert into gtt
  2  (SELECT ra.resource_assignment_id
  3        FROM   t2 ra,
  4               t3 bv
  5          WHERE  bv.current_flag <> 'Y'
  6            AND  bv.budget_status_code = 'B'
  7            AND  bv.baselined_date IS NOT NULL
  8            AND  bv.budget_version_id = ra.budget_version_id);

tkyte@ORA8IW> delete from t1 where resource_assignment_id in ( select * from gtt );

Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=519 Card=10000000 Bytes=260000000)
   1    0   DELETE OF 'T1'
   2    1     NESTED LOOPS (Cost=519 Card=10000000 Bytes=260000000)
   3    2       VIEW OF 'VW_NSO_1' (Cost=19 Card=500 Bytes=6500)
   4    3         SORT (UNIQUE) (Cost=19 Card=500 Bytes=6500)
   5    4           TABLE ACCESS (FULL) OF 'GTT' (Cost=16 Card=500 Bytes=6500)
   6    2       INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE) (Cost=1 Card=2000000 Bytes=26000000)



tkyte@ORA8IW> set autotrace off 

5 stars Purge huge table   April 2, 2005 - 7am Central time zone
Reviewer: A reader 
Tom,
"
looks like it could be a limitation of the very old software.  I need to go way 
back to 8i to reproduce this behavior.

suggest you might temporarily try two stepping this, eg:
"

I should give you the what I am trying to accomplish here.
We want to purge a table which has 70 million rows about 10 GB. on critira to partition the table. 
the goal is to remove 60 million rows from it based the critiras. 
After reading your thread:
http://asktom.oracle.com/pls/ask/f?p=4950:8:344944802785198156::NO::F4950_P8_DISPLAYID,F4950_P8_CRIT
ERIA:6407993912330
It is no pratical to delete the table by the conventional method.
So i am working on based on
create tmp table with nologging;
insert table with append, and parallel;
drop original table;
rename tmp to original;
recreate indexes and grants;
done.

the insert 60 millions took 20 hours. 
I would like your comment on the temp table, if the temp suitable to the massive inserts. what are 
needed to be change on the table:


CREATE TABLE PURGE_TMP_S
(
  RESOURCE_ASSIGNMENT_ID    NUMBER(15)          NOT NULL,
  START_DATE                DATE                NOT NULL,
  LAST_UPDATE_DATE          DATE                NOT NULL,
  LAST_UPDATED_BY           NUMBER              NOT NULL,
  CREATION_DATE             DATE                NOT NULL,
  CREATED_BY                NUMBER              NOT NULL,
  LAST_UPDATE_LOGIN         NUMBER              NOT NULL,
  END_DATE                  DATE                NOT NULL,
  PERIOD_NAME               VARCHAR2(30),
  QUANTITY                  NUMBER,
  RAW_COST                  NUMBER,
  BURDENED_COST             NUMBER,
  REVENUE                   NUMBER,
  CHANGE_REASON_CODE        VARCHAR2(30),
  DESCRIPTION               VARCHAR2(255),
  ATTRIBUTE_CATEGORY        VARCHAR2(30),
  ATTRIBUTE1                VARCHAR2(150),
  ATTRIBUTE2                VARCHAR2(150),
  ATTRIBUTE3                VARCHAR2(150),
  ATTRIBUTE4                VARCHAR2(150),
  ATTRIBUTE5                VARCHAR2(150),
  ATTRIBUTE6                VARCHAR2(150),
    ATTRIBUTE14               VARCHAR2(150),
  ATTRIBUTE15               VARCHAR2(150),
  RAW_COST_SOURCE           VARCHAR2(2),
  BURDENED_COST_SOURCE      VARCHAR2(2),
  QUANTITY_SOURCE           VARCHAR2(2),
  REVENUE_SOURCE            VARCHAR2(2),
  PM_PRODUCT_CODE           VARCHAR2(30),
  PM_BUDGET_LINE_REFERENCE  VARCHAR2(25),
  COST_REJECTION_CODE       VARCHAR2(30),
  REVENUE_REJECTION_CODE    VARCHAR2(30),
  BURDEN_REJECTION_CODE     VARCHAR2(30),
  OTHER_REJECTION_CODE      VARCHAR2(30),
  CODE_COMBINATION_ID       NUMBER(15),
  CCID_GEN_STATUS_CODE      VARCHAR2(1),
  CCID_GEN_REJ_MESSAGE      VARCHAR2(2000),
  REQUEST_ID                NUMBER
)
TABLESPACE PAD
PCTUSED    40
PCTFREE    10
INITRANS   1
MAXTRANS   255
STORAGE    (
            INITIAL          40K
            NEXT             40K
            MINEXTENTS       1
            MAXEXTENTS       10000000
            PCTINCREASE      0
            FREELISTS        1
            FREELIST GROUPS  1
            BUFFER_POOL      DEFAULT
           )
NOLOGGING 
NOCACHE
NOPARALLEL;

Thanks a lot Tom.
 


Followup   April 2, 2005 - 9am Central time zone:

the insert of 60million records took 20hours?

what kind of hardware/disk are we talking here.  That is abnormally slow.

did you do a subset of it with trace to see what is taking so long? 

4 stars   April 2, 2005 - 7am Central time zone
Reviewer: A reader 
"We want to purge a table which has 70 million rows about 10 GB. on critira to 
partition the table. "

i meant there is no critira for implement partitions on the table.
sorry for the typo. 


4 stars   April 2, 2005 - 11am Central time zone
Reviewer: A reader 
"
what kind of hardware/disk are we talking here.  That is abnormally slow.

did you do a subset of it with trace to see what is taking so long?
" 
it is Sun solaris 480 with 4 CPU. disk is raid 5. and there are some other applications running on 
the machine also.

I had it with the parallel 2 and did not set the sqltrace on it.

here is from statspack:
  Buffer Gets    Executions  Gets per Exec  % Total  Hash Value
--------------- ------------ -------------- ------- ------------
     29,014,953            1   29,014,953.0      25120040                
insert
/*+ append */ into pa.pa_budget_lines_tmp_s pbls  select
* from pa.pa_budget_lines pbl where pbl.resource_assignment_id i
n (SELECT ra.resource_assignment_id         FROM   pa.pa_resourc
e_assignments ra,                pa.pa_budget_versions bv
    WHERE  bv.current_flag <> 'Y'             AND  bv.budget_sta

 


Followup   April 2, 2005 - 11am Central time zone:

that is Oracle applications, do you have support working with you if you are planning on doing 
stuff to those tables with SQL?

is is the QUERY that takes 20 hours or the insert?

If you explain plan the insert and the select, do they use the same plan.

raid 5 = not fast stuff for direct path writing.  but not 20 hours slow. 

4 stars   April 2, 2005 - 12pm Central time zone
Reviewer: A reader 
"
that is Oracle applications, do you have support working with you if you are planning on doing 
stuff to those tables with SQL?
"
Yes, it's oracle apps.
We want to present oracle support what we can do for the purge of PA tables, since the support has 
not been able to provide the method. we test the purge on our Test environment.


"
is is the QUERY that takes 20 hours or the insert?
"

Yes.

"
If you explain plan the insert and the select, do they use the same plan.

"
apps@s0p2dbt1> explain plan for 
  2  insert /*+ append parallel(pbls,4)*/ 
  3  into pa.pa_budget_lines_tmp_s pbls
  4   select /*+ parallel(pbl, 4) */
  5  * from pa.pa_budget_lines pbl
  6  where pbl.resource_assignment_id in 
  7  (SELECT ra.resource_assignment_id
  8          FROM   pa.pa_resource_assignments ra,
  9                 pa.pa_budget_versions bv
 10            WHERE  bv.current_flag <> 'Y'
 11              AND  bv.budget_status_code = 'B'
 12              AND  bv.baselined_date IS NOT NULL
 13              AND  bv.budget_version_id = ra.budget_version_id)
 14  /

Explained.

apps@s0p2dbt1> @plan

Sat Apr 02                                                                      
                 
QUERY_PLAN
------------------------------------------------------------
  LOAD AS SELECT
    NESTED LOOPS
      VIEW  VW_NSO_1
        SORT UNIQUE
          HASH JOIN
            TABLE ACCESS FULL PA_BUDGET_VERSIONS
            TABLE ACCESS FULL PA_RESOURCE_ASSIGNMENTS
      TABLE ACCESS BY INDEX ROWID PA_BUDGET_LINES
        INDEX RANGE SCAN PA_BUDGET_LINES_N2

9 rows selected.


10 rows deleted.

apps@s0p2dbt1> explain plan for 
  2   select /*+ parallel(pbl, 4) */
  3  * from pa.pa_budget_lines pbl
  4  where pbl.resource_assignment_id in 
  5  (SELECT ra.resource_assignment_id
  6          FROM   pa.pa_resource_assignments ra,
  7                 pa.pa_budget_versions bv
  8            WHERE  bv.current_flag <> 'Y'
  9              AND  bv.budget_status_code = 'B'
 10              AND  bv.baselined_date IS NOT NULL
 11              AND  bv.budget_version_id = ra.budget_version_id)
 12  /

Explained.

apps@s0p2dbt1> @plan

Sat Apr 02                                                                      
                                                           

QUERY_PLAN
------------------------------------------------------------
  NESTED LOOPS
    VIEW  VW_NSO_1
      SORT UNIQUE
        HASH JOIN
          TABLE ACCESS FULL PA_BUDGET_VERSIONS
          TABLE ACCESS FULL PA_RESOURCE_ASSIGNMENTS
    TABLE ACCESS BY INDEX ROWID PA_BUDGET_LINES
      INDEX RANGE SCAN PA_BUDGET_LINES_N2

8 rows selected.


9 rows deleted.

apps@s0p2dbt1> 
 


Followup   April 2, 2005 - 1pm Central time zone:

"
is is the QUERY that takes 20 hours or the insert?
"
Yes.


Hmmm

Q: Is it raining or sunny
A: Yes

Must be a sunshower.  So, still don't know which it is.

use autotrace traceonly explain, you'll get the cost/card values which are vital. 

4 stars   April 2, 2005 - 1pm Central time zone
Reviewer: A reader 
Q: Is it raining or sunny
A: Yes

sorry i did not read your question carefully.
Yes, I mean 20 hrs for inserts. the query run 30 minutes.

I found the parallelism does not work here:

insert /*+ append parallel(pbls,4)*/ 
into pa.pa_budget_lines_tmp_s pbls
 select /*+ parallel(pbl, 4) */
* from pa.pa_budget_lines pbl
where pbl.resource_assignment_id in 
(SELECT ra.resource_assignment_id
        FROM   pa.pa_resource_assignments ra,
               pa.pa_budget_versions bv
          WHERE  bv.current_flag <> 'Y'
            AND  bv.budget_status_code = 'B'
            AND  bv.baselined_date IS NOT NULL
            AND  bv.budget_version_id = ra.budget_version_id)
/

And The parallelism does not work for

insert /*+ append parallel(pbls,4)*/ 
into pa.pa_budget_lines_tmp_s pbls
 select 
* from pa.pa_budget_lines pbl
where pbl.resource_assignment_id in 
(SELECT ra.resource_assignment_id
        FROM   pa.pa_resource_assignments ra,
               pa.pa_budget_versions bv
          WHERE  bv.current_flag <> 'Y'
            AND  bv.budget_status_code = 'B'
            AND  bv.baselined_date IS NOT NULL
            AND  bv.budget_version_id = ra.budget_version_id)
/

The parallelism does not work for 
 select /*+ parallel(pbl, 4) */
* from pa.pa_budget_lines pbl
where pbl.resource_assignment_id in 
(SELECT ra.resource_assignment_id
        FROM   pa.pa_resource_assignments ra,
               pa.pa_budget_versions bv
          WHERE  bv.current_flag <> 'Y'
            AND  bv.budget_status_code = 'B'
            AND  bv.baselined_date IS NOT NULL
            AND  bv.budget_version_id = ra.budget_version_id)
/

The parallelism only work for:
select /*+ parallel(pbl, 4) */
* from pa.pa_budget_lines pbl

init.ora 

NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
fast_start_parallel_rollback         string  LOW
optimizer_percent_parallel           integer 0
parallel_adaptive_multi_user         boolean FALSE
parallel_automatic_tuning            boolean FALSE
parallel_broadcast_enabled           boolean FALSE
parallel_execution_message_size      integer 2148
parallel_instance_group              string
parallel_max_servers                 integer 25

Sat Apr 02                                                                                          
                       page    2
 

NAME                                 TYPE    VALUE
------------------------------------ ------- ------------------------------
parallel_min_percent                 integer 0
parallel_min_servers                 integer 0
parallel_server                      boolean FALSE
parallel_server_instances            integer 1
parallel_threads_per_cpu             integer 2
recovery_parallelism                 integer 0
SQL> 

it's 8174 db.
 


Followup   April 2, 2005 - 1pm Central time zone:

so, given that it would not normally take 19.5 hours to direct path write that data.... I'd suggest 
sql tracing on a sizable subset of the data and see where the bottleneck is.  Just do it serially, 
easier to trace and understand. 

4 stars   April 2, 2005 - 2pm Central time zone
Reviewer: A reader 
Tom, 
i will trace it. before doing that i would like the parallelism does not for the inserts and the 
query as i refered. it is only working for

select /*+ parallel(t,4) */ * from pa.pa_budget_lines t;

 


Followup   April 2, 2005 - 2pm Central time zone:

not everything is parallelizable.  and given the plans -- hash join used in a NL, I would guess 
that parallelizing the access to the outer table (which is now being picked up by index) wouldn't 
make sense. 

5 stars maybe extent management ...   April 2, 2005 - 3pm Central time zone
Reviewer: Alberto Dell'Era from Milan, Italy
from the review "Purge huge table" by "A Reader":

CREATE TABLE PURGE_TMP_S ...
STORAGE    (
            INITIAL          40K
            NEXT             40K
            ...
           )

if the tablespace is dictionary managed, parallel or not, most of the time will be spent changing 
the data dictionary and logging the changes ;)

Especially since the rows appear to be quite long, say 1k, so 1 extent for every 40 rows, 
60,000,000 rows -> 1,500,000 extents looks a tad too much.
 


Followup   April 2, 2005 - 3pm Central time zone:

laughing out loud, didn't even see that ;)

yes, that would be considered a tad small. 

4 stars   April 2, 2005 - 7pm Central time zone
Reviewer: A reader 
from the review "Purge huge table" by "A Reader":

CREATE TABLE PURGE_TMP_S ...
STORAGE    (
            INITIAL          40K
            NEXT             40K
            ...
           )

"
if the tablespace is dictionary managed, parallel or not, most of the time will 
be spent changing the data dictionary and logging the changes ;)

Especially since the rows appear to be quite long, say 1k, so 1 extent for every 
40 rows, 60,000,000 rows -> 1,500,000 extents looks a tad too much.
"

Forgot to metioned that. It is local managed tablespace in Oracle Applications 11.5.7.
and the temp table was created as

create table temptable as select * from pa.table;

The PA table is the orginal apps table. 

Tom, that was the reason i asked your comment on the temp table. 


5 stars to "A Reader"   April 3, 2005 - 11am Central time zone
Reviewer: Alberto Dell'Era from Milan, Italy
>and the temp table was created as
>create table temptable as select * from pa.table;

(i guess you meant "create table temptable as select * from pa.table WHERE 1=0", right ?) 

By "temptable", you mean pa.pa_budget_lines_tmp_s, correct ? 

Btw, what's the output of

select  TABLESPACE_NAME, EXTENT_MANAGEMENT, allocation_type, initial_extent, next_extent from 
dba_tablespaces where TABLESPACE_NAME= (select tablespace_name from dba_tables where owner = 'PA' 
and table_name = upper ('pa_budget_lines_tmp_s'));

select tablespace_name, initial_extent, next_extent, pct_increase from dba_tables where owner = 
'PA' and table_name = upper ('pa_budget_lines_tmp_s');

And i would be curious to know what is the CREATE PURGE_TMP_S statement you gave above, if you 
created the table as a create-table-as-select ...  


2 stars   April 3, 2005 - 7pm Central time zone
Reviewer: A reader 
"
(i guess you meant "create table temptable as select * from pa.table WHERE 1=0", 
right ?) 
"
that's correct. 
create table pa.pa_budget_lines_tmp_s as select * from pa.pa_budget_lines where 1=2
/

the temp table was created in the PA data tablespace PAD.

we are working on Apps 11i. we do not want to the original table definitions/storge, because the 
apps may become no supportable by Oracle.

APPS@S0P2DBT1> select  TABLESPACE_NAME, EXTENT_MANAGEMENT, allocation_type, initial_extent, 
  2  next_extent from dba_tablespaces where TABLESPACE_NAME= (select tablespace_name 
  3  from dba_tables where owner = 'PA' and table_name = upper 
  4  ('pa_budget_lines_tmp_s'));

TABLESPACE_NAME
------------------------------------------------------------------------------------------
EXTENT_MANAGEMENT              ALLOCATION_TYPE             INITIAL_EXTENT NEXT_EXTENT
------------------------------ --------------------------- -------------- -----------
PAD
LOCAL                          USER                                 40960       40960


APPS@S0P2DBT1> select tablespace_name, initial_extent, next_extent, pct_increase from 
  2  dba_tables where owner = 'PA' and table_name = upper ('pa_budget_lines_tmp_s');

TABLESPACE_NAME
------------------------------------------------------------------------------------------
INITIAL_EXTENT NEXT_EXTENT PCT_INCREASE
-------------- ----------- ------------
PAD
         40960       40960            0


APPS@S0P2DBT1> 
APPS@S0P2DBT1>  


Followup   April 3, 2005 - 7pm Central time zone:

40k is a tad small for a really big table, don't you think?

I'll really have to recommend you work with support on this task.  They can advice you if the 
extent sizes are relevant.  You need to work within their advice for apps. 

2 stars   April 3, 2005 - 7pm Central time zone
Reviewer: A reader 
"we do not want to the original table ..."

should be "we do not want to alter the original table ..."
sorry for the typo. 


4 stars IN and Inner Join   April 4, 2005 - 2pm Central time zone
Reviewer: Praveen from Bangalore
Hi Tom,

I did the following experiment:

Oracle9iR2

CREATE TABLE ALL_OBJECTS1 AS SELECT * FROM sys.ALL_OBJECTS;

CREATE TABLE ALL_OBJECTS2 AS SELECT * FROM sys.ALL_OBJECTS;

CREATE INDEX idx_allobj1_objid ON ALL_OBJECTS1(object_id);

CREATE INDEX idx_allobj2_objid ON ALL_OBJECTS2(object_id);

ANALYZE TABLE ALL_OBJECTS1 COMPUTE STATISTICS;

ANALYZE TABLE ALL_OBJECTS2 COMPUTE STATISTICS;

SELECT COUNT(*) FROM ALL_OBJECTS;

COUNT(*)
--------
23533


Query 1
-------
SELECT * FROM ALL_OBJECTS1 a1
WHERE EXISTS (SELECT NULL FROM ALL_OBJECTS2 WHERE OBJECT_ID = a1.OBJECT_ID);

Explain Plan Output
-------------------
SELECT STATEMENT Optimizer Mode=CHOOSE
  HASH JOIN
    SORT UNIQUE                                                         
      INDEX FAST FULL SCAN    ORCL.IDX_ALLOBJ2_OBJID    
    TABLE ACCESS FULL        ORCL.ALL_OBJECTS1
    
Query2
------
SELECT * FROM ALL_OBJECTS1 a1
WHERE EXISTS (SELECT NULL FROM ALL_OBJECTS2 WHERE OBJECT_ID = a1.OBJECT_ID);

Explain Plan Output
-------------------
SELECT STATEMENT Optimizer Mode=CHOOSE
  HASH JOIN
    SORT UNIQUE                                                         
      INDEX FAST FULL SCAN    ORCL.IDX_ALLOBJ2_OBJID    
    TABLE ACCESS FULL        ORCL.ALL_OBJECTS1

Query3
------
SELECT * FROM ALL_OBJECTS1 a1, ALL_OBJECTS2 a2 WHERE a1.object_id = a2.object_id;

Explain Plan Output
-------------------
SELECT STATEMENT Optimizer Mode=CHOOSE
  HASH JOIN
    TABLE ACCESS FULL    ORCL.ALL_OBJECTS1
    TABLE ACCESS FULL    ORCL.ALL_OBJECTS2


Que 1: The explain plan output of first two queries contradicts with what you have answered 
    to the question of this thread. ie: IN and EXISTS are processed very, very 
    differently. But the plan here shows same exact access paths and costs and performance.
    What went wrong?
    
Que 2:     Ref: Query3. Why did a direct inner join between the two tables gave and entirely 
    different plan and poor performance than IN (or EXISTS). You said that IN is 
    infact a join done by first table on a 'distincted' and 'hashed', inner query.
    Query 3 is not even using the indexes available!
    
Que 3:  How should I choose between inner join and IN?

Thanks
Praveen
         


Followup   April 4, 2005 - 5pm Central time zone:

good point

things change, good thing I did a test case above that you can test with to observe that and point 
it out to the world (I'm not being "snarky", I'm really overjoyed at the timing actually)....


Use the RBO and see what you see.  way back when I wrote this, that was the "more popular" of the 
two perhaps

today in 2005, what I said years ago using the RBO does not apply to the CBO.  the cbo is smart 
enough to recognize these two things are effectively the same. 

5 stars test recoded   April 4, 2005 - 6pm Central time zone
Reviewer: Alberto Dell'Era from Milan, Italy
But query3 isn't semantically equivalent to query1/2... you should put a unique constraint on 
ALL_OBJECTS2_uq (object_id) to make them "the same", and recode the statement as below (or not?)

It goes even better anyway - the sort can be avoided alltogether (and it's confirmed that query1 
and query2 plans are the same) - i have coded query2 the way i suppose it should be (query1 text is 
equal to query2 in the review):

CREATE TABLE ALL_OBJECTS1 AS SELECT * FROM sys.ALL_OBJECTS;

CREATE TABLE ALL_OBJECTS2 AS SELECT * FROM sys.ALL_OBJECTS;

CREATE INDEX idx_allobj1_objid ON ALL_OBJECTS1(object_id);

CREATE INDEX idx_allobj2_objid ON ALL_OBJECTS2(object_id);

exec dbms_stats.gather_table_stats (user, 'ALL_OBJECTS1', cascade=>true);
exec dbms_stats.gather_table_stats (user, 'ALL_OBJECTS2', cascade=>true);


Query 1:
SELECT /* check */ * FROM ALL_OBJECTS1 a1
WHERE EXISTS (SELECT NULL FROM ALL_OBJECTS2 WHERE OBJECT_ID = a1.OBJECT_ID);

Query2:
SELECT /* check */ * FROM ALL_OBJECTS1 a1
WHERE OBJECT_ID in (SELECT OBJECT_ID  FROM ALL_OBJECTS2);

alter table ALL_OBJECTS2 add constraint ALL_OBJECTS2_uq unique (object_id);

Query3:
SELECT /* check unique */ a1.* FROM ALL_OBJECTS1 a1, ALL_OBJECTS2 a2 
WHERE a1.object_id = a2.object_id;

======================================================================
SELECT /* check */ * FROM ALL_OBJECTS1 a1 WHERE EXISTS (SELECT NULL 
FROM ALL_OBJECTS2 WHERE OBJECT_ID = a1.OBJECT_ID)

----------------------------------------------------------------------------
| Id  | Operation             |  Name              | Rows  | Bytes | Cost  |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |       |       |    94 |
|*  1 |  HASH JOIN SEMI       |                    | 35281 |  3376K|    94 |
|   2 |   TABLE ACCESS FULL   | ALL_OBJECTS1       | 35281 |  3204K|    50 |
|   3 |   INDEX FAST FULL SCAN| IDX_ALLOBJ2_OBJID  | 35282 |   172K|     9 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_ID"="OBJECT_ID")

Note: cpu costing is off
======================================================================
SELECT /* check */ * FROM ALL_OBJECTS1 a1 WHERE OBJECT_ID in (SELECT OBJECT_ID  
FROM ALL_OBJECTS2)


----------------------------------------------------------------------------
| Id  | Operation             |  Name              | Rows  | Bytes | Cost  |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |       |       |    94 |
|*  1 |  HASH JOIN SEMI       |                    | 35281 |  3376K|    94 |
|   2 |   TABLE ACCESS FULL   | ALL_OBJECTS1       | 35281 |  3204K|    50 |
|   3 |   INDEX FAST FULL SCAN| IDX_ALLOBJ2_OBJID  | 35282 |   172K|     9 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_ID"="OBJECT_ID")

Note: cpu costing is off

======================================================================
SELECT /* check unique */ a1.* FROM ALL_OBJECTS1 a1, ALL_OBJECTS2 a2 WHERE a1.object_id = 
a2.object_id

---------------------------------------------------------------------------
| Id  | Operation            |  Name              | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                    |       |       |    50 |
|   1 |  NESTED LOOPS        |                    | 35281 |  3376K|    50 |
|   2 |   TABLE ACCESS FULL  | ALL_OBJECTS1       | 35281 |  3204K|    50 |
|*  3 |   INDEX RANGE SCAN   | IDX_ALLOBJ2_OBJID  |     1 |     5 |       |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("OBJECT_ID"="OBJECT_ID")

Note: cpu costing is off

Note: plans fetched from v$sql_plan, 9.2.0.6. 


5 stars In and Out Parameters and IN/Exists operator   April 5, 2005 - 3am Central time zone
Reviewer: Keshav Madhusoodan from India
Excellent stuff, I got here.

 


5 stars Inner Join and IN   April 5, 2005 - 9am Central time zone
Reviewer: Praveen from Bangalore
Well Tom, by that you mean, with newer versions of CBO playing around, there won't by any 
difference at all between IN and EXISTS - whatever be the nature of data sources are - means 
whether an index exists or not, unique or not, relative sizes of each tables etc, did you? That's 
what my execution plan shows me - with a small difference. With unique key defined, 'NESTED LOOP 
SEMI' - for EXISTS, 'NESTED LOOP' - for IN; But otherwise the costs are same.

I had one more doubt that whether IN is always going to perform better that than a direct inner 
join? Testing I've done is telling this. Or is there any  situation where inner join is better than 
IN? 

Thanks, Alberto, for pointing out something important (and also correcting the error,...:). 
Unfortunately, the execution plan I get here, after putting an unique constraint on object_id, is 
very much different from the one that you have given. 'NESTED LOOP SEMI' - for EXISTS, 'NESTED 
LOOP' - for IN and surprisingly, HASH JOIN - for inner join! Well, I gathered statistics using 
ANALYZE.


Thanks
Praveen 


Followup   April 5, 2005 - 12pm Central time zone:

in and exists are equivalent, so the advice of the past (with rbo) does not apply.  


use either or. 

5 stars Question regarding "IN & EXISTS"   May 16, 2005 - 4pm Central time zone
Reviewer: Leo James from Glen Burnie, MD, USA
Very well explained TOM. 


4 stars in, inner join and unnesting   August 10, 2005 - 9pm Central time zone
Reviewer: James from Australia
Dear Tom,

You said (right at the start of this thread) concerning the unnesting/query rewriting of in 
subqueries:

>is typically processed as:
>
>select * 
>  from t1, ( select distinct y from t2 ) t2
> where t1.x = t2.y;
>
>The subquery is evaluated, distinct'ed, indexed (or >hashed or sorted) and then 
>joined to the original table -- typically.

I'm worried about the typically :-)

I have 3 queries that return the same result:

Query 1 - in query
select /*+ NORMAL */ count(*) from carsales_view
where carsales_view.dealer_id in (select dealer_id from dealer_regions where region_id=325)

Query 2 - inner join
select /*+ INNER */ count(*) from carsales_view,dealer_regions
where carsales_view.dealer_Id=dealer_regions.dealer_id and region_id = 325

Query 3 - "Typical" unnested in subquery
select /*+ UNNESTED */
count(*) from carsales_view t1,( select distinct dealer_id from dealer_regions where region_id=325) 
t2
where t1.dealer_Id=t2.dealer_id

The dealer_regions table has had stats created and enough buckets specified for the CBO to choose 
between two indexes.

The problem is that the 1st query - the in subquery - is not being executed as well as it should, 
the wrong index is being selected. What is even more curious is that the other 2 queries perform as 
expected - even q3 which is the "typical" unnested version of the 1st query.

I could just use an index hint but I would rather the CBO got it right as the selectivity of 
"region_id" varies (and sometimes the index I specified in the hint wont be the best one.)

Any advice to "encourage" the CBO to see things my way?

BTW I am using 9.1.5 so that maybe where my problem lies.

--trace output--

select /*+ NORMAL */ count(*) from carsales_view
where carsales_view.dealer_id in (select dealer_id from dealer_regions where region_id=325)

call     count       cpu    elapsed       disk      query    current        rows
total        4      0.58       0.61         55      94638          0           1

Rows     Row Source Operation
      1  SORT AGGREGATE (cr=94620 r=55 w=0 time=595018 us)
     39   NESTED LOOPS SEMI (cr=94620 r=55 w=0 time=594979 us)
  45815    NESTED LOOPS  (cr=48803 r=55 w=0 time=349295 us)
  48690     INDEX FAST FULL SCAN CSVMV_DEID_IND (cr=111 r=55 w=0 time=65768 us)(object id 270087)
  45815     INDEX UNIQUE SCAN CSDM_PK (cr=48692 r=0 w=0 time=199251 us)(object id 270093)
     39    INDEX UNIQUE SCAN DERE_PK (cr=45817 r=0 w=0 time=167174 us)(object id 281197)

Rows     Execution Plan
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     39    NESTED LOOPS (SEMI)
  45815     NESTED LOOPS
  48690      INDEX   GOAL: ANALYZED (FAST FULL SCAN) OF 
                 'CSVMV_DEID_IND' (NON-UNIQUE)
  45815      INDEX   GOAL: ANALYZED (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)
     39     INDEX (UNIQUE SCAN) OF 'DERE_PK' (UNIQUE)
********************************************************************************
select /*+ INNERJOIN */ count(*) from carsales_view,dealer_regions
where carsales_view.dealer_Id=dealer_regions.dealer_id and region_id = 325

call     count       cpu    elapsed       disk      query    current        rows
total        4      0.00       0.00          3         56          0           1

Rows     Row Source Operation
      1  SORT AGGREGATE (cr=54 r=3 w=0 time=965 us)
     39   NESTED LOOPS  (cr=54 r=3 w=0 time=941 us)
     39    NESTED LOOPS  (cr=13 r=3 w=0 time=690 us)
      5     TABLE ACCESS BY INDEX ROWID DEALER_REGIONS (cr=6 r=3 w=0 time=514 us)
      5      INDEX RANGE SCAN DERE_REID_IND (cr=2 r=1 w=0 time=229 us)(object id 281196)
     39     INDEX RANGE SCAN CSVMV_DEID_IND (cr=7 r=0 w=0 time=143 us)(object id 270087)
     39    INDEX UNIQUE SCAN CSDM_PK (cr=41 r=0 w=0 time=180 us)(object id 270093)

Rows     Execution Plan
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     39    NESTED LOOPS
     39     NESTED LOOPS
      5      TABLE ACCESS   GOAL: ANALYZED (BY INDEX ROWID) OF 
                 'DEALER_REGIONS'
      5       INDEX (RANGE SCAN) OF 'DERE_REID_IND' (NON-UNIQUE)
     39      INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'CSVMV_DEID_IND' 
                 (NON-UNIQUE)
     39     INDEX   GOAL: ANALYZED (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)
********************************************************************************
select /*+ UNNESTED */
count(*) from carsales_view t1,( select distinct dealer_id from dealer_regions where region_id=325) 
t2
where t1.dealer_Id=t2.dealer_id

call     count       cpu    elapsed       disk      query    current        rows
total        4      0.00       0.00          0         56          0           1

Rows     Row Source Operation
      1  SORT AGGREGATE (cr=54 r=0 w=0 time=596 us)
     39   NESTED LOOPS  (cr=54 r=0 w=0 time=577 us)
     39    NESTED LOOPS  (cr=13 r=0 w=0 time=337 us)
      5     VIEW  (cr=6 r=0 w=0 time=196 us)
      5      SORT UNIQUE (cr=6 r=0 w=0 time=189 us)
      5       TABLE ACCESS BY INDEX ROWID OBJ#(281059) (cr=6 r=0 w=0 time=99 us)
      5        INDEX RANGE SCAN OBJ#(281196) (cr=2 r=0 w=0 time=48 us)(object id 281196)
     39     INDEX RANGE SCAN OBJ#(270087) (cr=7 r=0 w=0 time=111 us)(object id 270087)
     39    INDEX UNIQUE SCAN OBJ#(270093) (cr=41 r=0 w=0 time=161 us)(object id 270093)

Rows     Execution Plan
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     39    NESTED LOOPS
     39     NESTED LOOPS
      5      VIEW
      5       SORT (UNIQUE)
      5        TABLE ACCESS   GOAL: ANALYZED (BY INDEX ROWID) OF 
                   'DEALER_REGIONS'
      5         INDEX (RANGE SCAN) OF 'DERE_REID_IND' (NON-UNIQUE)
     39      INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'CSVMV_DEID_IND' 
                 (NON-UNIQUE)
     39     INDEX   GOAL: ANALYZED (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)





  





 


Followup   August 11, 2005 - 9am Central time zone:

do you really do this without bind variables?  in the application?

what is 9.1.5 

2 stars A soloution to my last review   August 11, 2005 - 1am Central time zone
Reviewer: James from Australia
Sorry but I have found a work around.

The 1st query can be written as:

select 
count(*) from carsales_view
where dealer_id in (
select dealer_id from dealer_regions where region_id=325 and rownum>0)
/

Which achieves the same performance of a inner join for selective and non-selective values of 
region_id (although with a different plan):

region_id=325 - low selectivity:

56 consistant gets compared with 94,000 (qry1)
This compares to the manually unested query (q3) of 56 consistant gets.

region_id=81 - high selectivity:

15,572 consistant gets compared to 94,000 (using qry1)
This compares to the manually unested query (q3) of 15572 consistant gets.

So I have managed to get the CBO to execute the query based on the selectivity of the region_id 
field in the dealer_regions table.

I beleive this is probably a bug in the 9.1.5 CBO?

region_id=325 - high selectivity with rownum>0

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=19 Card=1 Bytes=22)
   1    0   SORT (AGGREGATE)
   2    1     NESTED LOOPS (Cost=19 Card=241 Bytes=5302)
   3    2       NESTED LOOPS (Cost=19 Card=241 Bytes=4338)
   4    3         VIEW OF 'VW_NSO_1' (Cost=2 Card=5 Bytes=65)
   5    4           SORT (UNIQUE)
   6    5             COUNT
   7    6               FILTER
   8    7                 TABLE ACCESS (BY INDEX ROWID) OF 'DEALER_REG
          IONS' (Cost=2 Card=5 Bytes=40)

   9    8                   INDEX (RANGE SCAN) OF 'DERE_REID_IND' (NON
          -UNIQUE) (Cost=1 Card=5)

  10    3         INDEX (RANGE SCAN) OF 'CSVMV_DEID_IND' (NON-UNIQUE)
          (Cost=1 Card=48 Bytes=240)

  11    2       INDEX (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)




Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
         56  consistent gets
          0  physical reads
          0  redo size
        214  bytes sent via SQL*Net to client
        275  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

region_id=81 - high selectivity with rownum>0

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=22 Card=1 Bytes=22)
   1    0   SORT (AGGREGATE)
   2    1     NESTED LOOPS (Cost=22 Card=33150 Bytes=729300)
   3    2       HASH JOIN (SEMI) (Cost=22 Card=33150 Bytes=596700)
   4    3         INDEX (FAST FULL SCAN) OF 'CSVMV_DEID_IND' (NON-UNIQ
          UE) (Cost=12 Card=48691 Bytes=243455)

   5    3         VIEW OF 'VW_NSO_1' (Cost=3 Card=689 Bytes=8957)
   6    5           COUNT
   7    6             FILTER
   8    7               TABLE ACCESS (FULL) OF 'DEALER_REGIONS' (Cost=
          3 Card=689 Bytes=5512)

   9    2       INDEX (UNIQUE SCAN) OF 'CSDM_PK' (UNIQUE)




Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
      15572  consistent gets
          0  physical reads
          0  redo size
        216  bytes sent via SQL*Net to client
        275  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed





 


5 stars reference for last answer   August 11, 2005 - 1am Central time zone
Reviewer: James from Australia
Found on google groups, keywords: oracle unnest subquery

http://groups.google.com/group/comp.databases.oracle.server/browse_frm/thread/939ef882371d3ce8/5066c
973efa68919?tvc=1&q=oracle+unnest+subquery#5066c973efa68919
regards 


5 stars IN, EXISTS and NULL   November 21, 2005 - 4am Central time zone
Reviewer: Sanjaya Balasuriya from Colombo, Sri Lanka
Hi Tom, 
How IN and EXISTS are effected by NULL values ?

Is it possible that IN and EXISTS return different sets of data depending on NULL values ? 


Followup   November 21, 2005 - 8am Central time zone:

in and exists are comparable in the presence of NULLS

not in and not exists behave differently. 

4 stars Oracle10g Doc has a different opinion.   December 20, 2005 - 5am Central time zone
Reviewer: Praveen from Dubai
Hi Tom,

Oracle10g documentation says IN and EXISTS executes and performs differently depending on the 
situation.

"In general, if the selective predicate is in the subquery, then use IN. If the 
selective predicate is in the parent query, then use EXISTS."

http://download-west.oracle.com/docs/cd/B14117_01/server.101/b10752/sql_1016.htm#28145
This is contradicting with your opinion that-

"today in 2005, what I said years ago using the RBO does not apply to the CBO.  
the cbo is smart enough to recognize these two things are effectively the same." above.

Please explain.

Regards
 


Followup   December 20, 2005 - 10am Central time zone:

ops$tkyte@ORA10GR2> create table employees as select * from hr.employees where 1=0;
 
Table created.
 
ops$tkyte@ORA10GR2> create table orders as select * from oe.orders where 1=0;
 
Table created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert into employees
  2  select object_id,
  3         substr(object_name,1,20),
  4         substr(object_name,1,25),
  5             substr(object_type,1,25),
  6             substr(last_ddl_time,1,20),
  7         created,
  8         substr(object_name,1,10),
  9         object_id,
 10         0.10,
 11         object_id+1,
 12         mod(rownum,1000)
 13    from all_objects
 14   where rownum <= 27000;
 
27000 rows created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert into orders
  2  select rownum,
  3         created,
  4         substr(object_type,1,8),
  5         mod(rownum,150),
  6         1,
  7         object_id,
  8         object_id,
  9         1
 10    from all_objects
 11   where rownum <= 10000;
 
10000 rows created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> alter table employees add constraint emp_pk primary key(employee_id);
 
Table altered.
 
ops$tkyte@ORA10GR2> create index orders_customer_id on orders(customer_id);
 
Index created.
 
ops$tkyte@ORA10GR2> create index employees_department_id on employees(department_id);
 
Index created.
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'EMPLOYEES', method_opt=>'for all 
indexed columns' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'ORDERS', method_opt=>'for all 
indexed columns' );
 
PL/SQL procedure successfully completed.


ops$tkyte@ORA10GR2> set autotrace traceonly explain
ops$tkyte@ORA10GR2> SELECT /* EXISTS example */
  2           e.employee_id, e.first_name, e.last_name, e.salary
  3    FROM employees e
  4   WHERE EXISTS (SELECT 1 FROM orders o                  /* Note 1 */
  5                    WHERE e.employee_id = o.sales_rep_id   /* Note 2 */
  6                      AND o.customer_id = 144);
 
Execution Plan
----------------------------------------------------------
Plan hash value: 551415261
 
------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| 
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    67 |  4087 |    49   (3)| 
|   1 |  NESTED LOOPS                |           |    67 |  4087 |    49   (3)| 
|   2 |   SORT UNIQUE                |           |    67 |  1139 |    14   (0)| 
|*  3 |    TABLE ACCESS FULL         | ORDERS    |    67 |  1139 |    14   (0)| 
|   4 |   TABLE ACCESS BY INDEX ROWID| EMPLOYEES |     1 |    44 |     1   (0)| 
|*  5 |    INDEX UNIQUE SCAN         | EMP_PK    |     1 |       |     0   (0)| 
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("O"."CUSTOMER_ID"=144)
   5 - access("E"."EMPLOYEE_ID"="O"."SALES_REP_ID")
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> SELECT /* IN example */
  2           e.employee_id, e.first_name, e.last_name, e.salary
  3      FROM employees e
  4     WHERE e.employee_id IN (SELECT o.sales_rep_id         /* Note 4 */
  5                               FROM orders o
  6                              WHERE o.customer_id = 144);
 
Execution Plan
----------------------------------------------------------
Plan hash value: 551415261
 
------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| 
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    67 |  4087 |    49   (3)| 
|   1 |  NESTED LOOPS                |           |    67 |  4087 |    49   (3)| 
|   2 |   SORT UNIQUE                |           |    67 |  1139 |    14   (0)| 
|*  3 |    TABLE ACCESS FULL         | ORDERS    |    67 |  1139 |    14   (0)| 
|   4 |   TABLE ACCESS BY INDEX ROWID| EMPLOYEES |     1 |    44 |     1   (0)| 
|*  5 |    INDEX UNIQUE SCAN         | EMP_PK    |     1 |       |     0   (0)| 
------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("O"."CUSTOMER_ID"=144)
   5 - access("E"."EMPLOYEE_ID"="O"."SALES_REP_ID")
 
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> set autotrace off



They give insufficient details in their example to reproduce.  when you tested this out - what did 
you yourself see? 

5 stars what's the maximum number of values in the IN expression   January 3, 2006 - 8pm Central time zone
Reviewer: Jianhui from CA
Hi Tom,

IN (value1, value2, ... valueN)

What's the maximum limit of N here allowed by Oracle? Do you have URL to the document for this 
limit?

Thanks

 


5 stars Question about not exists   January 19, 2006 - 8am Central time zone
Reviewer: Oleg 
Hi,Tom,
I have this query:
SELECT   b.branch_no, b.salesman_no,
         ROUND
             (  SUM (CASE
                        WHEN b.returning_no !=2
                           THEN order_qty
                     END)
              / 22
             ) avg_rtrn_orders_qty,
         ROUND (  SUM (CASE
                          WHEN b.bsnss_no != 6
                             THEN order_qty
                       END)
                / 22
               ) avg_no_gas_orders_qty,
         ROUND (  SUM (CASE
                          WHEN b.bsnss_no = 6
                             THEN order_qty
                       END)
                / 22
               ) avg_gas_orders_qty
    FROM km_orders a, km_customers b, km_nwpr_editions c,km_closed_customers d
   WHERE a.cstmr_no = b.cstmr_no
     and b.cstmr_no=d.cstmr_no(+)
     and c.nwpr_date BETWEEN d.date_from(+) AND d.date_to(+)
     AND b.bsnsslike_sw = 'ë'
     AND a.prod_no = c.prod_no
     AND c.nwpr_date BETWEEN a.date_from AND a.date_to
     AND a.prod_no = 1
     AND b.branch_no != 9
     AND c.nwpr_date BETWEEN TO_DATE ('01/11/2005', 'DD/MM/YYYY')
                         AND TO_DATE ('30/11/2005', 'DD/MM/YYYY')
     AND a.day_in_week = c.nwpr_day
     AND NOT EXISTS (
             SELECT 1
               FROM km_closed_customers
              WHERE cstmr_no = a.cstmr_no
                AND c.nwpr_date BETWEEN date_from AND date_to)
GROUP BY b.branch_no, b.salesman_no
I see that without 
" AND NOT EXISTS (
             SELECT 1
               FROM km_closed_customers
              WHERE cstmr_no = a.cstmr_no
                AND c.nwpr_date BETWEEN date_from AND date_to)"
the query is running faster,but i need to check this table,
what can i do instead this not exists statement?
Thank you,
Oleg,
 


5 stars best practice?   February 17, 2006 - 1pm Central time zone
Reviewer: A reader 
Tom,
is it a good practice to have 2 exists in the form?

   SELECT *
                    FROM TABLE a
                   WHERE NOT EXISTS (SELECT 1
                                       FROM TABLE B i
                                      WHERE i.CLASS = a.CLASS_NAME
                                        AND NVL(i.CLASS_NAME,0) = NVL(a.CLASS_DEPT,0)
                                    )
                     AND NOT EXISTS (SELECT 1
                                       FROM work_orders w
                                      WHERE w.class_id     = a.s_name
                                        AND NVL(w.CLASS_ID,0) = nvl(a.CLASS_DEPT,0)
                                     ) 


Followup   February 17, 2006 - 4pm Central time zone:

it seems to be less a matter of "good practice" and more a matter of getting the right answer 
though isn't it?

It is a bad practice to do that nvl'ing bit in general.  sure you really need to do that. 

5 stars can you clarify with example please   February 22, 2006 - 2pm Central time zone
Reviewer: A reader 


Followup   February 22, 2006 - 2pm Central time zone:

an example of what exactly 

4 stars of how can I avoid   February 22, 2006 - 3pm Central time zone
Reviewer: A reader 
Tom,
How can I avoid "the nvl'ing bit" in this case. 


Followup   February 22, 2006 - 3pm Central time zone:

fixing the model?  apparently you didn't mean to use nulls?  Not if you basically join on them like 
that.

Else, we'd have to function base index them in this case.  

Are you sure nulls are an issue here.  Do you really need to join by them or is that "just in case"

 

5 stars THANKS!!!!!   February 22, 2006 - 4pm Central time zone
Reviewer: A reader 


5 stars updating using EXISTS   March 9, 2006 - 3pm Central time zone
Reviewer: anirudh from usa
hi tom

in our project we often have a situaton where some of the fields in a table (master_table in the 
example below) needs to be updated with data in another table (UPDATES_FEED in the example below)

MASTER_TABLE
key_fld
fld_1
fld_2
fld_3
fld_4
fld_5
fld_6

UPDATES_FEED
key_fld
fld_1
fld_2

i typically use SQL such as :

update master_table a
set (fld_1,fld_2) = (select fld_1,fld_2
             from updates_feed b    
             where a.key_fld = b.key_fld)
where exists (select 1  
          from updates_feed c
          where a.key_fld = c.key_fld)

I have to add the WHERE condition otherwise if no match is found in updates_feed table it would 
update the fields to NULL)

even though it works - it looks some what less than elegant - and has performance issues as well - 
as i think the lookup on the updates_feed is happening two times

Is there a better way of achieving the result ?



 


Followup   March 9, 2006 - 4pm Central time zone:

update a join


update (select m.fld_1, m.fld_2, u.fld_1 n1, u.fld_2 n2
          from master_table m, updates_feed u
         where m.key_fld = u.key_fld )
 set fld_1 = n1, fld_2 = n2;

 

5 stars wonderful   March 10, 2006 - 4am Central time zone
Reviewer: A reader 
Morning Tom,

i have always wondered the best way of doing that kind of update. in your example is it therefore 
possible to update either or indeed both of the tables used in the select statement?

Regards 
Dave 


Followup   March 10, 2006 - 12pm Central time zone:

only one, you are updating a table from another table 

4 stars Strange behaviour   March 30, 2006 - 6am Central time zone
Reviewer: Abhijit Mallick from India
Hi Tom,
   Today I have found a strange behaviuor of SQL. The problem is as follows:
  
Please see the details: 

SQL> create table tab2
  2  (col21 number);

Table created.

SQL> insert into tab2 select rownum from user_objects;

2429 rows created.

SQL> create table tab1
  2  (col1 varchar2(1000),
  3   col2 number);

Table created.

SQL> insert into tab1 select table_name,rownum from user_tables;

293 rows created.

SQL> commit;

Commit complete.

SQL> ed
Wrote file afiedt.buf

  1  select count(*) from tab1
  2  where col1 is not null
  3* and col2 not in (select col2 from tab2)

SQL> /

  COUNT(*)
----------
         0

---LOOK AT THE INNER QUERY 
THE TABLE TAB2 DOES NOT HAVE THE COL2 STILL THE QUERY EXECUTES WITHOUT ANY ERROR. SEE THE BELOW 
PART , IT GIVES THE ERROR.

SQL> select col2 from tab2;
select col2 from tab2
       *
ERROR at line 1:
ORA-00904: "COL2": invalid identifier


SQL> 


Can you please discuss the behaviour of the above sub-query.

Thanks
--Abhijit Mallick
Bangalore
 


Followup   March 31, 2006 - 11am Central time zone:

it is called a correlated subquery.

select count(*) 
  from tab1
 where col1 is not null
   and col2 not in (select TAB1.COL2 from tab2)


that is the same query as the one you executed.  

Just like you could:

  and not exists ( select * from tab2 where tab2.col21 = tab1.col2 )

would work - as would
  and not exists ( select * from tab2 where col21 = col2 )

we would just figure out using standard scoping rules whence each reference column was from.
 

5 stars IN & EXISTS   April 19, 2006 - 8am Central time zone
Reviewer: jash from India
How to tune the below query to get result very quickly

select count(*) from polineitemtab;

  COUNT(*)
----------
   1581181

select count(*) from requisitiontab;

  COUNT(*)
----------
    284956

select rootid from NEW_TEMP_polineitemtab
where poli_requisition not in 
                   (select rootid from NEW_TEMP_requisitiontab) 


Followup   April 19, 2006 - 9am Central time zone:

make sure you are using the CBO
make sure rootid is NOT NULL in NEW_TEMP_requisitiontab OR add "where rootid is not null" to the 
subquery.

or

select /*+ ALL_ROWS */ a.rootid
  from NEW_TEMP_polineitemtab a, NEW_TEMP_requisitiontab b
 where a.rootid = b.rootid(+)
   and b.rootid is null;

(a do it yourself hash anti-join) 

5 stars IN & EXISTS   April 19, 2006 - 9am Central time zone
Reviewer: jash from India
What is b.rootid(+) significance in this query 


Followup   April 19, 2006 - 9am Central time zone:

outer join - you wanted to find rows in A, not in B - so we outer join A to B and only keep those 
where there was no match (b.rootid is null) 

5 stars IN & EXISTS   April 19, 2006 - 9am Central time zone
Reviewer: jash from India
There is no null values in b.rootid(column) NEW_TEMP_requisitiontab(table) what is the reason using 
is null  


Followup   April 19, 2006 - 10am Central time zone:

when you outer join T1 to T2, SQL will "make up" a NULL row in T2 to join to when there isn't a row 
to be joined to.

So, the following two queries are semantically equivalent (assuming t2.x is NOT NULL)

ops$tkyte@ORA9IR2> create table t1 ( x int );
Table created.
 
ops$tkyte@ORA9IR2> create table t2 ( x int NOT NULL );
Table created.
 
ops$tkyte@ORA9IR2> insert into t1 values ( 1 );
1 row created.
 
ops$tkyte@ORA9IR2> insert into t1 values ( 2 );
1 row created.
 
ops$tkyte@ORA9IR2> insert into t2 values ( 1 );
1 row created.
 
ops$tkyte@ORA9IR2> select * from t1 where x not in (select x from t2);
 
         X
----------
         2
 
ops$tkyte@ORA9IR2> select t1.*
  2    from t1, t2
  3   where t1.x = t2.x(+)
  4     and t2.x is null;
 
         X
----------
         2
 

5 stars EXISTS vs "where rownum <=1"   May 4, 2006 - 7pm Central time zone
Reviewer: K. Cruz from WashingtonDC
Thanks Tom for your valuable support. 

I'm reviewing potentially bad queries and ran across a query that looks for the existence of a 
value in a table. This query looked suspicious to me and wanted to get your feedback. 

SELECT DECODE(COUNT(NF.SUFFIX), 0, 'N', 'Y')
  FROM NF_FILES NF,
       NF_RPC_LOCATION NRL
 WHERE NF.SUFFIX = :B3
   AND NF.OFFICE_ID = NRL.OFFICE_ID
   AND NF.SECTION_CODE = NRL.SECTION_CODE
   AND NF.RESP_CODE = NRL.RESP_CODE
   AND NF.OFFICE_ID = :B2
   AND NF.LAST_TRANSACTION_DATE >= to_date(:B1, 'MMDDYYYY')
   AND ROWNUM <= 1

After rewriting the query so that it uses an EXIST clause, the explain plan for both seemed to look 
very similar. 

17:48:53 nfuser@NFDEV> l
  1  SELECT DECODE(COUNT(NF.SUFFIX), 0, 'N', 'Y')
  2    FROM NF_FILES NF,
  3         NF_RPC_LOCATION NRL
  4   WHERE NF.SUFFIX = :B3
  5     AND NF.OFFICE_ID = NRL.OFFICE_ID
  6     AND NF.SECTION_CODE = NRL.SECTION_CODE
  7     AND NF.RESP_CODE = NRL.RESP_CODE
  8     AND NF.OFFICE_ID = :B2
  9     AND NF.LAST_TRANSACTION_DATE >= to_date(:B1, 'MMDDYYYY')
 10*    AND ROWNUM <= 1
17:48:54 nfuser@NFDEV> /

D
-
N

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=1 Bytes=43)
   1    0   SORT (AGGREGATE)
   2    1     COUNT (STOPKEY)
   3    2       NESTED LOOPS (Cost=7 Card=1 Bytes=43)
   4    3         PARTITION RANGE (SINGLE)
   5    4           TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'NF_FILES' (Cost=5 Card=1 Bytes=31)
   6    5             INDEX (RANGE SCAN) OF 'NF_OFFID_P1_IDX' (NON-UNIQUE) (Cost=4 Card=1)
   7    3         INDEX (RANGE SCAN) OF 'RPC_LOCATION_RESP_CNTL_FK_I'(NON-UNIQUE) (Cost=2 Card=1 
Bytes=12)


17:50:15 nfuser@NFDEV> l
  1    SELECT 'Y'
  2      FROM DUAL
  3     WHERE EXISTS ( SELECT 1
  4                    FROM NF_FILES NF,
  5                         NF_RPC_LOCATION NRL
  6                   WHERE NF.SUFFIX = :B3
  7                     AND NF.OFFICE_ID = NRL.OFFICE_ID
  8                     AND NF.SECTION_CODE = NRL.SECTION_CODE
  9                     AND NF.RESP_CODE = NRL.RESP_CODE
 10                     AND NF.OFFICE_ID = :B2
 11*                    AND NF.LAST_TRANSACTION_DATE >= to_date(:B1, 'MMDDYYYY') )
17:50:16 nfuser@NFDEV> /

no rows selected

Elapsed: 00:00:00.06

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=17 Card=8168)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'DUAL' (Cost=17 Card=8168)
   3    1     NESTED LOOPS (Cost=7 Card=1 Bytes=43)
   4    3       PARTITION RANGE (SINGLE)
   5    4         TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'NF_FILES'(Cost=5 Card=1 Bytes=31)
   6    5           INDEX (RANGE SCAN) OF 'NF_OFFID_P1_IDX' (NON-UNIQUE) (Cost=4 Card=1)
   7    3       INDEX (RANGE SCAN) OF 'RPC_LOCATION_RESP_CNTL_FK_I' (NON-UNIQUE) (Cost=2 Card=1 
Bytes=12)

NF_FILES is a 300 million row table so based on your previous advice, my outer query was the DUAL 
table and my inner query was the large NF_FILES table. I'm not able to determine which is the most 
efficient and the original query has been flagged and resource intensive. Could you provide some 
feedback on which approach makes more sense? 

Thank you!!

 


Followup   May 5, 2006 - 1am Central time zone:

the original doesn't have to read dual, yours does the work of the original PLUS the reading of 
dual.

hopefully the index nf_offid_p1_idx has suffix and last_transaction_date in it as well 
(concatenated index)


 WHERE NF.SUFFIX = :B3
  7                     AND NF.OFFICE_ID = NRL.OFFICE_ID
  8                     AND NF.SECTION_CODE = NRL.SECTION_CODE
  9                     AND NF.RESP_CODE = NRL.RESP_CODE
 10                     AND NF.OFFICE_ID = :B2
 11*                    AND NF.LAST_TRANSACTION_DATE >= to_date(:B1, 'MMDDYYYY')

I'd want an index on (office_id,suffix,last_transaction_date) probably. 

5 stars   May 10, 2006 - 9am Central time zone
Reviewer: Reader 
I have a table called person. I need to update the last name and birth date of person_no 2 with 
last name and birthdate of person_no 1. "id" is the primary key of the person table. 

Can you please let me know how to do this update? 

create sequence id_seq start with 1;

create table person
(id number primary key
, person_no number
,first_nm varchar2(30)
,last_nm varchar2(30)
,birth_dt varchar2(14)
,CONSTRAINT pk_id PRIMARY KEY(id) )

insert into person
values
(id_seq.nextval,1,'ABC','AAA','19991120');

insert into person
values
(id_seq.nextval,2,'DEF','','00000000');

insert into person
values
(id_seq.nextval,1,'GHI','KLM','19921021');

insert into person
values
(id_seq.nextval,2,'NPQ','','00000000');

insert into person
values
(id_seq.nextval,1,'RST','UVW','19721215');

insert into person
values
(id_seq.nextval,2,'XYZ','','00000000');

insert into person
values
(id_Seq.nextval,1,'TTT','GGG','19750324');

select * from person;

 ID  PERSON_NO FIRST_NM  LAST_NM   BIRTH_DT   
---  --------  --------  -------   --------   
  1    1       ABC       AAA       19991120   
  2    2       DEF                 00000000   
  3    1       GHI       KLM       19921021   
  4    2       NPQ                 00000000   
  5    1       RST       UVW       19721215   
  6    2       XYZ                 00000000   
  7    1       TTT       GGG       19750324   


Thanks 


Followup   May 10, 2006 - 10am Central time zone:

you seem to have person_no = 1 and 2 in there more than once.

It is not at all clear how you would pick which person_no=1 to use to update all of the person_no = 
2 records. 

5 stars   May 10, 2006 - 11am Central time zone
Reviewer: Reader 
Tom,
I am sorry for not being clear on the above question. Please find below the details of the tables. 

I need to update the last name and birth date of 
person_no 2 with last name and birthdate of person_no 1. 
of the person table. 

Can you please let me know how to do this update? 



create sequence id_pseq start with 200;


create table person
(id_person number primary key
, person_no number
,first_nm varchar2(30)
,last_nm varchar2(30)
,birth_dt varchar2(14)
)

insert into person
values
(id_pseq.nextval,1,'ABC','AAA','19991120');

insert into person
values
(id_pseq.nextval,2,'DEF','','00000000');

insert into person
values
(id_pseq.nextval,1,'GHI','KLM','19921021');

insert into person
values
(id_pseq.nextval,2,'NPQ','','00000000');

insert into person
values
(id_pseq.nextval,1,'RST','UVW','19721215');

insert into person
values
(id_pseq.nextval,2,'XYZ','','00000000');

insert into person
values
(id_pseq.nextval,1,'TTT','GGG','19750324');


create sequence id_pbseq start with 100;

create table person_i_bnft
(id_ibnft number primary key
,id_person number references person(id_person)
,person_no number
,id_bnft number references person_bnft(id_bnft)
,age number 
);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,200,1,3000,20);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,201,2,3000,25);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,202,1,3001,30);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,203,2,3001,35);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,204,1,3002,40);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,205,2,3002,45);

insert into person_i_bnft
values
(id_pbseq.NEXTVAL,206,1,3003,50);

create sequence id_bseq start with 3000;

create table person_bnft
(id_bnft number primary key
,amt varchar2(6)
);

insert into person_bnft
values
(id_bseq.NEXTVAL,'$1000');

insert into person_bnft
values
(id_bseq.NEXTVAL,'$2000');

insert into person_bnft
values
(id_bseq.NEXTVAL,'$3000');

insert into person_bnft
values
(id_bseq.NEXTVAL,'$4000');


Data from tables:
PErson table:
ID_PERSON PERSON_NO    FIRST_NM LAST_NM BIRTH_DT    
200         1    ABC     AAA     19991120    
201         2    DEF          00000000    
202         1    GHI     KLM     19921021    
203         2    NPQ          00000000    
204         1    RST     UVW     19721215    
205         2    XYZ          00000000    
206         1    TTT     GGG     19750324    


PErson_i_bnft
ID_IBNFT ID_PERSON   PERSON_NO ID_BNFT    AGE
100      200        1    3000    20 
101      201        2    3000    25 
102      202        1    3001    30 
103      203        2    3001    35 
104      204        1    3002    40 
105      205        2    3002    45 
106      206        1    3003    50 


Person_bnft
ID_BNFT    AMT
3000    $1000
3001    $2000
3002    $3000
3003    $4000

Thanks 


Followup   May 11, 2006 - 7am Central time zone:

no, this is still as clear as a puddle of mud.

sigh, do you see that you have "FOUR PERSON_NO=1" records.  You have "THREE PERSON_NO=2" records.  
You have only said:

...
I need to update the last name and birth date of 
person_no 2 with last name and birthdate of person_no 1. 
of the person table. 
.......

No one can answer your question - there is no answer to your question.  Your question doesn't make 
sense yet.  Which of the four should be applied to the three, and why.
 

4 stars   May 11, 2006 - 8am Central time zone
Reviewer: Reader 
ID_PERSON PERSON_NO    FIRST_NM LAST_NM BIRTH_DT    
200         1        ABC     AAA     19991120    
201         2        DEF              00000000    
202         1        GHI     KLM     19921021    
203         2        NPQ              00000000    
204         1        RST     UVW     19721215    
205         2        XYZ              00000000    
206         1        TTT     GGG     19750324  


I need to update (last_nm,birth_dt) of ID_PERSON=201(person_no=2) with the value of 
(last_nm,birth_dt) in ID_PERSON=200, if the birth_dt of 201 is '00000000' and last_nm of 201 is 
null

I need to update (last_nm,birth_dt) of ID_PERSON=203(person_no=2) with the value of 
(last_nm,birth_dt) in ID_PERSON=202, if the birth_dt of 202 is '00000000' and last_nm of 202 is 
null

I need to update (last_nm,birth_dt) of ID_PERSON=205(person_no=2) with the value of 
(last_nm,birth_dt) in ID_PERSON=204, if the birth_dt of 204 is '00000000' and last_nm of 204 is 
null.

Thanks 


Followup   May 11, 2006 - 7pm Central time zone:

and how pray tell did you "know that"

logic - tell use the LOGIC here.

why does 200 go with 201, are they always sequential, is 200 a fixed constant or will that change.

Yes, it is now "semi-obvious" what you seem to be saying, but I want you to say it, out loud here.

What if the row with id_person = 202 is missing for example.  CONSIDER ALL OF THE CASES, be 
detailed, be explicit, state requirements - go 100% of the way. 

4 stars   May 11, 2006 - 11am Central time zone
Reviewer: Reader 
I am trying to do the following update, but I get the error
"ORA-01779: cannot modify a column which maps to a non key-preserved table "

Can you please let me know what is wrong here?


update 
(
  select t1.fm1,t1.lm1,t2.fm2,t2.lm2  
  from 
       (select p11.ID_PERSON id1
              ,p11.FIRST_NM fm1
          ,p11.LAST_NM lm1
              ,p11.birth_dt dob1
          ,p11.PERSON_NO per_no1
          ,p13.ID_BNFT T1_ID_BNFT
        from person p11
            ,person_i_bnft p12
        ,person_bnft p13
        where p11.ID_PERSON = p12.ID_PERSON
          and p12.ID_BNFT = p13.ID_BNFT
          and p11.PERSON_NO = 1
          and p11.BIRTH_DT <> '00000000'
      ) T1,
      (select p21.ID_PERSON id2
             ,p21.FIRST_NM fm2
         ,p21.LAST_NM lm2
             ,p21.birth_dt dob2
         ,p21.PERSON_NO per_no2
        ,p23.ID_BNFT T2_ID_BNFT
        from person p21
            ,person_i_bnft p22
        ,person_bnft p23
        where p21.ID_PERSON = p22.ID_PERSON
          and p22.ID_BNFT = p23.ID_BNFT
          and p21.PERSON_NO = 2
          and p21.BIRTH_DT = '00000000'
        ) T2
   where T1_ID_BNFT = T2_ID_BNFT)
set lm2=lm1
   ,dob2=dob1;

Thanks
 


Followup   May 11, 2006 - 7pm Central time zone:

did you search for "key preserved"

I only ask because those words have not yet appeared here at all and they seem like a logical thing 
to search for?

 

5 stars are these two queries the same...?   June 2, 2006 - 5am Central time zone
Reviewer: A reader 
Hi

I have a query which uses EXISTS I modified it to a join (reduced 50% logical I/O) however for the 
test data we have it seems correct but I am not 100% sure, may you have a look and see if they are 
the same?

SELECT *
  FROM BCOMMON c
 WHERE c.type = 'B'
   AND c.i_period = 1
   AND EXISTS (
          SELECT 1
            FROM BCOMMON c2
           WHERE c2.type = c.type
             AND c2.id_bonif_periodica = c.id_bonif
             AND TO_CHAR (c2.f_imputacion, 'YYYYMMDD') = '20060601')


SELECT c.*
FROM BCOMMON c, BCOMMON c2
WHERE c.type='B'
AND c.i_period=1
AND c2.ID_BONIF_PERIODICA=c.ID_BONIF
AND to_char(c2.F_IMPUTACION, 'YYYYMMDD')='20060601' 


Followup   June 2, 2006 - 10am Central time zone:

No.

you don't have a predicate on the second query to make sure that c2.type is 'B'
if id_bonif_periodica is not unique for a given type/f_imputacion, you cannot just join (it'll 
increase the number of returned records)

and please, never do this:

AND TO_CHAR (c2.f_imputacion, 'YYYYMMDD') = '20060601'

do this:

and f_imputacion = to_date( '20060601', 'yyyymmdd' )


or, if the date column has a time component:

and f_imputacion >= to_date( '20060601', 'yyyymmdd' )
and f_imputacion < to_date( '20060601', 'yyyymmdd' )+1

 

5 stars thanks   June 2, 2006 - 6pm Central time zone
Reviewer: A reader 
Hi Tom

Thanks for your valuable advices. 


5 stars join returns more records   June 5, 2006 - 3am Central time zone
Reviewer: A reader 
Hi Tom

I added a predicate to make sure that c2.type is 'B' so the query becomes

SELECT c.*
FROM BCOMMON c, BCOMMON c2
WHERE c.type='B'
AND c2.type='B'
AND c.i_period=1
AND c2.ID_BONIF_PERIODICA=c.ID_BONIF
and c2.f_imputacion >= to_date( '20060601', 'yyyymmdd' )
and c2.f_imputacion < to_date( '20060601', 'yyyymmdd' )+1

Since c2.ID_BONIF_PERIODICA is not unique I see more rows (with EXISTS only a rown is returned, 
with join two)

I guess join doesnt work in this case?

Sorry for my stupidness :-(
 


Followup   June 5, 2006 - 7am Central time zone:

but that is just what I said above:

...
if id_bonif_periodica is not unique for a given type/f_imputacion, you cannot 
just join (it'll increase the number of returned records)
.......



you can use IN, you can use EXISTS, you can join to an inline view that distincts - but if that 
column is not unique for the given attributes - you cannot "just join" really - it'll increase the 
number of records returned 

5 stars Merge exists and not exists into one SQL statement   October 10, 2006 - 11am Central time zone
Reviewer: Kandy Train from NY
Hi,

Hope this is an appropriate question for thread.
Here is an example,

drop table picking_tab;
drop table locations_tab;
create table picking_tab (pick_ticket_no number, location_id varchar2(10));
create table locations_tab (location_id varchar2(10), warehouse varchar2(10));

insert into picking_tab values(1001, 'Bos');
insert into picking_tab values(1002, 'Bos');
insert into picking_tab values(1002, 'Chi');
insert into picking_tab values(1003, 'Chi');

insert into locations_tab values('Bos', 'NorthEast');
insert into locations_tab values('Chi', 'MidWest');

commit;

What I would like to have is just one SQL statement so that I can pass one literal "NorthEast" or 
"MidWest" to a view with that SQL statement.

When I pass NorthEast, I want NorthEast specific Pick Tickets and any other pick ticket that picks 
from more locations, but at least one of them must be from NorthEast.

The results should look like,

PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
          1002 Bos         NorthEast
          1002 Chi         MidWest
          1003 Chi         MidWest

When I pass MidWest, I only need MidWest Pick Tickets that are not shared with any other region.

The results should look like,

PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
          1003 Chi         MidWest

The example may look silly, as I have to come up with something simple for you to help me.

I appreciate your help as always. 


Followup   October 10, 2006 - 7pm Central time zone:

why are there not two northeast records in the first result?

and you know, you do not "pass" things to a view, you apply predicates to a view, what do you mean 
by "pass"??? 

5 stars Merge exists and not exists into one SQL statement   October 10, 2006 - 11am Central time zone
Reviewer: Kandy Train from NY
In the above posting, I didn't mention what I have tried so far. I am putting it here in case you 
wanted to know it.

--Results
PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
          1003 Chi         MidWest

To get the above results, I used

select p.pick_ticket_no, p.location_id, l.warehouse
from picking_tab p, locations_tab l
where p.location_id = l.location_id
and not exists (select lt.warehouse 
            from picking_tab pt, locations_tab lt
            where pt.pick_ticket_no = p.pick_ticket_no
            and pt.location_id = lt.location_id
            and lt.warehouse = 'NorthEast');

--Results
PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
          1002 Bos         NorthEast
          1002 Chi         MidWest
          1003 Chi         MidWest

To get the above results, I used
select p.pick_ticket_no, p.location_id, l.warehouse
from picking_tab p, locations_tab l
where p.location_id = l.location_id
and exists (select lt.warehouse 
            from picking_tab pt, locations_tab lt
            where pt.pick_ticket_no = p.pick_ticket_no
            and pt.location_id = lt.location_id
            and lt.warehouse = 'MidWest');

I want to have just one SQL where I could pass either "MidWest" or "NorthEast".

Thanks,


 


Followup   October 10, 2006 - 8pm Central time zone:

see above, why just one northeast, doesn't make sense. 

5 stars Merge exists and not exists into one SQL statement   October 10, 2006 - 8pm Central time zone
Reviewer: Kandy train from NY
Hi Tom,

You are right and my first result (in first posting) should look like,

PICK_TICKET_NO LOCATION_ID WAREHOUSE
-------------- ----------- ----------
          1001 Bos         NorthEast
          1002 Bos         NorthEast
          1002 Chi         MidWest

And this is the SQL statement I have used to get this results.

select p.pick_ticket_no, p.location_id, l.warehouse
from picking_tab p, locations_tab l
where p.location_id = l.location_id
and exists (select lt.warehouse 
            from picking_tab pt, locations_tab lt
            where pt.pick_ticket_no = p.pick_ticket_no
            and pt.location_id = lt.location_id
            and lt.warehouse = 'NorthEast');

As you can see, It is the same SQL statement, and the only difference is in EXISTS and NOT EXISTS 
clauses.

I have a ref cursor that has a return type. I tried to use native dynamic SQL by creating two views 
with these two sql statements and did hit a error, which is a limitation in 9i.

So, now I have to create a one view that can produce result sets in one SQL statement.

May be I am making it harder for you to understand.

If that's the case, I will post whole procedure togather with the ref cursor tomorrow.

Thanks,

 


Followup   October 10, 2006 - 8pm Central time zone:

ok, why just one midwest then, why not both. 

5 stars Merge exists and not exists into one SQL statement   October 11, 2006 - 6am Central time zone
Reviewer: Kandy Train from NY
Pick_TIcket_Id 1002 has two records (Picking from two locations).

When the SQL is run from MidWest, They should only get the Pick Tickets that are purely from 
MidWest. If a Pick Ticket is filled from NorthEast and MidWest, it is not handled from MidWest, but 
from NorthEast.

When the SQL is run from NorthEast, They should get all the
Pick tickets that at least have a record in pickcing_tab for the same pick_ticket_no.

Does that make sense??

Thanks, 


Followup   October 11, 2006 - 8am Central time zone:

nope, sorry it does not, this is sounding alot like "procedural logic" to me here, the rules seem 
arbitrary. 

5 stars Merge exists and not exists into one SQL statement   October 11, 2006 - 11am Central time zone
Reviewer: Kandy Train from NY
Thanks and I will use Procedural code to solve this issue.

Thanks again and appreciate your help 


4 stars Performance Issue when using Exists...   November 2, 2006 - 12pm Central time zone
Reviewer: VLS from Bombay, India
Hi Tom,

In recent Oracle Magazine, in one of the example, you mentioned that exists will perform better in 
cases where the columns of only one table is used in SELECT statement. For example, 

select x.a, x.b. x.c 
from x, y
where x.a = y.a
and   x.b = :b1

can be coded as

select x.a, x.b. x.c 
from x
where x.b = :b1
and  exists
(select 'x' from y 
where y.a = x.a)

But, in my cases, I have observed that the query performance degrades dratiscally when I change 
these type of queries to make use of EXISTS. For example, 

SELECT dn.dn_num msisdn, ca.customer_id, ca.custcode, co.co_id,
t.ttl_des title, cc.ccfname fname, cc.cclname lname, ca.birthdate,
cc.ccemail email, ca.billcycle, ca.csclimit csc_limit, ca.prgcode,
prg.prgname, co.co_activated, ch.ch_status, co.tmcode, tm.shdes shdes,
sm.SM_PUK , cc.CCZIP , cc.ccline3 , ccline4 , ccline5 , cccity
FROM       (select customer_id, custcode, birthdate, billcycle, csclimit,
    prgcode, rownum srno from customer_all
    where prgcode in 
    (select prgcode from hss_cust_segment where description='POST')) ca,
    contract_all co,
    curr_co_status ch,
    ccontact_all cc,
    directory_number dn,
    pricegroup_all prg,
    mputmtab tm,
    title t,
    storage_medium sm
WHERE     ca.customer_id = co.customer_id
AND     cc.customer_id = ca.customer_id
AND     cc.ccbill = 'X'
AND     cc.CCTITLE = t.TTL_ID(+)
AND     co.co_id = ch.co_id
AND     ch.ch_status IN ('a', 's')
AND     ca.prgcode = prg.prgcode
AND     co.tmcode = tm.tmcode
AND     tm.vscode = 1
and     exists
(select 'x' from contr_services_cap cs
where     cs.co_id = co.co_id
AND     cs.sncode = 1
AND     cs.dn_id  = dn.dn_id)
and    exists
(select 'x' from contr_devices cd
where    cd.co_id = co.co_id
AND     cd.cd_deactiv_date IS NULL
AND     cd.port_id = sm.sm_id);

Above query did not fetch result in 6 hours whereas below query executed in 4 hours.

SELECT dn.dn_num msisdn, ca.customer_id, ca.custcode, co.co_id,
t.ttl_des title, cc.ccfname fname, cc.cclname lname, ca.birthdate,
cc.ccemail email, ca.billcycle, ca.csclimit csc_limit, ca.prgcode,
prg.prgname, co.co_activated, ch.ch_status, co.tmcode, tm.shdes shdes,
sm.SM_PUK , cc.CCZIP , cc.ccline3 , ccline4 , ccline5 , cccity
FROM       (select customer_id, custcode, birthdate, billcycle, csclimit,
    prgcode, rownum srno from customer_all
    where prgcode in 
    (select prgcode from hss_cust_segment where description='POST')) ca,
    contract_all co,
    curr_co_status ch,
    ccontact_all cc,
    directory_number dn,
    pricegroup_all prg,
    mputmtab tm,
    title t,
    storage_medium sm,
    contr_devices cd,
    contr_services_cap cs
WHERE     ca.customer_id = co.customer_id
AND     cc.customer_id = ca.customer_id
AND     cc.ccbill = 'X'
AND     cc.CCTITLE = t.TTL_ID(+)
AND     co.co_id = ch.co_id
AND     ch.ch_status IN ('a', 's')
and    cs.co_id = co.co_id
AND     cs.sncode = 1
AND     cs.seqno = (SELECT MAX (seqno)
    FROM contr_services_cap x
    WHERE cs.co_id = x.co_id AND x.sncode = 1)
AND     ca.prgcode = prg.prgcode
AND     co.tmcode = tm.tmcode
AND     tm.vscode = 1
AND     cs.dn_id  = dn.dn_id
and    cd.co_id = co.co_id
AND     cd.cd_deactiv_date IS NULL
AND     cd.port_id = sm.sm_id;


Similarly, there were other incidence when exists has caused performance issues. Also, the tables 
used inside the exists condition are big and have proper indexes even though these fail.

What could be the problem ? Please help...

Regards
VLS 


Followup   November 2, 2006 - 12pm Central time zone:

I did not say "would work better"

I said "semantically and syntactically it would be MORE MEANINGFUL"

are you using the cbo. 

4 stars Query Performance with Exists   November 2, 2006 - 10pm Central time zone
Reviewer: VLS from Bombay, India
Yes Tom, I am using CBO. The inner query also uses a proper indexes. 

Regards
VLS 


Followup   November 3, 2006 - 9am Central time zone:

those two queries do not seem to be even remotely the same here?

how can you even compare them?  they do not return the same results. 

4 stars Difference between the 2 Queries   November 3, 2006 - 9am Central time zone
Reviewer: VLS from Bombay, India
Hi Tom,

Would appreciate if you can let me know the difference between the two queries. Second query is the 
original query and first one is modified by me. In the Original query, contr_services_cap is joined 
back to contr_services_cap to get max(seqno) so that only one co_id is returned. I tried to avoid 
this aggregation by using EXISTS and thought that aggregation should resource intensive.

Regards
VLS 


Followup   November 3, 2006 - 10am Central time zone:

that is precisely the bit that seems strange - how can they be equivalent?

one says "keep only the row with the max seqno"

one says "keep any row that has a mate over here"

first query - no predicate on seqno
second query - predicate on seqno

don't get how they are the same. 

4 stars In response to my previous post   November 3, 2006 - 12pm Central time zone
Reviewer: VLS from Bombay, India
Hi Tom,

"one says "keep only the row with the max seqno"" v/s one says "keep any row that has a mate over 
here". How does it matter ? For example,

Table A

Column x    Column y
------------------------
100        128
100        290
100        300
100        180

200        768
200        190

300        650

select x from a o
where y = (select max(y) from a i where i.x=o.x.x)

will be 

100
200
300 
No matter 300 has only one value but this one value will be a max for this value of x. So, anyway 
each of x value will have a max. I used this logic and hence instead of taking max used EXISTS. 

Now, for the predicates on SEQNO, this predicate is only used to get a single row for each co_id. 
Again using max. This was not required when I used EXISTS.

Regards
VLS
 


Followup   November 3, 2006 - 2pm Central time zone:

you cannot compare the two queries you have.

where exists (something)

where something = (select max(somethingelse) 


are not comparable.  Show me the WHERE exists that is comparable to you:

select x from a o
where y = (select max(y) from a i where i.x=o.x.x)
 
please!!

I don't have (nor do I want) your schema
I don't know your relationships
I did not decompose your query 

I do however have a strong feeling "they just ain't the same" 

5 stars exits perform better?   January 18, 2007 - 9am Central time zone
Reviewer: A reader 
Hi Tom
I had a query like this : which was doing over 2250 LIO/execute -

select  DISTINCT
BO128_Accounts_Mod.Entity_cre_flag,
BO128_Accounts_Mod.MainTableID, BO128_Accounts_Mod.BOCreatedBy,
BO128_Accounts_Mod.Cust_Last_Name, BO128_Accounts_Mod.Cust_First_Name,
BO128_Accounts_Mod.OrgKey,
fn_CategoryLookup(BO128_Accounts_Mod.Cust_Type, 'ACCOUNT_TYPE','en_US'),
BO128_Accounts_Mod.Cust_Type,
BO128_Accounts_Mod.MakerID, BO128_Accounts_Mod.MakerID,
BO128_Accounts_Mod.ownedUserID, BO128_Accounts_Mod.AssignedTo,
BO128_Accounts_Mod.AssignedTo, BO128_Accounts_Mod.ownedUserID,
BO128_Accounts_Mod.AssignedTo, BO128_Accounts_Mod.AssignedToGroup,
fn_CategoryLookup(BO128_Accounts_Mod.StageName,'PROCESSSTEP_STAGE','en_US'),
BO128_Accounts_Mod.StageName,
fn_CategoryLookup(BO128_Accounts_Mod.RecordStatus,'RECORD_STATUS','en_US'),
BO128_Accounts_Mod.RecordStatus,
fn_CategoryLookup(BO128_Accounts_Mod.LastOperPerformed,'LAST_OPERATION', 'en_US'),
BO128_Accounts_Mod.LastOperPerformed,
BO128_Accounts_Mod.ProcessID, decode(BO128_Accounts_Mod.AccountID,NULL,NULL, 'srmBOObj:AccountModBO/AccountModBO.accountID:' || BO128_Accounts_Mod.AccountID),
BO128_Accounts_Mod.GroupID,
BO128_Accounts_Mod.createdUserID, BO128_Accounts_Mod.AccountID,
BO128_Accounts_Mod.AccessOwnerGroup, BO128_Accounts_Mod.LastEditedPage,
BO128_Accounts_Mod.Cust_DOB, BO128_Accounts_Mod.TFPartyFlag,
BO128_Accounts_Mod.Negated, BO128_Accounts_Mod.BlackListed,
BO128_Accounts_Mod.Suspended, BO128_Accounts_Mod.IsMCEdited,
BO128_Accounts_Mod.AssignedByUserID, BO128_Accounts_Mod.IsTampered,
BO128_Accounts_Mod.OwnerGroup
From
ao_acctmod BO128_Accounts_Mod, ao_demomod BO129_Demographic_Mod
where BO128_Accounts_Mod.AccountID = BO129_Demographic_Mod.AccountID AND
( ( ( ( ( ( ( ( ( BO128_Accounts_Mod.MakerID = 1 )  OR (
BO128_Accounts_Mod.ownedUserID =  400013 ) ) )  OR (
BO128_Accounts_Mod.AssignedTo =  400013 ) ) )  AND ( ( ( ( (
BO128_Accounts_Mod.RecordStatus = 'F' )  OR (
BO128_Accounts_Mod.RecordStatus = 'F' ) ) )  OR (
BO128_Accounts_Mod.RecordStatus = 'F' ) ) ) ) ) AND (rownum <= 201) ) )
order by BO128_Accounts_Mod.Cust_Last_Name Asc

call  count    cpu  elapsed    disk    query  current    rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse    1    0.01    0.01      0      0      0      0
Execute    1    0.00    0.00      0      0      0      0
Fetch    15    0.10    0.08      0    443      0      201
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total    17    0.11    0.09      0    443      0      201

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 38 (CRMUSER)

Rows  Row Source Operation
------- ---------------------------------------------------
  201 SORT UNIQUE (cr=2252 pr=0 pw=0 time=951615 us)
  201  COUNT STOPKEY (cr=443 pr=0 pw=0 time=4815 us)
  201  NESTED LOOPS (cr=443 pr=0 pw=0 time=4812 us)
  206  TABLE ACCESS FULL AO_ACCTMOD (cr=235 pr=0 pw=0 time=3567 us)
  201  INDEX RANGE SCAN IX_AO_DEMO_ACCID (cr=208 pr=0 pw=0 time=1943 us)(object id 18490)

when I changed it to like below (used exists instead of distinct) then it is doing alot less LIO's and peforms better too (in terms of response time) .

select
BO128_Accounts_Mod.Entity_cre_flag,
BO128_Accounts_Mod.MainTableID, BO128_Accounts_Mod.BOCreatedBy,
BO128_Accounts_Mod.Cust_Last_Name, BO128_Accounts_Mod.Cust_First_Name,
BO128_Accounts_Mod.OrgKey,
fn_CategoryLookup(BO128_Accounts_Mod.Cust_Type, 'ACCOUNT_TYPE','en_US'),
BO128_Accounts_Mod.Cust_Type,
BO128_Accounts_Mod.MakerID, BO128_Accounts_Mod.MakerID,
BO128_Accounts_Mod.ownedUserID, BO128_Accounts_Mod.AssignedTo,
BO128_Accounts_Mod.AssignedTo, BO128_Accounts_Mod.ownedUserID,
BO128_Accounts_Mod.AssignedTo, BO128_Accounts_Mod.AssignedToGroup,
fn_CategoryLookup(BO128_Accounts_Mod.StageName,'PROCESSSTEP_STAGE','en_US'),
BO128_Accounts_Mod.StageName,
fn_CategoryLookup(BO128_Accounts_Mod.RecordStatus,'RECORD_STATUS','en_US'),
BO128_Accounts_Mod.RecordStatus,
fn_CategoryLookup(BO128_Accounts_Mod.LastOperPerformed,'LAST_OPERATION', 'en_US'),
BO128_Accounts_Mod.LastOperPerformed,
BO128_Accounts_Mod.ProcessID, decode(BO128_Accounts_Mod.AccountID,NULL,NULL, 'srmBOObj:AccountModBO/AccountModBO.accountID:' || BO128_Accounts_Mod.AccountID),
BO128_Accounts_Mod.GroupID,
BO128_Accounts_Mod.createdUserID, BO128_Accounts_Mod.AccountID,
BO128_Accounts_Mod.AccessOwnerGroup, BO128_Accounts_Mod.LastEditedPage,
BO128_Accounts_Mod.Cust_DOB, BO128_Accounts_Mod.TFPartyFlag,
BO128_Accounts_Mod.Negated, BO128_Accounts_Mod.BlackListed,
BO128_Accounts_Mod.Suspended, BO128_Accounts_Mod.IsMCEdited,
BO128_Accounts_Mod.AssignedByUserID, BO128_Accounts_Mod.IsTampered,
BO128_Accounts_Mod.OwnerGroup
From
ao_acctmod BO128_Accounts_Mod
where exists ( select null from
      ao_demomod BO129_Demographic_Mod
        where
        BO128_Accounts_Mod.AccountID = BO129_Demographic_Mod.AccountID AND
( ( BO128_Accounts_Mod.MakerID = 1 )  OR ( BO128_Accounts_Mod.ownedUserID =  400020 )  OR ( BO128_Accounts_Mod.AssignedTo =  400020 ) )  AND ( BO128_Accounts_Mod.RecordStatus = 'F'  OR BO128_Accounts_Mod.RecordStatus = 'F'  OR  BO128_Accounts_Mod.RecordStatus = 'F' )  )
AND rownum <= 201
order by BO128_Accounts_Mod.Cust_Last_Name Asc

call  count    cpu  elapsed    disk    query  current    rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse    1    0.00    0.00      0      0      0      0
Execute    1    0.00    0.00      0      0      0      0
Fetch    15    0.07    0.17      0    735      0      201
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total    17    0.07    0.17      0    735      0      201

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 38 (CRMUSER)

Rows  Row Source Operation
------- ---------------------------------------------------
  201 COUNT STOPKEY (cr=735 pr=0 pw=0 time=5822 us)
  201  FILTER (cr=735 pr=0 pw=0 time=5618 us)
  206  TABLE ACCESS BY INDEX ROWID AO_ACCTMOD (cr=323 pr=0 pw=0 time=2705 us)
  206  INDEX FULL SCAN IX_AO1_CUST_LAST_NAME (cr=17 pr=0 pw=0 time=225 us)(object id 18495)
  201  FILTER (cr=412 pr=0 pw=0 time=2461 us)
  201  INDEX RANGE SCAN IX_AO_DEMO_ACCID (cr=412 pr=0 pw=0 time=2071 us)(object id 18490)

my question - what is the reason - how distinct peforms poor than exists..

Ajeet

4 stars change this query   March 26, 2007 - 7am Central time zone
Reviewer: A reader 
select PGM_VEH_CONFIG.CNTRY_CD
    , PGM_VEH_CONFIG.LANG_CD
    , PGM_VEH_CONFIG.BUSNS_FCN_USE_CD
    , PGM_VEH_CONFIG.MODL_YR_NBR
    , PGM_VEH_CONFIG.PVC_ID
    , RTRIM(PGM_VEH_CONFIG.PVC_DESC)
    , PGM_VEH_CONFIG.PVC_MSRP
    , TO_CHAR(PGM_VEH_CONFIG.PVC_EFCTV_DT,'yyyy-mm-dd')
    , TO_CHAR(PGM_VEH_CONFIG.PVC_TERMNT_DT,'yyyy-mm-dd')
    , PGM_VEH_CONFIG.SELLNG_SRC_CD
    , PGM_VEH_CONFIG.NAMPLT_CD
    , PGM_VEH_CONFIG.MDSNG_MODL_DESGTR
    , PGM_VEH_CONFIG.USG_SEQ_NBR
    , PGM_VEH_CONFIG.MKTG_BODY_STYLE_CD
    , PGM_VEH_CONFIG.SUB_BRAND_CD
    , PGM_VEH_CONFIG.PACKG_OPTN_CD
    , PGM_VEH_CONFIG.SEGMENT_CD
    , PGM_VEH_CONFIG.COMPRS_MODL_CD
    , PGM_VEH_CONFIG.CONFIG_MODL_CD
    , PGM_VEH_CONFIG.OPTN_USG_STMT
  from  gmceycea.PGM_VEH_CONFIG
    , gmceycea.PRICE_RANGE
  where PGM_VEH_CONFIG.BUSNS_FCN_USE_CD  ='038'
    and PGM_VEH_CONFIG.CNTRY_CD        = PRICE_RANGE.CNTRY_CD
    and PGM_VEH_CONFIG.LANG_CD        = PRICE_RANGE.LANG_CD
    and PGM_VEH_CONFIG.BUSNS_FCN_USE_CD  = PRICE_RANGE.BUSNS_FCN_USE_CD
    and PGM_VEH_CONFIG.PVC_MSRP between PRICE_RANGE.START_PRICE_RANGE and PRICE_RANGE.END_PRICE_RANGE
AND  EXISTS (SELECT * from gmceycea.PVC_COLR
      where pvc_colr.cntry_cd      = pgm_veh_config.cntry_cd
      and pvc_colr.lang_cd        = pgm_veh_config.lang_cd
      and pvc_colr.busns_fcn_use_cd  = pgm_veh_config.busns_fcn_use_cd
      and pvc_colr.modl_yr_nbr      = pgm_veh_config.modl_yr_nbr
      and pvc_colr.pvc_id          = pgm_veh_config.pvc_id
      and pvc_colr.busns_fcn_use_cd  = '038'
      and PGM_VEH_CONFIG.BUSNS_FCN_USE_CD= '038');

if PGM_VEH_CONFIG.BUSNS_FCN_USE_CD= '038'
column having the value 038 then EXISTS part should work otherwise exists part should not work.
kindly rephrase it.









Followup   March 26, 2007 - 11am Central time zone:

huh?
3 stars Question   March 27, 2007 - 9am Central time zone
Reviewer: Reader 
Hello,
How do we make Oracle to use index if the query has NOT IN?

SELECT *
FROM ORDERS
WHERE STATUS NOT IN ('NEW','OLD')

I created a Function Based Index.

CREATE INDEX IDX_FBI ON ORDERS (DECODE(STATUS,'NEW','1','OLD','1',0)) = '0'

SELECT *
FROM ORDERS
WHERE DECODE(STATUS,'NEW','1','OLD','1',0)) = '0'

It does a Full table scan but when I do a COUNT(*) it does a Fast Full Index scan. How to use index while using NOT IN?

Thanks

Followup   March 27, 2007 - 10am Central time zone:

no really, what index did you create - given the one you demonstrate does not work.

and what version are you using.


ops$tkyte%ORA10GR2> /*
ops$tkyte%ORA10GR2> drop table t;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t
ops$tkyte%ORA10GR2> as
ops$tkyte%ORA10GR2> select decode( mod(rownum,3),0,'OLD',1,'NEW'  ) ostatus, a.*
ops$tkyte%ORA10GR2>   from all_objects a;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2> drop index t_idx;

Index dropped.

ops$tkyte%ORA10GR2> create index t_idx on t( decode( ostatus, 'NEW', null, 'OLD', null, 'YES' ) );

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t where  decode( ostatus, 'NEW', null, 'OLD', null, 'YES' ) = 
'YES';

Execution Plan
----------------------------------------------------------
Plan hash value: 470836197

------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   500 | 48000 |    44   (0
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |   500 | 48000 |    44   (0
|*  2 |   INDEX RANGE SCAN          | T_IDX |   200 |       |    35   (0
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(DECODE("OSTATUS",'NEW',NULL,'OLD',NULL,'YES')='YES')

ops$tkyte%ORA10GR2> set autotrace off


3 stars   March 28, 2007 - 10am Central time zone
Reviewer: A reader 
Tom,

I have a problem with a select I am not able to built

consider that

select * from t
where decode (:x, null,'999', t.a) in (select t2.a from t2 where t2.a = :x)

This is not working when :x is null

When :x is null I will normally have this
select * from t
where '999'in (select t2.a from t2 where t2.a = '')

What I would like to have in this case is a sort of
select * from t
where '999' in ('999')

Could you please help me

Thanks a lot for your help


Followup   March 28, 2007 - 12pm Central time zone:

... This is not working when :x is null ...

sure it is, it works perfectly. It just probably doesn't do what you want - that is something else altogether.

select *
  from t
 where :x is null or a in (select t2.a from t2 where t2.a = :x)


seems to say exactly what you meant.

4 stars NOT IN too slow   May 4, 2007 - 2am Central time zone
Reviewer: Sagar from Mumbai India
Hi Tom,
    I have one small NOT IN query on 2 tables which when run on 9i database takes more than 2 hours, I had to kill the session without ouput.But IN works in 2 seconds!.
  I copied same tables and created same indexes in 10gR2 , now the query takes around 5-6 minutes in 10gR2.
  For Details see below.I am interested in knowing what causes consistent gets to be so large. There is no one connected to the user other than me.


SQL> select count(1) from camps_upload ;

  COUNT(1)
----------
     43656

Elapsed: 00:00:02.65

Execution Plan
----------------------------------------------------------
Plan hash value: 889239046

---------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |   184   (2)| 00:00:03 |
|   1 |  SORT AGGREGATE    |              |     1 |            |          |
|   2 |   TABLE ACCESS FULL| CAMPS_UPLOAD | 43656 |   184   (2)| 00:00:03 |
---------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        805  consistent gets
        781  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select count(1) from camps_master_upload ;

  COUNT(1)
----------
     38266

Elapsed: 00:00:07.71

Execution Plan
----------------------------------------------------------
Plan hash value: 1748044587

----------------------------------------------------------------------------------
| Id  | Operation          | Name                | Rows  | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                     |     1 |   182   (2)| 00:00:03 |
|   1 |  SORT AGGREGATE    |                     |     1 |            |          |
|   2 |   TABLE ACCESS FULL| CAMPS_MASTER_UPLOAD | 38266 |   182   (2)| 00:00:03 |
----------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
        338  recursive calls
          0  db block gets
        852  consistent gets
        792  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select bytes/(1024*1024) usage from user_segments where segment_name  IN 
('CAMPS_UPLOAD','CAMPS
_MASTER_UPLOAD') ;

     USAGE
----------
         7
         7

Elapsed: 00:00:04.75

Execution Plan
----------------------------------------------------------
ERROR:
ORA-01039: insufficient privileges on underlying objects of the view


SP2-0612: Error generating AUTOTRACE EXPLAIN report

Statistics
----------------------------------------------------------
       2017  recursive calls
          0  db block gets
       4111  consistent gets
        309  physical reads
          0  redo size
        444  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         56  sorts (memory)
          0  sorts (disk)
          2  rows processed

SQL> select count(1) from camps_upload where 
  2  (folio_no,comp_code) in ( select folio_no,comp_code from camps_master_upload)  ;

  COUNT(1)
----------
     38121

Elapsed: 00:00:02.50

Execution Plan
----------------------------------------------------------
Plan hash value: 123087718

-------------------------------------------------------------------------------------------
| Id  | Operation              | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                  |     1 |    26 |    66   (7)| 00:00:01 |
|   1 |  SORT AGGREGATE        |                  |     1 |    26 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI |                  |     1 |    26 |    66   (7)| 00:00:01 |
|   3 |    INDEX FAST FULL SCAN| IND_CMP_MAST_UPD | 38266 |   485K|    30   (4)| 00:00:01 |
|   4 |    INDEX FAST FULL SCAN| IND_CMP_UPD      | 43656 |   554K|    34   (3)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")


Statistics
----------------------------------------------------------
        551  recursive calls
          0  db block gets
        380  consistent gets
        273  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> ed
Wrote file afiedt.buf

  1  select count(1) from camps_upload where
  2* (folio_no,comp_code) not in ( select folio_no,comp_code from camps_master_upload)
SQL> /

  COUNT(1)
----------
      5535

Elapsed: 00:05:57.61

Execution Plan
----------------------------------------------------------
Plan hash value: 3220892056

-------------------------------------------------------------------------------------------
| Id  | Operation           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                     |     1 |    13 |  7228K  (3)| 24:05:45 |
|   1 |  SORT AGGREGATE     |                     |     1 |    13 |            |          |
|*  2 |   FILTER            |                     |       |       |            |          |
|   3 |    TABLE ACCESS FULL| CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
|*  4 |    TABLE ACCESS FULL| CAMPS_MASTER_UPLOAD |     1 |    13 |   185   (3)| 00:00:03 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "CAMPS_MASTER_UPLOAD"
              "CAMPS_MASTER_UPLOAD" WHERE LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2)))
   4 - filter(LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2))


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
   19545709  consistent gets
         20  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> 


Even though index is not used why consistent gets are so abnormally high?.

   

Followup   May 4, 2007 - 1pm Central time zone:

the camps_master_upload table was full scanned 43,656 times.
4 stars NOT IN too slow   May 5, 2007 - 1am Central time zone
Reviewer: Sagar from Mumbai , India
Thanks Tom,
        I have used "NOT EXIST" instead of "NOT IN" and the query executes within 3-4 seconds.Both tables do not have NULLs in the columns used in the query.


SQL> ED
Wrote file afiedt.buf

  1  select  COUNT(1) FROM camps_upload a
  2   where   not exists   ( select 1  from camps_master_upload b where a.folio_no =b.folio_no
  3* and a.comp_code = b.comp_code  )
SQL> /

  COUNT(1)
----------
      5535

Elapsed: 00:00:03.46

Execution Plan
----------------------------------------------------------
Plan hash value: 1970687259

--------------------------------------------------------------------------------
-----------

| Id  | Operation              | Name             | Rows  | Bytes | Cost (%CPU)|
 Time     |

--------------------------------------------------------------------------------
-----------

|   0 | SELECT STATEMENT       |                  |     1 |    26 |   220   (5)|
 00:00:03 |

|   1 |  SORT AGGREGATE        |                  |     1 |    26 |            |
          |

|*  2 |   HASH JOIN RIGHT ANTI |                  | 43655 |  1108K|   220   (5)|
 00:00:03 |

|   3 |    INDEX FAST FULL SCAN| IND_CMP_MAST_UPD | 38266 |   485K|    30   (4)|
 00:00:01 |

|   4 |    TABLE ACCESS FULL   | CAMPS_UPLOAD     | 43656 |   554K|   188   (4)|
 00:00:03 |

--------------------------------------------------------------------------------
-----------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"."FOLIO_NO"="B"."FOLIO_NO" AND "A"."COMP_CODE"="B"."COMP_CODE")


Statistics
----------------------------------------------------------
         17  recursive calls
          0  db block gets
        939  consistent gets
        239  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed


   
    Can you please explain how "NOT IN" works as "IN" executes within 3-4 seconds even if we use no indexes.This will be helpful for all.


Thanks in advance.






Wrote file afiedt.buf

  1   select  COUNT(1) FROM camps_upload
  2    where (folio_no,comp_code)   in
  3*   ( select folio_no,comp_code from camps_master_upload)
SQL> /

  COUNT(1)
----------
     38121

Elapsed: 00:00:12.18

Execution Plan
----------------------------------------------------------
Plan hash value: 474859051

--------------------------------------------------------------------------------
-------------

| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU
)| Time     |

--------------------------------------------------------------------------------
-------------

|   0 | SELECT STATEMENT      |                     |     1 |    26 |   375   (4
)| 00:00:05 |

|   1 |  SORT AGGREGATE       |                     |     1 |    26 |
 |          |

|*  2 |   HASH JOIN RIGHT SEMI|                     |     1 |    26 |   375   (4
)| 00:00:05 |

|   3 |    TABLE ACCESS FULL  | CAMPS_MASTER_UPLOAD | 38266 |   485K|   185   (3
)| 00:00:03 |

|   4 |    TABLE ACCESS FULL  | CAMPS_UPLOAD        | 43656 |   554K|   188   (4
)| 00:00:03 |

--------------------------------------------------------------------------------
-------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")


Statistics
----------------------------------------------------------
        398  recursive calls
          0  db block gets
       1662  consistent gets
        559  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         10  sorts (memory)
          0  sorts (disk)
          1  rows processed



Followup   May 8, 2007 - 10am Central time zone:

well, I don't know about you - but I see a "not exists" and an "in"

that is

I see entirely different queries here that do entirely different things and return entirely different answers.
4 stars NOT IN too slow   May 9, 2007 - 1am Central time zone
Reviewer: Sagar from Mumbai India.
Thanks for the reply Tom,
    Actually my question was continuation of the previously asked quetion just above.
    Anyway here is the problem I am facing.I have a simple "IN" query which executes in less than 5 seconds.If I put "NOT IN" in place of "IN" the query takes more than 5 minutes.Both the tables have around 40,000 records.You had replied that one of the table is full scanned 40,000 times in case of "NOT IN".Then why it is not full scanned it case of "IN"?.

Thanks in advance.


SQL> ed
Wrote file afiedt.buf

  1  select count(1) from camps_upload
  2  where
  3  (folio_no,comp_code) in
  4* ( select folio_no,comp_code from camps_master_upload )
SQL> /

  COUNT(1)
----------
     38121

Elapsed: 00:00:04.28

Execution Plan
----------------------------------------------------------
Plan hash value: 474859051

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    26 |   375   (4)| 00:00:05 |
|   1 |  SORT AGGREGATE       |                     |     1 |    26 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|                     |     1 |    26 |   375   (4)| 00:00:05 |
|   3 |    TABLE ACCESS FULL  | CAMPS_MASTER_UPLOAD | 38266 |   485K|   185   (3)| 00:00:03 |
|   4 |    TABLE ACCESS FULL  | CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")


Statistics
----------------------------------------------------------
        654  recursive calls
          0  db block gets
       1704  consistent gets
       1596  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> ed
Wrote file afiedt.buf

  1  select count(1) from camps_upload
  2  where
  3  (folio_no,comp_code) not in
  4* ( select folio_no,comp_code from camps_master_upload )
SQL> /

  COUNT(1)
----------
      5535

Elapsed: 00:05:28.67

Execution Plan
----------------------------------------------------------
Plan hash value: 3220892056

-------------------------------------------------------------------------------------------
| Id  | Operation           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                     |     1 |    13 |  7228K  (3)| 24:05:45 |
|   1 |  SORT AGGREGATE     |                     |     1 |    13 |            |          |
|*  2 |   FILTER            |                     |       |       |            |          |
|   3 |    TABLE ACCESS FULL| CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
|*  4 |    TABLE ACCESS FULL| CAMPS_MASTER_UPLOAD |     1 |    13 |   185   (3)| 00:00:03 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "CAMPS_MASTER_UPLOAD"
              "CAMPS_MASTER_UPLOAD" WHERE LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2)))
   4 - filter(LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2))


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
   19545709  consistent gets
       1591  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

   
   

Followup   May 11, 2007 - 9am Central time zone:

because in and not in are ENTIRELY DIFFERENT

I don't know why you would even consider comparing them.


3 stars NOT IN too slow   May 12, 2007 - 4am Central time zone
Reviewer: Sagar from Mumbai , India
Thanks for the reply Tom,
    Yes IN and NOT IN are different, but from point of view of algorithm for finding elements from the set A also in the set B (the IN process ) and finding elements in set A that are not in set B ( the NOT IN process) they are very much similar.
     

Followup   May 14, 2007 - 12pm Central time zone:

huh, you are kidding right?

they are very different.

think about this - you are asked to find "where x = 5", that is very very very different from "where x <> 5"

think about what you just said. Other than the fact both are written in SQL, there is nothing similar about them at all.
3 stars NOT IN too slow   May 16, 2007 - 5am Central time zone
Reviewer: Sagar from Mumbai , India
Thanks for the reply Tom,
          I am still not convinced.

Consider this manual process.

There is person having 2 boxes containing some objects.

Box1 has 2 objects.
Box2 has 3 objects.

1. He is told to find out objects from box1 that are also in box2.

Suppose he takes 1 minute to finish.This is the IN process.

2. Now we ask him to find objects from box1 that are not there in box2. This is NOT IN process.

For this he takes 100 minutes!.
(the eg. I showed you for IN executes within 4 seconds while for NOT IN with same data it takes 400 seconds)

In this case can we believe him when he takes 100 minutes for this ?.

IN and NOT IN have similar algorithm only the difference is that search is carried out till last element in the case of NOT IN but I don't think this can justify the delay.







Followup   May 16, 2007 - 10am Central time zone:

do you understand why

where x = 5

where x <> 5

would pretty much be processed very differently???

eg: where x = 5, obviously, a candidate for an index - find 5, get data

where x <> 5, obviously - NOT a candidate for an index, indexes are using to "find" stuff, not "not find" stuff.


in and not in are like NIGHT and DAY. eg: DIFFERENT.
3 stars NOT IN and filter   May 16, 2007 - 7pm Central time zone
Reviewer: Jonathan Lewis from NY NY
Sagar, in one of your posts you say: "Both tables do not have NULLs in the columns used in the query." However, the predicate section of the execution plan shows that you have not declared these columns to be NOT NULL:

   2 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "CAMPS_MASTER_UPLOAD"
              "CAMPS_MASTER_UPLOAD" WHERE LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2)))
   4 - filter(LNNVL("FOLIO_NO"<>:B1) AND LNNVL("COMP_CODE"<>:B2))


If you do not declare the columns as NOT NULL, the optimizer has to assume that they may contain nulls, hence the use of the lnnvl() function calls that disable the use of indexes, and leave Oracle doing a tablescan of the camps_master_upload table for (potientially) every row in the camps_upload table.

Declare the columns not null, or add '{column} is not null' predicates for all for column referenced, and you would probably get the same anti-join as you did with the 'not exists' version of the query. For related comments see:
http://jonathanlewis.wordpress.com/2007/02/25/not-in/








5 stars NOT IN too slow   May 17, 2007 - 6am Central time zone
Reviewer: Sagar from Mumbai , India.
Thanks Jonathan!!!
Jonathan you are absolutely right!!!.Thank you very much.
I just added not null constraint to the referenced columns and its DONE ;).See the output below.Its good example for oracle fans.

Also thanks Tom for taking his very precious time for this query.


SQL> alter table camps_master_upload modify (folio_no not null , comp_code not null ) ; 

Table altered.

SQL> alter table camps_upload modify (folio_no not null , comp_code not null ) ; 

Table altered.

SQL> ed
Wrote file afiedt.buf

  1  select count(1) from camps_upload
  2  where
  3* (folio_no,comp_code)  in ( select folio_no,comp_code from camps_master_upload )
SQL> /

  COUNT(1)
----------
     38121

Elapsed: 00:00:00.54

Execution Plan
----------------------------------------------------------
Plan hash value: 474859051

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    26 |   375   (4)| 00:00:05 |
|   1 |  SORT AGGREGATE       |                     |     1 |    26 |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|                     |     1 |    26 |   375   (4)| 00:00:05 |
|   3 |    TABLE ACCESS FULL  | CAMPS_MASTER_UPLOAD | 38266 |   485K|   185   (3)| 00:00:03 |
|   4 |    TABLE ACCESS FULL  | CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1601  consistent gets
          0  physical reads
          0  redo size
        413  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> ed
Wrote file afiedt.buf

  1  select count(1) from camps_upload
  2  where
  3* (folio_no,comp_code) not  in ( select folio_no,comp_code from camps_master_upload )
SQL> /

  COUNT(1)
----------
      5535

Elapsed: 00:00:00.16

Execution Plan
----------------------------------------------------------
Plan hash value: 1887649868

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    26 |   375   (4)| 00:00:05 |
|   1 |  SORT AGGREGATE       |                     |     1 |    26 |            |          |
|*  2 |   HASH JOIN RIGHT ANTI|                     | 43655 |  1108K|   375   (4)| 00:00:05 |
|   3 |    TABLE ACCESS FULL  | CAMPS_MASTER_UPLOAD | 38266 |   485K|   185   (3)| 00:00:03 |
|   4 |    TABLE ACCESS FULL  | CAMPS_UPLOAD        | 43656 |   554K|   188   (4)| 00:00:03 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FOLIO_NO"="FOLIO_NO" AND "COMP_CODE"="COMP_CODE")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1601  consistent gets
          0  physical reads
          0  redo size
        412  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed



5 stars NOT IN too slow...   May 17, 2007 - 8am Central time zone
Reviewer: Sagar from Mumbai , India.
Note that there are no indexes on both the tables.





Followup   May 17, 2007 - 11am Central time zone:

all I've been trying to get you to realize is:

not in and in are as fundamentally different as night and day are.

not in, especially with regards to nullality, is hugely different.
5 stars Thanks Tom   May 17, 2007 - 12pm Central time zone
Reviewer: Sagar from Mumbai , India.
Thanks once again Tom,
          Yes I got it now. Each day I learn something new here.Keep up the great work.I am a Tom Kyte fan ;).



3 stars I need this   October 23, 2007 - 7am Central time zone
Reviewer: rebeccah tukisi from south africa,midrand
good afternoon Tom

please help me to register on your site 
so that i can email question too, i kind of new to the site and the oracle world

do you also help with the discoverer queries/reports

from rebeccah


Followup   October 23, 2007 - 11am Central time zone:

nobody registers here.

when i have time, I permit new questions to be asked - it is first come, first serve and catch as catch can.
3 stars   March 10, 2008 - 7am Central time zone
Reviewer: Sachin from India
Hi Tom ,

Look at below  sql query , 
insert into rp_alert_cdr
          select
          /*+ INDEX(c, CDR_IDX2)*/   
          al_number_seq.currval,
     c.*
            from
        cdr c
     where
    c.prof_num = lv_high_usage_data(i).plg_prof_num and
    c.insert_dt >= lv_start_time and
    c.insert_dt <= lv_end_time and
    c.prof_num is not null and
    c.toll_value > 0 and
    c.b_intl_ind = 2 and
    c.b_off_hook_ind = 1 and
    c.orig_type <> 1 and
    upper(c.DIALLED_DIGITS) Not Like '101%'  and 
    c.term_type <> 0 and 
nvl(c.cic,999999) not in ('*V','*U');  -- added new clause



My process is ,just inserting records from CDR table to RP_ALERT_CDR only those record which 
condition is qulifiy the where clause. this was running as per need,, now i have added new 
condition.

"and  nvl(c.cic,999999)  not in ('*V','*U')"  -- added new clause

cdr table has CIC column varchar2(5) datatype. it can contain null record also . 
in current time , we have only *V,*U and null record in CIC column, after adding condition with out 
using NVL
it gives 0 record
and when we use query with nvl function it give desire 
result.






 


Followup   March 10, 2008 - 11am Central time zone:

"and  nvl(c.cic,999999)  not in ('*V','*U')"  -- added new clause

cdr table has CIC column varchar2(5) datatype.



one utterly and completely wonders why you would nvl to a number then doesn't one? If you have a varchar2(5) - what could be the possible logic behind nvl'ing to 999,999???

but anyway - you have this column cic

it has in it:

a) *V
b) *U
c) a null value


Now, you want to keep all records where cic is not in *V, *U

Well, null is neither NOT IN, nor IN ('*v','*u', NULL is "not known"

so, where null not in (a,b) - is NOT TRUE (nor is it false)
where NOT(null not in (a,b)) - is NOT TRUE (nor is it false)

they are unknown.


so, currently, seems what you really meant to say was:

where cic is null


plain and simple

4 stars   March 12, 2008 - 1am Central time zone
Reviewer: sachin from india
Thanks Tom ,

i was missing this. 


5 stars are these 2 SQLs are same   March 14, 2008 - 6am Central time zone
Reviewer: Reene 
Hi Tom

I have a SQL developed by some developer.

select  ooh.order_number,count(ool.line_id)
from       ont.oe_order_headers_all ooh,
           ont.oe_order_lines_all ool,
           wsh.wsh_delivery_Details wdd
           where
          exists
          ( select  */  'X'
          from    ont.oe_order_headers_all oh,
                  ont.oe_order_lines_all ol,
                  wsh.wsh_delivery_Details wdd,
                  ont.OE_SYSTEM_PARAMETERS_ALL osp
                  where oh.header_id=ol.header_id and
                   ol.line_id=wdd.source_line_id (+) and
                  ( oh.ORDER_TYPE_ID = osp.ATTRIBUTE1 and osp.attribute1 is not null ) and
                   oh.org_id=3171 and
                   oh.order_source_id=1203 and
                   nvl(wdd.released_status,'x') <> 'C' and
                   nvl(wdd.released_status,'x') <> 'D' and
                   ol.shippable_flag='Y' and
                   ooh.order_number = oh.order_number
         ) and
          ooh.order_source_id=1203  and
          ooh.header_id=ool.header_id and
          nvl(wdd.released_status,'x') <> 'D' and
          ool.line_id=wdd.source_line_id (+) and
          ool.shippable_flag='Y'
group by ooh.order_number
/

I tried to simplify it -as its tkprof shows a poor execution plan - i think that the above SQL can 
be re-wrtitten as :

select     oh.order_number,count(ol.line_id)
from       ont.oe_order_headers_all oh,
           ont.oe_order_lines_all ol,
           wsh.wsh_delivery_Details wdd,
           ont.OE_SYSTEM_PARAMETERS_ALL osp
where
          oh.order_source_id=1203  and
          oh.header_id=ol.header_id and
          oh.ORDER_TYPE_ID = osp.ATTRIBUTE1 and osp.attribute1 is not null ) and
          oh.org_id=3171 and
          nvl(wdd.released_status,'x') <> 'D' and
          nvl(wdd.released_status,'x') <> 'C' and
          ol.line_id=wdd.source_line_id (+) and
          ol.shippable_flag='Y'
group by oh.order_number
/

is it correct .

Thanks


Followup   March 15, 2008 - 9am Central time zone:

run it - and observe the two totally different answers you will likely get.

no, definitely not the same. 
5 stars typo in above post   March 14, 2008 - 6am Central time zone
Reviewer: Reene 
select  ooh.order_number,count(ool.line_id)
from       ont.oe_order_headers_all ooh,
           ont.oe_order_lines_all ool,
           wsh.wsh_delivery_Details wdd
           where
          exists
          ( select   'X'
          from    ont.oe_order_headers_all oh,
                  ont.oe_order_lines_all ol,
                  wsh.wsh_delivery_Details wdd,
                  ont.OE_SYSTEM_PARAMETERS_ALL osp
                  where oh.header_id=ol.header_id and
                   ol.line_id=wdd.source_line_id (+) and
                  ( oh.ORDER_TYPE_ID = osp.ATTRIBUTE1 and osp.attribute1 is not null ) and
                   oh.org_id=3171 and
                   oh.order_source_id=1203 and
                   nvl(wdd.released_status,'x') <> 'C' and
                   nvl(wdd.released_status,'x') <> 'D' and
                   ol.shippable_flag='Y' and
                   ooh.order_number = oh.order_number
         ) and
          ooh.order_source_id=1203  and
          ooh.header_id=ool.header_id and
          nvl(wdd.released_status,'x') <> 'D' and
          ool.line_id=wdd.source_line_id (+) and
          ool.shippable_flag='Y'
group by ooh.order_number
/

I tried to simplify it -as its tkprof shows a poor execution plan - i think that the above SQL can 
be re-wrtitten as :

select     oh.order_number,count(ol.line_id)
from       ont.oe_order_headers_all oh,
           ont.oe_order_lines_all ol,
           wsh.wsh_delivery_Details wdd,
           ont.OE_SYSTEM_PARAMETERS_ALL osp
where
          oh.order_source_id=1203  and
          oh.header_id=ol.header_id and
          oh.ORDER_TYPE_ID = osp.ATTRIBUTE1 and osp.attribute1 is not null ) and
          oh.org_id=3171 and
          nvl(wdd.released_status,'x') <> 'D' and
          nvl(wdd.released_status,'x') <> 'C' and
          ol.line_id=wdd.source_line_id (+) and
          ol.shippable_flag='Y'
group by oh.order_number
/

is it correct .

Thanks



5 stars correct   April 7, 2008 - 4am Central time zone
Reviewer: Reene 
Thanks Tom,

even though the number of records fethced were same,but the data was different. that is 
order_numbers were same but the counts were different.thanks for poiniting it out.
understanding them a bit better now.

thanks


5 stars   July 23, 2008 - 12pm Central time zone
Reviewer: A reader from NYC, USA
Tom,
After reading this thread, i am running the IN & EXIST query for emp and dept table. Both the query shows the same execution plan. But it should be different. For IN query,optimizer should use index scan on emp table. For exists query, optimizer should use FTS on emp table. Please correct me if i am wrong..


SQL> select count(*) from emp;

  COUNT(*)
----------
    458752
 

SQL> select count(*) from dept;

  COUNT(*)
----------
         2
 
SQL>  EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SQLADMIN',TABNAME => 'EMP',ESTIMAT
E_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

SQL> 
SQL>  EXECUTE DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'SQLADMIN',TABNAME => 'DEPT',ESTIMA
TE_PERCENT => 10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1', CASCADE => TRUE);

PL/SQL procedure successfully completed.

SQL> 

SQL> select count(ename) from emp where deptno in(select deptno from dept);

COUNT(ENAME)
------------
         100


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=335 Card=1 Bytes=13)
   1    0   SORT (AGGREGATE)
   2    1     NESTED LOOPS (Cost=335 Card=228355 Bytes=2968615)
   3    2       TABLE ACCESS (FULL) OF 'EMP' (Cost=335 Card=456710 Byt
          es=4567100)

   4    2       INDEX (UNIQUE SCAN) OF 'SYS_C0010206' (UNIQUE)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       5485  consistent gets
          0  physical reads
          0  redo size
        307  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> SELECT COUNT(ENAME) from emp where exists (select null  from dept where dept.deptno =
 emp.deptno);

COUNT(ENAME)
------------
         100


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=335 Card=1 Bytes=13)
   1    0   SORT (AGGREGATE)
   2    1     NESTED LOOPS (SEMI) (Cost=335 Card=228355 Bytes=2968615)
   3    2       TABLE ACCESS (FULL) OF 'EMP' (Cost=335 Card=456710 Byt
          es=4567100)

   4    2       INDEX (UNIQUE SCAN) OF 'SYS_C0010206' (UNIQUE)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       5485  consistent gets
          0  physical reads
          0  redo size
        307  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>


4 stars Query Tuning   November 10, 2008 - 3am Central time zone
Reviewer: Thakur Manoj from India
<code>Dear Sir

This query is taking 3 seconds to execute it

select distinct
      atcf.base_entity_id
      , be.entity_name_full As Cutomer_Name
      , mbf.bank_facility_name AS Account_No
      , atcf.proposed_limit AS Amount_Sanction
      , atcf.os_balance AS Amount_Outstand
      , rfdt.filter_date
      , bd.mst_borrower_state_id
  from  aud_trn_c3_fac atcf
      , rvw_cas_getchkfacdt rfdt
      , base_entity be
      , rvw_cas_stmt_borr_dtls bd
      , mst_bank_facility mbf
Where  atcf.trans_datetime = rfdt.facility_dt and
      atcf.base_entity_id = rfdt.base_entity_id and
      atcf.base_entity_id = be.id and
      atcf.id = rfdt.trn_c3_fac_id and
      rfdt.base_entity_id = bd.base_entity_id and
      rfdt.sub_process_id = bd.sub_process_id and
      rfdt.filter_date = bd.filter_date and
      rfdt.base_entity_id = be.id and
      atcf.mst_bank_facility_id = mbf.id and
      mbf.mst_facilitytype_id = 1 and
      (atcf.base_entity_id, atcf.id) not in
      (
        select distinct tss.base_entity_id,
        tss.trn_prop_facility_id
        from
        rvw_cas_stmt_sanc_sec_dtls tss)
       

Please check the explain Plan

Execution Plan
----------------------------------------------------------
  0    SELECT STATEMENT Optimizer=CHOOSE (Cost=292 Card=1 Bytes=153
      )

  1  0  SORT (UNIQUE) (Cost=228 Card=1 Bytes=153)
  2  1  FILTER
  3  2    MERGE JOIN (Cost=226 Card=1 Bytes=153)
  4  3      SORT (JOIN) (Cost=64 Card=1 Bytes=105)
  5  4      MERGE JOIN (Cost=63 Card=1 Bytes=105)
  6  5        SORT (JOIN) (Cost=14 Card=48 Bytes=3696)
  7  6          HASH JOIN (Cost=12 Card=48 Bytes=3696)
  8  7          HASH JOIN (Cost=7 Card=48 Bytes=2352)
  9  8            TABLE ACCESS (BY INDEX ROWID) OF 'MST_BANK
      _FACILITY' (Cost=2 Card=29 Bytes=754)

10  9              INDEX (RANGE SCAN) OF 'IDX_MST_BANK_FACI
      LITY' (NON-UNIQUE) (Cost=1 Card=29)

11  8            TABLE ACCESS (FULL) OF 'AUD_TRN_C3_FAC' (C
      ost=4 Card=48 Bytes=1104)

12  7          TABLE ACCESS (FULL) OF 'BASE_ENTITY' (Cost=4
      Card=1567 Bytes=43876)

13  5        SORT (JOIN) (Cost=49 Card=1 Bytes=28)
14  13          VIEW OF 'RVW_CAS_GETCHKFACDT' (Cost=48 Card=1
      Bytes=28)

15  14          SORT (GROUP BY) (Cost=48 Card=1 Bytes=63)
16  15            MERGE JOIN (Cost=46 Card=1 Bytes=63)
17  16              SORT (JOIN) (Cost=10 Card=1 Bytes=26)
18  17              HASH JOIN (Cost=8 Card=1 Bytes=26)
19  18                TABLE ACCESS (FULL) OF 'TRN_C3_FAC'
      (Cost=3 Card=328 Bytes=2952)

20  18                TABLE ACCESS (FULL) OF 'AUD_TRN_C3_F
      AC' (Cost=4 Card=951 Bytes=16167)

21  16              FILTER
22  21              SORT (JOIN)
23  22                VIEW (Cost=44 Card=97 Bytes=3589)
24  23                  UNION-ALL
25  24                  SORT (UNIQUE) (Cost=30 Card=89 B
      ytes=2492)

26  25                    FILTER
27  26                      TABLE ACCESS (FULL) OF 'AUD_
      TRN_PROC_STATE' (Cost=28 Card=89 Bytes=2492)

28  27                      SORT (AGGREGATE)
29  28                        TABLE ACCESS (FULL) OF '
      MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

30  26                      TABLE ACCESS (FULL) OF 'MSTW
      F_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

31  24                  SORT (UNIQUE) (Cost=6 Card=8 Byt
      es=176)

32  31                    FILTER
33  32                      TABLE ACCESS (FULL) OF 'AUD_
      TRN_DATA_STATE' (Cost=4 Card=8 Bytes=176)

34  33                      SORT (AGGREGATE)
35  34                        TABLE ACCESS (FULL) OF '
      MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

36  32                      TABLE ACCESS (FULL) OF 'MSTW
      F_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

37  3      SORT (JOIN) (Cost=162 Card=3 Bytes=144)
38  37      VIEW OF 'RVW_CAS_STMT_BORR_DTLS' (Cost=162 Card=3
      Bytes=144)

39  38        UNION-ALL
40  39          SORT (UNIQUE) (Cost=56 Card=1 Bytes=51)
41  40          MERGE JOIN (Cost=54 Card=1 Bytes=51)
42  41            SORT (JOIN) (Cost=49 Card=1 Bytes=24)
43  42              VIEW (Cost=47 Card=1 Bytes=24)
44  43              SORT (GROUP BY) (Cost=47 Card=1 Bytes=
      45)

45  44                MERGE JOIN (Cost=44 Card=1 Bytes=45)
46  45                  SORT (JOIN) (Cost=8 Card=4 Bytes=7
      6)

47  46                  HASH JOIN (Cost=6 Card=4 Bytes=7
      6)

48  47                    TABLE ACCESS (FULL) OF 'TRN_C3
      _CORPORATE' (Cost=2 Card=110 Bytes=660)

49  47                    TABLE ACCESS (FULL) OF 'AUD_TR
      N_C3_CORPORATE' (Cost=3 Card=358 Bytes=4654)

50  45                  FILTER
51  50                  SORT (JOIN)
52  51                    VIEW (Cost=44 Card=97 Bytes=25
      22)

53  52                      UNION-ALL
54  53                      SORT (UNIQUE) (Cost=30 Car
      d=89 Bytes=2492)

55  54                        FILTER
56  55                          TABLE ACCESS (FULL) OF
      'AUD_TRN_PROC_STATE' (Cost=28 Card=89 Bytes=2492)

57  56                          SORT (AGGREGATE)
58  57                            TABLE ACCESS (FULL
      ) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

59  55                          TABLE ACCESS (FULL) OF
      'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

60  53                      SORT (UNIQUE) (Cost=6 Card
      =8 Bytes=176)

61  60                        FILTER
62  61                          TABLE ACCESS (FULL) OF
      'AUD_TRN_DATA_STATE' (Cost=4 Card=8 Bytes=176)

63  62                          SORT (AGGREGATE)
64  63                            TABLE ACCESS (FULL
      ) OF 'MSTWF_PROCESS_ITEM' (Cost=2 Card=1 Bytes=16)

65  61                          TABLE ACCESS (FULL) OF
      'MSTWF_MKR_CHKR_STATE' (Cost=2 Card=1 Bytes=18)

66  41            SORT (JOIN) (Cost=6 Card=358 Bytes=9666)
67  66              TABLE ACCESS (FULL) OF 'AUD_TRN_C3_CORPO
      RATE' (Cost=3 Card=358 Bytes=9666)

68  39          SORT (UNIQUE) (Cost=53 Card=1 Bytes=51)
69  68          MERGE JOIN (Cost=51 Card=1 Bytes=51)
70  69            SORT (JOIN) (Cost=48 Card=3 Bytes=72)
71  70              VIEW (Cost=46 Card=3 Bytes=72)
72  71              SORT (GROUP BY) (Cost=46 Card=3 Bytes=
      135)

73  72                MERGE JOIN (Cost=43 Card=49 Bytes=22
      05)

74  73                  SORT (JOIN) (Cost=7 Card=30 Bytes=
      570)

75  74                  HASH JOIN (Cost=5 Card=30 Bytes=
      570)

76  75                    TABLE ACCESS (FULL) OF 'TRN_C3
      _BANK' (Cost=2 Card=4 Bytes=24)

77  75                    TABLE ACCESS (FULL) OF 'AUD_TR
      N_C3_BANK' (Cost=2 Card=68 Bytes=884)

78  73                  FILTER
79  78                  SORT (JOIN)
80  79                    VIEW (Cost=44 Card=97 Bytes=25
      22)

81  80


Write a Review
 


All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.

About Oracle | Legal Notices and Terms of Use | Privacy Statement