Skip to Main Content
  • Questions
  • IN (vs) EXISTS and NOT IN (vs) NOT EXISTS

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Madhusudhana Rao.

Asked: April 02, 2001 - 1:43 am UTC

Last updated: October 16, 2017 - 12:27 pm UTC

Version: Oracle 7.3.4

Viewed 100K+ times! This question is

You Asked

Hi Tom,

Can you pls explain the diff between IN and EXISTS and NOT IN
and NOT EXISTS. Because I have read that EXISTS will work better than
IN and NOT EXISTS will work better than NOT IN (read this is Oracle
server tunning).


Regards,
Madhusudhana Rao.P



and Tom said...

see
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:953229842074 <code>


It truly depends on the query and the data as to which is BEST.

Note that in general, NOT IN and NOT EXISTS are NOT the same!!!


SQL> select count(*) from emp where empno not in ( select mgr from emp );

COUNT(*)
----------
0

apparently there are NO rows such that an employee is not a mgr -- everyone is a mgr (or are they)


SQL> select count(*) from emp T1
2 where not exists ( select null from emp T2 where t2.mgr = t1.empno );

COUNT(*)
----------
9


Ahh, but now there are 9 people who are not managers. Beware the NULL value and NOT IN!! (also the reason why NOT IN is sometimes avoided).



NOT IN can be just as efficient as NOT EXISTS -- many orders of magnitude BETTER even -- if an "anti-join" can be used (if the subquery is known to not return nulls)



Rating

  (136 ratings)

We're not taking comments currently, so please try again later if you want to add a comment.

Comments

Helena Markova, April 03, 2001 - 4:05 am UTC


phrase it

A reader, September 26, 2001 - 2:20 pm UTC

Kindly phrase the following query for us

select count(*) from emp where empno not in ( select mgr from emp );


Reader

A reader, September 30, 2002 - 10:01 pm UTC

Tom,

Instead of

SQL> select count(*) from emp T1
  2  where not exists ( select null from emp T2 where t2.mgr = t1.empno );

you could have used 

SQL>  select count(*) from emp T1
  2  where not exists ( select mgr  from emp T2 where t2.mgr = t1.empno );

Could you tell what circumstances do we use "select null"
instead of "select <value>". Are there any advantages
 

Tom Kyte
October 01, 2002 - 9:54 am UTC

why select mgr?

I find select null to be semantically more meaningful. You are NOT selecting anything really -- so say that.

Why problem if NULL is there

Tony, October 02, 2002 - 2:02 am UTC

Hi Tom,
Your answer is superb. Can you tell us why there is no record selected for NOT IN when there is NULL?



Tom Kyte
October 02, 2002 - 10:25 am UTC

Because NULL means -- gee, I don't know.  (litterally, null means Unknown)


So, the predicate

where x not in ( NULL )

evaluates to neither TRUE, nor FALSE


ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from dual where dummy not in ( NULL );

no rows selected

ops$tkyte@ORA817DEV.US.ORACLE.COM> select * from dual where NOT( dummy not in (NULL) );

no rows selected


(you would think one of the two queries would return a row -- but there is a third state for a boolean expression in sql -- "I don't know what the answer is") 

select rows using exists

Madhuri Bondre, March 31, 2003 - 6:44 am UTC

Your solution is very useful

but i have one query

i have two tables

first table with item_cd, sub_item_cd (both the columns are in primary key)

second table with sub_item_cd (no primary key)

i want to filter only those records from first table where all sub_item_cd exists in second table

ex.

rows of first table

item_cd sub_item_cd

1010 A
1010 B
1010 C
1011 A
1011 D
1012 A

rows of second table

sub_item_cd

A
A
A
B
B
C
B
C

i need output as

item_cd

1010
1012

how can I get it in one select ?

can exists will solve it?

thanks in advance


Tom Kyte
March 31, 2003 - 9:02 am UTC

ps$tkyte@ORA920> select t1.item_cd, t1.sub_item_cd, decode( t2.sub_item_cd, NULL, 1, 0 )
  2    from t1, (select distinct sub_item_cd from t2 ) t2
  3   where t1.sub_item_cd = t2.sub_item_cd(+)
  4  /

   ITEM_CD S DECODE(T2.SUB_ITEM_CD,NULL,1,0)
---------- - -------------------------------
      1010 a                               0
      1011 a                               0
      1012 a                               0
      1010 b                               0
      1010 c                               0
      1011 d                               1

6 rows selected.

ops$tkyte@ORA920> select t1.item_cd
  2    from t1, (select distinct sub_item_cd from t2 ) t2
  3   where t1.sub_item_cd = t2.sub_item_cd(+)
  4   group by t1.item_cd
  5  having max(decode( t2.sub_item_cd, NULL, 1, 0 )) = 0
  6  /

   ITEM_CD
----------
      1010
      1012

ops$tkyte@ORA920> 

Thanks

Madhuri bondre, April 01, 2003 - 12:31 am UTC

you are great !!!!

Mirjana, May 05, 2003 - 2:51 am UTC


NOT IN or NOT EXISTS or

Gautam, May 24, 2003 - 11:27 pm UTC

Hi Tom, Your points are very helping, however, in case of having 20 Mill rows in outer table and Mill in inner table none is working, both are too slow. Is there a way to avoid the subquery being executed for retrieval of every row from the outer query.

Tom Kyte
May 25, 2003 - 9:59 am UTC

give me a "for example" and define "too slow".

on my pc, to full scan a 1,000,000 row table (a copy of all_objects -- the WIDTH * HEIGHT of a table is relevant -- not just the height. I have 1,000,000 tables that consume lots less space then a 10,000 row one) takes about 7 seconds. a 20million -- 20 times as long.

As long as you are using the CBO, and the tables are analyzed, it will use a hash anti join and be very fast.

Gautam, May 27, 2003 - 4:54 pm UTC

Hi Tom, Thanks for your reply.

I have following observations :

Query:

SELECT ind_id, lst_ord_ss, dec_sc_is, lst_ord_is,
ss_group, dec_sc_ss, list_pri, cont_lvl,
site_upd_dt, source_cd, sd_custtyp, site_id,
ss_subgrp
FROM multi_mv
WHERE country_cd = :sys_b_00
AND dsf_index IN (:sys_b_01, :sys_b_02)
AND mail_cont IN (:sys_b_03, :sys_b_04)
AND mail_site = :sys_b_05
AND ss_group IN
(:sys_b_06, :sys_b_07, :sys_b_08, :sys_b_09, :sys_b_10)
AND NOT ind_id IN (SELECT DISTINCT ind_id
FROM ssw.promo
WHERE drop_cd = :sys_b_11)

1. The plan of above query. NL's

SELECT STATEMENT Cost = 193604
FILTER
TABLE ACCESS FULL MULTI_MV
TABLE ACCESS BY INDEX ROWID PROMO
INDEX RANGE SCAN PROM_DROP_CD


2. Adding "ORDER BY int_id"into above Query changes
the plan to seems like MERGE_AJ

SELECT STATEMENT Cost = 118609
TABLE ACCESS BY INDEX ROWID MULTI_MV
INDEX FULL SCAN MV_MULTI_IND_ID
TABLE ACCESS BY INDEX ROWID PROMO
INDEX RANGE SCAN PROM_DROP_CD

3. Changing "NOT IN" to "NOT EXISTS"

SELECT STATEMENT Cost = 198944
SORT ORDER BY
HASH JOIN ANTI
TABLE ACCESS FULL MULTI_MV
TABLE ACCESS FULL PROMO

Why is the CBO choosing Anti join only in case of NOT EXISTS ?

Internally the Anti Hash Join retrives the full row or only the join column ?

The approx time for the query to finish in first 2 cases is 5 days.

I would appreciate any comments on this ?.

Tom Kyte
May 27, 2003 - 5:51 pm UTC

obviously NO BIND variables on your system eh.

I'll guess (in light of no creates, no knowledge of indexes, no dbms_stats.set_table_stats to let me know how big/how many rows things are, etc etc etc)


Is ind_id in ssw.promo NULLABLE

and if so, what happens when you say:

AND NOT ind_id IN (SELECT DISTINCT ind_id
FROM ssw.promo
WHERE drop_cd = :sys_b_11
and IND_ID IS NOT NULL )

instead (you do know of course that NOT EXISTS and NOT IN are *not* interchangeable, they'll give different results!)



Not In Vs Not Exists !

ramki, May 27, 2003 - 7:41 pm UTC

Tom,

I have 2 tables. Both tables have around 8000 records.

Table 1:

ID
parentID
col1
col2
col3

Table1: Primary key: ID

Table2

Module
ID
processDate
ParentID
dataCol1
dataCol2
source_column
source_fg

Table2 Primary key: Module, Id, processDate


Table 1 is a superset of all Ids that Table 2 has.
ParentIds for Ids in Table1 and Table2 would be same for same Id.


Now, I need to pick those Ids from Table1 that are

- not in Table2
- have same ParentID from Table2.
- for records that have source_column in table2 is not null
- for records in Table2 that have source_fg as null




Table1:

ID ParentId Col1 col2 col3
101 111 10 20 30
102 111 20 40 60
103 112 10 10 10
104 113 10 20 30
105 113 30 40 50
106 114 40 40 50
107 113 50 50 50

Table2:

Module ID ParentId ProcessDate datacol1 datacol2 source_column source_fg
1 101 111 27-MAY-2003 1000 1000 2000
1 103 112 27-MAY-2003 2000 2000 2000 Y
1 104 113 27-MAY-2003 3000 3000 3000
1 106 114 27-MAY-2003 2300 2300 3500


Now, I need the out to be:

102
105
107


Select t1.Id
from Table1 t1, Table2 t2
where not Exists (select null from Table2 where table2.module=1 and
table2.id = t1.id and table2.processDate = sysdate)
and t1.parentId = t2.parentId
and t2.Module = 1
and t2.processDate = sysdate
and t2.source_column is not null
and t2.source_fg is null


Actually, I need to insert these Ids in Table2 and replicate data from their parentIDs in Table2.
so, New records would be

Module ID ParentId ProcessDate datacol1 datacol2 source_column source_fg
1 102 111 27-MAY-2003 1000 1000 2000
1 105 113 27-MAY-2003 3000 3000 3000
1 107 113 27-MAY-2003 3000 3000 3000

I am trying to use, Insert into table2
select ......



Which one would be better: (using RBO)

Select t1.Id, t2.datacol1, t2.datacol2..........
from Table1 t1, Table2 t2
where not Exists (select null from Table2 where table2.module=1 and
table2.id = t1.id and table2.processDate = sysdate)
and t1.parentId = t2.parentId
and t2.Module = 1
and t2.processDate = sysdate
and t2.source_column is not null
and t2.source_fg is null


or

Select t1.Id, t2.datacol1, t2.datacol2..........
from Table1 t1, Table2 t2
where t1.Id not in (select Id from Table2 where table2.module=1 and
table2.processDate = sysdate)
and t1.parentId = t2.parentId
and t2.Module = 1
and t2.processDate = sysdate
and t2.source_column is not null
and t2.source_fg is null


Please advice. Thanks





Tom Kyte
May 27, 2003 - 8:01 pm UTC

neither, rbo is the wrong technology.

In anycase, it seems you have the test data, benchmark it? it is what I would have to do in order to show you.

rbo would be wrong. rbo would use the least efficient plans for these types of queries. cbo would do it most efficiently.

Problem with my NOT EXISTS statement

Godwin, May 30, 2003 - 8:03 am UTC

Hi Tom,
Its difficult getting you meanwhile this question is similar what is being discussed and I will be grateful if you will mail the reply to godwinofoe@yahoo.com. Please don't ignore reading this.I need an urgent reply to be able to get this report to management.

The problem is in generating a report in Report6i from a University HR database Oracle7.3.2 Server.
The problem is like this:
In a University, staff who retired are taken on contract to work for some time. Also staff who resigned from the university and want to come back are re-engaged.
Now management requested for a report on all active staff (i.e to exclude retirees and include staff re-engaged).
When I wrote a query to produce such report I was not getting the correct result. The query failed to exclude the retired staff. The logic i was using is that the subquery should select all staff who retired and resigned from the university. At that instance of creating their service record, their start date is the same as their end date so the subquery would select by using the condition where the start date equals the end date. When such records of staff are found by the subquery,the outer query would compare its result with the inner query and then exclude any matching records found in the inner query result. I hope i made myself clear.Here is the detail of my query:

SELECT distinct id,name
,INITCAP(LOWER(dname))
,disc,descrip
,startdate,endate
from staff a
,dept b
,jobhist c
,dictator d
,postab e
WHERE a.id= c.stid
AND a.deptno= b.deptno
AND c.indic = d.code
AND c.jobtitle = e.postid
AND NOT EXISTS
(SELECT 1
FROM jobhist j
WHERE j.stid=a.id
AND j.startdate=(SELECT MAX(NVL k.endate,k.startdate))
FROM jobhist k,jobhist n
WHERE k.stid = a.id
AND k.stid=n.stid
AND c.indic IN('01','02','04')
)
)

ORDER BY name asc

This is the result got from the above query:
DeptFile# NameService Indicator PostStart Date End Date
Operations 10003 Mana NEW APPOINTMENT LECTURER 1-Dec-87 20-Jan-99
Operations 10003 Mana re-enganged SENIOR LECTURER 1-Jun-01
Research 10001 Ofoe NEW APPOINTMENT LECTURER 15-Aug-02 15-May-03
Research 10001 Ofoe PROMOTION SENIOR LECTURER 15-May-03
Sales 10002 John NEW APPOINTMENT Junior Admin Asst 14-Jun-75 15-May-88
Sales 10002 John PROMOTION LECTURER 16-May-90 11-Jun-00
Sales 10002 John PROMOTION SENIOR ADMIN 15-May-88 30-May-90
Sales 10002 John PROMOTION SENIOR LECTURER 13-Jun-00 30-Jul-02

The correct output should have excluded staff called John since he is retired.

Here are the records stored in the various tables:

Jobhist Table
File# Ser.Indicator Postcode StartDate End Date
10001 A1 5 15-Aug-02 15-May-03
10001 A2 6 15-May-03
10002 A3 7 14-Jun-75 15-May-88
10002 A3 6 15-May-88 30-May-90
10002 A1 6 16-May-90 11-Jun-00
10002 A2 6 13-Jun-00 30-Jul-02
10002 A2 1 30-May-03 30-May-03
10003 A1 5 1-Dec-87 20-Jan-99
10003 A2 4 1-Jul-00 1-Jul-00
10003 A2 3 1-Jun-01

Staff table:
File# Name Hiredate StaffType Deptno Birthdate
10001 Ofoe 12-Aug-02 3 20
10004 Kofi 1-Jun-75 3 10 15-Jun-40
10002 John 12-May-14 3 30 12-Jun-70
10003 Mana 12-Sep-87 2 40 17-Jul-56


Service Indicator Table(dictator):
Code Description
1 retired
2 contract
3 re-enganged
4 resigned
5 NEW APPOINTMENT
6 PROMOTION

Is there a mistake I made, is there a different way of achieving the result or is SQL incapable of achieving the result? I need your help please. Thanks for your help.


Tom Kyte
May 30, 2003 - 8:40 am UTC

well, first - i don't read every review, most but not all. Also, I read them as I have time -- not on any sort of set schedule

second, I'm not emailing people on individual followups. I'd spend the other half of my life doing that (leaving no half for me)

third, the example you put forth is really really really hard to follow. Your sample tables for example haven't many columns at all that match the columns in your query. I don't understand your datamodel (as you should). So, I'll make a simple guess


sounds like you want STAFF records such that their last (most recent) record in JOBHIST does not have an indic code in ( 1, 2, 4 )

so, I think we can ignore enddate all togther, just find the max startdate for each person. we can start with:


select file#, max( to_char(startdate,'yyyymmddhh24miss') || indic )
from jobhist
group by file#


Now, that will give us a set of file# (which I assume is the key into STAFF) and their max startdate value ALONG WITH THE INDIC code for that max date.

We can then:

select file#
from (
select file#, max( to_char(startdate,'yyyymmddhh24miss') || indic ) data
from jobhist
group by file#
)
where substr( data, 15 ) not in ( '01', '02', '04' )
/

That'll be the set of ACTIVE STAFF file#'s.

Now you can just

select *
from staff
where file# in ( <that query> );







You got what i mean but not getting the actual result I want

Godwin, June 02, 2003 - 8:59 am UTC

Hi Tom,
Thanks for the response.
With reference to your understanding "sounds like you want STAFF records such that their last (most recent) record in
JOBHIST does not have an indic code in ( 1, 2, 4 )"

What i mean is that staff who resign(4) or retired(1)
shouldn't have their names or previous service records in the JOBHIST table appeared in the report at all.
Take it as that the STAFF TABLE is the MASTER Table with the primary key fileNo and the JOBHIST table is the DETAIL TABLE with FileNo as the foreign Key.
Your solutions you gave produced the names and previous service records of staff excluding their most recent records that has indic code in (1,2,4).
We don't the records of those whose most recent records that has indic code in (1,2,4) appeared.
Can't there be away of achieving that?
Thanks for your help.


Tom Kyte
June 02, 2003 - 9:16 am UTC

you say

"we don't the records ... with 1,2,4 appeared"

If I read that to mean "why don't" -- then it is because I filtered them:

select file#
from (
select file#, max( to_char(startdate,'yyyymmddhh24miss') || indic ) data
from jobhist
group by file#
)
where substr( data, 15 ) not in ( '01', '02', '04' )
/


Look at the concept there. (you should be able to fix this, it is after all your set of tables???)

That should -- if you've explained things accurately -- return the set of primary keys from jobhist for joining to the other table. The set of records this returns is the MAX startdate record for everyone in jobhist (their LAST ENTRY) if and only if their indic code isn't in 1, 2, 4


Take the technique, verify the results -- apply the technique to your problem.

Since I
a) don't know your tables
b) don't know your keys
c) am having a hard time following your explaination

this is the best I can do.

It was wonderful

Godwin, June 02, 2003 - 10:53 am UTC

Hi Tom,
It is working perfectly.
Thanks for your patience although my questions were not explanatory enough,your guess were correct.I made some mistakes in implementing the solution.
Just as you said,I will now fine tune the query so i can get the records from more than one tables. I will do my best to get accurate result but please bear with me if i should get back to you for additional help.
God bless!!


avoid correlated subquery

umesh, August 10, 2003 - 8:55 am UTC


SELECT
count(*)
FROM
AER A,
AER_PRODUCT B ,
AER_PRODUCT_APPROVAL_NO P ,
AER_REACT E
WHERE
A.AER_DELETED = '0' AND
B.RECORD_DELETED != '1' AND
E.RECORD_DELETED != '1' AND
(B.NOT_RELEVANT IS NULL) AND
(E.NOT_RELEVANT IS NULL) AND
B.TRADE_NAME LIKE :1 AND
(
( B.PRODUCT_TYPE_MULTI = '0'
AND
(B.PRODUCT_TYPE = '1' OR B.PRODUCT_TYPE = '2' OR B.PRODUCT_TYPE = '4' )
)
OR
( B.PRODUCT_TYPE_MULTI = '1'
AND
( ( P.PRODUCT_TYPE = '1' OR P.PRODUCT_TYPE = '2' OR P.PRODUCT_TYPE = '4' )
AND P.COUNTRY_CODE = NVL(A.AER_COUNTRY, '078')
AND B.SEQ_PRODUCT = P.SEQ_PRODUCT AND B.AER_ID = P.AER_ID
)
OR
(NOT EXISTS
( SELECT Y.APPROVAL_NO FROM AER_PRODUCT_APPROVAL_NO Y
WHERE Y.COUNTRY_CODE = NVL(A.AER_COUNTRY, '078')
AND B.SEQ_PRODUCT = Y.SEQ_PRODUCT AND B.AER_ID = Y.AER_ID
)
AND
B.PRODUCT_TYPE != '3'
))
)
AND
(B.PRODUCT_FLAG = '1' OR B.PRODUCT_FLAG = '4') AND
B.AER_ID = A.AER_ID AND
E.AER_ID = A.AER_ID AND
E.EVENT_TYPE != '2'
ORDER BY A.AER_NO, A.COMPOSITE_VERSION_NO DESC , A.AER_ID, E.AUTO_RANK, B.AUTO_RANK



Tables P.KEY
AER AER_ID
AER_PRODUCT AER_ID,SEQ_PRODUCT
AER_APPROVAL_NO
AER_ID,SEQ_PRODUCT,SEQ_APPROVAL_NUMBER

i WANT TO ELIMINATE THE CORRELATED SUBQRY
PLEASE HELP

A reader, September 04, 2003 - 6:39 pm UTC

Tom,

Can you please take a look and advice on this query


select     map, 
    todmdunit, 
    todmdgroup, 
    todfuloc, 
    tomodel,         
        fromdmdunit, 
    fromdmdgroup, 
    fromdfuloc, 
    frommodel, 
    eff 
from    stsc.dfumap a 
where      scen = 0 
and        not exists 
    (    select * 
        from   stsc.dfu b                    
        where  scen = 0                    
        and    a.fromdmdunit = b.dmdunit                    
        and    a.fromdmdgroup = b.dmdgroup
                and    a.fromdfuloc = b.loc
                and    a.frommodel = b.model
    )
/

no rows selected

Elapsed: 00:00:30.29

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=59587 Bytes=3634807)
   1    0   FILTER
   2    1     INDEX (FAST FULL SCAN) OF 'DFUMAP1' (NON-UNIQUE) (Cost=2 Card=59587 Bytes=36
          34807)

   3    1     INDEX (UNIQUE SCAN) OF 'DFU1' (UNIQUE) (Cost=2 Card=1 Bytes=23)

Statistics
----------------------------------------------------------
          0  recursive calls
         50  db block gets
    3590173  consistent gets
          0  physical reads
          0  redo size
        686  bytes sent via SQL*Net to client
        319  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed



SQL> desc dfu
 Name                                                                    Null?    Type
 ----------------------------------------------------------------------- -------- ----------------------------
 DMDUNIT                                                                 NOT NULL VARCHAR2(30)
 DMDGROUP                                                                NOT NULL VARCHAR2(10)
 LOC                                                                     NOT NULL VARCHAR2(10)
 HISTSTART                                                               NOT NULL DATE
 EFF                                                                     NOT NULL DATE
 DISC                                                                    NOT NULL DATE
 FCSTHOR                                                                 NOT NULL NUMBER(38)
 DMDCAL                                                                  NOT NULL VARCHAR2(18)
 DMDPOSTDATE                                                             NOT NULL DATE
 MODELDATE                                                               NOT NULL DATE
 STATMSE                                                                 NOT NULL FLOAT(126)
 MAXHIST                                                                 NOT NULL NUMBER(38)
 TOTFCSTLOCK                                                             NOT NULL NUMBER(38)
 LOCKDUR                                                                 NOT NULL NUMBER(38)
 REFITDATE                                                               NOT NULL DATE
 MASK                                                                    NOT NULL VARCHAR2(18)
 SCEN                                                                    NOT NULL NUMBER(38)
 SCENBITS                                                                NOT NULL NUMBER(38)
 MAPUSED                                                                 NOT NULL VARCHAR2(18)
 NETFCSTMSE                                                              NOT NULL FLOAT(126)
 NETFCSTMSESMCONST                                                       NOT NULL FLOAT(126)
 NETFCSTERROR                                                            NOT NULL FLOAT(126)
 NEGFCSTSW                                                               NOT NULL NUMBER(38)
 AUTOADJFACTOR                                                           NOT NULL FLOAT(126)
 MODEL                                                                   NOT NULL VARCHAR2(18)
 AUTOADJMIN                                                              NOT NULL FLOAT(126)
 AUTOADJMAX                                                              NOT NULL FLOAT(126)
 PUBLISHDATE                                                             NOT NULL DATE
 UDC_DFU_HIERARCHY                                                                VARCHAR2(50)
 UDC_MODELNUM                                                            NOT NULL NUMBER(38)
 UDC_ABCINDICATOR                                                                 VARCHAR2(50)
 UDC_DELETERECSW                                                         NOT NULL NUMBER(38)
 UDC_IMPORTSW                                                                     NUMBER(38)
 UDC_NEWAUTOGENSW                                                        NOT NULL NUMBER(38)


SQL>desc dfumap
 Name                                                                    Null?    Type
 ----------------------------------------------------------------------- -------- ----------------------------
 MAP                                                                     NOT NULL VARCHAR2(18)
 TODMDUNIT                                                               NOT NULL VARCHAR2(30)
 TODMDGROUP                                                              NOT NULL VARCHAR2(10)
 TODFULOC                                                                NOT NULL VARCHAR2(10)
 FROMDMDUNIT                                                             NOT NULL VARCHAR2(30)
 FROMDMDGROUP                                                            NOT NULL VARCHAR2(10)
 FROMDFULOC                                                              NOT NULL VARCHAR2(10)
 EFF                                                                     NOT NULL DATE
 DISC                                                                    NOT NULL DATE
 FACTOR                                                                  NOT NULL FLOAT(126)
 LEVELNUM                                                                NOT NULL NUMBER(38)
 FCSTTYPE                                                                NOT NULL NUMBER(38)
 HISTTYPE                                                                NOT NULL NUMBER(38)
 SCEN                                                                    NOT NULL NUMBER(38)
 SCENBITS                                                                NOT NULL NUMBER(38)
 CONVFACTOR                                                              NOT NULL FLOAT(126)
 LOCKSTART                                                               NOT NULL DATE
 LOCKEND                                                                 NOT NULL DATE
 TOMODEL                                                                 NOT NULL VARCHAR2(18)
 FROMMODEL                                                               NOT NULL VARCHAR2(18)
 UDC_NEWAUTOGENSW                                                        NOT NULL NUMBER(38)

SQL>select count(*) from dfu;
 COUNT(*)
----------
    770946


SQL>select count(*) from dfumap;
  COUNT(*)
----------
   1191740

parameters for 
optimizer_index_caching              integer 80
optimizer_index_cost_adj             integer 25 

Tom Kyte
September 05, 2003 - 3:33 pm UTC

try using NOT IN instead of EXISTS


from stsc.dfumap a
where scen = 0
and (fromdmdunit, ... ) not in
( select b.dmdunit, b.dmdgroup, b.loc, b.model
from stsc.dfu b
where scen = 0
)

Sometimes not exists does not work!!!

Aldo bravo, October 06, 2003 - 4:31 pm UTC

Hi Tom,

See this example:

SQL> drop table t1; 
drop table t1
           *
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> 
SQL> create table t1 (f1 varchar(10), 
  2  f2 varchar(10), 
  3  f3 varchar(10)); 

Table created.

SQL> 
SQL> insert into t1 values ('1','5','3'); 

1 row created.

SQL> insert into t1 values ('4','4','6'); 

1 row created.

SQL> commit; 

Commit complete.

SQL> SELECT f1,f2,f3 
  2  FROM t1 a 
  3  WHERE a.f2 <= '5' 
  4  AND NOT EXISTS (SELECT '1' 
  5  FROM t1 b 
  6  WHERE b.f2 <= '5' 
  7  AND b.f2 > a.f2); 

F1         F2         F3
---------- ---------- ----------
1          5          3

SQL> analyze table t1 compute statistics;

Table analyzed.

SQL> SELECT f1,f2,f3 
  2  FROM t1 a 
  3  WHERE a.f2 <= '5' 
  4  AND NOT EXISTS (SELECT '1' 
  5  FROM t1 b 
  6  WHERE b.f2 <= '5' 
  7  AND b.f2 > a.f2); 

no rows selected

What can you say about it?
If I delete the stats it works:
SQL> exec dbms_stats.delete_table_stats(tabname=> 'T1',ownname=>'ESCMANAGER');

PL/SQL procedure successfully completed.

SQL> /

F1         F2         F3
---------- ---------- ----------
1          5          3

It is a 9.2.0.4 database on Windows 2003.

Thanks, 

Tom Kyte
October 06, 2003 - 6:22 pm UTC

first -- thank you thank you thank you for such a nice simple testcase. bravo

Second, I filed Bug No: 3178622 on this -- obviously, a problem.

implicit cast?

freek, October 07, 2003 - 8:04 am UTC

Tom,

Could the problem above be explained by an implicit cast?
Where the optimizer is after the analyze aware of the fact the the datatype of the column is varchar2 and does not convert the literal to a number?

freek@DEV01> select (case when to_char(x) > to_char(y) then 'X bigger' else 'Y bigger' end)
2 from ( select 2 x, 10 y
3 from dual
4 );

(CASEWHE
--------
X bigger

Elapsed: 00:00:00.00

if you modify the example to use different values, you get following result

freek@DEV01> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE 9.2.0.1.0 Production
TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production

Elapsed: 00:00:00.00
freek@DEV01> drop table t;

Table dropped.

Elapsed: 00:00:00.00
freek@DEV01> create table t1
2 ( f1 varchar(10),
3 f2 varchar(10),
4 f3 varchar(10)
5 );

Table created.

Elapsed: 00:00:00.00
freek@DEV01> insert into t1 values ('1','2','3');

1 row created.

Elapsed: 00:00:00.00
freek@DEV01> insert into t1 values ('4','10','6');

1 row created.

Elapsed: 00:00:00.00
freek@DEV01> commit;

Commit complete.

Elapsed: 00:00:00.00
freek@DEV01> select f1,f2,f3
2 from t1 a
3 where a.f2 <= '20'
4 and not exists ( select '1'
5 from t1 b
6 where b.f2 <= '20'
7 and b.f2 > a.f2
8 );

F1 F2 F3
---------- ---------- ----------
1 2 3

Elapsed: 00:00:00.00
freek@DEV01> analyze table t1 compute statistics;

Table analyzed.

Elapsed: 00:00:00.00
freek@DEV01> select f1,f2,f3
2 from t1 a
3 where a.f2 <= '20'
4 and not exists ( select '1'
5 from t1 b
6 where b.f2 <= '20'
7 and b.f2 > a.f2
8 );

F1 F2 F3
---------- ---------- ----------
1 2 3

Elapsed: 00:00:00.00

now if you change the query to use an to_number

freek@DEV01> select f1,f2,f3
2 from t1 a
3 where to_number(a.f2) <= 20
4 and not exists ( select 1
5 from t1 b
6 where to_number(b.f2) <= 20
7 and to_number(b.f2) > to_number(a.f2)
8 );

F1 F2 F3
---------- ---------- ----------
4 10 6



Tom Kyte
October 07, 2003 - 8:36 am UTC

nope, it is a bug - if the SAME query against the SAME data returns DIFFERENT answers based solely on the plan developed -- that is a bug.

RE: sometimes not exists does not work...

Mark A. Williams, October 07, 2003 - 5:35 pm UTC

Tom,

Is there some environmental/config setting that could make a difference?  I have 9.2.0.4.0 on a WinXP laptop and can not duplicate...  Is this behavior specific to Windows Server 2003?  I could not duplicate on 9.2.0.4.0 on Solaris, AIX, or Linux as well...  (I am going to install Windows Server 2003 and test on that tomorrow)


SQL> select * from v$version;

BANNER
----------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE    9.2.0.3.0       Production
TNS for 32-bit Windows: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production

5 rows selected.

SQL> create table t1
  2  (
  3    f1 varchar(10),
  4    f2 varchar(10),
  5    f3 varchar(10)
  6  );

Table created.

SQL> insert into t1 values ('1','5','3');

1 row created.

SQL> insert into t1 values ('4','4','6');

1 row created.

SQL> commit;

Commit complete.

SQL> SELECT   f1,
  2           f2,
  3           f3
  4  FROM     t1 a
  5  WHERE    a.f2 <= '5'
  6  AND NOT EXISTS (SELECT   '1'
  7                  FROM     t1 b
  8                  WHERE    b.f2 <= '5'
  9                  AND      b.f2 > a.f2);

F1         F2         F3
---------- ---------- ----------
1          5          3

1 row selected.

SQL> analyze table t1 compute statistics;

Table analyzed.

SQL> SELECT   f1,
  2           f2,
  3           f3
  4  FROM     t1 a
  5  WHERE    a.f2 <= '5'
  6  AND NOT EXISTS (SELECT   '1'
  7                  FROM     t1 b
  8                  WHERE    b.f2 <= '5'
  9                  AND      b.f2 > a.f2);

F1         F2         F3
---------- ---------- ----------
1          5          3

1 row selected.

SQL>

Thanks,

Mark 

Tom Kyte
October 07, 2003 - 6:18 pm UTC

absolutely -- any non-default optimizer related parameters will affect this. It's all about the plan that was generated and many things will affect that.

I can replicate on Windows 2000 as well

Martin, October 08, 2003 - 2:52 am UTC


Workaround

Aldo Bravo, October 08, 2003 - 4:16 pm UTC

If you use /*+ nounnest +/ hint or if you set the parameter _unnest_subquery=false in the spfile the problem is solved.

There are many bugs related to this issue (2739068-2700474-3085399-2829723) and in one of them I found the workaround.

Thanks!!

rewriting "not in"

A reader, February 19, 2004 - 9:17 am UTC

I have seen a "not in" query written as:

select a.key
from t1 a,
t2 b
where a.key = b.key(+) and
b.key is null
/

What do you think of this?


Tom Kyte
February 19, 2004 - 11:32 am UTC

it is the "poor mans hash anti-join", a good idea in the olden days of the RBO, not so good in the days of a smarter CBO.

start with:


ops$tkyte@ORA920PC> create table t1 as select * from all_objects where rownum <= 5000;
 
Table created.
 
ops$tkyte@ORA920PC> create table t2 as select * from all_objects where rownum <= 4950;
 
Table created.
 
ops$tkyte@ORA920PC> create index t2_idx on t2(object_id);
 
Index created.

select count(*)
  from t1 rbo
 where object_id not in ( select object_id from t2 )
                                                                                                                  
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      8.63       8.53     129066     197295          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      8.64       8.53     129066     197295          0           1


select count(*)
  from t1 rbo
 where NOT EXISTS (select null from t2 where t2.object_id = rbo.object_id )
                                                                                                                  
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.08       0.13         83      10075          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.09       0.13         83      10075          0           1


select count(*)
  from t1, t2 rbo
 where t1.object_id = rbo.object_id(+)
   and rbo.object_id IS NULL
                                                                                                                  
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.08       0.11         72       5087          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.09       0.11         72       5087          0           1


<b>so, it looks "good" right?  better than the alternatives in RBO, yes.  but, analyze and</b>

select count(*)
  from t1 cbo
 where object_id not in ( select object_id from t2 )
                                                                                                                  
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.04          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.06       0.08          0         91          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.06       0.12          0         91          0           1

<b>hmmm, rbo not so good -- "user written hash anti join under rbo" not so good...

the cbo does OK though with the user written one:</b>


select count(*)
  from t1, t2 cbo
 where t1.object_id = cbo.object_id(+)
   and cbo.object_id IS NULL
                                                                                                                  
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.08       0.10          0         91          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.08       0.11          0         91          0           1



<b>but I find the not in to be infinitely easier to code and alot more "informational" (clear,easy to understand)</b>







 

Union in NOT EXISTS

Ishaque Hussain, February 19, 2004 - 10:46 pm UTC

Hi Tom, I don't have any statistics to provide you but could you look at query A and query B and advise which would be more efficient to write.

Query A)

select *
from contr_prod_price cpp,
where cpp.contr in (
select contr
from contr
where co = 'E')
and cpp.prod_cd = 'PX'
and not exists
(select 'x'
from contr_prod_prc_low_lvl_ind cpli
where cpp.contr = cpli.contr
and cpp.contr_frmla = cpli.contr_frmla
and cpp.contr_lvl = cpli.contr_lvl
union
select 'x'
from contr_prod_prc_low_lvl_dir cpld
where cpp.contr = cpld.contr
and cpp.contr_frmla = cpld.contr_frmla
and cpp.contr_lvl = cpld.contr_lvl )

Query B)

select *
from contr_prod_price cpp,
where cpp.contr in (
select contr
from contr
where co = 'E')
and cpp.prod_cd = 'PX'
and not exists
(select 'x'
from contr_prod_prc_low_lvl_ind cpli
where cpp.contr = cpli.contr
and cpp.contr_frmla = cpli.contr_frmla
and cpp.contr_lvl = cpli.contr_lvl )
union
select *
from contr_prod_price cpp
where cpp.contr in (
select contr
from contr
where co = 'E')
and cpp.prod_cd = 'PX'
and not exists
(select 'x'
from contr_prod_prc_low_lvl_dir cpld
where cpp.contr = cpld.contr
and cpp.contr_frmla = cpld.contr_frmla
and cpp.contr_lvl = cpld.contr_lvl )

Thanks

Tom Kyte
February 20, 2004 - 7:22 am UTC

well, given that they do not return the same results "in general" (the 2cnd will do a distinct on the result set)....

and that I might really consider a NOT IN....

I would probably go with the first, but I would benchmark them (since I had both)

INTERSECT , MINUS

Arangaperumal, April 28, 2004 - 7:57 am UTC

Hi Tom,

1.EXISTS plays same role as INTERSECT
2.NOT EXISTS plays same role as MINUS.
am i right?
So, the same rule can be applied?


NOT IN Query

Reader, May 25, 2004 - 1:19 am UTC

Hi Tom,
The following are 2 tables,(external_documents and external_document_links).They are linked by external_documents.NAME and external_document_links.EXTERNAL_DOCUMENT_NAME columns.


SQL> select * from external_documents --(Primary Key: NAME)
NAME                                    
----------------------
corp-bro2.jpg                
cvs_structure.doc            
corp-bro.jpg                 
Raid.pdf                     
AutomaticUndoInternals.pdf   

5 rows selected


-------------------------------------------------------------
SQL> select * from external_document_links; --(All the 4 columns are part of the primary key)

FORM       DOCUMENT_NAME  DOCUMENT_VERSION EXTERNAL_DOCUMENT_NAME
---------- ----------------------- -----------------------------
form_1     document_1          1         corp-bro2.jpg
form_1     document_1          1         Raid.pdf
form_2     document_2          1  AutomaticUndoInternals.pdf
form_2     document_1          1         Raid.pdf
PO         po_4000542_2_69     1         Raid.pdf
PO         po_4000542_2_69     1  AutomaticUndoInternals.pdf
form_3     document_1          1         Raid.pdf
form_1     document_2          1         corp-bro.jpg
PO         po_4000542_2_69     1         corp-bro.jpg
PO         po_4000542_2_69     1         corp-bro2.jpg
PO         po_4000542_2_69     1         cvs_structure.doc

11 rows selected.

-------------------------------------------------------------

The input to the query will be external_document_links.form,external_document_links.document_name and external_document_links.document_version. Based on this criteria I need to retrive documents linked to the above 3 columns with a flag as 'Y' and rest of the documents from external_documents table with the flag as 'N'

I have written a query which goes like this :-

SQL> select a.name NAME,'Y' PRESENT from external_documents a,external_document_links b 
where a.name=b.EXTERNAL_DOCUMENT_NAME AND b.form ='form_1' and b.DOCUMENT_VERSION =1 and b.DOCUMENT_NAME='document_1'  
union 
select name ,'N' from external_documents where name
not in (select a.name NAME from external_documents a,external_document_links b 
where a.name=b.EXTERNAL_DOCUMENT_NAME AND b.form ='form_1'and b.DOCUMENT_VERSION =1 and b.DOCUMENT_NAME='document_1')

So I get the right output as follows :- 
--------------------------------------------------------------------------------
NAME                           P
------------------------------ -
AutomaticUndoInternals.pdf     N
Raid.pdf                       Y
corp-bro.jpg                   N
corp-bro2.jpg                  Y
cvs_structure.doc              N

5 rows selected
 
----------------------------------------------------------------

 Can you please tell me if there is any other way in which this query can be re-written, i.e without using the 'NOT IN' clause.

 

Tom Kyte
May 25, 2004 - 7:12 am UTC

sigh, no table creates. no insert intos. no easy way to play with the data.


seems you want every row from external_documents, and if that document has a mate in external_document_links -- show Y else show N


select name, nvl( (select 'Y'
from external_document_links A
where a.name = b.external_document_name
and b.form = :bind1
and b.document_version = :bind2
and b.document_name = :bind3 ), 'N' ) present
from external_documents;

probably works, but I couldn't really run it or anything.

NOT IN Query

Reader, May 26, 2004 - 1:00 am UTC

Hi Tom
Thanks a TON, Sorry I did not provide the necesssary scripts for table and data creation.
You are the greatest.

NOT IN for REMOTE site

Erik, May 27, 2004 - 5:27 am UTC

Hello Tom,
I would like to ask you for one our bad sql which uses "not in" for remote site. For some sites we have slow network and the package takes about 10min. (it transfers about 100rows, runs once per hour). Do you have any suggestion for us how can we improve performance of the bad sql?
Thank you.
SET AUTOTRACE TRACEONLY
-- insert into monitor.promon@tesa
select
PROINUM1,
PROINUM2,
PROAPPL,
--lc_host,
PROSTAT,
upper(PROPROG),
PROMESS,
PRODATE,
upper(PROUSER),
PROTYPE
from
r4.promon
where
prodate > sysdate-1/24 and --ld_prodate and
prodate < sysdate and --ld_currentDate and
UPPER(proprog) not in (select UPPER(batch) from monitor.promon_badbatch@tesa);

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2069 Card=1 Bytes=100)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'PROMON' (Cost=2069 Card=1 Bytes=100)
3 1 REMOTE* (Cost=1 Card=1 Bytes=17) TESA.WORLD
3 SERIAL_FROM_REMOTE SELECT "BATCH" FROM "MONITOR"."PROMON_BADBATCH" "PROMON_BADBATCH"
WHERE LNNVL(UPPER(:1)<>UPPER("BATCH"))


Tom Kyte
May 27, 2004 - 9:27 am UTC

might try

from promon a, (select distinct upper(batch) batch
from monitor.promon_badbatch@tesa ) b
where upper(a.proprog) = b.batch(+)
and b.batch is null
and .....


(and consider removing the need to upper things like that all of the time by standardizing on upper or lower case for those fields....)

NOT IN for REMOTE site

Erik, May 31, 2004 - 12:40 pm UTC

Tom, thank you very much.
But, could you explain me please, why does my "NOT IN" is processed as correlated subquery? I thought that it should not be correlated.


Tom Kyte
May 31, 2004 - 2:11 pm UTC

why did you think that? it is all just syntax, the optimizer can and does do whatever it wants under the covers.

Intreasting

Riaz Shahid, June 01, 2004 - 3:10 am UTC

Thats a pretty intresting thread. Tom would you please through some light on query:

select count(*)
from t1, t2 cbo
where t1.object_id = cbo.object_id(+)
and cbo.object_id IS NULL

We are (outer) joing the two tables on the basis of object_id and on the other hand we are saying "cbo.object_id IS NULL".

I can't understand how it will be executed. Can you please tell us in psuedo code ???

Tom Kyte
June 01, 2004 - 8:34 am UTC

if you understand an outer join -- basically every row in T1 will be in the result set, regardless of whether there was a 'mate' in t2 to be joined to.

So, t1 will be retrieved from, and if there is a row(s) in t2 for that object_id -- the join will happen (and t2.object_id will be NOT NULL of course since we did the join).  Else, the column values for t2 will be set to NULL and the row from t2 will be output.  consider:

ops$tkyte@ORA9IR2> create table t1 ( x int );
 
Table created.
 
ops$tkyte@ORA9IR2> create table t2 ( x int );
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t1 values(1);
 
1 row created.
 
ops$tkyte@ORA9IR2> insert into t1 values(2);
 
1 row created.
 
ops$tkyte@ORA9IR2> insert into t2 values(1);
 
1 row created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from t1,t2 where t1.x = t2.x(+);
 
         X          X
---------- ----------
         1          1
         2


<b>the outer join says "all rows in t1 will be output, no matter what".  So, all we need to do is keep the rows where t2.x (cbo.object_id) is NULL -- those are the rows in t1 such that there was no row in T2 to join to -- just like a not in</b>
 

Excellent Explanation

Riaz Shahid, June 01, 2004 - 9:37 am UTC

I got it.

Love the way you described it...


NOT IN for REMOTE site

Erik, June 01, 2004 - 11:20 am UTC

I thought that this should not be processed as correlated:
... UPPER(proprog) not in (select UPPER(batch) from monitor.promon_badbatch@tesa);
But the excecution plan showed (I think that :1 = promon.proprog):
SELECT "BATCH" FROM "MONITOR"."PROMON_BADBATCH" "PROMON_BADBATCH" WHERE LNNVL(UPPER(:1)<>UPPER("BATCH"))
I did little test:
create table t1 (id number); create table t2 (id number);
insert into t1 values (1); insert into t1 values (2);
insert into t2 values (2);
analyze table t1 compute statistics;
analyze table t2 compute statistics;
I)
select * from t1 where id not in (select id from t2);
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'T1'
3 1 TABLE ACCESS (FULL) OF 'T2'
II)
select * from t1 where id not in (select id from t2@ee);
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'T1' (Cost=3)
3 1 REMOTE* (Cost=3) EE.WORLD
3 SERIAL_FROM_REMOTE SELECT "ID" FROM "T2" "T2" WHERE LNNVL("ID"<>:1)

1) Is the first SQL processed same as the second - but only explain plan did not show it?
2) If not - why the 2nd is processed as correlated (I think that :1 = t1.id)?
Thank you.


Tom Kyte
June 01, 2004 - 2:59 pm UTC

no it is not. you are missing a NOT NULL on there (on t2.id) (would make it better).


different queries, different plans.

Into the Wayback Machine (wayback up the page)

Duke Ganote, September 04, 2004 - 12:02 pm UTC

Your two basic queries for comparison were:
SQL> select count(*) from emp where empno not in ( select mgr from emp );

        COUNT(*)
----------------
               0

SQL> select count(*) from emp T1
  2   where not exists ( select null from emp T2 where t2.mgr = t1.empno );

        COUNT(*)
----------------
               8

The first fails because of 3-value logic; it's easy to forget that NULL != NULL.  The query might as well be:

SQL> select count(*) from emp where empno not in ( select null from dual );

        COUNT(*)
----------------
               0

The first query could be rectified to account for 3-value logic; for example:

SQL> select count(*) from emp where to_char(empno)
  2  NOT IN ( select NVL(to_char(mgr),'null') from emp );

        COUNT(*)
----------------
               8

Is that correct?  And the reason anti-joins can be efficient if there are no NULLs is because a standard index doesn't include NULL values? 

Tom Kyte
September 04, 2004 - 1:04 pm UTC

it would be more correct to code:

not in ( select mgr from emp where mgr is not null )


converting a number to a string would not be a good thing and it is more "self explanatory" what is happening with the "is not null"


It is not a direct cause/effect. create index on emp(mgr,empno) would have an entry for each and every row since EMPNO is not null -- so it is not correct to say that indexes do not include null values -- starndard b*tree indexes do not include ENTIRELY null key entrys but in general they have nulls.

Into the Halfwayback Machine...

Duke Ganote, September 05, 2004 - 11:37 pm UTC

Looking at the "poor mans hash anti-join", there were 3 alternatives evaluated:

select count(*) from t1 rbo
where object_id not in ( select object_id from t2 );
select count(*) from t1 rbo
where NOT EXISTS (select null from t2 where t2.object_id = rbo.object_id );
select count(*) from t1, t2 rbo
where t1.object_id = rbo.object_id(+) and rbo.object_id IS NULL;

Yet none seemed "set oriented". I tried another query that, to me, seems clearer:
select count(*) from t1 rbo,
( select object_id from t1 MINUS select object_id from t2 ) SET_OP
where SET_OP.object_id = RBO.object_id
/

Under RBO, it won with only 225 LIOs (compared with 197295, 10075, and 5087 LIOs):

select count(*) from t1 rbo,
( select object_id from t1 MINUS select object_id from t2 ) SET_OP
where SET_OP.object_id = RBO.object_id

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.06 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.04 0.04 0 225 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.04 0.10 0 225 0 1

However, under the CBO, it came in 3rd with 166 LIOs (more than 91 for 2 others).

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.03 0.03 0 166 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.03 0.03 0 166 0 1

I just find it interesting that logically-equivalent queries have such different implementations.


Tom Kyte
September 06, 2004 - 10:05 am UTC

select count(*) from t1 rbo
where object_id not in ( select object_id from t2 );

isn't set oriented? actually -- they are all set oriented, no procedural code, no ordered array semantics (as you get with analytics)

you have "give me everything in this set that isn't in this other set", vs "get me everything in this set such that is doesn't exist in the other set", vs "outer join set A to set B and keep rows such that .... is true"

that last one you added really seems "cleaner" than

select count(*) from t1 rbo
where object_id not in ( select object_id from t2 );

does?!?! really?




in & exists

A reader, September 06, 2004 - 11:41 am UTC

hi

in one of your first replies you mentioned this

+++++++++++++++++++++++++
So, the predicate
where x not in ( NULL )
evaluates to neither TRUE, nor FALSE

How about

where x in ( NULL )

that evaluates to neither true or false right?

So isnt IN and EXISTS different as well (like NOT IN and NOT EXISTS)?





Tom Kyte
September 06, 2004 - 11:59 am UTC

where in and where exists work the same.

where not in and where not exists work differently.


NULL does not affect the behaviour of "in ( set with nulls )" whereas NULL does affect the behaviour of "not in ( set with nulls )"



Clean is in the eyes of the beholder...

Duke Ganote, September 06, 2004 - 11:34 pm UTC

Truly all examples were non-procedural, but I think of set operations in terms of Venn diagrams and operations like MINUS, UNION, and INTERSECT. To my taste, the cleanest query for a simple count is:

select count(*) from
( select object_id from t1 MINUS select object_id from t2 ) RBO_SET

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.05 0.18 143 150 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.06 0.20 143 150 0 1

which has a scant 150 LIOs under the RBO, and an equal-to-the-best 91 LIOs under CBO:

select count(*) from
( select object_id from t1 MINUS select object_id from t2 ) CBO_SET

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.02 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.02 0.02 0 91 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.02 0.04 0 91 0 1

The other 3 queries could easily obtain information from T1 by replacing the "count(*)", so I augmented the simple MINUS with a join back to T1 in my previous query.

BTW, I appreciate your new "LINKS I LIKE" tab! Thank you.

Strange behaviour of IN with OR

Mike, November 10, 2004 - 8:48 am UTC

I have the most bizarre CBO'ness (for what of a better word!) that I can not fathom. Below are the tear down and set up needed.

select * from v$banner
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
PL/SQL Release 9.2.0.4.0 - Production
CORE 9.2.0.3.0 Production
TNS for HPUX: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production


--
-- Clear down
--
drop table HAS_PERMISSION_UNIT_TEST
/
drop table PLT_NODE_PERMISSIONS
/
drop table PLT_SECURITY_NODES
/
drop table PLT_USER_DELEGATES
/
drop table PLT_XDI_PERMISSIONS
/

CREATE TABLE PLT_SECURITY_NODES
( NODE_ID VARCHAR2(16) NOT NULL,
PARENT_NODE_ID VARCHAR2(16),
IS_DELETED NUMBER NOT NULL,
IS_SYSTEM NUMBER NOT NULL,
LAST_MODIFIED_DATETIME DATE,
LAST_MODIFIED_USER_ID VARCHAR2(32)
)
/


CREATE INDEX PSN_PSN_FK_I ON PLT_SECURITY_NODES
(PARENT_NODE_ID)
/


CREATE UNIQUE INDEX PSN_PK ON PLT_SECURITY_NODES
(NODE_ID)
/


CREATE INDEX NODE_AND_PARENT_NODE ON PLT_SECURITY_NODES
(NODE_ID, PARENT_NODE_ID)
/


ALTER TABLE PLT_SECURITY_NODES ADD (
CONSTRAINT PSN_PK PRIMARY KEY (NODE_ID))
/


ALTER TABLE PLT_SECURITY_NODES ADD (
CONSTRAINT PSN_PSN_FK FOREIGN KEY (PARENT_NODE_ID)
REFERENCES PLT_SECURITY_NODES (NODE_ID))
/


------------------------------------------------------------------------------------------------


CREATE TABLE PLT_NODE_PERMISSIONS
( NODE_ID VARCHAR2(32) NOT NULL,
USER_ID VARCHAR2(32) NOT NULL,
ROLE_ID VARCHAR2(32) NOT NULL,
IS_DELETED NUMBER NOT NULL,
IS_SYSTEM NUMBER NOT NULL,
LAST_MODIFIED_DATETIME DATE,
LAST_MODIFIED_USER_ID VARCHAR2(32)
)
/


CREATE UNIQUE INDEX PNP_PK ON PLT_NODE_PERMISSIONS
(NODE_ID, USER_ID, ROLE_ID)
/


CREATE INDEX NODE_AND_USER ON PLT_NODE_PERMISSIONS
(NODE_ID, USER_ID)
/


CREATE INDEX PNP_PSN_FK_I ON PLT_NODE_PERMISSIONS
(NODE_ID)
/


CREATE INDEX PNP_PPR_FK_I ON PLT_NODE_PERMISSIONS
(ROLE_ID, USER_ID)
/


ALTER TABLE PLT_NODE_PERMISSIONS ADD (
CONSTRAINT PNP_PK PRIMARY KEY (NODE_ID, USER_ID, ROLE_ID))
/


ALTER TABLE PLT_NODE_PERMISSIONS ADD (
CONSTRAINT PNP_PSN_FK FOREIGN KEY (NODE_ID)
REFERENCES PLT_SECURITY_NODES (NODE_ID))
/

----------------------------------------------------------------------------


CREATE TABLE PLT_USER_DELEGATES
( USER_ID VARCHAR2(32) NOT NULL,
DELEGATE_USER_ID VARCHAR2(32) NOT NULL,
EFFECTIVE_FROM DATE NOT NULL,
EFFECTIVE_TO DATE,
IS_DELETED NUMBER NOT NULL,
IS_SYSTEM NUMBER NOT NULL,
LAST_MODIFIED_DATETIME DATE,
LAST_MODIFIED_USER_ID VARCHAR2(32)
)
/


CREATE INDEX PUD_PP_FK_I ON PLT_USER_DELEGATES
(USER_ID)
/


CREATE INDEX PUD_PP_DELEGATE_FK_I ON PLT_USER_DELEGATES
(DELEGATE_USER_ID)
/


CREATE UNIQUE INDEX PUD_PK ON PLT_USER_DELEGATES
(USER_ID, DELEGATE_USER_ID, EFFECTIVE_FROM)
/


ALTER TABLE PLT_USER_DELEGATES ADD (
CONSTRAINT PUD_PK PRIMARY KEY (USER_ID, DELEGATE_USER_ID, EFFECTIVE_FROM))
/



---------------------------------------------------------------------------------------


CREATE TABLE PLT_XDI_PERMISSIONS
(
XDI VARCHAR2(255) NOT NULL,
USER_ID VARCHAR2(32) NOT NULL,
ROLE_ID VARCHAR2(16) NOT NULL,
IS_DELETED NUMBER NOT NULL,
IS_SYSTEM NUMBER NOT NULL,
LAST_MODIFIED_DATETIME DATE,
LAST_MODIFIED_USER_ID VARCHAR2(32)
)
/


CREATE INDEX PXP_PPR_FK_I ON PLT_XDI_PERMISSIONS
(ROLE_ID, USER_ID)
/


CREATE UNIQUE INDEX PXP_PK ON PLT_XDI_PERMISSIONS
(XDI, USER_ID, ROLE_ID)
/


ALTER TABLE PLT_XDI_PERMISSIONS ADD (
CONSTRAINT PXP_PK PRIMARY KEY (XDI, USER_ID, ROLE_ID))
/

begin
--
-- Set up 4 delegates for this user to have access via
--
for idx in 1..4
loop
insert into plt_user_delegates
( user_id, delegate_user_id, effective_from, effective_to, is_deleted, is_system, last_modified_datetime, last_modified_user_id)
values
('MJONES', 'MJONES_DELEGATE_'||idx, sysdate -100, sysdate + 100, 0, 0, sysdate, 'MJONES');
--
end loop;
--
-- Loop 500 times to create some realistic volume of data that this user has
--
for idx in 1..500
loop
insert into plt_security_nodes
( node_id, parent_node_id, is_deleted, is_system, last_modified_datetime, last_modified_user_id )
values
('NODE_'||idx||'_TOP_LVL', null, 0, 1, sysdate, 'MJONES');
--
insert into plt_security_nodes
( node_id, parent_node_id, is_deleted, is_system, last_modified_datetime, last_modified_user_id )
values
('NODE_'||idx||'_LEVEL_2', 'NODE_'||idx||'_TOP_LVL', 0, 1, sysdate, 'MJONES');
--
insert into plt_security_nodes
( node_id, parent_node_id, is_deleted, is_system, last_modified_datetime, last_modified_user_id )
values
('NODE_'||idx||'_LVL_3', 'NODE_'||idx||'_LEVEL_2', 0, 1, sysdate, 'MJONES');
--
-- Create permission into Node
--
insert into plt_node_permissions ( node_id, user_id, role_id, is_deleted, is_system ) values (
'NODE_'||idx||'_LVL_3', 'MJONES', 'UNIT_TEST', 0, 0);
end loop;
--
-- Now wedge up some XDI Permissions. Have a total of 100 XDI's
--
for idx in 1..100
loop
insert into plt_xdi_permissions
( xdi, user_id, role_id, is_deleted, is_system)
values
( idx||'_JONES_THE_XDI', 'MJONES', 'UNIT_TEST', 0, 0);
end loop;
end;
/
analyze table plt_node_permissions compute statistics for table for all indexes for all indexed columns
/
analyze table plt_security_nodes compute statistics for table for all indexes for all indexed columns
/
analyze table plt_user_delegates compute statistics for table for all indexes for all indexed columns
/
analyze table plt_xdi_permissions compute statistics for table for all indexes for all indexed columns
/
--
-- Now build up a table of 100,000 rows to base all this malarkey on... Set the access up to start with so that we have access too all 100,000 records in bunches of 200 (500 access records we have access to)
--
create table has_permission_unit_test as
select
to_char(rownum) primary_key,
owner, object_name, subobject_name, object_id, data_object_id, object_type, created, last_ddl_time, timestamp, status, temporary, generated, secondary,
'NODE_'||(mod(rownum,200)+1)||'_LVL_3' security_node_id,
--decode(mod(rownum,1000),0, mod(rownum,1000)||'JONES_THE_XDI',rownum||'_XDI_'||object_name) xdi,
rownum||'_XDI_'||object_name xdi,
0 is_deleted,
sysdate last_modified_datetime,
'BOB' last_modified_user_id
from
( select * from all_objects
union all
select * from all_objects
union all
select * from all_objects
union all
select * from all_objects
union all
select * from all_objects
)
where rownum <= 100000-- 100K
/
--
alter table has_permission_unit_test modify primary_key varchar2(32)
/
--
alter table has_permission_unit_test add primary key (primary_key)
/
--
analyze table has_permission_unit_test compute statistics for table for all indexes for all indexed columns
/
--
-- END OF SET UP
--


The query that behaves strangley is...

select count(*)
from has_permission_unit_test main
where main.primary_key <= :i_volume
and
(
main.security_node_id in
( select /*use_hash(sn) index_ffs( sn node_and_parent_node ) use_hash(np) index_ffs(np node_and_user)*/ np.node_id
from plt_security_nodes sn, plt_node_permissions np
where sn.node_id = np.node_id
and np.user_id in
( select user_id
from plt_user_delegates ud_first
where ud_first.delegate_user_id = 'MJONES'
and sysdate between ud_first.effective_from and ud_first.effective_to
and ud_first.is_deleted = 0
union all
select 'MJONES' user_id
from dual dual_first
)
connect by prior sn.parent_node_id = sn.node_id
)
-- COMMENT OUT FROM HERE
or
( main.xdi in
( select xi.xdi
from plt_xdi_permissions xi
where xi.user_id in
( select user_id
from plt_user_delegates ud_second
where ud_second.delegate_user_id = 'MJONES'
and sysdate between ud_second.effective_from and ud_second.effective_to
and ud_second.is_deleted = 0
union all
select 'MJONES' user_id
from dual dual_second
)
)
)
-- DOWN TO HERE
)


Running as is with :i_volume = 1000 takes around 4 seconds, and the matches are being done with the security nodes (I beleive), but if you comment out the bottom bit the plan radically changes to start FFS with a couple of indexes I can not see why the "or" changes this. Even if I change this to "or 1 in (select null from dual where 1=0) it still won't FFS. I tried hinting (the + is removed so it won't pick it up to no avail. Why has the performance so dropped off when I stick any OR in?

Thanks in advance, and hope this is enough info.

Mike.

Tom Kyte
November 10, 2004 - 12:05 pm UTC

sorry, this is just "not in scope" here and way too "big" for a review -- I have to answer these things relatively "quick" -- they are in a fifo queue, i either answer it or skip it. I don't go back.

this is a "question" for when I'm taking them.

Fair Enough!

A reader, November 10, 2004 - 12:59 pm UTC

I'll await a question time...

Reducing time

Yogesh, December 04, 2004 - 1:00 pm UTC

I've 2 tables cust(5G) and archive(1G), I want to create third table which will have cust - archive data so I used following qry

create table newcust
nologging
as
select * from cust where c1 not in
(select c1 from archive)
parallel(Degree 8)
/

Cust table have index on c1 column ... archive table do not have any indexes.

This query was running for 7 hours .. I killed this qry as it was taking almost all resources.

Can you please suggest some alternative to speed up this query?

Tom Kyte
December 04, 2004 - 1:46 pm UTC

make sure to use cbo, make sure c1 is NOT NULLable or you say "where c1 is not null" in the subquery.

not sure that parallel would be necessary - big juicy hash anti-join is what I'd be looking for.

Null

Yogesh, December 06, 2004 - 4:58 am UTC

Can you please explain part?

make sure c1 is null or you say "where c1 is not null" in
the subquery.

should it be

"make sure c1 is not null or you say "where c1 is not null" in the subquery."?

One more thing to add. C1 is not null in both the tables. In cust table it is PK and archive table is extract of cust.





Tom Kyte
December 06, 2004 - 11:43 am UTC

thanks, i corrected that -- it should be "make sure c1 is NOT NULL'able"

NULL

Yogesh, December 06, 2004 - 6:57 am UTC

I guess in the above answer you was relating NULL and NOT NULL to /*+ HASH_AJ */.

There are 2 options we can use

/*+ HASH_AJ */
/*+ MERGE_AJ */

which one we should use?


Tom Kyte
December 06, 2004 - 12:09 pm UTC

hash_aj probably. hash one table and then join to it -- versus sort both and merge.

REPORT 6i

Elena, March 18, 2005 - 3:59 am UTC

I live in Ukraine and have a question to you:
"Do you answer on questions about REPORT 6i?"
Or you can give us mail when I would ask my questions.
Please,answer me.
With best regards
Elena

Tom Kyte
March 18, 2005 - 7:04 am UTC

nope -- i've never used reports in my life.

otn.oracle.com -> discussion forums. there is a forum for reports.

HASH_AJ

Yogesh, April 04, 2005 - 6:58 am UTC

I'm using following query

CREATE TABLE ARCHIVE
AS SELECT * FROM SMALL WHERE CNO NOT IN
(SELECT /*+ HASH_AJ */ CNO FROM LARGE WHERE CNO IS NOT NULL)

Following is the query plan

CREATE TABLE STATEMENT Optimizer Mode=CHOOSE
LOAD AS SELECT
FILTER
TABLE ACCESS FULL SMALL 14 K 564 K 236
INDEX FULL SCAN PK_LARGE 867 K 21 M 48303

Size of small table is 20M, and large table is 3225M. Here I was expecting HASH ANTI JOIN.

But if I use following query ...

CREATE TABLE ARCHIVE
AS SELECT * FROM SMALL WHERE CNO IS NOT NULL AND CNO NOT IN
SELECT /*+ HASH_AJ */ CNO FROM LARGE)

It is using ANTI HASH JOIN.

My questions are,

1. Do we have to place NOT NULL in inner query or it can be a part of outer query?

2. Does table size matter?




Tom Kyte
April 04, 2005 - 7:43 am UTC

are your stats up to date, versions are always useful.

HASH_AJ

Yogesh, April 04, 2005 - 9:23 am UTC

DB version 8.0.4.
Stats for small table is up to date. Large table was analyzed 7 days back. Should I analyze it again ?

Tom Kyte
April 04, 2005 - 9:27 am UTC

do this,

select table_name, num_rows from user_tables where table_name in ( 'A, 'B' )

using your table names of course. Anywhere close to reality? post it.


svrmgrl> show parameter hash

as well


HASH_AJ

Yogesh, April 04, 2005 - 10:31 am UTC

TABLE_NAME : LARGE
TABLESPACE_NAME :
CLUSTER_NAME :
IOT_NAME :
PCT_FREE :
PCT_USED :
INI_TRANS :
MAX_TRANS :
INITIAL_EXTENT :
NEXT_EXTENT :
MIN_EXTENTS :
MAX_EXTENTS :
PCT_INCREASE :
FREELISTS :
FREELIST_GROUPS :
LOGGING :
BACKED_UP : N
NUM_ROWS : 17351339
BLOCKS : 399153
EMPTY_BLOCKS : 12287
AVG_SPACE : 1305
CHAIN_CNT : 0
AVG_ROW_LEN : 153
AVG_SPACE_FREELIST_BLOCKS : 0
NUM_FREELIST_BLOCKS : 0
DEGREE : 1
INSTANCES : 1
CACHE : N
TABLE_LOCK : ENABLED
SAMPLE_SIZE : 0
LAST_ANALYZED : 28-mar-2005 20:31:03
PARTITIONED : YES
IOT_TYPE :
TEMPORARY : N
NESTED : NO
BUFFER_POOL :

###############################################################

TABLE_NAME : SMALL
TABLESPACE_NAME : DB1
CLUSTER_NAME :
IOT_NAME :
PCT_FREE : 10
PCT_USED : 40
INI_TRANS : 1
MAX_TRANS : 255
INITIAL_EXTENT : 5242880
NEXT_EXTENT : 5242880
MIN_EXTENTS : 1
MAX_EXTENTS : 2147483645
PCT_INCREASE : 0
FREELISTS : 1
FREELIST_GROUPS : 1
LOGGING : YES
BACKED_UP : N
NUM_ROWS : 282211
BLOCKS : 2449
EMPTY_BLOCKS : 110
AVG_SPACE : 2792
CHAIN_CNT : 0
AVG_ROW_LEN : 41
AVG_SPACE_FREELIST_BLOCKS : 4293
NUM_FREELIST_BLOCKS : 9
DEGREE : 1
INSTANCES : 1
CACHE : N
TABLE_LOCK : ENABLED
SAMPLE_SIZE : 0
LAST_ANALYZED : 03-apr-2005 18:24:31
PARTITIONED : NO
IOT_TYPE :
TEMPORARY : N
NESTED : NO
BUFFER_POOL : DEFAULT

SVRMGR> show parameter hash
NAME TYPE VALUE
----------------------------------- ------- ------------------------------
hash_area_size integer 0
hash_join_enabled boolean TRUE
hash_multiblock_io_count integer 1
sequence_cache_hash_buckets integer 10


Tom Kyte
April 04, 2005 - 12:02 pm UTC

just wanted num rows....

you said:

Size of small table is 20M, and large table is 3225M. Here I was expecting
HASH ANTI JOIN.


stats say 282,211 and 17,351,339


they are "off" a tad. something isn't adding up.



HASH_AJ

Yogesh, April 04, 2005 - 12:39 pm UTC

M -> MB. Sorry about that ...

Tom Kyte
April 04, 2005 - 1:44 pm UTC

so you didn't have row counts before, they were sizes?

what is your sort area size?


HASH_AJ

Yogesh, April 04, 2005 - 2:00 pm UTC

Yes, 20MB is the size of small table, and large table size is 3225MB.

sort_area_size = 10000000

Tom Kyte
April 04, 2005 - 5:19 pm UTC

guess you can do the "do it yourself" way if you believe it'll be faster:

select small.*
from large, small
where large.key(+) = small.key
and large.key is null;


HASH_AJ

Yogesh, April 05, 2005 - 5:15 am UTC

I was trying to understand why it is not able to use HASH_AJ hint in this specific case. Because it is working with other queries. If I'm using two tables small - 1GB+ and Large - 5GB+, it works fine .....

"poor mans hash anti-join" ? Good idea ..

Tom Kyte
April 05, 2005 - 7:32 am UTC

8.0.4 could have some definite impact on this. really old stuff.

HASH_AJ

Yogesh, April 05, 2005 - 8:43 am UTC

I Agree ..

Displaying the column of inner query into outer query

Baskar, May 31, 2005 - 11:26 am UTC

Hi Tom,

Please look into this query.

SELECT
statement_date,
revenue_centre,
bill_create_date,
jnl_end_date,
company_code,
revenue
FROM ods_merc_bill_rev stg
WHERE NOT EXISTS (SELECT 1
FROM tmp_rtmr_accounts tra
WHERE stg.gl_account_code||' 000000' = tra.account_code_n106)
OR NOT EXISTS (SELECT account_code_n106,product_code
FROM tmp_rtmr_accounts tra
WHERE stg.gl_account_code||' 000000' = tra.account_code_n106
and EXISTS (SELECT 1
FROM tmp_rtmr_products trp
WHERE
tra.product_code = trp.product_code ))


Now I want the product code which is used in inner most query to be displayed in main query..
How to bring this into main query. Actually It should
return only three rows. If I insert the product code into main query, It is replicated. How to do!..Please advice me..




Table structure for ods_merc_bill_rev:



PROCESS_KEY NUMBER (12)
COMPANY_CODE VARCHAR (6)
REVENUE_CENTRE VARCHAR (6)
STATEMENT_DATE VARCHAR (8)
BILL_CREATE_DATE VARCHAR (8)
JNL_END_DATE VARCHAR (8)
BILL_PERIOD VARCHAR (3)
GL_ACCOUNT_CODE VARCHAR (10)
REVENUE NUMBER (20,2)



Table structure for tmp_rtmr_accounts:



ACCOUNT_CODE_N106 VARCHAR (17)
PRODUCT_CODE VARCHAR (20)


Table structure for tmp_rtmr_products:

PRODUCT_CODE VARCHAR (20)



Please have a look at this and give me a solution as soon as possible..

Thanks in Advance!

Regards,
baskar





Tom Kyte
May 31, 2005 - 6:34 pm UTC

you'll be joining - but it doesn't seem to make sense to me since you say "or not exists ( product code )"

how can you show something that doesn't exist.


(i'm not here to write SQL for you, I'll be glad to answer questions -- but please don't think I'm going to write your SQL as soon as possible, that is just not reasonable)

hash_aj in 10G

Yogesh, August 05, 2005 - 11:53 am UTC

What is the replacement of hash_aj in 10g? or do we don't have to use hash_aj anymore? Sorry I don't have installed version of 10g. I read this in one document.

Tom Kyte
August 05, 2005 - 2:14 pm UTC

the "hint" seems to be not documented, but the anti join lives on

ops$tkyte@ORA10G> create table t1 ( x int not null, y int );
 
Table created.
 
ops$tkyte@ORA10G> create table t2 ( x int not null, y int );
 
Table created.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> exec dbms_stats.set_table_stats( user, 'T1', numrows=>1000000, numblks=> 10000 );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G> exec dbms_stats.set_table_stats( user, 'T2', numrows=>1000000, numblks=> 10000 );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> @plan "select * from t1 where x not in (select x from t2 )";
ops$tkyte@ORA10G> delete from plan_table;
 
4 rows deleted.
 
ops$tkyte@ORA10G> explain plan for &1;
old   1: explain plan for &1
new   1: explain plan for select * from t1 where x not in (select x from t2 )
 
Explained.
 
ops$tkyte@ORA10G> select * from table(dbms_xplan.display);
 
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------
Plan hash value: 629543484
 
--------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    39 |       |  8525   (2)|
|*  1 |  HASH JOIN RIGHT ANTI|      |     1 |    39 |    23M|  8525   (2)|
|   2 |   TABLE ACCESS FULL  | T2   |  1000K|    12M|       |  2742   (2)|
|   3 |   TABLE ACCESS FULL  | T1   |  1000K|    24M|       |  2742   (2)|
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("X"="X")
 
15 rows selected.
 
 

anti join

Yogesh, August 09, 2005 - 5:14 am UTC

so optimizer is smart enough to make this decision without specifying any hint?

Tom Kyte
August 09, 2005 - 10:02 am UTC

i did not use a hint

Strange outcome of NOT IN

SAF, August 11, 2005 - 4:08 am UTC

Hi Tom,

I'm using NOT IN to clear up a table and turning up with some strange results. Hope this is not too stupid... anyway here goes. 

The basic idea is to clean up an old large table, based on a smaller new one. Both tables have no null elements. 

--create a "small" table first. 
CREATE TABLE m1_msisdns_int
(msisdn         varchar2(18)    NOT NULL);

--it's filled with some data...

--create table of unmatched keys
CREATE TABLE unmatched_msisdns
(msisdn         varchar2(18)    NOT NULL);

--fill it up 
INSERT INTO unmatched_msisdns
(SELECT mi.msisdn FROM medapp.msisdn_imei mi WHERE mi.msisdn NOT IN
(SELECT msisdn FROM m1_msisdns_int));

OK, now records in unmatched_msisdns *should* be in msisdn_imei and *should not* be in m1_msisdns_int. But this is not the case. For example, picking some random records:

SQL> select * from unmatched_msisdns where rownum < 5;

MSISDN
------------------
6581000016
6581000017
6581000032
6581000033

SQL> select msisdn from medapp.msisdn_imei where msisdn = '6581000016';

MSISDN
------------------
6581000016

SQL> select msisdn from m1_msisdns_int where msisdn = '6581000016';

MSISDN
------------------
6581000016

Why is the record in the m1_msisdns_int table too? What am I doing wrong?

Thanks.  

Tom Kyte
August 11, 2005 - 9:55 am UTC

show us from start to finish -- eg: table unmatched starts EMPTY.

do you have a test case I can reproduce with?

Strange outcome of NOT IN: details

SAF, August 12, 2005 - 4:37 am UTC

Hi Tom,

Thanks for response.

Here are more details of the situation. I run this script first: 

========================
DROP TABLE m1_msisdns_ext;
DROP TABLE m1_msisdns_int;
DROP TABLE unmatched_msisdns;

CREATE OR REPLACE DIRECTORY data_dir AS '/home/oracle';

CREATE TABLE m1_msisdns_ext
(msisdn         varchar2(18))
ORGANIZATION EXTERNAL
(TYPE oracle_loader
        DEFAULT DIRECTORY data_dir
        ACCESS PARAMETERS
        (RECORDS DELIMITED BY NEWLINE
        fields terminated by ',')
        location ('msisdns_20050811.dat')
)
PARALLEL 5
REJECT LIMIT UNLIMITED;

CREATE TABLE m1_msisdns_int
(msisdn         varchar2(18)    NOT NULL);

INSERT INTO m1_msisdns_int (SELECT msisdn FROM m1_msisdns_ext);

DROP TABLE m1_msisdns_ext;

commit;

CREATE TABLE unmatched_msisdns
(msisdn         varchar2(18)    NOT NULL);

INSERT INTO unmatched_msisdns (SELECT msisdn FROM medapp.msisdn_imei WHERE msisdn NOT IN (SELECT msisdn FROM m1_msisdns_int));

commit;

========================

Now,
1. The data file to create m1_msisdns_ext (msisdns_20050811.dat) is just a single column file with strings e.g.:
6596886272
6597600746
6593686340
6596875161
6594745290
6597902848
6598493940
6590933396
6597440556
6593672611

You can generate it yourself, or I can pass one to you.

2. The table medapp.msisdn_imei also contains a list of strings like m1_msisdns_int, but a much larger set. 

3. We're using Oracle 10g, on Lintel. All results shown here using SQLPlus.

Just to be sure, I do these checks:

SQL> select count(*) from medapp.msisdn_imei;

  COUNT(*)
----------
   1652979

SQL>  select count(*) from m1_msisdns_int;

  COUNT(*)
----------
   1204505

SQL> select count(*) from unmatched_msisdns;

  COUNT(*)
----------
    449069

SQL> select * from m1_msisdns_int where msisdn is null;

no rows selected

SQL> select * from medapp.msisdn_imei where  msisdn is null;

no rows selected

However, when I run the NOT IN based insertion, I still see some (some, not all) records in the unmatched_msisdns where some records belong to both tables (m1_msisdns_int and medapp.msisdn_imei). 

Finally, as an aside, when I select with quote a query does not work, but works without quotes!! Weird, isn't it??:

SQL> select * from unmatched_msisdns where msisdn = 6581000016;

MSISDN
------------------
6581000016

SQL> select * from unmatched_msisdns where msisdn = '6581000016';

no rows selected

Sorry for the long post, but hope it makes things clear. 

Thanks for your time and effort.  

Tom Kyte
August 12, 2005 - 8:45 am UTC

test case, something I can cut and paste and just @runit

Check for trailing spaces

Jono, August 12, 2005 - 9:12 am UTC

I would suggest checking if the unmatched_msisdns table contains any trailing spaces after the digits. These would be ignored when an implicit conversion to a number took place and will not be noticeable in the normal SQL*plus query output.

NOT IN: details

SAF, August 14, 2005 - 10:02 pm UTC

For the test case, I'd need to pass you the data file. How do I do that?

Jono:

Thanks for your input. However there are no trailing spaces in the string. Is there a way to turn this behaviour off?

Tom Kyte
August 15, 2005 - 7:00 am UTC

develop a method to generate, to reproduce the ISSUE.


However, I just noticed you said:

SQL> select * from unmatched_msisdns where msisdn = 6581000016;

MSISDN
------------------
6581000016

SQL> select * from unmatched_msisdns where msisdn = '6581000016';

no rows selected


I disagree with your analysis of the trailing whitespace.  When you compare the string to a number, there is an implicit to_number applied to the string:


ops$xp10gr1\tkyte@ORA10GR1> create table t ( x varchar2(18) );

Table created.

ops$xp10gr1\tkyte@ORA10GR1> @plan "select * from t where x = 1"

0 rows deleted.

old   1: explain plan for &1
new   1: explain plan for select * from t where x = 1

Explained.


PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    11 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    11 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
<b>
   1 - filter(TO_NUMBER("X")=1)
</b>
Note
-----
   - dynamic sampling used for this statement

17 rows selected.

<b>when you compare a string to a string, you do NOT have that happening:</b>

ops$xp10gr1\tkyte@ORA10GR1> @plan "select * from t where x = '1'"

2 rows deleted.

old   1: explain plan for &1
new   1: explain plan for select * from t where x = '1'

Explained.


PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    11 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    11 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
<b>
   1 - filter("X"='1')
</b>
Note
-----
   - dynamic sampling used for this statement

17 rows selected.



So, I believe you DO have trailing whitespace on some of those.  Look again.


select * from t 
where x like '% '
  or x like '%'||chr(9)
  or x like '%' || chr(10) 
  or x like '%' || chr(13)
/


for example. 

spaces

SAF, August 16, 2005 - 6:08 am UTC

The trailing spaces were the culprit indeed.

Thanks a lot.

A reader, August 24, 2005 - 3:54 pm UTC

Tom,
Sorry for dragging this thread back to the basics a bit, but going back to your first reply
select count(*) from emp where empno not in (Select nvl(mgr,0) from emp)
appears to bring the correct result.
What I am trying to understand is when the "NOT EXISTS" works surely it should also say 'Hey, I see a Null value and I don't know if it maybe or may not be so I can ignore'. I mean why is NOT IN alone returning count(*)=0, comparitively.

Thanks

Tom Kyte
August 25, 2005 - 3:25 am UTC

the where exists DOESN'T see the null. That is the point :)

we said "not exists ( select null from emp T2 where t2.mgr = t1.empno );"

and sure enough for that t1.empno, there were no rows.



Great Stuff

yogi, September 09, 2005 - 2:57 pm UTC

Thanks this was very useful .I used this to bring down the response time of a query from 40 minutes to some few seconds .Keep the good work going.


Not Exists

V, October 27, 2005 - 4:09 pm UTC

Tom I have the following which takes forever to run:
..
..
AND NOT EXISTS (SELECT null FROM tm tp WHERE tp.id = cd.id)
AND NOT EXISTS (SELECT null FROM t1 tp WHERE tp.id = cd.id)

can this be changed to:
AND NOT EXISTS (SELECT null FROM tm tp,t1 tp1 WHERE tp.id = cd.id OR tp1.id = cd.id )

Would this result in the same result?

Tom Kyte
October 28, 2005 - 1:42 am UTC

why do you believe the not exists is the problem? (you are using the CBO right?)



No, it cannot be changed to what you have, that query is not the same as the prior two.


Using the cbo, you might:

and cd.id not in ( select id from tm where id is not null
union all
select id from t1 where id is not null )




Is this way of existence check valid?

Ford Desperado, November 29, 2005 - 4:17 pm UTC

Tom,

I've come across an application where the client checks if there are any rows meeting some criteria. Here is how it is done:

select count(*) from some_table where (some complex criteria) and rownum=1
/* also in some places rownum<2 is used */

Is is a safe way for existence checks?

Isn't it better to check existence like this:

select case when exists(select 1 from some_table where (some complex criteria)) then 1 else 0 end a from dual

Thank you



Tom Kyte
November 30, 2005 - 11:07 am UTC

here is the best way to do this:


THIS SPACE INTENTIONALLY LEFT BLANK


I hate that check for existence, hardware vendors love them since it increases sales, but in software it is typically "not useful"


I take the approach of "rather than see if there is work to be done, I'll try to do work and at that point discover one of two things:

a) work is there to be done, lets do it
b) work is not there to be done, skip it


the rownum check works however.

great, thanks!

Ford Desperado, November 30, 2005 - 1:33 pm UTC

Thanks Tom

A reader, January 04, 2006 - 9:52 am UTC

Very helpful, thanks

How can I get a hash mechanism working for this query

Naresh, March 30, 2006 - 12:01 pm UTC

Hello Tom,

I have the query below - service_agreement_tas_d and cm1_agreement_param_tas_d have about 2 million rows each, csm_offer_Param has about 1000 rows. Tables are not analyzed as we are doing a data conversion where the tables do not have indexes. The join in the sub query is on key fields (meaning agreement_no, param_name, offer_instance_id define a unique row in cm1_agreement_param):

  1  select
  2              mod(agreement_no, 100) agreement_key,
  3              agreement_no, soc, 30000000 + rownum param_seq_no,
  4              sa.effective_date, sa.expiration_date, sa.source_agr_no, sa.soc_seq_no,
  5              param_instance_level, param_name, param_values,
  6              row_number() over
  7                     (partition by agreement_no, param_name order by soc_seq_no) rn
  8    from service_agreement_tas_d sa,
  9    (select distinct soc_cd, param_name, populate_level, param_values,
 10     mandatory_ind, param_instance_level from csm_offer_param ) op
 11    where sa.soc = op.soc_cd
 12      and not exists (select /*+ HASH_AJ */ NULL from cm1_agreement_param_tas_d p
 13                       where p.agreement_no = sa.agreement_no
 14                             and p.param_name = op.param_name
 15                             and ( (op.param_instance_level = 'O'
 16                                     and p.offer_instance_id = sa.soc_seq_no)
 17                                     or (op.param_instance_level = 'S'
 18                                             and p.offer_instance_id is NULL)
 19                                     )
 20*                      )
SQL> /
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=124560227 Card=19271
          2 Bytes=425700808)
   1    0   WINDOW (SORT) (Cost=68275 Card=192712 Bytes=425700808)
   2    1     COUNT
   3    2       FILTER
   4    3         HASH JOIN (Cost=9984 Card=192712 Bytes=425700808)
   5    4           VIEW (Cost=4340 Card=14703 Bytes=31508529)
   6    5             SORT (UNIQUE) (Cost=4340 Card=14703 Bytes=315967
          47)
   7    6               TABLE ACCESS (FULL) OF 'CSM_OFFER_PARAM' (Cost
          =6 Card=14703 Bytes=31596747)
   8    4           PARTITION LIST (ALL)
   9    8             TABLE ACCESS (FULL) OF 'SERVICE_AGREEMENT_TAS_D'
           (Cost=1156 Card=3854235 Bytes=254379510)
  10    3         PARTITION LIST (ALL)
  11   10           TABLE ACCESS (FULL) OF 'CM1_AGREEMENT_PARAM_TAS_D'
           (Cost=646 Card=1 Bytes=155)


Is it possible for the last full table access in the sub-query to use a hash mechanism?

I tried using HASH_AJ in inner query

 and not exists (select /*+ HASH_AJ */ NULL from cm1_agreement_param_tas_d p
                          where p.agreement_no = sa.agreement_no

But that did not work.

Hash related settings:

  1* select name, value from v$parameter where name like '%hash%'
SQL> /

NAME                           VALUE
------------------------------ ------------------------------
hash_join_enabled              TRUE
hash_area_size                 50000000

Thanks,
Naresh 

Tom Kyte
March 31, 2006 - 11:28 am UTC

code it as an outer join instead of a subquery and add

and p.agreement_no is null

to the query.

HASH_AJ for not in

Naresh, March 31, 2006 - 12:56 am UTC

Hello Tom,

Version used is 9.2.0.6

I tried a variation for above query with a not in - still does not use hash_aj. I put nvl for each column in the inner select and also set always_anti_join=hash (I do not know if this parameter is relevant in the version we use - 9.2.0.6). What can be done to get a hash join working for this?

   1  select
  2              agreement_no, soc, 30000000 + rownum param_seq_no,
  3              sa.effective_date, sa.expiration_date, sa.soc_seq_no,
  4              param_instance_level, param_name, param_values,
  5              row_number() over
  6                     (partition by agreement_no, param_name order by soc_seq_no) rn
  7    from service_agreement_tas_d sa,
  8    (select distinct soc_cd, param_name, populate_level, param_values,
  9     mandatory_ind, param_instance_level from csm_offer_param ) op
 10    where sa.soc = op.soc_cd
 11      and (nvl(agreement_no,0), nvl(param_name,' '), nvl(soc_seq_no, -1)) not in
 12     (select /*+ HASH_AJ */ nvl(agreement_no,0), nvl(param_name,' '),
 13             nvl(offer_instance_id, -1)
 14*            from cm1_agreement_param_tas_d)
SQL> /

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=226118412 Card=19271
          2 Bytes=423195552)
   1    0   WINDOW (SORT) (Cost=67236 Card=192712 Bytes=423195552)
   2    1     COUNT
   3    2       FILTER
   4    3         HASH JOIN (Cost=9276 Card=192712 Bytes=423195552)
   5    4           VIEW (Cost=4340 Card=14703 Bytes=31508529)
   6    5             SORT (UNIQUE) (Cost=4340 Card=14703 Bytes=315967
          47)
   7    6               TABLE ACCESS (FULL) OF 'CSM_OFFER_PARAM' (Cost
          =6 Card=14703 Bytes=31596747)
   8    4           PARTITION LIST (ALL)
   9    8             TABLE ACCESS (FULL) OF 'SERVICE_AGREEMENT_TAS_D'
           (Cost=1156 Card=3854235 Bytes=204274455)
  10    3         PARTITION LIST (ALL)
  11   10           TABLE ACCESS (FULL) OF 'CM1_AGREEMENT_PARAM_TAS_D'
           (Cost=1173 Card=489 Bytes=75795)


The Card=489 for 'CM1_AGREEMENT_PARAM_TAS_D is different from the one below:

SQL> select count(*) from CM1_AGREEMENT_PARAM_TAS_D;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=RULE (Cost=1173 Card=1)
   1    0   SORT (AGGREGATE)
   2    1     PARTITION LIST (ALL)
   3    2       TABLE ACCESS (FULL) OF 'CM1_AGREEMENT_PARAM_TAS_D' (Co
          st=1173 Card=3910431)

And this is what the user_tables shows

SQL> SQL> select lasT_analyzed, num_rows, USER_STATS from user_tables where table_name in
  2  ('SERVICE_AGREEMENT_TAS_D', 'CM1_AGREEMENT_PARAM_TAS_D');

LAST_ANAL   NUM_ROWS USE
--------- ---------- ---
                     NO
                     NO


Thanks,
Naresh
 

Adding != NULL in sub-query did it

naresh, March 31, 2006 - 5:40 am UTC

Writing the sub-query thus, gave the desired result

(select /*+ HASH_AJ */ nvl(agreement_no,0), nvl(param_name,' '),
nvl(offer_instance_id, -1)
from cm1_agreement_param_tas_d
where agreement_no is not NULL and param_name is not null and nvl(offer_instance_id,-1) is not null)

Thanks,
Naresh.

Query Optimization related to EXIST

Soni, May 20, 2006 - 12:55 am UTC

Please look into the part of query below. I need to optimize the query. This looks very big and not optimized. Please advice.

AND EXISTS ((SELECT 1
FROM springs.si_jde_schedrcpts_work sr
WHERE t.order_id = sr.t.order_id
AND t.item = sr.item
AND t.fromloc = sr.fromloc
AND sr.order_status < 90
AND sr.supplysource = 1)
OR EXISTS (SELECT 1
FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.order_staus > 90
AND sr.supplysource = 1)
OR EXISTS (SELECT 1
FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.order_staus < 90
AND sr.supplysource <> 1)

)

Tom Kyte
May 20, 2006 - 4:52 pm UTC

how do you know it is not optimized?

I know it is not syntactically "meaningful"

ops$tkyte@ORA10GR2> select *
  2    from dual
  3   where 1=1
  4     and exists ( (select 1 from dual)
  5                  or exists (select 1 from dual)
  6                  or exists (select 1 from dual)
  7                )
  8  /
                or exists (select 1 from dual)
                *
ERROR at line 5:
ORA-00907: missing right parenthesis
 

Optimization

Soni, May 21, 2006 - 11:01 am UTC

Thanks TOM, I know it might be wrong. I am new to Oracle and have to get this done using EXIST. I understand it might be wrong. I am learning and working too. Please help me modify this. I was not getting a way to solve this query having OR conditions with EXIST. Please advice.

Tom Kyte
May 21, 2006 - 8:12 pm UTC

these two:

OR EXISTS (SELECT 1
FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.order_staus > 90
AND sr.supplysource = 1)
OR EXISTS (SELECT 1
FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.order_staus < 90
AND sr.supplysource <> 1)

can obviously be rolled into one - but it would require an OR which wouldn't be much different from what you have if an index on qty,status,supplysource were being used.

basically "not sufficient data to really comment"

exists is not inheritly "bad"

Cursor Modification

Soni, May 22, 2006 - 1:18 am UTC

Hi Tom, I am here writting the whole code for the above query I have to make the changes in a Cursor.
The change description is :
In the c_recship cursor, add an edit to first of all check to see if the t.order_id, t.item and t.fromloc exist on the springs.si_jde_schedrcpts_work sr table, joining t.order_id to sr.order_id, t.item to sr.item and t.fromloc to sr.loc, where the sr. order_status < 90 and the sr.supplysource = 1. If found, create the recship. If it exists and the order_status > 90 or the t.order_id, t.item and t.fromloc does not exist at all as a supplysource = 1 row on the si_jde_schedrcpts_work table, then check the stsc.inventory table joining t.item to inventory.item, t.fromloc to inventory.loc and create recship if inventory.qty = > t.stqty.
Please suggest for the changes done by me.

CURSOR c_recship IS
SELECT t.item item, t.toloc dest, t.fromloc source, s.transmode
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') needarrivedate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') schedarrivedate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') availtoshipdate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') schedshipdate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') needshipdate
, 4 type, 0 actionallowedsw, 1 firmsw , ROUND(t.stqty,2) qty, 0 orderid
, t.item primaryitem, 1 sys_info, t.order_id
FROM
(SELECT a.item, a.duedate
, b.fromloc, a.toloc, stqty, a.order_number order_id
FROM (SELECT DISTINCT item, TRUNC(duedate) duedate
, toloc, order_number
FROM springs.si_jde_intransit_work
WHERE TRIM(UPPER(order_type)) = 'OT'
) a
, (SELECT item, TRUNC(duedate) duedate, fromloc
, rel_ot_ord_nbr, SUM(NVL(net_st_qty,0)) stqty
FROM springs.si_jde_intransit_work
WHERE TRIM(UPPER(order_type)) = 'ST'
GROUP BY item, TRUNC(duedate), fromloc, rel_ot_ord_nbr
HAVING SUM(NVL(net_st_qty,0)) <> 0
) b
WHERE a.order_number = b.rel_ot_ord_nbr
AND a.item = b.item
) t, stsc.sourcing s
WHERE NOT EXISTS (SELECT 1 FROM stsc.loc
WHERE scen = 0 AND loc = t.toloc
AND NVL(TRIM(si_loc_type),' ') = 'DC'
)
--Change by Soni

AND EXISTS((SELECT 1 FROM springs.si_jde_schedrcpts_work sr
WHERE t.order_id = sr.t.order_id AND t.item = sr.item
AND t.fromloc = sr.fromloc AND sr.order_status < 90
AND sr.supplysource =1 )
OR EXISTS(SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND inventory.qty => t.stqty AND sr.order_staus > 90)
OR EXISTS(SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND inventory.qty => t.stqty AND sr.supplysource=1))
--Change by Soni till here
AND s.transmode=springs.si_transmode_lookup(t.fromloc

,t.toloc
,' ',t.item
)
AND s.item = t.item
AND s.source = t.fromloc
AND s.dest = t.toloc
AND s.scen = 0;

Tom Kyte
May 22, 2006 - 7:43 am UTC

suggest you ask "why the heck would I have to do this"

WHERE TRIM(UPPER(order_type)) = 'OT'


what would possibly cause you to have to use trim and upper. ugh.

Optimization

Soni, May 22, 2006 - 8:26 am UTC

The other part of query is old and I don't have to change it. I have got the change to edit only a part of it. Please help as this is too urgent and I am new to Oracle.

Thanks for your patient.

Tom Kyte
May 22, 2006 - 3:50 pm UTC

is it working, that is the question.

it'll do what you programmed it to - is it working for you.

Checking for optimization

Soni, May 23, 2006 - 3:05 am UTC

CURSOR c_recship IS
SELECT t.item item, t.toloc dest, t.fromloc source, s.transmode
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') needarrivedate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') schedarrivedate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') availtoshipdate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') schedshipdate
, TO_CHAR(NVL(t.duedate,dt_ohpost),'YYMMDD') needshipdate
, 4 type, 0 actionallowedsw, 1 firmsw , ROUND(t.stqty,2) qty, 0 orderid , t.item primaryitem, 1 sys_info, t.order_id
FROM
(SELECT a.item, a.duedate
, b.fromloc, a.toloc, stqty, a.order_number order_id
FROM (SELECT DISTINCT item, TRUNC(duedate) duedate
, toloc, order_number
FROM springs.si_jde_intransit_work
WHERE TRIM(UPPER(order_type)) = 'OT'
) a
, (SELECT item, TRUNC(duedate) duedate, fromloc
, rel_ot_ord_nbr, SUM(NVL(net_st_qty,0)) stqty
FROM springs.si_jde_intransit_work
WHERE TRIM(UPPER(order_type)) = 'ST'
GROUP BY item, TRUNC(duedate), fromloc, rel_ot_ord_nbr
HAVING SUM(NVL(net_st_qty,0)) <> 0
) b
WHERE a.order_number = b.rel_ot_ord_nbr
AND a.item = b.item
) t, stsc.sourcing s
WHERE NOT EXISTS (SELECT 1 FROM stsc.loc
WHERE scen = 0 AND loc = t.toloc
AND NVL(TRIM(si_loc_type),' ') = 'DC'
)

--Changes by Soni from here

AND (EXISTS ( SELECT 1 FROM springs.si_jde_schedrcpts_work
sr WHERE t.order_id = sr.t.order_id AND t.item = sr.item AND t.fromloc = sr.fromloc AND sr.order_status < 90
AND sr.supplysource =1 )
OR EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND
inventory.qty => t.stqty
AND sr.order_staus > 90
)
OR EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND
inventory.qty => t.stqty
AND sr.supplysource=1
)
)
--Changes by Soni till here

AND s.transmode=springs.si_transmode_lookup(t.fromloc
,t.toloc ,' ',t.item
)
AND s.item = t.item AND s.source = t.fromloc
AND s.dest = t.toloc AND s.scen = 0;
**********
The changes done by me are between commnets. Please tell me if this correct. I have tested this. Its not giving any compilation error. Please check the part modified by me.
The description of the change is in the question asked before this. I want to put them not in three select statement but atleast in only two.
Thanks.


Tom Kyte
May 23, 2006 - 7:33 am UTC

looks ok to me if it is getting the *right answer*.

any thoughts on this query

A reader, June 15, 2006 - 11:03 am UTC

Tom,


Please keep in mind that main_report_tbl contains 4261053 records

select yard_area , area_name , chassis_lt_description , visit , gate_id , reference_id ,
ssl_user_code , container , lane_out_by , cont_ck , container_lht_description ,
gross_weight , seals , temp_mins , temp_maxs , location , loc_type , booking ,
trucker_description , driver , chassis , genset , hazardous , guard_out_date , voided_date ,
visit_voided_date , haz , vessel_voy , tir , ckr , sec , action , eir_id ,
foreign , sealink_code , lane_in_date , ssl_user_description , port_of_destination_desc ,
port_of_discharge_description , reefer , net_elapsed_time , net_time , rpm_time , clerical_time ,
clerical_services_time , remarks , damage_description , action ,
chassis , loc_type
from main_report_tbl
where guard_out_date is not null
and voided_date is null
and visit_voided_date is null
and substr ( action , 2 , 1 ) not in ( 'C' , 'G' )
and lane_in_date between to_date( '13-JUN-2005 00:00:00', 'dd-mon-yyyy hh24:mi:ss' )
and to_date( '13-JUN-2005 23:59:59', 'dd-mon-yyyy hh24:mi:ss' )

Tom Kyte
June 16, 2006 - 5:48 pm UTC

looks great!

as long as you want that data that is...

I mean, it doesn't get "much simplier than this"


cannot suggest anything unless you were to say about how many records you might generally anticipate, to see if an index makes sense or not.

Brad, July 09, 2006 - 10:33 pm UTC

Tom..
i have some trouble with a sql...
insert into table T
select /*+ parallel(x,4) */
a,
sum(c),
....
30 columns all sum()
from table X
where b not in (2,3)
group by a
the above sql runs fine..in about 10 mins...

But i have a limitation that i have to write the above sql
like this
insert into table T
select /*+ parallel(x,4) */
a,
sum(c),
....
30 columns all sum()
from table X
where b not in (select b from table Y)
group by a..

table Y has 2 rows (2 and 3)

Now this SQL takes 40 mins to complete...any ideas improve this performance...please let me know if i am being vague...Thanks

Tom Kyte
July 10, 2006 - 7:44 am UTC

is b in Y "NOT NULL" or is it "NULLABLE"

and I assume that statistics are "there" and accurate?

Brad, July 10, 2006 - 4:19 pm UTC

Both the tables are analyzed and the plans are as shown below ...table X has around 7.4 million rows and table y has 2 rows.
The column b in table y does not have null values but its nullable....

the output of the select statement will be arond 170,000 rows....


SQL1:
-----
The below sql takes 25 mins to comeback
select /*+ parallel(x,2) */
a,
sum(c),
....
30 columns all sum()
from table X
where b not in (select b from table Y)
group by a..

Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 73 K 162987
SORT GROUP BY 73 K 44 M 16217
FILTER
TABLE ACCESS FULL ODS.X 369 K 224 M 10175 :Q1638707000 P->S QC (RANDOM)
TABLE ACCESS FULL ODS.Y 1 4 2

SQL2:
----
The below SQL takes 6 mins to comeback

select /*+ parallel(x,2) */
a,
sum(c),
....
30 columns all sum()
from table X
where b not in (2,3)
group by a..



Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 73 K 96849
SORT GROUP BY 73 K 44 M 96849 :Q1638710001 P->S QC (RANDOM)
SORT GROUP BY 73 K 44 M 96849 :Q1638710000 P->P HASH
TABLE ACCESS FULL ODS.X 7 M 4G 10175 :Q1638710000 PCWP


Tom Kyte
July 11, 2006 - 7:34 pm UTC

add "and y is not null" to the subquery.



Brad, July 11, 2006 - 10:30 pm UTC

Tom..i tried what you said..still no change...

select /*+ parallel(x,2) */
a,
sum(c),
....
30 columns all sum()
from table X
where b not in (select b from table Y where b is not null)
group by a..



Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 73 K 162987
SORT GROUP BY 73 K 44 M 16217
FILTER
TABLE ACCESS FULL ODS.X 369 K 224 M 10175 :Q1640640000 P->S QC (RANDOM)
TABLE ACCESS FULL ODS.Y 1 4 2


Tom Kyte
July 12, 2006 - 3:34 pm UTC

make it so the plan fits on a line (so we humans can read it easily)

are the ESTIMATED CARDINALITIES correct

and how long does it take to just full scan this table on your instance? I mean, does it take 6 minutes to do that full scan?

not in vs exists

jas, July 11, 2006 - 11:16 pm UTC

I have a query like

select request_id from request
where request_id not in ( select ref from refer_tab)
and reuest_id not in ( select sed from sed_tab)

in whose explian plan ( select ref from refer_tab) query was not using index.

when i write it like .....
select request_id from request
where exists( select ref from refer_tab)
and reuest_id not in ( select sed from sed_tab)

it gives me the same output and also costs is some 10000 times low and query executes fast.

I can't understand the reason that exists and not in gives same result.




Tom Kyte
July 12, 2006 - 3:37 pm UTC

they don't in general give the same results (eg: they are NOT semantically equivalent)


ops$tkyte@ORA10GR2> create table t1 ( x int );

Table created.

ops$tkyte@ORA10GR2> create table t2 ( x int );

Table created.

ops$tkyte@ORA10GR2> create table t3 ( x int );

Table created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert into t1 values ( 1 );

1 row created.

ops$tkyte@ORA10GR2> insert into t2 values ( NULL );

1 row created.

ops$tkyte@ORA10GR2> insert into t3 values ( 3 );

1 row created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select *
  2    from t1
  3   where x not in (select x from t2)
  4     and x not in (select x from t3)
  5  /

no rows selected

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select *
  2    from t1
  3   where NOT EXISTS (select null from t2 where t2.x = t1.x)
  4     and x not in (select x from t3)
  5  /

         X
----------
         1


is ref in refer_tab NULLABLE.

if it is, add the predicate "where ref is not null" to the not in so that NOT IN and NOT EXISTS (I'm assuming your second example, which is totally "syntax free" was really a not exists and not exists...) all of a sudden become semantically "the same" 

CBO not choosing hash anti join

Morten, August 01, 2006 - 6:18 pm UTC

Having worked with Oracle since v.6 I have never really gotten comfortable with the CBO.

I find that often (always?) I would like my queries to behave predictably and consistently rather than potentially (rarely) faster.

Having said that, I realize that I am being backwards about this, and thus I have set out to explore your brilliant site in search of the one thing that will win me over.

This thread is at the core of my issue with the CBO: I have a query that is clearly well suited for a hash anti join, and it runs in about 2 mins using the "poor man's hash anti join".

Both tables involved are indexed and analyzed, so I set out to rewrite my query to use NOT IN, hoping that the CBO would do the rest for me. The result: Indexes used, decrease of perfomance by a factor 3. NOT EXISTS has the same problem. No NULL columns involved.

Normally I would just shake my head and use my hard-coded anti join, but I have a little bit of time on my hands to experiment, so ...

Could you possibly list a few reasons why the CBO would not pick the hash anti join for my query (looking for new records from mta_tab_2 to add to mta_tab_1)?

mta_tab_1 has 2819134 records (acct_id,start_dt unique)
mta_tab_2 has 1844852 records

SELECT x.acct_id
,x.start_dt
FROM mta_tab_2 x
,(SELECT a.acct_id
,a.start_dt
FROM (SELECT acct_id
,start_dt
,end_dt
FROM mta_tab_1
) a
,(SELECT acct_id
,MAX(start_dt) start_dt
FROM mta_tab_1
GROUP BY acct_id
) b
WHERE b.acct_id = a.acct_id
AND :dt BETWEEN a.start_dt AND a.end_dt
AND a.start_dt = b.start_dt
) y
WHERE :dt BETWEEN x.start_dt AND x.end_dt
AND y.start_dt IS NULL
AND y.acct_id(+) = x.acct_id;


Tom Kyte
August 02, 2006 - 10:56 am UTC

get used to the cbo - it is a fact, in 10g, it is just a plain and simple fact of life.

I always start with comparing the "guess" with "reality"

compare an autotrace traceonly explain
with
a tkprof

see where the optimizer gets estimated cardinalities wrong and ask yourself "is there a logical reason why they are wrong". From that we can usually apply corrective actions.

Looks like you really want to learn about analytics!!!!!

(SELECT a.acct_id
,a.start_dt
FROM (SELECT acct_id
,start_dt
,end_dt
FROM mta_tab_1
) a
,(SELECT acct_id
,MAX(start_dt) start_dt
FROM mta_tab_1
GROUP BY acct_id
) b
WHERE b.acct_id = a.acct_id
AND :dt BETWEEN a.start_dt AND a.end_dt
AND a.start_dt = b.start_dt
) y

could be written in a single pass as:

select acct_id, start_dt
from (
select acct_id, start_dt, end_dt,
row_number() over (partition by acct_id order by start_dt DESC) rn
from mta_tab_1
)
where rn = 1
and start_date <= :dt
and end_dt >= :dt

You just want to get the "max row by acct_id, where the max row is designated by start_dt"


And use that in a NOT IN

Seems this is the query:

select *
from mta_tab_2
where start_dt <= :dt
and end_dt >= :dt
and (acct_id) NOT IN
(
select acct_id
from (
select acct_id, start_dt, end_dt,
row_number() over (partition by acct_id order by start_dt DESC) rn
from mta_tab_1
)
where rn = 1
and start_date <= :dt
and end_dt >= :dt
and acct_id IS NOT NULL
)

more info

Morten, August 01, 2006 - 7:10 pm UTC

btw the query returns ~10K records

Wow - fast response

Morten, August 02, 2006 - 2:14 pm UTC

Thanks for the incredible response time!

I am onto the analytics. I tried the approach out with similar queries without any significant performance improvement, and in the end decided to modify the code as little as possible (I inherited the code).

So you are saying that by comparing the autotrace and the tkprof I should be able to pinpoint where the CBO goes wrong? (The tricky part is getting access to the trace file)

Tom Kyte
August 02, 2006 - 4:01 pm UTC

you compare the "row source operation" (the truth) with REAL output row counts
to the explain plan
which is the guess - to see if they are "way off"

How about finding all non-primary key indexes ?

A reader, August 25, 2006 - 12:02 pm UTC

SELECT uc.constraint_name,
uc.constraint_type,
uic.*
FROM user_indexes uix,
user_ind_columns uic,
(SELECT index_name,
uc.constraint_type,
uc.table_name,
uc.constraint_name
FROM user_constraints uc
WHERE uc.constraint_type = 'P') uc
WHERE uix.index_name = uic.index_name
AND uix.table_name = uic.table_name
AND uix.index_name = uc.index_name(+)
AND uc.constraint_type IS NULL
order by uic.column_position


any better idea ? can JDBC do it ?

note that the query must run from a user schema with only connect,resource prives.

Tom Kyte
August 27, 2006 - 8:46 pm UTC

select select index_name from user_indexes where index_name not in (select index_name from user_constraints where index_name is not null );


jdbc can run valid sql, so I would guess "yes" jdbc would be capable of running a query.

Scalability

yuvraj, September 05, 2006 - 3:16 am UTC

Hi Tom,

What about the scalability of each of these ? I have a situation where in the data in the inner query resultset will grow almost exponentially over time whereas the outer query data will almost remain constant (comparitively speaking). Currently, the plans do not look very different from each other (I am using NOT IN & NOT EXISTS).

Thanks a lot!

Tom Kyte
September 05, 2006 - 4:58 pm UTC

using the CBO, it'll change plans as the volumes of data change over time - unlike the RBO

Scalability

yuvraj, September 05, 2006 - 3:20 am UTC

I am using Oracle 10g Release 10.1.0.4.0.

scalability

A reader, September 06, 2006 - 4:32 am UTC

Tom,

Thank you for the quick response.

So, can we predict (or do we know) which one out of the two will scale better in the situation given? Or is there no way to say ? Given my situation where the plans look alike, which condition should I use keeping in mind the growth of volume of data?

Tom Kyte
September 06, 2006 - 7:51 am UTC

use the cbo and we don't really *care*, that is the point.

scalability-thanks

A reader, September 07, 2006 - 1:06 am UTC

Thank you Tom. I understand now..

OK

Kumar, February 02, 2007 - 1:50 am UTC

Hi Tom,
Any better way to rewrite the query given below?

SELECT ol.line_id
,ol.created_by
,ol.last_updated_by
,ad.document_id
,ad.pk1_value
,ad.seq_num
FROM oe_order_lines_all ol
,fnd_attached_documents ad
,(SELECT *
FROM so_lines_all
) sl
WHERE ad.entity_name = 'OE_ORDER_LINES'
AND ol.split_from_line_id = ad.pk1_value
AND sl.header_id = ol.Header_id
AND sl.line_id = ol.split_from_line_id
AND NOT EXISTS
( SELECT 1
FROM fnd_attached_documents ad2
WHERE ol.line_id = ad2.pk1_value
AND ad2.entity_name = 'OE_ORDER_LINES'

Please do reply.
Tom Kyte
February 02, 2007 - 10:45 am UTC

looks okey dokey to me. as long as it gets the right answer of course.

Does null always mean "UNKNOWN"?

Robert Simpson, February 07, 2007 - 12:36 pm UTC

A null value was added to a column, and suddenly no data was returned from queries with a NOT IN clause. We figured out why, but then figured that other queries using things like join conditions and IN to compare the same values were probably returning different results, but that ended up not being the case.

If NULL really means "gee, I don't know. (litterally, null means Unknown)" per October 02, 2002 above, and "US" matches NULL in a NOT IN clause, why don't those two values also match in a join condition, an IN clause, a NOT EXISTS or a MINUS?

It almost seems like NOT IN is very unique in its treatment of NULLs. What are we missing? Thanks.

Tom Kyte
February 07, 2007 - 6:51 pm UTC

because the sql standard says so - it is the way it was defined to work.

but it makes sense.

where x not in ( Q )

and Q contains nulls - is X not in there? We don't know, there is no way to compare X to nulls...

where x in ( Q )

well, regardless of what Q has as far as nulls go - if X=5 and Q has 5, X is in Q.

CBO: Result not consistent with Dukes..

Jens, February 08, 2007 - 4:18 am UTC

Back to the 3 easy examples:
select count(*) from t1 rbo
where object_id not in ( select object_id from t2 );

select count(*) from t1 rbo
where NOT EXISTS (select null from t2 where t2.object_id = rbo.object_id );

select count(*) from t1, t2 rbo
where t1.object_id = rbo.object_id(+) and rbo.object_id IS NULL;

On my system (10.2.0.1), all these run with the same plan, and thus the same amount of LIO when using the CBO:
---------------------------------------------------------------------------------
| Id  | Operation              | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |        |     1 |     8 |    25  (24)| 00:00:01 |
|   1 |  SORT AGGREGATE        |        |     1 |     8 |            |          |
|*  2 |   HASH JOIN ANTI       |        |    50 |   400 |    25  (24)| 00:00:01 |
|   3 |    TABLE ACCESS FULL   | T1     |  5000 | 20000 |    17  (12)| 00:00:01 |
|   4 |    INDEX FAST FULL SCAN| T2_IDX |  4950 | 19800 |     5  (20)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"="OBJECT_ID")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         80  consistent gets
          0  physical reads
          0  redo size
        227  bytes sent via SQL*Net to client
        234  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed


Then Duke Ganote claimed his approach using MINUS to be better, even using CBO. On my system its not:
------------------------------------------------------------------------------------
| Id  | Operation                 | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |        |     1 |    17 |    52  (35)| 00:00:01 |
|   1 |  SORT AGGREGATE           |        |     1 |    17 |            |          |
|*  2 |   HASH JOIN               |        |  5000 | 85000 |    52  (35)| 00:00:01 |
|   3 |    TABLE ACCESS FULL      | T1     |  5000 | 20000 |    17  (12)| 00:00:01 |
|   4 |    VIEW                   |        |  5000 | 65000 |    32  (41)| 00:00:01 |
|   5 |     MINUS                 |        |       |       |            |          |
|   6 |      SORT UNIQUE          |        |  5000 | 20000 |            |          |
|   7 |       TABLE ACCESS FULL   | T1     |  5000 | 20000 |    17  (12)| 00:00:01 |
|   8 |      SORT UNIQUE          |        |  4950 | 19800 |            |          |
|   9 |       INDEX FAST FULL SCAN| T2_IDX |  4950 | 19800 |     5  (20)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("SET_OP"."OBJECT_ID"="RBO"."OBJECT_ID")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        144  consistent gets
          0  physical reads
          0  redo size
        226  bytes sent via SQL*Net to client
        234  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          1  rows processed


The execution plan is much more complex, T1 is FTS'ed twice and his query needs to do 2 sorts to get to the result.

I guess this could be for a number of reasons, like:
* DBMS version
* init parameters
* difference in the way statistics where gathered

Any comments on this?

Jens

Need Help only seeing a forest behind the trees..

Jan Solca, April 27, 2007 - 11:28 am UTC

Hi Tom

I tried to solve a simple problem with the IN /EXISTS functions but I was not successfull yet.

Let's assume 2 tables T1 and T2:

T1 has attributes workerUnit, valid_from, valid_to
T2 has attributes workerUnit, year (not date but string)
T1 identifies service providers for a certain period and T2 identifies if the service providers had a budget allocated for a specific year.

What I'm trying to do is to generate a view (parametrized or not) which would allow me to perform the following query "Give me the list of valid workerUnits for a given date and also if a budget was allocated to it for that given year.

Sketch for a result set input date 24.04.2007

workerUnit  |  valid_from  |   valid_to  | hasBudget 
  945          01.01.1988     31.12.9999        1
  888          03.04.2007     31.12.9999        1
  874          08.05.2007     31.03.2010        0 


where 0 and 1 in the 'hasBudget' column identify a workerUnit which got budget for the year. Let's exercise it for the first row for instance
T1 
workerUnit  |  valid_from  |  valid_to 
  945          01.01.1988     31.12.9999
  222          01.01.1972     24.03.2006
T2 
workerUnit  |    Year
  945            2004
  945            2005
  945            2007  
  

Thus if the input date was 22.03.2006(1) the record would have a 0 in the hasBudget. As another example this input date(1) would also retrieve the workerUnit 222 which has a valid_to of 24.03.2006 which is older than (1) but no budget has been allocated (record missing in T2) thus hasBudget 0
the record 222 looks like (still (1) has input date)
workerUnit  |  valid_from  |   valid_to  | hasBudget 
  ...
  222          01.01.1972     24.03.2006       0
  ... 

Thanks your help ..



Don't Use IN/Exists

RobH, May 16, 2007 - 2:26 pm UTC

Maybe try using a join with a case
Select workerUnit, valid_from , valid_to, 
  case when to_date('YYYY',year) between valid_from and valid_to then 1 else 0 end
from t1 outer join t2 on t1.workerUnit = t1.workerUnit

or....(since you have many t2 rows)

RobH, May 16, 2007 - 2:31 pm UTC

Select workerUnit, valid_from , valid_to,
max(case when to_date('YYYY',year) between valid_from and valid_to then 1 else 0 end)
from t1
left outer join t2 on t1.workerUnit = t1.workerUnit

Not Exists vs Not IN

A reader, January 09, 2008 - 10:14 am UTC

Tom,
I want to select all the rows from a table where one column values are not in another table.

CREATE TABLE T1
(
A1 INTEGER,
ANOTHER_COL NUMBER DEFAULT 1,
ANOTHER_COL1 INTEGER,
ANOTHER_COL2 NUMBER(5)
)
/



Insert into T1
(A1, ANOTHER_COL, ANOTHER_COL1, ANOTHER_COL2)
Values
(1, 1, 1, 1);
Insert into T1
(A1, ANOTHER_COL, ANOTHER_COL1, ANOTHER_COL2)
Values
(23, 2, 3, 34);
COMMIT;


create table t3 as
select * from t1;


insert into t1 values (12,3,4,12);

commit;


I want to select all the rows in t1 where another_col1 values are not in t3 table.


select * from t1 where another_col1 not in (select another_col1 from t3) ;


How do we frame the query using Not exists?

Also, If I insert another set of values in T3:

insert into t3 values (13,4,'',23);

Why can't I get any rows when I have some values in T3 with null values ?
I can understand comparing with nulls is a problem. But it should return all rows whose values are not null, avoiding that null row, shouldn't it?

It's always been a confusion for me..can you clarify?

Thanks,


Tom Kyte
January 09, 2008 - 10:33 am UTC

where column in (set that contains non-null and some nulls)

where, we can say that column is either in that set or not (we only need look at the non-nulls to see if that value is in that set)

where column NOT IN (same set)

well, the presence of nulls - as defined by the SQL language itself, a rule if you will - makes it impossible to see if that value is "not in" that set - we cannot tell if column not in (null) is true or false - it evaluates UNKNOWN.

so, you would in general where column not in (select * from (that set) where that_column is not null )


if that is what you want....


As for the not exists, that is pretty trivial - give that bit of work to yourself as an exercise....

not able to tune it...

Reene, April 04, 2008 - 9:30 am UTC

Hi Tom

I have this query,it retuns 12 rows,runs for 90 minutes.

trying to tune it , but could not,spent 5 days but to no avail.

Parsing user id: 65 (APPS)
********************************************************************************

SELECT /*+ ordered use_hash(mtp) push_subq */ ooha.org_id "OPERATING UNIT ID"
FROM apps.oe_transaction_types_all ott,
apps.oe_transaction_types_tl ottt_l,
apps.oe_order_lines_all oola,
apps.mtl_parameters mtp,
apps.oe_order_headers_all ooha,
apps.hz_cust_accounts hca
WHERE ooha.header_id = oola.header_id AND
ooha.org_id = oola.org_id AND
ooha.order_source_id = 1263 AND
ott.transaction_type_id = ottt_l.transaction_type_id AND
ott.org_id = ooha.org_id AND
oola.line_type_id = ottt_l.transaction_type_id AND
oola.line_type_id = ott.transaction_type_id AND
mtp.organization_id = oola.ship_from_org_id AND
hca.cust_account_id = ooha.sold_to_org_id AND
(ottt_l.NAME LIKE 'GPO%FE_CONSIGNMENT_LINE' OR
ottt_l.NAME LIKE 'GPO%FE_SHIPMENT_LINE') AND
oola.flow_status_code IN ('SHIPPED', 'CLOSED') AND
hca.attribute9 IS NOT NULL
and NOT EXISTS
( SELECT /*+ NO_UNNEST */ 1
FROM apps.mtl_material_transactions mmt ,
apps.mtl_transaction_lot_numbers mtnl ,
apps.mtl_parameters mp ,
apps.mtl_item_locations mil
WHERE mmt.inventory_item_id = oola.inventory_item_id AND
mmt.transaction_type_id IN (420, 40, 41, 42) AND
mmt.transaction_quantity > 0 AND
mtnl.lot_number = ooha.order_number AND
mtnl.transaction_id = mmt.transaction_id AND
mmt.transaction_date > (TRUNC(SYSDATE) -90) AND
mmt.organization_id = mp.organization_id AND
mp.attribute7 = 'PARTS' AND
mp.attribute11 ='SERVICE' AND
mil.inventory_location_id = hca.attribute9 AND
mil.organization_id = mp.organization_id
) AND
NVL(oola.actual_shipment_date, oola.schedule_ship_date) > SYSDATE - 90
/

its tkporf output is

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 690.28 4899.16 1524771 23494672 0 12
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 690.28 4899.16 1524771 23494672 0 12

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 65 (APPS)

Rows Row Source Operation
------- ---------------------------------------------------
12 NESTED LOOPS (cr=23494672 r=1524771 w=0 time=4640213970 us)
157696 NESTED LOOPS (cr=5730245 r=1402270 w=0 time=3699650845 us)
157697 HASH JOIN (cr=5257151 r=1379549 w=0 time=3576187058 us)
157697 TABLE ACCESS BY INDEX ROWID OE_ORDER_LINES_ALL (cr=5257149 r=1379549 w=0 time=3575115599 us)
2789657 NESTED LOOPS (cr=13232 r=12982 w=0 time=90323177 us)
76 HASH JOIN (cr=277 r=82 w=0 time=175347 us)
4597 TABLE ACCESS FULL OE_TRANSACTION_TYPES_ALL (cr=157 r=0 w=0 time=6683 us)
76 TABLE ACCESS FULL OE_TRANSACTION_TYPES_TL (cr=120 r=82 w=0 time=157808 us)
2789580 INDEX RANGE SCAN GEMS_ONT_ORDER_LINES_N99 (cr=12955 r=12900 w=0 time=86301122 us)(object id 1863725)
377 INDEX FULL SCAN MTL_PARAMETERS_U1 (cr=2 r=0 w=0 time=367 us)(object id 9847)
157696 TABLE ACCESS BY INDEX ROWID OE_ORDER_HEADERS_ALL (cr=473094 r=22721 w=0 time=123022055 us)
157697 INDEX UNIQUE SCAN OE_ORDER_HEADERS_U1 (cr=315397 r=1363 w=0 time=11565697 us)(object id 688729)
12 TABLE ACCESS BY INDEX ROWID HZ_CUST_ACCOUNTS (cr=17764427 r=122501 w=0 time=1114177945 us)
157696 INDEX UNIQUE SCAN HZ_CUST_ACCOUNTS_U1 (cr=315395 r=274 w=0 time=3770326 us)(object id 715003)
157621 TABLE ACCESS BY INDEX ROWID MTL_TRANSACTION_LOT_NUMBERS (cr=17291336 r=121205 w=0 time=1096619120 us)
6897874 NESTED LOOPS (cr=14562731 r=114663 w=0 time=1018837983 us)
3448931 NESTED LOOPS (cr=7488435 r=112057 w=0 time=936487677 us)
157633 NESTED LOOPS (cr=788165 r=581 w=0 time=10022817 us)
157633 INDEX RANGE SCAN MTL_ITEM_LOCATIONS_U1 (cr=472899 r=581 w=0 time=7984520 us)(object id 9761)
157633 TABLE ACCESS BY INDEX ROWID MTL_PARAMETERS (cr=315266 r=0 w=0 time=1667395 us)
157633 INDEX UNIQUE SCAN MTL_PARAMETERS_U1 (cr=157633 r=0 w=0 time=770942 us)(object id 9847)
3448931 TABLE ACCESS BY INDEX ROWID MTL_MATERIAL_TRANSACTIONS (cr=6700270 r=111476 w=0 time=924106849 us)
8273225 INDEX RANGE SCAN MTL_MATERIAL_TRANSACTIONS_N1 (cr=695175 r=32781 w=0 time=303092226 us)(object id 6205511)
3448931 INDEX RANGE SCAN MTL_TRANSACTION_LOT_NUMBERS_N1 (cr=7074296 r=2606 w=0 time=74416944 us)(object id 9976)

is there anything which can be done to improve its response time.
is push_subq is the correct choice here.

how does not exists work - please explain using the above query and the tkprof output..
what is wrong here.

thanks


An explanation I use for the IN x NOT IN

Marcus Rangel, May 19, 2008 - 8:54 pm UTC

Tom, there is an explanation I usually give for developers who ask me why "1 in (1,2,null)" returns TRUE but "3 not in (1,2,null)" returns FALSE. Please tell me what you think:

The IN operation is a series of *equality* comparisons that is ultimately resolved by an OR:

1 = 1 ? TRUE
1 = 2 ? FALSE
1 = NULL ? FALSE

TRUE OR FALSE OR FALSE = TRUE

The NOT IN operation is a series of *inequality* comparisons that is ultimately resolved by an AND:

3 != 1 ? TRUE
3 != 2 ? TRUE
3 != NULL ? FALSE

TRUE AND TRUE AND FALSE = FALSE


Tom Kyte
May 20, 2008 - 10:50 am UTC

with the minor exception that 3!=NULL is not false, it is not true, it is UNKNOWN

ops$tkyte%ORA10GR2> select * from dual where 3 <> NULL;

no rows selected

ops$tkyte%ORA10GR2> select * from dual where NOT(3 <> NULL);

no rows selected


how about in subquery contains null

jian huang zheng, October 02, 2008 - 10:32 am UTC

HI TOM
 Thanks for this wonderful thread. i read:
<Quote>
Because NULL means -- gee, I don't know.  (litterally, null means Unknown)


So, the predicate

where x not in ( NULL )

evaluates to neither TRUE, nor FALSE

</qoute>

but when using in ( subquery), if subquery contains null, it will select rows ,why is that?


SQL> desc cc
 Name

 A

SQL> desc dd
 Name

 A

SQL> insert into dd values(null);

1 row created.

SQL> select *  from cc where a in ( select *  from dd);

         A
----------
         3

1 row selected.

SQL>

Tom Kyte
October 02, 2008 - 1:12 pm UTC

because the rules say "where 5 in ( 5, NULL )" is true because 5 is in (5).

But "where 5 NOT IN (1, null)" is unknown - it is unknown if "5 is not in (null)"

ops$tkyte%ORA11GR1> select * from dual where 5 in (5,null);

D
-
X

ops$tkyte%ORA11GR1> select * from dual where 5 NOT in (1,null);

no rows selected




Could you help with this query, how to evalue it?

jian huang zheng, October 05, 2008 - 8:40 am UTC

Hello Tom

Thanks for clearing my doubts about the not in. I have following query , could you please help how conceptually evaluate this query?
  SQL> create table t1 ( x int);

Table created.
SQL> create table t2 ( x int);

Table created.
SQL> select *  from t1;

         X
----------
         1
         2


3 rows selected.
SQL> select *  from t2;

         X
----------
         1
         3


3 rows selected.

SQL> select *  from t1,t2 where exists ( select null from t2 where t1.x=t2.x);

         X          X
---------- ----------
         1          1
         1          3
         1

3 rows selected.

SQL> select *  from t1,t2 where exists ( select null from t1 where t1.x=t2.x);

         X          X
---------- ----------
         1          1
         2          1
                    1

3 rows selected.

I am not sure how those two query are executed, Most appreciated if you can give an explaination.

Thanks~


Tom Kyte
October 06, 2008 - 2:49 pm UTC

... select * from t1,t2 ...

join every row in t1 with every row in t2.

you have

1
2

and

1
3

that results in
t1.x,t2.x
1,1
1,3
2,1
2,3

then - apply the filter:

where exists ( select null from t2 where t1.x=t2.x);

to each row.


You can always envision:

select *
from <table list>
where <predicate>

as a cartesian join of all tables in the table list (every row in table t1 joined with every row in t2, joined with every row in t3 and so on - so if you had 10 rows in each table - you'd end up with 10x10x10 rows in the result set) AND THEN run the predicate, the filter against them.

very clear explaination, and one more question about 'in' execution

jian huang zheng, October 09, 2008 - 10:22 am UTC

Hi Tom

Thanks for your explaination about the sql execution. one more question i am curious to know is:
suppose i issue sql : select * from table1 where a in (select a from table2),suppose hash semi-join is used , does oracle have to finish the hash join for all rows before feeding back the resultset to the client or during the execution?
Or is there a universal rule for oracle to feed back the result set, feed back during the sql execution or after the execution?
Tom Kyte
October 09, 2008 - 11:21 am UTC

with a hash (semi or not) join, the plan is typically like this:

hash join
full scan t1
full scan t2


One of the tables will be full scanned and hashed (hopefully into memory, but not necessarily)

Once that step has taken place, we can start full scanning the other table and probing the hash table for the matches

and start returning them.


So, you will full scan and hash one table BEFORE you get the first row back.

So, does Oracle have to finish the hash join for all rows? - No

exists

A reader, October 09, 2008 - 10:58 pm UTC


Tuning

vikram, November 12, 2008 - 10:56 am UTC

Total Records deleteing 20,000(how to tune the query)

delete from H6054 where entryid IN

(Select C1 from T6054 Where C536870913=LoginName); ---- consumes 8 secs

delete from T6054 where C536870913=LoginName; ---- consumes 6 secs

--COMMIT;
Tom Kyte
November 13, 2008 - 4:12 pm UTC

sounds good to me - but then I have no information of any use, so I get to make it up...

there are lots of indexes, and the delete hits them all.
there is no index on entryid and you are removing 20,000 out of 2,000,000 records
there is no index on C536870913 and you are finding 1 record out of 100,000

things like that - I'll just assume they are true - so therefore, 8 and 6 seconds is really fast.


no tables
no indexing scheme
no plans
no tkprofs
in short - nothing to really look at.

IN & Exists

Jyothsna, May 03, 2010 - 1:44 am UTC

In Rooms table ive 6 rows (2,4,8,16,32,64 )
if i fire
select roomsize from rooms
where 30>roomsize
Then o/p is 4 rows (2,4,8,16)
But the following gives
select roomsize from rooms
where exists (select 1 from rooms
where 30 > roomsize)
It gives all the 6 rows
Why it happens???



Tom Kyte
May 06, 2010 - 11:35 am UTC

what the heck is a rooms table?


anyway, if there are any rows in rooms such that "30 > roomsize" then the subquery "(select 1 from rooms
where 30 > roomsize)" obviously returns a record - doesn't it (we started with the supposition that there are some rows in rooms such that 30>roomsize).

Now, if that row exists - then "select x from t where exists (that subquery we know returns AT LEAST one row)" - would return every row from T - since the subquery by our definition here returns at least one row.

explain plan for in and exists

Chirayu, June 30, 2010 - 10:47 pm UTC

Hi Tom,
Oracle Database Version 11G

We had a bug in our application where in the original query was like this..

select bank_branch_name, bank_name, bank_branch_id
from xx_bank_branches
WHERE BANK_BRANCH_ID IN
(SELECT BANK_BRANCH_ID
from xx_bank_accounts a
WHERE a.ACCOUNT_TYPE <> 'EXTERNAL' )
order by bank_branch_name, bank_name ;

I thought of re-writing the query as

select bank_branch_name, bank_name, bank_branch_id
from xx_bank_branches b
where EXISTS ( SELECT 'x' FROM
xx_BANK_ACCOUNTS A
WHERE A.ACCOUNT_TYPE <> 'EXTERNAL'
and a.bank_branch_id = b.bank_branch_id )
order by bank_branch_name, bank_name;

However when I compared the explain plan.. the plan generated by both the queries was exactly the same..

My question is did CBO treat both the queries as similar ?
Tom Kyte
July 06, 2010 - 2:13 pm UTC

answered this exact question elsewhere.

Query Tuning

Vicky, March 15, 2011 - 1:43 am UTC

/* Formatted on 3/15/2011 10:37:11 AM (QP5 v5.126.903.23003) */
SELECT --gjh.period_name,gjh.je_Source,gjh.Name,gjh.Description,---rico
gjh.je_header_id journal_ctrl_num, gjl.je_line_num sequence_id
,gcc.segment1 || gcc.segment10 || gcc.segment2 || gcc.segment3 || gcc.segment4 || gcc.segment5 account_code
, --gcc.segment4,--RICO
SUBSTR(ap_main.description, 1, 40) journal_description
,(CASE WHEN gcc.segment3 IN ('1001', '1002') THEN SUBSTR(ap_main.doc_description, 1, 16) ELSE NULL END) document_1
,NULL document_2, SUBSTR(gjl.attribute1, 1, 32) reference_code, NVL(gjl.accounted_dr, -gjl.accounted_cr) balance
,NVL(gjl.entered_dr, -gjl.entered_cr) nat_balance, SUBSTR(gjh.currency_code, 1, 8) nat_cur_code
,gjh.currency_conversion_rate rate, gjh.currency_conversion_type rate_type_home
,DECODE(gjh.status, 'P', 1, 'U', 0, -1) posted_flag, TO_CHAR(GJH.POSTED_DATE, 'yyyy-mm-dd hh24:mi:ss') date_posted
,TO_CHAR(SYSDATE, 'yyyy-mm-dd hh24:mi:ss') date_transfer
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_code_combinations gcc
,--gl.gl_period_statuses gps ,
( SELECT ap.JE_HEADER_ID, ap.je_line_num, MAX(ap.description) description, MAX(ap.doc_description) doc_description
FROM (SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name, aid.description, NULL doc_description
FROM apps.ap_ae_headers_all aeh, apps.ap_ae_lines_all ael, apps.gl_import_references imp
,apps.gl_je_headers gjh, apps.gl_je_lines gjl, apps.Ap_Invoice_Distributions_All aid
WHERE aeh.ae_header_id = ael.ae_header_id
AND AEL.SOURCE_TABLE = 'AP_INVOICE_DISTRIBUTIONS'
AND AEL.SOURCE_ID = AID.INVOICE_DISTRIBUTION_ID
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.je_line_num = imp.je_line_num
AND aeh.gl_transfer_flag = 'Y'
AND aeh.set_of_books_id = 1001
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
UNION
--select GJH.JE_HEADER_ID,gjl.je_line_num ,gjh.period_name ,aid.description
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name, aid.description, b.description doc_description
FROM apps.ap_ae_headers_all aeh
,apps.ap_ae_lines_all ael
,apps.gl_import_references imp
,apps.ap_invoice_payments_all pay
,apps.ap_invoices_all inv
,apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.Ap_Invoice_Distributions_All aid
,apps.JA_CN_CFS_ACTIVITIES_ALL JCCA
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS' AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID)
b
WHERE aeh.ae_header_id = ael.ae_header_id
AND ael.source_table = 'AP_INVOICE_PAYMENTS'
AND ael.source_id = pay.invoice_payment_id
AND pay.invoice_id = inv.invoice_id
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
AND pay.invoice_id = jcca.TRX_id(+)
AND jcca.SOURCE(+) = 'AP'
AND b.FLEX_VALUE(+) = JCCA.DETAILED_CFS_ITEM
AND inv.invoice_id = aid.invoice_id
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name, aid.description, NULL doc_description
FROM apps.ap_ae_headers_all aeh, apps.ap_ae_lines_all ael, apps.gl_import_references imp
,apps.ap_invoices_all inv, apps.gl_je_headers gjh, apps.gl_je_lines gjl
,apps.Ap_Invoice_Distributions_All aid
WHERE aeh.ae_header_id = ael.ae_header_id
AND ael.source_table = 'AP_INVOICES'
AND ael.source_id = inv.invoice_id
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
AND inv.invoice_id = aid.invoice_id
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name, gjl.description, b.description doc_description
FROM apps.ap_ae_headers_all aeh
,apps.ap_ae_lines_all ael
,apps.gl_import_references imp
,apps.AP_CHECKS_ALL AC
,apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.JA_CN_CFS_ACTIVITIES_ALL JCCA
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS' AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID)
b
,apps.ap_invoice_payments_all pay
WHERE aeh.ae_header_id = ael.ae_header_id
AND ael.source_table = 'AP_CHECKS'
AND ael.source_id = AC.CHECK_ID
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
AND ac.check_id = pay.check_id
AND pay.invoice_id = jcca.TRX_id(+)
AND jcca.SOURCE(+) = 'AP'
AND b.FLEX_VALUE(+) = JCCA.DETAILED_CFS_ITEM) ap
GROUP BY ap.JE_HEADER_ID, ap.je_line_num) ap_main
WHERE gjh.set_of_books_id = 1001
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
--and gjh.period_name =gps.period_name
AND gjh.je_header_id = gjl.je_header_id
AND gjl.code_combination_id = gcc.code_combination_id
AND ap_main.je_header_id = gjh.je_header_id
AND ap_main.je_line_num = gjl.je_line_num
UNION ALL
--ar
SELECT --gjh.period_name,gjh.je_Source,gjh.Name,gjh.Description,---rico
gjh.je_header_id journal_ctrl_num, gjl.je_line_num sequence_id
,gcc.segment1 || gcc.segment10 || gcc.segment2 || gcc.segment3 || gcc.segment4 || gcc.segment5 ccount_code
, --gcc.segment4,--RICO
SUBSTR(gjl.description, 1, 40) journal_description
,(CASE WHEN gcc.segment3 IN ('1001', '1002') THEN SUBSTR(ar_main.doc_description, 1, 16) ELSE NULL END) document_1
,NULL document_2, SUBSTR(gjl.attribute1, 1, 32) reference_code, NVL(gjl.accounted_dr, -gjl.accounted_cr) balance
,NVL(gjl.entered_dr, -gjl.entered_cr) nat_balance, SUBSTR(gjh.currency_code, 1, 8) nat_cur_code
,gjh.currency_conversion_rate rate, gjh.currency_conversion_type rate_type_home
,DECODE(gjh.status, 'P', 1, 'U', 0, -1) posted_flag, TO_CHAR(GJH.POSTED_DATE, 'yyyy-mm-dd hh24:mi:ss') date_posted
,TO_CHAR(SYSDATE, 'yyyy-mm-dd hh24:mi:ss') date_transfer
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_code_combinations gcc
,( SELECT ar.JE_HEADER_ID, ar.je_line_num, MAX(ar.description) doc_description
FROM (SELECT GJH.JE_HEADER_ID, gjl.je_line_num, b.description
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_import_references gir
,-- ar_cash_receipts_all acr,
apps.AR_CASH_RECEIPTS_ALL ACR
,apps.JA_CN_CFS_ACTIVITIES_ALL JCCA
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS' AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID)
b
WHERE gjh.je_source = 'Receivables'
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gir.je_header_id = gjh.je_header_id
AND gir.je_line_num = gjl.je_line_num
-- NTP 4-Mar-11
--AND SUBSTR (gir.reference_2, 1, INSTR (gir.reference_2, 'C') - 1) =
-- TO_CHAR (acr.CASH_RECEIPT_ID)
AND TO_NUMBER(SUBSTR(gir.reference_2, 1, INSTR(gir.reference_2, 'C') - 1)) = acr.CASH_RECEIPT_ID
AND ACR.CASH_RECEIPT_ID = jcca.TRX_id(+)
AND jcca.SOURCE(+) = 'AR'
AND b.FLEX_VALUE(+) = JCCA.DETAILED_CFS_ITEM
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, b.description
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_import_references gir
,apps.ar_receivable_applications_all ara
,apps.JA_CN_CFS_ACTIVITIES_ALL JCCA
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS' AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID)
b
WHERE --gjh.je_source= 'Receivables'
gjh .je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gir.je_header_id = gjh.je_header_id
AND gir.je_line_num = gjl.je_line_num
AND NVL(gir.reference_2, 0) = TO_CHAR(ara.applied_customer_trx_id)
AND ara.applied_customer_trx_id = jcca.TRX_id(+)
AND jcca.SOURCE(+) = 'AR'
AND b.FLEX_VALUE(+) = JCCA.DETAILED_CFS_ITEM) ar
GROUP BY ar.JE_HEADER_ID, ar.je_line_num) ar_main
WHERE gjh.set_of_books_id = 1001
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
--and gjh.period_name =gps.period_name
AND gjh.je_header_id = gjl.je_header_id
AND gjl.code_combination_id = gcc.code_combination_id
AND ar_main.je_header_id = gjh.je_header_id
AND ar_main.je_line_num = gjl.je_line_num
UNION ALL
--gl
SELECT --gjh.period_name,gjh.je_Source,gjh.Name,gjh.Description,---rico
gjh.je_header_id journal_ctrl_num, gjl.je_line_num sequence_id
,gcc.segment1 || gcc.segment10 || gcc.segment2 || gcc.segment3 || gcc.segment4 || gcc.segment5 account_code
, ---gcc.segment4,--RICO
SUBSTR(gjl.description, 1, 40) journal_description
,DECODE(gcc.segment3, '1001', SUBSTR(b.description, 1, 16), '1002', SUBSTR(b.description, 1, 16), NULL) document_1
,NULL document_2, SUBSTR(gjl.attribute1, 1, 32) reference_code, NVL(gjl.accounted_dr, -gjl.accounted_cr) balance
,NVL(gjl.entered_dr, -gjl.entered_cr) nat_balance, SUBSTR(gjh.currency_code, 1, 8) nat_cur_code
,gjh.currency_conversion_rate rate, gjh.currency_conversion_type rate_type_home
,DECODE(gjh.status, 'P', 1, 'U', 0, -1) posted_flag, TO_CHAR(GJH.POSTED_DATE, 'yyyy-mm-dd hh24:mi:ss') date_posted
,TO_CHAR(SYSDATE, 'yyyy-mm-dd hh24:mi:ss') date_transfer
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_code_combinations gcc
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS' AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID) b
WHERE gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.global_attribute1 = B.FLEX_VALUE
AND gjl.code_combination_id = gcc.code_combination_id
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
UNION ALL
SELECT ---gjh.period_name,gjh.je_Source,gjh.Name,gjh.Description,---rico
gjh.je_header_id journal_ctrl_num, gjl.je_line_num sequence_id
,gcc.segment1 || gcc.segment10 || gcc.segment2 || gcc.segment3 || gcc.segment4 || gcc.segment5 account_code
, ---gcc.segment4,--RICO
SUBSTR(gjl.description, 1, 40) journal_description, NULL document_1, NULL document_2
,SUBSTR(gjl.attribute1, 1, 32) reference_code, NVL(gjl.accounted_dr, -gjl.accounted_cr) balance
,NVL(gjl.entered_dr, -gjl.entered_cr) nat_balance, SUBSTR(gjh.currency_code, 1, 8) nat_cur_code
,gjh.currency_conversion_rate rate, gjh.currency_conversion_type rate_type_home
,DECODE(gjh.status, 'P', 1, 'U', 0, -1) posted_flag, TO_CHAR(GJH.POSTED_DATE, 'yyyy-mm-dd hh24:mi:ss') date_posted
,TO_CHAR(SYSDATE, 'yyyy-mm-dd hh24:mi:ss') date_transfer
FROM apps.gl_je_headers gjh, apps.gl_je_lines gjl, apps.gl_code_combinations gcc
-- gl.gl_period_statuses gps
WHERE gjh.set_of_books_id = 1001
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
--and gjh.period_name =gps.period_name
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.code_combination_id = gcc.code_combination_id
AND(GJH.JE_HEADER_ID, GJL.JE_LINE_NUM) NOT IN
(SELECT A.JE_HEADER_ID, A.je_line_num
FROM (SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM apps.ap_ae_headers_all aeh, apps.ap_ae_lines_all ael
,apps.gl_import_references imp, apps.gl_je_headers gjh
,apps.gl_je_lines gjl, apps.Ap_Invoice_Distributions_All aid
WHERE aeh.ae_header_id = ael.ae_header_id
AND AEL.SOURCE_TABLE = 'AP_INVOICE_DISTRIBUTIONS'
AND AEL.SOURCE_ID = AID.INVOICE_DISTRIBUTION_ID
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.set_of_books_id = 1001
AND gjh.je_header_id = gjl.je_header_id
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM apps.ap_ae_headers_all aeh, apps.ap_ae_lines_all ael, apps.gl_import_references imp
,apps.ap_invoice_payments_all pay, apps.ap_invoices_all inv, apps.gl_je_headers gjh
,apps.gl_je_lines gjl, apps.Ap_Invoice_Distributions_All aid
WHERE aeh.ae_header_id = ael.ae_header_id
AND ael.source_table = 'AP_INVOICE_PAYMENTS'
AND ael.source_id = pay.invoice_payment_id
AND pay.invoice_id = inv.invoice_id
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.set_of_books_id = 1001
AND gjh.je_header_id = gjl.je_header_id
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
AND inv.invoice_id = aid.invoice_id
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM apps.ap_ae_headers_all aeh, apps.ap_ae_lines_all ael, apps.gl_import_references imp
,apps.ap_invoices_all inv, apps.gl_je_headers gjh, apps.gl_je_lines gjl
,apps.Ap_Invoice_Distributions_All aid
WHERE aeh.ae_header_id = ael.ae_header_id
AND ael.source_table = 'AP_INVOICES'
AND ael.source_id = inv.invoice_id
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.set_of_books_id = 1001
AND gjh.je_header_id = gjl.je_header_id
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
AND inv.invoice_id = aid.invoice_id
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM apps.ap_ae_headers_all aeh, apps.ap_ae_lines_all ael, apps.gl_import_references imp
,apps.AP_CHECKS_ALL AC, apps.gl_je_headers gjh, apps.gl_je_lines gjl
WHERE aeh.ae_header_id = ael.ae_header_id
AND ael.source_table = 'AP_CHECKS'
AND ael.source_id = AC.CHECK_ID
AND ael.gl_sl_link_id = imp.gl_sl_link_id
AND imp.Gl_Sl_Link_Table = 'APECL'
AND gjh.je_header_id = imp.je_header_id
AND gjh.set_of_books_id = 1001
AND gjh.je_header_id = gjl.je_header_id
AND gjl.je_line_num = imp.je_line_num
AND aeh.set_of_books_id = 1001
AND aeh.gl_transfer_flag = 'Y'
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
--ar
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_import_references gir
,-- ar_cash_receipts_all acr,
apps.AR_CASH_RECEIPTS_ALL ACR
,apps.JA_CN_CFS_ACTIVITIES_ALL JCCA
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS'
AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID) b
WHERE gjh.je_source = 'Receivables'
AND gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gir.je_header_id = gjh.je_header_id
AND gir.je_line_num = gjl.je_line_num
AND TO_NUMBER(SUBSTR(gir.reference_2, 1, INSTR(gir.reference_2, 'C') - 1)) = acr.CASH_RECEIPT_ID
AND ACR.CASH_RECEIPT_ID = jcca.TRX_id(+)
AND jcca.SOURCE(+) = 'AR'
AND b.FLEX_VALUE(+) = JCCA.DETAILED_CFS_ITEM
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM apps.gl_je_headers gjh
,apps.gl_je_lines gjl
,apps.gl_import_references gir
,apps.ar_receivable_applications_all ara
,apps.JA_CN_CFS_ACTIVITIES_ALL JCCA
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS'
AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID) b
WHERE --gjh.je_source= 'Receivables'
gjh .je_header_id = gjl.je_header_id
AND gir.je_header_id = gjh.je_header_id
AND gjh.set_of_books_id = 1001
AND gir.je_line_num = gjl.je_line_num
AND NVL(gir.reference_2, 0) = TO_CHAR(ara.applied_customer_trx_id)
AND ara.applied_customer_trx_id = jcca.TRX_id(+)
AND jcca.SOURCE(+) = 'AR'
AND b.FLEX_VALUE(+) = JCCA.DETAILED_CFS_ITEM
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)
--gl
UNION
SELECT GJH.JE_HEADER_ID, gjl.je_line_num, gjh.period_name
FROM gl.gl_je_headers gjh
,gl.gl_je_lines gjl
,(SELECT fvv.FLEX_VALUE, fvv.description
FROM apps.FND_FLEX_VALUES_VL FVV, apps.FND_FLEX_VALUE_SETS FVS
WHERE FVS.FLEX_VALUE_SET_NAME = 'JA_CN_CASHFLOW_ITEMS'
AND FVV.FLEX_VALUE_SET_ID = FVS.FLEX_VALUE_SET_ID) b
WHERE gjh.je_header_id = gjl.je_header_id
AND gjh.set_of_books_id = 1001
AND gjl.global_attribute1 = B.FLEX_VALUE
AND(GJH.PERIOD_NAME = :cp_period_name OR GJH.PERIOD_NAME = :cp_prior_period_name)) A
WHERE (A.PERIOD_NAME = :cp_period_name OR A.PERIOD_NAME = :cp_prior_period_name));

Query tuning

Vicky, March 15, 2011 - 1:44 am UTC

hi Tom,

Could you please help me in tuning this query.

Thanks,
Vicky
Tom Kyte
March 15, 2011 - 8:23 am UTC

in a word - no

in a sentence - it is huge, it is against tables I don't know, containing data I don't have a clue about, answering a question I don't understand. The odds that it is 100% "correct" (that every outer join is really necessary, that the approach taken is sound, that is even returns the right answer) is near 0% in my experience - therefore reverse engineering it - which itself would take a really really long time - isn't fruitful because the query is probably a bad idea in the first place.

In order to "tune a query", one needs:

o intimate knowledge of the schema - all of the constraints, everything.
o intimate knowledge of the question that needs to be answered

What one typically doesn't want (surprisingly) is the

o query you are trying to tune


Not that I'm asking you for the schema, etc - this is not "ask tom to tune your query"

Tuning on Exist

Prince, March 18, 2011 - 12:44 am UTC

Hi Tom,
Please explain me what is wrong in performance in this query and how to tune this. Thanks in advance..


UPDATE customer_table aaa
SET (acc_name, acc_flg, act_date) =
(
SELECT
DISTINCT name,
'Y',
DECODE(to_char(REFERENCE_START_DT, 'DD/MM/YYYY HH24.MI.SS'), NULL, aaa.status_date, to_char(REFERENCE_START_DT, 'DD/MM/YYYY HH24.MI.SS'))
FROM
account acc
WHERE
acc.acc_num = aaa.customer_id
AND aaa.success_flg = 'Y'
AND aaa.error_flg = 'N'
)
WHERE EXISTS
(
SELECT
DISTINCT name,
'Y',
DECODE(to_char(REFERENCE_START_DT, 'DD/MM/YYYY HH24.MI.SS'), NULL, aaa.status_date, to_char(REFERENCE_START_DT, 'DD/MM/YYYY HH24.MI.SS'))
FROM
account acc
WHERE
acc.acc_num = aaa.customer_id
AND aaa.success_flg = 'Y'
AND aaa.error_flg = 'N'
);
Tom Kyte
March 18, 2011 - 7:51 am UTC

If the optimizer is unable to merge the correlated subquery into the update - it might have to execute it at least once per row in the update.

I like MERGE for this

merge into customer_table aaa
using account acc
ON (aaa.customer_id = acc.customer_id 
    and aaa.success_flg='Y' 
    and aaa.error_flg='N')
when matched then update set acc_name = acc.acc_name, acc_flg = 'Y', act_date = nvl(acc.act_date,status_date);


Instead of decoding and converting to strings (I assume that status_date is a date? ) just nvl it.

If the distinct is necessary - there is something really wrong with your data model, in a big way.


There is no way the predicate:

WHERE
acc.acc_num = aaa.customer_id
AND aaa.success_flg = 'Y'
AND aaa.error_flg = 'N'
)

on aaa.success_flg and aaa.error_flg belonged in the subquery - that should have been set in the where clause of the update itself - where exists (...) AND success_flg = 'Y' and error_flg = 'N'.


A reader, March 21, 2011 - 12:48 am UTC

In this case the account acc is the remote table which is having more than 30millions of rec.
earlier I was using the hint driving_site(acc) for the select statement to improve the performance.

So Now how I will use this hint or is there any alternate to improve the performance ?
Will appreciate an earliest reply.

Thanks
Prince
Tom Kyte
March 21, 2011 - 9:44 am UTC

The driving site hint is probably your best bet.

Doubt in NOT IN

Sridhar, May 27, 2011 - 1:19 am UTC

Dear Tom,

We have been experimenting in the usage of IN and NOT IN and we came across a small scenario:

When we executed the following SQL:
select SYSDATE from dual where '2' not in ('1','3')
We got sysdate as an Output, but when we used the following SQL:
select SYSDATE from dual where '2' not in ('1','3', NULL)
0 Records was selected.
What could be the reason for this?
Tom Kyte
May 27, 2011 - 10:35 am UTC

it works that way by definition.

When you compare "NON-NULL-VALUE" to "NULL-VALUE" the answer is "I don't know", it is not true, it is not false.

Therefore, it is "unknown" whether '2' is not in ( NULL ).

With in - we can "know"

where '2' in ( '1', '2', null )

we Know 2 is in that set. But we don't know if '3' is NOT IN that set...

It is by definition the way it works. Part of SQL

Very Good

pradeep sharma, August 08, 2011 - 6:32 am UTC

Amazing Question & Answers

why EXISTS ?

A reader, October 24, 2011 - 11:04 am UTC

Hi Tom,
when 'IN' and 'EXISTS' are interchangeable and we can do everything with IN whatever we can do with EXITS –
Why was the operator EXISTS introduced in ORACLE ?
Are there any special cases where we must use EXISTS only and can’t use ‘IN’ ?

Many thanks

Tom Kyte
October 24, 2011 - 11:12 am UTC

Exists was introduced in SQL - the language specification. It is not an Oracle thing - it is a sql thing.


It is primarily syntactic convenience, semantic pleasantness. Sometimes "IN" is just "better sounding", sometimes exists is.


In and Exists can be used interchangeable.

Not In and Not Exists cannot - they are different (regarding treatment of nulls).


Just like you can say:

select * from t;
select ALL * from t;

interchangeably, or

select distinct * from t;
select unique * from t;

interchangeably...



In and exists both exist for the same reason "for" and "while" loops exist (even with goto which can be used instead of for or while as well)...


subquery

A reader, October 25, 2011 - 4:41 am UTC

Hi Tom,

1.Scalar subquery is a SPECIAL case of Single row return subquery which returns Single COLOUMN from single ROW.
We can use Scalar Subqueries wherever we can use single row subquery with single column and there are NO any differences between Scalar subquery and Single row retun subquery with single coloumn.
Could you please clarify if the above correct ?



2.I already asked the below question in new thread but did not get answer -SORRY for that.. Can you please confirm if this is correct that
a)when the subquery is non-correlated and
b) when subquery is having aggregate functions count(*) with group by and
c) when it returns multiple rows -

First we will execute the subquery once and STORE the results in PGA -then we will execute main query with this as input ??

ex: select t.x,t.y from T
where t.x > (select max(a) from s group by b)


Thanks Tom for your help all the way !!



Tom Kyte
October 25, 2011 - 7:10 am UTC

1) what is a single row subquery?

You can use scalar subqueries anywhere you can use an expression.


select * from t where x = 10;

select * from t where x = (select 10 from dual);


There, a scalar subquery was used to replace the expression "10".


Now, if you have:

select * from t where x in (select 10 from dual);

the select 10 from dual is not a scalar subquery in that context, it is just a "subquery".


2) No, that is not how it will happen in all cases. We can and will do view merging when we find it to be the appropriate approach.

Never assume some order of operation outside of expressions with ()'s. That is

(5+3)/2

has a definite order of operation - but

where x > (5+3)/2 and (y < 55 and z > 4)

You cannot assume that y&z are evaulated and then x is (or that x is, and then y and z, or etc etc etc).

All you know is (5+3)/2 - IF evaluated - will be done 'correctly' and everything else happens in some order.

Subqueries, inline views, etc etc etc - all can be moved around, rewritten and so on.



Your query in general does not work - select max(a) from s group by b would IN GENERAL return 0 to N records where N can be any number greater than zero. To see "t.x > (select max(a) from s group by b)" work - either

o s would have to be empty, that query returns 0 rows.
o b is unique in S, that query returns 1 rows.

If b has more than one value, that query returns more than 1 rows and will fail in this context. You need a scalar subquery here!


Now, if you wrote instead;

where t.x > ANY(select max(a) from s group by b)

or

where t.x > ALL(select max(a) from s group by b)

then your query would become a subquery and could work even if b was not unique in s.


A reader, October 25, 2011 - 9:30 am UTC

Hi Tom,
thanks for the reply , It was my misatke in the 2nd point - missing ANY..

ok, you said -
select * from t where x = (select 10 from dual);
There, a scalar subquery was used to replace the expression "10".

Now, if you have:
select * from t where x in (select 10 from dual);
the select 10 from dual is not a scalar subquery in that context, it is just a "subquery".


Why - is it because of the operator 'IN' ?

Tom Kyte
October 25, 2011 - 11:53 am UTC

yes, IN requires a set

= requires an expression.

a scalar subquery is an expression of sorts.


The algorithm by which EXIST, NOT EXISTS, In & NOT IN operates

NB, November 15, 2011 - 3:45 pm UTC

Can u explain the diff between the Output of these queries?? i.e. The algorithm by which EXIST, NOT EXISTS, In & NOT IN operates.


SELECT * FROM ALL_WKSCRATCHPAD_DB.TB1;??
SELECT * FROM ALL_WKSCRATCHPAD_DB.TB2;

Number Name
1 brad 30
2 tom 20
3 John 10


Number Name
1 brad 30
2 tom 20
3 anil 40


LOCKING ROW FOR ACCESS
SELECT *
FROM ALL_WKSCRATCHPAD_DB.TB1 T1
WHERE NOT EXISTS (SELECT ''
FROM ALL_WKSCRATCHPAD_DB.TB2 T2
WHERE T1.Name = T2.name)
;

Ans:
Number Name
1 John 10

---------------------------
LOCKING ROW FOR ACCESS
SELECT *
FROM ALL_WKSCRATCHPAD_DB.TB1 T1
WHERE T1.Name NOT IN (SELECT T2.name
FROM ALL_WKSCRATCHPAD_DB.TB2 T2
WHERE T1.Name = T2.name)
;

Number Name
1 John 10
----------------------------------

LOCKING ROW FOR ACCESS
SELECT *
FROM ALL_WKSCRATCHPAD_DB.TB1 T1
WHERE EXISTS (SELECT ''
FROM ALL_WKSCRATCHPAD_DB.TB2 T2
WHERE T1.Name <> T2.name)
;
Number Name
1 brad 30
2 tom 20
3 John 10
-------------------------------


LOCKING ROW FOR ACCESS
SELECT *
FROM ALL_WKSCRATCHPAD_DB.TB1 T1
WHERE T1.Name IN (SELECT T2.name
FROM ALL_WKSCRATCHPAD_DB.TB2 T2
WHERE T1.Name <> T2.name)
;

Number Name


Tom Kyte
November 16, 2011 - 9:33 am UTC

IN = Exists, they the same.

NOT IN is a *little* different from NOT EXISTS, but usually the same.


select * from t1 where not exists (select null from t2 where t2.x = t1.x);

select * from t1 where t1.x NOT IN (select t2.x from t2);

IF t2.x is NOT NULL - those two queries are the same.

IF t2.x is NULLABLE and there is at least one value of t2.x that is null in T2 - then the second query returns ZERO rows - whereas the not exists might return some rows.

but if you have:

select * from t1 where t1.x NOT IN (select t2.x from t2 where t2.x is not null)

then they would be the same again.



A reader, November 16, 2011 - 1:30 pm UTC

<quote>

Your query in general does not work - select max(a) from s group by b would IN GENERAL return 0 to N records where N can be any number greater than zero. To see "t.x > (select max(a) from s group by b)" work - either

o s would have to be empty, that query returns 0 rows.
o b is unique in S, that query returns 1 rows.

<quote>

little correction....

for second point it should not be unique but all record should have same value for b in order to return single row

Tom Kyte
November 16, 2011 - 1:33 pm UTC

for second point it should not be unique but all record should have same value
for b in order to return single row



no, b must be unique in S.


If today - b has only one value - but tomorrow it might have more than one - it would not work.

Therefore, for this query to work, B must be unique in S.


A reader, November 16, 2011 - 2:02 pm UTC

"Therefore, for this query to work, B must be unique in S."

in that case "(select max(a) from s group by b)" will return multiple rows and give multiple rows return error does not execute at all...
Tom Kyte
November 17, 2011 - 6:43 pm UTC

ah, I see the point now, yes, B must have a single unique value.

not in vs not exists - primary key

Surya, May 01, 2012 - 6:09 pm UTC

Hi Tom,

First of all, thank you very much for spending your valuable time in answering all our questions.

I have a question regarding not in vs not exists.
db version: 11gr2

Suppose I have a query like this,
select * from tab1 t1 where
(col1,col2,col3) not in (select col1,col2,col3 from tab2 t2);

If (col1,col2,col3) in tab2 is a primary key and so doesn't return any NULL values, can I rewrite the above query as below?

select * from tab1 t1 where
not exists(select null from tab2 t2 where t2.col1=t1.col1 and t2.col2=t1.col2 and t2.col3=t1.col3);

Appreciate all your help. Thank you.
Tom Kyte
May 02, 2012 - 1:07 pm UTC

yes, they would be semantically equivalent as long as c1,c2,c3 are not null

Not Exists computationally expensive

Nagakiran, July 24, 2012 - 8:51 pm UTC

Hi Tom,

I have been trying to fetch rows from table A if it doesn't exist multiple times in table B..

Eg: Table A
ID Value
1 ABC
1 ABC
1 ABC
2 DEF
2 DEF

Table B
1 ABC
1 ABC
2 DEF

When I compare,these two tables,I need to show that there are two rows in the output as
as ID Value
1 ABC
2 DEF

I tried using the following query
select A.ID ,A.value,
from A
where not exists (select * from B where A.id= B.id and A.value= B.value).

However,I don't get the required output.

I tried using LEFT OUTER JOIN too,
SELECT A.ID, A.value LEFT OUTER JOIN B
ON A.ID = B.ID AND A.value = B.value

Even this doesn't work..

Kindly help me on this.





Tom Kyte
July 30, 2012 - 9:18 am UTC

no creates
no inserts
no look

not exists works strange

pfunk, August 10, 2012 - 2:14 am UTC

Hi Tom,
i have faced with the strange query execution behavior.
In brief, I has two tables.
First - rem3_credit with "LINK" column as primary key.
Second - credit_doc with "LINK" column as primary key and nullable column "LINK_UP" referenced on rem3_credit.link.
Data was loaded via SQL*Loader with setting DIRECT=Y, so reference constraint had not checked during load.
I know about special settings in SQL*Loader which force reference constraints been checked after load but interesting issue was found when I decide to find rows in rem3_credit that are not in credit_doc.
My query was:
SQL> select * from rem3_credit rc where not exists (select * from credit_doc cd where cd.link=rc.link_up);

no rows selected

But actually there was one!
SQL> select rc.link,rc.link_up fk_link, cd.link pk_link from rem3_credit rc, credit_doc cd where cd.link(+)=rc.link_up and cd.link is null;

      LINK    FK_LINK    PK_LINK
---------- ---------- ----------
      2202      24031

There is the same result if i`d query such way:
SQL> select link,link_up from rem3_credit rc where not exists (select * from credit_doc cd where cd.link=rc.link_up and cd.link is not null);

      LINK    LINK_UP
---------- ----------
      2202      24031

But why I need to write additional predicate if I know (and Oracle knows) that credit_doc.link is primary key and can`t be null?
Let`s take a look at execution plan:
SQL> select * from TABLE(dbms_xplan.display_cursor('','','ALLSTATS LAST'));

SQL_ID  1cbvtut47qfv1, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ * from rem3_credit rc where not
exists (select * from credit_doc cd where cd.link=rc.link_up)

Plan hash value: 1161649700

-------------------------------------------------------------------------------------------
| Id  | Operation         | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |      1 |        |      0 |00:00:00.01 |      15 |
|*  1 |  TABLE ACCESS FULL| REM3_CREDIT |      1 |      1 |      0 |00:00:00.01 |      15 |
-------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RC"."LINK_UP" IS NULL)

Execution plan of "correct" query is:
SQL> select * from TABLE(dbms_xplan.display_cursor('','','ALLSTATS LAST'));

SQL_ID  av9w7ftq1bu06, child number 0
-------------------------------------
select /*+ GATHER_PLAN_STATISTICS */ rc.link,rc.link_up fk_link,
cd.link pk_link from rem3_credit rc, credit_doc cd where
cd.link(+)=rc.link_up and cd.link is null

Plan hash value: 2146757845

-----------------------------------------------------------------------------------------------
| Id  | Operation           | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |               |      1 |        |      1 |00:00:00.01 |     452 |
|*  1 |  FILTER             |               |      1 |        |      1 |00:00:00.01 |     452 |

|   2 |   NESTED LOOPS OUTER|               |      1 |      8 |    797 |00:00:00.01 |     452 |
|   3 |    TABLE ACCESS FULL| REM3_CREDIT   |      1 |    798 |    797 |00:00:00.01 |      16 |
|*  4 |    INDEX UNIQUE SCAN| PK_CREDIT_DOC |    797 |  12208 |    796 |00:00:00.01 |     436 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CD"."LINK" IS NULL)
   4 - access("CD"."LINK"="RC"."LINK_UP")

I suppose there is some kind of query rewrite or transformation. Or simply a bug. By the way:
SQL> select * from v$version;

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

Can you give some clarification about what happening here? Thank you in advance.
Tom Kyte
August 17, 2012 - 1:39 pm UTC

please utilize support for something like this.

not exists works strange

pfunk, August 10, 2012 - 2:44 am UTC

Sorry, i`ve made a mistake in table description.
Correct description is:
First - rem3_credit with "LINK" column as primary key and nullable column "LINK_UP" referenced on credit_doc.link.
Second - credit_doc with "LINK" column as primary key.

Balaji, August 14, 2012 - 9:04 am UTC

When you use IN / NOT IN, the query retrieves the records from subquery and then checks the existance of the master record in the subquery. It will not consider NULL Records.

Incase of Exists/Not Exists, the where condition will return a boolean (true/false) and display the desired output. It is faster in performance than IN and NOT IN.

Another better method is using Left/Right outer join.
Tom Kyte
August 17, 2012 - 2:21 pm UTC

there is so much wrong with what you just said. so much. I don't know where to start.

did you know we can and will rewrite your IN to be Exists, your Exists to be In, your not in as not exists, your not exists as not in???

they are the same to us - we use them as we see fit.


http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html



When you use IN / NOT IN, the query retrieves the records from subquery and
then checks the existance of the master record in the subquery. It will not
consider NULL Records.


I don't know what you mean by "it will not consider NULL records" - but I have a feeling that whatever you would say it means is probably wrong. Not in and VERY null sensitive.
ops$tkyte%ORA11GR2> create table t1 ( x int );

Table created.

ops$tkyte%ORA11GR2> create table t2 ( x int );

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> insert into t1 values(1);

1 row created.

ops$tkyte%ORA11GR2> insert into t1 values(2);

1 row created.

ops$tkyte%ORA11GR2> insert into t2 values(1);

1 row created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select * from t1 where x not in (select x from t2);

         X
----------
         2

ops$tkyte%ORA11GR2> insert into t2 values(null);

1 row created.

ops$tkyte%ORA11GR2> select * from t1 where x not in (select x from t2);

no rows selected

ops$tkyte%ORA11GR2> 






and you know what, ask yourself, if this were true:

It is faster in performance than
IN and NOT IN.


then why would anyone 'invent' IN and NOT IN????

Jess, August 21, 2012 - 7:57 am UTC

Hi Tom,

What about 'in' vs. '<>'?

I have a query that looks like this:

select txn.* from txn_tbl partition (20100101) txn, attribute_tbl attr
where txn.attrib_fk = attr.attrib_pk and attr.status <> 'A';

Select from big transaction table based on one attribute. The 'status' column in the latter has 3 values and a bitmap index on it. I need to return all transaction with attributes that have any of 2 values.

Run as it is (attr.status <> 'A'), I get the following plan:

--------------------------------------------------------------------------------------------------
Id |Operation |Name |Rows|Bytes|Cost(CPU|Time |Pstr|Pstp|IN-OUT
--------------------------------------------------------------------------------------------------
0|SELECT STATEMENT | | 18M|2912M|6168 (2)|00:01:52| | |
1| PX COORDINATOR | | | | | | | |
2| PX SEND QC (RANDOM) |:TQ10001 | 18M|2912M|6168 (2)|00:01:52| | | P->S
* 3| HASH JOIN | | 18M|2912M|6168 (2)|00:01:52| | | PCWP
4| BUFFER SORT | | | | | | | | PCWC
5| PX RECEIVE | |128 | 512 | 3(34)|00:00:01| | | PCWP
6| PX SEND BROADCAST |:TQ10000 |128 | 512 | 3(34)|00:00:01| | | S->P
7| VIEW |index$_join$_02|128 | 512 | 3(34)|00:00:01| | |
* 8| HASH JOIN | | | | | | | |
9| BITMAP CONV'N TO ROWIDS| |128 | 512 | 1 (0)|00:00:01| | |
*10| BITMAP INDEX FULL SCAN|ATTRIB_TBL_BMI1| | | | | | |
11| INDEX FAST FULL SCAN |ATTRIB_PK_IDX |128 | 512 | 1 (0)|00:00:01| | |
12| PX BLOCK ITERATOR | | 18M|2840M|6159 (1)|00:01:51|1317|1317| PCWC
13| TABLE ACCESS FULL |TXN_TBL | 18M|2840M|6159 (1)|00:01:51|1317|1317| PCWP
--------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("TXN"."ATTRIB_FK"="ATTR"."ATTRIB_PK")
8 - access(ROWID=ROWID)
10 - filter("ATTR"."STATUS"<>'A')


In one of the releases, the query was changed to the following "to make it easier to understand which values we're bringing back":

select txn.* from txn_tbl partition (20100101) txn, attribute_tbl attr
where txn.attrib_fk = attr.attrib_pk and attr.status in ('B','C');

The plan for this version is this:


--------------------------------------------------------------------------------------------------
Id |Operation |Name |Rows|Bytes|Cost(CPU|Time |Pstr|Pstp|INOUT
--------------------------------------------------------------------------------------------------
0|SELECT STATEMENT | | 18M|2912M|6169 (2)|00:01:52| | |
1| PX COORDINATOR | | | | | | | |
2| PX SEND QC (RANDOM) |:TQ10001 | 18M|2912M|6169 (2)|00:01:52| | | P->S
* 3| HASH JOIN | | 18M|2912M|6169 (2)|00:01:52| | | PCWP
4| BUFFER SORT | | | | | | | | PCWC
5| PX RECEIVE | |128 | 512 | 4(25)|00:00:01| | | PCWP
6| PX SEND BROADCAST |:TQ10000 |128 | 512 | 4(25)|00:00:01| | | S->P
* 7| VIEW |index$_join$_02|128 | 512 | 4(25)|00:00:01| | |
* 8| HASH JOIN | | | | | | | |
9| INLIST ITERATOR | | | | | | | |
10| BITMAP CONV'N TO ROWIDS| |128 | 512 | 2 (0)|00:00:01| | |
*11| BITMAP IDX SINGL VALUE|ATTRIB_TBL_BMI1| | | | | | |
12| INDEX FAST FULL SCAN |ATTRIB_PK_IDX |128 | 512 | 1 (0)|00:00:01| | |
13| PX BLOCK ITERATOR | | 18M|2840M|6159 (1)|00:01:51|1317|1317| PCWC
14| TABLE ACCESS FULL |TXN_TBL | 18M|2840M|6159 (1)|00:01:51|1317|1317| PCWP
--------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("TXN"."ATTRIB_FK"="ATTR"."ATTRIB_PK")
7 - filter("ATTR"."STATUS"='B' OR "ATTR"."STATUS"='C')
8 - access(ROWID=ROWID)
11 - access("ATTR"."STATUS"='B' OR "ATTR"."STATUS"='C')


With an additional access predicate and 'in list' operation, I would've thought that the second version would produce a different plan. But the plans appear virtually identical in terms of cost. I tried it with a larger number of values in the 'in' list against another table, but that didn't seem to affect the plan either. Does that mean that doing <> is no better than doing in ()? I would've thought that saying 'not A' would yield better performance than listing 'B, C, D, E' as possible 'in' values...

Thanks as always.
Tom Kyte
August 28, 2012 - 1:16 pm UTC

But the plans appear
virtually identical in terms of cost.


that should be expected, not unexpected!! The cost if retrieving the same amount of data should be....

the same.

I'm disappointed when it *doesn't* work out that way!!!



Does that mean that doing <> is no better than doing in ()?
I would've thought that saying 'not A' would yield better performance than
listing 'B, C, D, E' as possible 'in' values...


no, it doesn't "mean" that. It just means that to get 18,000,000 rows from a table - most of the work, the time, the energy is going to be spent doing IO.

Jess, August 29, 2012 - 2:41 pm UTC

Thanks Tom.

I suppose I was expecting the plans to be different (erroneously so) in a sense that one would be better than the other one, which was not the case.

So then with the io being spent either way to get the rows, does it make it a purely sematic debate as to whether <> or () should be used, or is there a meaningful difference between the two?

Thanks as always
Tom Kyte
August 29, 2012 - 2:53 pm UTC

In general, != will preclude an index range access path whereas "in (...)" would allow it.

in your case, no index range scanning was to be used so it didn't matter.


so, where x <> 42 would probably never use an index range scan to find the rows.

but where x in ( 1, 2, 3 ) would or could.

Jess, August 29, 2012 - 5:41 pm UTC

Thanks Tom.
That's a really neat piece of information. I didn't know this.

EXISTS with multiple OR vs UNION ALL

Prakash Rai, December 19, 2012 - 1:46 pm UTC

Hi Tom -
Someone in the thread asked to tune the query with multiple EXISTS clause with OR. I have the similar case with multiple EXISTS that is producing the wong result. I replaced subsequents EXISTS with UNION ALL and changed to one EXISTS clause and get the correct result.

Though I am not sure why that would make the difference. Do you see any reason?


--This clause misses some records

WHERE
EXISTS ( SELECT 1 FROM springs.si_jde_schedrcpts_work sr
WHERE t.order_id = sr.t.order_id
AND t.item = sr.item
AND t.fromloc = sr.fromloc
AND sr.order_status < 90
AND sr.supplysource =1 )
OR EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND
inventory.qty => t.stqty
AND sr.order_staus > 90
)
OR EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND
inventory.qty => t.stqty
AND sr.supplysource=1
)

VS

--this produces the expected result
WHERE
EXISTS ( SELECT 1 FROM springs.si_jde_schedrcpts_work sr
WHERE t.order_id = sr.t.order_id
AND t.item = sr.item
AND t.fromloc = sr.fromloc
AND sr.order_status < 90
AND sr.supplysource =1
UNION ALL
SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND
inventory.qty => t.stqty
AND sr.order_staus > 90
UNION ALL
SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc AND
inventory.qty => t.stqty
AND sr.supplysource=1
)

Thanks an always!
Prakash
Tom Kyte
December 20, 2012 - 4:57 pm UTC

please contact support with this one...

(and use two subqueries, not three - just or the last condition on the last two)


If what you say is true, it looks like a bad plan (bug) is being generated for the first query - but I cannot confirm that as I don't have the schema or the entire query and cannot verify that.

EXISTS with multiple OR vs UNION ALL

Prakash Rai, December 20, 2012 - 9:47 am UTC

Follow up to my previous post;

Somewhere I read, use of multiple EXISTS with OR clause like below will not execute subsequent clause and that may vary if a column used in the predicate is indexed. e.g. SR.SUPPLYSOURCE. That's exactly the case I tested, but WHY? Why the standalone index makes the query behave differently?

I see INDEX or UNION ALL are fix or work around but curious what's the reason to Oracle behaves differently?

....
WHERE
EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.supplysource = 1 )
OR EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.supplysource = 2 )
OR EXISTS( SELECT 1 FROM stsc.inventory inventory
WHERE t.item = inventory.item
AND t.fromloc = inventory.loc
AND inventory.qty => t.stqty
AND sr.supplysource =3 )

Thanks as always!
Prakash
Tom Kyte
December 20, 2012 - 5:23 pm UTC

please see above.

EXISTS with multiple OR vs UNION ALL

Prakash Rai, December 20, 2012 - 10:21 pm UTC

Tom - Thanks for your response. In real scenario I have 5 OR EXISTS and data breaks after 3rd clause. I had my peers and DBA looked into it and came to the similar conclusion that it is a bug. While we open SR with Oracle, I am applying workaround with UNION ALL, that preserves the performance and produce the correct result.

Thanks as always.
Prakash
Tom Kyte
January 02, 2013 - 8:19 am UTC

please only ever post the REAL QUERY.

You just wasted your time and my time by having me look at a fake example that does not represent what you really are faced with.

Now we have no idea if you are facing a bug or not - because we have never seen the actual query you are running.

The union all might not be giving you the right answer by the way. If more than one branch of the OR is satisfied - you are going to get the same row more than once with a union all.

Rahul, December 29, 2012 - 11:48 am UTC

Hi Tom ,

Your link provided for IN and exists is awesome . Though for NOT IN AND NOT EXISTS you gave us the example of the case which was mishandled by NOT IN .So if i ignore such cases then the working of NOT IN and NOT EXISTS should be similar to IN AND EXISTS. I mean , in case of NOT IN - The sub-query is transformed to a view with distinct values and a join . but an equi join wouldn't be needed here .So how does it work ?

Also for NOT EXISTS ,suppose the inner table is big enough and indexed on the column .Since we arent matching any values, i suppose the index wont be used. so how does NOT EXISTS work ?
Tom Kyte
January 04, 2013 - 11:23 am UTC

NOT IN and NOT EXISTS are not the same in the way that IN and EXISTS are.


they (not in/not exists) deal with NULLS very differently.




with not in/exists - the subuquery doesn't have to be distincted - we can do a thing called a semi-join - so we don't necessarily need to distinct them. We'd use an outer join and keep only the rows where the joined to table was "null" (not there at all).


and we can use an index easily for both not in and not exists if it makes sense.


ops$tkyte%ORA11GR2> create table t1 ( x int, y int, z int );

Table created.

ops$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'T1', numrows => 100 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> create table t2 ( x int not null, y int, z int );

Table created.

ops$tkyte%ORA11GR2> create index t2_idx on t2(x);

Index created.

ops$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'T2', numrows => 1000000, numblks => 100000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> set autotrace traceonly explain

ops$tkyte%ORA11GR2> select *
  2    from t1
  3   where NOT EXISTS (select null from t2 where t2.x = t1.x)
  4  /

Execution Plan
----------------------------------------------------------
Plan hash value: 2403176152

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |   100 |  5200 |    29   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI |        |   100 |  5200 |    29   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1     |   100 |  3900 |    29   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | T2_IDX |     1 |    13 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T2"."X"="T1"."X")

ops$tkyte%ORA11GR2> set autotrace off

EXISTS with multiple OR vs UNION ALL

Prakahs Rai, January 02, 2013 - 12:59 pm UTC

Tom -

I apologize for not posting the actual query.

Here is the query that I replaced "OR" with "UNION ALL".
In the third EXIST clause, if I explicitly convert ics.cac_segment_type (which is number field) to TO_NUMBRE (as "TO_NUMBER(ics.cac_segment_type) = 1.3", the query fully executes the next evaluation "ics.cac_segment_type= 1.4" and returns the correct result, or else only partiatially evaluates the next statmenet and skips about 20% of "ics.cac_segment_type= 1.4" matchings. The use of TO_NUMBER() changed the plan to "INDEX STORAGE FAST FULL SCAN" from "INDEX RANGE SCAN". This solution was found as hit and trial that I don't see the logical reasoning and also impacted the performance unacceptably.

UNION ALL produces the correct result and there is no counter impact. Though I don't see the point as you said "The union all might not be giving you the right answer by the way....". Can you please elaborate why would top query returns multiple records if the EXIST satisfies with multiple clauses. EXIST will only validate EXIST = Yes or No. What am I missing?

Thanks,
Prakash
SELECT
...
FROM
table1 x
WHERE EXISTS
(SELECT 'X'
FROM local_gtt ics
WHERE ics.cac_segment_type = 1.1
AND ics.buyer_bus_org_fk = x.buyerorg
AND ics.ui_buyer_org_exists =1
AND ics.segment_value = x.segment1
UNION ALL
SELECT 'X'
FROM local_gtt ics
WHERE ics.cac_segment_type =1.2
AND ics.buyer_bus_org_fk = x.buyerorg
AND ics.ui_buyer_org_exists = 1
AND ics.segment_value = x.segment2
UNION ALL
SELECT 'X'
FROM local_gtt ics
WHERE ics.cac_segment_type = 1.3
AND ics.buyer_bus_org_fk = x.buyerorg
AND ics.ui_buyer_org_exists = 1
AND ics.segment_value = x.segment3
UNION ALL
SELECT 'X'
FROM local_gtt ics
WHERE ics.cac_segment_type = 1.4
AND ics.buyer_bus_org_fk = x.buyerorg
AND ics.ui_buyer_org_exists = 1
AND ics.segment_value = x.segment4
UNION ALL
SELECT 'X'
FROM local_gtt ics
WHERE ics.cac_segment_type = 1.5
AND ics.buyer_bus_org_fk = x.buyerorg
AND ics.ui_buyer_org_exists = 1
AND ics.segment_value = x.segment5
)

Difference in execution plan and execution time

Amit, February 01, 2013 - 3:34 pm UTC

Hi Tom,

I have following queries with execution plans. Both return exact same data. First query uses normal joins, it shows execution plan with low cost and executes faster. Second query uses EXISTS, it's cost is much higher and execution is 10 times slower compared to first query.
I am trying to understand reason for the difference between the execution plans of these 2 queries. Is there any way so that I can put HINT to second query so that its plan becomes better?

Thanks!

Table details -
Table_name Num_rows
--------------------------------
TBL_CALENDAR 4217
TBL_ACCOUNT 41355
TBL_DATA_SRC 12
TBL_POSN 377259

---------------------------------------------------------------------------------------------
First Query -

SELECT S.ROWID
FROM TBL_POSN S,
TBL_ACCOUNT A,
TBL_DATA_SRC D,
(
SELECT TO_CHAR(TBL_CALENDAR_DATE,'MMYYYY') MMYYYY,
MAX(CASE WHEN TO_CHAR(TBL_CALENDAR_DATE, 'D') IN (2, 3, 4, 5, 6) THEN TBL_CALENDAR_DATE ELSE NULL END) LAST_WORKING_DAY,
MAX(LAST_DAY(TBL_CALENDAR_DATE)) LAST_CALENDAR_DAY
FROM TBL_CALENDAR
GROUP BY TO_CHAR(TBL_CALENDAR_DATE,'MMYYYY')
) LAST_DAY
WHERE 1=1
AND REPORT_DATE < '31-OCT-12'
AND REPORT_DCDE = 'I'
AND A.AC_SK = S.AC_SK
AND A.DATA_SRC_ID = D.DATA_SRC_ID
AND S.REPORT_DATE >= D.BACKLOAD_CUTOFF_DATE
and TO_CHAR(s.REPORT_DATE, 'MMYYYY') = LAST_DAY.MMYYYY
and (
a.DATA_SRC_ID in(1, 2)
OR (
a.DATA_SRC_ID in(3, 4, 5, 6, 10)
AND s.REPORT_DATE <> LAST_DAY.LAST_WORKING_DAY
)
or (
A.DATA_SRC_ID = 8
and s.REPORT_DATE <> LAST_DAY.LAST_CALENDAR_DAY
)
);

Plan hash value: 1112228868

---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Pstart| Pstop |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11926 | 815K| 529 (15)| | |
|* 1 | HASH JOIN | | 11926 | 815K| 529 (15)| | |
|* 2 | HASH JOIN | | 857 | 40279 | 519 (15)| | |
| 3 | NESTED LOOPS | | 8574 | 326K| 362 (19)| | |
|* 4 | TABLE ACCESS FULL | TBL_DATA_SRC | 1 | 12 | 2 (0)| | |
| 5 | PARTITION RANGE ITERATOR| | 14291 | 376K| 360 (19)| KEY | KEY |
|* 6 | INDEX FAST FULL SCAN | XU_TBL_POSN_PK | 14291 | 376K| 360 (19)| KEY | KEY |
| 7 | TABLE ACCESS FULL | TBL_ACCOUNT | 41355 | 323K| 154 (4)| | |
| 8 | VIEW | | 4200 | 96600 | 8 (25)| | |
| 9 | HASH GROUP BY | | 4200 | 33600 | 8 (25)| | |
| 10 | INDEX FULL SCAN | TBL_CALENDAR_PK| 4217 | 33736 | 6 (0)| | |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("LAST_DAY"."MMYYYY"=TO_CHAR(INTERNAL_FUNCTION("S"."REPORT_DATE"),'MMYYYY'))
filter("A"."DATA_SRC_ID"=1 OR "A"."DATA_SRC_ID"=2 OR ("A"."DATA_SRC_ID"=3 OR "A"."DATA_SRC_ID"=4 OR
"A"."DATA_SRC_ID"=5 OR "A"."DATA_SRC_ID"=6 OR "A"."DATA_SRC_ID"=10) AND
"S"."REPORT_DATE"<>"LAST_DAY"."LAST_WORKING_DAY" OR "A"."DATA_SRC_ID"=8 AND
"S"."REPORT_DATE"<>"LAST_DAY"."LAST_CALENDAR_DAY")
2 - access("A"."AC_SK"="S"."AC_SK" AND "A"."DATA_SRC_ID"="D"."DATA_SRC_ID")
4 - filter("D"."BACKLOAD_CUTOFF_DATE"<'31-OCT-12')
6 - filter("REPORT_DCDE"='I' AND "REPORT_DATE"<'31-OCT-12' AND
"S"."REPORT_DATE">="D"."BACKLOAD_CUTOFF_DATE")
------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------
Second query -

SELECT ROWID
FROM TBL_POSN s
WHERE 1=1
AND REPORT_DATE < '31-OCT-12'
AND REPORT_DCDE = 'I'
and EXISTS
(
SELECT 1
FROM TBL_ACCOUNT A,
TBL_DATA_SRC D,
(
SELECT TO_CHAR(TBL_CALENDAR_DATE,'MMYYYY') MMYYYY,
MAX(CASE WHEN TO_CHAR(TBL_CALENDAR_DATE, 'D') IN (2, 3, 4, 5, 6) THEN TBL_CALENDAR_DATE ELSE NULL END) LAST_WORKING_DAY,
MAX(LAST_DAY(TBL_CALENDAR_DATE)) LAST_CALENDAR_DAY
FROM TBL_CALENDAR
GROUP BY TO_CHAR(TBL_CALENDAR_DATE,'MMYYYY')
) LAST_DAY
WHERE AC_SK = s.AC_SK
AND A.DATA_SRC_ID = D.DATA_SRC_ID
AND REPORT_DATE >= D.BACKLOAD_CUTOFF_DATE
and TO_CHAR(REPORT_DATE, 'MMYYYY') = LAST_DAY.MMYYYY
and (
a.DATA_SRC_ID in(1, 2)
OR (
A.DATA_SRC_ID IN(3, 4, 5, 6, 10)
AND REPORT_DATE <> LAST_DAY.LAST_WORKING_DAY
)
or (
A.DATA_SRC_ID = 8
and REPORT_DATE <> LAST_DAY.LAST_CALENDAR_DAY
)
)
);

Plan hash value: 723147282

------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Pstart| Pstop |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 27 | 456K (8)| | |
|* 1 | FILTER | | | | | | |
| 2 | PARTITION RANGE ITERATOR | | 285K| 7536K| 353 (17)| 1 | KEY |
|* 3 | INDEX FAST FULL SCAN | XU_TBL_POSN_PK | 285K| 7536K| 353 (17)| 1 | KEY |
| 4 | NESTED LOOPS | | 2 | 86 | 4 (0)| | |
| 5 | NESTED LOOPS | | 1 | 20 | 2 (0)| | |
| 6 | TABLE ACCESS BY INDEX ROWID| TBL_ACCOUNT | 1 | 8 | 1 (0)| | |
|* 7 | INDEX UNIQUE SCAN | PK_TBL_ACCOUNT | 1 | | 1 (0)| | |
|* 8 | TABLE ACCESS BY INDEX ROWID| TBL_DATA_SRC | 1 | 12 | 1 (0)| | |
|* 9 | INDEX UNIQUE SCAN | PK_TBL_DATA_SRC| 1 | | 1 (0)| | |
|* 10 | VIEW | | 2 | 46 | 2 (0)| | |
| 11 | SORT GROUP BY | | 42 | 336 | 8 (25)| | |
|* 12 | INDEX FULL SCAN | TBL_CALENDAR_PK| 42 | 336 | 7 (15)| | |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter( EXISTS (SELECT /*+ */ 0 FROM (SELECT /*+ */
TO_CHAR(INTERNAL_FUNCTION("TBL_CALENDAR_DATE"),'MMYYYY') "MMYYYY",MAX(CASE
TO_NUMBER(TO_CHAR(INTERNAL_FUNCTION("TBL_CALENDAR_DATE"),'d')) WHEN 2 THEN "TBL_CALENDAR_DATE" WHEN 3 THEN
"TBL_CALENDAR_DATE" WHEN 4 THEN "TBL_CALENDAR_DATE" WHEN 5 THEN "TBL_CALENDAR_DATE" WHEN 6 THEN "TBL_CALENDAR_DATE"
ELSE NULL END ) "LAST_WORKING_DAY",MAX(LAST_DAY(INTERNAL_FUNCTION("TBL_CALENDAR_DATE"))) "LAST_CALENDAR_DAY" FROM
"TBLDBO"."TBL_CALENDAR" "TBL_CALENDAR" WHERE TO_CHAR(INTERNAL_FUNCTION("TBL_CALENDAR_DATE"),'MMYYYY')=TO_CHAR(:B1,'M
MYYYY') GROUP BY TO_CHAR(INTERNAL_FUNCTION("TBL_CALENDAR_DATE"),'MMYYYY')) "LAST_DAY","TBLDBO"."TBL_DATA_SRC"
"D","TBLDBO"."TBL_ACCOUNT" "A" WHERE "AC_SK"=:B2 AND "A"."DATA_SRC_ID"="D"."DATA_SRC_ID" AND
"D"."BACKLOAD_CUTOFF_DATE"<=:B3 AND (("A"."DATA_SRC_ID"=1 OR "A"."DATA_SRC_ID"=2) OR ("A"."DATA_SRC_ID"=3 OR
"A"."DATA_SRC_ID"=4 OR "A"."DATA_SRC_ID"=5 OR "A"."DATA_SRC_ID"=6 OR "A"."DATA_SRC_ID"=10) AND
"LAST_DAY"."LAST_WORKING_DAY"<>:B4 OR "A"."DATA_SRC_ID"=8 AND "LAST_DAY"."LAST_CALENDAR_DAY"<>:B5)))
3 - filter("REPORT_DCDE"='I' AND "REPORT_DATE"<'31-OCT-12')
7 - access("AC_SK"=:B1)
8 - filter("D"."BACKLOAD_CUTOFF_DATE"<=:B1)
9 - access("A"."DATA_SRC_ID"="D"."DATA_SRC_ID")
10 - filter("A"."DATA_SRC_ID"=1 OR "A"."DATA_SRC_ID"=2 OR ("A"."DATA_SRC_ID"=3 OR "A"."DATA_SRC_ID"=4 OR
"A"."DATA_SRC_ID"=5 OR "A"."DATA_SRC_ID"=6 OR "A"."DATA_SRC_ID"=10) AND "LAST_DAY"."LAST_WORKING_DAY"<>:B1 OR
"A"."DATA_SRC_ID"=8 AND "LAST_DAY"."LAST_CALENDAR_DAY"<>:B2)
12 - filter(TO_CHAR(INTERNAL_FUNCTION("TBL_CALENDAR_DATE"),'MMYYYY')=TO_CHAR(:B1,'MMYYYY'))

Tom Kyte
February 04, 2013 - 9:17 am UTC

couple of comments on a few really *bad* practices:

a) TO_CHAR(TBL_CALENDAR_DATE,'MMYYYY') MMYYYY

use trunc( dt, 'mm' ). keep the date a date, trunc to the month level. don't conver to a string (hugely expensive). especially when you need it to compare to something later: TO_CHAR(REPORT_DATE, 'MMYYYY') = LAST_DAY.MMYYYY. that should be "trunc(report_date,'mm') = last_day.mmyyyy.

trunc simply resets bytes at the end of a 7 byte field. to_char has to run through probably hundreds of thousands of lines of code!


b) CASE WHEN TO_CHAR(TBL_CALENDAR_DATE, 'D') IN (2, 3, 4, 5, 6)

compare numbers to numbers, dates to dates, strings to string. here you are comparing a string to a number - meaning there is an implicit conversion in there - meaning this code is *bad* (implicit conversions are *evil* - avoid them)

b) TO_CHAR(TBL_CALENDAR_DATE, 'd' )

beware of date functions that are dependent on where you are located.
ops$tkyte%ORA11GR2> alter session set nls_territory = america;

Session altered.

ops$tkyte%ORA11GR2> select to_char(sysdate,'d'), trunc(sysdate)-trunc(sysdate,'iw')+1
  2  from dual;

T TRUNC(SYSDATE)-TRUNC(SYSDATE,'IW')+1
- ------------------------------------
2                                    1

ops$tkyte%ORA11GR2> alter session set nls_territory = germany;

Session altered.

ops$tkyte%ORA11GR2> select to_char(sysdate,'d'), trunc(sysdate)-trunc(sysdate,'iw')+1
  2  from dual;

T TRUNC(SYSDATE)-TRUNC(SYSDATE,'IW')+1
- ------------------------------------
1                                    1

ops$tkyte%ORA11GR2> 


your code will return different answers in different countries... you might consider using the IW approach - it is based on ISO standards (worldwide) instead of country by country settings.

c) MAX(LAST_DAY(TBL_CALENDAR_DATE)) LAST_CALENDAR_DAY

don't compute things you don't need. You only reference last_calendar_day in the outer query in the predicate and could easily have used last_day(mmyyyy)

d) AND REPORT_DATE < '31-OCT-12'

never compare dates to strings, strings to numbers, etc.

always compare dates to dates, strings to strings, numbers to numbers.

never rely on implicit conversions, consider:

ops$tkyte%ORA11GR2> create table t ( x date );

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> insert into t values ( to_date( '01-jan-2012', 'dd-mon-yyyy' ) );

1 row created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select * from t where x < '31-oct-12';

no rows selected

ops$tkyte%ORA11GR2> alter session set nls_date_format = 'dd-mon-yyyy';

Session altered.

ops$tkyte%ORA11GR2> select * from t where x < '31-oct-12';

no rows selected

ops$tkyte%ORA11GR2> 


whoops, wrong answer all of a sudden.....

e) AND REPORT_DATE < '31-OCT-12'

you must have started programming in about 2001 or 2002 or later.... Anyone that programmed in the 1990's knows - it takes four characters for years. Always and forever.





I'm not a fan of hints at all - so no, I'm not going to give a magical hint - you might try using dynamic sampling set to level 3 or above - the exists query is getting what looks like an incorrect cardinality estimate. the first query gets 11-12,000 rows as a guess, the second gets 1 row as a guess. that is the root cause of the bad plan (incorrect cardinality estimates). I cannot tell really where it is incorrect (because you know your data, I do not - you know what indexes go with what tables - i do not, makes it really hard to read a plan...)

so, looking at these plans - where are the cardinality estimates off by a large amount. we'll start there - figure out how we might get the right cardinalities and then the right plan will just "fall out" from that.

as far as I know, these two queries are not equivalent. the join query could return many more rows than the exists. I don't know your constraints (primary keys, foreign keys, not null, etc) and hence cannot even tell you if these two queries are interchangeable!


Difference in execution plan and execution time

Manoj Kaparwan, February 02, 2013 - 5:33 pm UTC

Dear  tom
Here tried  the following test on Oracle 10.2.0.4
 

emp table - from scott.emp
big_table -- CTAS from all_objects

   
man@ora10g:rac1> select a.rowid
   from emp  a  where a.hiredate >to_date('1987-04-19','yyyy-mm-dd')
     and exists
      ( select null from big_table b , big_table c where b.object_id  = c.object_id );  2    3    4

ROWID
------------------
AAANjIAAEAACMNcAAK

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
Plan hash value: 4226424926

-----------------------------------------------------------------------------------------
| Id  | Operation              | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |                |  7692K|   146M|   223K  (3)| 00:44:45 |
|*  1 |  FILTER                |                |       |       |            |          |
|*  2 |   TABLE ACCESS FULL    | EMP            |  7692K|   146M|   223K  (3)| 00:44:45 |
|   3 |   NESTED LOOPS         |                |     1 |    10 |     4   (0)| 00:00:01 |
|   4 |    INDEX FAST FULL SCAN| BIG_TBL_ID_IDX | 50503 |   246K|     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN    | BIG_TBL_ID_IDX |     1 |     5 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT /*+ */ 0 FROM "BIG_TABLE" "C","BIG_TABLE" "B"
              WHERE "B"."OBJECT_ID"="C"."OBJECT_ID"))
   2 - filter("A"."HIREDATE">TO_DATE(' 1987-04-19 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
   5 - access("B"."OBJECT_ID"="C"."OBJECT_ID")

   
   
Then, I took tkprof
   
********************************************************************************

 select a.rowid
   from emp  a  where a.hiredate >to_date('1987-04-19','yyyy-mm-dd')
     and exists
      ( select null from big_table b , big_table c where b.object_id  = c.object_id )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.02          0         18          0           0
Fetch        2      0.00       0.00          0          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.01       0.03          0         22          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 61

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  FILTER  (cr=22 pr=0 pw=0 time=1125 us)
      1   TABLE ACCESS FULL EMP (cr=4 pr=0 pw=0 time=273 us)
      1   NESTED LOOPS  (cr=18 pr=0 pw=0 time=776 us)
      1    INDEX FAST FULL SCAN BIG_TBL_ID_IDX (cr=15 pr=0 pw=0 time=638 us)(object id 55243)
      1    INDEX RANGE SCAN BIG_TBL_ID_IDX (cr=3 pr=0 pw=0 time=92 us)(object id 55243)
********************************************************************************

So Oracle is doing right thing here - as usual.

as soon as the 1st row is returned from the NL.. job is done .. no longer required joining each row from big_table a, to big_table c

I can see estimation of row in explain plan is way off the reality... is this the reason we should not rely much on explain plan. so we would not go with the cost as shown in the case in above post - might be tkprof would reveal more?

Tom Kyte
February 04, 2013 - 9:40 am UTC

how did it come up with over 7,000 rows in EMP??????

ops$tkyte%ORA10GR2> create table emp as select * from scott.emp;

Table created.

ops$tkyte%ORA10GR2> create table big_table as select * from all_objects;

Table created.

ops$tkyte%ORA10GR2> alter table big_table add constraint bt_pk primary key(object_id);

Table altered.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'EMP' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'BIG_TABLE' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set autotrace on explain
ops$tkyte%ORA10GR2> select a.rowid
  2     from emp  a  where a.hiredate >to_date('1987-04-19','yyyy-mm-dd')
  3          and exists
  4                        ( select null from big_table b , big_table c where b.object_id  =
  5                            c.object_id );

ROWID
------------------
AAARLfAAEAAASvUAAK


Execution Plan
----------------------------------------------------------
Plan hash value: 2260623314

--------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     1 |    20 |     5   (0)| 00:00:01 |
|*  1 |  FILTER                |       |       |       |            |          |
|*  2 |   TABLE ACCESS FULL    | EMP   |     1 |    20 |     3   (0)| 00:00:01 |
|   3 |   NESTED LOOPS         |       |     2 |    20 |     2   (0)| 00:00:01 |
|   4 |    INDEX FAST FULL SCAN| BT_PK | 48879 |   238K|     2   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN   | BT_PK |     1 |     5 |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "BIG_TABLE" "C","BIG_TABLE" "B"
              WHERE "B"."OBJECT_ID"="C"."OBJECT_ID"))
   2 - filter("A"."HIREDATE">TO_DATE(' 1987-04-19 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))
   5 - access("B"."OBJECT_ID"="C"."OBJECT_ID")

ops$tkyte%ORA10GR2> set autotrace off



post your step by steps to reproduce please.

..Difference in execution plan and execution time

Manoj Kaparwan, February 04, 2013 - 7:32 pm UTC

Dear Tom,

My bad !

emp table in MAN schema had stats way off.

man@ora10g:rac1> select owner, table_name, num_rows from dba_tables where table_name ='EMP';

OWNER TABLE_NAME NUM_ROWS
------------------------------ ------------------------------ ----------
MAN EMP 100000000


later I gather stats.


man@ora10g:rac1> exec dbms_stats.gather_table_stats(user, 'EMP', cascade=>true);

PL/SQL procedure successfully completed.

man@ora10g:rac1> select owner, table_name, num_rows, last_analyzed from dba_tables where table_name ='EMP';

OWNER TABLE_NAME NUM_ROWS LAST_ANALYZED
------------------------------ ------------------------------ ---------- -------------------
MAN EMP 14 2013-02-05 12:25:31

man@ora10g:rac1> explain plan for
2 select a.rowid
from emp a where a.hiredate >to_date('1987-04-19','yyyy-mm-dd')
and exists
( select null from big_table b , big_table c where b.object_id = c.object_id ); 3 4 5

Explained.

man@ora10g:rac1> @xplan

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4226424926

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 7 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | TABLE ACCESS FULL | EMP | 1 | 20 | 3 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 1 | 10 | 4 (0)| 00:00:01 |
| 4 | INDEX FAST FULL SCAN| BIG_TBL_ID_IDX | 50503 | 246K| 2 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | BIG_TBL_ID_IDX | 1 | 5 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter( EXISTS (SELECT /*+ */ 0 FROM "BIG_TABLE" "C","BIG_TABLE" "B"
WHERE "B"."OBJECT_ID"="C"."OBJECT_ID"))
2 - filter("A"."HIREDATE">TO_DATE(' 1987-04-19 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
5 - access("B"."OBJECT_ID"="C"."OBJECT_ID")

21 rows selected.



Not in and Not exists

ajit, February 20, 2013 - 6:54 pm UTC

Hi tom,

I read most of the discussions above.

I just want to confirm one thing.

If we are dealing with columns which do not have null values..can not in/not exists be interchanged in that case?

Thanks in advance
Big fan :)

Tom Kyte
February 25, 2013 - 10:09 am UTC

yes, if the column involved in the NOT IN select list (the projected column(s)) are defined as NOT NULL - then not in and not exists would be semantically equivalent.

why "not in" unable to compare null.

ranjan, August 29, 2013 - 8:38 pm UTC

Hi Tom,

Hope you are fine :)

#############
Create Table A As Select Level A From Dual Connect By Level<=5;

create table b as select level+2 a from dual connect by level<=5 union select null from dual;

Select * From A;
1
2
3
4
5

Select * From B;
3
4
5
6
7

Select * From A Where A not In (Select A From B);
no rows selected

select * from a where a in (select a from b);
3
4
5
##############
You told at the top:
--------
Beware the NULL value
and NOT IN!! (also the reason why NOT IN is sometimes avoided).
---------
That is true but why oracle doesn't handle as it handles in case of "in".
I mean to say how oracle skip the values "1" and "2" in case of "in" which is not correct (as we know null cant be compared with anything (null is unknown), how it calculates that "1" in "null" and "2" in "null" are false).

Many people say that, that is side effect of "not in" , but could you please say something :).


Tom Kyte
September 04, 2013 - 6:22 pm UTC

because we are beholden to the ANSI standard to tell us how to process the language and the standard says this is the way it is to be done.

it is in the definition of the language, a definition we subscribe to but are not in charge of.

hmm, thank you for the reply

Ranjan, September 08, 2013 - 10:10 pm UTC


Exact use of exists

Boob, June 17, 2014 - 2:43 pm UTC

Hi could you please sujjest me how exactly exit will be used.

NOT IN and NOT EXISTS

Niranjan, September 17, 2014 - 9:15 am UTC

Hi Tom,

You said, NOT IN and NOT EXISTS are not the same!!! logically I understand that but I did not quite get, as the result I get is the same (query below). Are you referring to saying (NOT THE SAME) that, EXISTS evaluates to true immediately on the first occurrence of the value during a scan on the table/index?
Used smaple HR schema.
select department_id from departments d where not exists (select null from employees e where e.department_id = d.department_id);
select department_id from departments d where department_id not in (select department_id from employees e where e.department_id = d.department_id);

getting null values from record

lavanya, July 27, 2016 - 11:46 am UTC

Suppose customer is my table in that I want to find all the c ustomers where we don't have their phone numbers .how can I get it?
Chris Saxon
July 27, 2016 - 1:23 pm UTC

select *
from customers
where phone_number is null

Not seeing difference in null handling between NOT IN and NOT EXISTS

Stacey Baker, March 27, 2017 - 10:27 pm UTC

Hi Tom,

I'm still struggling with the difference in null handling between NOT IN and NOT EXISTS. The example you gave that worked, only worked because the correlated subquery inner join eliminated the null in the table (emp 7839 has no manager):

select count(*) from emp t1 where not exists ( select null from emp t2 where mgr = t1.empno);

If I add "or m.mgr is null", I get 0 records, the same answer I get with NOT IN.

select count(*) from emp t1 where not exists ( select null from emp t2 where mgr = t1.empno or mgr is null);

Also, if I use NOT IN in your NOT EXISTS query, it returns the correct count of employees who aren't managers:

select count(*) from emp t1 where empno not in (select mgr from emp t2 where mgr = t1.empno);

How can I see the difference?
Connor McDonald
March 28, 2017 - 1:06 am UTC

You can generally transpose between the two if you *change the sql*, because it is basically about sets and set membership. Tom's point was that simply switching "not in" to "not exists" is not a guarantee it will be the same result.

Smrdíš

A reader, September 26, 2017 - 12:01 pm UTC


Not sure I understand...

Randy Strauss, October 13, 2017 - 8:14 pm UTC

Other answer said/implied that the set in:
 select count(*) from emp where empno not in ( select mgr from emp ); 


evaluated to ( NULL ) - why? It seems like it should evaluate to a set that includes managers and a null (does a set automatically discard redundant values?) . Is that correct? And are you saying this wouldn't work- that NULL would make the set poorly defined?

Would this work better/well?:
 select count(*) from emp where empno not in ( select distinct(mgr) from emp where mgr is not null); 



Connor McDonald
October 16, 2017 - 12:27 pm UTC

And are you saying this wouldn't work- that NULL would make the set poorly defined?

Effectively, yes.

Would this work?:
select count(*) from emp where empno not in ( select distinct(mgr) from emp where mgr is not null);


Yes.

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.