Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question.

Asked: March 17, 2005 - 8:43 pm UTC

Last updated: February 02, 2007 - 10:12 am UTC

Version: 9.2.0.4

Viewed 1000+ times

You Asked

Tom,
I apologize for such a trivial question, however I was wondering the most efficient way to determine records that are in Table_A that aren't in Table_B. Assuming we are ONLY concerned about two specific columns in each table (ie. Col_3 & Col_4 in Table_A that aren't included in Col_1 & Col_2 in Table_B). Both tables have millions of records.

and Tom said...

using the CBO:


select *
from table_a
where (col1,col2) NOT IN (select col1, col2
from table_b
<where col1 is not null and col2 is not null>)

the <where is not null> would be useful only if col1 and col2 are NULLABLE in the table


if forced to use RBO (cbo should do a nice big juicy hash anti join)

select table_a.*
from table_a, table_b
where table_a.col1 = table_b.col1(+)
and table_a.col2 = table_b.col2(+)
and table_b.col1 is null
and table_b.col2 is null
/

simulates it.

Rating

  (13 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

NOT NULL

A reader, March 18, 2005 - 7:39 am UTC

The statement

"the <where is not null> would be useful only if col1 and col2 are NOT NULL in the table"

is this meant to read as "are NULL in the table" since NOT IN doesn't account for NULL's? If it's probable that col1 & col2 contain NULL's, we need to include the IS NOT NULL predicate?




Tom Kyte
March 18, 2005 - 9:55 am UTC

sorry -- would be useful ONLY IF col1 and col2 are NULLABLE (i said it exactly backwards -- I corrected the text above!!)

Why not use SET operation ?

Jeeves, March 18, 2005 - 7:53 am UTC

Tom,
You are a propenant of SET operation, I was wondering why MINUS was not used to solve the problem.
However, I shall run your proposed solution and MINUS operation and know for myself the performance implication.
Thanks.

Tom Kyte
March 18, 2005 - 9:57 am UTC

both of the above are SET operations.

MINUS changes the answer.


select * from t1
minus
select * from t2

is *not* the same as

select * from t1 where (col1,col2) not in (select col1, col2 from t2);


first -- there is a col3 (minus would make it impossible to just look at col1 and col2 for differences -- and retrieve col3)

second -- minus implies a DISTINCT.

another way

Jon, March 18, 2005 - 9:23 am UTC

Here is another way and actually, it is what I typically do.

select table_a.*
from table_a
where not exists (select null
from table_b
where table_a.col1 = table_b.col1
and table_a.col2 = table_b.col2)

I have always used the rule of thumb to not use "in" when the subselect has a large number of records. Is that a bad rule? My rule is old and probably concluded in the old days of RBO.

Tom Kyte
March 18, 2005 - 10:09 am UTC

bad rule from the days of the rbo.

not in is easier and will be rewritten by the cbo to not exists when appropriate.

definition needed ...

Gabe, March 18, 2005 - 11:27 am UTC

By the time NULLs appear into the picture the question is no longer trivial … at least not until equality between pairs is fully defined.

For instance, is (a,b) equal to (c,d) when all a, b, c, d are null? … or not? … or not defined?


Tom Kyte
March 18, 2005 - 11:54 am UTC

null is never equal to null, nor not equal.


so the answer to both is "unknown" of course :)

hope this is not too much about nothing ... :-)

Gabe, March 18, 2005 - 12:43 pm UTC

<quote>
null is never equal to null, nor not equal.
</quote>

I know ... I'm saying expanding that observation from singleton to pairs to triplets, etc. is not an absolute thing ... it is a case-by-case definition.

Based on your observation (a,null) compared to (b,null) will always be “unknown” …

flip@FLOP> set null null
flip@FLOP> col n format a4
flip@FLOP> select a.* from
2 (
3 select 1,null n from dual
4 union
5 select null,null n from dual
6 ) a
7 intersect
8 select b.* from
9 (
10 select 1,null n from dual
11 union
12 select null,null n from dual
13 ) b;

1 N
---------- ----
1 null
null null

In fact your NOT IN query doesn’t return the records from table_a which are not in table_b based on that (col1,col2) comparison … it returns that _plus_ records from table_a which cannot be compared with records from table_b … here I’m comparing one table with itself and I’m getting records back simply because they cannot be compared:

flip@FLOP> select * from table_b;

COL1 COL2
---------- ----------
1 null
null null

flip@FLOP> select *
2 from table_b
3 where (col1,col2) NOT IN (select col1, col2
4 from table_b
5 where col1 is not null
6 and col2 is not null
7 )
8 ;

COL1 COL2
---------- ----------
1 null
null null

I just don’t see it as clear-cut.



Tom Kyte
March 18, 2005 - 1:00 pm UTC

correct, nulls mess this up.

I made an assumption that col1, col2 in table b are not null

I usualy do this:

Sergey, March 18, 2005 - 5:15 pm UTC

select table_a.*
from table_a, table_b
where table_a.col1 = table_b.col1(+)
and table_a.col2 = table_b.col2(+)
and table_b.rowid IS NULL
/



Tom Kyte
March 18, 2005 - 8:11 pm UTC

that is the "do it yourself anti join with extra work" mentioned above.

NULL invasion

A reader, March 18, 2005 - 10:33 pm UTC

So, if ALL the columns in question are NULLABLE, the only sure way to ensure we receive non-null rows is to add an additional predicate to the parent query:

select col1, col2
from table_b
where (col1,col2) NOT IN (select col3, col4
from table_a
where col3 is not null
and col4 is not null
)
and col1 is not null
and col2 is not null
;

I assume this would be the most efficient way to obtain this result set?

Tom Kyte
March 19, 2005 - 6:54 am UTC

select * 
  from table_a 
 where (col1,col2) NOT IN (select col1, col2 
                             from table_b
                            <where col1 is not null and col2 is not null > )

is all you need. 

x,null
null,x
null,null

is "not" known to be in or not in some set.... so they will not be true for "NOT IN"

ops$tkyte@ORA9IR2> set null ***
ops$tkyte@ORA9IR2> select * from t1;
 
        C1         C2
---------- ----------
         1 ***
***                 1
***        ***
         1          1
         2          2
 
ops$tkyte@ORA9IR2> select * from t2;
 
        C1         C2
---------- ----------
         1          1
 
ops$tkyte@ORA9IR2> select * from t1 where (c1,c2) not in ( select * from t2 );
 
        C1         C2
---------- ----------
         2          2
 

I've got to be missing something???

A reader, March 19, 2005 - 1:19 pm UTC

SQL> set null ****
SQL> drop table table_b; 
drop table table_b 
          * 
ERROR at line 1: 
ORA-00942: table or view does not exist 


SQL> drop table table_a; 
drop table table_a 
          * 
ERROR at line 1: 
ORA-00942: table or view does not exist 


SQL> create table table_b (col1 number, col2 number); 

Table created. 

SQL> create table table_a (col3 number, col4 number); 

Table created. 

SQL> insert into table_b values (1, null); 

1 row created. 

SQL> insert into table_b values (null, null); 

1 row created. 

SQL> insert into table_b values (2, 2); 

1 row created. 

SQL> insert into table_a values (1, null); 

1 row created. 

SQL> insert into table_a values (2, 1); 

1 row created. 

SQL> insert into table_a values (null, null); 

1 row created. 

SQL> select * from table_b; 

     COL1       COL2 
---------- ---------- 
        1  **** 
****       **** 
        2          2 

SQL> select * from table_a; 

     COL3       COL4 
---------- ---------- 
        1  **** 
        2          1 
****       **** 

SQL> select col1, col2 
2  from table_b 
3  where (col1, col2) not in (select col3, col4 
4                             from table_a); 

no rows selected 

SQL> select col1, col2 
2  from table_b 
3  where (col1, col2) not in (select col3, col4 
4                             from table_a 
5                             where col3 is not null 
6                             and col4 is not null); 

     COL1       COL2 
---------- ---------- 
        1  **** 
        2          2 


SQL> c/and/or 
6*                            or col4 is not null) 
SQL> / 

     COL1       COL2 
---------- ---------- 
        2          2 
 

I get the same thing??

A reader, March 25, 2005 - 9:04 am UTC

I get the same results. Can you explain this?

Any feedback on this?

A reader, March 27, 2005 - 6:24 pm UTC

Is this a bug?

Tom Kyte
March 27, 2005 - 6:35 pm UTC

ops$tkyte@ORA9IR2> select col1, col2
  2   from table_b
  3   where (col1, col2) not in (select col3, col4
  4                           from table_a);
 
no rows selected

<b>that is because with the (NULL NULL) entry in table_a, it is "not known" if any of col1 col2 are "not in"

Now, if we run the next subquery:</b>

ops$tkyte@ORA9IR2> select col3, col4
  2  from table_a
  3  where col3 is not null
  4  and col4 is not null;
 
      COL3       COL4
---------- ----------
         2          1
 
ops$tkyte@ORA9IR2> select col1, col2
  2   from table_b
  3   where (col1, col2) not in (select col3, col4
  4                           from table_a
  5                           where col3 is not null
  6                           and col4 is not null);
 
      COL1       COL2
---------- ----------
         1 ****
         2          2

<b>
(1,null) is known to be NOT IN (2,1) since 1,<anything> cannot be in 2,<anything>

continuing on</b>

ops$tkyte@ORA9IR2> select col3, col4
  2  from table_a
  3  where col3 is not null
  4  or col4 is not null;
 
      COL3       COL4
---------- ----------
         1 ****
         2          1
 
ops$tkyte@ORA9IR2> select col1, col2
  2   from table_b
  3   where (col1, col2) not in (select col3, col4
  4                           from table_a
  5                           where col3 is not null
  6                           or col4 is not null);
 
      COL1       COL2
---------- ----------
         2          2
 <b>
Now, 1,<anything> is neither "known" or "not known" to be in (1,NULL), hence 1,null disappears -- with the addition of 1,<null> to the subquery.  However, 2,<anything> is known to NOT be in 1,null (because of the first column) and 2,2 is known to be not in (2,1) so it appears</b>


I see nothing wrong here?  Which rows are causing the problem? 

Confused??

A reader, March 27, 2005 - 6:45 pm UTC

I would have expected the query

SQL> select col1, col2 
2  from table_b 
3  where (col1, col2) not in (select col3, col4 
4                             from table_a); 

no rows selected 

to return:

     COL1       COL2 
---------- ---------- 
        2          2 

Instead of "no rows selected". 

Tom Kyte
March 27, 2005 - 6:49 pm UTC

but (2,<ANYTHING>)

is KNOWN to be "not in" (1,<ANYTHING>)



2 <> 1, that is KNOWN. it need not check the rest of the columns, the "vector" (2,*) is KNOWN to not be in (1,*)




Potential User Error To Watch For

Keith, April 21, 2005 - 11:07 am UTC

There is a slim but still possible chance that the sought after data does not exist in both tables. A few well placed queries to USER_TAB_COLS or clauses in your original query that accomplish the same or equivalent analysis could be helpful.

My question is, if both columns exist in both tables, is there a solid reason why you put yourself in a situation of duplicating data? Developers here who commit unneccessary data duplication get a head slap and a dose of public humiliation.

Of course, it's entirely possible that a (non-)unique pair in one table would have a similarly (non-)unique counter pair in another and the comparison would serve to limit results. Your data model may vary.

Nvl in null comparison

Pratap Singn, February 01, 2007 - 4:01 pm UTC

I normally prefer a null comparison by nvl instead of letting the default oracle behavior of unknown. This ensures all combination of (data,null) can be compared to (data,null) as well the distinctness of having (null, null) from one set will be compared to not exists of (null, null) from other set.
-PBS
Tom Kyte
February 02, 2007 - 10:12 am UTC

I normally prefer to code what needs to be coded at the point in time we code it.

Meaning - using NVL all of the time just doesn't make sense. The requirements of your application will dictate whether you need to compare nulls or not.