NOT NULL
A reader, March     18, 2005 - 7:39 am UTC
 
 
The statement 
"the <where is not null> would be useful only if col1 and col2 are NOT NULL in the table"
is this meant to read as "are NULL in the table" since NOT IN doesn't account for NULL's?  If it's probable that col1 & col2 contain NULL's, we need to include the IS NOT NULL predicate?
 
 
March     18, 2005 - 9:55 am UTC 
 
sorry -- would be useful ONLY IF col1 and col2 are NULLABLE (i said it exactly backwards -- I corrected the text above!!) 
 
 
 
Why not use SET operation ?
Jeeves, March     18, 2005 - 7:53 am UTC
 
 
Tom,
You are a propenant of SET operation, I was wondering why MINUS was not used to solve the problem.
However, I shall run your proposed solution and MINUS operation and know for myself the performance implication.
Thanks. 
 
March     18, 2005 - 9:57 am UTC 
 
both of the above are SET operations.
MINUS changes the answer.
select * from t1
minus
select * from t2
is *not* the same as
select * from t1 where (col1,col2) not in (select col1, col2 from t2);
first -- there is a col3 (minus would make it impossible to just look at col1 and col2 for differences -- and retrieve col3)
second -- minus implies a DISTINCT. 
 
 
 
another way
Jon, March     18, 2005 - 9:23 am UTC
 
 
Here is another way and actually, it is what I typically do.
select table_a.*
  from table_a 
 where not exists (select null
                     from table_b 
                    where table_a.col1 = table_b.col1
                      and table_a.col2 = table_b.col2)
I have always used the rule of thumb to not use "in" when the subselect has a large number of records.  Is that a bad rule?  My rule is old and probably concluded in the old days of RBO. 
 
March     18, 2005 - 10:09 am UTC 
 
bad rule from the days of the rbo.
not in is easier and will be rewritten by the cbo to not exists when appropriate. 
 
 
 
definition needed ...
Gabe, March     18, 2005 - 11:27 am UTC
 
 
By the time NULLs appear into the picture the question is no longer trivial 
 at least not until equality between pairs is fully defined.
For instance, is (a,b) equal to (c,d) when all a, b, c, d are null? 
 or not? 
 or not defined?
 
 
March     18, 2005 - 11:54 am UTC 
 
null is never equal to null, nor not equal.
so the answer to both is "unknown" of course :) 
 
 
 
hope this is not too much about nothing ... :-)
Gabe, March     18, 2005 - 12:43 pm UTC
 
 
<quote>
null is never equal to null, nor not equal.
</quote>
I know ... I'm saying expanding that observation from singleton to pairs to triplets, etc. is not an absolute thing ... it is a case-by-case definition.
Based on your observation (a,null) compared to (b,null) will always be unknown 
flip@FLOP> set null null
flip@FLOP> col n format a4
flip@FLOP> select a.* from
  2  (
  3  select    1,null n from dual
  4  union
  5  select null,null n from dual
  6  ) a
  7  intersect
  8  select b.* from
  9  (
 10  select    1,null n from dual
 11  union
 12  select null,null n from dual
 13  ) b;
         1 N
---------- ----
         1 null
null       null
In fact your NOT IN query doesnt return the records from table_a which are not in table_b based on that (col1,col2) comparison 
 it returns that _plus_ records from table_a which cannot be compared with records from table_b 
 here Im comparing one table with itself and Im getting records back simply because they cannot be compared:
flip@FLOP> select * from table_b;
      COL1       COL2
---------- ----------
         1 null
null       null
flip@FLOP> select *
  2  from   table_b
  3  where  (col1,col2) NOT IN (select col1, col2
  4                             from   table_b
  5                             where  col1 is not null
  6                             and    col2 is not null
  7                            )
  8  ;
      COL1       COL2
---------- ----------
         1 null
null       null
I just dont see it as clear-cut.
 
 
March     18, 2005 - 1:00 pm UTC 
 
correct, nulls mess this up. 
I made an assumption that col1, col2 in table b are not null 
 
 
 
I usualy do this:
Sergey, March     18, 2005 - 5:15 pm UTC
 
 
select table_a.*
  from table_a, table_b
 where table_a.col1 = table_b.col1(+)
   and table_a.col2 = table_b.col2(+)
   and table_b.rowid IS NULL
/
 
 
March     18, 2005 - 8:11 pm UTC 
 
that is the "do it yourself anti join with extra work" mentioned above. 
 
 
 
NULL invasion
A reader, March     18, 2005 - 10:33 pm UTC
 
 
So, if ALL the columns in question are NULLABLE, the only sure way to ensure we receive non-null rows is to add an additional predicate to the parent query:
select col1, col2
  from   table_b
  where  (col1,col2) NOT IN (select col3, col4
                             from   table_a
                             where  col3 is not null
                             and    col4 is not null
                            )
  and col1 is not null
  and col2 is not null
  ;
I assume this would be the most efficient way to obtain this result set? 
 
March     19, 2005 - 6:54 am UTC 
 
select * 
  from table_a 
 where (col1,col2) NOT IN (select col1, col2 
                             from table_b
                            <where col1 is not null and col2 is not null > )
is all you need. 
x,null
null,x
null,null
is "not" known to be in or not in some set.... so they will not be true for "NOT IN"
ops$tkyte@ORA9IR2> set null ***
ops$tkyte@ORA9IR2> select * from t1;
 
        C1         C2
---------- ----------
         1 ***
***                 1
***        ***
         1          1
         2          2
 
ops$tkyte@ORA9IR2> select * from t2;
 
        C1         C2
---------- ----------
         1          1
 
ops$tkyte@ORA9IR2> select * from t1 where (c1,c2) not in ( select * from t2 );
 
        C1         C2
---------- ----------
         2          2
 
 
 
 
 
I've got to be missing something???
A reader, March     19, 2005 - 1:19 pm UTC
 
 
SQL> set null ****
SQL> drop table table_b; 
drop table table_b 
          * 
ERROR at line 1: 
ORA-00942: table or view does not exist 
SQL> drop table table_a; 
drop table table_a 
          * 
ERROR at line 1: 
ORA-00942: table or view does not exist 
SQL> create table table_b (col1 number, col2 number); 
Table created. 
SQL> create table table_a (col3 number, col4 number); 
Table created. 
SQL> insert into table_b values (1, null); 
1 row created. 
SQL> insert into table_b values (null, null); 
1 row created. 
SQL> insert into table_b values (2, 2); 
1 row created. 
SQL> insert into table_a values (1, null); 
1 row created. 
SQL> insert into table_a values (2, 1); 
1 row created. 
SQL> insert into table_a values (null, null); 
1 row created. 
SQL> select * from table_b; 
     COL1       COL2 
---------- ---------- 
        1  **** 
****       **** 
        2          2 
SQL> select * from table_a; 
     COL3       COL4 
---------- ---------- 
        1  **** 
        2          1 
****       **** 
SQL> select col1, col2 
2  from table_b 
3  where (col1, col2) not in (select col3, col4 
4                             from table_a); 
no rows selected 
SQL> select col1, col2 
2  from table_b 
3  where (col1, col2) not in (select col3, col4 
4                             from table_a 
5                             where col3 is not null 
6                             and col4 is not null); 
     COL1       COL2 
---------- ---------- 
        1  **** 
        2          2 
SQL> c/and/or 
6*                            or col4 is not null) 
SQL> / 
     COL1       COL2 
---------- ---------- 
        2          2 
 
 
 
 
I get the same thing??
A reader, March     25, 2005 - 9:04 am UTC
 
 
I get the same results.  Can you explain this? 
 
 
Any feedback on this?
A reader, March     27, 2005 - 6:24 pm UTC
 
 
Is this a bug? 
 
March     27, 2005 - 6:35 pm UTC 
 
ops$tkyte@ORA9IR2> select col1, col2
  2   from table_b
  3   where (col1, col2) not in (select col3, col4
  4                           from table_a);
 
no rows selected
<b>that is because with the (NULL NULL) entry in table_a, it is "not known" if any of col1 col2 are "not in"
Now, if we run the next subquery:</b>
ops$tkyte@ORA9IR2> select col3, col4
  2  from table_a
  3  where col3 is not null
  4  and col4 is not null;
 
      COL3       COL4
---------- ----------
         2          1
 
ops$tkyte@ORA9IR2> select col1, col2
  2   from table_b
  3   where (col1, col2) not in (select col3, col4
  4                           from table_a
  5                           where col3 is not null
  6                           and col4 is not null);
 
      COL1       COL2
---------- ----------
         1 ****
         2          2
<b>
(1,null) is known to be NOT IN (2,1) since 1,<anything> cannot be in 2,<anything>
continuing on</b>
ops$tkyte@ORA9IR2> select col3, col4
  2  from table_a
  3  where col3 is not null
  4  or col4 is not null;
 
      COL3       COL4
---------- ----------
         1 ****
         2          1
 
ops$tkyte@ORA9IR2> select col1, col2
  2   from table_b
  3   where (col1, col2) not in (select col3, col4
  4                           from table_a
  5                           where col3 is not null
  6                           or col4 is not null);
 
      COL1       COL2
---------- ----------
         2          2
 <b>
Now, 1,<anything> is neither "known" or "not known" to be in (1,NULL), hence 1,null disappears -- with the addition of 1,<null> to the subquery.  However, 2,<anything> is known to NOT be in 1,null (because of the first column) and 2,2 is known to be not in (2,1) so it appears</b>
I see nothing wrong here?  Which rows are causing the problem? 
 
 
 
 
Confused??
A reader, March     27, 2005 - 6:45 pm UTC
 
 
I would have expected the query
SQL> select col1, col2 
2  from table_b 
3  where (col1, col2) not in (select col3, col4 
4                             from table_a); 
no rows selected 
to return:
     COL1       COL2 
---------- ---------- 
        2          2 
Instead of "no rows selected". 
 
 
March     27, 2005 - 6:49 pm UTC 
 
but (2,<ANYTHING>)
is KNOWN to be "not in" (1,<ANYTHING>)
2 <> 1, that is KNOWN.  it need not check the rest of the columns, the "vector" (2,*) is KNOWN to not be in (1,*)
 
 
 
 
Potential User Error To Watch For
Keith, April     21, 2005 - 11:07 am UTC
 
 
There is a slim but still possible chance that the sought after data does not exist in both tables. A few well placed queries to USER_TAB_COLS or clauses in your original query that accomplish the same or equivalent analysis could be helpful.
My question is, if both columns exist in both tables, is there a solid reason why you put yourself in a situation of duplicating data?  Developers here who commit unneccessary data duplication get a head slap and a dose of public humiliation.
Of course, it's entirely possible that a (non-)unique pair in one table would have a similarly (non-)unique counter pair in another and the comparison would serve to limit results.  Your data model may vary. 
 
 
Nvl in null comparison
Pratap Singn, February  01, 2007 - 4:01 pm UTC
 
 
I normally prefer a null comparison by nvl instead of letting the default oracle behavior of unknown. This ensures all combination of (data,null) can be compared to (data,null) as well the distinctness of having (null, null) from one set will be compared to not exists of (null, null) from other set.
-PBS 
February  02, 2007 - 10:12 am UTC 
 
I normally prefer to code what needs to be coded at the point in time we code it.
Meaning - using NVL all of the time just doesn't make sense.  The requirements of your application will dictate whether you need to compare nulls or not.