NOT NULL
A reader, March 18, 2005 - 7:39 am UTC
The statement
"the <where is not null> would be useful only if col1 and col2 are NOT NULL in the table"
is this meant to read as "are NULL in the table" since NOT IN doesn't account for NULL's? If it's probable that col1 & col2 contain NULL's, we need to include the IS NOT NULL predicate?
March 18, 2005 - 9:55 am UTC
sorry -- would be useful ONLY IF col1 and col2 are NULLABLE (i said it exactly backwards -- I corrected the text above!!)
Why not use SET operation ?
Jeeves, March 18, 2005 - 7:53 am UTC
Tom,
You are a propenant of SET operation, I was wondering why MINUS was not used to solve the problem.
However, I shall run your proposed solution and MINUS operation and know for myself the performance implication.
Thanks.
March 18, 2005 - 9:57 am UTC
both of the above are SET operations.
MINUS changes the answer.
select * from t1
minus
select * from t2
is *not* the same as
select * from t1 where (col1,col2) not in (select col1, col2 from t2);
first -- there is a col3 (minus would make it impossible to just look at col1 and col2 for differences -- and retrieve col3)
second -- minus implies a DISTINCT.
another way
Jon, March 18, 2005 - 9:23 am UTC
Here is another way and actually, it is what I typically do.
select table_a.*
from table_a
where not exists (select null
from table_b
where table_a.col1 = table_b.col1
and table_a.col2 = table_b.col2)
I have always used the rule of thumb to not use "in" when the subselect has a large number of records. Is that a bad rule? My rule is old and probably concluded in the old days of RBO.
March 18, 2005 - 10:09 am UTC
bad rule from the days of the rbo.
not in is easier and will be rewritten by the cbo to not exists when appropriate.
definition needed ...
Gabe, March 18, 2005 - 11:27 am UTC
By the time NULLs appear into the picture the question is no longer trivial
at least not until equality between pairs is fully defined.
For instance, is (a,b) equal to (c,d) when all a, b, c, d are null?
or not?
or not defined?
March 18, 2005 - 11:54 am UTC
null is never equal to null, nor not equal.
so the answer to both is "unknown" of course :)
hope this is not too much about nothing ... :-)
Gabe, March 18, 2005 - 12:43 pm UTC
<quote>
null is never equal to null, nor not equal.
</quote>
I know ... I'm saying expanding that observation from singleton to pairs to triplets, etc. is not an absolute thing ... it is a case-by-case definition.
Based on your observation (a,null) compared to (b,null) will always be unknown
flip@FLOP> set null null
flip@FLOP> col n format a4
flip@FLOP> select a.* from
2 (
3 select 1,null n from dual
4 union
5 select null,null n from dual
6 ) a
7 intersect
8 select b.* from
9 (
10 select 1,null n from dual
11 union
12 select null,null n from dual
13 ) b;
1 N
---------- ----
1 null
null null
In fact your NOT IN query doesnt return the records from table_a which are not in table_b based on that (col1,col2) comparison
it returns that _plus_ records from table_a which cannot be compared with records from table_b
here Im comparing one table with itself and Im getting records back simply because they cannot be compared:
flip@FLOP> select * from table_b;
COL1 COL2
---------- ----------
1 null
null null
flip@FLOP> select *
2 from table_b
3 where (col1,col2) NOT IN (select col1, col2
4 from table_b
5 where col1 is not null
6 and col2 is not null
7 )
8 ;
COL1 COL2
---------- ----------
1 null
null null
I just dont see it as clear-cut.
March 18, 2005 - 1:00 pm UTC
correct, nulls mess this up.
I made an assumption that col1, col2 in table b are not null
I usualy do this:
Sergey, March 18, 2005 - 5:15 pm UTC
select table_a.*
from table_a, table_b
where table_a.col1 = table_b.col1(+)
and table_a.col2 = table_b.col2(+)
and table_b.rowid IS NULL
/
March 18, 2005 - 8:11 pm UTC
that is the "do it yourself anti join with extra work" mentioned above.
NULL invasion
A reader, March 18, 2005 - 10:33 pm UTC
So, if ALL the columns in question are NULLABLE, the only sure way to ensure we receive non-null rows is to add an additional predicate to the parent query:
select col1, col2
from table_b
where (col1,col2) NOT IN (select col3, col4
from table_a
where col3 is not null
and col4 is not null
)
and col1 is not null
and col2 is not null
;
I assume this would be the most efficient way to obtain this result set?
March 19, 2005 - 6:54 am UTC
select *
from table_a
where (col1,col2) NOT IN (select col1, col2
from table_b
<where col1 is not null and col2 is not null > )
is all you need.
x,null
null,x
null,null
is "not" known to be in or not in some set.... so they will not be true for "NOT IN"
ops$tkyte@ORA9IR2> set null ***
ops$tkyte@ORA9IR2> select * from t1;
C1 C2
---------- ----------
1 ***
*** 1
*** ***
1 1
2 2
ops$tkyte@ORA9IR2> select * from t2;
C1 C2
---------- ----------
1 1
ops$tkyte@ORA9IR2> select * from t1 where (c1,c2) not in ( select * from t2 );
C1 C2
---------- ----------
2 2
I've got to be missing something???
A reader, March 19, 2005 - 1:19 pm UTC
SQL> set null ****
SQL> drop table table_b;
drop table table_b
*
ERROR at line 1:
ORA-00942: table or view does not exist
SQL> drop table table_a;
drop table table_a
*
ERROR at line 1:
ORA-00942: table or view does not exist
SQL> create table table_b (col1 number, col2 number);
Table created.
SQL> create table table_a (col3 number, col4 number);
Table created.
SQL> insert into table_b values (1, null);
1 row created.
SQL> insert into table_b values (null, null);
1 row created.
SQL> insert into table_b values (2, 2);
1 row created.
SQL> insert into table_a values (1, null);
1 row created.
SQL> insert into table_a values (2, 1);
1 row created.
SQL> insert into table_a values (null, null);
1 row created.
SQL> select * from table_b;
COL1 COL2
---------- ----------
1 ****
**** ****
2 2
SQL> select * from table_a;
COL3 COL4
---------- ----------
1 ****
2 1
**** ****
SQL> select col1, col2
2 from table_b
3 where (col1, col2) not in (select col3, col4
4 from table_a);
no rows selected
SQL> select col1, col2
2 from table_b
3 where (col1, col2) not in (select col3, col4
4 from table_a
5 where col3 is not null
6 and col4 is not null);
COL1 COL2
---------- ----------
1 ****
2 2
SQL> c/and/or
6* or col4 is not null)
SQL> /
COL1 COL2
---------- ----------
2 2
I get the same thing??
A reader, March 25, 2005 - 9:04 am UTC
I get the same results. Can you explain this?
Any feedback on this?
A reader, March 27, 2005 - 6:24 pm UTC
Is this a bug?
March 27, 2005 - 6:35 pm UTC
ops$tkyte@ORA9IR2> select col1, col2
2 from table_b
3 where (col1, col2) not in (select col3, col4
4 from table_a);
no rows selected
<b>that is because with the (NULL NULL) entry in table_a, it is "not known" if any of col1 col2 are "not in"
Now, if we run the next subquery:</b>
ops$tkyte@ORA9IR2> select col3, col4
2 from table_a
3 where col3 is not null
4 and col4 is not null;
COL3 COL4
---------- ----------
2 1
ops$tkyte@ORA9IR2> select col1, col2
2 from table_b
3 where (col1, col2) not in (select col3, col4
4 from table_a
5 where col3 is not null
6 and col4 is not null);
COL1 COL2
---------- ----------
1 ****
2 2
<b>
(1,null) is known to be NOT IN (2,1) since 1,<anything> cannot be in 2,<anything>
continuing on</b>
ops$tkyte@ORA9IR2> select col3, col4
2 from table_a
3 where col3 is not null
4 or col4 is not null;
COL3 COL4
---------- ----------
1 ****
2 1
ops$tkyte@ORA9IR2> select col1, col2
2 from table_b
3 where (col1, col2) not in (select col3, col4
4 from table_a
5 where col3 is not null
6 or col4 is not null);
COL1 COL2
---------- ----------
2 2
<b>
Now, 1,<anything> is neither "known" or "not known" to be in (1,NULL), hence 1,null disappears -- with the addition of 1,<null> to the subquery. However, 2,<anything> is known to NOT be in 1,null (because of the first column) and 2,2 is known to be not in (2,1) so it appears</b>
I see nothing wrong here? Which rows are causing the problem?
Confused??
A reader, March 27, 2005 - 6:45 pm UTC
I would have expected the query
SQL> select col1, col2
2 from table_b
3 where (col1, col2) not in (select col3, col4
4 from table_a);
no rows selected
to return:
COL1 COL2
---------- ----------
2 2
Instead of "no rows selected".
March 27, 2005 - 6:49 pm UTC
but (2,<ANYTHING>)
is KNOWN to be "not in" (1,<ANYTHING>)
2 <> 1, that is KNOWN. it need not check the rest of the columns, the "vector" (2,*) is KNOWN to not be in (1,*)
Potential User Error To Watch For
Keith, April 21, 2005 - 11:07 am UTC
There is a slim but still possible chance that the sought after data does not exist in both tables. A few well placed queries to USER_TAB_COLS or clauses in your original query that accomplish the same or equivalent analysis could be helpful.
My question is, if both columns exist in both tables, is there a solid reason why you put yourself in a situation of duplicating data? Developers here who commit unneccessary data duplication get a head slap and a dose of public humiliation.
Of course, it's entirely possible that a (non-)unique pair in one table would have a similarly (non-)unique counter pair in another and the comparison would serve to limit results. Your data model may vary.
Nvl in null comparison
Pratap Singn, February 01, 2007 - 4:01 pm UTC
I normally prefer a null comparison by nvl instead of letting the default oracle behavior of unknown. This ensures all combination of (data,null) can be compared to (data,null) as well the distinctness of having (null, null) from one set will be compared to not exists of (null, null) from other set.
-PBS
February 02, 2007 - 10:12 am UTC
I normally prefer to code what needs to be coded at the point in time we code it.
Meaning - using NVL all of the time just doesn't make sense. The requirements of your application will dictate whether you need to compare nulls or not.