don't just say "one time I replaced a not exists with outer join and it was better"
Duke Ganote, February 23, 2009 - 12:50 pm UTC
it depends
Sokrates, February 23, 2009 - 1:16 pm UTC
q<"
...
don't just say "one time I replaced a not exists with outer join and it was better" - because all I'll say is "one time I replaced an outer join with a not exists and it was better"
...
">
ok, it all depends, you are right.
I had an issue yesterday with a quite complex view and "not exists" (too complex to post here, nobody will be interested).
replacing "not exists" into an outer join - as I was used to - resulted in a completely other execution plan and brought down execution time from 12 minutes to 25 seconds.
Now I played around, trying to make a simple example - didn't succeed.
However, I saw (it seems to me that):
when query is a bit more complex than your example, execution plan will most likely change between "not exists" and "outer join", and sometimes the one seems to be superior, sometimes the other.
Learnt that I will take "not exists" into account more often, because it is absolutely more easy to read and to understand
February 23, 2009 - 5:46 pm UTC
... and sometimes the one
seems to be superior, sometimes the other.
...
and it will always come down to "one let us guess the estimated card= value better - get the right card= and we get the right plan, get the wrong one - and we don't"
So, that is one reason why we might play with dynamic sampling (level 3 - to fill in the blanks for guessing) or sql profiles before sitting down and rewriting SQL (or extended statistics, statistics on functions/virtual columns, statistics on column pairs and such available in 11g)
OBJECTION to "outer join trick just makes people scratch their heads"
Duke Ganote, February 24, 2009 - 10:48 am UTC
The outer join approach isn't a "trick" at all. I visualize a Venn diagram where:
A - ( A ^ B )
or:
"Set A minus the intersect of A and B"
http://en.wikipedia.org/wiki/Venn_Diagram#A_simple_example In SQL:
select emp.*
from scott.emp A LEFT OUTER JOIN
scott.dept B -- all of set A while identifying ( A ^ B )
ON ( A.deptno = B.deptno )
WHERE B.deptno is null -- MINUS ( A ^ B )
The "NOT EXISTS" approach is logically equivalent.
SELECT A.*
FROM scott.emp A -- all of set A
WHERE NOT EXISTS -- MINUS ( A ^ B )
(SELECT B.deptno
FROM scott.dept B
where B.deptno = A.deptno)
The "NOT IN" approach is simply "A - B".
SELECT A.*
FROM scott.emp A
WHERE A.deptno NOT IN
(SELECT B.deptno
FROM scott.dept B
where B.deptno IS NOT NULL)
February 24, 2009 - 5:05 pm UTC
I'd guess that 90% of 'sql coders' aren't sure at all what an outer join really is actually - many people stick them in "just in case".
select emp.*
from scott.emp A LEFT OUTER JOIN
scott.dept B -- all of set A while identifying ( A ^ B )
ON ( A.deptno = B.deptno )
WHERE B.deptno is null -- MINUS ( A ^ B )
that to me, reading it, says "join emp to dept"
this:
SELECT A.*
FROM scott.emp A -- all of set A
WHERE NOT EXISTS -- MINUS ( A ^ B )
(SELECT B.deptno
FROM scott.dept B
where B.deptno = A.deptno)
says "read emp and give me everything from emp such that there is no match in dept"
The not exists says exactly what you are trying say - what the question is, the outer join - it does not. My mind would not say "in order to get things in emp that are not in dept, I start by joining emp to dept" - it doesn't work that way.
beyond the 90%
Duke Ganote, February 24, 2009 - 5:49 pm UTC
It's "more obvious" to me, because the intersect ( A ^ B ) is an inner join:
select emp.*
from scott.emp A INNER JOIN
scott.dept B -- ( A ^ B )
ON ( A.deptno = B.deptno )
Getting all of A means using the left outer join, then "throwing away" the intersect.
But I've never denied that I was born with a high quota of quirks!
February 24, 2009 - 9:38 pm UTC
well, to me, to outer join (join A to B, even if there isn't a match in B) in order to get all of the A's that are not in B is... counter-intuitive :)
You say "potato", I saw "potahto"
Duke Ganote, February 25, 2009 - 9:00 am UTC
February 25, 2009 - 5:59 pm UTC
well, technically joins are not intersects.
The outer join approach isn't a "trick" at all. I visualize a Venn diagram where:
A - ( A ^ B )
or:
"Set A minus the intersect of A and B"
that would be
select * from a
minus
(select * from a
intersect
select * from b
)
but that results in a different answer from the join :)
while a join is a set operation - technically joins of any sort are not intersect or minus 'capable'.
:)
Chuck Jolley, February 25, 2009 - 1:36 pm UTC
How about writing the query with the not exists so it is easy to read and understand, then if the optimizer cant find the optimal plan for a particular query then give it a hint on the best plan by using, well, hints?
an example
Sokrates, February 26, 2009 - 4:01 am UTC
here is an example.
I find it interesting that optimizer uses different plans (no matter which optimizer_mode I use), why ?
autotrace shows that both plans are equally expensive.
(
real world example was similar, but EMP2 was not a table but a complex view using analytics, group by and joining to some lookup tables.
there, outer join happened to be much cheaper than not exists.
)
REM grant execute on dbms_random to scott;
connect scott
delete emp;
insert into emp(empno, ename, deptno, hiredate)
select
rownum,
nls_initcap(dbms_random.string('u', trunc(dbms_random.value * 6) + 5)),
1 + trunc(99 * dbms_random.value),
trunc(sysdate - 10000 * (dbms_random.value * dbms_random.value))
from dual
connect by level < 10000;
CREATE TABLE EMP2
( EMPNO NUMBER(4,0),
ENAME VARCHAR2(10),
JOB VARCHAR2(9),
MGR NUMBER(4,0),
HIREDATE DATE,
SAL NUMBER(7,2),
COMM NUMBER(7,2),
DEPTNO NUMBER(2,0),
CONSTRAINT PK_EMP2 PRIMARY KEY (EMPNO)
USING INDEX ,
CONSTRAINT FK_DEPTNO2 FOREIGN KEY (DEPTNO)
REFERENCES DEPT (DEPTNO)
);
insert into emp2(empno, ename, deptno, hiredate)
select
rownum,
nls_initcap(dbms_random.string('u', trunc(dbms_random.value * 6) + 5)),
1 + trunc(99 * dbms_random.value * dbms_random.value * dbms_random.value),
trunc(sysdate - 10000 * (dbms_random.value * dbms_random.value))
from dual
connect by level < 10000;
commit;
exec dbms_stats.gather_table_stats(user, 'EMP', cascade=>true)
exec dbms_stats.gather_table_stats(user, 'EMP2', cascade=>true)
I want to get each emp out of EMP for which there is no emp2 in EMP2 in the same department as emp
which was hired before the first hired emp in this department.
set lines 200 autotr traceonly feedb on
select e.*
from scott.emp e
where not exists
(select null
from scott.emp2 e2
where e2.deptno = e.deptno
and e2.hiredate < e.hiredate
)
/
2048 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3714357713
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9560 | 532K| 10 (10)| 00:00:01 |
|* 1 | HASH JOIN RIGHT ANTI| | 9560 | 532K| 10 (10)| 00:00:01 |
| 2 | TABLE ACCESS FULL | EMP2 | 9999 | 107K| 7 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | EMP | 105 | 2415 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("E2"."DEPTNO"="E"."DEPTNO")
filter("E2"."HIREDATE"<"E"."HIREDATE")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
251 consistent gets
0 physical reads
0 redo size
70699 bytes sent via SQL*Net to client
1834 bytes received via SQL*Net from client
138 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2048 rows processed
select e.*
from scott.emp e, (select deptno, min(hiredate) minhdate from scott.emp2 group by deptno) e2
where e2.deptno(+)=e.deptno
and e2.minhdate(+)<e.hiredate
and e2.deptno is null
/
2048 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2263371308
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9999 | 566K| 11 (19)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | HASH JOIN RIGHT OUTER| | 9999 | 566K| 11 (19)| 00:00:01 |
| 3 | VIEW | | 87 | 1044 | 9 (23)| 00:00:01 |
| 4 | HASH GROUP BY | | 87 | 957 | 9 (23)| 00:00:01 |
| 5 | TABLE ACCESS FULL | EMP2 | 9999 | 107K| 7 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | EMP | 100 | 2300 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("E2"."DEPTNO" IS NULL)
2 - access("E2"."DEPTNO"(+)="E"."DEPTNO")
filter("E2"."MINHDATE"(+)<"E"."HIREDATE")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
251 consistent gets
0 physical reads
0 redo size
70699 bytes sent via SQL*Net to client
1834 bytes received via SQL*Net from client
138 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2048 rows processed
March 03, 2009 - 7:26 am UTC
select e.*
from scott.emp e
where not exists
(select null
from scott.emp2 e2
where e2.deptno = e.deptno
and e2.hiredate < e.hiredate
)
/
select e.*
from scott.emp e, (select deptno, min(hiredate) minhdate from scott.emp2 group
by deptno) e2
where e2.deptno(+)=e.deptno
and e2.minhdate(+)<e.hiredate
and e2.deptno is null
/
do you see what you did?
You did a bit of manual optimization - something the optimizer didn't (yet) see.
You know that if
...
where not exists
(select null
from scott.emp2 e2
where e2.deptno = e.deptno
and e2.hiredate < e.hiredate
)
....
is true, then we could just generate a (probably MUCH SMALLER) set of deptno/hiredate values - using group by.
You did it in ONE but not the other.
Hence you did not test "not exists" vs "outer join" but rather "compare to entire set, versus "group by the set down to something very small and then compare"
In short, you did not test the premise you put forth in this case.
(and analytics would have been a better way to phrase this, you wanted the set of employees such that they had the minimum hiredate - not in and anti joins, neither of them "say" that to me)
Considerations for not exists
Ryan, February 26, 2009 - 5:21 pm UTC
I agree with using not exists.
Consider a case where you want to perform more logic than just "One not in the other".
Consider Item, Sale, and SaleDetail:
create table Item
(
ItemId number(6),
ItemDescription varchar(200),
UnitPrice number(18,6)
)
/
Create table sale
(
SaleId number(6),
TransactionDate date
)
/
create table SaleDetail
(
SaleId number(6),
ItemId number(6),
Quantity number(18)
)
insert into item
values (1,'Toyota Prius',18500.95)
/
insert into item
values (2,'Lock Washer', .004)
/
insert into sale
values (1000,'15-Jun-2008')
/
insert into sale
values (1001,'23-Aug-2008')
/
insert into sale
values (1002,'05-Jan-2009')
/
insert into sale
values (1003,'09-Jan-2009')
/
insert into sale
values (1004,'15-Feb-2009')
/
insert into SaleDetail
values (1000, 1, 1)
/
insert into SaleDetail
values (1001, 1, 2)
/
insert into SaleDetail
values (1002, 1, 2)
/
insert into SaleDetail
values (1003, 2, 1000)
/
insert into SaleDetail
values (1004, 2, 1900)
/
Now, I want to know which items from the item table have yet to be sold in 2009 in a quantity greater than 100 (what have i not yet sold in bulk this year?).
Two alternatives using outer join logic (both nasty to read and comprehend):
select i.*
from item i
left outer join saleDetail sd
ON i.itemId = sd.ItemId
AND sd.quantity > 100
Left outer join sale s
ON sd.saleId = s.saleid
and trunc(s.transactionDate,'YYYY') = to_date('01-JAN-2009','DD-MON-YYYY')
WHERE sd.itemId is null
or
select i.*
from item i
left outer join
(select sd_i.*
from saleDetail sd_i
JOIN sale s
on s.saleid = sd_i.saleid
where trunc(s.transactionDate,'YYYY') = to_date('01-JAN-2009','DD-MON-YYYY')) sd
ON i.itemId = sd.ItemId
AND sd.quantity > 100
WHERE sd.itemId is null
Now again, using the not exists (much easier to read and much easier to test what HAS been sold in bulk in 2009 by removing the inner to outer join on itemId):
select *
from item i
where not exists (select 'x'
from saleDetail sd
JOIN sale s
ON sd.saleId = s.saleId
where sd.itemId =i.itemId
and quantity > 100
and trunc(s.transactionDate,'YYYY') =
to_date('01-JAN-2009','DD-MON-YYYY')
)
Many developers would even try moving the where-like and conditions in the outer join syntax into the body of the where clause, thinking that it was just misplaced initially. Once it's in the where, you've created an inner join dataset and no inner join would have the "non-inner-join anti join" condition.
I'm pro anti-join
Duke Ganote, February 27, 2009 - 12:28 pm UTC
After standardizing the subquery in Ryan's example, I now realize a couple of reasons I prefer the LEFT JOIN version:
1) ANSI-standard join syntax
2) reinforcement of the idea that WHERE filters are applied
after the join.
WITH
saleSET AS
(SELECT sd.itemID
FROM saleDetail sd
JOIN sale s
ON sd.saleId = s.saleId
AND sd.quantity > 100
AND s.transactionDate BETWEEN TO_DATE('20090101','YYYYMMDD')
AND TO_DATE('20091231','YYYYMMDD')
)
SELECT count(*)
FROM item i
LEFT JOIN saleSET sd
ON ( sd.itemId =i.itemId )
WHERE sd.itemID IS NULL
In the NOT EXISTS version I have to go searching for the join condition in the WHERE clause and find the correlated reference.
I don't like commingled join and filter conditions, which are required for a correlated subquery.
WITH
saleSET AS
(SELECT sd.itemID
FROM saleDetail sd
JOIN sale s
ON sd.saleId = s.saleId
AND sd.quantity > 100
AND s.transactionDate BETWEEN TO_DATE('20090101','YYYYMMDD')
AND TO_DATE('20091231','YYYYMMDD')
)
SELECT count(*)
FROM item i
WHERE NOT EXISTS ( SELECT itemID
FROM saleSET sd
WHERE sd.itemId =i.itemId )
http://en.wikipedia.org/wiki/Relational_algebra#Antijoin
March 03, 2009 - 9:51 am UTC
... 2) reinforcement of the idea that WHERE filters are applied after the join. ...
that is only in concept. It doesn't have to be that way
"doesn't have to be that way..."
Duke Ganote, March 03, 2009 - 10:16 am UTC
March 03, 2009 - 5:01 pm UTC
yup, so - to say the where happens after the join - isn't really technically accurate, conceptually you can think of it that way.
I say this, because lots of people think:
select *
from (select * from generic_table where datatype = 'NUMBER' )
where to_number( value ) > 42;
would "force" the where datatype = 'NUMBER' to be evaluated first, so that the to_number(value) would be safe (generic model, store number, dates and strings in value, use datatype to figure out what it really is at runtime).
But, it doesn't.
Chuck, March 03, 2009 - 4:13 pm UTC
Possibly this demonstrates the stoneage dating of my database theory classes, but isn't one of the primary rules of SQL the proposition that there is no order of operation?
In other words, a SQL query is a single statement.
This implies that all access orders for a query must produce equivalent results.
Otherwise SQL would not be a logically consistent language.
March 03, 2009 - 9:21 pm UTC
SQL is a non-procedural language, yes.
It is like a complex definition of a program - that is then machine coded. Think of SQL like a specification - you are saying what needs be done, not how to do it so much.
Order of evaluation
Duke Ganote, March 04, 2009 - 1:15 am UTC
There must be an order of evaluation within an SQL statement. For example "Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause."
http://download-west.oracle.com/docs/html/A85397_01/function.htm#81409
March 04, 2009 - 1:05 pm UTC
Certain things sometimes impose ordering, but sql is non-procedural. We do view merging, query rewrites, predicate pushing - all kinds of things.
Don't think a parenthesis means "this will happen first"
Specifically, the example I gave:
select *
from (select * from generic_table where datatype = 'NUMBER' )
where to_number( value ) > 42;
you do not know if datatype='number' or to_number(value) will happen FIRST.
Specifically, the example YOU gave - you cannot say the where will happen after the join, it doesn't have to.
RE: "you cannot say the where will happen after the join"
Duke Ganote, March 04, 2009 - 5:06 pm UTC
You wrote: "the example YOU gave - you cannot say the where will happen after the join, it doesn't have to."
Sorry, I must not grasp which example you're referencing.
Surely you can't mean a LEFT JOIN example (like the one below) when you write that.
select DISTINCT e.deptno
From emp e left join dept d
ON ( e.deptno = d.deptno )
where d.deptno is null
/
no rows selected
It's a vastly different answer if the WHERE filter is applied first by making it an ON filter:
select DISTINCT e.deptno
From emp e left join dept d
ON ( e.deptno = d.deptno
AND d.deptno is null )
/
DEPTNO
------
10
20
30
March 04, 2009 - 8:34 pm UTC
they can happen simultaneously in a way, think about
select DISTINCT e.deptno
From emp e left join dept d
ON ( e.deptno = d.deptno )
where d.deptno is null or d.deptno = 10
/
could it apply a bit of the where clause first? Sure it could. (by your example, I meant - "it shows the outer join is done before the where).
Chuck, March 04, 2009 - 9:47 pm UTC
For once I think Tom read your question too quickly.
Your last two examples aren't logically the same query.
The point is substituting and outer join with an outer not null for a "not exists"
March 05, 2009 - 1:23 pm UTC
I'm not sure what you mean - whose question?
RE: "can happen simultaneously in a way"
Duke Ganote, March 05, 2009 - 2:10 pm UTC
I think you're making two observations:
1) There are useless outer joins that the optimizer *could* avoid, e.g.
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:4963137609733 where you wrote:
"If you see a query like this:
from t1, t2
where t1.x = t2.x(+)
and t2.any_other_column <some condition>
"you know for a fact that outer join is a waste, a performance drain, something to be removed from the query."
I think this example actually supports my idea of using the ANSI OUTER JOIN approach, in part as a conceptual reinforcement that *logically* WHERE occurs after JOIN.
2) unexpected ORA-01722. The exception that "proves (tests) the rule".
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:45012348053 You note that the optimizer's chosen path between starting point "A" and requested endpoint "B" may go straight through a minefield... even though the "as written" query said "A -> C -> B" in order to avoid the minefield. (And that's where one could use ROWNUM or MATERIALIZE or the like to tell the optimizer "no *really*, A -> C -> B".)
But I think that issue of "suboptimal (fatal) optimization" could arise even with the NOT EXISTS form of the question, and is not peculiar to the OUTER JOIN form.
March 05, 2009 - 3:02 pm UTC
1) Oh, don't get me wrong, if I wanted to do an outer join, the ansi syntax does have definite semantic advantages (notice how I slipped in there "if i wanted to do a join :)")
2) agreed
It's not a join...
Duke Ganote, March 12, 2009 - 9:37 pm UTC
There's just no "anti join" syntax, say, like this!
SELECT count(*)
FROM item i
ANTI JOIN saleSET sd
ON ( sd.itemId =i.itemId )
;
March 13, 2009 - 7:25 am UTC
sure there is....
where not in
where not exists
:)
You don't want an "anti join" (not any more than you 'want' a sort merge or a nested loop or a hash join, you want a JOIN, the mechanics - that is up to the optimizer.
You don't want a join at all, you want a set of data such that it is NOT IN some other set... To me, using a join to perform that, is semantically "awkward"
but I do!
Duke Ganote, March 13, 2009 - 12:11 pm UTC
March 13, 2009 - 2:58 pm UTC
I do not believe "anti-join" was one of the original "we are joins". It is an optimizer technique, it ISN"T something we code. There is no syntax for an 'anti-join' really, there isn't a "anti-join" like the others (natural join, outer join, etc)
We code "give me this", the optimizer says "I'll do a nested loops join, I'll do a sort merge, I'll do a hash, I'll do an anti-join".
There isn't anything like an "anti-join" in the language. There is "give me everything in this SET minus that SET (not an anti-join, a minus)", "give me everything in set SET not in that other SET (not an anti-join, a not in - SOMETIMES equivalent to not exists)" and so on.
I will always code "give me everything in this NOT IN that" as a not in/not exists, to me - it just makes much more semantic sense than "let us outer join AND THEN throw out the rows that match", that is all...
In the fullness of....
Duke Ganote, March 13, 2009 - 10:23 pm UTC
There should be a full anti join!
SELECT count(*)
FROM item i
FULL ANTI JOIN saleSET sd
ON ( sd.itemId =i.itemId )
as the equivalent of...
SELECT count(*)
FROM item i
FULL OUTER JOIN saleSET sd
ON ( sd.itemId =i.itemId )
WHERE sd.itemID IS NULL
OR i.itemID IS NULL
:)
March 16, 2009 - 8:59 am UTC
there is!
select count(*) from item where not exists! :)
Anti-Join
Manuel Vidigal, March 31, 2009 - 1:26 pm UTC
Hi Tom,
I'm having a different explain plan in Dev and Prod (both are version 10.2.0) for this query:
SELECT *
FROM swiadm.arruamento_tmp t
WHERE NOT EXISTS(SELECT NULL
FROM swiadm.arruamento a
WHERE a.ID = t.ID);
Dev:
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 120 | | 3492 (1)| 00:00:42 |
|* 1 | HASH JOIN RIGHT ANTI | | 1 | 120 | 7008K| 3492 (1)| 00:00:42 |
| 2 | INDEX FAST FULL SCAN| PK_ARRUAMENTO | 422K| 2060K| | 241 (2)| 00:00:03 |
| 3 | TABLE ACCESS FULL | ARRUAMENTO_TMP | 282K| 30M| | 1210 (2)| 00:00:15 |
------------------------------------------------------------------------------------------------
Prod:
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 282K| 21M| 543K (1)| 01:48:47 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| ARRUAMENTO_TMP | 282K| 21M| 1193 (2)| 00:00:15 |
|* 3 | INDEX UNIQUE SCAN| PK_ARRUAMENTO | 1 | 4 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Do you have any idea of why the optimizer in prod doesn't use the anti-join?
Thanks in advance.
April 01, 2009 - 7:31 am UTC
what say the statistics? Query them up for that arruamento table and compare them.
Also, what parameters are different?
select name, value from v$parameter where isdefault = 'FALSE';
compare them and post the differences.
RE: Anti-Join
Manuel Vidigal, April 01, 2009 - 10:59 am UTC
Hi Tom,
I'm not able to get the statistics in prod. It's an endless query.
Hi found out that if I use the UNNEST hint, I can force the optimizer to use the anti-join and the query ends in less than 1 minute:
SELECT *
FROM swiadm.arruamento_tmp t
WHERE NOT EXISTS(SELECT /*+ UNNEST */NULL
FROM swiadm.arruamento a
WHERE a.ID = t.ID);
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3363 Card=1 Bytes=123)
1 0 HASH JOIN (RIGHT ANTI) (Cost=3363 Card=1 Bytes=123)
2 1 INDEX (FAST FULL SCAN) OF 'PK_ARRUAMENTO' (INDEX (UNIQUE))
(Cost=184 Card=306128 Bytes=1530640)
3 1 TABLE ACCESS (FULL) OF 'ARRUAMENTO_TMP' (TABLE) (Cost=11
91 Card=282144 Bytes=33292992)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
6903 consistent gets
6179 physical reads
0 redo size
5062279 bytes sent via SQL*Net to client
7489 bytes received via SQL*Net from client
644 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
64300 rows processed
I also checked the Parameter and this was what I found:
Different ParametersNAME PROD VALUE DEV VALUE
filesystemio_options DIRECTIO none (is default)
java_max_sessionspace_size 120000000 128000000
java_pool_size 134217728 130023424
large_pool_size 134217728 96468992
open_cursors 1000 600
optimizer_secure_view_merging FALSE TRUE (is default)
pga_aggregate_target 838860800 512000000
processes 500 200
shared_pool_size 939524096 130023424
streams_pool_size 100663296 0 (is default)
_optimizer_cost_based_transformation OFF Doesn't exist in v$parameter
Does something pop up in your mind?
April 01, 2009 - 4:20 pm UTC
yes you can get the statistics - if you can access the table, you can get the statistics out of the dictionary??
yeah, that underscore parameter. bingo.
Manuel Vidigal, April 01, 2009 - 11:17 am UTC
Hi think I've found the awnser.
Using ALTER SESSION SET "_optimizer_cost_based_transformation" = ON ;
The explain plan started using the anti-join.
Do you know of any reason why they have this parameter set to off?
Thanks,
Manuel
April 01, 2009 - 4:26 pm UTC
you should really consult the person that set it on - to find out why they did so.
Thanks Tom
A reader, April 01, 2009 - 6:15 pm UTC
Thank you for your time.
Not Exists and a join
Maria Antony Samy Issac, August 01, 2011 - 8:21 am UTC
Hi Tom,
I got below given tables.
TABLE_A, having over 1 million records - got an index on the column TABLE_A.ID.
TABLE_B, having over 1 million records - got an index on the column TABLE_B.ID.
TABLE_C, having about 2000 records - got an index on the column TABLE_C.ID.
TABLE_B contains the IDs that are valid one. So, to validate the records of TABLE_A,
we use below given join.
SELECT A.ID
FROM TABLE_A A,
TABLE_B B
WHERE
AND A.ID = B.ID
TABLE_C contains the IDs that are not valid. So, the above query to validate the records of TABLE_A can be written as
SELECT A.ID
FROM TABLE_A A
WHERE NOT EXISTS ( SELECT 1 FROM TABLE_C C
WHERE A.ID = C.ID
)
I said to my friend that the second approach is better over the first. But he thinks the other way. Please suggest.
Note: The table TABLE_A and TABLE_B will keep growing as we get more customers. TABLE_C will also keep growing, but in the same ratio as it is now ( i.e about 2000 for every million records in TABLE_A)
August 01, 2011 - 12:03 pm UTC
depends on the plans.
I'd lean towards:
select * from table_a a where a.id not in (select c.id from table_c c where c.id is not null);
and I'd be looking for two full scans and a hash anti-join.
Not Exists
A reader, September 07, 2011 - 2:03 pm UTC
hi Tom,
I have a situation where I have to join multiple tables, say 6 tables which are around 1GB in size with 1 million of records.
Now I have to join three tables in normal way with their keys. With one table I have to check the sum of a column based on ID > 0 in that. And I have two big tables with which I have to check not exists.
In this example table_1 can be considered as driving table with col_1 being a primary key.
SELECT t1.col1, t2.col2, t3.col_2
from table_1 t1, table_2 t2, table_3 t3
where t1.col_1 = t2.col2
and t2.col_3=t3.col_1
and (select sum(abs(t4.col_2))from table_4 t4 where t4.col_1=t1.col_1) > 0
and not exists(select null from table_5 t5 where t5.col_1=t1.col_1 and t5.col_2 not in (100,200))
and not exists(select null from table_6 t6 where t6.col_1=t1.col_1);
Now my not NOT EXIST check takes a fare bit of time. May be because I also join with such a table where I do SUM operation.
Is there something which works faster than NOT EXISTS? How can I reframe the query to be a good faster one? Any use of analytics may help?
Apologies for not providing the script as these are huge tables.
thanks for your time.
September 08, 2011 - 5:04 pm UTC
I would not be using scalar subqueries at all.
with t4
as
(select sum(abs(t4.col_2)) sum_col_2, col_1
from table_4 t4
group by col_1 ),
t5
as
(select col_1
from table_5
where col_2 not in (100,200)
and col_1 is NOT NULL),
t6
as
(select col_1
from table_6
where col_1 IS NOT NULL)
SELECT t1.col1, t2.col2, t3.col_2
from table_1 t1, table_2 t2, table_3 t3, t4
where t1.col_1 = t2.col2
and t2.col_3=t3.col_1
and t4.col_1 = t1.col_1 and t4.sum_col_2 > 0
and t1.col_1 NOT IN (select * from t5)
and t1.col_1 NOT IN (select * from t6)
/
You'd be looking to join to t4 after aggregating it up, you'd be looking for ANTI-JOINS to t5 and t6 (if you don't see anti-join in the plan, use:
with t4
as
(select sum(abs(t4.col_2)) sum_col_2, col_1
from table_4 t4
group by col_1 ),
t5
as
(select /* add distinct if needed */ distint col_1
from table_5
where col_2 not in (100,200)
and col_1 is NOT NULL),
t6
as
(select /* add distinct if needed */ distinct col_1
from table_6
where col_1 IS NOT NULL)
SELECT t1.col1, t2.col2, t3.col_2
from table_1 t1, table_2 t2, table_3 t3, t4, t5, t6
where t1.col_1 = t2.col2
and t2.col_3=t3.col_1
and t4.col_1 = t1.col_1 and t4.sum_col_2 > 0
and t5.col_1(+) = t1.col_1 and t5.col_1 is null
and t6.col_1 = t1.col_1 and t6.col_1 is null
/
I hope I got the syntax right - and the semantics right - but without a TEST CASE it is really really hard to write sql you know....