Skip to Main Content
  • Questions
  • Performance of "not exists" versus outer-join

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Claus.

Asked: September 03, 2015 - 7:52 am UTC

Last updated: October 31, 2017 - 1:47 pm UTC

Version: 11

Viewed 10K+ times! This question is

You Asked

Having relied on AskTom for many years to solve Oracle problems, I hope you can help me with this:

I have an SQL statement which deletes records in a table A where no corresponding record exists in table B. Table B is on a remote DB accessed via database link. Current implementation is:

delete from a where not exists (select 'x' from b@DB_LINK b where b.id = A.id);

This works fine but runs for 5 mins.

I tested rewriting the statement using the outer-join method:

delete from A where id in (select a1.id from A a1, B where a1.id = B.id and B.id IS NULL);

This produces the same result in 300 ms.

Is is possible using hints to get the slow SQL to perform as good as the fast one?

Why can the optimizer not generate the same execution plan?

Thanks in advance
Claus Kjaergaard

and Chris said...

With no exeuction plans, it's hard to say exactly what's happening in your situation.

I've reproduced a similar effect however. Table A and B are a similar size. There's only a few rows in A that aren't in B:

create table a (id not null, junk not null) as 
  select rownum, 'filler' from dual
  connect by level <= 10000;
  
create table b (id not null, junk not null) as 
  select * from a where rownum <= 9100;
  
create index ai on a(id);
create index bi on b(id);


Let's see what the executions show us:

set autotrace trace exp

delete from a where not exists (select 'x' from b@DB_LINK b where b.id = A.id);

900 rows deleted.

Elapsed: 00:00:09.56

Execution Plan
----------------------------------------------------------
Plan hash value: 2162437761

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | DELETE STATEMENT       |      |   455 |  5915 |   293   (0)| 00:00:01 |
|   1 |  DELETE                | A    |       |       |            |          |
|*  2 |   FILTER               |      |       |       |            |          |
|   3 |    INDEX FAST FULL SCAN| AI   |  9100 |   115K|     9   (0)| 00:00:01 |
|   4 |    REMOTE              | B    |    20 |   260 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( NOT EXISTS (SELECT 0 FROM  "A1" WHERE "B"."ID"=:B1))

Remote SQL Information (identified by operation id):
----------------------------------------------------

   4 - SELECT "ID" FROM "B" "A1" WHERE "ID"=:1 (accessing
       'DB_LINK.ODEVANGELIST.ORACLECLOUD.INTERNAL' )

rollback;

delete from A where id in (select a1.id from A a1, B@db_link b where a1.id = B.id (+) and B.id IS NULL);

900 rows deleted.

Elapsed: 00:00:00.61

Execution Plan
----------------------------------------------------------
Plan hash value: 4249804651

--------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT          |          |  9100 |   231K|    24   (0)| 00:00:01 |
|   1 |  DELETE                   | A        |       |       |            |          |
|*  2 |   HASH JOIN SEMI          |          |  9100 |   231K|    24   (0)| 00:00:01 |
|   3 |    INDEX FAST FULL SCAN   | AI       |  9100 |   115K|     9   (0)| 00:00:01 |
|   4 |    VIEW                   | VW_NSO_1 |  9100 |   115K|    15   (0)| 00:00:01 |
|*  5 |     FILTER                |          |       |       |            |          |
|*  6 |      HASH JOIN RIGHT OUTER|          |  9100 |   231K|    15   (0)| 00:00:01 |
|   7 |       REMOTE              | B        |  2042 | 26546 |     6   (0)| 00:00:01 |
|   8 |       INDEX FAST FULL SCAN| AI       |  9100 |   115K|     9   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"="ID")
   5 - filter("B"."ID" IS NULL)
   6 - access("A1"."ID"="B"."ID"(+))

Remote SQL Information (identified by operation id):
----------------------------------------------------

   7 - SELECT "ID" FROM "B" "B" (accessing 'DB_LINK.ODEVANGELIST.ORACLECLOUD.INTERNAL' )


rollback;


10 seconds for not exists verses less than 1 for outer join subquery.

Why is this?

The first version of the query does a full index scan of AI first, then for each row returned executes:

SELECT "ID" FROM "B" "A1" WHERE "ID"=:1


on the remote database (taken from the Remote SQL Information). There's 10,000 rows in A, so this is executed 10,000 times. This is where all the time in my first query goes.

The second version of this query runs the following on the remote database once:

SELECT "ID" FROM "B" "B"


It then full scans the index AI. This only returns a few rows. These are used to lookup the rows in A to delete.

Given that you have a fast version of your query, why do you want to use hints in the first version?

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Hint or query rewrite

Julian Veres, October 30, 2017 - 6:22 pm UTC

Hi there,
I'm facing the very same issue, a slow "NOT IN" with filter and full scans versus a roughly 1000x faster left outer with a hash join.
Semantically I'd like to see the NOT IN in the query, this is what I mean and it's easy to understand. Technically, the left outer join is much faster. This database I'm using is a SE.

I believe that's why the Claus is asking whether there is a hint or a parameter that allows this query to be re-written?
Thanks,
Julian
Chris Saxon
October 31, 2017 - 1:47 pm UTC

There isn't a hint to force anti-hash joins. But the optimizer can rewrite a not in to get the same plan as a outer join:

create table a (id not null, junk not null) as 
  select rownum, 'filler' from dual
  connect by level <= 10000;
  
create table b (id not null, junk not null) as 
  select * from a where rownum <= 9100;
  
create index ai on a(id);
create index bi on b(id);

set serveroutput off

select /*+ gather_plan_statistics */* from a 
where  a.id not in (select b.id from b);

select * from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                                   
SQL_ID  fmxrz93gm1ypz, child number 1                                                               
-------------------------------------                                                               
select /*+ gather_plan_statistics */* from a  where  a.id not in                                    
(select b.id from b)                                                                                
                                                                                                    
Plan hash value: 3901740010                                                                         
                                                                                                    
-------------------------------------------------------------------------------------------------   
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |   
-------------------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT      |      |      1 |        |    900 |00:00:00.05 |      66 |     19 |   
|*  1 |  HASH JOIN RIGHT ANTI |      |      1 |  10000 |    900 |00:00:00.05 |      66 |     19 |   
|   2 |   INDEX FAST FULL SCAN| BI   |      1 |   9100 |   9100 |00:00:00.01 |      24 |     19 |   
|   3 |   TABLE ACCESS FULL   | A    |      1 |  10000 |  10000 |00:00:00.02 |      42 |      0 |   
-------------------------------------------------------------------------------------------------   
                                                                                                    
Predicate Information (identified by operation id):                                                 
---------------------------------------------------                                                 
                                                                                                    
   1 - access("A"."ID"="B"."ID")   

select /*+ gather_plan_statistics */a.* 
from   a , b 
where  a.id = B.id (+) 
and    B.id IS NULL;

select * from table(dbms_xplan.display_cursor(null, null, 'IOSTATS LAST'));

PLAN_TABLE_OUTPUT                                                                          
SQL_ID  asb6jak5unbjs, child number 0                                                      
-------------------------------------                                                      
select /*+ gather_plan_statistics */a.*  from   a , b  where  a.id =                       
B.id (+)  and    B.id IS NULL                                                              
                                                                                           
Plan hash value: 3901740010                                                                
                                                                                           
----------------------------------------------------------------------------------------   
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |   
----------------------------------------------------------------------------------------   
|   0 | SELECT STATEMENT      |      |      1 |        |    900 |00:00:00.07 |      66 |   
|*  1 |  HASH JOIN RIGHT ANTI |      |      1 |  10000 |    900 |00:00:00.07 |      66 |   
|   2 |   INDEX FAST FULL SCAN| BI   |      1 |   9100 |   9100 |00:00:00.01 |      24 |   
|   3 |   TABLE ACCESS FULL   | A    |      1 |  10000 |  10000 |00:00:00.02 |      42 |   
----------------------------------------------------------------------------------------   
                                                                                           
Predicate Information (identified by operation id):                                        
---------------------------------------------------                                        
                                                                                           
   1 - access("A"."ID"="B"."ID")  


So:

What exactly are your queries?
Which version are you on?
What's the DDL for the tables and indexes?
What are the plans for the two queries?

You can read more about possible strategies for hinting these from Jonathan Lewis at:

https://jonathanlewis.wordpress.com/2015/04/13/not-exists/

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.