Skip to Main Content

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Elwyn.

Asked: November 07, 2018 - 10:41 am UTC

Last updated: November 07, 2018 - 4:50 pm UTC

Version: 11.2.0.4.0

Viewed 1000+ times

You Asked

I am trying to join (outer join) a number of IOT's all with the same key in Parallel Mode; and am expecting the optimizer to use a Merge Join in order to avoid sorts (as the documentation clearly says it should); but it does not. - I'm tracing the problem and found that the IOT's which have been created with default Degree of Parallelism, the IOT Indexes themselves are only Single, i.e. No Parallelism. - Using the NO_PARALLEL hint is interesting, but it still avoids Merge Joins. I get the same results with 12.2.0.1.0 - My question are:-

1.) How can I ensure that the IOT Primary Indexes are in Parallel Mode?
2.) Why does the CBO still avoid the Merge Join, when the sources are pre-sorted?

Thanks, Elwyn

CREATE TABLE TEST1 (
  MyKey INTEGER,
  MyData VARCHAR2(20),
  CONSTRAINT X1 PRIMARY KEY (MyKey) 
) ORGANIZATION INDEX PARALLEL NOLOGGING;
/
CREATE TABLE TEST2 (
  MyKey INTEGER,
  MyData VARCHAR2(20),
  CONSTRAINT X2 PRIMARY KEY (MyKey) 
) ORGANIZATION INDEX PARALLEL NOLOGGING;
/
select INDEX_NAME, DEGREE, INSTANCES from USER_INDEXES where INDEX_NAME like 'X%';
/
select /*+ NO_PARALLEL USE_MERGE(t1 t2) */ *
  from TEST1 t1, TEST2 t2
 where t1.MyKey=t2.MyKey(+);
/

and Chris said...

I feel like you're approaching this backwards. It's not the case that joining two IOTs guarantees a merge join. More that reading data from an index means the database doesn't have to sort.

As the docs say:

Because of sorts required by other operations, the optimizer finds it cheaper to use a sort merge.

If an index exists, then the database can avoid sorting the first data set. However, the database always sorts the second data set, regardless of indexes.


https://docs.oracle.com/en/database/oracle/oracle-database/18/tgsql/joins.html#GUID-3F935776-FE28-4350-9FA4-E6B47489156E

Given your example, the database chooses a nested loop because it has a lower cost:

CREATE TABLE TEST1 (
  MyKey INTEGER,
  MyData VARCHAR2(20),
  CONSTRAINT X1 PRIMARY KEY (MyKey) 
) ORGANIZATION INDEX PARALLEL NOLOGGING;

CREATE TABLE TEST2 (
  MyKey INTEGER,
  MyData VARCHAR2(20),
  CONSTRAINT X2 PRIMARY KEY (MyKey) 
) ORGANIZATION INDEX PARALLEL NOLOGGING;

set serveroutput off

select /*+ USE_MERGE(t1 t2) */ *
  from TEST1 t1, TEST2 t2
 where t1.MyKey=t2.MyKey(+);
 
select * 
from   table(dbms_xplan.display_cursor(null, null, 'BASIC +COST LAST'));

PLAN_TABLE_OUTPUT                                                    
EXPLAINED SQL STATEMENT:                                             
------------------------                                             
select /*+ USE_MERGE(t1 t2) */ *   from TEST1 t1, TEST2 t2  where    
t1.MyKey=t2.MyKey(+)                                                 
                                                                     
Plan hash value: 786864782                                           
                                                                     
----------------------------------------------------                 
| Id  | Operation              | Name | Cost (%CPU)|                 
----------------------------------------------------                 
|   0 | SELECT STATEMENT       |      |     5 (100)|                 
|   1 |  MERGE JOIN OUTER      |      |     5  (20)|                 
|   2 |   SORT JOIN            |      |     2   (0)|                 
|   3 |    INDEX FULL SCAN     | X1   |     2   (0)|                 
|   4 |   SORT JOIN            |      |     3  (34)|                 
|   5 |    INDEX FAST FULL SCAN| X2   |     2   (0)|                 
---------------------------------------------------- 

select /*+ NO_PARALLEL */ *
  from TEST1 t1, TEST2 t2
 where t1.MyKey=t2.MyKey(+);
 
select * 
from   table(dbms_xplan.display_cursor(null, null, 'BASIC +COST LAST'));

PLAN_TABLE_OUTPUT                                               
EXPLAINED SQL STATEMENT:                                        
------------------------                                        
select /*+ NO_PARALLEL */ *   from TEST1 t1, TEST2 t2  where    
t1.MyKey=t2.MyKey(+)                                            
                                                                
Plan hash value: 2370096979                                     
                                                                
---------------------------------------------------             
| Id  | Operation             | Name | Cost (%CPU)|             
---------------------------------------------------             
|   0 | SELECT STATEMENT      |      |     2 (100)|             
|   1 |  NESTED LOOPS OUTER   |      |     2   (0)|             
|   2 |   INDEX FAST FULL SCAN| X1   |     2   (0)|             
|   3 |   INDEX UNIQUE SCAN   | X2   |     0   (0)|             
--------------------------------------------------- 


So the real question is:

Why do you need a merge join? What's the problem with the nested loops/hash join you currently have? And how does a merge join "fix" this?

On the parallel point: if you want to ensure the query runs in parallel, add the parallel hint. If you've enabled Automatic Degree of Parallelism (Auto DOP) the database will determine whether or not to use parallel execution.

Read more about this at:

https://blogs.oracle.com/datawarehousing/what-is-auto-dop

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.