Tom,
I was reading about Join Factorization from Optimizer blog. And ended up with the below scenario where Join Factorization doesn't takes happen.
https://blogs.oracle.com/optimizer/entry/optimizer_transformations_join_factorization rajesh@ORA11GR2> create table t1(x, pad) as
2 select rownum, rownum
3 from dual
4 connect by level <= 10;
Table created.
rajesh@ORA11GR2> create table t2(y, pad) as
2 select rownum, rownum
3 from dual
4 connect by level <= 10;
Table created.
rajesh@ORA11GR2> create table t3(z, pad) as
2 select rownum, rownum
3 from dual
4 connect by level <= 10;
Table created.
rajesh@ORA11GR2> begin
2 dbms_stats.gather_table_stats(user,'T1');
3 dbms_stats.gather_table_stats(user,'T2');
4 dbms_stats.gather_table_stats(user,'T3');
5 end;
6 /
PL/SQL procedure successfully completed.
rajesh@ORA11GR2>
rajesh@ORA11GR2>
rajesh@ORA11GR2> set autotrace traceonly explain
rajesh@ORA11GR2>
rajesh@ORA11GR2> select * from t1 join t2 on (t1.x = t2.y)
2 union all
3 select * from t1 join t3 on (t1.x = t3.z);
Execution Plan
----------------------------------------------------------
Plan hash value: 235069230
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 240 | 13 (54)| 00:00:01 |
| 1 | UNION-ALL | | | | | |
|* 2 | HASH JOIN | | 10 | 120 | 7 (15)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T1 | 10 | 60 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T2 | 10 | 60 | 3 (0)| 00:00:01 |
|* 5 | HASH JOIN | | 10 | 120 | 7 (15)| 00:00:01 |
| 6 | TABLE ACCESS FULL| T1 | 10 | 60 | 3 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL| T3 | 10 | 60 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."X"="T2"."Y")
5 - access("T1"."X"="T3"."Z")
rajesh@ORA11GR2>
Join factorization doesn't take place, Table 'T1' got scanned twice. I was expecting the above query to be transformed like this.
rajesh@ORA11GR2> select t1.x, v.pad
2 from t1, ( select y c1,pad from t2
3 union all
4 select z , pad from t3 ) v
5 where t1.x = v.c1 ;
Execution Plan
----------------------------------------------------------
Plan hash value: 152944573
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 580 | 10 (10)| 00:00:01 |
|* 1 | HASH JOIN | | 20 | 580 | 10 (10)| 00:00:01 |
| 2 | TABLE ACCESS FULL | T1 | 10 | 30 | 3 (0)| 00:00:01 |
| 3 | VIEW | | 20 | 520 | 6 (0)| 00:00:01 |
| 4 | UNION-ALL | | | | | |
| 5 | TABLE ACCESS FULL| T2 | 10 | 60 | 3 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL| T3 | 10 | 60 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."X"="V"."C1")
rajesh@ORA11GR2>
working link:
https://blogs.oracle.com/optimizer/entry/optimizer_transformations_join_factorization just because an optimization technique *is possible* doesn't mean it is the optimal approach.
first of all, testing on a "toy" amount of data is useless (I've mentioned this to you before...). When the table has 10 rows - it really doesn't matter HOW we optimize it, does it...
secondly - there is no predicate on T1, nothing to be factored out.
thirdly, lets say that T2 and T3 (forget T1 for a minute, it could be large or small) are large. Would you really want to
a) scan T2 and T3
b) place the results into TEMP (remember, they are big)
c) then take that TEMP result and read it *again* (eg: full scan T2 and T3 in effect for a second time!!!)
d) and join that to T1....
In this case, join factorization wouldn't be more efficient - we'd end up hitting T2 and T3 from TEMP a second time (the equivalent of five full scans - T1 + 2*T2 + 2*T3) - instead of four - 2*T1+T2+T3.
The estimated cost of the factorized query has to be less then the estimated cost of the non-factorized one.