Thank you for your answer.
"Inverting the join order" might not be the right term, but I constructed a query that achieves the same but allows the optimizer to join at the best possible away. In my test case there are NOT NULL constraints on FOO.PARENT_TABLE.PK and FOO.CHILD_TABLE.FK that are ENABLED and VALIDATED. So, there can’t by any NULLs, so I should be able to just make a:
select count(*) from foo.child_table where fk not in (select pk from foo.parent_table);
But the plan is basically the same that I end up when removing the “ordered” hint from the recursive statement.
If I put some more data into the tables and run the recursive SELECT with and without the ordered hint, the optimizer chooses two different plans, and in the latter case it is much faster. Please observe the following revised test case:
drop table child_table;
drop table parent_table;
create table parent_table (pk not null, constraint parent_table_pk primary key (pk)) as select ROWNUM pk FROM DBA_OBJECTS WHERE ROWNUM <= 10000;
create table child_table (pk not null, fk not null, constraint child_table_pk primary key (pk)) as select ROWNUM pk, mod(rownum, 10000)+1 fk FROM DBA_OBJECTS, DBA_OBJECTS WHERE ROWNUM <= 10000000;
create index child_table_fk on child_table(fk);
select /*+ all_rows ordered dynamic_sampling(2) */ A.rowid, 'A', 'B', 'C' from "FOO"."CHILD_TABLE" A , "FOO"."PARENT_TABLE" B where( "A"."FK" is not null) and( "B"."PK" (+)= "A"."FK") and( "B"."PK" is null);
select * from table(dbms_xplan.display_cursor(null, null, 'TYPICAL LAST'));
select /*+ all_rows dynamic_sampling(2) */ A.rowid, 'A', 'B', 'C' from "FOO"."CHILD_TABLE" A , "FOO"."PARENT_TABLE" B where( "A"."FK" is not null) and( "B"."PK" (+)= "A"."FK") and( "B"."PK" is null);
select * from table(dbms_xplan.display_cursor(null, null, 'TYPICAL LAST'));
I would really like to understand the exact reason for the "ordered" hint. Because if it is truly needed in this case, then I am afraid that my alternate approach above will be flawed (manually checking the data and setting RELY).
I am aware of the possibility of setting the parameter "query_rewrite_integrity". However, when I read the documentation I found out that setting the parameter to “trusted” not only allows join elimination on foreign key constraints in RELY, but also:
In this mode, the optimizer also uses prebuilt materialized views or materialized views based on views, and it uses relationships that are not enforced as well as those that are enforced. It also trusts declared but not ENABLED VALIDATED primary or unique key constraints and data relationships specified using dimensions.Ref.:
https://docs.oracle.com/database/122/DWHSG/basic-query-rewrite-materialized-views.htm#DWHSG8496 Relationships that are declared but not enforced I would translate into constraints that disabled, and that might be too many "side effects" for us to accept.
I recognize – and agree with - your statement that it is a better option to get the developers to write perfect SQLs rather than relying on the optimizer doing join elimination, but we might not be there yet (… or never will be).
December 01, 2017 - 11:09 am UTC
When the optimizer chooses a nested loop for an outer join, it has to access the row-preserved table first:
However, in a nested loop outer join, the join condition determines the order of tables. The database uses the outer, row-preserved table to drive to the inner table. https://docs.oracle.com/database/122/TGSQL/joins.htm#GUID-1ACFF09D-C8E1-4272-97B9-900D2053B91E This also used to be the case for hash joins. From the 9i docs:
Like an outer join, the order of tables is not determined by the cost, but by the join condition. https://docs.oracle.com/cd/A91202_01/901_doc/server.901/a87503/optimops.htm#49916 But is no longer true:
The cost determines the order of tables. The outer table, including preserved rows, may be used to build the hash table, or it may be used to probe the hash table. https://docs.oracle.com/database/122/TGSQL/joins.htm#GUID-278E01B5-9498-40EC-B0BD-CC415C18E078 So, I suspect the ordered hint is a hang-over from when accessing child_table first was a strict requirement.
Running your test case on 12.2.0.1, I get the same plan whether or not the ordered hint is there:
/* using ordered hint */
PLAN_TABLE_OUTPUT
SQL_ID 2kqwtvutvyy12, child number 0
-------------------------------------
select /*+ all_rows ordered dynamic_sampling(2) */ a.rowid, 'A',
'B', 'C' from "CHILD_TABLE" a, "PARENT_TABLE" b
where ( "A"."FK" is not null ) and ( "B"."PK" (+) = "A"."FK" )
and ( "B"."PK" is null )
Plan hash value: 2416273426
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 22996 (100)| |
| 1 | MERGE JOIN ANTI | | 10M| 384M| | 22996 (8)| 00:00:01 |
| 2 | INDEX FULL SCAN | CHILD_TABLE_FK | 10M| 253M| | 22914 (8)| 00:00:01 |
|* 3 | SORT UNIQUE | | 10000 | 126K| 408K| 82 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL| PARENT_TABLE | 10000 | 126K| | 8 (13)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("B"."PK"="A"."FK")
filter("B"."PK"="A"."FK")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
29 rows selected.
/* no ordered hint */
PLAN_TABLE_OUTPUT
SQL_ID bcmz76whb5jjk, child number 0
-------------------------------------
select /*+ all_rows dynamic_sampling(2) */ a.rowid, 'A',
'B', 'C' from "CHILD_TABLE" a, "PARENT_TABLE" b where (
"A"."FK" is not null ) and ( "B"."PK" (+) = "A"."FK" )
and ( "B"."PK" is null )
Plan hash value: 2416273426
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 22996 (100)| |
| 1 | MERGE JOIN ANTI | | 10M| 384M| | 22996 (8)| 00:00:01 |
| 2 | INDEX FULL SCAN | CHILD_TABLE_FK | 10M| 253M| | 22914 (8)| 00:00:01 |
|* 3 | SORT UNIQUE | | 10000 | 126K| 408K| 82 (10)| 00:00:01 |
| 4 | TABLE ACCESS FULL| PARENT_TABLE | 10000 | 126K| | 8 (13)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("B"."PK"="A"."FK")
filter("B"."PK"="A"."FK")
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
Yes, you need to take care with trusted query rewrite. If going down this route, I'd only enable it in sessions where you're sure it's safe. Not globally.