Hello!
We encountered strange (and logically inconsistent) CBO behavior. To provide some context: we are working with an EAV (entity-attribute-value) schema, all objects are stored in a single table, let's call it "my_objects", another table, "object_ancestry", stores all ancestor/descendant relations. Below is the script that creates all tables in question, populates them with some data, and builds indexes:
Environment
-----------
• Oracle 19.22.0.0 on AIX
• optimizer_mode = ALL_ROWS
• cursor_sharing = EXACT
• Full stats gathered on all objects (see script)
--- script start
declare
procedure drop_table(p_table_name in varchar2)
is
e_no_table exception;
pragma exception_init(e_no_table, -942);
begin
execute immediate 'drop table ' || p_table_name;
exception when e_no_table then null; -- ignore
end;
begin
drop_table('object_ancestry');
drop_table('my_objects');
end;
/
--- OEV (object-entity-value) schema:
create table my_objects(object_type_id number, object_id number, object_key varchar2(100));
create table object_ancestry(parent_type_id number, child_type_id number, parent_id number, child_id number);
--- adding some data
----- objects
insert /*+ append*/ into my_objects(object_type_id, object_id, object_key)
select 1 as object_type, level as object_id, 'location_' || level as object_key
from dual connect by level <= 1e+06;
commit;
insert /*+ append*/ into my_objects(object_type_id, object_id, object_key)
select 2 as object_type, 1e+06 + level as object_id, 'subregion_' || level as object_key
from dual connect by level <= 10000;
commit;
insert /*+ append*/ into my_objects(object_type_id, object_id, object_key)
select 3 as object_type, 2e+06 + level as object_id, 'region_' || level as object_key
from dual connect by level <= 100;
commit;
----- EOF objects
----- object ancestry
insert into object_ancestry(parent_type_id, child_type_id, parent_id, child_id)
select subregion.object_type_id as parent_type_id, location.object_type_id as child_type_id, subregion.object_id as parent_id, location.object_id as child_id
from my_objects location
join my_objects subregion on mod(location.object_id, 10000) = (subregion.object_id - 1e+06 - 1)
where location.object_type_id = 1
and subregion.object_type_id = 2
;
commit;
insert into object_ancestry(parent_type_id, child_type_id, parent_id, child_id)
select region.object_type_id as parent_type_id, subregion.object_type_id as child_type_id, region.object_id as parent_id, subregion.object_id as child_id
from my_objects subregion
join my_objects region on mod(subregion.object_id - 1e+06, 100) = (region.object_id - 2e+06 - 1)
where subregion.object_type_id = 2
and region.object_type_id = 3
;
commit;
insert into object_ancestry(parent_type_id, child_type_id, parent_id, child_id)
select 3 /*region*/ as parent_type_id, 1 /*location*/ as child_type_id, region_id as parent_id, location_id as child_id
from
(
select parent_id as subregion_id, child_id as location_id
from object_ancestry subregion_location
where parent_type_id = 2
and child_type_id = 1
) sl
join
(
select parent_id as region_id, child_id as subregion_id
from object_ancestry region_subregion
where parent_type_id = 3
and child_type_id = 2
) rs on sl.subregion_id = rs.subregion_id
;
commit;
create unique index idx01_object_ancestry on object_ancestry (parent_type_id, child_type_id, child_id, parent_id);
create unique index idx02_object_ancestry on object_ancestry (parent_type_id, child_type_id, parent_id, child_id);
create unique index pk_my_objects on my_objects(object_id);
create index idx01_my_objects on my_objects(object_type_id);
create unique index idx02_my_objects on my_objects(object_type_id, upper(object_key));
alter table my_objects add constraint pk_my_objects primary key(object_id) using index pk_my_objects;
exec dbms_stats.gather_table_stats(ownname=>user, tabname => 'MY_OBJECTS', estimate_percent => 100, cascade => TRUE);
exec dbms_stats.gather_table_stats(ownname=>user, tabname => 'OBJECT_ANCESTRY', estimate_percent => 100, cascade => TRUE);
--- script end
We need to extract all "location" objects that belong to specific "region" objects, where "region" objects are given by their object_key (some middleware generates all the queries, and we can't modify them),
Short Problem description:
-------
With the query below, the optimiser picks a different, higher-cost plan
as soon as the IN-list on REGION.OBJECT_KEY grows from 1 to 2 bind
values, and reports the LEADING(region) hint as 'not used'.
select /*+ leading(region)*/
location.object_id location_object_id,
region_location.parent_id region_object_id,
subregion_location.parent_id sub_region_object_id,
location.object_key location_key,
sub_region.object_key region_key,
region.object_key region_key
from my_objects location,
(
select region_location_int.child_id, region_location_int.parent_id
from object_ancestry region_location_int
where region_location_int.parent_type_id = :100
and region_location_int.child_type_id = :101
) region_location,
(
select subregion_location_int.child_id, subregion_location_int.parent_id
from object_ancestry subregion_location_int
where subregion_location_int.parent_type_id = :200
and subregion_location_int.child_type_id = :201
) subregion_location,
my_objects region,
my_objects sub_region
where location.object_type_id = :1
and subregion_location.child_id(+) = location.object_id
and sub_region.object_type_id(+) = :2
and sub_region.object_id(+) = subregion_location.parent_id
and region_location.child_id(+) = location.object_id
and region.object_type_id(+) = :3
and region.object_id(+) = region_location.parent_id
and (upper(region.object_key) in (upper(:4),upper(:5)))
;
For the above query, CBO refuses to use the hint, don't use index on my_objects(object_type_id, upper(object_key)), and generates the plan with the cost 5633 (for real application data, it is significantly higher)
If we remove one value from the list and modify the last predicate to make it look as follows:
and (upper(region.object_key) in (upper(:4)))
CBO honors the hint and switches to the plan that uses the index on my_objects(object_type_id, upper(object_key)) with the cost 267. There is an apparent logical inconsistency here, because in the worst case scenario in-list predicate with two values can be replaced by a union of two queries with one value in the list, and CBO knows that the cost of such plan is 267, therefore, in the worst case scenario it should be 267 * 2 = 534, which is 10 time lower than 5633 - the cost of the plan with two values in the list.
The things get even "curiouser" if we change the last predicate in the above query with the predicate on "sub_region":
and (upper(sub_region.object_key) in (upper(:4),upper(:5)))
In this case, CBO honors the hint, applies the index, and builds a plan with the cost of 181, even though we use bind variables in the query and from CBO perspective there must be absolutely no difference between the instance of "my_objects" table with alias "region" and another instance of "my_objects" table with alias "sub_region".
Can you please shed some light on what's going on here?
Thank you very much in advance!
This is likely a reflection of the inconsistency in the query itself - mixing inner and outer joins!
and region.object_type_id(+) = :3
and region.object_id(+) = region_location.parent_id
and (upper(region.object_key) in (upper(:4),upper(:5)))
This outer joins on object_type_id and object_id and inner joins on object_key. The inner join overrides the outer join.
Fix this so all the criteria are inner joins or outer joins as appropriate. Do this for region_location.child_id too as this in the join chain.
Doing this resolves the plan inconsistency. With everything outer joins and two variables in the list I get this plan on 19.26:
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 438K| 10573 (1)|
| 1 | HASH JOIN RIGHT OUTER | | 438K| 10573 (1)|
| 2 | INLIST ITERATOR | | | |
| 3 | TABLE ACCESS BY INDEX ROWID| MY_OBJECTS | 1 | 5 (0)|
| 4 | INDEX UNIQUE SCAN | IDX02_MY_OBJECTS | 1 | 4 (0)|
| 5 | HASH JOIN RIGHT OUTER | | 438K| 10566 (1)|
| 6 | TABLE ACCESS FULL | OBJECT_ANCESTRY | 502K| 1702 (2)|
| 7 | HASH JOIN RIGHT OUTER | | 384K| 6770 (1)|
| 8 | TABLE ACCESS FULL | MY_OBJECTS | 336K| 1129 (2)|
| 9 | HASH JOIN OUTER | | 384K| 4099 (2)|
| 10 | TABLE ACCESS FULL | MY_OBJECTS | 336K| 1129 (2)|
| 11 | TABLE ACCESS FULL | OBJECT_ANCESTRY | 502K| 1702 (2)|
-------------------------------------------------------------------------------
I also see that it's ignoring the leading hint in this plan. But it's accessing region first, so it's a non-issue.
With one bind variable, the overall cost is 2 lower and the database uses a unique scan instead of inlist as shown below. But the rest of the plan is the same.
| 0 | SELECT STATEMENT | | 438K| 10571 (1)|
| 1 | HASH JOIN RIGHT OUTER | | 438K| 10571 (1)|
| 2 | TABLE ACCESS BY INDEX ROWID| MY_OBJECTS | 1 | 3 (0)|
| 3 | INDEX UNIQUE SCAN | IDX02_MY_OBJECTS | 1 | 2 (0)|
Make everything inner joins and have two variables in the list and the plan is:
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 29 | 181 (0)|
| 1 | NESTED LOOPS OUTER | | 29 | 181 (0)|
| 2 | NESTED LOOPS OUTER | | 29 | 123 (0)|
| 3 | NESTED LOOPS | | 25 | 73 (0)|
| 4 | NESTED LOOPS | | 33 | 7 (0)|
| 5 | INLIST ITERATOR | | | |
| 6 | TABLE ACCESS BY INDEX ROWID| MY_OBJECTS | 1 | 5 (0)|
| 7 | INDEX UNIQUE SCAN | IDX02_MY_OBJECTS | 1 | 4 (0)|
| 8 | INDEX RANGE SCAN | IDX02_OBJECT_ANCESTRY | 50 | 2 (0)|
| 9 | TABLE ACCESS BY INDEX ROWID | MY_OBJECTS | 1 | 2 (0)|
| 10 | INDEX UNIQUE SCAN | PK_MY_OBJECTS | 1 | 1 (0)|
| 11 | INDEX RANGE SCAN | IDX01_OBJECT_ANCESTRY | 1 | 2 (0)|
| 12 | TABLE ACCESS BY INDEX ROWID | MY_OBJECTS | 1 | 2 (0)|
| 13 | INDEX UNIQUE SCAN | PK_MY_OBJECTS | 1 | 1 (0)|
---------------------------------------------------------------------------------------
This is a slightly lower overall cost than the single variable plan (the rest of the plan is the same as before):
| 0 | SELECT STATEMENT | | 43 | 5246 | 267 (0)| 00:00:01 |
| 1 | NESTED LOOPS OUTER | | 43 | 5246 | 267 (0)| 00:00:01 |
| 2 | NESTED LOOPS OUTER | | 43 | 4214 | 181 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 38 | 3078 | 105 (0)| 00:00:01 |
| 4 | NESTED LOOPS | | 50 | 2850 | 5 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| MY_OBJECTS | 1 | 40 | 3 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | IDX02_MY_OBJECTS | 1 | | 2 (0)| 00:00:01 |
There may be a costing issue in the original query. But the first step is to make the join types consistent.
There is an apparent logical inconsistency here, because in the worst case scenario in-list predicate with two values can be replaced by a union of two queries with one value in the list, and CBO knows that the cost of such plan is 267, therefore, in the worst case scenario it should be 267 * 2 = 534, which is 10 time lower than 5633 - the cost of the plan with two values in the list.It's not that simple. Adding another variable to the list will change the cost of accessing that table. This then has a cascading effect through the rest of the plan which can lead to wildly different final costs.
In general, there's no guarantee that adding another variable to an IN list will double the size of the results and thus the cost.
Further complicating this example is the huge skew in the values for object_type_id. An
explain plan which doesn't peek at the bind values for this column can have vastly different row estimates and costs to the
execution plans which peek at each of the values for this column.
The things get even "curiouser" if we change the last predicate in the above query with the predicate on "sub_region":Again, this is mixing inner and outer joins.