Why does the Oracle SQL Optimizer include a filter predicate for a Check Constraint when generating an execution plan for a SELECT statement? If the constraint is valid (according to DBA_CONSTRAINTS), then the table rows are all compliant with the constraint, so why generate a corresponding filter predicate?
Here is an example:
CREATE TABLE TESTER
(
PARENTID NUMBER(19,0) NOT NULL,
DATAID NUMBER(19,0) NOT NULL,
SOMEVALUE NUMBER (19,0)
CONSTRAINT LOOP_CHECK CHECK (DATAID <> PARENTID) ENABLE
);
CREATE INDEX TESTER_DATAID ON TESTER (DATAID);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (0,100,1);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (0,200,2);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (100,1001,123);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (100,1002,456);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (100,1003,789);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (200,2001,234);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (200,2002,567);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (200,2003,890);
SELECT * FROM TESTER WHERE DATAID IN (1001, 2001);
If you check the execution plan for the final SELECT statement, you will see an Access Predicate, with an OR condition and the terms DATAID=1001 and DATAID=2001. This is expected.
You will also see a Filter Predicate resulting from the Check Constraint. It contains an OR condition, and the terms PARENTID<>1000 and PARENTID<>2001.
1) Why is the filter predicate being generated? If the check is consistent then it is known that no records will fail the filter.
2) The filter is doubly redundant. Given that it represents the condition (PARENTID<>1001) OR (PARENTID<>2001), it is logically always TRUE.
3) Is there any significant overhead in the query processing resulting from the filter predicate?
This example is trivial, but it is based on an actual case at one of my clients, where the table being queried contains > 100 million rows, and there are system-generated queries against the table where the "DATAID IN(...)" clause contains hundreds of values. It would depend on how Oracle evaluates the resulting (PARENTID <> m) OR (PARENTID <> n) OR.. inequalities. If it stops after the first condition that evaluates to TRUE, then it will never need to check beyond the first two inequalities. If it evaluates all of the conditions regardless, then it is performing a potentially large number of unnecessary check comparisons for every row returned by the associated access predicate.
If you look at the "Final query after transformations" - which you can get from the lovely 10053 trace - you'll see something like:
select "TESTER"."PARENTID" "PARENTID",
"TESTER"."DATAID" "DATAID",
"TESTER"."SOMEVALUE" "SOMEVALUE"
from "CHRIS"."TESTER" "TESTER"
where ("TESTER"."DATAID" = 1001 or "TESTER"."DATAID" = 2001)
and (1001 <> "TESTER"."PARENTID" or 2001 <> "TESTER"."PARENTID")
The plan filters are in the query. So they become part of the plan.
These come from the transitive predicate generation steps.
So why bother with this?
Well in some cases it enables you to get better plans.
Consider this:
create table t (
c1 int not null
check ( c1 > 0 ),
c2 int not null,
check ( c1 = c2 )
);
create index i1
on t ( c1 );
insert into t
with rws as (
select level x from dual
connect by level <= 10
)
select x, x from rws;
commit;
exec dbms_stats.gather_table_stats ( user, 't' ) ;
select * from t
where c2 = 0;
From the constraints, we know that:
- C1 = C2
- C2 > 0
=> the query returns no rows
Also, because the columns are equal, in theory the optimizer could use the index on C1 instead of full scanning the table. Even though only C2 appears in the where clause.
So what happens?
The plan shows:
--------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |
|* 1 | FILTER | | 1 | | 0 |
|* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 0 | 1 | 0 |
|* 3 | INDEX RANGE SCAN | I1 | 0 | 1 | 0 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NULL IS NOT NULL)
2 - filter(("C2"=0 AND "C2">0))
3 - access("C1"=0)
It has indeed used the index on c1. Even though we queried c2! :)
Even better, the optimizer has identified the query returns no rows. So it doesn't even bother to access the index.
You can see this from the NULL IS NOT NULL filter at step 1. And zero starts for operations 2 & 3.
Of course, there will always be cases where this makes no difference or even ends up with a worse plan. Such is the consequence of a general case solution.
With respect to your scenario:
Simple inequalities are cheap to run. But as you have so many of them it's possible you have a noticeable overhead. So really the task for you is:
TEST!
Try running the queries with and without the constraint enabled. See if this makes a measurable difference to performance.
If it does, raise an enhancement request showing the effect. Describe the business impact this has. We can then prioritise this against other enhancements.