Skip to Main Content
  • Questions
  • Check Constraints and Explain Plan Filter Predicates

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Patrick.

Asked: November 28, 2019 - 1:59 pm UTC

Answered by: Chris Saxon - Last updated: December 17, 2019 - 4:27 pm UTC

Category: Database Administration - Version: 12c

Viewed 100+ times

You Asked

Why does the Oracle SQL Optimizer include a filter predicate for a Check Constraint when generating an execution plan for a SELECT statement? If the constraint is valid (according to DBA_CONSTRAINTS), then the table rows are all compliant with the constraint, so why generate a corresponding filter predicate?

Here is an example:
CREATE TABLE TESTER 
(
   PARENTID NUMBER(19,0) NOT NULL, 
   DATAID NUMBER(19,0) NOT NULL,
   SOMEVALUE NUMBER (19,0)
   CONSTRAINT LOOP_CHECK CHECK (DATAID <> PARENTID) ENABLE
);

CREATE INDEX TESTER_DATAID ON TESTER (DATAID);

INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (0,100,1);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (0,200,2);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (100,1001,123);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (100,1002,456);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (100,1003,789);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (200,2001,234);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (200,2002,567);
INSERT INTO TESTER(PARENTID< DATAID< SOMEVALUE) VALUES (200,2003,890);

SELECT * FROM TESTER WHERE DATAID IN (1001, 2001);


If you check the execution plan for the final SELECT statement, you will see an Access Predicate, with an OR condition and the terms DATAID=1001 and DATAID=2001. This is expected.

You will also see a Filter Predicate resulting from the Check Constraint. It contains an OR condition, and the terms PARENTID<>1000 and PARENTID<>2001.

1) Why is the filter predicate being generated? If the check is consistent then it is known that no records will fail the filter.

2) The filter is doubly redundant. Given that it represents the condition (PARENTID<>1001) OR (PARENTID<>2001), it is logically always TRUE.

3) Is there any significant overhead in the query processing resulting from the filter predicate?

This example is trivial, but it is based on an actual case at one of my clients, where the table being queried contains > 100 million rows, and there are system-generated queries against the table where the "DATAID IN(...)" clause contains hundreds of values. It would depend on how Oracle evaluates the resulting (PARENTID <> m) OR (PARENTID <> n) OR.. inequalities. If it stops after the first condition that evaluates to TRUE, then it will never need to check beyond the first two inequalities. If it evaluates all of the conditions regardless, then it is performing a potentially large number of unnecessary check comparisons for every row returned by the associated access predicate.

and we said...

If you look at the "Final query after transformations" - which you can get from the lovely 10053 trace - you'll see something like:

select "TESTER"."PARENTID"    "PARENTID",
       "TESTER"."DATAID"      "DATAID",
       "TESTER"."SOMEVALUE"   "SOMEVALUE"
from   "CHRIS"."TESTER" "TESTER"
where  ("TESTER"."DATAID" = 1001 or "TESTER"."DATAID" = 2001)
and    (1001 <> "TESTER"."PARENTID" or 2001 <> "TESTER"."PARENTID")


The plan filters are in the query. So they become part of the plan.

These come from the transitive predicate generation steps.

So why bother with this?

Well in some cases it enables you to get better plans.

Consider this:

create table t (
  c1 int not null
    check ( c1 > 0 ),
  c2 int not null,
  check ( c1 = c2 )
);

create index i1 
  on t ( c1 );

insert into t 
  with rws as (
    select level x from dual
    connect by level <= 10
  )
    select x, x from rws;

commit;
  
exec dbms_stats.gather_table_stats ( user, 't' ) ;

select * from t
where  c2 = 0;


From the constraints, we know that:

- C1 = C2
- C2 > 0
=> the query returns no rows

Also, because the columns are equal, in theory the optimizer could use the index on C1 instead of full scanning the table. Even though only C2 appears in the where clause.

So what happens?

The plan shows:

--------------------------------------------------------------------------------    
| Id  | Operation                            | Name | Starts | E-Rows | A-Rows |    
--------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT                     |      |      1 |        |      0 |    
|*  1 |  FILTER                              |      |      1 |        |      0 |    
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |      0 |      1 |      0 |    
|*  3 |    INDEX RANGE SCAN                  | I1   |      0 |      1 |      0 |    
--------------------------------------------------------------------------------    
                                                                                    
Predicate Information (identified by operation id):                                 
---------------------------------------------------                                 
                                                                                    
   1 - filter(NULL IS NOT NULL)                                                     
   2 - filter(("C2"=0 AND "C2">0))                                                  
   3 - access("C1"=0) 


It has indeed used the index on c1. Even though we queried c2! :)

Even better, the optimizer has identified the query returns no rows. So it doesn't even bother to access the index.

You can see this from the NULL IS NOT NULL filter at step 1. And zero starts for operations 2 & 3.

Of course, there will always be cases where this makes no difference or even ends up with a worse plan. Such is the consequence of a general case solution.

With respect to your scenario:

Simple inequalities are cheap to run. But as you have so many of them it's possible you have a noticeable overhead. So really the task for you is:

TEST!

Try running the queries with and without the constraint enabled. See if this makes a measurable difference to performance.

If it does, raise an enhancement request showing the effect. Describe the business impact this has. We can then prioritise this against other enhancements.

and you rated our response

  (1 rating)

Reviews

One thing still unclear

December 17, 2019 - 12:53 pm UTC

Reviewer: Patrick McManus

Hi Chris, and thanks for the response, it explains the situation clearly. Unfortunately, the database is provided by a third party vendor and I have no control over the presence of the check constraint in question.

There is still one thing that isn't clear to me. When Oracle is evaluating the filter predicate (PARENTID<>1001) OR (PARENTID<>2000) OR... (PARENTID<>nnnn), does it stop as soon as it reaches the first expression that evaluates to TRUE, or does it continue to evaluate every inequality in the list? I am assuming that it first parses the entire query to determine that there are only 'OR' conditions between the inequalities, and no others like 'AND' or 'NOT'.

If it stops at the first inequality that evaluates to TRUE, then it doesn't matter how many inequalities there are in the clause, it will never need to check beyond the first two, and there shouldn't be any performance impact to speak of.
Chris Saxon

Followup  

December 17, 2019 - 4:27 pm UTC

does it stop as soon as it reaches the first expression that evaluates to TRUE, or does it continue to evaluate every inequality in the list?

It certainly can use short-circuit evaluation.

You can verify with something like this:

create or replace function f ( p int ) 
  return int as
  retval int;
begin
  dbms_output.put_line ( 'Called: ' || p );
  return p + 1;
end f;
/

create table t (
  c1 int
);

insert into t ( c1 ) 
with rws as (
  select level x from dual
  connect by level <= 10
)
  select * from rws;
  
commit;

set serveroutput on
select c1
from   t
where  ( f ( c1 ) <> 2 or f ( c1 ) <> 3 );


The function adds one to the input. And displays "Called: X" where X is the current C1 value.

So the database only needs to evaluate the second expression (f <> 3) when C1 = 1. So you would expected "Called: 1" twice in the output. And every other value once.

Which is exactly what I see:

C1   
    1 
    2 
    3 
    4 
    5 
    6 
    7 
    8 
    9 
   10 


10 rows selected. 

Called: 1
Called: 1
Called: 2
Called: 3
Called: 4
Called: 5
Called: 6
Called: 7
Called: 8
Called: 9
Called: 10


So I'd only look into this further if the query is "too slow".

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.