The Tom Kyte Blog: Stuck in a rut...

Monday, January 30, 2006

Stuck in a rut...

Stuck in a rut. I’m on constraints this week. It all started with “something about nothing” – complete misunderstandings about NULL values, especially in indexes. Now I’m stuck on constraints. Just today I was giving a seminar session – and one of the topics was “Effective Schema” – all about constraints, using the right types, telling the CBO more information. This is an example I’ve used in that section.

I’ve been asked in the past “Why should I use a primary key? Why not just use a unique index?” Well, the answer is that you could, but doesn’t the fact you used a primary key say something over and above just using a unique index? In fact it does – it can say a lot. The same goes for the use of foreign keys, NOT NULL constraints and others – they not only protect the data, they add information about the data into the data dictionary. Using this additional information Oracle is able to perform a query rewrite more often, in many complex cases.

Consider the following small example. We will copy the EMP and DEPT tables from the SCOTT schema and create a materialized view that pre-joins the tables together for us.


ops$tkyte@ORA10GR2> create table emp
2  as
3  select * from scott.emp;
Table created.

ops$tkyte@ORA10GR2> create table dept
2  as
3  select * from scott.dept;
Table created.

ops$tkyte@ORA10GR2> create materialized view emp_dept
2  build immediate
3  refresh on demand
4  enable query rewrite
5  as
6  select dept.deptno, dept.dname, count (*)
7    from emp, dept
8   where emp.deptno = dept.deptno
9   group by dept.deptno, dept.dname
10  /
Materialized view created.

Now, we have withheld a lot of information from Oracle here. It does not understand the relationship between EMP and DEPT, does not know what columns are primary keys, and so on. Now, let’s run a query and see what happens:


ops$tkyte@ORA10GR2> set autotrace on
ops$tkyte@ORA10GR2> select count(*) from emp;

COUNT(*)
----------
     14

Execution Plan
-------------------------------------------...
Plan hash value: 2083865914

---------------------------------------------...
| Id  | Operation          | Name | Rows  | C...
---------------------------------------------...
|   0 | SELECT STATEMENT   |      |     1 |  ...
|   1 |  SORT AGGREGATE    |      |     1 |  ...
|   2 |   TABLE ACCESS FULL| EMP  |    14 |  ...
---------------------------------------------...

Now, you and I know that the count(*) could easily, and more efficiently (especially if the number of employees in each department was large and there were lots of departments), have been answered from the materialized view. There we have all of the information we need to get the count of employees. We know that because we know things about the data that we kept from Oracle:

DEPTNO is the primary key of DEPT. That means that each EMP record will join to at most one DEPT record.

DEPTNO in EMP is a foreign key to DEPTNO in DEPT. If the DEPTNO in EMP is not a null value, then it will be joined to a row in DEPT (we won’t lose any non-null EMP records during a join)

DEPTNO in EMP is NOT NULL – this coupled with the foreign key constraint tells us we won’t lose any EMP records.

These three facts imply that if we join EMP to DEPT – each EMP row will be observed in the result set AT LEAST once and AT MOST once. Since we never told Oracle these facts, it was not able to make use of the materialized view. So, let’s make Oracle aware of them:


ops$tkyte@ORA10GR2> alter table dept
2  add constraint
3  dept_pk
4  primary key(deptno);
Table altered.

ops$tkyte@ORA10GR2> alter table emp
2  add constraint
3  emp_fk_dept
4  foreign key(deptno)
5  references dept(deptno);
Table altered.

ops$tkyte@ORA10GR2> alter table emp
2  modify deptno not null;
Table altered.

ops$tkyte@ORA10GR2> set autotrace on
ops$tkyte@ORA10GR2> select count(*) from emp;

COUNT(*)
----------
     14


Execution Plan
---------------------------------------------------...
Plan hash value: 155013515

---------------------------------------------------...
| Id  | Operation                     | Name     | ...
---------------------------------------------------...
|   0 | SELECT STATEMENT              |          | ...
|   1 |  SORT AGGREGATE               |          | ...
|   2 |   MAT_VIEW REWRITE ACCESS FULL| EMP_DEPT | ...
---------------------------------------------------...

And now Oracle is able to rewrite the query using the materialized view. Anytime that you know Oracle could use a materialized view, but it is not doing so (and you have verified you can use materialized views in general) – take a closer look at the data and ask yourself “what piece of information have I withheld from Oracle?” Nine times out of ten, you'll find a missing piece of metadata that, when included, allows Oracle to rewrite the query.

So, what happens if this is a true data warehouse and there are tens of millions of records in the above tables? You don’t really want the additional effort of verifying a foreign key relationship – you already did that in your data scrubbing routine didn’t you? In which case, you can create a non-validated constraint – one that is used to inform the database about a relationship but that has not been validated by the database itself. Let’s look at the above example again but this time we’ll simulate a load of data into an existing data warehouse (our example above is our data warehouse). We’ll drop our constraints, load the data, refresh the materialized views and add our constraints back. We’ll start with dropping the constraints:


ops$tkyte@ORA10GR2> alter table emp drop constraint emp_fk_dept;
Table altered.

ops$tkyte@ORA10GR2> alter table dept drop constraint dept_pk;
Table altered.

ops$tkyte@ORA10GR2> alter table emp modify deptno null;
Table altered.

Now, in order to simulate our load, I will insert a single new row into EMP (not much of a load but enough to demonstrate with). Then, we will refresh our materialized view:


ops$tkyte@ORA10GR2> insert into emp (empno,deptno) values ( 1, 1 );
1 row created.

ops$tkyte@ORA10GR2> exec dbms_mview.refresh( 'EMP_DEPT' );
PL/SQL procedure successfully completed.

I purposely inserted a row into EMP that will violate the constraint we are about to put back on the EMP. This is to show you what can happen if you tell Oracle to trust your data and the data is invalid. It shows that Oracle will trust you but - Oracle will give you the wrong answer – though through no fault of its own. When you use these constraints on a large warehouse, you had better be very sure of the data. Now we tell Oracle about the relationships between EMP and DEPT:


ops$tkyte@ORA10GR2> alter table dept
2  add constraint dept_pk primary key(deptno)
3  RELY enable NOVALIDATE
4  /
Table altered.

ops$tkyte@ORA10GR2> alter table emp
2  add constraint emp_fk_dept
3  foreign key(deptno) references dept(deptno)
4  RELY enable NOVALIDATE
5  /
Table altered.

ops$tkyte@ORA10GR2> alter table emp modify deptno not null NOVALIDATE;
Table altered.

So here we have told Oracle that there is a foreign key from EMP to DEPT as before, but this time we have told Oracle to trust that if it joins EMP to DEPT by DEPTNO – every row in EMP will be retrieved at least once and at most once. Oracle will perform no validating checks. That is the purpose of the NOVALIDATE and RELY options. The NOVALIDATE bypasses the checking of existing data we loaded, RELY tells Oracle to “rely” on the integrity of the data. We are now ready to query:


ops$tkyte@ORA10GR2> alter session set query_rewrite_integrity=enforced;
Session altered.

ops$tkyte@ORA10GR2> select count(*) from emp;

COUNT(*)
----------
     15

Execution Plan
----------------------------------------------...
Plan hash value: 2083865914

----------------------------------------------...
| Id  | Operation          | Name | Rows  | Co...
----------------------------------------------...
|   0 | SELECT STATEMENT   |      |     1 |   ...
|   1 |  SORT AGGREGATE    |      |     1 |   ...
|   2 |   TABLE ACCESS FULL| EMP  |    15 |   ...
----------------------------------------------...

This time QUERY_REWRITE_INTEGRITY=ENFORCED is not good enough to cause a rewrite of our query as you can see. Oracle did not rewrite the query to use the materialized view because of this. We must go down a level in query integrity. We need Oracle to “trust” us:


ops$tkyte@ORA10GR2> alter session set query_rewrite_integrity=trusted;
Session altered.

ops$tkyte@ORA10GR2> select count(*) from emp;

COUNT(*)
----------
     14


Execution Plan
--------------------------------------------------...
Plan hash value: 155013515

------------------------------------------------------...
| Id  | Operation                     | Name     | Row...
------------------------------------------------------...
|   0 | SELECT STATEMENT              |          |    ...
|   1 |  SORT AGGREGATE               |          |    ...
|   2 |   MAT_VIEW REWRITE ACCESS FULL| EMP_DEPT |    ...
------------------------------------------------------...

So, here Oracle did in fact rewrite the query but as you can see – the side effect of that in this case is the wrong answer is returned. The reason we got the wrong answer is because the “fact” that each row in EMP would be preserved in a join to DEPT is not a fact given the data we loaded. When the materialized view refreshed, it did not get the newly added EMP row. The data we told Oracle to rely on was not reliable. This demonstration points out two things:

You can use materialized views in a large data warehouse very efficiently, without having to perform lots of extra, typically redundant, verifications of the data.

BUT, you had better be 100% sure that your data is scrubbed if you ask Oracle to rely on it.

22 Comments:

Howard J. Rogers said....: Nice demo. I intend to steal it shamelessly whenever I next get blank stares when I start mentioning the RELY flag!; Mon Jan 30, 06:41:00 PM EST
Vivek Gandhi said....: Hi Tom

I have been working on data warehousing projects where it is very common not to use constraints at all (except for NOT NULL at most). On the other hand I have also faced a lot of complaints on data integrity from the business.
I am of the opinion that data integrity is more important than a little more load processing time as that will give the business users more confidence in the data warehouse.
I wanted to know what is the general practice on contraints as far as data warehousing is concerned ?; Tue Jan 31, 09:16:00 AM EST
George said....: Hi, Tom,

Very nice tip. Is this (using MV) a 10g feature only? I first tried this in my 9ir2 database, it does not use the mv. I did get the same result on a 10gr2 database.

Thanks,
George; Tue Jan 31, 03:48:00 PM EST
Thomas Kyte said....: Is this (using MV) a 10g feature only?

Actually, most of this came from Expert One on One Oracle - 8.1.6 :)

You might need some dbms_stats.set_table_stats to fake it out and say "these tables are big" or make the tables big and use dbms_stats to gather stats on them.

(this works in 8i and above, given the cbo is used and it makes sense to do it)

I upgraded my example from 8.1.6 to 10gr2 :); Tue Jan 31, 03:52:00 PM EST
Anonymous said....: Hi Tom,
I was wandering if you could do a folow up on Vivek Gandhi comments.

I was also wondering what was the general practice on constraints in the data warehouse world.
Or if you have links that discuss this that would be great.; Mon Feb 06, 08:13:00 PM EST
Thomas Kyte said....: I was wandering if you could do a folow up on Vivek Gandhi comments.

Nothing really to follow up there - Vivek describes a common situation, one that is unfortunate. Since the optimizer actually uses constraints, since constraints actually improve the reliability of your data, since constraints are as important in a warehouse (if not even more so) as any other system - it is an unfortunate situation to be in.; Tue Feb 07, 12:10:00 AM EST
Grzegorz Guzdziol said....: Is there any consequence of having ,,rely enable validate'' primary key? I noticed that after migration from 9i to 10g I need to specify it when trying to create foreign key with ,,rely disable novalidate'' - otherwise it fails with ORA-25158 (but it worked on 9i).

I think that it shouldn't because since we have our constraint enabled and validated so it has to be reliable - but maybe I'm missing something.; Mon May 22, 09:01:00 AM EDT
Thomas Kyte said....: Is there any consequence of having ,,rely enable validate'' primary key?

you just need "rely". It won't change anything about the primary key really.; Mon May 22, 09:14:00 AM EDT
Grzegorz Guzdziol said....: So I can suppose Oracle will ignore rely/norely while executing query since constraint is enabled and validated? I just want to do no harm when turning my PK's into rely mode on 10g.; Mon May 22, 12:52:00 PM EDT
Thomas Kyte said....: It was already relying on them if they are enabled and validated.; Mon May 22, 12:59:00 PM EDT
Duke Ganote said....: Grzegorz Guzdziol said "our constraint enabled and validated so it has to be reliable - but maybe I'm missing something."

You replied Oracle "was already relying on them if they are enabled and validated."

I'm assuming then that the 10g ORA-25158 error is correcting a glitch in 9i, although subtle impact escapes me.

I found no explanation in metalink, just the comment that the primary key should be tagged as rely as well.

9r2> select * from v$version;

BANNER
-----------------------------------
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
PL/SQL Release 9.2.0.6.0 - Production
CORE 9.2.0.6.0 Production
TNS for IBM/AIX RISC System/6000: Version 9.2.0.6.0 - Production
NLSRTL Version 9.2.0.6.0 - Production

9r2> create table fact ( amt number, key1 number );

Table created.

9r2> create table dim1 ( key1 number, group1 number );

Table created.

9r2> alter table dim1 add constraint dim1_pk primary key ( key1 );

Table altered.

9r2> select constraint_name, status, validated, rely from user_constraints
2 where constraint_name = 'DIM_PK';

no rows selected

9r2> select constraint_name, status, validated, rely from user_constraints
2 where constraint_name = 'DIM1_PK';

CONSTRAINT_NAME STATUS VALIDATED RELY
------------------------------ -------- ------------- ----
DIM1_PK ENABLED VALIDATED

9r2> alter table fact add constraint fact_2_dim1_fk foreign key
2 (key1) references dim1(key1) rely;

Table altered.

9r2> select constraint_name, status, validated, rely from user_constraints
2 where constraint_name = 'FACT_2_DIM1_FK';

CONSTRAINT_NAME STATUS VALIDATED RELY
------------------------------ -------- ------------- ----
FACT_2_DIM1_FK ENABLED VALIDATED RELY

but in 10gR2:

10g> select * from v$version;

BANNER
-----------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE 10.2.0.1.0 Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

10g> create table fact ( amt number, key1 number );

Table created.

10g> create table dim1 ( key1 number, group1 number );

Table created.

10g> alter table dim1 add constraint dim1_pk primary key ( key1 );

Table altered.

10g> select constraint_name, status, validated, rely from user_constraints
2 where constraint_name = 'DIM1_PK';

CONSTRAINT_NAME STATUS VALIDATED RELY
------------------------------ -------- ------------- ----
DIM1_PK ENABLED VALIDATED

10g> alter table fact add constraint fact_2_dim1_fk foreign key
2 ( key1 ) references dim1(key1) rely;
alter table fact add constraint fact_2_dim1_fk foreign key
*
ERROR at line 1:
ORA-25158: Cannot specify RELY for foreign key if the associated primary key is NORELY

10g> alter table dim1 modify constraint dim1_pk rely;

Table altered.

10g> select constraint_name, status, validated, rely from user_constraints
2 where constraint_name = 'DIM1_PK';

CONSTRAINT_NAME STATUS VALIDATED RELY
------------------------------ -------- ------------- ----
DIM1_PK ENABLED VALIDATED RELY

10g> alter table fact add constraint fact_2_dim1_fk foreign key
2 ( key1 ) references dim1(key1) rely;

Table altered.

10g> select constraint_name, status, validated, rely from user_constraints
2 where constraint_name = 'FACT_2_DIM1_FK';

CONSTRAINT_NAME STATUS VALIDATED RELY
------------------------------ -------- ------------- ----
FACT_2_DIM1_FK ENABLED VALIDATED RELY; Tue Mar 27, 12:09:00 PM EDT
r-a-v-i said....: You are the best and is my virtual Oracle guru.

The solutions that you provide with great analysis are simply fabulous.

I really appreciate all efforts you are putting in for the Oracle community.

Thanks,
r-a-v-i; Mon Mar 03, 05:05:00 PM EST
Florian said....: Thank you Kyte, youre a beacon in the darkness.; Wed Aug 27, 09:27:00 AM EDT
Thomas said....: It seems that the FK optimization does only work with old style joins. When using ANSI JOIN syntax, the execution plan is different:

SQL> set autotrace traceonly explain
SQL> SELECT o.order_date
2 FROM orders o,
3 customers c
4 WHERE c.id = o.customer_id
5 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 1275100350

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 700K| 8886K| 432 (13)| 00:00:07 |
| 1 | TABLE ACCESS FULL| ORDERS | 700K| 8886K| 432 (13)| 00:00:07 |
----------------------------------------------------------------------------

SQL> SELECT o.order_date
2 FROM orders o
3 JOIN customers c ON c.id = o.customer_id
4 ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2561859391

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 700K| 12M| 898 (58)| 00:00:14 |
| 1 | NESTED LOOPS | | 700K| 12M| 898 (58)| 00:00:14 |
| 2 | TABLE ACCESS FULL| ORDERS | 700K| 8886K| 432 (13)| 00:00:07 |
|* 3 | INDEX UNIQUE SCAN| PK_CUSTOMERS | 1 | 5 | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("O"."CUSTOMER_ID"="C"."ID")

SQL>; Tue Mar 17, 09:31:00 AM EDT
Thomas Kyte said....: @Thomas

10.2.0.4 and 11.1.0.6, here is my *full* example - you give no version, you give no example :(

ops$tkyte%ORA10GR2> create table customers ( id number primary key, data char(200) );

Table created.

ops$tkyte%ORA10GR2> create table orders ( order# number primary key, customer_id references customers, order_date date, data char(200) );

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'CUSTOMERS', numrows => 1000000, numblks => 100000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'ORDERS', numrows => 2000000, numblks => 200000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> SELECT o.order_date
2 FROM orders o,
3 customers c
4 WHERE c.id = o.customer_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 1275100350

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 2148K| 61709 (1)| 00:04:54 |
|* 1 | TABLE ACCESS FULL| ORDERS | 100K| 2148K| 61709 (1)| 00:04:54 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("O"."CUSTOMER_ID" IS NOT NULL)

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> SELECT o.order_date
2 FROM orders o
3 JOIN customers c ON c.id = o.customer_id;

Execution Plan
----------------------------------------------------------
Plan hash value: 1275100350

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 2148K| 61709 (1)| 00:04:54 |
|* 1 | TABLE ACCESS FULL| ORDERS | 100K| 2148K| 61709 (1)| 00:04:54 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("O"."CUSTOMER_ID" IS NOT NULL)

ops$tkyte%ORA10GR2> set autotrace off; Tue Mar 17, 09:45:00 AM EDT
Thomas said....: Thanks for your answer

> you give no version

Sorry, I forgot to paste that from my SQL*Plus scren

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod

running on Windows XP (developer box for testing)

> you give no example :(
Hmm I showed the SQL and the execution plan...

Orders has 700000 rows
Customers has 115000 rows

Both have been analyzed before doing the test (I did not use set_table_stats); Wed Mar 18, 11:02:00 AM EDT
Thomas Kyte said....: @Thomas

Hmm I showed the SQL and the execution plan...

A query is not a test case - I gave you a full complete one that you could run from start to finish - tell me, when you do - does it reproduce?

Your query is not useful to me, I'd need a soup to nuts example starting with table creates and so on in order to see what you see.

I took the full test case (mine) and reproduced in 10.1.0.1 (your findings, the fact that it did not do the table elimination)

So, it looks like it could have been a bug in that early release of 10gr2 - current patch sets are not affected.; Wed Mar 18, 02:50:00 PM EDT
Thomas said....: Yes, indeed seems to be a bug in 10.2.0.1.
I applied the 10.2.0.4 patch and your example works fine.

Thanks for your help.; Tue Mar 24, 04:35:00 AM EDT
yellowrose said....: Hi Tom,

Does this work only on 11g release? This and the recent article published on Oracle Magazine. (On Constraints, Metadata, and Truth)

Thanks,; Thu Apr 23, 01:19:00 AM EDT
Thomas Kyte said....: @yellowrose,

the above was run in 10gr2 - but it is true for 8i, 9i, 10g, 11g, ....

The one thing in the Oracle Magazine article that is brand new is the unnecessary table elimination, the rest is all old stuff; Thu Apr 23, 09:41:00 AM EDT
RAJESH said....: Tom:

Is this correct?

RELY ENABLE NOVALIDATE and ENABLE NOVALIDATE - Both enable contraint but dont validate existing data.

RELY ENABLE NOVALIDATE - Will peform query rewrite if query_rewrite_integrity=trusted.
ENABLE NOVALIDATE - Will **not** peform query rewrite if query_rewrite_integrity=trusted/enforced/stale_tolerated

RELY ENABLE NOVALIDATE - May yeild wrong results in answering query. So we have to be sure that data is 100% pure when defining RELY option; Sat Oct 15, 09:04:00 PM EDT
Thomas Kyte said....: @Rajesh
yes; Sat Oct 15, 10:22:00 PM EDT