Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Stephen .

Asked: May 02, 2000 - 1:21 pm UTC

Last updated: June 21, 2011 - 8:02 am UTC

Version:

Viewed 50K+ times! This question is

You Asked

Could you please explain the concept of 'driving table' and
When joining a number of tables of varying sizes in a 'Select'
statement which one is the driving table and does it matter where
it goes in the 'from 'statement in the Select clause
Also is there any preferred order with the statements that join
the tables in the 'where' clause



and Tom said...



The 'driving' table is the table we will join FROM -- that is JOIN TO other tables. For example, lets say you have the query:



select * from emp, dept where emp.deptno = dept.deptno;

In this case the driving table might be DEPT, we would fetch rows from DEPT in a full scan and then find the rows in EMP that match. DEPT is the driving table.

The choice of a driving table made using many factors. For example, in the above query if there was an index on DEPT(DEPTNO) but not EMP(DEPTNO), we would probably use EMP as the driving table -- we would fetch rows from EMP and then using the DEPT(DEPTNO) index -- find their matches in the DEPT table. Table sizes, cardinality of column values, and other things can affect the choice of a driving table (eg: HINTS).

When using the Rule Based Optimizer -- the placement of tables in the FROM clause is relevant. We process the from clause from the RIGHT to the LEFT -- we would tend to pick a driving table from the end of the FROM list. There is a hint in the Cost Based Optimizer to have this happen as well.

When using CBO -- the order of tables is not relevant (unless you hint it to be). We use the statistics and data dictionary to determine which table is best to be used as the driving table.



Rating

  (16 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

What is a driving table

c barr, April 03, 2001 - 4:02 pm UTC

Good, helpful explanation - could say more.



Driving table

O. D, March 07, 2002 - 9:47 am UTC

Good morning:
Why there is no different in the order of two tables. It looks EMP is driving talbe in either order.

Thank you

select /*+ rule */ename, dname from scott.emp,
scott.dept where
scott.emp.deptno = scott.dept.deptno;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: RULE
1 0 NESTED LOOPS
2 1 TABLE ACCESS (FULL) OF 'EMP'
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'DEPT'
4 3 INDEX (UNIQUE SCAN) OF 'SYS_C0012143' (UNIQUE)
select /*+ rule */ename, dname from scott.dept,
scott.emp where
scott.emp.deptno = scott.dept.deptno;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: RULE
1 0 NESTED LOOPS
2 1 TABLE ACCESS (FULL) OF 'EMP'
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'DEPT'
4 3 INDEX (UNIQUE SCAN) OF 'SYS_C0012143' (UNIQUE)

Tom Kyte
March 07, 2002 - 10:40 am UTC

As i said:

The choice of a driving table made using many factors. For example, in the
above query if there was an index on DEPT(DEPTNO) but not EMP(DEPTNO),
we would
probably use EMP as the driving table -- we would fetch rows from EMP and then
using the DEPT(DEPTNO) index -- find their matches in the DEPT table. Table
sizes, cardinality of column values, and other things can affect the choice of a
driving table (eg: HINTS).



So, you have a primary key constraint on dept(depno) -> index. You do not have an index on emp(deptno). Hence, the RBO has a rule that says "indexes are good, use them" and it chooses the driving table based on the existence of an index.

About Driving Table

Sikandar Hayat, April 28, 2003 - 5:27 am UTC

I have created index on emp(deptno) and here is the result as you already mentioned,

SCOTT > select ename,dname
2 from emp,dept
3 where emp.deptno=dept.deptno;

14 rows selected.


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP'
2 1 NESTED LOOPS
3 2 TABLE ACCESS (FULL) OF 'DEPT'
4 2 INDEX (RANGE SCAN) OF 'TEMP_IDX' (NON-UNIQUE)

>> From the execution Plan if we want to check that which is driving table then it seems that the the table on the TOP will be driving table?

Needs your comments.

Tom Kyte
April 28, 2003 - 8:07 am UTC

comments about what?

I described what a driving table was above?

Here we will use DEPT as the driving table, the psuedo code for that query plan is

for x in ( select * from dept ) /* table access (full) dept -- driving table */
loop
do an index lookup to find matching rows in emp
for each row found in that index lookup
output data
end loop
end loop

Need some explanation

Vikas, May 12, 2003 - 4:32 am UTC

Hi Tom,

If I have a query which has the involvement of 5 tables and the stats for these tables have been recently updated using dbms_stats, would this query using /*+ RULE */ hint slow down.

What needs to be concluded, is that the optimizer when trying to come up with the optimal plan look in for the stats even when the coder has used the RULE hint with the query?

eg. Query A having tables t1,t2,t3,t4 and t4 without the RULE Hint.
Stats not analysed for all these tables
tkprof timings : x1 sec
&
eg. Query A having tables t1,t2,t3,t4 and t4 with the RULE Hint.
Stats analysed for all these tables
tkprof timings : x2 sec

Which one of them would be faster and why? My opinion is that x1 should be faster as the tables are not analysed and RULE Hint works well enough with the Syntax & the order of the tables instead of the statistics computed in the data dictionary.

Your comments are appreciated.

Regards,
Vikas

Tom Kyte
May 12, 2003 - 7:44 am UTC

In theory, the query without the hint using the CBO will go as fast, if not faster then the query with the hint.

the CBO has many more plans available to it. the CBO is more sophisticated.



driving table for rule based optimizer

Raju, January 18, 2006 - 3:24 am UTC

Oracle version : 9.0.1.4.0

Optimizer : RBO

in our project, though they are running on Oracle 9i they still use rule based optimizer. i am new to this project, so i have below questions about driving table in rule based optimizer.(CBO is so useful it decides everything). please answer :

1.how a driving table is determined for RBO

is it only based on the order of the tables(last in the from clause is the driving table)? or anything else is considered?

from the execution plan, how can we determine which table is being used as driving table by the Rule based optimizer

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP'
2 1 NESTED LOOPS
3 2 TABLE ACCESS (FULL) OF 'DEPT'
4 2 INDEX (RANGE SCAN) OF 'TEMP_IDX' (NON-UNIQUE)

in the above plan which is driving table, is it emp or dept?

2. how a where clause is executed by RBO? is it top to bottom or bottom to top. so, should we place static conditions first or join conditions first? for e.g.

which is better :

where emp.grade = 'g40'
and dept.dname = 'PRODUCTION'
and emp.deptno = dept.deptno

is good or

where
emp.deptno = dept.deptno
and dept.dname = 'PRODUCTION
and emp.grade = 'g40'

Tom Kyte
January 18, 2006 - 8:18 am UTC

there are rules documented for the rbo, they are in the performance guide, </code> http://docs.oracle.com/docs/cd/A87860_01/doc/server.817/a76992/optimops.htm#38864 <code>

if all else fails (no predicates of use to access a table), the RBO goes from right to left in the from clause, bottom up on the where/join conditions.


In your plan, looks like

for x in ( select * from DEPT )
loop
do index lookup on EMP
output joined rows
end loop

'dept' is driving the query.



A reader, January 19, 2006 - 11:51 am UTC

Hi Tom,

J. Lewis provides with the following example on his cost-based Oracle book 

begin
    dbms_stats.set_system_stats('MBRC',8);
    dbms_stats.set_system_stats('MREADTIM',20);
    dbms_stats.set_system_stats('SREADTIM',10);
    dbms_stats.set_system_stats('CPUSPEED',500);
end;
/

create table driver (
    id,
    xref,
    padding
)
nologging
as
select 
    rownum,
    rownum,
    rpad('x',20)
from
    all_objects
where 
    rownum <= 500
;

alter table driver add constraint d_pk
primary key(id)
;

create table target (
    id,
    small_vc,
    padding
)
as
select
    rownum,
    to_char(rownum),
    rpad('x',20)
from
    all_objects
where 
    rownum <= 3000
;

analyze tables with cascade ....

explain plan for
select 
    /*+ ordered use_nl(t) index(t) full(d) */
    d.id, t.small_vc
from
    driver    d,
    target    t
where
    t.id = d.xref
and    t.padding is not null
;


SQL> select operation, rpad(options,15), object_name, io_cost, cpu_Cost from plan_table;

OPERATION                      RPAD(OPTIONS,15 OBJECT_NAME                       IO_COST   CPU_COST
------------------------------ --------------- ------------------------------ ---------- ----------
SELECT STATEMENT                                                                     503    8995134
NESTED LOOPS                                                                         503    8995134
TABLE ACCESS                   FULL            DRIVER                                  3     114054
TABLE ACCESS                   BY INDEX ROWID  TARGET                                  1      17763
INDEX                          UNIQUE SCAN     T_PK                                           11822


Can you give me an idea as to why the IO cost of the index unique scan on t_pk is free ? This doesn't seem very reasonable at first sight.

Thanks ! 

excellent

Suresh Bansal, January 20, 2006 - 2:31 am UTC

We could get clear concepts as to how Oracle returns result set by joining multiple tables.

To a reader from Scotland

Jonathan Lewis, January 21, 2006 - 9:46 am UTC

Your example looks like chapter 11, prefetch_test_02.sql with &1 set to 500.

The I/O cost of the PK index access being free also looks unreasonable to me - but that's the way it seems to be (at least in 9.2.0.6)

There is a note in ch.4 to the effect that the cost of accessing a row by primary key is calculated in the 'normal fashion' but then reduced by one - that is part of the reason. But that would give an I/O cost of 1 for the access in this case.

However if we further assume that for nested loops the optimizer is coded to treat the root block of the target index as pinned for the duration (i.e. guaranteed to stay in memory) then this could justify a further reduction by one - giving the zero that we see.

Ideally, of course, if this is the intent of the code then it probably ought to try counting the root block exactly once, regardless of how many times the driving table accesses the second table - but if the basic code expects to do something like 'cardinality of table 1 * cost of accessing table 2', then simply ignoring the root block is the closest easy approximation.


Thanks to Mr. Lewis

A reader, January 24, 2006 - 12:20 pm UTC

Hi,

Thanks for the answer.

Your book as well as your articles are excellent. I highly recommend them.

Great! Always helpful

Faisal, July 02, 2008 - 4:54 am UTC

Hi Tom,

Thanks for this article. I was just looking for an answer to a similar question and found this page :)

This will settle my confusion.

Thanks again.

Driving table -

Jayadevan, May 26, 2010 - 12:19 am UTC

Hi Tom,
I think this is related to the driving table issue, not sure though. I have 3 tables - mt1, mt2 and mt3.Here is what happens - 
SQL> create table mt1( id integer);

Table created.

SQL> create table mt2( id integer);

Table created.

SQL> insert into mt1 select rownum from all_objects;

13891 rows created.

SQL> insert into mt2 select rownum from all_objects where rownum < 8;

7 rows created.

SQL> create table mt3 as select * from mt2;

Table created.

SQL>  select id  from mt2
  2   join
  3   mt1
  4   on
  5   mt2.id = mt1.id join mt3
  6  on  mt2.id=mt3.id;

        ID
----------
         1
         2
         3
         4
         5
         6
         7

7 rows selected.

SQL>  select id  from mt2
  2   join
  3   mt1
  4   on
  5   mt2.id = mt1.id ;
 select id  from mt2
        *
ERROR at line 1:
ORA-00918: column ambiguously defined

I am surprised that a 2 table query gives me an error and a 3 table query fetches data. This is a very simplified version of our actual query.

Tom Kyte
May 26, 2010 - 7:41 am UTC

you are right, both should have failed, the first one succeeding is an error:

ops$tkyte%ORA11GR2>  select id  from mt2
  2   join
  3   mt1
  4   on
  5   mt2.id = mt1.id join mt3
  6  on  mt2.id=mt3.id;
 select id  from mt2
        *
ERROR at line 1:
ORA-00918: column ambiguously defined


ops$tkyte%ORA11GR2>  select id  from mt2
  2   join
  3   mt1
  4   on
  5   mt2.id = mt1.id ;
 select id  from mt2
        *
ERROR at line 1:
ORA-00918: column ambiguously defined



it correctly fails in current releases.

Correction

Jayadevan, May 26, 2010 - 4:44 am UTC

Hi Tom,
In case you may wonder what is the 'driving table issue' I am referring to - let me clarify. It looks like Oracle is assuming I am trying to get the column from the driving table and hence it is not throwing an error when I use ANSI style syntax.
If I switch to
select id from mt2, mt1,mt3 where
mt2.id = mt1.id and mt2.id=mt3.id; , Oracle throws an error, as expected.

Driver table

Jayadevan, May 27, 2010 - 3:39 am UTC

Hi Tom,
Thanks a lot for your prompt response, as always. The information - that this has been fixed in current releases, is very useful. It looks like we have many such queries in our product. Now we know one action item before we upgrade the database - to fix those queries. By the way, we came across this issue when we ported our product to PostgreSQL - all these queries started breaking.
Thanks again.

Driving table in Query

Abhisek, August 05, 2010 - 12:31 pm UTC

Hi Tom,

Hope you are doing well. I have a question for the driving table concept for multiple tables.

SELECT t1.a, t2.b
FROM TABLE1 t1, TABLE2 t2, TABLE3 t3, TABLE4 t4
WHERE t1.ID = t2.ID
AND t3.ID = t2.ID
AND INSTR(t3.COL1,t4.COL1) > 0
AND t2.col2 = 1
AND t2.col3 in (1,2,3)

Now lets consider t2 can have multiple rows for t1.id, table t3 can have multiple entries for table t2. t4 is the smallest one containing some flags for table 3.

Please suggest what will be the driving table and what is the best way to join them (reordering of join conditions)

Any documents with example will be appreciated.
Tom Kyte
August 05, 2010 - 1:13 pm UTC

it would "depend"

you are joining t1 to t2 and t2 to t3 and then a non-equi join of t3 to t4.


is t2.col2,t2.col3 indexed (in a single index)? If so, and "AND t2.col2 = 1
AND t2.col3 in (1,2,3)" returns few rows - I might like to see T2 drive the query (find the rows in T2 - the FEW rows in T2 that satisfy that).

Then maybe pick up t1 (probably only one row in t1 for each t2 but you don't say, I'll assume - the optimizer would hopefully know because you told the optimizer about the schema relations via constraints right...), then get to t3 and do that nasty with t4 - or maybe just do t3 and t4 first to create a subresult that I join t1+t2 to.


It depends - it depends on the statistics, it depends on the estimated cardinalities, it depends on the indexing schema, it depends on the constraints in place - it totally depends.

Any of them could be the driving table - if t3 and t4 are really small - they could be done first and then use that to pick up t2 and then t1. The optimizer will try out the sensible join orders and based on estimated cardinalities - determine which one results in the fewest amount of IO's/CPU being used. That is the plan it would choose.

Driving table

Abhisek, June 21, 2011 - 7:40 am UTC

hi Tom,

I have seen many examples of driving table concept using two tables: emp and dept

What happens in case we have more than 2 tables. Lets say we have three tables and the join is as:

SELECT e.employee_id, j.job_title, e.salary, d.department_name
FROM employees e, jobs j, departments d
WHERE e.employee_id < 103
AND e.job_id = j.job_id
AND e.department_id = d.department_id;

1. What is considered as Driving table here and why? How does CBO decide the driving table?
2. What happens if we don't have the predicate e.employee_id < 103? Will the driving table be same?
3. What could be the easiet way to pick the driving table from explain plan if we have > 3 tables in join.

Thanks a lot.

Tom Kyte
June 21, 2011 - 8:02 am UTC

1) it depends - the optimizer will look at the joins and the predicates - try to decide which row source should go first in order to find the minimum set of rows to start with.


2) The lack of a predicate would change the estimated cardinalities, which would likely change the plan - I stress LIKELY, it might - it might not change the plan.

3) run an explain plan and see what the optimizer says :) It'll show you the tables and the order in which they would be accessed.

Thanks a lot for clarification

Abhisek, June 21, 2011 - 8:12 am UTC


More to Explore

Performance

Get all the information about database performance in the Database Performance guide.