What is a driving table
c barr, April 03, 2001 - 4:02 pm UTC
Good, helpful explanation - could say more.
Driving table
O. D, March 07, 2002 - 9:47 am UTC
Good morning:
Why there is no different in the order of two tables. It looks EMP is driving talbe in either order.
Thank you
select /*+ rule */ename, dname from scott.emp,
scott.dept where
scott.emp.deptno = scott.dept.deptno;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: RULE
1 0 NESTED LOOPS
2 1 TABLE ACCESS (FULL) OF 'EMP'
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'DEPT'
4 3 INDEX (UNIQUE SCAN) OF 'SYS_C0012143' (UNIQUE)
select /*+ rule */ename, dname from scott.dept,
scott.emp where
scott.emp.deptno = scott.dept.deptno;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: RULE
1 0 NESTED LOOPS
2 1 TABLE ACCESS (FULL) OF 'EMP'
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'DEPT'
4 3 INDEX (UNIQUE SCAN) OF 'SYS_C0012143' (UNIQUE)
March 07, 2002 - 10:40 am UTC
As i said:
The choice of a driving table made using many factors. For example, in the
above query if there was an index on DEPT(DEPTNO) but not EMP(DEPTNO), we would
probably use EMP as the driving table -- we would fetch rows from EMP and then
using the DEPT(DEPTNO) index -- find their matches in the DEPT table. Table
sizes, cardinality of column values, and other things can affect the choice of a
driving table (eg: HINTS).
So, you have a primary key constraint on dept(depno) -> index. You do not have an index on emp(deptno). Hence, the RBO has a rule that says "indexes are good, use them" and it chooses the driving table based on the existence of an index.
About Driving Table
Sikandar Hayat, April 28, 2003 - 5:27 am UTC
I have created index on emp(deptno) and here is the result as you already mentioned,
SCOTT > select ename,dname
2 from emp,dept
3 where emp.deptno=dept.deptno;
14 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP'
2 1 NESTED LOOPS
3 2 TABLE ACCESS (FULL) OF 'DEPT'
4 2 INDEX (RANGE SCAN) OF 'TEMP_IDX' (NON-UNIQUE)
>> From the execution Plan if we want to check that which is driving table then it seems that the the table on the TOP will be driving table?
Needs your comments.
April 28, 2003 - 8:07 am UTC
comments about what?
I described what a driving table was above?
Here we will use DEPT as the driving table, the psuedo code for that query plan is
for x in ( select * from dept ) /* table access (full) dept -- driving table */
loop
do an index lookup to find matching rows in emp
for each row found in that index lookup
output data
end loop
end loop
Need some explanation
Vikas, May 12, 2003 - 4:32 am UTC
Hi Tom,
If I have a query which has the involvement of 5 tables and the stats for these tables have been recently updated using dbms_stats, would this query using /*+ RULE */ hint slow down.
What needs to be concluded, is that the optimizer when trying to come up with the optimal plan look in for the stats even when the coder has used the RULE hint with the query?
eg. Query A having tables t1,t2,t3,t4 and t4 without the RULE Hint.
Stats not analysed for all these tables
tkprof timings : x1 sec
&
eg. Query A having tables t1,t2,t3,t4 and t4 with the RULE Hint.
Stats analysed for all these tables
tkprof timings : x2 sec
Which one of them would be faster and why? My opinion is that x1 should be faster as the tables are not analysed and RULE Hint works well enough with the Syntax & the order of the tables instead of the statistics computed in the data dictionary.
Your comments are appreciated.
Regards,
Vikas
May 12, 2003 - 7:44 am UTC
In theory, the query without the hint using the CBO will go as fast, if not faster then the query with the hint.
the CBO has many more plans available to it. the CBO is more sophisticated.
driving table for rule based optimizer
Raju, January 18, 2006 - 3:24 am UTC
Oracle version : 9.0.1.4.0
Optimizer : RBO
in our project, though they are running on Oracle 9i they still use rule based optimizer. i am new to this project, so i have below questions about driving table in rule based optimizer.(CBO is so useful it decides everything). please answer :
1.how a driving table is determined for RBO
is it only based on the order of the tables(last in the from clause is the driving table)? or anything else is considered?
from the execution plan, how can we determine which table is being used as driving table by the Rule based optimizer
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP'
2 1 NESTED LOOPS
3 2 TABLE ACCESS (FULL) OF 'DEPT'
4 2 INDEX (RANGE SCAN) OF 'TEMP_IDX' (NON-UNIQUE)
in the above plan which is driving table, is it emp or dept?
2. how a where clause is executed by RBO? is it top to bottom or bottom to top. so, should we place static conditions first or join conditions first? for e.g.
which is better :
where emp.grade = 'g40'
and dept.dname = 'PRODUCTION'
and emp.deptno = dept.deptno
is good or
where
emp.deptno = dept.deptno
and dept.dname = 'PRODUCTION
and emp.grade = 'g40'
January 18, 2006 - 8:18 am UTC
there are rules documented for the rbo, they are in the performance guide, </code>
http://docs.oracle.com/docs/cd/A87860_01/doc/server.817/a76992/optimops.htm#38864 <code>
if all else fails (no predicates of use to access a table), the RBO goes from right to left in the from clause, bottom up on the where/join conditions.
In your plan, looks like
for x in ( select * from DEPT )
loop
do index lookup on EMP
output joined rows
end loop
'dept' is driving the query.
A reader, January 19, 2006 - 11:51 am UTC
Hi Tom,
J. Lewis provides with the following example on his cost-based Oracle book
begin
dbms_stats.set_system_stats('MBRC',8);
dbms_stats.set_system_stats('MREADTIM',20);
dbms_stats.set_system_stats('SREADTIM',10);
dbms_stats.set_system_stats('CPUSPEED',500);
end;
/
create table driver (
id,
xref,
padding
)
nologging
as
select
rownum,
rownum,
rpad('x',20)
from
all_objects
where
rownum <= 500
;
alter table driver add constraint d_pk
primary key(id)
;
create table target (
id,
small_vc,
padding
)
as
select
rownum,
to_char(rownum),
rpad('x',20)
from
all_objects
where
rownum <= 3000
;
analyze tables with cascade ....
explain plan for
select
/*+ ordered use_nl(t) index(t) full(d) */
d.id, t.small_vc
from
driver d,
target t
where
t.id = d.xref
and t.padding is not null
;
SQL> select operation, rpad(options,15), object_name, io_cost, cpu_Cost from plan_table;
OPERATION RPAD(OPTIONS,15 OBJECT_NAME IO_COST CPU_COST
------------------------------ --------------- ------------------------------ ---------- ----------
SELECT STATEMENT 503 8995134
NESTED LOOPS 503 8995134
TABLE ACCESS FULL DRIVER 3 114054
TABLE ACCESS BY INDEX ROWID TARGET 1 17763
INDEX UNIQUE SCAN T_PK 11822
Can you give me an idea as to why the IO cost of the index unique scan on t_pk is free ? This doesn't seem very reasonable at first sight.
Thanks !
excellent
Suresh Bansal, January 20, 2006 - 2:31 am UTC
We could get clear concepts as to how Oracle returns result set by joining multiple tables.
To a reader from Scotland
Jonathan Lewis, January 21, 2006 - 9:46 am UTC
Your example looks like chapter 11, prefetch_test_02.sql with &1 set to 500.
The I/O cost of the PK index access being free also looks unreasonable to me - but that's the way it seems to be (at least in 9.2.0.6)
There is a note in ch.4 to the effect that the cost of accessing a row by primary key is calculated in the 'normal fashion' but then reduced by one - that is part of the reason. But that would give an I/O cost of 1 for the access in this case.
However if we further assume that for nested loops the optimizer is coded to treat the root block of the target index as pinned for the duration (i.e. guaranteed to stay in memory) then this could justify a further reduction by one - giving the zero that we see.
Ideally, of course, if this is the intent of the code then it probably ought to try counting the root block exactly once, regardless of how many times the driving table accesses the second table - but if the basic code expects to do something like 'cardinality of table 1 * cost of accessing table 2', then simply ignoring the root block is the closest easy approximation.
Thanks to Mr. Lewis
A reader, January 24, 2006 - 12:20 pm UTC
Hi,
Thanks for the answer.
Your book as well as your articles are excellent. I highly recommend them.
Great! Always helpful
Faisal, July 02, 2008 - 4:54 am UTC
Hi Tom,
Thanks for this article. I was just looking for an answer to a similar question and found this page :)
This will settle my confusion.
Thanks again.
Driving table -
Jayadevan, May 26, 2010 - 12:19 am UTC
Hi Tom,
I think this is related to the driving table issue, not sure though. I have 3 tables - mt1, mt2 and mt3.Here is what happens -
SQL> create table mt1( id integer);
Table created.
SQL> create table mt2( id integer);
Table created.
SQL> insert into mt1 select rownum from all_objects;
13891 rows created.
SQL> insert into mt2 select rownum from all_objects where rownum < 8;
7 rows created.
SQL> create table mt3 as select * from mt2;
Table created.
SQL> select id from mt2
2 join
3 mt1
4 on
5 mt2.id = mt1.id join mt3
6 on mt2.id=mt3.id;
ID
----------
1
2
3
4
5
6
7
7 rows selected.
SQL> select id from mt2
2 join
3 mt1
4 on
5 mt2.id = mt1.id ;
select id from mt2
*
ERROR at line 1:
ORA-00918: column ambiguously defined
I am surprised that a 2 table query gives me an error and a 3 table query fetches data. This is a very simplified version of our actual query.
May 26, 2010 - 7:41 am UTC
you are right, both should have failed, the first one succeeding is an error:
ops$tkyte%ORA11GR2> select id from mt2
2 join
3 mt1
4 on
5 mt2.id = mt1.id join mt3
6 on mt2.id=mt3.id;
select id from mt2
*
ERROR at line 1:
ORA-00918: column ambiguously defined
ops$tkyte%ORA11GR2> select id from mt2
2 join
3 mt1
4 on
5 mt2.id = mt1.id ;
select id from mt2
*
ERROR at line 1:
ORA-00918: column ambiguously defined
it correctly fails in current releases.
Correction
Jayadevan, May 26, 2010 - 4:44 am UTC
Hi Tom,
In case you may wonder what is the 'driving table issue' I am referring to - let me clarify. It looks like Oracle is assuming I am trying to get the column from the driving table and hence it is not throwing an error when I use ANSI style syntax.
If I switch to
select id from mt2, mt1,mt3 where
mt2.id = mt1.id and mt2.id=mt3.id; , Oracle throws an error, as expected.
Driver table
Jayadevan, May 27, 2010 - 3:39 am UTC
Hi Tom,
Thanks a lot for your prompt response, as always. The information - that this has been fixed in current releases, is very useful. It looks like we have many such queries in our product. Now we know one action item before we upgrade the database - to fix those queries. By the way, we came across this issue when we ported our product to PostgreSQL - all these queries started breaking.
Thanks again.
Driving table in Query
Abhisek, August 05, 2010 - 12:31 pm UTC
Hi Tom,
Hope you are doing well. I have a question for the driving table concept for multiple tables.
SELECT t1.a, t2.b
FROM TABLE1 t1, TABLE2 t2, TABLE3 t3, TABLE4 t4
WHERE t1.ID = t2.ID
AND t3.ID = t2.ID
AND INSTR(t3.COL1,t4.COL1) > 0
AND t2.col2 = 1
AND t2.col3 in (1,2,3)
Now lets consider t2 can have multiple rows for t1.id, table t3 can have multiple entries for table t2. t4 is the smallest one containing some flags for table 3.
Please suggest what will be the driving table and what is the best way to join them (reordering of join conditions)
Any documents with example will be appreciated.
August 05, 2010 - 1:13 pm UTC
it would "depend"
you are joining t1 to t2 and t2 to t3 and then a non-equi join of t3 to t4.
is t2.col2,t2.col3 indexed (in a single index)? If so, and "AND t2.col2 = 1
AND t2.col3 in (1,2,3)" returns few rows - I might like to see T2 drive the query (find the rows in T2 - the FEW rows in T2 that satisfy that).
Then maybe pick up t1 (probably only one row in t1 for each t2 but you don't say, I'll assume - the optimizer would hopefully know because you told the optimizer about the schema relations via constraints right...), then get to t3 and do that nasty with t4 - or maybe just do t3 and t4 first to create a subresult that I join t1+t2 to.
It depends - it depends on the statistics, it depends on the estimated cardinalities, it depends on the indexing schema, it depends on the constraints in place - it totally depends.
Any of them could be the driving table - if t3 and t4 are really small - they could be done first and then use that to pick up t2 and then t1. The optimizer will try out the sensible join orders and based on estimated cardinalities - determine which one results in the fewest amount of IO's/CPU being used. That is the plan it would choose.
Driving table
Abhisek, June 21, 2011 - 7:40 am UTC
hi Tom,
I have seen many examples of driving table concept using two tables: emp and dept
What happens in case we have more than 2 tables. Lets say we have three tables and the join is as:
SELECT e.employee_id, j.job_title, e.salary, d.department_name
FROM employees e, jobs j, departments d
WHERE e.employee_id < 103
AND e.job_id = j.job_id
AND e.department_id = d.department_id;
1. What is considered as Driving table here and why? How does CBO decide the driving table?
2. What happens if we don't have the predicate e.employee_id < 103? Will the driving table be same?
3. What could be the easiet way to pick the driving table from explain plan if we have > 3 tables in join.
Thanks a lot.
June 21, 2011 - 8:02 am UTC
1) it depends - the optimizer will look at the joins and the predicates - try to decide which row source should go first in order to find the minimum set of rows to start with.
2) The lack of a predicate would change the estimated cardinalities, which would likely change the plan - I stress LIKELY, it might - it might not change the plan.
3) run an explain plan and see what the optimizer says :) It'll show you the tables and the order in which they would be accessed.
Thanks a lot for clarification
Abhisek, June 21, 2011 - 8:12 am UTC