Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Alberto.

Asked: March 12, 2005 - 10:30 am UTC

Last updated: March 29, 2005 - 8:21 am UTC

Version: any

Viewed 1000+ times

You Asked

Hi Tom,

q in the subject.

I'm very very curious about what you enjoyed and brought back ...

and we said...

#1 -- watching Cary Millsap presenting after Mardi Gras night:

</code> http://www.oreillynet.com/pub/wlg/6633 <code>

(that is from his presentation the next morning -- NOT mardis gras night...)

the rest, not ranked -- just in order of the days...

o Tanel Poder talking about how to use named pipes, a little grep and tail to add "interactive viewing of wait events and other things as they happen in sqlplus"

o Mogens Nørgaard giving a rather rambling talk during mardi gras night, a story that had no beginning, and no apparent end -- but was a pretty fun journey (eg: this was not technical, just very funny)

o Steve Adams reaffirming that I got my chapter on hash joins in Effective Oracle by Design right (at least upto 9ir2 -- but I have to update it now for 10g :)

o Dominic Delmolino doing a talk on data skew -- which reaffirmed so clearly why hints are bad. The developers of a 3rd party application had hinted most sql so heavily -- to do things in the way they thought all customers would obviously do (we should drive from table a to table b to table c because this is how people will use our software). Unfortunately for Dominic -- they used it the way they wanted to -- and not so good performance (in part due to hints that made the optimizer use a plan that was not appropriate for the data they actually had!). This short 1/2 hour talk was perhaps my hilight of the show -- A classic example of why overhinting, especially in applications you are going to install more than once, is not really a good idea.



And meeting all of the people... That is really the best part. It is hard to just have 5 (I really liked Lex De Haans talk on "nothing about nulls", cary's talks are always excellent, I really liked an entire hour on instrumentation -- something I find very important.......)

Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

The top 5

Mark A. Williams, March 12, 2005 - 5:39 pm UTC

I thought you would have at least included a bit about your microphone :)

- Mark

Tom Kyte
March 13, 2005 - 9:35 am UTC

that would not have been a hilight!

Alberto Dell'Era, March 12, 2005 - 5:48 pm UTC

Great, i was just searching for the Authors' portraits to decorate my copy of "Optimizing Oracle Performance", so now i'm halfway the path ... ;)

Hash joins changes in 10g - interesting - any hint ?

Tom Kyte
March 13, 2005 - 9:36 am UTC

that hash outer joins apparently can switch driving tables now. It was always counter intuitive to me the way it was done -- now it apparently can be driven from either side (haven't tested myself as yet and maybe I heard it wrong, but something to investigate)

What about your own session

Doug, March 12, 2005 - 6:04 pm UTC

You did a day long event after the end of the conference, no? How did that go and do you plan on doing that sort of thing anywhere else?

Tom Kyte
March 13, 2005 - 9:38 am UTC

Well, it wasn't my most enjoyed session -- those most enjoyed sessions are to be had from the other side of the stage!

I do them from time to time at user group training days -- nothing scheduled right now though.

outer join/semi join

amit poddar, March 13, 2005 - 10:17 am UTC

Now in 10g oracle can switch the driving table for outer joins and semi joins based on the sizes of the row source.  The hash table and probe table was fixed in outer join and semi joins till 9i. Now in 10g the hash table and probe table can be switched

Consider the following  (example ran in 10g)

SQL> exec dbms_stats.set_table_stats(OWNNAME=>'AP349', TABNAME=>'EMP', NUMROWS=>100000);

PL/SQL procedure successfully completed.

SQL>  1  select  a.ename,b.dname
  2      from emp a, dept b
  3*     where a.deptno(+)=b.deptno
SQL> /

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=100000 Bytes=2200000)
   1    0   HASH JOIN (OUTER) (Cost=9 Card=100000 Bytes=2200000)
   2    1     TABLE ACCESS (FULL) OF 'DEPT' (TABLE) (Cost=2 Card=4 Bytes=52)
   3    1     TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=4 Card=100000 Bytes=900000)


  1  select  a.ename,b.dname
  2      from emp a, dept b
  3*     where a.deptno=b.deptno(+)
SQL> /

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=100000 Bytes=2200000)
   1    0   HASH JOIN (RIGHT OUTER) (Cost=9 Card=100000 Bytes=2200000)
   2    1     TABLE ACCESS (FULL) OF 'DEPT' (TABLE) (Cost=2 Card=4 Bytes=52)
   3    1     TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=4 Card=100000 Bytes=900000)

this is even true for a semi join. Oracle can use either of the table as the the hash table


SQL> select * from dept where deptno in (select deptno from emp);

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=3 Bytes=69)
   1    0   HASH JOIN (SEMI) (Cost=9 Card=3 Bytes=69)
   2    1     TABLE ACCESS (FULL) OF 'DEPT' (TABLE) (Cost=2 Card=4 Bytes=80)
   3    1     TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=4 Card=100000 Bytes=300000)


SQL>  select * from emp where deptno in (select deptno from dept);

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=100000 Bytes=4000000)
   1    0   HASH JOIN (RIGHT SEMI) (Cost=9 Card=100000 Bytes=4000000)
   2    1     TABLE ACCESS (FULL) OF 'DEPT' (TABLE) (Cost=2 Card=4 Bytes=12)
   3    1     TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=4 Card=100000 Bytes=3700000)





 

Tom Kyte
March 13, 2005 - 10:51 am UTC

I think a better way to see this is to keep the query constant, but change the stats.  So, in 9i:

ops$tkyte@ORA9IR2> create table emp as select * from scott.emp;
 
Table created.
 
ops$tkyte@ORA9IR2> create table dept as select * from scott.dept;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats( user, 'EMP', numrows => 10000000, numblks => 100000 );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats( user, 'DEPT', numrows => 4, numblks => 1 );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select dept.dname, emp.ename
  2    from emp, dept
  3   where emp.deptno(+) = dept.deptno;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=15221 Card=10000000 Bytes=420000000)
   1    0   HASH JOIN (OUTER) (Cost=15221 Card=10000000 Bytes=420000000)
   2    1     TABLE ACCESS (FULL) OF 'DEPT' (Cost=2 Card=4 Bytes=88)
   3    1     TABLE ACCESS (FULL) OF 'EMP' (Cost=15179 Card=10000000 Bytes=200000000)
 
 
 
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats( user, 'DEPT', numrows => 10000000, numblks => 100000 );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.set_table_stats( user, 'EMP', numrows => 4, numblks => 1 );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select dept.dname, emp.ename
  2    from emp, dept
  3   where emp.deptno(+) = dept.deptno;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=20347 Card=10000000 Bytes=420000000)
   1    0   HASH JOIN (OUTER) (Cost=20347 Card=10000000 Bytes=420000000)
   2    1     TABLE ACCESS (FULL) OF 'DEPT' (Cost=15179 Card=10000000 Bytes=220000000)
   3    1     TABLE ACCESS (FULL) OF 'EMP' (Cost=2 Card=4 Bytes=80)
 
 
 
ops$tkyte@ORA9IR2> set autotrace off


<b>The plan would always be that way -- in 10g however, either table can be used to drive:</b>


ops$tkyte@ORA10G> create table emp as select * from scott.emp;
 
Table created.
 
ops$tkyte@ORA10G> create table dept as select * from scott.dept;
 
Table created.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> exec dbms_stats.set_table_stats( user, 'EMP', numrows => 10000000, numblks => 100000 );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G> exec dbms_stats.set_table_stats( user, 'DEPT', numrows => 4, numblks => 1 );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> set autotrace traceonly explain
ops$tkyte@ORA10G> select dept.dname, emp.ename
  2    from emp, dept
  3   where emp.deptno(+) = dept.deptno;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=22579 Card=10000000 Bytes=420000000)
   1    0   HASH JOIN (OUTER) (Cost=22579 Card=10000000 Bytes=420000000)
   2    1     TABLE ACCESS (FULL) OF 'DEPT' (TABLE) (Cost=2 Card=4 Bytes=88)
   3    1     TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=22344 Card=10000000 Bytes=200000000)
 
 
 
ops$tkyte@ORA10G> set autotrace off
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> exec dbms_stats.set_table_stats( user, 'DEPT', numrows => 10000000, numblks => 100000 );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G> exec dbms_stats.set_table_stats( user, 'EMP', numrows => 4, numblks => 1 );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G>
ops$tkyte@ORA10G> set autotrace traceonly explain
ops$tkyte@ORA10G> select dept.dname, emp.ename
  2    from emp, dept
  3   where emp.deptno(+) = dept.deptno;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=22579 Card=10000000 Bytes=420000000)
   1    0   HASH JOIN (RIGHT OUTER) (Cost=22579 Card=10000000 Bytes=420000000)
   2    1     TABLE ACCESS (FULL) OF 'EMP' (TABLE) (Cost=2 Card=4 Bytes=80)
   3    1     TABLE ACCESS (FULL) OF 'DEPT' (TABLE) (Cost=22344 Card=10000000 Bytes=220000000)
 
 
 
ops$tkyte@ORA10G> set autotrace off


<b>so, it can flip flop the table it chooses to hash in 10g, whereas in 9i and before it would not</b>

 

Alberto Dell'Era, March 13, 2005 - 11:29 am UTC

Thanks Tom and Amit ...

So the smallest of the tables is chosen in 10g.

This is very intuitive since the "old" behaviour (described in "Effective Oracle by Design", page 470) required only an additional "bit" in the hash table to mark the matched rows, bit that the new "RIGHT OUTER" join doesn't need at all - but the difference in the memory requirements are negligible.

So the most important remaining performance inhibitor is the dimension of the hash table because the bigger the table, the more memory it takes (and perhaps - more visits to the temporary tablespace), and probably the more time it takes to probe (more hash collisions).

And - the more time it takes to send the first row back, because we must wait for the hash table to be completely built before sending the first row.

Exactly how Hash joins work.

Sai, March 29, 2005 - 4:07 am UTC

Tom,

You have explained hash join's well in your book, but I still don't get few thing's:

Let's say table T1 is 5G and table T2 is 10G in size, and the plan says it is using hash join for an equijoin sql statement.

1) T1 table in full is read into buffer cache and apply hash function. Apply hash on chunks of the table or one big hash?
2) T2 table in full is read into buffer cache, each row will be compared against T1 hash function.

Questions:
Q1) Does it sort T1 hash table, if not how does the probe from T2 goes to the right hash function value?
Q2) If T1 table was hashed in chunks, does the probe from T2 table start after the T1 hash table is completly built?

Thanks much for your time.


Tom Kyte
March 29, 2005 - 8:21 am UTC

1) <quote src=effective oracle by design>
The processing is a little different if the hash table being developed does not fit entirely in memory. In that case, Oracle will use TEMP space to hold portions (partitions) of the hash table and perhaps of the table that will be used to probe this hash table. When the hash area fills up and Oracle needs to use TEMP, it will find the largest partition within the hash table and write that out to TEMP. (Oracle actually uses a two-part hash table: there is a partition, or section, of the hash table a row belongs to and then a position in that partition.) Any new row retrieved that belongs to that on-disk partition will modify the on-disk image to contain it.
</quote>

q1) it hashed T1, not sorted. We hash a key in T1, we hash the same key in T2 to find it. (read about the two part key and the swapping that goes on -- it is all there)

q2) yes. t1 is hashed and then rows can start being returned as soon as we get the first t2 row.

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.