Skip to Main Content
  • Questions
  • Difference between Sort Merge and Hash Joins

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Logan.

Asked: March 19, 2002 - 1:24 pm UTC

Last updated: June 01, 2012 - 7:06 am UTC

Version: 8.1.6

Viewed 10K+ times! This question is

You Asked

Tom,

What is the difference between "Sort Merge" and "Hash" Joins. Don't they both do a one FULL scan each on the joining tables and join them?

I know Sort Merge is used in the case of "ALL ROWS" and Nested Loops in the case of "FIRST ROWS" hints. How about Has Join? When is it used?

Would really appreciate if you could explain it with a couple of examples.

Thanks in advance.

and Tom said...

Well, a sort merge of A and B is sort of like this:

read A and sort by join key to temp_a
read B and sort by join key to temp_b

read a record from temp_a
read a record from temp_b
while NOT eof on temp_a and temp_b
loop
if ( temp_a.key = temp_b.key ) then output joined record
elsif ( temp_a.key <= temp_b.key ) read a record from temp_a
elsif ( temp_a.key >= temp_b.key ) read a record from temp_b )
end loop

(its more complex then that, the above logic assumed the join key was unique -- we really need to join every match in temp_a to every match in temp_b but you get the picture)

The hash join is conceptually like:

create a hash table on one of A or B (say A) on the join key creating temp_a.

while NOT eof on B
read a record in b
hash the join key and look up into temp_a by that hash key for matching
records
output the matches
end loop

So, a hash join can sometimes be much more efficient (one hash, not two sorts)

Hash joins are used any time a sort merge might be used in most cases. If you don't see hash joins going on, perhaps you have hash_join_enabled turned off...




Rating

  (41 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Re: Hash and Sort Merge joins

Logan Palanisamy, March 19, 2002 - 9:09 pm UTC

Very good.



are yu saying...

Nag, March 19, 2002 - 10:05 pm UTC

That

1.Hash join is more beneficial in most cases than sort merge join. The way you have expressed your opinion it appears that we should choose hash join over sort merge join.

2.When should we use a sort merge join and hash join.(i.e. when we are using cbo, WHERE IN we can force tables to do a particular kind of join.

3.Which versions introduced both these joins.
Thanks



Tom Kyte
March 20, 2002 - 11:28 am UTC

1) you should choose nothing, the optimizer should be making this choice for you.

2) when the optimizer says so.

3) sort merge has been around forever. hash joins were 7.2 or 7.3...



What is the hash function used?

Logan Palanisamy, March 20, 2002 - 11:12 am UTC

What is the Hash function that is normally used?

Also, I would assume it is the bigger of the two tables involved in the join that is hashed. Am I correct?

Tom Kyte
March 20, 2002 - 2:42 pm UTC

It is an internal hash function, not documented.

It would seem more logical to me to hash the SMALLER table (less temp stuff to store, might fit into memory and all)....


From the Designing/tuning for performance guide:

Hash Join

Oracle can only perform a hash join for an equijoin. Hash join is not available with the RBO. You must enable hash join optimization, using the initialization parameter HASH_JOIN_ENABLED (which can be set with the ALTER SESSION statement) or
the USE_HASH hint.

To perform a hash join, Oracle performs the following steps:
1. Oracle performs a full table scan on each of the tables and splits each into as many partitions as possible based on the available memory.

2. Oracle builds a hash table from one of the partitions (if possible, Oracle selects a partition that fits into available memory). Oracle then uses the corresponding partition in the other table to probe the hash table. All partition pairs that do not fit into memory are placed onto disk.

3. For each pair of partitions (one from each table), Oracle uses the smaller one to build a hash table and the larger one to probe the hash table.




Monitoring means

john, May 17, 2002 - 5:17 am UTC

Tom,

How to monitor that the hash table couldn't fit
into hash_area_size?



Tom Kyte
May 17, 2002 - 10:11 am UTC

You can use the stat physical writes to see this.  Consider:



ops$tkyte@ORA817DEV.US.ORACLE.COM> /*
DOC>drop table t;
DOC>
DOC>create table t as select * from all_objects;
DOC>*/

ops$tkyte@ORA817DEV.US.ORACLE.COM> select a.name, b.value
  2  from v$statname a, v$mystat b
  3  where a.statistic# = b.statistic#
  4  and a.name = 'physical writes'
  5  /

NAME                                VALUE
------------------------------ ----------
physical writes                         0

ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set hash_area_size = 2048000;
Session altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly
ops$tkyte@ORA817DEV.US.ORACLE.COM> select       * from t t1, t t2 where t1.object_id = t2.object_id;

22962 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=129 Card=22962 Bytes=4041312)
   1    0   HASH JOIN (Cost=129 Card=22962 Bytes=4041312)
   2    1     TABLE ACCESS (FULL) OF 'T' (Cost=10 Card=22962 Bytes=2020656)
   3    1     TABLE ACCESS (FULL) OF 'T' (Cost=10 Card=22962 Bytes=2020656)




Statistics
----------------------------------------------------------
          0  recursive calls
         24  db block gets
        828  consistent gets
        714  physical reads
          0  redo size
    4733537  bytes sent via SQL*Net to client
     170255  bytes received via SQL*Net from client
       1532  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      22962  rows processed

ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace off
ops$tkyte@ORA817DEV.US.ORACLE.COM> select a.name, b.value
  2  from v$statname a, v$mystat b
  3  where a.statistic# = b.statistic#
  4  and a.name = 'physical writes'
  5  /

NAME                                VALUE
------------------------------ ----------
physical writes                       714

<b>that shows it took 714 physical writes to do this query -- they were due to the 2m hash area size.  If we make that 20m instead:</b>

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set hash_area_size = 20480000;

Session altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly
ops$tkyte@ORA817DEV.US.ORACLE.COM> select       * from t t1, t t2 where t1.object_id = t2.object_id;

22962 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=43 Card=22962 Bytes=4041312)
   1    0   HASH JOIN (Cost=43 Card=22962 Bytes=4041312)
   2    1     TABLE ACCESS (FULL) OF 'T' (Cost=10 Card=22962 Bytes=2020656)
   3    1     TABLE ACCESS (FULL) OF 'T' (Cost=10 Card=22962 Bytes=2020656)




Statistics
----------------------------------------------------------
          0  recursive calls
         24  db block gets
       2149  consistent gets
          0  physical reads
          0  redo size
    4536381  bytes sent via SQL*Net to client
     170255  bytes received via SQL*Net from client
       1532  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      22962  rows processed

ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace off
ops$tkyte@ORA817DEV.US.ORACLE.COM> select a.name, b.value
  2  from v$statname a, v$mystat b
  3  where a.statistic# = b.statistic#
  4  and a.name = 'physical writes'
  5  /

NAME                                VALUE
------------------------------ ----------
physical writes                       714


<b>it didn't add any physical writes, no "swapping" if you will.</b>

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter session set hash_area_size = 2048000;

Session altered.

ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace traceonly
ops$tkyte@ORA817DEV.US.ORACLE.COM> select       * from t t1, t t2 where t1.object_id = t2.object_id;

22962 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=129 Card=22962 Bytes=4041312)
   1    0   HASH JOIN (Cost=129 Card=22962 Bytes=4041312)
   2    1     TABLE ACCESS (FULL) OF 'T' (Cost=10 Card=22962 Bytes=2020656)
   3    1     TABLE ACCESS (FULL) OF 'T' (Cost=10 Card=22962 Bytes=2020656)




Statistics
----------------------------------------------------------
          0  recursive calls
         24  db block gets
        828  consistent gets
        714  physical reads
          0  redo size
    4733537  bytes sent via SQL*Net to client
     170255  bytes received via SQL*Net from client
       1532  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      22962  rows processed

ops$tkyte@ORA817DEV.US.ORACLE.COM> set autotrace off
ops$tkyte@ORA817DEV.US.ORACLE.COM> select a.name, b.value
  2  from v$statname a, v$mystat b
  3  where a.statistic# = b.statistic#
  4  and a.name = 'physical writes'
  5  /

NAME                                VALUE
------------------------------ ----------
physical writes                      1428

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
<b>taking it back down to 2m, we see 714 writes once again...</b>
 

Very Good

kiro, May 18, 2002 - 4:29 am UTC


monitoring in parallel & which table was choosen as hash ...

Nemec, August 07, 2002 - 6:59 pm UTC

Hi Tom,

a) could be the above monitoring (swapping of hash area and other resource consumption) also be done in case of running the hash join in parallel?

b) can I see in a explain plan which of the two tables was choosen to be hashed (is it the first table?)

c) In the "Oracle9i Database Performance Guide and Reference
Release 1 (9.0.1)"
is a sentence for OUTER hash join I don't understand:
The outer table (whose rows are being preserved) is used to build the hash table, and the inner table is used to probe the hash table.

I would choose the opposite way (all rows from the outer table must be processed, independent of matching or not matching the inner table). Can you explain?

Thanks


Nemec


What if the query is more complicated?

Matt, January 22, 2003 - 5:32 pm UTC

What if the query is more complicated and required some sorts as well as HASH JOINS?

Is this just a matter of identifying what space was used in the TEMPORORARY tablespace for the sorts (using v$sort_usage and v$sort_segment etc) and then subtracting this from the Physical Reads as you have described?

Are there any other ways that physical reads can occur when carrying out a SELECT?

Is there a nice neat query for some general SELECT that might be able to display how the physical reads are distributed amongst the various operations carried out during the generation of the row sets?

Thanks and Regards



Tom Kyte
January 22, 2003 - 5:52 pm UTC

In 9iR2 (9.2.0.2) yes -- tkprof does that

big_table@ORA920> set autotrace traceonly
big_table@ORA920> alter session set sql_trace=true;

Session altered.

big_table@ORA920> select a.object_name, b.object_type from big_table a, big_table b where a.object_id = b.object_id
2 and a.object_name like '%Z%';

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=13710 Card=91696 Bytes=3209360)
1 0 MERGE JOIN (Cost=13710 Card=91696 Bytes=3209360)
2 1 SORT (JOIN) (Cost=2552 Card=91693 Bytes=2017246)
3 2 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=2136 Card=91693 Bytes=2017246)
4 1 SORT (JOIN) (Cost=11159 Card=1833857 Bytes=23840141)
5 4 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=2136 Card=1833857 Bytes=23840141)



select a.object_name, b.object_type from big_table a, big_table b where a.object_id = b.object_id
and a.object_name like '%Z%'

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.02 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 21298 10.32 11.54 13526 35232 0 319452
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 21300 10.35 11.56 13526 35232 0 319452

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 64

Rows Row Source Operation
------- ---------------------------------------------------
319452 HASH JOIN (cr=35232 r=13526 w=0 time=10359215 us)
9338 TABLE ACCESS FULL BIG_TABLE (cr=13021 r=13011 w=0 time=3035938 us)
1000000 TABLE ACCESS FULL BIG_TABLE (cr=22211 r=515 w=0 time=2070694 us)

cr=consistent reads
r =physical reads
w =writes


Any workaround for 9.0.1.4.0?

Matt, January 22, 2003 - 8:11 pm UTC

Looks good. Is there some way of doing something like this in 9.0.1.4.0?

I can see JOIN operations in v$sql_plan but have no idea how any related info might be pulled out of here.

In your execution plan I see a Bytes value:

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=13710 Card=91696
Bytes=3209360)
1 0 MERGE JOIN (Cost=13710 Card=91696 Bytes=3209360)
2 1 SORT (JOIN) (Cost=2552 Card=91693 Bytes=2017246)
3 2 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=2136 Card=91693
Bytes=2017246)
4 1 SORT (JOIN) (Cost=11159 Card=1833857 Bytes=23840141)
5 4 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=2136 Card=1833857
Bytes=23840141)

What does this (Bytes Value) indicate?

Also in your HASH JOIN operation I see "time=10359215 us":

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 64

Rows Row Source Operation
------- ---------------------------------------------------
319452 HASH JOIN (cr=35232 r=13526 w=0 time=10359215 us)
9338 TABLE ACCESS FULL BIG_TABLE (cr=13021 r=13011 w=0 time=3035938 us)
1000000 TABLE ACCESS FULL BIG_TABLE (cr=22211 r=515 w=0 time=2070694 us)

cr=consistent reads
r =physical reads
w =writes

What does this ("us") represent?



Tom Kyte
January 23, 2003 - 7:46 am UTC

bytes = estimated number of bytes flowing out of that step of the execution plan. Consider (timestamp = varchar2(19), secondary = varchar2(1))

big_table@ORA920> select timestamp from big_table;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1253 Card=1001693 Bytes=19032167)
1 0 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=1253 Card=1001693 Bytes=19032167)



big_table@ORA920> select secondary from big_table;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1253 Card=1001693 Bytes=1001693)
1 0 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=1253 Card=1001693 Bytes=1001693)



as for the tkprof thingy -- 9.2.0.2 -- new feature. Not even in 9.2.0.1.


us - microsecond - 1,000,000th of a second




What are Anti-Join and Semi-Joins?

Logan Palanisamy, March 17, 2003 - 5:30 pm UTC

Tom,

I understand equi-joins, outer join (full, left and right), inner-join, natrual join, hash join, sort-merge join, etc.

What are anti-join and semi-joins? They are being referenced in 9iR2 (HASH_AJ, HASH_SJ, MERGE_AJ, MERGE_SJ), but I couldn't find anywhere about what they basically are? Can you give some examples?

Tom Kyte
March 17, 2003 - 7:06 pm UTC

This sounds interesting - an "ANTI JOIN". What could this be? An ANTI join is used to return rows from a table that are not present, not in some other row source. For example:

Select * from dept where deptno NOT IN ( select deptno from emp )

Might use an ANTI Join between DEPT and EMP and return only those rows in DEPT that didn't join to anything in EMP. Another way to write that query would be:

Select dept.*
From dept, emp
Where dept.deptno = emp.deptno(+)
And emp.rowid is null;

There we outer join DEPT to EMP - and only keep the rows where the EMP row did not exist. Another way you might find this query written could be:

Select *
From dept
Where NOT EXISTS
( select null from emp where emp.deptno = dept.deptno );

or

select *
from dept
where deptno NOT IN ( select deptno from emp );


An ANTI Join is generally the most efficient method of processing queries of this nature.


Semi-Joins are joins like:

select * from dept where deptno in ( select deptno from emp );

or

select * from dept where exists ( select null from emp where dept.deptno = dept.deptno )


we cannot turn either of the in or exists into a straight JOIN -- that would return too many rows from DEPT (dept would be multiplied by EMP). Hence, we do what is known as a semi join. We join DEPT to EMP but just stop after the first hit on EMP for any DEPT record. It is not a full join, just a partial or semi join.

Optimizer choice – Hash vs. Nested loops.

Hirisave Pratap, May 08, 2003 - 12:31 pm UTC

In order to analyze a performance problem, I recreated production database on another machine and used the same initialization parameters. When the query was run in the test bed, the optimizer choose nested loop where as the production database uses hash cluster join.

To be sure, it was verified that the number of records in the underlying tables, indexes, view definitions, etc. are identical in both databases. Also the name and value from v$parameter in both databases matched. The hash join is enabled in both database, SGAs are of same size, etc. (trust me :-)). The disk cluster sizes (RAID5) are different for the two databases.

What is the logic behind the choices made (hash join vs. nested loop) by the two optimizers?

#2. In analyzing further, it was observed that the select count(*) from one of the underlying tables resulted INDEX (FAST FULL SCAN) in one database and INDEX (RANGE SCAN) in the other database.

What could account for this difference?

Thanks in advance and appreciate your input!

Pratap


Tom Kyte
May 09, 2003 - 11:55 am UTC

search this site for 10053

run that sort of trace (you only need to explain plan the query after setting that event)

then, compare the trace files and see what is really different -- especially some init.ora parameters.

things like cluster factor, stale stats, different settings will do this. The software, given the same inputs, will come to the same conclusion. So, inputs must not be the same (or the software isn't the same)

Few questions regarding Tuning

Vikas, May 27, 2003 - 5:50 am UTC

Q.1 Under Scott schema, I executed a query :

Select * from emp,dept where emp.deptno = dept.deptno in which the result expected is 14 rows. With such a small subset of rows returned it is preferred to have a NESTED LOOP join access method instead of HASH JOIN which is preffered in case of data set after join is more than 10,000 rows (in Server tuning documentation).

Tables are analysed using DBMS_STATS.gather_schema_stats

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=19 Bytes=1083)
1 0 HASH JOIN (Cost=5 Card=19 Bytes=1083)
2 1 TABLE ACCESS (FULL) OF 'DEPT' (Cost=2 Card=4 Bytes=80)
3 1 TABLE ACCESS (FULL) OF 'EMP' (Cost=2 Card=19 Bytes=703)

Statistics
----------------------------------------------------------
5 recursive calls
0 db block gets
12 consistent gets
0 physical reads
0 redo size
2149 bytes sent via SQL*Net to client
510 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
14 rows processed

where as

Select empno,ename,mgr,sal,comm,emp.deptno from emp,dept where emp.deptno = dept.deptno; results into the explain plan

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=19 Bytes=494)
1 0 NESTED LOOPS (Cost=2 Card=19 Bytes=494)
2 1 TABLE ACCESS (FULL) OF 'EMP' (Cost=2 Card=19 Bytes=437)
3 1 INDEX (UNIQUE SCAN) OF 'PK_DEPT' (UNIQUE)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
1291 bytes sent via SQL*Net to client
510 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
14 rows processed

which is using NESTED LOOPS. It is also observed that as soon as a column name from the Second table is asked for retreival the HASH JOIN takes over from NESTED LOOPS.

Select empno,ename,mgr,sal,comm,emp.deptno,dname from emp,dept where emp.deptno = dept.deptno;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=19 Bytes=684)
1 0 HASH JOIN (Cost=5 Card=19 Bytes=684)
2 1 TABLE ACCESS (FULL) OF 'DEPT' (Cost=2 Card=4 Bytes=52)
3 1 TABLE ACCESS (FULL) OF 'EMP' (Cost=2 Card=19 Bytes=437)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
8 consistent gets
0 physical reads
0 redo size
1482 bytes sent via SQL*Net to client
510 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
14 rows processed

Can you explain it why it is not concentrating on the join set for the no. of rows retrieved? On what conditions HASH JOIN superceeds NESTED LOOPS join access.

Q.2 From the Oracle documentation <Quote>

Cardinality represents the number of rows in a row set. Here, the row set can be a base table, a view, or the result of a join or GROUP BY operator. <Quote>

Should it not be the distinct values of the row set and not just the number of rows. Moreover can you please explain us the concept of Effective cardinality,Join Cardinality using the example of EMP and DEPT tables, using some where condition so that the Selectivity comes into picture.

Thanks in anticipation.
Vikas










Tom Kyte
May 27, 2003 - 7:52 am UTC

hash join supercedes nested loops whenever the CBO decides to.  There is no cut and dry formula here (well, there is but it is huge). There are dozens of variables that affect this (init.ora settings).

As soon as you went for a plan that had to hit all of the table blocks in both tables, nested loops becomes much less "appealing".  For example:

ops$tkyte@ORA920> create table emp as select * from scott.emp;

Table created.

ops$tkyte@ORA920> alter table emp add constraint emp_pk primary key(empno);

Table altered.

ops$tkyte@ORA920> create table dept as select * from scott.dept;

Table created.

ops$tkyte@ORA920> alter table dept add constraint dept_pk primary key(deptno);

Table altered.

ops$tkyte@ORA920> analyze table emp compute statistics;

Table analyzed.

ops$tkyte@ORA920> analyze table dept compute statistics;

Table analyzed.

ops$tkyte@ORA920>
ops$tkyte@ORA920> show parameter optimizer_index

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_index_caching              integer     0
optimizer_index_cost_adj             integer     100
ops$tkyte@ORA920> set autotrace traceonly explain
ops$tkyte@ORA920>
ops$tkyte@ORA920> select * from emp, dept where emp.deptno = dept.deptno;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=14 Bytes=700)
   1    0   HASH JOIN (Cost=5 Card=14 Bytes=700)
   2    1     TABLE ACCESS (FULL) OF 'DEPT' (Cost=2 Card=4 Bytes=72)
   3    1     TABLE ACCESS (FULL) OF 'EMP' (Cost=2 Card=14 Bytes=448)



ops$tkyte@ORA920> alter session set optimizer_index_cost_adj = 10;

Session altered.

ops$tkyte@ORA920> alter session set optimizer_index_caching  = 90;

Session altered.

ops$tkyte@ORA920> select * from emp, dept where emp.deptno = dept.deptno;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=14 Bytes=700)
   1    0   NESTED LOOPS (Cost=4 Card=14 Bytes=700)
   2    1     TABLE ACCESS (FULL) OF 'EMP' (Cost=2 Card=14 Bytes=448)
   3    1     TABLE ACCESS (BY INDEX ROWID) OF 'DEPT' (Cost=2 Card=1 Bytes=18)
   4    3       INDEX (UNIQUE SCAN) OF 'DEPT_PK' (UNIQUE)



ops$tkyte@ORA920> set autotrace off


q2) no, it should not be the number of distinct values.  It is the number of rows -- period.

if I "select 1 from all_objects" should the cardinality of that be:

a) 1
b) 32,000 (on my system all_objects view has about 32k rows)


the only sensible answer is b) 

Cardinality

Vikas, May 28, 2003 - 6:12 am UTC

Hi Tom,

Agreed. If the row set as mentioned by you gives you 32K rows then the base cardinality for that row set has to be 32K & not the distinct values which is 1 in that case.

But if we have a column say "sex" and we have values M or F stored and the tbale has 32K of rows then what is the base cardinality of that column? I think it is 2 and not 32K because this is the only way to identify the low cardinality columns for creating the Bitmap indexes.

Please do correct me if I am wrong somewhere. Also please do mention the meaning of :

Select emp.* from emp,dept where emp.deptno = dept.deptno;

1. Join Cardinality and how to calculate it? and how does optimizer know at the compile time that what is the selectivity of that Join as the join cardinality is the

Product of two cardinalities * Selectivity of the Join condition.

Thanks
Vikas
Please do illustriate it with a example demonstrating two tables.



Tom Kyte
May 28, 2003 - 7:10 am UTC

no, the number of distinct values is 2.

the cardinality -- as defined by the definition of cardinality -- is 32k

search this site for 10053 to see some examples of how to get low level info such as you appear to want to see from the optimizer (however, I won't really help you parse a 10053 trace file too deeply, it is generally "more then we need to know" and very version specific. mostly useful for development to fix an issue)

Nested Loop VS Hash Join

vinodhps, May 28, 2003 - 8:57 am UTC

Hi Tom ,
As u said CBO choose the join depend on some of the parameter in the init.ora.

question

1) when , i execute query in test database(9.0.2.0) it comes very fast with in a minute. where as in production database(8.0.4.0) it takes more than 4 hrs or more.. why like that and plans are totally differnt.

2) like this some query take more time that i could not able to trace them even, it take hrs so i aborptly terminate the process, is there any way to trace(10046,10053) these kind of queries.(eg.some thing like explain plan set statement_id for query), if i terminate the process iam getting only empty trace files.. please suggest easy method to trace these queries.

3)Regarding you answer for Cardinality some times its showing total no of row for the subset but some time i return distinct value.In the below plan it shows card=3 where as the query returns 1294 rows, then what is that 3 can u please explain me tom.

Thanks in advance and your Response are really great.

oracle version 8.0.4

Query Plan
-------------------------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=4246 Rows=3 Bytes=496
SORT UNIQUE
UNION-ALL
FILTER
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
HASH JOIN
NESTED LOOPS
TABLE ACCESS BY INDEX ROWID T_ORD_PRD_DTLS [ANALYZED]
INDEX RANGE SCAN PRD_IDX [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_ORD_ORDER_HDR [ANALYZED]
INDEX UNIQUE SCAN CS_ORD_HDR_PK [ANALYZED]
VIEW
FILTER
SORT GROUP BY
HASH JOIN OUTER
FILTER
HASH JOIN OUTER
TABLE ACCESS FULL T_MKS_CONS_MST [ANALYZED]
TABLE ACCESS FULL T_MKS_CONS_CRED_LMT_MST [ANALYZED]
TABLE ACCESS FULL T_FAS_SALE_AWA [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_MKS_CONS_MST [ANALYZED]
INDEX UNIQUE SCAN CS_CONS_MST_PK [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_PIS_CITY_MST [ANALYZED]
INDEX UNIQUE SCAN CS_CITY_MST_PK [ANALYZED]
VIEW
FILTER
SORT GROUP BY
HASH JOIN OUTER
FILTER
HASH JOIN OUTER
TABLE ACCESS FULL T_MKS_IND_MST [ANALYZED]
TABLE ACCESS FULL T_MKS_IND_CRED_LMT_MST [ANALYZED]
TABLE ACCESS FULL T_FAS_SALE_AWA [ANALYZED]
INDEX RANGE SCAN CS_MKS_STATE_HDAY_MST_PK [ANALYZED]
FILTER
SORT GROUP BY
FILTER
HASH JOIN OUTER
FILTER
NESTED LOOPS OUTER
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS FULL T_ORD_SCHDL_DTLS [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_ORD_PRD_DTLS [ANALYZED]
INDEX RANGE SCAN PRD_IDX [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_ORD_ORDER_HDR [ANALYZED]
INDEX UNIQUE SCAN CS_ORD_HDR_PK [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_MKS_CONS_MST [ANALYZED]
INDEX UNIQUE SCAN CS_CONS_MST_PK [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_PIS_CITY_MST [ANALYZED]
INDEX UNIQUE SCAN CS_CITY_MST_PK [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_MKS_CONS_MST [ANALYZED]
INDEX UNIQUE SCAN CS_CONS_MST_PK [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_MKS_CONS_CRED_LMT_MST [ANALYZED]
INDEX RANGE SCAN CS_MKS_CONS_CRED_LMT_MST_PK [ANALYZED]
TABLE ACCESS FULL T_FAS_SALE_AWA [ANALYZED]
INDEX RANGE SCAN CS_MKS_STATE_HDAY_MST_PK [ANALYZED]
FILTER
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
HASH JOIN
NESTED LOOPS
TABLE ACCESS BY INDEX ROWID T_ORD_PRD_DTLS [ANALYZED]
INDEX RANGE SCAN PRD_IDX [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_ORD_ORDER_HDR [ANALYZED]
INDEX UNIQUE SCAN CS_ORD_HDR_PK [ANALYZED]
VIEW
FILTER
SORT GROUP BY
HASH JOIN OUTER
FILTER
HASH JOIN OUTER
TABLE ACCESS FULL T_MKS_CONS_MST [ANALYZED]
TABLE ACCESS FULL T_MKS_CONS_CRED_LMT_MST [ANALYZED]
TABLE ACCESS FULL T_FAS_SALE_AWA [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_ORD_EXPORT_DTLS [ANALYZED]
INDEX UNIQUE SCAN CS_ORD_EXPORT_PK [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_PIS_CITY_MST [ANALYZED]
INDEX UNIQUE SCAN CS_CITY_MST_PK [ANALYZED]
VIEW
FILTER
SORT GROUP BY
HASH JOIN OUTER
FILTER
HASH JOIN OUTER
TABLE ACCESS FULL T_MKS_IND_MST [ANALYZED]
TABLE ACCESS FULL T_MKS_IND_CRED_LMT_MST [ANALYZED]
TABLE ACCESS FULL T_FAS_SALE_AWA [ANALYZED]
INDEX RANGE SCAN CS_MKS_STATE_HDAY_MST_PK [ANALYZED]

95 rows selected.




oracle version 9.0.2


oiisdba@OIIS8.WORLD> @exp

Query Plan
---------------------------------------------------------------------------------------------------
SELECT STATEMENT [CHOOSE] Cost=3905 Rows=3 Bytes=897
SORT UNIQUE
UNION-ALL
FILTER
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS BY INDEX ROWID T_ORD_PRD_DTLS [ANALYZED]
INDEX RANGE SCAN PRD_IDX [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_ORD_ORDER_HDR [ANALYZED]
INDEX UNIQUE SCAN CS_ORD_HDR_PK [ANALYZED]
VIEW
FILTER
SORT GROUP BY
HASH JOIN OUTER
VIEW
FILTER
NESTED LOOPS OUTER
TABLE ACCESS FULL T_MKS_CONS_MST [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_MKS_CONS_CRED_LMT_MST [ANALYZED]
INDEX RANGE SCAN CS_MKS_CONS_CRED_LMT_MST_PK [ANALYZED]
TABLE ACCESS FULL T_FAS_SALE_AWA [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_MKS_CONS_MST [ANALYZED]
INDEX UNIQUE SCAN CS_CONS_MST_PK [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_PIS_CITY_MST [ANALYZED]
INDEX UNIQUE SCAN CS_CITY_MST_PK [ANALYZED]
VIEW
FILTER
SORT GROUP BY
HASH JOIN OUTER
VIEW
FILTER
HASH JOIN OUTER
TABLE ACCESS FULL T_MKS_IND_MST [ANALYZED]
TABLE ACCESS FULL T_MKS_IND_CRED_LMT_MST [ANALYZED]
TABLE ACCESS FULL T_FAS_SALE_AWA [ANALYZED]
INDEX FAST FULL SCAN CS_MKS_STATE_HDAY_MST_PK [ANALYZED]
FILTER
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS FULL T_ORD_SCHDL_DTLS [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_ORD_PRD_DTLS [ANALYZED]
INDEX UNIQUE SCAN CS_ORD_PRD_DTL_PK [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_ORD_ORDER_HDR [ANALYZED]
INDEX UNIQUE SCAN CS_ORD_HDR_PK [ANALYZED]
VIEW
FILTER
SORT GROUP BY
HASH JOIN OUTER
VIEW
FILTER
NESTED LOOPS OUTER
TABLE ACCESS FULL T_MKS_CONS_MST [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_MKS_CONS_CRED_LMT_MST [ANALYZED]
INDEX RANGE SCAN CS_MKS_CONS_CRED_LMT_MST_PK [ANALYZED]
TABLE ACCESS FULL T_FAS_SALE_AWA [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_MKS_CONS_MST [ANALYZED]
INDEX UNIQUE SCAN CS_CONS_MST_PK [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_PIS_CITY_MST [ANALYZED]
INDEX UNIQUE SCAN CS_CITY_MST_PK [ANALYZED]
INDEX FAST FULL SCAN CS_MKS_STATE_HDAY_MST_PK [ANALYZED]
FILTER
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
NESTED LOOPS
TABLE ACCESS BY INDEX ROWID T_ORD_PRD_DTLS [ANALYZED]
INDEX RANGE SCAN PRD_IDX [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_ORD_ORDER_HDR [ANALYZED]
INDEX UNIQUE SCAN CS_ORD_HDR_PK [ANALYZED]
VIEW
FILTER
SORT GROUP BY
HASH JOIN OUTER
VIEW
FILTER
NESTED LOOPS OUTER
TABLE ACCESS FULL T_MKS_CONS_MST [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_MKS_CONS_CRED_LMT_MST [ANALYZED]
INDEX RANGE SCAN CS_MKS_CONS_CRED_LMT_MST_PK [ANALYZED]
TABLE ACCESS FULL T_FAS_SALE_AWA [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_ORD_EXPORT_DTLS [ANALYZED]
INDEX UNIQUE SCAN CS_ORD_EXPORT_PK [ANALYZED]
TABLE ACCESS BY INDEX ROWID T_PIS_CITY_MST [ANALYZED]
INDEX UNIQUE SCAN CS_CITY_MST_PK [ANALYZED]
VIEW
FILTER
SORT GROUP BY
HASH JOIN OUTER
VIEW
FILTER
HASH JOIN OUTER
TABLE ACCESS FULL T_MKS_IND_MST [ANALYZED]
TABLE ACCESS FULL T_MKS_IND_CRED_LMT_MST [ANALYZED]
TABLE ACCESS FULL T_FAS_SALE_AWA [ANALYZED]
INDEX FAST FULL SCAN CS_MKS_STATE_HDAY_MST_PK [ANALYZED]

102 rows selected.


Tom Kyte
May 28, 2003 - 6:37 pm UTC

1) you confuse me

"test is 902"
"prod is 804"

that cannot be. that maybe a test instance -- but for some other application entirely.


804 and 902, besides being 5 generations and as many years apart, will respond entirely differently.


I pray you don't think testing on 902 can be done for 804.

Hash join Vs Nested Loop

Vinodhps, May 28, 2003 - 10:50 am UTC

Hi Tom,
Sorry i have given wrong versions. first plan is from 9.0.2(Test database) and second one is from 8.0.4(Live databse) have given vice versa..sorry.

i will be happy know the difference between the plans and how test database is faster.

Thank you

Hi Tom

vinodhps, May 28, 2003 - 11:59 pm UTC

Hi Tom,
Don't worry Due to server problem we have not upgraded and we already got server for 2terabyte . End of June we will upgrade to 9.0.2.

please can you tell me what make the query to take 4 to 5 hrs
what is the exact problem over there in the plan.

so that it will be helpfull for me to learn. about the plan.

Thanks,


Tom Kyte
May 29, 2003 - 8:08 am UTC

I cannot, insufficient data (and no, I don't want the extra data here -- it is not relevant to the original question that was asked)

Hash join two large tables

Tony, May 29, 2003 - 5:31 am UTC

Here is an excerpt of a statement from a dba when asked about poor performance on a large load script, which was causing a hash_join of two tables.

"This is because a small set of data being joined to a large set of data is good for Hash Join. but two large
data sets of a similar size is known to perform badly. A NL is better in that circumstance."

Is this true ??? If it is true why does our optimizer choose to hash join. We have a large hash area and have shown that the hash table has been stored in memory.



Tom Kyte
May 29, 2003 - 11:22 am UTC

that dba is smoking something. BULK OPERATIONS like a hash join become even more superior at the sizes go up.

Think about what a NL join involves:

for every row in big table 1 (read using good multi-block io)
loop
for every index in an index range scan
loop
do a single block io on big table 2
end loop
end loop

that inner loop is executed for each row in big table 1. that is hugely expensive.

Consider -- big_table and big_table2 are two 3.8 million row tables. Tell me, which query would you like to use? which query do you think scales "better"?


big_table@ORA920> set echo on
big_table@ORA920>
big_table@ORA920> set timing on
big_table@ORA920> set autotrace traceonly
big_table@ORA920> select *
2 from big_table2 t1, big_table t2
3 where t1.id = t2.id
4 /

3817191 rows selected.

Elapsed: 00:10:10.13

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=27627 Card=3817191 Bytes=778706964)
1 0 HASH JOIN (Cost=27627 Card=3817191 Bytes=778706964)
2 1 TABLE ACCESS (FULL) OF 'BIG_TABLE2' (Cost=5303 Card=3817191 Bytes=389353482)
3 1 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=5261 Card=3817191 Bytes=389353482)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
109833 consistent gets
218828 physical reads
0 redo size
609541217 bytes sent via SQL*Net to client
2799768 bytes received via SQL*Net from client
254481 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3817191 rows processed

big_table@ORA920> select /*+ use_nl(t1,t2) index(t1 big_table2_pk) */ *
2 from big_table2 t1, big_table t2
3 where t1.id = t2.id
4 /

3817191 rows selected.

Elapsed: 00:32:05.67

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7639643 Card=3817191 Bytes=778706964)
1 0 NESTED LOOPS (Cost=7639643 Card=3817191 Bytes=778706964)
2 1 TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=5261 Card=3817191 Bytes=389353482)
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE2' (Cost=2 Card=1 Bytes=102)
4 3 INDEX (UNIQUE SCAN) OF 'BIG_TABLE2_PK' (UNIQUE) (Cost=1 Card=1)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12011606 consistent gets
3857264 physical reads
0 redo size
540186137 bytes sent via SQL*Net to client
2799768 bytes received via SQL*Net from client
254481 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3817191 rows processed

big_table@ORA920> set autotrace off

Choosing between Hash Join and Nested Loop or Merge Join

Vivek Sharma, May 29, 2003 - 4:23 pm UTC

Dear Tom,

Thanks for your support. This article was very helpful to understand the differences in all of these joins.

What I want to know is, when writing a query on some big tables, suppose 4 tables and all of them containing around 1.5 Lakh rows. Assuming that the query will return around 80000 to 90000 rows. Understanding from this article, Hash Join are better than Nested Loop or Sort Joins. Shall we provide the hint to use Hash Join or the optimizer will choose the best for us.

Thanks in advance.
Regards
Vivek Sharma

Tom Kyte
May 29, 2003 - 6:52 pm UTC

NO HINTS, let the optimizer do its job...

Maybe next year, a hash join isn't right and something else (that doesn't even exist) is. Hints preclude you from getting it.

Hash vs Nested join

Vinodhps, May 30, 2003 - 4:22 am UTC

Hi... tom ,
thanks for your response..
1) some query take more time that i could not able to trace them ,it take hrs so i aborptly terminate the process, is there any way to trace(10046,10053) these kind of queries.
if i terminate the process iam getting only empty trace
files.. please suggest easy method to trace these queries.
In your book i could not able to find any regarding this.

2)Regarding your answer for Cardinality its showing total no of row for the subset .when i expalin plan my query which shows me card=3 where as the query returns 1294 rows, what is that 3 can you please Explain the concept behind that Mr.Tom.

Thanks

Tom Kyte
May 30, 2003 - 8:08 am UTC

1) don't know what to tell you . you kill them, they stop running, of course the trace is empty.

10053 doesn't require running the query, just explain plan it.


2) i cannot explain anything without seeing it (NO do not put it here). that is an estimated cardinality. It is based on statistics. It is a mathematical function. It is rarely 100% accurate (think about it, every set of input to that query would have to return exactly 3 rows -- thats not possible).

I would not be overly concerned with the difference between 3 and 1294 at the topmost level. There is some set of inputs (binds) to that query that would probably come much closer or the stats are slightly out of date or the data is skewed slightly and you don't have histograms...

do not believe everything CBO chooses

A reader, January 06, 2004 - 9:12 am UTC

Hi

Do not believe everything CBO chooses, we have very bad experiences in all versions from 8.0.6 to 9.2.0.4. It choose many times merge join when it should use hash join, one example is one of the batch jobs running in 9.2.0.4 used to take 8 hours, now it takes 45 minutes. All we did were forcing it to use hash join instead of merge join, which CBO thought was the best

hash_area_size

A reader, February 20, 2004 - 4:46 am UTC

Hi Tom,
If I am using pga_aggregate_target in 9ir2, and execution plan showing hash joins, Do I need to tune hash_area_size?
Thanks

Tom Kyte
February 20, 2004 - 9:39 am UTC

well, it depends on the workarea_size_policy init.ora setting. when set to AUTO then No -- hash_area_size would only be used by a shared server connection in 9i when using AUTO workareas (and in 10g, even shared servers won't use it.)

if using manual, then yes, hash_area_size is the setting the server will use to hash.

Sort Join and Hash Join

Vivek Sharma, April 15, 2004 - 2:02 pm UTC

Dear Tom,

In your Book "Effective Oracle by Design" on page 356 you have demonstrated the execution plan of a query with different settings of Sort_area_size and hash_area_size. In the example, when the sort_area_size was 100MB and hash_area_size was 200MB, the optimizer chose hash join whereas when the sort_area_size was reduced to 64k and hash_area_size to 128k, the optimizer chose sort merge join. Why is it so ? In both the situation, the hash area size was twice the value of sort_area, then why the plans are different ?

Your explanation will help me understand this in much better way.

Thanks and Regards
Vivek

Tom Kyte
April 15, 2004 - 2:28 pm UTC

because the hash area size was deemed too small to be efficient anymore. sort merge became more efficient. they are different algorithm (i describe their implementations in that book as well)

it is easier/more efficient to sort in a smaller space using lots of temp than it is to hash based on their algorithms.


hash table selection in hash join

Sami, May 06, 2004 - 11:15 am UTC

Dear Tom,

The tkprof output shows like below for my SQL.

Table name C1(242 rows) and P1(7568 rows) is involved in HASH join.

My question is that why oracle is trying to keep BIG table(P1, which has more rows compared to C1) in memory?

In other words, how P1 becomes has table instead of c1(lookup table)?

Rows Row Source Operation
------- ---------------------------------------------------
2 COUNT STOPKEY
2 VIEW
2 SORT ORDER BY STOPKEY
2 NESTED LOOPS
7568 HASH JOIN
7567 TABLE ACCESS BY INDEX ROWID P1
7568 INDEX RANGE SCAN (object id 293257)
242 TABLE ACCESS FULL C1
2 TABLE ACCESS BY INDEX ROWID E1
13869 INDEX UNIQUE SCAN (object id 188447)

Thanks in advance

Tom Kyte
May 06, 2004 - 2:36 pm UTC

what does the *explain* plan look like.

(that and 7500 is "small", but lets see the PLAN from autotrace)

autotrace output

Sami, May 06, 2004 - 3:13 pm UTC

Execution Plan
--------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=90 Card=4 Bytes=896)
1 0 COUNT (STOPKEY)
2 1 VIEW (Cost=90 Card=4 Bytes=896)
3 2 SORT (ORDER BY STOPKEY) (Cost=90 Card=4 Bytes=640)
4 3 NESTED LOOPS (Cost=80 Card=4 Bytes=640)
5 4 HASH JOIN (Cost=72 Card=4 Bytes=432)
6 5 TABLE ACCESS (BY INDEX ROWID) OF 'P1' (Cost=69 Card=46 Bytes=4324)
7 6 INDEX (RANGE SCAN) OF 'P1_IX05' (NON-UNIQUE) (Cost=2 Card=46)
8 5 TABLE ACCESS (FULL) OF 'C1' (Cost=2 Card=242 Bytes=3388)
9 4 TABLE ACCESS (BY INDEX ROWID) OF 'E1' (Cost=2 Card=353 Bytes=18356)
10 9 INDEX (UNIQUE SCAN) OF 'E1_PK' (UNIQUE) (Cost=1 Card=353)


Statistics
----------------------------------------------------------
197 recursive calls
34 db block gets
23771 consistent gets
1418 physical reads
4284 redo size
1102 bytes sent via SQL*Net to client
431 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
2 rows processed



Tom Kyte
May 06, 2004 - 8:32 pm UTC

2 COUNT STOPKEY
2 VIEW
2 SORT ORDER BY STOPKEY
2 NESTED LOOPS
7568 HASH JOIN
7567 TABLE ACCESS BY INDEX ROWID P1
7568 INDEX RANGE SCAN (object id 293257)
242 TABLE ACCESS FULL C1
2 TABLE ACCESS BY INDEX ROWID E1
13869 INDEX UNIQUE SCAN (object id 188447)


see the difference between "reality" and "the guess"

the guess said "46 rows", reality said "7,568 rows". But that is OK, cause Oracle has the ability to flip flop the hasher and the hashee if it discovers at run time it made the wrong choice.

runtime plan

Sami, May 17, 2004 - 9:53 am UTC

Dear Tom,

You mean to say that oracle will go for different plan than what we see in tkprof. So where I can see the actual "runtime" plan?

For the same query,explain_plan(using autotrace) can show plan1,tkprof can show plan2 BUT runtime it may go for plan3?

explain plan ==> tkprof ==> runtime
plan1 ==> plan2 ==> plan3



Tom Kyte
May 17, 2004 - 3:10 pm UTC

tkprof (without explain=u/p) has REALITY.

tkprof never lies.

autotrace can "lie", not really -- it has less/different information.

Hash Join

Ruchi, June 16, 2005 - 7:43 am UTC

Hi Tom,

I was optimizing a query which contains equijoins on 2-3 tables. The explain plan shows that hash join is used along with Full table scans for 2 tables. The tables have approx 23000 rows in them.

In my opinion, as its a hash join, we can not remove the full table scan 'coz its the basic thing that optimizer will do while applying the hash join.

But my colleague suggested me that there has to be a way out.

Also, is it true that a query consisting of only equijoins on 2 tables will always do a full table scan as no limiting condition has been imposed and hence use hash join for quite big tables (whats the limits?).

Thanks Tom for replying to this. I have an urgent time-bound committment for responding to the tuning request. Thanks alot!

Rgds/Ruchi.

Tom Kyte
June 16, 2005 - 9:58 am UTC

please do this:


a) say out loud "indexes are not all goodness"
b) say out loud "full scans are not evil"
c) goto (a) until you believe it.

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6749454952894#6760861174154 <code>

will help them believe it.


if you are joining three tables together, all of the rows, why would you even consider an index.

indexes are good to get "few of the rows" from a table, not all of them

How CBO decides that which of the two tables is the smaller table

Vikas Khanna, July 14, 2005 - 4:15 am UTC

Hi Tom,

Please let us know the defintion of smaller table.

For each pair of partitions (one from each table), Oracle uses the smaller one to build a hash table and the larger one to probe the hash table.

Mine thoughts:
--> Does CBO verifies the statistics of the two tables (A&B) and does a computation like this
ie LEAST((NULL(A.num_rows,0) * Null(A.avg_row_len,0)),(Null(B.num_rows,0) * Null(B.avg_row_len,0)))
and chooses the table depending on the above result.

If the above formulation is correct then CBO can mistakenly choose different table as smaller table if the statistics are n't calculated using dbms_stats OR using analyze command.

OR it is something beyond than this?

Kindly explain your thoughts!

Regards
Vikas Khanna

Tom Kyte
July 14, 2005 - 10:32 am UTC

the cbo uses cardinality guesses to determine the smaller table.

Why sorts show in the stats for a hash join?

naresh, December 24, 2005 - 11:24 am UTC

Hi Tom,

I have a table with below structure (it contains a hierarchy):

create table rol_itm (parent_id number(10), child_id number(10), type_id number(2))

Test Data was populated using (basically will populate 2 million rows for parent-child, and 500K grand children):

insert /*+ APPEND */ all
when mod(id2,1) = 0 then
        into rol_itm (father_id, child_id, type_id) values (id, id2, 0)
when mod(id3,4) = 0 then
        into rol_itm (father_id, child_id, type_id) values (id2, id3, 0)
 select 100000000 + rn id , 140000000 + rn id2, 170000000 + rn id3
 from (select level rn from dual connect by level <=2000000)
/

A Hash hint was used in the query below (query is to get every non-direct parent for a child - in this test case, it will get the 500K grand parents):

SQL> l
  1  select 1000000 + rownum, father_id, child_id, -depth, substr(path, 2, instr(path,'/',2,1)-2) subtree_root
  2     from(
  3      select /*+ USE_HASH (rol_itm) */ father_id, child_id, level depth, SYS_CONNECT_BY_PATH(father_id, '/') path
  4      from rol_itm
  5      connect by prior child_id = father_id)
  6*    where depth > 1
SQL> /

500000 rows selected.

Elapsed: 00:00:53.79

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=82 Bytes=167362)
   1    0   COUNT
   2    1     VIEW (Cost=2 Card=82 Bytes=167362)
   3    2       FILTER
   4    3         CONNECT BY (WITH FILTERING)
   5    4           NESTED LOOPS
   6    5             TABLE ACCESS (FULL) OF 'ROL_ITM' (Cost=2 Card=82)
   7    5             TABLE ACCESS (BY USER ROWID) OF 'ROL_ITM'
   8    4           HASH JOIN
   9    8             CONNECT BY PUMP
  10    8             TABLE ACCESS (FULL) OF 'ROL_ITM' (Cost=2 Card=82 Bytes=2132)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
    2514685  consistent gets
          0  physical reads
          0  redo size
   23488645  bytes sent via SQL*Net to client
     367314  bytes received via SQL*Net from client
      33335  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
     500000  rows processed


Note: This is being used in a data migration, where we do not do the statistics for anything on the staging database, so I have to give the hint explicitly (else create an index to use NL, which runs slower).

Question is: since a hash join is being used - Why are those sorts showing up in the stats?

Thanks,
Naresh 

Tom Kyte
December 24, 2005 - 12:34 pm UTC

you have connect by's in there.

Why does connect by do a sort?

Naresh, December 25, 2005 - 4:03 am UTC

Hello Tom,

Why does a connect by do a sort? My guess at how it would work:

Read rol_itm and build a hash table for it
Read rol_itm and for each row
Output the row
Probe(child_id, path) --> path being the sys_connect_by_path string

Where function probe(child_id, path) would be

probe (id) {
if hash table has entry with father_id = id
append child_id to path and output the row
probe (child_id of row found)
}

Above would not need a sort, right?

However, it can be totally different from above of course, or I missed something.

Thanks,
Naresh

Tom Kyte
December 25, 2005 - 8:53 am UTC

but it "doesn't"

do a simple connect by on a single table - emp.

Semijoin

Duke Ganote, December 29, 2005 - 10:51 am UTC

Here
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:3109440977616#8661334014331
you wrote:
> Semi-Joins are joins like:
> 
> select * from dept where deptno in ( select deptno from emp );
> 
> or
> 
> select * from dept where exists 
> ( select null from emp where dept.deptno = dept.deptno )
> 
> we cannot turn either of the in or exists into a straight JOIN -- that would 
> return too many rows from DEPT (dept would be multiplied by EMP).  Hence, we do 
> what is known as a semi join.  We join DEPT to EMP but just stop after the first 
> hit on EMP for any DEPT record.  It is not a full join, just a partial or semi 
> join. 

"<b>cannot turn...into a straight JOIN</b>" depends on your definition; the following

select DISTINCT d.* 
  from dept d inner join emp e
    on e.deptno = d.deptno

yields equivalent results.  This other equivalent:

select d.* 
  from dept d inner join 
       ( select DISTINCT deptno
           from emp ) e
    on e.deptno = d.deptno

is of course from
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:953229842074 <code>
right?

Tom Kyte
December 29, 2005 - 1:05 pm UTC

but that isn't turning it into a straight join and the DISTINCT change change the answer in general (you are distincting the DEPT table, what if the selected attributes from DEPT are not in fact distinct - you potentially change the cardinality with distinct)

In general, select distinct d.* will NOT return equivalent results.


And the second one is the semi-join in effect.


I'll stick with saying "cannot turn.... into a straight JOIN". can you write joins that in some cases work? sure, but not in general.

Distinctly so...

Duke Ganote, December 30, 2005 - 8:17 am UTC

> if the selected attributes from DEPT
> are not in fact distinct

If, in other words, DEPTNO were not the unique key of DEPT? Then the results would be likely different for

select DISTINCT d.*
from dept d inner join emp e
on e.deptno = d.deptno

I had noted that the original
> select * from dept where deptno in ( select deptno from emp )
is equivalent to
select * from dept where deptno in ( select DISTINCT deptno from emp )
which is easily transformed into the join form you proposed.

Tom Kyte
December 30, 2005 - 9:55 am UTC

No, I meant "if the set of attributes you selected were not distinct"


select location
from dept
where deptno in ( ..... );


location does not have to be distinct


But what I have been saying is you cannot casually convert an IN into a JOIN - you have to add distincts and other operations - it cannot just be CONVERTED from an IN to a JOIN without using inline views, distinct and other things.




what is actual sort/hash memory and temp space size needed?

jianhui, March 03, 2006 - 11:54 am UTC

Hello Tom,
If two join tables have small number of rows but very large amount of blocks under HWM, I am wondering how much space would be needed for sort merge join as well as hash join. Does oracle need to use bigger space to do the sort even number of rows are small (extream case, let's say only 1 row per table but there are tons of blocks under HWM)? Would be nice to demo the test scenario for both sort and hash join in this sparse table case. I would guess oracle does not need that much sort space but not yet verified.
Thank you


Tom Kyte
March 03, 2006 - 2:05 pm UTC

only that which to hold the actual data - it would read a row, hash it, put it into hash table. "empty rows" - they don't count, they will be "read" but there is nothing there.

would it be possible to estimate size of temp space needed

Jianhui, March 03, 2006 - 10:24 pm UTC

Thank you Tom. So given number of rows of two tables, average row length, and the size of sort area, would it be possible to estimate the temp space size needed for sort merge join?

For instance, select * from a, b where a.id=b.id
a has 10M rows average row length 1 KBytes
b has 10M rows average row length 1 KBytes,
sort_area_size is 100MB
Best Regards,

Tom Kyte
March 04, 2006 - 7:04 am UTC

database can "guess" for us, but it'll be off.

There is zero percent chance anyone reasonable would consider a sort merge join for that.  They would use a hash join 


ops$tkyte@ORA10GR2> create table t1 ( x int, y char(1000) );

Table created.

ops$tkyte@ORA10GR2> create table t2 ( a int, b char(1000) );

Table created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> exec dbms_stats.set_table_stats( user, 'T1', numrows=>10000000, numblks => 10000000/7, AVGRLEN => 1024 );

PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR2> exec dbms_stats.set_table_stats( user, 'T2', numrows=>10000000, numblks => 10000000/7, AVGRLEN => 1024 );

PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> set autotrace traceonly explain
ops$tkyte@ORA10GR2> select /*+ use_merge(t1 t2) */ *
  2    from t1, t2
  3   where t1.x = t2.a;

Execution Plan
----------------------------------------------------------
Plan hash value: 412793182

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Tim
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    10M|    18G|       |  6670K  (1)| 08:
|   1 |  MERGE JOIN         |      |    10M|    18G|       |  6670K  (1)| 08:
|   2 |   SORT JOIN         |      |    10M|  9679M|    21G|  3335K  (1)| 04:
|   3 |    TABLE ACCESS FULL| T1   |    10M|  9679M|       |   440K  (1)| 00:
|*  4 |   SORT JOIN         |      |    10M|  9679M|    21G|  3335K  (1)| 04:
|   5 |    TABLE ACCESS FULL| T2   |    10M|  9679M|       |   440K  (1)| 00:
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."X"="T2"."A")
       filter("T1"."X"="T2"."A")

ops$tkyte@ORA10GR2> set autotrace off
 

Outstanding, as usual.

Paulo, March 22, 2007 - 2:58 pm UTC

Tom,

I'm reading the DB Tuning guide, and have some doubts regarding hash joins (all kinds of joins, really). They may sound silly, so bear with me for a while here...

1) You said (on the very first reply to this question):
"hash the join key and look up into temp_a by that hash key for matching records"

How is this fast? Do we loop on temp_a? Is it implemented as an array? What if it doesn't fit into memory?

2) I think there's a small mystake on sort-merge algorithm you presented... shouldn't the line that reads
"then output joined record" read "then output joined record and read a record from temp_a"? Temp_a is the "driving" (or is it "outer") table, right?

3) When doing a nested loop, do we phisically read the inner table more than once? From disk or memory? Both tables must fit into memory?

So many questions, huh?

Thanks again!
Paulo, from Brazil

PS: I'm using Firefox and sometimes when I post a review, it gets posted three times. I'm not pressing the Submit button three times, believe me :)
Tom Kyte
March 22, 2007 - 3:12 pm UTC

1) do you have access to my book Effective Oracle by Design? I covered in there the mechanics of a hash join....

2) it is even more complex than that since A and B might have the same record over and over again - it was a concept thing here....

3) it is just like it looks.

loop over outer table
   find all matching records in other table, we might have to FULL scan the other
   table over and over, we might use an index.  you cannot say if you will 'read'
   the table more than once, just that you will issue the logical equivalent of
   "select * from otherTable where key = x.key" over and over....


Alberto Dell'Era, March 22, 2007 - 4:27 pm UTC

> How is this fast? Do we loop on temp_a? Is it implemented as an array? What if it doesn't fit into memory?

After you have mastered the fundamentals in Tom's "Effective Oracle by Design", if you're still curious, take a look at Jonathan Lewis' "Cost Based Oracle" - one of the last chapters contains the best illustration of the Hash Join mechanics I've ever read. It covers the "What if it doesn't fit into memory" especially well.

Hash Joins

Paulo, March 23, 2007 - 1:17 pm UTC

Tom, I'm seriously considering buying your book, but it's very expensive here in Brazil... In the meantime, could you please point me someplace where I can learn this hash join stuff? I'm really interested... I think it's a very important thing to know, and the Tuning guide doesn't say much...

Thanks

Hash Joins

Paulo, March 23, 2007 - 1:17 pm UTC

Tom, I'm seriously considering buying your book, but it's very expensive here in Brazil... In the meantime, could you please point me someplace where I can learn this hash join stuff? I'm really interested... I think it's a very important thing to know, and the Tuning guide doesn't say much...

Thanks

Hash Join - Oracle 9i

Tacho, November 27, 2007 - 8:16 pm UTC

Hi Tom,

I tried to reproduce the example given as of May 29, 2003, but can not did it at all.
The following is what a did:

1. create the table "big_table" with your sql script big_table.sql

2. Create table "big_table2" as step 1.

3.- tables big_table and big_table2 have 3817191 rows each one.

4.- runstats tool is being used.

here is the test:

big_table@ORA9I> set timing on
big_table@ORA9I>  select *
  2  from big_table2 t1, big_table t2
  3  where t1.id = t2.id
  4  /

3817191 rows selected.

Elapsed: 00:06:20.00

Execution Plan
----------------------------------------------------------                      
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=480543 Card=3817191           
          Bytes=763438200)                                                      
                                                                                
   1    0   MERGE JOIN (Cost=480543 Card=3817191 Bytes=763438200)               
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE2' (Cost=1362          
          48 Card=3817191 Bytes=381719100)                                      
                                                                                
   3    2       INDEX (FULL SCAN) OF 'BIG_TABLE2_PK' (UNIQUE) (Cost=17          
          214 Card=3817191)                                                     
                                                                                
   4    1     SORT (JOIN) (Cost=344295 Card=3817191 Bytes=381719100)            
   5    4       TABLE ACCESS (FULL) OF 'BIG_TABLE' (Cost=18069 Card=38          
          17191 Bytes=381719100)                                                
                                                                                


Statistics
----------------------------------------------------------                      
       1231  recursive calls                                                    
          2  db block gets                                                      
     756140  consistent gets                                                    
     374002  physical reads                                                     
        780  redo size                                                          
  638141775  bytes sent via SQL*Net to client                                   
    2799772  bytes received via SQL*Net from client                             
     254481  SQL*Net roundtrips to/from client                                  
         20  sorts (memory)                                                     
          1  sorts (disk)                                                       
    3817191  rows processed                                                     

big_table@ORA9I> 
big_table@ORA9I> exec runStats_pkg.rs_middle;

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
big_table@ORA9I> select /*+ use_nl(t1,t2) index(t1 big_table2_pk) */ *
  2  from big_table2 t1, big_table t2
  3  where t1.id = t2.id
  4  /

3817191 rows selected.

Elapsed: 00:06:31.04

Execution Plan
----------------------------------------------------------                      
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3970875 Card=3817191          
           Bytes=763438200)                                                     
                                                                                
   1    0   NESTED LOOPS (Cost=3970875 Card=3817191 Bytes=763438200)            
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE2' (Cost=1362          
          48 Card=3817191 Bytes=381719100)                                      
                                                                                
   3    2       INDEX (FULL SCAN) OF 'BIG_TABLE2_PK' (UNIQUE) (Cost=17          
          214 Card=3817191)                                                     
                                                                                
   4    1     TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=2 Car          
          d=1 Bytes=100)                                                        
                                                                                
   5    4       INDEX (UNIQUE SCAN) OF 'BIG_TABLE_PK' (UNIQUE) (Cost=1          
           Card=1)                                                              
                                                                                




Statistics
----------------------------------------------------------                      
          0  recursive calls                                                    
          0  db block gets                                                      
   12342894  consistent gets                                                    
     272358  physical reads                                                     
          0  redo size                                                          
  638141775  bytes sent via SQL*Net to client                                   
    2799772  bytes received via SQL*Net from client                             
     254481  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
    3817191  rows processed                                                     

big_table@ORA9I> exec runstats_pkg.rs_stop
Run1 ran in 79590 hsecs                                                         
Run2 ran in 131477 hsecs                                                        
run 1 ran in 60,54% of the time                                                 


Run1 latches total versus runs -- difference and pct                            
Run1        Run2        Diff       Pct                                          
5,173,832  30,699,763  25,525,931     16.85%                                    


What am I doing wrong here?, What is it missing?

why in 

big_table@ORA9I>  select *
  2  from big_table2 t1, big_table t2
  3  where t1.id = t2.id
  4  /

am I getting a MERGE JOIN instead of HASH JOIN as you example did?

There is not a huge difference in elapsed time between sqls ran above.
maybe here the important thing is the reduction in latches, as you say "locks are serialization devices, serialization devices inhibit scalability in a multi-user environment".

Please can you give some thougts on this.

Thanks beforehand.

Regards,
Tacho

Tom Kyte
November 28, 2007 - 10:41 pm UTC

what NON-DEFAULT parameters do you have set.

and what are your pga memory settings.

Hash Joins - Oracle 9i

Tacho, November 29, 2007 - 11:34 am UTC

Hi Tom, 

big_table@ORA9I> select name,value from v$pgastat ;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
aggregate PGA target parameter                                     16777216
aggregate PGA auto target                                           8736768
global memory bound                                                  838656
total PGA inuse                                                     7067648
total PGA allocated                                                15816704
maximum PGA allocated                                              16735232
total freeable PGA memory                                                 0
PGA memory freed back to OS                                               0
total PGA used for auto workareas                                         0
maximum PGA used for auto workareas                                  652288
total PGA used for manual workareas                                       0
maximum PGA used for manual workareas                                     0
over allocation count                                                     0
bytes processed                                                  2199501824
extra bytes read/written                                            1298432
cache hit percentage                                                  99,94

16 rows selected.

big_table@ORA9I> sho parameter _area_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------
bitmap_merge_area_size               integer     1048576
create_bitmap_area_size              integer     8388608
hash_area_size                       integer     1048576
sort_area_size                       integer     524288
workarea_size_policy                 string      AUTO

big_table@ORA9I> select name, value from v$parameter where isDefault='FALSE';

NAME                           VALUE
------------------------------ ----------------------------------------
processes                      150
timed_statistics               TRUE
shared_pool_size               50331648
large_pool_size                8388608
java_pool_size                 33554432
control_files                  c:\oracle9i\oradata\ORA9I\CONTROL01.CTL,
                               c:\oracle9i\oradata\ORA9I\CONTROL02.CTL,
                               c:\oracle9i\oradata\ORA9I\CONTROL03.CTL

db_block_size                  4096
db_cache_size                  33554432
compatible                     9.2.0.0.0
db_file_multiblock_read_count  8
db_create_file_dest            C:\ORACLE9I\ORADATA\ORA9I
fast_start_mttr_target         300
undo_management                AUTO
undo_tablespace                UNDOTBS1
undo_retention                 900
remote_login_passwordfile      EXCLUSIVE
db_domain
instance_name                  ORA9I
dispatchers                    (PROTOCOL=TCP) (SERVICE=ORA9IXDB)
job_queue_processes            10
hash_join_enabled              FALSE
background_dump_dest           c:\oracle9i\admin\ORA9I\bdump
user_dump_dest                 c:\oracle9i\admin\ORA9I\udump
core_dump_dest                 c:\oracle9i\admin\ORA9I\cdump
sort_area_size                 524288
db_name                        ORA9I
open_cursors                   300
star_transformation_enabled    FALSE
query_rewrite_enabled          FALSE
pga_aggregate_target           16777216
aq_tm_processes                1

31 rows selected.

database was created using dbca.

Regards.

Tom Kyte
November 29, 2007 - 7:05 pm UTC

hash_join_enabled              FALSE


as support would say:

tar closed.


database was created by dbca
database was modified by human beings.

Hash Joins - 9i

Tacho, November 29, 2007 - 7:14 pm UTC


thanks tom, by changing the parameter hash_join_enabled to TRUE, the optimizer works as expected.

Thanks a lot for sharing your knowledge.

regards.

hasan, December 17, 2011 - 2:55 pm UTC

Hi ,
I have to insert call data records into a table (MAIN TABLE) and I have to filter duplicate values. There is a second table containing previously read records(TAG TABLE). Tag table contains 8 columns(unique columns for the record).

MAIN TABLE contains 100 columns including the 8 columns in TAG table. 8 of the unique columns are nullable. Both MAIN and TAG tables are range partitioned on call date. Also, both are hash paartitioned on the 8 columns.

Tag table contains 12 billion rows on 80 date partitions. Number of records to be inserted at a time vary between 5 million to 20 million.

Both tables have up to date statistics.

My query is basically like

insert /*+ append*/ into MAIN_TABLE
select /*+ hash_aj */ * from CDR_TABLE
where (col_1 ,col_2 , ... col_8) not in
(
select col_1 ,col_2 , ... col_8 from TAG_TABLE
)

How should I filter duplicates from going to main table ? This query takes more than an hour.

1-) Design of the tables is not fixed, so is this a good design ?
2-) Should I loop on call date to find duplicates or should I join the 5M to 12B at once ?
3-) I am trying to use hash anti join but explain plan uses merge NA anti join. I do not want to sort in the beginning, since one table is really big. How can I use hash anti join ? ( I set always_anti_join = hash )
4-) Any thing you'd suggest for such a problem ?

Thanks in advance...

Tom Kyte
December 18, 2011 - 5:18 am UTC

HASH_AJ is not a hint. It doesn't exist (as of 10g and above)



How should I filter duplicates from going to main table ?

it seems you already are? Your NOT IN is doing that already is it not?


1) why does tag_table exist? It seems that main_table already has this information and since those columns are unique - they'll have an index we can fast full scan already - tag_table seems redundant.

2) just join.

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6749454952894#6760861174154


3) show the plan. It is quite possible that due to the fact that each of co1..col8 are nullable - you might not be able to do a hash anti-join. You can try a query such as:

ops$tkyte%ORA11GR2> create table t1 ( c1 int , c2 int , c3 int, unique(c1,c2,c3)  );

Table created.

ops$tkyte%ORA11GR2> create table t2 ( c1 int, c2 int, c3 int, c4 varchar(20) );

Table created.

ops$tkyte%ORA11GR2> create table t3 ( c1 int, c2 int, c3 int, c4 varchar(20) );

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'T1', numrows => 1000000000, numblks => 1000000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'T2', numrows => 20000000, numblks => 100000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> explain plan for
  2  insert /*+ append */ into t3
  3  select /*+ use_hash(t1,t2) */ t2.*
  4    from t1, t2
  5   where t2.c1 = t1.c1(+)
  6     and t2.c2 = t1.c2(+)
  7     and t2.c3 = t1.c3(+)
  8     and t1.rowid is null;

Explained.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3201614621

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT        |              |    20M|  1945M|       |  3073K  (1)| 10:14:43 |
|   1 |  LOAD AS SELECT         | T3           |       |       |       |            |          |
|*  2 |   FILTER                |              |       |       |       |            |          |
|*  3 |    HASH JOIN OUTER      |              |    20M|  1945M|  1201M|  3073K  (1)| 10:14:43 |
|   4 |     TABLE ACCESS FULL   | T2           |    20M|   972M|       | 27290   (1)| 00:05:28 |
|   5 |     INDEX FAST FULL SCAN| SYS_C0026028 |  1000M|    47G|       |     9   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("T1".ROWID IS NULL)
   3 - access("T2"."C1"="T1"."C1"(+) AND "T2"."C2"="T1"."C2"(+) AND
              "T2"."C3"="T1"."C3"(+))

19 rows selected.


to evaluate if the hash method is the way to go for you.

4) you might not have a problem, you might not have hardware (memory, IO bandwidth,cpu) to do this in less time - that is entirely possible.

clarification about sort merge join

Biswaranjan, June 01, 2012 - 5:18 am UTC

Hi Tom,

Hope you are doing good. :)

Top of this page you have mentioned the internal structure
of sort merge join as
"Well, a sort merge of A and B is sort of like this:

read A and sort by join key to temp_a
read B and sort by join key to temp_b

read a record from temp_a
read a record from temp_b
while NOT eof on temp_a and temp_b
loop
if ( temp_a.key = temp_b.key ) then output joined record
elsif ( temp_a.key <= temp_b.key ) read a record from temp_a
elsif ( temp_a.key >= temp_b.key ) read a record from temp_b )
end loop
"


Is it like you missed one extra line with above if condition?

I mean it would be like below.

"read A and sort by join key to temp_a
read B and sort by join key to temp_b

read a record from temp_a
read a record from temp_b
while NOT eof on temp_a and temp_b
loop
if ( temp_a.key = temp_b.key ) then output joined record and
(read next record from temp_a ,read next record from temp_b) --this line
elsif ( temp_a.key <= temp_b.key ) read a record from temp_a
elsif ( temp_a.key >= temp_b.key ) read a record from temp_b )
end loop "

Can you please clarify .

Thanks as always,
Biswaranjan


Tom Kyte
June 01, 2012 - 7:06 am UTC

sure, there are some 'reads' missing there - but it is more complex than you have even.

the record from temp_a might join to many more temp_b records. the record from temp_b might join to many more temp_a records - there might be more temp going on.


This is "in concept it is sort of like this". It is not a perfect representation by any means, the real stuff would be "large"