Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Babloo.

Asked: June 06, 2006 - 6:18 pm UTC

Answered by: Tom Kyte - Last updated: September 16, 2020 - 1:05 pm UTC

Category: Database - Version: 9.1.2

Viewed 10K+ times! This question is

You Asked

Hi Tom ,
I have your book and am still not able to understand the mechanism of hash join . it just says similar to this ( I do not have book at work so can not exactly reproduce it )

one table would be hashed into memory and then the other table would be used to probe the matching values
Can you explain it a bit more ?
Also what are the factors which determine whether nested loops or hash join would be done . I had thougt that nested loops will be used for 2 smaller and simmilar (volumewise) set of data and hash join will be used when one table is really big and the other is small . but aparrently there are other factors too . what are those things ?
one of my coworker once insisted that (In one case ), nested loops were done becuase one the table is using the index and when we put the hint for full table , it started doing hash joins .
What could be the reason becuase the volume of data did not change

and we said...

Ok, you get to be my "guinea pig" :)

See
</code> http://asktom.oracle.com/tkyte/hj/hj.html

It is a flash answer to the question.  Let me know what you think...  It is about 13 minutes long.

Alternatively, if you want "offline viewing" of this, you can download a .exe file (about 1.8mb) that lets you play this whenever you want:

http://asktom.oracle.com/tkyte/hj/hj.exe <code>


set linesize 1000
drop table emp;
drop table dept;

create table emp
as
select *
from scott.emp;
create table dept
as
select *
from scott.dept;
alter table dept
add constraint dept_pk
primary key(deptno);
alter table emp
add constraint emp_pk
primary key(empno)
modify deptno not null
add constraint emp_fk
foreign key (deptno)
references dept(deptno);
begin
dbms_stats.gather_table_stats
( user, 'EMP' );
dbms_stats.gather_table_stats
( user, 'DEPT' );
end;
/
set autotrace traceonly explain
select *
from emp, dept
where emp.deptno = dept.deptno;
set autotrace off

delete from emp;
delete from dept;
alter table dept modify deptno number(5);
alter table emp modify deptno number(5);

insert into dept
(deptno,dname,loc)
select rownum, substr(object_name,1,14),
substr(object_type,1,13)
from all_objects
where rownum <= 500;

insert into emp
(empno,ename,job,mgr,hiredate,sal,comm,deptno)
select rownum,substr(object_name,1,10),
substr(object_type,1,9), rownum-1,
created,object_id/2,data_object_id/2,
mod(rownum,500)+1
from all_objects
where rownum < 10000;

begin
dbms_stats.gather_table_stats
( user, 'EMP', cascade=>true );
dbms_stats.gather_table_stats
( user, 'DEPT', cascade=>true );
end;
/
set autotrace traceonly explain
select *
from emp, dept
where emp.deptno = dept.deptno;
set autotrace off



and you rated our response

  (86 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

which of the to tables be hashed in memory

June 07, 2006 - 11:00 am UTC

Reviewer: Hrishy from RI

Hi

Would it be safer to assume that oracle would always hash the smaller of the two tables always ?

what about sybase db2 informix teradata postgre and mysql do they have hash join concept ?

regards
Hrishy

Tom Kyte

Followup  

June 07, 2006 - 3:06 pm UTC

The goal is to hash the smaller of the two result sets - yes, and in fact, Oracle can change its mind part way through if necessary (if it guesses wrong).



Excellent presentation

June 07, 2006 - 11:03 am UTC

Reviewer: Peter Schwenzer from Philadelphia

Tom;
The presentation on hash joins is excellent. I can't wait till you do more of them. I think it would be very helpful if you put a link to the presentation area on the home page. If possible, could you use a cookie or some means to notify the user that new presentations are available when the first enter the site..??

Keep up the great work.

Tom Kyte

Followup  

June 07, 2006 - 3:07 pm UTC

I'll consider that (probably not the cookie, more of a sorted list)

Vow!!!!!!

June 07, 2006 - 11:14 am UTC

Reviewer: babloo from minneapolis

Do not know how to thank you ? It is amazing

Awsome

June 07, 2006 - 11:19 am UTC

Reviewer: Glenn Feiner from Princeton, NJ

AskTom on steriods!!!

Now, how about one on Nested Tables?

Excellent presentation

June 07, 2006 - 3:38 pm UTC

Reviewer: Harry from Somerset,NJ USA

This was a wonderful piece of presentation

one doubt still not clear

June 07, 2006 - 11:59 pm UTC

Reviewer: babloo from minneapolis

Hi Tom ,
That was fantastic . HOwever one thing is still not clear
when I compared both options ( nested loop and hash joins )
Here is what I found

hash join had to put the dept table in memory using hash function
next both hash joins and nested loops started full table scans of emp table
and for each value of deptno
1. nested loop method went to index of dept table , got the rowid for this dept no
and then searched the dept table by the rowid
2. hash join method applied the hash function on this dept no , found the slot and accesed the dept record from hashed memory structure

what I am not clear about is "if there were millions of records , why hash join would be fast ?
Is it becuase the second step was retrievalof dept record from memory whereas for nested loops it was reading the dept table . but then hash join method had to do one extra step of applying the hash function on the whole table one by one and copying it into memory
am I missing something ?


Tom Kyte

Followup  

June 08, 2006 - 8:23 am UTC

see
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6749454952894#6760861174154 <code>

for an example.

Using a nested loop - every time you go from EMP to DEPT, you would have to do an index range scan - find a block (3 or 4 LIO's - LIO's are neither "free" nor really "cheap" - everyone assumes that it is "a ram retrieval, super fast" - but they are not).



June 08, 2006 - 11:27 am UTC

Reviewer: A reader

another nice presentation of how a hash join works
can be found on
</code> http://tinyurl.com/kmpvs <code>

Very nice

June 08, 2006 - 4:49 pm UTC

Reviewer: Kevin from St. Louis, MO


If it's interaction you're going for, perhaps you should have highlight-able text on the same page of the demo with labeled chunks of code, and every now and then during the demo say: "If you're following along at home, pause the demo, and copy, past, and run section (2) in your SQL-Plus window."

Otherwise, very cool idea, well worth the investment, IMO.

Can we take this just a bit farther...

June 08, 2006 - 5:29 pm UTC

Reviewer: Mark Brady from Baltimore, MD

If I Hash dept 40 and get 6 and when I use the emp table to probe I hash 40 and get 6, how is that faster, better and cheaper than putting 40 into 40 and not using the function at all?

Say I have 100 departments and my hash function returns 10 values. I have 10 buckets and 9 -more- entries linked to each of those buckets (or some such distribution). So I still have 100 entries they are just start to look like a 10x10 array instead of a 100x1 array?

Is the savings in so far as to find some entry in the 100x1 array the worst case is that I search 100 entries and in the hash the worst case is that I have to search 10 buckets and then 10 entries in that bucket or a total of 20? And that savings during the search outweighs by the additional cost of the hash function? And that the reason this is efficient for 2 'large' tables is that the number of times I use the hash function is records_in_table_1 PLUS records_in_table_2 and the number of times through nested loops is records_in_table_1 TIMES records_in_table_2. And since A+B approaches A*B as A or B approaches 1. And therefore the cost of the function is a large factor as A+B approaches A*B?

Do I have this right?

Tom Kyte

Followup  

June 08, 2006 - 8:13 pm UTC

40 might have hashed into 40. It could have.

It was an example - consider joining ALL_OBJECTS to something. All of a sudden you have a key perhaps of "object_id" with lots of different values all over the place.

Maybe the low value is 5 and the high value is 100,325.
but the number of distinct values is 20,000.


Or, more generically, suppose the join key is a STRING, or a date. it can be anything.

So, we take the key, hash it into a finite (small) number of slots - providing us the ability to "go from key to values" very quickly. For any data.

Remember, we are normally talking thousands, hundreds of thousands, millions, billions of rows. 14 rows in emp and 4 in dept are good for powerpoint (they fit) but they are not "realistic"

Properties of Hash Function

June 08, 2006 - 6:31 pm UTC

Reviewer: S from Canada

Source: </code> http://tracer.lcc.uma.es/problems/avalanche/avalanche.html <code>

" Hash functions need to meet a quite demanding, and sometimes contradictory, set of properties, which are summarized in the following:

1. The input can be of any length
2. The output has a fixed length
3. For any x, H(x) is easy to compute
4. H is one-way i.e. it is hard to invert, in the sense that for any y it is computationally infeasible to find an x such as y=H(x)
5. H is collision-free i.e. for any x it is computationally infeasible to find another x’ such as H(x)=H(x’)"

Obviously, property 4 is required if security is an issue.

Excellent presentation! Thanks!

June 14, 2006 - 9:38 am UTC

Reviewer: A reader


Presentation

July 02, 2006 - 3:56 pm UTC

Reviewer: Bob from London, UK

As we say in the UK - absolutely top demonstration. I now understand hash joins!!!

It would be great if we could have a ctopic on interpreting TKPROF!

Excellent Presentation...

July 12, 2006 - 2:05 pm UTC

Reviewer: A Reader

Would like to see all such presentations at one place.

Question on hash join.

August 04, 2006 - 12:18 pm UTC

Reviewer: Dihan Cheng from USA

Hello Tom, I read your 'Effective Oracle By Design' topic about Hash Outer Join on Page 469. Here is the quote:

"If I were to sit down and write the hash-join algorithm, I would have gone in the other direction. I would have hashed the table whose rows were not being preserved, and then read the table whose rows were being preserved, probing the hash table to see whether there was a row to join to. Oracle has chosen to do it the other way"

I think the best way is Oracle should give user a hint (similar to that of nested loop) to let user be involved in choosing which table used as the outer one, which to be used as the inner one.

For example, in the case of the example you give on Page 470.
In the real work, EMP table normally have tens of thousands of records, DEPT table normally has most hundreds of records. In this case, I'd prefer hash the DEPT table (The table whose rows are being preserved) since it's smaller. But the disadvantages of this approach is the order of the resultset is in order of the EMP table. Thus, in other circumstances, if the user wants the resultset be in order of DEPT, we have to hash the EMP table instead.

Any advice is very appreciated!

warm regards,

Dihan

Tom Kyte

Followup  

August 04, 2006 - 12:23 pm UTC

Currently, in 10g - we can go "either way" and the optimizer will decide which way to go.

In 9i - there was ONLY one way - so a hint would be bad since if you told it to do it a way that wasn't implemented.... :)

I think the best way is for less hints, not more.

The order by happens after the join regardless.

Question on hash join.

August 04, 2006 - 1:59 pm UTC

Reviewer: Dihan Cheng from USA

Most useful to me Tom.

Except for the following sentence I am still confused.

"The order by happens after the join regardless."

Could you please give me more illustration on your intention to tell me this?

warm regards,

Dihan

Tom Kyte

Followup  

August 04, 2006 - 2:43 pm UTC

query that joined emp to dept and order by dept would:

a) hash join emp to dept
b) then sort

the "order" of rows on disk - not really relevant, not really controllable even.

Question about hash join.

August 04, 2006 - 3:06 pm UTC

Reviewer: Dihan Cheng from USA

I see, what you mean is it's less important or even uncontrollable to get the order of the
query (select * from DEPT) after the join.



warm regards,

Dihan

Hash Join is chosed over Nested loop for some reason

October 09, 2006 - 2:13 am UTC

Reviewer: Oraboy

Hi 
 I encountered this problem recently.
(Table names have been changed to hide customer info) 
  
 SQL>SELECT /*+ INDEX(A INDX_PRO_REQ) */
   2  A.TEST_PRO_ID, A.TEST_PRO_COMP_TIME
   3  FROM TEST_PRO A
   4  WHERE A.PRO_TYPE_ID = 10
   5  AND A.PRO_STATUS_ID != 4
   6  AND A.TEST_REQ_ID IN
   7    ( SELECT GRP_ID FROM TEST_GRMTI WHERE MBR_ID=3091
   8       UNION
   9      SELECT USR_ID FROM TEST_USERS WHERE USR_ID = 3091
  10       UNION
  11      SELECT MBR_ID FROM TEST_GRMTI WHERE GRP_ID IN (3090)
  12       UNION
  13      SELECT USR_ID FROM TEST_USERS WHERE USR_ID IN (3090)
  14      )
  15  /
  
 
 1025 rows selected.
 
 Elapsed: 00:00:03.34 -- 3 seconds
 
 Execution Plan
 ----------------------------------------------------------
 Plan hash value: 1125416648
 
 ----------------------------------------------------------------------------------------------------
 | Id  | Operation                        | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
 ----------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                 |                           |  9429 |   349K| 13335   (1)| 00:02:41 |
 |*  1 |  TABLE ACCESS BY INDEX ROWID     | TEST_PRO                  |    33 |   825 |    47   (0)| 00:00:01 |
 |   2 |   NESTED LOOPS                   |                           |  9429 |   349K| 13335   (1)| 00:02:41 |<--
 |   3 |    VIEW                          | VW_NSO_1                  |   283 |  3679 |    17  (24)| 00:00:01 |
 |   4 |     SORT UNIQUE                  |                           |   283 |  2820 |    17  (65)| 00:00:01 |
 |   5 |      UNION-ALL                   |                           |       |       |            |          |
 |   6 |       TABLE ACCESS BY INDEX ROWID| TEST_GRMTI                |     4 |    40 |     6   (0)| 00:
 |*  7 |        INDEX RANGE SCAN          | GRMTI_MBR_ID_INDX         |     4 |       |     3   (0)| 00:00:01 |
 |*  8 |       INDEX UNIQUE SCAN          | TEST_USRS_PK1             |     1 |     5 |     2   (0)| 00:00:01 |
 |*  9 |       INDEX RANGE SCAN           | GRMTI_GRP_PK              |   277 |  2770 |     3   (0)| 00:00:01 |
 |* 10 |       INDEX UNIQUE SCAN          | TEST_USR_PK1              |     1 |     5 |     2   (0)| 00:00:01 |
 |* 11 |    INDEX RANGE SCAN              | INDX_PRO_REQ              |   131 |       |     0   (0)| 00:00:01 |
 ----------------------------------------------------------------------------------------------------
 
 Predicate Information (identified by operation id):
 ---------------------------------------------------
 
    1 - filter("A"."PRO_TYPE_ID"=10 AND "A"."PRO_STATUS_ID"<>4)
    7 - access("MBR_ID"=3091)
    8 - access("USR_ID"=3091)
    9 - access("GRP_ID"=3090)
   10 - access("USR_ID"=3090)
   11 - access("A"."TEST_REQ_ID"="$nso_col_1")
 
 
 Statistics
 ----------------------------------------------------------
          17  recursive calls
           0  db block gets
        1138  consistent gets                             <-- 1.1K LIOs
           0  physical reads
           0  redo size
       18731  bytes sent via SQL*Net to client
         712  bytes received via SQL*Net from client
          70  SQL*Net roundtrips to/from client
           1  sorts (memory)
           0  sorts (disk)
       1025  rows processed
       
       
  Without the index:
  -----------------
  
  Elapsed: 00:00:08.51 <----
  
  Execution Plan
  ----------------------------------------------------------
  Plan hash value: 3363079960
  
   -----------------------------------------------------------------------------------------------------------
  | Id  | Operation                        | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
   -----------------------------------------------------------------------------------------------------------
  |   0 | SELECT STATEMENT                 |                          |  9429 |   349K|  2703   (3)| 00:00:33 |
  |*  1 |  HASH JOIN                       |                          |  9429 |   349K|  2703   (3)| 00:00:33 |<--COST is low
  |   2 |    VIEW                          | VW_NSO_1                 |   283 |  3679 |    17  (24)| 00:00:01 |
  |   3 |     SORT UNIQUE                  |                          |   283 |  2820 |    17  (65)| 00:00:01 |
  |   4 |      UNION-ALL                   |                          |       |       |            |          |
  |   5 |       TABLE ACCESS BY INDEX ROWID| TEST_GRMTI               |     4 |    40 |     6   (0)| 00:
  |*  6 |        INDEX RANGE SCAN          | GRMTI_MBR_ID_INDX        |     4 |       |     3   (0)| 00:00:01 |
  |*  7 |       INDEX UNIQUE SCAN          | TEST_USRS_PK1            |     1 |     5 |     2   (0)| 00:00:01 |
  |*  8 |       INDEX RANGE SCAN           | GRMTI_GRP_PK             |   277 |  2770 |     3   (0)| 00:00:01 |
  |*  9 |       INDEX UNIQUE SCAN          | TEST_USR_PK1             |     1 |     5 |     2   (0)| 00:00:01 |
  |* 10 |   TABLE ACCESS FULL              | PROCESS                  |   195K|  4770K|  2687   (3)| 00:00:33 |
  -------------------------------------------------------------------------------------------------------------
  
  Predicate Information (identified by operation id):
  ---------------------------------------------------
  
     1 - access("A"."TEST_REQ_ID"="$nso_col_1")
     6 - access("MBR_ID"=3091)
     7 - access("USR_ID"=3091)
     8 - access("GRP_ID"=3090)
     9 - access("USR_ID"=3090)
    10 - filter("A"."TEST_PRO_TYPE_ID"=10 AND "A"."TEST_PRO_STATUS_ID"<>4)
  
  
  Statistics
  ----------------------------------------------------------
           15  recursive calls
            0  db block gets
        12055  consistent gets   <-- LIOS
        11138  physical reads
            0  redo size
        19292  bytes sent via SQL*Net to client
          712  bytes received via SQL*Net from client
           70  SQL*Net roundtrips to/from client
            1  sorts (memory)
            0  sorts (disk)
       1025  rows processed



Question: 
As you see above , if I run the sql as is (CBO-> ALL_ROWS), it picks up hash_join which takes about 11 seconds. I am not worried about elapsed time but I cannot understand why it would pick such plan when nested loops (when hinted) does the same job with lot less LIOs and elapsed time is less then 4 seconds.

I cannot understand what perhaps would confuse the optimizer to give NL a much higher cost (13K) which in practise looks like a cheaper operation.

version: 10.0.1.4

I have also tried 
a) changing the PGA to manual to take hash_area_size (132K) into effect
b) optimizer_index_caching bump upto 35


Index and table stats are upto date.

Any pointers would be helpful,
Thanks 

Tom Kyte

Followup  

October 09, 2006 - 8:27 am UTC

have you compared at all the

a) guess (explain plan/autotrace output)
b) reality (tkprof row source plan)

I would rather suspect "optmizer had to guess at cardinality, optimizer guessed wrong"

if you remove the index hint and replace with a level 3 dynamic sample:

/*+ dynamic_sampling(A 3) */

to help the optimizer out with its guess what then?

Very nice

October 24, 2006 - 4:16 am UTC

Reviewer: Bard from Norway

This was just what I was looking for! Just the right amount of detail and good examples!

/Bard

Please do a similar presentation on nest loops and merge joins

October 31, 2006 - 10:59 pm UTC

Reviewer: A reader

Please do a similar presentation on nest loops and merge joins

hash join and v$session_longops

November 08, 2006 - 10:01 am UTC

Reviewer: Kristian from Stockholm. Sweden

Thanks for a great explanation Tom.

The thing that confuses me a little now is the operations listed in v$session_longops during a hash join. For instance, I have a query that hash joins two larger tables (A and B) and then does a group by. During the execution I can see theese different operations in v$session_longops (in order).

* table scan (of smaller A)
* table scan (of larger B)
* hash join
* sort output

What is really taking place during the 'hash join' operation. I though that the hash join was performed during the full scan of the larger table (B)?

Or is this a result of a hash table not fitting into memory?



Tom Kyte

Followup  

November 08, 2006 - 7:47 pm UTC

got plan?

Hash joins and temp space

November 19, 2006 - 9:40 pm UTC

Reviewer: scarletmanuka from Perth, WA Australia

Thanks for the info. Regarding the paging to temp issue, does the CBO look at the amount of data and estimate whether it will fit, or is there a database parameter affecting the ranking between nested loops and hash joins?

We have several fairly large tables (1-20 million records) which are all being joined together in a regular report. This report is now failing due to data growth (this system has been accumulating data for four months now; statistics are analysed regularly). Oracle is running out of temp space trying to hash join everything together, even though the tables are joined by the primary key indexes. I can get around this by adding selected index hints, but I wondered if there was a better way.

Tom Kyte

Followup  

November 20, 2006 - 2:52 am UTC

the CBO will use sort/hash area size with manual memory management or the pga aggregate target with automatic memory management in its computations.

and before you start using indexes here, you might want to peek at this:
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6749454952894#6760861174154 <code>

Hash joins and temp space

November 20, 2006 - 11:07 pm UTC

Reviewer: scarletmanuka from Perth, WA Australia

Sorry, should have been clearer in my example. We do join all these large tables together, but only return between 5K and 100K rows out of the currently ~1 million in the main table. So the hash joins are spending a lot of time (and running out of temp space) going over rows we don't actually need. [The extract is to get the current week's transaction data, so the ratio is very roughly 1 to (number of weeks system has been in place).]

Timings for the reporting process that died went like this for the last week:
8 min, 8 min, 9 min, 14 min, failed, failed, failed
Adding selected index hints to the query has brought the process time back down to 9 min again (note that most of this is processing time; the query returned all results in just over 2 min).

Adding more useful indexes to the database would certainly help too, and would probably help the CBO avoid the hash joins that are killing it, but I have to go and nudge the DBA a bit more to get that done. (The link from the main table to the period table is indexed, but the field on the period table that we're using to restrict this query is not indexed!! So the CBO doesn't realise that we don't need all the data...)

So I'm not blaming the CBO, it's just that until the indexing is fixed I need to persuade it not to do the right thing. ;-)

Tom Kyte

Followup  

November 22, 2006 - 3:09 pm UTC

are the estimated cardinalities really correct then - compare the explain plan to a TKPROF (you compare)


Most brilliant!

November 26, 2006 - 4:09 pm UTC

Reviewer: Apurva from India

Tom,

You are a magician!

Btw, does 'TK' in TKPROF stand for Tom Kyte?

Regards,

Tom Kyte

Followup  

November 26, 2006 - 7:47 pm UTC

no, it does not :)

Hash Join Demo URL Broken

December 01, 2006 - 7:14 pm UTC

Reviewer: Chi H from California

Tom,
Your demo referenced in the tinyurl doesn't work right now. Can you put in a new URL?

Tom Kyte

Followup  

December 02, 2006 - 12:44 pm UTC

fixed, thanks

Very Usefull

December 04, 2006 - 2:46 am UTC

Reviewer: Khushru F Doctor from Ahmedabad - India

These are really very good discussions that educate us way beyond any Book or manual.

Tom is the the ORACLE USERS man inside ORACLE Corp. giving us all these wonderfull pieces of information that we would never have been able to find out.

Also the TINYURL link still does not work !!



Tom Kyte

Followup  

December 04, 2006 - 7:21 am UTC

the tinyurl link will never work, I updated the links above to not be tinyurl links.

Great Presentation

December 05, 2006 - 2:02 pm UTC

Reviewer: Roland from Virginia

Awesome!!!

hash join vs nested loops

March 25, 2007 - 5:28 pm UTC

Reviewer: gabriel from romania

hello tom !
I'm writing here becouse I hope you bring some ligth in a problem related to which factors the optimizer take into account when it choose "the best plan" with hash or with loops. I read all the infos presented about hash join and I could't find an acceptable answer. So, here is the situation :

I have a table 'pricelists'
desc pricelist
 Name                      Null?    Type
 -----------------------------------------
 ID                     NOT NULL   NUMBER(11)
 ITEMKEY                           VARCHAR2(20)
 PRICE                             FLOAT(39)
 CURRENCYCODE                      VARCHAR2(5)
 PRICELISTNUMBER                   NUMBER(11)
 DATF                              DATE

with some indexes define on it.

The same table (structure an indexes) is define for two diferent users(schema) on the same server (9.2.0.6). The only diference is in the number of rows : one have 802556 and the other 1930500. The select is like that :
SELECT  pl1.ID, pl1.itemkey, pl1.price, pl1.currencycode,
        pl1.pricelistnumber, pn.pricelistname, pl1.datf
   FROM pricelists pl1, pricelistnames pn
  WHERE pn.pricelistid = pl1.pricelistnumber
    AND pl1.itemkey = '15767'
    AND pl1.datf =
           (SELECT MAX (datf)
              FROM pricelists pl2
             WHERE pl2.pricelistnumber = pl1.pricelistnumber
               AND pl2.datf <= '19-MAR-2007'
               AND pl2.itemkey = '15767')
    AND pl1.datf <= '19-MAR-2007'
ORDER BY datf DESC;


Now on the bigger table the execution plan is :
--------------------------------------------------------------------------------------------
| Id  | Operation                       |  Name               | Rows  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                     |     1 |   117 |   305   (2)|
|*  1 |  FILTER                         |                     |       |       |            |
|   2 |   SORT GROUP BY                 |                     |     1 |   117 |   305   (2)|
|*  3 |    HASH JOIN                    |                     |   164 | 19188 |   304   (2)|
|*  4 |     TABLE ACCESS BY INDEX ROWID | PRICELISTS          |   164 |  5576 |   150   (1)|
|*  5 |      INDEX RANGE SCAN           | PRICELISTS_ITEMKEY  |    18 |       |     4  (25)|
|*  6 |     HASH JOIN                   |                     |   164 | 13612 |   153   (2)|
|   7 |      TABLE ACCESS FULL          | PRICELISTNAMES      |     8 |    96 |     3  (34)|
|*  8 |      TABLE ACCESS BY INDEX ROWID| PRICELISTS          |   164 | 11644 |   150   (1)|
|*  9 |       INDEX RANGE SCAN          | PRICELISTS_ITEMKEY  |   370 |       |     4  (25)|
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PL1"."DATF"=MAX("PL2"."DATF"))
   3 - access("PL2"."PRICELISTNUMBER"="PL1"."PRICELISTNUMBER")
   4 - filter("PL2"."DATF"<=TO_DATE('2007-03-19 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
   5 - access("PL2"."ITEMKEY"='15767')
   6 - access("PN"."PRICELISTID"="PL1"."PRICELISTNUMBER")
   8 - filter("PL1"."DATF"<=TO_DATE('2007-03-19 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
   9 - access("PL1"."ITEMKEY"='15767')

26 rows selected.


and on the smaller table is :
-----------------------------------------------------------------------------------------
| Id  | Operation                        |  Name           | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                 |     1 |    60 |    50   (6)|
|*  1 |  FILTER                          |                 |       |       |            |
|   2 |   SORT GROUP BY                  |                 |     1 |    60 |    50   (6)|
|*  3 |    TABLE ACCESS BY INDEX ROWID   | PRICELISTS      |     1 |    17 |     7  (15)|
|   4 |     NESTED LOOPS                 |                 |     1 |    60 |    48   (3)|
|   5 |      NESTED LOOPS                |                 |     1 |    43 |    42   (3)|
|*  6 |       TABLE ACCESS BY INDEX ROWID| PRICELISTS      |     1 |    31 |    41   (3)|
|*  7 |        INDEX SKIP SCAN           | IDX_PRLS_DATA   |  1267 |       |    21   (5)|
|   8 |       TABLE ACCESS BY INDEX ROWID| PRICELISTNAMES  |     1 |    12 |     2  (50)|
|*  9 |        INDEX UNIQUE SCAN         | SYS_C00206416   |     1 |       |            |
|* 10 |      INDEX RANGE SCAN            | IDX_PRLS_DATA   |   211 |       |     3  (34)|
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PL1"."DATF"=MAX("PL2"."DATF"))
   3 - filter("PL2"."ITEMKEY"='15767')
   6 - filter("PL1"."ITEMKEY"='15767')
   7 - access("PL1"."DATF"<=TO_DATE('2007-03-19 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
       filter("PL1"."DATF"<=TO_DATE('2007-03-19 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
   9 - access("PN"."PRICELISTID"="PL1"."PRICELISTNUMBER")
  10 - access("PL2"."PRICELISTNUMBER"="PL1"."PRICELISTNUMBER" AND
              "PL2"."DATF"<=TO_DATE('2007-03-19 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
       filter("PL2"."DATF"<=TO_DATE('2007-03-19 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

29 rows selected.


Of course, the fast one is the one on the bigger table with the hash join (no matter what the optimeizer said). What I don't understand is WHY ? In both cases the table are fully anlyzed with dbms_stats.gather_table_stats. As I said the only difference is in the number of rows so I delete the statistics for the smaller table and suprise : the execution plan was the one with hash join ! So, how is oracle choosing hash join or nested loop based only on the number of rows or better said only on statistics (because all other parameters are the same)??? the difference between two plans is huge : 3 min !


Hash Join

November 12, 2007 - 8:08 am UTC

Reviewer: karthick pattabiraman from india

I have the folloing update statement

16:34:28 [SYSADM@AKIVATST]> explain plan for
17:01:22   2  UPDATE (SELECT name,
17:01:30   3                 hx_new_value
17:01:30   4            FROM HX_TRANSACTION a,
17:01:30   5                 HX_STAGING b
17:01:30   6           WHERE b.hx_old_value = a.name
17:01:30   7             AND b.hx_field_name = 'NAME')
17:01:30   8     SET name = hx_new_value ;

Explained.

17:01:32 [SYSADM@AKIVATST]> set linesize 1000
17:01:41 [SYSADM@AKIVATST]> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
Plan hash value: 1814070594

----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name                 | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT    |                      |  5007K|   205M|       | 37122   (2)| 00:07:26 |
|   1 |  UPDATE             | HX_TRANSACTION       |       |       |       |            |          |
|*  2 |   HASH JOIN         |                      |  5007K|   205M|    14M| 37122   (2)| 00:07:26 |
|*  3 |    TABLE ACCESS FULL| HX_STAGING     |   399K|     9M|       |   406   (5)| 00:00:05 |
|   4 |    TABLE ACCESS FULL| HX_TRANSACTION       |  5001K|    81M|       | 25925   (2)| 00:05:12 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("B"."HX_OLD_VALUE"="A"."NAME")
   3 - filter("B"."HX_FIELD_NAME"='NAME')


I can see that HX_STAGING is hashed in memory. Below I collected the time for different volume of records in staging table which is hashed.


Sno Transaction Staging Execution Time Diff 
  Count  Count Time in Min In Min 
------------------------------------------------------------     
1 5000000  100000 6.262833333 0 
2 5000000  200000 6.373166667 0.110333333
3 5000000  300000 6.916666667 0.5435
4 5000000  400000 27.67783333 20.76116667

Now if you see for till staging is 300000 the time is some what consistent. But when it is increased to 400000 I see a big difference of 20.76116667. Is this because the staging table is unable to fit in memory. How can I overcome this problem? Do I have to increase the memory size? If so how to do that?

Tom Kyte

Followup  

November 16, 2007 - 1:01 pm UTC

probably - trace it and look at the tkprof. see if you see waits for IO on temp, that would be a good indication.

You don't "have" to do anything, you should work with your DBA as you are probably NOT the only thing on this machine in real life and hence consuming all memory for you would not be good - it needs to be done in coordination with the system as a whole.

The DBA or administrator of this system can advise you on how you might set your hash area size for your purposes.

Follow up

November 13, 2007 - 2:42 am UTC

Reviewer: karthick pattabiraman from India

This is a follow-up to my previous posting.

After doing some research on hash join i came up with this. Just thought of sharing with everyone.

From the performance tuning guide

"Ideally, the size of a work area is big enough that it can accommodate the input data and auxiliary memory structures allocated by its associated SQL operator. This is known as the optimal size of a work area."

Following is my setting

NAME TYPE VALUE
------------------------------------ -------------------------------- ----------------------------
pga_aggregate_target big integer 104857600
hash_area_size integer 131072
sort_area_size integer 65536
workarea_size_policy string AUTO

Following are my question.

1. When workarea_size_policy is AUTO does the value in hash_area_size really matters.

2. When value is set in hash_area_size and workarea_size_policy is AUTO, will oracle take hash_area_size as the maximum size to create Hash table or it will determine by itself based on pga_aggregate_target.

3. How can I determine if the table or the result set that is been hashed fits in the Hash area. How to find the size of the result set or the table to determine the optimal size of the work area.

Tom Kyte

Followup  

November 16, 2007 - 2:16 pm UTC

1) in 9ir2, when using shared server, AUTO is not used. In 10g, when set to auto, auto is used in dedicated and shared servers.

2) no, if you are using auto, hash and sort area sizes are not used, AUTO is used.

3) there is the pga advisor, there is sql_tracing.

I found the answer :o)

November 13, 2007 - 9:53 am UTC

Reviewer: karthick pattabiraman from India

Found all the answers by my self. Tom correct me if iam wrong.

1. When workarea_size_policy is AUTO does the value in hash_area_size really matters.

ANS: NO when workarea_size_policy is set as AUTO none of the *_AREA_SIZE will be consisered.

2. When value is set in hash_area_size and workarea_size_policy is AUTO, will oracle take
hash_area_size as the maximum size to create Hash table or it will determine by itself based on
pga_aggregate_target.

ANS: This is answered in first question itself


3. How can I determine if the table or the result set that is been hashed fits in the Hash area.
How to find the size of the result set or the table to determine the optimal size of the work area.

ANS: For tuning your pga_aggregate_target value we can use the following view V$PGA_TARGET_ADVICE. Got this beautiful query from the doc..

SELECT round(PGA_TARGET_FOR_ESTIMATE/1024/1024) target_mb,
ESTD_PGA_CACHE_HIT_PERCENTAGE cache_hit_perc,
ESTD_OVERALLOC_COUNT
FROM V$PGA_TARGET_ADVICE;

This will help you to determing if we need to increase pga_aggregate_target. Any way after changing the size it need to be tested. This worked for me :-)
Tom Kyte

Followup  

November 16, 2007 - 2:23 pm UTC

1) sort of, see above.


_hash_join_enabled Parameter

November 19, 2007 - 12:02 am UTC

Reviewer: Saurabh Khurana from USA

Hi Tom,
I am stuck with a puzzle and have spent a lot of time but have no clue about it so far. I have an application back ground and it's only this incident that I will talk about which lead me to dwell deep into Oracle.

Any ways, One of our customers reported huge performance problems and the DBAs I work with after some investigation asked them to set the obsolete parameter _hash_join_enabled to false and one more undocumented parameter _optimizer_cost_model to 'io'. Apparantely it solved their system wide problem & the customer was happy but I wasn't.

I was not satisfied with the fact that we disabled hash joins all together. There has to be a reason why Oracle made this parameter obsolete. If it is obsolete why use it? None of the DBAs had any satisfying answer. This lead me to pursue an answer myself and that's when I got your books and jumped into Oracle. I have been trying all sorts of experiments to find an answer. Last 3 weeks have been spent in search of enlightenment but in vain :(. This forum is my last hope!

I have failed to understand the rationale behind what improved the performance. Why were hash joins detrimental to the performance? And if they were why did not optimizer chose sort merge at the first place? Here are some facts.

1. Oracle version is 10.2.0.3.
2. The memory management is automatic and the aggregate PGA is set to 1.2G which seems to be Ok.
3. The Target SGA is 3.5G
3. When hash joins are disabled the optimizer does sort merges mostly.
4. With hash joins the LIO's are almost 20-30 times more. The LIO for the queries is pretty high. 1M consistent gets with hash joins disabled, 20M o/w.

5. The statistics are up to date.
6. The tables in concern range from 1M to 20M rows.
7. The cost of the queries in concern becomes 30% of current when hash joins are enabled, but they are 5-50 times slow!!

Any help or any lead would really be appreciated.

Regards,
Saurabh

HASH JOIN RIGHT SEMI

November 21, 2007 - 5:07 am UTC

Reviewer: Karthick. from India

Normally in explain paln i use to get HASH JOIN but for a sql iam getting HASH JOIN RIGHT SEMI what does that mean. is it any thing different from HASH JOIN.

_hash_join_enabled Parameter obsolete

January 28, 2009 - 5:10 pm UTC

Reviewer: Peter from Slovakia

Hello,

I face similar issues as Saurabh Khurana.
I can not set "_hash_join_enabled"=false for already running sessions, so I must kill them when not optimal hash_join queries will occur.

I can not figure out why explain plan in 10.2.0.3 on big tables always prefer hash_join resulting in XX more time
then executions in 9i.

I can easily use hint with nested loop or kill session and issue "_hash_join_enabled"=false which turns queries into microseconds immediately.

However, I can not figure why 10.2.0.3 goes in dummy way
with hash joins in explain plan even when all kinds of statistics on affected BIG tables are calculated ( with histograms in any possible combinations, without them, etc...) and query ask for a few records where index
exists.

I do not put specific query here, I have seen these
behaviour on XX databases on many machines with many
different SQLs ( just one thing is common for that issue to get optimizer in wrong way: "OR" is used in such queries )

Any hint?

Peter.


Tom Kyte

Followup  

January 30, 2009 - 12:24 pm UTC

give me an example. show us a tkprof (it should have the ROW SOURCE OPERATION section AND the explain plan section - both of them) that shows that when the hash join is chosen - it is XX more time and a query that was forced to use nested loops and an index was XX times faster.

And provide your non-default (ONLY THE NON-DEFAULT) init.ora settings, you can query v$parameter isdefault to find them.

And provide whether you intended to get the first rows as fast as possible or whether you wanted to get all rows.

Hash join is most certainly "not a dummy approach" in any way shape or form. My guess is either

a) you have some wickedly poor init.ora non default parameters
b) your statistics are way out of date
c) you are running an interactive query that you want to get the first rows as soon as possible - but you are using all_rows optimization


see:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6749454952894#6760861174154

for example. That query there - it would take 5-6 seconds to return the first record using the hash join, but 10 seconds to get all of them. Whereas the nested loops join would return the first record INSTANTLY but - as you can see - it would take slightly longer to get to the last one.

February 03, 2009 - 12:03 pm UTC

Reviewer: Reader

Tom,
Are there any ohter prsentations like the one below?

http://asktom.oracle.com/tkyte/hj/hj.html
Tom Kyte

Followup  

February 03, 2009 - 12:10 pm UTC

nope, I did that once so far and it took a lot of work...

need advices on HASH JOIN

February 08, 2009 - 7:56 am UTC

Reviewer: Mike from LA

Tom,
I have the 10gR2. the SGA and PGA info as:
sql> @showpga

NAME VALUE
------------------------------ --------------
session uga memory 921,776
session uga memory max 1,518,856
session pga memory 1,569,048
session pga memory max 1,896,728
--------------
sum 5,906,408

sql>
sql> @vsga


NAME VALUE
-------------------- --------------
Database Buffers 2,248,146,944
Fixed Size 2,242,736
Redo Buffers 10,813,440
Variable Size 2,033,764,176
--------------
sum 4,294,967,296

The tables' info in the query:
Tables info

num_rows table_name last_analyzed
-----------------------------
54470 PA_PROJECTS_ALL 08-FEB-09
2104470 PA_TASKS 08-FEB-09
5420270 PA_RESOURCE_ASSIGNMENTS 08-FEB-09
119610 PA_BUDGET_VERSIONS 08-FEB-09

The query to be shown run more than 2 hours for returning 1263880 records.

I ran it as:
01:25:10 sql>> set autotrace trace
01:25:22 sql>> SELECT
01:25:32 2 'PRJ_'||UPPER(P.SEGMENT1),
01:25:32 3 'PRJ_'||UPPER(P.SEGMENT1)||'_TSK_'||UPPER(T.TASK_NUMBER),
01:25:32 4 UPPER('ACTIVITY '||P.SEGMENT1||', '||T.TASK_NUMBER||' - '||T.DESCRIPTION),
01:25:32 5 UPPER(P.SEGMENT1||', '||T.TASK_NUMBER||' - '||T.DESCRIPTION),
01:25:32 6 UPPER(P.SEGMENT1||', '||T.TASK_NUMBER)
01:25:32 7 FROM PA_PROJECTS_ALL P
01:25:32 8 , PA_TASKS T
01:25:32 9 , PA_RESOURCE_ASSIGNMENTS A
01:25:32 10 , PA_BUDGET_VERSIONS B
01:25:32 11 WHERE P.PROJECT_ID = T.PROJECT_ID
01:25:32 12 AND T.TASK_ID <> T.PARENT_TASK_ID
01:25:32 13 AND T.PARENT_TASK_ID IS NOT NULL
01:25:32 14 AND P.PROJECT_ID = B.PROJECT_ID
01:25:32 15 AND P.PROJECT_ID = A.PROJECT_ID
01:25:32 16 AND T.TASK_ID = A.TASK_ID
01:25:32 17 AND B.BUDGET_VERSION_ID = A.BUDGET_VERSION_ID
01:25:32 18 AND B.BUDGET_STATUS_CODE = 'B'
01:25:32 19 AND B.BUDGET_TYPE_CODE = 'Current'
01:25:32 20 AND B.CURRENT_FLAG = 'Y'
01:25:32 21 /

1263880 rows selected.


Execution Plan
----------------------------------------------------------

-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 106 | 25304 |
| 1 | NESTED LOOPS | | 1 | 106 | 25304 |
| 2 | HASH JOIN | | 12 | 636 | 25280 |
| 3 | HASH JOIN | | 9968 | 350K| 3579 |
| 4 | TABLE ACCESS FULL | PA_BUDGET_VERSIONS | 9968 | 223K| 3109 |
| 5 | VIEW | index$_join$_001 | 54470 | 691K| 469 |
| 6 | HASH JOIN | | | | |
| 7 | INDEX FAST FULL SCAN | PA_PROJECTS_U1 | 54470 | 691K| 145 |
| 8 | INDEX FAST FULL SCAN | PA_PROJECTS_U2 | 54470 | 691K| 321 |
| 9 | INDEX FAST FULL SCAN | PA_RESOURCE_ASSIGNMENTS_U2 | 5420K| 87M| 21615 |
| 10 | TABLE ACCESS BY INDEX ROWID| PA_TASKS | 1 | 53 | 2 |
| 11 | INDEX UNIQUE SCAN | PA_TASKS_U1 | 1 | | 1 |
-------------------------------------------------------------------------------------------



Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4668610 consistent gets
460575 physical reads
10220 redo size
77725800 bytes sent via SQL*Net to client
884947 bytes received via SQL*Net from client
126389 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1263880 rows processed

04:02:44 sql>>

It had run about 2.5 hrs.

Then I tried to force the hash-join since we have hugh SGA and PGA.
sql>> set time on
02:31:59 sql>> set autotrace trace
02:32:28 sql>>
02:32:28 sql>> SELECT /*+ use_hash(p t) */
02:32:41 2 'PRJ_'||UPPER(P.SEGMENT1),
02:32:41 3 'PRJ_'||UPPER(P.SEGMENT1)||'_TSK_'||UPPER(T.TASK_NUMBER),
02:32:41 4 UPPER('ACTIVITY '||P.SEGMENT1||', '||T.TASK_NUMBER||' - '||T.DESCRIPTION),
02:32:41 5 UPPER(P.SEGMENT1||', '||T.TASK_NUMBER||' - '||T.DESCRIPTION),
02:32:42 6 UPPER(P.SEGMENT1||', '||T.TASK_NUMBER)
02:32:42 7 FROM PA_PROJECTS_ALL P
02:32:42 8 , PA_TASKS T
02:32:42 9 , PA_RESOURCE_ASSIGNMENTS A
02:32:42 10 , PA_BUDGET_VERSIONS B
02:32:42 11 WHERE P.PROJECT_ID = T.PROJECT_ID
02:32:42 12 AND T.TASK_ID <> T.PARENT_TASK_ID
02:32:42 13 AND T.PARENT_TASK_ID IS NOT NULL
02:32:42 14 AND P.PROJECT_ID = B.PROJECT_ID
02:32:42 15 AND P.PROJECT_ID = A.PROJECT_ID
02:32:42 16 AND T.TASK_ID = A.TASK_ID
02:32:42 17 AND B.BUDGET_VERSION_ID = A.BUDGET_VERSION_ID
02:32:42 18 AND B.BUDGET_STATUS_CODE = 'B'
02:32:42 19 AND B.BUDGET_TYPE_CODE = 'Current'
02:32:42 20 AND B.CURRENT_FLAG = 'Y'
02:32:42 21 /

1263880 rows selected.


Execution Plan
----------------------------------------------------------

----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 106 | 42350 |
| 1 | HASH JOIN | | 1 | 106 | 42350 |
| 2 | HASH JOIN | | 8 | 424 | 25280 |
| 3 | HASH JOIN | | 9968 | 350K| 3579 |
| 4 | TABLE ACCESS FULL | PA_BUDGET_VERSIONS | 9968 | 223K| 3109 |
| 5 | VIEW | index$_join$_001 | 54470 | 691K| 469 |
| 6 | HASH JOIN | | | | |
| 7 | INDEX FAST FULL SCAN| PA_PROJECTS_U1 | 54470 | 691K| 145 |
| 8 | INDEX FAST FULL SCAN| PA_PROJECTS_U2 | 54470 | 691K| 321 |
| 9 | INDEX FAST FULL SCAN | PA_RESOURCE_ASSIGNMENTS_U2 | 5420K| 87M| 21615 |
| 10 | TABLE ACCESS FULL | PA_TASKS | 1837K| 92M| 17041 |
----------------------------------------------------------------------------------------



Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
535322 consistent gets
355917 physical reads
772 redo size
79117543 bytes sent via SQL*Net to client
884948 bytes received via SQL*Net from client
126389 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1263880 rows processed

04:48:07 sql>>

it still had run 2 hrs.

Based on the info presented to you, I would like to know your adivces on how to make the improvement.
TIA


Tom Kyte

Followup  

February 09, 2009 - 6:33 pm UTC

pretty much nothing above helps, but a single tkprof with a row source operation section and a WAIT EVENTS section would.

excellent

July 27, 2009 - 10:08 am UTC

Reviewer: jv from India

Dear Tom,
good day to you as always, I want to thank you once again for sharing your knowledge and expertise on Oracle DB with people around. Also can you please point to any other presentations that you have inline to this.

Thanks a ton for all your support to Oracle DB users.


Regards,
your fan.

hash join flash presentation

December 01, 2009 - 8:25 am UTC

Reviewer: George from belfast

Tom the links to your fantastic hash join flash presentation no longer work.

Is there any plan to make these available again as i'd love to point some developers and dbas towards them in work?

Thanks





Tom Kyte

Followup  

December 01, 2009 - 10:37 am UTC

I'll have to look into how I can do that, I don't have a file system anymore and couldn't figure out the syntax for flash when the 'file' was in a different location (in less then 60 seconds anyway...)

I'll see what I can do.

hash_join performance

February 04, 2010 - 4:49 pm UTC

Reviewer: A Reader

Hi Tom
Thanks for your time.
Oracle version - 9.2.0.8

SQL> set pagesize 123
SQL> set linesize 200
SQL> set timing on
SQL> set autotrace traceonly
SQL>  SELECT DISTINCT T1.objid FROM tpi T1
  2    INNER JOIN tp T2 ON T1.proc_inst2process = T2.objid
 INNER JOIN tgi T3 ON T1.objid = T3.group2proc_inst
 WHERE  (  T2.type =  0 ) AND (  T1.start_time >=  to_date('04/02/2009 00:00:00','dd/mm/yyyy hh24:mi:ss') ) AND
  3    4    5   (  T3.id LIKE  'Fail - Fault Found' )
  6  /

no rows selected

Elapsed: 00:04:36.16

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=96373 Card=1780  Bytes=94340)

   1    0   SORT (UNIQUE) (Cost=96373 Card=1780 Bytes=94340)
   2    1     HASH JOIN (Cost=96356 Card=1780 Bytes=94340)
   3    2       TABLE ACCESS (FULL) OF 'tp' (Cost=3 Card=304 Bytes=2128)

   4    2       HASH JOIN (Cost=96352 Card=1784 Bytes=82064)
   5    4         TABLE ACCESS (FULL) OF 'tgi' (Cost=93053 Card=4191 Bytes=117348)

   6    4         TABLE ACCESS (FULL) OF 'tpi' (Cost=2573   Card=1669159 Bytes=30044862)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
    1641882  consistent gets
    1641500  physical reads
       1304  redo size
        300  bytes sent via SQL*Net to client
        461  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL>
SQL> !more trace.sql
alter session set timed_statistics=true;
alter session set events '10046 trace name context forever , level 12';
SQL>
SQL> @trace
Session altered.
Elapsed: 00:00:00.00
Session altered.
Elapsed: 00:00:00.00
SQL>
SQL>  SELECT DISTINCT T1.objid FROM tpi T1
  2    INNER JOIN tp T2 ON T1.proc_inst2process = T2.objid
  3   INNER JOIN tgi T3 ON T1.objid = T3.group2proc_inst
  4   WHERE  (  T2.type =  0 ) AND (  T1.start_time >=  to_date('04/02/2009 00:00:00','dd/mm/yyyy hh24:mi:ss') ) AND
 (  T3.id LIKE  'Fail - Fault Found' )  5
  6  /
no rows selected
Elapsed: 00:07:13.37

TKPROF-
********************************************************************************

 SELECT DISTINCT T1.objid FROM tpi T1
  INNER JOIN tp T2 ON T1.proc_inst2process = T2.objid
 INNER JOIN tgi T3 ON T1.objid = T3.group2proc_inst
 WHERE  (  T2.type =  0 ) AND (  T1.start_time >=  to_date('04/02/2009 00:00:00','dd/mm/yyyy hh24:mi:ss') ) AND
 INNER JOIN tgi T3 ON T1.objid = T3.group2proc_inst
 WHERE  (  T2.type =  0 ) AND (  T1.start_time >=  to_date('04/02/2009 00:00:00','dd/mm/yyyy hh24:mi:ss') ) AND
 (  T3.id LIKE  'Fail - Fault Found' )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1    110.65     423.19    1641675    1642129          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3    110.66     423.19    1641675    1642129          0           0

Misses in library cache during parse: 1
Optimizer goal: FIRST_ROWS
Parsing user id: 42  (USER)

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  SORT UNIQUE
      0   HASH JOIN
    925    TABLE ACCESS FULL tp
      0    HASH JOIN
      0     TABLE ACCESS FULL tgi
      0     TABLE ACCESS FULL tpi


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: FIRST_ROWS
      0   SORT (UNIQUE)
      0    HASH JOIN
    925     TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'tp'
      0     HASH JOIN
      0      TABLE ACCESS   GOAL: ANALYZED (FULL) OF
                 'tgi'
      0      TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'tpi'


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       1        0.00          0.00
  db file sequential read                       202        0.02          0.54
  db file scattered read                      51359        0.70        301.97
  latch free                                   1094        0.10          3.34

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  SORT UNIQUE
      0   HASH JOIN
    925    TABLE ACCESS FULL tp
      0    HASH JOIN
      0     TABLE ACCESS FULL tgi
      0     TABLE ACCESS FULL tpi

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: FIRST_ROWS
      0   SORT (UNIQUE)
      0    HASH JOIN
    925     TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'tp'
      0     HASH JOIN
      0      TABLE ACCESS   GOAL: ANALYZED (FULL) OF
                 'tgi'
      0      TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'tpi'

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       1        0.00          0.00
  db file sequential read                       202        0.02          0.54
  db file scattered read                      51359        0.70        301.97
  latch free                                   1094        0.10          3.34
  SQL*Net message from client                     1       48.54         48.54
  row cache lock                                  2        0.94          0.94
********************************************************************************
Trace file has many
.....
WAIT #1: nam='db file sequential read' ela= 38 p1=33 p2=21361 p3=1
.....

EXPLAIN PLAN-
SQL> set autotrace off
SQL> explain plan for
  2   SELECT DISTINCT T1.objid FROM tpi T1
  INNER JOIN tp T2 ON T1.proc_inst2process = T2.objid
  3    4   INNER JOIN tgi T3 ON T1.objid = T3.group2proc_inst
  5   WHERE  (  T2.type =  0 ) AND (  T1.start_time >=  to_date('04/02/2009 00:00:00','dd/mm/yyyy hh24:mi:ss') ) AND
  6   (  T3.id LIKE  'Fail - Fault Found' )
  7  /
Explained.
Elapsed: 00:00:00.01
SQL> @xplan
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------

--------------------------------------------------------------------------
| Id  | Operation            |  Name             | Rows  | Bytes | Cost  |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                   |  1780 | 94340 | 96373 |
|   1 |  SORT UNIQUE         |                   |  1780 | 94340 | 96373 |
|*  2 |   HASH JOIN          |                   |  1780 | 94340 | 96356 |
|*  3 |    TABLE ACCESS FULL | tp          |   304 |  2128 |     3 |
|*  4 |    HASH JOIN         |                   |  1784 | 82064 | 96352 |
|*  5 |     TABLE ACCESS FULL| tgi             |  4191 |   114K| 93053 |
|*  6 |     TABLE ACCESS FULL| tpi              |  1669K|    28M|  2573 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."PROC_INST2PROCESS"="T2"."OBJID")
   3 - filter("T2"."TYPE"=0)
   4 - access("T1"."OBJID"="T3"."GROUP2PROC_INST")
   5 - filter("T3"."ID" LIKE 'Fail - Fault Found' AND
              "T3"."GROUP2PROC_INST" IS NOT NULL)
   6 - filter("T1"."START_TIME">=TO_DATE(' 2009-02-04 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

Note: cpu costing is off
25 rows selected.
Elapsed: 00:00:00.93
SQL>
 select count(*) from   tgi  "T3"
 43064766
 select count(*) from   tpi "T1"
 1784438
 select count(*) from   tp  "T2"
 879

Inexing scheme : 
"T1"."PROC_INST2PROCESS" - Indexed
"T2"."OBJID" - Primary Key Index
"T3"."GROUP2PROC_INST" - Indexed
"T1"."OBJID" - Primary Key Index
"T3"."GROUP2PROC_INST" - Indexed

Refrenced Col sizes:

 select column_name,avg_col_len  from user_tab_columns where table_name='TPI'
  and  column_name in ( upper('proc_inst2process'), upper('START_time'), upper('objid'))

 
COLUMN_NAME  AVG_COL_LEN
PROC_INST2PROCESS 4
OBJID   6
START_TIME  8

 select column_name,avg_col_len  from user_tab_columns where table_name='TP'
  and  column_name in ( upper('type'),  upper('objid'))
COLUMN_NAME AVG_COL_LEN
TYPE  3
OBJID  4

 select column_name,avg_col_len  from user_tab_columns where table_name='TGI'
  and  column_name in ( upper('group2proc_inst'),  upper('id'))
COLUMN_NAME AVG_COL_LEN
ID  26
GROUP2PROC_INST 2


SQL> show parameter hash_
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
hash_area_size                       integer     1048576
hash_join_enabled                    boolean     TRUE
SQL> show parameter work_area
SQL> show parameter work

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
workarea_size_policy                 string      manual
SQL>


Questions:
a) Is hash_area size 1M not sufficent?
b) Why 1st hash join is between tpi and tgi ( I mean table tp has fewer number of rows)?
c) How to fix the issue?
d)  table, Index stats are 14 days old . bigger 02 tables are subject to huge insert.. stats are gather on every 14 days
[exec dbms_stats.gather_schema_stats(ownname=>'OWNER', degree=>2, cascade=>true, options=>'gather');]
 (  .. as big table will always remain big and small will always small...? )
   
Regards




Tom Kyte

Followup  

February 04, 2010 - 6:37 pm UTC

a) it looks OK, I don't see direct reads/writes to/from temp - do you? Mostly I see db file scattered reads (physical IO's done for the full scan) taking time - you are reading a lot of data from disk.

b) you have to look at the rows flowing out of a step, not into for this. Only 1,780 rows come out of that first join (as far as we know).

c) what issue is there? You are doing a lot of physical IO, unless you can make physical IO faster.....

are the estimated row counts and costs seeming reasonable to you? in the explains? are they close to what reality would be?


.. hash_join performance...contd

February 04, 2010 - 5:04 pm UTC

Reviewer: A Reader

Tom,
...continued question
e) Looks table "TGI" is the "buid table"( 114K bytes) and table "TPI" is the "probe table" ( 28M bytes). Will there be multi-pass because of huge table "TPI" as we have just 1M hash_area_size?

Regards

Tom Kyte

Followup  

February 04, 2010 - 6:39 pm UTC

... Will there be multi-pass because of huge table
...

no, because the estimated numbers of rows are small - the data after the scan is not that large.

.. hash_join performance...contd

February 06, 2010 - 5:14 pm UTC

Reviewer: A Reader

Tom,

...c) what issue is there? You are doing a lot of physical IO, unless you can make physical IO faster.

I found low hanging fruit here

...
5 - filter("T3"."ID" LIKE 'Fail - Fault Found' AND
              "T3"."GROUP2PROC_INST" IS NOT NULL)
...

"T3"."GROUP2PROC_INST" IS NOT NULL is being filtered. we havent have any such direct predicate to test the NOT NULL however Optimiser is doing the right thing here .. as we have around 90% NULL values in "T3"."GROUP2PROC_INST" .

Existing normal B*tree index on "T3"."GROUP2PROC_INST" is being abused as NULLs are being checked.


Therefore,

CREATED function based index   on tgi(GROUP2PROC_INST,0) 

and then 

SQL> delete from plan_table;

7 rows deleted.

SQL> commit;

Commit complete.

SQL> explain plan for
  2   SELECT DISTINCT T1.objid FROM tpi T1
  3    INNER JOIN tp T2 ON T1.proc_inst2process = T2.objid
  4   INNER JOIN tgi T3 ON T1.objid = T3.group2proc_inst
  5   WHERE  (  T2.type =  0 ) AND (  T1.start_time >=  to_date('04/02/2009 00:00:00', 'dd/mm/yyyy hh24:mi:ss' ) ) AND
  6   (  T3.id LIKE  'Fail - Fault Found'  )
  7  /

Explained.

SQL> @xplan

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------
| Id  | Operation                      |  Name             | Rows  | Bytes | Cost  |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                   |   605 | 32065 |  1110 |
|   1 |  SORT UNIQUE                   |                   |   605 | 32065 |  1110 |
|*  2 |   HASH JOIN                    |                   |   605 | 32065 |  1103 |
|*  3 |    TABLE ACCESS FULL           | tp         |   277 |  1939 |     3 |
|*  4 |    HASH JOIN                   |                   |   606 | 27876 |  1099 |
|*  5 |     TABLE ACCESS BY INDEX ROWID| tgi       |  1682 | 47096 |    59 |
|*  6 |      INDEX FULL SCAN           | FB_TGI_G2PI_NDXA  |     1 |       |    26 |
|*  7 |     TABLE ACCESS FULL          | tpi        |   518K|  9116K|   941 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."PROC_INST2PROCESS"="T2"."OBJID")
   3 - filter("T2"."TYPE"=0)
   4 - access("T1"."OBJID"="T3"."GROUP2PROC_INST")
   5 - filter("T3"."ID" LIKE 'Fail - Fault Found')
   6 - filter("T3"."GROUP2PROC_INST" IS NOT NULL)
   7 - filter("T1"."START_TIME">=TO_DATE(' 2009-02-04 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

Note: cpu costing is off

26 rows selected.

SQL>

SQL> set autotrace traceonly
SQL>
SQL>  SELECT DISTINCT T1.objid FROM tpi T1
  2    INNER JOIN tp T2 ON T1.proc_inst2process = T2.objid
  3   INNER JOIN tgi T3 ON T1.objid = T3.group2proc_inst
  4   WHERE  (  T2.type =  0 ) AND (  T1.start_time >=  to_date('04/02/2009 00:00:00', 'dd/mm/yyyy hh24:mi:ss' ) ) AND
  5   (  T3.id LIKE  'Fail - Fault Found'  )
  6  .
SQL> set timing on
SQL>  SELECT DISTINCT T1.objid FROM tpi T1
  2    INNER JOIN tp T2 ON T1.proc_inst2process = T2.objid
  3   INNER JOIN tgi T3 ON T1.objid = T3.group2proc_inst
  4   WHERE  (  T2.type =  0 ) AND (  T1.start_time >=  to_date('04/02/2009 00:00:00', 'dd/mm/yyyy hh24:mi:ss' ) ) AND
  5   (  T3.id LIKE  'Fail - Fault Found'  )
  6  /

no rows selected

Elapsed: 00:00:00.05

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=1110 Card=605 Bytes=32065)

   1    0   SORT (UNIQUE) (Cost=1110 Card=605 Bytes=32065)
   2    1     HASH JOIN (Cost=1103 Card=605 Bytes=32065)
   3    2       TABLE ACCESS (FULL) OF 'tp' (Cost=3 Card=277 Bytes=1939)

   4    2       HASH JOIN (Cost=1099 Card=606 Bytes=27876)
   5    4         TABLE ACCESS (BY INDEX ROWID) OF 'tgi' (Cost=59 Card=1682 Bytes=47096)

   6    5           INDEX (FULL SCAN) OF 'FB_TGI_G2PI_NDXA' (NON-UNIQUE) (Cost=26 Card=1)

   7    4         TABLE ACCESS (FULL) OF 'tpi' (Cost=941 Card=518632 Bytes=9335376)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         34  consistent gets
         18  physical reads
          0  redo size
        300  bytes sent via SQL*Net to client
        460  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL>


Kindly comment.
Regards


.. hash_join performance...contd

February 06, 2010 - 5:43 pm UTC

Reviewer: A Reader

Tom,

Above explain plan output , trace output is from test env.
therefore differences in row set, bytes etc compared the same posted earlier.


Regards

URL's not working

February 11, 2010 - 4:43 pm UTC

Reviewer: Ravi from Bay Area, CA

Hello Tom,

Following URL's are not working. It would be really great if you could restore them.

http://asktom.oracle.com/tkyte/hj/hj.html
http://asktom.oracle.com/tkyte/hj/hj.exe

Thanks,
Ravi

Hash Join

February 14, 2010 - 5:18 pm UTC

Reviewer: Bob from UK

Hi Tom,

I was reading a document from Don Burleson about hash joins, in which he states the following:

"Unfortunately, the Oracle hash join is more memory intensive than a nested loop join. To be faster than a nested loop join, we must set the hash_area_size large enough to hold the entire hash table in memory (about 1.6 times the sum of the rows in the table)."

http://www.dba-oracle.com/tips_oracle_hash_joins.htm

My question: is it true that we should multiply the sum of the rows in the table by 1.6? If so, what's the reasoning behind that?

Also, any ideas what the following hint (quoted from the above URL does?):

NOREWRITE FULL USE_HASH ORDERED
Tom Kyte

Followup  

February 16, 2010 - 4:54 pm UTC

... My question: is it true that we should multiply the sum of the rows in the
table by 1.6? If so, what's the reasoning behind that?
...


I'd have to say "ask the author" - he should have some way to contact him there on his site - right?


There are lots of questions that article should prompt.

When I give my seminar, I start with:

when reading information on the interweb thing - always - ALWAYS - look for these three things:

a) a date on the document, something written in 2010 is probably more relevant than something written in 2000 - unless you are using software written in 2000 in which case the converse is true. Therefore, you NEED some date context.

b) a version, what version is the author talking about. I give many examples of tuning advice that was correct 10 years ago but either doesn't make sense today (doesn't affect performance, just makes things harder to understand or code) or is counterproductive (is slower than the 'untuned' approach used to be). If you read a paper on 11g and you are using 9i - it might NOT apply to you - and vice versa

c) and most importantly a place to followup, to question, to review, to add your "yeah but", to add you "what about ifs". If you cannot do that, I say ignore the stuff - it might be correct, it might not be, it might have been pointed out to the author 1,000,000 times it is wrong - but they don't do anything. In short, if you cannot comment/review it - ignore it.



All of those hints are documented in the documentation (which is dated, versioned, and available to be commented on!)

http://docs.oracle.com/docs/cd/E11882_01/server.112/e10592/sql_elements006.htm#sthref493

March 19, 2010 - 9:33 am UTC

Reviewer: Bipul from London, UK

Hi Tom,

Looks like the following URLs have been moved.

http://asktom.oracle.com/tkyte/hj/hj.html
http://asktom.oracle.com/tkyte/hj/hj.exe

Can you please let us know the new link.

thanks
-bipul

April 04, 2010 - 6:36 am UTC

Reviewer: A reader

hi Tom,

The link not work, please restore it.


Thanks.

April 12, 2010 - 5:02 am UTC

Reviewer: Nidhi from India

Hey Tom,

Can you please make those link available.
Tom Kyte

Followup  

April 13, 2010 - 9:13 am UTC

I don't have time to get the swf stuff working, I don't have a file system to host on. sorry.

Can't you delegate this task?

April 29, 2010 - 5:30 pm UTC

Reviewer: Dave

to some oracle underling? ;-\

hash joins

May 18, 2010 - 6:01 pm UTC

Reviewer: S from US

i have 1 query(same SQL ID) in a Data Warehouse System which has 2 different plan hash values. The older one has a better execution plan with hash joins but the second one has started using some indexes..
1. Is this related to the data distribution changes (ie histograms) ?
2. what would be a way to force the query to use the old plan ? We are on 10g.
Tom Kyte

Followup  

May 24, 2010 - 10:35 am UTC

1) maybe yes, maybe no. It depends - you don't show us the query, you don't tell us if anything happened between plan1 and plan2 being generated - say gathering statistics for example.

2) insufficient data, we first have to arrive at "why two plans". In a data warehouse - it is rather rare to repeatedly run the same data warehouse query over and over (data warehouse queries are typified by their "we don't run this more than once" quality normally). If you do run this query over and over - have you considered a materialized view so you can run it once and then just query the answer over and over?

Hash Join

June 06, 2010 - 6:34 am UTC

Reviewer: Satwinder from Hyderabad, India

http://asktom.oracle.com/tkyte/hj/hj.html
http://asktom.oracle.com/tkyte/hj/hj.exe

is not working. Looking at the thread, it does seems to be a wonderful presentation.

Can the presentation be posted at my email address please?

Thanks,
Satwinder

join order

July 08, 2010 - 2:27 am UTC

Reviewer: Ravi B from Bay Area,CA

Hi Tom,

For example if we have to join 2 big tables and two small tables in a query, how do we go about creating indexes for the joins?

Is it ok if we let optimizer do hash joins on big tables and fast full scan on two small tables? Is this the only optimal way to execute this kind of query? In general, if multiple tables are involved in a query how do we go about creating indexes?

Thanks!
Tom Kyte

Followup  

July 08, 2010 - 12:21 pm UTC

if you have to join two big tables and two small tables in a single query

and the question begins and ends there - so the query is:

select * from big_t1, big_t2, small_t1, small_t2
where <just join conditions>


you can index however you want because we won't use any - we'll full scan - hash join and get you the result set much faster then if we used any indexes.




fast full scan is an access path available only to indexes - it does not apply to tables.



join order

July 08, 2010 - 2:02 pm UTC

Reviewer: Ravi B from Bay Area,CA

I have indexes on small tables. So it is doing hash join plus fast full scan on these indexes. So, you do not recommend any indexes on the tables (small or big) as it would do a hash join anyways?

Thanks!
Tom Kyte

Followup  

July 08, 2010 - 2:19 pm UTC

short of using the indexes as smaller versions of the tables (if you have an index that contains ALL of the necessary columns - we can use the index instead of the table) - we'll not be using indexes for that type of query (not be using them in the 'normal' conventional sense - we won't index range scan them in a nested loop)


full scans + hash joins = right approach to get the entire result set.

Index scan during hash operation

July 27, 2010 - 9:57 pm UTC

Reviewer: Ravi B from Bay Area, CA

Hi Tom,

I have a query which has a big table joining 4 small tables which is doing hash joins + index fast full scan on small tables. The time taken to do a count(*) on the SQL is taking about 9 secs to 13 secs. Is there a way to optimize this SQL further?

Details:
--------

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
"CORE 10.2.0.4.0 Production"
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

Table counts:
-------------

select count(*) from exedata
count: 3821030

select count(*) from products
count: 413671

select count(*) from editions
count: 8989

select count(*) from versions
count: 222560


SELECT COUNT(*)
FROM exeData ex,
orgs o,
products p,
editions ed,
versions v
WHERE 1=1
AND ex."_rid" != -1
AND ex.orgs_rid = o."_rid"
AND ex.prod_rid = p."_rid"
AND ex.editions_rid = ed."_rid"
AND ex.versions_rid = v."_rid"
and ex."_enddate" IS NULL
and o."_enddate" IS NULL
and p."_enddate" IS NULL
and ed."_enddate" IS NULL
and v."_enddate" IS NULL

EXPLAIN PLAN
-----------


----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 74 | | 20110 (3)|
| 1 | SORT AGGREGATE | | 1 | 74 | | |
|* 2 | HASH JOIN | | 11M| 824M| | 20110 (3)|
|* 3 | INDEX RANGE SCAN | PRODUCTS_BDNA | 77603 | 985K| | 156 (1)|
|* 4 | HASH JOIN | | 7123K| 414M| | 19863 (2)|
|* 5 | INDEX RANGE SCAN | ORGS_BDNA | 9641 | 112K| | 20 (0)|
|* 6 | HASH JOIN | | 4414K| 206M| | 19788 (2)|
|* 7 | INDEX FAST FULL SCAN | EDITIONS_BDNA | 6611 | 59499 | | 7 (0)|
|* 8 | HASH JOIN | | 3933K| 150M| 2672K| 19746 (2)|
|* 9 | INDEX FAST FULL SCAN| VERSIONS_BDNA | 118K| 1273K| | 143 (3)|
|* 10 | TABLE ACCESS FULL | EXEDATA | 3091K| 85M| | 14274 (2)|
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("EX"."PROD_RID"="P"."_rid")
3 - access("P"."_enddate" IS NULL)

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------------------

4 - access("EX"."ORGS_RID"="O"."_rid")
5 - access("O"."_enddate" IS NULL)
6 - access("EX"."EDITIONS_RID"="ED"."_rid")
7 - filter("ED"."_enddate" IS NULL)
8 - access("EX"."VERSIONS_RID"="V"."_rid")
9 - filter("V"."_enddate" IS NULL)
10 - filter("EX"."_enddate" IS NULL AND "EX"."_rid"<>(-1))

Note
-----
- 'PLAN_TABLE' is old version

Tom Kyte

Followup  

July 28, 2010 - 7:40 am UTC

tell me, do you expect the count to be small or large - are the row counts in the plan "nearly correct"

eg: after applying:

10 - filter("EX"."_enddate" IS NULL AND "EX"."_rid"<>(-1))


do you get about 3,091k rows from that table?

and so on.

July 28, 2010 - 12:40 pm UTC

Reviewer: Ravi B from Bay Area,CA

The query would return 3075708 rows after applying "EX"."_enddate" IS NULL AND "EX"."_rid"<>(-1)

Thanks,
Ravi
Tom Kyte

Followup  

July 29, 2010 - 7:48 am UTC

then the plan looks *perfect* to me.

the only thing might be, after looking at:

AND ex."_rid" != -1
AND ex.orgs_rid = o."_rid"
AND ex.prod_rid = p."_rid"
AND ex.editions_rid = ed."_rid"
AND ex.versions_rid = v."_rid"
and ex."_enddate" IS NULL
and o."_enddate" IS NULL
and p."_enddate" IS NULL
and ed."_enddate" IS NULL
and v."_enddate" IS NULL


would be an index on _enddate, versions_rid, editions_rid, prod_rid, orgs_rid, _rid - IF exedata is really "fat" and this skinny index could be used instead of the table.


if you trace with wait events enabled and tkprof that - do you see waits on reads/writes to/from temp? If so, maybe a larger pga could be beneficial.

July 28, 2010 - 1:17 pm UTC

Reviewer: Ravi B from Bay Area,CA

The counts are approximately correct.

I have analyzed the tables, but no change in execution plan or execution time.

I have created the small tables without any indexes.Now the plan shows HASH JOINS + Full table scans on all the tables. Now the execution time went up to >15 secs.
Tom Kyte

Followup  

July 29, 2010 - 7:51 am UTC

the index fast full scans were good - they were using the indexes as skinny versions of the table - NOT AS INDEXES. One of my suggestions above might have an impact.

But here would be the largest impact: the fastest way to do anything is.....


TO NOT DO IT IN THE FIRST PLACE.


I always look for count(*) queries and ask "why are you doing that, is it truly necessary"

And after I push and push and push - you know what the answer is? "No, I guess we do not really need to do that, it just made us feel good that is all"

So, look at this query long and hard, the best bit of tuning is probably:

/* 

*/



July 29, 2010 - 12:00 pm UTC

Reviewer: Ravi B from Bay Area,CA

Thanks Tom.

I know you hate this (me too); the count(*) is for displaying total count of pages for pagination query. To make matter worse, ALL the business logic is written in Java. NO code on the database, yes absolutely NO procedures, packages etc. I see hard coded literals all over the place, no binds. ALL the queries are built on the fly. Even the table constraints are enforced in the Java code. When i did reverse engineer, i just got bunch of entities with no relations at all :) The application treats oracle as a box to dump and retrieve data, that is it! I am a new hire here and I don't have much choice. In my opinion we have to rewrite entire application but nobody would take it in my company. But all are looking for a database setting fast = true :)

best regards,
Ravi
Tom Kyte

Followup  

July 29, 2010 - 1:07 pm UTC

funny, I was just updating this EXTREMELY RELEVANT Q&A

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1137577300346084930


they will never find that setting, however, I can find a setting for developed application code that always makes it go much much faster (ok, it is an OS command). Here is my secret:

$ cd /path/to/bad/code
$ rm -rf *

works every time :)

October 11, 2010 - 11:24 am UTC

Reviewer: A reader

Hi Tom,
I was reading Tanel's post below :
http://blog.tanelpoder.com/2010/10/06/a-the-most-fundamental-difference-between-hash-and-nested-loop-joins/

He said -
"When joining table A and B (A being driving table and B being the probed table), then hash joins can not perform index lookups into table B based on the values returned from table A !!!"



My point is - after the *hash table*(a type of data structure) is built - the hash algorithm *will not * touch the probed table again *physically* - so It is not *needed* for the optimizer to touch the second table again - all this is just by 'definition' of hash algorithm itself - so no need to prove it again or even we don't think in those lines that -accessing the probed table again by index or some other way .
Is my point wrong ?

Tom Kyte

Followup  

October 11, 2010 - 1:16 pm UTC

... My point is - ...

usually people say that after they've been asked "what is your point"... I'm not sure if you are a returning person referencing some context on this page or what.


... so no need to prove it again ..


Prove WHAT again?

October 11, 2010 - 1:30 pm UTC

Reviewer: A reader

Sorry Tom if I was not clear -


Here let me write like this .

"Index lookup can not be done on probed table as this is no more a physical table - it is a *hash table* , a data structure built in Our PGA for the rows in the probed table"

Is it correct ?




Tom Kyte

Followup  

October 11, 2010 - 2:16 pm UTC

The phraseology is a bit strange. It is not that "index lookup cannot be done" (because in theory - sure, it could, the index exists presumably after all - technically it could be done".

It is that the inefficient index lookup does not HAVE to be done, because we have a hash table of data either in memory or in memory and on disk (it can spill over to temp). Using that hash table we can avoid the inefficiencies of repeated index probes to find rows - we can in a single operation go to the row in the hash table rather then have to do multiple consistent gets on an index structure to find it.

October 11, 2010 - 2:31 pm UTC

Reviewer: A reader

My words might not describe my understanding correctly-my mistake.I meant to say
"Index look up can not be done on hash table"

Yes, we all know that index look up can be done on the table but hash join wont do it as we have the data in our hash table.
Tom Kyte

Followup  

October 11, 2010 - 2:34 pm UTC

I would rather you say:

"a rather inefficient index lookup does not have to be performed, since the data is nicely organized in a hash table"


Technically, there is no reason there couldn't be an in memory index for the in memory hash table - the point is - we hashed it to avoid any and all indexes due to their inherit inefficiencies for large amounts of data.

Broken links?

October 11, 2010 - 3:41 pm UTC

Reviewer: Gary

The latest links don't appear to work, I get:

Not Found

Sorry!The page requested was not found.

Feel free to scratch this comment. Thanks,

Superb

October 13, 2010 - 1:29 am UTC

Reviewer: Ashish from India

What a great explanation!
You made my day, we love you Tom.

hash joins

November 16, 2010 - 6:58 am UTC

Reviewer: Rakesk

Hi Tom,

Below links are not working. Please do the needfull.

http://asktom.oracle.com/tkyte/hj/hj.html

http://asktom.oracle.com/tkyte/hj/hj.exe


where do resultset after matching the hash table save?

December 24, 2010 - 12:03 am UTC

Reviewer: tramplai from CN

Hi Tom,

I have question:
1. where do resultset after matching the hash table save in dedicate server?
2. Where was hash table created about hash join in dedicate server?



Tom Kyte

Followup  

December 24, 2010 - 12:16 pm UTC

1) we don't have to save the result anywhere - for a simple two table join - we'd read and hash one table into memory (or temp space on disk) and then start reading the second table and sending the result back to the client.

2) in the PGA (program global area) of the server process, spilling into TEMP if needed.

Swap_join_inputs

February 04, 2011 - 3:08 pm UTC

Reviewer: A reader

Tom:

I have the below hint defined in 10.2.0.3

/*+ leading(big_table,a,b,c,d,e) full(big_table) swap_join_inputs(small_table_a) swap_join_inputs(small_table_b) swap_join_inputs(small_table_c) swap_join_inputs(small_table_d) swap_join_inputs(small_table_e) use_hash(a,b,c,d,e) */

This worked as a deep right tree join and as soon as scanning by BIG_TABLE was done all the associated joins also were done. We were done pretty much with in 10 min for a large result set.

We moved to 11g. The explain plan looks the same however
now we see left tree join and after the scan of the big_table, we now have one big additional join. Is something wrong with the hint?

Thanks,
Tom Kyte

Followup  

February 06, 2011 - 11:54 am UTC

remove all of the hints and just run the query (assuming you have good, valid, representative statistics). How does it go then?

UbQ (Understanding by Questioning)

June 29, 2011 - 6:49 pm UTC

Reviewer: Zvi from Tel Aviv, ISRAEL

Hi Tom,

Would you agree with the following:

The way Oracle implements "Hash Joins" is by reducing the size of a large table into a multidimensional array within memory, using an algorithm (hash) to initially build and then provide access into array entries.



Tom Kyte

Followup  

June 30, 2011 - 1:29 pm UTC

I would not agree with the word "reduce" since the resultant hash table is not compressed or anything.

I would not agree with multi-dimensional - as this is a simple one dimensional lookup table (hash table). Like an associative array of a single dimension.


We implement a hash join by taking the smaller of two data sources to be joined and creating a hash table (hopefully in memory) and then scanning the larger object and doing a hash lookup for every row from the larger source.

URL not found

September 18, 2011 - 9:29 am UTC

Reviewer: Mark from USA

Hi Tom,
I'm trying to see your presentation and getting error page when on this url: http://asktom.oracle.com/tkyte/hj/hj.html

Powerpoint of your presentation

March 21, 2012 - 12:24 pm UTC

Reviewer: A reader

Can you please post the powerpoint slides of your presentation? Thank you.
Tom Kyte

Followup  

March 21, 2012 - 10:48 pm UTC

i only have the demo and narrative loaded up

how to write a simple query which will go for hash join..

April 15, 2012 - 12:15 pm UTC

Reviewer: Karthikeyan.S from India

Hi,

I have a simple Question..

how to force a hash join...
1).Enough PGA/TEMP space
2).Large hash area size and set optimizer goal to ALL_ROWS
3).Put the smaller table next to from clause making sure oracle picks the smaller table as driving table.

but..

how to write a simple query which will go for hash join, considering the init parameters are per standards.

Tom Kyte

Followup  

April 16, 2012 - 12:58 am UTC

3) is "not accurate", the order of tables in the from clause is not relevant to the CBO - it was to the RBO, but the RBO doesn't even know a hash join exists.


2) would not be accurate if 1) is true - if 1) is true (you are using pga_aggregate_target), hash_area_size is not used (except in 9i and before when using shared server - but not really relevant - that would be (1) is not true :) ) - I guess (1) and (2) are mutually exclusive.

to write a simple query that goes for a hash join -

Just code it.

by default the optimizer mode these days is all_rows - and as long as you are using choose (default in older releases for the cbo) or all_rows (current default) - it will hash join naturally, all by itself.

If you wanted to "force it", you could hint it, but I would strongly encourage you not to do that.

hash join -similar to followup above

May 03, 2012 - 7:07 am UTC

Reviewer: A reader

Per documentation, NL joins method are recommended whenever join is between "large" table and small table.However,hash join method seems to be better if lot of data is expected from the "large" table.
Our query without hint was taking hours to complete.But,when we added use_hash hint ,it completed much faster.
Similar to followup above,I was looking for your suggestion so optimizer picks up Hash joins in these cases(our statistics are fresh).
Tom Kyte

Followup  

May 03, 2012 - 9:12 am UTC

Not true.

Nest loop joins are typically preferred when you have a small number of rows to be joined to a small number of rows.


Hash joins when you have a large number of rows to be joined to a large number of rows.


The sizes of the underlying tables - not relevant. The expected number of rows - very important.


If you had to hint with "use_hash" then I'm pretty sure the optimizer guessed the cardinality wrong and your approach to fix the query should have been to question why it was guessing wrong and how to get it the correct numbers perhaps by gathering more statistics (histograms for example), more current statistics (you might have stale stats), dynamic sampling, sql profiling, extended statistics and so on.

time elapsed for nested loop and hash join.

July 02, 2012 - 3:49 am UTC

Reviewer: Biswaranjan from India

Hi Tom,

Hope you are fine and doing good.

I did run your very first example for babloo's question but had the below doubt. 
####################################in 10.2.0.1

SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.00
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.37
SQL> select /*+ use_nl(dept emp) */ *
  2    from emp, dept
  3   where emp.deptno = dept.deptno;

9999 rows selected.

Elapsed: 00:00:00.23

Execution Plan
----------------------------------------------------------
Plan hash value: 3487251775

--------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Ti
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |  9999 |   644K|    35  (43)| 00
|   1 |  NESTED LOOPS                |         |  9999 |   644K|    35  (43)| 00
|   2 |   TABLE ACCESS FULL          | EMP     |  9999 |   419K|    19   (6)| 00
|   3 |   TABLE ACCESS BY INDEX ROWID| DEPT    |     1 |    23 |     1   (0)| 00
|*  4 |    INDEX UNIQUE SCAN         | DEPT_PK |     1 |       |     0   (0)| 00
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")


Statistics
----------------------------------------------------------
        761  recursive calls
          0  db block gets
      10249  consistent gets
         96  physical reads
          0  redo size
     596577  bytes sent via SQL*Net to client
        590  bytes received via SQL*Net from client
         21  SQL*Net roundtrips to/from client
         10  sorts (memory)
          0  sorts (disk)
       9999  rows processed

SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.00
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.03
SQL> ed
Wrote file afiedt.buf

  1  select *
  2    from emp, dept
  3*  where emp.deptno = dept.deptno
SQL> /

9999 rows selected.

Elapsed: 00:00:00.25

Execution Plan
----------------------------------------------------------
Plan hash value: 615168685

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  9999 |   644K|    22   (5)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  9999 |   644K|    22   (5)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPT |   500 | 11500 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| EMP  |  9999 |   419K|    19   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")


Statistics
----------------------------------------------------------
        761  recursive calls
          0  db block gets
        236  consistent gets
        101  physical reads
          0  redo size
     596577  bytes sent via SQL*Net to client
        590  bytes received via SQL*Net from client
         21  SQL*Net roundtrips to/from client
         10  sorts (memory)
          0  sorts (disk)
       9999  rows processed

SQL> 

############################

above for nested loop execution plan
we got.
10249  consistent gets
96  physical reads

and for hash join execution plan 
we got 
236  consistent gets
101  physical reads

But my doubt is why it did take neary same elapsed time even in case of hash plan it is showing very less consistent gets.

I also used set autotrace off and found 
the elapsed time for both plans(14.75 seconds ,arraysize 5000 or arraysize 500).

Thanks as always,
Biswaranjan.

Tom Kyte

Followup  

July 02, 2012 - 8:16 am UTC

because you have a teeny tiny bit of data, at these sizes - any plan would work out OK.

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6749454952894#6760861174154

:)

July 02, 2012 - 7:20 pm UTC

Reviewer: Biswaranjan

Thank you :) , I have read that link long back.
thanks again,
Biswaranjan

URLs not working

July 29, 2012 - 6:27 am UTC

Reviewer: Manu Batham from INDIA

Hi Tom,

Can you please pose new url's for these links?

http://asktom.oracle.com/tkyte/hj/hj.html

http://asktom.oracle.com/tkyte/hj/hj.exe

I saw many of users asking about the same on tkyte.blogspot.com, but never saw your reply on that.
(Will be very much helpful if you replace old links with new one)

Thanks,
Manu
Tom Kyte

Followup  

July 30, 2012 - 11:52 am UTC

https://asktom.oracle.com/pls/apex/z?p_url=ASKTOM%2Edownload_file%3Fp_file%3D66776171704447018265&p_cat=hj.swf&p_company=822925097021874

has the swf file, you can download and play it in your browser.

September 19, 2012 - 11:50 am UTC

Reviewer: Milind from India

With given new link for swf file I got flash file for hash joins. However, most of the links from Files download page are not working. I can not download any of your presentations. Can you please find out the problem and correct links if they are changed?
Tom Kyte

Followup  

September 20, 2012 - 7:33 am UTC

I'll have someone look into that, I wasn't aware of the problem, sorry about that.


update:

it is fixed, again, sorry about that!!!

September 20, 2012 - 8:42 am UTC

Reviewer: Milind

Thanks a lot for quick fix of download links. Hoping to meet you in AIOUG at Bangalore in November 2012.

hj link is not available!

December 05, 2012 - 1:33 am UTC

Reviewer: alan tan

Hi,Tom
The link you gave

http://asktom.oracle.com/tkyte/hj/hj.html

is not available now.
Could you kindly fix it or tell me where to get it?

Regards
Alan

January 27, 2013 - 12:27 pm UTC

Reviewer: Abhimaniu from INDIA

Hi Tom

The 2 links on your first reply to this thread are no longer valid. I am talking about below 2 links:
http://asktom.oracle.com/tkyte/hj/hj.html
http://asktom.oracle.com/tkyte/hj/hj.exe

Could these be uploaded again?

hash join

August 12, 2015 - 11:35 am UTC

Reviewer: ahmed from UK

Tom,
I have a query like the following
WITH cte1 AS ( select col1, col2 FROM mtTable WHERE condition)
SELECT t1.col1, t1.col2 FROM cte1 t1, largeTable t2 WHERE
t1.col1 = largeTable.keyAttr;

cte1 returns 0 rows.
Oracle is doing hash join between cte1 and LargeTable
and this means FULL table scan on largeTable.
Can't Oracle detect that cte1 is empty and skip this join?
even when I used Dynamic_sampling( t1 4) hint it still does HASH JOIN.


URL not working

December 13, 2017 - 7:58 am UTC

Reviewer: Naresh from Mountain View, CA USA

Connor McDonald

Followup  

December 14, 2017 - 8:28 am UTC

Go to Resources => Presentation Downloads and search for "hj"

not able to open hj html or exe file

January 13, 2020 - 7:34 am UTC

Reviewer: Pkjangra from India

could you please update the link for hj.html or hj.exe
I was really looking forward to the illustration and tried opening these files. But it says no such file.
I know the answer is really old, but I am interested in having a look at it if feasible.
Connor McDonald

Followup  

January 14, 2020 - 2:29 am UTC

Go to our downloads section under Resources, and enter "hj" as the criteria

on Hash Join Buffered

June 16, 2020 - 5:33 am UTC

Reviewer: Rajeshwaran, Jeyabal

Team:

What is Hash join buffered? how that is different from Hash join? what are the criteria that influences the optimizer for Hash join buffered over Hash join?
Connor McDonald

Followup  

June 17, 2020 - 4:13 am UTC

The best description I've seen on that topic is here

https://chinaraliyev.wordpress.com/2019/02/27/the-hash-join-buffered-operation/

on Hash Join Buffered

July 08, 2020 - 5:45 pm UTC

Reviewer: Rajeshwaran, Jeyabal

Team:

when the optimizer performs "Hash join buffer" operation - v$session_longops show "time_remaining" as null as "elapsed_seconds" keep on increasing
and the "message" column shows something like this " HASH Join: 99695 out of 0 blocks done"

once that operation is completed, "time_remaining" set to zero and "elapsed_seconds" shows the time it takes and message column
shows something like this " HASH Join: 336423 out of 336423 blocks done"

so while that operation is in-progress, how can we guess how long will it take to complete and how many more blocks to process? kindly advice.
Connor McDonald

Followup  

July 10, 2020 - 4:01 am UTC

That sounds like a potential bug.

I'd encourage you to log an SR on that.

on IN operator and HASH join.

September 15, 2020 - 5:15 pm UTC

Reviewer: Rajeshwaran, Jeyabal

Team:

Is it not possible to have IN clause (with multiple values) and HASH join together? please help us to understand why this is not possible?

demo@PDB1> create table t1 as select * from all_objects;

Table created.

demo@PDB1> create table t2 as select * from all_objects;

Table created.

demo@PDB1> set autotrace traceonly exp
demo@PDB1> select *
  2  from t1
  3  left outer join t2
  4  on t1.object_id = t2.object_id
  5  and t1.owner in ('SYS','SCOTT');

Execution Plan
----------------------------------------------------------
Plan hash value: 2654714712

-----------------------------------------------------------------------------------------
| Id  | Operation             | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                 | 73068 |    42M|    29M  (1)| 00:18:57 |
|   1 |  MERGE JOIN OUTER     |                 | 73068 |    42M|    29M  (1)| 00:18:57 |
|   2 |   TABLE ACCESS FULL   | T1              | 73068 |  9490K|   399   (1)| 00:00:01 |
|   3 |   BUFFER SORT         |                 |     1 |   481 |    29M  (1)| 00:18:57 |
|   4 |    VIEW               | VW_LAT_C83A7ED5 |     1 |   481 |   398   (1)| 00:00:01 |
|*  5 |     FILTER            |                 |       |       |            |          |
|*  6 |      TABLE ACCESS FULL| T2              |     1 |   133 |   398   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("T1"."OWNER"='SCOTT' OR "T1"."OWNER"='SYS')
   6 - filter("T1"."OBJECT_ID"="T2"."OBJECT_ID")

demo@PDB1> select /*+ use_hash(t1,t2) */ *
  2  from t1
  3  left outer join t2
  4  on t1.object_id = t2.object_id
  5  and t1.owner in ('SYS','SCOTT');

Execution Plan
----------------------------------------------------------
Plan hash value: 2654714712

-----------------------------------------------------------------------------------------
| Id  | Operation             | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                 | 73068 |    42M|    29M  (1)| 00:18:57 |
|   1 |  MERGE JOIN OUTER     |                 | 73068 |    42M|    29M  (1)| 00:18:57 |
|   2 |   TABLE ACCESS FULL   | T1              | 73068 |  9490K|   399   (1)| 00:00:01 |
|   3 |   BUFFER SORT         |                 |     1 |   481 |    29M  (1)| 00:18:57 |
|   4 |    VIEW               | VW_LAT_C83A7ED5 |     1 |   481 |   398   (1)| 00:00:01 |
|*  5 |     FILTER            |                 |       |       |            |          |
|*  6 |      TABLE ACCESS FULL| T2              |     1 |   133 |   398   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("T1"."OWNER"='SCOTT' OR "T1"."OWNER"='SYS')
   6 - filter("T1"."OBJECT_ID"="T2"."OBJECT_ID")

demo@PDB1> select *
  2  from t1
  3  left outer join t2
  4  on t1.object_id = t2.object_id
  5  and t1.owner in ('SYS') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 1823443478

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 73068 |    18M|       |  1802   (1)| 00:00:01 |
|*  1 |  HASH JOIN OUTER   |      | 73068 |    18M|    10M|  1802   (1)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   | 73068 |  9490K|       |   399   (1)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T2   | 73069 |  9490K|       |   399   (1)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID"(+) AND "T1"."OWNER"=CASE
              WHEN ("T2"."OBJECT_ID"(+) IS NOT NULL) THEN 'SYS' ELSE 'SYS' END )

demo@PDB1>

Connor McDonald

Followup  

September 16, 2020 - 1:47 am UTC


SQL>
SQL> create table t1 as select * from dba_objects;

Table created.

SQL> create table t2 as select * from dba_objects;

Table created.

SQL>
SQL> set feedback only
SQL> select *
  2  from  t1, t2
  3  where t1.object_id = t2.object_id
  4  and t1.owner in ('SYS','SYSTEM');

53013 rows selected.

SQL> set feedback on
SQL>
SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  8g969t7t92cmf, child number 0
-------------------------------------
select * from  t1, t2 where t1.object_id = t2.object_id and t1.owner in
('SYS','SYSTEM')

Plan hash value: 1838229974

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   898 (100)|          |
|*  1 |  HASH JOIN         |      |  3868 |   997K|   898   (1)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   |  3868 |   498K|   448   (1)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T2   | 83173 |    10M|   450   (2)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   2 - filter(("T1"."OWNER"='SYS' OR "T1"."OWNER"='SYSTEM'))


22 rows selected.

SQL>
SQL> set feedback only
SQL> select *
  2  from  t1, t2
  3  where t1.object_id = t2.object_id(+)
  4  and t1.owner in ('SYS','SYSTEM');

53014 rows selected.

SQL> set feedback on
SQL>
SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  2pcc78sk5hm0s, child number 0
-------------------------------------
select * from  t1, t2 where t1.object_id = t2.object_id(+) and t1.owner
in ('SYS','SYSTEM')

Plan hash value: 1823443478

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   898 (100)|          |
|*  1 |  HASH JOIN OUTER   |      |  3868 |   997K|   898   (1)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T1   |  3868 |   498K|   448   (1)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T2   | 83173 |    10M|   450   (2)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   2 - filter(("T1"."OWNER"='SYS' OR "T1"."OWNER"='SYSTEM'))


22 rows selected.

SQL>
SQL> set feedback only
SQL> select *
  2  from  t1 left outer join t2
  3  on t1.object_id = t2.object_id
  4  and t1.owner in ('SYS','SYSTEM');

83172 rows selected.

SQL> set feedback on
SQL>
SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  g161z958p1jtf, child number 0
-------------------------------------
select * from  t1 left outer join t2 on t1.object_id = t2.object_id and
t1.owner in ('SYS','SYSTEM')

Plan hash value: 1823443478

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |       |  2037 (100)|          |
|*  1 |  HASH JOIN OUTER   |      | 86995 |    21M|    11M|  2037   (1)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   | 83172 |    10M|       |   450   (2)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T2   | 83173 |    10M|       |   450   (2)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
       filter(("T1"."OWNER"=CASE  WHEN ("T2"."OBJECT_ID" IS NOT NULL) THEN
              'SYS' ELSE 'SYS' END  OR "T1"."OWNER"=CASE  WHEN ("T2"."OBJECT_ID" IS NOT
              NULL) THEN 'SYSTEM' ELSE 'SYSTEM' END ))


24 rows selected.

SQL>


on IN operator and HASH join.

September 16, 2020 - 5:28 am UTC

Reviewer: Rajeshwaran, Jeyabal

the above demo was from 12.2 and still we see the same plan, what version you run from ?

demo@PDB1> set feedback only serveroutput off
demo@PDB1> select /*+ gather_plan_statistics */ *
  2  from t1
  3  left outer join t2
  4  on t1.object_id = t2.object_id
  5  and t1.owner in ('SYS','SCOTT') ;

10000 rows selected.

demo@PDB1> set feedback on
demo@PDB1> select * from table( dbms_xplan.display_cursor(format=>'allstats last') );

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
SQL_ID  cfv8ppgx4yukr, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ * from t1 left outer join t2 on
t1.object_id = t2.object_id and t1.owner in ('SYS','SCOTT')

Plan hash value: 2654714712

---------------------------------------------------------------------------------------------------
| Id  | Operation             | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                 |      1 |        |  10000 |00:00:08.57 |    1960K|
|   1 |  MERGE JOIN OUTER     |                 |      1 |  10000 |  10000 |00:00:08.57 |    1960K|
|   2 |   TABLE ACCESS FULL   | T1              |      1 |  10000 |  10000 |00:00:00.02 |     261 |
|   3 |   BUFFER SORT         |                 |  10000 |      1 |  10000 |00:00:08.48 |    1960K|
|   4 |    VIEW               | VW_LAT_C83A7ED5 |  10000 |      1 |  10000 |00:00:08.43 |    1960K|
|*  5 |     FILTER            |                 |  10000 |        |  10000 |00:00:08.42 |    1960K|
|*  6 |      TABLE ACCESS FULL| T2              |  10000 |      1 |  10000 |00:00:08.41 |    1960K|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter(("T1"."OWNER"='SCOTT' OR "T1"."OWNER"='SYS'))
   6 - filter("T1"."OBJECT_ID"="T2"."OBJECT_ID")


25 rows selected.

demo@PDB1>

Chris Saxon

Followup  

September 16, 2020 - 10:35 am UTC

In 12.2 the optimizer transforms the query to a lateral join, e.g.:
select ...
from "CHRIS"."T1" "T1",
     lateral ((
       select ...
       from "CHRIS"."T2" "T2"
       where "T1"."OBJECT_ID" = "T2"."OBJECT_ID"
       and ("T1"."OWNER" = 'SCOTT' or "T1"."OWNER" = 'SYS')
     )) (+) "VW_LAT_C83A7ED5";


Which is why it's unable to use a hash join. In 19c it transforms this to something like:

select ...
from "CHRIS"."T1"    "T1",
     "CHRIS"."T2"    "T2"
where "T1"."OBJECT_ID" = "T2"."OBJECT_ID" (+)
and   ("T1"."OWNER" = case
        when ("T2"."OBJECT_ID" (+) is not null) then
          'SYS'
        else
          'SYS'
      end
   or "T1"."OWNER" = case
     when ("T2"."OBJECT_ID" (+) is not null) then
       'SCOTT'
     else
       'SCOTT'
   end
)


Which can hash join.

There are various ANSI SQL related bugs, I'm not sure exactly which one causes this issue.

But there's an easier solution: move the filter from the join to the where clause:

select *
from t1
left outer join t2
on   t1.object_id = t2.object_id
where t1.owner in ('SYS','SCOTT');


Which transforms to the more expected where clause of:

from "CHRIS"."T1"    "T1",
     "CHRIS"."T2"    "T2"
where ("T1"."OWNER" = 'SCOTT'
       or "T1"."OWNER"         = 'SYS')
and "T1"."OBJECT_ID"    = "T2"."OBJECT_ID" (+)


Which again, can use a hash join.

on IN operator and HASH join - from 18c database.

September 16, 2020 - 6:40 am UTC

Reviewer: Rajeshwaran, jeyabal

From 18c(XE) database. IN list on two table join is working as HASH BASED joined.
However when we got more than two table and a mix of inner and outer join, it is not going well.
Please look at the final plan in this demo, the bunch of logical IO's are from step 5,6,7,8.
what can we do to turn this "MERGE JOIN" into a "HASH BASED" join.

demo#20#@XEPDB1> create table t1 as select * from all_objects where owner in ('SYS','SYSTEM') and rownum <=1000;

Table created.

demo#20#@XEPDB1> create table t2 as select * from all_objects where owner in ('SYS','SYSTEM') and rownum <=1000;

Table created.

demo#20#@XEPDB1> set feedback only serveroutput off
demo#20#@XEPDB1> select /*+ gather_plan_statistics */ *
  2  from t1 left outer join t2 on
  3  t1.object_id = t2.object_id
  4  and t2.owner in ('SYS','SYSTEM');

1000 rows selected.

demo#20#@XEPDB1> set feedback on
demo#20#@XEPDB1>
demo#20#@XEPDB1> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  98ykb6awx04yw, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ * from t1 left outer join t2 on
t1.object_id = t2.object_id and t2.owner in ('SYS','SYSTEM')

Plan hash value: 1823443478

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |   1000 |00:00:00.01 |      45 |     34 |       |       |          |
|*  1 |  HASH JOIN OUTER   |      |      1 |   1000 |   1000 |00:00:00.01 |      45 |     34 |   824K|   824K| 1328K (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |   1000 |   1000 |00:00:00.01 |      19 |     17 |       |       |          |
|*  3 |   TABLE ACCESS FULL| T2   |      1 |   1000 |   1000 |00:00:00.01 |      26 |     17 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   3 - filter(("T2"."OWNER"='SYS' OR "T2"."OWNER"='SYSTEM'))


22 rows selected.

demo#20#@XEPDB1>
demo#20#@XEPDB1> set feedback only serveroutput off
demo#20#@XEPDB1> select /*+ gather_plan_statistics */ *
  2  from t1 inner join t2
  3  on  t1.object_id = t2.object_id
  4  left outer join t1 t3
  5  on t1.object_id = t2.object_id
  6  and t2.owner in ('SYS','SYSTEM') ;

1000000 rows selected.

demo#20#@XEPDB1> set feedback on
demo#20#@XEPDB1>
demo#20#@XEPDB1> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  ff1k3xz0ht0ja, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ * from t1 inner join t2 on
t1.object_id = t2.object_id left outer join t1 t3 on t1.object_id =
t2.object_id and t2.owner in ('SYS','SYSTEM')

Plan hash value: 206051411

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                 |      1 |        |   1000K|00:00:08.54 |   20021 |       |       |          |
|   1 |  MERGE JOIN OUTER     |                 |      1 |   1000K|   1000K|00:00:08.54 |   20021 |       |       |          |
|*  2 |   HASH JOIN           |                 |      1 |   1000 |   1000 |00:00:00.04 |    1021 |   824K|   824K| 1356K (0)|
|   3 |    TABLE ACCESS FULL  | T1              |      1 |   1000 |   1000 |00:00:00.01 |      19 |       |       |          |
|   4 |    TABLE ACCESS FULL  | T2              |      1 |   1000 |   1000 |00:00:00.02 |    1002 |       |       |          |
|   5 |   BUFFER SORT         |                 |   1000 |   1000 |   1000K|00:00:01.35 |   19000 |   196K|   196K|  174K (0)|
|   6 |    VIEW               | VW_LAT_024544ED |   1000 |   1000 |   1000K|00:00:00.77 |   19000 |       |       |          |
|*  7 |     FILTER            |                 |   1000 |        |   1000K|00:00:00.51 |   19000 |       |       |          |
|   8 |      TABLE ACCESS FULL| T1              |   1000 |   1000 |   1000K|00:00:00.33 |   19000 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   7 - filter((("T2"."OWNER"='SYS' OR "T2"."OWNER"='SYSTEM') AND "T1"."OBJECT_ID"="T2"."OBJECT_ID"))


28 rows selected.

demo#20#@XEPDB1>
demo#20#@XEPDB1>

Queries identical?

September 16, 2020 - 10:12 am UTC

Reviewer: A reader

SELECT *
 FROM t1,
      t2
WHERE t1.object_id = t2.object_id(+)
  AND t1.owner IN ('SYS',
                   'SYSTEM')


is

SELECT *
  FROM t1
  LEFT OUTER JOIN t2
    ON t1.object_id = t2.object_id
   WHERE t1.owner IN ('SYS',
                    'SYSTEM')


not

SELECT *
  FROM t1
  LEFT OUTER JOIN t2
    ON t1.object_id = t2.object_id
    ON t1.owner IN ('SYS',
                    'SYSTEM')


Chris Saxon

Followup  

September 16, 2020 - 10:36 am UTC

Indeed.

Correction: ON => AND

September 16, 2020 - 12:26 pm UTC

Reviewer: A reader

SELECT *
  FROM t1
  LEFT OUTER JOIN t2
    ON t1.object_id = t2.object_id
    AND t1.owner IN ('SYS',
                    'SYSTEM')

on IN operator and HASH join - from 18c database.

September 16, 2020 - 12:56 pm UTC

Reviewer: Rajeshwaran, jeyabal

All right, we got some sql, like this in our application database (running on 18c - 18.10)
and this "merge join" is killing us, so what can be done here to turn this "merge join" into a "hash join"?

demo#189#@XEPDB1> set feedback only serveroutput off
demo#189#@XEPDB1> select /*+ gather_plan_statistics */ *
  2  from t1 inner join t2
  3  on  t1.object_id = t2.object_id
  4  left outer join t1 t3
  5  on t3.object_id = t2.object_id
  6  and t2.owner in ('SYS','SYSTEM') ;

1000 rows selected.

demo#189#@XEPDB1> set feedback on
demo#189#@XEPDB1>
demo#189#@XEPDB1> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  1pr6gq1zypnvf, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ * from t1 inner join t2 on
t1.object_id = t2.object_id left outer join t1 t3 on t3.object_id =
t2.object_id and t2.owner in ('SYS','SYSTEM')

Plan hash value: 206051411

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                 |      1 |        |   1000 |00:00:00.14 |   19105 |       |       |          |
|   1 |  MERGE JOIN OUTER     |                 |      1 |   1000 |   1000 |00:00:00.14 |   19105 |       |       |          |
|*  2 |   HASH JOIN           |                 |      1 |   1000 |   1000 |00:00:00.03 |     105 |   824K|   824K| 1373K (0)|
|   3 |    TABLE ACCESS FULL  | T1              |      1 |   1000 |   1000 |00:00:00.01 |      19 |       |       |          |
|   4 |    TABLE ACCESS FULL  | T2              |      1 |   1000 |   1000 |00:00:00.01 |      26 |       |       |          |
|   5 |   BUFFER SORT         |                 |   1000 |      1 |   1000 |00:00:00.13 |   19000 |  2048 |  2048 | 2048  (0)|
|   6 |    VIEW               | VW_LAT_024544ED |   1000 |      1 |   1000 |00:00:00.12 |   19000 |       |       |          |
|*  7 |     FILTER            |                 |   1000 |        |   1000 |00:00:00.12 |   19000 |       |       |          |
|*  8 |      TABLE ACCESS FULL| T1              |   1000 |      1 |   1000 |00:00:00.12 |   19000 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   7 - filter(("T2"."OWNER"='SYS' OR "T2"."OWNER"='SYSTEM'))
   8 - filter("T3"."OBJECT_ID"="T2"."OBJECT_ID")


29 rows selected.

demo#189#@XEPDB1>
demo#189#@XEPDB1>


Chris Saxon

Followup  

September 16, 2020 - 1:05 pm UTC

Like I said, filter in the WHERE clause, not the JOIN clause.

Last line AND => WHERE.

More to Explore

DBMS_STATS

More on PL/SQL routine DBMS_STATS here