Thanks for the question, Ivan.

Asked: July 19, 2000 - 5:40 am UTC

Last updated: May 31, 2007 - 2:28 pm UTC

Version: 7.3.4.4.0

Viewed 10K+ times! This question is

You Asked

Tom,

Currently I'm working on a datawarehouse project and there are some serious performance problems.
In the init.ora file the parameter HASH_JOIN_ENABLED is set to TRUE.
Altough I've created some additional indexes but they are almost never used ,except when I specify hints like FIRST_ROWS and of course RULE. In all other cases a FULL TABLE SCAN + HASH JOIN is performed in memory.

How does and when does the COST-BASED optimizer decides to use the HASHING instead of using the indexes (in combination with NESTED LOOPS). Is it good to initialize the parameter HASH_JOIN_ENABLED to TRUE or should we disable it and use hints in our query-statements ?

Thanks in advance,
Ivan Verpoort.

and Tom said...

I might consider doing:

o setting hash_join_enabled=false in the init.ora

o using alter session set hash_join_enabled=true; in sessions
that wanted to use hash joins

If i did not like the hash join effect and wanted to disable it.

Actually -- if I was building a data warehouse, i would really want to use Oracle8i for partitions, enhanced star queries, more parallel operations, summary tables, advanced analytic functions (rollup, cube operators, moving averages, cumulative frequencies and so on), function based indexes, among many dozens of other relevant advances.

Init.ora parameters such

hash_area_size integer 0
hash_multiblock_io_count integer 1

affect how frequently the hash join is choosen as well. The hash_area_size defaults to 2 x sort_area_size if set to 0. The larger a hash_area_size -- the more frequently hash joins will be seen as a better alternative. Same with the multiblock_io_count -- the higher that is the more appealing the hash join it. So, an alternative to disabling the hash join (it truely can be faster in many cases) is to tweak the above parameters to make it "not as" appealing. Both of these parameters can be set at the session level so you can explain a query -- adjust these parameters and then either flush the shared pool or slightly modify the query (change the case of something, add a space) and re-explain it to see the results.

Also, make sure you are using TRUE temporary tablespaces. The hash joins will make use of them -- if they are permanent or are temporary and haven't been used yet, you'll get hit hard on the temporary extent allocations (recursive sql). Consider this small example:

scott@ORA734.WORLD> alter tablespace temp permanent;
Tablespace altered.

scott@ORA734.WORLD> exec dbms_lock.sleep(5);
PL/SQL procedure successfully completed.

scott@ORA734.WORLD> alter tablespace temp temporary;
Tablespace altered.

that makes all of the extents allocated in temp "go away" so the next query that needs them will have to allocate them

scott@ORA734.WORLD> select 3, t1.*, t2.* from t1, t2 where t1.owner = t2.username;

1145 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=939 1 0 HASH JOIN (Cost=10 Card=939 Bytes=121131)
2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=1 Card=82
3 1 TABLE ACCESS (FULL) OF 'T1' (Cost=2 Card=1145

Statistics
----------------------------------------------------------
4015 recursive calls
538 db block gets
1394 consistent gets
342 physical reads
76282 redo size
142606 bytes sent via SQL*Net to client
1311 bytes received via SQL*Net from client
79 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1145 rows processed

scott@ORA734.WORLD> select 5, t1.*, t2.* from t1, t2 where t1.owner = t2.username;

1145 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=939
1 0 HASH JOIN (Cost=10 Card=939 Bytes=121131)
2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=1 Card=82 Byt
3 1 TABLE ACCESS (FULL) OF 'T1' (Cost=2 Card=1145 Byte

Statistics
----------------------------------------------------------
0 recursive calls
5 db block gets
358 consistent gets
342 physical reads
0 redo size
142606 bytes sent via SQL*Net to client
1311 bytes received via SQL*Net from client
79 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1145 rows processed

scott@ORA734.WORLD>

See the huge difference in recursive calls and redo generated? These recursive calls were all space management calls and have some overhead associated with them. If you use a true temporary tablespace -- you can avoid that once the temp tablespace "warms" up.

Rating

(25 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

hash joins issue

kit, July 01, 2003 - 8:10 am UTC

Hi Tom,
I'm still confused with hash joins and need some clarifications. please see below and then a scenario I'm trying to resolve

fIRST oracle definition of hash join
To perform a hash join, Oracle performs the following steps:
1. Oracle performs a full table scan on each of the tables and splits each into
as many partitions as possible based on the available memory.

2. Oracle builds a hash table from one of the partitions (if possible, Oracle
selects a partition that fits into available memory). Oracle then uses the
corresponding partition in the other table to probe the hash table. All
partition pairs that do not fit into memory are placed onto disk.

3. For each pair of partitions (one from each table), Oracle uses the smaller
one to build a hash table and the larger one to probe the hash table.

f 2 big tables joining on col a and B on both tables. how does it partition

If there are indexes on both tables then hash join via index or full table scan

see my scenario below

Table 1 driving table - 22 m rows of which 170,000 rows selected - not long
Table 2 17m rows Joined via unique index
Table 3 564 rows
Table 4 42516
Table 5 3104
Table 6 31 rows

end result rows selected 168k

users run 10 MV per day for an area, table 1 and 2 do not have an area and hence cannot use the area specified

1& 2 driven first to get just the current days trade and lookup other tables

Please let me know which one you think will be efficient

Index scan on table 1 hash full table 2
Index scan on table 1 hash index table 2 takes ages than hash full table
Index scan on table 1 NL using unique index table 2 -

Table 3 has the area which users need to run for but can only be done via tables 1 and 2.

Table 4, 5, 6 can be done via index joins or is it best via hash joins

July 01, 2003 - 8:57 am UTC

the efficient approach?

analyze the tables, gather good stats, let the database decide what is the best path.

there is insufficient data here to say anything else. (that and I'm not a sql compiler, thats the job of the software)

hash joins

kit, July 03, 2003 - 5:31 pm UTC

tom i'm still confuses about hash joins if you have2 very large tables, then is it better to scan one using an index and full table scan on the other using an hash join.

I can also index scan both tables and then also hash jojn but takes much longer

please explain issue about how it partiions the data

if join between the two tables are say seq_num then does it partition using that key.

i did analyse the tables, via dbms stats for all columns ,cascade

and seems to do range scan and a full on other big table

|I cant see how this would be better than using the unique columns from the rabge scan and looking up the other big table

I know NL brings best back the first row quicker opposed to the hash join. but i'm interetsed in whole thoughput but seems hash is perfoming worser than the NL

Just cannot explain it.
do you need explain plans to see more clearly

July 03, 2003 - 8:18 pm UTC

no, full scanning a table via an index is the worst thing possible, the bigger the table, the worse it is (unless you are an interactive application that shows the first 10 rows and waits for the end user to ask for more of course)

use hints, play with it, make the query execute the way you *believe* would be better with indexes and time it with tkprof. it is the only way to believe -- seeing it believing.

see
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6749454952894 <code>
for an example.

HASH joins

kit, July 14, 2003 - 12:19 pm UTC

Thanks for that. still bit confused

Please see below more notes on the scenario with stats
as you see the stats below diff scenario produces diff results.

Based on the info do you think better to do Hash or NL between table 1 and 2

Table 1 driving table - 22 m rows of which 170,000 rows selected - not long

done via an index. this is joined to table of 17m rows but again the result set is
170,000 rows

this then joined to table of 17m rows result set 170 K

Table 2 17m rows Joined via unique index

not restricted just looking up additional deal data

This then joined to table of 564 rows but end result still 170k

Table 3 564 rows

still 170

Index range scan on table 1 and then NL via unique index on table 2

stats

index range scan table1 and NL via unique keys table 2

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 1.49 1.51 0 0 0 0
Execute 1 189.92 870.98 158877 1074196 93 168606
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 191.41 872.49 158877 1074196 93 168606

scan range tabl1 and hash full table 2
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.40 0.40 0 0 0 0
Execute 1 117.62 510.55 22842 52528 87 168606
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 118.02 510.95 22842 52528 87 168606

scan range tabl1 and hash full table 2- same as above but when tother people accessing
table 2. hits def increase not sure why

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.39 0.41 0 0 0 0
Execute 1 434.66 829.61 1129310 1943101 98 168012
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 435.05 830.02 1129310 1943101 98 168606

Same scna and hash join but sometimes get the following plan. please explain

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.03 0 0 0 0
Execute 1 72.99 348.28 4466 4467 37 166986
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 73.01 348.31 4466 4467 37 168606

also is is

analyze table invdatew compute statistics for
table for all indexes for all indexed columns;

same as dbms stats way

Whats the diff between all indexes and all ind columns

July 15, 2003 - 12:54 am UTC

i've no clue what I'm looking at here.

but there is only one tkprof that looks good to me and that is the last one which you say "sometime get the following plan" -- but I see no plans anywhere

Please -- when I'm taking new questions -- feel free to post one there....

I think

a) analyzing the tables
b) letting the optimizer do its job

is easier then trying to be an optimizer most of the time. let it do its job

Hash_Area_size estimate

A reader, July 15, 2003 - 1:57 am UTC

Hi Tom,
How to estimate, the hash_area_size required by a query?
Is there any formula to or is it achieved through benchmarking with different values?

July 15, 2003 - 9:31 am UTC

In Oracle9i, using pga_aggregate_target, this is not neccessary.

In 8i, you set it to the amount you want sessions to use. It is a function of the width of the hash table and the number of estimated/real rows in the hash table. There is no cut and dry formula I know of to estimate this and it would necessarily change from execution to execution of the same query with different inputs.

hash joins

Kit, July 15, 2003 - 8:20 am UTC

HI,

I only put the last one so you could explain why the same query produces diff stats.
I can see the last one is best but the same query, range table 1 and hash full2

ok
back to basics

170k out of 22M rows shows good use of index for table 1

when joining on table 2 (total of 19m rows) but does not restrict the 170 k then is it best to use the full hash or
the NL index lookup

The gather stats is related to this tuning problem so any hints greatly apreciated

We do let the optimiser choose but just wondering if there are other setting which is making it choose the
wrong plans

Hash joins

kit, July 17, 2003 - 4:51 am UTC

Tom could you please get a chance to look at my last query

July 17, 2003 - 10:30 am UTC

Please -- when I'm taking new questions -- feel free to post one there....

(as stated). this would be a "new question" really. insufficient data to comment on the above.

hash joins

kit, July 17, 2003 - 6:56 pm UTC

Tom,
I can find the moment whenever you are accepting questions

Also I think my questions is related to this topic as I'm inquiring about hash joins

pleaselet me know what more info you require

July 17, 2003 - 9:10 pm UTC

sorry.

confused

kit, July 21, 2003 - 9:26 am UTC

Tom,
do you mean you still do not understand the question or you are not able to do look at it now

hash or nested

dxl, July 21, 2004 - 9:28 am UTC

Tom

I have a query which is running in about 1 minute on one machine and about 6 hours on another although i've never seen it actually finish.

The explain plans are:

For the machine where it runs in about 1 minute:

14:19:07 MORT1@PMRP>explain plan for
14:19:07 2 select count(*) from (
14:19:07 3 SELECT ONS.*
14:19:08 4 FROM ONS_DATA_SET ONS,
14:19:08 5 NSTS_RESPONSE NSTS, gp_live_list GP
14:19:08 6 WHERE ONS.NHS_NUMBER = NSTS.NEW_NHS_NUMBER
14:19:08 7 AND NSTS.RETURNED_CURRENT_GP = GP.GP_CODE
14:19:08 8 AND NSTS.RETURNED_CURRENT_GP_PRAC = GP.GP_PRACTICE_CODE
14:19:08 9 );

Explained.

Elapsed: 00:00:00.00
14:19:08 MORT1@PMRP>select lpad(' ', 2*(level - 1)) ||
14:19:08 2 operation||' '||
14:19:08 3 options||' '||
14:19:08 4 object_name||' '||optimizer||
14:19:08 5 decode(id, 0, 'Cost = '||position)||
14:19:08 6 partition_start||' '||
14:19:08 7 partition_stop
14:19:08 8 "Query Plan"
14:19:08 9 FROM Plan_Table
14:19:08 10 START WITH id = 0
14:19:08 11 CONNECT BY PRIOR id = parent_id
14:19:08 12 /

Query Plan
-------------------------------------------------------------------------------
SELECT STATEMENT CHOOSECost = 2553
SORT AGGREGATE
HASH JOIN
HASH JOIN
INDEX FAST FULL SCAN IDX_ASCT_GP_LIVE_4 ANALYZED
INDEX FAST FULL SCAN IDX_NSTS_RES_NHS_NUM ANALYZED
INDEX FAST FULL SCAN IDX_ONS_NHS_NUMBER ANALYZED

7 rows selected.

and for the instance on which it takes 6 hours+ explain plan is:

14:20:30 PMRPTEST@DEV8>explain plan for
14:20:31 2 select count(*) from (
14:20:31 3 SELECT ONS.*
14:20:31 4 FROM ONS_DATA_SET ONS,
14:20:31 5 NSTS_RESPONSE NSTS, gp_live_list GP
14:20:31 6 WHERE ONS.NHS_NUMBER = NSTS.NEW_NHS_NUMBER
14:20:31 7 AND NSTS.RETURNED_CURRENT_GP = GP.GP_CODE
14:20:31 8 AND NSTS.RETURNED_CURRENT_GP_PRAC = GP.GP_PRACTICE_CODE
14:20:31 9 );

Explained.

Elapsed: 00:00:00.01
14:20:31 PMRPTEST@DEV8>select lpad(' ', 2*(level - 1)) ||
14:20:31 2 operation||' '||
14:20:31 3 options||' '||
14:20:31 4 object_name||' '||optimizer||
14:20:31 5 decode(id, 0, 'Cost = '||position)||
14:20:31 6 partition_start||' '||
14:20:31 7 partition_stop
14:20:31 8 "Query Plan"
14:20:31 9 FROM Plan_Table
14:20:31 10 START WITH id = 0
14:20:31 11 CONNECT BY PRIOR id = parent_id
14:20:31 12 /

Query Plan
-------------------------------------------------------------------------------
SELECT STATEMENT CHOOSECost = 9335
SORT AGGREGATE
NESTED LOOPS
HASH JOIN
INDEX FAST FULL SCAN IDX_ASCT_GP_LIVE_4 ANALYZED
INDEX FAST FULL SCAN IDX_NSTS_RES_NHS_NUM ANALYZED
INDEX FAST FULL SCAN IDX_ONS_NHSNUM

7 rows selected.

The difference here is that the first does a hash join to join the tables and the second uses a nested loop.
Could this be a reason why it is taking so long or do i need to look elsewhere?
Could it have anything to do with hash_area_size or some other init setting??

July 21, 2004 - 9:44 am UTC

use autotrace traceonly explain please. so we can see more information.

also, is the data in fact even close to being the same?
stats are up to date?
optimizer_* settings are the same?

update

dxl, July 22, 2004 - 5:46 am UTC

What i really want to know is how to best tune this query but i guess thats quite hard to see from where you are sat!

The query i posted before is actually part of a view. When i join this view to some other tables i get very poor performance. I am trying to understand also why on the 2 different boxes i am getting different explain plans when the data is the same, the optimizer settings are the same and the indexes are the same (and all the stats are up to date)

the query is :

set autotrace traceonly explain
SELECT count(*)
FROM (SELECT ONS.*, GP.*
FROM ONS_DATA_SET ONS,
NSTS_RESPONSE NSTS,
GP_LIVE_LIST GP
WHERE ONS.NHS_NUMBER = NSTS.NEW_NHS_NUMBER
AND NSTS.RETURNED_CURRENT_GP = GP.GP_CODE
AND NSTS.RETURNED_CURRENT_GP_PRAC = GP.GP_PRACTICE_CODE) lmd,
MORT_Batch_Records mbr,
STHA_PCT st
WHERE lmd.NHS_Number = mbr.NHS_No
AND lmd.PCT_Code = st.pct_code (+)
AND mbr.Batch_ID = 1
AND mbr.Stage_ID = 3
AND YEAR = greatest( trunc( NVL(Date_Of_Death, to_date( '01-apr-2002', 'dd-mon-yyyy' )), 'q' ) , to_date( '01-apr-2002', 'dd-mon-yyyy' ))
;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11914 Card=1 Bytes=5
3)

1 0 SORT (AGGREGATE)
2 1 HASH JOIN (Cost=11914 Card=106 Bytes=5618)
3 2 HASH JOIN (OUTER) (Cost=11702 Card=106 Bytes=3816)
4 3 VIEW (Cost=11700 Card=106 Bytes=3392)
5 4 HASH JOIN (Cost=11700 Card=106 Bytes=7208)
6 5 HASH JOIN (Cost=10987 Card=953 Bytes=46697)
7 6 TABLE ACCESS (FULL) OF 'ASCT_GP_LIST_STAGE_4'
(Cost=5663 Card=261541 Bytes=6538525)

8 6 INDEX (FAST FULL SCAN) OF 'IDX_NSTS_RES_NHS_NU
M' (NON-UNIQUE) (Cost=718 Card=1105800 Bytes=26539200)

9 5 INDEX (FAST FULL SCAN) OF 'IDX_ONS_NHS_NUMBER' (
NON-UNIQUE) (Cost=661 Card=1150760 Bytes=21864440)

10 3 INDEX (FULL SCAN) OF 'STHA_PCT_IDX_001' (NON-UNIQUE)
(Cost=1 Card=310 Bytes=1240)

11 2 INDEX (RANGE SCAN) OF 'MORT_BATCH_RECORDS_IDX_001' (NO
N-UNIQUE) (Cost=211 Card=21280 Bytes=361760)

and on the other machine i get:

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8760 Card=1 Bytes=53
)

1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS (OUTER) (Cost=8760 Card=1 Bytes=53)
3 2 HASH JOIN (Cost=8759 Card=1 Bytes=49)
4 3 VIEW (Cost=8607 Card=104 Bytes=3328)
5 4 HASH JOIN (Cost=8607 Card=104 Bytes=7072)
6 5 HASH JOIN (Cost=7562 Card=939 Bytes=46011)
7 6 TABLE ACCESS (FULL) OF 'ASCT_GP_LIST_STAGE_4'
(Cost=6288 Card=261541 Bytes=6538525)

8 6 INDEX (FAST FULL SCAN) OF 'IDX_NSTS_RES_NHS_NU
M' (NON-UNIQUE) (Cost=1119 Card=1105937 Bytes=26542488)

9 5 INDEX (FAST FULL SCAN) OF 'IDX_ONS_NHS_NUMBER' (
NON-UNIQUE) (Cost=1044 Card=1157554 Bytes=21993526)

10 3 INDEX (RANGE SCAN) OF 'MORT_BATCH_RECORDS_IDX_001' (
NON-UNIQUE) (Cost=151 Card=14291 Bytes=242947)

11 2 INDEX (RANGE SCAN) OF 'STHA_PCT_IDX_001' (NON-UNIQUE)
(Cost=1 Card=302 Bytes=1208)

i know it is hard without knowing the tables or data but does anything jump out by looking at these explain plans?

July 22, 2004 - 7:30 am UTC

something is different -- search this site for 10053 or if you have my book "Effective Oracle by Design" -- see chapter 6.

that trace will show you what is different (999 times out of 1000 the parameters that "are the same" are "actually quite different" and this report lists all parameters used).

Dave, July 22, 2004 - 8:44 am UTC

There seems to be a pretty clear pattern here that on the second box multiblock i/o -- TABLE ACCESS (FULL) and INDEX (FAST FULL SCAN) -- are being rated at a higher cost than on the first box. I know, don't compare costs, but to me it seems that the DB_FILE_MULTIBLOCK_READ_COUNT might be a candidate for a difference between the two systems (or maybe there's a significant difference in system stats).

Anyhoo, just a theory -- I'm sure that there's a fair few other factors that could affect this.

hash joins

kit, October 14, 2004 - 1:03 pm UTC

Tom, Good to have you back
Can you confirm if the below is corect

say 2 tables (a and b)

in a query Both tables have filters to limit the numbers of rows using index scan or other scan

Oracle identifies which table has the smaller resulting set and hashes the key used to join to the other table

Once Hashed, it goes though the other larger result set one by 1 and hashes the key columns and looks up in the smaller hashset and outputs the row

Thanks
Ketan

October 14, 2004 - 7:31 pm UTC

thats the concept (i have more details in effective oracle by design -- such as what happens with outer joins and such) -- but yes, that is effectively it.

datawarehouse

loney mohanty, November 14, 2004 - 6:41 am UTC

can u send me datawarehouse company faq

November 14, 2004 - 10:02 am UTC

not really sure what you mean by that.

Why is Hash Join more Efficient?

Reader, November 17, 2004 - 6:01 am UTC

Hi Tom,

Following is a query which was till recently used for 1 of our reports which took more than 4 hours to execute and a lot of IO's. The developer had done a mistake of doing a join twice in the same query(A.SHPSNR =P.PSNR ->line 44,P.PSNR = A.SHPSNR -> line 50)

1 SELECT
2 A.SHDATE,
3 A.SHSUBSCRIBERNR,
4 LTRIM(AD.ADDRFIRSTNAME || ' ' || AD.ADDRSURNAME) CUSTNAME,
5 LTRIM (REPLACE (TO_CHAR(AD.ADDRHOUSENRNUMERIC),'0',' ') ||' ' ||LTRIM(A
D.ADDRHOUSEALPHANR || ' ' || AD.ADDRSTREET ) ) CUSTADDRESS,
6 P.PSACCNR,
7 M.ACCACCOUNTTYPE,
8 M.ACCSTATUS,
9 M.LCONAME,
10 M.MSTUSERKEY,
11 M.MSUSERKEY ,
12 A.SHPRODUCTNR,
13 PR.DESCRIPTION PRODUCTDESC,
14 F.FTGROSSAMTVAT,
15 R.REASONDESCRIPTION,
16 ZONENAME ZONE_NAME,
17 STATENAME STATE_NAME,
18 CITYNAME CITY_NAME,
19 AREANAME AREA_NAME,
20 DISNAME DIS_NAME,
21 SUBDNAME SUBDIS_NAME ,
22 LCO.LCO_LCO,
23 U.USERFULLNAME,
24 LCO.LCO_ZONE,
25 LCO.LCO_STATE,
26 LCO.LCO_CITY,
27 LCO.LCO_AREA_JV,
28 LCO.LCO_DISTRIBUTOR,
29 LCO.LCO_SUBDISTRIBUTOR
30 FROM
31 ibsadminlocal.SUHISTOR A ,
32 ibsadminlocal.PRODSUBS P,
33 ibsadminlocal.PRODUCTS PR,
34 ibsadminlocal.CUSTMSLINK M ,
35 ibsadminlocal.FINTRNSU F,
36 ibsadminlocal.ADDRESS AD ,
37 ibsadminlocal.REASONS R ,
38 ibsadminlocal.USRNAMES U,
39 H_LCOHIERARCHY LCO
40 WHERE
41 A.SHEVENTNR=171
42 AND A.SHDATE>=731641
43 AND A.SHDATE <= 731671
44 AND A.SHPSNR =P.PSNR
45 AND A.SHUSERNR = U.USERNR
46 AND A.SHPRODUCTNR =PR.PRODUCTNR
47 AND PR.PRODUCTTYPE IN (3,4,5,6)
48 AND P.PSPRODUCTNR <>6
49 AND M.PSACCNR =P.PSACCNR
50 AND P.PSNR = A.SHPSNR
51 AND P.PSACCNR = F.FTACCNR AND
52 F.FTCREATEDATE <= 731671 AND
53 P.PSSUBSCRIBERNR= AD.ADDRCUSTNR AND
54 AD.ADDREVENTNR =100 AND
55 M.COLLBOYADDREVENTNR=100 AND
56 A.SHREASON=R.REASONNO AND
57* P.PSMARKETSEGMENT=LCO.LCO_LCO
58 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2024 Card=7 Bytes=2709)

1 0 NESTED LOOPS (Cost=2024 Card=7 Bytes=2709)
2 1 NESTED LOOPS (Cost=914 Card=1 Bytes=375)
3 2 NESTED LOOPS (Cost=913 Card=1 Bytes=227)
4 3 HASH JOIN (Cost=904 Card=3 Bytes=552)
5 4 NESTED LOOPS (Cost=902 Card=5 Bytes=840)
6 5 NESTED LOOPS (Cost=900 Card=1 Bytes=83)
7 6 HASH JOIN (Cost=899 Card=1 Bytes=58)
8 7 HASH JOIN (Cost=63 Card=2098 Bytes=83920)
9 8 TABLE ACCESS (FULL) OF 'USRNAMES' (Cost=1Card=188 Bytes=3760)

10 8 TABLE ACCESS (BY INDEX ROWID) OF 'SUHISTOR' (Cost=60 Card=2098 Bytes=41960)

11 10 INDEX (RANGE SCAN) OF 'KEY005SUHISTOR' (UNIQUE) (Cost=9 Card=2098)

12 7 TABLE ACCESS (FULL) OF 'PRODSUBS' (Cost=175 Card=113506 Bytes=2043108)

13 6 TABLE ACCESS (BY INDEX ROWID) OF 'REASONS' (Cost=1 Card=151 Bytes=3775)

14 13 INDEX (UNIQUE SCAN) OF 'KEY002REASONS' (UNIQUE)

15 5 TABLE ACCESS (BY INDEX ROWID) OF 'CUSTMSLINK' (Cost=2 Card=469 Bytes=39865)

16 15 INDEX (RANGE SCAN) OF 'CUSTMLINK_PSACCNR' (NON-UNIQUE) (Cost=1 Card=469)

17 4 TABLE ACCESS (FULL) OF 'PRODUCTS' (Cost=1 Card=35 Bytes=560)

18 3 TABLE ACCESS (BY INDEX ROWID) OF 'ADDRESS' (Cost=3 Card=9892 Bytes=425356)

19 18 INDEX (RANGE SCAN) OF 'KEY002ADDRESS' (UNIQUE) (Cost=2 Card=9892)

20 2 TABLE ACCESS (BY INDEX ROWID) OF 'H_LCOHIERARCHY' (Cost=1 Card=3475 Bytes=514300)

21 20 INDEX (UNIQUE SCAN) OF 'KWY001LCOHIERARCHY' (UNIQUE)
22 1 TABLE ACCESS (FULL) OF 'FINTRNSU' (Cost=1110 Card=287606 Bytes=3451272)

The Statistics were up to date.
Now when I removed the join condition on line 44 the same query completed in under 2 mins with the following plan.
**************
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3441 Card=15332 Bytes=5933484)

1 0 HASH JOIN (Cost=3441 Card=15332 Bytes=5933484)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'CUSTMSLINK' (Cost=2 Card=469 Bytes=39865)

3 2 INDEX (RANGE SCAN) OF 'CUSTMLINK_COLLBOYADDREVENTNR' (NON-UNIQUE) (Cost=1 Card=469)

4 1 HASH JOIN (Cost=3350 Card=3269 Bytes=987238)
5 4 TABLE ACCESS (FULL) OF 'USRNAMES' (Cost=1 Card=188 Bytes=3760)

6 4 HASH JOIN (Cost=3335 Card=3269 Bytes=921858)
7 6 HASH JOIN (Cost=3082 Card=3269 Bytes=438046)
8 7 HASH JOIN (Cost=1212 Card=444 Bytes=54168)
9 8 HASH JOIN (Cost=950 Card=1360 Bytes=107440)
10 9 TABLE ACCESS (FULL) OF 'PRODUCTS' (Cost=1 Card=35 Bytes=560)

11 9 HASH JOIN (Cost=946 Card=2098 Bytes=132174)
12 11 HASH JOIN (Cost=63 Card=2098 Bytes=94410)
13 12 TABLE ACCESS (FULL) OF 'REASONS' (Cost=1 Card=151 Bytes=3775)

14 12 TABLE ACCESS (BY INDEX ROWID) OF 'SUHISTOR' (Cost=60 Card=2098 Bytes=41960)

15 14 INDEX (RANGE SCAN) OF 'KEY005SUHISTOR' (UNIQUE) (Cost=9 Card=2098)

16 11 TABLE ACCESS (FULL) OF 'PRODSUBS' (Cost=175 Card=113506 Bytes=2043108)

17 8 TABLE ACCESS (FULL) OF 'ADDRESS' (Cost=131 Card=9892 Bytes=425356)

18 7 TABLE ACCESS (FULL) OF 'FINTRNSU' (Cost=1110 Card=287606 Bytes=3451272)

19 6 TABLE ACCESS (FULL) OF 'H_LCOHIERARCHY' (Cost=14 Card=3475 Bytes=514300)

******************
Can you please explain why the Execution plan changed so drastically and what is the ill-effect of using the same join condition twice in the same Query?

Thanks

Disable Hash Join with PGA AGGREGATE TARGET

Job Mathews, May 01, 2006 - 11:07 pm UTC

How can I disable hash_joins when using PGA_AGGREGATE TARGET parameter?

I have seen very bad performance when execution plan goes for "Hash Joins". When I force to use Nested Loops, it becomes very fast. I am not sure the reason but as of now thinks best woould be to disable hash_joins.

May 02, 2006 - 3:46 am UTC

perhaps you really mean "how can I optimize for response time instead of the default total throughput"

I think that you want to get the first rows as fast as possible (interactive application) as opposed to the LAST row as fast as possible (batch). By default - the CBO is going to be using ALL ROWS (throughput) optimization - when you use CHOOSE or when you use 10g (set to ALL_ROWS)

I think you want to have your application issue an alter session to set first_rows_n

You DO NOT want to disable the brutally efficient hash join, you may well want to optimize for response time in the applications that make sense to have that be the mode they should run in.

Rob Mathews, May 02, 2006 - 10:38 pm UTC

Thanks Tom.

In my case query returns 10000 rows ( avg_row_len=80 bytes). It takes me 10 seconds to display all these rows. So in worst case I should get a difference of 10 seconds but in my case it is 1 hour vs 4 mintues ( with nested loop ). Problemis there for many queries and all are having hash joins as plan. I thought CBO is going for bad plan ( as stats are recent ). I am not sure how to solve this puzzle so I thought you are the right person.

Also can you please let me know the way out to disable hash_joins temporarily?

May 03, 2006 - 1:49 am UTC

show me, give us a tkprof of the query with hash joins and one with hints that make it use nested loops (I assume you have some way to get the nested loops going)

You can read about alter session - but I'd rather FIX the problem than lobotomize the server.

Hash_area_size over dblink

Smita Acharya, May 12, 2006 - 5:53 pm UTC

Hi Tom,

Thanks for all your great answers. The following is the query I am trying to run over a dblink.

SELECT *
FROM AMIOWN.SERVICE_X "A1"
WHERE "A1"."IMAGE_RECNBR"<>ALL (SELECT "A3"."IMAGE_RECNBR" FROM AMIOWN.ODS_CHANGE_LOG "A3"
WHERE "A3"."TABLE_NAME"='SERVICE_X' AND "A3"."YMDTRANS"=TO_CHAR(TRUNC(SYSDATE@!),'yyyymmdd'))
AND "A1"."SERV_NBR"<>' '
AND "A1"."SERV_NBR"<>ALL (SELECT "A2"."SERV_NBR" FROM AMIOWN.SERVICE_X "A2" WHERE "A2"."YMDTRANS">=20050101 GROUP BY "A2"."SERV_NBR" HAVING COUNT(*)>1)
AND "A1"."YMDTRANS">=20050101

The plan for the query is as follows:
Step # Step Name
11 SELECT STATEMENT
10 HASH JOIN [ANTI]
4 HASH JOIN [ANTI]
2 AMIOWN.SERVICE_X TABLE ACCESS [BY INDEX ROWID]
1 AMIOWN.SERVICE_X_YMD INDEX [RANGE SCAN]
3 AMIOWN.ODS_CHANGE_LOG_PK INDEX [RANGE SCAN]
9 . VIEW
8 FILTER
7 SORT [GROUP BY]
6 AMIOWN.SERVICE_X TABLE ACCESS [BY INDEX ROWID]
5 AMIOWN.SERVICE_X_YMD INDEX [RANGE SCAN]

DB version - 9.2.0.1
My current pga_aggregrate_target is 256MB . Most of the sessions seem to manage with that value except for some batch jobs like the above
that run during the night.
This is one of the queries which takes about 10 hours to run with the above pga_aggregate setting.
Is there a way I can make the above query run with
workarea_size_policy=MANUAL, hash_area_size=160MB ,sort_area_size=40MB during the night time , across a dblink.

I have a lot of other questions about the setting of the above parameters , but I guess it would be better to take it one at a time.

hash_join_enabled

Mark, May 24, 2007 - 7:31 am UTC

Hello Tom,
In your first reponse you have mentioned, it should be true temporary tablespace. The underlying datafile of the TEMP tablespace is NOT TEMPFILE but an datafile in my case. In due course we will be changing it to TEMPFILE.

Is this the reason why it's not doing an HASH JOIN but going for NESTED LOOP? If not what could be the reason?

I have analyzed both the tables.

Note: I had asked this same question sometime back and you replied saying EXPLAIN PLAN can lie sometimes. I generated tkprof and Im finding it's same as EXPLAIN PLAN.

select name, value from v$parameter where name like '%hash%'
hash_join_enabled TRUE
hash_area_size 120048000

Here is the explain plan for the query (without any hint):

SELECT a.actlinkId,actionIndex,a.fieldId,assignShort,assignLong,sampleServer,sampleSchema ,b.schemaId
FROM actlink_push a,actlink_mapping b
WHERE a.actlinkId=b.actlinkId
AND b.schemaId=203
ORDER BY 1 DESC,2 DESC

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=36 Card=182 Bytes=26
026)

1 0 SORT (ORDER BY) (Cost=36 Card=182 Bytes=26026)
2 1 NESTED LOOPS (Cost=30 Card=182 Bytes=26026)
3 2 TABLE ACCESS (FULL) OF 'ACTLINK_PUSH' (Cost=30 Card=20
566 Bytes=2796976)

4 2 INDEX (UNIQUE SCAN) OF 'ACTLINK_MAPPING_IND' (UNIQUE)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
21045 consistent gets
0 physical reads
0 redo size
6715 bytes sent via SQL*Net to client
626 bytes received via SQL*Net from client
52 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
50 rows processed

Here is the explain plan for the query (with FULL hint):

SELECT /*+ full(B) */ a.actlinkId,actionIndex,a.fieldId,assignShort,assignLong,sampleServer,sampleSchema ,b.schemaId
FROM actlink_push a,actlink_mapping b
WHERE a.actlinkId=b.actlinkId
AND b.schemaId=203
ORDER BY 1 DESC,2 DESC

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=44 Card=182 Bytes=26
026)

1 0 SORT (ORDER BY) (Cost=44 Card=182 Bytes=26026)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'ACTLINK_PUSH' (Cost=2
Card=11 Bytes=1496)

3 2 NESTED LOOPS (Cost=38 Card=182 Bytes=26026)
4 3 TABLE ACCESS (FULL) OF 'ACTLINK_MAPPING' (Cost=6 Car
d=16 Bytes=112)

5 3 INDEX (RANGE SCAN) OF 'ACTLINK_PUSH_IND' (NON-UNIQUE
) (Cost=1 Card=11)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
111 consistent gets
0 physical reads
0 redo size
7133 bytes sent via SQL*Net to client
626 bytes received via SQL*Net from client
52 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
50 rows processed

ACTLINK_PUSH table is indexed on ACTLINKID column
ACTLINK_MAPPING table is indexed on SCHEMAID and ACTLINKID columns

This is a repeat question so apologies for this.

Thanks

May 26, 2007 - 11:25 am UTC

the estimated rowcounts are pretty small.

got a tkprof with the row source operation to see if the actual row counts are close to the estimated?

TKPROF output

Mark, May 29, 2007 - 10:16 am UTC

Hello Tom,
Here is the tkprof output.

SELECT a.actlinkId,actionIndex,a.fieldId,assignShort,assignLong,sampleServer,sampleSchema ,b.schemaId
FROM actlink_push a,actlink_mapping b
WHERE a.actlinkId=b.actlinkId
AND b.schemaId=:"SYS_B_0"
ORDER BY :"SYS_B_1" DESC,:"SYS_B_2" DESC

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       51      0.10       0.12          1      21042          0          50
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       53      0.10       0.12          1      21042          0          50

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 21  (ARADMIN)

Rows     Row Source Operation
-------  ---------------------------------------------------
     50  SORT ORDER BY 
     50   NESTED LOOPS  
  20566    TABLE ACCESS FULL ACTLINK_PUSH 
     50    INDEX UNIQUE SCAN ACTLINK_MAPPING_IND (object id 3817)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
     50   NESTED LOOPS
     50    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'ACTLINK_PUSH'
  20566    INDEX   GOAL: ANALYZED (UNIQUE SCAN) OF 'ACTLINK_MAPPING_IND' 
               (UNIQUE)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      52        0.00          0.00
  SQL*Net message from client                    52       83.02         85.05
  db file sequential read                         1        0.01          0.01

Thanks

May 30, 2007 - 11:00 am UTC

the dreaded cursor_sharing = force or similar. Gosh, do i hate that. You have a really bad bad bug in your developed applications that needs to be fixed - that is priority ONE!!!

did you notice explain plan is different from row source operation?
http://asktom.oracle.com/Misc/when-explanation-doesn-sound-quite.html

not going to like explain plans with cursor sharing.

now, give that schemaid is NOT A CONSTANT, it is saying "in general, given your data and the bind we saw the first time we executed this" the best plan is.....

is schemaid a skewed column in your system or rather evenly distributed?

Plan changed after Migration to Oracle 10g

Muhammad Riaz Shahid, May 30, 2007 - 11:54 am UTC

Dear Tom,
I don't know whether this question is relevant to current thread or not but i just wanted to get more info of what happened.

Recently, we migrated our database from 9i to 10g R2. After the migration, execution plan of following query has been changed:

 SELECT     
    jh.job_reference,
    jh.job_status,
    jh.bol_number,
    jh.part_load_master,
    jcv.doc_release_status,
    jcv.imp_custom_release_status,
    jcv.charge_release_status,
    jcv.equipment_release_status,
    jcv.release_status,
    decode(j.journey_type, 'T', j.point_from)  ppod,
    row_number() over(partition by jh.job_reference, journey_type order by seq_number desc) r_no,
    j.disch_voyage_reference imp_voy,
    j.point_from poo,
    j.point_to delivery_port,
    (SELECT dp_pfs9985.get_pol(jh.job_reference) FROM DUAL) pol,
    j.point_to fpod,
    jh.alternate_point_import,
    j.journey_Type
 FROM job_headers jh,
      job_crf_validation jcv,
      (SELECT job_reference, point_from,point_to, disch_voyage_reference, seq_number, journey_type
       FROM journey 
       WHERE journey_type IN ('T', 'C', 'D'))
 WHERE  jh.job_reference IN (select job_reference from gtt_pfs9985)
 AND    jh.job_reference = jcv.job_reference(+)
 AND    jh.job_reference = j.job_Reference(+)

10g Plan:

----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4202 Card=2 Bytes=260)

   1    0   FAST DUAL (Cost=2 Card=1)
   2    0   WINDOW (SORT) (Cost=4202 Card=2 Bytes=260)
   3    2     NESTED LOOPS (OUTER) (Cost=4201 Card=2 Bytes=260)
   4    3       NESTED LOOPS (OUTER) (Cost=4199 Card=1 Bytes=94)
   5    4         HASH JOIN (RIGHT SEMI) (Cost=4198 Card=1 Bytes=68)
   6    5           TABLE ACCESS (FULL) OF 'GTT_PFS9985' (TABLE (TEMP)
          ) (Cost=2 Card=1 Bytes=40)

   7    5           TABLE ACCESS (FULL) OF 'JOB_HEADERS' (TABLE) (Cost=4195 Card=36193 Bytes=1013404)
   8    4         TABLE ACCESS (BY INDEX ROWID) OF 'JOB_CRF_VALIDATION' (TABLE) (Cost=1 Card=1 Bytes=26)
   9    8           INDEX (UNIQUE SCAN) OF 'JOB_CRF_VALIDATION_PK' (INDEX (UNIQUE)) (Cost=0 Card=1)
  10    3       VIEW PUSHED PREDICATE (Cost=2 Card=2 Bytes=72)
  11   10         TABLE ACCESS (BY INDEX ROWID) OF 'JOURNEY' (TABLE) (Cost=3 Card=2 Bytes=82)
  12   11           INDEX (RANGE SCAN) OF 'JOURNEY_PK' (INDEX (UNIQUE)) (Cost=2 Card=2)

9i Plan

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=1 Bytes=110)
   1    0   TABLE ACCESS (FULL) OF 'DUAL' (Cost=17 Card=8168)
   2    0   WINDOW (SORT) (Cost=9 Card=1 Bytes=110)
   3    2     FILTER
   4    3       NESTED LOOPS (OUTER) (Cost=6 Card=1 Bytes=110)
   5    4         NESTED LOOPS (OUTER) (Cost=6 Card=1 Bytes=74)
   6    5           NESTED LOOPS (Cost=5 Card=1 Bytes=53)
   7    6             SORT (UNIQUE)
   8    7               TABLE ACCESS (FULL) OF 'GTT_PFS9985' (Cost=2 Card=1 Bytes=32)
   9    6             TABLE ACCESS (BY INDEX ROWID) OF 'JOB_HEADERS'
  10    9               INDEX (UNIQUE SCAN) OF 'JOB_HEADERS_PK' (UNIQUE)
  11    5           VIEW PUSHED PREDICATE OF 'JOB_CRF_VALIDATION' (Cost=1 Card=1 Bytes=21)

  12   11             FILTER
  13   12               TABLE ACCESS (BY INDEX ROWID) OF 'JOB_CRF_VALIDATION' (Cost=2 Card=1 Bytes=21)
  14   13                 INDEX (UNIQUE SCAN) OF 'JOB_CRF_VALIDATION_P
          K' (UNIQUE) (Cost=1 Card=1)
  15    4         VIEW PUSHED PREDICATE
  16   15           FILTER
  17   16             TABLE ACCESS (BY INDEX ROWID) OF 'JOURNEY' (Cost=3 Card=1 Bytes=36)
  18   17               INDEX (RANGE SCAN) OF 'JOURNEY_PK' (UNIQUE) (Cost=2 Card=2)

GTT_PFS9985 is a global temporary table and value of optimizer_dynamic_sampling is 2 in both databases. We managed to find solution by rewritting the query (and hence getting the same plan as that of 9i) but i was wondering why the optimzier got confused and did an FTS on table job_headers?

May 30, 2007 - 3:57 pm UTC

I doubt it was "confused", it was purposeful - 10g optimization is very different from 9i

when you upgrade over major releases - you expect (actually hope) plans change, they will - it is one of the reasons you upgrade.

Many of your plans changed for the better.
Many of them stayed the same - performance wise.
And you had one that was noticably "not as good"

The third category is unfortunate, but typically does happen. We find them in testing, correct the underlying cause (you worked around the issue by rewriting the query - fine, but we could have tried to find the underlying cause first and attempted a correction there)

Still one clarification

Muhammad Riaz Shahid, May 31, 2007 - 10:15 am UTC

Dear Tom,
Actually after migration to 10g, this has been happened to so many queries (change from NL to Hash Joins) and all such queries are having performance issues. So i just wanted to find the "actual" cause of this problem. Maybe we have set some parameter value really wrong; maybe anythign else. Can you please through some light on which parameters can (probaly) cause these kind of problems?

Riaz

May 31, 2007 - 10:37 am UTC

what parameters do you have set (hopefully the answer is "none" besides the sort of mandatory ones like control files and memory settings)

About HASH JOINS

Mark, May 31, 2007 - 10:28 am UTC

Hello Tom,
For my previous question regarding the HASH JOINS you replied 'did you notice explain plan is different from row source operation?'

Rows     Row Source Operation
-------  ---------------------------------------------------
     50  SORT ORDER BY 
     50   NESTED LOOPS  
  20566    TABLE ACCESS FULL ACTLINK_PUSH 
     50    INDEX UNIQUE SCAN ACTLINK_MAPPING_IND (object id 3817)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
     50   NESTED LOOPS
     50    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'ACTLINK_PUSH'
  20566    INDEX   GOAL: ANALYZED (UNIQUE SCAN) OF 'ACTLINK_MAPPING_IND' 
               (UNIQUE)

When you say the difference do you mean the sequence of execution? If not can you please tell me.

SchemaId is not evenly distributed.

SELECT schemaId, count(schemaid)
from actlink_mapping
group by schemaid
order by count(schemaid)

Count(schemaid) varies from 1 to 550 out of 2200 rows.

select count(distinct schemaid) from actlink_mapping

Returns 1363 rows

Thanks

May 31, 2007 - 10:41 am UTC

if there are 2,200 rows - how are you accessing 20,566 via the index.

i hate cursor sharing=force/similar. what is the reason you are using it.

HASH_JOIN

Karthik, May 31, 2007 - 12:23 pm UTC

ACTLINK_PUSH = 20567 rows
ACTLINK_MAPPING = 22307 (not 2200 rows ... typo error)

Beleive me the first thing I told the Senior DBA was increase session_cached_cursors from 0 to 100 (to start with) and change the value CUSROR_SHARING from FORCE to EXACT.

Thanks

May 31, 2007 - 2:28 pm UTC

so, what cardinality does

select * from actlink_mapping where schemaid = <various values>

return in a plan? what does the optimizer think will be returned here.

tell you what, start over, show us the tables, the indexing schemes (so we know what index names go with what columns), tell us what plan you EXPECT should be used, show us the stats from user_tables.

HASH_JOIN

Mark, June 01, 2007 - 8:40 am UTC

Hello Tom,

I suppose it should use HASH JOIN instead of NL. Using HASH JOIN the number of logical reads are very less but the query time is almost same. In that case should we worry too much about the logical reads.

SQL> desc ACTLINK_PUSH 
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------
 ACTLINKID                                 NOT NULL NUMBER(15)
 ACTIONINDEX                               NOT NULL NUMBER(15)
 FIELDID                                   NOT NULL NUMBER(15)
 ASSIGNSHORT                                        VARCHAR2(255)
 ASSIGNLONG                                         CLOB
 SAMPLESCHEMA                                       VARCHAR2(255)
 SAMPLESERVER                                       VARCHAR2(64)
 
 COLUMN NAME INDEX NAME
 -------------------------------
 ACTLINKID   ACTLINK_PUSH_IND

 SQL> desc actlink_mapping
  Name                                      Null?    Type
  ----------------------------------------- -------- ------------
  SCHEMAID                                  NOT NULL NUMBER(15)
  OBJINDEX                                  NOT NULL NUMBER(15)
 ACTLINKID                                 NOT NULL NUMBER(15)
 
  COLUMN NAME INDEX NAME   COLUMN_POSITION
 ---------------------------------------------------------------
 SCHEMAID   ACTLINK_MAPPING_IND   1
 ACTLINKID  ACTLINK_MAPPING_IND   2
 ACTLINKID   ACTLINK_MAPPING_IDX4   1
 
 ACTLINK_MAPPING_IND is unique index.
 
SQL> SELECT COUNT(*)
  2   FROM ACTLINK_PUSH 
  3   ;

  COUNT(*)
----------
     20567
     
     
SELECT DISTINCT ACTLINKID 
FROM ACTLINK_PUSH 

1850 rows selected.

SELECT ACTLINKID, COUNT(*)
FROM ACTLINK_PUSH 
GROUP BY ACTLINKID
ORDER BY COUNT(*) DESC

COUNT(*) is in the range of 1 to 210. It's unevenly distributed.

SQL> SELECT COUNT(*) FROM ACTLINK_MAPPING;

  COUNT(*)
----------
     22307
     
SELECT SCHEMAID, COUNT(*) 
FROM ACTLINK_MAPPING
GROUP BY SCHEMAID
ORDER BY COUNT(*) desc

COUNT(*) is in the range of 1 to 561. This is also unevenly distributed. But combination of SCHEMAID and ACTLINKID is unqiue
     


TABLE_NAME   NUM_ROWS   LAST_ANALYZED SAMPLE_SIZE
ACTLINK_MAPPING   22294    24/05/2007 10:11:18    22294
ACTLINK_PUSH   20566    24/05/2007 10:10:39    20566

********************************************************************************

Without any hint     

SELECT  a.actlinkId,actionIndex,a.fieldId,assignShort,assignLong,sampleServer,sampleSchema ,b.schemaId
FROM actlink_push a, actlink_mapping  b
WHERE b.actlinkId=a.actlinkId
AND b.schemaId=:"SYS_B_0"
ORDER BY :"SYS_B_1" DESC,:"SYS_B_2" DESC

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch      256      0.21       0.21          0      42086          0         254
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      260      0.21       0.21          0      42086          0         254

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 21  (ARADMIN)

Rows     Row Source Operation
-------  ---------------------------------------------------
     50  SORT ORDER BY 
     50   NESTED LOOPS  
  20567    TABLE ACCESS FULL ACTLINK_PUSH 
     50    INDEX UNIQUE SCAN ACTLINK_MAPPING_IND (object id 3817)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
     50   NESTED LOOPS
     50    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'ACTLINK_PUSH'
  20567    INDEX   GOAL: ANALYZED (UNIQUE SCAN) OF 'ACTLINK_MAPPING_IND' 
               (UNIQUE)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     258        0.00          0.00
  SQL*Net message from client                   258       26.41         88.82
  
  

********************************************************************************

With HASH JOIN hint for a different SCHEMAID 

********************************************************************************
     
     
SELECT /*+ USE_HASH(B) */ a.actlinkId,actionIndex,a.fieldId,assignShort,assignLong,sampleServer,sampleSchema ,b.schemaId
FROM actlink_push a,actlink_mapping b
WHERE a.actlinkId=b.actlinkId
AND b.schemaId=:"SYS_B_0"
ORDER BY :"SYS_B_1" DESC,:"SYS_B_2" DESC

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       51      0.03       0.02          0        476          0          50
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       53      0.03       0.02          0        476          0          50

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 21  (ARADMIN)

Rows     Row Source Operation
-------  ---------------------------------------------------
     50  SORT ORDER BY 
     50   HASH JOIN  
     35    INDEX RANGE SCAN ACTLINK_MAPPING_IND (object id 3817)
  20567    TABLE ACCESS FULL ACTLINK_PUSH 


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
     50   HASH JOIN
     50    INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'ACTLINK_MAPPING_IND' 
               (UNIQUE)
     35    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'ACTLINK_PUSH'


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      52        0.00          0.00
  SQL*Net message from client                    52     1161.73       1180.13
  
********************************************************************************

Thanks

Still awaiting ...

A reader, June 02, 2007 - 6:56 am UTC

Hello Tom,
I still waiting for your reply.

Thanks

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

hash joins issue

hash joins

HASH joins

Hash_Area_size estimate

hash joins

Hash joins

hash joins

confused

hash or nested

update

hash joins

datawarehouse

Why is Hash Join more Efficient?

Disable Hash Join with PGA AGGREGATE TARGET

Hash_area_size over dblink

hash_join_enabled

TKPROF output

Plan changed after Migration to Oracle 10g

Still one clarification

About HASH JOINS

HASH_JOIN

HASH_JOIN

Still awaiting ...

More to Explore

PL/SQL demos

PL/SQL docs

Hash joining versus index

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

hash joins issue

hash joins

HASH joins

Hash_Area_size estimate

hash joins

Hash joins

hash joins

confused

hash or nested

update

hash joins

datawarehouse

Why is Hash Join more Efficient?

Disable Hash Join with PGA AGGREGATE TARGET

Hash_area_size over dblink

hash_join_enabled

TKPROF output

Plan changed after Migration to Oracle 10g

Still one clarification

About HASH JOINS

HASH_JOIN

HASH_JOIN

Still awaiting ...

More to Explore

PL/SQL demos

PL/SQL docs