Skip to Main Content
  • Questions
  • Question on explain plan for hash join

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Gautham.

Asked: September 26, 2008 - 1:56 pm UTC

Last updated: January 28, 2009 - 8:43 am UTC

Version: 10.2.0.3

Viewed 10K+ times! This question is

You Asked

Lately I was working on a query which was fetching like 2million rows and was running for a few hours .I was able to tune this query to give out results in less than 15 mins but i still have questions regarding the explain plan of the old un-tuned query.I am not putting down my tuned SQL as this has got nothing to do with my question.
This was the query I was working on
SELECT ppl_cd, 'PPL-CD', NULL, ppl_org_cd, TO_NUMBER (NULL), ppl_fttl_cd,
ppl_fst_nm, ppl_mid_init, ppl_lst_nm, ppl_infrml_fst_nm, ppl_suffix_nm,
ppl_srch_nm, ppl_lang_cd, ppl_gender, NULL ext_job_role,
ppl_job_title_nm, 'Y', NVL (ppl_prim_cont_flg, 'Y'),
TO_CHAR (ppl_add_dt, 'YYYYMMDD') last_modified_date,
DECODE (ppl_active_flg, 'N', 'S', '') status_indicator,
(SELECT DISTINCT pcat_pcs_cd
FROM ppl_categories
WHERE pcat_ppl_cd = ppl_cd
AND pcat_pcs_cd IN (210, 211)) user_class_cd,
NULL
FROM people
/
I have the explain plan on TOAD and therefore I am typing it manually

SELECT STATEMENT
HASH UNIQUE
TABLE ACCESS BY INDEX ROWID TABLE PINNACLE.PPL_CATEGORIES
INDEX RANGE SCAN INDEX PINNACLE.PCAT_PPL_CD
TABLE ACCESS FULL TABLE PINNACLE.PEOPLE

My question is the 1st operation is
INDEX RANGE SCAN INDEX PINNACLE.PCAT_PPL_CD
There is no where condition on this in the query.All there I can see is a join(pcat_ppl_cd = ppl_cd).So where is it getting the values for doing the index range scan.To me , it looks like there should be a FTS on people table to drive this index scan but that is not shown in the above explain plan.
Can you please throw some light on this?

Regards
Gautham


and Tom said...

You have a query:


select <columns>,
<scalar subquery from other_table where column = TABLE.PPL_CD>
from table;


that is processed like:


for x in ( select * from table )
loop
    select ... into .... 
      from other_table
     where column = :BIND;

    output record
end loop


there is no join here. The HASH UNIQUE step is your DISTINCT clause.

You are just seeing the plan for your scalar subquery, consider:

ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> sELECT ppl_cd, 'PPL-CD', NULL, ppl_org_cd, TO_NUMBER (NULL), ppl_fttl_cd,
  2      ppl_fst_nm, ppl_mid_init, ppl_lst_nm, ppl_infrml_fst_nm, ppl_suffix_nm,
  3      ppl_srch_nm, ppl_lang_cd, ppl_gender, NULL ext_job_role,
  4      ppl_job_title_nm, 'Y', NVL (ppl_prim_cont_flg, 'Y'),
  5      TO_CHAR (ppl_add_dt, 'YYYYMMDD') last_modified_date,
  6      DECODE (ppl_active_flg, 'N', 'S', '') status_indicator,
  7      (SELECT DISTINCT pcat_pcs_cd
  8              FROM ppl_categories
  9              WHERE pcat_ppl_cd = ppl_cd
 10              AND pcat_pcs_cd IN (210, 211)) user_class_cd,
 11      NULL
 12  FROM people
 13  /

Execution Plan
----------------------------------------------------------
Plan hash value: 820210871

------------------------------------------------------------------------
| Id  | Operation                    | Name            | Rows  | Bytes |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |  2000K|   190M|
|   1 |  HASH UNIQUE                 |                 |   200 |  5200 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| PPL_CATEGORIES  |   200 |  5200 |
|*  3 |    INDEX RANGE SCAN          | PCAT_PPL_CD_IDX |  8000 |       |
|   4 |  TABLE ACCESS FULL           | PEOPLE          |  2000K|   190M|
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("PCAT_PCS_CD"=210 OR "PCAT_PCS_CD"=211)
   3 - access("PCAT_PPL_CD"=:B1)

ops$tkyte%ORA10GR2> variable ppl_cd number
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> SELECT DISTINCT pcat_pcs_cd
  2    FROM ppl_categories
  3   WHERE pcat_ppl_cd = :ppl_cd
  4     AND pcat_pcs_cd IN (210, 211);

Execution Plan
----------------------------------------------------------
Plan hash value: 2133281571

------------------------------------------------------------------------
| Id  | Operation                    | Name            | Rows  | Bytes |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |   200 |  5200 |
|   1 |  HASH UNIQUE                 |                 |   200 |  5200 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| PPL_CATEGORIES  |   200 |  5200 |
|*  3 |    INDEX RANGE SCAN          | PCAT_PPL_CD_IDX |  8000 |       |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("PCAT_PCS_CD"=210 OR "PCAT_PCS_CD"=211)
   3 - access("PCAT_PPL_CD"=TO_NUMBER(:PPL_CD))

ops$tkyte%ORA10GR2> set autotrace off


A scalar subquery would be entirely inappropriate here if ppl_cd is nearly unique (which is probably is, sounds like it could be a primary key even)

hopefully, your rewritten query looks like:

sELECT ppl_cd, 'PPL-CD', NULL, ppl_org_cd, TO_NUMBER (NULL), ppl_fttl_cd,
    ppl_fst_nm, ppl_mid_init, ppl_lst_nm, ppl_infrml_fst_nm, ppl_suffix_nm,
    ppl_srch_nm, ppl_lang_cd, ppl_gender, NULL ext_job_role,
    ppl_job_title_nm, 'Y', NVL (ppl_prim_cont_flg, 'Y'),
    TO_CHAR (ppl_add_dt, 'YYYYMMDD') last_modified_date,
    DECODE (ppl_active_flg, 'N', 'S', '') status_indicator,
    x.pcat_pcs_cd user_class_cd,
    NULL
FROM people,
    (SELECT distinct pcat_ppl_cd, pcat_pcs_cd
            FROM ppl_categories
            WHERE pcat_pcs_cd IN (210, 211)) x
 where x.pcat_ppl_cd = people.ppl_cd
/


and only has the DISTINCT if in fact distinct is necessary

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Gautham, September 29, 2008 - 9:15 pm UTC

Hi Tom,
Thanks for clarifying this so nicely.But since the outer table "table" drives the "other_table" from your pseudo code and algorithm, should not the expalin plan show FTS of the "table" above the index scan of "other_table".

Something like
Full Scan of Table
Table access of index rowid(other_table)
Index range scan of other_table
Why is it showing the table access full of PEOPLE table below the scan of people_categories as if people_categories is driving the scan of people table?

Is there anything wrong with the way I read the explain plan.Believe me I read your article on reading explain plans before posting this.
Tom Kyte
September 29, 2008 - 10:03 pm UTC

scalar subqueries are somewhat "exceptional", we are not joining to them in the traditional sense - they happen as part of the "select" from the other row source.

The plan is correct - for example:


ops$tkyte%ORA10GR2> create table t1 ( x int primary key, y int );

Table created.

ops$tkyte%ORA10GR2> create table t2 ( x int, y int );

Table created.

ops$tkyte%ORA10GR2> create table t3 ( x int primary key, y int );

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select /*+ first_rows */ t1.*, t2.*, (select count(y) from t3 where t3.x = t1.x) from t1, t2 where t1.x = t2.x;

Execution Plan
----------------------------------------------------------
Plan hash value: 3150900447

------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Co
------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     1 |    52 |
|   1 |  SORT AGGREGATE              |              |     1 |    26 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T3           |     1 |    26 |
|*  3 |    INDEX UNIQUE SCAN         | SYS_C0030852 |     1 |       |
|   4 |  NESTED LOOPS                |              |     1 |    52 |
|   5 |   TABLE ACCESS FULL          | T2           |     1 |    26 |
|   6 |   TABLE ACCESS BY INDEX ROWID| T1           |     1 |    26 |
|*  7 |    INDEX UNIQUE SCAN         | SYS_C0030851 |     1 |       |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T3"."X"=:B1)
   7 - access("T1"."X"="T2"."X")

Note
-----
   - dynamic sampling used for this statement

ops$tkyte%ORA10GR2> set autotrace off


it is just not a sophisticated enough display medium to show what is "really" going on. The select processing happened "last" in effect here.


Gautham C, September 30, 2008 - 12:20 pm UTC

Thanks Tom. That helps.
Appreciate the time you spent to explain this so effectively.

Very interesting

Chuck Jolley, September 30, 2008 - 3:15 pm UTC

By the way, in TOAD, if you right click in the explain plan window you get a menu that will allow you to change the way the plan is displayed. For some real fun you can even animate them. Some of the modes are pretty entertaining ;)
None of them are as nice, straight forward, and easy to read as the sqlplus standard display though.

Another option is to copy the plan to the clipboard.
You can then paste it into notepad or something to neaten it up, but at least you wouldn't have to type it all out...


Explain plan of scalar subqueries

Kim Berg Hansen, January 27, 2009 - 4:55 am UTC

Hi, Tom

Just a followup on the "special" explain plan for scalar subqueries.

What is the best way to work around, that the predicates for the scalar subquery seems to act as bind variables?

An example that took a colleague and myself some time to figure out:


SQL> delete plan_table
0 rows deleted.
SQL> explain plan for
select
sm.batchserienummer,
sm.varenummer,
sm.salgsdato,
sm.filialid,
(
   select max(formxl)keep (dense_rank last order by dataset, batchserienr$, bogfxrtdato,
   bogfxrttid, lxbenummer)
   from lagerpost lp
   where lp.dataset = sm.dataset
   and lp.batchserienr$ = sm.batchserienummer
) lagerfilial
from dd_stelmotornr sm
where sm.dataset = 'DAT'
and sm.batchserienummer = 'LNGTGBDL88C301289'
Explain complete.
SQL> select * from table(dbms_xplan.display)
Plan hash value: 1390947906                                                                                                       
                                                                                                                                  
-----------------------------------------------------------------------------------------------                                   
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |                                   
-----------------------------------------------------------------------------------------------                                   
|   0 | SELECT STATEMENT            |                 |     1 |    49 |     2   (0)| 00:00:01 |                                   
|   1 |  SORT AGGREGATE             |                 |     1 |    33 |            |          |                                   
|*  2 |   TABLE ACCESS FULL         | LAGERPOST       |  7095K|   223M|  1286K  (4)| 01:05:44 |                                   
|   3 |  TABLE ACCESS BY INDEX ROWID| DD_STELMOTORNR  |     1 |    49 |     2   (0)| 00:00:01 |                                   
|*  4 |   INDEX UNIQUE SCAN         | I_605SERIENRIDX |     1 |       |     1   (0)| 00:00:01 |                                   
-----------------------------------------------------------------------------------------------                                   
                                                                                                                                  
Predicate Information (identified by operation id):                                                                               
---------------------------------------------------                                                                               
                                                                                                                                  
   2 - filter("LP"."BATCHSERIENR$"=:B1 AND "LP"."DATASET"=:B2)                                                                    
   4 - access("SM"."DATASET"='DAT' AND "SM"."BATCHSERIENUMMER"='LNGTGBDL88C301289')                                               

17 rows selected.
SQL> delete plan_table
5 rows deleted.
SQL> explain plan for
select
sm.batchserienummer,
sm.varenummer,
sm.salgsdato,
sm.filialid,
(
   select max(formxl)keep (dense_rank last order by dataset, batchserienr$, bogfxrtdato,
   bogfxrttid, lxbenummer)
   from lagerpost lp
   where lp.dataset = 'DAT'
   and lp.batchserienr$ = 'LNGTGBDL88C301289'
) lagerfilial
from dd_stelmotornr sm
where sm.dataset = 'DAT'
and sm.batchserienummer = 'LNGTGBDL88C301289'
Explain complete.
SQL> select * from table(dbms_xplan.display)
Plan hash value: 1598958998                                                                                                       
                                                                                                                                  
---------------------------------------------------------------------------------------------------                               
| Id  | Operation                    | Name               | Rows  | Bytes | Cost (%CPU)| Time     |                               
---------------------------------------------------------------------------------------------------                               
|   0 | SELECT STATEMENT             |                    |     1 |    49 |     2   (0)| 00:00:01 |                               
|   1 |  SORT AGGREGATE              |                    |     1 |    33 |            |          |                               
|   2 |   TABLE ACCESS BY INDEX ROWID| LAGERPOST          |     1 |    33 |     5   (0)| 00:00:01 |                               
|*  3 |    INDEX RANGE SCAN          | I_013BATCHSERIEIDX |     1 |       |     4   (0)| 00:00:01 |                               
|   4 |  TABLE ACCESS BY INDEX ROWID | DD_STELMOTORNR     |     1 |    49 |     2   (0)| 00:00:01 |                               
|*  5 |   INDEX UNIQUE SCAN          | I_605SERIENRIDX    |     1 |       |     1   (0)| 00:00:01 |                               
---------------------------------------------------------------------------------------------------                               
                                                                                                                                  
Predicate Information (identified by operation id):                                                                               
---------------------------------------------------                                                                               
                                                                                                                                  
   3 - access("LP"."DATASET"='DAT' AND "LP"."BATCHSERIENR$"='LNGTGBDL88C301289')                                                  
   5 - access("SM"."DATASET"='DAT' AND "SM"."BATCHSERIENUMMER"='LNGTGBDL88C301289')                                               

18 rows selected.

SQL> delete plan_table
4 rows deleted.
SQL> explain plan for
select
sm.batchserienummer,
sm.varenummer,
sm.salgsdato,
sm.filialid,
(
   select /*+ index(lp) */ max(formxl)keep (dense_rank last order by dataset, batchserienr$, bogfxrtdato,
   bogfxrttid, lxbenummer)
   from lagerpost lp
   where lp.dataset = sm.dataset
   and lp.batchserienr$ = sm.batchserienummer
) lagerfilial
from dd_stelmotornr sm
where sm.dataset = 'DAT'
and sm.batchserienummer = 'LNGTGBDL88C301289'
Explain complete.
SQL> select * from table(dbms_xplan.display)
Plan hash value: 1598958998                                                                                                       
                                                                                                                                  
---------------------------------------------------------------------------------------------------                               
| Id  | Operation                    | Name               | Rows  | Bytes | Cost (%CPU)| Time     |                               
---------------------------------------------------------------------------------------------------                               
|   0 | SELECT STATEMENT             |                    |     1 |    49 |     2   (0)| 00:00:01 |                               
|   1 |  SORT AGGREGATE              |                    |     1 |    33 |            |          |                               
|   2 |   TABLE ACCESS BY INDEX ROWID| LAGERPOST          |  7095K|   223M|  2868K  (1)| 02:26:30 |                               
|*  3 |    INDEX RANGE SCAN          | I_013BATCHSERIEIDX |  7095K|       | 57149   (1)| 00:02:56 |                               
|   4 |  TABLE ACCESS BY INDEX ROWID | DD_STELMOTORNR     |     1 |    49 |     2   (0)| 00:00:01 |                               
|*  5 |   INDEX UNIQUE SCAN          | I_605SERIENRIDX    |     1 |       |     1   (0)| 00:00:01 |                               
---------------------------------------------------------------------------------------------------                               
                                                                                                                                  
Predicate Information (identified by operation id):                                                                               
---------------------------------------------------                                                                               
                                                                                                                                  
   3 - access("LP"."DATASET"=:B1 AND "LP"."BATCHSERIENR$"=:B2)                                                                    
   5 - access("SM"."DATASET"='DAT' AND "SM"."BATCHSERIENUMMER"='LNGTGBDL88C301289')                                               

18 rows selected.



The "outer" select has a cardinality of 1 which is correct.
The table (lagerpost) in the scalar subquery has about 135 million rows but the cardinality of the table here should be between 2 and 10 rows. Problem is the index is extremely skewed - about 99.7% of the rows have the same value in column batchserienr$ (value asc(2) which means "empty string" for the ERP system.) There is a histogram on the column.

In the first example we see, that the scalar subquery is optimized with bind variables, which leads to full table scan of the 135 million rows. (Seems like the bind variables are "empty" so bind variable peeking does not help.)
In the second example I copy the constants into the scalar subquery - this of course gives me the desired plan.
In the third example I get the desired plan with an "index" hint - the cardinality is terribly wrong but when executed it does the right thing.

Is a hint the best way to work around this? (Even though most hints normally would be bad practice :-)

Thanks

Regards
Kim Berg Hansen

Tom Kyte
January 28, 2009 - 8:43 am UTC

select
sm.batchserienummer,
sm.varenummer,
sm.salgsdato,
sm.filialid,
(
select max(formxl)keep (dense_rank last order by dataset, batchserienr$,
bogfxrtdato,
bogfxrttid, lxbenummer)
from lagerpost lp
where lp.dataset = 'DAT'
and lp.batchserienr$ = :x ) lagerfilial
from dd_stelmotornr sm
where sm.dataset = 'DAT'
and sm.batchserienummer = :x


since you constrain "batchserienummer" in the outer query, I see no reason not use it in both places (if you are binding, bind in both)

PS. Forgot version info :-)

Kim Berg Hansen, January 27, 2009 - 5:10 am UTC

Sorry - forgot version info in the review above.
The database is version 10.2.0.3.0.

(Next month I'll get a chance to test it on an 11g base to see if the behaviour has changed :-)

Regards
Kim Berg Hansen