Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, karthick.

Asked: November 27, 2007 - 11:54 pm UTC

Last updated: December 16, 2011 - 6:05 am UTC

Version: 10G

Viewed 10K+ times! This question is

You Asked

explain plan for
update hx_p1 p
set object_id = (select new_object_id
from hx_s s
where s.table_name = 'P1'
and p.object_id = s.object_id)
where exists (select new_object_id
from hx_s s
where s.table_name = 'P1'
and p.object_id = s.object_id);

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1000 | 15000 | 11 (10)| 00:00:01 |
| 1 | UPDATE | HX_P1 | | | | |
|* 2 | HASH JOIN SEMI | | 1000 | 15000 | 11 (10)| 00:00:01 |
| 3 | TABLE ACCESS FULL | HX_P1 | 1000 | 6000 | 3 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | HX_S | 1000 | 9000 | 7 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID| HX_S | 1 | 16 | 188 (0)| 00:00:03 |
|* 6 | INDEX RANGE SCAN | HX_S_IDX | 1000 | | 4 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

1. I just want to know why HX_S is going for a full scan while its been hashed. why it is not using
the index HX_S_IDX.

2. I know what is HASH join but what is that SEMI there.

3. Same query was just modified by one of our developer he added a rownum < 2 and the entire plan got changed. Why is that?

explain plan for
update hx_p1 p
set object_id = (select new_object_id
from hx_s s
where s.table_name = 'P1'
and p.object_id = s.object_id)
where exists (select new_object_id
from hx_s s
where s.table_name = 'P1'
and p.object_id = s.object_id
and rownum <2);

------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------| 0 | UPDATE STATEMENT | | 1 | 6 | 94158 (1)| 00:18:50 |
| 1 | UPDATE | HX_P1 | | | | |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL | HX_P1 | 1000 | 6000 | 3 (0)| 00:00:01 |
|* 4 | COUNT STOPKEY | | | | | |
|* 5 | TABLE ACCESS BY INDEX ROWID| HX_S | 1 | 9 | 188 (0)| 00:00:03 |
|* 6 | INDEX RANGE SCAN | HX_S_IDX | 1000 | | 4 (0)| 00:00:01 |
|* 7 | TABLE ACCESS BY INDEX ROWID | HX_S | 1 | 16 | 188 (0)| 00:00:03 |
|* 8 | INDEX RANGE SCAN | HX_S_IDX | 1000 | | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------

4. Can you explain this plan for us. some of the terms i dont get like FILTER, COUNT STOPKEY.

and Tom said...

1) because indexes are really slow.

when you access much of the data.

indexes are good for getting "a row" or "a few rows".
indexes are not good for getting "most or many or all of the rows"

instead of:

update hx_p1 p
  set object_id = (select new_object_id
          from hx_s s
          where s.table_name = 'P1'
            and p.object_id = s.object_id)
where exists (select new_object_id
        from hx_s s
      where s.table_name = 'P1'
        and p.object_id = s.object_id);


you should:

update ( select new_object_id, object_id
           from hx_s, hx_p1
          where hx_s.table_name = 'P1'
            and hx_s.object_id = hx_p1.object_id )
  set object_id = new_object_id;



presuming object_id is UNIQUE in hx_s, or



2) semi joins are joins that stop when the first hit is encountered. They are used many times for "existence" checks.

select * from dept where deptno in (select deptno from emp);

that could use a semi join of DEPT to EMP - it would only join each DEPT record to at most ONE emp record - not all of the EMP records as that would create too many DEPT rows. So, it 'semi joins' them.

3) .... Same query was just modified by one of our developer he added a rownum < 2 and the entire plan got changed. Why is that? ....

that should be obvious? by putting rownum < 2 - it knows know "ONE ROW"

I would presume that in order to get ONE ROW, we might be able to consider access paths like an index that would not make sense FOR ALL OR MOST of the rows.

4) filter - a predicate, a where clause, a filter....

count stopkey - that is rownum, we know we are STOPPING after "count" rows have been processed - 1 in this case.

Rating

  (10 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Column Ambiguously Defined

Mark Brady, November 28, 2007 - 2:54 pm UTC

update ( select new_object_id, object_id
           from hx_s, hx_p1
          where hx_s.table_name = 'P1'
            and hx_s.object_id = hx_p1.object_id )
  set object_id = new_object_id;


Is the only thing missing the "hx_s" prefixed to "object_id" in the SELECT clause and the SET clause or just one?
Tom Kyte
November 29, 2007 - 8:09 am UTC

should have qualified it in the select list, then would be unambigous - hx_s is not "visible" in the set close, it is out of scope by then

Will ROWNUM improve perfomance

karthick, November 29, 2007 - 3:05 am UTC


3) .... Same query was just modified by one of our developer he added a rownum < 2 and the entire plan got changed. Why is that? ....

that should be obvious? by putting rownum < 2 - it knows know "ONE ROW"

I would presume that in order to get ONE ROW, we might be able to consider access paths like an index that would not make sense FOR ALL OR MOST of the rows.


Does that means a rownum < 2 clause will improve perfomance as it just looks for one row and does an index range scan.

Tom Kyte
November 29, 2007 - 8:51 am UTC

no, it means that rownum < 2 totally changes the query, the rows affected, everything.

you cannot compare this.


that would be like saying "to make queries go fast, please add ' AND 1=0 ' to the where clause, they go much faster that way"

it would be true, the queries run faster
but they don't do anything useful.

correct me if iam wrong

karthick, November 29, 2007 - 9:09 am UTC

but, correct me if iam wrong. exists checks only for the first existance of the record and by putting rownum < 2 iam not changing the behaviour of the exists correct?? but i get a better perfomance as it takes the index and does a range scan
Tom Kyte
November 29, 2007 - 6:39 pm UTC

I did not read the original query when you asked about the rownum later in a followup, so I did not see where it was.

quantify please - using tkprof - what you mean by "better performance"

Doesn't EXISTS implies ONE ROW ?

Nico, November 29, 2007 - 11:43 am UTC

Hi Tom,

I can't imagine a case where adding "and rownum < 2" in an EXISTS correlated subquery would modify the query result.

I perfectly understand that it does change the plan, but I don't see how this would modify the number of rows updated.

I'm sure it's only because my following example is too trivial :
create table t1 as
select
   level id,
   '___' val
from dual connect by level <= 15;

create table t2 as
select
   3*level id,
   'BBB' val
from dual connect by level <= 5;

update t1
set val=(
   select t2.val from t2 where t2.id=t1.id
)
where exists (
   select 1 from t2 where t2.id=t1.id
);

select *
from t1;

  ID VAL
---- ---
   1 ___
   2 ___
   3 BBB
   4 ___
   5 ___
   6 BBB
   7 ___
   8 ___
   9 BBB
  10 ___
  11 ___
  12 BBB
  13 ___
  14 ___
  15 BBB

rollback;

update t1
set val=(
   select t2.val from t2 where t2.id=t1.id
)
where exists (
   select 1 from t2 where t2.id=t1.id
   and rownum < 2
);

select *
from t1;
  ID VAL
---- ---
   1 ___
   2 ___
   3 BBB
   4 ___
   5 ___
   6 BBB
   7 ___
   8 ___
   9 BBB
  10 ___
  11 ___
  12 BBB
  13 ___
  14 ___
  15 BBB

rollback;

Can you shed some of your light on that point ?
Tom Kyte
November 29, 2007 - 7:11 pm UTC

sorry, that was my mistake, i did not re-read the original post. I was responding to "where rownum < 2" in general, not in the specific case

same output from different queries

nameless, November 29, 2007 - 2:49 pm UTC

Nico:

I can't imagine a case where adding "and rownum < 2" in an EXISTS correlated subquery would modify the query result.

Right. Still, that would be a different query. And is not necesarily for the better.

Above karthick said: but i get a better perfomance as it takes the index and does a range scan

Well, not really, not always anyway ... if anything, the optimizer cannot do a hash semi join now.

Take this example: table t (id) with an index on id

select max(id) from t
select max(id) from t where rownum > 0

They produce the same results, they both might be doing a INDEX FULL SCAN to get the MAX ... but the first one can use the special optimization INDEX FULL SCAN (MIN/MAX) while the second cannot.

Same output, different queries, different set of options the optimizer can pursue ... different everything really.

rownum<2 is better = false

karthick, November 30, 2007 - 7:02 am UTC


I proved myself wrong :-)

drop table t
/

drop table s
/

create table t as select mod(level,1000000) t_val from dual connect by level <= 10000000
/

exec dbms_stats.gather_table_stats('sysadm','t')

create table s as select 'A' as s_fld, t_val as s_old_val, 1000000-t_val as s_new_val from (select distinct t_val from t)
/

create index s_idx on s(s_fld,s_old_val)
/

exec dbms_stats.gather_table_stats('sysadm','s')

update t
set t_val = (select s_new_val
               from s
              where s_fld = 'A'
                and s_old_val = t_val)
where exists(select null
               from s
              where s_fld = 'A'
                and s_old_val = t_val)
/

Rows Execution Plan
------- ---------------------------------------------------
0 UPDATE STATEMENT GOAL: ALL_ROWS
0 UPDATE OF 'T'
0 HASH JOIN (RIGHT SEMI)
0 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'S' (TABLE)
0 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T' (TABLE)
0 TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF 'S' (TABLE)
0 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'S_IDX' (INDEX)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.06 0.14 0 0 0 0
Execute 1 405.31 694.99 13024 30016193 10223745 10000000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 405.37 695.14 13024 30016193 10223745 10000000


update t
set t_val = (select s_new_val
               from s
              where s_fld = 'A'
                and s_old_val = t_val)
where exists(select null
               from s
              where s_fld = 'A'
                and s_old_val = t_val
                and rownum < 2)
/               

Rows Execution Plan
------- ---------------------------------------------------
0 UPDATE STATEMENT GOAL: ALL_ROWS
0 UPDATE OF 'T'
0 FILTER
0 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T' (TABLE)
0 COUNT (STOPKEY)
0 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'S_IDX' (INDEX)
0 TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF 'S' (TABLE)
0 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'S_IDX' (INDEX)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.06 0.14 0 0 0 0
Execute 1 912.43 1390.87 127033 100664115 10551252 10000000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 912.50 1391.02 127033 100664115 10551252 10000000

As you have already said Hash join gives better perfomance when doing update of large amount of data. And full scan is better than Index scan for such cases.

Thank you.
Tom Kyte
November 30, 2007 - 1:36 pm UTC

no

thank you!

I feel much better now :) one more person that is on the side of:

indexes are not all goodness
full scans are not all evil


Dropping Pennies

SeánMacGC, November 30, 2007 - 8:11 pm UTC

Amen! :o)

Scalar sub query

karthick, December 03, 2007 - 8:25 am UTC

update t
set t_val = <b>(select s_new_val
               from s
              where s_fld = 'A'
                and s_old_val = t_val)</b>
where exists(select null
               from s
              where s_fld = 'A'
                and s_old_val = t_val
                and rownum < 2)


why the scalar sub query is not hash joined and gose for a index range scan.
Tom Kyte
December 03, 2007 - 11:33 am UTC

well, first, there are no scalar subqueries there technically - that is a correlated subquery.

remove the rownum < 2 and it'll have a better opportunity - like I said "where rownum < 2" changes the semantic meaning of a query greatly.

karthick, December 04, 2007 - 1:44 am UTC


How about this

update t
set t_val = (select s_new_val
               from s
              where s_fld = 'A'
                and s_old_val = t_val)
where exists(select null
               from s
              where s_fld = 'A'
                and s_old_val = t_val)
/


Rows Execution Plan
------- ---------------------------------------------------
0 UPDATE STATEMENT GOAL: ALL_ROWS
0 UPDATE OF 'T'
0 HASH JOIN (RIGHT SEMI)
0 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'S' (TABLE)
0 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T' (TABLE)
0 TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF 'S' (TABLE)
0 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'S_IDX' (INDEX)

Same Issue, but what coudl eb the solution

Joe, December 14, 2011 - 5:54 pm UTC

Hi Tom,
In one of my Queries, I end up in same situation and the HASH JOIN SEMI, is consuming time. This update on a table with 44366 records is taking 26 minutes.
Here I am taking all records loaded in current run of a program ( using request_id) and then checkign if ther eis duplicates in that and updates a column in the table.

Why does this join take so much time to execute.



UPDATE TAB_A a
SET status = 3,
error_log =
error_log
|| '| There is more than one record with same item number and destination UOM in the file.',
last_updated_by = g_user_id,
last_update_date = SYSDATE
WHERE a.request_id = g_request_id
AND a.ITEM is not null
AND a.DESTINATION_UOM is not null
AND EXISTS (
SELECT 1
FROM TAB_B b
WHERE b.request_id = g_request_id
AND b.ITEM is not null
AND b.DESTINATION_UOM is not null
and b.ITEM = a.ITEM
and b.DESTINATION_UOM = a.DESTINATION_UOM
and b.rowid <> a.rowid
);

Expalin plan is
Plan
UPDATE STATEMENT ALL_ROWSCost: 7 Bytes: 143 Cardinality: 1
4 UPDATE TAB_A
3 HASH JOIN SEMI Cost: 7 Bytes: 143 Cardinality: 1
1 TABLE ACCESS FULL TABLE TAB_A Cost: 3 Bytes: 900 Cardinality: 9
2 TABLE ACCESS FULL TABLE TAB_B Cost: 3 Bytes: 387 Cardinality: 9


Tom Kyte
December 16, 2011 - 6:05 am UTC

why not

where (item,destination_uom) in (select item, destination_uom
from tab_b
where request_id = g_request_id
group by item,destination_uom
having count(*) > 1 );

otherwise, post a tkprof, I've a feeling the cardinality estimates are way off here. but I'd really try the IN first.