Skip to Main Content
  • Questions
  • Slow processing on database using merge

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Brett.

Asked: October 31, 2003 - 8:36 am UTC

Last updated: October 10, 2012 - 4:32 pm UTC

Version: 9.2.0.4

Viewed 10K+ times! This question is

You Asked

Tom,

After further investigating, I have found that the problem is with using the merge statement on tables with millions of rows. I have several tables that use the following logic:

PROCEDURE UPSERT
(P_ID IN SX_SHIP_METHOD_CARR_MAPS.ID%TYPE
,P_STATUS IN SX_SHIP_METHOD_CARR_MAPS.STATUS%TYPE
,P_SXSHPMTH_FK IN SX_SHIP_METHOD_CARR_MAPS.SXSHPMTH_FK%TYPE
,P_SXMULSIT_FK IN SX_SHIP_METHOD_CARR_MAPS.SXMULSIT_FK%TYPE
,P_SXUFS_FK IN SX_SHIP_METHOD_CARR_MAPS.SXUFS_FK%TYPE
,P_DEFAULT_FLAG IN SX_SHIP_METHOD_CARR_MAPS.DEFAULT_FLAG%TYPE
)
IS

V_DATA SX_SHIP_METHOD_CARR_MAPS_TT := SX_SHIP_METHOD_CARR_MAPS_TT();
V_STATUS VARCHAR2(2);
BEGIN
select decode(to_char(abs(p_status)), '0', '0', '1')
into v_status
from dual;

if g_sync_mode = 'S' then
BEGIN
insert into sx_ship_method_carr_maps
( id
,status
,sxshpmth_fk
,sxmulsit_fk
,sxufs_fk
,default_flag
)
values ( p_id
,v_status
,p_sxshpmth_fk
,p_sxmulsit_fk
,p_sxufs_fk
,p_default_flag
);
EXCEPTION
WHEN DUP_VAL_ON_INDEX THEN
update sx_ship_method_carr_maps
set status = v_status
,sxshpmth_fk = p_sxshpmth_fk
,sxmulsit_fk = p_sxmulsit_fk
,sxufs_fk = p_sxufs_fk
,default_flag = p_default_flag
where id = p_id;
END;
else
v_data.extend;
v_data(1) := sx_ship_method_carr_maps_rt( p_id
,v_status
,p_sxshpmth_fk
,p_sxmulsit_fk
,p_sxufs_fk
,p_default_flag
);
merge into sx_ship_method_carr_maps t1
using (select * from table(cast(v_data as sx_ship_method_carr_maps_tt))) t2
on (t1.id = t2.id)
when matched then
update set t1.status = t2.status
,t1.sxshpmth_fk = t2.sxshpmth_fk
,t1.sxmulsit_fk = t2.sxmulsit_fk
,t1.sxufs_fk = t2.sxufs_fk
,t1.default_flag = t2.default_flag
when not matched then
insert ( id
,status
,sxshpmth_fk
,sxmulsit_fk
,sxufs_fk
,default_flag
)
values ( t2.id
,t2.status
,t2.sxshpmth_fk
,t2.sxmulsit_fk
,t2.sxufs_fk
,t2.default_flag
);
end if;
END;
--;

END SX_SHIP_METHOD_CARR_MAPS_PKG;
/

When the mode is 'S', the performance is no problem because I truncate the table and load each record right without trying to find it. When the mode is 'M', I use the merge statement to update it if it is there or insert it if it is not. The only index I have on the table is the primary key (id). On tables of more than 3 million rows, it appears to take a while to find the record (I verified this using SQLPLUS). However, if I use the insert/update logic for everything, I don't have the slowdown. Is there something specific to the merge that would cause this?

Thanks,

Brett

Tom,

I am experiencing very slow execution times for inserts on my database using the merge command. Running statspack reveals this. However, if I change the merge command to use and insert/update logic, performance improves and my buffer cache goes from 0% to over 90% instantly. I am trying to figure out why this may be happening. My guess is that the database is on a RAID 5 and there a lots of read contentions. I am running on a 2-processor system with 2Gig of RAM.

STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------- ---------- -------- -------- --------- ------- ---------
PROD 4205553307 prod 1 9.2.0.4.0 NO ZEUS

Snap Id Snap Time Sessions Cur/Sess Comment
------- ------------------ -------- -------- ------------
Begin: 1 30-Oct-03 14:25.22 14 4.1
End: 2 30-Oct-03 14:35:44 16 5.7
Elapsed: 10.37 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 48M Std Block Size: 8K
Shared Pool Size: 68M Log Buffer: 512K

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
------------------------------ ----------- ----------- -----------
db file scattered read 104,821 454 78.61
CPU time 120 20.74
db file sequential read 631 3 .53
control file parallel write 202 0 .07
control file sequential read 186 0 .03

Any thoughts on what I can do to make this system for efficient?

Thanks,

Brett

and Tom said...

why are you doing this row by row by stinking slow row??????

why would you merge a row at a time?

merge is designed to merge two SETS, not single rows.

your row by row by slow slow slow row processing is the killer here. You have tons of procedural code that is just slowing you way way way down. You are dyanmically allocating storage, casting, using object types -- all of which is wholly unneccesary.

The invoker of this routine shouldn't be -- the invoker should just be running MERGE on sets, period.

Your insert/update routine is many many times slower then it needs to be as well - forget about MERGE for a moment, you are using many orders of magnitude more resources then you ever need to!


To answer the very last question "what can you do to make this system more efficient" i have only one word for you:

SETS


it is all about sets, SQL is about sets, the best way to make SQL perform is to think SETS, not procedurally, SETS. MERGE is all about SETS. MERGE is not about row by row (cobol is maybe)...


I found this so slow -- I could not even bear to run a 1,000,000 row test! Here is a couple of thousand rows. We can make it faster -- and I'll iteratively show you how -- but really a single sole MERGE statement is all you want. I'll do 10000 rows to show you really slow (your way) and just plain slow (a simple modification). I use a script "big_table.sql" from my new book Effective Oracle By Design to create my test table:


---------------- big_table.sql ----------------------------------

create table big_table
as
select rownum id, a.*
from all_objects a
where 1=0
/
alter table big_table nologging;

declare
l_cnt number;
l_rows number := &1;
begin
insert /*+ append */
into big_table
select rownum, a.*
from all_objects a
where rownum <= &1;

l_cnt := sql%rowcount;

commit;

while (l_cnt < l_rows)
loop
insert /*+ APPEND */ into big_table
select rownum+l_cnt,
OWNER, OBJECT_NAME, SUBOBJECT_NAME,
OBJECT_ID, DATA_OBJECT_ID,
OBJECT_TYPE, CREATED, LAST_DDL_TIME,
TIMESTAMP, STATUS, TEMPORARY,
GENERATED, SECONDARY
from big_table
where rownum <= l_rows-l_cnt;
l_cnt := l_cnt + sql%rowcount;
commit;
end loop;
end;
/

alter table big_table add constraint
big_table_pk primary key(id)
/

begin
dbms_stats.gather_table_stats
( ownname => user,
tabname => 'BIG_TABLE',
method_opt => 'for all indexed columns',
cascade => TRUE );
end;
/
select count(*) from big_table;
------------------------------------------------------------------


I use that to create a 10000 row table and then ran this script "slow.sql"

---------------------- slow.sql ---------------------------------------

create or replace type myScalarType
as object
(ID NUMBER,
OWNER VARCHAR2(30),
OBJECT_NAME VARCHAR2(30),
OBJECT_TYPE VARCHAR2(18),
CREATED DATE,
LAST_DDL_TIME DATE,
TIMESTAMP VARCHAR2(19),
STATUS VARCHAR2(7)
)
/
create or replace type myArrayType
as table of myScalarType
/


drop table big_table;
@big_table &1
alter table big_table modify object_id null;
exec dbms_stats.set_table_stats( user, 'BIG_TABLE', numrows => 1000000, numblks => 15000 );

drop table updates;

create table updates
as
select id+(select trunc(count(*)/2) from big_table) id,
owner, object_name, object_type, created,
last_ddl_time, timestamp, status
from big_table
/
alter table updates add constraint updates_pk primary key(id);


create or replace
procedure do_single_row
( p_id in big_table.id%type,
p_owner in big_table.owner%type,
p_object_name in big_table.object_name%type,
p_object_type in big_table.object_type%type,
p_created in big_table.created%type,
p_last_ddl_time in big_table.last_ddl_time%type,
p_timestamp in big_table.timestamp%type,
p_status in big_table.status%type )
as
l_data myArrayType := myArrayType();
begin
l_data.extend;
l_data(1) :=
myScalarType( p_id, p_owner, p_object_name,
p_object_type, p_created,
p_last_ddl_time, p_timestamp,
p_status );

merge into big_table t1
using ( select * from table( cast(l_data as myArrayType) ) ) t2
on ( t1.id = t2.id )
when matched then
update set t1.owner = t2.owner,
t1.object_name = t2.object_name,
t1.object_type = t2.object_type,
t1.created = t2.created,
t1.last_ddl_time = t2.last_ddl_time,
t1.timestamp = t2.timestamp,
t1.status = t2.status
when not matched then
insert ( id, owner, object_name, object_type,
created, last_ddl_time, timestamp,
status)
values ( t2.id, t2.owner, t2.object_name,
t2.object_type, t2.created,
t2.last_ddl_time, t2.timestamp, t2.status );
end;
/

create or replace procedure slow_way
as
begin
for x in ( select * from updates )
loop
do_single_row( x.id, x.owner, x.object_name,
x.object_type, x.created,
x.last_ddl_time, x.timestamp, x.status );
end loop;
end;
/

@trace
exec slow_way
---------------------------------------------------------------------

I set the stats on big_table so as to get a plan for the merge that would be used for millions of rows in big_table (else, the merge used a hash anti join on a small table, that would skew our findings for this scaled down test). I wanted an index lookup as that is what would really happen in real life. Given we are doing a row at a time, this is a valid use of setting stats to "get a plan" -- so we can extrapolate what would happen if we did this process 1000000 times instead of just the 1,000 i did here:


the above script will update 1/2 the rows in big_table and insert 1/2 as many as exist by design.

Now, TKPROF shows us from this run:


MERGE into big_table t1
using ( select * from table( cast(:b1 as myArrayType) ) ) t2
on ( t1.id = t2.id )
when matched then
update set t1.owner = t2.owner,
t1.object_name = t2.object_name,
t1.object_type = t2.object_type,
t1.created = t2.created,
t1.last_ddl_time = t2.last_ddl_time,
t1.timestamp = t2.timestamp,
t1.status = t2.status
when not matched then
insert ( id, owner, object_name, object_type,
created, last_ddl_time, timestamp,
status)
values ( t2.id, t2.owner, t2.object_name,
t2.object_type, t2.created,
t2.last_ddl_time, t2.timestamp, t2.status )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 10000 12.65 15.36 0 25600 23216 10000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10001 12.66 15.36 0 25600 23216 10000

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 542 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
20000 MERGE (cr=25646 r=0 w=0 time=6067730 us)
10000 VIEW (cr=25019 r=0 w=0 time=2636444 us)
10000 NESTED LOOPS OUTER (cr=25019 r=0 w=0 time=2329853 us)
10000 COLLECTION ITERATOR PICKLER FETCH (cr=0 r=0 w=0 time=291352 us)
5000 TABLE ACCESS BY INDEX ROWID BIG_TABLE (cr=25019 r=0 w=0 time=1275712 us)
5000 INDEX RANGE SCAN BIG_TABLE_PK (cr=20019 r=0 w=0 time=953202 us)(object id 56866)



Now, if we just extrapolate up to 1,000,000 rows:


ops$tkyte@ORA920> select 1000000/10000*12.66 from dual;

1000000/10000*12.66
-------------------
1266

1* select 1266/60 from dual
ops$tkyte@ORA920> /

1266/60
----------
21.1

it would take 21 CPU minutes to do this (trust me, elapsed wall clock time even longer)!!! So, lets modify the routine a little bit. You are using object types, nested tables, cast -- very cpu intensive -- that and the optimizer is guessing that there will be 8k rows in that little subquery as well (no stats for that "table()"). Lets create a small 1 row table:




drop table iot;
create table iot ( x int primary key ) organization index;
insert into iot values ( 1 );
analyze table iot compute statistics;



and rewrite out do_single_row routine "better"

create or replace
procedure do_single_row_better
( p_id in big_table.id%type,
p_owner in big_table.owner%type,
p_object_name in big_table.object_name%type,
p_object_type in big_table.object_type%type,
p_created in big_table.created%type,
p_last_ddl_time in big_table.last_ddl_time%type,
p_timestamp in big_table.timestamp%type,
p_status in big_table.status%type )
as
begin
merge into big_table t1
using ( select p_id id,
p_owner owner,
p_object_name object_name,
p_object_type object_type,
p_created created,
p_last_ddl_time last_ddl_time,
p_timestamp timestamp,
p_status status
from iot
where x = 1 ) t2
on ( t1.id = t2.id )
when matched then
update set t1.owner = t2.owner,
t1.object_name = t2.object_name,
t1.object_type = t2.object_type,
t1.created = t2.created,
t1.last_ddl_time = t2.last_ddl_time,
t1.timestamp = t2.timestamp,
t1.status = t2.status
when not matched then
insert ( id, owner, object_name, object_type,
created, last_ddl_time, timestamp, status)
values ( t2.id, t2.owner, t2.object_name, t2.object_type,
t2.created, t2.last_ddl_time,
t2.timestamp, t2.status );
end;
/


No temporary variables, less procedural code, uses all real tables (analyzed). The tkprof is now:

MERGE into big_table t1
using ( select :b8 id,
:b7 owner,
:b6 object_name,
:b5 object_type,
:b4 created,
:b3 last_ddl_time,
:b2 timestamp,
:b1 status
from iot
where x = 1 ) t2
on ( t1.id = t2.id )
when matched then
update set t1.owner = t2.owner,
t1.object_name = t2.object_name,
t1.object_type = t2.object_type,
t1.created = t2.created,
t1.last_ddl_time = t2.last_ddl_time,
t1.timestamp = t2.timestamp,
t1.status = t2.status
when not matched then
insert ( id, owner, object_name, object_type,
created, last_ddl_time, timestamp, status)
values ( t2.id, t2.owner, t2.object_name, t2.object_type,
t2.created, t2.last_ddl_time,
t2.timestamp, t2.status )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 10000 11.94 12.60 0 35587 23194 10000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 10001 11.95 12.61 0 35587 23194 10000

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 542 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
20000 MERGE (cr=35627 r=0 w=0 time=3548112 us)
10000 VIEW (cr=35009 r=0 w=0 time=1010034 us)
10000 NESTED LOOPS OUTER (cr=35009 r=0 w=0 time=933976 us)
10000 INDEX UNIQUE SCAN SYS_IOT_TOP_56881 (cr=10000 r=0 w=0 time=401922 us)(object id 56882)
5000 TABLE ACCESS BY INDEX ROWID BIG_TABLE (cr=25009 r=0 w=0 time=353633 us)
5000 INDEX UNIQUE SCAN BIG_TABLE_PK (cr=20009 r=0 w=0 time=150502 us)(object id 56878)


So, it is a little better, but not good enough. Lets do it "right". forget the code, lets just go for it, single SQL statement:



merge into big_table t1
using ( select * from updates ) t2
on ( t1.id = t2.id )
when matched then
update set t1.owner = t2.owner,
t1.object_name = t2.object_name,
t1.object_type = t2.object_type,
t1.created = t2.created,
t1.last_ddl_time = t2.last_ddl_time,
t1.timestamp = t2.timestamp,
t1.status = t2.status
when not matched then
insert ( id, owner, object_name, object_type,
created, last_ddl_time, timestamp, status)
values ( t2.id, t2.owner, t2.object_name, t2.object_type,
t2.created, t2.last_ddl_time,
t2.timestamp, t2.status )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 1.03 1.03 0 15783 8319 10000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 1.05 1.05 0 15783 8319 10000

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 542

Rows Row Source Operation
------- ---------------------------------------------------
2 MERGE (cr=15832 r=0 w=0 time=1042526 us)
10000 VIEW (cr=15141 r=0 w=0 time=474658 us)
10000 NESTED LOOPS OUTER (cr=15141 r=0 w=0 time=440052 us)
10000 TABLE ACCESS FULL OBJ#(56889) (cr=127 r=0 w=0 time=49104 us)
5000 TABLE ACCESS BY INDEX ROWID OBJ#(56887) (cr=15014 r=0 w=0 time=276923 us)
5000 INDEX RANGE SCAN OBJ#(56888) (cr=10014 r=0 w=0 time=166679 us)(object id 56888)


now, that -- that is significant. what about 1,000,000 rows?

merge into big_table t1
using ( select * from updates ) t2
on ( t1.id = t2.id )
when matched then
update set t1.owner = t2.owner,
t1.object_name = t2.object_name,
t1.object_type = t2.object_type,
t1.created = t2.created,
t1.last_ddl_time = t2.last_ddl_time,
t1.timestamp = t2.timestamp,
t1.status = t2.status
when not matched then
insert ( id, owner, object_name, object_type,
created, last_ddl_time, timestamp, status)
values ( t2.id, t2.owner, t2.object_name, t2.object_type,
t2.created, t2.last_ddl_time,
t2.timestamp, t2.status )

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.31 0 0 0 0
Execute 1 67.60 277.51 14310 55205 2159380 1000000
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 67.61 277.82 14310 55205 2159380 1000000

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 542

Rows Row Source Operation
------- ---------------------------------------------------
2 MERGE (cr=55544 r=14310 w=0 time=277757501 us)
1000000 VIEW (cr=26882 r=14304 w=0 time=54631955 us)
1000000 HASH JOIN OUTER (cr=26882 r=14304 w=0 time=51744510 us)
1000000 TABLE ACCESS FULL OBJ#(56903) (cr=12561 r=0 w=0 time=1592967 us)
1000000 TABLE ACCESS FULL OBJ#(56901) (cr=14321 r=14304 w=0 time=31082443 us)


there we go -- a couple of minutes, not a couple of hours! 1 cpu minute (lots less then 21) Hugely different.


So, the best way to make this have "fast=true"? Use set based operations, a single merge. lose the slow by slow (oops -- meant row by row) processing.


Rating

  (12 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Excellent, truly detailed

A reader, November 02, 2003 - 1:36 pm UTC


Another MERGE Problem

Jasbir Kular, November 03, 2003 - 11:24 am UTC

The following PL/SQL block using a MERGE and Table Function gives me an error in 9.2.0.3.0 but not in 9.2.0.1.0. The error is ORA-01008: not all variables bound ORA-06512: at "BNDWQA.TEST_DATA_FUN", line
11 ORA-06512: at line 4

The equivalent PL/SQL block using an Insert/Update statement works fine in both Oracle versions.


DECLARE
P_ID NUMBER := 1;
BEGIN
MERGE
INTO TEST_DATA_2
USING
(SELECT ID, DATA
FROM TABLE(TEST_DATA_FUN(CURSOR(
SELECT * FROM TEST_DATA WHERE (ID = P_ID)
)))
) SEL
ON ( ID = SEL.ID )
WHEN MATCHED THEN
UPDATE
SET DATA = SEL.DATA
WHEN NOT MATCHED THEN
INSERT (ID, DATA)
VALUES (SEL.ID, SEL.DATA);
END;

Here are the test table and functions I used:

CREATE TABLE TEST_DATA
(ID NUMBER,
DATA VARCHAR2(30));

INSERT INTO TEST_DATA VALUES (1, 'TEST1');
INSERT INTO TEST_DATA VALUES (2, 'TEST2');
INSERT INTO TEST_DATA VALUES (3, 'TEST3');

CREATE TABLE TEST_DATA_2
(ID NUMBER,
DATA VARCHAR2(30));

CREATE TYPE TEST_DATA_TYPE AS OBJECT
(ID NUMBER,
DATA VARCHAR2(30));

CREATE TYPE TEST_DATA_TAB AS TABLE OF TEST_DATA_TYPE;


CREATE OR REPLACE FUNCTION TEST_DATA_FUN (P SYS_REFCURSOR)
RETURN TEST_DATA_TAB
PIPELINED
PARALLEL_ENABLE(PARTITION P BY ANY)
IS
L_ID NUMBER;
L_DATA VARCHAR2(30);
L_TEST_DATA_TYPE TEST_DATA_TYPE;
BEGIN
LOOP
FETCH P INTO L_ID, L_DATA;
EXIT WHEN P%NOTFOUND;
L_TEST_DATA_TYPE := TEST_DATA_TYPE(L_ID, L_DATA);
PIPE ROW (L_TEST_DATA_TYPE);
END LOOP;
RETURN;
END;

Thanks.



Tom Kyte
November 03, 2003 - 5:52 pm UTC

confirmed, did you file a tar with support? you have the perfect reproducible testcase for getting a bug opened.

Another MERGE Problem

Jasbir Kular, November 04, 2003 - 11:35 am UTC

Yes, I logged a TAR. I am trying to install Oracle 9.2.0.4.0 and run the test on that version.

Another MERGE Problem

Jasbir Kular, November 04, 2003 - 3:36 pm UTC

I tried my test on 9.2.0.4.0 and the bug still exists in that version.

Still a problem in 9.2.0.5

Peter Tran, March 26, 2004 - 9:28 pm UTC

Jasbir Kular,

It's still a problem in 9.2.0.5.

-Peter

Hi

A reader, April 02, 2004 - 1:42 pm UTC

I have a situation that a table (say FACT_BLEND) has 2 columns that makes a record unique and 4 other columns to store the measure values. Currently there is no primary key defined on that. The fact_blend table has 1 billion records.
Two of the measure values are for current month and other two are for the previous month.
Say for example current month is march and previous month is feb 2004.
I bring in the april data into my fact table. The process that gets the april data is from a very big table (1.7 billion records)joined with couple of dimension tables each on an average 4 million records. The chances of new set of records are minimum ( I would say 10%).
My job needs to update the existing keys with the april months data in the measure columns (overwrite feb month data) and insert the records if they are new.

Now I have to make sure the MERGE helps me here. Do I need to have primary keys on my fact_blend table to perform MERGE?
What is a good idea to achive this fast?

Thanks a lot


Tom Kyte
April 02, 2004 - 3:03 pm UTC

you do not need a primary key for merge and to merge such large sets with eachother, indexes will be "not useful", what will be useful will be a large hash area size (8i) or pga_aggregate_target (9i) - but even in 9i, i might set workarea policy to manual and set a large hash area size -- larger than pga aggregate target might let me use otherwise.

Force merge to use hash join rather than nested loops outer

Rob, July 23, 2004 - 10:45 am UTC

Tom:

1) I would like to experiment with gettig the 9i merge clause to use a hash join rather than a nested loops operation when merging two large data sets. The only way I have been able to do this in my dev environment is to drop the indexes. Is there a hint that I could experiment with?

2) Does merge make use of PQ for the writes to the table that is being updated.

3) I know this is off the topic but can you tell me offhand whether dbms_stats is an autonomous transaction, such that if I am inserting/updating a table in a proc, then call dbms_stats in the same proc, then continue to update/insert into the table still in the same proc w/o a commit, will the stats get rolled back if there is a failure/rollback after the call to dbms_stats?


Tom Kyte
July 23, 2004 - 3:51 pm UTC

1) always works "naturally" for me.  can you give a counter case?

ops$tkyte@ORA9IR2> create table t1 as select object_id, object_name from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR2> create table t2 as select object_id, object_name from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create index t1_idx on t1(object_id);
 
Index created.
 
ops$tkyte@ORA9IR2> create index t2_idx on t2(object_id);
 
Index created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t1 compute statistics for table for all indexes;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> analyze table t2 compute statistics for table for all indexes;
 
Table analyzed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace on
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> merge into t1
  2  using t2
  3  on ( t1.object_id = t2.object_id )
  4  when matched then update set object_name = object_name
  5  when not matched then insert ( object_id, object_name ) values ( t2.object_id, t2.object_name );
 
30666 rows merged.
 
 
Execution Plan
----------------------------------------------------------
   0      MERGE STATEMENT Optimizer=CHOOSE (Cost=72 Card=30666 Bytes=1839960)
   1    0   MERGE OF 'T1'
   2    1     VIEW
   3    2       HASH JOIN (OUTER) (Cost=72 Card=30666 Bytes=1870626)
   4    3         TABLE ACCESS (FULL) OF 'T2' (Cost=16 Card=30666 Bytes=919980)
   5    3         TABLE ACCESS (FULL) OF 'T1' (Cost=16 Card=30665 Bytes=950615)
 
 


2) merge can be used with parallel dml, yes.

3) dbms_stats would need be wrapped in an atrans yourself. else it commits.



ops$tkyte@ORA9IR2> create table t1 ( x int );
 
Table created.
 
ops$tkyte@ORA9IR2> create table t2 ( x int );
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t1 values ( 1 );
 
1 row created.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> rollback;
 
Rollback complete.
 
ops$tkyte@ORA9IR2> select * from t1;
 
         X
----------
         1
 
ops$tkyte@ORA9IR2> truncate table t1;
 
Table truncated.
 
ops$tkyte@ORA9IR2> insert into t1 values ( 1 );
 
1 row created.
 
ops$tkyte@ORA9IR2> declare
  2          pragma autonomous_transaction;
  3  begin
  4          dbms_stats.gather_table_stats( user, 'T' );
  5          commit;
  6  end;
  7  /
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> rollback;
 
Rollback complete.
 
ops$tkyte@ORA9IR2> select * from t1;
 
no rows selected
 

How do I parallize MERGE

Peter Tran, July 27, 2004 - 4:38 pm UTC

Hi Tom,

You wrote, "2) merge can be used with parallel dml, yes.".

1) How do I do this?
2) Would you recommend parallizing merge in a high volume OLTP environment?

Thanks,
-Peter

Tom Kyte
July 27, 2004 - 7:03 pm UTC

1) with pdml - alter session to enable it -- see the Data Warehousing guide, it covers PDML

2) "in a high volume OLTP" -- merge is a "batch thing". confused. what is really going on -- need more info? in general parallel = something to not even consider in OLTP, but what are you doing

Merge and batch

Peter Tran, July 27, 2004 - 7:29 pm UTC

Hi Tom,

"what is really going on -- need more info? in general parallel = something to not even consider in OLTP, but what are you doing"

Yes, we only use merge in our nightly batch processing. However, we may have 30 concurrent corba services doing data loading and invoking merge. This approach was taken over doing a bunch of single insert (to guard against PK violation) followed by a batch insert. Since we parallize already at the application level, I would think that enabling PDML would not be needed. Would you agree?

Thanks,
-Peter

Tom Kyte
July 27, 2004 - 7:50 pm UTC

unless you have idle cpu, parallel query would be "no suggested".

if you have a 30 cpu machine, each session could do a parallel 2 (2xcpu = good place to start for degrees of parallelism)

MERGE Performance

Narendra, May 19, 2006 - 5:39 am UTC

Hi Tom,

I have a MERGE statement that merges 30000 records
based on a SELECT. This could never complete and I had to kill session every time I tried running it. I tried following steps:
1. Ran the SELECT part separately and it executed in few seconds.
2. Added recordset of SELECT in a table and used that table in MERGE statement. It worked in about 1-1.5 mins.

Where can I start looking for the problem ?
The SELECT used for MERGE includes destination table (i.e. table which is updated / inserted).
I have not posted actual SQLs or table definitions since it is quite huge. (I will do it if you don't mind).

Would appreciate if you can give some direction.

Thanks

Merge and triggers,

A reader, October 10, 2012 - 12:01 pm UTC

Hello,

If using MERGE statement and operate in a bulk way, how would the row level trigger behave? We have a need to merge a 170M row table (say tabA) with another 170M row table (say tabB) but the table that needs to be merged has a row based trigger and that trigger need to be ON due to business requirement.

The table tabB is on a different database. Is it a good idea to make a copy in the database where tabA resides to avoid network during MERGE operation?

Thanks,

Tom Kyte
October 10, 2012 - 1:57 pm UTC

it would be fired for an update or an insert - depending on if the merge updated or inserted a row.

be prepared for this to take days... 170,000,000 * 1ms ~ 48hours. Assuming you have the worlds best trigger programmers.

and that is just for the trigger itself.

this is a horrible idea. No way I would use merge, not at all.

I would use a CREATE TABLE AS SELECT - in parallel - nologging. Incorporate the trigger logic into the select statement that selects the data you want. drop the old table.

DML on 170,000,000 rows isn't anything you want to even consider.

Merge and triggers,

A reader, October 10, 2012 - 2:13 pm UTC

Hello,

Thanks for your view on the proposed logic. The table that needs to be merged is part of Oracle Streams. Since we had some issues on Steams, the data is out of sync between the source and the target database.

The triggers are used for a further downstream on the target database. If we create new table and rename it (and later drop the old table), there wouldn't be way for us to send the information to the downstream tables which trigger was supposed to do. One other challenge, how would Streams know if we rename the table to the original one? Will it continue to replicate?

Getting bit complex here.

Thanks,



Tom Kyte
October 10, 2012 - 4:32 pm UTC

and if you add streams into a 170,000,000 row update - you'll be talking many many days of process.

You can do the logic the triggers were doing in bulk too - look at the logic and say "if I had to do this in a single sql statement - what would I do"

I suggest

a) DDL for the merge - do not merge. You can sync the data in both systems using DDL and then start streaming again.

b) do whatever you need to emulate the action of the trigger firing - but don't use a trigger, it'll really take days (math - it is all about multiplying in this case)


More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library