Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, prakash.

Asked: February 23, 2009 - 1:46 am UTC

Last updated: February 02, 2012 - 2:51 pm UTC

Version: 9.1.2

Viewed 10K+ times! This question is

You Asked

Hi Tom,

Just a thought,

I have two columns in a table(T) say columns A and B.

I want to update a value to any one of these columns which is not null.
definitely one of these columns is NULL and the other is NOT NULL

So I have to write,

UPDATE T
SET A = 2
WHERE A IS NOT NULL
/
UPDATE T
SET B = 2
WHERE B IS NOT NULL
/

to write in single UPDATE,

UPDATE T
SET A = decode(A,NULL,NULL,2),
B = decode(B,NULL,NULL,2)
/

Is there a way to write in single update with out case and decode ?

Just a thought, Oracle can be designed in such a way to
handle NVL in UPDATE SET Columns like,
UPDATE SET NVL(A,B) = 2 -- Just a thought

regards,
PK


and Tom said...

you need an LVALUE (lvalue being a computer science sort of term, a value that can be modified). Since SQL has no concept of a pointer - a "function" will never be an lvalue, only an rvalue (something that can be used to assign from, but not to).

Your update would be actually:
update t 
set a = case when a is null then 2 else a end, 
b = case when b is null then 2 else b end
WHERE A is null OR B is null;




Rating

  (18 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, March 03, 2009 - 4:34 am UTC

Tom,

I think prakash stated

"Is there a way to write in single update with out case and decode".

Your solution definitely uses 'case'.

If I understand prakash's question correctly he is looking for something like

update t
set    a = nvl(a, 2)
,      b = nvl(b, 2)
where  a is null
or     b is null      

Tom Kyte
March 03, 2009 - 4:39 pm UTC

actually, we both got it wrong.


His working query is:

UPDATE T
SET A = decode(A,NULL,NULL,2),
B = decode(B,NULL,NULL,2)
/


if A is NULL, let it be - else 2. We both did it backwards.

NVL cannot work here.

prakash, March 04, 2009 - 1:18 am UTC

Tom,

Thanks for your inputs.

This works for me.

UPDATE T
SET A = NVL2(A,2,A),
B = NVL2(B,2,B)
/

regards,
Prakash
Tom Kyte
March 04, 2009 - 1:07 pm UTC

you might want to add

where a is not null or b is not null



but one wonders why you cannot use case (ansi) versus NVL2 (not)


it certainly would be a lot more readable.

prakash, March 04, 2009 - 2:13 pm UTC

Tom,

Thanks for your suggestion.

Even without A is NOT NULL or B is NOT NULL
this NVL2 will work.

Ofcourse, we can use CASE instead NVL2.
Already I used DECODE, it is working fine.

Since you said NVL can't work for this, out of curious I thought and got this NVL2.

One more thing, CASE have more SYNTAX than DECODE and NVL2, better to use NVL2 ( hope performance wise there will not be much difference among CASE / DECODE / NVL2 ).

If CASE is more optimized command than DECODE/NVL2 then I will change to CASE.


Regards,
Prakash
Tom Kyte
March 04, 2009 - 2:25 pm UTC

it will work, but it will update (and hence lock) every row in the table - it does not do what your original two updates did.

you should switch to case just because it makes the update infinitely more understandable. Take pity on those that will follow you in the future.

A reader, March 04, 2009 - 2:49 pm UTC

Tom,

Actually in my real application scenario, I have a table with many columns. It has primary key column and other two columns( say A, B) come under this NOT NULL problem.

I will put primary key column in WHERE clause. But I don't know A / B is NULL or NOT NULL. but definitely one of these columns will be NOT NULL and the other will be NULL.

If NULL, I will leave as it is. If NOT NULL I have to UPDATE that column value with COST.

Any way thanks especially, I have to thank you for I have learnt a lot, still learning, exploring Oracle.. from you. ( One on One / ASK TOM.. etc )

Regards,
PRakash

Skip redundant update

Charlie MuJiang, March 04, 2009 - 7:33 pm UTC

Oracle 11.1.0.7 on Linux 64bit.

Scenario: our book seller upload their books data into our database,
there are many columns in book table, but book sellers do not tell us which column to update, they supply everything.

My question is how to skip the column that is not changed? then we can dramatically reduce the undo/redo.
I heard SQL*Server can ignore this kind of update "intelligently". :)


Here is an example test case:

-- drop table book;

create table book
(
book_id number,
author varchar2(100),
title varchar2(100),
price number(10,2),
qty number(5),
constraint book_pk primary key(book_id)
);


Here is heap table to simulate the source external table,

-- drop table book_ext;

create table book_ext
(
book_id number,
author varchar2(100),
title varchar2(100),
price number(10,2),
qty number(5)
);

insert into book(book_id, author, title, price, qty) values( 1, 'Bob', 'tube in winter', 10.52, 3);
insert into book(book_id, author, title, price, qty) values( 2, 'Joe', 'tube in summer', 10.52, 4);
insert into book(book_id, author, title, price, qty) values( 3, 'Jerry', 'tube in spring', 10.52, 5);

insert into book_ext(book_id, author, title, price, qty) values( 1, 'Bob', 'tube in winter', 21, 3);
insert into book_ext(book_id, author, title, price, qty) values( 2, 'Joe', 'tube in summer', 22, 4);
insert into book_ext(book_id, author, title, price, qty) values( 3, 'Jerry', 'tube in spring', 23, 5);

commit;


Here is merge SQL

MERGE INTO book d
using (select book_id, author, title, price, qty from book_ext ) s
on (d.book_id = s.book_id)
WHEN MATCHED THEN
update set
d.author = s.author,
d.title = s.title,
d.price = s.price,
d.qty = s.qty
;

You can see only price changed, and the SQL I really want to run is:

MERGE INTO book d
using (select book_id, author, title, price, qty from book_ext ) s
on (d.book_id = s.book_id)
WHEN MATCHED THEN
update set
d.price = s.price
;

Any suggestion will be appreciated. hope I did not post in the wrong place and get deleted. ^_^

Thanks,
Charlie
Tom Kyte
March 04, 2009 - 9:02 pm UTC

... My question is how to skip the column that is not changed? then we can
dramatically reduce the undo/redo.
I heard SQL*Server can ignore this kind of update "intelligently". :)
....

a) if the column is indexed, we intelligently do not touch the index if the column was unchanged

b) no, you cannot skip a column row by row, you either 1) update it, or 2) do not update it - for every row in the update

c) define the sqlserver feature, no baiting, technically and accurately describe what it does please.
ops$tkyte%ORA10GR2> create table t
  2  as
  3  select rpad('*',40,'*') a, rpad('*',40,'*') b , rpad('*',40,'*') c
  4    from all_objects
  5  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @mystat "redo size"
ops$tkyte%ORA10GR2> set echo off

NAME                                               VALUE
--------------------------------------------- ----------
redo size                                       50913500

ops$tkyte%ORA10GR2> update t set a = a, b = b, c = c;

49999 rows updated.

ops$tkyte%ORA10GR2> @mystat2
ops$tkyte%ORA10GR2> set echo off

NAME                                               VALUE DIFF
--------------------------------------------- ---------- ------------------
redo size                                       79983900       29,070,400

ops$tkyte%ORA10GR2> select used_ublk from v$transaction;

 USED_UBLK
----------
      1771

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @mystat "redo size"
ops$tkyte%ORA10GR2> set echo off

NAME                                               VALUE
--------------------------------------------- ----------
redo size                                       79984024

ops$tkyte%ORA10GR2> update t set a = a, b = b;

49999 rows updated.

ops$tkyte%ORA10GR2> @mystat2
ops$tkyte%ORA10GR2> set echo off

NAME                                               VALUE DIFF
--------------------------------------------- ---------- ------------------
redo size                                      104823452       24,839,428

ops$tkyte%ORA10GR2> select used_ublk from v$transaction;

 USED_UBLK
----------
      1325

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @mystat "redo size"
ops$tkyte%ORA10GR2> set echo off

NAME                                               VALUE
--------------------------------------------- ----------
redo size                                      104823620

ops$tkyte%ORA10GR2> update t set a = a;

49999 rows updated.

ops$tkyte%ORA10GR2> @mystat2
ops$tkyte%ORA10GR2> set echo off

NAME                                               VALUE DIFF
--------------------------------------------- ---------- ------------------
redo size                                      123809580       18,985,960

ops$tkyte%ORA10GR2> select used_ublk from v$transaction;

 USED_UBLK
----------
       969

ops$tkyte%ORA10GR2> commit;

Commit complete.





results: undo is more, but not 2x, 3x more.
results: redo is more, but not 2x, 3x more.

there is some 'intelligence' there.


further:

ops$tkyte%ORA10GR2> create index t_idx on t(a,b,c);

Index created.

ops$tkyte%ORA10GR2> @mystat "redo size"
ops$tkyte%ORA10GR2> set echo off

NAME                                               VALUE
--------------------------------------------- ----------
redo size                                      202805140

ops$tkyte%ORA10GR2> update t set a = a, b = b, c = c;

49999 rows updated.

ops$tkyte%ORA10GR2> @mystat2
ops$tkyte%ORA10GR2> set echo off

NAME                                               VALUE DIFF
--------------------------------------------- ---------- ------------------
redo size                                      227344072       24,538,932

ops$tkyte%ORA10GR2> select used_ublk from v$transaction;

 USED_UBLK
----------
      1317

ops$tkyte%ORA10GR2> commit;

Commit complete.


Charlie, please at least skip unchanged rows !

Sokrates, March 05, 2009 - 4:28 am UTC

Charlie, please don't
MERGE INTO book d
using (select book_id, author, title, price, qty from book_ext ) s
....


but

MERGE INTO book d
using (select book_id, author, title, price, qty from book_ext minus select book_id, author, title, price, qty from book) s
...

which skips all unchanged rows !

Skip redundant update

Charlie MuJiang, March 05, 2009 - 1:45 pm UTC

Hi Tom,

1. Please remove the sentence about SQL Server. I know this site is serious and profession. I'll ask my SQL Server DBA friend to confirm later.
2. I feel lucky the previous comments did not get deleted. :), so it's related.

OK, let me give you a big picture.

*) Here is our real data distribution and statistics,
Average row length is about 1000 bytes, author/title/description/subject/key_words these synopsis columns used 90%+ spaces, and less updated; price/qty these columns used less than 10% space, freuently updated.

I use RunStats did same undo/redo benchmark before, in our case, redundant updates exhaust too much resources.

*) The update columns combination patterns arrive randomly,
e.g.
author + price, 20 rows
price + qty, 900 rows
title + ISBN, 50 rows

For example, 2 rows update price, 2 rows update author and qty:
insert into book(book_id, author, title, price, qty) values( 1, 'Bob', 'tube in winter', 10.52, 4);
insert into book(book_id, author, title, price, qty) values( 2, 'Joe', 'tube in summer', 10.52, 5);
insert into book(book_id, author, title, price, qty) values( 3, 'Jerry', 'tube in spring', 10.52, 6);
insert into book(book_id, author, title, price, qty) values( 4, 'David', 'tube in fall', 10.52, 7);
commit;

insert into book_ext(book_id, author, title, price, qty) values( 1, 'Bob', 'tube in winter', 55.11, 4);
insert into book_ext(book_id, author, title, price, qty) values( 2, 'Joe', 'tube in summer', 72.11, 5);

insert into book_ext(book_id, author, title, price, qty) values( 3, 'BBB', 'tube in spring', 10.52, 9);
insert into book_ext(book_id, author, title, price, qty) values( 4, 'DDD', 'tube in fall', 10.52, 9);
commit;

*) I want to group same update pattern rows together to do bulk update to reduce the PL/SQL -> SQL engine context switch.

*) To make it more complex, 80% book sellers do not supply book_id, we have to match existing rows to know if it is a new book or an existing book. we call product catalog rollup, Amazon called item aggregation.
Our current solution is to generate hash_code based on these synopsis columns, do a 2 level lazy match, 1st level on hash_code, 2nd level on synopsis columns.

It is kind of a common real business scenario, happened in many B2C/B2B eCommerce applications, include Amazon, they have same issue.

I just want to try many approaches, to make a better efficient design. (acturally it's a rewrite, or refactoring).

Thanks,
Charlie
Tom Kyte
March 05, 2009 - 2:12 pm UTC

1) no, it is ok, it is just that we'd need information about "what that means" in order to address it.


... redundant
updates exhaust too much resources.
....

define that for us, what are you seeing - give numbers.

Skip redundant update

Charlie MuJiang, March 05, 2009 - 3:18 pm UTC

We do not index these synopsis columns. Synopsis data will be sent to our book Search Engine, to support front end UI keyword search. There is only one primary key index on book table.
(some B2C sites just do not update database, and send the synopsis data to Search Engine directly if the effected data in a row is less than 80%.)

STAT...redo size:
2,589,196 (update price and qty 2 columns)
23,906,048 (update all columns)

Here is the benchmark:

-- drop table book;

create table book
(
book_id number,
vendor_id number,
price number(10,2),
qty number(5),
hash_code varchar2(32 byte) not null,
author varchar2(750),
title varchar2(750),
vendor_short_isbn number(13,0),
book_condition varchar2(30 byte),
book_editor varchar2(255 byte),
subject varchar2(2000 byte),
publisher_name varchar2(750 byte),
description varchar2(4000 byte),
constraint book_pk primary key(book_id)
) nologging;

insert /*+ append */ into book
(book_id, vendor_id, price, qty, hash_code, author, title, vendor_short_isbn, book_condition, book_editor, subject, publisher_name, description)
select
rownum book_id,
mod(rownum,50) vendor_id,
10.55 price,
30 qty,
rpad('x',31,'x') hash_code,
rpad('x',100,'x') author,
rpad('x',100,'x') title,
999999999012 vendor_short_isbn,
rpad('x',20,'x') book_condition,
rpad('x',30,'x') book_editor,
rpad('x',30,'x') subject,
rpad('x',30,'x') publisher_name,
rpad('x',700,'x') description
from dual
connect by level <= 10000;

commit;

-- drop table book2;
create table book2 nologging
as
select * from book;

-- drop table book_ext;
create table book_ext nologging
as
select * from book;

set serveroutput on

begin
runStats_pkg.rs_start;

MERGE INTO book d
using (select book_id, price, qty from book_ext ) s
on (d.book_id = s.book_id)
WHEN MATCHED THEN
update set
d.price = s.price,
d.qty = s.qty
;
commit;

runStats_pkg.rs_middle;

MERGE INTO book2 d
using (select * from book_ext ) s
on (d.book_id = s.book_id)
WHEN MATCHED THEN
update set
d.vendor_id = s.vendor_id,
d.price = s.price,
d.qty = s.qty,
d.hash_code = s.hash_code,
d.author = s.author,
d.title = s.title,
d.vendor_short_isbn = s.vendor_short_isbn,
d.book_condition = s.book_condition ,
d.book_editor = s.book_editor ,
d.subject = s.subject,
d.publisher_name = s.publisher_name,
d.description = s.description
;
commit;

runStats_pkg.rs_stop(0);

end;
/

Run1 ran in 15 hsecs
Run2 ran in 82 hsecs
run 1 ran in 18.29% of the time

Name Run1 Run2 Diff
STAT...HSC Heap Segment Block Changes 10,005 10,006 1
STAT...consistent changes 15 19 4
STAT...CPU used by this session 15 23 8
STAT...redo buffer allocation retries 0 10 10
LATCH.Consistent RBA 4 17 13
LATCH.lgwr LWN SCN 4 17 13
LATCH.mostly latch-free SCN 4 17 13
LATCH.In memory undo latch 5 27 22
LATCH.redo writing 12 51 39
STAT...change write time 2 42 40
LATCH.enqueues 4 48 44
STAT...redo entries 10,006 10,050 44
LATCH.redo allocation 15 64 49
STAT...Elapsed Time 15 82 67
STAT...dirty buffers inspected 0 103 103
LATCH.undo global data 8 163 155
STAT...redo subscn max counts 40 357 317
STAT...redo ordering marks 39 357 318
STAT...db block changes 20,061 20,428 367
STAT...free buffer inspected 111 479 368
STAT...hot buffers moved to head of LRU 39 542 503
STAT...db block gets 10,097 10,866 769
STAT...session logical reads 10,851 11,668 817
STAT...IMU undo allocation size 72 1,080 1,008
STAT...IMU Redo allocation size 0 15,308 15,308
STAT...physical read bytes 32,768 0 -32,768
STAT...undo change vector size 928,332 11,571,404 10,643,072
STAT...redo size 2,589,196 23,906,048 21,316,852

Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
55,492 59,657 4,165 93.02%

PL/SQL procedure successfully completed.

Tom Kyte
March 05, 2009 - 4:27 pm UTC

23mb is tiny. Why are you worried about that???? It is nothing.

Skip redundant update

Charlie MuJiang, March 05, 2009 - 5:26 pm UTC

Do you mean 23MB redo per 10,000 rows update is tiny?

We update 30~50 millions rows a day, about 70~120GB redo, not much. our Linux PC server is not powerful, HP DL380, 2CPU, 8GB memory, we will upgrade it to 32GB soon.

Back to your track,

1. it is slow(Elapsed Time), we need to reduce the latency and improve the throughput.
2. it takes 700% more memory to do the in memory hash join.
(For big book seller, from time to time we need to join source data more than 1 million rows, 12M * 100 = 1.2G PGA, not big)
3. Latches is OK, about same, scalability should be OK.

FYI, here is the memory used during execution.

-- Oracle 11.1.0.7
set serveroutput off
set linesize 120
set pagesize 0

MERGE /*+ gather_plan_statistics */ INTO book d
using (select book_id, price, qty from book_ext ) s
on (d.book_id = s.book_id)
WHEN MATCHED THEN
update set
d.price = s.price,
d.qty = s.qty
;

SELECT * FROM table(dbms_xplan.display_cursor(NULL,NULL, 'iostats memstats last'));

commit;

----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1 | | 1 |00:00:00.15 | 10832 | | | |
| 1 | MERGE | BOOK | 1 | | 1 |00:00:00.15 | 10832 | | | |
| 2 | VIEW | | 1 | | 10000 |00:00:00.01 | 752 | | | |
|* 3 | HASH JOIN | | 1 | 10000 | 10000 |00:00:00.01 | 752 | 865K| 865K| 1409K (0)|
| 4 | TABLE ACCESS FULL| BOOK_EXT | 1 | 10000 | 10000 |00:00:00.01 | 376 | | | |
| 5 | TABLE ACCESS FULL| BOOK | 1 | 10000 | 10000 |00:00:00.01 | 376 | | | |
----------------------------------------------------------------------------------------------------------------------

MERGE /*+ gather_plan_statistics */ INTO book2 d
using (select * from book_ext ) s
on (d.book_id = s.book_id)
WHEN MATCHED THEN
update set
d.vendor_id = s.vendor_id,
d.price = s.price,
d.qty = s.qty,
d.hash_code = s.hash_code,
d.author = s.author,
d.title = s.title,
d.vendor_short_isbn = s.vendor_short_isbn,
d.book_condition = s.book_condition ,
d.book_editor = s.book_editor ,
d.subject = s.subject,
d.publisher_name = s.publisher_name,
d.description = s.description
;

SELECT * FROM table(dbms_xplan.display_cursor(NULL,NULL, 'iostats memstats last'));

commit;

----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1 | | 1 |00:00:00.75 | 11580 | | | |
| 1 | MERGE | BOOK2 | 1 | | 1 |00:00:00.75 | 11580 | | | |
| 2 | VIEW | | 1 | | 10000 |00:00:00.02 | 752 | | | |
|* 3 | HASH JOIN | | 1 | 10000 | 10000 |00:00:00.02 | 752 | 11M| 2085K| 12M (0)|
| 4 | TABLE ACCESS FULL| BOOK_EXT | 1 | 10000 | 10000 |00:00:00.01 | 376 | | | |
| 5 | TABLE ACCESS FULL| BOOK2 | 1 | 10000 | 10000 |00:00:00.01 | 376 | | | |
----------------------------------------------------------------------------------------------------------------------


Do you think try to skip the redundant update will not help much compare to the code complexity to handle it?

Tom Kyte
March 05, 2009 - 6:03 pm UTC

...
We update 30~50 millions rows a day, about 70~120GB redo, not much. our Linux
PC server is not powerful, HP DL380, 2CPU, 8GB memory, we will upgrade it to
32GB soon.
.....


A person can only comment on information provided. I asked "what are your numbers".

You gave me a for example with 23mb.

From that, you will get a comment "23mb, small"

You did not tell me what you were really experiencing.



... 2. it takes 700% more memory to do the in memory hash join. ...

well, sure it does - but I don't see how you can avoid that - even if you could selectively "not update column X on a row by row basis" - you would *STILL* need column X.

and if it puts a memory strain on you, we can page to disk.





Your choices are:

a) make multiple passes - update the columns most expected to change, and then update the other columns that are not expected to change.

ops$tkyte%ORA11GR1> create table t1 ( x int, freq_change int, infreq_change int );

Table created.

ops$tkyte%ORA11GR1> create table t2 ( x int, freq_change int, infreq_change int );

Table created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> insert into t1 select rownum, user_id, 42 from all_users;

49 rows created.

ops$tkyte%ORA11GR1> insert into t2 select rownum, case when mod(rownum,2) = 0 then user_id/2 else user_id end, case when mod(rownum,10) = 0 then 55 else 42 end from all_users;

49 rows created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> merge into t1
  2  using (select x, freq_change from t2 minus select x, freq_change from t1 ) t2
  3  on (t1.x = t2.x)
  4  when matched then update set t1.freq_change = t2.freq_change
  5  /

24 rows merged.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> merge into t1
  2  using (select x, infreq_change from t2 minus select x, infreq_change from t1 ) t2
  3  on (t1.x = t2.x)
  4  when matched then update set t1.infreq_change = t2.infreq_change
  5  /

4 rows merged.



b) do it "slow by slow", procedurally, using an update that is "perfect" for each update (hard on the shared pool, but in reality you'll probably only generate 10-20 unique update statements over time)

c) do it in bulk as you are




Question for you: is every row in the target table represented in this batch file? I mean, could we just replace it?

Skip redundant update

Charlie MuJiang, March 05, 2009 - 6:54 pm UTC

Thanks Tom for keep looking into this question.

You are reading my mind and our new system design.

Here is my new design,

Our book table have 50 columns, the table is partitioned by Vendor_ID (Book seller ID),
Beside index on primary key, index the hash_code column to handle the small chunk of source data. Both are local partition index.
hash_code has something like the bloom filter, False negatives are not possible, False positives are possible.

We have 5 types of flat file to be uploaded into inventory database, each vendor upload different file:

INSERT.
DELETE.
UPDATE (few, from UI, through async messaging system, dump to interface flat file, the read as external table).
MERGE/Unknown(insert or update).
PURGE = the Replacement, replace whole inventory for the seller, it will take advantage of exchange partition here.

As for UPDATE and skip unchanged columns, in fact, it is the 2nd step of MERGE(not SQL MERGE).

1) Before generate staging data for update, we need to match source rows and target rows to find out which column(s) need to be updated.
E.g. 10000 rows in a file, 8000 rows update column qty, 2000 rows update column author and ISBN, all the rest columns data are not changed.

2) And then group same update pattern rows together to do bulk update to reduce the PL/SQL -> SQL engine context switch.

It is like your suggestion a) plus b), but not slow by slow. (now all our IT and PM department knew "slow by slow". :D )
Yes, the popular update columns combination patterns should be less than 20, and to reduce the overhead on Library Cache.
For less popular update columns combination patterns, we simply update all columns.

We will have a counting log table to help to decide which pattern is popular. at same time, store the UPDATE/MERGE sql text in the table, do not need to generate the dynamic SQL every time.

Anyway, this makes the code complex, I wonder if it is worth doing...

Tom Kyte
March 05, 2009 - 8:20 pm UTC

"I wonder if it is worth doing..."

if your number 1 goal is "reduce to a bare minimum the redo generated", then the answer is probably "yes"

Suggested approach based on Tom's table compare routine

Stew Ashton, March 06, 2009 - 10:03 am UTC

create table t_source(key1 constraint t_source_pk primary key, val1, val2, val3)
as select object_id, object_name, owner, status from all_objects where rownum <= 100
/
-- create table succeeded.

create table t_target as select * from t_source
/
-- create table succeeded.

delete from t_source where key1 = (select min(key1) from t_source)
/
-- 1 rows deleted

update t_source
set val1 = case when mod(rownum,2) = 1 then '1' else val1 end
,val2 = case when mod(floor((rownum)/2),2) = 1 then '2' else val2 end
,val3 = case when mod(floor((rownum)/4),2) = 1 then '3' else val3 end
where rownum <= 7
/
-- 7 rows updated

insert into t_source values((select max(key1)+1 from t_source), '11', '22', '33')
/
-- 1 rows inserted

create global temporary table t_temp as
select t_target.*, 0 old_cnt, 0 new_cnt, 0 mod_type, 0 column_update_flag from T_TARGET where 0=1
/
-- create global succeeded.

insert into t_temp
select a.*
, sum(new_cnt - old_cnt) over(partition by KEY1) mod_type
, case when new_cnt <> 1 or count(*) over(partition by KEY1) = 1 then 0 else (
case when lag(val1) over(partition by KEY1 order by new_cnt) = val1 then 0 else 1 end
+ case when lag(val2) over(partition by KEY1 order by new_cnt) = val2 then 0 else 2 end
+ case when lag(val3) over(partition by KEY1 order by new_cnt) = val3 then 0 else 4 end
) end column_update_flag
from (
select KEY1, VAL1, VAL2, VAL3, sum(old_cnt) old_cnt, sum(new_cnt) new_cnt from (
select o.*, 1 old_cnt, 0 new_cnt from T_TARGET o
union all
select n.*, 0 old_cnt, 1 new_cnt from T_SOURCE n
) group by KEY1, VAL1, VAL2, VAL3 having sum(old_cnt) <> sum(new_cnt)
) a
/
-- 16 rows inserted

-- >> for mod-type, -1 means delete, 1 means insert, 0 means update
-- >> column_update_flag indicates the appropriate update statement to use depending on column mix
-- >> unchanged records have been eliminated

select * from t_temp
/
-- KEY1 VAL1 VAL2 VAL3 OLD_CNT NEW_CNT MOD_TYPE COLUMN_UPDATE_FLAG
-- ----- -------- ------ ------- -------- -------- --------- -------------------
-- 2 C_OBJ# SYS VALID 1 0 -1 0
-- 3 I_OBJ# SYS VALID 1 0 0 0
-- 3 1 SYS 3 0 1 0 5
-- 6 C_TS# SYS VALID 1 0 0 0
-- 6 C_TS# SYS 3 0 1 0 4
-- 11 I_USER# SYS VALID 1 0 0 0
-- 11 1 2 VALID 0 1 0 3
-- 19 IND$ SYS VALID 1 0 0 0
-- 19 1 2 3 0 1 0 7
-- 28 CON$ SYS VALID 1 0 0 0
-- 28 1 SYS VALID 0 1 0 1
-- 46 I_COL2 SYS VALID 1 0 0 0
-- 46 I_COL2 2 VALID 0 1 0 2
-- 49 I_CON2 SYS VALID 1 0 0 0
-- 49 I_CON2 2 3 0 1 0 6
-- 102 11 22 33 0 1 1 0
--
-- 16 rows selected

-- >> inserts new records, deletes old records, updates records when all columns have changed

merge into t_target t
using t_temp s
on (s.key1 = t.key1)
when matched then update set val1 = s.val1, val2 = s.val2, val3 = s.val3
where mod_type = -1 or column_update_flag = 7
delete where mod_type = -1
when not matched then insert (KEY1, VAL1, VAL2, VAL3) values (s.KEY1, s.VAL1, s.VAL2, s.VAL3)
/
-- 3 rows merged

-- >> updates records with val1 changed

merge into t_target t
using (select key1, val1 from t_temp where column_update_flag = 1) s
on (s.key1 = t.key1)
when matched then update set val1 = s.val1
/
-- 1 rows merged

-- >> updates records with val1 and val2 changed (you can see where this is going)

merge into t_target t
using (select key1, val1, val2 from t_temp where column_update_flag = 3) s
on (s.key1 = t.key1)
when matched then update set val1 = s.val1, val2 = s.val2
/
-- 1 rows merged

To Stew Ashton suggested approach

Charlie MuJiang, March 06, 2009 - 3:58 pm UTC

Hi Stew Ashton,

You approach is for replacement. I'd rather load source data into an interim table and then exchange the partition.

If target table has 100,000 rows, I only want to upload(MERGE, update/insert) part of the change data, such as 3000 rows, no delete; average row length is 1000 bytes.

Can you do a benchmark for this case?
to prove that your approach it more efficient(use less resources/latches) than simple update/merge all columns?

All the interim processes are not inexpensive, not to say we have 50+ columns in book table to GROUP BY, such as

==
insert into t_temp
select a.*
from (
...
union all
...
) group by KEY1, VAL1, VAL2, VAL3 having sum(old_cnt) <> sum(new_cnt)
) a
==

Thanks,
Charlie

Skip redundant update

Charlie MuJiang, March 06, 2009 - 4:10 pm UTC

Hi Tom,

My plan A is MERGE/update all columns, and make plan B that develop complex PL/SQL code to update and skip the redundant column as a backup plan.

I'll start to build prototype for both. And then balance the performance and complexity (easy to maintain) after experiment several representative production data set.

Meanwhile, keep looking for other simple and efficient approaches.

Thanks,
Charlie

@Charlie re:suggested approach

Stew Ashton, March 07, 2009 - 2:24 am UTC


Charlie, you are better equipped to test alternatives than I, because you know the data and you have the data.

I am sure my "suggested approach" is *worse* than others in every respect except one :
it achieves your stated goal (update only changed columns) in a set-based manner.

Personally, when I want to reduce redo and undo I avoid updates to unchanged ROWS, just as Sokrates said above. I never worry about unchanged COLUMNS. I am glad you have chosen another Plan A.

I posted this approach because it was so easy to tweak Tom's table compare method in order to pinpoint changed columns along with changed rows. That bit might conceivably be of interest to someone.

Correction to column compare code

Stew Ashton, March 09, 2009 - 8:27 am UTC


There is a bug in my big insert statement about three screens up:
case when lag(val1) over(partition by KEY1 order by new_cnt) = val1 then 0 else 1 end
will evaluate to 1 when the old and new values are both NULL.One solution would be NVL:
case when nvl(lag(val1) over(partition by KEY1 order by new_cnt),chr(0))
= nvl(val1,chr(0)) then 0 else 1 end
Sorry about that :(

Jess, February 02, 2012 - 11:50 am UTC

Hi Tom,

Regarding your comment above
"no, you cannot skip a column row by row, you either 1) update it, or 2) do not update it - for every row in the update"

Given that that's the case, what is your advice as an alternatives for cases when you really want to? I'm looking at needing to update a table that's about 2M rows. It's 50 columns wide and is indexed on 1-column PK. The update will affect approx. half the table (so 1M rows) and will definitely update 3 columns for each of those rows with a new value. For those rows, it will also conditionally update 20 more columns, but a new value will only be a tiny percentage of rows. With a decode or case, it'll overwrite existing value with the same value in the majority of cases even though that's not technically necessary. This also needs to be done for 5 more tables of similar shape and size. Is there really no better way?



Tom Kyte
February 02, 2012 - 1:59 pm UTC

if I were to update 1,000,000 out of 2,000,000 rows - I would probably not update.

I would probably create table as select <the data I wanted>.

That would likely be much faster, it will skip undo, it can skip redo (if you do this non-logged).

Jess, February 02, 2012 - 2:36 pm UTC

Thanks Tom.
There was a bit of a push for update, so not to deal with recreatng constraints, etc., but I'll try to push us down the route of your recommendation.


Tom Kyte
February 02, 2012 - 2:51 pm UTC

is it something you can do with dbms_redefinition? It'll automate the coping of constraints and you can perform transformations during the redefinition...

Jess, February 03, 2012 - 10:31 am UTC

I totally did not think about that.... That's an excellent idea and is likely to smooth over the complaints about overhead of doing create as select....
Thank you as always, you're a lifesaver!