Skip to Main Content
  • Questions
  • How to do update (replace values) in a table contains 50+ million records?

Breadcrumb

We're taking a break this week as Connor and Chris will both be at CloudWorld, the premier Oracle conference of 2024. If you're in Vegas, please come say Hi or pop into our sessions

Question and Answer

Connor McDonald

Thanks for the question, Felix.

Asked: January 12, 2016 - 6:29 am UTC

Last updated: February 28, 2024 - 6:36 am UTC

Version: ORACLE 11g

Viewed 10K+ times! This question is

You Asked

Hi,
I have a table contains 50+ million records, and I am writing a procedure to replace the bad data to the correct values(about 1500 records).
K_V is the array of bad data and target correct value,like
K_V('bad data1') := 'correct value1'

when I loop the K_V, do
'update table set xx=replace(xx,bad data,correct value);'

This procedure run whole night but still can not finish.
So how can deal with this problem? Seems I can not write the procedure that way.

Thanks.

and Chris said...

Are you updating 1,500 rows from 50 million, or do you have 1,500 entries in k_v?

In any case, I don't see a where clause on your update. So you're updating all 50 million rows (k_v size) times. No wonder it's taking a long time!

To overcome this, load the bad -> good data mapping into a temporary table. Then you can run the update once on only the rows that you need to change, e.g.:

update t
set    xx = (select new_val
      from   gtt
      where  old_val = xx)
where  exists (
  select null from gtt
  where  old_val = xx
);


This updates all the rows in t where there is a matching "bad value" in the temporary table. If you're updating just 1,500 rows from the 50 million and there's an index on XX this should do the job well.

To do this, you must first load the values from k_v into your temp table. Here's a complete example of this:

create table t as 
  select cast(decode(mod(rownum, 3), 
              0, 'bad data1', 
       1, 'bad data2', 
       'good data') as varchar2(10)) xx 
  from dual connect by level <= 100;

create global temporary table gtt (
  old_val varchar2(50),
  new_val varchar2(50)
) ;

select xx, count(*) from t
group  by xx;

XX           COUNT(*)
---------- ----------
good data          33
bad data2          34
bad data1          33

declare
  type tp is table of varchar2(10) index by varchar2(10);
  k_v tp;
 
  type gtt_tab is table of gtt%rowtype index by pls_integer;
  vals gtt_tab;
 
  key varchar2(10);

begin

  k_v('bad data1') := 'good data1';
  k_v('bad data2') := 'good data2';
 
  key := k_v.first;
 
  while key is not null loop
   
    vals(vals.count+1).old_val := key;
    vals(vals.count).new_val := k_v(key);
  
    key := k_v.next(key);
  
  end loop;

  forall i in indices of vals
    insert into gtt
    values (vals(i).old_val, vals(i).new_val);

  update t
  set    xx = (select new_val
          from   gtt
        where  old_val = xx)
  where  exists (
    select null from gtt
    where  old_val = xx
  );

end;
/

select xx, count(*) from t
group  by xx;

XX           COUNT(*)
---------- ----------
good data          33
good data2         34
good data1         33


What if you're updating a large number of the rows in your table?

In this case it's likely to be quicker to do the "update" using create table as select:

create table t_temp as
  select coalesce(new_val, xx) xx
  from   t
  left join gtt
  on     xx = old_val;


Do this instead of the update in the code above. You'll need to switch over T and T_TEMP after this.

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6407993912330

Rating

  (2 ratings)

We're not taking comments currently, so please try again later if you want to add a comment.

Comments

Thanks a lot

A reader, January 13, 2016 - 1:52 am UTC

K_V has only 300+ items, bad records in db is around 1500+.
Bad data and correct value are part content of 'XX' in table.
I have got the point.
Thanks so much for answering my question. It is quite helpful for me.

Leandro, February 27, 2024 - 9:21 pm UTC

I'm a bit late to the party (almost a decade in fact), but I don't get why a temporary table is necessary for the exemple, given OP performs a simple update on a lone table.

I mean, wouldn't this produce the same optmized result:

update table 
set xx=replace(xx,bad data,correct value)
where xx like '%bad data%';



The WHERE clause would filter the 1500 records with bad data and just after perform the update, wouldn't it?
Connor McDonald
February 28, 2024 - 6:36 am UTC

Take another look at the question.

It looks like it is not a straight single (bad|good) pair but a set of 1500 of them

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library