Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Pablo.

Asked: January 02, 2002 - 10:07 am UTC

Last updated: January 06, 2021 - 5:08 pm UTC

Version: 8.1.5

Viewed 100K+ times! This question is

You Asked

Tom:

We have a 6 millons rows table and we need to clean it. This process will delete 1,5 millons.


My first approach was create a SP with this lines:

SET TRANSACTION USE ROLLBACK SEGMENT Rbig;
DELETE FROM CTDNOV WHERE CTDEVT IN (4,15); (1,5m rows)
COMMIT;

Then I submited a job to run the SP at night. This process took more than 4 hours to finish and generated
a huge amounts of archives.



2nd Approach (i didn't test it, i think this is slower than the 1st approach. According to you book is
faster one commit at the end than a lots of m/n-rows commits)



from Metalink

Create PL/SQL procedure:

CREATE OR REPLACE PROCEDURE delete_tab (tablename IN VARCHAR2,
empno IN NUMBER ,
nrows IN NUMBER ) IS

sSQL1 VARCHAR2(2000);
sSQL2 VARCHAR2(2000);
nCount NUMBER;

BEGIN

nCount := 0;
sSQL1:='delete from '|| tablename ||
' where ROWNUM < ' || nrows || ' and empno=' || empno;
sSQL2:='select count(ROWID) from ' || tablename ||
' where empno= ' || empno;

LOOP

EXECUTE IMMEDIATE sSQL1;

EXECUTE IMMEDIATE sSQL2 INTO nCount;

DBMS_OUTPUT.PUT_LINE('Existing records: ' || to_char(ncount) );

commit;

EXIT WHEN nCount = 0;

END LOOP;

END delete_tab;
/

3. Execute above created procedure

SQL> execute delete_tab('big_emp',7369,5000)
Existing records: 60537
Existing records: 55538
Existing records: 50539
Existing records: 45540
Existing records: 40541
Existing records: 35542
Existing records: 30543
Existing records: 25544
Existing records: 20545
Existing records: 15546
Existing records: 10547
Existing records: 5548
Existing records: 549
Existing records: 0

PL/SQL procedure successfully completed.


3th Approach (it's seems to be the better choice)

When a DELETE is issued, Oracle stores the whole deleted row in the rollback segments,
so you can undo the changes later, if you want to. So there is an image of the rows in rollback
which are currently not present in the table. Hence the phrase 'before image'. Now all the rollback
blocks are written to the redo log files too. So you have the data blocks with the table (without
the deleted rows, of course) and the rollback blocks with the old image both producing redo,
which accounts for additional archive logs. I cant comment on the ratio of deleted records to ratio of
redo produced, though.
If you do an insert, only the rowid is stored in the rollback segs, which generates less redo.
Depending on the number of rows in your tables, it may be better for you to insert the records you
dont want to delete into a temp table, truncate the main table and move the records back to the main.
This would probably generate less redo than delete, but that depends on number of rows.



Above, i describe the table i want to clean

from dba_segments

SEGMENT_NAME EXTENTS BYTES
---------------------------------------- ---------- ----------
CTDNOV 26 2276614144





from dba_extents

SEGMENT_NAME EXTENT_ID Kbytes
---------------------------------------- ---------- ----------
CTDNOV 0 520
1 520
2 800
3 1200
4 1800
5 2680
6 4000
7 6000
8 9000
9 13480
10 20200
11 30280
12 45440
13 68136
14 102240
15 153312
16 229968
17 20480
18 30720
19 46080
20 69120
21 103680
22 155520
23 233280
24 349920
25 524880




I'm interesting about your opinion, what do you think is the best way to clean(delete millons of rows) a big table?. How deallocate extents from the table?, how generate the minimum amount of redo's?


Thanks in Advance and sorry for my english
Pablo R.



and we said...

Well, #2 is the slowest way and generates the most amount of redo by far.

#1 is my preferred way unless you want to get more radical (see below).

The problem with #3 is you are forgetting about indexes, you can do a direct path INSERT /*+ APPEND */ to move the data, but the indexes would still be maintained and they generate gobs of redo. So, in this case the insert might do as much as the delete.


Other cases to consider:

case1:
create table new_table unrecoverable as select * from old_table where ....;
drop table old_table;
rename new_table to old_table;
create index old_table_idx1 on old_table(c1,c2) unrecoverable parallel 5;
.....

NO log on that, just move the data to a new table, drop/rename old/new, create indexes as fast as possible without log.

case2:
partition the data, do a parallel delete. Each partition will use its own rollback segment, each will run in parallel.


case3:
partition the data so that you can do a DROP partition instead of DELETE.



Rating

  (94 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Deallocating tables

Pablo Rovedo, January 02, 2002 - 3:56 pm UTC

Tom: Regarding with the 3 methods described above, #1,#2 don't deallocate space (extents) however the last approach allows redefine the table storage (reset HWM), so we recover tablespace space.

Tom Kyte
January 02, 2002 - 4:34 pm UTC

so? in general, if you delete a million records, you are going to put them back sometime won't you. You only reclaim that space for a brief moment of time. You were talking like this is an ongoing requirement -- if so, reclaiming the space is sort of pointless.

if you like #3, you should REALLY like my twist on it then:

create table new_table unrecoverable as select * from old_table where ....;
drop table old_table;
rename new_table to old_table;
create index old_table_idx1 on old_table(c1,c2) unrecoverable parallel 5;

it would be faster then any of the others, "reclaims" the maximum space, and generates the smallest redo/undo. Satisfying all of your requirements.

Deleting Rows

Anand Gopalan, January 03, 2002 - 12:08 am UTC

Tom ,

We faced a similar situation , where in we had to delete millions of rows based on ceratin date. i.e. to delete all rows in the table where the date is < less than target date. Since ours is a continuouslu updated table we could not follow approach 1 , i.e. usinng unrecoverable option as we would loose rows that are inserted after creating a temp table from the existing one. so What we did is a PL/SQL script

cursor
select rowid from table

In loop

delete from table where rowid = rowid

and do a commit after selected number of rows

say if count > 500
commit.

This way we avoid rollback segement contention and at the same time we also avoid frequent commit ( instead of commit for every row)

This solution really works fast for us.

Hybrid possibility

Dave, January 03, 2002 - 1:12 am UTC

As a variation on the partitioning and the CTAS options, in the past I have defined the table to have a single partition, used CTAS to create a table of the same structure with the data to be kept, and performed a partition exchange to swap the original and the new data sets. The advantage for me was that I could use the "excluding indexes" option, and perform index rebuilds instead of coding the index creation scripts. Less maintenance if indexes are added, dropped or modified.

Oh, and it also handily kept the old data set handy in case of a screwup.

How to delete if the "redo" entries are not necessary

Bill, December 04, 2002 - 10:09 pm UTC

Hi Tom:

I have several tables (about 1 million rows) that I want to delete some of the records (20,000 to 50,000 rows) for sure. I want a quick DELETE and there is no requirement to generate rollback data because the data are definitely not needed. What do you think is the best way to delete them ? I mean I could use regular delete statement but not sure if I could even TURN OFF the redo entries generated (before image) ? Any thoughts. Thanks.

Tom Kyte
December 05, 2002 - 7:40 am UTC

ARG -- why do people always say "there is no requirement for redo/rollback -- the data is not needed".

What about the data you are NOT deleting, is that data needed???? If not, just truncate and be done with it. If it is needed -- then you do in fact need ROLLBACK and REDO. redo/Rollback isn't just to restore the data you deleted in this case, it is there to put the database table BACK the way it was so it is not some scrambled bunch of bits and bytes in the middle of a change that is totally unusable after a crash. That table is a data structure -- if you crash or fail in the middle of modifying it -- it is left in an unusable, dangerous state. That is what redo/rollback is all about -- protecting the data that is left behind, as well.

they are definitely NEEDED


So, now that that is out of the way -- that redo/rollback is DESIRABLE and unavoidable -- we'll look at why rollback isn't even the problem here -- it is index maintanance.

Deleteing 20-50k rows without indexes -- fast.

big_table@ORA920.US.ORACLE.COM> select count(*) from t;

COUNT(*)
----------
1876416

big_table@ORA920.US.ORACLE.COM> set timing on
big_table@ORA920.US.ORACLE.COM> delete from t where owner in ( 'SYSTEM', 'WKSYS' );

42560 rows deleted.

Elapsed: 00:00:13.82


so, no worries there. pretty fast. but...


big_table@ORA920.US.ORACLE.COM> create index t_idx2 on t(object_id);

Index created.

Elapsed: 00:00:26.92

big_table@ORA920.US.ORACLE.COM> create index t_idx1 on t(timestamp);

Index created.

Elapsed: 00:00:38.72

big_table@ORA920.US.ORACLE.COM> delete from t where owner in ( 'SYSTEM', 'WKSYS' );

42560 rows deleted.

Elapsed: 00:00:29.49


The run time is almost tripled for the delete (still, pretty darn fast -- and this is on my desktop PC, a p4 with 512meg and a single ide disk -- perhaps this is as they say "much ado about nothing"). It is the index maintenance, not redo/rollback that is an issue here.


You can use the approaches above however, in your case, I would say this is a couple of second, maybe minutes operation depending on the WIDTH of the data and the number of indexes you have to maintain. I would just do the delete.

How to delete if the "redo" entries are not necessary

Bill, December 04, 2002 - 10:29 pm UTC

Sorry, forgot to mention. Some of the tables contain Indexes.

Another approach

Praveen, March 11, 2003 - 3:53 am UTC

Tom,

Why do you think that the following approach is not the fastest one ?

Create a new table with only data that are NOT to be deleted and then drop the original table.

Thanks and regards

Praveen

Tom Kyte
March 11, 2003 - 7:58 am UTC

umm, where did you get the idea that my suggestion (made in the original answer) would not be fast?


...
Other cases to consider:

case1:
create table new_table unrecoverable as select * from old_table where ....;
drop table old_table;
rename new_table to old_table;
create index old_table_idx1 on old_table(c1,c2) unrecoverable parallel 5;
........


given 1, 2, and 3 to choose from -- I would use #1.

If not constrained to their 1,2,3 -- I would use one of my case 1, case 2, case 3.

partition-based parallel delete/update?

chuan sat, July 08, 2003 - 1:08 pm UTC

Tom
You are extremely good!

You mentioned somewhere above that if we delete data from muliple partitions, the deletions will happen in parallel per partition. Is this automatic? or do I have to specific 'parallel' keyword somewhere in the delete statement. Is the UPDATE also happening in parallel in partitioned table?

Thanks so much


Tom Kyte
July 08, 2003 - 1:42 pm UTC

you have to be doing parallel dml (pdml)

see the server concepts guide -- they have a really good chapter on this topic.

parallel dml on non-partitioned big table

Ling, July 09, 2003 - 8:04 am UTC

I don't understand why Oracle doesn't provide parallel dml(update and delete) on non-partitioned big table. Technically, it isn't so hard: you just simulate the way parallel query works, that is, you spawn some session, each scan a portion of whole table, delete/update the rows that satisfy your condition. Actually, I wrote such a utility using shell and sqlplus and it works fine. Of course, it would be more decent to write it in Pro*C. The commit limitation on pdml on partitioned table also applies

Tom Kyte
July 09, 2003 - 11:13 am UTC

we do. see
</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96520/tuningpe.htm#64628 <code>

the table only needs to be partitioned if there is a bitmap index



technically it is very hard -- think about it for a while. consider

o integrity
o uniqueness

and the fact that you could easily -- quickly -- self deadlock yourself.



detailed and to the point

Bob Coonrad, July 28, 2004 - 8:25 am UTC

thanks for the insight. I am surrently struggling with a way to implement a data purging scheme on a system that is running 24/7. Currently doing simple timed deletes filtered by date....this is causing occasional deadlocks. This article presents lots of options. Still not sure which way to go.....but i'm now armed with the necessary info to make an informed decision.

A reader, August 19, 2004 - 11:25 am UTC


Unable to extent Index while deleting records from a table.

Vivek Sharma, September 16, 2004 - 2:37 am UTC

Dear Tom,

I have a table partition with around 21 Million Records and based on some condition, I am deleting 1 Million records. While deleting, I am getting ora-01654 error.

delete /*+ parallel(a,8) from vivek.ft_cur_trans_header partition(SYS_P29510) a
where TH_CLIENT_ID = 'UNIQUE' or TH_CLIENT_ID = 'UCC';

delete /*+ parallel(a,8) from vivek.ft_cur_trans_header partition(SYS_P29510) a
*
ERROR at line 1:
ORA-01654: unable to extend index VIVEK.TH_CLIEND_ID_IDX by 6400 in
tablespace HEADER_IDX

Why my delete is causing an Index to extend ? What could be the reason ?

Thanks and Regards
Vivek

Tom Kyte
September 16, 2004 - 8:05 am UTC

ops$tkyte@ORA9IR2> create table t
  2  as
  3  select * from all_objects order by mod(object_id,0);
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create index t_idx on
  2  t(object_id,object_name,owner,object_type,last_ddl_time,created) tablespace
  3  small_extent pctfree 0;
 
Index created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select blocks from user_segments where segment_name = 'T_IDX';
 
    BLOCKS
----------
       248
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> alter session enable parallel dml;
 
Session altered.
 
ops$tkyte@ORA9IR2> delete /*+ parallel(t,8) */ from t;
 
27890 rows deleted.
 
ops$tkyte@ORA9IR2> commit;
 
Commit complete.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select blocks from user_segments where segment_name = 'T_IDX';
 
    BLOCKS
----------
       274
 

<b>PDML does space differently.  If you were to do this "serially", it should not grow the index</b>

 

Not Clear

Vivek Sharma, September 17, 2004 - 3:12 am UTC

Dear Tom,

Your above explanation is not clear and convincing. If it does that other way, then why ? When I delete a record, it has just got to delete the entry from the Index Block then why is it extending the index ?

I tried this, but the blocks ion dba_segments remained the same.


SQL> select count(*) from vivek;

  COUNT(*)
----------
    145622

1 row selected.

SQL> insert into vivek select * from vivek;

145622 rows created.

SQL> commit;

Commit complete.

SQL> select count(*) from vivek;

  COUNT(*)
----------
    291244

1 row selected.

SQL> update vivek
  2  set object_id=rownum ;

291244 rows updated.

SQL> commit;

Commit complete.

SQL> create index vivek_idx on vivek(object_id);

Index created.

SQL> select blocks from user_segments where segment_name ='VIVEK_IDX';

    BLOCKS
----------
       768

1 row selected.

SQL> alter session enable parallel dml;

Session altered.

SQL> delete /*+ parallel(a,8) */ from vivek where mod(object_id,25)=0;

11649 rows deleted.

SQL> commit;

Commit complete.

SQL> select blocks from user_segments where segment_name ='VIVEK_IDX';

    BLOCKS
----------
       768

1 row selected.

SQL> alter session disable parallel dml;

Session altered.

SQL> delete from vivek where mod(object_id,26)=0;

10753 rows deleted.

SQL> commit;

Commit complete.

SQL> select blocks from user_segments where segment_name ='VIVEK_IDX';

    BLOCKS
----------
       768

1 row selected.

SQL> alter session enable parallel dml;

Session altered.

SQL> delete /*+ parallel(a,8) */ from vivek;

268842 rows deleted.

SQL> commit;

Commit complete.

SQL> select blocks from user_segments where segment_name ='VIVEK_IDX';

    BLOCKS
----------
       768

1 row selected.

It did not changed. I forgot to mention yesterday that I was deleting it from a HASH Partitioned table. Could it be because of that ?

Regards
Vivek 

Tom Kyte
September 17, 2004 - 9:11 am UTC

*parallel* does things *differently*

*parallel* does *space mgmt differently*


I have shown that deletes can and will increase the size of an index, what more do you need? Indexes are complex data structures, parallel does wacky stuff (has too, highly concurrent -- many parallel processes hitting the same exact structures)


i set up teeny tiny extents.
i used pctfree 0
i proved that a delete can grow an index.

that is all you need to know here? "yes, a delete can and will cause an index to grow at times"



How to know much have been deleted.

Sean, November 14, 2004 - 9:15 pm UTC

Hi Tom,

Here are the info of two tables:
audit_cdm 20 millions
audit_cdm_temp 7 millions

I want to delete audit_cdm records if they exists in audit_cdm_temp. All records in audit_cdm_temp exist in audit_cdm. Within these 7 millions records needed to be deleted, there may be 100,000 duplicates in audit_cdm which I don't want to delete.

I have to delete them during the weekend within 40 hr. I have set redo_retention time to 40hr. The delete has been going on for about 30 hrs. If I know the deletion takes too long, I will use different approach, such as deleting 1 Millions or less for each batch. By the way, I can not use "create table" method since the table is always in use.

Here is the query:

DELETE FROM audit_cdm a
WHERE EXISTS (SELECT null
FROM audit_cdm_temp c
WHERE a.loan_key = c.loan_key
AND nvl(a.row_type, 0) = nvl(c.row_type, 0)
AND a.update_time = c.update_time)
AND NOT EXISTS (SELECT null
FROM audit_cdm b
WHERE a.loan_key = b.loan_key
AND nvl(a.row_type, 0) = nvl(b.row_type, 0)
AND a.update_time = b.update_time
AND a.rowid <> b.rowid)

There are index on these three columns on both tables.

My question:
Is there any way to know how long the deletion will take or how much have been deleted?

Oracle 9204.

Thanks so much for your help.


Tom Kyte
November 15, 2004 - 6:29 am UTC

whats up with nvl(a.row_type,0) = nvl(c.row_tpye,0)

but your delete looks fishy -- it says delete from audit_cdm where the key is in audit_cdm_temp UNLESS there is a duplicate -- then keep ALL DUPS.

My approach would have been something like:

delete from audit_cdm
where rowid
in ( select rid
from (select a.rowid rid,
count(a.rowid)
over (partition by
a.loan_key,nvl(a.row_tpye,0),a.update_time) cnt
from audit_cdm a, audit_cdm_temp c
WHERE a.loan_key = c.loan_key
AND nvl(a.row_type, 0) = nvl(c.row_type, 0)
AND a.update_time = c.update_time
)
where cnt = 1
)


that is, just join audit_cdm to audit_cdm_temp - and only keep rowids from audit_cdm where the count of rowids for that "key" is 1 (no dups)....


that nvl() however is something I'd look at and say "did I really mean to use NULL on row_type or was that a mistake we need to fix..."



How much deletion is done.

Sean, November 15, 2004 - 3:14 pm UTC

Hi Tom,

The combination of three columns is supposed to be unique, but we never put unique constraint on it. Some of the columns have null value for row_type when they are first entered. We are in the process to clean up and redesign this table.

Thanks so much for your query and it is running now.

I still would like to know much have been deleted before it is done. Since I have setup large undo retention time for this operation, I can not let this operation going on for too long, otherwise my undo tablespace will be full.

Are there any dictionary views to monitor this deletion?

Thanks so much for your help.


Tom Kyte
November 15, 2004 - 9:08 pm UTC

v$session_longops might be useful, you can monitor the full scans that would be going on.

Very interesting

Helena Marková, November 16, 2004 - 5:29 am UTC


How to monitor the deletion.

Sean, November 16, 2004 - 3:31 pm UTC

Hi Tom,

Though my deletion has been running for about 24 hr. It even didn't show in v$session_longops view when I use sid and serial# to query it.

Thanks so much for your help.


Tom Kyte
November 16, 2004 - 11:18 pm UTC

so, if you explain that query, what is the plan.

Possible new Oracle feature??

A reader, December 03, 2004 - 1:47 pm UTC

Tom,

Somewhere in this thread you answered a question with:

'ARG -- why do people always say "there is no requirement for redo/rollback -- the data is not needed". What about the data you are NOT deleting, is that data needed???? If not, just truncate and be done with it. If it is needed -- then you do in fact need ROLLBACK and REDO'

I think sometimes people say they don't need redo/rollback because the rows being deleted are not needed anymore. They don't care if the data is lost or corrupted or whatever, it is garbage and is not needed anymore. So the quicker they can delete the rows, the better.

Having said that, would it be possible for Oracle to add a feature somewhat similar to an autonomous transaction where you can temporarily "disable" redo/rollback *only* for the rows being deleted? Wouldn't it be faster to delete rows that way, assuming you could disable redo/rollback for the rows affected by the WHERE clause?

Tom Kyte
December 03, 2004 - 2:05 pm UTC

but the problem is that it is the INDEX that would be corrupt -- not the deleted "rows".

A table is a complex data STRUCTURE, not just a bunch of rows.
An index is a horribly complex data STRUCTURE.

A delete is an operation that makes changes to both structures -- so the delete fails without any sort of undo available. Great -- so the row is sort of gone (maybe), 2 out of 4 of the indexes don't point to it anymore - but 2 out of 4 do (but it isn't there, or is it?)

You see, you have lots of things being modified all together. It is not as simple as removing a line of text in a file.

No, it would not be faster, it would be A MESS.

parallel delete/update,

sns, March 25, 2005 - 2:25 pm UTC

Tom,

I think you may have answered this question somewhere.

I need a clarification regarding delete statement running in parallel (say forcing by hint).

Cases:

1.Can we run the delete /*+ parallel(t1,4) */ from t1 where a=100;
where t1 is a non-partitioned table?

2.Can we run the delete /*+ parallel(t1,4) */ from t1 where a=100;
where t1 is a range partitioned table on column "a"?

3.Can we run the delete /*+ parallel(t1,4) */ from t1 where a=100;
where t1 is a range partitioned table and the column that is partitioned on "b"?

4. Can we run the delete /*+ parallel(t1,4) */ from t1 whrere a=100 and b=100;
where t1 is a hash partitioned table on column a and b?


Similar questions for the update statement too.

I need to know how parallelism work on all the 4 cases.
Will there be any deadlock issues when running anyone of them in parallel?
In case # 3, will parallel process executes multiple partitions at a time?
In general I need to know the pros and cons of delete/update statement running in parallel.

Is "alter session enable parallel dml" required for any of the above cases?

Thanks a lot,


Tom Kyte
March 25, 2005 - 6:56 pm UTC

1) version specific, but today -- yes.

2) same as #1

3) ditto

4) again.

they can all be parallelized with or without physical partitioning (assuming you have somewhat current software). They will not deadlock they each get different ranges to process.


see:

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96520/tuningpe.htm#64317 <code>

A follow up question,

sns, March 25, 2005 - 7:04 pm UTC

Thanks for the explanation. We have oracle version 9.2 and above. Parallel DML should not be a problem in this case.

I need some more clarification of you statement....
"They will not deadlock they each get different
ranges to process."

Say the table (t1) is partitioned on date column (each partition holds one week worth of data).

If I say delete /*+ parallel(t1,4) */ from
t1 where date between '01-jan-05' and '01-feb-05';
Oracle would scan 4 partitions and does it issues 4*4 parallel slaves to process the delete statement?

If my "where" condition is different than the date column
and if I specify parallel degree 4, how does oracle distributes data assigning one range to each of the 4 parallel slave?

Thanks,


Tom Kyte
March 26, 2005 - 9:07 am UTC

they would each (regardless of predicate) get one of N (n=degree of parallelism) non-overlapping ranges of data to process.

it'll break it up by how the data is laid out on disk, sort of like "process 1, you do all of the x = 5 rows in extent 1, process 2 you get extent 2 and so on"


I would be worried however if I truly needed a parallel delete on an ongoing basis, there are better physical designs you could consider that would permit you to simply roll partitions of data off instead of having to try and make an inheritly slow process faster.

Delete statement

shams, September 13, 2005 - 12:36 am UTC

I have a table t1
ID actno type seq src
1 111 CHK 1 L
1 111 SAV 2 L

table t2
ID actno type seq src
1 111 CHK 2 GL
1 111 SAV 3 L

I want to delete the records in t2 for the records that has no sequence matching records in t1 and src != 'GL'.
In the given example
the following record from t2 should be deleted

1 111 SAV 3 L

and not the first record

Thanks,
Shams


Tom Kyte
September 13, 2005 - 11:49 am UTC

need more info -- am I to assume that we can join t1 to t2 by what?

id, actno, seq?

delete

shams, September 13, 2005 - 12:21 pm UTC

The joing criteria
is t1.id = t2.id
and t1.acct_no = t2.acct_no
and t1.type = t2.type

The records in T2 are deleted only for the following criteria
t1.src = 'L' and t2.src = 'L'
t1.src = 'GL' and t2.src = 'GL'

and the records are not touched in t2 if
t1.src = 'L' and t2.src = 'GL'

I tried
delete from t2
where seq not in
( select seq from t1
where t1.id = t2.id
and t1.acct_no = t2.acct_no
and t1.type = t2.type
)
did not get the desired results

Thanks in advance

Tom Kyte
September 13, 2005 - 1:09 pm UTC

I'm confused -- if you need to "join" to see if the src's are right -- but you cannot join (since the seq is missing)??????





delete

shams, September 13, 2005 - 1:27 pm UTC

I want to delete the missing sequences from t2 that are not in t1 and only if

The records in T2 are deleted only for the following criteria
t1.src = 'L' and t2.src = 'L'
t1.src = 'GL' and t2.src = 'GL'

and the records are not touched in t2 if
t1.src = 'L' and t2.src = 'GL'



delete from t2
where acct_no in (select acct_no from t1)
where seq not in
( select seq from t1
where t1.id = t2.id
and t1.acct_no = t2.acct_no
and t1.type = t2.type
)
The above logic deletes the missing sequences but fails the checking criteria as mentioned above

The records in T2 are deleted only for the following criteria
t1.src = 'L' and t2.src = 'L'
t1.src = 'GL' and t2.src = 'GL'

and the records are not touched in t2 if
t1.src = 'L' and t2.src = 'GL'


Please let me know what will be your approach to solve the problem
Thanks for the help

Tom Kyte
September 13, 2005 - 3:53 pm UTC

how can t1.src = 'L' and t2.src = 'L' if the T1 record does not exist??? That is my question, for that sequence, there is no 't1.src' record.

delete

shams, September 13, 2005 - 4:36 pm UTC

Let me explain with following ex:

I have a table t1 ( current day data)
ID actno type seq src
1 111 CHK 1 L
1 111 SAV 1 L
1 111 SAV 2 L
1 222 CHK 1 GL
1 222 SAV 1 GL


table t2(Previous days data)
ID actno type seq src
1 111 CHK 1 GL
1 111 CHK 2 GL
1 111 SAV 1 L
1 111 SAV 2 L
1 111 SAV 3 L
1 222 CHK 1 GL
1 222 SAV 1 GL
1 222 SAV 2 GL

( Note: No referential integrity on tables)


The table t1 get updated everyday by the interface and the table t2 needs to be synchronized with the sequence in T1

Lets say
The table t1 has the current day data and the t2 has previous days data.

if the t1.src = L and t2.src = GL; the extra sequences should not be deleted from t2

if the t1.src = GL and t2.src = L or GL ; the extra sequences from t2 that are not in t1 should be deleted

Only following 5t and 8th records should be deleted from t2

1 111 SAV 3 L
1 222 SAV 2 GL


Thanks in advance

Tom Kyte
September 13, 2005 - 4:41 pm UTC

is there an implied - but not enforced - data rule that says "for a given actno, type -- the SRC is constant, it will be the same for all rows with that actno, type"?

(has anyone thought about looking at this model if so??)

delete

shams, September 13, 2005 - 4:55 pm UTC

Let me explain with following ex:

I have a table t1 ( current day data)
ID actno type seq src
1 111 CHK 1 L
1 111 SAV 1 L
1 111 SAV 2 L
1 222 CHK 1 GL
1 222 SAV 1 GL


table t2(Previous days data)
ID actno type seq src
1 111 CHK 1 GL
1 111 CHK 2 GL
1 111 SAV 1 L
1 111 SAV 2 L
1 111 SAV 3 L
1 222 CHK 1 GL
1 222 SAV 1 GL
1 222 SAV 2 GL

( Note: No referential integrity on tables)


The table t1 get updated everyday by the interface and the table t2 needs to be synchronized with the sequence in T1

Lets say
The table t1 has the current day data and the t2 has previous days data.

if the t1.src = L and t2.src = GL; the extra sequences should not be deleted from t2

if the t1.src = GL and t2.src = L or GL ; the extra sequences from t2 that are not in t1 should be deleted

Only following 5t and 8th records should be deleted from t2

1 111 SAV 3 L
1 222 SAV 2 GL


Thanks in advance

delete

shams, September 13, 2005 - 6:09 pm UTC

Same account can come for SRC = 'L' or 'GL' interfaces
and the accounts are not specific to SRC

Regards

Tom Kyte
September 13, 2005 - 6:19 pm UTC

so, I am going back around to -- how can we tell, if we cannot join t1 to t2 (you are missing sequences), how do we know what t2.src is.

RE -- delete

Hannu, September 13, 2005 - 6:42 pm UTC

How about trying ..
delete
from t2
where id = 1 -- select whatever id
and acct_no in ( select acct_no from t1 where t1.src
= 'L' )
and seq not in
( select seq from t1
where t1.id = t2.id
and t1.acct = t2.acct
and t1.type = t2.type
)
This should delete
1 111 SAV 3 L


and Similarly
delete
from t2
where id = 1 -- select whatever id
and acct_no in ( select acct_no from t1 where t1.src
= 'GL' )
and seq not in
( select seq from t1
where t1.id = t2.id
and t1.acct = t2.acct
and t1.type = t2.type
)
This should delete
1 222 SAV 2 GL


What are your thoughts Tom?




Tom Kyte
September 13, 2005 - 7:00 pm UTC

I'm still trying to figure out the question, too many ambiguities still.

Back up large number of tables

Praveen, October 04, 2005 - 12:42 am UTC

Hi Tom,

I am facing the following scenario.

There are a large number of tables, say about 100, each containing about half a million to 1 million records, which needs to be back-up into similar tables (they are staging tables and no relations exists between them) before their contents are deleted and loaded with fresh data.

I adopted the following approach, rather than straight forward "truncate back-up tables", "insert into backup-tables select * from source-tables" and "truncate source-tables":

1)droped all existing back-up tables.

2)renamed source tables to back-up tables.

3)created source tables using "create source-tables as select * from backup-table where 1=2"

In case of any error during any of the above transactions, i can always "recover" using a log-table that traces process.

Do you mind to give away your way of thought?

Thanks and regards
Praveen

Tom Kyte
October 04, 2005 - 1:31 pm UTC

I've used two schemas for this.

schemaA and schemaB

and a third schema schemaC that has synonyms that point to either A or B.


So, say your have all of your private synonyms in your schemaC pointing to A right now. You want to load B so you:

a) truncate all of B's tables
b) load them
c) verify them
d) create or replace synonyms in C to point at B

tomorrow, you do it to A

and so on.




Brilliant method

Praveen, October 09, 2005 - 8:23 am UTC

Thanks Tom, that was an out-of-the-box thought!

Wierd situation because of delete-insert and insert again,

sns, October 10, 2005 - 11:50 am UTC

I am trying to do delete-insert on a pretty big table as per the business requirement.
I am now facing a wierd situation and don't know the reason behind that.

The name of the table that I am doing delete-insert: SERVICE_TAG_HEADER. SERVICE_TAG_NUM 
is the primary key in the base table as well as in my temp tables.

10:27:49 SQL> select count(*) from service_tag_header;
  
    COUNT(*)
  ----------
    86065415
   
The name of the other table that has the source data for deleting and inserting the base table:
  SNS_SERVICE_TAG_NUM
10:47:30 SQL> select count(*) from sns_service_tag_num;

  COUNT(*)
----------
  13406603

Out of 13 million records, 4.5 needs to be deleted and insert back and the rest 8.7 million
are fresh records.

I did delete first for those records that exists in the base table.  I did in a
pl/sql block one record at a time (but issued commit for every 100 records).  This approach finished
in 30 hours.
I tried other approaches like update (select... from ... where....) set...
This update took more a day and did nothing.  Later I killed it.

I have a temp table called SNS_ROWS_TO_BE_DELETED that has common records between my base table
and temp table (SNS_SERVICE_TAG_NUM).

select count(*) from SNS_ROWS_TO_BE_DELETED
  COUNT(*)
----------
   4541008
   
After my deletes and inserts are done on those 4.5 million records, I issued the SQL
10:48:14 SQL> select count(*) from sns_rows_to_be_deleted a, service_tag_header b
10:49:29   2  where  a.service_tag_num = b.service_tag_num;


  COUNT(*)
----------
   4541008
   
This ensured all the records are back in the base table.
   
Now, I tried to insert the remaining records 13Million - 4.5 Million = 8.7 million (approx)
using the following approach:

insert /*+ append */ into odsd_admin.service_tag_header
07:10:21   2  (service_tag_num,order_num_buid,order_num,company_num,location_num
07:10:21   3              ,customer_num_buid,customer_num,address_seq_num,address_type,ship_date
07:10:21   4              ,order_date,src_trans_timestamp,update_timestamp,ods_create_date
07:10:21   5              ,mch_type,service_hold,item_class,order_key_type,tie_num,item_sub_class
07:10:21   6              ,modify_date,modify_by,modify_pgm,ods_modify_date,create_by
07:10:21   7              ,create_pgm,create_date
07:10:21   8  )
07:10:21   9  select service_tag_num,order_num_buid,order_num,company_num,location_num
07:10:21  10              ,customer_num_buid,customer_num,address_seq_num,address_type,ship_date
07:10:21  11              ,order_date,src_trans_timestamp,update_timestamp,ods_create_date
07:10:21  12              ,mch_type,service_hold,item_class,order_key_type,tie_num,item_sub_class
07:10:21  13              ,modify_date,modify_by,modify_pgm,ods_modify_date,create_by
07:10:21  14              ,create_pgm,create_date
07:10:21  15  from
07:10:21  16           (
07:10:21  17           select
07:10:21  18               a.service_tag_num,a.order_num_buid,a.order_num,a.company_num,a.location_num
07:10:21  19              ,a.customer_num_buid,a.customer_num,a.address_seq_num,a.address_type,a.ship_date
07:10:21  20              ,a.order_date,a.src_trans_timestamp,a.update_timestamp,a.ods_create_date
07:10:21  21              ,a.mch_type,a.service_hold,a.item_class,a.order_key_type,a.tie_num,a.item_sub_class
07:10:21  22              ,a.modify_date,a.modify_by,a.modify_pgm,a.ods_modify_date,a.create_by
07:10:21  23              ,a.create_pgm,a.create_date,b.service_tag_num stn
07:10:21  24           from support.sns_service_tag_num a,odsd_admin.service_tag_header b
07:10:21  25           where a.service_tag_num = b.service_tag_num(+)
07:10:21  26          )
07:10:21  27   where stn is null
07:10:21  28    ;

This insert ran for the whole night and did nothing either.  However, its SID was not there
when I queried v$session table.  I am not sure what happened to the  insert statement.

Now I am facing the problem.  When I try to issue the same INSERT again, I get unique
constraint error.  

I issued the following SQL statement and I expect 8.7 million as output in case the above
insert has failed.  If the insert has succeeded then I should get 0 records.  However, I got...

10:34:17 SQL> select count(*) from 
  10:36:18   2  (select a.service_tag_num,b.service_tag_num stn 
  10:36:34   3  from sns_service_tag_num a, odsd_admin.service_tag_header b
  10:36:46   4  where a.service_tag_num = b.service_tag_num(+)
  10:36:58   5  )
  10:36:59   6  where stn is null;
  
    COUNT(*)
  ----------
    13406603  -- this count is really bugging me.  I was expecting 8.7 million records.

This is really bugging me.  
Why it says 13 million even though I did verify the 4.5 million
records exist in the base table?
Where are those 8.7 million records? 
What's wrong here?  

I also ran the following sql statement to make sure my TEMP tables are ok

10:43:08 SQL> select count(*) from sns_rows_to_be_deleted a, sns_service_tag_num b
10:43:24   2  where a.service_tag_num = b.service_tag_num;

  COUNT(*)
----------
   4541008  -- as expected.



The unique constraint error I am getting is probably because of the 4.5 million records
already there in the table.



Thanks a lot,


   

Tom Kyte
October 10, 2005 - 12:12 pm UTC

sorry - insufficient data/logging going on here - I've no idea what really happened and we don't seem to have an audit trail (log files of your own making)

about the previous question,

sns, October 10, 2005 - 5:01 pm UTC

Ok let me put it in a better way.

I have two sql's below and I am wondering why the first SQL is returning 0 records. It is returning in fraction of a second.

Basacilly both queries should return the same count.

In the first query "sss" is the alias to service_tag_num which I select from "b".

select count(*) from
(
select
a.service_tag_num,a.order_num_buid,a.order_num,a.company_num,a.location_num
,a.customer_num_buid,a.customer_num,a.address_seq_num,a.address_type,a.ship_date
,a.order_date,a.src_trans_timestamp,a.update_timestamp,a.ods_create_date
,a.mch_type,a.service_hold,a.item_class,a.order_key_type,a.tie_num,a.item_sub_class
,a.modify_date,a.modify_by,a.modify_pgm,a.ods_modify_date,a.create_by
,a.create_pgm,a.create_date,b.service_tag_num sss
from sns_service_tag_num a,odsd_admin.service_tag_header b
where a.service_tag_num = b.service_tag_num
)
where sss is null
COUNT(*)
----------
0

Elapsed: 00:00:00.00

select count(*) from support.sns_service_tag_num a,service_tag_header b
where a.service_tag_num = b.service_tag_num;
COUNT(*)
----------
4541008

Elapsed: 00:09:53.32

Tom Kyte
October 10, 2005 - 7:34 pm UTC

not a chance --

the first one is:

select count(*)
from ( .... )
where SSS IS NULL;

well, SSS is b.service_tag_num, and b.service_tag_num is IMPOSSIBLE to be NULL since it is your join column there and this isn't an outer join.

Hence, that first query will always return "0".

The second query counts the number of rows joined - it will return 0, 1 or more.

Options for archving old data

A reader, September 06, 2006 - 11:09 am UTC

Tom,

We are looking at the options of archiving data older than 12 months on a transcational system. We would like to do this on a regular basis (once every month). The total # of tables with data that need to be archived is around 400. The data in these tables grows at about 15-20 gig/month. The options we are considering are listed below. I would like your opinion on the options below and any other recommendations you may have .

1) Partition tables
2) Write SQL code to move data from transactional db to the archiving db
3) Combination of partitioning and SQL code
4) Transportable tablespace


Thank you


Tom Kyte
September 06, 2006 - 3:47 pm UTC

#1 comes first and foremost.
#2 is so far down the list that I would hope to not have to go there.

#4 would be used in conjunction with #1

ALTER TABLE MODIFY

Branka, September 12, 2006 - 3:20 pm UTC

I have table A that is 40GB size and 300000000 records.
Column A1 has to be modified from char(8) to char(17).

I created script:
CREATE TABLE A
( A1 CHAR(17 BYTE) NOT NULL ENABLE,
A2 VARCHAR2(20 BYTE) NOT NULL ENABLE,
A3 NUMBER NOT NULL ENABLE,
A4 VARCHAR2(1024 BYTE),
A5 NUMBER,
A6 VARCHAR2(12 BYTE),
A7 VARCHAR2(10 BYTE),
A8 VARCHAR2(30 BYTE) DEFAULT 'user' NOT NULL ENABLE,
A9 DATE DEFAULT sysdate NOT NULL ENABLE)
TABLESPACE DATA_160MB NOLOGGING;


INSERT /*+ APPEND */ INTO A (A1, A2, A3, A4, A5, A6, A7, A8, A9)
SELECT /*+PARALLEL(PAGE_PC)*/ A1, A2, A3, A4, A5, A6, A7, A8, A9
FROM PAGE_PC;
commit;

Oracle help person made decision that is better to do import into table (Oracle 9i).

What approach would you make?


Tom Kyte
September 12, 2006 - 5:42 pm UTC

I would approach this via:

alter table t modify column char(17);

it should be instantaneous.

why rebuild the entire table???

Modify column

Branka, September 12, 2006 - 5:57 pm UTC

I tried it on table that is 100 times smaller, and it run 1 hour.
I tried with table that is 10 time smaller, and it didn't finish after 3 hours.


Tom Kyte
September 13, 2006 - 7:09 am UTC

oh, doh, you used the hateful char type - that'll explain it, it has to rewrite the entire table - migrating rows and all.

Yet another reason I'll add to the list of reasons NEVER EVER to consider even thinking about using char.

a varchar2 modification would take the same amount of time regardless of the amount of data - 1m would take the same amount of time as 1 tb to modify.

Modify column

branka, September 13, 2006 - 9:40 am UTC

Using char instead of varchar is different story. But how would you migrate rows?
Would you use my insert or import (9i)


Tom Kyte
September 13, 2006 - 2:48 pm UTC

create table as select if I was allowed to incur downtime.
dbms_redefinition otherwise

I would never use export/imort - way risky. slow too.

modify column

Branka, September 13, 2006 - 3:00 pm UTC

Thank you.
I believe that you wouldn't insert chunk by chunk of data and make commit.
It is so hard to fight against that approach.


Modify column

Branka, September 13, 2006 - 3:31 pm UTC

How would you in create from select that column A1 is char(17)?

Tom Kyte
September 14, 2006 - 8:52 am UTC

cast

ops$tkyte%ORA10GR2> create table t
  2  as
  3  select cast( substr(username,1,17) as char(17)) a,
  4         cast( user_id as number(10,2) ) b,
  5         created c
  6    from all_users;

Table created.

ops$tkyte%ORA10GR2> desc t
 Name                                     Null?    Type
 ---------------------------------------- -------- ----------------------------
 A                                                 CHAR(17)
 B                                                 NUMBER(10,2)
 C                                        NOT NULL DATE

 

modify column

Branka, September 14, 2006 - 9:49 am UTC

Great. Thanks

OK

Raju, March 06, 2007 - 7:24 am UTC

Hi Tom,
I have a explicit cursor processing like this.

for x in (select * from a) loop
delete t1 where t1c1 = x.c1 and t1c2 = x.c2 and ...
delete t2 where t2c1 = x.c1 and t2c2 = x.c2 and ...
delete t3 where t3c1 = x.c1 and t3c2 = x.c2 and ...
end loop;

Since I am doing a row level processing I am not
able to get final count of rows deleted from the above 3 tables.
I tried this way of declaring three numeric variables initialised to zero and incrementing

it by 1 after each delete.
But it shows the same value for all the 3 tables.
Do you have any idea to get that?
Please let me know.
Thanks for your time.
Tom Kyte
March 06, 2007 - 11:08 am UTC

I hate your code, can we fix it???


delete from t1 where (c1,c2,...) in (select c1, c2, .... from a);
rows_deleted := sql%rowcount;


done, no loop.


Else, adding three counters is pretty basic stuff??? but you don't show us what you did, so it is hard to say why it is wrong.

OK

Rajesh, April 10, 2007 - 9:47 am UTC


Hi Tom,
We have a column "trans_date" which would be of Oracle date format in a transaction table

which will be added data for each sysdate as month progresses.
If transaction date moves to next month then some delete statements should
NOT HAPPEN.

if to_char(trans_date,'MMYYYY') <> next_month or last_day(current_month)
delete a;
delete b;
delete c;
end if;
Tom Kyte
April 10, 2007 - 11:22 am UTC

so, add a WHERE CLAUSE to the delete statement then.


you need no procedural code.

Purging Data

DM, April 11, 2007 - 1:50 am UTC

Hi

We have to purge/archive data from 4 tables into some files.These 4 tables does not have any relation ship as defined but they have dependency based on there PK.

Your suggesting are required how we can do this.

Thanks in Advance
DM
Tom Kyte
April 11, 2007 - 11:13 am UTC

... These 4 tables does not have any relation ship as defined but they have dependency based on there PK....


that does NOT compute at all.


I guess my only suggestion would be "write a program to accomplish your goals"

This is so vague as to be unanswerable.

Deleting many rows from several tables linked by foreign key contraints

Dmitry Kuznetsov, June 04, 2007 - 2:59 pm UTC

Dear Tom,

Thanks for all your answers, I really find many of them here useful for me (for example, the sobering message that UNDO/redo are actually necessary and complementary part of the data left in the table after deletion.

Although the topic of the thread is "Deletion from a big table", let us extend it to deletion from several tables linked together by referential integrity constraints with ON DELETE CASCADE clause. Our data are organized as a tree, being accommodated in such several tables, e.g. deletion of a single record from the top-most table propagates down to dependent tables, and the whole data tree is deleted with a single DELETE, then committed. The problem comes when that has to be scaled up, and amount of rows to delete in dependent tables become too large for one transaction (i.e. the estimated size of UNDO data is > 100 Gb).

Is there a way to break such a multi-table DELETE into several transactions?
Tom Kyte
June 06, 2007 - 12:26 pm UTC

if you are doing something that requires 100gb of undo, one might start to think about other better approaches.

Why are you deleting such massive amounts of data? Why are you not using partitioning or some other data structure to facilitate this activity. Or why are you not "purging" or whatever it is you are doing more frequently (so they are smaller)

Deleting many rows from several tables linked by foreign key contraints

Dmitry Kuznetsov, June 07, 2007 - 11:08 am UTC

Tom,

Thanks for the answer!

To be more clear, I must describe why we use tree-like structure. The data are biological datasets containing protein sequences and their annotations. Each instance of such tree-like structure I described is a version of a dataset, there are several datasets, and we want to keep certain number of versions of them. When time comes to delete an old version, it must be deleted as a unit, not gradually.
Of course, it can be marked as 'old', and then deleted chunk-by-chunk starting from lower tables, then the orphaned rows deleted from upper tables, and so on. But such an approach abolishes the essence of referential integrity principle, in my opinion. In fact, we used such an approach while using MySQL as DBMS, and one of the main motivations for us to migrate to Oracle was exactly the possibility to easily manage our datasets using ON DELETE CASCADE constraints.
I will look on partitioning (I assume you meant the horizontal, e.g. row-based table splitting) as you suggested.

Thanks a lot once more,
Best regards.
Dmitry.

Deleting from a 200+ million table that is continuously being inserted to

Sal, February 20, 2008 - 2:49 pm UTC

This is our audit trail table. 10.2 version.

It is a truely 24X7 operation and any user activity (insert/update/delete of other tables in the database) is logged to this audit trail table.

There are some 20 mill rows that are erroreous and they need to be deleted (using a criteria).

When I tried to delete in test environment, I found that the front application slowed down considerably causing timeouts. Other than asking for a downtime, what else can I do while deleting so that this slowdown does not occur?

Thanks.
Tom Kyte
February 20, 2008 - 3:25 pm UTC

why not....

a) create a new empty one
b) flip flop the tables (rename current to old, new to current)
c) create table keep as select * from old where <condition to filter out bad>

you still have all of the old audit information, just elsewhere. If it is important to have a single logical table you can

a) create a new empty partitioned table - with the partition boundary set to right now (two partitions - one for anything less than now, one for anything greater than now)

b) flip flop
c) create table keep as .....
d) exchange keep with the empty partition.


Deleting from a 200+ million table that is continuously being inserted to

Sal, February 20, 2008 - 6:21 pm UTC

Not trying to be a paranoid, but what will happen to the inserts that will occur while I am flip flopping the table names?

thanks
Tom Kyte
February 20, 2008 - 8:43 pm UTC

be paranoid. it is ok. I usually am.

you'll need a moment of quiescence for that - the inserts will have to stop temporarily.

delete using import

Fahad, March 15, 2008 - 4:00 am UTC

Even I have a very big table with millions of records. This table has many columns ,foreign keys and indexes. about 5 more tables are dependent on this one. I need to remove more than million records. Is create new table/rename method appropiate or can i do the following:
1.export the table using the 'QUERY=' clause with the condition to get the rows i need.
2.Drop the table.
3.Import the table from dump i got from step 1.
Are these steps enough?

Tom Kyte
March 15, 2008 - 9:52 am UTC

millions of records is not meaningful - size counts - millions of records is "small" these days.

what is "millions", how big is the table, why are you discounting just deleting the records you want to delete - that might be the most viable approach.

please remove export/import from your vocabulary when talking about performing a delete however, we will not even think about using them (already said that above)

delete

Fahad, March 17, 2008 - 5:12 am UTC

There are 7 million records in this table. We want to remove 6 million records from this table. This table has two foriegn keys and many other constraints. This table has reference to two tables and about 5 tables refer to this table. This table also has many indexes.
So based on your advice can i do the following steps:
1. Generate DDL script of this table using TOAD.This will give create statements of existing constraints and indexes also.
2. create new table as select from big table with condition.
3. Rename old/new table.
4. Execute the create contraint/indexes command i got from step 1.
Will this be ok ? Data integrity will be fine?
Please dont mind my lack of providing the complete information you need.I will be obliged if you properly understand and answer my question.
Tom Kyte
March 24, 2008 - 8:45 am UTC

or, delete 6 million records

you should already have the DDL, if you have to go to TOAD to get your production schema's, you have some serious issues.


if you create constraints, and you completely recreate all constraints, and all constraints are enabled and validated - then you can be sure that data integrity, as far as the enforced validated constraints go - is there.

i have the similar requirement

Kris, May 05, 2008 - 3:06 pm UTC

i have similar requirement...

my case is like this. i have schema with around 40 tables. most of them has more than 10,000,000 rows. there are three main tables and remaining are cascaded from these tables.

i have delete data from tables everyday. delete count will be around 100,000 - 300,000 everyday. right now it is taking more than 24 hrs for us to delete data of single day.

right now we are building a driver table and deleting from DB using that. do you suggest us to delete at once or delete 10,000 recs at a time?

what do you think is the better option to delete data?
Tom Kyte
May 06, 2008 - 12:51 am UTC

if by building a driver table you mean "we are going to stop doing slow by slow and start doing set operations" - I say "excellent"

and just do it in a single sql statement if at all possible.

Thats Quick.. let me explain bit more

Kris, May 06, 2008 - 11:40 am UTC

Hi Tom,

thanks for the quick reply..

my algorithm is like this

i have a temp table with indexes on it.

1). insert into select from (criteria to fetch the keys of three tables) i.e. building driver table.

2). DBMS_STATS.gather_table_stats on temp table

3). delete data from non cascaded tables as DELETE TABLE WHERE EXISTS (SELECT key FROM temp_table WHERE table.key = temp.key) and COMMIT;

4). delete parent tables which cascade to all child tables with same DELETE as above. and COMMIT after each statement.

5). END.

whats your comment on this algorithm. thanks for ut time.

*Kris
Tom Kyte
May 07, 2008 - 12:54 am UTC

2) could be dbms_stats.set_table_stats - you just populated, you already know the data in there...

3) I'd prefer "in" at that point - especially nice if KEY were defined as unique. If you have on delete cascade, why bother doing this bottom up?

exchange partition for housekeeping

Harry Zhang, May 24, 2008 - 3:14 am UTC

Hi Tom,

I learnt a lot here. Below is my test case. I have a table eda_log_item have huge amount of data every day needs housekeeping. My idea is to create a partitioned table eda_log_item_p and exchange the maxvalue partition with non partitioned table eda_log_item(step 6), then split maxvalue partition and keep the maxvalue partition empty for next exchange. Finally use drop partition for real housekeeping.

My question is when I do the exchange table with partition step, at the same time there are inserts into my non-partitioned table what will happend, will the insert fail or it will success? Do you think this approach is ok for OLTP enviroment?

Thanks always

-- step 1.create log table where data need housekeeping
CREATE TABLE eda_log_item
( DATE_TIME DATE not null,
LOG_DATA VARCHAR2(2000) not null,
LOG_ID NUMBER,
LOG_ITEM_ID NUMBER(10)
)
/
-- step 2.generate data
begin
for l in 1..1000 loop
insert into eda_log_item values (sysdate,'abc',l,l);
end loop;
end;
/
commit
/

-- step 3.create index on log table
create index EITEM_ELOG_FK_I on EDA_LOG_ITEM (LOG_ID)
/

-- step 4.create partition table for keep log data
CREATE TABLE eda_log_item_p
( DATE_TIME DATE not null,
LOG_DATA VARCHAR2(2000) not null,
LOG_ID NUMBER,
LOG_ITEM_ID NUMBER(10)
)
PARTITION BY RANGE (date_time)
(
PARTITION p1 VALUES LESS than (TO_DATE('30-12-2005','DD-MM-YYYY')) ,
PARTITION px VALUES LESS THAN ( MAXVALUE )
)
/

-- step 5.create local index on partition table
create index EITEM_p_ELOG_FK_I on EDA_LOG_ITEM_P (LOG_ID) local

-- step 6.exchange partition for eda_log_item with partition eda_log_item_p.px
alter table eda_log_item_p
exchange partition px
with table eda_log_item
INCLUDING INDEXES
WITHOUT VALIDATION
UPDATE GLOBAL INDEXES
/

-- step 7.get max value from last partition for split partition
select max(date_time) from eda_log_item_p partition (px);

-- step 8.split last partition in eda_log_item_p
alter table eda_log_item_p
split partition px
at ( TO_DATE('02-05-2008 10:23:53','DD-MM-YYYY hh24:mi:ss'))
into (partition p2, partition px)
update global indexes
/

-- step 9.drop partition p1 for housekeeping
alter table eda_log_item_p drop partition p1
/

-- end of test case --
Tom Kyte
May 24, 2008 - 7:01 pm UTC

the DDL (exchange) will require an exclusive lock on both - so, new transactions will block until the operation is done.

that said, it would be strange to exchange partition an actively inserted into table, very very strange indeed. I'd think a bit more about that....

exchange partition for housekeeping

Harry Zhang, June 06, 2008 - 8:16 am UTC

Hi Tom,

Thank you very much for the comments. I was thinking about this. Three benefits I can think about using this approach, want you to comment on.

To make it clear, three tables I mentioned.
1. eda_log -- parent table
2. eda_log_item -- child table
3. eda_log_item_p -- partitioned table

1. The eda_log_item table insert could be fast without choosing which partition to go

2. eda_log_item_p could have different index to speed up search

3. eda_log_item has foreign key point to eda_log table, so if I partition eda_log_item table, it needs an global index, which will slow down insert and delete. So if I partition eda_log_item_p and exchange partition with eda_log_item, eda_log_item only use normal index, insertand delete from parent table with cascade could be faster.


Tom Kyte
June 09, 2008 - 9:54 am UTC

in order to exchange the way you describe, the table would have to have the index on it - you said "including indexes", you need an index on the TABLE and the PARTITION. You gain nothing.


Deleting a big Parent table

Maverick, July 21, 2008 - 11:30 am UTC

Tom, I have a requirement to delete a big [well..kind of] table with 2 million rows. This is the parent table and has 10-12 child tables depending on the data it has in each row. They are all created with delete cascade with all the indexes in place. When I tried to delete data frrm this table, it takes almost 12-13 hours for that kind of data, which is awful.
[I am guessing it is doing full table scan on child tables while deleting]

What is the best way to delete ..I don't need Rollback/redo [I know you already said ARG!!] but, my requirement is deleting entire table not just a subset of data. So, I don't care about indexes etc..

what is the best you could suggest?
thanks for any ideas..
Tom Kyte
July 22, 2008 - 10:52 am UTC

... [I am guessing it is doing full table scan on child tables while deleting] ...

not if you have indexed it...


... I don't need Rollback/redo ...

you might not, but the database sure does.


... This is the parent table and has 10-12 child tables depending on the data
it has in each row ...

this doesn't compute. The parent table either has 10 or 11 or 12 child tables - the number of child tables is "static" - so, let us say it has 12.


And you want to remove ALL of the rows from the parent, you wrote:

... my requirement is deleting entire table not just a subset of
data. ...

so, suggestion:

a) disable 12 foreign keys
b) issue 13 truncates - one each for the child, one for the parent.
c) enable 12 foreign keys


fast, little undo, little redo, enabling constraints will be fast (no data), indexes will be "reset", everything...

Purging huge data from the list of table

sudheer_0886, September 10, 2008 - 6:47 am UTC

Hi Tom,
I have one issue like I need to delete a large number of (2 million) tuples from a table of 5 million based on some criteria.The criteria(condition) is fetched from one base table,based on the fetched data tuple is deleted.similarly I have to delete from 30 similar tables which are linked by foreign and primary key constraints.

Like "Delete from employees where id IN(select id from empinfo where empinid=100)"

I tried some options
-Writing all the queries in a file and running the script
-Using a cursor (creating a index table) and deleting particular tuple from the table

Both giving me a large response time.

Main concern here is time. I need to delete in very quick time.
So I need a optimised and efficient solution for the issue.

Thanks In Anticipation.
Tom Kyte
September 11, 2008 - 10:55 am UTC

we should have thought of this need when we designed our system

because then we probably would have used partitioning to segregate the data so that when we needed to purge - we would just truncate or drop a partition here and there and be done in seconds.


to delete 2/5ths of the data, given indexes foreign keys, child tables that need the delete cascaded to (lots of them) you have to sort of expect "it'll take a long time"

but you don't give us much to work with here. you don't say if you have a maintenance window (downtime) to do this offline. If you do, probably - a bunch of "create table as select <data to KEEP>", index these new tables, put constraints on them, etc etc etc - and the drop the old and rename the new would be the "fastest". Can be done easily in parallel, nologging if you wanted as well.

A reader, November 18, 2008 - 11:27 pm UTC

We have 30 days worth of data in a table which come to 5 million rows. On the 31st day, we want to delete the 1st day data (which will be abt 150,000) retaining 30 days of data and so on. What will be the best way to delete these large number of rows from the big table.
Tom Kyte
November 21, 2008 - 5:41 pm UTC

it would be by NOT USING DELETE.

This screams out for partitioning, so you can rapidly and easily drop or truncate the old data.

Partitioning - this is the classic case for it.

A reader, November 22, 2008 - 5:21 pm UTC

The application is being redesigned to use partitions, until then we need to find a temporary solution to delete the data like I explained before. Do you have a temporary solution or method for the delete ?
Tom Kyte
November 24, 2008 - 5:24 pm UTC

just delete it, it'll be inefficient but since no planning or design took place, there is not much you can do until it is designed to be efficient.


it'll be slow.
it'll generate gobs of undo and redo.
it'll be basically your only realistic choice.
it'll probably lead you to wanting to reorganize.

but wait to do that last bit until you have your partitioning scheme designed and reorganized ONE TIME into the new structure.



Truncating partitioned tables with referencing FK constraints

Graeme King, December 03, 2008 - 8:07 pm UTC

Hi Tom,

We have a partitioned table that we want to truncate the partition (in a very similar manner to the person above who wants to delete the data from the partition that contains data from the 1st of the month on the 31st and then reuse the 1st partition the next day etc) but we have a referring foreign key constraint that although the, referencing table has no data in it that relates to the partition we want to truncate, we get errors saying :

ORA-02266: unique/primary keys in table referenced by enabled foreign keys

We have the option of disabling the foreign key prior to the truncate of the partition, however that leaves the other partitions vulnerable for corruption while the truncate occurs until we re-enable the constraint, and because the constraint seems to be table wide, rather than partition wide, it takes a long time to check because we have so much data spread many partitions. We could use novalidate, but then we don't know if any inconsistent data exists.

Is there any better way of doing this type of management operation?

BTW we are using 10gR2

Thanks

Graeme





Tom Kyte
December 09, 2008 - 11:38 am UTC

In 11g - we have a good way - when using reference partitioning - the database understands there is a 1:1 relationship between parent and child partitions:

ops$tkyte%ORA11GR1> CREATE TABLE p
  2  (
  3    x   int primary key,
  4    dt  date,
  5    y   varchar2(30)
  6  )
  7  PARTITION BY RANGE (x)
  8  (
  9    PARTITION part1 VALUES LESS THAN (100),
 10    PARTITION part2 VALUES LESS THAN (200)
 11  )
 12  /

Table created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> CREATE TABLE c
  2  (
  3    x   int not null,
  4    dt  date,
  5    y   varchar2(30),
  6    constraint c_fk_p foreign key(x) references p(x)
  7  )
  8  PARTITION BY reference(c_fk_p)
  9  /

Table created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> insert into p select rownum, created, object_name from all_objects where rownum < 200;

199 rows created.

ops$tkyte%ORA11GR1> insert into c select rownum, created, object_name from all_objects where rownum < 200;

199 rows created.

ops$tkyte%ORA11GR1> commit;

Commit complete.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> alter table c truncate partition part1;

Table truncated.

ops$tkyte%ORA11GR1> alter table p truncate partition part1;

Table truncated.

ops$tkyte%ORA11GR1> alter table p truncate partition part2;
alter table p truncate partition part2
            *
ERROR at line 1:
ORA-02266: unique/primary keys in table referenced by enabled foreign keys



prior to that, an option would be:

a) checkpoint (makes truncate go faster - truncate = checkpoint segment + truncate).

b) make table read only somehow ( alter table t add constraint read_only_t check(1=0) enable novalidate; )

c) disable fkey, truncate, enable without validation

d) drop constraint in b



Delete millions with minimum locking issue and the best response time

shervinsh, December 04, 2008 - 8:55 pm UTC

Thanks Tom for your insight.
If table has high activity (table lock causes a problem) and also want to delete million off recrds from billion,what would be the best approach.
I read about
Create new table.
Copy from old to new.
Drop old
Rename new to old.

But this is not online operation and the chance to lose transactions are high.
Tom Kyte
December 09, 2008 - 11:57 am UTC

there would be no table lock.

why do you think there would be?

delete from t where <condition>

would lock only the deleted rows.

truncate partition table lock ....

Rajk, January 29, 2009 - 4:45 pm UTC

Tom,

I need to truncate data from older monthly partitions in a production system. Data gets continually loaded in to this partitioned table (in to newer partitions - current date).

Will an alter table truncate partition cause load errors in to the table because of the lock it takes on the table.

Thanks,
RajK
Tom Kyte
January 29, 2010 - 2:31 pm UTC

depends on how they do the load....

if loaders use convention path load - no contention unless they are loading into partition you try to truncate.

if loaders use direct path load into TABLE - then they will contend, the truncate by default won't work - it'll fail with ora-54

if loaders use direct path load into PARTITIONS - no contention unless they are loading the partition you try to truncate

consider:

ops$tkyte%ORA10GR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int,
  5    y   varchar2(30)
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('01-jan-2008','dd-mon-yyyy')) ,
 10    PARTITION part2 VALUES LESS THAN (to_date('01-jan-2009','dd-mon-yyyy'))
 11  )
 12  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t select to_date('01-jun-2007'), rownum, rownum from all_users;

45 rows created.

ops$tkyte%ORA10GR2> insert into t select to_date('01-jun-2008'), rownum, rownum from all_users;

45 rows created.

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t select to_date('01-jun-2007'), rownum, rownum from all_users;

45 rows created.

<b>I have transaction but only affecting partition part1</b>

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> declare
  2          pragma autonomous_transaction;
  3  begin
  4          execute immediate 'alter table t truncate partition part2';
  5  end;
  6  /

PL/SQL procedure successfully completed.

<b>I can, in another transaction, truncate part2</b>

ops$tkyte%ORA10GR2> select 'part1', count(*) from t partition(part1) union all select 'part2', count(*) from t partition(part2);

'PART   COUNT(*)
----- ----------
part1         90
part2          0

ops$tkyte%ORA10GR2> rollback;

Rollback complete.

ops$tkyte%ORA10GR2> select 'part1', count(*) from t partition(part1) union all select 'part2', count(*) from t partition(part2);

'PART   COUNT(*)
----- ----------
part1         45
part2          0

<b>that this goes back to 45 just shows they were separate and distinct transactions, I could roll back that conventional path insert</b>

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t select to_date('01-jun-2007'), rownum, rownum from all_users;

45 rows created.

ops$tkyte%ORA10GR2> insert into t select to_date('01-jun-2008'), rownum, rownum from all_users;

45 rows created.

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert /*+ APPEND */ into t PARTITION(part1) select to_date('01-jun-2007'), rownum, rownum from all_users;

45 rows created.

ops$tkyte%ORA10GR2> select 'part1', count(*) from t partition(part1) union all select 'part2', count(*) from t partition(part2);
select 'part1', count(*) from t partition(part1) union all select 'part2', count(*) from t partition(part2)
*
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel

<b>that proves I did a direct path load, further I did it into a specific partition - part1 - I said that in the insert, that is important</b>

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> declare
  2          pragma autonomous_transaction;
  3  begin
  4          execute immediate 'alter table t truncate partition part2';
  5  end;
  6  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> rollback;

Rollback complete.
<b>
we can both operation contention free, however:</b>


ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert /*+ APPEND */ into t select to_date('01-jun-2007'), rownum, rownum from all_users;

45 rows created.

ops$tkyte%ORA10GR2> select 'part1', count(*) from t partition(part1) union all select 'part2', count(*) from t partition(part2);
select 'part1', count(*) from t partition(part1) union all select 'part2', count(*) from t partition(part2)
                              *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> declare
  2          pragma autonomous_transaction;
  3  begin
  4          execute immediate 'alter table t truncate partition part2';
  5  end;
  6  /
declare
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified
ORA-06512: at line 4

<b>If I don't say what partition I'm direct pathing into, we will contend with each other.
</b>

hi

sr_orcl, March 25, 2009 - 8:57 am UTC

Hi,
This article is very useful.But how can i ask new question?

Deleting parent child table

Anu, October 06, 2009 - 5:35 am UTC

Hi Tom,
Please Clarify the below statement
"If you have on delete cascade, why bother doing this bottom up?"

I read that deleting the records from child table first and then from its parent table is correct approach to follow while deleting records.If the child records are deletedfirst now how to catch its parent record for deletion??

Regards,
Anu

Tom Kyte
October 08, 2009 - 7:03 am UTC

Anu

if you have on delete cascade set, then:


delete from p where .....;


is better than

delete from c where c_fk_to_p in (select pk from p where ... );
delete from p where ....;


delete cascade CASCADES the delete so you don't have to.

If you do not have on delete cascade, then you have to make sure the child records do not exist first if you want to succeed.




on delete cascade: just delete from parent;

no on delete cascade: you have to ensure no children exist before deleting from parent

implementation strategies to delete rows

Tony Fernandez, December 09, 2009 - 10:40 am UTC

Tom,

I have put together a list of possible alternatives to delete thousand of rows from a tenths of millions of rows table.

Can you please advice and see if the list is correct, can you please add/erase/edit as appropriate?

Here is the list:

( Listed from slowest to fastest, method and comments in italics )

1. Delete rows from a bulk collect, issue multiple commits until deletion exhausted. Redo/undo logs are limited as opposed to #2

2. Delete all rows at once where < Criteria >. Then issue a single Commit. Needs extended Redo/undo log space.

3. Copy rows to preserve into a new table. Drop table. Rename new table into old name. Build Indexes. Only redo logs generated, needs half space for logs.

4. Copy rows same as #3. use unrecoverable clause to stop building undo/redo logs. Build Indexes. No redo/undo logs generated, log spaces unneeded.

5. Partition table to issue parallel deletes per partition. Redo/undo logs are limited, each partition runs in parallel and each create redo/undo logs, so now we have multiple redo/undo log pairs.

6. Partition table, drop old partition create a new partition for new rows in every purge exercise. This is the most efficient approach and if implemented any purge process will take only few seconds.

Thanks for your attention,
Tom Kyte
December 10, 2009 - 2:26 pm UTC

hah,

#1 will need more total redo and undo than #2.


#2 will need more undo AT ANY POINT IN TIME, but #1 will generate more UNDO in total and more redo in total.


a thousand rows is teeny, tiny, trivial. Even if this was a couple thousand rows - that is teeny, tiny, trivial. as in "big deal"

#2 is the only option to seriously consider right here.


and get out of the mindset that doing things by piece will generate less redo/undo - it is wrong.



use #6 if you can isolate ALL of the rows you need to delete in a partition and if that partitioning scheme does not negatively impact performance of all other operations in your system.

Reader, December 15, 2009 - 11:20 am UTC

Tom,

pricing table which hold price data based on ids and dt_key
price_correction holds the correction that needs to be applied to pricing table.

Both tables are daily partition tables. They can have data from 1990. I am only concerned with data after 20090101.

When we get the price_correction, I have to delete whatever was present in pricing based on the id and dt_key and insert them to pricing.
I am using the procedure proc_price_correct to delete and insert. This runs fine if there are less records. If Ihave 200k records for example, it takes a lot
of time to complete. Can you advice what I should do to accomplish this task?

create table pricing
(dt_key number
,id number
,price number
)
partition by range (dt_key)
(partition pricing01 values less than (20090101)
,partition pricing20090101 values less than (20090102)
,partition pricing20090102 values less than (20090103)
,partition pricing20090103 values less than (20090104)
,partition pricing20090104 values less than (20090105)
,partition pricing20090105 values less than (20090106)
,partition pricing20090106 values less than (20090107)
);

insert into pricing values (20090101,1,3245);
insert into pricing values (20090101,2,2000);
insert into pricing values (20090101,3,4000);
insert into pricing values (20090101,4,5000);
insert into pricing values (20090101,5,600);
insert into pricing values (20090102,1,756);
insert into pricing values (20090104,2,890);
insert into pricing values (20090104,3,450);
insert into pricing values (20090106,4,700);
insert into pricing values (20090106,5,400);

create table price_correction
(load_dt_key number
,dt_key number
,id number
,price number)
partition by range (load_dt_key)
(partition price_correction01 values less than (20090101)
,partition price_correction20090101 values less than (20090102)
,partition price_correction20090102 values less than (20090103)
,partition price_correction20090103 values less than (20090104)
,partition price_correction20090104 values less than (20090105)
,partition price_correction20090105 values less than (20090106)
,partition price_correction20090106 values less than (20090107)
);

insert into price_correction values (20090104,20090101,1,3000);
insert into price_correction values (20090104,20090101,2,1000);
insert into price_correction values (20090104,20090101,3,4000);
insert into price_correction values (20090105,20090102,1,3000);
insert into price_correction values (20090105,20090104,2,8900);
insert into price_correction values (20090105,20090104,3,4500);

commit;

create or replace procedure proc_price_correct( p_load_dt_key in number)
is
l_min_dt_key number(8);
l_max_dt_key number(8);
begin
select min(dt_key), max(dt_key)
into l_min_dt_key,l_max_dt_key
from price_correction
where dt_key>=20090101;

delete from pricing p
where dt_key >= 20090101
and dt_key >= l_min_dt_key and dt_key <= l_max_dt_key
and id in (select id
from price_correction c
where c.dt_key = p.dt_key);

insert into pricing (dt_key,id,price)
select dt_key,id,price
from price_correction
where load_dt_key = p_load_dt_key
and dt_key >= 20090101;

commit;
end;
/

Tom Kyte
December 15, 2009 - 1:24 pm UTC


... it takes a
lot
of time to complete. ...

define "a lot of time" (sometimes, expectations are what need correcting, not code)

tell us how big this stuff is. (each partition...


this is confusing:

delete from pricing p
where dt_key >= 20090101
and dt_key >= l_min_dt_key and dt_key <= l_max_dt_key
and id in (select id
from price_correction c
where c.dt_key = p.dt_key);


Normally, you would not use a correlated subquery with IN, you would with EXISTS or NOT EXISTS - but not so much with IN. Explain your thinking on that one please?


shouldn't the logic just be a single merge?

update existing records to new price, insert any new records for that price?

delete from bulk

Tony Fernandez, December 24, 2009 - 11:47 am UTC

Tom,

Questions here:

1) in the open cursor
loop
fetch .. limit 500
forall in ..
delete big_table where ..
*** commit *** (1)
endloop
*** commit *** (2)
in most code posted never see but one commit after loop or none at all.

I thought both are necessary. Can you please comment?

2) Also, regardless of commits, monitored how many rows are deleted so far. It never goes in multiples of LIMIT=500. Not sure how to explain that, can you please help?

3) In big tables ( about 30 million rows and 700k rows to delete ), in spite of all redo/undo considerations, I found creating new table as in #4 in list 2 messages above, then creating all indexes, synonyms and grants. This route takes about 2hrs and change. But doing bulk deletes it averages about less. So it seems recreating is bad compared to bulk deletes, is this correct?

Merry Christmas 2009 !!!

Tom Kyte
December 31, 2009 - 4:38 pm UTC

1) my opinion is the code should simply be:


delete from big_table where .....;

period, no slow procedural code, no bugs introduced by developers, nothing like that at all.

Just one delete - and NO commit, the commit is something the client only knows when it is appropriate.

So, my routine would be:

begin
delete from big_table where (whatever,columns) in (your cursor definition here);
end;


and that is all.


The commit (1) you have - that commits before the transaction is complete, what happens if the instance fails after the first commit and before the last? Your database is in an unknown state. In many/most cases - this is horrible. In *some* (a few) cases - this is acceptable (eg: this is a purge of old data). In the acceptable cases, I would say - use partitioning and don't delete, just truncate an old partition.

But if you commit(1), there is no need for commit(2) - think about it. what would commit(2) be committing precisely in this case? (answer - nothing). In fact, it could be that commit(1) is done once too many times (not too few, too many!) What happens if your cursor returns 500 rows and no more. Then you will be committing NOTHING the second time through.


2) I've no idea what you mean there, however, just because you fetch 500 rows does not mean in general you will DELETE 500 rows. Unless you are using a unique, primary key or rowid in your where clause, the delete could delete 0, 1 or 1,000,000 (or more!) rows for each fetched row. I would not expect 500 rows deleted, I would expect somewhere between 0 and infinity.

3) ... So it seems recreating is bad compared to bulk deletes, is this correct?...

sigh, it depends, it *all depends*.

Now, in your case, you are removing a small amount of data from a big amount of data. In this case, a single delete would outperform all of them probably. Procedural code would come in 2nd. A create/replace trick - probably third.

Now, if you start deleting a ton more data - that might all change, *it depends*

if it's a partition table

ssgoh, January 26, 2010 - 6:05 am UTC

For the same or larger volumne of data, If it's a partition table, and it needs to be online/active transaction going on, which will be the better solution?

1) delete from tablename partition partition_name where <condition>;
=> do this partition by partition since successful deletion is more important than timing for our case.
=> Is that undo tablespace will only need to store the before stage of this partition only, not the whole table?

or

2) create a temp table as select * from partition_name where <condition of data we want to retain), and do exchange for each partition. drop/truncate the temp table.
=> Will data consistency be a concern here? since transactions are very active at the same time.

Which is the better option?
Thanks!

Tom Kyte
January 29, 2010 - 2:34 pm UTC

For the same or larger volumne of data

same or larger than WHAT exactly????



why would you not partition by your where clause so you can just truncate or drop? That would be best. Purging via where clause is so - well, 1980.

deleting huge rows.. (a performance comparision)

A Reader, January 28, 2010 - 6:40 am UTC

Tom,
Thanks very much for your time and enlightening us.

We have :
Oracle 9.2.0.8
Hp Ux 11.23

Target- delete millions of rows( old data ) from a table T1

Since we can NOT afford outage hence CTAS and other options are ruled out.

Method being opted are ( in Test env first ) :

a) Mark the rows to be deleted. i.e. Update one column, which is made exclusivley for archiving/deletion
b) delete the marked rows in a batches
b) Use Online redfifnition to reclaim the storage space and lower the HWM.

Further details:

1) Table T1 has no clobs ( columns are varchar, number, date types)
2) T1 has "objid" column as primary key NUMBER datatype
3) "dev" column is type NUMBER
4) dev column is indexed normal B*Tree index.
5) Based on the bussiness rule for "unused data" we set "dev" column to 1. Default value of "dev" column would is NULL

We use the following function to set the "dev" column to 1

f1) ...FN_Mark_for_Del
UPDATE T1
SET dev = 1
WHERE start_time < (SYSDATE - no_days) and some_business_rule..........;

Then we use the following function to delete the rows.

f2) ...FN_del_T1
FUNCTION fn_delete_T1 (batch_size IN NUMBER, max_rows IN NUMBER)
RETURN BOOLEAN
AS
cnt NUMBER;
row_count NUMBER;
BEGIN
SAVEPOINT checkhere;

SELECT COUNT (objid)
INTO row_count
FROM T1
WHERE dev = 1;

cnt := 0;
DBMS_OUTPUT.put_line
( 'Total number of rows to be deleted from T1 ['
|| LEAST (row_count, max_rows)
|| '] in batches of ['
|| batch_size
|| '] rows'
);
DBMS_OUTPUT.put_line ( 'Starting deletion at ['
|| TO_CHAR (SYSDATE, 'dd-MM-yyyy hh24:mi:ss')
|| ']'
);

WHILE (cnt < row_count AND (cnt + batch_size <= max_rows))
LOOP
DELETE FROM T1
WHERE dev = 1 AND ROWNUM <= batch_size;

cnt := cnt + SQL%ROWCOUNT;
COMMIT;
DBMS_OUTPUT.put_line ( 'Deleted ['
|| cnt
|| '] rows from T1'
);
END LOOP;

DBMS_OUTPUT.put_line ( 'End of deletion at ['
|| TO_CHAR (SYSDATE, 'dd-MM-yyyy hh24:mi:ss')
|| ']'
);
DBMS_OUTPUT.put_line ('Deleted [' || cnt
|| '] rows from T1'
);
RETURN TRUE;
EXCEPTION
WHEN OTHERS
THEN
ROLLBACK TO checkhere;
DBMS_OUTPUT.put_line ( 'Error processing. Sql error code is['
|| SQLCODE
|| ']'
);
RETURN FALSE;
END fn_delete_T1;




We used f1 to mark the data for deletion :

Table Total Count Marked for Del
T1 6469204 6177615


Attempted to delete the rows in a controlled manner in following phases.
phase-1 : batch size 500, records deleted 1000000 --> Time taken 00:05:49.09 Undo Generated 1GB, Redo 1.5 GB
phase-2 : batch size 1000,records deleted 1000000 --> Time taken 00:07:23.8 Undo Generated 1GB, Redo 1.5 GB
phase-3 : batch size 250, records deleted 1000000 --> Time taken 00:29:59.9 Undo Generated 1GB, Redo 1.5 GB
phase-4 : batch size 500, records deleted 1000000 --> Time taken 00:22:23.31 Undo Generated 1GB, Redo 1.5 GB

Done above just to benchmark the batch_size and no_of_rows deleted .. so that we can use those figure in live.

Can see huge difference in time in phas-1 & phase-4 ( same batch size, same number_of_rows deleted).

excperts from Statspack report ( phase-1):

..
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 5,465,377.22 1,039,493.31
Logical reads: 66,886.79 12,721.60
Block changes: 44,316.35 8,428.80
Physical reads: 3,898.02 741.39
Physical writes: 1,498.68 285.04
User calls: 61.01 11.60
Parses: 31.51 5.99
Hard parses: 0.01 0.00
Sorts: 0.32 0.06
Logons: 0.00 0.00
Executes: 37.43 7.12
Transactions: 5.26
..
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 94.17 In-memory Sort %: 100.00
Library Hit %: 99.99 Soft Parse %: 99.98
Execute to Parse %: 15.82 Latch Hit %: 99.98
Parse CPU to Parse Elapsd %: 120.00 % Non-Parse CPU: 99.94

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 32.99 33.00
% SQL with executions>1: 91.76 91.72
% Memory for SQL w/exec>1: 84.31 84.28

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 205 39.03
log file parallel write 4,494 130 24.71
db file sequential read 1,511,549 124 23.65
global cache busy 60 45 8.62
write complete waits 47 7 1.29
-------------------------------------------------------------

...
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
25,865,837 1 25,865,837.0 99.7 196.90 340.57 2165703550
Module: SQL*Plus
DECLARE RetVal BOOLEAN; BATCH_SIZE NUMBER; MAX_ROWS NUMBER
; BEGIN BATCH_SIZE := NULL; MAX_ROWS := NULL; RetVal := USERNAME.PKG_ARCHIVE_DEL.fn_delete_T1 ( 500, 1000000 ); COMMIT;
END;

25,842,514 2,000 12,921.3 99.6 193.00 336.26 3890394561
Module: SQL*Plus
DELETE FROM T1 WHERE DEV = 1 AND ROWNUM <= :B1





excperts from Statspack report ( phase-4):

...
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 1,438,322.45 992,682.66
Logical reads: 31,594.86 21,805.73
Block changes: 11,635.91 8,030.72
Physical reads: 16,059.68 11,083.86
Physical writes: 818.52 564.92
User calls: 61.15 42.21
Parses: 32.80 22.64
Hard parses: 0.00 0.00
Sorts: 0.37 0.25
Logons: 0.06 0.04
Executes: 34.67 23.93
Transactions: 1.45
..
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.99 Redo NoWait %: 100.00
Buffer Hit %: 49.17 In-memory Sort %: 100.00
Library Hit %: 100.00 Soft Parse %: 100.00
Execute to Parse %: 5.41 Latch Hit %: 99.68
Parse CPU to Parse Elapsd %: 54.84 % Non-Parse CPU: 99.97

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 33.05 33.06
% SQL with executions>1: 91.44 91.42
% Memory for SQL w/exec>1: 84.07 84.05

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 1,271 66.76
db file sequential read 23,891,385 472 24.78
log file parallel write 4,272 121 6.37
latch free 5,188 14 .71
DFS lock handle 2,646 9 .46
-------------------------------------------------------------
..
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
46,644,584 1 46,644,584.0 99.2 1244.23 1311.65 733080457
Module: SQL*Plus
DECLARE RetVal BOOLEAN; BATCH_SIZE NUMBER; MAX_ROWS NUMBER
; BEGIN BATCH_SIZE := NULL; MAX_ROWS := NULL; RetVal := USERNAME.PKG_ARCHIVE_DEL.fn_delete_T1 (500, 1000000 ); COMMIT;
END;

46,621,261 2,000 23,310.6 99.2 1240.47 1307.92 3890394561
Module: SQL*Plus
DELETE FROM T1 WHERE DEV = 1 AND ROWNUM <= :B1



Questions

q1) what for phase-4 waiting ? [ as it took time 4x longer than phase-1 for the same number of rows & get per execution are almost double in phase-4

compared to phase-1 with poor buffer hit ratio.. I know you dont like RATIOs :) ]
q2) is "delayed block cleanout" playing some role here ?
q3) do you see function used (f1) and (f2) efficient ? would you suggest something here to improve the efficiency?
q4) in functions no use of rowid or cursor.. do use of these will help.
q5) will indexing "dev" column as bitmap index will help? (In production env this table T1 is subject to huge insert/update.)
q6) any other comment/point you wish to add?


Kind Regards


Tom Kyte
January 29, 2010 - 3:39 pm UTC

each 'phase' has to re-read the table from start to where it finishes, each one has to READ all of the data the prior one read (since there would not be an index on this data)


I'm not sure why you would not just:

delete from T1
WHERE start_time < (SYSDATE - no_days) and some_business_rule..........;

commit;

and be done with it, you won't lock anyone out (they better not be trying to lock rows you are deleting - if they are WHY THE HECK ARE YOU DELETING STUFF PEOPLE CARE ABOUT??? - hence, you would have no contention)

that way you do a SINGLE PASS..



I would suggest erasing all of the code you wasted time on and just do the delete. That is my suggestion. It'll be much much much faster, generate less undo and redo in total and be quite easy.



I'll never get why people do this, make the simple so very very complex.

..........deleting huge rows.. (a performance comparision)

A reader, January 29, 2010 - 4:37 pm UTC

Thanks a lot Tom,

I undertsand that simple DML ( delete ) would be a Single pass while PL/SQL has it many .

Further,

a) Do we need to reorganise the table with every 'phase' of deletion? i.e. if we aim to delte 8 million rows in a batches of 1/2 Million then do we need to reorganise the table on every 1/2 million deletion.?

b) You said in single Pass amount of redo/undo generated would be less. How it would be as the total number of rows deleted in muplitple passes ( using code) and number of rows deleted in single pass ..using delete from ... would be same?

Regards

Tom Kyte
February 01, 2010 - 9:19 am UTC

a) of course not. but again, 8million is pretty small, not sure I'd do 1/2 million at a time - since you now REALIZE YOU WILL HAVE TO START AT THE TOP OF THE TABLE EACH AND EVERY TIME..

Lets say that table is 16,000,000 rows. And it has an average row width of 200 bytes. That table is about 3.2 in size (block overhead, row overhead, whitespace included)

Now, if you just delete, you'll read 3.2gb of table data. Done, math is simple.

Now, assuming that every other row is to be deleted (assuming a perfectly even distribution of data in the data), you'll read 1,000,000 rows the first time to delete 500,000 - or about 200mb. The second batch will read 200+200. The third will read 200+200+200 and so on (you'll do this 16 times). You'll read.... drum roll please......

ops$tkyte%ORA11GR2> select 200 * rownum amt, sum(200*rownum) over (order by rownum) sum_amt
  2    from dual
  3  connect by level <= 16
  4  /

       AMT    SUM_AMT
---------- ----------
       200        200
       400        600
       600       1200
       800       2000
      1000       3000
      1200       4200
      1400       5600
      1600       7200
      1800       9000
      2000      11000
      2200      13200
      2400      15600
      2600      18200
      2800      21000
      3000      24000
      3200      27200

16 rows selected.


about 27-28gb of data. Hmmmmmmmmmm.


If you are going to do this bad approach - please use rowid ranges or something - search for do it yourself parallelism (diy parallelism) on this site to at least avoid that heinous overhead.



b) everytime you start and stop operations - you add overhead, every commit - extra stuff is done, start a transaction - extra stuff is done. You want the most efficient way to do something - THINK BIG BIG BIG BATCH.

A reader, February 01, 2010 - 4:20 am UTC

Dears,

You can not be as efficient as doing a straight delete

DELETE FROM T1
WHERE dev = 1;
commit;

In your code you are comitting inside the loop which can enhance the possibility of ora-01555 snaphot too old erreur and alters the execution time also.

As per regards to the bitmap index on dev1, I think you will be hiting ora-060 deadlock detected while waiting for ressource if you create this type of index as far as your table is subject to multiple and concurent DML operations

Regards

Mohamed Houri


deleting huge volume of data from partition table

ssgoh, February 09, 2010 - 4:31 am UTC

Sorry because my question wasn't clear previously.

"For the same or larger volume" actually means the whole table has around 600 millions of records, in 128 hash partitions. The table was being designed long time ago and data being stored since then, and we can't change it now. So each partition has around 4.7 millions of records and we are targeting around 300K of records per partition to be deleted.

When we do the deletion by partition, with where clause, undotablespace will need to store the before state of the targeted partition or the whole huge table?


Tom Kyte
February 15, 2010 - 2:08 pm UTC

... When we do the deletion by partition, with where clause, undotablespace will
need to store the before state of the targeted partition or the whole huge
table?
...


undo stores the modified rows only. If you delete 300k rows, you will need undo to hold 300k rows (a delete puts the before image into redo - the entire row image will be recorded there).

fastest ? constraints ?

A reader, July 19, 2011 - 10:44 am UTC

Best method to cleanup hundreds of gbs of date is through CTAS table (select to ensure only data needed is retained) and dropping old table and followed by renaming table to its original name.It also reclaims space. I don't want to lose constraints and indexes during this operation. SO I would need to save DDL of my existing table and run "alter table" later. Am I right ?
Also,Is the CTAS-rename method the best and fastest?
Tom Kyte
July 19, 2011 - 10:52 am UTC

a create table as select could be used, if you are willing to be offline during the purge. However, it is just about the hardest method going when compared to...

dropping a partition.


And it would be very very very slow when compared to....


dropping a partition


I would look to partitioning, not a do it yourself approach.


thanks

A reader, July 19, 2011 - 11:12 am UTC

Unfortunately, it is not partitioned.Probably we could gain by implementing partition to make such operations easier in future .Please advise if this is fine for non partitioned table.
Tom Kyte
July 19, 2011 - 11:49 am UTC

it works, it is hard, it is error prone, it is probably better than delete if you are doing most of the data.


Follow up question

Rick, January 19, 2012 - 2:29 pm UTC

Tom,

The explanation of the approach for doing the deletes in batches using rownum makes sense. It will have to scan the table again, causing each iteration of the loop to take longer to run.

However, we've seen that if we kill the stored procedure after awhile, then restart it, the iterations will run faster than they did before we killed it. That seems to contradict what we've observed since rerunning the stored procedure will still need to read in all of the data that's already been processed as it did before.

Or, could it potentially read in a batch faster because the order the data is accessed is not guaranteed, so by blind luck, the process grabs the first batch quickly?
Tom Kyte
January 19, 2012 - 2:57 pm UTC

However, we've seen that if we kill the stored procedure after awhile, then
restart it, the iterations will run faster than they did before we killed it.
That seems to contradict what we've observed since rerunning the stored
procedure will still need to read in all of the data that's already been
processed as it did before.


give numbers or a test case to reproduce with.


I'd need an example to explain what might be happening.

I'd at least need a pretty comprehensive explanation of what the entire setup was, query plans and such.

Rick, January 20, 2012 - 9:51 am UTC

Ok, let me get some information together and see if I can quantify what they're seeing.

Deleting from partitioned table based on non partitioned key column

Aditi, April 25, 2012 - 3:00 am UTC

Hi Tom,

We have a scenario where we want to purge data based on accounting date from a huge partitioned table which is partitioned on application_id.

So in short we want something like:
Delete from table t
where t.application_id=200 and t.accounting date between :X1 and :X2

t has 600 million records and is partitioned table on application_id and the amount of data varies from 60k to 140 millions depending on the accounting date (quarter/half year/yearly)

We though the best would be to create a bogus partition say -999 and then update the records to be deleted to have this application id, and then truncate this partition with bogus application id.

However this meant updating the application_id column (which is partitioned key column) in this table which isnt possible without enable row movement.

Do you think enabling row movement would be a good idea here?
Do you think its a good approach, do you have any better idea?
Also, correct me if I am wrong, but parallel dml and disabling of the indexes would prove beneficial in this case. Currently there are 7 local indexes on t.
Tom Kyte
April 25, 2012 - 9:50 am UTC

I would go the path of:


create table tmp as select * from t partition(the_right_one)
where application_id <> 200 and (accounting_date < :x1 or accounting_date > x2);

/* i made the reasonable assumption that application_id/accounting_date are NOT NULL, if they are nullable, change the where clause to accommodate that */

index it like the real table

alter table t exchange partition to swap in the new one.



in all likelihood - that will be much faster/efficient than a mass delete. Delete works great for a row, delete consumes the most resources...

doing the update to mass move the data would be beyond a bad idea. An update that moves a record from partition to partition is a DELETE+INSERT!!!! you would be massively increasing the amount of work you would be doing.




Use DDL for this mass purge. You'll end up with a clean data structure - nicely packed indices and can do it using parallel operations that will not generate any redo or undo (redo generation is optional for you - if you do not generate redo, please do schedule a backup of the affected datafiles right after you are done)

Deleting many rows from a big table

Dakshinamurty Y, May 04, 2012 - 1:08 am UTC

I read the entire discussion. It cleared most of my doubts. I still have few questions.

Our client is using Oracle 11gr2 database. Client is having a 700 million record table, containing 11 years data. That table is partitioned by month. There are 4 global indexes on that table. The client wants to archive and purge 6 years data based on some business validation. These 6 years data will be read only data. 85-95% records from each partition have to be deleted by this way from 72 partitions. Minimum row size of one partition of these 72 partitions is 1 million and maximum row size one partition of these 72 partitions is 5 million.

We are planning to give an archiving and purging strategy in steps:
We take care of append, unrecoverable, no log, parallelism and backup where ever needed.
1. We identify records that can be deleted based on id, date (before which it can be deleted) based on business validations. We will be putting that in a table id_date_t
2. We create a list partitioned table id_t from id_date_t, having same 72 partitions as original table t. We put ids based on dates in to each partition.
3. We take records in each of 72 partitions of t that exist in id_t and move it to a different database/flat file.
4. We take 5%-15% of records in each of 72 partitions of t that do not exist in id_t and put that in a separate partitioned table part_t
5. Above 4 steps we can do it in online mode when transactions are going on. Now comes actual deletion part. For that we are planning to drop those 72 partitions, move part_t to t. Then we have to validate global indexes.

For Step 5, we are thinking that it requires a downtime. Client is having a P7 IBM machine with 64 CPU and 8 threads/CPU, 256 GB RAM.

My two questions are:
1. Is a 12 hr downtime sufficient?
2. What happens to DR site during 5th Step? Will the main database site be performance wise affected because of this? DR site replication is on Data Guard.

Tom Kyte
May 06, 2012 - 2:28 pm UTC

you can do the partition operations while maintaining the global index - see the update indexes clause of the ALTER TABLE command

http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_3001.htm#SQLRF53502


it does not require downtime.




But I don't see how you can do steps 1-4 in an online fashion? what happens if someone updates a record?

Re: Deleting many rows from a big table

Dakshinamurty Y, May 07, 2012 - 12:20 am UTC

I said "These 6 years data will be read only data. "
So, there is no chance to update the data.

Ours is a banking scenario. Millions of end users will be doing transactions per day. The bank users/Ebanking end users will be referring to this big table.

I read the link you provided. It says "When you perform DDL on a table partition, if an index is defined on table, then Oracle Database invalidates the entire index, not just the partitions undergoing DDL. This clause lets you update the index partition you are changing during the DDL operation, eliminating the need to rebuild the index after the DDL."

From this I got to know that index update is much faster. However, I need the downtime as the global index will be invalidated and before validating it, the users might fire queries on this table, which will be going for FTS and CPU consumption might be drastically increased and might bring the system to halt.

Going for update index clause, I am thinking I can take down time of maximum of 4 hours only. What is your take on that?
Tom Kyte
May 07, 2012 - 5:26 am UTC

What you seem to have said was that 85% of the data would be purged, the rest would not. The 85% was 'read only' - you didn't say anything about the remaining live data.

You wrote:

The
client wants to archive and purge 6 years data based on some business validation. These 6 years
data will be read only data. 85-95% records from each partition have to be deleted by this way from
72 partitions.


so, 85-95% of the data will be purged - but what about the remaining 5-15% - is it read only? remember - we don't know or have your schema, we don't understand your data.

You better make sure the 72 partitions you are doing this to are in fact 'read only'.


From this I got to know that index update is much faster

No, actually it is much slower - however - it is ONLINE, you do not take an outage. That is the advantage. Who cares if it takes longer if the end users have continuous access. That is the point.


The global index WILL NOT GO INVALID - that is the entire point of the update indexes clause. The index is maintained during the partition operation.


If you use update indexes - you will have 0.0 hours of downtime.


and if you do it offline - without update indexes - no one would be able to give you an estimate of how long it would take - you haven't provided details that would allow people to do that.


700 million rows is a useless metric - we have no clue how big that is (other than it is larger than 700MB - but we don't know how much larger)

we don't know your IO capabilities

we don't know your machine capability (cpu, memory)

and even if we did - we'd have to benchmark. which is what you'll be doing in your test environment to get numbers.

Deleting many rows from a big table

Dakshinamurty Y, May 07, 2012 - 9:01 am UTC

Sorry for not being clear and sorry for not understanding "ALTER TABLE .... UPDATE GLOBAL INDEXES" clause properly.

I exactly got the solution, what I was looking for: Using ".... UPDATE GLOBAL INDEXES" will keep the global indexes in "VALID state".

Now, I will do a POC on UPDATE GLOBAL INDEXES and comeback to you with the results, for asktom user's benefit.

partition at the very begin?

A reader, May 08, 2012 - 10:13 pm UTC

Hi tom,
Per your first reply, by case 2 and 3, you mean the design from the very beginning should consider 'partition', right?
If it has already on production without partition, partition it then do either 'parallel delete' or 'partition drop' will
not solve the issue as 'partitioning the existing table might take some time', right?

case2:
partition the data, do a parallel delete. Each partition will use its own rollback
segment, each will run in parallel.
case3:
partition the data so that you can do a DROP partition instead of DELETE.
Tom Kyte
May 10, 2012 - 3:00 am UTC

'partitioning the existing table might take some time',
right?


a one time pain to fix a long term ongoing issue.


to get rid of 1.5 out of 6 million - and given the table is not currently partitioned - I would be looking to

a) use a create table as select to save the 4.5 million records (no delete)
b) have that create table as select implement partitioning
c) so I never have to have an outage to purge data again

Mass DELETE still slow with partitioned tables

Brian, December 13, 2012 - 8:02 pm UTC

Hi Tom,

You are the best, all the time!

I followed your second recommendation of partitioning tables to facilitate mass deletions, as this choice provides some level of transparency to our applications. However my first attempt did not seem to give a result as impressive as I expected...

Here is my database:

SQL> select * from v$version where banner like 'Oracle%';

BANNER
--------------------------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi

Here is my DDL for this table given by SQL Developer Data Modeler.

CREATE TABLE TRANSACTIONS 
    ( 
     TRANSACTION_ID NUMBER (18)  NOT NULL , 
     ITEM_ID NUMBER (4)  NOT NULL , 
     MERCHANT_ID VARCHAR2 (16 CHAR)  NOT NULL , 
     ORDER_ID NUMBER (18)  NOT NULL , 
     KIND VARCHAR2 (16 CHAR)  NOT NULL , 
     ACTION VARCHAR2 (12 CHAR)  NOT NULL , 
     REASON VARCHAR2 (16 CHAR)  NOT NULL , 
     COUNT NUMBER (18)  NOT NULL , 
     RATE NUMBER (38,18)  NOT NULL , 
     AMOUNT NUMBER (38,2)  NOT NULL , 
     NOTES VARCHAR2 (16 CHAR) , 
     TRADING_DAY CHAR (8 CHAR)  NOT NULL , 
     TRANSACTION_TIME TIMESTAMP  NOT NULL 
    ) 
    LOGGING 
    DISABLE ROW MOVEMENT 
    PARTITION BY LIST ( ACTION ) 
    ( 
    PARTITION HELD VALUES ( 'HOLD', 'LIFT' ) , 
    PARTITION PERM VALUES ( DEFAULT ) 
    ) 
;
ALTER TABLE TRANSACTIONS 
    ADD CONSTRAINT TRANSACTIONS_ACTION_CK 
    CHECK (ACTION IN ('DEPOSIT',
'WITHDRAW',
'HOLD',
'LIFT',
'SECURE',
'RELEASE',
'CREDIT',
'CHARGE'))
;
CREATE INDEX TRANSACTIONS_MER_IDX ON TRANSACTIONS 
    ( 
     MERCHANT_ID ASC 
    ) 
    LOGGING 
;
CREATE INDEX TRANSACTIONS_ACTION_IDX ON TRANSACTIONS 
    ( 
     ACTION ASC 
    ) 
    LOGGING 
;
CREATE INDEX TRANSACTIONS_KIND_IDX ON TRANSACTIONS 
    ( 
     KIND ASC 
    ) 
    LOGGING 
;
CREATE INDEX TRANSACTIONS_DAY_IDX ON TRANSACTIONS 
    ( 
     TRADING_DAY ASC 
    ) 
    LOGGING 
;
ALTER TABLE TRANSACTIONS 
    ADD CONSTRAINT TRANSACTIONS_PK PRIMARY KEY ( TRANSACTION_ID, ITEM_ID )
;
ALTER TABLE TRANSACTIONS 
    ADD CONSTRAINT TRANSACTIONS_MER_FK FOREIGN KEY 
    ( 
     MERCHANT_ID
    ) 
    REFERENCES MERCHANTS 
    ( 
     MERCHANT_ID
    ) 
    DEFERRABLE 
;

(The values in ACTION column never change so I felt it's pretty safe to disable row movement.)

When the table contained the following number of rows in the 2 partitions,

HELD 3052156 ('HOLD', 'LIFT')
PERM 2099596 (DEFAULT)

... the following SQL took a little over 20 minutes to get rid of the 3 million rows.

DELETE FROM TRANSACTIONS WHERE ACTION IN ('HOLD', 'LIFT');

Well, I have to say 20 minutes is still surprisingly long to just get rid of the HELD partition completely - I've hoped Oracle would have treated this delete in the same way as truncating the partition and rebuilding the indexes...

Did I do something wrong here? Or is 20 minutes is about the best I can get?

Thanks for your insights!

Tom Kyte
December 17, 2012 - 4:04 pm UTC

you would partition in order to NOT DELETE. You would partition so you can drop or truncate a partition.


partitioning will not speed up a delete (well, probably not, it could make a full scan faster - but the actual delete operation will not be fast)

you partition in order to avoid having to say DELETE in the first place.


truncate the partition that contains hold and lift.

Multi-fold improvement

Brian, December 19, 2012 - 8:49 pm UTC

Hi Tom,

Thanks for making this very clear.

I've found it helpful to split my table into more than 2 partitions - actually 8, one for each possible value of the ACTION column. As soon as I did that I believe I could remove the index on the ACTION column and make insertion a bit faster. Am I right?

Now I have to truncate 2 partitions instead of just one, and rebuild the indexes afterwards. Too bad I can't use the UPDATE INDEXES clause to take care of the index rebuilds - is it possible to use a single TRUNCATE PARTITION statement on 2 partitions?

ALTER TABLE TRANSACTIONS TRUNCATE PARTITION HOLD;
ALTER TABLE TRANSACTIONS TRUNCATE PARTITION LIFT;
ALTER INDEX TRANSACTIONS_MER_IDX REBUILD;
ALTER INDEX TRANSACTIONS_DAY_IDX REBUILD;
ALTER INDEX TRANSACTIONS_MER_IDX REBUILD;
ALTER INDEX TRANSACTIONS_KIND_IDX REBUILD;

Now it took a little over 5 minutes to delete 3 million rows in these 2 partitions, a multi-fold improvement from the 20 minutes using DELETE.

Best,
Brian
Tom Kyte
December 20, 2012 - 5:10 pm UTC

You could probably remove the index - if the index was only on action.


you can truncate and maintain the global indexes if you like.


currently (11g and before) partition operations work on a single partition only.

index based delete

Ravi B, July 09, 2013 - 3:31 am UTC

Tom,

I have a table say BIG_TABLE which has ID column in addition to serveral other colums. Each ID has about 12100188 rows.

1) is it efficient to have an index on ID column and say
delete from BIG_TABLE where ID = 123

If so, are we doing double the work deleting data as well as index?

2) if i do delete from BIG_TABLE where ID = 123 without an index it would do a full table scan.

What is the right way of doing this?
Tom Kyte
July 16, 2013 - 2:59 pm UTC

1) i seriously doubt it. You really want to do 12,100,188 single block reads??? I'd rather full scan (or hopefully the table is partitioned and you can scan just a single partition maybe?)

2) AWESOME, i love full scans, love em.



you really don't give enough details to say anything else.

full table scan delete

RaviB, July 18, 2013 - 6:16 am UTC

Yes full table scans are awesome.

But here is the issue. We have a product which works on both sqlserver and oracle. I am oracle guy just trying to learn sqlserver.

Weather it is deletes,selects or creating csv files from the database SQL Server beats hands down on performance. Our QA uses same hardware for testing both the systems. I am talking about minutes to hours comparison. SQL server handles millions of row deletes in matter of 10 minutes whereas on the same table, same number of rows, same hardware oracle takes more than an hour. I run the SQL from sqldeveloper and the query returns in few secs. execution plan seems to be ok. For delete, both sqlserver and oracle does full table scan. I have no answer to our QA people.
Tom Kyte
July 18, 2013 - 5:10 pm UTC

trace it, see what it is waiting on. perhaps your redo logs are too small and you are waiting on log switches.

we'd need to see what it is waiting on.


remember, when you run a query from sqldeveloper or whatever - it gets the first few records, shows them and STOPS. You haven't run the entire query.

full table scan delete

RaviB, July 18, 2013 - 6:23 am UTC

We have MEMORY_TARGET set to 12GB. I think it is more than enough for the HASH joins etc.
Tom Kyte
July 18, 2013 - 5:10 pm UTC

there shouldn't be any hash joins, it is a simple single table full table scan with a where clause.

RaviB, July 18, 2013 - 5:44 pm UTC

HASH joins are for other SQL queries (not deletes) which run slow on oracle but quicker than NESTED LOOP for our situation. insert into SELECT (HASH join plan) take about 45 mins compared to SELECT with NESTED LOOP plan which takes more than 3hrs. But the same INSERT SQL runs in 15 minutes in sqlserver.

For delete, yes it is doing full table scan but takes hours to finish. Which level tracing you recommend for this issue?

Thanks,
Ravi
Tom Kyte
July 18, 2013 - 5:56 pm UTC

then I would guess that you have a different set of configuration parameters for sqlserver.


use dbms_monitor.session_trace_enable( waits => true );


also, look at things from my perspective. we were talking about a single table full table scan for a delete. bringing up hash joins was just "not right", not relevant, a red herring from left field, confusing, etc. I cannot guess what you mean by things.


and deletes can certainly use hash joins, hash joins are not for things other than deletes. hash joins in a delete can be awesome too.

Jess, September 05, 2013 - 7:44 pm UTC

Hi Tom,

Hoping you'll be able to help with fixing a performance problem in deleting rows from a table for some code that I've just inherited.

There is a non-partitioned table of unfulfilled orders holding about 15M wide rows. The data grows about 3M a year overall but fluctuates from a daily perspective (e.g., 50K can be added today, but then 35K removed tomorrow).

Other than the records being inserted into this table, the queries that run against it a lot and often are of the following format:

select /*+ parallel (ord,20) */ <whatever you need>
from outst_orders ord
where filled_status = 'N' and priority_customer = 'N'
order by customer_number desc, invoice_number desc, invoice_date desc, invoice_line desc;

This type of query is meant to return about 10% of the data in the table.

This is in an OLTP database, with the table indexed as follows:
- B*Tree IDX1 on (invoice_date, invoice_number, customer_number)
- B*Tree IDX2 on (invoice_line)

(I believe there used to be queries just for invoice_line, which is why it's in a separate index--it seems to make sense to combine the 2 indexes into 1 at this point, I'd think. Also, the column order in the index is ascending, which doesn't make sense given that the queries all want to be descending... It should matter, no?)

There used to be an index on filled_status, but we've removed it to induce an FTS every time as a temporary fix, as that was producing faster results at the time, since the table was growing uncontrollably because nothing was ever removed from it. Now that we're capping the grows by offloading processed records, does it make sense to stick to an FTS or would an index now be an option?

Now with a change in the data model, we're removing the filled orders into another table.

The query to do so is called from the Java, so there are thousands of threads pinging the following query one at a time:

delete from outst_orders where
customer_number = <X> and invoice_number = <Y> and
invoice_date = <Z> and invoice_line = <Q>
and filled_status in ('F','O','C');

This produces the following plan:

-----------------------------------------------------------------------------------
|Id| Operation |Name |Rows|Bytes|Cost(%CPU)|Time |
-----------------------------------------------------------------------------------
| 0|DELETE STATEMENT | | 1 | 49 | 5 (0) |00:00:01|
| 1| DELETE |OUTST_ORDERS | | | | |
|*2| TABLE ACCESS BY INDEX ROWID|OUTST_ORDERS | 1 | 49 | 5 (0) |00:00:01|
|*3| INDEX RANGE SCAN |OUTST_ORDERS_IND1| 1 | | 4 (0) |00:00:01|
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("INVOICE_LINE"=1234 AND ("FILLED_STATUS"='F' OR "FILLED_STATUS = 'O'
OR "MAPPING_STATUS"='C'))
3 - access("INVOICE_DATE"=SYSDATE@!-20 AND "INVOICE_NUMBER"=98765
AND "CUSTOMER_NUMBER"=65646354)

Each delete takes about 30 seconds to complete. Given that each run is expected to delete about 30-50K records, that adds up to a long time.

What can we do to speed this up? At these volumes/percentages, index range scan seems like an ok approach, though I would think that moving the last column from the INX2 into IDX1 might help.

Any tips would be much appreciated.

Thank you.
Tom Kyte
September 09, 2013 - 11:25 am UTC

... does it make sense to stick to an FTS or would an index now be an
option? ...

to get 1.5 million rows - usually an index is a bad idea. think about what would happen if many of the table blocks had to be read into the cache.

each single block read is about 5ms. 1,000,000 times 5ms is 50,000 seconds - about 1.33 hours. And you've just clobbered your buffer cache.

likely better to multiblock read directly into PGA bypassing the inefficient buffer cache. probably.


... so there are thousands of threads
pinging the following query one at a time: ...

that is pretty dumb, do you have thousands of cores????



BULK OPERATIONS - one sql to move, one sql to delete - one thread (using parallel if sensible - but parallel SMALL, not thousands)

slow by slow code - stinks.


if you do something "fast" a lot - it takes a long long time, that is what you are seeing. java and row by row processing is the bane of my existence. I'm really quick sick of seeing it over and over and over and over and over and over.... every single time.


full scans

single sql statements from a single thread in bulk.


http://tinyurl.com/RWP-DW-PART2
http://tinyurl.com/RWP-DW-PART3


Jess, September 09, 2013 - 5:28 pm UTC

Hi Tom,

Thank you for answering and for your suggestion. Just a quick follow-up if I could.

In this scenario http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2345591157689#7313144400346498164 above), the delete removes less than 1% of the data.

Changing it into a "delete where filled_status in ...", it will do an FTS, as there is no index on filled_status. It seems to make sense to add an index this column (to pull 50K of 2KB rows) for delete. Would you agree?

Also, for the main select query (with its 2 indexes), it transpires that we definitely don't want to use an index, so would it make sense to drop them, so we don't have to maintain them and just do FTSs? It also follows from what you're saying that the count of rows we return is as important as the % of data they represent. As the table grows, the rows returned will be smaller and smaller percentage, but the count of them will remain static. If so, then it sounds like there is no tipping point in the future where we'd want to use an index. Did I understand correctly?

If the number of records we select were to drop to, say, 30K (out of 20-odd Mil), and we wanted to bring the index back, I would think that we'd want all 4 columns in the same index, all descending. Do you agree?

(Btw, yes, doing the delete multi-threaded is not the best of idea, but I'm looking at what's already been implemented before my time. And thank you for the links--extremely informative.)

Thanks as always for your help.

Tom Kyte
September 10, 2013 - 10:11 am UTC

instead of tons of silly little java threads each taking 30 seconds to use an index to find tens of thousands of rows - why are you not looking at a single delete to delete everything in a single statement???

why turn a batch job into a high end oltp system? with concurrency waits and everything else?

why is this not a single delete

and why isn't the long term plan to get partitioning in place?


using an index by thousands of java threads is bad when you could just be doing a single bulk statement (maybe in parallel) using big bulk operations.



you cannot get blood from a stone, you have stone here. If you want to get something from it, you'll have to give serious consideration to *changing code* - that is what we do for a living - change code.

Jess, September 10, 2013 - 11:00 am UTC

Hi Tom,

I've changed the delete to a bulk operation off the back of your first reply (thank you!), so that's been done.

We are not partitioning the table because we'd still need to scan every partition for processing (for business reasons, all data in the table is looked at every time, hardly ever for a specific day).

My second question was for further optimisation -after- the delete has been changed to "delete where filled_status in ... ". Because there is no index on filled_status, this bulk delete will do a full table scan. If, in the future, we plan to delete about 30-50K of 2Kb rows per run from a 20 Mil table, would it make sense to add an index on this column and do delete indexed? Or would it make sense to leave it as FTS? Or would it make sense to use the existing index(s) and delete table where <indexed columns> in (select <indexed columns> from the table where filled_status in (...))?

Also, it follows from what you're saying that the count of rows we return is as important as the % of data they represent. As the table grows, the rows returned will be smaller and smaller percentage, but the count of them will remain static. If so, then it sounds like there is no
tipping point in the future where we'd want to use an index. Did I understand correctly?

Thanks again for your help.
Tom Kyte
September 10, 2013 - 11:24 am UTC

... We are not partitioning the table because we'd still need to scan every
partition for processing (for business reasons, all data in the table is looked
at every time, hardly ever for a specific day). ...

then partition by the criteria your delete process is searching by??? You don't have to partition by a date.


If, in the future,
we plan to delete about 30-50K of 2Kb rows per run from a 20 Mil table, would
it make sense to add an index on this column and do delete indexed?


table has about 6.6 million blocks (assuming 8k block)
you might be hitting as many as 50,000 of them, plus all of the indexes.

it would probably make sense to use an index in that case

but I'd still rather be able to do DDL as all of the redo/undo is going to add up too - and the index maintenance. be nice if we had a two partition table and as the rows became candidates for deletion - we'd move them to the "delete me" partition (so we are incurring the delete+index maintenance continuously, all of the time, as it happens) and then just truncate the delete me stuff (including its local index)


... If so, then it sounds like there is no
tipping point in the future where we'd want to use an index. ...

there is a tipping point but it is less about the number of rows than the number of blocks you'll hit. say you have a 100,000,000 row table. And you are going to delete 1% of it.

what if each block held 100 rows
and the rows to delete were uniformally distributed throughout the table

then - your delete would modify every single block in the table, best to full scan (best to use DDL probably in this case - direct path read, direct path write, no undo, no redo, parallel...)


on the other hand, if your 1,000,000 rows were located together, you'd be hitting 10,000 blocks, probably good for an index


think of the work that needs to be done to retrieve the rows via an index, it is not number of rows, it is "how many blocks will I process"

Jess, September 10, 2013 - 4:06 pm UTC

Thanks Tom, appreciate it.

The selection criteria for partitioning is extremely skewed. In the 20M table, every day about 30K rows go from 'unfilled' to 'filled', so they would be the ones moving partitions, getting inserted into history, and then deleted. I wouldn't have thought that a skew like that would be desirable. The rows become tagged for deletion (i.e., 'order filled') as part of the overnight process. All orders in this table get checked against other table, filled ones are marked for deletion, copied to a history table, and deleted. Would the DDL approach still be the desired one in this instance? (Not sure why a local index would be required at all here, but that's beside the point).

The blocks are, 8K in this system. The rows are approx. 2K, so 4 rows per block in this case. For deletes, the data is somewhat clustered. That is, the data trickles in every day, but unfilled orders get filled in chunks per company, so should be in close proximity. Based on what you're saying I'll give the index a go to see if it works out better (once the volumes get to this stable point).

Thank you for explaining indexing in terms of blocks--everyone always talks about percentages of data returned. This has been extremely helpful.


david huang, January 09, 2014 - 1:10 pm UTC

tom :

I learn a lot from your idea and method.
I have one idea. If I want to delete 80% of 3 million rows in a table. could I create and copy the rest 20% rows of that table and rename it. then I truncate the original table?

David
Tom Kyte
January 09, 2014 - 7:28 pm UTC

if possible, I would do something similar to

copy out the 20%.
drop the old table
rename the new to old
grant on it, index it, etc...

or since it is pretty small (just 600k rows), you could

copy out 20%
truncate
insert /*+ append */ back in the 20%
done

A reader, January 09, 2014 - 8:02 pm UTC

Do you have this explained in your book effective design?
Tom Kyte
January 10, 2014 - 5:36 pm UTC

Not that I recall... Not everything can be in a single book (I tried with Expert One on One Oracle :) that was 1,365 pages long.... still didn't get *everything* in there..)

A reader, January 15, 2014 - 10:38 pm UTC

what database does oracle use? db2 or oracle?

krishna, March 09, 2014 - 4:07 pm UTC

Hi Tom, i recently attended an interview where they asked me this question that how to delete millions of records from a table which is regularly getting updated... i told him all these options, i.e using CTAS and drpo old table and rename old to new table..rebuild indexes, partitioning logic, nologging etc..which you recommended...but they keep on insisting that none of this method is going to work when you have to delete the records real time..and these method would take much time..please let me know what should be the correct approach then?

Deleting million rows

Pradeep, August 06, 2014 - 1:09 pm UTC

We have to purge around 70 % data from the two BIG tables TEST and TEST_EVENT in our database.
We followed below approach . After this activity we wanted to keep the old tables for sometime so that in case of any issues we can refer them.

We have followed below approach in Pre production.

1. Created NEW tables (TEST_NEW and TEST_EVENT_NEW) as CTAS (PARALLEL NOLOGGING)with the data we wanted to keep.
2. Created indexes (PARALLEL and NOLOGGING) in new tables (INX_TEST_NEW etc..)
3. Renamed the original table and indexes with SUFFIX BKP (Renamed TEST to TEST_BKP and TEST_EVENT to TEST_EVENT_BKP).
4. Renamed the newly created to original tables and their indexes to the orignal''s name. - Renamed TEST_NEW to TEST and TEST_EVENT_NEW to TEST_EVENT etc.....
5. Provided all grants to the fresh tables TEST and TEST_EVENT.
6. In the end of the activity , we have the fresh tables
TEST and TEST_EVENT with the data we wanted to keep and TEST_BKP/TEST_EVENT_BKP with the old to refer for sometime.

When all process completed and tested, everything was fine except that we still can see the NEW tables and Indexes (TEST_NEW and TEST_EVENT_NEW) on AWR reports.
Which shpuldn't be as these have been renamed and dont exist anymore. While checking in ALL_TABLES , ALL_INDEXES, ALL_OBJECTS these are not there .

Really appreciate your help.

What hapens if new row is added after consistent read that satisfies the delete condition?

Ani, September 23, 2014 - 12:04 pm UTC

Suppose I have a yes/no column. I have about a million entries with "no" value. I want to delete all rows that are marked as "no". So I start a single delete statement.
So first consistent read happens. Oracle will select all the rows with "no" value. And then for each row it will do a current read and delete the row. But in the meantime, an update statement changes the value of an existing row from "yes" to "no" and commit it. But my assumption is that as Oracle is done with consistent read phase and already selected the rows to delete, it will not take care of this new "no" value row even though it was committed before the delete statement is finished. And it will not restart the delete operation in this case as the selected rows are still unchanged. So even after deleting all the rows with "no" column, I can still have a row with "no" column. Is this scenario correct?
If yes, is there any way to take care of this without using a whole table lock.

Thanks.

"lose" rows with dbms_redefintion?

Michael, January 07, 2015 - 4:21 pm UTC

Hi Tom,

I have to delete about 90% of a 611 million rows table. I will CTAS the 10% into a new table, drop the old table and rename new to old afterwards.

I wonder if there would be any way to "lose" this 90% while doing an online table redefinition?

I'm on Oracle 11.2.0.4. With Oracle 12.1 it would be possible to put VPD policies on the old table to lose the rows. Is there any trick for 11.2?

Regards,
Michael

Dbms parallel execute

Walaa saafan, January 06, 2021 - 11:49 am UTC

Hi Tom,

I need to ask about using dbms parallel execute to delete rows in table, is it better than using usual delete statement in case if I can’t partition the table?
Chris Saxon
January 06, 2021 - 5:08 pm UTC

It's easier to try a plain parallel DML

alter session enable parallel dml;
delete /*+ parallel */...


If you're on 12.2 or higher I'd look into a filtered table move first though:

alter table to_delete_from
  move including rows
  where rows_to_keep = 'Y';


For other options see:

https://blogs.oracle.com/sql/how-to-delete-millions-of-rows-fast-with-sql