Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, Bo.

Asked: August 16, 2002 - 8:42 am UTC

Last updated: May 22, 2013 - 9:00 pm UTC

Version: 8.1.6

Viewed 100K+ times! This question is

You Asked

Hi. Tom.
Thank you for taking this question.
In a recent Big Volumn data processing, I use the bulk delete. However, I found the speed is very slow. To make a compare, I use regular delete to treat similar amount of data on same table, the speed is about same.

For this particular case, we have around 300,000 IDs to be deleted.

The REGULAR DELETE looks like:
DELETE FROM t
WHERE id IN (select id from a)

The BULK DELETE is:
DECLEAR
TYPE num_nt IS TABLE OF VARCHAR2(20);
TYPE descr_nt IS TABLE OF VARCHAR2(20);

nt_x_nums num_nt := num_nt();
nt_a_descrs descr_nt := descr_nt();

v_cnt_sqlrow NUMBER := 0;
SELECT id BULK COLLECT
INTO nt_x_nums
FROM a;

v_cnt_sqlrow :=SQL%ROWCOUNT;
-- DBMS_OUTPUT.PUT_LINE(v_cnt_sqlrow);

FORALL i IN nt_x_nums.FIRST..nt_x_nums.LAST
DELETE FROM t
WHERE id= nt_x_nums(i)

RETURNING id BULK COLLECT INTO nt_a_descrs


Could you please provide a rule to determine when should use bulk delete?

Thanks,

and Tom said...

You would always want to use the "regular" delete.

Less code, runs in less (generally) time.

the rule to decide when to use a bulk delete: "use it when you cannot do the delete in a single SQL statement".

Here, you probably had a table with more some number of indexes and deleting just does alot of work. Maintaining the index structures for 300,000 deletes can be "cumbersome". Additionally there is lots of UNDO and REDO generated.


If 300,000 was a large percentage of the table, it is sometimes better to:

create table temp nologging as select * from t where id not in ( select id from a );

(keep the rows you want)

index temp (unrecoverable, in parrallel )
grant on temp (as you had for t)
drop table t;
rename temp to t;


Rating

  (64 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

bulk delete several tables

A reader, November 14, 2002 - 11:17 am UTC

Hi

I have to delete 6 tables where a column is compared to a column of a 15 millions rows table

I was wondering if I could do this in bulk deletes such as

declare
type x is table of number;
x_array x;
begin
select cod_id bulk collect
into x_array
from x_table;

....................................

then now I dont know how to proceed because basically the delete is something like

delete t1 where cod_id = (select cod_id from x_table);
delete t2 where cod_id = (select cod_id from x_table);
delete t3 where cod_id = (select cod_id from x_table);
delete t4 where cod_id = (select cod_id from x_table);
delete t5 where cod_id = (select cod_id from x_table);
delete t6 where cod_id = (select cod_id from x_table);

but obviously to do this I have to query 6 times the 15 millions rows table, I was wondering if I can query once then use that result to all 6 tables in PL/SQL?

thank you

Tom Kyte
November 14, 2002 - 7:34 pm UTC

delete t1 where cod_id in ( select cod_id from x_table );
...
delete t6 where cod_id in ( select cod_id from x_table );


Matters not if x_table has 15 BILLION rows -- it is the size of t1, ... t6 that matters. Consider:

big_table@ORA920.LOCALHOST> create table t ( x int );
big_table@ORA920.LOCALHOST> insert into t select rownum from all_users;
big_table@ORA920.LOCALHOST>
big_table@ORA920.LOCALHOST> analyze table t compute statistics;
big_table@ORA920.LOCALHOST>
big_table@ORA920.LOCALHOST> select count(*) from big_table;

COUNT(*)
----------
1000000
big_table@ORA920.LOCALHOST>
big_table@ORA920.LOCALHOST> set autotrace on
big_table@ORA920.LOCALHOST>
big_table@ORA920.LOCALHOST> delete from t where x in ( select id from big_table );

Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=38 Card=36 Bytes=216)
1 0 DELETE OF 'T'
2 1 NESTED LOOPS (Cost=38 Card=36 Bytes=216)
3 2 TABLE ACCESS (FULL) OF 'T' (Cost=2 Card=36 Bytes=72)
4 2 INDEX (UNIQUE SCAN) OF 'BIG_TABLE_PK' (UNIQUE) (Cost=1 Card=1 Bytes=4)




Statistics
----------------------------------------------------------
64 recursive calls
38 db block gets
92 consistent gets
13 physical reads
8516 redo size
788 bytes sent via SQL*Net to client
822 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
36 rows processed

big_table@ORA920.LOCALHOST>
big_table@ORA920.LOCALHOST> set autotrace off


that was really fast -- big_table is really big but it didn't really *matter*

t1 to t6 are even bigger :p

A reader, November 15, 2002 - 3:04 am UTC

Hi

Table t1 to t6 have around 50 million rows each!

I dont understand why the size depends on t1 to t6, I think in the example you provided it´s using index scan because we are comparing values from t1´s column to big_table´s PK no?
I thought sub-queries always have to be resolved first that´s why i thought it would query 6 times the whole table if I run 6 deletes

I have this PL/SQL anonymous block

DECLARE
TYPE X_EMPNO IS TABLE OF X.EMPNO%TYPE;
X_EMPNO_ARRAY X_EMPNO;
BEGIN
SELECT empno BULK COLLECT
INTO X_EMPNO_ARRAY
FROM emp;
FORALL i IN X_EMPNO_ARRAY.FIRST..X_EMPNO_ARRAY.LAST
DELETE FROM x
WHERE empno = X_EMPNO_ARRAY(i);
DELETE FROM y
WHERE empno = X_EMPNO_ARRAY(i);.... 6 delets for 6 tables
END;
/

will these deletes work like

delete from x wher empno in (select empno from emp)?

I queried v$sql and I see delete from x where empno = :B1 for this PL/SQL Block



Tom Kyte
November 15, 2002 - 7:12 pm UTC

No, don't do it that way -- just stop. (try my way ok)

If big table has an index (pk or otherwise), it is the size of t1..t6.

DON'T TRY TO OUT GUESS the database here, it actually knows what to do.

I'd just love to see you array fetch 50million records -- you wanna talk *dog slow*. That is exactly what you would be doing.

do what I told you to -- use explain plan, check it out.

forgot to say

A reader, November 15, 2002 - 7:22 am UTC

I alsof forgot to say that the BIG_TABLE is not indexed, it´s a temporary table

I thought about that array fetch was horrible too!

A reader, November 15, 2002 - 7:59 pm UTC

I thought about array fetching 50 millions would be horrible too! That's why my doubts of how to do it. The problem is with RBS space, if I do a straight delete like this I doubt it very much I have enough RBS, I have a 2GB RBS just for this kind of jobs but I dont think even 2GB is enough however I left one delete going today after I left office we will see if it works on Monday (assumming there are enough RBS space)

big_table has a index on cod_id, the delete I ran was

delete from t1 a
where exists (select null
from big_table b
where a.cod_id = b.cod_id)

the explain plan was FTS of t1 and index scan on big_table

I had a previous delete which looks like

delete from t1
where cod_id in (select cod_id from big_table)

the explain table looked horrible, sort-merge going on for both tables of 50 millions rows and 15 millions rows so I opted for EXISTS however I am not quite sure if my choice was correct.

My last question is about sub-queries, I always thought sub-queries are resolved first but from your example it doesnt seem so... however if there were a predicate in your sub-query I guess it will be resolved first right?

Tom Kyte
November 15, 2002 - 8:25 pm UTC

what does RBS have to do with 50 million rows in a table you read?? anyway..


it depends (as i keep saying) on the SIZE of t1. If t1 is large - sort merge = WAY TO GO. If t1 is small - exists is ok.


Subqueries are not resolved first, second, third or last. there are numerous ways things can go -- many many many ways.

delete failed

A reader, November 18, 2002 - 4:11 am UTC

Hi

I checked in the office today and the delete failed due to RBS space (as I expected!). It took 2.5 hours reading 15 milliong physical reads, 70 millions block gets, 45 millions consistent gets and 65 millions block changes

I am using now

DECLARE
TYPE X_EMPNO IS TABLE OF X.EMPNO%TYPE;
X_EMPNO_ARRAY X_EMPNO;
L_ROWNUM NUMBER := 50000;
BEGIN
WHILE TRUE
LOOP
SELECT empno BULK COLLECT
INTO X_EMPNO_ARRAY
FROM emp;
WHERE ROWNUM <= L_ROWNUM;
IF SQL%NOTFOUND
THEN
EXIT;
END IF;
SET TRANSACTION USE ROLLBACK SEGMENT RBIG;
FORALL i IN X_EMPNO_ARRAY.FIRST..X_EMPNO_ARRAY.LAST
DELETE FROM x
WHERE empno = X_EMPNO_ARRAY(i);
FORALL i IN X_EMPNO_ARRAY.FIRST..X_EMPNO_ARRAY.LAST
DELETE FROM y
WHERE empno = X_EMPNO_ARRAY(i);
COMMIT;
END LOOP;
END;
/

I will let you know if this goes better!


I would like to do plain delete as well but due to disk space problem I cannot have more RBS...

Cheers

Tom Kyte
November 18, 2002 - 8:17 am UTC

that won't do anything except go into an INFINITE LOOP. Have fun with that.

Your question was "how to delete from t1..t6 given a really big X_TABLE". It was not "how to efficiently delete a gazillion rows. You never mentioned how big t1..t6 are, how many rows you really expect to delete. sounds like a lot (that would be why the rbs was used so heavily).

Suggest you read:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6407993912330 <code>

and change UPDATE to DELETE -- instead of deleting the records you don't want, SELECT OUT the records you do -- drop the old table -- rename new table.

No rollback needed at all for that.


ops forgot to say

A reader, November 18, 2002 - 4:13 am UTC

I forgot to say that you might think my pl/sql is very wrong because of rownum I am using, I forgot to add a delete at the end which is

DELETE FROM emp
WHERE empno = X_EMPNO_ARRAY(i);

Tom Kyte
November 18, 2002 - 8:20 am UTC

see above - you don't want to delete, I think you want to CREATE TABLE AS SELECT ROWS_TO_KEEP and drop.

It would be about <very big number here> times faster. You can create the tables with nologging and no rollback. You can recreate the indexes with parallel nologging. It'll be super fast as compared to deleting most of the rows from indexed tables.

How to Tune Slow Deleting?

A Reader, December 06, 2002 - 4:08 pm UTC

Hi Tom,
I have I table that has over 2m rows. When I deleted a row using:
delete from t where id=i;
The id is pk column. This operation took 40 seconds to come back. Since I have a loop to do so to delete many many records (maybe up to 1m), it will never come back. Beacuse many tables (8) reference this table and this is a production database, I cannot do a truncate table, or creating a temp table to do the delettion. What can I tune to make the deletion faster, or where is the possible problem?
Thanks you for you help in advance.

Tom Kyte
December 07, 2002 - 9:20 am UTC

Make sure the CHILD TABLES have indexes on their foreign keys or else you are forcing a FULL SCAN on the child tables for each and every single row you delete. That is what it sounds like to me - you have unindexed foreign keys.

Here is a script that will help you detect unindexed foreign keys in a schema:

select decode( b.table_name, NULL, '****', 'ok' ) Status,
a.table_name, a.columns, b.columns
from
( select a.table_name, a.constraint_name,
max(decode(position, 1, cname,NULL)) ||
max(decode(position, 2,', '||cname,NULL)) ||
max(decode(position, 3,', '||cname,NULL)) ||
max(decode(position, 4,', '||cname,NULL)) ||
max(decode(position, 5,', '||cname,NULL)) ||
max(decode(position, 6,', '||cname,NULL)) ||
max(decode(position, 7,', '||cname,NULL)) ||
max(decode(position, 8,', '||cname,NULL)) ||
max(decode(position, 9,', '||cname,NULL)) ||
max(decode(position,10,', '||cname,NULL)) ||
max(decode(position,11,', '||cname,NULL)) ||
max(decode(position,12,', '||cname,NULL)) ||
max(decode(position,13,', '||cname,NULL)) ||
max(decode(position,14,', '||cname,NULL)) ||
max(decode(position,15,', '||cname,NULL)) ||
max(decode(position,16,', '||cname,NULL)) columns
from (select substr(column_name,1,30) cname,
substr(table_name,1,30) table_name,
substr(constraint_name,1,30) constraint_name,
position
from user_cons_columns) a,
user_constraints b
where a.constraint_name = b.constraint_name
and b.constraint_type = 'R'
group by a.table_name, a.constraint_name ) a,
( select table_name, index_name,
max(decode(position, 1, cname,NULL)) ||
max(decode(position, 2,', '||cname,NULL)) ||
max(decode(position, 3,', '||cname,NULL)) ||
max(decode(position, 4,', '||cname,NULL)) ||
max(decode(position, 5,', '||cname,NULL)) ||
max(decode(position, 6,', '||cname,NULL)) ||
max(decode(position, 7,', '||cname,NULL)) ||
max(decode(position, 8,', '||cname,NULL)) ||
max(decode(position, 9,', '||cname,NULL)) ||
max(decode(position,10,', '||cname,NULL)) ||
max(decode(position,11,', '||cname,NULL)) ||
max(decode(position,12,', '||cname,NULL)) ||
max(decode(position,13,', '||cname,NULL)) ||
max(decode(position,14,', '||cname,NULL)) ||
max(decode(position,15,', '||cname,NULL)) ||
max(decode(position,16,', '||cname,NULL)) columns
from ( select substr( column_name, 1, 30 ) cname,
substr( table_name, 1, 30 ) table_name,
substr( index_name, 1, 30 ) index_name,
column_position position
from user_ind_columns ) user_ind_columns
group by table_name, index_name ) b
where a.table_name = b.table_name (+)
and b.columns (+) like a.columns || '%'
/




Yes, that's where the problem is!

A Reader, December 18, 2002 - 4:41 pm UTC

Thank you!

cascade delete

A reader, October 28, 2003 - 12:57 pm UTC

hi tom
if i cascade delete a central parent table with
multiple child tables (of the order of 20) - the
total rows in all tables reach a million, the delete
is naturally slow. We can not afford to drop and
recreate the table. Is it a good idea to alter
table drop the constraints, drop indexes - delete the data and then add back the constraints. I am afraid that the
data that goes in would not get validated and don't
think it is a good idea. Your advice please? (the delete
can be issued by the user anytime in the UI.) It
does run in the background as a dbms_job - so that
is not the problem - it is the resource intensiveness
of that which is troubling us.

Thanx for an awesome site and two awesome books - have
both of them, read them more or less and love them!

Tom Kyte
October 28, 2003 - 2:23 pm UTC

are you saying "it will delete 1,000,000 rows" or the sum of all of the rows is 1,000,000


It will not "naturally" be slow.

it will be a direct function of:

o the number of child rows to cascade the delete to.
o the presence or lack thereof of indexes on the child tables foreign keys.

if you update the parent primary key OR you delete from the parent table -- indexing the childs fkeys is pretty much "a mandatory thing". have you done that.



thanx Tom!

A reader, October 28, 2003 - 4:37 pm UTC

Turns out indexes were not there in the test
(I am aware of the fk index thinggy and its impact
on child table)
a side question - does the lack of indexes also result
in more redo? Well, on thinking more about it - with indexes - even more redo (for index maintenance)
will be generated, right? One of the concerns
(perhaps unnecessary) was that the redo generated was
too high during cascade deletes )



Tom Kyte
October 28, 2003 - 9:08 pm UTC

more indexes = more redo = (in this case) better performance over all

given you were doing this without indexes -- the "concern" isn't relevant

thanx Tom!

A reader, October 28, 2003 - 10:03 pm UTC


Pls. help

A reader, March 15, 2004 - 1:10 pm UTC

Tom,

In the discussion above titled "How to Tune Slow Deleting", I am just wondering how you figured out that there is an unindexed foreign key.
Also, I remember studying somewhere that there is issue with locking when there is no index on the foreign key. Can you please eloborate

Thanks

Tom Kyte
March 15, 2004 - 2:41 pm UTC

the facts as given:


o table that has over 2m rows. (not so relevant really)
o I deleted a row using delete from t where id=i; The id is pk column.
(that is relevant, that should be instantaneous we would assume)
o This operation took 40 seconds to come back.
(that is relevant, that is the problem)
o Because many tables (8) reference this table
(ahh, that is the probable problem!)


I just guessed that if you were deleting a single parent row by a primary key value, and it was slow, and there were 8 child tables -- one or more (or all) of them might be getting full scanned to process the delete -- if there wasn't an index in place.

see
</code> http://asktom.oracle.com/~tkyte/unindex/index.html <code>


Help on locks

A reader, March 15, 2004 - 6:04 pm UTC

Tom,

Please clarify about locks :

Table level lock
----------------
DDLs always take a table lock and No DMLs can be performed

DML
---
DMLs can be made to take a table lock (using LOCK TABLE statement, but is not a good practive). But Oracle's default locking mechanism is a row-level exclusive lock (for all rows affected by the DML) and a table level share lock (ie. no DDLs can be performed when a DML is taking place), but another transaction can place a share lock (possibly to update other rows in the table).

But the Oracle documentation you mentioned has the following lines and I quote here
<Quote>
Share locks allow reading only. Therefore, no INSERT, UPDATE, or DELETE statements could be issued on the child table until the transaction containing the UPDATE or DELETE was committed. Queries were allowed on the child table.
</Quote>

Please clarify the above statement. If my understanding is not correct, please explain. The topics on locks is very confusing



Tom Kyte
March 15, 2004 - 6:31 pm UTC

snippets from my "Book Expert One on One Oracle"

<quote>
There are three types of DDL locks:

o Exclusive DDL locks - This will prevent other sessions from gaining a DDL lock or TM (DML) lock themselves. This means that a table against which DDL is performed may be queried, but cannot be modified during the operation.

o Share DDL locks - This protects the referenced objects from being changed by other sessions with DDL, but allows modifications to the data.

o Breakable parse locks - This allows objects such as a query plan cached in the shared pool to register its reliance on some object. If you perform DDL against that object, Oracle will review the list of objects that have registered their dependence, and invalidate them. Hence, these 'locks' are 'breakable'; they do not prevent the DDL from occurring.

</quote>

So, the statement "ddl always take a table lock and no dmls can be performed" is false.


The table level share lock is not a lock on the table really -- it is yet another queueing mechanism to make sure the table isn't "changed structurally". It is different than "lock table in share mode"

It took me about 30 pages to adequately describe locking in Oracle and impacts on your life (and I still left bits out). I won't be able to do it here, but if you have access to Expert one on one Oracle -- i think it is pretty descriptive.

Help

A reader, March 15, 2004 - 6:48 pm UTC

Tom,

Thanks much for your response. So is the statement given in the application developer's guide within <quote>-</quote> ie. "share table locks prevents DMLs from being performed on the table" in the above question correct or false. Please let me know

Thanks

Tom Kyte
March 15, 2004 - 7:06 pm UTC

in the proper context, it is accurate.

lock table in share mode.

no other dmls will be done. that is a true table level share lock.


suggest you look here:
</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96524/c21cnsis.htm#2841

and in paricular:

http://docs.oracle.com/docs/cd/B10501_01/server.920/a96524/c21cnsis.htm#9975 <code>

(but goto the pdf, the table seems to be missing some headers for some reason -- all of the pdfs are on otn as well)

Buld delete vs dynamic "in" delete

Lee, March 23, 2004 - 1:30 pm UTC

Tom,
What would be faster, a FORALL ..... DELETE ....WHERE ROWID = T_TYPE();

OR

constructing a DELETE .....WHERE ROWID IN ();

Which delete will be faster? (not taking into account the time it takes to construct the dynamic "in" delete).

Thanks.

Tom Kyte
March 24, 2004 - 7:56 am UTC

unless the IN used binds, forall probably. (without binds, don't even consider doing it)


(but, did you BENCHMARK IT????????)




Buld delete vs dynamic "in" delete

Lee, March 24, 2004 - 9:03 am UTC

Tom,
I'm not sure what you suggest to use bind in this case.
The in is going to execute only once, it will look something like this (I will build it dynamicly, concatenate the values with a loop on plsql table):

DELETE FROM MY_TABLE
WHERE ROWID IN ('DSAFASDFAS', 'ADFSAFDSA', 'ADSFASDFSAD', 'ADFASDFDSA');

about 800-1000 rowids in the in.

Thanks.

Tom Kyte
March 24, 2004 - 9:40 am UTC

and no bind variables...........

everyone says "only once", but they never really mean it.


but if only once, the difference in timing is so "not relevant" as it would be measured in fractions of a second at best. do which ever is "easiest" to do.

Buld delete vs dynamic "in" delete

Lee., March 24, 2004 - 10:35 am UTC

Tom,
I have another one for you.

I have to update 600 records at the time based on rowid, I can use FORALL..UPDATE. Or I can build an update statment dynamicly that will look like this(I'll do it with only 3 rowids here)

UPDATE (SELECT DATE_1,
ROWID R_ID
FROM MY_TABLE
WHERE ROWID IN ('ASDFASD', 'ADFADSF', 'ADFDSA') SET
DATE_1 = CASE WHEN ROWID = 'ASDFASD' THEN
DATE_1 + 1
WHEN ROWID = 'ADFADSF' THEN
DATE_1 - 1
WHEN ROWID = 'ADFDSA' THEN
DATE_1 + 100
END;


This will be for around 600 values.

Thanks.



Tom Kyte
March 24, 2004 - 10:45 am UTC

this would definitely use forall and two arrays


forall i in 1..l_array.count
update t set date_1 = l_array(i) where rowid = l_rowids(i);


Like I said -- DO WHAT IS EASIEST. Here, filling an array procedurally looks to me to be "obviously much easier than anything else"

Buld delete vs dynamic "in" delete

Lee, March 24, 2004 - 11:14 am UTC

Tom,

I'm with you on the "easy" thing, however, I just tried it and to my surprise it seem that the UPDATE (SELECT...) SET COL_1 CASE WHEN.... works much faster then the FORALL...and I mean MUCH faster (1-5 seconds vs many minutes).

Since I was a little kid I was taught that FORALL would be faster in a case like this and I’m surprise to find that it is not (it is definatly easier to write). Could it be coz of hardware configuration? The main “bad” thing on my box is that there is only 1 controller, could that be a factor.

Thanks.


Tom Kyte
March 24, 2004 - 11:31 am UTC

then you made "a mistake" and compared apples to toaster ovens

and in fact, if you had 600 rowids , you would have blown the 32k limit

Look:

ops$tkyte@ORA9IR2> create table t
  2  as
  3  select * from all_objects
  4  where rownum <= 600;
 
Table created.
 
Elapsed: 00:00:00.16
ops$tkyte@ORA9IR2> alter table t add Ive_been_updated char(1);
 
Table altered.
 
Elapsed: 00:00:00.09
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*) from t where ive_been_updated = 'Y';
 
  COUNT(*)
----------
         0
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> declare
  2          type ridArray is table of rowid index by binary_integer;
  3          type numArray is table of number index by binary_integer;
  4          l_rids  ridArray;
  5          l_nums  numArray;
  6  begin
  7          select rowid BULK COLLECT into l_rids from t;
  8
  9          for i in 1 .. l_rids.count
 10          loop
 11                  l_nums(i) := 1 + -2 * mod(i,2);
 12          end loop;
 13
 14          forall i in 1 .. l_rids.count
 15                  update t set created = created + l_nums(i), ive_been_updated = 'Y' where rowid = l_rids(i);
 16  end;
 17  /
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2> select count(*) from t where ive_been_updated = 'Y';
 
  COUNT(*)
----------
       600
 
Elapsed: 00:00:00.00


and if this is a one time thing -- so what?  if it is NOT a one time thing, you must must must use binds.


 

Buld delete vs dynamic "in" delete

Lee, March 24, 2004 - 11:45 am UTC

Tom,

I'm not sure why you think that I made a mistake.
I used 2 variables (v_sql varchar2(32500)) to constract the update query.

The end result was something like this:
EXECUTE IMMEDIATE 'UPDATE (SELECT ROWID R_ID, '||
' DATE_TERMINATION '||
' FROM F_PROVIDER PARTITION (F_PROVIDER_1) '||
' WHERE ROWID IN '||V_SQL_STM_2||') SET '||
'DATE_TERMINATION = CASE '||V_SQL_STM_3||' END';


One more thing, when I said "one time" I didn't mean 1 time for ever I ment 1 time per execution. I would want to use bind variables in this case, do I do it just like a regular bind variable?

EXECUTE IMMEDIATE 'UPDATE (SELECT ROWID R_ID, '||
' DATE_TERMINATION '||
' FROM F_PROVIDER PARTITION (F_PROVIDER_1) '||
' WHERE ROWID IN :V_BIND_1) SET '||
'DATE_TERMINATION = CASE :V_BINE_2 END' USING V_SQL_STM_2, V_SQL_STM_3;

Tom Kyte
March 24, 2004 - 1:19 pm UTC

show me the full example where by the forall took "minutes" for 600 rows and the "update case" took seconds.

That is what I'm saying you made a mistake. to delete 600 rows, via forall, would take about the same time as with "case" -- one or the other didn't do what you think it did.

That is what I mean.

Show us the ENTIRE test case - the whole thing.

and if this is "standard operating procedure", you MUST (as in HAVE TO) use forall with binds. No other way you even want to CONSIDER.

Buld delete vs dynamic "in" delete

Lee, March 24, 2004 - 1:38 pm UTC

Ok, you asked for it.......go easy on me if you find bad mistakes. (the FORALL is commented close to the end of the code).



DECLARE
V_NUM NUMBER := 1;
V_COUNT NUMBER := 1;
V_C1_COUNT NUMBER := 2;
V_SQL_STM VARCHAR2(32500) := NULL;
V_SQL_STM_2 VARCHAR2(32500) := 'ZZTOP';
V_SQL_STM_3 VARCHAR2(32500) := NULL;
TYPE T_R_ID_TYPE IS TABLE OF ROWID INDEX BY PLS_INTEGER NOT NULL;
TYPE T_UPD_TERM_DATE_TYPE IS TABLE OF NUMBER INDEX BY PLS_INTEGER NOT NULL;
T_DEL_R_ID T_R_ID_TYPE;
T_UPD_R_ID T_R_ID_TYPE;
T_UPD_TERM_DATE T_UPD_TERM_DATE_TYPE;
-------------------------------------------
CURSOR C1 IS SELECT DISTINCT
CASE WHEN --DO NOT DELETE
DATE_EFFECTIVE < LEAD_DATE_EFFECTIVE AND
DATE_TERMINATION > LEAD_DATE_EFFECTIVE THEN
NULL
WHEN --DELETE
(DATE_EFFECTIVE > LEAD_DATE_EFFECTIVE AND
DATE_EFFECTIVE < LEAD_DATE_TERMINATION) OR
(DATE_EFFECTIVE > LAG_DATE_EFFECTIVE AND
DATE_EFFECTIVE < LAG_DATE_TERMINATION) THEN
R_ID
END DEL_R_ID,
CASE WHEN DATE_EFFECTIVE < LEAD_DATE_EFFECTIVE AND
DATE_TERMINATION > LEAD_DATE_EFFECTIVE THEN
R_ID
WHEN (DATE_EFFECTIVE > LEAD_DATE_EFFECTIVE AND
DATE_EFFECTIVE < LEAD_DATE_TERMINATION) OR
(DATE_EFFECTIVE > LAG_DATE_EFFECTIVE AND
DATE_EFFECTIVE < LAG_DATE_TERMINATION) THEN
NULL
END UPD_R_ID,
CASE WHEN DATE_EFFECTIVE < LEAD_DATE_EFFECTIVE AND
DATE_TERMINATION > LEAD_DATE_EFFECTIVE THEN
TO_NUMBER(TO_CHAR(TO_DATE(LEAD_DATE_EFFECTIVE, 'YYYYMMDD') - 1, 'YYYYMMDD'))
WHEN (DATE_EFFECTIVE > LEAD_DATE_EFFECTIVE AND
DATE_EFFECTIVE < LEAD_DATE_TERMINATION) OR
(DATE_EFFECTIVE > LAG_DATE_EFFECTIVE AND
DATE_EFFECTIVE < LAG_DATE_TERMINATION) THEN
NULL
END UPD_TERM_DATE
FROM
(SELECT
CLIENT_PROVIDER_ID,
LEAD(CLIENT_PROVIDER_ID) OVER(ORDER BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF, DATE_EFFECTIVE) LEAD_CLIENT_PROVIDER_ID,
LAG(CLIENT_PROVIDER_ID) OVER(ORDER BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF, DATE_EFFECTIVE) LAG_CLIENT_PROVIDER_ID,
CLIENT_PROVIDER_AFF,
LEAD(CLIENT_PROVIDER_AFF) OVER(ORDER BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF, DATE_EFFECTIVE) LEAD_CLIENT_PROVIDER_AFF,
LAG(CLIENT_PROVIDER_AFF) OVER(ORDER BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF, DATE_EFFECTIVE) LAG_CLIENT_PROVIDER_AFF,
DATE_EFFECTIVE,
DATE_TERMINATION,
LEAD(DATE_EFFECTIVE) OVER(PARTITION BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF ORDER BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF, DATE_EFFECTIVE ASC) LEAD_DATE_EFFECTIVE,
LEAD(DATE_TERMINATION) OVER(PARTITION BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF ORDER BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF, DATE_EFFECTIVE ASC) LEAD_DATE_TERMINATION,
LAG(DATE_TERMINATION) OVER(PARTITION BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF ORDER BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF, DATE_EFFECTIVE ASC) LAG_DATE_TERMINATION,
LAG(DATE_EFFECTIVE) OVER(PARTITION BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF ORDER BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF, DATE_EFFECTIVE ASC) LAG_DATE_EFFECTIVE,
ROWID R_ID,
LEAD(ROWID) OVER(PARTITION BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF ORDER BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF, DATE_EFFECTIVE ASC) LEAD_R_ID,
CREATE_TSP,
MIN(CREATE_TSP) OVER(PARTITION BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF) MIN_CREATE_TSP
FROM F_PROVIDER PARTITION (F_PROVIDER_1)
ORDER BY CLIENT_PROVIDER_ID, CLIENT_PROVIDER_AFF, DATE_EFFECTIVE ASC)
WHERE ((CLIENT_PROVIDER_ID = LEAD_CLIENT_PROVIDER_ID AND
CLIENT_PROVIDER_AFF = LEAD_CLIENT_PROVIDER_AFF) OR
(CLIENT_PROVIDER_ID = LAG_CLIENT_PROVIDER_ID AND
CLIENT_PROVIDER_AFF = LAG_CLIENT_PROVIDER_AFF)) AND
(((DATE_EFFECTIVE >= LEAD_DATE_EFFECTIVE AND
DATE_TERMINATION <= LEAD_DATE_TERMINATION) OR
(DATE_EFFECTIVE <= LEAD_DATE_EFFECTIVE AND
DATE_TERMINATION >= LEAD_DATE_TERMINATION)) OR
((DATE_EFFECTIVE >= LAG_DATE_EFFECTIVE AND
DATE_TERMINATION <= LAG_DATE_TERMINATION) OR
(DATE_EFFECTIVE <= LAG_DATE_EFFECTIVE AND
DATE_TERMINATION >= LAG_DATE_TERMINATION))) AND
CREATE_TSP = MIN_CREATE_TSP;
BEGIN
WHILE V_C1_COUNT > 1
LOOP
OPEN C1;
LOOP
FETCH C1 BULK COLLECT INTO T_DEL_R_ID, T_UPD_R_ID, T_UPD_TERM_DATE LIMIT 600;
V_C1_COUNT := C1%ROWCOUNT;
EXIT WHEN T_DEL_R_ID.COUNT < 2 AND T_UPD_R_ID.COUNT < 2;
FOR I IN 1 .. T_DEL_R_ID.COUNT
LOOP
IF T_DEL_R_ID(I) IS NULL THEN
T_DEL_R_ID.DELETE(I);
END IF;
END LOOP;
FOR I IN 1 .. T_UPD_R_ID.COUNT
LOOP
IF T_UPD_R_ID(I) IS NULL THEN
T_UPD_R_ID.DELETE(I);
T_UPD_TERM_DATE.DELETE(I);
END IF;
END LOOP;
DBMS_APPLICATION_INFO.SET_MODULE (TO_CHAR(V_COUNT||': '||T_DEL_R_ID.COUNT)||', '||TO_CHAR(T_UPD_R_ID.COUNT), TO_CHAR(SYSDATE, 'DD-MON-YY HH24:MI:SS'));
--------------------------
IF T_DEL_R_ID.COUNT > 0 THEN
FOR I IN T_DEL_R_ID.FIRST .. T_DEL_R_ID.COUNT
LOOP
V_SQL_STM := V_SQL_STM||CASE WHEN T_DEL_R_ID(I) IS NOT NULL THEN
''''||T_DEL_R_ID(I)||''', '
END;
END LOOP;
END IF;
IF V_SQL_STM IS NOT NULL THEN
EXECUTE IMMEDIATE 'DELETE /*+ NOLOGGING NOPARALLEL(F_PROVIDER) */ FROM F_PROVIDER '||
'WHERE ROWID IN ('||RTRIM(V_SQL_STM, ', ')||')';
END IF;
COMMIT;
V_SQL_STM := NULL;
-------------------------
V_SQL_STM_2 := T_UPD_R_ID.COUNT;
IF T_UPD_R_ID.COUNT > 0 THEN
V_SQL_STM_2 := NULL;
V_SQL_STM_3 := NULL;
FOR I IN T_UPD_R_ID.FIRST .. T_UPD_R_ID.LAST
LOOP
IF T_UPD_R_ID(I) IS NOT NULL THEN
V_SQL_STM_2 := V_SQL_STM_2||''''||T_UPD_R_ID(I)||''', ';
V_SQL_STM_3 := V_SQL_STM_3||'WHEN R_ID = '''||T_UPD_R_ID(I)||''' THEN '||
' '||T_UPD_TERM_DATE(I)||' ';
END IF;
END LOOP;
V_SQL_STM_2 := '('||RTRIM(V_SQL_STM_2, ', ')||')';
EXECUTE IMMEDIATE 'UPDATE (SELECT ROWID R_ID, '||
' DATE_TERMINATION '||
' FROM F_PROVIDER PARTITION (F_PROVIDER_1) '||
' WHERE ROWID IN '||V_SQL_STM_2||') SET '||
'DATE_TERMINATION = CASE '||V_SQL_STM_3||' END';
COMMIT;
END IF;
--********************
--IF T_UPD_R_ID.COUNT > 0 THEN
-- FORALL I IN T_UPD_R_ID.FIRST .. T_UPD_R_ID.COUNT
-- UPDATE /*+ NOLOGGING NOPARALLEL(F_PROVIDER) */ F_PROVIDER SET
-- DATE_TERMINATION = T_UPD_TERM_DATE(I)
-- WHERE ROWID = T_UPD_R_ID(I);
-- COMMIT;
--END IF;
--********************
-------------------------
V_COUNT := V_COUNT + 1;
END LOOP;
CLOSE C1;
END LOOP;
----------------
T_DEL_R_ID.DELETE;
T_UPD_R_ID.DELETE;
T_UPD_TERM_DATE.DELETE;
END;
/


Tom Kyte
March 24, 2004 - 2:13 pm UTC

sorry -- way too big -- and I cannot even run it!!!!   

the point of a test case is to make it so others can reproduce your findings.  Here I cannot even see how you timed something like this (tkprof, what?)


this code doesn't fly:

               IF T_DEL_R_ID.COUNT > 0 THEN
                   FOR I IN T_DEL_R_ID.FIRST .. T_DEL_R_ID.COUNT
                   LOOP
                           V_SQL_STM := V_SQL_STM||CASE WHEN T_DEL_R_ID(I) IS 
NOT NULL THEN
                                                                 
''''||T_DEL_R_ID(I)||''', '
                                                   END;
                   END LOOP;           
               END IF;

And if you .DELETED some of these -- you would be getting "NO DATA FOUND" like this:


ops$tkyte@ORA9IR1> declare
  2          type array is table of number index by binary_integer;
  3          l_data array;
  4          l_string  long;
  5  begin
  6          for i in 1 .. 3
  7          loop
  8                  l_data(i) := i;
  9          end loop;
 10
 11          l_data.delete(2);
 12
 13          for i in l_data.first .. l_data.count
 14          loop
 15                  l_string := l_string || l_data(i) || ',';
 16          end loop;
 17  end;
 18  /
declare
*
ERROR at line 1:
ORA-01403: no data found
ORA-06512: at line 15
 

sorry -- but this code won't "run" in general.


you do know that "nologging" is "not an option" at all here (that nologging 'hint' is meaningless)



so, I think there is a good change the one piece of code well, it never actually "ran" or is very different from the above.


Sooo, here is my complete from start to finish that shows forall = superior for this example.

Also, if you do not bind -- you are dead in the water, just do not pass go, do not collect $200.00, go directly to jail:


ops$tkyte@ORA9IR2> drop table t;
 
Table dropped.
 
Elapsed: 00:00:00.08
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t
  2  as
  3  select *
  4    from all_objects
  5   where rownum <= 1200;
 
Table created.
 
Elapsed: 00:00:00.13
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> alter table t add been_updated char(1);
 
Table altered.
 
Elapsed: 00:00:00.06
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*) from t where been_updated='Y';
 
  COUNT(*)
----------
         0
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> select count(*) from t;
 
  COUNT(*)
----------
      1200
 
Elapsed: 00:00:00.01
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> declare
  2      type ridArray is table of rowid index by binary_integer;
  3      type dtArray is table of date index by binary_integer;
  4
  5      l_rids ridArray;
  6      l_rids_to_update ridArray;
  7      l_rids_to_delete ridArray;
  8      l_dates dtArray;
  9  begin
 10      select rowid
 11        bulk collect into l_rids
 12        from t;
 13
 14      for i in 1 .. l_rids.count
 15      loop
 16          if (mod(i,2) = 0)
 17          then
 18              l_rids_to_update(l_rids_to_update.count+1) := l_rids(i);
 19              l_dates(l_dates.count+1) := sysdate;
 20          else
 21              l_rids_to_delete(l_rids_to_delete.count+1) := l_rids(i);
 22          end if;
 23      end loop;
 24
 25      forall i in 1 .. l_rids_to_delete.count
 26          delete from t where rowid = l_rids_to_delete(i);
 27
 28      forall i in 1 .. l_rids_to_update.count
 29          update t set created = l_dates(i), been_updated = 'Y' where rowid = l_rids_to_update(i);
 30  end;
 31  /
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.05
ops$tkyte@ORA9IR2> select count(*) from t where been_updated='Y';
 
  COUNT(*)
----------
       600
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> select count(*) from t;
 
  COUNT(*)
----------
       600
 
Elapsed: 00:00:00.00



ops$tkyte@ORA9IR2> drop table t;
 
Table dropped.
 
Elapsed: 00:00:00.11
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t
  2  as
  3  select *
  4    from all_objects
  5   where rownum <= 1200;
 
Table created.
 
Elapsed: 00:00:00.10
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> alter table t add been_updated char(1);
 
Table altered.
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*) from t where been_updated='Y';
 
  COUNT(*)
----------
         0
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> select count(*) from t;
 
  COUNT(*)
----------
      1200
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> declare
  2      type ridArray is table of rowid index by binary_integer;
  3      l_rids ridArray;
  4      l_to_delete   long;
  5      l_to_update1  long;
  6      l_to_update2  long;
  7  begin
  8      select rowid
  9        bulk collect into l_rids
 10        from t;
 11
 12      for i in 1 .. l_rids.count
 13      loop
 14          if (mod(i,2) = 0)
 15          then
 16              l_to_update1 := l_to_update1 || '''' || l_rids(i) || ''',';
 17              l_to_update2 := l_to_update2 || ' when rowid = ''' || l_rids(i) || ''' then sysdate ';
 18          else
 19              l_to_delete := l_to_delete || '''' || l_rids(i) || ''',';
 20          end if;
 21      end loop;
 22
 23      execute immediate 'delete from t where rowid in ( ' || rtrim(l_to_delete,',') || ')';
 24      dbms_output.put_line( sql%rowcount || ' rows deleted' );
 25
 26      execute immediate 'update t set been_updated = ''Y'', ' ||
 27          ' created = case ' || l_to_update2 || ' end where rowid in ( ' ||
 28                         rtrim( l_to_update1, ',' ) || ')';
 29      dbms_output.put_line( sql%rowcount || ' rows updated' );
 30  end;
 31  /
600 rows deleted
600 rows updated
 
PL/SQL procedure successfully completed.
 
Elapsed: 00:00:00.17
ops$tkyte@ORA9IR2> select count(*) from t where been_updated='Y';
 
  COUNT(*)
----------
       600
 
Elapsed: 00:00:00.00
ops$tkyte@ORA9IR2> select count(*) from t;
 
  COUNT(*)
----------
       600
 
Elapsed: 00:00:00.00



 

Delete not using PK index

Oleg Oleander, April 01, 2004 - 3:04 pm UTC

SQL> delete  from T_BILLS nb 
  2  where nb.BILL_ID in (select ID from TABLE(CAST(NT_ID_LIST_TY(ID_TY(1),ID_TY(2)) as NT_ID_LIST_TY)));

0 sor törölve.


Végrehajtási terv
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE (Cost=54934 Card=1 Bytes=1
          35)

   1    0   DELETE OF 'T_BILLS'
   2    1     NESTED LOOPS (SEMI) (Cost=54934 Card=1 Bytes=135)
   3    2       TABLE ACCESS (FULL) OF 'T_BILLS' (Cost=24 Card=3230 By
          tes=436050)

   4    2       COLLECTION ITERATOR (CONSTRUCTOR FETCH)

Why isnt the above query using PK on T_BILLS (the BILL_ID column)? Naturally I bind the collection in reality, but its all the same. Its very slow. If I use the FORALL trick previously seen in this article than it uses the index, and its pretty fast. Please comment.

Oleg 

Tom Kyte
April 02, 2004 - 9:34 am UTC

If I were coding this in PLSQL -- i would use the forall (already have the collection, already in PLSQL -- same net effect but easier to understand and code)

but see
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:3779680732446#15740265481549 <code>

for a discussion of what's happening when you use this technique and how to help the optimizer understand what it is being asked to do.

how detect delete exceptions in bulk processing

Pinguman, September 20, 2004 - 5:01 am UTC

Hi

Using Oracle 8.1.7.4

I have a pl/sql code as follows,


begin
for i in (select * from TAB_X)
loop
begin
delete TAB_Y where x_id = i.x_id; -- the number of rows deleted varies
exception
when others
then
dbms_output.put_line(i.x_id || ' error');
end;
end loop;
commit;
end;
/

this is terribly slow as you can imagine, I am trying to use bulk processing in pl/sql but I cant find a way to convert the exception part, i.e catch the x_id which might cause error in the delete process

In 9i we have SAVE EXCEPTIONS but in 8i is there a way to do this?



Tom Kyte
September 20, 2004 - 8:41 am UTC

there are so many things wrong with this.

a when others, not followed by a "RAISE"

tell me -- what logic could you possibly have that says "hey, delete as many things as you can and if some fail for whatever reason, well -- thats OK, just sort of ignore it"


The way to optimize that routine would be:

begin
null;
end;
/


the only way to code that would be:


delete from tab_y where x_id in ( select x_id from tab_x );





that was a pseudo code

A reader, September 20, 2004 - 9:50 am UTC

Hi

The code above is just a pseudo code, in fact it´s much more complicated inside the loop, there are quite a few calculations and if else condition testings so a plain delete wont work

Thank you many much

Tom Kyte
September 20, 2004 - 10:50 am UTC

then you are stuck with slow by slow processing as save exceptions is a new feature and did not exists in 8i

Delete from t

A reader, October 13, 2004 - 11:02 pm UTC

big_table@ORA920.LOCALHOST> delete from t where x in ( select id from big_table
);

Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE (Cost=38 Card=36 Bytes=216)
1 0 DELETE OF 'T'
2 1 NESTED LOOPS (Cost=38 Card=36 Bytes=216)
3 2 TABLE ACCESS (FULL) OF 'T' (Cost=2 Card=36 Bytes=72)
4 2 INDEX (UNIQUE SCAN) OF 'BIG_TABLE_PK' (UNIQUE) (Cost=1 Card=1
Bytes=4)

Not quite sure I understand the above. Why does step 4 above show a unique scan of big_table? Wouldnt it have to get all the "ids" from big_table and sort/unique them to compare to t.x and delete?

Tom Kyte
October 14, 2004 - 9:20 am UTC

it is reading T row by row (cause it thinks there are 36 rows in there) and doing an index probe into BIG_TABLE to see if that key exists.

So, it is like:

for x in ( select * from t ) -- expecting 36 iterations
loop
if ( row exists in big_table )
then
delete it
end if
end loop


it would rather do that than process a really big table -- it did the right thing (assuming the stats were correct)

Frank, January 21, 2005 - 11:12 am UTC

Hi Tom,

I am using Oracle EE 8.1.7.4 and below is my question.

NOTE_ID NOT NULL NUMBER(9) - PK
USER_ID NOT NULL NUMBER(9)
SUBJECT NOT NULL VARCHAR2(100)
NOTE_TEXT VARCHAR2(4000)
NOTE_DATE NOT NULL DATE

# of rows : 257 (not very rapidly growing)
disk_reads : 479411
buffer_gets : 558667

On the statspack reports, the below sql appears on the top for higher disk_reads and buffer_gets.

delete from notes where note_id = :1;

I checked the query plan and the cost is always 1 and the access is via primary key. The table is analyzed every week using estimate statistics and there are no chained rows.

Should these high numbers be a concern? Is there a way to reduce the numbers?

Thanks.

Tom Kyte
January 21, 2005 - 6:38 pm UTC

don't delete -- numbers for deleting will go way down.

Seriously -- they are aggregate numbers for all of the executions in that time frame.

how many deletes did that statspack represent.

restricting number of concurrent deletes

Sanji, June 30, 2005 - 8:11 am UTC

Tom,
We are on 10g rel 2, Win XP. The application is designed such that during off load hours(mostly at nights), there are scheduled jobs that offload data from 1 table and create subsequent files. The table is decently big in size with more than 50 million rows.
The issue is that since the jobs are scheduled subsequently (at an hour's difference), if the first job doesn't finish and the other one starts, then the database starts dragging.

I tried suggesting Select FOR UPDATE functionality, but it requires a code change and that would be cumbersome.
The next i can think of is a Before DELETE trigger that would identify any delete happening on the table and terminating that process.
Does this sound feasible. (Could you demonstrate the functionality with an example ?)
Is there a better functionality that can be implemented.

Thanks
Sanji

Tom Kyte
June 30, 2005 - 9:48 am UTC

insufficient data, one asks "why have more than one job" if they all do the same thing. seems job1 should call job2 should call job3 and so on, if you do not want them running concurrently.

pretty much anything is going to be a "code change"

Delete Trigger

Sanji, July 11, 2005 - 5:57 am UTC

Tom,
The functionality is such that the jobs have to be independently executed.
I tried creating a trigger that would identify a sql pattern as in 'delete%from%table_name%where%exists%'
and in case the pattern exists, would refrain any process executing this sql pattern.
The problem is that this trigger blocks any delete on the table. Fairly so, because the condition that i am checking would be only for the existence of the pattern.
Sample code that i wrote for this trigger

CREATE OR REPLACE TRIGGER delete_concurrency
BEFORE DELETE ON NGSDMS60.NGCONTENTDETAIL

DECLARE
id1 number;id2 number;id3 number;
BEGIN

select count(*) into id1
from v$session s, v$transaction t, v$sqlarea a
where s.taddr = t.addr
and s.saddr = t.ses_addr
and s.sql_hash_value=0
and s.sql_address='00'
and a.address = s.prev_sql_addr
and a.hash_value = s.prev_hash_value
and upper(a.sql_text) like 'DELETE%FROM%NGSDMS60%NGCONTENTDETAIL%CD%WHERE%EXISTS%(%SEL%';

if id1 > 0 then
raise_application_error(-20000,'Error Encountered due to id1');
end if;

select count(*) into id2
from v$session s, v$transaction t,v$sqlarea a
where s.taddr = t.addr
and s.saddr = t.ses_addr
and upper(a.sql_text) like 'DELETE%FROM%NGSDMS60%NGCONTENTDETAIL%CD%WHERE%EXISTS%(%SEL%'
and a.address = s.sql_address
and a.hash_value = s.sql_hash_value ;


if id2 > 0 then
raise_application_error(-20000,'Error Encountered due to id2');
end if;

END;
/

How do i get rid of other (that do not match the pattern) delete sqls not getting blocked because of this trigger (other than removing the trigger :) )

Regards
Sanji

Tom Kyte
July 11, 2005 - 8:41 am UTC

that is not going to work reliably.

the microsecond after you peek -- (while someone else was peeking looking for the same sql) -- you'll both be doing the thing you want to do sequentially.

time to go back to the application which has a design flaw?

delete issue

Sanji, July 11, 2005 - 9:50 am UTC

Dear Tom,
I agree to your verdict.

Anyways for educational purpose, i have another doubt( if it's not outside the scope of this discussion).
How to extract the sql which has been executed from the same session ?
for instance

update table_name set col_name = some_value
where condition;

this sql has not been committed and i need to extract it in some variable from the same session.
the moment i try writing
select sql_text from v$sql/sqlarea/sqltext/open_cursor
where address/ hash_value = ....

it'd give me the select sql.

Anyway i can get the update statement from the same session ?

Thanks and Regards
Sanji

Tom Kyte
July 11, 2005 - 11:35 am UTC

triggers can access this via ora_sql_txt

Deleting from table by rowid

Thiru, August 24, 2005 - 12:16 pm UTC

Tom,

This is regarding a very slow delete process from a table with ten check constraints, a primary key and a foreign key.
40000 records takes 6-8 mts to delete. The production table willhave around 5 million records to be delted this way out of around 20 million records. Is it because of the constraints and the various overhead required to maintain the indexes?

DELETE FROM TBL WHERE ROWID IN (SELECT ROWID FROM PURGE_TBL);


Is there any other way to handle this type of delete? When I disabled the Primary key and the foriegn key, the delete just took a minute.

Thanks.

Tom Kyte
August 24, 2005 - 6:28 pm UTC

the foreign key on this table isn't relevant.
neither are the check constraints.

they don't really have to be validated during the delete.

All the indexes - they will impact this greatly.

Why wouldn't you design a better way to "purge" by using partitioning or something? So you don't delete?

Problem with Arrays

Kanth, August 25, 2005 - 11:06 am UTC

Tom,I need help with some Array logic

I am doing an Audit on every DML performed by users by creating an array of columns[~ Delimited] and passing it
to a function which loops thru array and insert all column values into an audit table.


-- for Inserts (pseudo code)

insert into table_name(col1,col2..coln) values (p_col1,p_col2..p_coln) --p_col1..p_coln are IN parameters to this procedure
returning [col1],[col2]..[coln] into [colval1],[colval2],..[colvaln]

-- Create an array of columns
-- oldcol1...oldcoln values are nulls in case of inserts

Array_value(1):='column_name ~'||oldcol1||'~'||colval1;
Array_value(2):='column_name ~'||oldcol2||'~'||colval2;
.
.
.
Array_value(n):='column_name ~'||oldcoln||'~'||colvaln;

--and pass this array to an audit function
audit_func(array_values);

-- for Updates (pseudo code)

Update table_name set col2=p_col2..coln=p_coln
where col1=p_col1
returning [col1],[col2]..[coln] into [colval1],[colval2],..[colvaln]

-- Create an array of columns
-- oldcol1...oldcoln values are nulls in case of inserts

Array_value(1):='column_name ~'||oldcol1||'~'||colval1;
Array_value(2):='column_name ~'||oldcol2||'~'||colval2;
.
.
.
Array_value(n):='column_name ~'||oldcoln||'~'||colvaln;

But problem with Updates/Deletes is if it's affecting multiple rows, i need to return all those rows and do an Audit by creating an array of columns for all rows.
My question is how do i handle multiple rows as array of columns? Is there any simple approach to this?

Hope i am making sense.
[If i am taking this thread in a different path and this questions does not belong here, sorry,please let me know and i'll post in correct thread]

Thanks


Tom Kyte
August 25, 2005 - 2:27 pm UTC

bulk collect.


ops$tkyte@ORA10GR1> create table t ( x int, y int );

Table created.

ops$tkyte@ORA10GR1>
ops$tkyte@ORA10GR1> insert into t select rownum, rownum from all_users;

67 rows created.

ops$tkyte@ORA10GR1>
ops$tkyte@ORA10GR1> declare
  2          type array is table of number;
  3          l_data array;
  4  begin
  5          update t set y = y*2 where x <= 5 returning y BULK COLLECT into l_data;
  6          for i in 1 .. l_data.count
  7          loop
  8                  dbms_output.put_line( l_data(i) );
  9          end loop;
 10  end;
 11  /
2
4
6
8
10

PL/SQL procedure successfully completed.
 

Single Delete Vs PL/SQL Delete

A reader, August 25, 2005 - 12:29 pm UTC

We've a similar case like this.

delete from big table where col1 in ( select col2 from x_tables)

I understand single delete runs faster than PL/SQL delete., But if it's the database is so active then I've seen single delete takes more resources, So we used PL/SQL for all with limit clause and do commit in a loop., we're commiting in a loop for two reason, avoid rbs error and for restart/recovery..

Does it make sense or I'm missing something here? Any advice would help.


Tom Kyte
August 25, 2005 - 6:30 pm UTC

doesn't seem to make sense to me, since doing it slow by slow will generate MORE total redo and MORE total undo.

why not delete using ROWNUM set very generously instead of procedural code???


Problem with Arrays

A reader, August 25, 2005 - 2:42 pm UTC

Thanks for your response,Tom.

so, in my situation i have to implement it this way?

for Updates (pseudo code)

Update table_name set col2=p_col2..coln=p_coln
where col1=p_col1
returning [col1],[col2]..[coln] bulk collect into [colval1],[colval2],..[colvaln]

for i in 1..colval1.count
loop
-- Create an array of columns
-- oldcol1...oldcoln values are nulls in case of inserts

Array_value(1):='column_name ~'||oldcol1(i)||'~'||colval1(i);
Array_value(2):='column_name ~'||oldcol2(i)||'~'||colval2(i);
.
.
.
Array_value(n):='column_name ~'||oldcoln(i)||'~'||colvaln(i);


--and pass this array to an audit function
audit_func(array_values);

End loop;

Is this right?


Tom Kyte
August 25, 2005 - 6:38 pm UTC

I'll not validate your logic after the update - that is just procedural code, but use the bulk collect on the update to get the values they were updated to

Alexander, June 14, 2007 - 3:22 pm UTC

Is there any difference between

delete from t;

and 

delete t;

Execution plan says no. Wondering why the two.
Tom Kyte
June 14, 2007 - 4:11 pm UTC

they are the same.

select all * from t;
select * from t;


they are the same.

select distinct * from t;
select unique * from t;

they are the same.

why not.

Synonyms

Michel Cadot, June 14, 2007 - 4:47 pm UTC


and:
create procedure ... is

and
create procedure ... as

Regards
Michel

Deleting through rowid

CT, July 30, 2007 - 12:33 pm UTC

We have a batch process where we delete approx 2 millions of rows from a set of 80 millions through index .
Because of huge db file sequential read it takes about 800 to 900 seconds when system is idle.
<quote>
The index fast full scan typically uses db file scattered reads
</quote>

This made me to try out something in my idle development instance.

DELETE FROM RC_DTL
WHERE HDR_NBR IN ( SELECT hdr_nbr
FROM rc_hdr
WHERE run_nbr = -9200705
)


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 109.99 848.32 139311 53940 8469403 2010561
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 110.00 848.32 139311 53940 8469403 2010561

Misses in library cache during parse: 1
Parsing user id: 1804

Rows Row Source Operation
------- ---------------------------------------------------
0 DELETE RC_DTL (cr=33615 pr=112005 pw=31617 time=662796664 us)
2010561 NESTED LOOPS (cr=32863 pr=2501 pw=0 time=24127017 us)
13850 INDEX RANGE SCAN RC_HDR_NDX_4 (cr=64 pr=61 pw=0 time=1440484 us)(object id 239182)
2010561 INDEX RANGE SCAN RC_DTL_NDX2 (cr=32799 pr=2440 pw=0 time=10960085 us)(object id 239179)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 107690 1.23 664.63
log file switch completion 22 0.97 1.34
log buffer space 89 0.98 23.25
latch free 1 0.00 0.00
direct path read temp 31422 0.00 0.10
SQL*Net message to client 2 0.00 0.00
SQL*Net message from client 2 0.00 0.00
log file switch (checkpoint incomplete) 48 0.98 45.34
latch: cache buffers lru chain 2 0.00 0.00
log file sync 11 0.97 10.17
********************************************************************************

This takes about 850 seconds to delete 2010561 rows.

NEXT I tried for a different run_nbr just to avoid reading from cache.

DELETE FROM RC_DTL
WHERE ROWID IN (SELECT d.ROWID
FROM rc_dtl d, rc_hdr h
WHERE h.run_nbr = -9200704
AND h.hdr_nbr = d.hdr_nbr
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.02 0 0 0 0
Execute 1 185.92 371.15 228794 2325784 16635836 2003709
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 185.94 371.18 228794 2325784 16635836 2003709

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 1804

Rows Row Source Operation
------- ---------------------------------------------------
0 DELETE RC_DTL (cr=2325784 pr=228794 pw=0 time=371155531 us)
2003709 NESTED LOOPS (cr=2325012 pr=218262 pw=0 time=206012971 us)
2003709 VIEW VW_NSO_1 (cr=185494 pr=149273 pw=0 time=79763625 us)
2003709 SORT UNIQUE (cr=185494 pr=149273 pw=0 time=69745071 us)
2003709 HASH JOIN (cr=185494 pr=149273 pw=0 time=26094059 us)
13807 INDEX RANGE SCAN RC_HDR_NDX_4 (cr=60 pr=58 pw=0 time=41677 us)(object id 239182)
78332749 INDEX FAST FULL SCAN RC_DTL_NDX2 (cr=185434 pr=149215 pw=0 time=313331158 us)(object id 239179)
2003709 TABLE ACCESS BY USER ROWID RC_DTL (cr=2139518 pr=68989 pw=0 time=147045392 us)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
SQL*Net message from client 2 0.00 0.00
db file sequential read 86210 1.36 163.91
db file scattered read 2487 0.80 18.83
latch: object queue header operation 1 0.00 0.00
latch free 8 0.00 0.00
log file switch completion 17 0.97 2.71
buffer exterminate 1 0.14 0.14
log buffer space 21 0.97 5.70
********************************************************************************



This took about 370 seconds and deleted 2003709 rows.

From tkprof , the second one has done more logical then first one but still completed much faster then the first one.
I believe it is because of the db file scattered rather than sequential read of index.


In production the first statement takes like about 20 to 23 minutes in the busy hours to delete approx 1.8 to 2 mill.
I am wondering is it right approach to change the statement to second one as it is faster inspite of huge logical I/O.
Also any thoughts on the physics of operation would be helpful.

Thank you very much
CT


Tom Kyte
July 30, 2007 - 5:45 pm UTC

one of them does a ton less physical IO, will that always be the case or did you get lucky.

the first one - less cpu, more elapsed (because of more physical io)
second one - lots more cpu, less elapsed (because of less physical io)

is the difference in physical io totally reproducible.

Physical I/O

A reader, July 30, 2007 - 6:16 pm UTC

Hi Tom,
Yes It is always the case.I have repeated this for different run numbers and is consistent. When I look through the enterprise manager first statement waits for I/O db file sequential read. For the second I don't see much wait for db file sequential read, I see scattered read.

I thought it is the scattered read( believe this will do multiblock ) in the second statement which brought the Physical read down or am I wrong.

Waiting for your advice on this.

Thank you
CT


Tom Kyte
August 02, 2007 - 9:11 am UTC

seems pretty straight forward - one of them uses the cache a lot (more cpu) but as a result is fortunate to not have to do as much physical IO

and the physical IO is the gating factor for this particular query.


so, if you have the cpu to burn, and the additional latching is not causing other things to go south - you have a solution....

Thank you

A reader, August 02, 2007 - 10:52 am UTC

Thank you Tom. Yes Physical I/O is the problem.

Thanks
CT

bulk delete on table without index

Markus, April 30, 2008 - 11:56 am UTC

hi tom,

we are using oracle warehouse builder (owb). i had an owb session obviously doing a full table scan (fts) forever. the code was generated by owb.

statement:
DELETE
FROM "DS_CASHFLOWAMOUNT"
WHERE ("DS_CASHFLOWAMOUNT"."LIACASHFLOW" IS NULL
AND :B1 IS NULL
OR "DS_CASHFLOWAMOUNT"."LIACASHFLOW" = :B2 )
RETURNING ROWID INTO :O0

level 12 trace:
...
WAIT #25: nam='db file scattered read' ela= 172 file#=6 block#=126229 blocks=16 obj#=71316 tim=2125239483246
WAIT #25: nam='db file scattered read' ela= 174 file#=6 block#=126245 blocks=16 obj#=71316 tim=2125239485409
WAIT #25: nam='db file scattered read' ela= 170 file#=6 block#=126261 blocks=16 obj#=71316 tim=2125239487599
WAIT #25: nam='db file scattered read' ela= 174 file#=6 block#=126277 blocks=16 obj#=71316 tim=2125239489829
...

some session stats:
...
physical read total IO requests
4162543
physical read total multi block requests 3722351
physical read total bytes 877336215552
physical reads 53548353
physical reads cache 53548353
physical read IO requests 4162544
physical read bytes 877336215552
physical reads cache prefetch 49385824
table scan rows gotten 18613319546
...

the table is rather small:
BLOCKS NUM_ROWS LAST_ANALYZED EXTENTS
---------- ---------- ------------------- ----------
30547 9109267 04.04.2008 22:02:55 133

i found a bulk delete in the code generated by owb:
...
FORALL i IN "LCF_CASHFLOWAMOUNT_si".."LCF_CASHFLOWAMOUNT_i" - 1
DELETE FROM
"DS_CASHFLOWAMOUNT"
WHERE

("DS_CASHFLOWAMOUNT"."LIACASHFLOW" is null and "LCF_CASH_2_LIACASHF"(i) is null or "DS_CASHFLOWAMOUNT"."LIACASHFLOW" = "LCF_CASH_2_LIACASHF"(i))
RETURNING ROWID BULK COLLECT INTO get_rowid;
...

there is no appropriate index on this table. i assume the delete is done with the default array size of 100 rows. so if i would delete say 100000 rows in bulk the database would do 1000 fts due to a missing index? i assume that i would see this as 1 execution in v$sql stats as this is 1 statement done by pl/sql. is this correct?

i will create a composite index on "DS_CASHFLOWAMOUNT"."LIACASHFLOW" + PRIMARY KEY because of the "is NULL" predicate. i think this should reduce this amount of fts. true?




Tom Kyte
April 30, 2008 - 12:56 pm UTC

unless you have an index that can and should be used, each "delete" (if you have an array size of 100, that is 100 deletes) will full scan the table, yes.

indexing, if appropriate, can reduce the full scans.


Delete many records at once

Dawar, May 02, 2008 - 6:36 pm UTC

Tom,

This is for bulk delete question.

Assume we have table called emp.

table desription:

SQL> desc emp
 Name                      Null?    Type
 --------- ----------------------------
 EMPNO                 NOT NULL NUMBER(6)
 DIVISION              NOT NULL NUMBER(3)
 JOB_NO                NOT NULL NUMBER(4)
 START_DATE            NOT NULL DATE

select * from emp;
EMPNO--------DIVISION--------JOB_NO------------START_DATE
1111---------011-------------8181--------------04/10/1999
1111---------011-------------8181--------------04/10/2004
2222---------011-------------3131--------------05/11/2005
3333---------022-------------8181--------------06/09/2001
3333---------044-------------8181--------------06/08/1988
5555---------011-------------8066--------------01/01/2001
6666---------033-------------9600--------------01/01/1999
7777---------044-------------8181--------------06/24/1996
7777---------033-------------8181--------------12/02/1991
7777---------033-------------8181--------------03/01/2002
9999---------044-------------9191--------------03/05/1980
9999---------055-------------9191--------------03/06/1989


My goal is to delete employee records for those employee which contains multiple values for JOB_NO equal to 8181, (JOB_NO= 8181) with new start_dateS.
We need to keep only one record for JOB_NO 8181 which contains oldest start_date.

Here is the delete statement for the single record.

delete from emp
where empno = 7777 and job_no = 8181 and start_date in ('03/01/2002','06/24/1996);

So how could I delete thousands records from the table with this logic?

After deleting multiple records table should be as below:

select * from emp;
EMPNO--------DIVISION--------JOB_NO------------START_DATE
1111---------011-------------8181--------------04/10/1999
2222---------011-------------3131--------------05/11/2005
3333---------044-------------8181--------------06/08/1988
5555---------011-------------8066--------------01/01/2001
6666---------033-------------9600--------------01/01/1999
7777---------033-------------8181--------------12/02/1991
9999---------044-------------9191--------------03/05/1980
9999---------055-------------9191--------------03/06/1989



Dawar
LA, USA


Tom Kyte
May 03, 2008 - 8:44 am UTC

this would have been easy to demonstrate

had I had a create table
and insert into statements....


Delete on Fact tables.

Ramakrishnan.A, May 04, 2008 - 6:05 am UTC

Hi,

Our data warehouse system is expected to have ~ 4,500 fact tables , each table will collect ~ 500 rows every minute
Every day mid-night we run delete job, which suppose to delete the data which are older than 15 days.

Now when test the system delete task is not working, even for 100 tables :-)

One big mistake we are doing is we are triggering 100 delete in parallel ; so 100 db connection same moment + 100 table read & delete same parallel.
We decided to change this to call delete task in sequence;

I guess still this will not solve the problem, I am thinking to try out following things

1. We may required to created index on date column. ( many of read operations are also based on date this will any way help read )

2. If we create fact table with no logging & no rollback will it help ? this will improve insert performance also.
There is no business logic in these tables , its always insert, read & delete.

Can I have fact tables with no logging & no rollback , if I do so what problem I will have.

Can you suggest me , what all should I try it out.

Best regards
Ramakrishnan.A
Tom Kyte
May 04, 2008 - 10:35 am UTC

umm, well, first - 4,500 fact tables - wow..

second, why would you not just partition this data by date and drop/truncate old partitions.

deleting this data is going to obviously be painfully slow - not the right way to even think about the problem. You should have partitioning in place here for this, no question.


you can have nologging on the tables, but with a delete - it will be "not useful", nologging works with direct path operations only (direct path loads, alter table move, create table as select, insert /*+ APPEND */ - not for delete, update, insert and the like)

Delete on Fact tables

Ramakrishnan.A, May 04, 2008 - 12:37 pm UTC

Thanks for your reply,

I checked my system specification with my colleagues now , its 1500 tables max & each table might get ~ 100 to 500 rows per minute.
As you suggested I will use partition table for every week and drop the partition table which are older than 3 weeks.

Between truncate & drop which is better to use. Dropping a partition saves space I will prefer that.
Our intention is system should not keep growing over the period of time.

Best regards
Ramakrishnan.A
Tom Kyte
May 04, 2008 - 2:45 pm UTC

if you want the data to "go away" - drop will be the way to go.


local indexes will be just fine

any global indexes - you'll need to think about. If you just "drop", they go invalid and you have to rebuild them. If you drop and maintain indexes - the drop will take longer (but so what), the redo and undo for the table will not have to be generated - but the undo and redo to maintain the indexes will.

remember - you have to DESIGN things into your system, partitioning - must be designed, thought about and tested. Do not just "turn it on" and expect good things. Understand the obvious side effects that will happen when you do turn it on.



Delete many records at once

Dawar, May 05, 2008 - 3:14 pm UTC

Sorry I did not privide you create table & insert script (above).
But issue has solved.

Here is the link.

http://forums.oracle.com/forums/thread.jspa?threadID=651006&tstart=0

Mitch Sacks, January 17, 2010 - 10:52 pm UTC

Tom,

I wanted to run this by you since I didn't see an example of this with the use of LIMIT (I might have missed it). Assume indexes and Foreign Keys are dropped before the deletes take place, I just didn't include them to keep the example clear. Secondly, I want to get the fastest performance on the deletes as possible, that's why I'm trying to do it with BULK COLLECT and FORALL. This code is for a Data Warehouse and some of the tables are huge. The table in the example below, presently has 50 million records. The code below in Example 1, surpasses the performance of anything else I've tried, and deletes the records in question in about 15 minutes.

EXAMPLE 1
DECLARE TYPE tt_delete IS TABLE OF ROWID; -- declare collection type
t_delete tt_delete;                       -- declare the collection
 
CURSOR c_delete IS
SELECT ROWID
  FROM ps_al_chk_memo
 WHERE check_dt  >= SELECT MAX(check_dt) - 90
                      FROM ps_al_chk_memo;
 
l_delete_buffer PLS_INTEGER := 5000;
 
BEGIN
    dbms_output.put_line('STARTING: '||to_char(sysdate, 'MM/DD/YYYY HH24:MI:SS'));

    OPEN c_delete;
    LOOP
        FETCH c_delete BULK COLLECT INTO t_delete LIMIT l_delete_buffer;
        
        FORALL i IN 1..t_delete.COUNT
            DELETE ps_al_chk_memo
             WHERE ROWID = t_delete (i);
             
            COMMIT;
        EXIT WHEN c_delete%NOTFOUND;
        COMMIT;
    END LOOP;
    CLOSE c_delete;

    dbms_output.put_line('FINISHED: '||to_char(sysdate, 'MM/DD/YYYY HH24:MI:SS'));
END;
/ 

EXAMPLE 2
Performing the delete in this fashion consistently takes about 30 minutes.
DELETE FROM ps_al_chk_memo 
 WHERE check_dt >= (SELECT MAX(check_dt) - 90 
                      FROM ps_al_chk_memo);


Tom Kyte
January 18, 2010 - 5:40 pm UTC

care to share the plan for the single delete and a tkprof (with actual row counts) if possible.

also for the sql in the procedural code if you can.

Delete Performance

Venkat, March 17, 2012 - 3:00 pm UTC

Hi Tom,

Apart from create table as select or partitioning the tables, could you please let me know what is the efficient way of deleting many rows from a table.

For ex., we are deleting around 300K rows from a table(with 1.8M rows) every week using some conditions. It's taking around 5hours to delete those many rows. Below is the sample script(for security purposes editing the original query)

delete from <table> where <ROWID> in (SELECT ROWID FROM <subquery> sq,<tab1> t1 where <condition>)

Please note that select query itself is completing in 1minute. It's only during the delete it's taking long time. My concern is there are around 300M buffer gets on the delete statement. Why there would be those many buffer gets?

There is one primary key on 2 columns in the table.

If you could please throw some light on this it would be great.

dbversion: 11.1.0.7

Sorry if I missed providing any more data.

Thank you









Tom Kyte
March 19, 2012 - 9:58 am UTC

could you please
let me know what is the efficient way of deleting many rows from a table.


The most efficient way to do something is to NOT do it. That said, a create table as select or (even better, much much better) is partitioning.

A delete is perhaps the most resource intensive SQL statement ever. It has to log in undo the most stuff (the entire before image of the entire row). It touches every single index pretty much on the table. It generates the most undo - and redo. It does a ton of work. It affects free space algorithms (does the block go back on the freelist of not)


Why there would be those many buffer gets?

index maintenance. undo maintenance. Hopefully a full scan (if not, you really want it do, you do not want to do 300k table access by index rowids)


Why are the alternatives available to you?

Delete Performance

Venkat, March 17, 2012 - 3:39 pm UTC

Hi Tom,

Adding some information to the above. I just checked that delete is infact running the delete on 8 child tables(foreign key references).

Can you please let me know what would be the best way to delete the rows in this case. Appreciate your help.

Thank you
Tom Kyte
March 19, 2012 - 10:01 am UTC

hah, there you go.

300k records - each of which has to perform 8 index range scans and table access by index rowids - to delete a small subset of child records. 300,000 times - slow by slow at its very best.

On delete cascade - fine for a single row OLTP update.
On delete cascade - death for a batch operation as you are trying to do.

I'd get rid of the on delete cascade and delete from the child tables yourself IN BULK - one delete per child table to get rid of them and then a delete against the parent.

Or, change the way you process entirely so as to make this operation that is killing you efficient.

Delete Performance

Venkat, March 19, 2012 - 4:39 pm UTC

Thank you very much Tom.

"I'd get rid of the on delete cascade and delete from the child tables yourself IN BULK - one delete per child table to get rid of them and then a delete against the parent."

What I understand from the above statement is, delete the rows from child table and disable the constraints before deleting against the parent? If so, during the delete what happens to regular inserts that go parallel on these tables(possible data corruption)? Please confirm.

And if we do not disable the constraints, it will still going to check if the data is there or not which would have a bit overhead correct?

Appreciate all your help!!!
Tom Kyte
March 19, 2012 - 9:22 pm UTC

No, I didn't say drop the constraint, I said get rid of the "on delete cascade" option.


get rid of on delete cascade (slow by slow processing, good for oltp, horrible for batch)

delete from child
delete from parent
commit



but - it will still take hours, really - it will and it should and there isn't anything that will change that.

delete Million rows

Kishore, April 13, 2012 - 6:51 pm UTC

Hi Tom,

We are deleting records(around 1M) older than 90 days from staging table using below delete statement(calling it through pl/sql since we need to commit after every 50K records).

delete from cust_etl_stage where is_active='N' and status='P' and rec_date<sysdate-90 and rownum<50000;

Plan for single execution:
-------------------------------------------------------------
| Id | Operation | Name | Rows |
-------------------------------------------------------------
| 0 | DELETE STATEMENT | | 50K|
| 1 | DELETE | CUST_ETL_STAGE | |
|* 2 | COUNT STOPKEY | | |
|* 3 | INDEX RANGE SCAN| CUST_ETL_STG_IDX6 | 10M|
-------------------------------------------------------------

Table size is 35Gb. Number of rows: 50M

There are around 6 indexes on this table and one of the index has all the columns listed in the where clause. Order of columns in the index is, rec_date,status,is_active(high selective to low selective). There aren't any child tables for this table.

It's taking around 1hour to delete 1M rows from this table. Can you please let me know what we can do to make it faster?

Appreciate all your help.
Tom Kyte
April 14, 2012 - 11:03 am UTC

I would suggest


create table new_cust_etl_stage
as
select * from cust_etl_stage
where NOT( is_active = 'N' and status = 'P' and rec_date < sysdate-90);

drop table cust_etl_stage;
rename new_cust_etl_stage to cust_etl_stage;


I'm not being sarcastic, I'm not joking, I'm dead serious.



<Order of columns in the index is,
rec_date,status,is_active(high selective to low selective)


and you do realize that selectivity has nothing to do with the order of columns in an index right?

Say c1 has two values. say c2 has 1,000,000 values. say table t has 100,000,000 rows.

If I create the index on (c1,c2) or (c2,c1) how many IO's would it take to process the where clause:

where c1 = ? and c2 = ?

would it be different? No, the height of the index would basically be the same for (c1,c2) or (c2,c1) - it would take THE SAME AMOUNT OF WORK!!!


Now, if the predicate was

where c1 = ? and c2 > ?

would the order matter? sure would - having c2 first would be a HUGE MISTAKE. Why? Well, lets say c2 is in the range of 1.. 1,000,000 - if you had an index on (c2,c1) and you queried:

where c1 = 1 and c2 > 999,999

then that index would be pretty efficient - it would look at all of the c2 > 999,999 from the index - and then inspect all of the c1 values (it could not use the index to process the c1=1 part as an ACCESS method, it would have to filter it).

Now if the where clause was:

where c1 = 1 and c2 > 500,000

it would have to look at every index entry such that c2 was > 500,000 and see if c1 = 1 (filter). It would look at HALF of the index.


Now, if the index were on c1,c2 - and the predicate was as above, it would go right to c1=1, c2>500,000 and ONLY INDEX RANGE scan those values - period, it would not see a single c1 value that was not 1, it would not see a single c2 value that was less then or equal to 500,000. It would only index range scan the rows that should be.


Your index is completely backwards for your predicate. Put things you use = on first, things you use > < etc on last.


Also, your last statement:

it's taking around 1 hour....


It is going to - take at least that long (I'm surprised it is that fast to be honest!)


Think about it - you have 1,000,000 rows to retrieve and delete via an index. We might have to do as many as 1,000,000 physical IO's to retrieve those rows via an index. A physical IO takes about 5ms on a good day on average (a random IO).

Well, 1,000,000 * 5ms is about 1 and 1/3 hours of time on IO.


Try the create table as select, you can do that parallel, nologging - it generates no undo by default. Give it a try, you might be very surprised.

A reader, April 14, 2012 - 11:59 am UTC

Hi Tom. Can following solution be an option if we could not drop/rename original table?

1. select the ROWIDs for those to-be-deleted rows in order, and store them in a temporary table;
2. do bulk delete based on rowid.

Thanks
Tom Kyte
April 15, 2012 - 7:00 am UTC

you would be doing 1,000,000 rowid look up. 1,000,000 single block IO's.

It would take as long, if not longer than what you are currently doing.

You'd have to run a query, and then delete.



delete Million rows

kishore, April 14, 2012 - 1:08 pm UTC

Thank you very much for great explanation.

"Create table as select" is very good option. We are doing it occasionally.

In this case, we will be scheduling a job to daily purge the records older than 90days. And this table cannot be made offline. Sorry, I should have mentioned beforehand.

I think, we were left with deleting it in the same way.

Could you please let me know,
1) what happens if there is an "OR" condition in the where clause for one of the column and all the columns(in where clause) are indexed?

Ex., create index test1_idx on test1(cola,colb,colc);

delete from test1_idx where ((cola=2 and colb='A') or cola=5) and colc>50000;

Assuming, cola has 3 distinct values. colb has 2 distinct values and colc is not null and has 1M rows.

2) Below link, made me think that selectivity does matter for a column order. May be I am wrong or may be it won't be the case in our situation(as we are only talking about this particular delete). Please clarify.

http://richardfoote.wordpress.com/2008/03/10/index-skip-scan-does-index-column-order-matter-any-more-warning-sign/

As always, thank you very much for your wonderful suggestions.

Tom Kyte
April 16, 2012 - 1:02 am UTC

*partitioning*

why isn't partitioning part of your solution. drop would become your purge. It would be measured in milliseconds.


1) if colc has the values 1..1,000,000 - I don't want to use an index for colc > 50,000 probably.

but assuming it would make sense to use an index


ops$tkyte%ORA11GR2> create table t as select * from all_objects where 1=0;

Table created.

ops$tkyte%ORA11GR2> exec dbms_stats.set_table_stats( user, 'T', numrows =>  1000000, numblks => 100000);

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> create index t_idx1 on t(owner,last_ddl_time,object_type);

Index created.

ops$tkyte%ORA11GR2> set autotrace traceonly explain
ops$tkyte%ORA11GR2> delete from t where ((owner='SCOTT' and object_type='TABLE') or owner='A') and last_ddl_time > sysdate-50;

0 rows deleted.


Execution Plan
----------------------------------------------------------
Plan hash value: 3054501207

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |        | 50000 |  1806K|     0   (0)| 00:00:01 |
|   1 |  DELETE            | T      |       |       |            |          |
|   2 |   CONCATENATION    |        |       |       |            |          |
|*  3 |    INDEX RANGE SCAN| T_IDX1 |   500 | 18500 |     0   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN| T_IDX1 | 49500 |  1788K|     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("OWNER"='A' AND "LAST_DDL_TIME">SYSDATE@!-50)
   4 - access("OWNER"='SCOTT' AND "LAST_DDL_TIME">SYSDATE@!-50 AND
              "OBJECT_TYPE"='TABLE')
       filter("OBJECT_TYPE"='TABLE' AND LNNVL("OWNER"='A'))

ops$tkyte%ORA11GR2> set autotrace off




2) hah, that is giving a reason for putting the LEAST selective first!!!!

your theory is "most selective" first.


If all things are equal - there are arguments for putting the LEAST SELECTIVE first, there are NO arguments for putting the most selective first!!!


reasons for putting least selective first include:

a) index key compression, the index would compress very very well. If you have an index on "t(owner,object_type,object_name) compress 2", that would compress very well since owner,object_type repeat a lot. If you have an index on (object_name,object_type,owner) - it would not be compressable at all since object_name is nearly unique.

b) index skip scans, they only work when the leading edge of the index is very repetitive - non selective.


But this is only when "all things are equal"


If you have a predicate "where a = ? and b = ?" there are arguments for putting the least selective first, not so the other way around so much.


delete million rows

Kishore, April 15, 2012 - 12:28 pm UTC

That's very nice explanation. Thank you very much Tom.

I am not sure how partitioning would help in cases like this. deleting with certain criteria(for ex., is_active,status,rec_date in where clause). Let's suppose we partition by rec_date, we can't really drop the partitions directly correct? or Do you mean, it would help in partition elimination?

In your example above(where 2 indexes were created), I don't see 2nd index being used in the plan. Can you please clarify?

Thank you very much for all your help.


Tom Kyte
April 16, 2012 - 1:03 am UTC

subpartitioning would have to be used.

I removed the two index example, I goofed - it didn't work like I wanted it to - the single index was the way to go...

A reader, April 16, 2012 - 3:48 am UTC

Hello Sir,

with respect to your above example i have pasted below the filtering information

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("OWNER"='A' AND "LAST_DDL_TIME">SYSDATE@!-50)
4 - access("OWNER"='SCOTT' AND "LAST_DDL_TIME">SYSDATE@!-50 AND
"OBJECT_TYPE"='TABLE')
filter("OBJECT_TYPE"='TABLE' AND LNNVL("OWNER"='A'))


can you please explain why there is a filter applied?


Thank you,


Tom Kyte
April 16, 2012 - 7:32 am UTC

because we can use the index as an access method for "owner=,last_ddl_time>" in a range scan - but we cannot use it as an access for object_type - since we have the last_ddl_time > ? bit in there. The object_types are not "sorted" anymore in that range scan since we are hitting multiple last_ddl_times. We can access through owner,last_ddl_time but then have to filter all of the object_types we see..

index column order

A reader, April 16, 2012 - 10:54 am UTC

Hi Tom,

In your example above, why do we have "object_type" at the end of index definition?

Thank you
Tom Kyte
April 16, 2012 - 4:07 pm UTC

so the filter can be done in the index without having to hit the table to figure out if the row 'fits the predicate' or not.

what if owner = ? and last_ddl_time > ? matched 5,000,000 rows - but only one of them was a table...

I wouldn't want to go to the table 4,999,999 times just to be disappointed, I'd rather scan through the index (which I'd have to anyway regardless) and only hit the table once.

delete performance poor

Edgy, April 30, 2013 - 1:15 am UTC

Hi Tom,

I am suffer from delete performance for long and a lot. In fact I reviewed all related Q&A and comments from web. However, the issue is still there, at least, I am not satisfied with the performance yet.

I tried single delete SQL statement and bulk delete in PL/SQL as well. Bulk delete tried with 1k to 10K per loop. 400K rows deletion takes around 400 seconds up to 7000+ seconds. The result is very different. However, usually 400K took 1500+ seconds. I believe it should be better. (Whole DB has 20-40 millions records)

We have two tables, here deletion performance is about delete rows from child table. So, I assume there is no foreign key index issue. But, I am sure foreign key is indexed.

I did a trace with single delete statement. Here is result.

1. Delete rows in child table.
********************************************************************************

SQL ID: 80xyr9wpnxk0x Plan Hash: 1455046954

DELETE FROM FCS_TMETADATA
WHERE
BATCH='KAR' AND DATE='20120920'


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 40.67 1503.86 137785 18068 2376235 447830
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 40.68 1503.86 137785 18068 2376235 447830

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 158 (TRANSIENT)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 DELETE FCS_TMETADATA (cr=18320 pr=137785 pw=0 time=633122074 us)
447830 447830 447830 INDEX RANGE SCAN TMETADATA_PK (cr=5996 pr=5995 pw=0 time=14858725 us cost=5877 size=58367004 card=399774)(object id 185304)


Rows Execution Plan
------- ---------------------------------------------------
0 DELETE STATEMENT MODE: ALL_ROWS
0 DELETE OF 'FCS_TMETADATA'
447830 INDEX MODE: ANALYZED (RANGE SCAN) OF 'TMETADATA_PK' (INDEX
(UNIQUE))


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
Disk file operations I/O 2 0.00 0.00
db file sequential read 137780 0.80 1468.62
log file switch (private strand flush incomplete)
3 0.09 0.17
db file parallel read 1 0.02 0.02
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.31 0.31
********************************************************************************

From the trace, I noticed cluster-factoring is very high in my case. However, looks like I am not able to do much on decrease cluster factoring because there are hundreds jobs adding files (records) to Oracle simultaneously. I think this is one of key reason why clustering-factor is so high.

So, my question is, in this case, how can I do to have better performance on deletion?

Really appreciate any help.

Thanks,
Edgy
Tom Kyte
April 30, 2013 - 3:25 pm UTC

it is all about physical IO here, you are doing 137,780 single block reads (your data is scattered all over the place

that the foreign key is indexed is part of your performance problem (and likely a necessary evil - meaning, don't just drop it). A delete is going to maintain every index on that table - so to delete you will:

read a few hundred/thousand index blocks to find the rows

read to almost half a million database blocks from the table

for every index on that table, you'll do 3 or 4 IO's for every single row to find the index key entry to be removed.


I would suggest partitioning this data by batch and the date, and you can then just truncate/drop the data - no undo, no redo, if the indexes are local - no index maintenance

delete performance poor

Edgy, May 01, 2013 - 12:50 am UTC

Hi Tom,

Thank you very much for suggestion.

Definitely, we have to go with partition. Looks like it is final solution. However, I am still wondering in term of high clustering-factor situation, is there any possible solutions besides partitioning table?

You might feel weird why I asked this. As in this design, we have Parent-Child tables (two tables). The previous question is about deletion on child tables. After those records are deleted, another process is going to remove rows in parent table, which is not referenced by child table any more. This part, the deletion performance is also poor. However, as for this table, there is no any indicator as child table (batch and date), it is hard to make it partition. So, what is suggest then?

Here is trace.

Delete rows in parent table. (In this case, I only delete first 10K rows)
********************************************************************************

DELETE FROM fcs_tfiledata
where rowid in
( SELECT
/* + FIRST_ROWS(10000) */
f.rowid
from
fcs_tmetadata m
RIGHT OUTER JOIN
fcs_tfiledata f
ON
m.file_size = f.file_size AND
m.file_md5sum = f.file_md5sum AND
m.file_adler32 = f.file_adler32
WHERE
m.file_id IS NULL
and rownum < 10000
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 2 32.57 317.44 180756 321172 99531 9999
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 32.58 317.45 180756 321172 99531 9999

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 158 (TRANSIENT)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
0 0 0 DELETE FCS_TFILEDATA (cr=321175 pr=180756 pw=257362 time=317444735 us)
9999 9999 9999 NESTED LOOPS (cr=321128 pr=179896 pw=257362 time=343346121 us cost=756569 size=71 card=1)
9999 9999 9999 VIEW VW_NSO_1 (cr=311341 pr=170497 pw=257362 time=239476012 us cost=756567 size=119988 card=9999)
9999 9999 9999 SORT UNIQUE (cr=311341 pr=170497 pw=257362 time=239472466 us)
9999 9999 9999 COUNT STOPKEY (cr=311341 pr=170497 pw=257362 time=240992140 us)
9999 9999 9999 FILTER (cr=311341 pr=170497 pw=257362 time=240989352 us)
49390 49390 49390 HASH JOIN RIGHT OUTER (cr=311341 pr=170497 pw=257362 time=249539927 us cost=756567 size=8615996784 card=79777748)
28891868 28891868 28891868 INDEX FAST FULL SCAN FCS_TMETA_FILE_IDX (cr=304782 pr=114245 pw=0 time=201584436 us cost=74884 size=1469039343 card=28804693)(object id 186318)
113223 113223 113223 INDEX FAST FULL SCAN TFILEDATA_PK (cr=6559 pr=5381 pw=0 time=4392450 us cost=282089 size=5463736539 card=95855027)(object id 77928)
9999 9999 9999 TABLE ACCESS BY USER ROWID FCS_TFILEDATA (cr=9787 pr=9399 pw=0 time=66994693 us cost=1 size=59 card=1)


Rows Execution Plan
------- ---------------------------------------------------
0 DELETE STATEMENT MODE: ALL_ROWS
0 DELETE OF 'FCS_TFILEDATA'
9999 NESTED LOOPS
9999 VIEW OF 'VW_NSO_1' (VIEW)
9999 SORT (UNIQUE)
9999 COUNT (STOPKEY)
9999 FILTER
49390 HASH JOIN (RIGHT OUTER)
28891868 INDEX MODE: ANALYZED (FAST FULL SCAN) OF
'FCS_TMETA_FILE_IDX' (INDEX)
113223 INDEX MODE: ANALYZED (FAST FULL SCAN) OF
'TFILEDATA_PK' (INDEX (UNIQUE))
9999 TABLE ACCESS MODE: ANALYZED (BY USER ROWID) OF
'FCS_TFILEDATA' (TABLE)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 27600 0.42 164.35
asynch descriptor resize 8 0.00 0.00
db file scattered read 12953 0.40 88.30
Disk file operations I/O 1 0.00 0.00
CSS initialization 1 0.00 0.00
CSS operation: query 6 0.00 0.00
CSS operation: action 1 0.00 0.00
direct path write temp 3016 0.64 17.58
direct path read temp 1640 0.16 13.59
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 8.22 8.22
********************************************************************************

Based on this SQL statement, I also tried PL/SQL bulk delete, delete 10K and commit. In term of high clustering-factor, in PL/SQL, I choose to sort rowid to make sure the deletion rows are in same block as possible. However, the performance is still not very good as I expected. It is about 70 rows to 500 rows per-seconds. (Not sure why, the performance is not very stable during my many time testing)

Thank you very much again. Very appreciate your comments!!!

Edgy
Tom Kyte
May 06, 2013 - 1:36 pm UTC

why commit? just to make it slower?


the two major problems with your delete above

a) there is a foreign key to the parent table and even if there isn't any data in the child table, we have to verify that and prevent data from being inserted - for every row, one. at. a. time. slow by slow - over and over and over and over again.

b) you are full scanning more than 28,000,000 rows every time you do this delete. and you do this delete over and over and over and over because you THINK you want to commit ever 10k rows. So we end up reading and re-reading that index fast full scan over and over and over and over and ..............


just do a simple delete if you are doing to persist with delete (partitioning is the right approach here)

I don't want to rewrite this delete right away - because it is very strange:

DELETE FROM fcs_tfiledata
 where rowid in
           ( SELECT
                /* + FIRST_ROWS(10000) */
                f.rowid
                from
                        fcs_tmetadata m
                    RIGHT OUTER JOIN
                        fcs_tfiledata f
                    ON
                        m.file_size = f.file_size AND
                        m.file_md5sum = f.file_md5sum AND
                        m.file_adler32 = f.file_adler32
                    WHERE
                        m.file_id IS NULL
                        and rownum < 10000
               )



are you using the right outer join as a "not in" here? I see the m.file_id is null - is file_id really not null in M and you are outer joining to find the rows in F that are NOT IN M?

or does "select * from fcs_tmetadata where file_id is null" return rows as well?

if you are doing "not in" processing, please just

delete from fcs_tfiledata 
  where (file_size,file_md5sum,file_adler32) not in 
        (select file_size,file_md5sum,file_adler32 
           from fcs_tmetadata 
          where file_size is not null 
            and file_md5sum is not null 
            and file_adler32 is not null);


or

delete from fca_tfiledata f where not exists (
select null
  from fcs_tmetadata m
 where                  m.file_size = f.file_size AND
                        m.file_md5sum = f.file_md5sum AND
                        m.file_adler32 = f.file_adler32
)


Big amount deletion performance is poor

Edgy, May 08, 2013 - 3:45 pm UTC

Hi Tom,

Firstly, thank you very much for helping me on this matter!

And very sorry for misleading you. To limited 10K in previous is just explain plan in my testing, I wonder how long it will take for 10K deletion. 

In actual solution I am using right now is 

1) create a temp table 'to_delete' which contain all rowid from fcs_tfiledata ( out-joined as same logic in my previous statement. It should be changed as you mentioned.)

SQL:
 create table to_delete (row_id) nologging parallel as  SELECT
         f.rowid
        FROM fcs_tfiledata f
        WHERE (file_size, file_md5sum,file_adler32)
          not in (select file_size, file_md5sum,file_adler32 from fcs_tmetadata)


2) use bulk delete to delete.

DECLARE
BEGIN
  EXECUTE IMMEDIATE 'ALTER SESSION ENABLE PARALLEL DML';
END;

And then:
declare
  type obj_rowid is table of ROWID INDEX BY PLS_INTEGER;
  lr_rowid obj_rowid;
  cursor c IS select row_id from to_delete order by row_id desc;

BEGIN
  open c;
  loop
    fetch c BULK COLLECT INTO lr_rowid LIMIT 10000;
    forall i in lr_rowid.FIRST .. lr_rowid.LAST
      DELETE /*+ PARALLEL(fcs_tmetadata) */ from fcs_tfiledata where rowid in (lr_rowid(i));
  commit;
  dbms_output.put_line(' SQL%ROWCOUNT rows are deleted!');
  exit when c%NOTFOUND;
  end loop;
  CLOSE c;
END;

3). Drop temp table 'to_delete'.

Why to create temp table, because during my testing, I found if select all rowid and sort and send to cursor in bulk delete pl/sql, it took much much more time. However, create a temp table with nologging and parallel and then sort is much much faster.

Why chose 10K, it is about my testing amount. I tried 1K, 5K, 10K and even 100K and etc. Looks like 10K is fair performance.


Before having this bulk delete code, we did simple delete with out-join selection. It took more time. At that moment, DB has probably more than 100 million garbage there, so simple delete never finished after 36 hours run. After then, we developed bulk delete as it. But, the performance is still not satisfying us.

Also tried :
SQL> select * from fcs_tmetadata where file_id is null;

no rows selected

SQL>

The file_id is null only after out-join. This is what I understood.

Appreciate you help!!!
Edgy

Tom Kyte
May 08, 2013 - 5:16 pm UTC

re-read my answer above, I would just be typing in it again here. Nothing has changed.

I hate slow by slow
This should be a single statement
and that statement should be DDL not DML.



your parallel 'hint' fortunately won't do anything. Think about it, why would you want to go parallel on a single row delete by rowid??????????????? You are doing 10,000 single row deletes with each call - but they are all single row deletes. Do you really want to full scan the table to find one row (by its rowid!!!!)



if you persist with a delete, I gave you the delete to use above....

if you go with DDL, consider partitioning this table if possible so you can just truncate - but I would use DDL regardless:


create table new_table
as
select * from fcs_tfiledata
where (file_size, file_md5sum,file_adler32) IN (select ... from fcs_tmetadata);

drop table fcs_tfiledata;

rename new_table to fcs_tfiledata - add constraints, indexes, grants, etc...



if you have 100,000,000 rows to delete - the create table as select is 100% definitely the way to go. definitely.

Big amount deletion performance is poor

Edgy, May 08, 2013 - 6:32 pm UTC

Hi Tom,

Thank for your quick response.

Yes, I do remember you mentioned a lot about single SQL statement as possible. Why not follow, here is the reasons:

1) deletion on a big amount takes time. From our understanding, it takes more redo/undo space as well. If it is too big, it could run slow by slow.

My question: Is this the case, we should avoid from single deletion statement?

2) If deletion is interrupted before commit, it takes another and even more time to roll-back. This is also we tried to avoid from.

3) "The keyword FORALL lets you run multiple DML statements very efficiently. It can only repeat a single DML statement, unlike a general-purpose FOR loop." "The FORALL statement, a feature of bulk SQL, sends DML statements from PL/SQL to SQL in batches rather than one at a time. "(from Oralce doc) So, my understanding is Oracle optimize it run as in parallel. (It might be wrong according to your comments above.)

4) As rowid contains file/block/row information, from my understanding, to get that rows should be O(1) or O(n). In another words, Oracle could get that row very quick.

My Question: Does FORALL run in parallel internal Oralce?

5) If we run clean as daily basis, daily deletion is about 40K to 70K rows (much less than 10,000,000 when we first faced this issue). It is about less 5% rows to delete in term of whole database.

My Question: Do you think it is still worth to use CTAS? My understanding is it should be worth if there are more than 20% records deletion.

And as the quickest way of CTAS, is with nologging and parallel option. So, after delete and rename table back, do we need DBA do backup right away? As don't want to be any possible data loss.

Very very appreciate your help!!! Sorry for any my weak knowledge. :(

Edgy

Tom Kyte
May 08, 2013 - 7:20 pm UTC

1) it takes more overall redo AND undo to do it bit by bit as you are. Your commits generate extra redo, you are actually generating MORE, not less.

if you do it row by row as you are now, that is slow by slow. a single delete is bulk, you cannot get any bulkier than that. DELETE is by far *the slowest of all DML operations*. period, no getting around it. It is not designed for millions of rows, never will be.

2) why would it be interrupted?


3) your understanding is wrong. FORALL sends in your case 10,000 rows to the sql layer and the sql layer executes the single row delete statment 10,000 times. it is 10,000 single row operations.

and even if it were not, did you REALLY want to full scan a table in parallel over and over and over again to find a measely 10,000 rows?


forall just limits the number of times plsql calls the sql layer, it passes an array of data to be processed by that sql statement. it is better than calling sql 10,000 times but not as good as just calling it once.

4) in math, when I was getting my degree, one of the first things you learn is:


a big number multiplied by any other number (regardless of how small) is still a BIG NUMBER.

Your big number - the number of times you'll be calling this delete.
Your other number - not really relevant, we are dealing with a big number here.


so what if an individual delete is fast, if you do something fast 100,000,000 times - it is going to take a very very very very very very long time.


5) I think CTAS is the way to go here, yes.

if you do nologging, yes, a backup would be mandatory.


Big amount deletion performance is poor

Edgy, May 09, 2013 - 1:59 am UTC

Hi Tom,

I am trying as you suggested to use single SQL statement. Here is explain plan.

delete from fcs_tfiledata f where not exists (
select null
from fcs_tmetadata m
where m.file_size = f.file_size AND
m.file_md5sum = f.file_md5sum AND
m.file_adler32 = f.file_adler32
)

-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 218K| 18M| | 978K (1)| 03:15:44 |
| 1 | DELETE | FCS_TFILEDATA | | | | | |
|* 2 | HASH JOIN ANTI | | 218K| 18M| 1228M| 978K (1)| 03:15:44 |
| 3 | TABLE ACCESS FULL | FCS_TFILEDATA | 21M| 978M| | 629K (1)| 02:05:56 |
| 4 | INDEX FAST FULL SCAN| FCS_TMETA_FILE_IDX | 51M| 2181M| | 149K (1)| 00:29:58 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("M"."FILE_SIZE"="F"."FILE_SIZE" AND "M"."FILE_MD5SUM"="F"."FILE_MD5SUM" AND
"M"."FILE_ADLER32"="F"."FILE_ADLER32")

Looks like performance is not so good as I expected. During daily run which uses bulk delete as mentioned above, taking an example from yesterday, it deleted 290552 rows for 3248 seconds.

Real testing is on the way. I may updated later if there has major timing difference. BTW, statics is updated every day. So, supposed the explain plan should be close enough to real.

Thanks,
Edgy
Tom Kyte
May 09, 2013 - 1:23 pm UTC

that is perfect - exactly what I'd want to see.

that plan is *awesome*, best you could hope for.

Big amount deletion performance is poor

Edgy, May 09, 2013 - 3:31 pm UTC

Hi Tom,

Sorry, I don't think I get it.

It shows deletion needs 3hours and 15 minutes. Actually, it took 6238.763 seconds to finish. It is longer than bulk deletion plsql method, which takes around 3200-3500 seconds.

Do I make any mistakes???

Thanks,
Edgy
Tom Kyte
May 09, 2013 - 5:05 pm UTC

how long did the query plus the delete take.

Big amount deletion performance is poor

Edgy, May 09, 2013 - 7:34 pm UTC

Single SQL delete statement:
delete from fcs_tfiledata f where not exists (
select null
from fcs_tmetadata m
where m.file_size = f.file_size AND
m.file_md5sum = f.file_md5sum AND
m.file_adler32 = f.file_adler32
)

This statement took 6238.763 seconds last night.

Thanks,
Edgy
Tom Kyte
May 09, 2013 - 7:46 pm UTC

and the thing you are comparing this to. how long did the create table as select AND the plsql block take to remove the same amount of data?

and you are sure the only plan in the shared pool for that delete is two full scans plus hash anti-join right? no index based plans (that would be bad) other than an index fast full scan right?

Big amount deletion performance is poor

Edgy, May 10, 2013 - 3:00 pm UTC

Hi Tom,

Here is explain plan for CTAS method. It seems not better.

Nornal CTAS version:
=====================
explain plan for
create table fcs_tfiledata_2 as
select * from fcs_tfiledata f
where ( f.file_size, f.file_md5sum, f.file_adler32 ) in
(select m.file_size, m.file_md5sum, m.file_adler32 from fcs_tmetadata m);

Plan hash value: 179069729

-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | 2477K| 418M| | 1078K (1)| 03:35:38 |
| 1 | LOAD AS SELECT | FCS_TFILEDATA_2 | | | | | |
|* 2 | HASH JOIN RIGHT SEMI | | 2477K| 418M| 2775M| 1067K (1)| 03:33:29 |
| 3 | INDEX FAST FULL SCAN| FCS_TMETA_FILE_IDX | 51M| 2181M| | 149K (1)| 00:29:58 |
| 4 | TABLE ACCESS FULL | FCS_TFILEDATA | 21M| 2769M| | 629K (1)| 02:05:56 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("F"."FILE_SIZE"="M"."FILE_SIZE" AND "F"."FILE_MD5SUM"="M"."FILE_MD5SUM" AND
"F"."FILE_ADLER32"="M"."FILE_ADLER32")

Parallel version of CTAS:
=========================
explain plan for
create table fcs_tfiledata_new parallel as
select * from fcs_tfiledata f
where ( f.file_size, f.file_md5sum, f.file_adler32 ) in
(select m.file_size, m.file_md5sum, m.file_adler32 from fcs_tmetadata m);

Plan hash value: 2703603877

--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | 2477K| 418M| | 36372 (1)| 00:07:17 | | | |
| 1 | LOAD AS SELECT | FCS_TFILEDATA_NEW | | | | | | | | |
| 2 | PX COORDINATOR | | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10002 | 2477K| 418M| | 36036 (1)| 00:07:13 | Q1,02 | P->S | QC (RAND) |
|* 4 | HASH JOIN RIGHT SEMI BUFFERED| | 2477K| 418M| 86M| 36036 (1)| 00:07:13 | Q1,02 | PCWP | |
| 5 | PX RECEIVE | | 51M| 2181M| | 5202 (1)| 00:01:03 | Q1,02 | PCWP | |
| 6 | PX SEND HASH | :TQ10000 | 51M| 2181M| | 5202 (1)| 00:01:03 | Q1,00 | P->P | HASH |
| 7 | PX BLOCK ITERATOR | | 51M| 2181M| | 5202 (1)| 00:01:03 | Q1,00 | PCWC | |
| 8 | INDEX FAST FULL SCAN | FCS_TMETA_FILE_IDX | 51M| 2181M| | 5202 (1)| 00:01:03 | Q1,00 | PCWP | |
| 9 | PX RECEIVE | | 21M| 2769M| | 21836 (1)| 00:04:23 | Q1,02 | PCWP | |
| 10 | PX SEND HASH | :TQ10001 | 21M| 2769M| | 21836 (1)| 00:04:23 | Q1,01 | P->P | HASH |
| 11 | PX BLOCK ITERATOR | | 21M| 2769M| | 21836 (1)| 00:04:23 | Q1,01 | PCWC | |
| 12 | TABLE ACCESS FULL | FCS_TFILEDATA | 21M| 2769M| | 21836 (1)| 00:04:23 | Q1,01 | PCWP | |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("F"."FILE_SIZE"="M"."FILE_SIZE" AND "F"."FILE_MD5SUM"="M"."FILE_MD5SUM" AND "F"."FILE_ADLER32"="M"."FILE_ADLER32")

(Note: Even if explain plan shows very good performance, however, the execution in reality is quite bad. It is still running after almost 13 hours running. I believe there must be something wrong.

And I remember last time I did similar testing, in our system, it looks like slower when enabling parallel in CTAS. Only helpful parameter is nologging (it was much faster), however, I don't want go with it as backup is needed afterward. It need involve DBA to do it. As this is daily clean jobs from APP, so I prefer to not bother DBA. It sounds weird, but it happened. Any idea on this matter?)

So, right now:

Single Statement deletion: Deleted 290K, took 6238.763 second
PL/SQL Bulk Delete: Deleted 290552 records, took 3548.595 second
CTAS (parallel): Still running, more than 12 hours.

Is there something wrong with our Oracle? Or any suggestion?

Edgy
Tom Kyte
May 10, 2013 - 3:33 pm UTC

what is your pga_aggregate_target? how much memory do you have on your system.

I'm guessing that the bulk statements are spilling to disk (temp) and that is hurting them.

Big amount deletion performance is poor

Edgy, May 10, 2013 - 5:13 pm UTC

Hi Tom,

It takes time to get those values as I have to request tickets and go through DBA. It is frustrated. But, it is true. Will update to you when get it.

Supposed the DB server has lots of memory, at least 24G, I think. But not so sure.

So far, looks like pl/sql bulk deletion is best result according to my testing, even if it shouldn't. But I don't know where is wrong. :(

Edgy

Big amount deletion performance is poor

Edgy, May 13, 2013 - 4:50 pm UTC

Hi Tom,

Just FYI. i did run CTAS (WITHOUT nologging and parallel option) during weekend, it took 6 hours to finish.

SQL> create table fcs_tfiledata_new as
  2  select * from fcs_tfiledata f
  3  where (f.file_size, f.file_md5sum, f.file_adler32 ) in
  4   ( select m.file_size, m.file_md5sum, m.file_adler32 from fcs_tmetadata m);

Table created.

Elapsed: 06:17:40.58
SQL>

It is too long in this case. 

Thanks,
Edgy

Big amount deletion performance is poor

Edgy, May 14, 2013 - 5:31 pm UTC

Hi Tim,

I got response and here is information for our Oracle.
"
In Oracle 11g the new automatic management parameters "memory_max_target" and "memory_target" and they are set up as :

memory_max_target 24256M
memory_target 24256M

as you see they are set to 24GB.

The total memory allocated for database server is:
Mem: 60251M total, 38696M used, 21554M free
"

However, CTAS seems does only need 2 more G memory. So, CTAS should not that slow, right? So, what's else could be real cause?

Thanks,
Edgy
Tom Kyte
May 14, 2013 - 9:22 pm UTC

In this particular case (and this is hypothetical, i'm guessing this is what is happening), the CTAS:

create table to_delete (row_id) nologging parallel as SELECT
f.rowid
FROM fcs_tfiledata f
WHERE (file_size, file_md5sum,file_adler32)
not in (select file_size, file_md5sum,file_adler32 from fcs_tmetadata)

you are doing is probably able to complete entirely in memory using the primary key index on fcs_tfiledata and an anti-join to fcs_tmetadata. It can do this because it doesn't need to have the entire row from fcs_tfiledata in the result, just the rowid.

then you retrieved this row_id ordered (and that sort probably fit into memory since you have a generous amount of memory and only had to sort some row_ids).

that limited the number of physical IO's/cache gets you would have to do since you deleted in an ordered fashion.

this is one of those unusual cases where a procedural process can best a big set based one. It is unusual, it is not the norm.

It is probably because your small procedural steps (your CTAS, your order by row_id) could be done in memory - no swapping to temp - and then the procedural delete minimized IO's by hitting the table in sorted rowid order - making so that we'd never have to read and re-read the same block to delete rows from it

Big amount deletion performance is poor

Edgy, May 22, 2013 - 8:12 pm UTC

Hi Tom,

Sorry for not reply for a couple days.

Yes, you are right. If the memory is not big enough, Oracle is going to use temp. My situation is our Oracle DB has 24G memory(memory_max_target=>24256M; memory_target=>24256M), here is the explain plan. Looks like it has not reach max memory in this case.

So, in this case, why it is very slow? (explain plan says 7 minutes, in fact, it took more than 6 hours)

explain plan for
create table fcs_tfiledata_new parallel as
select * from fcs_tfiledata f
where ( f.file_size, f.file_md5sum, f.file_adler32 ) in
(select m.file_size, m.file_md5sum, m.file_adler32 from fcs_tmetadata m);
----------------------------------------------------------------------------------------------------
----------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost
(%CPU)| Time | TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------
----------------------------------------
| 0 | CREATE TABLE STATEMENT | | 2477K| 418M| | 36372
(1)| 00:07:17 | | | |
| 1 | LOAD AS SELECT | FCS_TFILEDATA_NEW | | | |
| | | | |
| 2 | PX COORDINATOR | | | | |
| | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10002 | 2477K| 418M| | 36036
(1)| 00:07:13 | Q1,02 | P->S | QC (RAND) |
|* 4 | HASH JOIN RIGHT SEMI BUFFERED| | 2477K| 418M| 86M| 36036
(1)| 00:07:13 | Q1,02 | PCWP | |
| 5 | PX RECEIVE | | 51M| 2181M| | 5202
(1)| 00:01:03 | Q1,02 | PCWP | |
| 6 | PX SEND HASH | :TQ10000 | 51M| 2181M| | 5202
(1)| 00:01:03 | Q1,00 | P->P | HASH |
| 7 | PX BLOCK ITERATOR | | 51M| 2181M| | 5202
(1)| 00:01:03 | Q1,00 | PCWC | |
| 8 | INDEX FAST FULL SCAN | FCS_TMETA_FILE_IDX | 51M| 2181M| | 5202
(1)| 00:01:03 | Q1,00 | PCWP | |
| 9 | PX RECEIVE | | 21M| 2769M| | 21836
(1)| 00:04:23 | Q1,02 | PCWP | |
| 10 | PX SEND HASH | :TQ10001 | 21M| 2769M| | 21836
(1)| 00:04:23 | Q1,01 | P->P | HASH |
| 11 | PX BLOCK ITERATOR | | 21M| 2769M| | 21836
(1)| 00:04:23 | Q1,01 | PCWC | |
| 12 | TABLE ACCESS FULL | FCS_TFILEDATA | 21M| 2769M| | 21836
(1)| 00:04:23 | Q1,01 | PCWP | |
----------------------------------------------------------------------------------------------------
----------------------------------------

Really confused by those performance estimation.

Thanks!
Edgy
Tom Kyte
May 22, 2013 - 9:00 pm UTC

monitor temp usage as it runs.


the tempspc is an *estimate* just like the time. if it is generating more temp than it estimates.. it'll take potentially a lot longer.

More to Explore

DBMS_OUTPUT

More on PL/SQL routine DBMS_OUTPUT here