Skip to Main Content
  • Questions
  • Hierarchical delete from childs back to parent

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Martijn.

Asked: April 22, 2013 - 7:28 pm UTC

Last updated: April 24, 2013 - 6:10 pm UTC

Version: 11.2.0.3

Viewed 1000+ times

You Asked

Hi Tom,

Glad you're back again and taking questions.
This time I got quite an exotic task, but managed to make great progress sofar.
The requirement (from cst) is as follows:
Based on the datadictionary I'm traversing trough a hierarchical query (from leafs to root), deleting all rows that match with PK-values in a table, table name is parameterized.
I've managed to rebuild the existing slow-by-slow code into set based parallel delete statements ( oh man, I love 11.2 :) ). Where the old code needed 7/8 hours to delete 17,000 rows, now we're able to delete (the first) 25,000,000 rows in less than 30 minutes, the total number of rows to delete is 40,000,000. So, ideally this operation should take about an hour, one would think...
BUT:
There are 3 tables (out of a total of 198 tables ), among which the master table, that need lots of time. They end up in v$session_longups, with an estimated time between 6000/7000 seconds, scanning approx. 65,000 blocks. However, for other tables with much much less FK's, but the same amount of blocks, the delete is done almost instantly.

Therefore I'm sure that this is caused by all the FK's that are referencing those specific tables, so I wonder if there are some ways left to speed up these specific delete's in a more or less responsible way.
All I can think of now:
- disable those referencing FK's/indexes (but then I'll have to enable them after processing )
or
- gather table stats for the child tables that were deleted from, before deleting from those specific tables, especially the master/root table. (Some tables will end up 100% empty, so I would need to reset the HWM?)

What would your approach be in this case?
Are there other options left I forgot about/didn't know about?
I tried bulk processing, but it's much slower than issuing a delete statement.
and CTAS is not allowed by cst due to, in short: 'legal stuff' :( .
As always many thanks for your insights and comments on this matter!

Regards,
Martijn

and Tom said...

A foreign key is a slow by slow operation. For every foreign key you have - when you do a delete, we have to do an index range scan (single block io) and some locking.

I personally would never want to delete 40,000,000 rows. I would prefer high speed DDL to keep the rows I want to keep (similar to a reorganization). Delete is the single most resource intensive operation - it is the worst.


it would seem most of the tables would need a reorganization anyway after this - and the indexes could need a bit of looking at as well...

consider DDL to keep the rows you want


short of that - disabling the foreign keys to remove the slow by slow from the bulk delete and re-enabling them afterwards. I don't see how the gathering of stats would help - the cost of full scanning would still be high on a row by row basis even if the tables where empty. (and yes, you'd have to reorganize them first in order to reset the high water mark)

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Rebuild through DBMS_INDEX_UTL?

Hoek, April 24, 2013 - 9:41 am UTC

>I personally would never want to delete 40,000,000 rows. I would prefer high speed DDL to keep the rows I want to keep (similar to a reorganization). Delete is the single most resource intensive operation - it is the worst.
> consider DDL to keep the rows you want
Totally agree, that is what I initially had in mind (ofcourse), however it is not allowed by cst.

> and yes, you'd have to reorganize them first in order to reset the high water mark
Thanks for that reminder!

We're currently following this approach:
- loop through hierarchy bottom up
- delete all rows at once (the complete SET) in PDML from child table (joined to temp PK table with rows to delete)
- alter table child table move
- dbms_index_utl.build_table_indexes (child table)
- delete in PDML from parent table
- alter table parent table move
- dbms_index_utl.build_table_indexes (parent table)
- drop temp table
It's actually very little code, but very fast and powerful.

All (DML/constraint) errors are being handled through DML Error logging. I do not disable any constraints, I do not use any exception handler (not even a WHEN OTHERS ;) )
I'm using the function result cache for generating the delete statements, it works great. It now all works just at a very satisfying speed, after fiddling with several DOP's (and getting familiar with some new ORA-codes, like the ORA-12829 Deadlock ;) ), I've currently managed to delete around. 20 million rows in 500 secs, so we're quite very happy with that (final run will follow soon) :)

One question still:
Why is DBMS_INDEX_UTL undocumented (I found it in Morgan's Library)? It seems like a very handy package?
And since this is a one-time (data-conversion/migration) only action, would you have any objections against using it?

Thanks and regards,
Martijn

Tom Kyte
April 24, 2013 - 12:59 pm UTC

... however it is
not allowed by cst. ...


tell customer to live with the response time then, they are the limiting factor. I hate it when someone tells me "you cannot do X, but make Y go faster" - without any sound technical reasoning. It certainly doesn't motivate me to work on Y at all.


I logged that it was not documented, we'll see if they document it going forward.

you might add a quick query after the index rebuild to verify all were rebuilt and raise an exception if not - that would be the only thing I might add (in answer to objections - I have none other than you might add that query just to be safe - or a query at the end of the entire process to verify all is well, no indexes where missed)

True

Hoek, April 24, 2013 - 2:11 pm UTC


I hate it when someone tells me "you cannot do X, but make Y go faster" - without any sound technical reasoning. It certainly doesn't motivate me to work on Y at all.


From my experience sound technical reasoning is often (or: not always, but more often than one would ideally want) subjected to $$$/'politics'...
Just curious:
What would be your worst Y, by the way?

Oh well, on the sunny side, what motivated me still anyway: I finally got to use some 11.2 features I never got to use before professional/officially (besides a little playing around with them on my personal laptop), learned some great new stuff, improved my 'set based thinking/beliefs' :)

And, where the old 'Y' needed 7/8 hrs for 17k rows, the new 'Y' now needs about 8 mins for 20M rows, so (obviously) cst (and me myself ofcourse) is very satisfied with the response time ;)

Many thanks again for your time and remarks!
Tom Kyte
April 24, 2013 - 4:45 pm UTC

From my experience sound technical reasoning is often (or: not always, but more often than one would ideally want) subjected to $$$/'politics'...


that isn't sound technical reasoning :) I vote to say "we cannot do it fast because <group/person> says we cannot and doesn't give any science behind it".

or "<we/group/person> have consciously made a decision to do it slow"


What would be your worst Y, by the way?


I'm not sure what you mean?


Hoek, April 24, 2013 - 5:23 pm UTC

Doh, I used the wrong word: it should be 'subordinate(d?) to $$$/politics' instead of 'subjected to'...hope that clears things up.
And:
What I meant when asking What would be your worst Y , I was referring to your remark hate it when someone tells me "you cannot do X, but make Y go faster".
In other words:
Have you ever been wanting to do X, but (for whatever non-technical reason(s)) only Y was possible, and if so, I'm just curious what you had in mind originally (that would be X) and to what Y you needed/decided to resort as an alternative you could 'live with'...
Short-circuiting to: what do you consider your own Oracle-WTF?
I realize I'm getting a bit offtopic now, on the other hand I also think it would be an interesting read, not only for me, if you've ever been in such a situation ;)


Tom Kyte
April 24, 2013 - 6:10 pm UTC

I cannot really say what the worst "Y" is, but I can tell you the worst "X".

the worst "X" is "you cannot change the code".

I remember once where the customer told me how they would process an electronic transaction.

they would create a file for each customer transaction (XML of course)

they would save these files up and then have a program read the file and invoke a web service (guess they heard bulk processing was best but didn't really understand what that meant..)

the web service would in turn take the XML sent to it and parse it, process a single row transaction in the database and return some XML.

the program would receive the xml back, parse it and write yet another file

I knew how many of these files they would be processing (10's to 100's of millions). I said that even if they got this to run in say 5ms per file (that was being generous), this would take days to process all of the files.

I showed them how to do this in bulk - it could take seconds - literally seconds - if their program could just connect to a database and do the work directly.

But no, their architect said every interaction with the database had to be via a web service. every single one (I said, but what about the web services - do they have to use a web service??? if not, they break the rule ;) no amusement was to be had there...).

it was their architecture after all.

I walked away, there was no point. Even if you made this go twice as fast - it was still - days...


the right approach - change code. But code seems immutable these days :(


Hoek, April 24, 2013 - 8:34 pm UTC

But no, their architect said every interaction with the database had to be via a web service. every single one (I said, but what about the web services - do they have to use a web service??? if not, they break the rule ;) no amusement was to be had there...).
:D
Sort of an "Over-decoupl'ismized architect". Very recognizable, though.