Sokrates, February 14, 2012 - 2:07 pm UTC
Luke, June 14, 2012 - 9:00 am UTC
Hi Tom - we have a suite of Java applications that depend on a common framework. The framework has a schema to store its own data, and each application creates a new schema at install time. Different deployments will have different applications installed.
We have a case where a designer wants to add a FK from an application table to a framework table and use "on delete cascade". We have never used cascade functionality, and I'm hesistant to go start down that path. The framework data is removed by the Java framework issuing simple 'delete' statements based on some external condition. I'm thinking one solution would be to have the framework performs callbacks into the applications to do their own cleanup first, but adding 'on delete cascade' is appealing because it's a one-line change, and I haven't been able to come up with any compelling arguments against it. Do you think cascade is appropriate in this case, or do you have any other suggestions? Thanks.
June 14, 2012 - 12:15 pm UTC
I'm not a fan of it IN GENERAL, but that doesn't mean to NOT do it.
I don't know what it means to have the "framework perform call backs into the applications to do their own clean up first", but I'm not sure I like it.
on delete cascade might be useful here, if used judiciously after considering any side effects it might have.
thanks for feedback
Luke, June 14, 2012 - 1:19 pm UTC
By "framework perform call backs into the applications to do their own clean up first" I meant allowing applications to 'register' themselves with the framework at installation time, so the framework can notify the applications before performing the delete operation, allowing the apps to clean up dependent data. I'm not sure I like it either. I'm worried that either approach ('callbacks' or delete cascade) could get out of hand if overused. Thanks for your input.
June 14, 2012 - 3:38 pm UTC
If applications do this:
"Different deployments will have different applications installed."
the 'callbacks' will have to be really dynamic - I'd rather have the application register a static, documented, something I can clearly see in the dictionary side effect rather than a dynamic "we'll link you in at runtime, cross our fingers the stars align properly, and run some code" approach.
On delete cascade is done row by row - which can be a performance issue if you are deleting thousands of parent rows (but not so much if you delete say ONE). The dynamic application invocation would have even more overhead.
I'm *probably* leaning heavily towards an on delete cascade with lots of system documentation for future people as to why this stuff might exist. that is over a "I cannot see what is going to happen until it actually happens" approach.
Luke, June 14, 2012 - 4:03 pm UTC
Thank you, that's very helpful.
Martin Rose, June 15, 2012 - 5:41 am UTC
But you must still define the ON-DELETE-CASCADE constraint in your DB whenever a master-child relationship exists, even if you choose to do your DELETEs via a different method, just in case someone inadvertantly executes a normal DELETE against the parent. Otherwise you'll end up with orphaned children, (aka. bad data), and a DB you cannot trust...
June 15, 2012 - 7:57 am UTC
this is why I'm not a fan of that "let us let the application do this stuff".
I'd rather use the on delete cascade than to trust an application framework to know what to do.
I do not understand the example
Galen Boyer, June 15, 2012 - 9:11 am UTC
How could a parent be deleted and orphan child records? Does you
database not have FKs?
I am not a fan of delete cascade for exactly the reason you used in
your example. Without delete cascade, someone could NOT have deleted
the parent without understanding and handling its children. With
delete cascade, someone could come into the database, issue a delete
statement and vhalla, bye-bye all data.
June 15, 2012 - 1:15 pm UTC
If the application did the work, if you relied on the application to do the delete cascade.
If you do that, and anyone does anything outside of that application, all bets are off.
They are (in my mind) discussing having some "frame work" take care of things (like referential integrity) versus the database.
The viola - bye bye all data - would only happen if someone put a declaritive integrity constraint in place with explicitly asking for on delete cascade. meaning - they announced and declared and documented their wish to be deleted when the parent goes away. No matter what application did the delete. If that is not correct - one would assume they would not do it in the first place.
I'm not a huge fan of on delete cascade, I prefer transactional API's (no one does a delete from outside of a stored procedure). But given the choice between "let the application do it" (which is a joke to me, it won't) versus "let the database do it", I go with the database.
A reader, June 15, 2012 - 10:21 am UTC
Well I will give you that, that Yes you could block the cascade deletes entirely, but as Tom said in reply to my previous reply, you now need to code that knowledge about being able to do the delete on the parent, OUTSIDE of the database.
You now need to know that deletes are a logical possibility on the parent & that that parent has delete-able children.
I'm not saying it can't be done (and neither is Tom), but rather by moving this knowledge out of the DB you have lost info. about operations that can be performed on your DB.
> With delete cascade, someone could come into the database, issue a delete statement and vhalla, bye-bye all data.
Well yes. That's what you want to happen when you delete a parent, in order to keep data integrity.
That is what the model does for you
Galen Boyer, June 15, 2012 - 10:51 am UTC
You have a data model with RI constraints. Either the java layer
abides by it or they get errors. Your model guarantees that they must
abide by it. So, whatever layer does the delete must abide. Now, if
you want to move the delete into PLSQL, so it is done in the database,
then the PLSQL must abide by it. Either way, code cannot get it wrong
if the model demands it.
June 15, 2012 - 1:17 pm UTC
As long as they let the database do RI (referential integrity). But the millisecond someone says "our framework" - I know they don't use that stuff.... Guess I'm relying on my empirically observed behavior of programmers here.
Luke, June 18, 2012 - 8:24 am UTC
"As long as they let the database do RI (referential integrity). But the millisecond someone says "our framework" - I know they don't use that stuff"
That's not true in our case - we'd continue to use foreign key constraints, but use "our framework" (maybe I should have used a different term) to tell one app to delete its dependent data before the framework deletes its own data. This would all be done in Java, with the framework and apps each making JDBC calls - all in one transaction. I don't think you will like this approach, but that's the type of environment we are working in.
What about this:
Say I EMP and DEPT tables. I have PL/SQL procedures for adding/deleting employees. Some departments want to create their own tables to manage information about their employees. I have no way of knowing what these tables will be like ahead of time. So dept1 creates table EMP_ACTIVITY with a FK to EMP, and some other tables that have FK's to EMP_ACTIVITY, etc. So now my delete procedure will fail due to RI unless all departments delete their dependent data first. Is there some common approach to dealing with this type of situation? Or would you just say to avoid this altogether (which realistically isn't an option for me).
From a Java perspective, the concept of registering 'callbacks' (or listeners) with the framework would be a pretty standard approach for dealing with this. As I mentioned before, I'm considering allowing apps to use 'on cascade delete' when creating the FK, as long as there is not much app-specific data pointing to the framework tables. Thanks.
June 18, 2012 - 9:14 am UTC
Is there some common approach to dealing with this type
of situation? Or would you just say to avoid this altogether (which
realistically isn't an option for me)
if the desired behavior is "when we delete a row from DEPT, we want our child rows in DEPT_CHILD to be removed", on delete cascade would be the simpliest approach. Document the HECK out of the fact you are doing this.
Why would you care about the volume of data? I'd rather have the database do it over procedural code if at all possible.
the only way I might be swayed differently would be if you frequently BULK DELETE from the parent table - removing dozens or hundreds or more rows in a single delete. If you do that, it would be more efficient to bulk delete from the children first using the same list of parent primary keys to remove.
Luke, June 19, 2012 - 8:09 am UTC
Thanks Tom and everyone else who commented - this discussion was very helpful.