Skip to Main Content
  • Questions
  • Online database maintenance to avoid duplicates

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question.

Asked: June 10, 2024 - 10:55 am UTC

Last updated: June 11, 2024 - 1:08 pm UTC

Version: 19.13

Viewed 1000+ times

You Asked

Hello, Ask TOM Team.

Scenario:

1. We have to deploy a feature in one app. The change consists of avoiding duplicates (mix of three database columns).
2. In the database site I have to add one column to a table (200+ millions) and then update it so the app can validate the rows and avoid duplicated rows. At the end of the database maintenance I have to create an index to avoid the duplicates in th database site.

So, here is the database steps (process 1):

1. Move duplicate rows to another tables (insert into) in another schema (like 6 or 7 tables with the same structure). --2hrs
2. Delete those rows in the original schema. --2 hrs
3. Update the column in one table (almost all rows in the table). --40 mins
4. Create a unique index. -- 12 mins

All of these tasks can be done online because rows can still being inserted. The problem is when deleting the rows (step 2), users can still insert duplicate data in the relational tables and the delete script may fail in some point because of FK constraint.

Process 2

I have the option to make the update first so the app can have the updated data to avoid duplicates despite the database index will not be created at this point. At this point, duplicates rows will not be inserted in the database and I can run the rest of steps without no issue. The problem is that this update takes 40 minutes to complete. I could rename the old table, create a new table with the original name, insert the data avoiding the update using and rename the new table with the original table name. This is much faster but is not an online process.

Questions:

1. What can I do to avoid the FKs constraint error? Process 1
2. How can I do the Process 2 online?

Thanks.

Reagards,

and Chris said...

It's unclear to me what the new column is for. Is this necessary to prevent duplicates? Or a flag the app uses to know there are no duplicates?

So here are some general thoughts.

To stop new duplicates appearing between copying and deleting the current duplicates, you can create an unvalidated unique constraint. Note this requires a non-unique index.

For example:

create table t ( c1 int );

insert into t values ( 1 );
insert into t values ( 1 );
commit;

create index i on t ( c1 );
alter table t add constraint u unique ( c1 ) 
  using index i 
  novalidate;

insert into t values ( 1 );
-- ORA-00001: unique constraint (CHRIS.U) violated on table CHRIS.T columns (C1)


2 hours for the inserts to complete feels like a long time - even for 200M+ rows. If this is taking 2 hours, I would expect there to be ways you can speed this process up.

What can I do to avoid the FKs constraint error? Process 1

Add a WHERE clause to check there NOT EXISTS child rows!

If this is complicated you can use the LOG ERRORS clause to capture the failed rows. This allows the delete to complete and remove rows without children.

However, this does bring the bigger question of how you plan on handling the child rows after the process completes.

How can I do the Process 2 online?

DBMS_redefinition has an execute_update procedure which enables you to do online updates.

Rating

  (1 rating)

We're not taking comments currently, so please try again later if you want to add a comment.

Comments

Follow Up

A reader, June 10, 2024 - 1:25 pm UTC

Thanks for the response.

Or a flag the app uses to know there are no duplicates? Yes.

2 hours for the inserts to complete feels like a long time - even for 200M+ rows. If this is taking 2 hours, I would expect there to be ways you can speed this process up. The insert is using a where condition.

If this is complicated you can use the LOG ERRORS clause to capture the failed rows. This allows the delete to complete and remove rows without children.

Can you give me a basic example of this?

delete from child1 c where
c.id where (select p.id from parent1 p where *some stuff*);

delete from parent1 p where *some stuff*; --the error would raised here.

DBMS_redefinition has an execute_update procedure which enables you to do online updates. In theory, the update would no block the table. Or wouldn't it? What I need in process 2 is to do it online to avoid the update because more update time means more duplicates.

Chris Saxon
June 11, 2024 - 1:08 pm UTC

Can you give me a basic example of this?

What exactly are you looking for? You've already deleted rows from the child - how are you getting an FK error?

What I need in process 2 is to do it online to avoid the update because more update time means more duplicates.

I don't understand what you're saying here, please clarify.

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library