Thanks for the question, Jim.

Asked: June 17, 2002 - 11:47 am UTC

Last updated: August 31, 2005 - 1:02 pm UTC

Version: 8i

Viewed 10K+ times! This question is

You Asked

Tom,

I am interested in finding the fastest way to save off
the duplicate records to a duplicates table. My source
table has about 13 million rows and the users want to remove
all duplicates from the primary table and store them in a
duplicates table for later review. (I know, not exactly the
smartest thing to do. However, I have not control over this).
I expect there to be a relatively low number of dups.
(less than 1%). Here is what I have come up with.

create table dup_rowids
as
(select rowid
from table_a
minus
select Max(rowid)
from table_a
group by A, B, C);

Using this list, I can then create table for those records
that are dups and those records that are unique

Am I on the right track?

Thanks,
Jim

and Tom said...

how about adding a unique constraint on the columns in question and using "exceptions into". For example:

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t ( a int, b int, c int );

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> exec gen_data( 'T', 500 );

PL/SQL procedure successfully completed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into t select * from t where rownum < 10;

9 rows created.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table exceptions(row_id rowid,
2 owner varchar2(30),
3 table_name varchar2(30),
4 constraint varchar2(30));

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> alter table t add constraint t_unique unique(a,b,c) exceptions into exceptions;
alter table t add constraint t_unique unique(a,b,c) exceptions into exceptions
*
ERROR at line 1:
ORA-02299: cannot validate (OPS$TKYTE.T_UNIQUE) - duplicate keys found

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table dups
2 as
3 select *
4 from t
5 where rowid in ( select row_id from exceptions )
6 /
create table dups
*
ERROR at line 1:
ORA-00955: name is already used by an existing object

ops$tkyte@ORA817DEV.US.ORACLE.COM> delete from t where rowid in ( select row_id from exceptions );

18 rows deleted.

ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into t select distinct * from dups;

9 rows created.

Rating

(17 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

One more tip

A Reader, June 17, 2002 - 3:23 pm UTC

Tom,

In your response you have assumed that the duplicate rows are absolutely identical, column by column. Sometimes it's not teh case; only the values in the PK column(s) are identical but the rest may be different. So a distinct * will not work in that case. Rather I suggest the following after the dups table is created.

delete dups o where rowid in
(select rowid from dups where a = o.a
and rownum < 2)

Then the dups table will be free of duplicated. In the last stage just issue : "insert into t select * from dups" instead of select distinct * from dups.

Hope this helps.

Another tip

Raman, June 18, 2002 - 3:03 am UTC

try this :

SELECT * FROM emp a
WHERE rowid > ANY
(SELECT rowid FROM emp b
WHERE a.ename = b.ename
)

will give you all the duplicate copies for records.

Raman.

more ideas

Igor, March 13, 2003 - 5:52 pm UTC

If you dont need to delete rows from dups.

You could

INSERT INTO t
SELECT *
FROM dups
WHERE ROWID IN (SELECT row_id
FROM (SELECT DISTINCT id, MAX (ROWID) AS row_id
FROM dups
GROUP BY id))

Its faster.

March 14, 2003 - 5:51 pm UTC

really? faster?

where is the test that shows this.

How about writing a pl/sql proc to remove dups

Chandru, March 14, 2003 - 9:36 pm UTC

Tom,

If I write a pl/sql proc as below to remove duplicate records, would this be fast enough to clean a huge table, say, with million records? Any comments will be appreciated. Thanks. -- Chandru

create or replace procedure rem_dups
is
cursor dup_rec_cur
is
select n, count(*)
from test
group by n
having count(*) > 1;

cursor record_id_cur (p_record number)
is
select n,rowid
from test
where n = p_record;

v_rowid1 rowid;

begin -- rem_dups

for r in dup_rec_cur loop
for s in record_id_cur(r.n) loop
v_rowid1 := s.rowid;
exit;

end loop;

delete from test
where rowid <> v_rowid1 and n = r.n;

end loop;
commit;
end rem_dups;
/

March 15, 2003 - 9:03 am UTC

just use sql, never write code unless you have to.

Got it!

Chandru, March 15, 2003 - 11:07 am UTC

Thanks. I thought exec a pl/sql proc is simpler than multiple sql statements to identify the dups and then delete them. However, I fully agree with you that if sql can do it more efficiently than a pl/sql procedure, stick with sql.

Listing duplicates

Joe, September 11, 2003 - 3:32 pm UTC

Hi Tom,

I have a table containing Bill of Material items. How do I find the parent BOM items using the same child items?

Here is the data for table bom_list:

bom_nbr item qty
1001 BK7001 3
1001 TL0905 2 <-
1001 JJ1209 1
1002 AA1023 3
1002 AP0922 2
1003 WR7834 1
1003 TL0905 5 <-
1003 XT1129 1
1003 LM9870 1
1003 VB0997 1 <-
1004 SK2545 4
1004 AJ0005 1
1004 SG1212 1
1004 KF5412 1
1004 AS7890 1
1004 VB0997 2 <-
1005 SA1990 1
1005 TL0905 1 <-

This is the output that I am trying to get:

bom_nbr item qty
1001 TL0905 2
1003 TL0905 5
1005 TL0905 1
1003 VB0997 1
1004 VB0997 2

Thanks,
Joe

September 11, 2003 - 7:45 pm UTC

no idea why your result set is what it is.

is there some logic hiding in there? I see your arrows, but don't really know why you have arrows there?

Joe, September 12, 2003 - 10:19 am UTC

Tom,

I have typed in the arrows to indicate that these are the repeating items. I want to find the repeating items in various BOM. So in the output list, I have listed the items that are repeating in different BOMs. And this is the output I am trying to reproduce.

Do this help?

Thanks.

September 12, 2003 - 10:52 am UTC

select *
from bom_table
where item in ( select item
from bom_table
group by item
having count(bom_nbr) > 1)

is one approach.

One more way

Oleksandr Alesinskyy, November 04, 2003 - 5:33 am UTC

What about:

delete from DUPTAB where rowid in (select rid from(
select rowid rid,row_number() over(partition by DUPCOL1,DUPCOL2,...,DUPCOLN order by rowid desc) rn
from DUPTAB
) where rn>1);

I guess it is fastest if no index exists on DUPCOL1...DUPCOLN as this approach saves 1 full scan

why delete?

A reader, March 01, 2004 - 4:18 pm UTC

Cant we simply do this?

insert into emp select * from dup b
where rowid in (select rowid
from dup c
where b.empno = c.empno
and rownum < 2)

March 01, 2004 - 4:56 pm UTC

not sure where it is fitting in -- but you need to delete from EMP to get rid of the dups first no? show the entire process, start with a table with dups and work through it -- then we can say

oops sorry

A reader, March 01, 2004 - 7:00 pm UTC

sorry I was actualy trying to answer the first follow up, anyway I think it fit in your initial example as well

ops$tkyte@ORA817DEV.US.ORACLE.COM> delete from t where rowid in ( select row_id
from exceptions );

18 rows deleted.

ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into t select distinct * from dups;

9 rows created.

The INSERT part can be something like

insert into t select * from dup b
where rowid in (select rowid
from dup c
where b.a = c.a
and b.b = c.b
and b.c = c.c
and rownum < 2)

instead of select distinct

March 02, 2004 - 7:10 am UTC

I'd rather "bulk" perform than use the correlated update which would be run like a nested loop

for x in ( select * from dup )
loop
select rowid from dup where ....
if x.rowid = that rowid then
insert
end if
end loop

so yes, you could, but I would not want to (you'd need an index on the key columns)

perhaps:

insert into t
select ....
from ( select t.*,
row_number() over (partition by KEY_COLS order by something) rn
from t )
where rn = 1;

that would be nice if the entire row as not "duped" -- just some defining key and you wanted to keep one row for each key.

Continuing to Insert to this table

Russell, May 14, 2004 - 3:42 am UTC

Tom,
based on your original reply,
I am using Oracle version 8.0.6.

I followed your steps, and removed duplicates from the base table. I am now inserting into that table, values from a staging table which may or may not already be in the base table (Effectively, I would like new values to go into the table, and not worry about records already there).

Is there a way you can write something like

Insert into T1
Select *
from T2
EXCEPTIONS INTO EXCEPTIONS;

I tried, and it returned error ORA-00933 SQL Command not properly ended.

I have implemented it as
Insert into T1
Select *
from T2
where (F1, F2) not IN
(SELECT F1, F2
FROM T1)

But this seems to be pretty slow

Thanks

Russell

May 14, 2004 - 10:16 am UTC

not in will be dog slow with the rbo. You are using rather "old" software.

try this:

select t2.*
from t1,t2
where t1.f1(+) = t2.f1
and t1.f2(+) = t2.f2
and t1.f1 is null
and t1.f2 is null;

don't right pl/sql only when you have to

Serge F., June 08, 2005 - 10:25 pm UTC

Hi Tom you gave this answer to one of the readers of this good thread.

Assuming that I have to use pl/sql code with no loops, but using your ideeas , I got the following problem:

If I execute in a SQLPlus Session:

alter table T
ENABLE CONSTRAINT T_PK
EXCEPTIONS INTO T_EXCEPTIONS;

and I have duplicates, indeed those rows are going into T_EXCEPTIONS.

But if I put the same code into a package/stored procedure as follows:

PROCEDURE manage_dups
IS

BEGIN
EXECUTE IMMEDIATE ' alter table T
ENABLE CONSTRAINT T_PK
EXCEPTIONS INTO T_EXCEPTIONS ';
EXCEPTION
WHEN OTHERS THEN
BEGIN INSERT INTO DUPS
select * from t where rowid in
(select row_id from t_exceptions);

.....
-- some other inserts and deletes as in your original example
END;
END;

this procedure (or package) doesn't work !! I mean, there is nothing in exception table, so when it jumps into exception block there are no records to insert into DUPS.

Unless I miss something silly, please provide us your expertise
Thank you.

June 08, 2005 - 10:50 pm UTC

example please.

tkyte@ORA9IR2W> delete from exceptions;

2 rows deleted.

tkyte@ORA9IR2W> create table t ( x int );

Table created.

tkyte@ORA9IR2W> alter table t add constraint t_pk primary key(x) disable;

Table altered.

tkyte@ORA9IR2W> insert into t values ( 1);

1 row created.

tkyte@ORA9IR2W> insert into t values ( 1);

1 row created.

tkyte@ORA9IR2W> commit;

Commit complete.

tkyte@ORA9IR2W>
tkyte@ORA9IR2W> create or replace procedure dups
2 as
3 begin
4 execute immediate 'alter table t enable constraint t_pk exceptions into exceptions';
5 exception
6 when others then
7 for x in (select rowid r from exceptions )
8 loop
9 dbms_output.put_line( 'rid = ' || x.r );
10 end loop;
11 end;
12 /

Procedure created.

tkyte@ORA9IR2W> exec dups
rid = AAAIHvAABAAANMiAAA
rid = AAAIHvAABAAANMiAAB

PL/SQL procedure successfully completed.

pragma autonomous transaction

Serge F., June 09, 2005 - 9:21 am UTC

Thank you Tom, for your answer.

The main procedure doing the initial insert, was calling this separate procedure to enforce back the primary key after initial insert.
Later reviewing my all stuff I noticed that I was using pragma autonomous transaction into that procedure. By commenting out the PRAGMA, it works.

Thanks again.

procedure to remove duplicate after sqlldr direct load

Ravi, August 30, 2005 - 7:03 pm UTC

Hi Tom,
      I am trying to write a procedure to remove duplicates from a table of million records, every day i will be loading 30 tables using direct load and most of the time i get duplicates(not full row but duplicate in composite primary key)i need to copy the duplicate record as a string(record_string = c1,c2,c3..) into some other table, and remove duplicate rows from the main table, I am trying your way but i get an error when i execute it.
can you please help me out, where am i going wrong

 1  CREATE OR REPLACE PROCEDURE duplicates
 2  (p_table_name VARCHAR2,
 3  p_const_name VARCHAR2)
 4  AS
 5  sql_statement VARCHAR2(1000);
 6    first_pass        VARCHAR2(1);
 7     col_string        VARCHAR2(1000);
 8  dup_col_string VARCHAR2(500);
 9  CURSOR get_columns IS
10      SELECT column_name
11       FROM all_cons_columns
12       WHERE table_name = UPPER(p_table_name)
13         AND constraint_name = UPPER(p_const_name)
14       ORDER BY position;
15  BEGIN
16        sql_statement := ('alter table ' || p_table_name ||
17                         ' enable constraint ' ||
18                         p_const_name ||' exceptions into exceptions');
19          execute immediate sql_statement;
20  EXCEPTION WHEN OTHERS THEN
21  sql_statement := 'create table dups  as
22    select *  from '||p_table_name||' where rowid in ( select row_id from exceptions )' ;
23  EXECUTE IMMEDIATE sql_statement;
24   /* Build column string */
25    col_string := NULL;
26    first_pass := 'Y';
27    FOR col_rec IN get_columns LOOP
28       IF (first_pass = 'N') THEN
29         col_string := (col_string ||' AND '|| col_rec.column_name||' = '||'d.'||col_rec.column_name)
30       END IF;
31       col_string := (col_string || col_rec.column_name||' = '||'d.'||col_rec.column_name);
32       first_pass := 'N';
33    END LOOP;
34         sql_statement := 'delete dups d where rowid in
35         (select rowid from dups where '||dup_col_string||' and rownum < 2)' ;
36         EXECUTE IMMEDIATE sql_statement;
37         sql_statement := 'DELETE FROM '||p_table_name||' where rowid in (select row_id from exceptio
38         EXECUTE IMMEDIATE sql_statement;
39          sql_statement := 'INSERT INTO '||p_table_name||' select * from dups';
40         EXECUTE IMMEDIATE sql_statement;
41* END;
42  /

procedure created.

SQL> exec duplicates('EMP','PK_EMP');
BEGIN duplicates('EMP','PK_EMP'); END;

*
ERROR at line 1:
ORA-01031: insufficient privileges
ORA-06512: at "SCOTT.DUPLICATES", line 23
ORA-02437: cannot validate (SCOTT.PK_EMP) - primary key violated
ORA-06512: at line 1

create table dup_record(table_name varchar2(30), 
cons_name varchar2(30), record_value varchar2(30) )
/

August 31, 2005 - 1:02 pm UTC

</code> http://asktom.oracle.com/Misc/RolesAndProcedures.html <code>
you need create table granted to you to create a table.

in the above question wrong size for record_value

ravi, August 30, 2005 - 8:02 pm UTC

Hi TOM,
sorry i also dint include the code to insert the values into this table, as i dint know how to do it, do i need to drop dups table each time and recreate it or is there any way to do it dynamically, any help or suggestions please

create table dup_record(table_name varchar2(30),
cons_name varchar2(30), record_value varchar2(2000) )
/

Thank you

RAVI, August 31, 2005 - 6:19 pm UTC

your method of doing this is great, working very fast, compared to the other way of finding duplicates using group by clause and saving them to a table and then run a query again to delete them, I found the diffrence in my case i have around 6 to 10 columns as composite primary key

and with DML error logging... mighty nice

Duke Ganote, November 12, 2009 - 7:24 pm UTC

SQL> CREATE TABLE T ( T INT, U CHAR(1) );

SQL> CREATE INDEX t_pk ON t ( t );

SQL> ALTER TABLE t ADD CONSTRAINT t_pk PRIMARY KEY ( t )  EXCEPTIONS INTO EXCEPTIONS;

SQL> ALTER TABLE t DISABLE PRIMARY KEY;

TABLE altered.

SQL> INSERT INTO t VALUES ( 1, 'a' ) LOG errors REJECT LIMIT 1;

1 ROW created.

SQL> INSERT INTO t VALUES ( 1, 'a' ) LOG errors REJECT LIMIT 1;

1 ROW created.

SQL> INSERT INTO t VALUES ( 2, 'aa' ) LOG errors REJECT LIMIT 1;

0 ROWS created.

SQL> ALTER TABLE t ENABLE PRIMARY KEY;
ALTER TABLE t ENABLE PRIMARY KEY
*
ERROR AT line 1:
ORA-02437: cannot VALIDATE (CIDWWRITE.T_PK) - PRIMARY KEY violated

SQL> SELECT table_name, row_id, CONSTRAINT FROM EXCEPTIONS;

TABLE_NAME                     ROW_ID             CONSTRAINT
------------------------------ ------------------ ------------------------------
T                              AADm78ALRAAAhlDAAB T_UNIQUE
T                              AADm78ALRAAAhlDAAA T_UNIQUE

SQL> SELECT ora_err_mesg$, t, u FROM err$_t;

ORA_ERR_MESG$                  T     U
------------------------------ ----- -----
ORA-12899: VALUE too large FOR 2     aa
 COLUMN "CIDWWRITE"."T"."U" (a
ctual: 2, maximum: 1)

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

One more tip

Another tip

more ideas

How about writing a pl/sql proc to remove dups

Got it!

Listing duplicates

One more way

why delete?

oops sorry

Continuing to Insert to this table

don't right pl/sql only when you have to

pragma autonomous transaction

procedure to remove duplicate after sqlldr direct load

in the above question wrong size for record_value

and with DML error logging... mighty nice

More to Explore

PL/SQL demos

PL/SQL docs

Fastest way to save off Duplicates

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

One more tip

Another tip

more ideas

How about writing a pl/sql proc to remove dups

Got it!

Listing duplicates

One more way

why delete?

oops sorry

Continuing to Insert to this table

don't right pl/sql only when you have to

pragma autonomous transaction

procedure to remove duplicate after sqlldr direct load

in the above question wrong size for record_value

and with DML error logging... mighty nice

More to Explore

PL/SQL demos

PL/SQL docs