Skip to Main Content
  • Questions
  • Deleting Duplicate Rows from a table

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Shaun.

Asked: May 31, 2000 - 11:46 am UTC

Last updated: September 13, 2005 - 12:44 pm UTC

Version: 8.0.5

Viewed 1000+ times

You Asked

I have a table which is comprised of the following:

ACCT_ID CUST_ID LOAD_DATE
------- ------- ---------
1234 ABCD 20000101
1234 DEFG 20000201
5678 STDF 20000201
5678 TRFD 20000101
9111 ADFG 20000201
9112 ASDX 20000201

I need to delete the oldest record based upon ACCT_ID.
I need to delete ACCT_ID=1234, CUST_ID=ABCD because its LOAD_DATE=20000101 which is older than ACCT_ID=1234, CUST_ID=DEFG
Similarly I need to delete ACCT_ID=5678, CUST_ID=TRFD.

I need to do this for approx 20,000 records from a table which contains 800,000 records.


and Tom said...

Your question is a little ambigous. You talk of deleting duplicate rows but also say "I need to delete the oldest record based upon ACCT_ID".

Duplicate record deletion would get rid of 0, 1 or more rows for the same key -- keeping the "newest" load_date.

Deleting the oldest record by acct_id would always delete 1 record per unique acct_id.

Both are demonstrated here....


ops$tkyte@8i> select * from t;

ACCT_ID CUST_ID LOAD_DATE
---------- ------------------------- ----------
1234 ABCD 20000101
1234 DEFG 20000201
5678 STDF 20000201
5678 TRFD 20000101
9111 ADFG 20000201
9112 ASDX 20000201

6 rows selected.

there is our test data...


ops$tkyte@8i> delete from t
2 where load_date <> ( select max(load_date)
3 from t b
4 where b.acct_id = t.acct_id )
5 /

2 rows deleted.


This delete goes after ACCT_ID's that have more then 1 occurence and delete ALL but the "newest" -- the MAX. We get the max(load_date) for a given acct_id and unless you have that load_date -- you are deleted. This will delete 0, 1 or more rows per acct_id.

There should be an index on (acct_id,load_date)


ops$tkyte@8i> select * from t;

ACCT_ID CUST_ID LOAD_DATE
---------- ------------------------- ----------
1234 DEFG 20000201
5678 STDF 20000201
9111 ADFG 20000201
9112 ASDX 20000201

ops$tkyte@8i> rollback;
Rollback complete.

Now we can try again. This time, we pick the OLDEST record for an ACCT_ID and delete it -- regardless of the number of records with that ACCT_ID

ops$tkyte@8i> delete from t
2 where load_date = ( select min(load_date)
3 from t b
4 where b.acct_id = t.acct_id )
5 /

4 rows deleted.

ops$tkyte@8i>
ops$tkyte@8i> select * from t;

ACCT_ID CUST_ID LOAD_DATE
---------- ------------------------- ----------
1234 DEFG 20000201
5678 STDF 20000201


Rating

  (7 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Hi

Praveen, March 12, 2003 - 4:48 am UTC

Tom,

You are so simple....

warm regards

Praveen

Ricky Dionisio, September 15, 2003 - 5:04 pm UTC

simple indeed.

Thanks,

Developer

A reader, September 12, 2005 - 3:14 pm UTC

It is extrem slow if the table has millions records. Is there an another to delete the duplicate record in the large table?

Thanks

Tom Kyte
September 12, 2005 - 3:22 pm UTC

the fastest way I've discovered to date is thus:


delete from t
where rowid in ( select rid
from ( select rowid rid, row_number() over
(partition by YOUR_KEY order by rowid) rn
from t)
where rn <> 1 )



delete statement

shams, September 13, 2005 - 1:03 am UTC

I have a table t1
ID actno type seq src
1 111 CHK 1 L
1 111 SAV 2 L

table t2
ID actno type seq src
1 111 CHK 2 GL
1 111 SAV 3 L

I want to delete the records in t2 for the records that has no sequence matching records in t1 and src != 'GL'.
In the given example
the following record from t2 should be deleted

1 111 SAV 3 L

and not the first record

Regards,
Shams

Try this out

Kamran, September 13, 2005 - 1:52 am UTC

delete from t2
where not exists
(select t1.seq from t1 where t1.seq=t2.seq);

delete

shams, September 13, 2005 - 10:47 am UTC

That does not work because the I do not want to delete the records from t2 if the t2.src = 'GL' .
The records in T2 are deleted only for the following criteria
t1.src = 'L' and t2.src = 'L'
t1.src = 'GL' and t2.src = 'GL'

and the records are not touched in t2 if
t1.src = 'L' and t2.src = 'GL'


Thanks

Tom Kyte
September 13, 2005 - 12:44 pm UTC

how to join those tables - do the id and acct columns have any meaning or is seq all that it is about.

delete

shams, September 13, 2005 - 4:54 pm UTC

id,acct_no and seq combination makes sense as
the extra records in t2 should be checked with the combination and delete the recods if the combination does not exits in the table t2.