One more tip
A Reader, June 17, 2002 - 3:23 pm UTC
Tom,
In your response you have assumed that the duplicate rows are absolutely identical, column by column. Sometimes it's not teh case; only the values in the PK column(s) are identical but the rest may be different. So a distinct * will not work in that case. Rather I suggest the following after the dups table is created.
delete dups o where rowid in
(select rowid from dups where a = o.a
and rownum < 2)
Then the dups table will be free of duplicated. In the last stage just issue : "insert into t select * from dups" instead of select distinct * from dups.
Hope this helps.
Another tip
Raman, June 18, 2002 - 3:03 am UTC
try this :
SELECT * FROM emp a
WHERE rowid > ANY
(SELECT rowid FROM emp b
WHERE a.ename = b.ename
)
will give you all the duplicate copies for records.
Raman.
more ideas
Igor, March 13, 2003 - 5:52 pm UTC
If you dont need to delete rows from dups.
You could
INSERT INTO t
SELECT *
FROM dups
WHERE ROWID IN (SELECT row_id
FROM (SELECT DISTINCT id, MAX (ROWID) AS row_id
FROM dups
GROUP BY id))
Its faster.
March 14, 2003 - 5:51 pm UTC
really? faster?
where is the test that shows this.
How about writing a pl/sql proc to remove dups
Chandru, March 14, 2003 - 9:36 pm UTC
Tom,
If I write a pl/sql proc as below to remove duplicate records, would this be fast enough to clean a huge table, say, with million records? Any comments will be appreciated. Thanks. -- Chandru
create or replace procedure rem_dups
is
cursor dup_rec_cur
is
select n, count(*)
from test
group by n
having count(*) > 1;
cursor record_id_cur (p_record number)
is
select n,rowid
from test
where n = p_record;
v_rowid1 rowid;
begin -- rem_dups
for r in dup_rec_cur loop
for s in record_id_cur(r.n) loop
v_rowid1 := s.rowid;
exit;
end loop;
delete from test
where rowid <> v_rowid1 and n = r.n;
end loop;
commit;
end rem_dups;
/
March 15, 2003 - 9:03 am UTC
just use sql, never write code unless you have to.
Got it!
Chandru, March 15, 2003 - 11:07 am UTC
Thanks. I thought exec a pl/sql proc is simpler than multiple sql statements to identify the dups and then delete them. However, I fully agree with you that if sql can do it more efficiently than a pl/sql procedure, stick with sql.
Listing duplicates
Joe, September 11, 2003 - 3:32 pm UTC
Hi Tom,
I have a table containing Bill of Material items. How do I find the parent BOM items using the same child items?
Here is the data for table bom_list:
bom_nbr item qty
1001 BK7001 3
1001 TL0905 2 <-
1001 JJ1209 1
1002 AA1023 3
1002 AP0922 2
1003 WR7834 1
1003 TL0905 5 <-
1003 XT1129 1
1003 LM9870 1
1003 VB0997 1 <-
1004 SK2545 4
1004 AJ0005 1
1004 SG1212 1
1004 KF5412 1
1004 AS7890 1
1004 VB0997 2 <-
1005 SA1990 1
1005 TL0905 1 <-
This is the output that I am trying to get:
bom_nbr item qty
1001 TL0905 2
1003 TL0905 5
1005 TL0905 1
1003 VB0997 1
1004 VB0997 2
Thanks,
Joe
September 11, 2003 - 7:45 pm UTC
no idea why your result set is what it is.
is there some logic hiding in there? I see your arrows, but don't really know why you have arrows there?
Joe, September 12, 2003 - 10:19 am UTC
Tom,
I have typed in the arrows to indicate that these are the repeating items. I want to find the repeating items in various BOM. So in the output list, I have listed the items that are repeating in different BOMs. And this is the output I am trying to reproduce.
Do this help?
Thanks.
September 12, 2003 - 10:52 am UTC
select *
from bom_table
where item in ( select item
from bom_table
group by item
having count(bom_nbr) > 1)
is one approach.
One more way
Oleksandr Alesinskyy, November 04, 2003 - 5:33 am UTC
What about:
delete from DUPTAB where rowid in (select rid from(
select rowid rid,row_number() over(partition by DUPCOL1,DUPCOL2,...,DUPCOLN order by rowid desc) rn
from DUPTAB
) where rn>1);
I guess it is fastest if no index exists on DUPCOL1...DUPCOLN as this approach saves 1 full scan
why delete?
A reader, March 01, 2004 - 4:18 pm UTC
Cant we simply do this?
insert into emp select * from dup b
where rowid in (select rowid
from dup c
where b.empno = c.empno
and rownum < 2)
March 01, 2004 - 4:56 pm UTC
not sure where it is fitting in -- but you need to delete from EMP to get rid of the dups first no? show the entire process, start with a table with dups and work through it -- then we can say
oops sorry
A reader, March 01, 2004 - 7:00 pm UTC
sorry I was actualy trying to answer the first follow up, anyway I think it fit in your initial example as well
ops$tkyte@ORA817DEV.US.ORACLE.COM> delete from t where rowid in ( select row_id
from exceptions );
18 rows deleted.
ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into t select distinct * from dups;
9 rows created.
The INSERT part can be something like
insert into t select * from dup b
where rowid in (select rowid
from dup c
where b.a = c.a
and b.b = c.b
and b.c = c.c
and rownum < 2)
instead of select distinct
March 02, 2004 - 7:10 am UTC
I'd rather "bulk" perform than use the correlated update which would be run like a nested loop
for x in ( select * from dup )
loop
select rowid from dup where ....
if x.rowid = that rowid then
insert
end if
end loop
so yes, you could, but I would not want to (you'd need an index on the key columns)
perhaps:
insert into t
select ....
from ( select t.*,
row_number() over (partition by KEY_COLS order by something) rn
from t )
where rn = 1;
that would be nice if the entire row as not "duped" -- just some defining key and you wanted to keep one row for each key.
Continuing to Insert to this table
Russell, May 14, 2004 - 3:42 am UTC
Tom,
based on your original reply,
I am using Oracle version 8.0.6.
I followed your steps, and removed duplicates from the base table. I am now inserting into that table, values from a staging table which may or may not already be in the base table (Effectively, I would like new values to go into the table, and not worry about records already there).
Is there a way you can write something like
Insert into T1
Select *
from T2
EXCEPTIONS INTO EXCEPTIONS;
I tried, and it returned error ORA-00933 SQL Command not properly ended.
I have implemented it as
Insert into T1
Select *
from T2
where (F1, F2) not IN
(SELECT F1, F2
FROM T1)
But this seems to be pretty slow
Thanks
Russell
May 14, 2004 - 10:16 am UTC
not in will be dog slow with the rbo. You are using rather "old" software.
try this:
select t2.*
from t1,t2
where t1.f1(+) = t2.f1
and t1.f2(+) = t2.f2
and t1.f1 is null
and t1.f2 is null;
don't right pl/sql only when you have to
Serge F., June 08, 2005 - 10:25 pm UTC
Hi Tom you gave this answer to one of the readers of this good thread.
Assuming that I have to use pl/sql code with no loops, but using your ideeas , I got the following problem:
If I execute in a SQLPlus Session:
alter table T
ENABLE CONSTRAINT T_PK
EXCEPTIONS INTO T_EXCEPTIONS;
and I have duplicates, indeed those rows are going into T_EXCEPTIONS.
But if I put the same code into a package/stored procedure as follows:
PROCEDURE manage_dups
IS
BEGIN
EXECUTE IMMEDIATE ' alter table T
ENABLE CONSTRAINT T_PK
EXCEPTIONS INTO T_EXCEPTIONS ';
EXCEPTION
WHEN OTHERS THEN
BEGIN INSERT INTO DUPS
select * from t where rowid in
(select row_id from t_exceptions);
.....
-- some other inserts and deletes as in your original example
END;
END;
this procedure (or package) doesn't work !! I mean, there is nothing in exception table, so when it jumps into exception block there are no records to insert into DUPS.
Unless I miss something silly, please provide us your expertise
Thank you.
June 08, 2005 - 10:50 pm UTC
example please.
tkyte@ORA9IR2W> delete from exceptions;
2 rows deleted.
tkyte@ORA9IR2W> create table t ( x int );
Table created.
tkyte@ORA9IR2W> alter table t add constraint t_pk primary key(x) disable;
Table altered.
tkyte@ORA9IR2W> insert into t values ( 1);
1 row created.
tkyte@ORA9IR2W> insert into t values ( 1);
1 row created.
tkyte@ORA9IR2W> commit;
Commit complete.
tkyte@ORA9IR2W>
tkyte@ORA9IR2W> create or replace procedure dups
2 as
3 begin
4 execute immediate 'alter table t enable constraint t_pk exceptions into exceptions';
5 exception
6 when others then
7 for x in (select rowid r from exceptions )
8 loop
9 dbms_output.put_line( 'rid = ' || x.r );
10 end loop;
11 end;
12 /
Procedure created.
tkyte@ORA9IR2W> exec dups
rid = AAAIHvAABAAANMiAAA
rid = AAAIHvAABAAANMiAAB
PL/SQL procedure successfully completed.
pragma autonomous transaction
Serge F., June 09, 2005 - 9:21 am UTC
Thank you Tom, for your answer.
The main procedure doing the initial insert, was calling this separate procedure to enforce back the primary key after initial insert.
Later reviewing my all stuff I noticed that I was using pragma autonomous transaction into that procedure. By commenting out the PRAGMA, it works.
Thanks again.
procedure to remove duplicate after sqlldr direct load
Ravi, August 30, 2005 - 7:03 pm UTC
Hi Tom,
I am trying to write a procedure to remove duplicates from a table of million records, every day i will be loading 30 tables using direct load and most of the time i get duplicates(not full row but duplicate in composite primary key)i need to copy the duplicate record as a string(record_string = c1,c2,c3..) into some other table, and remove duplicate rows from the main table, I am trying your way but i get an error when i execute it.
can you please help me out, where am i going wrong
1 CREATE OR REPLACE PROCEDURE duplicates
2 (p_table_name VARCHAR2,
3 p_const_name VARCHAR2)
4 AS
5 sql_statement VARCHAR2(1000);
6 first_pass VARCHAR2(1);
7 col_string VARCHAR2(1000);
8 dup_col_string VARCHAR2(500);
9 CURSOR get_columns IS
10 SELECT column_name
11 FROM all_cons_columns
12 WHERE table_name = UPPER(p_table_name)
13 AND constraint_name = UPPER(p_const_name)
14 ORDER BY position;
15 BEGIN
16 sql_statement := ('alter table ' || p_table_name ||
17 ' enable constraint ' ||
18 p_const_name ||' exceptions into exceptions');
19 execute immediate sql_statement;
20 EXCEPTION WHEN OTHERS THEN
21 sql_statement := 'create table dups as
22 select * from '||p_table_name||' where rowid in ( select row_id from exceptions )' ;
23 EXECUTE IMMEDIATE sql_statement;
24 /* Build column string */
25 col_string := NULL;
26 first_pass := 'Y';
27 FOR col_rec IN get_columns LOOP
28 IF (first_pass = 'N') THEN
29 col_string := (col_string ||' AND '|| col_rec.column_name||' = '||'d.'||col_rec.column_name)
30 END IF;
31 col_string := (col_string || col_rec.column_name||' = '||'d.'||col_rec.column_name);
32 first_pass := 'N';
33 END LOOP;
34 sql_statement := 'delete dups d where rowid in
35 (select rowid from dups where '||dup_col_string||' and rownum < 2)' ;
36 EXECUTE IMMEDIATE sql_statement;
37 sql_statement := 'DELETE FROM '||p_table_name||' where rowid in (select row_id from exceptio
38 EXECUTE IMMEDIATE sql_statement;
39 sql_statement := 'INSERT INTO '||p_table_name||' select * from dups';
40 EXECUTE IMMEDIATE sql_statement;
41* END;
42 /
procedure created.
SQL> exec duplicates('EMP','PK_EMP');
BEGIN duplicates('EMP','PK_EMP'); END;
*
ERROR at line 1:
ORA-01031: insufficient privileges
ORA-06512: at "SCOTT.DUPLICATES", line 23
ORA-02437: cannot validate (SCOTT.PK_EMP) - primary key violated
ORA-06512: at line 1
create table dup_record(table_name varchar2(30),
cons_name varchar2(30), record_value varchar2(30) )
/
in the above question wrong size for record_value
ravi, August 30, 2005 - 8:02 pm UTC
Hi TOM,
sorry i also dint include the code to insert the values into this table, as i dint know how to do it, do i need to drop dups table each time and recreate it or is there any way to do it dynamically, any help or suggestions please
create table dup_record(table_name varchar2(30),
cons_name varchar2(30), record_value varchar2(2000) )
/
Thank you
RAVI, August 31, 2005 - 6:19 pm UTC
your method of doing this is great, working very fast, compared to the other way of finding duplicates using group by clause and saving them to a table and then run a query again to delete them, I found the diffrence in my case i have around 6 to 10 columns as composite primary key
and with DML error logging... mighty nice
Duke Ganote, November 12, 2009 - 7:24 pm UTC
SQL> CREATE TABLE T ( T INT, U CHAR(1) );
SQL> CREATE INDEX t_pk ON t ( t );
SQL> ALTER TABLE t ADD CONSTRAINT t_pk PRIMARY KEY ( t ) EXCEPTIONS INTO EXCEPTIONS;
SQL> ALTER TABLE t DISABLE PRIMARY KEY;
TABLE altered.
SQL> INSERT INTO t VALUES ( 1, 'a' ) LOG errors REJECT LIMIT 1;
1 ROW created.
SQL> INSERT INTO t VALUES ( 1, 'a' ) LOG errors REJECT LIMIT 1;
1 ROW created.
SQL> INSERT INTO t VALUES ( 2, 'aa' ) LOG errors REJECT LIMIT 1;
0 ROWS created.
SQL> ALTER TABLE t ENABLE PRIMARY KEY;
ALTER TABLE t ENABLE PRIMARY KEY
*
ERROR AT line 1:
ORA-02437: cannot VALIDATE (CIDWWRITE.T_PK) - PRIMARY KEY violated
SQL> SELECT table_name, row_id, CONSTRAINT FROM EXCEPTIONS;
TABLE_NAME ROW_ID CONSTRAINT
------------------------------ ------------------ ------------------------------
T AADm78ALRAAAhlDAAB T_UNIQUE
T AADm78ALRAAAhlDAAA T_UNIQUE
SQL> SELECT ora_err_mesg$, t, u FROM err$_t;
ORA_ERR_MESG$ T U
------------------------------ ----- -----
ORA-12899: VALUE too large FOR 2 aa
COLUMN "CIDWWRITE"."T"."U" (a
ctual: 2, maximum: 1)