Skip to Main Content
  • Questions
  • Compare 2 tables to see if they are in sync

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Mike .

Asked: September 21, 2004 - 12:52 pm UTC

Last updated: November 07, 2004 - 4:12 pm UTC

Version: 9.2.0

Viewed 10K+ times! This question is

You Asked

Tom,
I have 2 tables namely A & B. They have identical columns and have
the same number of rows via select count(*) from A & from B.

SQL> desc A;
Name Null? Type
----------------------------------------- -------- -----------------
C1 NOT NULL NUMBER
C2 VARCHAR2(2000)
C3 CHAR(255)

SQL> desc B;
Name Null? Type
----------------------------------------- -------- -----------------
C1 NOT NULL NUMBER
C2 VARCHAR2(2000)
C3 CHAR(255)


SQL> select count(*) from A;

COUNT(*)
----------
1471

SQL> select count(*) from B;

COUNT(*)
----------
1471

But the content in one of the rows is different as from the following
query. The only difference is the the last character in column C2. It is an A in table A and a B in table B

SQL> select * from A where C1=1;


C1 C2 C3
---------- ------------ ----------------
1 AAAAAAAAAAAA 100


SQL> select * from B where C1=1;


C1 C2 C3
---------- ------------ ----------------
1 AAAAAAAAAAAB 100


I would like to write a SQL to compare or see if table A and B are in sync in respect to their content rather than the number of rows.
But I don't know how to do it. Please help
Thanks,

Here they are,

create table A (
C1 NUMBER NOT NULL,
C2 VARCHAR2(2000),
C3 CHAR(255)
);
create unique index A_INDEX on A (C1);

create table B (
C1 NUMBER NOT NULL,
C2 VARCHAR2(2000),
C3 CHAR(255)
);
create unique index B_INDEX on B (C1);

insert into A(C1, C2, C3) values(&1, &2, &3 );

Thanks again,
Mike


and Tom said...

Always nice to have *inserts* too -- that populate the table with well thought out data to work with (the less I HAVE to type, the more I CAN type if you get the gist :)

anyway:


ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from a;

C1 C2 C3
---------- ---- -----
1 x y
2 xx y
3 x y

ops$tkyte@ORA9IR2> select * from b;

C1 C2 C3
---------- ---- -----
1 x y
2 x y
3 x yy

first rows are the same, second and third rows differ...


ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select c1, c2, c3, count(src1), count(src2)
2 from ( select a.*, 1 src1, to_number(null) src2 from a
3 union all
4 select b.*, to_number(null) src1, 2 src2 from b
5 )
6 group by c1,c2,c3
7 having count(src1) <> count(src2)
8 /

C1 C2 C3 COUNT(SRC1) COUNT(SRC2)
---------- ---- ----- ----------- -----------
2 x y 0 1
2 xx y 1 0
3 x y 1 0
3 x yy 0 1

ops$tkyte@ORA9IR2>


that shows the rows that differ, and the count(srcN) flag shows what table it came from -- src1 or src2...




To give credit where credit is due:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2151582681236#15364732017705 <code>

is the origination of this really nice, effective and performat technique.

Rating

  (13 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, September 21, 2004 - 3:16 pm UTC


good solution...

daniel, September 21, 2004 - 3:42 pm UTC

I would just use:
select * from a
minus
select * from b;

Tom Kyte
September 21, 2004 - 4:09 pm UTC

that doesn't work.

think: delete from a;

now what? b is full of stuff, all out of sync with A.

think: b has one row more than A

now what? b and a are out of sync and you won't know that.


so, you have to

(select 'In A', a.* from a minus select 'In A', b.* from b)
UNION ALL
(select 'In B', b.* from b minus select 'In B', a.* from a)

and even then, you won't see "duplicates" (eg: you have the same row 50 times in A, once in B -- minus will nuke all 51 rows -- tables are not in sync, but you think they are)....


which is why the group by trick was 'invented' -- avoiding the 4 full scans ands sorts o'plenty.



I meant...

daniel, September 21, 2004 - 3:52 pm UTC

I meant to say: in the past I would have used 'minus'.

Tom Kyte
September 21, 2004 - 4:09 pm UTC

ahh, very good :)

another option

Aino, September 21, 2004 - 5:10 pm UTC

Hi,

using a unittest framework might come in handy. utPLSQL provides just the required functionality:
utAssert.eqquery (...);
It allows you to compare the data returned by two queries
It only fails or succeeds (which is often enough for unittesting), but it doesn't show the difference in data as in your solution.

Ciao
Aino

musical variation (rock and roll)

Alberto Dell'Era, September 21, 2004 - 5:40 pm UTC

What about this variation:

dellera@ORACLE9I> with base as (
select a.*, +1 addendum from a
union all
select b.*, -1 addendum from b
),
comp as (
select base.*, sum(addendum) over (partition by c1,c2,c3) diff from base
)
select comp.* from comp where diff <> 0;

C1 C2 C3 ADDENDUM DIFF
---------- ----- ---------- ---------- ----------
2 x y -1 -1
2 xx y 1 1
3 x y 1 1
3 x yy -1 -1

That way you may also easily propagate e.g. the rowid of the offending rows, or any other useful identifier (while retaining the "table flag"); not sure if that could be done with the group by (at least not as naturally).

Can this be useful ?

VKOUL, September 21, 2004 - 8:24 pm UTC

SQL> 
SQL> select * from t1;

        C1 C2                 C3
---------- ---------- ----------
         1 ABC               101
         2 DEF               200
         3 XYZ               300
         4 MNO               400
         5 XXX               500

SQL> select * from t2;

        C1 C2                 C3
---------- ---------- ----------
         1 ABC               100
         2 DEF               200
         3 XYZ               301
         4 MNO               400
         6 LXY               600

SQL> 
SQL> get afiedt.buf
  1  SELECT *
  2  FROM  (
  3         (SELECT 't1' origin, t1.* from t1
  4          MINUS
  5          SELECT 't1'       , t2.* from t2
  6         )
  7         UNION ALL
  8         (SELECT 't2' origin, t2.* from t2
  9          MINUS
 10          SELECT 't2'       , t1.* from t1
 11         )
 12*       )
SQL> /

OR         C1 C2                 C3
-- ---------- ---------- ----------
t1          1 ABC               101
t1          3 XYZ               300
t1          5 XXX               500 -- Row in t1 Only
t2          1 ABC               100
t2          3 XYZ               301
t2          6 LXY               600 -- Row in t2 Only

6 rows selected.

SQL> 

You can order by any coulmns you want.
 

Tom Kyte
September 22, 2004 - 7:23 am UTC

that is what I said a comment or two above -- that if you wanted to use minus, you have to do 4 full scans and sort's o'plenty.


The group by trick is much more efficient as it turns out. (and you can order it too if you like)

Point noted now

VKOUL, September 22, 2004 - 11:36 am UTC


Automated script for comparing tables across database link

Darren, September 22, 2004 - 6:22 pm UTC

I developed the script below to compare row data in a destination versus source database to determine data differences resulting from applying data migration scripts to upgrade commercially available products. The script builds comparison queries from the Oracle Data Dictionary using the comparison method detailed by Tom. The script is only useful if you have source and destination database available, hopefully this will be of use to some people.

-- *

-- COMPARE DATABASE ROW DATA OVER DATABASE LINK.

-- This routine queries the data dictionary to create a SQL
-- script that will compare row data between identically named
-- tables over a database link. This script was designed to
-- determine row data differences after a data migration had
-- taken place. To use this script the source database needs
-- to be accessible via a database link from the destination
-- database.

-- source database = the original database that is to be upgraded
-- destination database = the database that has the upgrade applied

-- This routine creates a SQL script that is executed to determine
-- rows that have been added to the destination database or deleted
-- from the source database. Modified rows are shown as deleted in
-- the source database and added to the destination database, these
-- rows appear next to one another in the results.

-- This script fetches the primary key columns sorts by these first.

-- all sql is performed in the destination database

-- *

-- create a database link, you will need the required privileges to
-- do so or else the database link should be created and you should
-- be granted the required privileges.

-- NOTE: Replace SOURCEDB with the TNSNAMES entry for the source database

drop database link SOURCEDB;

create database link SOURCEDB
connect to &username identified by &password using 'SOURCEDB';

-- create a table to store results

drop table temp_compare_database_data;
create table temp_compare_database_data(table_name varchar2(30), query clob);

set linesize 999
column query format a999

-- run the following script

--
-- start of script
--

declare

v_columns varchar2(2000);
v_primary_key varchar2(2000);
v_order_by varchar2(2000);
v_query clob;
v_found varchar2(1);
v_destination_data_type user_tab_columns.data_type%type;

begin

for mytables in
(
select table_name
from user_tables@SOURCEDB
where table_name like 'LM_%') loop

-- NOTE: change list of tables as required.

-- test to see if table appears in the destination database

begin

select 'Y' into v_found
from
user_tables
where
table_name = mytables.table_name;

exception

when no_data_found then v_found := 'N';

end;

if (v_found = 'Y') then

-- we can only compare tables that appear in both the source and
-- destination databases, refer to the database dictionary difference
-- report to view a list of database structure differences.

v_columns := null;
v_primary_key := null;
v_order_by := null;
v_query := null;

for mypk in
(
select ucco.column_name
from
user_constraints@SOURCEDB ucon,
user_cons_columns@SOURCEDB ucco
where
ucon.table_name = mytables.table_name and
ucon.constraint_type = 'P' and
ucco.constraint_name = ucon.constraint_name
order by
ucco.position) loop

if (v_primary_key is null) then
v_primary_key := 'des.'||mypk.column_name||' = src.'||mypk.column_name;
v_order_by := mypk.column_name;
else
v_primary_key := rtrim(v_primary_key )||' and des.'||mypk.column_name||
' = src.'||mypk.column_name;
v_order_by := rtrim(v_order_by)||','||mypk.column_name;
end if;

end loop;

for mycolumns in
(
select column_name, data_type
from
user_tab_columns@SOURCEDB
where
table_name=mytables.table_name
order by column_id) loop

-- test to see if column appears in the destination database

begin

select 'Y',data_type into v_found,v_destination_data_type
from
user_tab_columns
where
table_name = mytables.table_name and
column_name = mycolumns.column_name;

exception

when no_data_found then v_found := 'N';

end;

if (v_found = 'Y' and
mycolumns.data_type = v_destination_data_type and
mycolumns.data_type not in ('BLOB','CLOB','LONG','RAW','DATE')) then

-- NOTE: you should modify the list of data types excluded in the above
-- statement based on your requirements.

-- we can only compare columns that appear in both the source and
-- destination databases, refer to the database dictionary difference
-- report to view a list of database structure differences.

if (v_columns is null) then
v_columns := mycolumns.column_name;
else
v_columns:=rtrim(v_columns)||','||mycolumns.column_name;
end if;

end if;

end loop;

-- output script

v_query :=
'-- '||mytables.table_name||chr(10)||
chr(10)||
'select decode(count(added_to_dest) - count(deleted_from_source),1,''add'',-1,''delete'') change,'||chr(10)||
v_columns||' '||chr(10)||
'from '||chr(10)||
'('||chr(10)||
'select '||v_columns||', 1 added_to_dest, to_number(null) deleted_from_source '||chr(10)||
'from '||mytables.table_name||' destination '||chr(10)||
'union all ' ||chr(10)||
'select '||v_columns||', to_number(null) added_to_dest, 2 deleted_from_source '||chr(10)||
'from '||mytables.table_name||'@SOURCEDB source '||chr(10)||
')'||chr(10)||
'group by '||v_columns||' '||chr(10)||
'having count(added_to_dest) <> count(deleted_from_source) '||chr(10)||
'order by '||v_order_by||',change;';

insert into temp_compare_database_data(table_name,query) values (mytables.table_name,v_query);

end if;

end loop;

end;
/

--
-- end of script
--

--
-- spool generated script from database
--

spool c:\temp_compare_database_data.sql
select query from temp_compare_database_data order by table_name;
spool off

drop table temp_compare_database_data;

-- then run the script created (temp_compare_database_data.sql) to
-- produce the actual comparisons

set linesize 32000
set heading on
set feedback on
set echo on
spool c:\temp_compare_database_data.results
@c:\temp_compare_database_data.sql
spool off;

-- NOTE: there will be an error at the start and end of the .results
-- file due to the select statement and spool off statement.

-- ***********************

drop database link SOURCEDB;


Wonder how this differ from..

A reader, September 23, 2004 - 1:41 pm UTC

select * from (
select a.*, 'A'
from a, b
where a.c1 = b.c1 and (a.c2 <> b.c2 or a.c3 <> b.c3)
union all
select b.*, 'B'
from a, b
where a.c1 = b.c1 and (a.c2 <> b.c2 or a.c3 <> b.c3)
)
order by 1

OR

select 'A', a.*, 'B', b.*
from a, b
where a.c1 = b.c1
and (a.c2 <> b.c2 or a.c3 <> b.c3)


.. besides having different plans, perhaps ?

Tom Kyte
September 24, 2004 - 9:20 am UTC

as noted by someone below -- beware the null

but again, I forsee many full table scans in your future... Also what about the key "c1" that is in A but not B -- or vice versa. You'd need OUTER JOINS.



Beware the nulls

Mike, September 23, 2004 - 4:04 pm UTC

If you are using <> to look for mismatches, you will not detect differences if one of the values is NULL.

I have seen this extended as
NVL(a.c2,'NULL') <> NVL(b.c2,'NULL')
and such, but then you need to worry about whether a column might actually contain the string 'NULL', etc.


Tom Kyte
September 24, 2004 - 9:38 am UTC

the most performant method is:


(a <> b or (a is null and b is not null) or (a is not null and b is null))


short circut evaluation, no function call overhead. And it is always right :)



decode for inequalities

Alberto Dell'Era, September 24, 2004 - 10:43 am UTC

> (a <> b or (a is null and b is not null) or (a is not null and b is null))

in the past, i remember you suggested something like

decode (a,b,0,1) = 1

[or variations like "decode (a,b,0) is not null" ]

which is a tad easier on the keyboard - have you found any quirks lately about decode that makes you prefer the SQL fragment ?

TIA
Alberto

Tom Kyte
September 24, 2004 - 12:02 pm UTC

I said "yes it works" -- cannot remember if I benchmarked it or not. so, lets do it.


select count(*) from t
where object_id <> data_object_id
or (object_id is null and data_object_id is not null)
or (object_id is not null and data_object_id is null)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.00 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 6 0.12 0.12 0 1158 0 3
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 12 0.12 0.12 0 1158 0 3
********************************************************************************
select count(*) from t
where decode( object_id, data_object_id, 1 ) is null

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.01 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 6 0.13 0.12 0 1158 0 3
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 12 0.14 0.12 0 1158 0 3
********************************************************************************
select count(*) from t
where decode( object_id, data_object_id, 1, 0 ) <> 1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.00 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 6 0.13 0.12 0 1158 0 3
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 12 0.13 0.13 0 1158 0 3


They appear more or less "equivalent" -- decode is less typing, the other is more "meaningful"

so, if you use decode, make up the reduced keystrokes with a comment :)


(and this only applies in sql, as you cannot call decode() from plsql without using sql and you would not run sql just to run decode, you would use the a<>b or (... ) or (...) )

Awesome

A Happy SQL Guy, November 07, 2004 - 4:12 pm UTC

Thank you so much for this solution!!!

It appears to be working beautifully.

compare table

A reader, November 03, 2011 - 4:39 pm UTC