A reader, September 21, 2004 - 3:16 pm UTC
good solution...
daniel, September 21, 2004 - 3:42 pm UTC
I would just use:
select * from a
minus
select * from b;
September 21, 2004 - 4:09 pm UTC
that doesn't work.
think: delete from a;
now what? b is full of stuff, all out of sync with A.
think: b has one row more than A
now what? b and a are out of sync and you won't know that.
so, you have to
(select 'In A', a.* from a minus select 'In A', b.* from b)
UNION ALL
(select 'In B', b.* from b minus select 'In B', a.* from a)
and even then, you won't see "duplicates" (eg: you have the same row 50 times in A, once in B -- minus will nuke all 51 rows -- tables are not in sync, but you think they are)....
which is why the group by trick was 'invented' -- avoiding the 4 full scans ands sorts o'plenty.
I meant...
daniel, September 21, 2004 - 3:52 pm UTC
I meant to say: in the past I would have used 'minus'.
September 21, 2004 - 4:09 pm UTC
ahh, very good :)
another option
Aino, September 21, 2004 - 5:10 pm UTC
Hi,
using a unittest framework might come in handy. utPLSQL provides just the required functionality:
utAssert.eqquery (...);
It allows you to compare the data returned by two queries
It only fails or succeeds (which is often enough for unittesting), but it doesn't show the difference in data as in your solution.
Ciao
Aino
musical variation (rock and roll)
Alberto Dell'Era, September 21, 2004 - 5:40 pm UTC
What about this variation:
dellera@ORACLE9I> with base as (
select a.*, +1 addendum from a
union all
select b.*, -1 addendum from b
),
comp as (
select base.*, sum(addendum) over (partition by c1,c2,c3) diff from base
)
select comp.* from comp where diff <> 0;
C1 C2 C3 ADDENDUM DIFF
---------- ----- ---------- ---------- ----------
2 x y -1 -1
2 xx y 1 1
3 x y 1 1
3 x yy -1 -1
That way you may also easily propagate e.g. the rowid of the offending rows, or any other useful identifier (while retaining the "table flag"); not sure if that could be done with the group by (at least not as naturally).
Can this be useful ?
VKOUL, September 21, 2004 - 8:24 pm UTC
SQL>
SQL> select * from t1;
C1 C2 C3
---------- ---------- ----------
1 ABC 101
2 DEF 200
3 XYZ 300
4 MNO 400
5 XXX 500
SQL> select * from t2;
C1 C2 C3
---------- ---------- ----------
1 ABC 100
2 DEF 200
3 XYZ 301
4 MNO 400
6 LXY 600
SQL>
SQL> get afiedt.buf
1 SELECT *
2 FROM (
3 (SELECT 't1' origin, t1.* from t1
4 MINUS
5 SELECT 't1' , t2.* from t2
6 )
7 UNION ALL
8 (SELECT 't2' origin, t2.* from t2
9 MINUS
10 SELECT 't2' , t1.* from t1
11 )
12* )
SQL> /
OR C1 C2 C3
-- ---------- ---------- ----------
t1 1 ABC 101
t1 3 XYZ 300
t1 5 XXX 500 -- Row in t1 Only
t2 1 ABC 100
t2 3 XYZ 301
t2 6 LXY 600 -- Row in t2 Only
6 rows selected.
SQL>
You can order by any coulmns you want.
September 22, 2004 - 7:23 am UTC
that is what I said a comment or two above -- that if you wanted to use minus, you have to do 4 full scans and sort's o'plenty.
The group by trick is much more efficient as it turns out. (and you can order it too if you like)
Point noted now
VKOUL, September 22, 2004 - 11:36 am UTC
Automated script for comparing tables across database link
Darren, September 22, 2004 - 6:22 pm UTC
I developed the script below to compare row data in a destination versus source database to determine data differences resulting from applying data migration scripts to upgrade commercially available products. The script builds comparison queries from the Oracle Data Dictionary using the comparison method detailed by Tom. The script is only useful if you have source and destination database available, hopefully this will be of use to some people.
-- *
-- COMPARE DATABASE ROW DATA OVER DATABASE LINK.
-- This routine queries the data dictionary to create a SQL
-- script that will compare row data between identically named
-- tables over a database link. This script was designed to
-- determine row data differences after a data migration had
-- taken place. To use this script the source database needs
-- to be accessible via a database link from the destination
-- database.
-- source database = the original database that is to be upgraded
-- destination database = the database that has the upgrade applied
-- This routine creates a SQL script that is executed to determine
-- rows that have been added to the destination database or deleted
-- from the source database. Modified rows are shown as deleted in
-- the source database and added to the destination database, these
-- rows appear next to one another in the results.
-- This script fetches the primary key columns sorts by these first.
-- all sql is performed in the destination database
-- *
-- create a database link, you will need the required privileges to
-- do so or else the database link should be created and you should
-- be granted the required privileges.
-- NOTE: Replace SOURCEDB with the TNSNAMES entry for the source database
drop database link SOURCEDB;
create database link SOURCEDB
connect to &username identified by &password using 'SOURCEDB';
-- create a table to store results
drop table temp_compare_database_data;
create table temp_compare_database_data(table_name varchar2(30), query clob);
set linesize 999
column query format a999
-- run the following script
--
-- start of script
--
declare
v_columns varchar2(2000);
v_primary_key varchar2(2000);
v_order_by varchar2(2000);
v_query clob;
v_found varchar2(1);
v_destination_data_type user_tab_columns.data_type%type;
begin
for mytables in
(
select table_name
from user_tables@SOURCEDB
where table_name like 'LM_%') loop
-- NOTE: change list of tables as required.
-- test to see if table appears in the destination database
begin
select 'Y' into v_found
from
user_tables
where
table_name = mytables.table_name;
exception
when no_data_found then v_found := 'N';
end;
if (v_found = 'Y') then
-- we can only compare tables that appear in both the source and
-- destination databases, refer to the database dictionary difference
-- report to view a list of database structure differences.
v_columns := null;
v_primary_key := null;
v_order_by := null;
v_query := null;
for mypk in
(
select ucco.column_name
from
user_constraints@SOURCEDB ucon,
user_cons_columns@SOURCEDB ucco
where
ucon.table_name = mytables.table_name and
ucon.constraint_type = 'P' and
ucco.constraint_name = ucon.constraint_name
order by
ucco.position) loop
if (v_primary_key is null) then
v_primary_key := 'des.'||mypk.column_name||' = src.'||mypk.column_name;
v_order_by := mypk.column_name;
else
v_primary_key := rtrim(v_primary_key )||' and des.'||mypk.column_name||
' = src.'||mypk.column_name;
v_order_by := rtrim(v_order_by)||','||mypk.column_name;
end if;
end loop;
for mycolumns in
(
select column_name, data_type
from
user_tab_columns@SOURCEDB
where
table_name=mytables.table_name
order by column_id) loop
-- test to see if column appears in the destination database
begin
select 'Y',data_type into v_found,v_destination_data_type
from
user_tab_columns
where
table_name = mytables.table_name and
column_name = mycolumns.column_name;
exception
when no_data_found then v_found := 'N';
end;
if (v_found = 'Y' and
mycolumns.data_type = v_destination_data_type and
mycolumns.data_type not in ('BLOB','CLOB','LONG','RAW','DATE')) then
-- NOTE: you should modify the list of data types excluded in the above
-- statement based on your requirements.
-- we can only compare columns that appear in both the source and
-- destination databases, refer to the database dictionary difference
-- report to view a list of database structure differences.
if (v_columns is null) then
v_columns := mycolumns.column_name;
else
v_columns:=rtrim(v_columns)||','||mycolumns.column_name;
end if;
end if;
end loop;
-- output script
v_query :=
'-- '||mytables.table_name||chr(10)||
chr(10)||
'select decode(count(added_to_dest) - count(deleted_from_source),1,''add'',-1,''delete'') change,'||chr(10)||
v_columns||' '||chr(10)||
'from '||chr(10)||
'('||chr(10)||
'select '||v_columns||', 1 added_to_dest, to_number(null) deleted_from_source '||chr(10)||
'from '||mytables.table_name||' destination '||chr(10)||
'union all ' ||chr(10)||
'select '||v_columns||', to_number(null) added_to_dest, 2 deleted_from_source '||chr(10)||
'from '||mytables.table_name||'@SOURCEDB source '||chr(10)||
')'||chr(10)||
'group by '||v_columns||' '||chr(10)||
'having count(added_to_dest) <> count(deleted_from_source) '||chr(10)||
'order by '||v_order_by||',change;';
insert into temp_compare_database_data(table_name,query) values (mytables.table_name,v_query);
end if;
end loop;
end;
/
--
-- end of script
--
--
-- spool generated script from database
--
spool c:\temp_compare_database_data.sql
select query from temp_compare_database_data order by table_name;
spool off
drop table temp_compare_database_data;
-- then run the script created (temp_compare_database_data.sql) to
-- produce the actual comparisons
set linesize 32000
set heading on
set feedback on
set echo on
spool c:\temp_compare_database_data.results
@c:\temp_compare_database_data.sql
spool off;
-- NOTE: there will be an error at the start and end of the .results
-- file due to the select statement and spool off statement.
-- ***********************
drop database link SOURCEDB;
Wonder how this differ from..
A reader, September 23, 2004 - 1:41 pm UTC
select * from (
select a.*, 'A'
from a, b
where a.c1 = b.c1 and (a.c2 <> b.c2 or a.c3 <> b.c3)
union all
select b.*, 'B'
from a, b
where a.c1 = b.c1 and (a.c2 <> b.c2 or a.c3 <> b.c3)
)
order by 1
OR
select 'A', a.*, 'B', b.*
from a, b
where a.c1 = b.c1
and (a.c2 <> b.c2 or a.c3 <> b.c3)
.. besides having different plans, perhaps ?
September 24, 2004 - 9:20 am UTC
as noted by someone below -- beware the null
but again, I forsee many full table scans in your future... Also what about the key "c1" that is in A but not B -- or vice versa. You'd need OUTER JOINS.
Beware the nulls
Mike, September 23, 2004 - 4:04 pm UTC
If you are using <> to look for mismatches, you will not detect differences if one of the values is NULL.
I have seen this extended as
NVL(a.c2,'NULL') <> NVL(b.c2,'NULL')
and such, but then you need to worry about whether a column might actually contain the string 'NULL', etc.
September 24, 2004 - 9:38 am UTC
the most performant method is:
(a <> b or (a is null and b is not null) or (a is not null and b is null))
short circut evaluation, no function call overhead. And it is always right :)
decode for inequalities
Alberto Dell'Era, September 24, 2004 - 10:43 am UTC
> (a <> b or (a is null and b is not null) or (a is not null and b is null))
in the past, i remember you suggested something like
decode (a,b,0,1) = 1
[or variations like "decode (a,b,0) is not null" ]
which is a tad easier on the keyboard - have you found any quirks lately about decode that makes you prefer the SQL fragment ?
TIA
Alberto
September 24, 2004 - 12:02 pm UTC
I said "yes it works" -- cannot remember if I benchmarked it or not. so, lets do it.
select count(*) from t
where object_id <> data_object_id
or (object_id is null and data_object_id is not null)
or (object_id is not null and data_object_id is null)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.00 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 6 0.12 0.12 0 1158 0 3
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 12 0.12 0.12 0 1158 0 3
********************************************************************************
select count(*) from t
where decode( object_id, data_object_id, 1 ) is null
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.01 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 6 0.13 0.12 0 1158 0 3
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 12 0.14 0.12 0 1158 0 3
********************************************************************************
select count(*) from t
where decode( object_id, data_object_id, 1, 0 ) <> 1
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.00 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 6 0.13 0.12 0 1158 0 3
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 12 0.13 0.13 0 1158 0 3
They appear more or less "equivalent" -- decode is less typing, the other is more "meaningful"
so, if you use decode, make up the reduced keystrokes with a comment :)
(and this only applies in sql, as you cannot call decode() from plsql without using sql and you would not run sql just to run decode, you would use the a<>b or (... ) or (...) )
Awesome
A Happy SQL Guy, November 07, 2004 - 4:12 pm UTC
Thank you so much for this solution!!!
It appears to be working beautifully.
compare table
A reader, November 03, 2011 - 4:39 pm UTC