Skip to Main Content
  • Questions
  • comparing two table data and logging the changes

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, sachin.

Asked: December 06, 2001 - 12:21 pm UTC

Last updated: June 18, 2013 - 2:42 pm UTC

Version: 8.1.6

Viewed 10K+ times! This question is

You Asked


in my application, i have two tables similar in structure say trans_detail and trans_detail_tmp.
now for a trans_id i need to compare the each and every field value in the both tables and
write to another table if any changes were found.

i have written a procedure to the above. i declare the following cursors for those tables
like

cursor c_trans is select * from trans_detail where trans_id = p_trans_id;
cursor c_trans_tmp is select * from trans_detail_tmp where trans_id = p_trans_id;

then i compare each and every field and write to another table if they are not same like below

open c_trans;
open c_trans_tmp;
fetch c_trans into r1;
fetch c_trans_tmp into r2;

if r1.trans_amount != r2.trans_amount then
-- insert the change to the other table.
end if;

so, this i have to do for all the fields.
is it possible to do the same with a single query? or any better you suggest..
any suggestions in this are appreciated.

thanks

and Tom said...

This single statement:

insert into other_table
( select 'in trans_detail, not tmp', trans_detail.*
from trans_detail where trans_id = p_trans_id
MINUS
select 'in trans_detail, not tmp', trans_detail_tmp.*
from trans_detail_tmp where trans_id = p_trans_id
)
UNION ALL
( select 'in tmp, not trans_detail', trans_detail_tmp.*
from trans_detail_tmp where trans_id = p_trans_id
MINUS
select 'in tmp, not trans_detail', trans_detail.*
from trans_detail where trans_id = p_trans_id
)
will do it all for you (no code!!!) -- and it will handle NULLS properly (else you must compare if ( a != b or ( a is null and b is not null ) or ( a is not null and b is null ) ).

I stuck the constants "in tmp, not trans_detail" and so on in there so you could tell which rows came from which table if you needed that.



Rating

  (60 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

i need change in each column to written as a seperate row

sachin, December 06, 2001 - 3:15 pm UTC

thanks for the answer.
but i need to log the changes on each column as a different row into that other table.

for ex:

select '1' a,'2' b,'3' c from dual
minus
select '1' a,'4' b,'6' c from dual

i get the following result

A B C
- - -
1 2 3

i need that other table to contain the following rows.
column_name old_val new_val
b 2 4
c 3 6

Tom Kyte
December 07, 2001 - 8:56 am UTC

You would need a full outer join in order to do that (since the row could be in T1 and not in T2 or in T2 and not in T1).

Is trans_id the primary key?

yes trans_id is a primary key

sachin, December 07, 2001 - 9:52 am UTC

can you give that sample query with that outer join.

thanks

Tom Kyte
December 07, 2001 - 1:43 pm UTC

Say you have the tables setup like this:

tkyte@TKYTE9I.US.ORACLE.COM> create table trans (trans_id int primary key,
2 c1 date,
3 c2 number,
4 c3 varchar2(10)
5 )
6 /

Table created.

Elapsed: 00:00:00.00
tkyte@TKYTE9I.US.ORACLE.COM>
tkyte@TKYTE9I.US.ORACLE.COM> create table trans_tmp (trans_id int primary key,
2 c1 date,
3 c2 number,
4 c3 varchar2(10)
5 )
6 /

Table created.

Elapsed: 00:00:00.03
tkyte@TKYTE9I.US.ORACLE.COM>
tkyte@TKYTE9I.US.ORACLE.COM> create table changes
2 ( trans_id int, cname varchar2(10), val1 varchar2(4000), val2 varchar2(4000) );

Table created.


This would capture the changes:

tkyte@TKYTE9I.US.ORACLE.COM> select *
2 from (
3 select t1.trans_id,
4 decode( r, 1, 'C1', 2, 'C2', 3, 'C3' ) cname,
5 decode( r, 1, to_char(t1.c1,'dd-mon-yyyy hh24:mi:ss'),
6 2, to_char(t1.c2),
7 3, t1.c3) v1,
8 decode( r, 1, to_char(t2.c1,'dd-mon-yyyy hh24:mi:ss'),
9 2, to_char(t2.c2),
10 3, t2.c3) v2
11 from trans t1,
12 trans_tmp t2,
13 (select rownum r from all_objects where rownum <= 3 )
14 where t1.trans_id = :bv
15 and t1.trans_id = t2.trans_id (+)
16 )
17 where v1 <> v2
18 or (v1 is null and v2 is not null)
19 or (v1 is not null and v2 is null)
20 /

Where 3 in this query represents the number of columns we need to compare. The decodes would convert everything into character strings for comparision...


outer join query

sachin, December 07, 2001 - 11:45 am UTC

tom,

can you post the outer join query that you talked about in your answer please.

trans_id is a primary key.

outer join on the other table

sachin, December 07, 2001 - 2:48 pm UTC

tom,

i think your following query does the outer join only on table not on both. am i right?
what about the rows that are in t2(trans_tmp), but not in t1(trans)


Tom Kyte
December 07, 2001 - 3:18 pm UTC

Ok, try this one on for size then:

select *
from (
select t1.trans_id,
decode( r, 1, 'C1', 2, 'C2', 3, 'C3' ) cname,
decode( r, 1, to_char(t1.c1,'dd-mon-yyyy hh24:mi:ss'),
2, to_char(t1.c2),
3, t1.c3) v1,
decode( r, 1, to_char(t2.c1,'dd-mon-yyyy hh24:mi:ss'),
2, to_char(t2.c2),
3, t2.c3) v2
from
(select rownum rr, trans.* from trans where trans_id = :bv ) t1,
(select rownum rr, trans_tmp.* from trans_tmp where trans_id = :bv) t2,
(select rownum r from all_objects where rownum <= 3 ) t3,
(select rownum rr from dual) t4
where t1.rr(+) = t4.rr
and t2.rr(+) = t4.rr
)
where v1 <> v2
or (v1 is null and v2 is not null)
or (v1 is not null and v2 is null)
/




What about when you just have raw data?

Duncan, January 08, 2004 - 7:04 am UTC

Tom

I have a procedure which has many input parameters representing elements of table data. Once these parameters are read into the procedure I validate each element. Once the data line has been validated I then need to check that the record doesn't already exist in my table.

I do this by checking a few fields of demographic data, eg name, address. If the record doesn't exist then i insert it.

However if the record does exits i need to do an update of all the other fields of the row with the new data fields read into the procedure. I also need to keep track of which fields have actually changed from their original table data. Obviously I can do this the long way around by checking individual fields and comparing their table data to the new data, but I wanted to know if you have a quicker more efficient method since my table is very wide, and checking individual elements is tedious, especially because I have to account for NULL values.

I was wondering about loading the data line into maybe a RECORD type and comparing to the table data, but I understand that you cannot directly compare RECORD types.
Would another type work, eg ORACLE OBJECT? If so how could i get it to log which fields have been changed?

There must be a quick less painful way to do this as I can't believe it is that uncommon?

Many thanks for your help
(db version 8.1.7.4)

Tom Kyte
January 08, 2004 - 1:48 pm UTC

to check for field by field you have one and only one option.

to check field by field.


(it is rather "uncommon", I've not run into it as a requirement very often).

the "easiest" way I can think to do this is:


ops$tkyte@ORA920PC> create table t ( a int, b date, c varchar2(25) );
Table created.

<b>that represents your table.  A is your "key", b, c the data you update</b>
 
ops$tkyte@ORA920PC> insert into t values ( 1, trunc(sysdate), 'hello world' );
ops$tkyte@ORA920PC> insert into t values ( 2, null, 'hello world' );
ops$tkyte@ORA920PC> insert into t values ( 3, trunc(sysdate), null );

ops$tkyte@ORA920PC> create or replace view v
  2  as
  3  select b, 0 b_ind, c, 0 c_ind
  4    from t
  5   where 1=0;
 
View created.

<b>I'm using this view as a template.  Basically, we'll have column/indicator, column/indicator in this view.

The view is used to pick up the datatypes from the data dictionary easily and make it so that change to the base table -- changes the view straight away (and so we don't have to define the record in plsql)...

Instead of a view, this could be a cursor in the package

Here is your routine.  note how we use DECODE to do the compares (null safe)</b>

 
ops$tkyte@ORA920PC> create or replace procedure do_iu( p_a in int, p_b in date, p_c in varchar2 )
  2  as
  3      l_rec v%rowtype;
  4  begin
  5      select b, decode( b, p_b, 0, 1 ),
  6             c, decode( c, p_c, 0, 1 )
  7        into l_rec
  8        from t
  9       where a = p_a;
 10
 11      if ( l_rec.b_ind = 1 )
 12      then
 13          dbms_output.put_line( 'B will be changed from "' || l_rec.b || '" to ' || p_b );
 14      end if;
 15      if ( l_rec.c_ind = 1 )
 16      then
 17          dbms_output.put_line( 'C will be changed from "' || l_rec.c || '" to ' || p_c );
 18      end if;
 19
 20      update t
 21         set b = p_b, c = p_c
 22       where a = p_a;
 23      dbms_output.put_line( 'Updated ' || sql%rowcount );
 24  exception
 25      when no_data_found then
 26          dbms_output.put_line( 'Inserted' );
 27          insert into t (a,b,c) values ( p_a, p_b, p_c );
 28  end;
 29  /
 
Procedure created.
 
<b>and then to test...</b>

ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> exec do_iu( 1, sysdate-1, 'hello world' );
B will be changed from "08-JAN-04" to 07-JAN-04
Updated 1
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> exec do_iu( 1, trunc(sysdate), 'goodbye' );
B will be changed from "07-JAN-04" to 08-JAN-04
C will be changed from "hello world" to goodbye
Updated 1
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> exec do_iu( 1, sysdate, 'goodbye' );
B will be changed from "08-JAN-04" to 08-JAN-04
Updated 1
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> exec do_iu( 2, sysdate, 'hello world' );
B will be changed from "" to 08-JAN-04
Updated 1
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> exec do_iu( 3, trunc(sysdate), 'hello world' );
C will be changed from "" to hello world
Updated 1
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> exec do_iu( 4, sysdate, 'hello world' );
Inserted
 
PL/SQL procedure successfully completed.
 

Read the tables only once!

Marco Stefanetti, February 03, 2004 - 2:03 pm UTC

I have found this simple solution, the benefit is that i read the two tables only once:

1 select field,
2 value,
3 sum(source) source
4 from (
5 (
6 select 'a' field,1 source, 20 value from dual
7 union
8 select 'b' field,1 source, 15 value from dual
9 union
10 select 'c' field,1 source, 30 value from dual
11 )
12 union all
13 (
14 select 'b' field,2 source, 15 value from dual
15 union
16 select 'c' field,2 source, 24 value from dual
17 union
18 select 'd' field,2 source, 13 value from dual
19 )
20 )
21 group by field, value
22* having sum(source) < 3

F VALUE SOURCE
- ---------- ----------
a 20 1
c 24 2
c 30 1
d 13 2

bye


Tom Kyte
February 03, 2004 - 2:13 pm UTC

huh?

you do realize that you read dual 6 times and not only that but you did a sort/distinct on it....


And you'd read your TABLE 2 times for each column in it in general?

how'd you get "that i read the two tables only once:"?




(and not only that, but it gets "the wrong answer" to boot, what happens with two tables:

T1

1 2
2 1


T2

1 1
2 2



Your technique would say "they are the same", but they are not.


Not sure if you meant what you typed in as the example query -- why don't you work with real tables?

you are too fast on respond

Marco Stefanetti, February 03, 2004 - 2:31 pm UTC

try this :

create table a (field char,source number,value number);
insert into a values('a',1,10);
insert into a values('b',1,10);
insert into a values('c',1,10);
insert into a values('1',1, 2);
insert into a values('2',1, 1);

select * from a;

create table b (field char,source number,value number);
insert into b values('b',2,10);
insert into b values('c',2,12);
insert into b values('d',2,10);
insert into b values('1',2, 1);
insert into b values('2',2, 2);

select * from b;

select field,
value,
sum(source) source
from (
( select * from a )
union all
( select * from b )
)
group by field, value
having sum(source) < 3;

---------------------------------------

results:

select * from a;

F SOURCE VALUE
- ---------- ----------
a 1 10
b 1 10
c 1 10
1 1 2
2 1 1

select * from b;

F SOURCE VALUE
- ---------- ----------
b 2 10
c 2 12
d 2 10
1 2 1
2 2 2

select field,
value,
sum(source) source
from (
( select * from a )
union all
( select * from b )
)
group by field, value
having sum(source) < 3;

F VALUE SOURCE
- ---------- ----------
1 1 2
1 2 1
2 1 1
2 2 2
a 10 1
c 10 1
c 12 2
d 10 2

Record "b 10" doesn't compare!!!!!

I read table a once
I read table b once


see you

Tom Kyte
February 03, 2004 - 6:15 pm UTC

that is not at all what you posted -- not at all -- see what you posted, it would appear in your above example that you were PIVOTING fields named "a", "b", and "c"

that is why I said "do a real example for your does not make sense as presented"

what was the purpose of the union's of duals -- they only confused the issue. I thought you were PIVOTING a table with 3 columns A, B and C -- that would not work.

Now I see your general idea and say "looks good". I would however do it like this:



select id, count(src1), count(src2)
from ( select a.*, 1 src1, to_number(null) src2 from big_table a
union all
select b.*, to_number(null) src1, 2 src2 from big_table b
)
group by ID, OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID,
DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP,
STATUS, TEMPORARY, GENERATED, SECONDARY
having count(src1) <> count(src2)


That does effectively the same thing -- BUT works even if table A has a row three times and table B does not have it at all (you would get sum(source) = 3 in that case)

This'll show how many rows are are in each table when they are not equal, will tolerate duplicate rows and show the mismatches.

Thanks for the interesting new twist -- it does do less work in most cases -- the group by could be killer on really wide rows.



ps

Marco Stefanetti, February 03, 2004 - 2:44 pm UTC

select field,
sum( decode(source,1,value,null) ) table_a,
sum( decode(source,2,value,null) ) table_b
from
( select field,
value,
sum(source) source
from (
( select * from a )
union all
( select * from b )
)
group by field, value
having sum(source) < 3
)
group by field;

F TABLE_A TABLE_B
- ---------- ----------
1 2 1
2 1 2
a 10
c 10 12
d 10

bye



just another example

Marco Stefanetti, February 03, 2004 - 4:19 pm UTC

You can compare 2 fields of 3 tables


------------------------------------------------------------------------------------------------------------

create table a ( key varchar2(10), value1 number, value2 varchar2(10) );
insert into a values('A',10,'aaa');
insert into a values('B',20,'bbb');
insert into a values('C',30,'ccc');
insert into a values('D',40,'ddd');
insert into a values('E',50,'eee');

create table b ( key varchar2(10), value1 number, value2 varchar2(10) );
insert into b values('B',20,'bbb');
insert into b values('C',30,'ccc');
insert into b values('D',41,'ddd');
insert into b values('E',50,'eee');

create table c ( key varchar2(10), value1 number, value2 varchar2(10) );
insert into c values('C',30,'ccc');
insert into c values('D',40,'ddd');
insert into c values('E',50,'eeX');

------------------------------------------------------------------------------------------------------------

select key,
max( decode( source_code_tot, 1, value1, 3, value1, 5, value1, 7, value1, null ) ) a_v1,
max( decode( source_code_tot, 1, value2, 3, value2, 5, value2, 7, value2, null ) ) a_v2,
max( decode( source_code_tot, 2, value1, 3, value1, 6, value1, 7, value1, null ) ) b_v1,
max( decode( source_code_tot, 2, value2, 3, value2, 6, value2, 7, value2, null ) ) b_v2,
max( decode( source_code_tot, 4, value1, 5, value1, 6, value1, 7, value1, null ) ) c_v1,
max( decode( source_code_tot, 4, value2, 5, value2, 6, value2, 7, value2, null ) ) c_v2,
decode( source_code_tot, 7, 'OK', 'DIFF') descr
from
(
select key,
value1,
value2,
sum(source_code) source_code_tot
from (
( select a.*, 1 source_code from a)
union all
( select b.*, 2 source_code from b)
union all
( select c.*, 4 source_code from c)
)
group by key, value1, value2
)
group by key, decode( source_code_tot, 7, 'OK', 'DIFF')
order by key;

KEY A_V1 A_V2 B_V1 B_V2 C_V1 C_V2 DESCR
---------- ---------- ---------- ---------- ---------- ---------- ---------- --------------------------------
A 10 aaa DIFF
B 20 bbb 20 bbb DIFF
C 30 ccc 30 ccc 30 ccc OK
D 40 ddd 41 ddd 40 ddd DIFF
E 50 eee 50 eee 50 eeX DIFF

------------------------------------------------------------------------------------------------------------

Is it strange to sum the labels 1,2,4?
not so strange, table d would have source_code=8

Useful?
yes, if one of the tables is a complex view on a remote db, and you can read it only once

fast?
I compare 10000 rows of a remote table in 2 seconds.

let me know ...

Multiple records

Marco Stefanetti, February 04, 2004 - 12:35 pm UTC

YOU:
that is not at all what you posted -- not at all -- see what you posted, it
would appear in your above example that you were PIVOTING fields named "a", "b",
and "c"

ME:
Sorry, the union select in the first of my posts was simply a table example,
but it was confusing and, sure, it is better with real tables

YOU:
select id, count(src1), count(src2)
from ( select a.*, 1 src1, to_number(null) src2 from big_table a
union all
select b.*, to_number(null) src1, 2 src2 from big_table b
)
group by ID, OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID,
DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP,
STATUS, TEMPORARY, GENERATED, SECONDARY
having count(src1) <> count(src2)


That does effectively the same thing -- BUT works even if table A has a row
three times and table B does not have it at all (you would get sum(source) = 3
in that case)

ME:
I'm not sure, you are not considering the content of the fields, only keys.

YOU:
the group by could be killer on really wide rows.

ME:
Yes, but I think that something "killer" would be necessary in any case

YOU: BUT works even if table A has a row three times

ME: I have got another idea, hope i'm right
we can use the product of primes instead of the sum of binary numbers

PROBLEM:

find differences and missing records between similar tables

TABLES:

create table a ( key varchar2(10), value1 varchar2(10) );
insert into a values('A','aaa');
insert into a values('A','aaa');
insert into a values('B','bbb');
insert into a values('B','bbb');
insert into a values('C','ccc');
insert into a values('D','ddd');
insert into a values('E','eee');

create table b ( key varchar2(10), value1 varchar2(10) );
insert into b values('B','bbb');
insert into b values('C','ccc');
insert into b values('C','ccc');
insert into b values('D','ddd');
insert into b values('E','eee');

create table c ( key varchar2(10), value1 varchar2(10) );
insert into c values('C','ccc');
insert into c values('C','ccc');
insert into c values('D','ddd');
insert into c values('E','eeX');

SELECT:

select
key,
decode( mod(source_code_tot,2), 0, value1, null) a,
decode( mod(source_code_tot,3), 0, value1, null) b,
decode( mod(source_code_tot,5), 0, value1, null) c,
source_code_tot p,
mod(source_code_tot,30) r,
decode( mod(source_code_tot,30), 0, 'OK', 'DIFF') res
from
(
select key,
value1,
trunc( exp( sum( ln( source_code) ) ) ) source_code_tot
from (
( select a.*, 2 source_code from a)
union all
( select b.*, 3 source_code from b)
union all
( select c.*, 5 source_code from c)
)
group by key, value1
)
;

RESULTS:

KEY A B C P R RES
----- ----- ----- ----- ----- ----- -----
A aaa 4 4 DIFF
B bbb bbb 12 12 DIFF
C ccc ccc ccc 450 0 OK
D ddd ddd ddd 30 0 OK
E eeX 5 5 DIFF
E eee eee 6 6 DIFF


It seems to work, but may be inaccurate due to exp(sum(ln(x))) because of approximations

STEPS:

1) union all : we consider all records af our tables labelling them whit a prime number
2) group by key, value1 : we consider as different, records with different values ( difference between records),
at the same time we calculate the product of the labels (exp sum ln), this is a single number that memorizes the
provenience of the records in the group
3) decode(mod,30) : we can have a total multiple of 30 (2*3*5) if, and only if, the record was present in the tree sources.

I'm sorry for the long posts, but it seems to me quite interesting.



Tom Kyte
February 04, 2004 - 5:42 pm UTC

watch out - that exp(sum(ln(x))) would be a killer cpu wise!!

I like your approach with the minor modification I have (which permits duplicates -- I'm not assuming "keys" of any sort here at all).

The group by approach (given that you don't get sort key too long) is elegant and efficient.

Sorry

Marco Stefanetti, February 04, 2004 - 1:29 pm UTC

Sorry, you are rigth
considering the above tables it's enought:

select key,
decode( ta, 0, null, value1) a,
decode( tb, 0, null, value1) b,
decode( tc, 0, null, value1) c,
decode( ta, 0, 'DIFF',
decode( tb, 0, 'DIFF',
decode( tc, 0, 'DIFF', 'OK' ) ) ) res
from
(
select key,
value1,
sum(ta) ta,
sum(tb) tb,
sum(tc) tc
from (
( select a.*, 1 ta, 0 tb, 0 tc from a)
union all
( select b.*, 0 ta, 1 tb, 0 tc from b)
union all
( select c.*, 0 ta, 0 tb, 1 tc from c)
)
group by key, value1
)

RESULTS:

KEY A B C RES
---------- ---------- ---------- ---------- ---------------
A aaa DIFF
B bbb bbb DIFF
C ccc ccc ccc OK
D ddd ddd ddd OK
E eeX DIFF
E eee eee DIFF

OK, stop here, I think we have found a good solution.

Comparing two data between two tables with simular data types

Bonnie, November 22, 2004 - 7:51 pm UTC

Hi Tom,

I was also having problems compare data from two simular table, and tried your solution using the decode statement. However, since I'm comparing 202 columns between the two tables, I'm getting an error of 'ORA-00939: too many arguments for function' Any ideas on how I can resolve this?

Your help is much appreciated.

Tom Kyte
November 22, 2004 - 8:11 pm UTC

decode stops at 255.

so, you'd have to do at least 2 decodes

3 select t1.trans_id,
4 decode( r, 1, 'C1', 2, 'C2', 3, 'C3' ) || decode( r, 4, c4, 5, c5 ) cname,


stopping the first decode with 200 or so inputs and starting the next.

Comparing two data between two tables with simular data types

Bonnie, December 01, 2004 - 1:03 pm UTC

Thank you so much for that. It took a while to sort through all the different variables on my two tables as they were not always consistant. Anyhow, I was wondering if you know how I can resolve this dynamic sql:

--this first one returns the out put of

s_PlSqlBlock :=
'UPDATE TRANS
SET '||C1||' = to_date('||c_cursor.V2||',''dd-mon-yyyy hh24:mi:ss'')'||
' WHERE trans_id = '||c_cursor.trans_id;

the out put from a select statement is:

UPDATE TRANS
SET C1= to_date(27-apr-2004 10:45:00,'dd-mon-yyyy hh24:mi:ss')
WHERE trans_id= 3;

--notice there's no quotes around the '27-apr-2004 10:45:00' which caused the update statement to fail, with 'ORA-00904: invalid column name' error.
The following also doesn't work.

s_PlSqlBlock :=
'UPDATE TRANS
SET '||C1||' = to_date('||'''c_cursor.V2'''||',''dd-mon-yyyy hh24:mi:ss'')'||
' WHERE trans_id = '||c_cursor.trans_id;

Any ideas how I can make this work?

Thank you so much!!





Tom Kyte
December 01, 2004 - 1:49 pm UTC

USE BIND VARIABLES!!!!!!!

don't put literals in SQL like that.

comparing two tables

biju, December 08, 2004 - 11:50 am UTC

i Have a two tables(tb1,tb2) with same fields(id(pk),name,age,age_ret)

i want a query which will give me a table(tmp) containing (id,name,age in tb1,age in tb2,age_ret in tb1,age_ret in tb2)
if there is change in age,age_ret in table tb2 compared to tb1

can u help me out

Tom Kyte
December 09, 2004 - 12:38 pm UTC

did you read the above examples? you are 99.9% of the way there.

not too much left to do on your part.

Table/Column Compare

Ivan, January 28, 2005 - 5:49 pm UTC

Regarding Marco's February 04, 2004 result

1) I should think that the 3 table's set of relevant column(s) could be considered equal
only if have an equal count of (any) duplicate values, along with having equal values.

In his example - with a key of 'C' - Table A has only 1 row, other Tables have 2 rows
It could be "DIFF", not "OK"?


2) The DECODE can be replaced with a CASE to eliminate DECODE's above stated limitation.


So another query could be:

select key,
decode( ta, 0, null, value1) a,
decode( tb, 0, null, value1) b,
decode( tc, 0, null, value1) c,
--decode( ta, 0, 'DIFF',
-- decode( tb, 0, 'DIFF',
-- decode( tc, 0, 'DIFF', 'OK' ) ) ) res
(CASE WHEN ta = tb AND ta = tc THEN 'OK' ELSE 'DIFF' END) res
from
(
select key,
value1,
sum(ta) ta,
sum(tb) tb,
sum(tc) tc
from (
( select a.*, 1 ta, 0 tb, 0 tc from a)
union all
( select b.*, 0 ta, 1 tb, 0 tc from b)
union all
( select c.*, 0 ta, 0 tb, 1 tc from c)
)
group by key, value1
);

Hit ENTER to Continue

KEY A B C RES
---------- ---------- ---------- ---------- ----
A aaa DIFF
B bbb bbb DIFF
C ccc ccc ccc DIFF
D ddd ddd ddd OK
E eeX DIFF
E eee eee DIFF
F DIFF


A reader, February 11, 2005 - 3:23 am UTC

Hi Tom,

We want to compare the data contents of table A and B. We know that each record is unique in table A and B, and the number of records in table A and B is the same. If A minus B and B minus B returns no rows, then can we conclude that the data in A and B are the same?

Thanks

Tom Kyte
February 11, 2005 - 8:03 pm UTC

a - b
b - a

yes, but:

</code> https://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html <code>

will show a way to do it without full scanning both tables twice.

A reader, February 11, 2005 - 8:13 pm UTC

Tom,

Is there some package in Streams to compare tables like the one in replication.

Thanks.

Tom Kyte
February 12, 2005 - 12:21 pm UTC

</code> https://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html <code>

it just takes a query to compare two tables. the rectifer diff stuff is somewhat unique to advanced replication.

Your link is very useful

A reader, February 12, 2005 - 4:22 am UTC

Hi Tom,

Thanks for the link </code> https://asktom.oracle.com/Misc/oramag/on-injecting-and-comparing.html <code>Don't know the previous comments of this question provides the better solution.

Thanks

Mihail Bratu, March 30, 2005 - 10:02 am UTC


Diff's columns for big tables

GP, April 28, 2005 - 9:05 am UTC

Hello Tom,

I have a similar requirement
Table A (180 columns), Table B(180 columns)
Both the tables having approximately 100,000 rows
Need to find all the differences and log them
I am querying it like below

tableA minus tableB
union all
tableB minus tableA

I get certain rows but it's hard to determine which columns have changed

Hence, Referring to your followup earlier on this Posting.

tkyte@TKYTE9I.US.ORACLE.COM> select *
2 from (
3 select t1.trans_id,
4 decode( r, 1, 'C1', 2, 'C2', 3, 'C3' ) cname,
5 decode( r, 1, to_char(t1.c1,'dd-mon-yyyy hh24:mi:ss'),
6 2, to_char(t1.c2),
7 3, t1.c3) v1,
8 decode( r, 1, to_char(t2.c1,'dd-mon-yyyy hh24:mi:ss'),
9 2, to_char(t2.c2),
10 3, t2.c3) v2
11 from trans t1,
12 trans_tmp t2,
13 (select rownum r from all_objects where rownum <= 3 )
14 where t1.trans_id = :bv
15 and t1.trans_id = t2.trans_id (+)
16 )
17 where v1 <> v2
18 or (v1 is null and v2 is not null)
19 or (v1 is not null and v2 is null)
20 /


Following is the query I am currenlty using for diffs
(lk is the primary key)

(select 180 columns from A@itenv where lk in (select lk from lklnr@itenv)
minus
select 180 columns from A@cbenv where lk in (select lk from lklnr@cbenv))
union all
(select 180 columns from A@cbenv where lk in (select lk from lklnr@cbenv)
minus
select 180 columns from A@itenv where lk in (select lk from lklnr@itenv))

How can I rewrite the above query for my use
to get the differences and columns with both values.

Thankyou.

Tom Kyte
April 28, 2005 - 9:34 am UTC

building on the comparing two tables article here:
https://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html

which demonstrates what I've found to be fastest approach (see second article on that page)...

You sort of need to have some idea of the "primary key" here (in order to compare what is different -- we need to be able to "join" the two tables row by row right)...

So, you use the query there:

SQL> select c1, c2, c3, 
  2         count(src1) CNT1, 
  3          count(src2) CNT2
  4     from 
  5   ( select a.*, 
  6          1 src1, 
  7          to_number(null) src2 
  8       from a
  9      union all
  10    select b.*, 
  11         to_number(null) src1, 
  12         2 src2 
  13      from b
  14   )
  15  group by c1,c2,c3
  16 having count(src1) <> count(src2)
  17 /
 
  C1   C2    C3   CNT1    CNT2
 ---   --    --   ----    ----
   2    x     y      0       1
   2   xx     y      1       0
   3    x     y      1       0
   3    x    yy      0       1
 
<b>and order by "primary key" (say it was C1 in this example)</b>

then the matching rows appear one under the other.

or you can further use analytics and

count(distinct nvl(c2,'something c2 cannot be')) over (partition by c1) cnt_c2,

and look at the columns where cnt_NN > 1

 

Lk is the primary key in table A and table B

A reader, April 28, 2005 - 12:24 pm UTC


Tom Kyte
April 28, 2005 - 2:03 pm UTC

great -- use it where I used c1..

I like analytics

Vinayak, April 28, 2005 - 2:07 pm UTC

u mean something like this?:

SQL> select * from t1 ;

        ID COL1       COL2
---------- ---------- ----------
         1 x          XXX
         1 x          X
         1 y          YY

SQL> select * from t2 ;

        ID COL1       COL2
---------- ---------- ----------
         1 x          XXX
         1 y          Y

SQL> select * from
  2  (select tname,id,col1,col2,count(*) over (partition by id,col1,col2) cnt
  3  from (select 'T1' tname,t1.* from t1 union all select 'T2' tname,t2.* from t2)
  4  ) where cnt=1 ;

TNAME                                    ID COL1       COL2              CNT
-------------------------------- ---------- ---------- ---------- ----------
T1                                        1 x          X                   1
T2                                        1 y          Y                   1
T1                                        1 y          YY                  1
 

Tom Kyte
April 28, 2005 - 2:10 pm UTC

yes

Tom, I need further help

GP, April 28, 2005 - 2:26 pm UTC

I went through the article and was wonderful.
I tried rewriting my query
but I am missing please guide me further

LK is the Primary key in both the tables

select c1, c2, c3, count(src1) CNT1, count(src2) CNT2
from
(
select a.*, 1 src1, to_number(null) src2
from A@itenv a
where lk in (select lk from lklnr@itenv)
union all
select b.*, to_number(null) src1, 2 src2
from B b
where lk in (select lk from lklnr@itenv)
)
where a.lk = b.lk
group by c1,c2,c3
having count(src1) <> count(src2)

I need the lk, columns with both the values that are different.

Please Guide.


Tom Kyte
April 28, 2005 - 2:34 pm UTC

change c1, c2, c3.... to YOUR COLUMNS

there is not any join.

select YOUR_COLUMNS, count(src1) CNT1, count(src2) CNT2
from
(
select a.*, 1 src1, to_number(null) src2
from A@itenv a
where lk in (select lk from lklnr@itenv)
union all
select b.*, to_number(null) src1, 2 src2
from B b
where lk in (select lk from lklnr@itenv)
)
group by YOUR_COLUMNS
having count(src1) <> count(src2)

I did the column select

GP, April 28, 2005 - 3:32 pm UTC

I am sorry, I guess I did not put my question clear.

Yeap I am putting all my columns in select and group by

but that would give me all the columns in my resultset

but I want to have lk, just the column that changed, value 1,val 2.
(val1 and val2 are value of columns that are different in both rows)

Please help me get through further

Thankyou very much.

Tom Kyte
April 28, 2005 - 4:22 pm UTC

there is where after you get this:

select YOUR_COLUMNS, count(src1) CNT1, count(src2) CNT2
from
(
select a.*, 1 src1, to_number(null) src2
from A@itenv a
where lk in (select lk from lklnr@itenv)
union all
select b.*, to_number(null) src1, 2 src2
from B b
where lk in (select lk from lklnr@itenv)
)
group by YOUR_COLUMNS
having count(src1) <> count(src2)

(call that Q) you'll:



select q.*, count(distinct c1) over (partition by lk) cnt_c1, ....
from (q)


those counts can be used as "flags" to say "look at me"



Last followup on this table comparison thread is confusing to me

Rick, May 09, 2005 - 7:37 pm UTC

This has been an awesome and very useful thread. However,is it possible for you to provide a more complete example on how to return only the primary key column and column(s) whose values have changed please?

Did you mean that the results of the compare rows script be inserted into a table called q and then select from that table to get the counts? You say that the counts can be used as "flags" to say "look at me" Would you please expound on that? Please excuse my obtuseness

Tom Kyte
May 09, 2005 - 8:02 pm UTC

you have to look column by column -- no magic there, the flags where just a way to say "look at me", nothing more.

Maybe this script will help! ;)

LOU, July 14, 2005 - 5:28 pm UTC

undefine TABLE1
undefine TABLE2
define g_table1 = '&&TABLE1'
define g_table2 = '&&TABLE2'
set verify off
set feedback off
set serveroutput on size 1000000
spo temp_file.sql
declare
v_owntab1 varchar2(255) := '&&g_table1';
v_owntab2 varchar2(255) := '&&g_table2';
v_own1 varchar2(255);
v_own2 varchar2(255);
v_tab1 varchar2(255);
v_tab2 varchar2(255);
v_dot1 number := 0;
v_dot2 number := 0;
type t_cols is table of varchar2(255) index by binary_integer; v_cols1
t_cols; v_cols2 t_cols; v_out1 varchar2(255); v_out2 varchar2(255); kq
CONSTANT varchar2(1) := ''''; v_ind number := 0; v_str
varchar2(2000):=null; v_ind_found boolean := FALSE; v_ind_colno number
:= 0;
procedure print_cols (p_cols in t_cols) is
begin
for i in 1..p_cols.count
loop
dbms_output.put_line(','||p_cols(i));
end loop;
end print_cols;

begin
v_dot1 := instr(v_owntab1, '.');
v_dot2 := instr(v_owntab2, '.');

if v_dot1 > 0 then
v_own1 := upper(substr(v_owntab1, 1, v_dot1-1));
v_tab1 := upper(substr(v_owntab1, v_dot1+1));
else
v_own1 := null;
v_tab1 := upper(v_owntab1);
end if;

if v_dot2 > 0 then
v_own2 := upper(substr(v_owntab2, 1, v_dot2-1));
v_tab2 := upper(substr(v_owntab2, v_dot2+1));
else
v_own2 := null;
v_tab2 := upper(v_owntab2);
end if;
select column_name
bulk collect into v_cols1
from all_tab_columns
where table_name = v_tab1
and owner = nvl(v_own1, user)
order by column_id;

select column_name
bulk collect into v_cols2
from all_tab_columns
where table_name = v_tab2
and owner = nvl(v_own2, user)
order by column_id;

if v_cols1.count = 0 or v_cols2.count = 0 then
dbms_output.put_line('Either or Both the tables are invalid');
return;
end if;

dbms_output.put_line('(');
dbms_output.put_line('select '||kq||'TAB1'||kq);
print_cols(v_cols1);
dbms_output.put_line(' from '||nvl(v_own1, user)||'.'||v_tab1);
dbms_output.put_line('MINUS');
dbms_output.put_line('select '||kq||'TAB1'||kq);
print_cols(v_cols1);
dbms_output.put_line(' from '||nvl(v_own2, user)||'.'||v_tab2);
dbms_output.put_line(')');

dbms_output.put_line('UNION');

dbms_output.put_line('(');
dbms_output.put_line('select '||kq||'TAB2'||kq);
print_cols(v_cols1);
dbms_output.put_line(' from '||nvl(v_own2, user)||'.'||v_tab2);
dbms_output.put_line('MINUS');
dbms_output.put_line('select '||kq||'TAB2'||kq);
print_cols(v_cols1);
dbms_output.put_line(' from '||nvl(v_own1, user)||'.'||v_tab1);
dbms_output.put_line(')');

dbms_output.put_line('order by ');
for c1 in (
select b.column_name
from all_indexes a, all_ind_columns b
where a.owner=b.index_owner
and a.index_name=b.index_name
and a.uniqueness = 'UNIQUE'
and a.table_owner = nvl(v_own1, user)
and a.table_name = v_tab1
order by b.index_name, b.column_position
)
loop
v_ind_found := TRUE;
v_ind_colno := v_ind_colno + 1;
if v_ind_colno = 1 then
dbms_output.put_line(c1.column_name);
else
dbms_output.put_line(','||c1.column_name);
end if;
end loop;
if not v_ind_found then
dbms_output.put_line('2 ');
end if;
dbms_output.put_line(';');


end;
/
spo off
set feedback on

Humaira Liaqat, August 15, 2005 - 4:32 am UTC

Hello Tom,

I want to write a procedure which would insert,update and delete records from
10 tables using join query.
The structure of these tables is: they have 3 columns namely OLD,NEW AND VALUE.

The query is like
SELECT A.RED AS OLD,B.RED AS NEW,B.NAME AS VALUE FROM COLOR_S1 A ,COLOR_S B
WHERE A.NAME = B.NAME (+) UNION
SELECT A.RED AS OLD,B.RED AS NEW,A.NAME AS VALUE FROM COLOR_S1 A,COLOR_S B WHERE
B.NAME = A.NAME (+) ;

Here for example COLOR_S1 and COLOR_S2 are similar parent tables where value of
red is different in both the tables but they are joined using NAME which is
common for both the tables.
When ever there is an insert or update or delete in any of these parent
tables(like COLOR_S1 AND COLOR_S2), the procedure shuld fire and respective
child table( 3 column table) should get update.
The number of parent tables is 20(inclusive of all _S1 and _S2 tables).The
number of child tables are 10.(_S tables).

Please guide me how to write this procedure so that the query gets fired
everytime any change occurs in any of the 20 parent tables.

Thanks
Humaira.



Tom Kyte
August 15, 2005 - 7:20 am UTC

did not follow that at all.

Humaira Liaqat, August 16, 2005 - 1:15 am UTC

Hi Tom,

I am sorry I could not explain You properly.
Anyway can you just help me to write a procedure to update a table if any insert,update or delete happens in the parent table? If a child table has 2 parent which are tables similar to each other except their primary key can a join query be used in the procedure to populate the child table?

Thanks
Humaira.

Tom Kyte
August 16, 2005 - 10:50 am UTC

and what might this "update" do -- you'll need to explain it before I can demonstrate it.

What is the goal, why do you need to update "a table" (assuming the child table(s)) when you update the parent? what are you trying to accomplish.

comparing two table data and logging the changes in MS Access

Shailesh, March 16, 2006 - 3:55 am UTC

Now I want to compare two tables having same Structure in MS Access

Tom Kyte
March 16, 2006 - 11:41 am UTC

go for it?

Now you have the differences, process them

Drew DeGraff, July 14, 2006 - 12:22 pm UTC

I built upon what you guys started.

This will compare a “base” table with a “refresh” and put the differences in a “diff” table

We will need to create a diff table from the base table and add three columns; base_tab number, refresh_tab number, and action varchar2(30). These columns will indicate which table the differences come from.

We first populate the diff table. We will then update the diff table with the action, update, delete, insert, old, that the rows represent. The query brings back all records that are different, “old” will represent the records in the base table that were changed in some way by the “update” records. The “old” records can be ignored or deleted.

The basic query for getting the differences is

select pk, column1, column2, column3,
count(src1) base_tab, count(src2) refresh_tab
from
( select a.*,
1 src1,
to_number(null) src2
from base_tab a
union all
select b.*,
to_number(null) src1,
2 src2
from refresh_tab b
)
group by pk, column1, column2, column3
having count(src1) <> count(src2)


This query only returns rows that are different or do no exist in the base and/or the refresh table.

The base_tab and refresh_tab column fields should only contain a 1 or 0 unless the primary key has been violated and there are duplicates. 1 or more means it exists in that respective table, 0 means it doesnÂ’t exist. If there are duplicates then we may need to tweak the code further.

Create table base_tab (pk number, column1 number, column2 varchar2(30), column3 date);
Create table refresh_tab (pk number, column1 number, column2 varchar2(30), column3 date);
Create table diff (pk number, column1 number, column2 varchar2(30), column3 date, base_tab number, refresh_tab number, action varchar2(30));

Insert into base_tab VALUES (1, 1.21,Â’SNACK CAKESÂ’,Â’31-aug-2007Â’);
Insert into base_tab VALUES (2, .96,Â’DORITOSÂ’,Â’30-sep-2007Â’);
Insert into base_tab VALUES (3, .31,Â’SODAÂ’,Â’31-aug-2007Â’);
Insert into base_tab VALUES (4, 2.67,Â’PIZZAÂ’,Â’30-nov-2007Â’);

Insert into refresh_tab VALUES (1, 1.21,Â’SNACK CAKESÂ’,Â’31-aug-2007Â’);
Insert into refresh_tab VALUES (2, .96,Â’DORITOSÂ’,Â’31-OCT-2007Â’);
Insert into refresh_tab VALUES (3, .31,Â’SODA POPÂ’,Â’31-aug-2007Â’);
Insert into refresh_tab VALUES (5, .21,Â’SNICKERSÂ’,Â’30-nov-2007Â’);
Insert into refresh_tab VALUES (6, .45,Â’BEERÂ’,Â’30-nov-2007Â’);

--clean out the differences table
truncate table diff;

--
-- build the difference data
--
insert into diff select pk,column1,column2, column3,
count(src1) base_tab, count(src2) refresh_tab, null action
from
( select a.*,
1 src1,
to_number(null) src2
from base_tab a
union all
select b.*,
to_number(null) src1,
2 src2
from refresh_tab b
)
group by pk,column1,column2, column3
having count(src1) <> count(src2);


--
-- update old and update records. It gets all records where the pk
-- exists more than once, one record would be from the base, the other
-- is the changed record in the refresh table
-- if the base_tab column contains a 1, then itÂ’s a Old, else its an
-- Update
--
update diff set action=decode(base_tab,1,'Old','Update') where
(pk) IN (
select pk from (
select pk, count(*) from diff group by pk
having count(*) >1) );
--
-- update the delete and insert records. It gets all records where the
-- pk exist only once.
-- if the base_tab column contains a 1, then itÂ’s a Delete, else its an -- insert
--
update diff set action=decode(base_tab,1,'Delete','Insert') where
(pk) IN (
select pk from (
select pk, count(*) from diff
group by pk having count(*) =1) );


SELECT * FROM DIFF;
PK col1 col2 col3 bt rt action
2 .96 DORITOS 30-SEP-07 1 0 Old
2 .96 DORITOS 31-OCT-07 0 1 Update
3 .31 SODA 31-AUG-07 1 0 Old
3 .31 SODA POP 31-AUG-07 0 1 Update
4 2.67 PIZZA 30-NOV-07 1 0 Delete
5 .21 SNICKERS 30-NOV-07 0 1 Insert
6 .45 BEER 30-NOV-07 0 1 Insert

Now you can process the orders any way you wish.


Please tell me if I am missing something

Srinivas Narashimalu, July 21, 2006 - 11:27 am UTC

Hi Tom,

I had previously taken your advice on tuning this query. I had tuned it and have also come up with a different approach of getting this report. 

This query essentially displays records for periods (let's say) 2005.01.01 to 2005.07.21 (sysdate-12months) and 2006.01.01 to 2006.07.21 (sysdate). This is mentioned in the "pfact.invoice_date_key" condition in the query (query is given below).

Previously the resultset of this query would be inserted to a table everyday (with truncating it before inserting). 

Now I want to insert into the table only one day's new records each day, instead of truncating and populating it everyday with 'all' the data. 

I created FULL_REP table (to hold all the records i.e upto sysdate (condition for "pfact.invoive_date_key" in the query)) using CTAS using the query given below.

I created PART_REP table (to hold partial the records i.e upto sysdate-1 (condition for "pfact.invoive_date_key" in the query)) using CTAS using the query given below.

But this is what I get when I tested it -

SQL> select count(*) from FULL_REP; (this table has data for periods between 01.01.2005 and 07.19.2005 and 01.01.2006 and 07.19.2006)
 
  COUNT(*)
----------
    251544
 
SQL> select count(*) from PART_REP; (this table has data for periods between 01.01.2005 and 07.18.2005 and 01.01.2006 and 07.18.2006)
 
  COUNT(*)
----------
    250926
 
SQL> select 251544 - 250926 from dual;
 
251544-250926
-------------
          618
 
SQL> create table STAGE_REP as (select * from FULL_REP minus select * from PART_REP);        
 
Table created.
 
SQL> select count(*) from del2;
 
  COUNT(*)
----------
      8909

WHY IS IT CREATED WITH 8909 ROWS? IT MUST HAVE BEEN 618 RIGHT?

And when I insert the STAGE_REP values into PART_REP it's resulting in 
select count(*) from PART_REP;

  COUNT(*)
----------
    259835

Please tell me where I am going wrong. Below is the sql used to create tables FULL_REP,PART_REP and STAGE_REP.

SELECT pfact.invoice_year,
                dlr.dealer_key,
                dlr.dealer_number,
                dlr.name,
                dlr.city||', '||dlr.state AS location,
                dlr.country,
                dlr.classification,
                dlr.start_date_active,
                dlr.end_date_active,
                dlr.old_dealer_number,
                regdist.region_code,
                regdist.region_desc,
                regdist.regional_manager,
                regdist.district_code,
                regdist.district_desc,
                regdist.district_manager,
                dlr.facing_dealer,
                dlr.parts_parent_dealer,
--              dlr.parent_location,    -- to be added to dealer_dim
                '',
                dlr.financial_report_dealer,
                dlr.consolidation_dealer,
                dlr.parts_management_group,
                org.organization_code,
                org.division_code,
                supp.supplier_code,
                supp.name,
                pgrp.product_group_desc,
--              pvdr.parent_code,
--              pvdr.part_classification,
--              pvdr.proprietary_ind,
--              dship.direct_ship_program,
--              ordr.order_type,
--              pri.priority_code,
                NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY'),
                        DECODE(ordr.order_type,'PDC',
                                DECODE(pri.priority_code,'S',
                                        SUM(pfact.local_ext_sales_price), 0) ,0) ,0), 0) AS pdc_stock,
                NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY'),
                        DECODE(ordr.order_type,'PDC',
                                DECODE(pri.priority_code,'E',
                                        SUM(pfact.local_ext_sales_price), 0), 0), 0), 0) AS pdc_vor,
                NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY'),
                        DECODE(ordr.order_type,'DSP',
                                SUM(pfact.local_ext_sales_price), 0), 0), 0) AS vdsp,
                NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY'),
                        DECODE(ordr.order_type,'SD',
                                SUM(pfact.local_ext_sales_price), 0), 0), 0) AS ship_direct,
                NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY'),
                        SUM(pfact.local_ext_sales_price), 0), 0) AS gross_sales,
                NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY'),
                        SUM(pfact.local_gross_profit), 0), 0) AS gross_profit,
                NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY') - 1,
                        DECODE(ordr.order_type,'PDC',
                                DECODE(pri.priority_code,'S',
                                        SUM(pfact.local_ext_sales_price), 0), 0), 0), 0) AS pdc_stock_prev,
                NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY') - 1,
                        DECODE(ordr.order_type,'PDC',
                                DECODE(pri.priority_code,'E',
                                        SUM(pfact.local_ext_sales_price), 0), 0), 0), 0) AS pdc_vor_prev,
                NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY') - 1,
                        DECODE(ordr.order_type,'DSP',
                                SUM(pfact.local_ext_sales_price), 0), 0), 0) AS vdsp_prev,
                NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY') - 1,
                        DECODE(ordr.order_type,'SD',
                                SUM(pfact.local_ext_sales_price), 0), 0), 0) AS ship_direct_prev,
                NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY') - 1,
                        SUM(pfact.local_ext_sales_price), 0), 0) AS gross_sales_prev,
                NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY') - 1,
                        SUM(pfact.local_gross_profit), 0), 0) AS gross_profit_prev
        FROM    priority_dim             pri,
                product_group_dim              pgrp,
                order_type_dim           ordr,
                organization_dim         org,
                dealer_dim               dlr,
                region_district_dim      regdist,
                price_vendor_dim         pvdr,
               part_dim                 pdim,
                supplier_dim             supp,
                part_order_line_fact     pfact
        WHERE regdist.type(+) = 'PARTS'
--        AND dlr.dealer_number = 'A300'
          AND regdist.region_district_key IN
                                (SELECT MIN(region_district_key)
                                 FROM region_district_dim
                                 WHERE type = 'PARTS'
                                 AND dealer_key = dlr.dealer_key)
          AND (pfact.invoice_date_key BETWEEN TO_NUMBER(TO_CHAR(SYSDATE-1, 'yyyy')- 1||'0101')
                                          AND TO_NUMBER(TO_CHAR(SYSDATE-1, 'yyyymmdd')))
          AND (pfact.invoice_date_key NOT BETWEEN TO_NUMBER(TO_CHAR(ADD_MONTHS(SYSDATE-1, -12), 'yyyymmdd'))
                                              AND TO_CHAR(TO_CHAR(SYSDATE-1, 'YYYY')- 1)||'1231')
--w/ Canada dealers
--        AND (SUBSTR(dlr.dealer_number, 1, 1) NOT IN ('X','Y'))
--w/o Canada dealers
          AND (SUBSTR(dlr.dealer_number, 1, 1) NOT IN ('X','Y','N'))
          AND NVL(gl_code, 'MISC') NOT LIKE 'DDSS%'
          AND NVL(gl_code, 'MISC') NOT LIKE 'SSMX%'
          AND NVL(gl_code, 'MISC') NOT LIKE 'SSMU%'
          AND pdim.part_ind = 'Y'
          AND   dlr.dealer_key = regdist.dealer_key(+)
          AND dlr.dealer_key = pfact.dealer_key
--        AND regdist.region_district_key (+) = pfact.region_district_key       
          AND org.organization_key = pfact.organization_key
        AND pgrp.product_group = pvdr.product_group
--        AND cr.credit_reason_key = pfact.credit_reason_key                    
--        AND dship.direct_ship_program_key(+) = pfact.direct_ship_program_key
          AND ordr.order_type_key = pfact.order_type_key
          AND pri.priority_key = pfact.priority_key
          AND pdim.part_key = pfact.shipped_part_key
          AND pdim.organization_key = pfact.organization_key
          AND pvdr.price_vendor_key = pfact.price_vendor_key
          AND supp.supplier_key = pfact.supplier_key
        GROUP BY pfact.invoice_year,
                dlr.dealer_key,
                dlr.dealer_number,
                dlr.name,
                dlr.city||', '||dlr.state,
                dlr.country,
                dlr.classification,
                dlr.start_date_active,
                dlr.end_date_active,
                dlr.old_dealer_number,
                regdist.region_code,
                regdist.region_desc,
               regdist.regional_manager,
                regdist.district_code,
                regdist.district_desc,
                regdist.district_manager,
                dlr.facing_dealer,
                dlr.parts_parent_dealer,
--              dlr.parent_location,
                dlr.financial_report_dealer,
                dlr.consolidation_dealer,
                dlr.parts_management_group,
                org.organization_code,
                org.division_code,
                supp.supplier_code,
                supp.name,
                pgrp.product_group_desc,
--              pvdr.parent_code,
--              pvdr.part_classification,
--              pvdr.proprietary_ind,
--              dship.direct_ship_program,
                ordr.order_type,
                pri.priority_code;

Sorry to bother with such huge query!

Thanks,
Srinivas 

Tom Kyte
July 23, 2006 - 7:41 am UTC

I'd be looking at my data to figure out what I did wrong? I mean, you have the 8206 rows - debug it?

Not knowing your data as you do....

Not knowing how you built part and full rep (are they "correct" in the first place)

One cannot really say (and that is a rather large query, I'm not going to reverse engineer it here)

Appreciate it!

Srinivas Narashimalu, July 23, 2006 - 10:42 pm UTC

Thanks for the response Tom!

I know it's a large query..I was sort of not comfortable posting that..but somehow wanted to know where I was going wrong as I have been troubleshooting for quiet sometime.

Anyhow thanks again for your response and greatly appreciate your politeness.

Thanks,
Srinivas

Bringing 1 table in sync with the other

A reader, July 28, 2006 - 4:33 pm UTC

Given 2 tables with similar structure (same PK and N optional attributes)

How can the techniques discussed on this thread be used to bring Table A "in sync" with Table B? i.e.

a. if the PK is the same, update the N optional attributes in Table A to be the same as those in Table B.

b. If B doesn't have the row, delete it from A.

c. If A doesn't have the row, add it to A

i.e. B is the master table, I want to make A look like it.

Additionally, I would like to "log" each INSERT/UPDATE/DELETE I am doing to a audit trail, so I guess a set-based approach is out?

Any ideas appreciated. Thanks

Tom Kyte
July 28, 2006 - 9:01 pm UTC

simple.

delete from a;
insert into a select * from b;


not sure of the point of the audit trail since a row in b could have been changed 500 times, or once?

Checksum Based Approach

Thiyag, September 13, 2006 - 2:26 am UTC

Hi Tom:

Can you suggest some checksum based approach for comparing the table contents. For example say there is some algorithm built, which reads the contents of the table and give a checksum value which could be used to compare the content of another table present in some other database. This way the two databases need not be connected to perform the content check.

Tom Kyte
September 13, 2006 - 7:25 am UTC

select sum( checksum( col1||'/'||col2||'/'||....||'/'||colN) )
from t;

perhaps.

What is checksum?

Thiyag, September 13, 2006 - 9:44 am UTC

I am interested in knowing your opinion on how this checksum need to be written? Any suggestions. Is there some already implemented packages available in Oracle which uses this checksum but for a different purpose that could be harnessed?

Tom Kyte
September 13, 2006 - 2:49 pm UTC

dbms_obfuscation_toolkit can create an md5 hash
dbms_crypto has more than just md5

they are checksum like examples.

multiple tables with same structure

satish, November 16, 2006 - 9:52 am UTC

TOM,

I have two tables for each day like (t20060101, o20060101 ...etc say 730 tables per year) with same structure, I need to query data from these tables, given column values and start date and end date.
Is this type of question answered? if so, please give me the link, if not, can you please give me the code?

satish

Tom Kyte
November 16, 2006 - 3:20 pm UTC

I would say you've done a bad thing.

You meant to partition

but you didn't

why not?

A reader, November 17, 2006 - 9:38 am UTC


A similar table comparison problem

sd, March 29, 2007 - 6:22 am UTC

Hello Tom,


i've got a similar constellation as the original poster. I want to compare two
master-detail tables efficiently in a PL/SQL procedure.


These are the simplified example tables. Each entry in the master table can
have an arbitrary amount of entries in the detail table. P/Fkeys and indices
are missing, obviously.


CREATE TABLE app_queue (
id NUMBER NOT NULL,
description VARCHAR2(100) NOT NULL
);

CREATE TABLE app_queue_detail (
queue_id NUMBER NOT NULL,
fieldname VARCHAR2(40) NOT NULL,
fieldvalue VARCHAR2(40) NOT NULL
);



CREATE TABLE app_notification (
id NUMBER NOT NULL,
description VARCHAR2(100) NOT NULL
);

CREATE TABLE app_notification_detail (
notification_id NUMBER NOT NULL,
fieldname VARCHAR2(40) NOT NULL,
fieldvalue VARCHAR2(40) NOT NULL
);


I want to return all app_notification and app_queue id's where all fieldnames
and fieldvalues of the notification match a queue's details. I've had some
approaches with nested cursors but i wasn't really satisfied with them.

Example data following.

Match, return app_queue.id and app_notification.id:
app_queue_detail app_notification_detail
NAME joseph NAME joseph
BDAY 1990

No match:
app_queue_detail app_notification_detail
NAME joseph NAME john
BDAY 1990 BDAY 19%

No match:
app_queue_detail app_notification_detail
NAME joseph FAVC gr%
BDAY 1990
ZIPC 31337


Is there a possibility to turn this problem into an efficient query?


Thanks in advance.
Tom Kyte
March 30, 2007 - 12:47 pm UTC

need some inserts there

and an explanation of why the first one is a match.


differences of records in single table

Rasika, May 24, 2007 - 1:09 pm UTC

Hi Tom,

I have similar question but this is with a single table, let's say we audit changes to the records of a table and we transfer each modfied record into new audit table

ex: if table T1 is with columns (PK, C1, C2, C3) and our audit table is AUD_T1 (PK, C1, C2, C3, OPERATION_TYPE, TIMESTAMP, USER)
(In the audit table actually PK is not primary key column but it just keep the values for primary key column PK of T1)

so when any change happens to T1 that new (or old) record will be copied to AUD_T1 with additional op_type (insert, delete, update) timestamp and the user

so it could be as follows in AUD_T1

PK C1 C2 C3 OPERATION_TYPE TIMESTAMP USER
1 a b c UPDATE t1 u1
1 a1 b c1 UPDATE t2 u1
1 a1 b1 c1 UPDATE t3 u2
2 c d e UPDATE t4 u1
2 c1 d e UPDATE t5 u2

so what I need to generate is table as follows using above date in AUD_T1

to make it easy if we consider records with PK = 1

TIMESTAMP COLUMN OLD_VALUE NEW_VALUE
t2 C1 a a1
t2 C3 c c1

t3 C2 b b1

and actually if I can get with following format that would be even great

At t2 user u1 changed
--
C1 a a1
C3 c c1

At t3 user u2 changed
--
C2 b b1


This is basically yo represent some meaningful audit report to user, we already keep records as
in AUD_T1 but the problem is presenting them as above..

If you could provide few SQL queries or PL/SQL SPs which could do such thing or point to any
existing question & answer link that would be great

Thanks Tom



data from single table

Rasika, May 24, 2007 - 1:11 pm UTC

Hi Tom,

I have similar question but this is with a single table, let's say we audit changes to the records of a table and we transfer each modfied record into new audit table

ex: if table T1 is with columns (PK, C1, C2, C3) and our audit table is AUD_T1 (PK, C1, C2, C3, OPERATION_TYPE, TIMESTAMP, USER)
(In the audit table actually PK is not primary key column but it just keep the values for primary key column PK of T1)

so when any change happens to T1 that new (or old) record will be copied to AUD_T1 with additional op_type (insert, delete, update) timestamp and the user

so it could be as follows in AUD_T1

PK C1 C2 C3 OPERATION_TYPE TIMESTAMP USER
1 a b c UPDATE t1 u1
1 a1 b c1 UPDATE t2 u1
1 a1 b1 c1 UPDATE t3 u2
2 c d e UPDATE t4 u1
2 c1 d e UPDATE t5 u2

so what I need to generate is table as follows using above date in AUD_T1

to make it easy if we consider records with PK = 1

TIMESTAMP COLUMN OLD_VALUE NEW_VALUE
t2 C1 a a1
t2 C3 c c1

t3 C2 b b1

and actually if I can get with following format that would be even great

At t2 user u1 changed
--
C1 a a1
C3 c c1

At t3 user u2 changed
--
C2 b b1


This is basically yo represent some meaningful audit report to user, we already keep records as
in AUD_T1 but the problem is presenting them as above..

If you could provide few SQL queries or PL/SQL SPs which could do such thing or point to any
existing question & answer link that would be great

We're using Oracle 10g

Thanks Tom



tom, please declare...

Ralph Buetikofer, January 14, 2009 - 5:33 am UTC

hi tom

thank you so far for lots of good tipps!

i have to compare several huge tables with millions of records in it. so read this thread and found some good solutions incl. the one from stefano and you. now i have to find the fastest way to compare our data.

therefor i have a table with only two cols, no pk and 5 millions records. it's intressting for me finding out that solution 1 is faster than your solution 2. did i forget something or is there something wrong in my example? i'd like you to let me know.

here the examples:

1) A where A not in B union all B where B not in A

SELECT a.col1, a.col2
FROM tab_a a
WHERE (a.col1, a.col2) NOT IN
(SELECT b.col1, b.col2
FROM tab_b b)
UNION ALL
SELECT b.col1, b.col2
FROM tab_b e60
WHERE (b.col1, b.col2) NOT IN
(SELECT a.col1, a.col2
FROM tab_a a);

result: 0 Records selected (compare was fine), 17seconds

2) your solution described above
SELECT col1,
col2,
COUNT(src1) CNT1,
COUNT(src2) CNT2
FROM (SELECT a.col1, a.col2,
1 src1,
TO_NUMBER(NULL) src2
FROM tab_a a
UNION ALL
SELECT b.col1, b.col2,
TO_NUMBER(NULL) src1,
2 src2
FROM tab_b b)
GROUP BY col1,
col2
HAVING COUNT(src1) <> COUNT(src2);

result: 0 Records selected (compare was fine), 25seconds... (?)

btw: we're using oracle 10g r2

any ideas tom?

thank you for your reply.
Tom Kyte
January 16, 2009 - 4:23 pm UTC

tkprof it, post those results.

wall clock times are not very useful - probably they are "the same"


Your code does not handle nulls at all, hopefully col1 and col2 are NOT NULL.. That is, your query is not useful if any of the attributes are NULLABLE (so it would not matter how fast or slow it is :) )

ops$tkyte%ORA11GR1> create table tab_a ( col1 number, col2 number );

Table created.

ops$tkyte%ORA11GR1> create table tab_b ( col1 number, col2 number );

Table created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> insert into tab_a values ( 1, 1 );

1 row created.

ops$tkyte%ORA11GR1> insert into tab_a values ( 1, null );

1 row created.

ops$tkyte%ORA11GR1> insert into tab_b values ( 1, 1 );

1 row created.

ops$tkyte%ORA11GR1> insert into tab_b values ( null, 1 );

1 row created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> SELECT a.col1, a.col2
  2    FROM tab_a a
  3   WHERE (a.col1, a.col2) NOT IN
  4         (SELECT b.col1, b.col2
  5            FROM tab_b b)
  6  UNION ALL
  7  SELECT b.col1, b.col2
  8    FROM tab_b b
  9   WHERE (b.col1, b.col2) NOT IN
 10         (SELECT a.col1, a.col2
 11            FROM tab_a a);

no rows selected

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> SELECT col1,
  2         col2,
  3         COUNT(src1) CNT1,
  4         COUNT(src2) CNT2
  5    FROM (SELECT a.col1, a.col2,
  6                 1 src1,
  7                 TO_NUMBER(NULL) src2
  8            FROM tab_a a
  9          UNION ALL
 10          SELECT b.col1, b.col2,
 11                 TO_NUMBER(NULL) src1,
 12                 2 src2
 13            FROM tab_b b)
 14   GROUP BY col1,
 15            col2
 16  HAVING COUNT(src1) <> COUNT(src2);

      COL1       COL2       CNT1       CNT2
---------- ---------- ---------- ----------
                    1          0          1
         1                     1          0




Matching from Single Table based on user input

Shravani, July 08, 2011 - 4:47 am UTC

Hi Tom,

Thanks so much for all your help and mentoring.

I have a requirement of comparing the records from same table. The output should be based on the user input.

Script :
----------------------------------------------------
drop table dispatch_records;

create table dispatch_records (cust_no varchar2(10), dispatch_Date date, dispatch_zip varchar2(10));

insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_1', '01-Jan-2001', '07001');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_1', '11-Jan-2001', '07771');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_1', '22-Feb-2001', '08731');

insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_2', '22-Feb-2001', '07771');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_2', '22-Feb-2001', '08731');

insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_3', '27-Mar-2003', '07001');

insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_4', '27-Mar-2003', '07001');

insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_5', '01-Jan-2001', '08731');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_5', '01-Jan-2001', '07001');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_5', '01-Jan-2001', '07771');

insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_6', '22-Feb-2001', '07771');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_6', '22-Feb-2001', '08731');

insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_7', '22-Feb-2001', '10011');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_7', '22-Feb-2001', '10017');
commit;
----------------------------------------------------

The requirement is

If user selects option "All" then interface should display the customers where all the zipcode are matching

Output For "All ZipCode(s) Matching"
---------------------------------------------
Dispatch Location matching for : Cust_1, Cust_5
Dispatch Location matching for : Cust_2, Cust_6
Dispatch Location matching for : Cust_3, Cust_4
---------------------------------------------

If user selects option "Any 1" then it means display the customers for which any 1 zipcode is matching.

Output For "Any 1 ZipCode(s) Matching"
-----------------------------------------------
Dispatch Location matching for : Cust_1, Cust_2
Dispatch Location matching for : Cust_1, Cust_3
Dispatch Location matching for : Cust_1, Cust_4
Dispatch Location matching for : Cust_1, Cust_5
Dispatch Location matching for : Cust_1, Cust_6
Dispatch Location matching for : Cust_2, Cust_5
Dispatch Location matching for : Cust_2, Cust_6
Dispatch Location matching for : Cust_3, Cust_4
Dispatch Location matching for : Cust_3, Cust_5
Dispatch Location matching for : Cust_4, Cust_5
Dispatch Location matching for : Cust_5, Cust_6
---------------------------------------------

If user selects option "Any 2" then it means display the customers for which any 2 zip codes are matching.

Output For "Any 2 ZipCode(s) Matching"
-----------------------------------------------
Dispatch Location matching for : Cust_1, Cust_2
Dispatch Location matching for : Cust_1, Cust_5
Dispatch Location matching for : Cust_1, Cust_6
Dispatch Location matching for : Cust_2, Cust_5
Dispatch Location matching for : Cust_2, Cust_6
Dispatch Location matching for : Cust_5, Cust_6
---------------------------------------------

Can you please suggest a effective way in SQL to achieve this?

Thanks & Regards
Shravani
Tom Kyte
July 08, 2011 - 2:59 pm UTC

ops$tkyte%ORA11GR2> with t
  2  as
  3  ( select distinct cust_no, dispatch_zip zip,
  4           count(distinct dispatch_zip) over (partition by cust_no) cnt
  5      from dispatch_records
  6  )
  7  select t1.cust_no, t2.cust_no
  8    from t t1, t t2
  9   where t1.cust_no < t2.cust_no
 10     and t1.zip = t2.zip
 11   group by t1.cust_no, t2.cust_no
 12  having max(t1.cnt) = max(t2.cnt)
 13     and max(t1.cnt) = count(*)
 14   order by 1
 15  /

CUST_NO    CUST_NO
---------- ----------
Cust_1     Cust_5
Cust_2     Cust_6
Cust_3     Cust_4

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> with t
  2  as
  3  ( select distinct cust_no, dispatch_zip zip
  4      from dispatch_records
  5  )
  6  select distinct t1.cust_no, t2.cust_no
  7    from t t1, t t2
  8   where t1.cust_no < t2.cust_no
  9     and t1.zip = t2.zip
 10   order by 1
 11  /

CUST_NO    CUST_NO
---------- ----------
Cust_1     Cust_2
Cust_1     Cust_3
Cust_1     Cust_4
Cust_1     Cust_5
Cust_1     Cust_6
Cust_2     Cust_5
Cust_2     Cust_6
Cust_3     Cust_4
Cust_3     Cust_5
Cust_4     Cust_5
Cust_5     Cust_6

11 rows selected.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> with t
  2  as
  3  ( select distinct cust_no, dispatch_zip zip
  4      from dispatch_records
  5  )
  6  select t1.cust_no, t2.cust_no
  7    from t t1, t t2
  8   where t1.cust_no < t2.cust_no
  9     and t1.zip = t2.zip
 10   group by t1.cust_no, t2.cust_no
 11  having count(*) >= 2
 12   order by 1
 13  /

CUST_NO    CUST_NO
---------- ----------
Cust_1     Cust_2
Cust_1     Cust_5
Cust_1     Cust_6
Cust_2     Cust_5
Cust_2     Cust_6
Cust_5     Cust_6

6 rows selected.

Thanks so much for the simple solution.

Shravani, July 11, 2011 - 3:12 am UTC


Result Set is Huge

Shravani, July 27, 2011 - 6:19 am UTC

Hi Tom,

This is regarding the question "Matching from Single Table based on user input" asked in the same thread.

Actually I want to use the query logic you gave to pass input to another program.

Queries are working fine for small volume, however for higher volume it is returning huge data. For example for 100K records in "dispatch_records" table the query is returning 12M records. And in production the volume of this table is more than 1M.

Is it possible to get the output only at customer level and
also is it possible to get the output using "dispatch_records" table just once?

Regards
Shravani
Tom Kyte
July 28, 2011 - 7:01 pm UTC

state the question entirely here - do not make me read up and down on the page. give YOUR example, explained in your words, with your data.


I did not understand this:

Is it possible to get the output only at customer level and
also is it possible to get the output using "dispatch_records" table just once?



Result Set is Huge

Shravani, July 31, 2011 - 10:40 am UTC

Sorry Tom...I will phrase my question properly.

For further processing, want to get out the customers for whom the dispatch zip codes are common. Basically want to group the output based on what is matching.

Here is the table structure and data.

drop table dispatch_records;

create table dispatch_records (cust_no varchar2(10), dispatch_Date date, dispatch_zip
varchar2(10));

------------------------------------------------------

insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_1', '07001');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_1', '07771');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_1', '08731');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_2', '07771');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_2', '08731');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_3', '07001');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_4', '07001');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_5', '08731');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_5', '07001');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_5', '07771');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_6', '07771');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_6', '08731');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_7', '10011');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_7', '10017');
commit;
------------------------------------------------------

The query you gave in previous thread for this is

with t
as
( select distinct cust_no, dispatch_zip zip
from dispatch_records
)
select distinct t1.cust_no, t2.cust_no
from t t1, t t2
where t1.cust_no < t2.cust_no
and t1.zip = t2.zip
order by 1
/

And the output is

CUST_NO CUST_NO
---------- ----------
Cust_1 Cust_2
Cust_1 Cust_3
Cust_1 Cust_4
Cust_1 Cust_5
Cust_1 Cust_6
Cust_2 Cust_5
Cust_2 Cust_6
Cust_3 Cust_4
Cust_3 Cust_5
Cust_4 Cust_5
Cust_5 Cust_6
--------------------

The query does give me the required output, however since it is returning each combination, the resultset is becoming very huge.

Is it possible to get something like below. This will enable me to pass on lesser records to next program. Basically once I have the below output, I will apply the DISTINCT after removing the ZIP

----------------------------------------
Cust_Number Zip Rank
-----------------------------------------
Cust_1 07771 1
Cust_1 08731 1
Cust_2 07771 1
Cust_2 08731 1
Cust_5 07771 1
Cust_5 08731 1
Cust_6 07771 1
Cust_6 08731 1

Cust_1 07001 2
Cust_1 07771 2
Cust_1 08731 2
Cust_5 07001 2
Cust_5 07771 2
Cust_5 08731 2

Cust_1 07001 3
Cust_3 07001 3
Cust_4 07001 3
Cust_5 07001 3
------------------------------------



Regards
Shravani


Tom Kyte
August 01, 2011 - 11:48 am UTC

explain your output. It is not immediately obvious to me what it represents.


and if you are trying to return less data to the client, why is your desired output LARGER than my original output?

Shravani, August 02, 2011 - 6:48 am UTC

Hi Tom,

Thanks so much for the response. Basically I want to send the customers based on the groups. In the below output I am getting the customers in Pairs. All the customers with RANK column value as 1 are the ones who have 07771 and 08731 zip codes common. The ones with Rank column value as 2, have 3 zip codes common.

Cust_Number Zip Rank
-----------------------------------------
Cust_1 07771 1
Cust_1 08731 1
Cust_2 07771 1
Cust_2 08731 1
Cust_5 07771 1
Cust_5 08731 1
Cust_6 07771 1
Cust_6 08731 1

Cust_1 07001 2
Cust_1 07771 2
Cust_1 08731 2
Cust_5 07001 2
Cust_5 07771 2
Cust_5 08731 2

Cust_1 07001 3
Cust_3 07001 3
Cust_4 07001 3
Cust_5 07001 3
------------------------------------

Basically my final goal is to send the distinct customers based on a common zip codes the represent.

My final goal is to get the output as


Cust_Number Rank
-------------------------
Cust_1 1
Cust_2 1
Cust_5 1
Cust_6 1
Cust_1 2
Cust_2 2
Cust_1 3
Cust_3 3
Cust_4 3
Cust_5 3

The above output is clerly giving the distinct groupings of customers based on the matching zip codes.

Regards
Shravani
Tom Kyte
August 02, 2011 - 7:12 am UTC

how does cust_2 get a rank of 2 in your final output?????

What appears to be "clear" to you is "muddy" to me.


Sorry for the typo

Shravani, August 02, 2011 - 8:00 am UTC

Sorry once again Tom for my stupid mistakes. The final output should be

Cust_Number Rank
-------------------------
Cust_1 1
Cust_2 1
Cust_5 1
Cust_6 1
Cust_1 2
Cust_5 2
Cust_1 3
Cust_3 3
Cust_4 3
Cust_5 3

Regards
Shravani
Tom Kyte
August 02, 2011 - 8:38 am UTC

why isn't cust_3 in rank 1?

clear as mud I'd say?

Here is what I think you want - again, the answer is LARGER - but whatever. Your output does not compute for me at all



ops$tkyte%ORA11GR2> with t
  2  as
  3  ( select distinct cust_no, dispatch_zip zip
  4    from dispatch_records
  5    )
  6    select distinct t1.cust_no t1_cust_no, t2.cust_no t2_cust_no
  7    from t t1, t t2
  8    where t1.cust_no < t2.cust_no
  9    and t1.zip = t2.zip
 10    order by 1
 11  /

T1_CUST_NO T2_CUST_NO
---------- ----------
Cust_1     Cust_2
Cust_1     Cust_3
Cust_1     Cust_4
Cust_1     Cust_5
Cust_1     Cust_6
Cust_2     Cust_5
Cust_2     Cust_6
Cust_3     Cust_4
Cust_3     Cust_5
Cust_4     Cust_5
Cust_5     Cust_6

11 rows selected.

ops$tkyte%ORA11GR2> with t
  2  as
  3  ( select distinct cust_no, dispatch_zip zip
  4    from dispatch_records
  5    )
  6    select distinct new_cust_no, dense_rank() over (order by t1_cust_no)
  7      from (
  8    select t1_cust_no, nvl(t2_cust_no,t1_cust_no) new_cust_no
  9      from (
 10    select distinct t1.cust_no t1_cust_no, t2.cust_no t2_cust_no
 11    from t t1, t t2
 12    where t1.cust_no < t2.cust_no
 13    and t1.zip = t2.zip
 14          )
 15   group by grouping sets((t1_cust_no),(t1_cust_no,t2_cust_no))
 16         )
 17    order by 2
 18  /

NEW_CUST_N DENSE_RANK()OVER(ORDERBYT1_CUST_NO)
---------- -----------------------------------
Cust_1                                       1
Cust_2                                       1
Cust_3                                       1
Cust_4                                       1
Cust_5                                       1
Cust_6                                       1
Cust_2                                       2
Cust_5                                       2
Cust_6                                       2
Cust_3                                       3
Cust_4                                       3
Cust_5                                       3
Cust_4                                       4
Cust_5                                       4
Cust_5                                       5
Cust_6                                       5

16 rows selected.

Sorry Again Tom.....

Shravani, August 03, 2011 - 5:40 am UTC

Hi Tom,

Thanks so much for giving your valuable time on my question.
And I am extremely sorry Tom, I am not explaining the requirements properly.

The cust_3 is not in Rank 1 because it has match on the different zip codes.

This time I will try my best to explain the requirement with expected result set.

I customized your query for pulling the customers to make the groups based on the zip codes and number of zip codes match.

select
c1, c2, rnk, zip
from
(
select distinct c1, c2, zip,
dense_rank() over (order by sum_zip) Rnk
from
(
with t
as
( select distinct cust_no, dispatch_zip zip
from dispatch_records order by 1
)
select t1.cust_no c1, t2.cust_no c2, t1.zip,
sum(t1.zip) over(partition by t1.cust_no, t2.cust_no) sum_zip
from t t1, t t2
where t1.cust_no < t2.cust_no
and t1.zip = t2.zip
)
)
order by 3, 1, 2, 4

Output

===============================
C1 C2 RNK ZIP
===============================
Cust_1 Cust_3 1 07001 --------
Cust_1 Cust_4 1 07001 |
Cust_3 Cust_4 1 07001 Customers in this set have one zip code
Cust_3 Cust_5 1 07001 07001 common and hence are grouped.
Cust_4 Cust_5 1 07001 --------|

Cust_1 Cust_2 2 07771 --------
Cust_1 Cust_2 2 08731 |
Cust_1 Cust_6 2 07771 |
Cust_1 Cust_6 2 08731 |
Cust_2 Cust_5 2 07771 Customers in this set have 08731 and 07771
Cust_2 Cust_5 2 08731 zip codes in common and hence are grouped.
Cust_2 Cust_6 2 07771 |
Cust_2 Cust_6 2 08731 |
Cust_5 Cust_6 2 07771 |
Cust_5 Cust_6 2 08731 --------

Cust_1 Cust_5 3 07001 --------
Cust_1 Cust_5 3 07771 Customers in this set have 3 zip codes in common
Cust_1 Cust_5 3 08731 --------
===============================

I am not able to rollup the above output further to get the final output as

======================
Cust_No Rank
======================
Cust_1 1 --------
Cust_3 1 Customers with Only zip code (07001)
Cust_4 1 matching
Cust_5 1 --------

Cust_1 2 --------
Cust_2 2 Customers with 2 zip codes (08731 and
Cust_5 2 07771) matching.
Cust_6 2 --------

Cust_1 3 ------- Customers with 3 zip codes (07001, 07771 and 08731)
Cust_5 3 --------matching
=====================

I sincerely request you to spend few minutes for the last time on this requirement.

Regards
Shravani

Tom Kyte
August 03, 2011 - 7:54 am UTC

this has nothing to do with the original query. And you are right - you didn't really explain this before.

Assuming your query is correct, just use the same grouping sets technique I used above:

ops$tkyte%ORA11GR2> select distinct nvl(c2, c1), rnk
  2    from (
  3  select c1, c2, rnk, zip
  4  from
  5  (
  6      select distinct c1, c2, zip,
  7             dense_rank() over (order by sum_zip) Rnk
  8      from
  9      (
 10      with t
 11          as
 12          ( select distinct cust_no, dispatch_zip zip
 13              from dispatch_records order by 1
 14          )
 15          select t1.cust_no c1, t2.cust_no c2, t1.zip,
 16                 sum(t1.zip) over(partition by t1.cust_no, t2.cust_no) sum_zip
 17            from t t1, t t2
 18           where t1.cust_no < t2.cust_no
 19             and t1.zip = t2.zip
 20      )
 21  )
 22      )
 23  group by grouping sets( (c1,rnk),(c1,c2,rnk) )
 24  order by 2, 1
 25  /

NVL(C2,C1)        RNK
---------- ----------
Cust_1              1
Cust_3              1
Cust_4              1
Cust_5              1

Cust_1              2
Cust_2              2
Cust_5              2
Cust_6              2

Cust_1              3
Cust_5              3

10 rows selected.



Thanks so much

Shravani, August 03, 2011 - 9:20 am UTC

Thanks so much Tom. This is exactly what I wanted.

I just have 1 question, In the above query since Zip codes was having all numerics it was possible to put the SUM(zip) and then apply the DENSE_RANK. But suppose we have a non-numeric data, then how can we group it.

Here are the scripts for your ready reference.

drop table dispatch_records;

create table dispatch_records (cust_no varchar2(10), dispatch_Date date, dispatch_zip
varchar2(20));

------------------------------------------------------

insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_1', 'Edison');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_1', 'Bridgwater');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_1', 'Forked River');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_2', 'Bridgwater');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_2', 'Forked River');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_3', 'Edison');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_4', 'Edison');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_5', 'Forked River');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_5', 'Edison');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_5', 'Bridgwater');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_6', 'Bridgwater');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_6', 'Forked River');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_7', 'Avenel');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_7', 'South Jersey');
commit;
------------------------------------------------------

With this data, how can we get the output as

======================
Cust_No Rank
======================
Cust_1 1 --------
Cust_3 1 Customers with Only zip code (Edison)
Cust_4 1 matching
Cust_5 1 --------

Cust_1 2 --------
Cust_2 2 Customers with 2 zip codes (Bridgwater and
Cust_5 2 Forked River) matching.
Cust_6 2 --------

Cust_1 3 ------- Customers with 3 zip codes (Edison, Bridgwater and Forked River)
Cust_5 3 --------matching
=====================

Regards
Shravani
Tom Kyte
August 03, 2011 - 9:54 am UTC

question for you, what if the zip codes are:

10000
20000


16000
14000

for two groups - wouldn't that mess up your query as well. Like I said, I'm assuming your query works - maybe it doesn't. I didn't really analyze it.

you'd need to spec out the inner query for us, in detail - very very clearly.

I simply took your work and made it return what you wanted. If your work doesn't work... then you'll need to work on that bit.

Shravani, August 03, 2011 - 10:51 am UTC

Sorry Again Tom.

Believe me this is the first time in AskTom forum, I am not able to explain properly what I want.

Actually my question is more on applying the analytical functions on more than one record, but I am not doing a good job of phrasing it.

For the below data, the grouping should be based on the matching zip codes. I used a very dirty method of doing a SUM(ZIP) OVER (PARTITION BY T1.CUST_NO, T2.CUST_NO).
Thinking here was to set up a base value on which analytical functions will work.

My question is without using SUM(), can we group the data, basically how to tell Oracle to apply the rank based on the matching rows.

Output

===============================
C1 C2 ZIP
===============================
Cust_1 Cust_3 07001
Cust_1 Cust_4 07001
Cust_3 Cust_4 07001
Cust_3 Cust_5 07001
Cust_4 Cust_5 07001

Cust_1 Cust_2 07771
Cust_1 Cust_2 08731
Cust_1 Cust_6 07771
Cust_1 Cust_6 08731
Cust_2 Cust_5 07771
Cust_2 Cust_5 08731
Cust_2 Cust_6 07771
Cust_2 Cust_6 08731
Cust_5 Cust_6 07771
Cust_5 Cust_6 08731

Cust_1 Cust_5 07001
Cust_1 Cust_5 07771
Cust_1 Cust_5 08731
===============================

Regards
Shravani
Tom Kyte
August 03, 2011 - 12:39 pm UTC

what happens when you have customers that match twice but with different zips?

eg:

c1 X
c2 X
c1 Y
c2 Y

c3 A
c4 B
c3 A
c4 B


c1 and c2 match twice, c3 and c4 match twice - but they match on different "zips"


You need to be really really precise. It has nothing to do with asking for the first time in a forum - we are all software developers. Our *MOST IMPORTANT* skill - the skill you need more than any - is the skill of putting on paper what actually needs to be done.

You know, write a specification. From start to finish, in sufficient detail that a programmer can generate code from it.

Otherwise - you get garbage.

So, please spec it out - clearly, in detail, precisely.

Yes - it is hard, yes it is a pain, no - you cannot get out of doing it. It is what we do after all.

Sorry once again Tom

rupalimay21@gmail.com, August 04, 2011 - 11:39 am UTC

Hi Tom,

Hats off to you for the below line

"we are all software developers. Our *MOST IMPORTANT* skill - the skill you need more than any - is the skill of
putting on paper what actually needs to be done. "

Very true Tom.

I will give you little background on the business requirement. The client I am working acquired few small companies and the data got integrated successfully.
Now in few of the ledgers we are finding that there are lot of duplicate customer records. Our funtional experts came up with alogorith to identify the duplicates.
They gave set of rules like
Rule 1. Customers with same registration date
Rule 2. Customers with same zip codes (This rule has to be dynamic, means in program we should have flexibility to increase number of matching zip codes)
Rule 3. Customers who have consumed goods of more than 5M in last 2 recent years
and so on.

note : All the rules goes with AND condition. And the rules are arranged as per the granularity, means Rule-1 is at high granular level than Rule-2.

As per the functional team, rules can change any time hence they are looking for some kind of framework which will identify duplicates based on the defined
rules.

We have created a Stage Table which holds all the customers. Using PL/SQL we process each rule and then flag off the customer codes based on the group.

For example for the below scenario after the rule_1 process
Cust_1, Cust_2, Cust_3 shares same registration date.
Cust_3, Cust_5 shares same registration date.

DUPS_BASE

---------------------------------------------------------------------
CUST_NO Rule_1 Rule_2 Rule_3
----------------------------------------------------------------------
Cust_1 1
Cust_2 1
Cust_3 1
Cust_4 2
Cust_5 2
----------------------------------------------------------------------

Once the Rule_1 is processed, process takes second rule and Cust_1, Cust_2, Cust_3 gets evaluated against Rule_2.

I asked question specific to Rule_2. Basically I want to bundle customers in a group, so that next process will pick them up and apply next rule.

Now to answer your question, if
c1 X
c2 X
c1 Y
c2 Y

then C1 and C2 will grouped under same ranking. Basically, all the customers are grouped based on the matching zip codes, so that next rule will get applied on the set of customers.

Tom, I am not sure whether I explained the requirement properly. I will surely not again request you to spend time on this, because i know I have wasted
lot of your time already.

Thanks so much for all your advices Tom.

Regards
Shravani
Tom Kyte
August 04, 2011 - 11:48 am UTC

I asked about this:

eg:

c1 X
c2 X
c1 Y
c2 Y

c3 A
c4 B
c3 A
c4 B


c1 and c2 match twice, c3 and c4 match twice - but they match on different "zips"


not just c1, c2 - what is the output for c1,c2,c3,c4

Also, cust_1 might appear in MANY groups then right? - did you/do you consider that?

cust_1 might have 5 zip codes, of which 3 match cust_2, 3 match cust_3, 3 match cust_4 - but cust_2,3,4 do not match each other three times. What then? What does the output look like at that point?


Differences in data content

Eric, November 12, 2011 - 9:57 pm UTC

To help me compare the contents of two tables for I project I am working on I looked at ideas for a year and ended writing the following. I hope is it not a really old idea. It populates a table with the primary key, the column name and the data from both tables. In short the primary key is reported as a tilde delimited value, the two tables are joined by their primary keys and the differences reported. I have not used this for tables with more than 10 million rows. On smaller tables it will compare about 20 million values a minute on a slow server. Not sure how it handles big tables. One downside is NVL forces a full table scan but, we are looking at the entire table anyway.
--
create table data_differences ( pkey_string varchar2(200) ,
column_name varchar2(30) , table1_name varchar2(30) ,
table2_name varchar2(30), table1_value varchar2(240) ,
table2_value varchar2(240), job_date date )

create table data_job_status ( column_name varchar2(30) ,
transaction_date date )

create table table_one ( id number(4) , source varchar2(12) ,
name varchar2(20) , transaction_date date , userid varchar2(20) )

create table table_two as select * from table_one

alter table table_one add ( constraint table_one_pk primary key ( id , source ) )
alter table table_two add ( constraint table_two_pk primary key ( id , source ) )

insert into table_one values ( 1 , 'CORP' , 'SMITH' ,
TO_DATE ( '11082011' , 'MMDDYYYY' ) , 'USER10' )
insert into table_one values ( 2 , 'CORP' , 'SMITH' ,
TO_DATE ( '11081999' , 'MMDDYYYY' ) , 'USER11' )
insert into table_one values ( 3 , 'CORP' , 'SMITH' ,
TO_DATE ( '11081999' , 'MMDDYYYY' ) , 'USER15' )

insert into table_two values ( 1 , 'CORP' ,'SMITH' ,
TO_DATE ( '11082011' , 'MMDDYYYY' ) , 'USER1' )
insert into table_two values ( 2 , 'CORP' , 'JOHNSON' ,
TO_DATE ( '11081999' , 'MMDDYYYY' ) , 'USER1' )
insert into table_two values ( 3 , 'CORP' , 'JOHNSON' ,
TO_DATE ( '11082011' , 'MMDDYYYY' ) , 'USER1' )
--
-- 2 of 3 rows returned
PKEY_STRING COLUMN_NAME TABLE1_NAME TABLE2_NAME TABLE1_VALUE TABLE2_VALUE
-------------- ---------------- -------------- -------------- --------------- ------------
2~CORP NAME TABLE_ONE TABLE_TWO SMITH JOHNSON
3~CORP TRANSACTION_DATE TABLE_ONE TABLE_TWO 08-NOV-99 08-NOV-11
--
-- BEGIN
-- COMPARE_DATA2 ( 'GUEST' , 'GUEST' , 'TABLE_ONE' , 'TABLE_TWO' );
-- END
--
CREATE OR REPLACE PROCEDURE compare_table_content
( in_owner1 in varchar , in_owner2 in varchar ,
in_table1 in varchar , in_table2 in varchar ) is
n number(8);
x number(8);
c_col varchar2(30); /* column name */
c_nam varchar2(30); /* primary key name */
d_typ varchar2(12); /* data type */
q_col varchar2(30); /* column for primary key */
char_col varchar2(32); /* column name in quotes */
ch_table1 varchar2(32);
ch_table2 varchar2(32);
q_str varchar2(500); /* query string */
x_str varchar2(1000); /* execute string */
ins_str varchar2(2000); /* insert string */
key_str varchar2(500); /* delimited primary key */
whr_str varchar2(2000); /* where string */
nls_dte varchar2(20) := '''DD-MON-YYYY'''; /* date fmt */
--
-- default values NVL function
--
nvl_chr varchar2(20) := '''QqQqQq''';
nvl_dte varchar2(40) := 'sysdate + 100';
--
-- Get primary key
--
cursor c1 ( in_owner in varchar , in_table in varchar ) is
select a.constraint_name , a.column_name , a.position
from all_cons_columns a , all_constraints b
where a.table_name = upper ( in_table )
and b.table_name = a.table_name
and b.owner = upper ( in_owner )
and a.owner = b.owner
and a.constraint_name = b.constraint_name
and b.constraint_type = 'P'
order by a.position;
--
-- Get column names not in primary key and not excluded
-- In this example has been excluded.
--
cursor c2 ( in_owner in varchar , in_table in varchar ,
pkey_name in varchar ) is
select a.column_name , a.data_type
from all_tab_columns a
where a.table_name = upper ( in_table )
and a.owner = upper ( in_owner )
and a.data_type in ( 'VARCHAR2' , 'NUMBER' , 'DATE' )
and not exists ( select 1 from all_cons_columns b
where a.owner = b.owner
and a.table_name = b.table_name
and b.constraint_name = upper ( pkey_name )
and a.column_name = b.column_name )
and a.column_name != 'USERID'
order by a.column_id;

begin

execute immediate 'truncate table data_job_status reuse storage';
execute immediate 'truncate table data_differences reuse storage';
ch_table1 := chr(39)||upper ( in_table1 )||chr(39);
ch_table2 := chr(39)||upper ( in_table2 )||chr(39);

open c1 ( in_owner1 , in_table1 );
loop
fetch c1 into c_nam , q_col , n;
exit when c1%notfound;

if ( n = 1 ) then

ins_str := 'insert into data_differences select ';
whr_str := ' where a.'||q_col||' = b.'||q_col;
key_str := ' a.'||q_col;

else

whr_str := whr_str||' and a.'||q_col||' = b.'||q_col;
key_str := key_str||'||'||chr(39)||'~'||chr(39)||'|| a.'||q_col;

end if;

end loop;
close c1;

open c2 ( in_owner1 , in_table1 , c_nam );
loop
fetch c2 into c_col , d_typ;
exit when c2%notfound;

char_col := chr(39)||c_col||chr(39);

if ( d_typ = 'NUMBER' ) then

q_str := ' AND NVL ( A.'||c_col||' , -99.9191 ) '||
' != NVL ( B.'||c_col||' , -99.9191 )';

x_str := ins_str||' '||key_str||' , '||char_col||
' , '||ch_table1||' , '||ch_table2||' , '||
' to_char ( a.'||c_col||' ) , to_char ( b.'||c_col||' ) '||
' , sysdate from '||
' '||in_owner1||'.'||in_table1||' a , '||in_owner2||
'.'||in_table2||' b '||
whr_str||' '||q_str;

elsif ( d_typ = 'DATE' ) then

q_str := ' AND NVL ( A.'||c_col||' , '||nvl_dte||' ) '||
' != NVL ( B.'||c_col||' , '||nvl_dte||' ) ';

x_str := ins_str||' '||key_str||' , '||char_col||
' , '||ch_table1||' , '||ch_table2||' , '||
' to_char ( a.'||c_col||' , '||nls_dte||' ) , '||
' to_char ( b.'||c_col||' , '||nls_dte||' ) , sysdate '||
' FROM '||in_owner1||'.'||in_table1||' a , '||in_owner2||
'.'||in_table2||' b '||whr_str||' '||q_str;

else

q_str := ' AND NVL ( A.'||c_col||' , '||nvl_chr||' )'||
' != NVL ( B.'||c_col||' , '||nvl_chr||' )';

x_str := ins_str||' '||key_str||' , '||char_col||
' , '||ch_table1||' , '||ch_table2||' , '||
' a.'||c_col||' , b.'||c_col||' , sysdate '||
' FROM '||in_owner1||'.'||in_table1||' a , '||in_owner2||
'.'||in_table2||' b '||whr_str||' '||q_str;

end if;

insert into data_job_status values ( c_col , sysdate );
commit;

execute immediate ( x_str );
commit;

end loop;
close c2;

commit;

end compare_table_content;
/

Comparing versioned records

Parthiban Nagarajan, March 03, 2012 - 2:39 am UTC

drop table my_cmp purge;

-- This is the main table
-- VERSION_NO is incremented for each UPDATE
-- The record before the change is copied to another table (my_cmp_audit - given below)
create table my_cmp
(
key_col1x varchar2(10),
key_col2x varchar2(10),
info_col1x date,
info_col2x number,
oth_col1x varchar2(10),
oth_col2x date,
oth_col3x number,
version_no number,
constraint my_cmp_pk primary key (key_col1x, key_col2x)
);

begin
insert into my_cmp values ('k1', '1', to_date('16feb1985', 'dd-mon-yyyy'), 27, 'Hello', sysdate, 10, 5);
insert into my_cmp values ('k1', '2', to_date('03mar1970', 'dd-mon-yyyy'), 42, 'World', sysdate, 10, 2);
end;
/

drop table my_cmp_audit purge;

-- This table contains the history of records from my_cmp
create table my_cmp_audit
(
key_col1x varchar2(10),
key_col2x varchar2(10),
info_col1x date,
info_col2x number,
oth_col1x varchar2(10),
oth_col2x date,
oth_col3x number,
version_no number,
constraint my_cmp_audit_pk primary key (key_col1x, key_col2x, version_no)
);

-- Sample data
begin
insert into my_cmp_audit values ('k1', '1', to_date('16feb1985', 'dd-mon-yyyy'), 27, 'Hello', sysdate , 10, 4);
insert into my_cmp_audit values ('k1', '1', to_date('16feb1985', 'dd-mon-yyyy'), 27, 'Hell' , trunc(sysdate), 10, 3);
insert into my_cmp_audit values ('k1', '1', to_date('16feb1985', 'dd-mon-yyyy'), 27, 'Help' , sysdate , 10, 2);
insert into my_cmp_audit values ('k1', '1', to_date('16feb1985', 'dd-mon-yyyy'), 1 , 'xxxxx', sysdate-3333 , 5, 1);
--
insert into my_cmp_audit values ('k1', '2', to_date('03mar1970', 'dd-mon-yyyy'), 42, 'World', sysdate - 50, 10, 1);
end;
/

-- This view shows us the record level comparison side-by-side (using self-join)
-- INFO_COL is not compared; OTH_COLs are compared
-- OTH_COL is given $ suffix if it is from the AUDIT table
-- But this view information is just half the way
create or replace view MY_CMP_VU1 as
with all_recs as (
select KEY_COL1X, KEY_COL2X, INFO_COL1X, OTH_COL1X, OTH_COL2X, OTH_COL3X, VERSION_NO
from MY_CMP
union all
select KEY_COL1X, KEY_COL2X, INFO_COL1X, OTH_COL1X as OTH_COL1$, OTH_COL2X as OTH_COL2$, OTH_COL3X as OTH_COL3$, VERSION_NO
from MY_CMP_AUDIT
)
select KEY_COL1X, KEY_COL2X, a.INFO_COL1X, a.OTH_COL1X, a.OTH_COL2X, a.OTH_COL3X, b.OTH_COL1X as OTH_COL1$, b.OTH_COL2X as OTH_COL2$, b.OTH_COL3X as OTH_COL3$, a.VERSION_NO
from (select * from all_recs) a
join (select * from all_recs) b
using (KEY_COL1X, KEY_COL2X)
where b.VERSION_NO = a.VERSION_NO - 1;

select * from my_cmp_vu1 order by key_col1x, key_col2x, version_no;

KEY_COL1X KEY_COL2X INFO_COL1X OTH_COL1X OTH_COL2X OTH_COL3X OTH_COL1$ OTH_COL2$ OTH_COL3$ VERSION_NO
---------- ---------- ---------------- ---------- ---------------- ---------- ---------- ---------------- ---------- ----------
k1 1 16/02/1985 00:00 Help 03/03/2012 14:03 10 xxxxx 17/01/2003 14:03 5 2
k1 1 16/02/1985 00:00 Hell 03/03/2012 00:00 10 Help 03/03/2012 14:03 10 3
k1 1 16/02/1985 00:00 Hello 03/03/2012 14:03 10 Hell 03/03/2012 00:00 10 4
k1 1 16/02/1985 00:00 Hello 03/03/2012 14:03 10 Hello 03/03/2012 14:03 10 5
k1 2 03/03/1970 00:00 World 03/03/2012 14:03 10 World 13/01/2012 14:03 10 2


-- This view gives the complete and expected result
create or replace view my_cmp_vu2 as
select key_col1x, key_col2x, info_col1x,
field_name, old_value, new_value
from
(
select key_col1x, key_col2x, info_col1x, version_no,
'OTH_COL1X' field_name, to_char(oth_col1x) new_value, to_char(oth_col1$) old_value
from my_cmp_vu1
where oth_col1x <> oth_col1$
union all
select key_col1x, key_col2x, info_col1x, version_no,
'OTH_COL2X', to_char(oth_col2x), to_char(oth_col2$)
from my_cmp_vu1
where oth_col2x <> oth_col2$
union all
select key_col1x, key_col2x, info_col1x, version_no,
'OTH_COL3X', to_char(oth_col3x), to_char(oth_col3$)
from my_cmp_vu1
where oth_col3x <> oth_col3$
)
order by 1, 2, version_no;

select * from my_cmp_vu2 order by key_col1x, key_col2x;

KEY_COL1X KEY_COL2X INFO_COL1X FIELD_NAM OLD_VALUE NEW_VALUE
---------- ---------- ---------------- --------- ---------------------------------------- ----------------------------------------
k1 1 16/02/1985 00:00 OTH_COL1X xxxxx Help
k1 1 16/02/1985 00:00 OTH_COL2X 17/01/2003 14:03 03/03/2012 14:03
k1 1 16/02/1985 00:00 OTH_COL3X 5 10
k1 1 16/02/1985 00:00 OTH_COL2X 03/03/2012 14:03 03/03/2012 00:00
k1 1 16/02/1985 00:00 OTH_COL1X Help Hell
k1 1 16/02/1985 00:00 OTH_COL1X Hell Hello
k1 1 16/02/1985 00:00 OTH_COL2X 03/03/2012 00:00 03/03/2012 14:03
k1 1 16/02/1985 00:00 OTH_COL2X 03/03/2012 14:03 03/03/2012 14:03
k1 2 03/03/1970 00:00 OTH_COL2X 13/01/2012 14:03 03/03/2012 14:03

But when I think about the performance, its hard for me to accept this as the optimal solution.
Because, in the production environment, the data volume would be large and the columns to be compared also >200
Please give me some idea.

Thanks and regards
Tom Kyte
March 03, 2012 - 8:54 am UTC

what do you need to optimize for? to retrieve a single record, a hundred records via some predicate, all records? what is large to you?

Re: Comparing versioned records

Parthiban Nagarajan, March 04, 2012 - 2:08 am UTC

Hi Tom

I want to scan all the records of a monthly partition.
This is to report how the records are getting mutated.
So, every month, a new partition is added and at the end of the month, the report is generated for that partition.

Also, there are a couple of tables which are list partitioned on a region flag. This needs to be scanned based on the LAST_UPDATE_DATETIME field.

Thanks and regards
Tom Kyte
March 04, 2012 - 9:17 am UTC

who would read such a lengthly report? Is this *really* a need? Since it can be generated anytime, for any set of records of interest - why would you generate a report no human being on the planet could even ever read/understand/make use of?

Do you really want to do this? truly? It seems like a big waste?

You can either do the self join as you are doing or union all this data together and use lag or lead to marry the current row with the previous or next row to do the comparision.

but - if you ask me - I wouldn't do it, no one could ever look at it.

Re: Comparing versioned records

Parthiban Nagarajan, March 04, 2012 - 10:33 am UTC

Hi Tom

Believe it or not. We are already doing this.
But with a package, object types and its corresponding table types, with row-by-row processing with dynamic SQL to collect the column_name and etc.

I just want to make it simple so that the maintenance of the code would be easy.

I am not sure how the report is used. They might check only for a few keys (or records) based on some other report.
Also, this report is to list only the records that are modified (i.e. version_no > 1 or the records which have at least one matching record in the audit table).

Please give me some idea to tune the 2nd view or if this cannot be tuned, please tell me your comments on the 2nd view at the performance angle.

And my opinion is this should be better when compared to its existing form (package, dynamic sql, etc.)

Thanks and regards
Tom Kyte
March 05, 2012 - 6:53 am UTC

ops$tkyte%ORA11GR2> with data as (select level l from dual connect by level <= 3)
  2  select *
  3    from (
  4  select key_col1x, key_col2x, info_col1x, version_no,
  5         'OTH_COL'||l||'X' field_name,
  6         decode( l, 1, to_char(oth_col1x), 2, to_char(oth_col2x), 3, to_char(oth_col3x) ) new_value,
  7         decode( l, 1, to_char(oth_col1$), 2, to_char(oth_col2$), 3, to_char(oth_col3$) ) old_value
  8    from my_cmp_vu1, data
  9         )
 10   where decode( new_value, old_value, 1, 0 ) = 0
 11  /

KEY_COL1X  KEY_COL2X  INFO_COL1 VERSION_NO FIELD_NAME NEW_VALUE  OLD_VALUE
---------- ---------- --------- ---------- ---------- ---------- ----------
k1         1          16-FEB-85          4 OTH_COL1X  Hello      Hell
k1         1          16-FEB-85          3 OTH_COL1X  Hell       Help
k1         1          16-FEB-85          2 OTH_COL1X  Help       xxxxx
k1         1          16-FEB-85          2 OTH_COL2X  05-MAR-12  19-JAN-03
k1         1          16-FEB-85          2 OTH_COL3X  10         5
k1         2          03-MAR-70          2 OTH_COL2X  05-MAR-12  15-JAN-12

6 rows selected.



try that one on (or look up pivot if you are in 11g)

Re: Comparing versioned records

Parthiban Nagarajan, March 06, 2012 - 10:37 am UTC

Hi Tom

Yes. As you mentioned, I need to UNPIVOT. I did it with the CROSS JOIN and DECODE, as of now. I will try the UNPIVOT operator also.

Thanks for your invaluable guidance on this.

Thanks and regards

Compare Tables & Aggregate Results

Ian Wallace, January 02, 2013 - 5:23 am UTC

Hi Tom,

This has been a great thread.

If I run the following query as suggested above:

SELECT source_code,
lcn_id,
ssm_id,
source_client,
tct_type,
tct_subtype,
value_date,
interest_accrued,
interest_earned,
ccy_id,
interest_rate,
incremental_earned_home,
incremental_earned,
COUNT(src1) base_tab,
COUNT(src2) refresh_tab
from
( SELECT a.*,
1 src1,
TO_NUMBER(NULL) src2
FROM tr_owner2.daily_interest_active2 a
UNION ALL
SELECT b.*,
TO_NUMBER(NULL) src1,
2 src2
FROM tr_owner.daily_interest_active2 b
)
GROUP BY source_code,
lcn_id,
ssm_id,
source_client,
tct_type,
tct_subtype,
value_date,
interest_accrued,
interest_earned,
ccy_id,
interest_rate,
incremental_earned_home,
incremental_earned
HAVING COUNT(src1) <> count(src2);

I get:

source_code lcn_id ssm_id source_client tct_type tct_subtype value_date interest_accrued interest_earned ccy_id interest_rate incremental_earned_home incremental_earned base_tab refresh_tab
----------- ------ ------ ------------- -------- ----------- ---------- ---------------- --------------- ------ ------------- ----------------------- ------------------ -------- -----------
111111 0IRE AC 999999 INT 01/04/2006 526.08 -19.13 GBP 4.6 1 0
111111 0IRE AC 999999 INT 01/04/2006 526.08 -19.13 GBP 4.9 0 1

The PK for both tables is:
source_code,lcn_id,ssm_id,source_client,tct_type,tct_subtype,value_date

What I would like to do is combine both rows into one row based on PK as follows:

source_code lcn_id ssm_id source_client tct_type tct_subtype value_date interest_accrued_old interest_accrued_new interest_earned_old interest_earned_new interest_rate_old interest_rate_new
----------- ------ ------ ------------- -------- ----------- ---------- -------------------- -------------------- ------------------- ----------------------- ------------------ -----------------
111111 0IRE AC 999999 INT 01/04/2006 526.08 526.08 -19.13 -19.13 4.6 4.9

I'm stuck however on just how to go about doing that?

Aplogies if this has already been answered.

Thanks,
Ian.





Tom Kyte
January 04, 2013 - 2:45 pm UTC

what makes one old or new??? I'll assume base_tab is "old" and refresh_tab is "new"

anyway - you have your query Q above, it returns a "table".

that table includes:

source_code,lcn_id,ssm_id,source_client,tct_type,tct_subtype,value_date


interest_accrued,
interest_earned,
ccy_id,
interest_rate,
incremental_earned_home,
incremental_earned,


and the base_tab, refresh_tab counts

You want to 'pivot' the interest_accrued, interest_earned and interest_rates. You can use pivot in 11g - or in all releases "max(decode)", I'll do max(decode) since you didn't say...


select source_code,lcn_id,ssm_id,source_client,tct_type,tct_subtype,value_date,
       max(decode( base_tab, 1, interest_accrued )) ia_old, 
       max(decode( refresh_tab, 1, interest_accrued )) ia_new,
       ... same for the other two columns ...
  from (Q)
 group by source_code,lcn_id,ssm_id,source_client,tct_type,tct_subtype,value_date


had I been given a create table, some inserts (a test case in short), I could have written the query syntactically correct - but I wasn't so this is just psuedo code... search this site for pivot for full examples....


I don't know how does query work....confused

Texas Cowboy, June 10, 2013 - 4:03 pm UTC

Tom,
your responses to several discussions have been awesome. I just have a question about your query (shown below) to several posts.
I just don't know why you chose 1 for src1 and 2 for src2 in the UNION ALL subquery. I am having trouble understanding how this query compares the data and shows only differences. What effect does 1 and 2 have when you count the records in outer query

select c1, c2, c3,
2 count(src1) CNT1,
3 count(src2) CNT2
4 from
5 ( select a.*,
6 1 src1,
7 to_number(null) src2
8 from a
9 union all
10 select b.*,
11 to_number(null) src1,
12 2 src2
13 from b
14 )
15 group by c1,c2,c3
16 having count(src1) <> count(src2)
17 /
Tom Kyte
June 18, 2013 - 2:42 pm UTC

they could have been 0, 0 - or anything. Just needed something (not null) to count. 1 and 2 seemed "appropriate" given their name source 1 and source 2.



If you have a key (c1,c2,c3) that exists only in A, then "count(src1) cnt1" would be 1 and "count(src2) cnt2" would be 0 - we would see the row exists in A but not B.

and vice versa.

if the key (c1,c2,c3) existed in both tables - then count(src1/src2) would be 1 and we would not display that row.

Yes now I understand....

Texas Cowboy, June 18, 2013 - 4:04 pm UTC

I am such a dolt for not reading it carefully for the first time. I was thinking you were doing SUM() instead of COUNT(), so naturally I thought why would 1 ever equal to 2. Alas, it was COUNT() and your explanation totally made sense. I took this query, modified it to fit my tables, and it worked very well.
Count of 2 to 1 or 1 to 2 means there is a duplicate in one table.
Count of 1 to 0 or 0 to 1 means there is no row of that pk in one table.
count of 1 and 1 will not display because the row is in both tables and it's same.


oracle

A reader, December 29, 2015 - 12:13 pm UTC

nice

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library