i need change in each column to written as a seperate row
sachin, December 06, 2001 - 3:15 pm UTC
thanks for the answer.
but i need to log the changes on each column as a different row into that other table.
for ex:
select '1' a,'2' b,'3' c from dual
minus
select '1' a,'4' b,'6' c from dual
i get the following result
A B C
- - -
1 2 3
i need that other table to contain the following rows.
column_name old_val new_val
b 2 4
c 3 6
December 07, 2001 - 8:56 am UTC
You would need a full outer join in order to do that (since the row could be in T1 and not in T2 or in T2 and not in T1).
Is trans_id the primary key?
yes trans_id is a primary key
sachin, December 07, 2001 - 9:52 am UTC
can you give that sample query with that outer join.
thanks
December 07, 2001 - 1:43 pm UTC
Say you have the tables setup like this:
tkyte@TKYTE9I.US.ORACLE.COM> create table trans (trans_id int primary key,
2 c1 date,
3 c2 number,
4 c3 varchar2(10)
5 )
6 /
Table created.
Elapsed: 00:00:00.00
tkyte@TKYTE9I.US.ORACLE.COM>
tkyte@TKYTE9I.US.ORACLE.COM> create table trans_tmp (trans_id int primary key,
2 c1 date,
3 c2 number,
4 c3 varchar2(10)
5 )
6 /
Table created.
Elapsed: 00:00:00.03
tkyte@TKYTE9I.US.ORACLE.COM>
tkyte@TKYTE9I.US.ORACLE.COM> create table changes
2 ( trans_id int, cname varchar2(10), val1 varchar2(4000), val2 varchar2(4000) );
Table created.
This would capture the changes:
tkyte@TKYTE9I.US.ORACLE.COM> select *
2 from (
3 select t1.trans_id,
4 decode( r, 1, 'C1', 2, 'C2', 3, 'C3' ) cname,
5 decode( r, 1, to_char(t1.c1,'dd-mon-yyyy hh24:mi:ss'),
6 2, to_char(t1.c2),
7 3, t1.c3) v1,
8 decode( r, 1, to_char(t2.c1,'dd-mon-yyyy hh24:mi:ss'),
9 2, to_char(t2.c2),
10 3, t2.c3) v2
11 from trans t1,
12 trans_tmp t2,
13 (select rownum r from all_objects where rownum <= 3 )
14 where t1.trans_id = :bv
15 and t1.trans_id = t2.trans_id (+)
16 )
17 where v1 <> v2
18 or (v1 is null and v2 is not null)
19 or (v1 is not null and v2 is null)
20 /
Where 3 in this query represents the number of columns we need to compare. The decodes would convert everything into character strings for comparision...
outer join query
sachin, December 07, 2001 - 11:45 am UTC
tom,
can you post the outer join query that you talked about in your answer please.
trans_id is a primary key.
outer join on the other table
sachin, December 07, 2001 - 2:48 pm UTC
tom,
i think your following query does the outer join only on table not on both. am i right?
what about the rows that are in t2(trans_tmp), but not in t1(trans)
December 07, 2001 - 3:18 pm UTC
Ok, try this one on for size then:
select *
from (
select t1.trans_id,
decode( r, 1, 'C1', 2, 'C2', 3, 'C3' ) cname,
decode( r, 1, to_char(t1.c1,'dd-mon-yyyy hh24:mi:ss'),
2, to_char(t1.c2),
3, t1.c3) v1,
decode( r, 1, to_char(t2.c1,'dd-mon-yyyy hh24:mi:ss'),
2, to_char(t2.c2),
3, t2.c3) v2
from
(select rownum rr, trans.* from trans where trans_id = :bv ) t1,
(select rownum rr, trans_tmp.* from trans_tmp where trans_id = :bv) t2,
(select rownum r from all_objects where rownum <= 3 ) t3,
(select rownum rr from dual) t4
where t1.rr(+) = t4.rr
and t2.rr(+) = t4.rr
)
where v1 <> v2
or (v1 is null and v2 is not null)
or (v1 is not null and v2 is null)
/
What about when you just have raw data?
Duncan, January 08, 2004 - 7:04 am UTC
Tom
I have a procedure which has many input parameters representing elements of table data. Once these parameters are read into the procedure I validate each element. Once the data line has been validated I then need to check that the record doesn't already exist in my table.
I do this by checking a few fields of demographic data, eg name, address. If the record doesn't exist then i insert it.
However if the record does exits i need to do an update of all the other fields of the row with the new data fields read into the procedure. I also need to keep track of which fields have actually changed from their original table data. Obviously I can do this the long way around by checking individual fields and comparing their table data to the new data, but I wanted to know if you have a quicker more efficient method since my table is very wide, and checking individual elements is tedious, especially because I have to account for NULL values.
I was wondering about loading the data line into maybe a RECORD type and comparing to the table data, but I understand that you cannot directly compare RECORD types.
Would another type work, eg ORACLE OBJECT? If so how could i get it to log which fields have been changed?
There must be a quick less painful way to do this as I can't believe it is that uncommon?
Many thanks for your help
(db version 8.1.7.4)
January 08, 2004 - 1:48 pm UTC
to check for field by field you have one and only one option.
to check field by field.
(it is rather "uncommon", I've not run into it as a requirement very often).
the "easiest" way I can think to do this is:
ops$tkyte@ORA920PC> create table t ( a int, b date, c varchar2(25) );
Table created.
<b>that represents your table. A is your "key", b, c the data you update</b>
ops$tkyte@ORA920PC> insert into t values ( 1, trunc(sysdate), 'hello world' );
ops$tkyte@ORA920PC> insert into t values ( 2, null, 'hello world' );
ops$tkyte@ORA920PC> insert into t values ( 3, trunc(sysdate), null );
ops$tkyte@ORA920PC> create or replace view v
2 as
3 select b, 0 b_ind, c, 0 c_ind
4 from t
5 where 1=0;
View created.
<b>I'm using this view as a template. Basically, we'll have column/indicator, column/indicator in this view.
The view is used to pick up the datatypes from the data dictionary easily and make it so that change to the base table -- changes the view straight away (and so we don't have to define the record in plsql)...
Instead of a view, this could be a cursor in the package
Here is your routine. note how we use DECODE to do the compares (null safe)</b>
ops$tkyte@ORA920PC> create or replace procedure do_iu( p_a in int, p_b in date, p_c in varchar2 )
2 as
3 l_rec v%rowtype;
4 begin
5 select b, decode( b, p_b, 0, 1 ),
6 c, decode( c, p_c, 0, 1 )
7 into l_rec
8 from t
9 where a = p_a;
10
11 if ( l_rec.b_ind = 1 )
12 then
13 dbms_output.put_line( 'B will be changed from "' || l_rec.b || '" to ' || p_b );
14 end if;
15 if ( l_rec.c_ind = 1 )
16 then
17 dbms_output.put_line( 'C will be changed from "' || l_rec.c || '" to ' || p_c );
18 end if;
19
20 update t
21 set b = p_b, c = p_c
22 where a = p_a;
23 dbms_output.put_line( 'Updated ' || sql%rowcount );
24 exception
25 when no_data_found then
26 dbms_output.put_line( 'Inserted' );
27 insert into t (a,b,c) values ( p_a, p_b, p_c );
28 end;
29 /
Procedure created.
<b>and then to test...</b>
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> exec do_iu( 1, sysdate-1, 'hello world' );
B will be changed from "08-JAN-04" to 07-JAN-04
Updated 1
PL/SQL procedure successfully completed.
ops$tkyte@ORA920PC> exec do_iu( 1, trunc(sysdate), 'goodbye' );
B will be changed from "07-JAN-04" to 08-JAN-04
C will be changed from "hello world" to goodbye
Updated 1
PL/SQL procedure successfully completed.
ops$tkyte@ORA920PC> exec do_iu( 1, sysdate, 'goodbye' );
B will be changed from "08-JAN-04" to 08-JAN-04
Updated 1
PL/SQL procedure successfully completed.
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> exec do_iu( 2, sysdate, 'hello world' );
B will be changed from "" to 08-JAN-04
Updated 1
PL/SQL procedure successfully completed.
ops$tkyte@ORA920PC> exec do_iu( 3, trunc(sysdate), 'hello world' );
C will be changed from "" to hello world
Updated 1
PL/SQL procedure successfully completed.
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> exec do_iu( 4, sysdate, 'hello world' );
Inserted
PL/SQL procedure successfully completed.
Read the tables only once!
Marco Stefanetti, February 03, 2004 - 2:03 pm UTC
I have found this simple solution, the benefit is that i read the two tables only once:
1 select field,
2 value,
3 sum(source) source
4 from (
5 (
6 select 'a' field,1 source, 20 value from dual
7 union
8 select 'b' field,1 source, 15 value from dual
9 union
10 select 'c' field,1 source, 30 value from dual
11 )
12 union all
13 (
14 select 'b' field,2 source, 15 value from dual
15 union
16 select 'c' field,2 source, 24 value from dual
17 union
18 select 'd' field,2 source, 13 value from dual
19 )
20 )
21 group by field, value
22* having sum(source) < 3
F VALUE SOURCE
- ---------- ----------
a 20 1
c 24 2
c 30 1
d 13 2
bye
February 03, 2004 - 2:13 pm UTC
huh?
you do realize that you read dual 6 times and not only that but you did a sort/distinct on it....
And you'd read your TABLE 2 times for each column in it in general?
how'd you get "that i read the two tables only once:"?
(and not only that, but it gets "the wrong answer" to boot, what happens with two tables:
T1
1 2
2 1
T2
1 1
2 2
Your technique would say "they are the same", but they are not.
Not sure if you meant what you typed in as the example query -- why don't you work with real tables?
you are too fast on respond
Marco Stefanetti, February 03, 2004 - 2:31 pm UTC
try this :
create table a (field char,source number,value number);
insert into a values('a',1,10);
insert into a values('b',1,10);
insert into a values('c',1,10);
insert into a values('1',1, 2);
insert into a values('2',1, 1);
select * from a;
create table b (field char,source number,value number);
insert into b values('b',2,10);
insert into b values('c',2,12);
insert into b values('d',2,10);
insert into b values('1',2, 1);
insert into b values('2',2, 2);
select * from b;
select field,
value,
sum(source) source
from (
( select * from a )
union all
( select * from b )
)
group by field, value
having sum(source) < 3;
---------------------------------------
results:
select * from a;
F SOURCE VALUE
- ---------- ----------
a 1 10
b 1 10
c 1 10
1 1 2
2 1 1
select * from b;
F SOURCE VALUE
- ---------- ----------
b 2 10
c 2 12
d 2 10
1 2 1
2 2 2
select field,
value,
sum(source) source
from (
( select * from a )
union all
( select * from b )
)
group by field, value
having sum(source) < 3;
F VALUE SOURCE
- ---------- ----------
1 1 2
1 2 1
2 1 1
2 2 2
a 10 1
c 10 1
c 12 2
d 10 2
Record "b 10" doesn't compare!!!!!
I read table a once
I read table b once
see you
February 03, 2004 - 6:15 pm UTC
that is not at all what you posted -- not at all -- see what you posted, it would appear in your above example that you were PIVOTING fields named "a", "b", and "c"
that is why I said "do a real example for your does not make sense as presented"
what was the purpose of the union's of duals -- they only confused the issue. I thought you were PIVOTING a table with 3 columns A, B and C -- that would not work.
Now I see your general idea and say "looks good". I would however do it like this:
select id, count(src1), count(src2)
from ( select a.*, 1 src1, to_number(null) src2 from big_table a
union all
select b.*, to_number(null) src1, 2 src2 from big_table b
)
group by ID, OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID,
DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP,
STATUS, TEMPORARY, GENERATED, SECONDARY
having count(src1) <> count(src2)
That does effectively the same thing -- BUT works even if table A has a row three times and table B does not have it at all (you would get sum(source) = 3 in that case)
This'll show how many rows are are in each table when they are not equal, will tolerate duplicate rows and show the mismatches.
Thanks for the interesting new twist -- it does do less work in most cases -- the group by could be killer on really wide rows.
ps
Marco Stefanetti, February 03, 2004 - 2:44 pm UTC
select field,
sum( decode(source,1,value,null) ) table_a,
sum( decode(source,2,value,null) ) table_b
from
( select field,
value,
sum(source) source
from (
( select * from a )
union all
( select * from b )
)
group by field, value
having sum(source) < 3
)
group by field;
F TABLE_A TABLE_B
- ---------- ----------
1 2 1
2 1 2
a 10
c 10 12
d 10
bye
just another example
Marco Stefanetti, February 03, 2004 - 4:19 pm UTC
You can compare 2 fields of 3 tables
------------------------------------------------------------------------------------------------------------
create table a ( key varchar2(10), value1 number, value2 varchar2(10) );
insert into a values('A',10,'aaa');
insert into a values('B',20,'bbb');
insert into a values('C',30,'ccc');
insert into a values('D',40,'ddd');
insert into a values('E',50,'eee');
create table b ( key varchar2(10), value1 number, value2 varchar2(10) );
insert into b values('B',20,'bbb');
insert into b values('C',30,'ccc');
insert into b values('D',41,'ddd');
insert into b values('E',50,'eee');
create table c ( key varchar2(10), value1 number, value2 varchar2(10) );
insert into c values('C',30,'ccc');
insert into c values('D',40,'ddd');
insert into c values('E',50,'eeX');
------------------------------------------------------------------------------------------------------------
select key,
max( decode( source_code_tot, 1, value1, 3, value1, 5, value1, 7, value1, null ) ) a_v1,
max( decode( source_code_tot, 1, value2, 3, value2, 5, value2, 7, value2, null ) ) a_v2,
max( decode( source_code_tot, 2, value1, 3, value1, 6, value1, 7, value1, null ) ) b_v1,
max( decode( source_code_tot, 2, value2, 3, value2, 6, value2, 7, value2, null ) ) b_v2,
max( decode( source_code_tot, 4, value1, 5, value1, 6, value1, 7, value1, null ) ) c_v1,
max( decode( source_code_tot, 4, value2, 5, value2, 6, value2, 7, value2, null ) ) c_v2,
decode( source_code_tot, 7, 'OK', 'DIFF') descr
from
(
select key,
value1,
value2,
sum(source_code) source_code_tot
from (
( select a.*, 1 source_code from a)
union all
( select b.*, 2 source_code from b)
union all
( select c.*, 4 source_code from c)
)
group by key, value1, value2
)
group by key, decode( source_code_tot, 7, 'OK', 'DIFF')
order by key;
KEY A_V1 A_V2 B_V1 B_V2 C_V1 C_V2 DESCR
---------- ---------- ---------- ---------- ---------- ---------- ---------- --------------------------------
A 10 aaa DIFF
B 20 bbb 20 bbb DIFF
C 30 ccc 30 ccc 30 ccc OK
D 40 ddd 41 ddd 40 ddd DIFF
E 50 eee 50 eee 50 eeX DIFF
------------------------------------------------------------------------------------------------------------
Is it strange to sum the labels 1,2,4?
not so strange, table d would have source_code=8
Useful?
yes, if one of the tables is a complex view on a remote db, and you can read it only once
fast?
I compare 10000 rows of a remote table in 2 seconds.
let me know ...
Multiple records
Marco Stefanetti, February 04, 2004 - 12:35 pm UTC
YOU:
that is not at all what you posted -- not at all -- see what you posted, it
would appear in your above example that you were PIVOTING fields named "a", "b",
and "c"
ME:
Sorry, the union select in the first of my posts was simply a table example,
but it was confusing and, sure, it is better with real tables
YOU:
select id, count(src1), count(src2)
from ( select a.*, 1 src1, to_number(null) src2 from big_table a
union all
select b.*, to_number(null) src1, 2 src2 from big_table b
)
group by ID, OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID,
DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP,
STATUS, TEMPORARY, GENERATED, SECONDARY
having count(src1) <> count(src2)
That does effectively the same thing -- BUT works even if table A has a row
three times and table B does not have it at all (you would get sum(source) = 3
in that case)
ME:
I'm not sure, you are not considering the content of the fields, only keys.
YOU:
the group by could be killer on really wide rows.
ME:
Yes, but I think that something "killer" would be necessary in any case
YOU: BUT works even if table A has a row three times
ME: I have got another idea, hope i'm right
we can use the product of primes instead of the sum of binary numbers
PROBLEM:
find differences and missing records between similar tables
TABLES:
create table a ( key varchar2(10), value1 varchar2(10) );
insert into a values('A','aaa');
insert into a values('A','aaa');
insert into a values('B','bbb');
insert into a values('B','bbb');
insert into a values('C','ccc');
insert into a values('D','ddd');
insert into a values('E','eee');
create table b ( key varchar2(10), value1 varchar2(10) );
insert into b values('B','bbb');
insert into b values('C','ccc');
insert into b values('C','ccc');
insert into b values('D','ddd');
insert into b values('E','eee');
create table c ( key varchar2(10), value1 varchar2(10) );
insert into c values('C','ccc');
insert into c values('C','ccc');
insert into c values('D','ddd');
insert into c values('E','eeX');
SELECT:
select
key,
decode( mod(source_code_tot,2), 0, value1, null) a,
decode( mod(source_code_tot,3), 0, value1, null) b,
decode( mod(source_code_tot,5), 0, value1, null) c,
source_code_tot p,
mod(source_code_tot,30) r,
decode( mod(source_code_tot,30), 0, 'OK', 'DIFF') res
from
(
select key,
value1,
trunc( exp( sum( ln( source_code) ) ) ) source_code_tot
from (
( select a.*, 2 source_code from a)
union all
( select b.*, 3 source_code from b)
union all
( select c.*, 5 source_code from c)
)
group by key, value1
)
;
RESULTS:
KEY A B C P R RES
----- ----- ----- ----- ----- ----- -----
A aaa 4 4 DIFF
B bbb bbb 12 12 DIFF
C ccc ccc ccc 450 0 OK
D ddd ddd ddd 30 0 OK
E eeX 5 5 DIFF
E eee eee 6 6 DIFF
It seems to work, but may be inaccurate due to exp(sum(ln(x))) because of approximations
STEPS:
1) union all : we consider all records af our tables labelling them whit a prime number
2) group by key, value1 : we consider as different, records with different values ( difference between records),
at the same time we calculate the product of the labels (exp sum ln), this is a single number that memorizes the
provenience of the records in the group
3) decode(mod,30) : we can have a total multiple of 30 (2*3*5) if, and only if, the record was present in the tree sources.
I'm sorry for the long posts, but it seems to me quite interesting.
February 04, 2004 - 5:42 pm UTC
watch out - that exp(sum(ln(x))) would be a killer cpu wise!!
I like your approach with the minor modification I have (which permits duplicates -- I'm not assuming "keys" of any sort here at all).
The group by approach (given that you don't get sort key too long) is elegant and efficient.
Sorry
Marco Stefanetti, February 04, 2004 - 1:29 pm UTC
Sorry, you are rigth
considering the above tables it's enought:
select key,
decode( ta, 0, null, value1) a,
decode( tb, 0, null, value1) b,
decode( tc, 0, null, value1) c,
decode( ta, 0, 'DIFF',
decode( tb, 0, 'DIFF',
decode( tc, 0, 'DIFF', 'OK' ) ) ) res
from
(
select key,
value1,
sum(ta) ta,
sum(tb) tb,
sum(tc) tc
from (
( select a.*, 1 ta, 0 tb, 0 tc from a)
union all
( select b.*, 0 ta, 1 tb, 0 tc from b)
union all
( select c.*, 0 ta, 0 tb, 1 tc from c)
)
group by key, value1
)
RESULTS:
KEY A B C RES
---------- ---------- ---------- ---------- ---------------
A aaa DIFF
B bbb bbb DIFF
C ccc ccc ccc OK
D ddd ddd ddd OK
E eeX DIFF
E eee eee DIFF
OK, stop here, I think we have found a good solution.
Comparing two data between two tables with simular data types
Bonnie, November 22, 2004 - 7:51 pm UTC
Hi Tom,
I was also having problems compare data from two simular table, and tried your solution using the decode statement. However, since I'm comparing 202 columns between the two tables, I'm getting an error of 'ORA-00939: too many arguments for function' Any ideas on how I can resolve this?
Your help is much appreciated.
November 22, 2004 - 8:11 pm UTC
decode stops at 255.
so, you'd have to do at least 2 decodes
3 select t1.trans_id,
4 decode( r, 1, 'C1', 2, 'C2', 3, 'C3' ) || decode( r, 4, c4, 5, c5 ) cname,
stopping the first decode with 200 or so inputs and starting the next.
Comparing two data between two tables with simular data types
Bonnie, December 01, 2004 - 1:03 pm UTC
Thank you so much for that. It took a while to sort through all the different variables on my two tables as they were not always consistant. Anyhow, I was wondering if you know how I can resolve this dynamic sql:
--this first one returns the out put of
s_PlSqlBlock :=
'UPDATE TRANS
SET '||C1||' = to_date('||c_cursor.V2||',''dd-mon-yyyy hh24:mi:ss'')'||
' WHERE trans_id = '||c_cursor.trans_id;
the out put from a select statement is:
UPDATE TRANS
SET C1= to_date(27-apr-2004 10:45:00,'dd-mon-yyyy hh24:mi:ss')
WHERE trans_id= 3;
--notice there's no quotes around the '27-apr-2004 10:45:00' which caused the update statement to fail, with 'ORA-00904: invalid column name' error.
The following also doesn't work.
s_PlSqlBlock :=
'UPDATE TRANS
SET '||C1||' = to_date('||'''c_cursor.V2'''||',''dd-mon-yyyy hh24:mi:ss'')'||
' WHERE trans_id = '||c_cursor.trans_id;
Any ideas how I can make this work?
Thank you so much!!
December 01, 2004 - 1:49 pm UTC
USE BIND VARIABLES!!!!!!!
don't put literals in SQL like that.
comparing two tables
biju, December 08, 2004 - 11:50 am UTC
i Have a two tables(tb1,tb2) with same fields(id(pk),name,age,age_ret)
i want a query which will give me a table(tmp) containing (id,name,age in tb1,age in tb2,age_ret in tb1,age_ret in tb2)
if there is change in age,age_ret in table tb2 compared to tb1
can u help me out
December 09, 2004 - 12:38 pm UTC
did you read the above examples? you are 99.9% of the way there.
not too much left to do on your part.
Table/Column Compare
Ivan, January 28, 2005 - 5:49 pm UTC
Regarding Marco's February 04, 2004 result
1) I should think that the 3 table's set of relevant column(s) could be considered equal
only if have an equal count of (any) duplicate values, along with having equal values.
In his example - with a key of 'C' - Table A has only 1 row, other Tables have 2 rows
It could be "DIFF", not "OK"?
2) The DECODE can be replaced with a CASE to eliminate DECODE's above stated limitation.
So another query could be:
select key,
decode( ta, 0, null, value1) a,
decode( tb, 0, null, value1) b,
decode( tc, 0, null, value1) c,
--decode( ta, 0, 'DIFF',
-- decode( tb, 0, 'DIFF',
-- decode( tc, 0, 'DIFF', 'OK' ) ) ) res
(CASE WHEN ta = tb AND ta = tc THEN 'OK' ELSE 'DIFF' END) res
from
(
select key,
value1,
sum(ta) ta,
sum(tb) tb,
sum(tc) tc
from (
( select a.*, 1 ta, 0 tb, 0 tc from a)
union all
( select b.*, 0 ta, 1 tb, 0 tc from b)
union all
( select c.*, 0 ta, 0 tb, 1 tc from c)
)
group by key, value1
);
Hit ENTER to Continue
KEY A B C RES
---------- ---------- ---------- ---------- ----
A aaa DIFF
B bbb bbb DIFF
C ccc ccc ccc DIFF
D ddd ddd ddd OK
E eeX DIFF
E eee eee DIFF
F DIFF
A reader, February 11, 2005 - 3:23 am UTC
Hi Tom,
We want to compare the data contents of table A and B. We know that each record is unique in table A and B, and the number of records in table A and B is the same. If A minus B and B minus B returns no rows, then can we conclude that the data in A and B are the same?
Thanks
A reader, February 11, 2005 - 8:13 pm UTC
Tom,
Is there some package in Streams to compare tables like the one in replication.
Thanks.
Your link is very useful
A reader, February 12, 2005 - 4:22 am UTC
Mihail Bratu, March 30, 2005 - 10:02 am UTC
Diff's columns for big tables
GP, April 28, 2005 - 9:05 am UTC
Hello Tom,
I have a similar requirement
Table A (180 columns), Table B(180 columns)
Both the tables having approximately 100,000 rows
Need to find all the differences and log them
I am querying it like below
tableA minus tableB
union all
tableB minus tableA
I get certain rows but it's hard to determine which columns have changed
Hence, Referring to your followup earlier on this Posting.
tkyte@TKYTE9I.US.ORACLE.COM> select *
2 from (
3 select t1.trans_id,
4 decode( r, 1, 'C1', 2, 'C2', 3, 'C3' ) cname,
5 decode( r, 1, to_char(t1.c1,'dd-mon-yyyy hh24:mi:ss'),
6 2, to_char(t1.c2),
7 3, t1.c3) v1,
8 decode( r, 1, to_char(t2.c1,'dd-mon-yyyy hh24:mi:ss'),
9 2, to_char(t2.c2),
10 3, t2.c3) v2
11 from trans t1,
12 trans_tmp t2,
13 (select rownum r from all_objects where rownum <= 3 )
14 where t1.trans_id = :bv
15 and t1.trans_id = t2.trans_id (+)
16 )
17 where v1 <> v2
18 or (v1 is null and v2 is not null)
19 or (v1 is not null and v2 is null)
20 /
Following is the query I am currenlty using for diffs
(lk is the primary key)
(select 180 columns from A@itenv where lk in (select lk from lklnr@itenv)
minus
select 180 columns from A@cbenv where lk in (select lk from lklnr@cbenv))
union all
(select 180 columns from A@cbenv where lk in (select lk from lklnr@cbenv)
minus
select 180 columns from A@itenv where lk in (select lk from lklnr@itenv))
How can I rewrite the above query for my use
to get the differences and columns with both values.
Thankyou.
April 28, 2005 - 9:34 am UTC
building on the comparing two tables article here:
https://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html
which demonstrates what I've found to be fastest approach (see second article on that page)...
You sort of need to have some idea of the "primary key" here (in order to compare what is different -- we need to be able to "join" the two tables row by row right)...
So, you use the query there:
SQL> select c1, c2, c3,
2 count(src1) CNT1,
3 count(src2) CNT2
4 from
5 ( select a.*,
6 1 src1,
7 to_number(null) src2
8 from a
9 union all
10 select b.*,
11 to_number(null) src1,
12 2 src2
13 from b
14 )
15 group by c1,c2,c3
16 having count(src1) <> count(src2)
17 /
C1 C2 C3 CNT1 CNT2
--- -- -- ---- ----
2 x y 0 1
2 xx y 1 0
3 x y 1 0
3 x yy 0 1
<b>and order by "primary key" (say it was C1 in this example)</b>
then the matching rows appear one under the other.
or you can further use analytics and
count(distinct nvl(c2,'something c2 cannot be')) over (partition by c1) cnt_c2,
and look at the columns where cnt_NN > 1
Lk is the primary key in table A and table B
A reader, April 28, 2005 - 12:24 pm UTC
April 28, 2005 - 2:03 pm UTC
great -- use it where I used c1..
I like analytics
Vinayak, April 28, 2005 - 2:07 pm UTC
u mean something like this?:
SQL> select * from t1 ;
ID COL1 COL2
---------- ---------- ----------
1 x XXX
1 x X
1 y YY
SQL> select * from t2 ;
ID COL1 COL2
---------- ---------- ----------
1 x XXX
1 y Y
SQL> select * from
2 (select tname,id,col1,col2,count(*) over (partition by id,col1,col2) cnt
3 from (select 'T1' tname,t1.* from t1 union all select 'T2' tname,t2.* from t2)
4 ) where cnt=1 ;
TNAME ID COL1 COL2 CNT
-------------------------------- ---------- ---------- ---------- ----------
T1 1 x X 1
T2 1 y Y 1
T1 1 y YY 1
April 28, 2005 - 2:10 pm UTC
yes
Tom, I need further help
GP, April 28, 2005 - 2:26 pm UTC
I went through the article and was wonderful.
I tried rewriting my query
but I am missing please guide me further
LK is the Primary key in both the tables
select c1, c2, c3, count(src1) CNT1, count(src2) CNT2
from
(
select a.*, 1 src1, to_number(null) src2
from A@itenv a
where lk in (select lk from lklnr@itenv)
union all
select b.*, to_number(null) src1, 2 src2
from B b
where lk in (select lk from lklnr@itenv)
)
where a.lk = b.lk
group by c1,c2,c3
having count(src1) <> count(src2)
I need the lk, columns with both the values that are different.
Please Guide.
April 28, 2005 - 2:34 pm UTC
change c1, c2, c3.... to YOUR COLUMNS
there is not any join.
select YOUR_COLUMNS, count(src1) CNT1, count(src2) CNT2
from
(
select a.*, 1 src1, to_number(null) src2
from A@itenv a
where lk in (select lk from lklnr@itenv)
union all
select b.*, to_number(null) src1, 2 src2
from B b
where lk in (select lk from lklnr@itenv)
)
group by YOUR_COLUMNS
having count(src1) <> count(src2)
I did the column select
GP, April 28, 2005 - 3:32 pm UTC
I am sorry, I guess I did not put my question clear.
Yeap I am putting all my columns in select and group by
but that would give me all the columns in my resultset
but I want to have lk, just the column that changed, value 1,val 2.
(val1 and val2 are value of columns that are different in both rows)
Please help me get through further
Thankyou very much.
April 28, 2005 - 4:22 pm UTC
there is where after you get this:
select YOUR_COLUMNS, count(src1) CNT1, count(src2) CNT2
from
(
select a.*, 1 src1, to_number(null) src2
from A@itenv a
where lk in (select lk from lklnr@itenv)
union all
select b.*, to_number(null) src1, 2 src2
from B b
where lk in (select lk from lklnr@itenv)
)
group by YOUR_COLUMNS
having count(src1) <> count(src2)
(call that Q) you'll:
select q.*, count(distinct c1) over (partition by lk) cnt_c1, ....
from (q)
those counts can be used as "flags" to say "look at me"
Last followup on this table comparison thread is confusing to me
Rick, May 09, 2005 - 7:37 pm UTC
This has been an awesome and very useful thread. However,is it possible for you to provide a more complete example on how to return only the primary key column and column(s) whose values have changed please?
Did you mean that the results of the compare rows script be inserted into a table called q and then select from that table to get the counts? You say that the counts can be used as "flags" to say "look at me" Would you please expound on that? Please excuse my obtuseness
May 09, 2005 - 8:02 pm UTC
you have to look column by column -- no magic there, the flags where just a way to say "look at me", nothing more.
Maybe this script will help! ;)
LOU, July 14, 2005 - 5:28 pm UTC
undefine TABLE1
undefine TABLE2
define g_table1 = '&&TABLE1'
define g_table2 = '&&TABLE2'
set verify off
set feedback off
set serveroutput on size 1000000
spo temp_file.sql
declare
v_owntab1 varchar2(255) := '&&g_table1';
v_owntab2 varchar2(255) := '&&g_table2';
v_own1 varchar2(255);
v_own2 varchar2(255);
v_tab1 varchar2(255);
v_tab2 varchar2(255);
v_dot1 number := 0;
v_dot2 number := 0;
type t_cols is table of varchar2(255) index by binary_integer; v_cols1
t_cols; v_cols2 t_cols; v_out1 varchar2(255); v_out2 varchar2(255); kq
CONSTANT varchar2(1) := ''''; v_ind number := 0; v_str
varchar2(2000):=null; v_ind_found boolean := FALSE; v_ind_colno number
:= 0;
procedure print_cols (p_cols in t_cols) is
begin
for i in 1..p_cols.count
loop
dbms_output.put_line(','||p_cols(i));
end loop;
end print_cols;
begin
v_dot1 := instr(v_owntab1, '.');
v_dot2 := instr(v_owntab2, '.');
if v_dot1 > 0 then
v_own1 := upper(substr(v_owntab1, 1, v_dot1-1));
v_tab1 := upper(substr(v_owntab1, v_dot1+1));
else
v_own1 := null;
v_tab1 := upper(v_owntab1);
end if;
if v_dot2 > 0 then
v_own2 := upper(substr(v_owntab2, 1, v_dot2-1));
v_tab2 := upper(substr(v_owntab2, v_dot2+1));
else
v_own2 := null;
v_tab2 := upper(v_owntab2);
end if;
select column_name
bulk collect into v_cols1
from all_tab_columns
where table_name = v_tab1
and owner = nvl(v_own1, user)
order by column_id;
select column_name
bulk collect into v_cols2
from all_tab_columns
where table_name = v_tab2
and owner = nvl(v_own2, user)
order by column_id;
if v_cols1.count = 0 or v_cols2.count = 0 then
dbms_output.put_line('Either or Both the tables are invalid');
return;
end if;
dbms_output.put_line('(');
dbms_output.put_line('select '||kq||'TAB1'||kq);
print_cols(v_cols1);
dbms_output.put_line(' from '||nvl(v_own1, user)||'.'||v_tab1);
dbms_output.put_line('MINUS');
dbms_output.put_line('select '||kq||'TAB1'||kq);
print_cols(v_cols1);
dbms_output.put_line(' from '||nvl(v_own2, user)||'.'||v_tab2);
dbms_output.put_line(')');
dbms_output.put_line('UNION');
dbms_output.put_line('(');
dbms_output.put_line('select '||kq||'TAB2'||kq);
print_cols(v_cols1);
dbms_output.put_line(' from '||nvl(v_own2, user)||'.'||v_tab2);
dbms_output.put_line('MINUS');
dbms_output.put_line('select '||kq||'TAB2'||kq);
print_cols(v_cols1);
dbms_output.put_line(' from '||nvl(v_own1, user)||'.'||v_tab1);
dbms_output.put_line(')');
dbms_output.put_line('order by ');
for c1 in (
select b.column_name
from all_indexes a, all_ind_columns b
where a.owner=b.index_owner
and a.index_name=b.index_name
and a.uniqueness = 'UNIQUE'
and a.table_owner = nvl(v_own1, user)
and a.table_name = v_tab1
order by b.index_name, b.column_position
)
loop
v_ind_found := TRUE;
v_ind_colno := v_ind_colno + 1;
if v_ind_colno = 1 then
dbms_output.put_line(c1.column_name);
else
dbms_output.put_line(','||c1.column_name);
end if;
end loop;
if not v_ind_found then
dbms_output.put_line('2 ');
end if;
dbms_output.put_line(';');
end;
/
spo off
set feedback on
Humaira Liaqat, August 15, 2005 - 4:32 am UTC
Hello Tom,
I want to write a procedure which would insert,update and delete records from
10 tables using join query.
The structure of these tables is: they have 3 columns namely OLD,NEW AND VALUE.
The query is like
SELECT A.RED AS OLD,B.RED AS NEW,B.NAME AS VALUE FROM COLOR_S1 A ,COLOR_S B
WHERE A.NAME = B.NAME (+) UNION
SELECT A.RED AS OLD,B.RED AS NEW,A.NAME AS VALUE FROM COLOR_S1 A,COLOR_S B WHERE
B.NAME = A.NAME (+) ;
Here for example COLOR_S1 and COLOR_S2 are similar parent tables where value of
red is different in both the tables but they are joined using NAME which is
common for both the tables.
When ever there is an insert or update or delete in any of these parent
tables(like COLOR_S1 AND COLOR_S2), the procedure shuld fire and respective
child table( 3 column table) should get update.
The number of parent tables is 20(inclusive of all _S1 and _S2 tables).The
number of child tables are 10.(_S tables).
Please guide me how to write this procedure so that the query gets fired
everytime any change occurs in any of the 20 parent tables.
Thanks
Humaira.
August 15, 2005 - 7:20 am UTC
did not follow that at all.
Humaira Liaqat, August 16, 2005 - 1:15 am UTC
Hi Tom,
I am sorry I could not explain You properly.
Anyway can you just help me to write a procedure to update a table if any insert,update or delete happens in the parent table? If a child table has 2 parent which are tables similar to each other except their primary key can a join query be used in the procedure to populate the child table?
Thanks
Humaira.
August 16, 2005 - 10:50 am UTC
and what might this "update" do -- you'll need to explain it before I can demonstrate it.
What is the goal, why do you need to update "a table" (assuming the child table(s)) when you update the parent? what are you trying to accomplish.
comparing two table data and logging the changes in MS Access
Shailesh, March 16, 2006 - 3:55 am UTC
Now I want to compare two tables having same Structure in MS Access
March 16, 2006 - 11:41 am UTC
go for it?
Now you have the differences, process them
Drew DeGraff, July 14, 2006 - 12:22 pm UTC
I built upon what you guys started.
This will compare a “base” table with a “refresh” and put the differences in a “diff” table
We will need to create a diff table from the base table and add three columns; base_tab number, refresh_tab number, and action varchar2(30). These columns will indicate which table the differences come from.
We first populate the diff table. We will then update the diff table with the action, update, delete, insert, old, that the rows represent. The query brings back all records that are different, “old” will represent the records in the base table that were changed in some way by the “update” records. The “old” records can be ignored or deleted.
The basic query for getting the differences is
select pk, column1, column2, column3,
count(src1) base_tab, count(src2) refresh_tab
from
( select a.*,
1 src1,
to_number(null) src2
from base_tab a
union all
select b.*,
to_number(null) src1,
2 src2
from refresh_tab b
)
group by pk, column1, column2, column3
having count(src1) <> count(src2)
This query only returns rows that are different or do no exist in the base and/or the refresh table.
The base_tab and refresh_tab column fields should only contain a 1 or 0 unless the primary key has been violated and there are duplicates. 1 or more means it exists in that respective table, 0 means it doesnÂ’t exist. If there are duplicates then we may need to tweak the code further.
Create table base_tab (pk number, column1 number, column2 varchar2(30), column3 date);
Create table refresh_tab (pk number, column1 number, column2 varchar2(30), column3 date);
Create table diff (pk number, column1 number, column2 varchar2(30), column3 date, base_tab number, refresh_tab number, action varchar2(30));
Insert into base_tab VALUES (1, 1.21,Â’SNACK CAKESÂ’,Â’31-aug-2007Â’);
Insert into base_tab VALUES (2, .96,Â’DORITOSÂ’,Â’30-sep-2007Â’);
Insert into base_tab VALUES (3, .31,Â’SODAÂ’,Â’31-aug-2007Â’);
Insert into base_tab VALUES (4, 2.67,Â’PIZZAÂ’,Â’30-nov-2007Â’);
Insert into refresh_tab VALUES (1, 1.21,Â’SNACK CAKESÂ’,Â’31-aug-2007Â’);
Insert into refresh_tab VALUES (2, .96,Â’DORITOSÂ’,Â’31-OCT-2007Â’);
Insert into refresh_tab VALUES (3, .31,Â’SODA POPÂ’,Â’31-aug-2007Â’);
Insert into refresh_tab VALUES (5, .21,Â’SNICKERSÂ’,Â’30-nov-2007Â’);
Insert into refresh_tab VALUES (6, .45,Â’BEERÂ’,Â’30-nov-2007Â’);
--clean out the differences table
truncate table diff;
--
-- build the difference data
--
insert into diff select pk,column1,column2, column3,
count(src1) base_tab, count(src2) refresh_tab, null action
from
( select a.*,
1 src1,
to_number(null) src2
from base_tab a
union all
select b.*,
to_number(null) src1,
2 src2
from refresh_tab b
)
group by pk,column1,column2, column3
having count(src1) <> count(src2);
--
-- update old and update records. It gets all records where the pk
-- exists more than once, one record would be from the base, the other
-- is the changed record in the refresh table
-- if the base_tab column contains a 1, then itÂ’s a Old, else its an
-- Update
--
update diff set action=decode(base_tab,1,'Old','Update') where
(pk) IN (
select pk from (
select pk, count(*) from diff group by pk
having count(*) >1) );
--
-- update the delete and insert records. It gets all records where the
-- pk exist only once.
-- if the base_tab column contains a 1, then itÂ’s a Delete, else its an -- insert
--
update diff set action=decode(base_tab,1,'Delete','Insert') where
(pk) IN (
select pk from (
select pk, count(*) from diff
group by pk having count(*) =1) );
SELECT * FROM DIFF;
PK col1 col2 col3 bt rt action
2 .96 DORITOS 30-SEP-07 1 0 Old
2 .96 DORITOS 31-OCT-07 0 1 Update
3 .31 SODA 31-AUG-07 1 0 Old
3 .31 SODA POP 31-AUG-07 0 1 Update
4 2.67 PIZZA 30-NOV-07 1 0 Delete
5 .21 SNICKERS 30-NOV-07 0 1 Insert
6 .45 BEER 30-NOV-07 0 1 Insert
Now you can process the orders any way you wish.
Please tell me if I am missing something
Srinivas Narashimalu, July 21, 2006 - 11:27 am UTC
Hi Tom,
I had previously taken your advice on tuning this query. I had tuned it and have also come up with a different approach of getting this report.
This query essentially displays records for periods (let's say) 2005.01.01 to 2005.07.21 (sysdate-12months) and 2006.01.01 to 2006.07.21 (sysdate). This is mentioned in the "pfact.invoice_date_key" condition in the query (query is given below).
Previously the resultset of this query would be inserted to a table everyday (with truncating it before inserting).
Now I want to insert into the table only one day's new records each day, instead of truncating and populating it everyday with 'all' the data.
I created FULL_REP table (to hold all the records i.e upto sysdate (condition for "pfact.invoive_date_key" in the query)) using CTAS using the query given below.
I created PART_REP table (to hold partial the records i.e upto sysdate-1 (condition for "pfact.invoive_date_key" in the query)) using CTAS using the query given below.
But this is what I get when I tested it -
SQL> select count(*) from FULL_REP; (this table has data for periods between 01.01.2005 and 07.19.2005 and 01.01.2006 and 07.19.2006)
COUNT(*)
----------
251544
SQL> select count(*) from PART_REP; (this table has data for periods between 01.01.2005 and 07.18.2005 and 01.01.2006 and 07.18.2006)
COUNT(*)
----------
250926
SQL> select 251544 - 250926 from dual;
251544-250926
-------------
618
SQL> create table STAGE_REP as (select * from FULL_REP minus select * from PART_REP);
Table created.
SQL> select count(*) from del2;
COUNT(*)
----------
8909
WHY IS IT CREATED WITH 8909 ROWS? IT MUST HAVE BEEN 618 RIGHT?
And when I insert the STAGE_REP values into PART_REP it's resulting in
select count(*) from PART_REP;
COUNT(*)
----------
259835
Please tell me where I am going wrong. Below is the sql used to create tables FULL_REP,PART_REP and STAGE_REP.
SELECT pfact.invoice_year,
dlr.dealer_key,
dlr.dealer_number,
dlr.name,
dlr.city||', '||dlr.state AS location,
dlr.country,
dlr.classification,
dlr.start_date_active,
dlr.end_date_active,
dlr.old_dealer_number,
regdist.region_code,
regdist.region_desc,
regdist.regional_manager,
regdist.district_code,
regdist.district_desc,
regdist.district_manager,
dlr.facing_dealer,
dlr.parts_parent_dealer,
-- dlr.parent_location, -- to be added to dealer_dim
'',
dlr.financial_report_dealer,
dlr.consolidation_dealer,
dlr.parts_management_group,
org.organization_code,
org.division_code,
supp.supplier_code,
supp.name,
pgrp.product_group_desc,
-- pvdr.parent_code,
-- pvdr.part_classification,
-- pvdr.proprietary_ind,
-- dship.direct_ship_program,
-- ordr.order_type,
-- pri.priority_code,
NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY'),
DECODE(ordr.order_type,'PDC',
DECODE(pri.priority_code,'S',
SUM(pfact.local_ext_sales_price), 0) ,0) ,0), 0) AS pdc_stock,
NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY'),
DECODE(ordr.order_type,'PDC',
DECODE(pri.priority_code,'E',
SUM(pfact.local_ext_sales_price), 0), 0), 0), 0) AS pdc_vor,
NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY'),
DECODE(ordr.order_type,'DSP',
SUM(pfact.local_ext_sales_price), 0), 0), 0) AS vdsp,
NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY'),
DECODE(ordr.order_type,'SD',
SUM(pfact.local_ext_sales_price), 0), 0), 0) AS ship_direct,
NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY'),
SUM(pfact.local_ext_sales_price), 0), 0) AS gross_sales,
NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY'),
SUM(pfact.local_gross_profit), 0), 0) AS gross_profit,
NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY') - 1,
DECODE(ordr.order_type,'PDC',
DECODE(pri.priority_code,'S',
SUM(pfact.local_ext_sales_price), 0), 0), 0), 0) AS pdc_stock_prev,
NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY') - 1,
DECODE(ordr.order_type,'PDC',
DECODE(pri.priority_code,'E',
SUM(pfact.local_ext_sales_price), 0), 0), 0), 0) AS pdc_vor_prev,
NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY') - 1,
DECODE(ordr.order_type,'DSP',
SUM(pfact.local_ext_sales_price), 0), 0), 0) AS vdsp_prev,
NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY') - 1,
DECODE(ordr.order_type,'SD',
SUM(pfact.local_ext_sales_price), 0), 0), 0) AS ship_direct_prev,
NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY') - 1,
SUM(pfact.local_ext_sales_price), 0), 0) AS gross_sales_prev,
NVL(DECODE(invoice_year, TO_CHAR(SYSDATE - 1,'YYYY') - 1,
SUM(pfact.local_gross_profit), 0), 0) AS gross_profit_prev
FROM priority_dim pri,
product_group_dim pgrp,
order_type_dim ordr,
organization_dim org,
dealer_dim dlr,
region_district_dim regdist,
price_vendor_dim pvdr,
part_dim pdim,
supplier_dim supp,
part_order_line_fact pfact
WHERE regdist.type(+) = 'PARTS'
-- AND dlr.dealer_number = 'A300'
AND regdist.region_district_key IN
(SELECT MIN(region_district_key)
FROM region_district_dim
WHERE type = 'PARTS'
AND dealer_key = dlr.dealer_key)
AND (pfact.invoice_date_key BETWEEN TO_NUMBER(TO_CHAR(SYSDATE-1, 'yyyy')- 1||'0101')
AND TO_NUMBER(TO_CHAR(SYSDATE-1, 'yyyymmdd')))
AND (pfact.invoice_date_key NOT BETWEEN TO_NUMBER(TO_CHAR(ADD_MONTHS(SYSDATE-1, -12), 'yyyymmdd'))
AND TO_CHAR(TO_CHAR(SYSDATE-1, 'YYYY')- 1)||'1231')
--w/ Canada dealers
-- AND (SUBSTR(dlr.dealer_number, 1, 1) NOT IN ('X','Y'))
--w/o Canada dealers
AND (SUBSTR(dlr.dealer_number, 1, 1) NOT IN ('X','Y','N'))
AND NVL(gl_code, 'MISC') NOT LIKE 'DDSS%'
AND NVL(gl_code, 'MISC') NOT LIKE 'SSMX%'
AND NVL(gl_code, 'MISC') NOT LIKE 'SSMU%'
AND pdim.part_ind = 'Y'
AND dlr.dealer_key = regdist.dealer_key(+)
AND dlr.dealer_key = pfact.dealer_key
-- AND regdist.region_district_key (+) = pfact.region_district_key
AND org.organization_key = pfact.organization_key
AND pgrp.product_group = pvdr.product_group
-- AND cr.credit_reason_key = pfact.credit_reason_key
-- AND dship.direct_ship_program_key(+) = pfact.direct_ship_program_key
AND ordr.order_type_key = pfact.order_type_key
AND pri.priority_key = pfact.priority_key
AND pdim.part_key = pfact.shipped_part_key
AND pdim.organization_key = pfact.organization_key
AND pvdr.price_vendor_key = pfact.price_vendor_key
AND supp.supplier_key = pfact.supplier_key
GROUP BY pfact.invoice_year,
dlr.dealer_key,
dlr.dealer_number,
dlr.name,
dlr.city||', '||dlr.state,
dlr.country,
dlr.classification,
dlr.start_date_active,
dlr.end_date_active,
dlr.old_dealer_number,
regdist.region_code,
regdist.region_desc,
regdist.regional_manager,
regdist.district_code,
regdist.district_desc,
regdist.district_manager,
dlr.facing_dealer,
dlr.parts_parent_dealer,
-- dlr.parent_location,
dlr.financial_report_dealer,
dlr.consolidation_dealer,
dlr.parts_management_group,
org.organization_code,
org.division_code,
supp.supplier_code,
supp.name,
pgrp.product_group_desc,
-- pvdr.parent_code,
-- pvdr.part_classification,
-- pvdr.proprietary_ind,
-- dship.direct_ship_program,
ordr.order_type,
pri.priority_code;
Sorry to bother with such huge query!
Thanks,
Srinivas
July 23, 2006 - 7:41 am UTC
I'd be looking at my data to figure out what I did wrong? I mean, you have the 8206 rows - debug it?
Not knowing your data as you do....
Not knowing how you built part and full rep (are they "correct" in the first place)
One cannot really say (and that is a rather large query, I'm not going to reverse engineer it here)
Appreciate it!
Srinivas Narashimalu, July 23, 2006 - 10:42 pm UTC
Thanks for the response Tom!
I know it's a large query..I was sort of not comfortable posting that..but somehow wanted to know where I was going wrong as I have been troubleshooting for quiet sometime.
Anyhow thanks again for your response and greatly appreciate your politeness.
Thanks,
Srinivas
Bringing 1 table in sync with the other
A reader, July 28, 2006 - 4:33 pm UTC
Given 2 tables with similar structure (same PK and N optional attributes)
How can the techniques discussed on this thread be used to bring Table A "in sync" with Table B? i.e.
a. if the PK is the same, update the N optional attributes in Table A to be the same as those in Table B.
b. If B doesn't have the row, delete it from A.
c. If A doesn't have the row, add it to A
i.e. B is the master table, I want to make A look like it.
Additionally, I would like to "log" each INSERT/UPDATE/DELETE I am doing to a audit trail, so I guess a set-based approach is out?
Any ideas appreciated. Thanks
July 28, 2006 - 9:01 pm UTC
simple.
delete from a;
insert into a select * from b;
not sure of the point of the audit trail since a row in b could have been changed 500 times, or once?
Checksum Based Approach
Thiyag, September 13, 2006 - 2:26 am UTC
Hi Tom:
Can you suggest some checksum based approach for comparing the table contents. For example say there is some algorithm built, which reads the contents of the table and give a checksum value which could be used to compare the content of another table present in some other database. This way the two databases need not be connected to perform the content check.
September 13, 2006 - 7:25 am UTC
select sum( checksum( col1||'/'||col2||'/'||....||'/'||colN) )
from t;
perhaps.
What is checksum?
Thiyag, September 13, 2006 - 9:44 am UTC
I am interested in knowing your opinion on how this checksum need to be written? Any suggestions. Is there some already implemented packages available in Oracle which uses this checksum but for a different purpose that could be harnessed?
September 13, 2006 - 2:49 pm UTC
dbms_obfuscation_toolkit can create an md5 hash
dbms_crypto has more than just md5
they are checksum like examples.
multiple tables with same structure
satish, November 16, 2006 - 9:52 am UTC
TOM,
I have two tables for each day like (t20060101, o20060101 ...etc say 730 tables per year) with same structure, I need to query data from these tables, given column values and start date and end date.
Is this type of question answered? if so, please give me the link, if not, can you please give me the code?
satish
November 16, 2006 - 3:20 pm UTC
I would say you've done a bad thing.
You meant to partition
but you didn't
why not?
A reader, November 17, 2006 - 9:38 am UTC
A similar table comparison problem
sd, March 29, 2007 - 6:22 am UTC
Hello Tom,
i've got a similar constellation as the original poster. I want to compare two
master-detail tables efficiently in a PL/SQL procedure.
These are the simplified example tables. Each entry in the master table can
have an arbitrary amount of entries in the detail table. P/Fkeys and indices
are missing, obviously.
CREATE TABLE app_queue (
id NUMBER NOT NULL,
description VARCHAR2(100) NOT NULL
);
CREATE TABLE app_queue_detail (
queue_id NUMBER NOT NULL,
fieldname VARCHAR2(40) NOT NULL,
fieldvalue VARCHAR2(40) NOT NULL
);
CREATE TABLE app_notification (
id NUMBER NOT NULL,
description VARCHAR2(100) NOT NULL
);
CREATE TABLE app_notification_detail (
notification_id NUMBER NOT NULL,
fieldname VARCHAR2(40) NOT NULL,
fieldvalue VARCHAR2(40) NOT NULL
);
I want to return all app_notification and app_queue id's where all fieldnames
and fieldvalues of the notification match a queue's details. I've had some
approaches with nested cursors but i wasn't really satisfied with them.
Example data following.
Match, return app_queue.id and app_notification.id:
app_queue_detail app_notification_detail
NAME joseph NAME joseph
BDAY 1990
No match:
app_queue_detail app_notification_detail
NAME joseph NAME john
BDAY 1990 BDAY 19%
No match:
app_queue_detail app_notification_detail
NAME joseph FAVC gr%
BDAY 1990
ZIPC 31337
Is there a possibility to turn this problem into an efficient query?
Thanks in advance.
March 30, 2007 - 12:47 pm UTC
need some inserts there
and an explanation of why the first one is a match.
differences of records in single table
Rasika, May 24, 2007 - 1:09 pm UTC
Hi Tom,
I have similar question but this is with a single table, let's say we audit changes to the records of a table and we transfer each modfied record into new audit table
ex: if table T1 is with columns (PK, C1, C2, C3) and our audit table is AUD_T1 (PK, C1, C2, C3, OPERATION_TYPE, TIMESTAMP, USER)
(In the audit table actually PK is not primary key column but it just keep the values for primary key column PK of T1)
so when any change happens to T1 that new (or old) record will be copied to AUD_T1 with additional op_type (insert, delete, update) timestamp and the user
so it could be as follows in AUD_T1
PK C1 C2 C3 OPERATION_TYPE TIMESTAMP USER
1 a b c UPDATE t1 u1
1 a1 b c1 UPDATE t2 u1
1 a1 b1 c1 UPDATE t3 u2
2 c d e UPDATE t4 u1
2 c1 d e UPDATE t5 u2
so what I need to generate is table as follows using above date in AUD_T1
to make it easy if we consider records with PK = 1
TIMESTAMP COLUMN OLD_VALUE NEW_VALUE
t2 C1 a a1
t2 C3 c c1
t3 C2 b b1
and actually if I can get with following format that would be even great
At t2 user u1 changed
--
C1 a a1
C3 c c1
At t3 user u2 changed
--
C2 b b1
This is basically yo represent some meaningful audit report to user, we already keep records as
in AUD_T1 but the problem is presenting them as above..
If you could provide few SQL queries or PL/SQL SPs which could do such thing or point to any
existing question & answer link that would be great
Thanks Tom
data from single table
Rasika, May 24, 2007 - 1:11 pm UTC
Hi Tom,
I have similar question but this is with a single table, let's say we audit changes to the records of a table and we transfer each modfied record into new audit table
ex: if table T1 is with columns (PK, C1, C2, C3) and our audit table is AUD_T1 (PK, C1, C2, C3, OPERATION_TYPE, TIMESTAMP, USER)
(In the audit table actually PK is not primary key column but it just keep the values for primary key column PK of T1)
so when any change happens to T1 that new (or old) record will be copied to AUD_T1 with additional op_type (insert, delete, update) timestamp and the user
so it could be as follows in AUD_T1
PK C1 C2 C3 OPERATION_TYPE TIMESTAMP USER
1 a b c UPDATE t1 u1
1 a1 b c1 UPDATE t2 u1
1 a1 b1 c1 UPDATE t3 u2
2 c d e UPDATE t4 u1
2 c1 d e UPDATE t5 u2
so what I need to generate is table as follows using above date in AUD_T1
to make it easy if we consider records with PK = 1
TIMESTAMP COLUMN OLD_VALUE NEW_VALUE
t2 C1 a a1
t2 C3 c c1
t3 C2 b b1
and actually if I can get with following format that would be even great
At t2 user u1 changed
--
C1 a a1
C3 c c1
At t3 user u2 changed
--
C2 b b1
This is basically yo represent some meaningful audit report to user, we already keep records as
in AUD_T1 but the problem is presenting them as above..
If you could provide few SQL queries or PL/SQL SPs which could do such thing or point to any
existing question & answer link that would be great
We're using Oracle 10g
Thanks Tom
tom, please declare...
Ralph Buetikofer, January 14, 2009 - 5:33 am UTC
hi tom
thank you so far for lots of good tipps!
i have to compare several huge tables with millions of records in it. so read this thread and found some good solutions incl. the one from stefano and you. now i have to find the fastest way to compare our data.
therefor i have a table with only two cols, no pk and 5 millions records. it's intressting for me finding out that solution 1 is faster than your solution 2. did i forget something or is there something wrong in my example? i'd like you to let me know.
here the examples:
1) A where A not in B union all B where B not in A
SELECT a.col1, a.col2
FROM tab_a a
WHERE (a.col1, a.col2) NOT IN
(SELECT b.col1, b.col2
FROM tab_b b)
UNION ALL
SELECT b.col1, b.col2
FROM tab_b e60
WHERE (b.col1, b.col2) NOT IN
(SELECT a.col1, a.col2
FROM tab_a a);
result: 0 Records selected (compare was fine), 17seconds
2) your solution described above
SELECT col1,
col2,
COUNT(src1) CNT1,
COUNT(src2) CNT2
FROM (SELECT a.col1, a.col2,
1 src1,
TO_NUMBER(NULL) src2
FROM tab_a a
UNION ALL
SELECT b.col1, b.col2,
TO_NUMBER(NULL) src1,
2 src2
FROM tab_b b)
GROUP BY col1,
col2
HAVING COUNT(src1) <> COUNT(src2);
result: 0 Records selected (compare was fine), 25seconds... (?)
btw: we're using oracle 10g r2
any ideas tom?
thank you for your reply.
January 16, 2009 - 4:23 pm UTC
tkprof it, post those results.
wall clock times are not very useful - probably they are "the same"
Your code does not handle nulls at all, hopefully col1 and col2 are NOT NULL.. That is, your query is not useful if any of the attributes are NULLABLE (so it would not matter how fast or slow it is :) )
ops$tkyte%ORA11GR1> create table tab_a ( col1 number, col2 number );
Table created.
ops$tkyte%ORA11GR1> create table tab_b ( col1 number, col2 number );
Table created.
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> insert into tab_a values ( 1, 1 );
1 row created.
ops$tkyte%ORA11GR1> insert into tab_a values ( 1, null );
1 row created.
ops$tkyte%ORA11GR1> insert into tab_b values ( 1, 1 );
1 row created.
ops$tkyte%ORA11GR1> insert into tab_b values ( null, 1 );
1 row created.
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> SELECT a.col1, a.col2
2 FROM tab_a a
3 WHERE (a.col1, a.col2) NOT IN
4 (SELECT b.col1, b.col2
5 FROM tab_b b)
6 UNION ALL
7 SELECT b.col1, b.col2
8 FROM tab_b b
9 WHERE (b.col1, b.col2) NOT IN
10 (SELECT a.col1, a.col2
11 FROM tab_a a);
no rows selected
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> SELECT col1,
2 col2,
3 COUNT(src1) CNT1,
4 COUNT(src2) CNT2
5 FROM (SELECT a.col1, a.col2,
6 1 src1,
7 TO_NUMBER(NULL) src2
8 FROM tab_a a
9 UNION ALL
10 SELECT b.col1, b.col2,
11 TO_NUMBER(NULL) src1,
12 2 src2
13 FROM tab_b b)
14 GROUP BY col1,
15 col2
16 HAVING COUNT(src1) <> COUNT(src2);
COL1 COL2 CNT1 CNT2
---------- ---------- ---------- ----------
1 0 1
1 1 0
Matching from Single Table based on user input
Shravani, July 08, 2011 - 4:47 am UTC
Hi Tom,
Thanks so much for all your help and mentoring.
I have a requirement of comparing the records from same table. The output should be based on the user input.
Script :
----------------------------------------------------
drop table dispatch_records;
create table dispatch_records (cust_no varchar2(10), dispatch_Date date, dispatch_zip varchar2(10));
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_1', '01-Jan-2001', '07001');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_1', '11-Jan-2001', '07771');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_1', '22-Feb-2001', '08731');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_2', '22-Feb-2001', '07771');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_2', '22-Feb-2001', '08731');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_3', '27-Mar-2003', '07001');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_4', '27-Mar-2003', '07001');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_5', '01-Jan-2001', '08731');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_5', '01-Jan-2001', '07001');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_5', '01-Jan-2001', '07771');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_6', '22-Feb-2001', '07771');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_6', '22-Feb-2001', '08731');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_7', '22-Feb-2001', '10011');
insert into dispatch_records(cust_no, dispatch_date, dispatch_zip) values ('Cust_7', '22-Feb-2001', '10017');
commit;
----------------------------------------------------
The requirement is
If user selects option "All" then interface should display the customers where all the zipcode are matching
Output For "All ZipCode(s) Matching"
---------------------------------------------
Dispatch Location matching for : Cust_1, Cust_5
Dispatch Location matching for : Cust_2, Cust_6
Dispatch Location matching for : Cust_3, Cust_4
---------------------------------------------
If user selects option "Any 1" then it means display the customers for which any 1 zipcode is matching.
Output For "Any 1 ZipCode(s) Matching"
-----------------------------------------------
Dispatch Location matching for : Cust_1, Cust_2
Dispatch Location matching for : Cust_1, Cust_3
Dispatch Location matching for : Cust_1, Cust_4
Dispatch Location matching for : Cust_1, Cust_5
Dispatch Location matching for : Cust_1, Cust_6
Dispatch Location matching for : Cust_2, Cust_5
Dispatch Location matching for : Cust_2, Cust_6
Dispatch Location matching for : Cust_3, Cust_4
Dispatch Location matching for : Cust_3, Cust_5
Dispatch Location matching for : Cust_4, Cust_5
Dispatch Location matching for : Cust_5, Cust_6
---------------------------------------------
If user selects option "Any 2" then it means display the customers for which any 2 zip codes are matching.
Output For "Any 2 ZipCode(s) Matching"
-----------------------------------------------
Dispatch Location matching for : Cust_1, Cust_2
Dispatch Location matching for : Cust_1, Cust_5
Dispatch Location matching for : Cust_1, Cust_6
Dispatch Location matching for : Cust_2, Cust_5
Dispatch Location matching for : Cust_2, Cust_6
Dispatch Location matching for : Cust_5, Cust_6
---------------------------------------------
Can you please suggest a effective way in SQL to achieve this?
Thanks & Regards
Shravani
July 08, 2011 - 2:59 pm UTC
ops$tkyte%ORA11GR2> with t
2 as
3 ( select distinct cust_no, dispatch_zip zip,
4 count(distinct dispatch_zip) over (partition by cust_no) cnt
5 from dispatch_records
6 )
7 select t1.cust_no, t2.cust_no
8 from t t1, t t2
9 where t1.cust_no < t2.cust_no
10 and t1.zip = t2.zip
11 group by t1.cust_no, t2.cust_no
12 having max(t1.cnt) = max(t2.cnt)
13 and max(t1.cnt) = count(*)
14 order by 1
15 /
CUST_NO CUST_NO
---------- ----------
Cust_1 Cust_5
Cust_2 Cust_6
Cust_3 Cust_4
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> with t
2 as
3 ( select distinct cust_no, dispatch_zip zip
4 from dispatch_records
5 )
6 select distinct t1.cust_no, t2.cust_no
7 from t t1, t t2
8 where t1.cust_no < t2.cust_no
9 and t1.zip = t2.zip
10 order by 1
11 /
CUST_NO CUST_NO
---------- ----------
Cust_1 Cust_2
Cust_1 Cust_3
Cust_1 Cust_4
Cust_1 Cust_5
Cust_1 Cust_6
Cust_2 Cust_5
Cust_2 Cust_6
Cust_3 Cust_4
Cust_3 Cust_5
Cust_4 Cust_5
Cust_5 Cust_6
11 rows selected.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> with t
2 as
3 ( select distinct cust_no, dispatch_zip zip
4 from dispatch_records
5 )
6 select t1.cust_no, t2.cust_no
7 from t t1, t t2
8 where t1.cust_no < t2.cust_no
9 and t1.zip = t2.zip
10 group by t1.cust_no, t2.cust_no
11 having count(*) >= 2
12 order by 1
13 /
CUST_NO CUST_NO
---------- ----------
Cust_1 Cust_2
Cust_1 Cust_5
Cust_1 Cust_6
Cust_2 Cust_5
Cust_2 Cust_6
Cust_5 Cust_6
6 rows selected.
Thanks so much for the simple solution.
Shravani, July 11, 2011 - 3:12 am UTC
Result Set is Huge
Shravani, July 27, 2011 - 6:19 am UTC
Hi Tom,
This is regarding the question "Matching from Single Table based on user input" asked in the same thread.
Actually I want to use the query logic you gave to pass input to another program.
Queries are working fine for small volume, however for higher volume it is returning huge data. For example for 100K records in "dispatch_records" table the query is returning 12M records. And in production the volume of this table is more than 1M.
Is it possible to get the output only at customer level and
also is it possible to get the output using "dispatch_records" table just once?
Regards
Shravani
July 28, 2011 - 7:01 pm UTC
state the question entirely here - do not make me read up and down on the page. give YOUR example, explained in your words, with your data.
I did not understand this:
Is it possible to get the output only at customer level and
also is it possible to get the output using "dispatch_records" table just once?
Result Set is Huge
Shravani, July 31, 2011 - 10:40 am UTC
Sorry Tom...I will phrase my question properly.
For further processing, want to get out the customers for whom the dispatch zip codes are common. Basically want to group the output based on what is matching.
Here is the table structure and data.
drop table dispatch_records;
create table dispatch_records (cust_no varchar2(10), dispatch_Date date, dispatch_zip
varchar2(10));
------------------------------------------------------
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_1', '07001');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_1', '07771');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_1', '08731');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_2', '07771');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_2', '08731');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_3', '07001');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_4', '07001');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_5', '08731');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_5', '07001');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_5', '07771');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_6', '07771');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_6', '08731');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_7', '10011');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_7', '10017');
commit;
------------------------------------------------------
The query you gave in previous thread for this is
with t
as
( select distinct cust_no, dispatch_zip zip
from dispatch_records
)
select distinct t1.cust_no, t2.cust_no
from t t1, t t2
where t1.cust_no < t2.cust_no
and t1.zip = t2.zip
order by 1
/
And the output is
CUST_NO CUST_NO
---------- ----------
Cust_1 Cust_2
Cust_1 Cust_3
Cust_1 Cust_4
Cust_1 Cust_5
Cust_1 Cust_6
Cust_2 Cust_5
Cust_2 Cust_6
Cust_3 Cust_4
Cust_3 Cust_5
Cust_4 Cust_5
Cust_5 Cust_6
--------------------
The query does give me the required output, however since it is returning each combination, the resultset is becoming very huge.
Is it possible to get something like below. This will enable me to pass on lesser records to next program. Basically once I have the below output, I will apply the DISTINCT after removing the ZIP
----------------------------------------
Cust_Number Zip Rank
-----------------------------------------
Cust_1 07771 1
Cust_1 08731 1
Cust_2 07771 1
Cust_2 08731 1
Cust_5 07771 1
Cust_5 08731 1
Cust_6 07771 1
Cust_6 08731 1
Cust_1 07001 2
Cust_1 07771 2
Cust_1 08731 2
Cust_5 07001 2
Cust_5 07771 2
Cust_5 08731 2
Cust_1 07001 3
Cust_3 07001 3
Cust_4 07001 3
Cust_5 07001 3
------------------------------------
Regards
Shravani
August 01, 2011 - 11:48 am UTC
explain your output. It is not immediately obvious to me what it represents.
and if you are trying to return less data to the client, why is your desired output LARGER than my original output?
Shravani, August 02, 2011 - 6:48 am UTC
Hi Tom,
Thanks so much for the response. Basically I want to send the customers based on the groups. In the below output I am getting the customers in Pairs. All the customers with RANK column value as 1 are the ones who have 07771 and 08731 zip codes common. The ones with Rank column value as 2, have 3 zip codes common.
Cust_Number Zip Rank
-----------------------------------------
Cust_1 07771 1
Cust_1 08731 1
Cust_2 07771 1
Cust_2 08731 1
Cust_5 07771 1
Cust_5 08731 1
Cust_6 07771 1
Cust_6 08731 1
Cust_1 07001 2
Cust_1 07771 2
Cust_1 08731 2
Cust_5 07001 2
Cust_5 07771 2
Cust_5 08731 2
Cust_1 07001 3
Cust_3 07001 3
Cust_4 07001 3
Cust_5 07001 3
------------------------------------
Basically my final goal is to send the distinct customers based on a common zip codes the represent.
My final goal is to get the output as
Cust_Number Rank
-------------------------
Cust_1 1
Cust_2 1
Cust_5 1
Cust_6 1
Cust_1 2
Cust_2 2
Cust_1 3
Cust_3 3
Cust_4 3
Cust_5 3
The above output is clerly giving the distinct groupings of customers based on the matching zip codes.
Regards
Shravani
August 02, 2011 - 7:12 am UTC
how does cust_2 get a rank of 2 in your final output?????
What appears to be "clear" to you is "muddy" to me.
Sorry for the typo
Shravani, August 02, 2011 - 8:00 am UTC
Sorry once again Tom for my stupid mistakes. The final output should be
Cust_Number Rank
-------------------------
Cust_1 1
Cust_2 1
Cust_5 1
Cust_6 1
Cust_1 2
Cust_5 2
Cust_1 3
Cust_3 3
Cust_4 3
Cust_5 3
Regards
Shravani
August 02, 2011 - 8:38 am UTC
why isn't cust_3 in rank 1?
clear as mud I'd say?
Here is what I think you want - again, the answer is LARGER - but whatever. Your output does not compute for me at all
ops$tkyte%ORA11GR2> with t
2 as
3 ( select distinct cust_no, dispatch_zip zip
4 from dispatch_records
5 )
6 select distinct t1.cust_no t1_cust_no, t2.cust_no t2_cust_no
7 from t t1, t t2
8 where t1.cust_no < t2.cust_no
9 and t1.zip = t2.zip
10 order by 1
11 /
T1_CUST_NO T2_CUST_NO
---------- ----------
Cust_1 Cust_2
Cust_1 Cust_3
Cust_1 Cust_4
Cust_1 Cust_5
Cust_1 Cust_6
Cust_2 Cust_5
Cust_2 Cust_6
Cust_3 Cust_4
Cust_3 Cust_5
Cust_4 Cust_5
Cust_5 Cust_6
11 rows selected.
ops$tkyte%ORA11GR2> with t
2 as
3 ( select distinct cust_no, dispatch_zip zip
4 from dispatch_records
5 )
6 select distinct new_cust_no, dense_rank() over (order by t1_cust_no)
7 from (
8 select t1_cust_no, nvl(t2_cust_no,t1_cust_no) new_cust_no
9 from (
10 select distinct t1.cust_no t1_cust_no, t2.cust_no t2_cust_no
11 from t t1, t t2
12 where t1.cust_no < t2.cust_no
13 and t1.zip = t2.zip
14 )
15 group by grouping sets((t1_cust_no),(t1_cust_no,t2_cust_no))
16 )
17 order by 2
18 /
NEW_CUST_N DENSE_RANK()OVER(ORDERBYT1_CUST_NO)
---------- -----------------------------------
Cust_1 1
Cust_2 1
Cust_3 1
Cust_4 1
Cust_5 1
Cust_6 1
Cust_2 2
Cust_5 2
Cust_6 2
Cust_3 3
Cust_4 3
Cust_5 3
Cust_4 4
Cust_5 4
Cust_5 5
Cust_6 5
16 rows selected.
Sorry Again Tom.....
Shravani, August 03, 2011 - 5:40 am UTC
Hi Tom,
Thanks so much for giving your valuable time on my question.
And I am extremely sorry Tom, I am not explaining the requirements properly.
The cust_3 is not in Rank 1 because it has match on the different zip codes.
This time I will try my best to explain the requirement with expected result set.
I customized your query for pulling the customers to make the groups based on the zip codes and number of zip codes match.
select
c1, c2, rnk, zip
from
(
select distinct c1, c2, zip,
dense_rank() over (order by sum_zip) Rnk
from
(
with t
as
( select distinct cust_no, dispatch_zip zip
from dispatch_records order by 1
)
select t1.cust_no c1, t2.cust_no c2, t1.zip,
sum(t1.zip) over(partition by t1.cust_no, t2.cust_no) sum_zip
from t t1, t t2
where t1.cust_no < t2.cust_no
and t1.zip = t2.zip
)
)
order by 3, 1, 2, 4
Output
===============================
C1 C2 RNK ZIP
===============================
Cust_1 Cust_3 1 07001 --------
Cust_1 Cust_4 1 07001 |
Cust_3 Cust_4 1 07001 Customers in this set have one zip code
Cust_3 Cust_5 1 07001 07001 common and hence are grouped.
Cust_4 Cust_5 1 07001 --------|
Cust_1 Cust_2 2 07771 --------
Cust_1 Cust_2 2 08731 |
Cust_1 Cust_6 2 07771 |
Cust_1 Cust_6 2 08731 |
Cust_2 Cust_5 2 07771 Customers in this set have 08731 and 07771
Cust_2 Cust_5 2 08731 zip codes in common and hence are grouped.
Cust_2 Cust_6 2 07771 |
Cust_2 Cust_6 2 08731 |
Cust_5 Cust_6 2 07771 |
Cust_5 Cust_6 2 08731 --------
Cust_1 Cust_5 3 07001 --------
Cust_1 Cust_5 3 07771 Customers in this set have 3 zip codes in common
Cust_1 Cust_5 3 08731 --------
===============================
I am not able to rollup the above output further to get the final output as
======================
Cust_No Rank
======================
Cust_1 1 --------
Cust_3 1 Customers with Only zip code (07001)
Cust_4 1 matching
Cust_5 1 --------
Cust_1 2 --------
Cust_2 2 Customers with 2 zip codes (08731 and
Cust_5 2 07771) matching.
Cust_6 2 --------
Cust_1 3 ------- Customers with 3 zip codes (07001, 07771 and 08731)
Cust_5 3 --------matching
=====================
I sincerely request you to spend few minutes for the last time on this requirement.
Regards
Shravani
August 03, 2011 - 7:54 am UTC
this has nothing to do with the original query. And you are right - you didn't really explain this before.
Assuming your query is correct, just use the same grouping sets technique I used above:
ops$tkyte%ORA11GR2> select distinct nvl(c2, c1), rnk
2 from (
3 select c1, c2, rnk, zip
4 from
5 (
6 select distinct c1, c2, zip,
7 dense_rank() over (order by sum_zip) Rnk
8 from
9 (
10 with t
11 as
12 ( select distinct cust_no, dispatch_zip zip
13 from dispatch_records order by 1
14 )
15 select t1.cust_no c1, t2.cust_no c2, t1.zip,
16 sum(t1.zip) over(partition by t1.cust_no, t2.cust_no) sum_zip
17 from t t1, t t2
18 where t1.cust_no < t2.cust_no
19 and t1.zip = t2.zip
20 )
21 )
22 )
23 group by grouping sets( (c1,rnk),(c1,c2,rnk) )
24 order by 2, 1
25 /
NVL(C2,C1) RNK
---------- ----------
Cust_1 1
Cust_3 1
Cust_4 1
Cust_5 1
Cust_1 2
Cust_2 2
Cust_5 2
Cust_6 2
Cust_1 3
Cust_5 3
10 rows selected.
Thanks so much
Shravani, August 03, 2011 - 9:20 am UTC
Thanks so much Tom. This is exactly what I wanted.
I just have 1 question, In the above query since Zip codes was having all numerics it was possible to put the SUM(zip) and then apply the DENSE_RANK. But suppose we have a non-numeric data, then how can we group it.
Here are the scripts for your ready reference.
drop table dispatch_records;
create table dispatch_records (cust_no varchar2(10), dispatch_Date date, dispatch_zip
varchar2(20));
------------------------------------------------------
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_1', 'Edison');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_1', 'Bridgwater');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_1', 'Forked River');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_2', 'Bridgwater');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_2', 'Forked River');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_3', 'Edison');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_4', 'Edison');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_5', 'Forked River');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_5', 'Edison');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_5', 'Bridgwater');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_6', 'Bridgwater');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_6', 'Forked River');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_7', 'Avenel');
insert into dispatch_records(cust_no, dispatch_zip) values ('Cust_7', 'South Jersey');
commit;
------------------------------------------------------
With this data, how can we get the output as
======================
Cust_No Rank
======================
Cust_1 1 --------
Cust_3 1 Customers with Only zip code (Edison)
Cust_4 1 matching
Cust_5 1 --------
Cust_1 2 --------
Cust_2 2 Customers with 2 zip codes (Bridgwater and
Cust_5 2 Forked River) matching.
Cust_6 2 --------
Cust_1 3 ------- Customers with 3 zip codes (Edison, Bridgwater and Forked River)
Cust_5 3 --------matching
=====================
Regards
Shravani
August 03, 2011 - 9:54 am UTC
question for you, what if the zip codes are:
10000
20000
16000
14000
for two groups - wouldn't that mess up your query as well. Like I said, I'm assuming your query works - maybe it doesn't. I didn't really analyze it.
you'd need to spec out the inner query for us, in detail - very very clearly.
I simply took your work and made it return what you wanted. If your work doesn't work... then you'll need to work on that bit.
Shravani, August 03, 2011 - 10:51 am UTC
Sorry Again Tom.
Believe me this is the first time in AskTom forum, I am not able to explain properly what I want.
Actually my question is more on applying the analytical functions on more than one record, but I am not doing a good job of phrasing it.
For the below data, the grouping should be based on the matching zip codes. I used a very dirty method of doing a SUM(ZIP) OVER (PARTITION BY T1.CUST_NO, T2.CUST_NO).
Thinking here was to set up a base value on which analytical functions will work.
My question is without using SUM(), can we group the data, basically how to tell Oracle to apply the rank based on the matching rows.
Output
===============================
C1 C2 ZIP
===============================
Cust_1 Cust_3 07001
Cust_1 Cust_4 07001
Cust_3 Cust_4 07001
Cust_3 Cust_5 07001
Cust_4 Cust_5 07001
Cust_1 Cust_2 07771
Cust_1 Cust_2 08731
Cust_1 Cust_6 07771
Cust_1 Cust_6 08731
Cust_2 Cust_5 07771
Cust_2 Cust_5 08731
Cust_2 Cust_6 07771
Cust_2 Cust_6 08731
Cust_5 Cust_6 07771
Cust_5 Cust_6 08731
Cust_1 Cust_5 07001
Cust_1 Cust_5 07771
Cust_1 Cust_5 08731
===============================
Regards
Shravani
August 03, 2011 - 12:39 pm UTC
what happens when you have customers that match twice but with different zips?
eg:
c1 X
c2 X
c1 Y
c2 Y
c3 A
c4 B
c3 A
c4 B
c1 and c2 match twice, c3 and c4 match twice - but they match on different "zips"
You need to be really really precise. It has nothing to do with asking for the first time in a forum - we are all software developers. Our *MOST IMPORTANT* skill - the skill you need more than any - is the skill of putting on paper what actually needs to be done.
You know, write a specification. From start to finish, in sufficient detail that a programmer can generate code from it.
Otherwise - you get garbage.
So, please spec it out - clearly, in detail, precisely.
Yes - it is hard, yes it is a pain, no - you cannot get out of doing it. It is what we do after all.
Sorry once again Tom
rupalimay21@gmail.com, August 04, 2011 - 11:39 am UTC
Hi Tom,
Hats off to you for the below line
"we are all software developers. Our *MOST IMPORTANT* skill - the skill you need more than any - is the skill of
putting on paper what actually needs to be done. "
Very true Tom.
I will give you little background on the business requirement. The client I am working acquired few small companies and the data got integrated successfully.
Now in few of the ledgers we are finding that there are lot of duplicate customer records. Our funtional experts came up with alogorith to identify the duplicates.
They gave set of rules like
Rule 1. Customers with same registration date
Rule 2. Customers with same zip codes (This rule has to be dynamic, means in program we should have flexibility to increase number of matching zip codes)
Rule 3. Customers who have consumed goods of more than 5M in last 2 recent years
and so on.
note : All the rules goes with AND condition. And the rules are arranged as per the granularity, means Rule-1 is at high granular level than Rule-2.
As per the functional team, rules can change any time hence they are looking for some kind of framework which will identify duplicates based on the defined
rules.
We have created a Stage Table which holds all the customers. Using PL/SQL we process each rule and then flag off the customer codes based on the group.
For example for the below scenario after the rule_1 process
Cust_1, Cust_2, Cust_3 shares same registration date.
Cust_3, Cust_5 shares same registration date.
DUPS_BASE
---------------------------------------------------------------------
CUST_NO Rule_1 Rule_2 Rule_3
----------------------------------------------------------------------
Cust_1 1
Cust_2 1
Cust_3 1
Cust_4 2
Cust_5 2
----------------------------------------------------------------------
Once the Rule_1 is processed, process takes second rule and Cust_1, Cust_2, Cust_3 gets evaluated against Rule_2.
I asked question specific to Rule_2. Basically I want to bundle customers in a group, so that next process will pick them up and apply next rule.
Now to answer your question, if
c1 X
c2 X
c1 Y
c2 Y
then C1 and C2 will grouped under same ranking. Basically, all the customers are grouped based on the matching zip codes, so that next rule will get applied on the set of customers.
Tom, I am not sure whether I explained the requirement properly. I will surely not again request you to spend time on this, because i know I have wasted
lot of your time already.
Thanks so much for all your advices Tom.
Regards
Shravani
August 04, 2011 - 11:48 am UTC
I asked about this:
eg:
c1 X
c2 X
c1 Y
c2 Y
c3 A
c4 B
c3 A
c4 B
c1 and c2 match twice, c3 and c4 match twice - but they match on different "zips"
not just c1, c2 - what is the output for c1,c2,c3,c4
Also, cust_1 might appear in MANY groups then right? - did you/do you consider that?
cust_1 might have 5 zip codes, of which 3 match cust_2, 3 match cust_3, 3 match cust_4 - but cust_2,3,4 do not match each other three times. What then? What does the output look like at that point?
Differences in data content
Eric, November 12, 2011 - 9:57 pm UTC
To help me compare the contents of two tables for I project I am working on I looked at ideas for a year and ended writing the following. I hope is it not a really old idea. It populates a table with the primary key, the column name and the data from both tables. In short the primary key is reported as a tilde delimited value, the two tables are joined by their primary keys and the differences reported. I have not used this for tables with more than 10 million rows. On smaller tables it will compare about 20 million values a minute on a slow server. Not sure how it handles big tables. One downside is NVL forces a full table scan but, we are looking at the entire table anyway.
--
create table data_differences ( pkey_string varchar2(200) ,
column_name varchar2(30) , table1_name varchar2(30) ,
table2_name varchar2(30), table1_value varchar2(240) ,
table2_value varchar2(240), job_date date )
create table data_job_status ( column_name varchar2(30) ,
transaction_date date )
create table table_one ( id number(4) , source varchar2(12) ,
name varchar2(20) , transaction_date date , userid varchar2(20) )
create table table_two as select * from table_one
alter table table_one add ( constraint table_one_pk primary key ( id , source ) )
alter table table_two add ( constraint table_two_pk primary key ( id , source ) )
insert into table_one values ( 1 , 'CORP' , 'SMITH' ,
TO_DATE ( '11082011' , 'MMDDYYYY' ) , 'USER10' )
insert into table_one values ( 2 , 'CORP' , 'SMITH' ,
TO_DATE ( '11081999' , 'MMDDYYYY' ) , 'USER11' )
insert into table_one values ( 3 , 'CORP' , 'SMITH' ,
TO_DATE ( '11081999' , 'MMDDYYYY' ) , 'USER15' )
insert into table_two values ( 1 , 'CORP' ,'SMITH' ,
TO_DATE ( '11082011' , 'MMDDYYYY' ) , 'USER1' )
insert into table_two values ( 2 , 'CORP' , 'JOHNSON' ,
TO_DATE ( '11081999' , 'MMDDYYYY' ) , 'USER1' )
insert into table_two values ( 3 , 'CORP' , 'JOHNSON' ,
TO_DATE ( '11082011' , 'MMDDYYYY' ) , 'USER1' )
--
-- 2 of 3 rows returned
PKEY_STRING COLUMN_NAME TABLE1_NAME TABLE2_NAME TABLE1_VALUE TABLE2_VALUE
-------------- ---------------- -------------- -------------- --------------- ------------
2~CORP NAME TABLE_ONE TABLE_TWO SMITH JOHNSON
3~CORP TRANSACTION_DATE TABLE_ONE TABLE_TWO 08-NOV-99 08-NOV-11
--
-- BEGIN
-- COMPARE_DATA2 ( 'GUEST' , 'GUEST' , 'TABLE_ONE' , 'TABLE_TWO' );
-- END
--
CREATE OR REPLACE PROCEDURE compare_table_content
( in_owner1 in varchar , in_owner2 in varchar ,
in_table1 in varchar , in_table2 in varchar ) is
n number(8);
x number(8);
c_col varchar2(30); /* column name */
c_nam varchar2(30); /* primary key name */
d_typ varchar2(12); /* data type */
q_col varchar2(30); /* column for primary key */
char_col varchar2(32); /* column name in quotes */
ch_table1 varchar2(32);
ch_table2 varchar2(32);
q_str varchar2(500); /* query string */
x_str varchar2(1000); /* execute string */
ins_str varchar2(2000); /* insert string */
key_str varchar2(500); /* delimited primary key */
whr_str varchar2(2000); /* where string */
nls_dte varchar2(20) := '''DD-MON-YYYY'''; /* date fmt */
--
-- default values NVL function
--
nvl_chr varchar2(20) := '''QqQqQq''';
nvl_dte varchar2(40) := 'sysdate + 100';
--
-- Get primary key
--
cursor c1 ( in_owner in varchar , in_table in varchar ) is
select a.constraint_name , a.column_name , a.position
from all_cons_columns a , all_constraints b
where a.table_name = upper ( in_table )
and b.table_name = a.table_name
and b.owner = upper ( in_owner )
and a.owner = b.owner
and a.constraint_name = b.constraint_name
and b.constraint_type = 'P'
order by a.position;
--
-- Get column names not in primary key and not excluded
-- In this example has been excluded.
--
cursor c2 ( in_owner in varchar , in_table in varchar ,
pkey_name in varchar ) is
select a.column_name , a.data_type
from all_tab_columns a
where a.table_name = upper ( in_table )
and a.owner = upper ( in_owner )
and a.data_type in ( 'VARCHAR2' , 'NUMBER' , 'DATE' )
and not exists ( select 1 from all_cons_columns b
where a.owner = b.owner
and a.table_name = b.table_name
and b.constraint_name = upper ( pkey_name )
and a.column_name = b.column_name )
and a.column_name != 'USERID'
order by a.column_id;
begin
execute immediate 'truncate table data_job_status reuse storage';
execute immediate 'truncate table data_differences reuse storage';
ch_table1 := chr(39)||upper ( in_table1 )||chr(39);
ch_table2 := chr(39)||upper ( in_table2 )||chr(39);
open c1 ( in_owner1 , in_table1 );
loop
fetch c1 into c_nam , q_col , n;
exit when c1%notfound;
if ( n = 1 ) then
ins_str := 'insert into data_differences select ';
whr_str := ' where a.'||q_col||' = b.'||q_col;
key_str := ' a.'||q_col;
else
whr_str := whr_str||' and a.'||q_col||' = b.'||q_col;
key_str := key_str||'||'||chr(39)||'~'||chr(39)||'|| a.'||q_col;
end if;
end loop;
close c1;
open c2 ( in_owner1 , in_table1 , c_nam );
loop
fetch c2 into c_col , d_typ;
exit when c2%notfound;
char_col := chr(39)||c_col||chr(39);
if ( d_typ = 'NUMBER' ) then
q_str := ' AND NVL ( A.'||c_col||' , -99.9191 ) '||
' != NVL ( B.'||c_col||' , -99.9191 )';
x_str := ins_str||' '||key_str||' , '||char_col||
' , '||ch_table1||' , '||ch_table2||' , '||
' to_char ( a.'||c_col||' ) , to_char ( b.'||c_col||' ) '||
' , sysdate from '||
' '||in_owner1||'.'||in_table1||' a , '||in_owner2||
'.'||in_table2||' b '||
whr_str||' '||q_str;
elsif ( d_typ = 'DATE' ) then
q_str := ' AND NVL ( A.'||c_col||' , '||nvl_dte||' ) '||
' != NVL ( B.'||c_col||' , '||nvl_dte||' ) ';
x_str := ins_str||' '||key_str||' , '||char_col||
' , '||ch_table1||' , '||ch_table2||' , '||
' to_char ( a.'||c_col||' , '||nls_dte||' ) , '||
' to_char ( b.'||c_col||' , '||nls_dte||' ) , sysdate '||
' FROM '||in_owner1||'.'||in_table1||' a , '||in_owner2||
'.'||in_table2||' b '||whr_str||' '||q_str;
else
q_str := ' AND NVL ( A.'||c_col||' , '||nvl_chr||' )'||
' != NVL ( B.'||c_col||' , '||nvl_chr||' )';
x_str := ins_str||' '||key_str||' , '||char_col||
' , '||ch_table1||' , '||ch_table2||' , '||
' a.'||c_col||' , b.'||c_col||' , sysdate '||
' FROM '||in_owner1||'.'||in_table1||' a , '||in_owner2||
'.'||in_table2||' b '||whr_str||' '||q_str;
end if;
insert into data_job_status values ( c_col , sysdate );
commit;
execute immediate ( x_str );
commit;
end loop;
close c2;
commit;
end compare_table_content;
/
Comparing versioned records
Parthiban Nagarajan, March 03, 2012 - 2:39 am UTC
drop table my_cmp purge;
-- This is the main table
-- VERSION_NO is incremented for each UPDATE
-- The record before the change is copied to another table (my_cmp_audit - given below)
create table my_cmp
(
key_col1x varchar2(10),
key_col2x varchar2(10),
info_col1x date,
info_col2x number,
oth_col1x varchar2(10),
oth_col2x date,
oth_col3x number,
version_no number,
constraint my_cmp_pk primary key (key_col1x, key_col2x)
);
begin
insert into my_cmp values ('k1', '1', to_date('16feb1985', 'dd-mon-yyyy'), 27, 'Hello', sysdate, 10, 5);
insert into my_cmp values ('k1', '2', to_date('03mar1970', 'dd-mon-yyyy'), 42, 'World', sysdate, 10, 2);
end;
/
drop table my_cmp_audit purge;
-- This table contains the history of records from my_cmp
create table my_cmp_audit
(
key_col1x varchar2(10),
key_col2x varchar2(10),
info_col1x date,
info_col2x number,
oth_col1x varchar2(10),
oth_col2x date,
oth_col3x number,
version_no number,
constraint my_cmp_audit_pk primary key (key_col1x, key_col2x, version_no)
);
-- Sample data
begin
insert into my_cmp_audit values ('k1', '1', to_date('16feb1985', 'dd-mon-yyyy'), 27, 'Hello', sysdate , 10, 4);
insert into my_cmp_audit values ('k1', '1', to_date('16feb1985', 'dd-mon-yyyy'), 27, 'Hell' , trunc(sysdate), 10, 3);
insert into my_cmp_audit values ('k1', '1', to_date('16feb1985', 'dd-mon-yyyy'), 27, 'Help' , sysdate , 10, 2);
insert into my_cmp_audit values ('k1', '1', to_date('16feb1985', 'dd-mon-yyyy'), 1 , 'xxxxx', sysdate-3333 , 5, 1);
--
insert into my_cmp_audit values ('k1', '2', to_date('03mar1970', 'dd-mon-yyyy'), 42, 'World', sysdate - 50, 10, 1);
end;
/
-- This view shows us the record level comparison side-by-side (using self-join)
-- INFO_COL is not compared; OTH_COLs are compared
-- OTH_COL is given $ suffix if it is from the AUDIT table
-- But this view information is just half the way
create or replace view MY_CMP_VU1 as
with all_recs as (
select KEY_COL1X, KEY_COL2X, INFO_COL1X, OTH_COL1X, OTH_COL2X, OTH_COL3X, VERSION_NO
from MY_CMP
union all
select KEY_COL1X, KEY_COL2X, INFO_COL1X, OTH_COL1X as OTH_COL1$, OTH_COL2X as OTH_COL2$, OTH_COL3X as OTH_COL3$, VERSION_NO
from MY_CMP_AUDIT
)
select KEY_COL1X, KEY_COL2X, a.INFO_COL1X, a.OTH_COL1X, a.OTH_COL2X, a.OTH_COL3X, b.OTH_COL1X as OTH_COL1$, b.OTH_COL2X as OTH_COL2$, b.OTH_COL3X as OTH_COL3$, a.VERSION_NO
from (select * from all_recs) a
join (select * from all_recs) b
using (KEY_COL1X, KEY_COL2X)
where b.VERSION_NO = a.VERSION_NO - 1;
select * from my_cmp_vu1 order by key_col1x, key_col2x, version_no;
KEY_COL1X KEY_COL2X INFO_COL1X OTH_COL1X OTH_COL2X OTH_COL3X OTH_COL1$ OTH_COL2$ OTH_COL3$ VERSION_NO
---------- ---------- ---------------- ---------- ---------------- ---------- ---------- ---------------- ---------- ----------
k1 1 16/02/1985 00:00 Help 03/03/2012 14:03 10 xxxxx 17/01/2003 14:03 5 2
k1 1 16/02/1985 00:00 Hell 03/03/2012 00:00 10 Help 03/03/2012 14:03 10 3
k1 1 16/02/1985 00:00 Hello 03/03/2012 14:03 10 Hell 03/03/2012 00:00 10 4
k1 1 16/02/1985 00:00 Hello 03/03/2012 14:03 10 Hello 03/03/2012 14:03 10 5
k1 2 03/03/1970 00:00 World 03/03/2012 14:03 10 World 13/01/2012 14:03 10 2
-- This view gives the complete and expected result
create or replace view my_cmp_vu2 as
select key_col1x, key_col2x, info_col1x,
field_name, old_value, new_value
from
(
select key_col1x, key_col2x, info_col1x, version_no,
'OTH_COL1X' field_name, to_char(oth_col1x) new_value, to_char(oth_col1$) old_value
from my_cmp_vu1
where oth_col1x <> oth_col1$
union all
select key_col1x, key_col2x, info_col1x, version_no,
'OTH_COL2X', to_char(oth_col2x), to_char(oth_col2$)
from my_cmp_vu1
where oth_col2x <> oth_col2$
union all
select key_col1x, key_col2x, info_col1x, version_no,
'OTH_COL3X', to_char(oth_col3x), to_char(oth_col3$)
from my_cmp_vu1
where oth_col3x <> oth_col3$
)
order by 1, 2, version_no;
select * from my_cmp_vu2 order by key_col1x, key_col2x;
KEY_COL1X KEY_COL2X INFO_COL1X FIELD_NAM OLD_VALUE NEW_VALUE
---------- ---------- ---------------- --------- ---------------------------------------- ----------------------------------------
k1 1 16/02/1985 00:00 OTH_COL1X xxxxx Help
k1 1 16/02/1985 00:00 OTH_COL2X 17/01/2003 14:03 03/03/2012 14:03
k1 1 16/02/1985 00:00 OTH_COL3X 5 10
k1 1 16/02/1985 00:00 OTH_COL2X 03/03/2012 14:03 03/03/2012 00:00
k1 1 16/02/1985 00:00 OTH_COL1X Help Hell
k1 1 16/02/1985 00:00 OTH_COL1X Hell Hello
k1 1 16/02/1985 00:00 OTH_COL2X 03/03/2012 00:00 03/03/2012 14:03
k1 1 16/02/1985 00:00 OTH_COL2X 03/03/2012 14:03 03/03/2012 14:03
k1 2 03/03/1970 00:00 OTH_COL2X 13/01/2012 14:03 03/03/2012 14:03
But when I think about the performance, its hard for me to accept this as the optimal solution.
Because, in the production environment, the data volume would be large and the columns to be compared also >200
Please give me some idea.
Thanks and regards
March 03, 2012 - 8:54 am UTC
what do you need to optimize for? to retrieve a single record, a hundred records via some predicate, all records? what is large to you?
Re: Comparing versioned records
Parthiban Nagarajan, March 04, 2012 - 2:08 am UTC
Hi Tom
I want to scan all the records of a monthly partition.
This is to report how the records are getting mutated.
So, every month, a new partition is added and at the end of the month, the report is generated for that partition.
Also, there are a couple of tables which are list partitioned on a region flag. This needs to be scanned based on the LAST_UPDATE_DATETIME field.
Thanks and regards
March 04, 2012 - 9:17 am UTC
who would read such a lengthly report? Is this *really* a need? Since it can be generated anytime, for any set of records of interest - why would you generate a report no human being on the planet could even ever read/understand/make use of?
Do you really want to do this? truly? It seems like a big waste?
You can either do the self join as you are doing or union all this data together and use lag or lead to marry the current row with the previous or next row to do the comparision.
but - if you ask me - I wouldn't do it, no one could ever look at it.
Re: Comparing versioned records
Parthiban Nagarajan, March 04, 2012 - 10:33 am UTC
Hi Tom
Believe it or not. We are already doing this.
But with a package, object types and its corresponding table types, with row-by-row processing with dynamic SQL to collect the column_name and etc.
I just want to make it simple so that the maintenance of the code would be easy.
I am not sure how the report is used. They might check only for a few keys (or records) based on some other report.
Also, this report is to list only the records that are modified (i.e. version_no > 1 or the records which have at least one matching record in the audit table).
Please give me some idea to tune the 2nd view or if this cannot be tuned, please tell me your comments on the 2nd view at the performance angle.
And my opinion is this should be better when compared to its existing form (package, dynamic sql, etc.)
Thanks and regards
March 05, 2012 - 6:53 am UTC
ops$tkyte%ORA11GR2> with data as (select level l from dual connect by level <= 3)
2 select *
3 from (
4 select key_col1x, key_col2x, info_col1x, version_no,
5 'OTH_COL'||l||'X' field_name,
6 decode( l, 1, to_char(oth_col1x), 2, to_char(oth_col2x), 3, to_char(oth_col3x) ) new_value,
7 decode( l, 1, to_char(oth_col1$), 2, to_char(oth_col2$), 3, to_char(oth_col3$) ) old_value
8 from my_cmp_vu1, data
9 )
10 where decode( new_value, old_value, 1, 0 ) = 0
11 /
KEY_COL1X KEY_COL2X INFO_COL1 VERSION_NO FIELD_NAME NEW_VALUE OLD_VALUE
---------- ---------- --------- ---------- ---------- ---------- ----------
k1 1 16-FEB-85 4 OTH_COL1X Hello Hell
k1 1 16-FEB-85 3 OTH_COL1X Hell Help
k1 1 16-FEB-85 2 OTH_COL1X Help xxxxx
k1 1 16-FEB-85 2 OTH_COL2X 05-MAR-12 19-JAN-03
k1 1 16-FEB-85 2 OTH_COL3X 10 5
k1 2 03-MAR-70 2 OTH_COL2X 05-MAR-12 15-JAN-12
6 rows selected.
try that one on (or look up pivot if you are in 11g)
Re: Comparing versioned records
Parthiban Nagarajan, March 06, 2012 - 10:37 am UTC
Hi Tom
Yes. As you mentioned, I need to UNPIVOT. I did it with the CROSS JOIN and DECODE, as of now. I will try the UNPIVOT operator also.
Thanks for your invaluable guidance on this.
Thanks and regards
Compare Tables & Aggregate Results
Ian Wallace, January 02, 2013 - 5:23 am UTC
Hi Tom,
This has been a great thread.
If I run the following query as suggested above:
SELECT source_code,
lcn_id,
ssm_id,
source_client,
tct_type,
tct_subtype,
value_date,
interest_accrued,
interest_earned,
ccy_id,
interest_rate,
incremental_earned_home,
incremental_earned,
COUNT(src1) base_tab,
COUNT(src2) refresh_tab
from
( SELECT a.*,
1 src1,
TO_NUMBER(NULL) src2
FROM tr_owner2.daily_interest_active2 a
UNION ALL
SELECT b.*,
TO_NUMBER(NULL) src1,
2 src2
FROM tr_owner.daily_interest_active2 b
)
GROUP BY source_code,
lcn_id,
ssm_id,
source_client,
tct_type,
tct_subtype,
value_date,
interest_accrued,
interest_earned,
ccy_id,
interest_rate,
incremental_earned_home,
incremental_earned
HAVING COUNT(src1) <> count(src2);
I get:
source_code lcn_id ssm_id source_client tct_type tct_subtype value_date interest_accrued interest_earned ccy_id interest_rate incremental_earned_home incremental_earned base_tab refresh_tab
----------- ------ ------ ------------- -------- ----------- ---------- ---------------- --------------- ------ ------------- ----------------------- ------------------ -------- -----------
111111 0IRE AC 999999 INT 01/04/2006 526.08 -19.13 GBP 4.6 1 0
111111 0IRE AC 999999 INT 01/04/2006 526.08 -19.13 GBP 4.9 0 1
The PK for both tables is:
source_code,lcn_id,ssm_id,source_client,tct_type,tct_subtype,value_date
What I would like to do is combine both rows into one row based on PK as follows:
source_code lcn_id ssm_id source_client tct_type tct_subtype value_date interest_accrued_old interest_accrued_new interest_earned_old interest_earned_new interest_rate_old interest_rate_new
----------- ------ ------ ------------- -------- ----------- ---------- -------------------- -------------------- ------------------- ----------------------- ------------------ -----------------
111111 0IRE AC 999999 INT 01/04/2006 526.08 526.08 -19.13 -19.13 4.6 4.9
I'm stuck however on just how to go about doing that?
Aplogies if this has already been answered.
Thanks,
Ian.
January 04, 2013 - 2:45 pm UTC
what makes one old or new??? I'll assume base_tab is "old" and refresh_tab is "new"
anyway - you have your query Q above, it returns a "table".
that table includes:
source_code,lcn_id,ssm_id,source_client,tct_type,tct_subtype,value_date
interest_accrued,
interest_earned,
ccy_id,
interest_rate,
incremental_earned_home,
incremental_earned,
and the base_tab, refresh_tab counts
You want to 'pivot' the interest_accrued, interest_earned and interest_rates. You can use pivot in 11g - or in all releases "max(decode)", I'll do max(decode) since you didn't say...
select source_code,lcn_id,ssm_id,source_client,tct_type,tct_subtype,value_date,
max(decode( base_tab, 1, interest_accrued )) ia_old,
max(decode( refresh_tab, 1, interest_accrued )) ia_new,
... same for the other two columns ...
from (Q)
group by source_code,lcn_id,ssm_id,source_client,tct_type,tct_subtype,value_date
had I been given a create table, some inserts (a test case in short), I could have written the query syntactically correct - but I wasn't so this is just psuedo code... search this site for pivot for full examples....
I don't know how does query work....confused
Texas Cowboy, June 10, 2013 - 4:03 pm UTC
Tom,
your responses to several discussions have been awesome. I just have a question about your query (shown below) to several posts.
I just don't know why you chose 1 for src1 and 2 for src2 in the UNION ALL subquery. I am having trouble understanding how this query compares the data and shows only differences. What effect does 1 and 2 have when you count the records in outer query
select c1, c2, c3,
2 count(src1) CNT1,
3 count(src2) CNT2
4 from
5 ( select a.*,
6 1 src1,
7 to_number(null) src2
8 from a
9 union all
10 select b.*,
11 to_number(null) src1,
12 2 src2
13 from b
14 )
15 group by c1,c2,c3
16 having count(src1) <> count(src2)
17 /
June 18, 2013 - 2:42 pm UTC
they could have been 0, 0 - or anything. Just needed something (not null) to count. 1 and 2 seemed "appropriate" given their name source 1 and source 2.
If you have a key (c1,c2,c3) that exists only in A, then "count(src1) cnt1" would be 1 and "count(src2) cnt2" would be 0 - we would see the row exists in A but not B.
and vice versa.
if the key (c1,c2,c3) existed in both tables - then count(src1/src2) would be 1 and we would not display that row.
Yes now I understand....
Texas Cowboy, June 18, 2013 - 4:04 pm UTC
I am such a dolt for not reading it carefully for the first time. I was thinking you were doing SUM() instead of COUNT(), so naturally I thought why would 1 ever equal to 2. Alas, it was COUNT() and your explanation totally made sense. I took this query, modified it to fit my tables, and it worked very well.
Count of 2 to 1 or 1 to 2 means there is a duplicate in one table.
Count of 1 to 0 or 0 to 1 means there is no row of that pk in one table.
count of 1 and 1 will not display because the row is in both tables and it's same.
oracle
A reader, December 29, 2015 - 12:13 pm UTC
nice