Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Abi.

Asked: May 21, 2012 - 11:56 pm UTC

Last updated: May 25, 2012 - 9:17 am UTC

Version: 10g

Viewed 1000+ times

You Asked

Hi,

Thanks for a forum like this to post our questions and get awesome solutions from you.

My question is

I have 2 huge tables (around 1 million records) TableA and TableB in the same structure. In each of this table I have 28 coulmns. Out of this 28 columns 1 column is primary key. I need to compare the two tables and display the result in the following format.
Note: I don’t have to report the columns that have not changed.

PK ColumnName OldValue NewValue
--------------------------------------------------------------------------
1 Column1 100 200
1 Column5 ABC XYZ

I have tried
(SELECT * FROM TableA
MINUS
SELECT * FROM TableB)
UNION ALL
(SELECT * FROM TableA
MINUS
SELECT * FROM TableB)

This is taking time and also I am not able to display the changes in the required format.

For Demo purpose, please use the following script

CREATE TABLE emp_new
(eno NUMBER PRIMARY KEY,
ename VARCHAR2(100),
salary NUMBER(13, 2),
Commission NUMBER(13, 2),
HireDate DATE,
Designation varchar(100)
);

CREATE TABLE emp_old
(eno NUMBER PRIMARY KEY,
ename VARCHAR2(100),
salary NUMBER(13, 2),
Commission NUMBER(13, 2),
HireDate DATE,
Designation varchar(100)
);

insert into emp_new
(eno, ename, salary, Commission, HireDate, Designation)
VALUES
(100, 'Name1', 1000, 0, '13-MAY-2001', 'CEO');

insert into emp_new
(eno, ename, salary, Commission, HireDate, Designation)
VALUES
(200, 'Name2', 2000, 0, '13-JUN-2001', 'ACCOUNTANT');

insert into emp_new
(eno, ename, salary, Commission, HireDate, Designation)
VALUES
(300, 'Name3', 3000, 0, '13-JUL-2001', 'SENIOR ANALYST');

insert into emp_new
(eno, ename, salary, Commission, HireDate, Designation)
VALUES
(400, 'Name4', 4000, 0, '13-AUG-2001', 'ANALYST');

insert into emp_new
(eno, ename, salary, Commission, HireDate, Designation)
VALUES
(500, 'Name5', 5000, 0, '13-SEP-2001', 'SALES MANAGER');

insert into emp_old
(eno, ename, salary, Commission, HireDate, Designation)
VALUES
(100, 'Name11', 1000, 0, '13-MAY-2011', 'CEO');

insert into emp_old
(eno, ename, salary, Commission, HireDate, Designation)
VALUES
(200, 'Name2', 2000, 0, '13-JUN-2001', 'ACCOUNTANT');

insert into emp_old
(eno, ename, salary, Commission, HireDate, Designation)
VALUES
(300, 'Name32', 3000, 0, '13-JUL-2011', 'SENIOR ANALYST');

insert into emp_old
(eno, ename, salary, Commission, HireDate, Designation)
VALUES
(400, 'Name4', 4000, 0, '13-AUG-2001', 'SENIOR ANALYST');

insert into emp_old
(eno, ename, salary, Commission, HireDate, Designation)
VALUES
(500, 'Name5', 10000, 0, '13-SEP-2001', 'SALES MANAGER');


Thanks in advance.

Regards
Abi


and Tom said...

One million records hasn't been huge for a long long time. In fact, I think of it as rather small - I use that many on my laptop for examples a lot.


See this article:

http://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html

It does not get the "precise" format you asked for - but it is very close and infinitely more compact/readable I believe. You can at least use that as a starting point for getting your ultimate result if you don't like my format.

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Report format

A reader, May 22, 2012 - 11:50 pm UTC

Tom,
Thanks a lot for the link. It was good.
Tm, After comparing the two tables, I need the report in the following format

PK ColumnName OldValue NewValue
-----------------------------------------------------------1 Column1 100 200
1 Column5 ABC XYZ

Please help me.
Thanks
Regards
Abi
Tom Kyte
May 23, 2012 - 8:46 am UTC

ugh... taking it to the next level for you:



ops$tkyte%ORA11GR2> create table a as select object_id c1, object_name c2, created c3 from all_objects;

Table created.

ops$tkyte%ORA11GR2> create table b as select * from a;

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> update a set c2 = lower(c2) where c1 = (select object_id from all_objects where owner='SCOTT' and object_name='EMP');

1 row updated.

ops$tkyte%ORA11GR2> update b set c2 = lower(c2) where c1 = (select object_id from all_objects where owner='SCOTT' and object_name='DEPT');

1 row updated.

ops$tkyte%ORA11GR2> update a set c3 = c3+1 where c1 = (select object_id from all_objects where owner='SCOTT' and object_name = 'SALGRADE');

1 row updated.

ops$tkyte%ORA11GR2> update b set c3 = c3-1 where c1 = (select object_id from all_objects where owner='SCOTT' and object_name = 'BONUS');

1 row updated.

ops$tkyte%ORA11GR2> delete from a where c1 = (select object_id from all_objects where owner='SYS' and object_name = 'OBJ$');

1 row deleted.

ops$tkyte%ORA11GR2> delete from b where c1 = (select object_id from all_objects where owner='SYS' and object_name = 'COL$');

1 row deleted.

ops$tkyte%ORA11GR2> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';

Session altered.

ops$tkyte%ORA11GR2> select c1,
  2         decode( R, 1, decode(old_c2,new_c2,null,'C2'),
  3                        2, decode(old_c3,new_c3,null,'C3') ) cname,
  4             decode( R, 1, old_c2,
  5                        2, old_c3 ) oldValue,
  6             decode( R, 1, new_c2,
  7                        2, new_c3 ) newValue
  8    from (
  9  select c1,
 10         case when cnt1=0 then null    when cnt1=1 then old_c2 end old_c2,
 11         case when cnt1=0 then old_c2  when cnt1=1 then new_c2 end new_c2,
 12         case when cnt1=0 then null    when cnt1=1 then old_c3 end old_c3,
 13         case when cnt1=0 then old_c3  when cnt1=1 then new_c3 end new_c3,
 14             rn,
 15             cnt1,
 16             cnt2
 17    from (
 18  select c1,
 19         c2 old_c2,
 20             lead(c2) over (partition by c1 order by cnt1 desc) new_c2,
 21         c3 old_c3,
 22             lead(c3) over (partition by c1 order by cnt1 desc) new_c3,
 23             row_number() over (partition by c1 order by cnt1 desc) rn,
 24             cnt1, cnt2
 25    from (
 26  select c1, c2, c3,
 27         count(src1) CNT1,
 28          count(src2) CNT2
 29     from
 30   ( select a.*,
 31          1 src1,
 32          to_number(null) src2
 33       from a
 34      union all
 35     select b.*,
 36          to_number(null) src1,
 37          2 src2
 38       from b
 39    )
 40   group by c1,c2,c3
 41  having count(src1) <> count(src2)
 42         )
 43         )
 44   where rn = 1
 45         ), (select level R from dual connect by level <= 2)
 46   where decode( R, 1, decode(old_c2,new_c2,null,'C2'),
 47                        2, decode(old_c3,new_c3,null,'C3') ) is not null
 48  order by 1
 49  /

        C1 CN OLDVALUE                       NEWVALUE
---------- -- ------------------------------ ------------------------------
        18 C2                                OBJ$
        18 C3                                05-sep-2010 15:39:55
        21 C2 COL$
        21 C3 05-sep-2010 15:39:55
     86429 C2 emp                            EMP
     86430 C2 DEPT                           dept
     86431 C3 16-mar-2011 08:23:05           15-mar-2011 08:23:05
     86432 C3 17-mar-2011 08:23:05           16-mar-2011 08:23:05

8 rows selected.




replace my use of 2 in the last inline view with the number of columns you have...

add more to the decode to deal with them


Query to compare and display the results.

Rajeshwaran, Jeyabal, May 23, 2012 - 7:04 am UTC

Does this helps you ?


rajesh@ORA10GR2> variable q1 varchar2(40);
rajesh@ORA10GR2> variable q2 varchar2(40);
rajesh@ORA10GR2> exec :q1 := 'select * from emp_old order by eno'

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
rajesh@ORA10GR2> exec :q2 := 'select * from emp_new order by eno'

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
rajesh@ORA10GR2>
rajesh@ORA10GR2> column rnum format 99999;
rajesh@ORA10GR2> column cname format a20;
rajesh@ORA10GR2> column old_val format a30;
rajesh@ORA10GR2> column new_val format a30;
rajesh@ORA10GR2>
rajesh@ORA10GR2>
rajesh@ORA10GR2> select t1.rnum ,t1.cname, t1.val as old_val,t2.val as new_val
  2  from table (cols_as_rows(:q1)) t1,
  3    table (cols_as_rows(:q2)) t2
  4  where t1.rnum = t2.rnum
  5  and t1.cname = t2.cname
  6  and t1.val <> t2.val;

  RNUM CNAME                OLD_VAL                        NEW_VAL
------ -------------------- ------------------------------ -----------------------------
     1 ENAME                Name11                         Name1
     1 HIREDATE             13-may-2011                    13-may-2001
     3 ENAME                Name32                         Name3
     3 HIREDATE             13-jul-2011                    13-jul-2001
     4 DESIGNATION          SENIOR ANALYST                 ANALYST
     5 SALARY               10000                          5000

6 rows selected.

Elapsed: 00:00:00.10
rajesh@ORA10GR2>
rajesh@ORA10GR2>

Here is the implementation details for the function cols_as_rows.
rajesh@ORA10GR2> create or replace type mytabtype as object(rnum number,cname varchar2(30),val varchar2(4000));
  2  /

Type created.

Elapsed: 00:00:00.06
rajesh@ORA10GR2>
rajesh@ORA10GR2> create or replace type mytab is table of mytabtype;
  2  /

Type created.

Elapsed: 00:00:00.03
rajesh@ORA10GR2>
rajesh@ORA10GR2> create or replace
  2  function cols_as_rows(p_query in VARCHAR2)
  3  return mytab
  4  pipelined
  5  as
  6    l_cursor   number default dbms_sql.open_cursor;
  7    l_column_count  number := 0;
  8    l_desctab          dbms_sql.desc_tab;
  9    l_column_value  varchar2(4000);
 10  l_status        number;
 11   l_rnum             number default 1;
 12  begin
 13     dbms_sql.parse(l_cursor,p_query,dbms_sql.native);
 14     dbms_sql.describe_columns (l_cursor,l_column_count,l_desctab);
 15
 16     for i in 1..l_column_count
 17     loop
 18             dbms_sql.define_column (l_cursor,i,l_column_value,4000);
 19     end loop;
 20
 21          l_status := dbms_sql.execute(l_cursor);
 22     while ( dbms_sql.fetch_rows(l_cursor) > 0 )
 23    loop
 24             for i in 1..l_column_count
 25             loop
 26                     dbms_sql.column_value (l_cursor,i,l_column_value);
 27                     --dbms_output.put_line ( rpad(' ',5,' ')||rpad(l_desctab(i).col_name,20,'-') ||' '|| l_column_value );
 28                     pipe row(mytabtype(l_rnum,l_desctab(i).col_name,l_column_value));
 29             end loop;
 30                    l_rnum := l_rnum + 1;
 31     end loop;
 32
 33    dbms_sql.close_cursor(l_cursor);
 34    return;
 35  end cols_as_rows;
 36  /

Function created.

Elapsed: 00:00:00.06
rajesh@ORA10GR2>



I see these results from your sample data's based on your queries.
rajesh@ORA10GR2> (SELECT * FROM emp_old
  2  MINUS
  3  SELECT * FROM emp_new)
  4  UNION
  5  (SELECT * FROM emp_old
  6  MINUS
  7  SELECT * FROM emp_new) ;

    ENO ENAME           SALARY COMMISSION HIREDATE                  DESIGNATION
------- --------------- ------ ---------- ------------------------- --------------------
    100 Name11            1000          0 13-may-2011               CEO
    300 Name32            3000          0 13-jul-2011               SENIOR ANALYST
    400 Name4             4000          0 13-aug-2001               SENIOR ANALYST
    500 Name5            10000          0 13-sep-2001               SALES MANAGER

Elapsed: 00:00:00.03
rajesh@ORA10GR2>
rajesh@ORA10GR2> (SELECT * FROM emp_old
  2  MINUS
  3  SELECT * FROM emp_new)
  4  UNION ALL
  5  (SELECT * FROM emp_old
  6  MINUS
  7  SELECT * FROM emp_new) ;

    ENO ENAME           SALARY COMMISSION HIREDATE                  DESIGNATION
------- --------------- ------ ---------- ------------------------- --------------------
    100 Name11            1000          0 13-may-2011               CEO
    300 Name32            3000          0 13-jul-2011               SENIOR ANALYST
    400 Name4             4000          0 13-aug-2001               SENIOR ANALYST
    500 Name5            10000          0 13-sep-2001               SALES MANAGER
    100 Name11            1000          0 13-may-2011               CEO
    300 Name32            3000          0 13-jul-2011               SENIOR ANALYST
    400 Name4             4000          0 13-aug-2001               SENIOR ANALYST
    500 Name5            10000          0 13-sep-2001               SALES MANAGER

8 rows selected.

Elapsed: 00:00:01.23
rajesh@ORA10GR2>
rajesh@ORA10GR2>

How about using Table function and make it as generic ?

Rajeshwaran, Jeyabal, May 23, 2012 - 9:01 am UTC

replace my use of 2 in the last inline view with the number of columns you have...
add more to the decode to deal with them


Tom, Rather than adding more decode's How about an generalized solution like this?

drop table a purge;
drop table b purge;
create table a as select object_id c1, object_name c2, created c3 from 
all_objects;
create table b as select * from a;
update a set c2 = lower(c2) where c1 = (select object_id from all_objects where 
owner='SCOTT' and object_name='EMP');
update b set c2 = lower(c2) where c1 = (select object_id from all_objects where 
owner='SCOTT' and object_name='DEPT');
update a set c3 = c3+1 where c1 = (select object_id from all_objects where 
owner='SCOTT' and object_name = 'SALGRADE');
update b set c3 = c3-1 where c1 = (select object_id from all_objects where 
owner='SCOTT' and object_name = 'BONUS');
delete from a where c1 = (select object_id from all_objects where owner='SYS' 
and object_name = 'OBJ$');
delete from b where c1 = (select object_id from all_objects where owner='SYS' 
and object_name = 'COL$');

variable q1 varchar2(40);
variable q2 varchar2(40);
exec :q1 := 'select * from a order by c1'
exec :q2 := 'select * from b order by c1'
column rnum format 99999;
column cname format a20;
column old_val format a30;
column new_val format a30;

rajesh@ORA10GR2>
rajesh@ORA10GR2> select t1.rnum ,t1.cname, t1.val as old_val,t2.val as new_val
  2  from table (cols_as_rows(:q1)) t1,
  3    table (cols_as_rows(:q2)) t2
  4  where t1.rnum = t2.rnum
  5  and t1.cname = t2.cname
  6  and t1.val <> t2.val;

  RNUM CNAME                OLD_VAL                        NEW_VAL
------ -------------------- ------------------------------ ------------------------
    17 C1                   19                             18
    17 C2                   IND$                           OBJ$
    18 C1                   20                             19
    18 C2                   ICOL$                          IND$
    19 C1                   21                             20
    19 C2                   COL$                           ICOL$
 55624 C3                   29-nov-2010                    28-nov-2010
 55625 C3                   30-nov-2010                    29-nov-2010
 55639 C2                   emp                            EMP
 55640 C2                   DEPT                           dept

10 rows selected.

Elapsed: 00:00:02.79
rajesh@ORA10GR2>

Tom Kyte
May 23, 2012 - 11:07 am UTC

cols_as_rows would not be efficient on "large" things. I built that function to deal with a single or very very few rows.

and - you'd have to deal with the fact the join should be on C1 - not on RNUM. Missing/additional rows messes up your join entirely. You'd have to write a "custom" cols_as_rows that output C1 on every row instead of RNUM

and you'd want to used "decode(t1.val,t2.val,1,0) = 0" to deal with nulls



how about my below query?

A reader, May 25, 2012 - 3:57 am UTC

below query should work with the original posted create-insert script.


select pk, cast(regexp_replace(t,',.*') as varchar2(20)) columnname,
cast(regexp_replace(t,'.*,(.*),.*','\1') as varchar2(20)) newvalue,
cast(regexp_replace(t,'.*,') as varchar2(20)) oldvalue from (
select pk, cast(decode(level,1,name,2,DESIGNATION,3,SALARY,4,COMMISSION,5,HIREDATE) as varchar2(80)) t from (
select a.eno pk,decode(a.ename, b.ename,null,'NAME'||', '||a.ename||', '||b.ename) name,
decode(a.DESIGNATION, b.DESIGNATION,null,'DESIGNATION'||', '||a.DESIGNATION||', '||b.DESIGNATION) DESIGNATION,
decode(a.SALARY, b.SALARY,null,'SALARY'||', '||a.SALARY||', '||b.SALARY) SALARY,
decode(a.COMMISSION, b.COMMISSION,null,'COMMISSION'||', '||a.COMMISSION||', '||b.COMMISSION) COMMISSION,
decode(a.HIREDATE, b.HIREDATE,null,'HIREDATE'||', '||a.HIREDATE||', '||b.HIREDATE) HIREDATE
from emp_new a, emp_old b where a.eno = b.eno)
connect by level<=5 and prior pk = pk and prior dbms_random.value is not null)
where t is not null;
Tom Kyte
May 25, 2012 - 9:17 am UTC

why don't you go ahead and benchmark this approach versus the others..

As soon as I see "regexp" and "big tables" in which we'll process every row - I know I'll be shopping for many more cpus and hoping for parallel query to kick in.

and throw a connect by in there - ....


I did not validate this query at all.