Skip to Main Content
  • Questions
  • Compare columns in two tables and report which column is different

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question.

Asked: May 11, 2009 - 11:41 pm UTC

Last updated: April 27, 2022 - 1:28 am UTC

Version: 9i

Viewed 50K+ times! This question is

You Asked

Compare columns in two tables and list out column names which are different. for ex:- create table t1(c1 number(2), c2 varchar2(10));
create table t2(c1 number(2), c2 varchar2(10)); insert into t1 values(1,'a'); insert into t2 values(1,'b'); result should be column c2 is different. Please suggest me how to do this one?


and Tom said...

there is so much missing here...

I'll have to assume that C1 is the primary key of both tables.
I'll have to assume you expect rows to be in C1 and not in C2 sometimes - and vice versa.
I'll have to assume that in general there will be a c3, c4 and so on.
I'll have to assume that if the rows are exactly the same - you don't want to see them at all.

I'll have to assume that if I answered your question as it stands without making lots of other assumptions - you'd be coming back over and over and over and saying "yeah, but what about...."

Asking questions - being very very specific about your needs - that is the most crucial attribute a person in our industry MUST have (well, if they want to be successful that is)...


So, using my assumptions and making your 'test case' (it isn't a test case yet, it is lacking) a better one:

ops$tkyte%ORA10GR2> create table t1(c1 number(2) primary key, c2 varchar2(10), c3 varchar2(10));

Table created.

ops$tkyte%ORA10GR2> create table t2(c1 number(2) primary key, c2 varchar2(10), c3 varchar2(10));

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 values(1,'a','c');

1 row created.

ops$tkyte%ORA10GR2> insert into t2 values(1,'b','c');

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 values(2,'a','b');

1 row created.

ops$tkyte%ORA10GR2> insert into t2 values(2,'a','c');

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 values(3,'a','b');

1 row created.

ops$tkyte%ORA10GR2> insert into t2 values(3,'a','b');

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 values(4,'a','b');

1 row created.

ops$tkyte%ORA10GR2> insert into t2 values(4,'x','y');

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 values(5,'a','b');

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t2 values(6,'a','b');

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select case when t1.c1 is null then 'missing in t1'
  2              when t2.c1 is null then 'missing in t2'
  3                  end m1,
  4             decode( t1.c2, t2.c2, '', 'c2' ) c2_flag,
  5             decode( t1.c3, t2.c3, '', 'c3' ) c3_flag,
  6             coalesce(t1.c1,t2.c1) pk,
  7             t1.c2, t2.c2,
  8             t1.c3, t2.c3
  9    from t1 full outer join t2 on (t1.c1 = t2.c1)
 10   where decode( t1.c2, t2.c2, '', 'c2' ) is not null
 11      or decode( t1.c3, t2.c3, '', 'c3' ) is not null
 12   order by pk
 13  /

M1            C2 C3         PK C2         C2         C3         C3
------------- -- -- ---------- ---------- ---------- ---------- ----------
              c2             1 a          b          c          c
                 c3          2 a          a          b          c
              c2 c3          4 a          x          b          y
missing in t2 c2 c3          5 a                     b
missing in t1 c2 c3          6            a                     b



Rating

  (8 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

comparing tables

A reader, May 12, 2009 - 4:59 pm UTC


Another approach

Stew Ashton, May 13, 2009 - 8:19 am UTC


This approach is based on your "compare tables" technique from asktom and Oracle Magazine. When the same PK is present in both tables, I present the data in two rows instead of one. The COL_DIF_ID column mimicks the GROUPING_ID Function: it is a bit vector that indicates which columns actually changed.
select case sum(new_cnt - old_cnt) over(partition by c1)
  when 1 then 'T2 (missing in T1)'
  when -1 then 'T1 (missing in T2)'
  else case old_cnt when 1 then 'T1' else 'T2' end
  end  "TABLE"
, C1 PK, C2, C3
, case when new_cnt = 1 and count(*) over(partition by c1) > 1 then (
  case when lag(c2) over(partition by c1 order by new_cnt) = c2 then 0 else 1 end
  + case when lag(c3) over(partition by c1 order by new_cnt) = c3 then 0 else 2 end
  ) end COL_DIF_ID
from (
  select c1, c2, c3, sum(old_cnt) old_cnt, sum(new_cnt) new_cnt from (
    select o.*, 1 old_cnt, 0 new_cnt from t1 o
    union all
    select n.*, 0 old_cnt, 1 new_cnt from t2 n
  ) group by c1, c2, c3 having sum(old_cnt) <> sum(new_cnt)
) order by c1, new_cnt
/

TABLE              PK    C2         C3         COL_DIF_ID
------------------ ----- ---------- ---------- -----------
T1                 1     a          c
T2                 1     b          c          1
T1                 2     a          b
T2                 2     a          c          2
T1                 4     a          b
T2                 4     x          y          3
T1 (missing in T2) 5     a          b
T2 (missing in T1) 6     a          b

8 rows selected

For your statement: extract from dbms_xplan.display_cursor(null,null,'ALLSTATS LAST')
-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name         | Starts | E-Rows | A-Rows | Buffers | Used-Mem |
-----------------------------------------------------------------------------------------------
|   1 |  SORT ORDER BY         |              |      1 |      6 |      5 |      23 | 2048  (0)|
|   2 |   VIEW                 |              |      1 |      6 |      5 |      23 |          |
|   3 |    UNION-ALL           |              |      1 |        |      5 |      23 |          |
|*  4 |     FILTER             |              |      1 |        |      4 |      14 |          |
|*  5 |      HASH JOIN OUTER   |              |      1 |      5 |      5 |      14 |  965K (0)|
|   6 |       TABLE ACCESS FULL| T1           |      1 |      5 |      5 |       7 |          |
|   7 |       TABLE ACCESS FULL| T2           |      1 |      5 |      5 |       7 |          |
|   8 |     NESTED LOOPS ANTI  |              |      1 |      1 |      1 |       9 |          |
|*  9 |      TABLE ACCESS FULL | T2           |      1 |      5 |      5 |       7 |          |
|* 10 |      INDEX UNIQUE SCAN | SYS_C0030494 |      5 |      5 |      4 |       2 |          |
-----------------------------------------------------------------------------------------------
For my statement: extract from dbms_xplan.display_cursor(null,null,'ALLSTATS LAST')
------------------------------------------------------------------------------
| Id  | Operation              | Name | Starts | A-Rows | Buffers | Used-Mem |
------------------------------------------------------------------------------
|   1 |  WINDOW SORT           |      |      1 |      8 |      14 | 2048  (0)|
|*  2 |   FILTER               |      |      1 |      8 |      14 |          |
|   3 |    SORT GROUP BY       |      |      1 |      9 |      14 | 8192  (0)|
|   4 |     VIEW               |      |      1 |     10 |      14 |          |
|   5 |      UNION-ALL         |      |      1 |     10 |      14 |          |
|   6 |       TABLE ACCESS FULL| T1   |      1 |      5 |       7 |          |
|   7 |       TABLE ACCESS FULL| T2   |      1 |      5 |       7 |          |
------------------------------------------------------------------------------

It appears that one approach requires 3 full scans and the other only does 2.

Compare columns in same table

A reader, January 03, 2012 - 1:49 pm UTC

Hi Tom,

I have a table:

create table demo_test(
column_1 varchar2(100),
column_2 date,
column_3 date,
status varchar2(1));

insert into demo_test values('VAL1','01-JAN-2004','01-JAN-2004',NULL);

insert into demo_test values('VAL2','01-JAN-2004','01-JAN-2009',NULL);

insert into demo_test values('VAL3','01-JAN-2004','01-JAN-2006','D');


I have to select rows where column_2 is not equal to column_1 and status is NULL

I am using this but it is taking long time. Table contains around 3 million records

select a.* from demo_test a, demo_test b
where a.column_1 = b.columns_1
and a.column_2 = b.column_2
and a.column_3 != b.column_3
and a.status is null



Please suggest if you have a better option and indexes if I have to make.
Tom Kyte
January 04, 2012 - 8:31 am UTC

I have to select rows where column_2 is not equal to column_1 and status is NULL


well, that - as stated - would simply be

select * 
  from demo_test
 where decode( column_1, column_2, 1, 0 ) = 0
   and status is null



that decode compares column_1 and 2 - and considers NULLS to be equal - so if colum_1 and 2 are the same or both are null - then it will return 1, else 0.


I don't have any idea why you joined, there is nothing in your specification that would lead one to a join at all?????


Either - you have miscoded your SQL way off - or your specification isn't true to what you really intend on doing.

Thank You

A reader, January 05, 2012 - 1:28 pm UTC

Hi Tom,

Thanks for the reply. Yes I had messed up with that query. There was no need of self join.

I was trying something afterwards as

select * from demo_test a where a.start_date != a.end_date and a.status is null;


But the problem would occur when NULL were present. So your code works perfectly fine.


Thanks again.

New approach

A reader, March 15, 2019 - 8:16 am UTC

Hi
In 2019 and for 12c v1 release. Do you have other approach to do the same comparison with reporting each diff.

My requirements is starting from t1. T1 is the source that really count. For each col in t1 need to check and report whether it exists/equals/ different/ vs t2 or whether find many in t2 for the compared cols all together.
Chris Saxon
March 15, 2019 - 8:32 am UTC

I'm not aware of anything recent that is better than the union-all/group by method for comparing tables.

Re: "New approach"

Stew Ashton, March 15, 2019 - 9:21 am UTC

Hi,

I have worked on techniques for comparing tables for several years. When the MATCH_RECOGNIZE clause came out with version 12c, I tried it. In my tests, the good old union all / group by solution continued to be the fastest and most efficient technique.

If you want to see not only what rows have differences, but also which columns are different, then please see
https://stewashton.wordpress.com/2018/05/21/actually-seeing-the-differences/

If you provide a test case with CREATE TABLE and INSERT statements, I could show you a solution here.

Best regards,
Stew Ashton
Connor McDonald
April 27, 2022 - 1:28 am UTC

For those put off by match_recognize, I know sometimes pattern matching examples can leave you needing some codeine :-) but with practice and time it becomes a lot easier.

Be patient

My case

A reader, March 16, 2019 - 7:29 am UTC

Thank you Chris and Stew.
My requirements is one way comparison so it could be easier to achieve.
What matter for me is to check only my T1 rows vs T2 ones.
Ie. Check whether all cols Name , Sal, dept etc of T1 find exact match in my ref table T2. So T2 could have more rows for more employees I won't care about.
My requirements is to generate report showing what is wrong in my T1 vs T2.
With the option that ie if for one employee say empid 10 , if T2.name equals "*" this means that is joker for my T1.name and hence it matches.

You could consider a business example with the customers bills (fact table T1). Lots of bill ids and comparing each row to a Billing repository (T2) .
For instance common cols in T1 and T2 are about 20 columns say Cust_id, bill_type, state, store, etc..

So reporting for each bill_id what match and what it does not for the rest of cols vs the same cols of T2 and taking in account the optionally "*"


Column Diff Query Template

JStew, November 13, 2021 - 3:12 pm UTC

My requirement is to send out data to managers, they change any/all/none of the data in the columns, and send back to me. I then have to identify each column that has a difference from what I sent, and mark those columns as changed for a central office reviewer to visually scan and approve/deny the changes for integration back into the central data set.

This solution may not fit your needs of course, but a template structure is offered here that you can augment to meet your needs no matter the number of columns.

You could easily write plsql to write this query for you for the tables in question.
This would get very cumbersome if you had to compare columns from 3 or more tables or bi-directional changes. I only care about single direction changes. Did the person change my original row columns or not. If so, what columns did they change, and what was my before value and what is their after value, and show me nothing else please.

In other words, only show me rows with columns that have changes with their before values and nothing else.


create table thing1 (id number, firstname varchar2(10), lastname varchar2(10));
create table thing2 (id number, firstname varchar2(10), lastname varchar2(10));

insert into thing1 values (1,'Buddy', 'Slacker');
insert into thing2 values (1,'Buddy', 'Slacker');
insert into thing1 values (2,'Mary', 'Slacker');
insert into thing2 values (2,'Mary', 'Slacke');
insert into thing1 values (3,'Timmy', 'Slacker');
insert into thing2 values (3,'Timm', 'Slacker');
insert into thing1 values (4,'Missy', 'Slacker');
insert into thing2 values (4,'Missy', 'Slacker');
commit;



Un-comment commented select * queries one at a time after each data set to understand what is in each data set at each stage of the refinement process.

with rowdifferences as
(
select
id
,firstname
,lastname
from thing2
minus
select
id
,firstname
,lastname
from thing1
)
--select * from rowdifferences
,thing1matches as
(
select
t1.id
,t1.firstname
,t1.lastname
from thing1 t1
join rowdifferences rd on t1.id = rd.id
)
--select * from thing1matches
, col1differences as
(
select
id
,firstname
from rowdifferences
minus
select
id
,firstname
from thing1matches
)
--select * from col1differences
, col2differences as
(
select
id
,lastname
from rowdifferences
minus
select
id
,lastname
from thing1matches
)
--select * from col2differences
,truedifferences as
(
select
case when c1.id is not null then c1.id
when c2.id is not null then c2.id
end id
,c1.firstname
,c2.lastname
from col1differences c1
full join col2differences c2 on c1.id = c2.id
)
--select * from truedifferences
select
t1m.id
,case when td.firstname is not null then t1m.firstname end beforefirstname
,td.firstname afterfirstname
,case when td.lastname is not null then t1m.lastname end beforelastname
,td.lastname afterlastname
from thing1matches t1m
join truedifferences td on t1m.id = td.id
;



Connor McDonald
November 16, 2021 - 5:07 am UTC

nice stuff