Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, Priscillia.

Asked: February 08, 2001 - 7:55 pm UTC

Last updated: November 16, 2020 - 3:27 am UTC

Version: oracle 8i

Viewed 100K+ times! This question is

You Asked

Hi Tom,

I have two tables of values(dept1 and dept2). How do I compare all the data in these two tables??? It will return true if both tables contain the same data & false for otherwise...
Another thing is that I do not know how to use CREATE OPERATOR using Oracle 8i.. Can U explain more and give me specific examples????

Thks.

Best Rgds,
Priscillia

and Tom said...

select count(*)
from
(
( select * from dept1
minus
select * from dept2 )
union all
( select * from dept2
minus
select * from dept1 )
)

will tell you how many rows are in dept1 not in dept2 + in dept2 not in dept1


as for the operators, unless you are going to build your own datatype, they are not very relevant to you.

see
</code> http://docs.oracle.com/cd/A81042_01/DOC/appdev.816/a76937/toc.htm
and specifically:

http://docs.oracle.com/cd/A81042_01/DOC/appdev.816/a76937/dci07id3.htm#620608 <code>
...


Rating

  (92 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

dimantion update

A reader, October 18, 2002 - 12:39 pm UTC

Hi tom,

I have a user table in NY. and another table user_d in london.

the table structure of both the tables is different but
some fileds are
common ( userid,user_name,user_country,city..)

now I db in london is in sql server 6.5 and db in NY is ora 8i.

I have to append the rows from london to NY every night.
and also update any changes made in those rows that are changed (eg some user changed his address )

what is the best way.

expected rows inthe dbny is 4.2 mil
expected rows in dblondon is 4.8 mil

expected new rows every day 10k format flat file

can you please suggest me the best way ?

shuld I truncate the table in NY every nigth and reload ?
but it has lots of references !!

of check every night ? 4.1 mil rows ?

any other best solution ?

TIA

Tom Kyte
October 18, 2002 - 12:47 pm UTC

whatever the heck "dimantion" means....

are you getting a flat file with updates -- rows to insert and rows to update?

sorry for that "dimention"

A reader, October 18, 2002 - 12:50 pm UTC

I get only one flat file with all data with their table structure and I load it in temp table

TIA

Tom Kyte
October 18, 2002 - 1:13 pm UTC

is that what is in

expected new rows every day 10k format flat file


so that is just CHANGED ROWS or WHAT? please be more clear -- pretend I don't know anything about your environment (hey -- I don't)

flat file

A reader, October 18, 2002 - 1:13 pm UTC

I get the file that conatains all the the data since
begining. ( all records every time)

all records = old not changed + old changed + new




Tom Kyte
October 18, 2002 - 1:13 pm UTC

how could that be 10k in size?

flat file

A reader, October 18, 2002 - 1:26 pm UTC

the only new rows are 10K

so all to gether all previous reocrds + 10K new records

Tom Kyte
October 18, 2002 - 1:30 pm UTC

Well, if you get a file with ALL of the records and no way to distiguish between "new", "modified" and "unchanged" my suggestion would be:

o goto the sqlserver guys
o tell them "hey, give me the ADDS and MODS and nothing else"


otherwise, your processing is going to be heinously bad. That is my suggestion. Get the CHANGES, not the entire database. they should be able to do that pretty easily (even I could tell them how to do that in sqlserver ;)

changed records

A reader, October 18, 2002 - 2:00 pm UTC



If there is no timestamp or flag that indicates the modification or change, how would one identify the changed or modified record ?


TIA

Tom Kyte
October 18, 2002 - 4:22 pm UTC

one would add such a detail in order to make ones processing somewhat efficient.

what do you say ?

A reader, October 18, 2002 - 2:49 pm UTC

belive me or not...

When we told them to give the modified and new record data only our london group said ok but they can not 100%
garuntee that they are not " missing some data " or there is no " repeated data "

now if I take that data and put it in the db it will be my responsibility because they already told me they will not 100% garuntee

that's why I asked for all data. now there is no timestamp or flag that will indicate that this is modified record

please advice...

Tom Kyte
October 18, 2002 - 4:27 pm UTC

they need to change their process, or your choice is:

o dump and load every night.

and thats about all she wrote.

Dave, October 18, 2002 - 3:29 pm UTC

If you are stuck with getting the whole data set, and want to avoid statements that look for differences in every one of multiple columns for a few million rows, you might consider a different slant.

You can potentially speed up the identification of changed data by adding a new column to your target table containing a hash value generated by applying the DBMS_UTILITY.GET_HASH_VALUE() function to a concatanation of your columns (populating and updating this value with a trigger I guess). Create an index on your PK (or username, whatever you will use to do the comparison with the new data) + this column. As you load the source data, calculate the hash value for the potentially-new records. Then use a MERGE statement to do update when the hash value is different, or insert where the record does not exist.

Does that make sense? It's a trade-off between CPU and I/O I suppose.

Tom Kyte
October 18, 2002 - 4:29 pm UTC

since you would have to load everything to do that -- sounds like:

o create a single partition table
o load the data into this
o when you get a new file, load it into a stage table, index it, etc
o alter table exchange partition <<<=== swap the newly loaded table with the existing partition

minimum downtime (just the exchange)....

not possible

A reader, October 24, 2002 - 3:35 pm UTC

Hi tom,

I can do swap but,

1.) The data is extracted and transformed in to other tables too. and there are audit tails so still I have to update them.

2.) I am thinking to

-> create new_data talble
-> add one more column ( isused varchar2(1))
3.) when ever a row from that table is extracted,
I chage the flag isused = 't'
4.) next day
4.1--> append the new data in the new_data table
4.2--> compare all rows withthe previous days data
if the the row matches with the previous days row
( all fields except load_id)
then
change that row's isused column to 'T'
5.) after that process all rows with isused = 'f' or null ?

do u think it is ok ?

but I don't know how to do setp 4.2 can you plase advise
how to compare like that ?

TIA

Tom Kyte
October 24, 2002 - 4:21 pm UTC

sorry -- sounds like a new question and insufficient data to answer anything. I don't know what you mean by "extracted" and how extracting would "update". A row by row audit trail is going to be the slow point here so you might as well just brute force it procedurally anyway if you must do that.

Sounds like you need to sit down and come up with your procedural design.

Table data difference and make the statements like insert , update and delete

kumar, October 24, 2002 - 6:25 pm UTC

Tom,
This question is slightly differs from the above discussion however the reason I am putting on the same thread is to find out is there any ready made script available in the scripts bank? If I have two identical tables from two different databases and not much of data in those tables at the most maximum 15000 rows. I want to generate the DML statements like delete, update, insert after comparing table1 with table2. Just let me know if any ready made procedure such exist in these script bank. I am in the need of make one however if any exist I can tweek that one easily. All my data comparision goes through Primary key.

Thanks

Tom Kyte
October 24, 2002 - 7:03 pm UTC

bank is empty on that one. sorry.

"How to compare two tables of data????", version oracle 8i

Karl Bergerson, April 03, 2003 - 2:00 am UTC

Simply excellent as usual!

Which is the better execution plan

Angus McTavish aged 75, June 12, 2003 - 6:44 am UTC

Going back to your original answer, I'd like to pose this example and ask which you think is the better method.

Two tables, same structure, 60000 rows, primary key on ID, analyzed. First example is consistently slower by a long way despite the lower cost and much fewer gets. It is doing a lot of recursive sql (why?) and more sorts however. I've run both examples numerous times and the results are pretty consistent.

I have to do a lot of this kind of thing, comparing data from two tables, so I'd appreciate your thoughts.

SQL> set timing on
SQL> set autot traceonly
SQL> select id,'In table 1, not in table 2'
  2  from accounts
  3  minus
  4  select id,'In table 1, not in table 2'
  5  from accounts2
  6  union
  7  select id,'In table 2 not in table 1'
  8  from accounts2
  9   minus
 10  select id,'In table 2 not in table 1'
 11  from accounts
 12  /

49 rows selected.

Elapsed: 00:00:17.00

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1208 Card=253738 Byt
   1    0   MINUS
   2    1     SORT (UNIQUE) (Cost=1208 Card=253738 Bytes=1014952)
   3    2       UNION-ALL
   4    3         MINUS
   5    4           SORT (UNIQUE) (Cost=302 Card=63450 Bytes=253800)
   6    5             INDEX (FAST FULL SCAN) OF 'PK_AC1' (UNIQUE) (Cos
   7    4           SORT (UNIQUE) (Cost=302 Card=63419 Bytes=253676)
   8    7             INDEX (FAST FULL SCAN) OF 'PK_AC2' (UNIQUE) (Cos
   9    3         INDEX (FAST FULL SCAN) OF 'PK_AC2' (UNIQUE) (Cost=9
  10    1     SORT (UNIQUE) (Cost=302 Card=63450 Bytes=253800)
  11   10       INDEX (FAST FULL SCAN) OF 'PK_AC1' (UNIQUE) (Cost=9 Ca




Statistics
----------------------------------------------------------
         32  recursive calls
         25  db block gets
        544  consistent gets
       1248  physical reads
          0  redo size
       1460  bytes sent via SQL*Net to client
        451  bytes received via SQL*Net from client
          6  SQL*Net roundtrips to/from client
          0  sorts (memory)
          4  sorts (disk)
         49  rows processed

SQL> select id,'In table 1, not in table 2'
  2  from accounts
  3  where id not in (select id from accounts2)
  4  union
  5   select id,'In table 2 not in table 1'
  6  from accounts2
  7  where id not in (select id from accounts)
  8  /

49 rows selected.

Elapsed: 00:00:09.05

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=32 Card=6344 Bytes=2
   1    0   SORT (UNIQUE) (Cost=32 Card=6344 Bytes=25376)
   2    1     UNION-ALL
   3    2       FILTER
   4    3         INDEX (FAST FULL SCAN) OF 'PK_AC1' (UNIQUE) (Cost=9
   5    3         INDEX (UNIQUE SCAN) OF 'PK_AC2' (UNIQUE) (Cost=1 Car
   6    2       FILTER
   7    6         INDEX (FAST FULL SCAN) OF 'PK_AC2' (UNIQUE) (Cost=9
   8    6         INDEX (UNIQUE SCAN) OF 'PK_AC1' (UNIQUE) (Cost=1 Car




Statistics
----------------------------------------------------------
          0  recursive calls
          8  db block gets
     253856  consistent gets
          0  physical reads
          0  redo size
       1460  bytes sent via SQL*Net to client
        418  bytes received via SQL*Net from client
          6  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         49  rows processed
 

Tom Kyte
June 12, 2003 - 9:12 am UTC

Looks like a poorly sized SORT_AREA_SIZE.  In 9iR2 using workarea_size_policy = auto and a pga_aggregate target set reasonably, the first plan always bests the second.

You see -- in your output, lots of physical IO in the first one, the recursive sql is due to the SORTS TO DISK.  Consider:



ops$tkyte@ORA920> create table t1 as select * from all_objects where owner <> 'AQ';
Table created.

ops$tkyte@ORA920> update t1 set object_id = -object_id;
30228 rows updated.

ops$tkyte@ORA920> insert into t1 select * from all_objects where owner <> 'AQ';
30228 rows created.

ops$tkyte@ORA920> alter table t1 add constraint t1_pk primary key(object_id);
Table altered.

ops$tkyte@ORA920> analyze table t1 compute statistics
  2  for table for all indexes for all indexed columns;
Table analyzed.

ops$tkyte@ORA920> create table t2 as select * from all_objects where owner <> 'HR';
Table created.

ops$tkyte@ORA920> update t2 set object_id = -object_id;
30219 rows updated.

ops$tkyte@ORA920> insert into t2 select * from all_objects where owner <> 'HR';
30219 rows created.

ops$tkyte@ORA920> alter table t2 add constraint t2_pk primary key(object_id);
Table altered.

ops$tkyte@ORA920> analyze table t2 compute statistics
  2  for table for all indexes for all indexed columns;
Table analyzed.

Now, we can observe:

ops$tkyte@ORA920> set timing on
ops$tkyte@ORA920> set autotrace traceonly
ops$tkyte@ORA920> select object_id,'In table 1, not in table 2'
  2  from t1
  3  where object_id not in (select object_id from t2)
  4  union
  5   select object_id,'In table 2 not in table 1'
  6  from t2
  7  where object_id not in (select object_id from t1)
  8  /

118 rows selected.

Elapsed: 00:00:00.67

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=34 Card=19 Bytes=190)
   1    0   SORT (UNIQUE) (Cost=34 Card=19 Bytes=190)
   2    1     UNION-ALL
   3    2       NESTED LOOPS (ANTI) (Cost=14 Card=18 Bytes=180)
   4    3         INDEX (FAST FULL SCAN) OF 'T1_PK' (UNIQUE) (Cost=14 Card=60456 Bytes=302280)
   5    3         INDEX (UNIQUE SCAN) OF 'T2_PK' (UNIQUE)
   6    2       NESTED LOOPS (ANTI) (Cost=14 Card=1 Bytes=10)
   7    6         INDEX (FAST FULL SCAN) OF 'T2_PK' (UNIQUE) (Cost=14 Card=60438 Bytes=302190)
   8    6         INDEX (UNIQUE SCAN) OF 'T1_PK' (UNIQUE)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     121168  consistent gets
          0  physical reads
          0  redo size
       2659  bytes sent via SQL*Net to client
        576  bytes received via SQL*Net from client
          9  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        118  rows processed

ops$tkyte@ORA920> select object_id,'In table 1, not in table 2'
  2  from t1
  3  minus
  4  select object_id,'In table 1, not in table 2'
  5  from t2
  6  union
  7  select object_id,'In table 2 not in table 1'
  8  from t2
  9   minus
 10  select object_id,'In table 2 not in table 1'
 11  from t1
 12  /

118 rows selected.

Elapsed: 00:00:00.65

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=564 Card=120894 Bytes=1208940)
   1    0   MINUS
   2    1     SORT (UNIQUE) (Cost=564 Card=120894 Bytes=1208940)
   3    2       UNION-ALL
   4    3         MINUS
   5    4           SORT (UNIQUE) (Cost=141 Card=60456 Bytes=302280)
   6    5             INDEX (FAST FULL SCAN) OF 'T1_PK' (UNIQUE) (Cost=14 Card=60456 Bytes=302280)
   7    4           SORT (UNIQUE) (Cost=141 Card=60438 Bytes=302190)
   8    7             INDEX (FAST FULL SCAN) OF 'T2_PK' (UNIQUE) (Cost=14 Card=60438 Bytes=302190)
   9    3         INDEX (FAST FULL SCAN) OF 'T2_PK' (UNIQUE) (Cost=14 Card=60438 Bytes=302190)
  10    1     SORT (UNIQUE) (Cost=141 Card=60456 Bytes=302280)
  11   10       INDEX (FAST FULL SCAN) OF 'T1_PK' (UNIQUE) (Cost=14 Card=60456 Bytes=302280)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        540  consistent gets
          0  physical reads
          0  redo size
       2659  bytes sent via SQL*Net to client
        576  bytes received via SQL*Net from client
          9  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
        118  rows processed


<b>but, if we force a bad thing:



ops$tkyte@ORA920> alter session set workarea_size_policy=manual;

Session altered.

ops$tkyte@ORA920> alter session set sort_area_size=65536;

Session altered.


we see something a tad different</b>

118 rows selected.

Elapsed: 00:00:01.63

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1248 Card=120894 Bytes=1208940)
   1    0   MINUS
   2    1     SORT (UNIQUE) (Cost=1248 Card=120894 Bytes=1208940)
   3    2       UNION-ALL
   4    3         MINUS
   5    4           SORT (UNIQUE) (Cost=312 Card=60456 Bytes=302280)
   6    5             INDEX (FAST FULL SCAN) OF 'T1_PK' (UNIQUE) (Cost=14 Card=60456 Bytes=302280)
   7    4           SORT (UNIQUE) (Cost=312 Card=60438 Bytes=302190)
   8    7             INDEX (FAST FULL SCAN) OF 'T2_PK' (UNIQUE) (Cost=14 Card=60438 Bytes=302190)
   9    3         INDEX (FAST FULL SCAN) OF 'T2_PK' (UNIQUE) (Cost=14 Card=60438 Bytes=302190)
  10    1     SORT (UNIQUE) (Cost=312 Card=60456 Bytes=302280)
  11   10       INDEX (FAST FULL SCAN) OF 'T1_PK' (UNIQUE) (Cost=14 Card=60456 Bytes=302280)




Statistics
----------------------------------------------------------
          0  recursive calls
          9  db block gets
        540  consistent gets
       1184  physical reads
          0  redo size
       2659  bytes sent via SQL*Net to client
        576  bytes received via SQL*Net from client
          9  SQL*Net roundtrips to/from client
          0  sorts (memory)
          4  sorts (disk)
        118  rows processed

<b>see, 4 sorts to disk, just like you -- with the corresponding PIO that goes with that.  I'd guess that your sort area size is at the silly small default of 64k -- up it.</b>

 

Which is the better execution plan

Angus McTavish aged 75, June 12, 2003 - 6:54 am UTC

Going back to your original answer, I'd like to pose this example and ask which you think is the better method.

Two tables, same structure, 60000 rows, primary key on ID, analyzed. First example is consistently slower by a long way despite the lower cost and much fewer gets. It is doing a lot of recursive sql (why?) and more sorts however. I've run both examples numerous times and the results are pretty consistent.

I have to do a lot of this kind of thing, comparing data from two tables, so I'd appreciate your thoughts.

SQL> set timing on
SQL> set autot traceonly
SQL> select id,'In table 1, not in table 2'
  2  from accounts
  3  minus
  4  select id,'In table 1, not in table 2'
  5  from accounts2
  6  union
  7  select id,'In table 2 not in table 1'
  8  from accounts2
  9   minus
 10  select id,'In table 2 not in table 1'
 11  from accounts
 12  /

49 rows selected.

Elapsed: 00:00:17.00

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1208 Card=253738 Byt
   1    0   MINUS
   2    1     SORT (UNIQUE) (Cost=1208 Card=253738 Bytes=1014952)
   3    2       UNION-ALL
   4    3         MINUS
   5    4           SORT (UNIQUE) (Cost=302 Card=63450 Bytes=253800)
   6    5             INDEX (FAST FULL SCAN) OF 'PK_AC1' (UNIQUE) (Cos
   7    4           SORT (UNIQUE) (Cost=302 Card=63419 Bytes=253676)
   8    7             INDEX (FAST FULL SCAN) OF 'PK_AC2' (UNIQUE) (Cos
   9    3         INDEX (FAST FULL SCAN) OF 'PK_AC2' (UNIQUE) (Cost=9
  10    1     SORT (UNIQUE) (Cost=302 Card=63450 Bytes=253800)
  11   10       INDEX (FAST FULL SCAN) OF 'PK_AC1' (UNIQUE) (Cost=9 Ca




Statistics
----------------------------------------------------------
         32  recursive calls
         25  db block gets
        544  consistent gets
       1248  physical reads
          0  redo size
       1460  bytes sent via SQL*Net to client
        451  bytes received via SQL*Net from client
          6  SQL*Net roundtrips to/from client
          0  sorts (memory)
          4  sorts (disk)
         49  rows processed

SQL> select id,'In table 1, not in table 2'
  2  from accounts
  3  where id not in (select id from accounts2)
  4  union
  5   select id,'In table 2 not in table 1'
  6  from accounts2
  7  where id not in (select id from accounts)
  8  /

49 rows selected.

Elapsed: 00:00:09.05

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=32 Card=6344 Bytes=2
   1    0   SORT (UNIQUE) (Cost=32 Card=6344 Bytes=25376)
   2    1     UNION-ALL
   3    2       FILTER
   4    3         INDEX (FAST FULL SCAN) OF 'PK_AC1' (UNIQUE) (Cost=9
   5    3         INDEX (UNIQUE SCAN) OF 'PK_AC2' (UNIQUE) (Cost=1 Car
   6    2       FILTER
   7    6         INDEX (FAST FULL SCAN) OF 'PK_AC2' (UNIQUE) (Cost=9
   8    6         INDEX (UNIQUE SCAN) OF 'PK_AC1' (UNIQUE) (Cost=1 Car




Statistics
----------------------------------------------------------
          0  recursive calls
          8  db block gets
     253856  consistent gets
          0  physical reads
          0  redo size
       1460  bytes sent via SQL*Net to client
        418  bytes received via SQL*Net from client
          6  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         49  rows processed
 

It ain't the sorts to disk...

Angus, June 13, 2003 - 5:06 am UTC

Tom,
Thanks for the suggestion - I'd overlooked the sort area size, and I thought you'd hit the nail on the head because I was using a small sort area - 131072.
However, if I up the sort area size to 100 mb the sorts to disk go away as you would expect, but the PIO remains the same as does the recursive sql and the response time. For info, this is not on 9i but 8i (8.1.7.4). Something else is going on here... Is it something to do with the way the index blocks are kept in or flushed out of the SGA depending upon the execution plans? However many times I run the two bits of sql the results are consistent.
SQL> set autot traceonly
SQL> set timing on
SQL> ed
Wrote file afiedt.buf

  1  select id,'In table 1, not in table 2'
  2    from accounts
  3    minus
  4    select id,'In table 1, not in table 2'
  5    from accounts2
  6    union
  7    select id,'In table 2 not in table 1'
  8    from accounts2
  9     minus
 10    select id,'In table 2 not in table 1'
 11*   from accounts
SQL> /

49 rows selected.

Elapsed: 00:00:07.06

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=32 Card=6344 Bytes=2
   1    0   SORT (UNIQUE) (Cost=32 Card=6344 Bytes=25376)
   2    1     UNION-ALL
   3    2       FILTER
   4    3         INDEX (FAST FULL SCAN) OF 'PK_AC1' (UNIQUE) (Cost=9 
   5    3         INDEX (UNIQUE SCAN) OF 'PK_AC2' (UNIQUE) (Cost=1 Car
   6    2       FILTER
   7    6         INDEX (FAST FULL SCAN) OF 'PK_AC2' (UNIQUE) (Cost=9 
   8    6         INDEX (UNIQUE SCAN) OF 'PK_AC1' (UNIQUE) (Cost=1 Car



Statistics
----------------------------------------------------------
          0  recursive calls
          8  db block gets
     253856  consistent gets
          0  physical reads
          0  redo size
       1432  bytes sent via SQL*Net to client
        442  bytes received via SQL*Net from client
          6  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         49  rows processed

SQL> /

49 rows selected.

Elapsed: 00:00:07.06

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=32 Card=6344 Bytes=2
   1    0   SORT (UNIQUE) (Cost=32 Card=6344 Bytes=25376)
   2    1     UNION-ALL
   3    2       FILTER
   4    3         INDEX (FAST FULL SCAN) OF 'PK_AC1' (UNIQUE) (Cost=9
   5    3         INDEX (UNIQUE SCAN) OF 'PK_AC2' (UNIQUE) (Cost=1 Car
   6    2       FILTER
   7    6         INDEX (FAST FULL SCAN) OF 'PK_AC2' (UNIQUE) (Cost=9
   8    6         INDEX (UNIQUE SCAN) OF 'PK_AC1' (UNIQUE) (Cost=1 Car




Statistics
----------------------------------------------------------
          0  recursive calls
          8  db block gets
     253856  consistent gets
          0  physical reads
          0  redo size
       1460  bytes sent via SQL*Net to client
        442  bytes received via SQL*Net from client
          6  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         49  rows processed


SQL> ed
Wrote file afiedt.buf

  1  select id,'In table 1, not in table 2'
  2    from accounts
  3    minus
  4    select id,'In table 1, not in table 2'
  5    from accounts2
  6    union
  7    select id,'In table 2 not in table 1'
  8    from accounts2
  9     minus
 10    select id,'In table 2 not in table 1'
 11*   from accounts
SQL> /

49 rows selected.

Elapsed: 00:00:14.09

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1208 Card=253738 Byt
   1    0   MINUS
   2    1     SORT (UNIQUE) (Cost=1208 Card=253738 Bytes=1014952)
   3    2       UNION-ALL
   4    3         MINUS
   5    4           SORT (UNIQUE) (Cost=302 Card=63450 Bytes=253800)
   6    5             INDEX (FAST FULL SCAN) OF 'PK_AC1' (UNIQUE) (Cos
   7    4           SORT (UNIQUE) (Cost=302 Card=63419 Bytes=253676)
   8    7             INDEX (FAST FULL SCAN) OF 'PK_AC2' (UNIQUE) (Cos
   9    3         INDEX (FAST FULL SCAN) OF 'PK_AC2' (UNIQUE) (Cost=9
  10    1     SORT (UNIQUE) (Cost=302 Card=63450 Bytes=253800)
  11   10       INDEX (FAST FULL SCAN) OF 'PK_AC1' (UNIQUE) (Cost=9 Ca




Statistics
----------------------------------------------------------
         32  recursive calls
         25  db block gets
        544  consistent gets
       1248  physical reads
          0  redo size
       1461  bytes sent via SQL*Net to client
        471  bytes received via SQL*Net from client
          6  SQL*Net roundtrips to/from client
          4  sorts (memory)
          4  sorts (disk)
         49  rows processed

SQL> alter session set sort_area_size=104857600;

Session altered.

Elapsed: 00:00:00.05


SQL> ed
Wrote file afiedt.buf

  1  select id,'In table 1, not in table 2'
  2    from accounts
  3    minus
  4    select id,'In table 1, not in table 2'
  5    from accounts2
  6    union
  7    select id,'In table 2 not in table 1'
  8    from accounts2
  9     minus
 10    select id,'In table 2 not in table 1'
 11*   from accounts
SQL> /


49 rows selected.

Elapsed: 00:00:14.07

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=528 Card=253738 Byte
   1    0   MINUS
   2    1     SORT (UNIQUE) (Cost=528 Card=253738 Bytes=1014952)
   3    2       UNION-ALL
   4    3         MINUS
   5    4           SORT (UNIQUE) (Cost=132 Card=63450 Bytes=253800)
   6    5             INDEX (FAST FULL SCAN) OF 'PK_AC1' (UNIQUE) (Cos
   7    4           SORT (UNIQUE) (Cost=132 Card=63419 Bytes=253676)
   8    7             INDEX (FAST FULL SCAN) OF 'PK_AC2' (UNIQUE) (Cos
   9    3         INDEX (FAST FULL SCAN) OF 'PK_AC2' (UNIQUE) (Cost=9
  10    1     SORT (UNIQUE) (Cost=132 Card=63450 Bytes=253800)
  11   10       INDEX (FAST FULL SCAN) OF 'PK_AC1' (UNIQUE) (Cost=9 Ca




Statistics
----------------------------------------------------------
         32  recursive calls
         24  db block gets
        544  consistent gets
       1248  physical reads
          0  redo size
       1461  bytes sent via SQL*Net to client
        471  bytes received via SQL*Net from client
          6  SQL*Net roundtrips to/from client
          8  sorts (memory)
          0  sorts (disk)
         49  rows processed
 

Tom Kyte
June 13, 2003 - 8:24 am UTC

my testing shows that I think you have set the sort_area_retained_size too small then and it is writing the results out to TEMP after the sort (sorted in memory and then swapped it all out).

ops$tkyte@ORA817DEV> alter session set sort_area_size=104857600;

Session altered.

Elapsed: 00:00:00.00
ops$tkyte@ORA817DEV> alter session set sort_area_retained_size=0;

Session altered.

Elapsed: 00:00:00.00
ops$tkyte@ORA817DEV> select object_id,'In table 1, not in table 2'
  2  from t1
  3  minus
  4  select object_id,'In table 1, not in table 2'
  5  from t2
  6  union
  7  select object_id,'In table 2 not in table 1'
  8  from t2
  9   minus
 10  select object_id,'In table 2 not in table 1'
 11  from t1
 12  /

Elapsed: 00:00:01.96

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=476 Card=214328 Bytes=1071640)
   1    0   MINUS
   2    1     SORT (UNIQUE) (Cost=476 Card=214328 Bytes=1071640)
   3    2       UNION-ALL
   4    3         MINUS
   5    4           SORT (UNIQUE) (Cost=119 Card=53580 Bytes=267900)
   6    5             INDEX (FAST FULL SCAN) OF 'T1_PK' (UNIQUE) (Cost=4 Card=53580 Bytes=267900)
   7    4           SORT (UNIQUE) (Cost=119 Card=53584 Bytes=267920)
   8    7             INDEX (FAST FULL SCAN) OF 'T2_PK' (UNIQUE) (Cost=4 Card=53584 Bytes=267920)
   9    3         INDEX (FAST FULL SCAN) OF 'T2_PK' (UNIQUE) (Cost=4 Card=53584 Bytes=267920)
  10    1     SORT (UNIQUE) (Cost=119 Card=53580 Bytes=267900)
  11   10       INDEX (FAST FULL SCAN) OF 'T1_PK' (UNIQUE) (Cost=4 Card=53580 Bytes=267900)




Statistics
----------------------------------------------------------
          0  recursive calls
         16  db block gets
        456  consistent gets
          0  physical reads
          0  redo size
        591  bytes sent via SQL*Net to client
        425  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
          4  rows processed

ops$tkyte@ORA817DEV> alter session set sort_area_retained_size=65536;

Session altered.

Elapsed: 00:00:00.00
ops$tkyte@ORA817DEV> select object_id,'In table 1, not in table 2'
  2  from t1
  3  minus
  4  select object_id,'In table 1, not in table 2'
  5  from t2
  6  union
  7  select object_id,'In table 2 not in table 1'
  8  from t2
  9   minus
 10  select object_id,'In table 2 not in table 1'
 11  from t1
 12  /

Elapsed: 00:00:07.67

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=476 Card=214328 Bytes=1071640)
   1    0   MINUS
   2    1     SORT (UNIQUE) (Cost=476 Card=214328 Bytes=1071640)
   3    2       UNION-ALL
   4    3         MINUS
   5    4           SORT (UNIQUE) (Cost=119 Card=53580 Bytes=267900)
   6    5             INDEX (FAST FULL SCAN) OF 'T1_PK' (UNIQUE) (Cost=4 Card=53580 Bytes=267900)
   7    4           SORT (UNIQUE) (Cost=119 Card=53584 Bytes=267920)
   8    7             INDEX (FAST FULL SCAN) OF 'T2_PK' (UNIQUE) (Cost=4 Card=53584 Bytes=267920)
   9    3         INDEX (FAST FULL SCAN) OF 'T2_PK' (UNIQUE) (Cost=4 Card=53584 Bytes=267920)
  10    1     SORT (UNIQUE) (Cost=119 Card=53580 Bytes=267900)
  11   10       INDEX (FAST FULL SCAN) OF 'T1_PK' (UNIQUE) (Cost=4 Card=53580 Bytes=267900)




Statistics
----------------------------------------------------------
          0  recursive calls
         20  db block gets
        456  consistent gets
        518  physical reads
          0  redo size
        591  bytes sent via SQL*Net to client
        425  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
          4  rows processed

ops$tkyte@ORA817DEV> set autotrace off
ops$tkyte@ORA817DEV>
 

Strange behaviour of sort_area_retained_size

Angus McTavish aged 75, June 14, 2003 - 9:01 am UTC

I think you've got the answer this time - thanks. But I've noticed a very strange thing.
If I run my query with the small sort area, I get the disk sorts and all the physical IO. Run it numerous times - same results. If I up the sort area to 100mb I don't get the disk sorts but I do still get the PIO. Again, run the statement over and over again - same results.
But then, and this is the strange thing, if I 'alter session set sort_area_retained_size=whatever', the PIO goes away and the response time improves dramatically. But it doesn't matter what value I use. If I set it to 65536 the PIO goes away, if I set it to zero the PIO goes away. I've tried it over and over again and always the same result - if I explicitly set the retained size to anything, even zero, the PIO vanishes into thin air.
Oracle always does this to me, just when I think 'Oh good, now I understand what's going on', something inexplicable happens and I end up with more questions than answers.
Anyway, as always, thanks for your help on this.

Developer, April 08, 2004 - 8:21 am UTC

I like to get your adavice.
I need to compare two large tables with 30 columns on each table. (The structure of two tables are different. I only need to compare the same datatype fields). I like to know which method has the best performance. When I use

select col1, col2, .. from tab1
minus
select col1, col2, .. from tab2

The performance is the big concern.



Tom Kyte
April 08, 2004 - 10:47 am UTC

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2151582681236#15393095283923 <code>

read that page - the group by trick seems to be "the best"

Compare with table on different box

A reader, April 22, 2004 - 6:02 pm UTC

Hi,
I need to compare 13 million rows (20 columns) with the table in another box/instance. What is your opinion in terms of performance about?

1. Compare through dblink Or
2. Compare in one box, by exporting and importing from another box
Thanks

Tom Kyte
April 23, 2004 - 10:10 am UTC

depends on many factors -- speed of link for one (eg: i could export, compress, transfer and import perhaps faster than selecting them over)

depends on the plan/query you plan on using as well.  


A relatively efficient way to compare two tables, as long as 20 columns fits in a block, can be:

ops$tkyte@ORA9IR2> create table t1 as select * from scott.emp;
 
Table created.
 
ops$tkyte@ORA9IR2> create table t2 as select * from scott.emp;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace on explain
ops$tkyte@ORA9IR2> select max(tname), empno, ename, job, mgr, hiredate, sal, comm, deptno
  2    from
  3  (
  4  select 'T1' tname, t1.* from t1
  5  union all
  6  select 'T2' tname, t2.* from t2
  7  )
  8  group by empno, ename, job, mgr, hiredate, sal, comm, deptno
  9  having count(*) <> 2
 10  /
 
no rows selected
 
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   FILTER
   2    1     SORT (GROUP BY)
   3    2       VIEW
   4    3         UNION-ALL
   5    4           TABLE ACCESS (FULL) OF 'T1'
   6    4           TABLE ACCESS (FULL) OF 'T2'
 
 
 
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2>

<b>thats a nice big, bulky one pass method -- it reports table by table rows that are in one but not to be found in the other (nulls compare "the same" nicely)
</b>


ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> update t1 set ename = lower(ename) where rownum = 1;
 
1 row updated.
 
ops$tkyte@ORA9IR2> update t2 set comm = 100 where comm is null and sal <> 800 and rownum = 1;
 
1 row updated.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select max(tname), empno, ename, job, mgr, hiredate, sal, comm, deptno
  2    from
  3  (
  4  select 'T1' tname, t1.* from t1
  5  union all
  6  select 'T2' tname, t2.* from t2
  7  )
  8  group by empno, ename, job, mgr, hiredate, sal, comm, deptno
  9  having count(*) <> 2
 10  /
 
MA      EMPNO ENAME      JOB        MGR HIREDATE    SAL       COMM     DEPTNO
-- ---------- ---------- --------- ---- --------- ----- ---------- ----------
T2       7369 SMITH      CLERK     7902 17-DEC-80   800                    20
T1       7369 smith      CLERK     7902 17-DEC-80   800                    20
T1       7566 JONES      MANAGER   7839 02-APR-81  2975                    20
T2       7566 JONES      MANAGER   7839 02-APR-81  2975        100         20



 

Spot on!

A reader, April 23, 2004 - 5:59 pm UTC

This is what I was looking for.
Thanks

Data insert,update+insert,delete from 3 tables

Jeff, July 31, 2004 - 3:48 pm UTC

Hi Tom,
I have emp and emptest tables with different columns also different records like diff values,additional records etc(pls see the diff at bottom).Now I want to sync the differences to the emp table from emptest.Then I want to insert in to emp_emptest like column ename_job should contains concat of ename+job from emp and column sal_comm should contains concat of sal+comm from emp,then want to delete from emp where records=inserted records into emp_emptest. Please help me.
Thanks,
Jeff


SQL> desc emp
 Name                    Null?    Type
 ----------------------- -------- ----------------
 EMPNO                   NOT NULL NUMBER(4)
 ENAME                            VARCHAR2(10)
 JOB                              VARCHAR2(9)
 MGR                              NUMBER(4)
 HIREDATE                         DATE
 SAL                              NUMBER(7,2)
 COMM                             NUMBER(7,2)
 DEPTNO                           NUMBER(2)

SQL> desc emptest
 Name                    Null?    Type
 ----------------------- -------- ----------------
 EMPNOTEST               NOT NULL NUMBER(25)
 ENAMETEST                        VARCHAR2(20)
 JOBTEST                          VARCHAR2(9)
 MGRTEST                          NUMBER(9)
 HIREDATETEST                     DATE
 SALTEST                          NUMBER(7,2)
 COMMTEST                         NUMBER(7,2)
 DEPTNOTEST                       NUMBER(8)

SQL> desc emp_emptest
Name                    Null?    Type
 ----------------------- -------- ----------------
 EMPNO                   NOT NULL NUMBER(25)
 ENAME_JOB                       VARCHAR2(20)
 MGR                             NUMBER(9)
 HIREDATE                        DATE
 SAL_COMM                        NUMBER(7,2)
 DEPTNO                         NUMBER(8)

SQL> set linesize 100
SQL> select * from emp;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO               
---------- ---------- --------- ---------- --------- ---------- ---------- ----------               
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20               
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30               
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30               
      7566 JONES      MANAGER         7839 02-APR-81       2975                    20               
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30               
      7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30               
      7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10               
      7788 SCOTT      ANALYST         7566 19-APR-87       3000                    20               
      7839 KING       PRESIDENT            17-NOV-81       5000                    10               
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30               
      7876 ADAMS      CLERK           7788 23-MAY-87       1100                    20               

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO               
---------- ---------- --------- ---------- --------- ---------- ---------- ----------               
      7900 JAMES      CLERK           7698 03-DEC-81        950                    30               
      7902 FORD       ANALYST         7566 03-DEC-81       3000                    20               
      7934 MILLER     CLERK           7782 23-JAN-82       1300                    10               
      6666 NEWDEV     ORA DEV         9999 02-JUN-00       8000          0         10               
      9000 TOM        DBA             7566 31-JUL-04       7500                    20               

16 rows selected.

SQL> select * from emptest;

 EMPNOTEST ENAMETEST            JOBTEST      MGRTEST HIREDATET    SALTEST   COMMTEST DEPTNOTEST     
---------- -------------------- --------- ---------- --------- ---------- ---------- ----------     
      7369 SMITH                CLERK           7902 17-DEC-80        800                    20     
      7499 ALLEN                SALESMAN        7698 20-FEB-81       1600        300         30     
      7521 WARD                 SALESMAN        7698 22-FEB-81       1250        500         30     
      7566 JONES                MANAGER         7839 02-APR-81       2975                    20     
      7654 MARTIN               SALESMAN        7698 28-SEP-81       1250       1400         30     
      7698 BLAKE                MANAGER         7839 01-MAY-81       2850                    30     
      7782 CLARK                MANAGER         7839 09-JUN-81       2450                    10     
      7788 SCOTT                ANALYST         7566 19-APR-87       3000                    20     
      7839 KINGFISH             PRESIDENT            17-NOV-81       5000                    10     
      7844 TURNER               SALESMAN        7698 08-SEP-81       1500          0         30     
      7876 ADAMS                CLERK           7788 23-MAY-87       1100                    20     

 EMPNOTEST ENAMETEST            JOBTEST      MGRTEST HIREDATET    SALTEST   COMMTEST DEPTNOTEST     
---------- -------------------- --------- ---------- --------- ---------- ---------- ----------     
      7900 JAMES                ORA MANA        7698 03-DEC-81        950                    30     
      7902 FORDJR               ANALYST         7566 03-DEC-81       3000                    20     
      7934 MILLER               CLERK           7782 23-JAN-82       1300                    10     
      6666 NEWDBA               ORA DBA         9999 02-JUN-00       8000          0         10     

15 rows selected.

SQL> SET LINESIZE 150
SQL> /

MAX(TNAME)        EMPNO ENAME                JOB              MGR HIREDATE         SAL       COMM     DEPTNO                                          
------------ ---------- -------------------- --------- ---------- --------- ---------- ---------- ----------                                          
EMP                7900 JAMES                CLERK           7698 03-DEC-81        950                    30                                          
EMPTEST            7900 JAMES                ORA MANA        7698 03-DEC-81        950                    30                                          
EMP                9000 TOM                  DBA             7566 31-JUL-04       7500                    20                                          
EMPTEST            6666 NEWDBA               ORA DBA         9999 02-JUN-00       8000          0         10                                          
EMP                6666 NEWDEV               ORA DEV         9999 02-JUN-00       8000          0         10                                          
EMP                7839 KING                 PRESIDENT            17-NOV-81       5000                    10                                          
EMPTEST            7839 KINGFISH             PRESIDENT            17-NOV-81       5000                    10                                          
EMP                7902 FORD                 ANALYST         7566 03-DEC-81       3000                    20                                          
EMPTEST            7902 FORDJR               ANALYST         7566 03-DEC-81       3000                    20                                          

9 rows selected.
 

Tom Kyte
July 31, 2004 - 5:09 pm UTC

sorry -- barely followed the description and then when I saw just table describes and the output of selects, said "hmmm".

but sounds vaguely like "merge into emp using emptest" and "insert into emp_emptest select ... from emp, emptest where..." doesn't it.

Data insert,update+insert,delete from 3 tables

Jeff, July 31, 2004 - 5:44 pm UTC

Hi Tom,
Sorry for confused you.Here is my requirment.

I have three tables a,b and c
Tables a and b are having same datatypes,records with different column names but b has latest information.

Now I want to sync the table b records to table a (to make table a upto date)

then want to move some records to table c based on a condition.then want to delete records on table a ,who were moved to table c.
hope i explained to understand.Pls help me with a example.


Tom Kyte
July 31, 2004 - 6:09 pm UTC

sync the table = merge into a using b on (key) when matched then update, when not matched then insert.

move some records from a to c = "insert into c select * from a where <some condition>"

delete from a those records we moved = "delete from a where <some condition>"

Peformance of insert

Vinay, September 15, 2004 - 11:18 am UTC

Hi Tom,

create global temporary table a
(party_id number, value number)
on commit delete rows;

create table hz_parties (party_id number);

create index T1 on hz_parties(party_id);

create table B (party_id number, value number, status varchar2(1));

insert into A select party_id, value from B where status = 'A';
(assume that this inserted 10000 records into A)

After this I want to insert the party records that are present in hz_parties but not in table A.
For this i'm using the below insert statement

insert into a select party_id, null
from hz_parties hz
where not exists (select null from A
where A.party_id = hz.party_id )

But this seems to take too much time.
Records in hz_parties: Few millions

Please let us know on how to improve the performance of this insert.

Thanks a lot


Tom Kyte
September 15, 2004 - 11:40 am UTC

hz_parties is a complex view, how long does just "select * from hz_parties" itself take on your system?

using the CBO?

use not in if party_id is defined as NOT NULL in A

Nulls and spaces

A reader, October 28, 2004 - 10:54 am UTC

I have 2 tables with identical structure

create table copy as select * from original;

Data is loaded into "original" and "copy" independently for days. Now I want to compare the data in the 2 tables.

select * from original
minus
select * from copy

works fine, but it gives differences on cases where a column is NULL in one table and spaces in another.

How can I make it such that it compares the TRIM()ed version of all columns?

Thanks

Tom Kyte
October 28, 2004 - 1:51 pm UTC

well NULL is not "spaces" and "spaces" are not nulls, so it did the right thing

you would have to

select trim(c1), trim(c2), ...
minus
select trim(c1), ....


to compare after performing a function

stragg

A reader, October 28, 2004 - 2:20 pm UTC

stragg to the rescue again!

I ended up using

select stragg(case when data_type like '%CHAR%' then 'trim('||column_name||')' else column_name end)
from user_tab_columns
where table_name='T'

to generate the column list with trim's on the string columns. Used this string to do

select that_string from t
minus
select that_string from t_copy

Thanks



compare - generate delta records

karma, March 16, 2005 - 4:35 pm UTC

In this thread you showed efficient method to compare data like :
ops$tkyte@ORA9IR2> set autotrace on explain
ops$tkyte@ORA9IR2> select max(tname), empno, ename, job, mgr, hiredate, sal,
comm, deptno
2 from
3 (
4 select 'T1' tname, t1.* from t1
5 union all
6 select 'T2' tname, t2.* from t2
7 )
8 group by empno, ename, job, mgr, hiredate, sal, comm, deptno
9 having count(*) <> 2
10 /

Is there a way to generate delta records out of this ? e.g.
Assume table t1 has old data and T2 has new data. Is there a efficient way to generate list of records as follows:

If T1 and T2 both have rows and one of the column is different then only generate rows from T2 and mark as update.
If row only exists in T2 then generate rows only from T2 and mark as insert.
If row only exists in T1 then generate rows only from T1 and mark as delete.
If rows exist in T1 and T2 and every column is same then dont do anything.
Also, would if make any difference if both of this tables are external tables or regular tables ?

Thanks


Tom Kyte
March 17, 2005 - 8:27 am UTC

well, you would have to specify what the "key" was (empno in this case)

ops$tkyte@ORA9IR2> create table emp1 as select * from scott.emp;
Table created.
 
ops$tkyte@ORA9IR2> create table emp2 as select * from scott.emp;
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> update emp2 set empno = 1234 where rownum = 1;
1 row updated.

<b>that should generate a delete for the old key and an insert of 1234..</b>
 
ops$tkyte@ORA9IR2> update emp2 set ename = lower(ename) where empno = 7788;
1 row updated.

<b>that should generate an update of that record</b>

 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select case when cnt1 = 1 and tname = 'T1' then 'DELETE ME'
  2              when cnt1 = 1 and tname = 'T2' then 'INSERT ME'
  3              when cnt1 = 2 and tname = 'T2' then 'UPDATE ME TO THIS'
  4              end what,
  5         empno, ename, job, mgr, hiredate, sal, comm, deptno
  6    from (
  7  select count(*) over (partition by empno) cnt1,
  8         count(*) over (partition by empno, ename, job, mgr, hiredate, 
                                        sal, comm, deptno) cnt2,
  9         tname, empno, ename, job, mgr, hiredate, sal, comm, deptno
 10    from
 11  (
 12  select 'T1' tname, t1.* from emp1 t1
 13  union all
 14  select 'T2' tname, t2.* from emp2 t2
 15  )
 16  ) x
 17  where (cnt1 = 1) or (cnt1 = 2 and cnt2 = 1 and tname = 'T2')
 18  /
 
WHAT              EMPNO ENAME      JOB         MGR HIREDATE    SAL COMM DEPTNO
----------------- ----- ---------- --------- ----- --------- ----- ---- ------
INSERT ME          1234 Smith      CLERK      7902 17-DEC-80   800          20
DELETE ME          7369 Smith      CLERK      7902 17-DEC-80   800          20
UPDATE ME TO THIS  7788 scott      ANALYST    7566 09-DEC-82  3000          20


but I'd rather

delete from emp1 where empno not in (select empno from emp2)
merge into emp1 using emp2 on (emp1.empno = emp2.empno)
when matched then update....
when not matched then insert....

 

good

A reader, June 03, 2005 - 1:29 pm UTC


comparing data in a table and flat file

Sim, July 08, 2005 - 10:29 pm UTC

I have a comma delimited flat file and a table in the database. How do i find the diff. The number of columns are same in both flat file and table.
Well, for the comma delimited flat file, i could the load the data into an external table and then do a sql minus that you demonstrated in the begginning of this article. But i don't have create table privilege to create an external table. i have only select only access to the schema. Is there any way to do this from SQL plus.

Tom Kyte
July 09, 2005 - 8:54 am UTC

if you do not have the ability to create an external table definition, you are hosed (technical term).

sorry -- not much we can do here in the database if you haven't the privilege to do the simpliest of things to accomplish your job.

Need some help

Joe, July 14, 2005 - 2:38 pm UTC

Tom,

HOw can I compare to tables and columns dynamically?




Tom Kyte
July 14, 2005 - 3:03 pm UTC

see the other answer, I'll give you the technique, you develop the stored procedure.

GoldenGate Veridata

VA, August 15, 2005 - 10:53 am UTC

Just ran into this product called Goldengate Veridata

</code> http://www.goldengate.com/technology/veridata/index.html <code>

Seems interesting.

how to find difference of data into the same table

A reader, January 30, 2006 - 5:55 am UTC

Dear Tom,

I have a table T as follows

Name Null? Type
------------------------------- -------- ----
COMP VARCHAR2(15)
A NUMBER
B VARCHAR2(4)
C VARCHAR2(4)
E NUMBER
F NUMBER

select * from T

COMP A B C E F
--------------- --------- ---- ---- --------- ---------
11002 315 open new 5 12
11002 315 open new 5 77
31001 15 clos new 13 3
31001 15 clos new 10 3

I would like to find the differences (for each comp) and log the result into a new table T2 described as follows

Name Null? Type
------------------------------- -------- ----
COMP VARCHAR2(15)
A NUMBER
B VARCHAR2(4)
C VARCHAR2(4)
E NUMBER
F NUMBER

and where values are

COMP A B C E F
--------------- --------- ---- ---- --------- ---------
11002 12
11002 77
31001 10
31001 13



Tom Kyte
January 30, 2006 - 8:36 am UTC

you do not say if there can be "three" rows per comp.

using lag/lead with two rows - this would be "easy"

Comparing two tables

Adam Musch, January 30, 2006 - 3:21 pm UTC

A query like the following can be highly useful if you're trying to find the differences between two sets of columns, especially if there can be multiple records.

select match_col1, match_col2, ... match_col_n,
sum(case rec_source when 'A' then 1 else 0) as a_count,
sum(case rec_source when 'B' then 1 else 0) as B_count
from (select match_col1, match_col2, ... match_col_n,
'A' as rec_source
from table1
union all
select match_col1, match_col2, ... match_col_n,
'B' as rec_source
)
group by match_col1, match_col2, ... match_col_n
having sum(case rec_source when 'A' then 1 else 0) !=
sum(case rec_source when 'B' then 1 else 0)

This will find only exact matches for each distinct set of
match columns, and will show you the "not found" values in one table or the other.



thirumaran, February 07, 2006 - 5:41 am UTC

Dear Tom

I got a requirement to compare the data of two similar tables existing in two different schemas. As the data is so huge in the production, i could not effectively compare them.

Can you let me know if there is any effective way of comparing the data either a query or any tool to compare the table column data .

Thanks
Thirumaran


Tom Kyte
February 07, 2006 - 5:56 am UTC

why would the size prevent you?

see
</code> https://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html
(ctl-f for comparing)

or if you just want to know "are they the same", you can possibly use a hash
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:26043356526642#44900835942101 <code>



Why different result in a for loop

Hitesh Bajaj, May 03, 2006 - 4:42 am UTC

Create or replace procedure ddl_modifications AS
begin
declare
l_cnt NUMBER;
l_col_name VARCHAR2(30);
l_table_name VARCHAR2(30);
l_data_type VARCHAR2(30);
l_data_length VARCHAR2(30);
l_nullable VARCHAR2(30);
ddl_add_column VARCHAR2(100);
begin
for x in
(
Select 'IN SCOTTDUPL, NOT SCOTT', table_name tname, column_name cname,data_type,data_length,Nullable
from all_tab_columns
where owner = 'SCOTT' and table_name = 'DEPT'
MINUS
Select 'IN SCOTTDUPL, NOT SCOTT', table_name tname, column_name cname,data_type,data_length, Nullable
from all_tab_columns
where owner = 'SCOTTDUPL' and table_name = 'DEPT'
UNION ALL
Select 'IN SCOTT, NOT SCOTTDUPL', table_name tname, column_name cname,data_type,data_length, nullable
from all_tab_columns
where owner = 'SCOTTDUPL' and table_name = 'DEPT'
MINUS
Select 'IN SCOTT, NOT SCOTTDUPL', table_name tname, column_name cname,data_type,data_length, nullable
from all_tab_columns
where owner = 'SCOTT' and table_name = 'DEPT'
) loop
dbms_output.put_line('Hmm' || x.table_name|| x.column_name );
end loop;
end;
end;
/
scott@ADAM.DOMAIN> exec ddl_modifications;
HmmDEPTDEPTNO
HmmDEPTDNAME
HmmDEPTLOC
HmmDEPTX

PL/SQL procedure successfully completed.

and the data in two tables is like this :

scott@ADAM.DOMAIN> Select * from all_tab_columns where owner='SCOTT' and table_name = 'DEPT';

OWNER TABLE_NAME COLUMN_NAME DATA_TYPE
------------------------------ ------------------------------ ------------------------------ -------
SCOTT DEPT DEPTNO NUMBER
SCOTT DEPT DNAME VARCHAR2
SCOTT DEPT LOC VARCHAR2
SCOTT DEPT X NUMBER
scott@ADAM.DOMAIN> Select * from all_tab_columns where owner = 'SCOTTDUPL' and table_name = 'DEPT';

OWNER TABLE_NAME COLUMN_NAME DATA_TYPE
------------------------------ ------------------------------ ------------------------------ -------
SCOTTDUPL DEPT DEPTNO NUMBER
SCOTTDUPL DEPT DNAME VARCHAR2
SCOTTDUPL DEPT LOC VARCHAR2
SCOTTDUPL DEPT HIREDATE DATE
SCOTTDUPL DEPT X VARCHAR2

Should have been :
'INSCOTTDUPL,NOTSCOTT' TNAME CNAME DATA_TYPE
----------------------- ------------------------------ ------------------------------ --------------
IN SCOTT, NOT SCOTTDUPL DEPT HIREDATE DATE
IN SCOTT, NOT SCOTTDUPL DEPT LOC VARCHAR2
IN SCOTT, NOT SCOTTDUPL DEPT X VARCHAR2
IN SCOTTDUPL, NOT SCOTT DEPT LOC VARCHAR2
IN SCOTTDUPL, NOT SCOTT DEPT X NUMBER

Thanks in advance,



Tom Kyte
May 03, 2006 - 7:06 am UTC

do this, it is NOT the loop, it is a definers rights procedure and ROLES:

ps$tkyte@ORA10GR2> create or replace procedure p1
  2  as
  3  begin
  4          for x in (select count(*) cnt from all_objects)
  5          loop
  6                  dbms_output.put_line( x.cnt );
  7          end loop;
  8  end;
  9  /

Procedure created.

ops$tkyte@ORA10GR2> create or replace procedure p2
  2  authid CURRENT_USER
  3  as
  4  begin
  5          for x in (select count(*) cnt from all_objects)
  6          loop
  7                  dbms_output.put_line( x.cnt );
  8          end loop;
  9  end;
 10  /

Procedure created.

ops$tkyte@ORA10GR2> select count(*) cnt from all_objects;

       CNT
----------
     49962

ops$tkyte@ORA10GR2> exec p1
40874

PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR2> exec p2
49962

PL/SQL procedure successfully completed.


See the different counts - you see a different number of objects when roles are enabled (authid current_user) vs when they are not (by default).  See also:

http://asktom.oracle.com/Misc/RolesAndProcedures.html

Cacade tables

Sanji, May 03, 2006 - 1:14 pm UTC

Hi Tom

How to cascade 2 tables.

Table A (col1, col2, col3)
Table B (col4, col5, col6)

So can a new table C, with cascaded structure and data from tables A & B, be created.
Table C (col1, col2, col3, col4, col5, col6)

Thanks
Sanji


Tom Kyte
May 03, 2006 - 1:33 pm UTC

"cascaded structure", never heard of that one before.


create table c
as
select * from a, b where 1=0;



Merging data from different table

Sanji, May 03, 2006 - 2:33 pm UTC

Thanks for the reply Tom.
Apparently "cascaded strucutre" was meant to be a table with columns from both the tables. I couldn't think of a better term :)

Anyways, create table c as select * from a, b where 1=2, would create the table with columns from tables a and b.
The requirement is such that i need to merge data of both the tables into a single table and tables 'a' and 'b' might not necessarily have same number of columns and datatypes.

If i execute
"create table c as select * from a, b"
This would perform a cartesian product of rows from a and b

Thanks again.
Sanji

Tom Kyte
May 03, 2006 - 2:55 pm UTC

tell me Sanji

what query would you use to join A to B and get the right data?

That query is the query you are looking for.


I don't know how to merge your tables together - perhaps you can tell us that?

Merging data

Sanji, May 03, 2006 - 3:04 pm UTC

It's a peculiar requirement.
(Assuming that the application design cannot change), the application team requests schema changes on a regular basis. Now the issue is that every table in the schema has 7 "fixed" columns, whose position cannot change. So whenever there is a new column request, it wastes lots of time and effort, coz of obvious reasons. The new columns has to be accomodated somewhere in the middle of the table structure.
What i planned was that i would create 2 separate tables that would accomodate columns before and after the fixed column's column id.
I would make changes to the first table and then merge with the other table.
This is the methodology that i have adopted.

create table san3 as select * from san1;

create table san4 as select * from san1;

declare

cursor c1 is
select column_name, column_id
from dba_tab_columns
where table_name = 'SAN3'
and owner = 'SUR0833'
and column_id >= (select column_id from dba_tab_columns
where column_name = 'ENTSTAT_REF'
and table_name = 'SAN3'
and owner = 'SUR0833');

cursor c2 is
select column_name, column_id
from dba_tab_columns
where table_name = 'SAN4'
and owner = 'SUR0833'
and column_id < (select column_id from dba_tab_columns
where column_name = 'ENTSTAT_REF'
and table_name = 'SAN4'
and owner = 'SUR0833');

cname varchar2(100);

cid number;

begin

open c1;
loop
fetch c1 into cname, cid;
exit when c1%notfound;

execute immediate 'alter table SAN3 drop column ' || cname ;

end loop;
close c1;

open c2;
loop
fetch c2 into cname, cid;
exit when c2%notfound;

execute immediate 'alter table SAN4 drop column ' || cname ;

end loop;
close c2;


exception

when OTHERS then
dbms_output.put_line (DBMS_UTILITY.FORMAT_ERROR_STACK);
end;
/


So i make changes(add, drop, modify columns) to table san3 and then i plan to merge structure/ data from table san4 into a new table.
I know, it's a bad design/ requirement, but then it needs to be done.

Thanks
Sanji

Tom Kyte
May 03, 2006 - 3:22 pm UTC

I don't get this AT ALL

there are no obvious reasons here to me - you'll really need to be more clear

(but one word for you: VIEW, with a view you can have your columns in ANY ORDER YOU LIKE)

but I did not follow this discussion one bit. Seems that the seven columns are already there - adding a new column goes to the end - would NEVER interfere with them - so nothing obvious at all.


what is "coz", it is related to this: </code> http://home.centurytel.net/cozland/home.html <code>

Merging data

Sanji, May 03, 2006 - 3:35 pm UTC

Consider this example

table a(
sno number,
addr varchar2(100),
name varchar2(100),
update_by varchar2(100),
updated_on date,
created_by varchar2(100),
created_on date)

The last 4 columns updated_by, updated_on, created_by, created_on, need to be "STATIC" in terms of their locations in the table. If i need to add another column to this table, say "new_column", the table structure has to be like this

table a(
sno number,
addr varchar2(100),
name varchar2(100),

new_column varchar2

update_by varchar2(100),
updated_on date,
created_by varchar2(100),
created_on date)


So the last 4 column's positions remain STATIC. I cannot add the new_column to the end of the table.
So if an existing table has to be modified, then it has to be renamed, dropped and recreated with the new column in between, thus maintaining the locations of the "FIXED" columns, and then the data has to be pumped back to this new table from the old table.

We generate a script like

insert into new_table
select(col1 , col2, col3, <null or default for new_column>, col5, col6, col7, col8> from old_table.

This procedure is time consuming and i wanted to automate it.

Sorry for "coz", it was "because".

Thanks
Sanji


Tom Kyte
May 03, 2006 - 3:39 pm UTC

one word:

VIEW


ops$tkyte@ORA10GR2> create table a(
  2  sno number,
  3  addr varchar2(100),
  4  name varchar2(100),
  5  update_by varchar2(100),
  6  updated_on date,
  7  created_by varchar2(100),
  8  created_on date)
  9  /

Table created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> alter table a rename to "A_NEVER_TO_BE_USED_AGAIN!"
  2  /

Table altered.

ops$tkyte@ORA10GR2> alter table "A_NEVER_TO_BE_USED_AGAIN!"
  2  add new_column varchar2(20)
  3  /

Table altered.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> create or replace view A
  2  as
  3  select sno, addr, name, NEW_COLUMN, update_by, updated_on, created_by, created_on
  4    from "A_NEVER_TO_BE_USED_AGAIN!"
  5  /

View created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> desc a
 Name                                     Null?    Type
 ---------------------------------------- -------- ----------------------------
 SNO                                               NUMBER
 ADDR                                              VARCHAR2(100)
 NAME                                              VARCHAR2(100)
 NEW_COLUMN                                        VARCHAR2(20)
 UPDATE_BY                                         VARCHAR2(100)
 UPDATED_ON                                        DATE
 CREATED_BY                                        VARCHAR2(100)
 CREATED_ON                                        DATE



<b>
Do NOT do what you are trying to do
Do USE a view to accomplish this goal without any hassle or work
Period

Last I'll have to say on it, there is no way I'll help optimize a process that should never be done - use the VIEW
</b> 

Best possible solution.

Sanji, May 03, 2006 - 3:59 pm UTC

Tom,
If ever i want to learn something, it'd be simplifying solutions.
It's like spending a million dollar on a pen that can defy gravity against using a pencil ... :)

Thanks a ton
Regards
Sanji

Tom Kyte
May 03, 2006 - 4:39 pm UTC

.. If ever i want to learn something, it'd be simplifying solutions. ...

I think that is one of the nicest things I've had anyone say here.

A reader, May 03, 2006 - 6:54 pm UTC

oh boy that coz website might have recvd the most hits today ever in its life.

union in view

A reader, June 22, 2006 - 5:01 pm UTC

I have created a view in oracle which has union in it.

when I write any query to select from view It's cost is 57000. the earlier version of view didn't have union and it jad cost of 34 only.
Is ther any other better way to write the view.

craete view
select a,b,c from d,e,f where ...
union
select x,y,z fron l,e,f where ...


Tom Kyte
June 23, 2006 - 9:49 am UTC

does it run

a) fast
b) slow

who cares about the cost, well besides the optimizer.

union

A reader, June 23, 2006 - 11:23 am UTC

I have not seen the output as it takes very long and load on server grows so much that I have to kill it.


Tom Kyte
June 23, 2006 - 1:16 pm UTC

then tune said query.

before it was not a union - so before it was "a single query"

now, you have two queries (the new one is probably very very different). They are executed and their results are then "distinct"-ed


Make sure

a) you need union and not union all.

b) your two individual queries run "quickly" before adding the union at all.

temp table

A reader, June 26, 2006 - 12:41 pm UTC

should I use gtt global temp table.

Tom Kyte
June 26, 2006 - 12:47 pm UTC

I don't know.

I don't know who you are.
I don't know what probablem you are trying to solve.

I'd guess "no", because temporary tables are needed very infrequently. So, most of the time, the answer would be "no"

Pat, June 27, 2006 - 2:42 pm UTC

Tom,

You are amazing as usual. Can you please help me on this

i want the same result duplicated with one different column. Reason is I have a very complex SQL which i want to use the result as a table and duplicate the result.
What i did is first I inserted into table and reseleted the same result and modified the required column and reinserted. Is it possible can we do it in one SQL without the need to requery the complex SQL. I don't no weather i explained properly here is the example.

create table DUP1
(VOL NUMBER,SDT date);

insert into DUP1
values(1000,'01-JAN-2006');

insert into DUP1
values(1000,'02-JAN-2006');

insert into DUP1
values(2000,'03-JAN-2006');

insert into DUP1
values(3000,'03-JAN-2006');


SELECT 'A' TYPE,VOL,SDT FROM DUP1;

TYPE VOL SDT
A 1000 1/1/2006 12:00:00.000 AM
A 1000 1/2/2006 12:00:00.000 AM
A 2000 1/3/2006 12:00:00.000 AM
A 3000 1/3/2006 12:00:00.000 AM


SELECT 'A' TYPE,VOL,SDT FROM DUP1
UNION
SELECT 'B' TYPE,VOL,SDT FROM DUP1;

looking for this Output

TYPE VOL SDT
A 1000 1/1/2006 12:00:00.000 AM
A 1000 1/2/2006 12:00:00.000 AM
A 2000 1/3/2006 12:00:00.000 AM
A 3000 1/3/2006 12:00:00.000 AM
B 1000 1/1/2006 12:00:00.000 AM
B 1000 1/2/2006 12:00:00.000 AM
B 2000 1/3/2006 12:00:00.000 AM
B 3000 1/3/2006 12:00:00.000 AM

I want the output with out using the table DUP1 twice.


Tom Kyte
June 27, 2006 - 2:59 pm UTC

ops$tkyte@ORA9IR2> select *
  2    from dup1, (select 'A' type from dual union all select 'B' from dual)
  3   order by type, vol;

       VOL SDT       T
---------- --------- -
      1000 01-JAN-06 A
      1000 02-JAN-06 A
      2000 03-JAN-06 A
      3000 03-JAN-06 A
      1000 01-JAN-06 B
      1000 02-JAN-06 B
      2000 03-JAN-06 B
      3000 03-JAN-06 B

8 rows selected.
 

A reader, June 27, 2006 - 4:08 pm UTC

Thanks Tom,

Can you please direct me on "Analytical functions by example" and "Performance Tunning" books other than Oracle documents.

Tom Kyte
June 27, 2006 - 4:55 pm UTC

why, have you read them - which Oracle documents have you already read?



diff in data

space, September 22, 2006 - 8:33 am UTC

hi Tom,

I am trying to write a generic program to find data diffs between 2 tables
both tables have the same data structure and same number of rows
I want to write the diff's in the third table.
I am trying to implement this but I am not able
Can you please help me with this.


declare

cursor cur_tab1 is select * from all_tab_columns where table_name = 'TRANS';
type tab1 is table of cur_tab1%rowtype;
-- index by binary_integer;
l_tab1 tab1;
l_row1 pls_integer;

cursor cur_tab2 is select * from all_tab_columns where table_name = 'TRANS_TMP';-- where table_name = 'TRANS_TMP' and owner = 'MIC';
type tab2 is table of cur_tab2%rowtype;
--index by binary_integer;
l_tab2 tab2;
l_row2 pls_integer;


begin
dbms_output.put_line(' ');
open cur_tab1;
fetch cur_tab1 bulk collect into l_tab1;

open cur_tab2;
fetch cur_tab1 bulk collect into l_tab2;

-- process the data by scanning through the collection.
l_row1 := l_tab1.first;
l_row2 := l_tab2.first;

while (l_row1 is not null or l_row2 is not null)
loop
if l_tab1(l_row1) <> l_tab2(l_row2)) then
dbms_output.put_line('here ');
else
dbms_output.put_line(' here 2');
end if;

l_row1 := l_tab1.next (l_row1);
l_row2 := l_tab2.next (l_row2);
end loop; */
close cur_tab1;
close cur_tab2;
end;

Thankyou very much.


Tom Kyte
September 22, 2006 - 3:17 pm UTC

search for
</code> https://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html <code>
"comparing" and you'll see a single sql statement you can use to "diff" two tables.

if you write procedural code for this, you have done it WRONG.

your code, well, please take a peek at it. you are comparing the names of columns of your two tables. You are not even a tiny bit close to comparing the CONTENTS of the two tables.

agreed, b

space, September 25, 2006 - 9:35 am UTC

Thankyou for the response Tom.
Yes I agree that the minus is a very straight forward approach
but the problem is that I have more than 300 columns.

Tom Kyte
September 25, 2006 - 3:04 pm UTC

umm - so?

go ahead and write a snippet of sql then that writes sql and open a ref cursor. do not write code to compare these tables.

by the way, that article suggests against using minus and shows a getter approach.

besides, I'll say again, your code doesn't compare two tables - so not sure where you are goind with that.

Get to know the comparison of a Column with in a Table

Raj, November 22, 2006 - 4:04 pm UTC

Here is my requirement ..

My oracle Version is 9.2.0.4

I have a Table A

ID_LIST ID_KEY ID_AVAIL_PACKS

1001 101 30
1001 101 30
1001 101 30
1001 102 30

1002 100 10
1002 100 10
1002 100 20

In this case i have to check the same id_list should have the same values of id_key .. if it is different then I should raise an error into error table.

The second set of rows is also again an error because it is different in id_avail_packs. This also needs to be trapped and written into error table.

How can I check in one query for the above said two scenarios...


Regards,
Raj




Tom Kyte
November 24, 2006 - 12:53 pm UTC

eh?

but there is no create
no inserts
hence no lookie.

but sounds like you are looking for maybe:

select id_list, count(distinct id_key), count(distinct id_avail_packs)
from t
group by id_list
having count(distinct id_key) > 1 or count(distinct id_avail_packs) > 1;



Get to know the comparison of a Column with in a Table

RAJ, November 24, 2006 - 6:38 pm UTC

You are absolutely right and great..

It worked for me.



Merging rows of a table to create a new result set

Dheeraj, July 11, 2007 - 10:59 am UTC

Hi Tom,

I have following steps to explain my problem:

SQL> create table tab1 (col1 number, col2 number);

Table created.

SQL> insert into tab1 values(1,10);

1 row created.

SQL> insert into tab1 values(1,20);

1 row created.

SQL> insert into tab1 values(1,30);

1 row created.

SQL> insert into tab1 values(2,10);

1 row created.

SQL> insert into tab1 values(2,20);

1 row created.

SQL> insert into tab1 values(2,30);

1 row created.

SQL> insert into tab1 values(3, 10);

1 row created.

SQL> insert into tab1 values(3, 20);

1 row created.

SQL> select * from tab1;

COL1 COL2
---------- ----------
1 10
1 20
1 30
2 10
2 20
2 30
3 10
3 20

8 rows selected.

Now my requirement is to write a SQL to get col1 = 1, 2 from table, tab1 because these two values of col1 have the same set of col2.

Thanks in advance,

Dheeraj
Tom Kyte
July 11, 2007 - 1:04 pm UTC

and what if there is also

4  10
4  20



what is the result set now?

Merging rows of a table to create a new result set

Dheeraj, July 11, 2007 - 10:53 pm UTC

Hmm...basically the idea is to group col1 based on their common dependency on col2 values...

The example, you pointed out, lets add two more rows:

col1 col2
5 10
6 30

Now, I should get following result set:
Group 1: col1 = 1,2
Group 2: col1 = 3,4
Group 3: col1 = 5
Group 4: col1 = 6, since these groups have common dependency on col2.

Many thanks,

Dheeraj
Tom Kyte
July 12, 2007 - 10:02 am UTC

here is one possible approach

ops$tkyte%ORA10GR2> select * from tab1;

      COL1       COL2
---------- ----------
         1         10
         1         20
         1         30
         2         10
         2         20
         2         30
         3         10
         3         20
         4         10
         4         20
         5         10
         6         30

12 rows selected.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select col1, scbp, max(rn) over (order by scbp, col1) grp
  2    from (
  3  select col1, scbp,
  4         case when decode( lag( scbp ) over (order by scbp, col1), scbp, 1, 0 ) = 0
  5              then row_number() over (order by scbp, col1)
  6           end rn
  7    from (
  8  select col1, max( sys_connect_by_path( col2, ',' ) ) keep (dense_rank last order by level ) scbp
  9    from (
 10  select col1, col2, row_number() over (partition by col1 order by col2) rn
 11     from tab1
 12         )
 13   connect by prior col1 = col1 and prior rn = rn-1
 14   group by col1
 15         )
 16         )
 17   order by scbp, col1
 18  /

      COL1 SCBP                                            GRP
---------- ---------------------------------------- ----------
         5 ,10                                               1
         3 ,10,20                                            2
         4 ,10,20                                            2
         1 ,10,20,30                                         4
         2 ,10,20,30                                         4
         6 ,30                                               6

6 rows selected.

RE:Merging rows of a table to create a new result set

Frank Zhou, July 12, 2007 - 11:49 am UTC

Here is an alternative sql solution.

SQL> select * from tab1;

COL1 COL2
---------- ----------
1 10
1 20
1 30
2 10
2 20
2 30
3 10
3 20
4 10
4 20
5 10
6 30

12 rows selected.

SQL> select col1,rtrim(ltrim(regexp_replace(str,'<X>|</X><X>|</X>',','),','),',') str,
2 DENSE_RANK( ) OVER (ORDER BY str ) as grp
3 from (SELECT col1,
4 XMLAgg(XMLElement(x,col2) order by col2).getstringval() str
5 FROM tab1 group by col1
6 );

COL1 STR GRP
---------- -------------------- ----------
5 10 1
3 10,20 2
4 10,20 2
1 10,20,30 3
2 10,20,30 3
6 30 4

6 rows selected.

SQL> spool off;

Here is an other approach using SQL model Clause.

http://oraqa.com/2007/05/24/how-to-assign-the-same-number-identifier-to-different-groups-that-contain-the-identical-set-of-data-in-a-sql-statement/

Frank


RE:Merging rows of a table to create a new result set

Dheeraj, July 13, 2007 - 1:18 am UTC

Thanks Tom and Frank!!!

Let me use them in my application and see how they perform.

Cheers,

Dheeraj

Merging rows of a table to create a new result set

Dheeraj, July 13, 2007 - 9:54 am UTC

Hi folks,

In continuation to what we obtained above in a single SQL, can we obtain the same result thru. PL-SQL ?
I don't mind usage of a couple of more SQLs to obtain the right result set.

Basically, we can think of tab1 rows as:

col1 col2
1 [10,20,30]
2 [10,20,30]
3 [10,20]
4 [10,20]
5 [10]
6 [30]

Now, we want to generate list of col1 values having the same set of col2 values.
May be, we can think of trapping col1 values in an array variable.

Second thoughts on redesign of this table's data so that SQL/PL-SQL becomes simpler are also welcome.

Many thanks,

Dheeraj
Tom Kyte
July 13, 2007 - 10:36 am UTC

we showed you how to turn col2 into a list.


Now, you have col2 "listed", it can be grouped on.

and since you know how to turn things into a list, you can do that with col1


just take what you've learned and re-apply it to this problem!!!!

How to compare two tables of data dynamically

Gowtham P., October 03, 2008 - 9:16 am UTC

Hello Tom,
I was reading the example you provided at the start of this link for comparing two tables of data.
I am having following query related to comparing two tables of data dynamically:-


I need to write a routine, such that it will have the IN parameters as Sourcetable_Name and DestinationTable_Name.

The routine should compare the data of each column between the source and destination table dynamically (Certain tables has around 100 columns and holds around 35 lakhs records), if there is any mismatch for a particular record then delete the record from the destination table and then Insert that corresponding record fetching from the source table otherwise (means a New record) directly insert into destination table fetching from the Source table.

This routine is being run every night to sync up the data.
Some of the table doesnot have any Primary key column or indexes, since it is used for auditing purpose. Also the table doesnot have any columns which hold the timing information as when the data was inserted or updated.
Therefore we are forced to compare the data of each column between the source table and destination table.

Please advice me how should I write such a routine.

Thanks in Advance

Tom Kyte
October 03, 2008 - 9:58 am UTC

sorry - you are missing some logic here.

You do not just want to compare two tables - that is easy. You want to merge it sounds like???

Do you really want to "merge"

And if you do, you need to have a way to tell your routine "what the join key is"

Merge can do the update (what you called the delete+insert) and the insert of new data.

What about removing rows in Destination that do not have a key in Source?

Re: How to compare two tables of data dynamically

Stew Ashton, October 04, 2008 - 3:38 am UTC


Funny, I just wrote a routine to do this last week.

My solution requires all the "missing logic" you speak of. I put primary keys on all the destination tables so I know what to compare. I do merges, except when all the columns are in the primary key. I happen to do deletes but could omit them.

What I don't understand is why go this route? I have a special case where materialized views have drawbacks, so I wrote some code, but I still wake up at night wondering if I didn't just reinvent the wheel.

Perhaps Gowtham P. will explain the special circumstances which appear to require a coded solution?

Re: How to compare two tables of data dynamically

Gowtham .P, October 06, 2008 - 6:13 am UTC

Hello Tom,
I can't directly use MERGE, because I don't know the Join Keys. Therefore we are forced to do a flat comparison between the Source table and Destination table records to make both tables in sync.
If you have any sample code for such comparison,could you please share it with me.

Following is the logic my routine is supposed to do:

i) Compare each column values of Source table with the Destination table.
ii) If the record is not present in the destination table, then INSERT into destination table fetching from the source table.
iii) If there is a mismatch in data then DELETE the record from the destination table and insert the corresponding record fetching from the source table

This Routine should be used dynamically for synchronzing
180 tables once in a day.

Please help me by sharing some sample code for performing such comparison.

Thanks in Advance

Tom Kyte
October 06, 2008 - 3:06 pm UTC

please think about the impact of this statement:

....
I can't directly use MERGE, because I don't know the Join Keys.
......

think about that and what you say next...

... Therefore we
are forced to do a flat comparison between the Source table and Destination
table records to make both tables in sync.
...

now, please detail the psuedo code for that? Please - I'd love to see it because it would involve a level of "smarts" not ever before seen.

Say I have a table:

t1 and t2 have columns (c1, c2, c3)

T1( row1 = 1, 2, 3 )
T2( row1 = 1, 6, 0 )

Now, synchronize those for me please. Go for it, what is the answer?

No matter what answer YOU post, I'll say "nope, you got wrong"

Eg: you say the answer is obviously:

result( row1 = 1, 6, 0 )

because obviously c1 is the primary key. I say "no, that is wrong, c1, c2 is the primary key the obviously correct answer is

result( row1 = 1, 2, 3 | row2 = 1, 6, 0 )

and so on - no matter what answer you say, I'll say "you are wrong"

Now, please, explain to us how this can work.....

And when you do - we'll have the PSUEDO code that implements your algorithm and from that we can write code.

And when you cannot (which is more what I suspect), your "requirement" will magically change, refine and become implementable...



Your I) II) and III) above are so ill defined as to be not worthwhile to read.

I) says - Compare each column values of Source table with the Destination table.

Ok, we have nothing to join on, so therefore, we must compare each column of record 1 in source WITH EVERY ROW - EVERY ROW IN DESTINATION

II) says - If the record is not present in the destination table, then INSERT into
destination table fetching from the source table.

well, I don't see how I) gives us that information, but we could actually accomplish this

iii) says - If there is a mismatch in data then DELETE the record from the destination
table and insert the corresponding record fetching from the source table

uh-oh. how exactly does this work??!?!?! If there is a mismatch? Well, re-read i) again please - unless the table is full of the same record, there will obviously be a mismatch again - obviously



Please - think about this - really. Take your time and describe exactly how to merge my T1 and T2 above....

Re: How to compare two tables of data dynamically

Gowtham .P, October 07, 2008 - 8:52 am UTC

Hello Tom,

Here is my Pseudocode, this will do a flat comparison between the Source table and Destination
table records to make both tables in sync.
That is the data from the table T1 will be stored into T2

Source table name --> T1 (in our case)
Destination table name --> T2 (in our case)

-- My declaration section
TYPE t_REF IS REF CURSOR;
c_REF t_REF;

TYPE t_rowid IS TABLE OF VARCHAR2 (50) INDEX BY VARCHAR2 (50);
tSource t_rowid; -- Holds the ROWID of table
tDestn t_rowid;-- Holds the ROWID of table
tCommonDestn t_rowid;-- Holds the ROWID of table
tCommonSource t_rowid;-- Holds the ROWID of table


vrowid VARCHAR2 (50);-- Holds the ROWID of table
vrowid_Destn VARCHAR2 (50);-- Holds the ROWID of table
vrowid_src VARCHAR2 (50);-- Holds the ROWID of table
i VARCHAR2 (50);

BEGIN

-- STEP #1 :
-- Write a Dynamic SQL(vSQL) to Fetch all rowid from Destination(T2) table.

-- STEP #2 :
-- open REF CURSOR c_REF for above STEP #1 dynamic SQL
-- Loop through and Populate a PL/SQL table "tDestn" with index as well as the element as ROWID of Destination table(T2).
Example:
tDestn (vrowid) := vrowid;
-- CLOSE REF_CURSOR c_REF;

-- STEP #3 :
-- Write a Dynamic SQL(vSQL) to Fetch all rowid from Source(T1) table;

-- STEP #4 :
-- open REF CURSOR c_REF for above STEP #2 dynamic SQL
-- Loop through and Populate a PL/SQL table "tSource" with index as well as the element as ROWID of Source table.
-- CLOSE REF_CURSOR c_REF;

-- STEP #5 :
-- Write a Dynamic SQL(vSQL) to Fetch the common rowid's from Destination table and Source table
-- By comparing the all the column values

Example :
vsql := 'select a.rowid,B.rowid from '|| T2 || ' a,'|| T1 || ' b '
'where a.col_first = b.col_first
and ............. a.col_last = b.col_last';

-- STEP #6 :
-- open REF CURSOR for above STEP #5 dynamic SQL
-- Loop through and Populate two PL/SQL table "tCommonDestn","tCommonSource" with index as well as the element as ROWID of Destination and Source table respectively;
-- CLOSE REF_CURSOR;

Example :

OPEN crows FOR vsql;
LOOP
FETCH crows INTO vrowid_Destn, vrowid_src;
EXIT WHEN crows%NOTFOUND;
BEGIN
tCommonDestn (vrowid_Destn) := vrowid_Destn;
tCommonSource (vrowid_src) := vrowid_src;
END;
END LOOP;
CLOSE crows;

-- STEP #7 :
-- Delete from the PL/SQL table "tDestn" (got through STEP #2) for all the values in PL/SQL "tCommonDestn" in step #6.
-- After this the PL/SQL table "tDestn" is left out with ROWID which is different from source.
Example:
i := tCommonDestn.FIRST;

WHILE i IS NOT NULL
LOOP
tDestn.DELETE (i); -- deleting from dest pl/sql table
i := tCommonDestn.NEXT (i);
END LOOP;

-- STEP #8 :
-- Now for the ROWID left in PL/SQL table "tDestn"
-- DELETE the records from destination (T2) table at database.
-- The Deletion happens using Execute immediate statment

Example:
i := tDestn.FIRST;

WHILE i IS NOT NULL LOOP
BEGIN
EXECUTE IMMEDIATE 'delete from ' || T2 || ' where rowid=:1' USING i;
EXCEPTION WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE( 'ERROR IN DELETION '||SQLERRM);
END;

i := tDestn.NEXT (i);
END LOOP;

COMMIT;


-- STEP #9 :
-- Delete from the PL/SQL table "tSource" (got through STEP #4) for all the values in PL/SQL "tCommonSource" in step #6.
-- After this the PL/SQL table "tSource" is left out with ROWID which is different from Destination.

-- STEP #10 :
-- Now for the ROWID left in PL/SQL table "tSource" ,select the data for those ROWID's from Source table(T1) and
-- INSERT the records into destination (T2) table at database.
-- The Insertion happens using Execute immediate statment
i := tSource.FIRST;

WHILE i IS NOT NULL LOOP
BEGIN
EXECUTE IMMEDIATE 'insert into '|| T2 || ' select ' || col_1,COL_2..COL_LAST || ' from '
|| T1 || ' where rowid=:1' USING i;
EXCEPTION WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE( 'ERROR IN INSERTION'||SQLERRM);
END;

i := tSource.NEXT (i);
END LOOP;

COMMIT;

-- STEP #11 :
-- Empty all the PL/SQL table
tSource :=v_empty;
tDestn :=v_empty;
tCommonDestn :=v_empty;
tCommonSource :=v_empty;
END;

Tom Kyte
October 08, 2008 - 9:19 pm UTC

...
-- STEP #5 :
-- Write a Dynamic SQL(vSQL) to Fetch the common rowid's from Destination
table and Source table
.


huh??? there is only one case where the rowids would be in common - that would be in a cluster.

In general, your rowids will be unique for each row, even across tables.

You understand a rowid is a physical address right? It has nothing to do with the data contained therein

You have not described a process at all here - unless and until you can identify a KEY FOR YOUR SOURCE AND DESTINATION - your logic will simply be:

delete from destination;
insert into destination select * from source;
commit;

that is all.

Dan, October 07, 2008 - 5:06 pm UTC

"i) Compare each column values of Source table with the Destination table.
ii) If the record is not present in the destination table, then INSERT into destination table
fetching from the source table.
iii) If there is a mismatch in data then DELETE the record from the destination table and insert
the corresponding record fetching from the source table"

In other words, as far as Destination goes:
Keep it if it is present on both tables.
Delete it if it isn't on Source.
Insert it if it is present on Source (and not on Destination).

In which case you might as well just truncate your Destination table, and do a INSERT AS SELECT with append hint after disabling constraints and indexes.
Tom Kyte
October 08, 2008 - 9:56 pm UTC

... In which case you might as well just truncate your Destination table, and do a
INSERT AS SELECT with append hint after disabling constraints and indexes. ...

unless and until they identify the KEY TO JOIN - you are 100% correct.

Display the date witch is not exist in my table

Jamil Alshaibani, November 15, 2008 - 10:07 am UTC

Dear Sir 
I want to get the date which is not exist in this field C_DATE, when I compare between tow date  
And my table structure as the flowing:

SQL> DESC CHEMICAL_CONSUMPTION
 Name                            Null?    Type
 ------------------------------- -------- ----
 C_DATE                                   DATE
 C_JOB_NO                                 NUMBER(8)
 C_BLOCK_TYPE                             VARCHAR2(10)
 C_COLOR_TYPE                             VARCHAR2(10)
 C_RUN_TIME                               NUMBER(6,2)
 C_OPER_CODE                              VARCHAR2(10)
 REMARKS                                  VARCHAR2(450)
 C_YEAR                                   NUMBER(4)
 C_DAY_REMARKS                            VARCHAR2(2000)
 ORDER_CODE                               NUMBER(12)
 RRQ_DOC_CODE                             NUMBER(12)

Can I compare this table with the dual table to display the date witch is not in  CHEMICAL_CONSUMPTION .
Waiting for your valuable answer with example

Best regards

Jamil Alshaibani


Tom Kyte
November 18, 2008 - 6:36 pm UTC

insufficient data, you'll really need to be a bit more "clear"

A reader, November 30, 2008 - 6:54 am UTC

Hi Tom,
I am comparing data from two tables ,if any difference,display the mismatching records only for auditing purpose.

Example:
Case 1
ACCOUNT_CUST_CD (Not matching from both tables) ,account_contact_person_cd (Matching from both tables)

if account_contact_person_cd matching from both tables ,do not display source and target record.
if ACCOUNT_CUST_CD not matches,displays source and Target values on the same.

1-LPPP( ACCOUNT_CUST_CD --source)
1_1696T( ACCOUNT_CUST_CD --Target)

Case 2

ACCOUNT_CUST_CD , account_contact_person_cd not matches ,returns the source and Target values on the same

1-LPPP 1-411FQ( ACCOUNT_CUST_CD
account_contact_person_cd Source)

1_1696T 1-411FT(ACCOUNT_CUST_CD
account_contact_person_cd Target)

Thanksin advance
MR




Tom Kyte
December 01, 2008 - 6:19 am UTC

have fun I guess?

you do not seem to be asking anything
you sure didn't provide a create table or insert statements

your 'explanation' is so vague as to beg more questions than answers...

Look at this case:

Case 1
ACCOUNT_CUST_CD (Not matching from both tables) ,account_contact_person_cd
(Matching from both tables)

if account_contact_person_cd matching from both tables ,do not display source
and target record.
if ACCOUNT_CUST_CD not matches,displays source and Target values on the same.

1-LPPP( ACCOUNT_CUST_CD --source)
1_1696T( ACCOUNT_CUST_CD --Target)



ok, so - how do you JOIN these tables????

by account_cust_cd - apparently not, you said "not matching from both tables"

by account_contact_person_cd - apparently not, you said "if they match by this column from both tables - do not display source and target" - so if that were the join key - we'd never had anything to display obviously.


Sorry - this is very ill defined.

Adding the ability to have tolerance...

Philip Moore, February 09, 2010 - 1:18 pm UTC

Hi Tom,

I've used your result set comparison in Production (albeit with the de-duplicate flaw) for quite some time to automate data quality checks in a PL/SQL ETL for datawarehouse applications.

Here is the PL/SQL that we've used for quite some time (using your first approach with the de-duplicate issue):

CREATE OR REPLACE FUNCTION compare_query_results (
   p_query1                     IN   VARCHAR2
 , p_query2                     IN   VARCHAR2
 , p_raise_error_if_not_equal   IN   BOOLEAN DEFAULT FALSE
 , p_raise_error_if_no_rows     IN   BOOLEAN DEFAULT FALSE
)
   RETURN NUMBER
IS
   -- Constants
   c_query_results_equal       CONSTANT PLS_INTEGER       := 0;
   c_query_results_not_equal   CONSTANT PLS_INTEGER       := 1;
   oracr                       CONSTANT VARCHAR2 (1)      := CHR (10);
   -- Variable Declaration
   v_sql_stmt                           VARCHAR2 (32767);
   v_record_count                       PLS_INTEGER;
   v_return_code                        PLS_INTEGER;
   v_record                             DUAL.dummy%TYPE;
   v_result_set_has_rows                BOOLEAN;
   -- Ref Cursors
   v_cursor                             sys_refcursor;
   -- Custom Defined-Exceptions
   result_sets_do_not_match             EXCEPTION;
   query_returns_no_rows                EXCEPTION;
BEGIN
   -- Get the count of differing records between p_query1 and p_query2
   v_sql_stmt               :=
         'WITH query1 AS (SELECT /*+ materialize */'
      || SUBSTR (p_query1, INSTR (UPPER (p_query1)
                                , 'SELECT'
                                , 1
                                , 1
                                 ) + 6)
      || ')
         , query2 AS (SELECT /*+ materialize */'
      || SUBSTR (p_query2, INSTR (UPPER (p_query2)
                                , 'SELECT'
                                , 1
                                , 1
                                 ) + 6)
      || ')
       SELECT ''X''
  FROM (
   (SELECT * FROM query1 MINUS SELECT * FROM query2)
    UNION ALL
   (SELECT * FROM query2 MINUS SELECT * FROM query1)
  )';

   OPEN v_cursor
    FOR v_sql_stmt;

   FETCH v_cursor
    INTO v_record;

   v_result_set_has_rows    := v_cursor%FOUND;

   CLOSE v_cursor;

   -- If there are rows - the result sets do NOT match...
   IF v_result_set_has_rows
   THEN
      v_return_code    := c_query_results_not_equal;

      IF p_raise_error_if_not_equal
      THEN
         RAISE result_sets_do_not_match;
      END IF;
   -- If there are no rows - the result sets do match...
   ELSIF NOT v_result_set_has_rows
   THEN
      IF p_raise_error_if_no_rows
      THEN
         -- Check to make sure that the queries contain rows if desired...
         v_sql_stmt    := 'SELECT ''X''
   FROM (' ||             oracr || p_query1 || oracr || ')';

         OPEN v_cursor
          FOR v_sql_stmt;

         FETCH v_cursor
          INTO v_record;

         IF v_cursor%NOTFOUND
         THEN
            CLOSE v_cursor;

            RAISE query_returns_no_rows;
         END IF;

         CLOSE v_cursor;
      END IF;

      v_return_code    := c_query_results_equal;
   END IF;

   RETURN v_return_code;
EXCEPTION
   WHEN result_sets_do_not_match
   THEN
      raise_application_error (-20101, 'The Queries'' result sets do NOT match. Error returned as requested.');
   WHEN query_returns_no_rows
   THEN
      raise_application_error (-20102, 'The Queries'' result sets match, however they contain no rows. Error returned as requested.');
   WHEN OTHERS
   THEN
      -- Raise the error
      raise_application_error (-20103
                             ,    'There is a syntax or semantical error in one or both queries preventing comparison.'
                               || oracr
                               || 'Error Stack    :'
                               || oracr
                               || DBMS_UTILITY.format_error_stack ()
                               || oracr
                               || 'Error_Backtrace:'
                               || oracr
                               || DBMS_UTILITY.format_error_backtrace ());
END compare_query_results;
/


This seems to work fine for us (at least thus far) - and even works from SQL:
SELECT compare_query_results (/* p_query1 => */ 'SELECT ''APPLES'' AS fruit, 100 AS sales FROM dual'
                            , /* p_query2 => */ 'SELECT ''APPLES'' AS fruit, 100 AS sales FROM dual') AS results_are_equal
     , compare_query_results (/* p_query1 => */ 'SELECT ''APPLES'' AS fruit, 100 AS sales FROM dual'
                            , /* p_query2 => */ 'SELECT ''APPLES'' AS fruit, 150 AS sales FROM dual') AS results_not_equal
  FROM dual;

Results:
RESULTS_ARE_EQUAL      RESULTS_NOT_EQUAL      
---------------------- ---------------------- 
0                      1                      




We often call this from PL/SQL - with the boolean parameters: p_raise_error_if_not_equal and p_raise_error_if_no_rows set to TRUE to immediately raise an error (and thus notify someone that the ETL job has failed due to data quality issues).

One of the things I've struggled with, however (due to my math rustiness) - is how could I use this technique while allowing for small tolerances for numeric result set columns? My company deals with billions of dollars, so allowing for small tolerances in our data quality check queries would be great.

I can't get my head around comparing rows with tolerances - when I can't compare row by row procedurally. I know there must be some clever rounding solution that involves CEIL or FLOOR or something, but I can't figure it out.

I know you have a great math background, and I wanted to see if you would know off-hand how I could modify my two queries to allow for tolerance in comparing my numeric columns - while continuing to use your result set comparison technique.

Thank you!


Tom Kyte
February 15, 2010 - 2:53 pm UTC

In order to compare a row to another row (which you want to do) you need to JOIN


you would have to have some set of key columns, and some set of "have to be close" columns and some set of "have to be the same"

basically you would:


select * 
  from (query1) q1 full outer join (query2) q2 on (q1.key_cols = q2.key_cols)
 where (abs(q1.c1-q2.c1) > tolerance 
        or q1.c1 is null and q2.c2 is not null 
        or q1.c1 is not null and q2.c2 is null)
       OR 
       ....
       OR
       (abs(q1.cN-q2.cN) > tolerance 
        or q1.cN is null and q2.cN is not null 
        or q1.cN is not null and q2.cN is null)
       OR
       decode( q1.x1, q2.x2, 1, 0 ) = 0
       OR
       ...
       OR
       decode( q1.xM, q2.xM, 1, 0 ) = 0




where c1..cn are the Have to be close columns and x1..xM are the have to be the same columns

OK

Rama, February 16, 2010 - 7:28 am UTC

Hi,
How about using the dbms_comparison package?

Tom Kyte
February 16, 2010 - 5:17 pm UTC

for what? what is your goal?

Thanks Tom...

Philip Moore, February 17, 2010 - 9:26 am UTC

Thanks for the excellent idea. I was focused in on using a square peg solution for a circle issue.

I'll write another PL/SQL framework to take advantage of this comparison technique :)

I believe it will be quite useful.

Thanks again.

Sincerely,

Phil

disk quota

surender, May 08, 2010 - 7:07 am UTC

i want give a paraaticular space of my user plz give me answer as soon as possible
Tom Kyte
May 08, 2010 - 7:52 am UTC

you have got to be joking right? I checked my calendar, April 1 happened *last month*.


PLZ - German postal codes? I don't get it, what do they have to do with Oracle?

"give me answer as soon as possible", hmmmm



"I want give a particular space of my user" - I have no idea what they means.


If you want to put a quota or see what quotas have been assigned on users:

http://www.oracle.com/pls/db112/search?remark=quick_search&word=quota


It could be as easy as plugging the word quota into the documentation and hitting enter!

Table Comparison

A reader, May 08, 2010 - 12:03 pm UTC

Hi TOm:

Can you give me some hints or proper approach for data comparison.

Bascically, I have table EMP on server P (9i production dabase).
Table was exported and then imported into 11g test database on Server T.


THe import log did not indicate any errors other than character set conversion. THe 9i was iso8859 and 1g was mswin1252.

In summary I want to run a TEST to compare if the data is identical in both tables.

Problem is that i cant create a DB link from test to production to run a sql statement to compare data.

However, i can create a sql*plus or TOAD session to each database fom my windows XP client.


Is this the correct SQL for data comparison? Is there a way to do it using two client session or i need to dump the table data to TEXT file and then run some too to do a DIFF on both files.


select col1, col2, .. from tab1
minus
select col1, col2, .. from tab2

Tom Kyte
May 10, 2010 - 6:22 pm UTC

if you use a dblink of any sort - that itself will change the data :) You have different character sets, the same conversion will take place. The act of selecting the data from db1 with cs1 into db2 with cs2 will run the conversion again.


You could look at the hashing functions available to you - from dbms_utility.get_hash_value to ora_hash (but that was 10g, so ora_hash might not work) to md5 checksums in dbms_crypto/dbms_obfuscation_toolkit - whatever.

That would give you a checksum/hash value per row.

You could add them up, average them, min, max, stddev - etc and compare just those numbers.

If any differ - data is somehow 'different'

since they are different charactersets and I don't believe one is the subset of the other - I would expect them to be potentially different.

data

A reader, May 11, 2010 - 9:02 am UTC

Tom:

mswin1252 is a superset of iso88591p1. It maps/includes to all the iso88591 characters and has 27 additional code points.

http://en.wikipedia.org/wiki/Windows-1252

I cannot create a DB link. I assume I would need to run 2 reports: one for same table in each DB and check the PK and MD5 hash value for each.

Based on what you said if I have "ABC" in one row with CS1 and "ABC" in same row in CS2, oracle might tell me there is a difference if the code points are different.

so the "ABC" is more of a display thing really for the user.

I ran CSSCAN and reports indicated 100% changeless conversion. Very few non-standard windows characters are not supported in ISO8859 itself. I can correct those.

minus or JOIN

Branka, July 19, 2010 - 2:45 pm UTC

I need to compare data in 2 tables.

When should I use select like
SELECT c1, c2,c3
FROM A JOIN B
ON A.id1=B.id1 AND A.id2=B.id2
WHERE A.c1<>B.c1
OR A.c2<>B.c2
OR A.c3<>B.c3

And when should I use
SELECT id1,id2,c1,c2,c3
FROM A
MINUS
SELECT id1,id2,c1,c2,c3
FROM B
Tom Kyte
July 23, 2010 - 6:37 am UTC

well, the second one shows the unique rows in A that are not in B (unique rows, not all of the rows - eg:


ops$tkyte%ORA10GR2> with a as
  2  (select 1 x from dual
  3   union all
  4   select 1 x from dual
  5   union all
  6   select 2 x from dual
  7  ),
  8  b as
  9  (select 2 x from dual
 10   union all
 11   select 3 x from dual)
 12  select * from a minus select * from b;

         X
----------
         1



the first one returns rows from A that *exist* keywise (A.ID1, A.ID2 must exist in B) and some of the other columns are not equal (NULLS are a problem with that)

in short, they return complete different answers. So there is no answer for you - since you are not giving me two SEMANTICALLY EQUIVALENT queries to compare.

You would have to use the one that returns the answer that answers the question you were asking, these queries answer two entirely different questions.

compare tables

Branka, July 26, 2010 - 10:27 am UTC

I realized that for second query, I need to use select that you developed on AskTom site, and I twisted a little, so it will use only records that exist in both tables. I would have problem only if same record exist in both tables, and is same, but in one of the tables it has 1 record, and in second one 2 records. That can happen when id1 and id2 are PK in first table, but not in second.
Other than that, it seam that result should be same.
Do you see any difference between those 2 queries? Which one would you recommend to be used?

SELECT c1,
c2,
c3
FROM
( SELECT b1.*, 1 src1, to_number(NULL) src2 FROM b1
UNION ALL
SELECT b2.*, to_number(NULL) src1, 2 src2 FROM b2
)
WHERE (id1,id2) IN
(SELECT id1, id2 FROM b1
INTERSECT
SELECT id1, id2 FROM b2
)
GROUP BY c1,
c2,
c3,
id1,
id2
HAVING COUNT(src1) <> COUNT(src2)
order by 1,2,3
/

SELECT b1.c1, b1.c2,b1.c3
FROM b1 JOIN B2
ON b1.id1=b2.id1 AND b1.id2=b2.id2
WHERE b1.c1<>B2.c1
OR b1.c2<>B2.c2
OR b1.c3<>B2.c3
UNION ALL
SELECT b2.c1, b2.c2,b2.c3
FROM b1 JOIN B2
ON b1.id1=b2.id1 AND b1.id2=b2.id2
WHERE b1.c1<>B2.c1
OR b1.c2<>B2.c2
OR b1.c3<>B2.c3
order by 1,2,3
/

Tom Kyte
July 27, 2010 - 11:57 am UTC

if you wanted to only compare records that have keys in both, it would be:

select ....
  from b1, b2
 where b1.id1 = b2.id1 and b1.id2 = b2.id2
   and ( decode( b1.c1, b2.c1, 0, 1 ) = 1
         OR
         decode( b1.c2, b2.c2, 0, 1 ) = 1 
         OR 
         ....)


decode is "null comparison friendly", if either b1.cN or b2.cN were null, the <> stuff stops working as you would probably want it to work.

Since you are only interested in records that exists IN BOTH, just a single natural join with the or'd decodes is all you need/want.

A reader, July 27, 2010 - 12:36 pm UTC

what if both the column are null?

thanks


Tom Kyte
July 27, 2010 - 12:51 pm UTC

when you tested it out to see, what did you find out?

ops$tkyte%ORA10GR2> select decode( null, null, 0, 1 ) from dual;

DECODE(NULL,NULL,0,1)
---------------------
                    0


Null is equal to null using decode.

not null is NOT equal to null using decode.

compare tables

Branka, July 27, 2010 - 1:08 pm UTC

Tom,

Thanks a lot.

What I realy want is to have some report that will say:
'exist in B1 and not in b2',1,2,3
'exist in B1 and not in b2',1,2,4
etc
'exist in B2 and not in b1',2,3,4
'exist in B2 and not in b1',2,3,5
etc
'exist in both but different',3,4,5
'exist in both but different',3,4,6
etc

So, having record in both tables but different is just part of the problem.
In my database first query was much faster because of lack of indexes. (It executes in 1 minute, and second one in 5 hours)

Tom Kyte
July 28, 2010 - 7:29 am UTC

then, ummm, you have a strange way of saying things.

Go back to my ORIGINAL QUERY, it does that.

https://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html

"Comparing the Contents of Two Tables"

Simply order the results by your concept of the primary key and you'll get a report that will show records in T1 not in T2, in T2 not in T1, and (since they are right next to each other) records with the same primary key but at least one column that is different right next to each other in the output. You'll see the T1 record and the T2 record right next to each other.

Syncing un-keyed tables

Kevin, July 27, 2010 - 3:01 pm UTC

TK: "You have not described a process at all here - unless and until you can identify a KEY FOR YOUR SOURCE AND DESTINATION - your logic will simply be:
delete from destination;
insert into destination select * from source;
commit;
that is all. "


I'm not sure if this is what the reader was looking for, but I've developed the following approach for synchronizing key-less tables without emptying and repopulating the entire destination table.

T1 is the source table, T2 is the destination.

DROP TABLE T1
/
DROP TABLE T2
/
CREATE TABLE T1
AS SELECT LEVEL C1, 'TEST'||LEVEL C2 FROM DUAL CONNECT BY LEVEL < 1000001
/
CREATE TABLE T2
AS SELECT LEVEL C1, 'TEST'||LEVEL C2 FROM DUAL CONNECT BY LEVEL < 1000001
/
INSERT INTO T1 VALUES (1, 'TEST1')
/
INSERT INTO T2 VALUES (1, 'TEST1')
/
INSERT INTO T1 VALUES (2, 'TEST2')
/
UPDATE T1 SET C2='DIFF3' WHERE C1 = 3
/
DELETE FROM T1 WHERE C1 = 4
/


BEGIN
FOR L_DIFFS IN (
SELECT C1, C2,
SUM(T1_COUNT) INSERT_COUNT,
SUM(T2_COUNT) DELETE_COUNT,
STRAGG(RID) DELETE_RIDS
FROM
(SELECT /*+ parallel(T1) */ T1.*, NULL RID, 1 T1_COUNT, 0 T2_COUNT FROM T1 UNION ALL
SELECT /*+ parallel(T2) */ T2.*, ROWIDTOCHAR(ROWID) RID, 0 T1_COUNT, 1 T2_COUNT FROM T2)
GROUP BY C1, C2
HAVING SUM(T1_COUNT) <> SUM(T2_COUNT)
)
LOOP
FOR I IN 1..L_DIFFS.INSERT_COUNT LOOP
INSERT INTO T2 VALUES (L_DIFFS.C1, L_DIFFS.C2);
END LOOP;
FOR J IN 1..L_DIFFS.DELETE_COUNT LOOP
DELETE FROM T2
WHERE ROWID = CHARTOROWID(SUBSTR(L_DIFFS.DELETE_RIDS,19*J-18,18));
END LOOP;
END LOOP;
END;
/




Tom Kyte
July 28, 2010 - 7:34 am UTC

ummm, what do you call "C1" and "C2"


I call them "keys" in your example.....


You could do this in a single sql statement you know - c1 and c2 ARE YOUR KEYS.



a new query to diff 2 tables

vasile, July 28, 2010 - 4:11 am UTC

I have a different approach.  This will display both tables in parallel making a full join between them. 
 The only disadvantage is that the tables must have primary key. 

SQL> 
DROP TABLE t1;
DROP TABLE t2;

CREATE TABLE t1
( pk NUMBER(9) primary key, 
  a  char(3),
  b  varchar2(15),
  c  number(10,2)
);


CREATE TABLE t2
( pk NUMBER(9) primary key, 
  a  char(3),
  b  varchar2(15),
  c  number(10,2)
);

INSERT INTO t1 VALUES (1,'ABC', 'foo text1', 3.14);
INSERT INTO t1 VALUES(2,'MNX', 'foo text2', 101);
INSERT INTO t1 VALUES(3,'NBA', 'foo text3', 102);
INSERT INTO t1 VALUES(10,'TOM','foo text4', 103);

INSERT INTO t2 VALUES(1,'ABC', 'foo text1', 3.14);
INSERT INTO t2 VALUES(2,'XXX', 'foo text2', 101);
INSERT INTO t2 VALUES(3,'NBA', 'BOO',       102);
INSERT INTO t2 VALUES(20,'TOM','foo text4', 103);

SELECT t1.*, t2.*
FROM  t1 
FULL JOIN  t2 ON (t1.pk = t2.pk)
WHERE t1.pk is null 
OR t2.pk is null 
OR t1.a <> t2.a
OR t1.b <> t2.b 
OR t1.c <> t2.c;

Tom Kyte
July 28, 2010 - 7:43 am UTC

You better use:

( decode( b1.c1, b2.c1, 0, 1 ) = 1
         OR
         decode( b1.c2, b2.c2, 0, 1 ) = 1 
         OR 
         ....)


if any of your columns is nullable.

... The only disadvantage is that the tables must have primary key. ...

how is that a disadvantage? IT IS A PREREQUISITE TO COMPARING TWO TABLES. If you have no key - you have nothing to compare!!!!!!!!!!!!!!!!!!!!!!!!!!

vasile, July 28, 2010 - 8:53 am UTC

"The only disadvantage is that the tables must have primary key" this part suppose to be ironical :) .
I don't talk to tables with no primary key.
Tom Kyte
July 28, 2010 - 9:21 am UTC

it is not possible to tell that was to be sarcastic the way it was written...

More on keyless tables

Kevin, July 28, 2010 - 10:52 am UTC

Tom,  

Thanks for the follow-up.  

TK: "ummm, what do you call "C1" and "C2" 
I call them "keys" in your example..... 
You could do this in a single sql statement you know - c1 and c2 ARE YOUR KEYS. "


With respect, they are not keys.  My example explicitly inserted duplicate rows into the table:

SQL> SELECT * FROM T1 WHERE C1=1;

        C1 C2
---------- --------------------------------------------
         1 TEST1
         1 TEST1

SQL> 


I do not disagree that a "keyless" table is intrinsically a "bad thing".  I would never, ever create a table without a PK.  On encountering such a table, no matter the context, I would not bite my tongue in explaining that no-PK == bug-waiting-to-happen, always.

Nevertheless, I have been handed a directive to synchronize (over a dblink) a table lacking keys and containing duplicates with a remote counterpart.  With no other contextual information, I could only assume that duplicates on the source table are there by design; that they have some "business-meaning", and that they must be replicated in content and cardinality (e.g. if a row is duplicated 3 times on the source, and only twice on the destination, I need to effectively add one more duplicate to the destination).  


TK: "IT IS A PREREQUISITE TO COMPARING TWO TABLES. If you have no key - you have nothing to compare!!!!!!!!!!!!!!!!!!!!!!!!!! "

It seems to me that two tables without keys can be compared, and are equivalent if they have the same rows.  For instance, if I sort and print the contents of two tables T1 and T2 to files T1.txt and T2.txt, and T1.txt is bytewise-identical to T2.txt, then the tables have been compared, and are equivalent.  


TK: "You could do this in a single sql statement you know - c1 and c2 ARE YOUR KEYS. "

I am skeptical that a single SQL statement exists which could update table T2 to be equivalent to table T1:

T1
        C1 C2
---------- --------------------------------------------
         1 TEST1
         1 TEST1
         1 TEST1
         2 TEST2
         3 TEST3


T2
        C1 C2
---------- --------------------------------------------
         1 TEST1
         1 TEST1
         2 TEST2
         2 TEST2
         4 TEST4

CREATE TABLE T1 
AS SELECT 1 C1, 'TEST1' C2 FROM DUAL UNION ALL
AS SELECT 1 C1, 'TEST1' C2 FROM DUAL UNION ALL
AS SELECT 1 C1, 'TEST1' C2 FROM DUAL UNION ALL
AS SELECT 2 C1, 'TEST2' C2 FROM DUAL UNION ALL
AS SELECT 3 C1, 'TEST3' C2 FROM DUAL 
/
CREATE TABLE T1 
AS SELECT 1 C1, 'TEST1' C2 FROM DUAL UNION ALL
AS SELECT 1 C1, 'TEST1' C2 FROM DUAL UNION ALL
AS SELECT 2 C1, 'TEST2' C2 FROM DUAL UNION ALL
AS SELECT 2 C1, 'TEST2' C2 FROM DUAL UNION ALL
AS SELECT 4 C1, 'TEST4' C2 FROM DUAL 
/

Tom Kyte
July 28, 2010 - 11:15 am UTC

your key is in fact the entire row - that they have 'duplicates' for whatever reason is goofy - but still - you have a key - the entire row. That is the basis for your comparison.

and your sync "syncs" such that T2 ends up looking like T1 - it would be easier just to truncate + insert /*+ append */, you end up moving the one table to the other for a complete comparison anyway (you end up doing a lot of work to figure out "have to replace one row", you could - probably should - just replace the rows period). Especially since this is happening over the network (I would have to assume, if not, I cannot fathom why T1 and T2 would be in the same database) - most of the work will be copying T1 over to T2's database for the comparison.


If I was doing this and needed total online availability - I would use partitioning - insert into partition 1 one month and truncate partition 2 (hence committing the insert). Then next month insert into partition 2 and truncate partition 1 - and so on. You would always have a perfect point in time copy of the data (hopefully during your resync - t1 and t2 are locked - otherwise your sync is easily broken).


More context.

Kevin, July 28, 2010 - 5:46 pm UTC

Tom,

You're correct, the tables are separated over the network and much of the syncronization work is simply pulling the data across.

A wider context may help explain my design: we are running a single-directional Streams process from source to [a read-only] destination for many large tables.

On occasion, the source capture process has a hiccup or otherwise unrecoverable issue which necessitates that we re-create the capture as of a specific "fresh" SCN. When this happens, we must rapidly resync all destination tables to match the source tables as of that SCN before we can turn the Streams process back on. This must be done with zero downtime on the destination tables; and quickly, as each passing second contributes to our reportable average latency (which we seek to minimize).

The partitioned table idea is an interesting one; however, it would necessitate adding a partioning column on the destination table which doesn't exist in the source table; thus we'd have to customize the Streams process to correctly assign a partitioning-column value as the tables' data was streamed. The fact that many tables already have partitioning schemes further complicates the idea, as we'd have to convert these partitioning schemes into sub-partitioning on a one-by-one basis. Furthermore, the tables are heavily indexed, making the insertion / indexing of all rows of the tables a slower overall operation than the PL/SQL comparison method (see next para).

For the record, the main contending design to the one we're using was to simply pull all of the source data into a temp table on the destination side, build all indexes, constraints, grants, etc., and then use table RENAME operations to swap the data out. Extensive benchmarking showed that the index rebuild operations, even maximally parallelised, tended to take about 2 - 3 times longer than the aforementioned compare-tables-and-sync-with-PL/SQL logic (slower because the tables are heavily indexed and the PL/SQL-syncing is fast as the tables are > 99.95% in sync). Plus, the RENAME DDL left me making the uncomfortable statement that a nonzero timeline exists where the destination table wouldn't exist and code referring to it might get errors accordingly.

I didn't really mean to go to this level of detail, but just wanted to say what a tremendous value it is to present a design to critical analysis - thanks for your time, input, and criticism!
Tom Kyte
July 29, 2010 - 9:34 am UTC

I didn't really mean to go to this level of detail, but just wanted to say what
a tremendous value it is to present a design to critical analysis - thanks for
your time, input, and criticism!


but that is why I always ask "why" :)

you see, without this level of detail - it is impossible to judge whether something is the 'best way'.


Say someone came up to you and said "I need to synchronize two tables, how do I do it".

In my opinion, the only right answer is either:

a) why?
b) it depends.

I usually lead with "why" - because many times the 'best way' turns out to be 'do not do it, do this instead'.

But when the why leads to "ok, you need to do it", then we get into the "it depends".

The answer might be "truncate + insert /*+ APPEND */"

The answer might be "add partitioning column, insert /*+ append */ into empty partition, truncate old"

The answer might be "use a materialized view"

The answer might be your approach

And so on - it all depends - we need that context to be sure. Your bit about:

... as the tables are > 99.95% in sync). ...


adds a ton of context. In your case, it looks like you've chosen the 'best approach' given your context - after bench marking and prototyping the various implementations you thought might work.

Well done :)

re: more on keyless tables by Kevin

Stew Ashton, July 30, 2010 - 12:17 pm UTC


When comparing and synchronizing tables, there are actually three cases:

1) There is a primary key plus other columns not referenced in the index;
2) There is a primary key that contains all the columns in the table;
3) There is no primary key.

When comparing, Tom's method works with all three; the "full join" method works with cases 1) and 2).

When synchronizing, case 1) allows a one-statement compare and MERGE (in 10G and above). See http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1395074000346990883#1405766500346136828
Case 2) could be done with a temporary table and two statements:
1) a multi-table INSERT:
a) insert directly into the target table
b) insert into the temporary table the rows to be deleted
2) a DELETE statement based on the temporary table.

I imagine case 3) could be done in two statements, although I wouldn't want to maintain either one :)


compare values in 2 tables

Branka, August 06, 2010 - 11:43 am UTC

I need to compare 2 tables, and to create report.
Both tables have same PK, but some of the PK values from the first table maybe are not in the second one, and some of the PK values from the first one maybe are not in the second one. For all values where PK value in first one is same as in second one, I need to compare other fields in the table, and find if it is same or not. If it is not same, I need to list it.
Lets say that ID is PK in both tables:

Table tab1
pk col1 col2
1 a1 a2
2 b1 b2
3 c11 c2
5 e1 e2
Table tab2
pk col1 col2
2 b1 b2
3 c1 c2
4 d1 d2


Report that I need it

Your report would return all data that I need, but from that output I can not form report that I need. I asked you if you have solution for the output that I am looking for?

Output that I would get using your query would look like this:

ID COL1 COL2 CNT1 CNT2
---------------------- ---------- ---------- ---------------------- ----------------------
1 a1 a2 1 0
5 e1 e2 1 0
3 c1 c2 0 1
4 d1 d2 0 1
3 c11 c2 1 0


What I need is :

text id col1 col2
-----------------------------------------------------------
Exist in table tab1 but not in tab2 1
Exist in table tab1 but not in tab2 5
Exist in table tab2 but not in tab1 4
Exist in both but different in tab1 3 c11 c2
Exist in both but different in tab2 3 c1 c2





SELECT id,
col1,
col2,
COUNT(src1) CNT1,
COUNT(src2) CNT2
FROM
( SELECT tab1.*, 1 src1, to_number(NULL) src2 FROM tab1

UNION ALL

SELECT tab2.*, to_number(NULL) src1, 2 src2 FROM tab2
)
GROUP BY id,
col1,
col2
HAVING COUNT(src1) <> COUNT(src2)







CREATE TABLE DQADMIN.TAB1
( ID NUMBER,
COL1 VARCHAR2(10 BYTE),
COL2 VARCHAR2(10 BYTE)
)

CREATE TABLE DQADMIN.TAB2
( ID NUMBER,
COL1 VARCHAR2(10 BYTE),
COL2 VARCHAR2(10 BYTE)
)

REM INSERTING into TAB1
Insert into TAB1 (ID,COL1,COL2) values (1,'a1','a2');
Insert into TAB1 (ID,COL1,COL2) values (2,'b1','b2');
Insert into TAB1 (ID,COL1,COL2) values (3,'c11','c2');
Insert into TAB1 (ID,COL1,COL2) values (5,'e1','e2');

REM INSERTING into TAB2
Insert into TAB2 (ID,COL1,COL2) values (2,'b1','b2');
Insert into TAB2 (ID,COL1,COL2) values (3,'c1','c2');
Insert into TAB2 (ID,COL1,COL2) values (4,'d1','d2');




Tom Kyte
August 09, 2010 - 12:24 pm UTC

What I need is :

text            id col1 col2
-----------------------------------------------------------
Exist in table tab1 but not in tab2  1
Exist in table tab1 but not in tab2  5
Exist in table tab2 but not in tab1  4
Exist in both but different in tab1  3  c11  c2
Exist in both but different in tab2  3  c1  c2 



that output does not make sense to me, explain it please. I don't get it.

comapre data in 2 tables

Branka, August 10, 2010 - 2:25 pm UTC

I need to make report where I first list all records (PK) that exist in tab1 but not in tab2, in the continuation of that I need to list all records that belong to tab2, but not to tab1 (PK), than all records that exist in both tables (same PK), but have some record different. That I need to list 2 times, first how diff. records look in tab1, and than how those records look in tab2.
Your select bring back all those records, but, I do not see a way to know if tecord is only in tab1 and not in tab2, or it is in both tables, and have dif. data.
Tom Kyte
August 18, 2010 - 11:32 pm UTC


full outer join would do that - it doesn't look like your output at all but only because your sample output and your supplied text that was to explain the output do not match in any way I can see....

ops$tkyte%ORA10GR2> select a.*, b.*,
  2         case when t1_id is null then 'in t2, not in t1'
  3                  when t2_id is null then 'in t1, not in t2'
  4                          else 'in both, but different data'
  5                  end msg
  6    from (select id t1_id, col1 t1_col1, col2 t1_col2 from tab1) a
  7         full outer join
  8         (select id t2_id, col1 t2_col1, col2 t2_col2 from tab2) b
  9             on (t1_id = t2_id)
 10   where (t1_id is null or t2_id is null or
 11          decode( t1_col1, t2_col1, 1, 0 ) = 0 or
 12          decode( t1_col2, t2_col2, 1, 0 ) = 0)
 13  /

     T1_ID T1_COL1    T1_COL2         T2_ID T2_COL1    T2_COL2    MSG
---------- ---------- ---------- ---------- ---------- ---------- ---------------------------
         3 c11        c2                  3 c1         c2         in both, but different data
         3 c11        c2                  3 c1         c2         in both, but different data
         3 c11        c2                  3 c1         c2         in both, but different data
         3 c11        c2                  3 c1         c2         in both, but different data
         3 c11        c2                  3 c1         c2         in both, but different data
         3 c11        c2                  3 c1         c2         in both, but different data
         5 e1         e2                                          in t1, not in t2
         1 a1         a2                                          in t1, not in t2

8 rows selected.

compare value in 2 tables

Branka, August 10, 2010 - 2:58 pm UTC

There was formating problem

text id col1 col2
-----------------------------------------------------------
Exist in table tab1 but not in tab2 1
Exist in table tab1 but not in tab2 5
Exist in table tab2 but not in tab1 4
Exist in both but different in tab1 3 c11 c2
Exist in both but different in tab2 3 c1 c2

comapre data in 2 tables

Branka, September 20, 2010 - 2:53 pm UTC

Great and elegant solution. Thanks

Comparing two Tables

A reader, June 18, 2012 - 8:38 am UTC

If I need to compare two tables(Same database) and create a delta table with Insert, Update and delete, is the following SQL okay as far as performance is concerned or do you have a better suggestion or a link?
Database version is Oracle Database 11g Enterprise Edition Release 11.2.0.2.0

WITH old_data
     AS (SELECT 1 AS id, 'A' AS dta FROM DUAL
         UNION ALL
         SELECT 2, 'B' FROM DUAL
         UNION ALL
         SELECT 3, 'C' FROM DUAL),
     new_data
     AS (SELECT 1 AS id, 'A' AS dta FROM DUAL
         UNION ALL
         SELECT 3, 'X' FROM DUAL
         UNION ALL
         SELECT 4, 'Y' FROM DUAL),
     ins_upd
     AS (SELECT * FROM new_data
         MINUS
         SELECT * FROM old_data),
     del_upd
     AS (SELECT * FROM old_data
         MINUS
         SELECT * FROM new_data),
     upd
     AS (SELECT id FROM ins_upd
         INTERSECT
         SELECT id FROM del_upd),
     ins
     AS (SELECT id FROM ins_upd
         MINUS
         SELECT id FROM upd),
     del
     AS (SELECT id FROM del_upd
         MINUS
         SELECT id FROM upd)
--
SELECT 'Inserted' AS action,
       NULL AS old_id,
       NULL AS old_dta,
       new_data.id AS new_id,
       new_data.dta AS new_dta
  FROM new_data JOIN ins ON (ins.id = new_data.id)
UNION ALL
SELECT 'Updated',
       old_data.id,
       old_data.dta,
       new_data.id,
       new_data.dta
  FROM old_data
       JOIN new_data
          ON (old_data.id = new_data.id)
       JOIN upd
          ON (upd.id = new_data.id)
UNION ALL
SELECT 'Deleted',
       old_data.id AS old_id,
       old_data.dta AS old_dta,
       NULL AS new_id,
       NULL AS new_dta
  FROM old_data JOIN del ON (del.id = old_data.id)


Thank you
Tom Kyte
June 18, 2012 - 9:15 am UTC

it depends, if you have a lot of rows to insert/update/delete

drop table some_table;
create table some_table as select * from other_table;


or, if there is a modest number of rows, have some_table be a materialized view of other table and just refresh it.


Comparing two Tables

A reader, June 18, 2012 - 9:23 am UTC

I am sorry if I was not that clear earlier. The requirement is to compare old and new table and create a flat file with only the changes(Insert, Update and Delete) and not a copy of the whole new table data. The old and new table can have 5 millions rows.
Tom Kyte
June 18, 2012 - 9:50 am UTC

here:

http://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html

I wrote about comparing the contents of two tables.

Comparing the Contents of Two Tables

Using that technique - you would get output such as:

SQL> select c1, c2, c3, 
  2         count(src1) CNT1, 
  3          count(src2) CNT2
  4     from 
  5   ( select a.*, 
  6          1 src1, 
  7          to_number(null) src2 
  8       from a
  9      union all
  10    select b.*, 
  11         to_number(null) src1, 
  12         2 src2 
  13      from b
  14   )
  15  group by c1,c2,c3
  16 having count(src1) <> count(src2)
  17 /
 
  C1   C2    C3   CNT1    CNT2
 ---   --    --   ----    ----
   2    x     y      0       1
   2   xx     y      1       0
   3    x     y      1       0
   3    x    yy      0       1


(my tables had columns c1, c2, c3)


If a row doesn't appear in the output - then that record is the same in both tables.


Now, since you compare EVER SINGLE COLUMN - I don't see how you could ever generate an UPDATE of an existing row - there would only be deletes and inserts. You don't mention the fact of a primary key at all and without one, updates are just not possible

so, there are by definition and by your example...

Using the above you can easily see what to delete - any row where cnt1=0 and cnt2=1 (only exists in second table) and then what to insert - any row where cnt1=1 and cnt2=0

Comparing two tables

A reader, June 18, 2012 - 11:07 am UTC

Thank you very much for the link.

For my requirement, C1 is the Primary key and the flat file with the difference I create should not duplicate C1 line
Also, indicate whether the record is insert record, update record or delete record.
Update record is a row we already sent but one or more columns have different value and I don't have to indicate which columns have changed, just have to send the new values.

Can you please review the SQL at the end and comment on the quality of the SQL?

The following is the script to generate sample records.

drop table sk_a;
drop table sk_b;

create table sk_a
(C1   NUMBER(10) primary key,
 C2   VARCHAR2(10),
 C3   VARCHAR2(10));
 
 create table sk_b
(C1   NUMBER(10) primary key,
 C2   VARCHAR2(10),
 C3   VARCHAR2(10));
 
insert into sk_a
values
(1, 'x', 'y');

insert into sk_a
values
(2, 'xx', 'y');

insert into sk_a
values
(3, 'x', 'y');

insert into sk_a
values
(4, 'x', 'y');


insert into sk_b
values
(1, 'x', 'y');

insert into sk_b
values
(2, 'x', 'y');

insert into sk_b
values
(3, 'x', 'yy');

insert into sk_b
values
(0, 'x', 'y');

commit;

SELECT *
FROM sk_a;

SELECT *
FROM sk_b;




SELECT C1,
       NEW_C2,
       OLD_C2,
       NEW_C3,
       OLD_C3,
       (CASE WHEN NEW_C1 IS NULL THEN
                'DELETE'
             WHEN OLD_C1 IS NULL THEN
                'INSERT'
             ELSE
                'UPDATE'
         END) RECORD_TYPE
  FROM (SELECT DISTINCT
               C1,
               FIRST_VALUE (DECODE (SRC1, 1, C1) IGNORE NULLS)
                  OVER (PARTITION BY C1)
                  NEW_C1,
               FIRST_VALUE (DECODE (SRC2, 2, C1) IGNORE NULLS)
                  OVER (PARTITION BY C1)
                  OLD_C1,
               FIRST_VALUE (DECODE (SRC1, 1, C2) IGNORE NULLS)
                  OVER (PARTITION BY C1)
                  NEW_C2,
               FIRST_VALUE (DECODE (SRC2, 2, C2) IGNORE NULLS)
                  OVER (PARTITION BY C1)
                  OLD_C2,
               FIRST_VALUE (DECODE (SRC1, 1, C3) IGNORE NULLS)
                  OVER (PARTITION BY C1)
                  NEW_C3,
               FIRST_VALUE (DECODE (SRC2, 2, C3) IGNORE NULLS)
                  OVER (PARTITION BY C1)
                  OLD_C3
          FROM (SELECT A.C1,
                       A.C2,
                       A.C3,
                       1 SRC1,
                       TO_NUMBER (NULL) SRC2
                  FROM SK_A A
                UNION ALL
                SELECT B.C1,
                       B.C2,
                       B.C3,
                       TO_NUMBER (NULL) SRC1,
                       2 SRC2
                  FROM SK_B B))
 WHERE (   NVL (NEW_C1, -1000) != NVL (OLD_C1,-1000)
        OR NVL (NEW_C2, '~') != NVL (OLD_C2, '~')
        OR NVL (NEW_C3, '~') != NVL (OLD_C3, '~'));



Tom Kyte
June 18, 2012 - 11:45 am UTC

ops$tkyte%ORA11GR2> select coalesce( a.c1, b.c1 ) "c1",
  2         a.c2 new_c2, a.c3 new_c3,
  3             b.c2 old_c2, b.c3 old_c3,
  4             case when a.c1 is null then 'delete'
  5                  when b.c1 is null then 'insert'
  6                          else 'update'
  7              end what
  8    from sk_a a full outer join sk_b b
  9      on (a.c1 = b.c1)
 10   where (a.c1 is null or b.c1 is null or decode(a.c2,b.c2,1,0)=0 or decode(a.c3,b.c3,1,0)=0)
 11   order by 1
 12  /

        c1 NEW_C2     NEW_C3     OLD_C2     OLD_C3     WHAT
---------- ---------- ---------- ---------- ---------- ---------------
         0                       x          y          delete
         2 xx         y          x          y          update
         3 x          y          x          yy         update
         4 x          y                                insert


Comparison of two tables

A reader, June 18, 2012 - 11:49 am UTC

Thanks a lot

Hari, September 22, 2016 - 12:43 pm UTC

Hi Tom,
Thanks for your Info provided...

I have executed my query with same minus & UNION ALL operators but Oracle was throwing the results only for the second part of Minus
eg:
( select * from dept2
minus
select * from dept1 )
)

can you Guide me, I got stuck here..

Reader, September 22, 2016 - 12:57 pm UTC

Hi Tom,

I am new to Oracle...I have scenario, I need to compare both the tables..


Tab A------------- Test Env
id name e_id country salary dept
1 abc 120 ind 1000 Finance
2 cde 101 aus 1500 Hr
3 edf 105 US 4555 Marketing
4 fgh 134 uk 15132 Analytics
5 ijk 532 eng 24154 Sales
6 lmh 545 nz 52455 Design
Tab B --------- Prod Env
id name e_id country salary dept
1 abc 120 ind 1000 Finance
2 cde 101 aus 1500 Hr
3 edf 188 US 4555 Marketing
4 fgh 134 uk 15132 Analytics
5 ijk 532 eng 24154 Sales
6 lmh 545 nz 52455 Design

where i tried with the above query (minus & Union All) but it was not resulting the correct values.. The difference in my 2nd table (Tab B) is
3 edf 188 US 4555 Marketing..
Can you please share us the clear idea how to get the values...

I am handling here 300 + columns...
It would be great if i get an answered for this query.


September 23, 2016 - 4:41 am UTC


SQL> drop table table1;

Table dropped.

SQL> drop table table2;

Table dropped.

SQL>
SQL> CREATE TABLE Table1
  2      (id int, name varchar(3), e_id int, country varchar(3), salary int, dept varchar(9))
  3  ;

Table created.

SQL>
SQL>
SQL> INSERT INTO Table1 values    (1, 'abc', 120, 'ind', 1000, 'Finance');

1 row created.

SQL> INSERT INTO Table1 values        (2, 'cde', 101, 'aus', 1500, 'Hr');

1 row created.

SQL> INSERT INTO Table1 values        (3, 'edf', 105, 'US', 4555, 'Marketing');

1 row created.

SQL> INSERT INTO Table1 values        (4, 'fgh', 134, 'uk', 15132, 'Analytics');

1 row created.

SQL> INSERT INTO Table1 values        (5, 'ijk', 532, 'eng', 24154, 'Sales');

1 row created.

SQL> INSERT INTO Table1 values        (6, 'lmh', 545, 'nz', 52455, 'Design');

1 row created.

SQL>
SQL>
SQL>
SQL>
SQL> CREATE TABLE Table2
  2      (id int, name varchar(3), e_id int, country varchar(3), salary int, dept varchar(9))
  3  ;

Table created.

SQL>
SQL>
SQL> INSERT INTO Table2 VALUES    (1, 'abc', 120, 'ind', 1000, 'Finance');

1 row created.

SQL> INSERT INTO Table2 VALUES    (2, 'cde', 101, 'aus', 1500, 'Hr');

1 row created.

SQL> INSERT INTO Table2 VALUES    (3, 'edf', 188, 'US', 4555, 'Marketing');

1 row created.

SQL> INSERT INTO Table2 VALUES    (4, 'fgh', 134, 'uk', 15132, 'Analytics');

1 row created.

SQL> INSERT INTO Table2 VALUES    (5, 'ijk', 532, 'eng', 24154, 'Sales');

1 row created.

SQL> INSERT INTO Table2 VALUES    (6, 'lmh', 545, 'nz', 52455, 'Design');

1 row created.

SQL>
SQL> select * from table1 minus select * from table2
  2  union all
  3  select * from table2 minus select * from table1;

        ID NAM       E_ID COU     SALARY DEPT
---------- --- ---------- --- ---------- ---------
         3 edf        188 US        4555 Marketing

SQL>
SQL>


Alternate way to find the diff

Reader, September 26, 2016 - 6:15 am UTC

Tom,

thanks for the query provided.. Is there any alternate way/alternate query to speed up the performance for huge number of columns and records in Oracle..??
Connor McDonald
September 27, 2016 - 12:05 am UTC

you could explore using something like using ora_hash to see if that can short circuit things a little better, eg

select *
from t1 full outer join t2
where ora_hash(t1.col1||t1.col2....) = ora_hash(t2.col1||t2.col2....)


Speeding up the query

Stew Ashton, September 27, 2017 - 6:17 am UTC

Earlier in this thread, Tom recommended a solution found here:
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2151582681236#15393095283923
This solution scans each table just once. Using the data just above me:
drop table table1 purge;
drop table table2 purge;
CREATE TABLE Table1
(id int, name varchar(3), e_id int, country varchar(3), salary int, dept varchar(9))
;
INSERT INTO Table1 values    (1, 'abc', 120, 'ind', 1000, 'Finance');
INSERT INTO Table1 values        (2, 'cde', 101, 'aus', 1500, 'Hr');
INSERT INTO Table1 values        (3, 'edf', 105, 'US', 4555, 'Marketing');
INSERT INTO Table1 values        (4, 'fgh', 134, 'uk', 15132, 'Analytics');
INSERT INTO Table1 values        (5, 'ijk', 532, 'eng', 24154, 'Sales');
INSERT INTO Table1 values        (6, 'lmh', 545, 'nz', 52455, 'Design');
CREATE TABLE Table2
(id int, name varchar(3), e_id int, country varchar(3), salary int, dept varchar(9))
;
INSERT INTO Table2 VALUES    (1, 'abc', 120, 'ind', 1000, 'Finance');
INSERT INTO Table2 VALUES    (2, 'cde', 101, 'aus', 1500, 'Hr');
INSERT INTO Table2 VALUES    (3, 'edf', 188, 'US', 4555, 'Marketing');
INSERT INTO Table2 VALUES    (4, 'fgh', 134, 'uk', 15132, 'Analytics');
INSERT INTO Table2 VALUES    (5, 'ijk', 532, 'eng', 24154, 'Sales');
INSERT INTO Table2 VALUES    (6, 'lmh', 545, 'nz', 52455, 'Design');
commit;

select
  "ID", "NAME", "E_ID", "COUNTRY", "SALARY", "DEPT",
sum(OLD_CNT) OLD_CNT, sum(NEW_CNT) NEW_CNT
FROM (
  select
  "ID", "NAME", "E_ID", "COUNTRY", "SALARY", "DEPT",
  1 OLD_CNT, 0 NEW_CNT
  from TABLE1 O
  union all
  select
  "ID", "NAME", "E_ID", "COUNTRY", "SALARY", "DEPT",
  0 OLD_CNT, 1 NEW_CNT
  from TABLE2 N
)
group by
  "ID", "NAME", "E_ID", "COUNTRY", "SALARY", "DEPT"
having sum(OLD_CNT) != sum(NEW_CNT)
order by 1, NEW_CNT;

        ID NAM       E_ID COU     SALARY DEPT         OLD_CNT    NEW_CNT
---------- --- ---------- --- ---------- --------- ---------- ----------
         3 edf        105 US        4555 Marketing          1          0
         3 edf        188 US        4555 Marketing          0          1
Assuming the existence of a primary key, starting with database version 12c, we can get output in CDC format: I for insert, D for delete, U for update (and O for the old version of the updated row):
select Z##OP as OP,
  "ID", "NAME", "E_ID", "COUNTRY", "SALARY", "DEPT", Z##RID
from (
  select rowid Z##RID, 
  "ID", "NAME", "E_ID", "COUNTRY", "SALARY", "DEPT"
  from TABLE1 o
  union all
  select null, 
  "ID", "NAME", "E_ID", "COUNTRY", "SALARY", "DEPT"
  from TABLE2 n
)
match_recognize (
  partition by 
      "ID"
  order by Z##RID
  measures translate(classifier(),'N','U') Z##OP
  all rows per match
  pattern( ( ^ D $ ) |  ( ^ I $ ) |  ( ^ O N $ ) )
  define D as Z##RID is not null,
    I as Z##RID is null,
    O as Z##RID is not null,
    N as Z##RID is null and 1 in (
  decode(O."E_ID",N."E_ID",0,1), decode(O."COUNTRY",N."COUNTRY",0,1),
  decode(O."DEPT",N."DEPT",0,1), decode(O."NAME",N."NAME",0,1),
  decode(O."SALARY",N."SALARY",0,1)
  )
);

OP         ID NAM       E_ID COU     SALARY DEPT      Z##RID            
-- ---------- --- ---------- --- ---------- --------- ------------------
O           3 edf        105 US        4555 Marketing AAAVdaAARAAAAWsAAC
U           3 edf        188 US        4555 Marketing
The SQL for the first solution can be generated from
https://stewashton.wordpress.com/2015/01/21/stew_compare_sync-introducing-the-package/

Best regards, Stew Ashton

Comparison

Talha Imran, June 13, 2019 - 3:32 pm UTC

Hi Tom,
I want to compare one user define entry with a column of specific table and I m confused to do this. Can u plz explain this with example.
Chris Saxon
June 13, 2019 - 3:36 pm UTC

I'm confused about what exactly you're trying to do!

Can you please give an example?

Searching

Talha Imran, June 13, 2019 - 6:27 pm UTC

TOM I have a table in my database and I want to search a specific person through his ID-Card number that i will enter ID-Card number and after searching from that specific table on the basis of my entered ID-card number and if ID-card number is matched then simply the data of person showed to me.

Thanks.
Best Regards,
Talha Imran.
Connor McDonald
June 14, 2019 - 2:59 am UTC

HOW_TO_ASK_GOOD_QUESTION

VERY large tables - comparison

Andre, November 08, 2020 - 9:37 pm UTC

Dear Tom et al

I have looked at many cases - that are simple to execute for not too large tables
In my case I have a very large table (over 1 billion rows) which was a replicated (time-consistent clone)
When I tried to execute a simple count(*) on
select * from clone_t0 minus select * from clone_t1
it is running for two hours to finally abort with
ORA-1652: unable to extend temp
NOTE ..: TEMP size is 180 GB
I tried also a trick with group by the table PK - same error

So - I resorted to a PL/SQL procedure which executes a comparison of sets of rows (100,000 each) with a between clauses specifying low and high values for PK range = in bind variables
Native SQL gets executed in the loop 1 .. 10000 and I am able to see its progress by examining the current active SQL from v$sqlarea with a sql_id corresponding to the value of the ACTIVE session in v$session.

Projected execution is approx 8 hours.

I am sharing this with you - hoping for your constructive comment suggesting possibly a better approach..?

Thank you
Best wishes
Andre (Switzerland)
Connor McDonald
November 09, 2020 - 5:14 am UTC

My experience with large table operations is that the temporary workareas (hashing and sorting) are the main bottleneck. Generally we can *read* tables very quickly, but the temp space costs are the killer.

With that in mind, carving the task up into chunks which are as large as possible but *just* small enough to fit in memory gives a pretty good result...it just needs a little experimenting to find the sweet spot.

eg I'll start with two largish tables (large for my laptop!)

SQL> create table t1 
  2  as
  3  select
  4   rownum pk
  5  ,owner
  6  ,object_name
  7  ,subobject_name
  8  ,object_id
  9  ,data_object_id
 10  ,object_type
 11  ,created
 12  ,last_ddl_time
 13  from dba_objects,
 14    ( select 1 from dual connect by level <= 2000 );

Table created.

SQL>
SQL> create table t2 
  2  as select
  3   pk
  4  ,owner
  5  ,object_name
  6  ,subobject_name
  7  ,case when rownum != 57123123 then object_id else 0 end object_id
  8  ,data_object_id
  9  ,case when rownum != 123123 then object_type else 'x' end object_type
 10  ,created
 11  ,last_ddl_time
 12  from t1
 13  where mod(pk,93843212)!= 0;

Table created.

SQL> select num_rows, blocks
  2  from   user_tables
  3  where  table_name in ('T1','T2');

  NUM_ROWS     BLOCKS
---------- ----------
 174538000    1799514
 174537999    1799514


So ~175m rows, and there is some small discrepancy in both number of rows and values so that's what I would like to find

I use "decode" as a comparison because it handles nulls nicely and a full outer join to pick up missing entries in either table. So my SQL looks like this:

SQL> select t1.pk, t2.pk
  2  from   t1 full outer join  t2
  3  on (t1.pk = t2.pk )
  4  where t1.pk is null
  5   or t2.pk is null
  6   or decode(t1.owner,t2.owner,1,0)=0
  7   or decode(t1.object_name,t2.object_name,1,0)=0
  8   or decode(t1.subobject_name,t2.subobject_name,1,0)=0
  9   or decode(t1.object_id,t2.object_id,1,0)=0
 10   or decode(t1.data_object_id,t2.data_object_id,1,0)=0
 11   or decode(t1.object_type,t2.object_type,1,0)=0
 12   or decode(t1.created,t2.created,1,0)=0
 13   or decode(t1.last_ddl_time,t2.last_ddl_time,1,0)=0


Now I'll run that for my 175m rows....

SQL> set timing on
SQL> select t1.pk, t2.pk
  2  from   t1 full outer join  t2
  3  on (t1.pk = t2.pk )
  4  where t1.pk is null
  5   or t2.pk is null
  6   or decode(t1.owner,t2.owner,1,0)=0
  7   or decode(t1.object_name,t2.object_name,1,0)=0
  8   or decode(t1.subobject_name,t2.subobject_name,1,0)=0
  9   or decode(t1.object_id,t2.object_id,1,0)=0
 10   or decode(t1.data_object_id,t2.data_object_id,1,0)=0
 11   or decode(t1.object_type,t2.object_type,1,0)=0
 12   or decode(t1.created,t2.created,1,0)=0
 13   or decode(t1.last_ddl_time,t2.last_ddl_time,1,0)=0
 14  /
from   t1 full outer join  t2
       *
ERROR at line 2:
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP


Elapsed: 00:16:44.57



Splat! And 16mins of execution time.

So I want to find the right subset of rows and also dedicate a nice chunk of RAM to my session for sorting etc.

alter session set workarea_size_policy = manual
alter session set sort_area_size = 1024000000
alter session set sort_area_retained_size = 1024000000


So first I'll try with just 1million rows

SQL> with
  2  t1x as ( select * from t1 where pk between 1 and 1000000 ),
  3  t2x as ( select * from t2 where pk between 1 and 1000000 )
  4  select t1x.pk, t2x.pk
  5  from   t1x full outer join  t2x
  6  on (t1x.pk = t2x.pk )
  7  where t1x.pk is null
  8   or t2x.pk is null
  9   or decode(t1x.owner,t2x.owner,1,0)=0
 10   or decode(t1x.object_name,t2x.object_name,1,0)=0
 11   or decode(t1x.subobject_name,t2x.subobject_name,1,0)=0
 12   or decode(t1x.object_id,t2x.object_id,1,0)=0
 13   or decode(t1x.data_object_id,t2x.data_object_id,1,0)=0
 14   or decode(t1x.object_type,t2x.object_type,1,0)=0
 15   or decode(t1x.created,t2x.created,1,0)=0
 16   or decode(t1x.last_ddl_time,t2x.last_ddl_time,1,0)=0
 17  /

        PK         PK
---------- ----------
    123123     123123

1 row selected.

Elapsed: 00:00:52.99

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    3594557  consistent gets
    3594534  physical reads
          0  redo size
        623  bytes sent via SQL*Net to client
       1001  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed 


Notice no sorts to disk, and it finished in 52 seconds. But (in my case) that would still be 175 x 50 seconds...a long time.

But now I can explore how far I can push that subset without hitting disk...

SQL> with
  2  t1x as ( select * from t1 where pk between 1 and 3000000 ),
  3  t2x as ( select * from t2 where pk between 1 and 3000000 )
  4  select t1x.pk, t2x.pk
  5  from   t1x full outer join  t2x
  6  on (t1x.pk = t2x.pk )
  7  where t1x.pk is null
  8   or t2x.pk is null
  9   or decode(t1x.owner,t2x.owner,1,0)=0
 10   or decode(t1x.object_name,t2x.object_name,1,0)=0
 11   or decode(t1x.subobject_name,t2x.subobject_name,1,0)=0
 12   or decode(t1x.object_id,t2x.object_id,1,0)=0
 13   or decode(t1x.data_object_id,t2x.data_object_id,1,0)=0
 14   or decode(t1x.object_type,t2x.object_type,1,0)=0
 15   or decode(t1x.created,t2x.created,1,0)=0
 16   or decode(t1x.last_ddl_time,t2x.last_ddl_time,1,0)=0
 17  /

        PK         PK
---------- ----------
    123123     123123

1 row selected.

Elapsed: 00:00:54.13

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    3594557  consistent gets
    3594534  physical reads
          0  redo size
        623  bytes sent via SQL*Net to client
       1001  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed


So 3 million is OK, so I try higher

SQL> with
  2  t1x as ( select * from t1 where pk between 1 and 10000000 ),
  3  t2x as ( select * from t2 where pk between 1 and 10000000 )
  4  select t1x.pk, t2x.pk
  5  from   t1x full outer join  t2x
  6  on (t1x.pk = t2x.pk )
  7  where t1x.pk is null
  8   or t2x.pk is null
  9   or decode(t1x.owner,t2x.owner,1,0)=0
 10   or decode(t1x.object_name,t2x.object_name,1,0)=0
 11   or decode(t1x.subobject_name,t2x.subobject_name,1,0)=0
 12   or decode(t1x.object_id,t2x.object_id,1,0)=0
 13   or decode(t1x.data_object_id,t2x.data_object_id,1,0)=0
 14   or decode(t1x.object_type,t2x.object_type,1,0)=0
 15   or decode(t1x.created,t2x.created,1,0)=0
 16   or decode(t1x.last_ddl_time,t2x.last_ddl_time,1,0)=0
 17  /

        PK         PK
---------- ----------
    123123     123123

1 row selected.

Elapsed: 00:00:58.15

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    3594557  consistent gets
    3594534  physical reads
          0  redo size
        623  bytes sent via SQL*Net to client
       1003  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> /

        PK         PK
---------- ----------
    123123     123123

1 row selected.

Elapsed: 00:00:58.33


10 million is OK so no I'm down to 17 executions of 60seconds (ie, 17 mins). Let us see how far we can push it.

SQL> with
  2  t1x as ( select * from t1 where pk between 1 and 20000000 ),
  3  t2x as ( select * from t2 where pk between 1 and 20000000 )
  4  select t1x.pk, t2x.pk
  5  from   t1x full outer join  t2x
  6  on (t1x.pk = t2x.pk )
  7  where t1x.pk is null
  8   or t2x.pk is null
  9   or decode(t1x.owner,t2x.owner,1,0)=0
 10   or decode(t1x.object_name,t2x.object_name,1,0)=0
 11   or decode(t1x.subobject_name,t2x.subobject_name,1,0)=0
 12   or decode(t1x.object_id,t2x.object_id,1,0)=0
 13   or decode(t1x.data_object_id,t2x.data_object_id,1,0)=0
 14   or decode(t1x.object_type,t2x.object_type,1,0)=0
 15   or decode(t1x.created,t2x.created,1,0)=0
 16   or decode(t1x.last_ddl_time,t2x.last_ddl_time,1,0)=0
 17  /
with
*
ERROR at line 1:
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT


So 20million is too many for my laptop :-) but typically you would see a big performance dip which means you're staging to disk. SQL Monitor would also show this in the "Temp" column.

But you can see - we're finding the "sweet spot" for memory. Once we have that, we can look at using (say) DBMS_PARALLEL_EXECUTE to run these in parallel (assuming your server has plenty of RAM).

Notice that I'm using a full scan in all cases. For large volumes of data, just because we carve up by primary key does NOT mean we should do a range scan.

VERY large tables - comparison

Andre, November 09, 2020 - 1:26 pm UTC

Dear Connor

Many thanks
A very interesting approach with this decode.

This LPAR has plenty of RAM - however in my case I have two tables with 1,010,000,000 rows each.
The original is HASH partitioned - these 2 clones are NOT ... yet
- as I was considering to partition them into 32 HASH partitions
Then it would be processing approx 30 mil rows in each table.

I am guessing that you are NOT in favor of PL/SQL procedure.

Another method could be to run 32 jobs each processing matching HASH partitions from the UNION of these two tables thus effectively comparing the contents of 32 pairs of tables instead of just i set of HUGE over 1 billion rows.

In my case PL/SQL procedure executed overnight OK in just over 8 hours as estimated.

Just to sum it up = my basic question to you would be
YOUR recommended approach with justification why you would opt for one approach and why you would NOT choose some other.

It appears that you were making a point of the processing time = i.e. performance - being the main choice criteria ... correct?

In my case the PL/SQL procedure had revealed 250000 different rows between the two tables = i.e. 0.025 %

I was monitoring using v$sqlarea sql_text but I could track it within PL/SQL in several different ways to monitor the progress.

The other issue is the impact of this comparison query or job on other application modules - whether I would cause starvation of memory or CPU and cause other app modules to be impacted too much.

Again - I would welcome your recommended approach with pros and cons = hence a couple or a few different ones...?

Thank you again
Best wishes
Andre



Connor McDonald
November 11, 2020 - 6:40 am UTC

But notice my example was NOT about partitions.

Each time I scanned the *entire* table, but I only was comparing a subset.

In your case, you would scan the 1billion rows (you'll be surprised how how fast can be done) but only do the join between the largest subset you can do without staging to temp.

So your first run is:

where pk between "x" and "x+10million"

The second one is:

where pk between "x+10million" and "x+20million"

and so forth. If your machine has the necessary grunt, these can parallelised by you, or via dbms_parallel_execute.

Hope that makes sense. You definitely do *not* want to be doing index range scans here.

VERY large tables - comparison

Andre, November 11, 2020 - 9:14 pm UTC

Dear Connor,

YES - it DOES make sense.

Thank you for this comprehensive response and very good technique with decode.

Sure - it goes without saying that I would not wish to invoke INDEX range.

I will implement this + monitor and trace to obtain a performance profile.

Thanks a bunch - again
Best wishes
Andre

Connor McDonald
November 12, 2020 - 12:54 am UTC

Keep us posted on how it goes !

VERY large tables - comparison

Andre, November 12, 2020 - 3:28 pm UTC

Dear Connor

Brief summary
(1) coded an SQL similar to yours and ran it with a range 1 - 1 mil
1 - 10 mil and 1 - 20 mil
(2) average execution time for 1 - 20 mil range = 11 minutes
(3) executed the same range using my previous approach with Native SQL = ela approx 4 minutes
(4) executed with your method - tracing with DBMS_MONITOR and processed the trace file with Method-R

Would be happy to share the html file - if you want me to upload the file or send it to you via email - please let me know either here or //preferably// via email to
andre.infoptima@gmail.com

Cheers
A

Connor McDonald
November 16, 2020 - 3:27 am UTC

Feel free to flick it along to: asktom_us@oracle.com with "1417403971710" in the subject line

Addenda:

Ah...I see one big glaring issue... 10.2 :-)

I recommended a full outer join because we have a native implementation of that in *supported* versions. With 10.2, because we didn't have it back then, it is converted into a UNION ALL of 2 separate joins (which of course doubles the work).

Similarly, 10.2 has done a temp table transformation in the WITH section, which means we're also doing two loads of data into temp storage before we proceed. You might be able to overcome that with an INLINE hint, but the lack of full outer join is going to hurt you.