You Asked

Hi Tom,

I have a very large table with over 850 million rows of data. We are using CDC to extract the data from the source system to a target for publication and etl to a datawarehouse and ODS. I have a requirement to run periodic checks to ensure the data between the two systems are in sync and if not, bring them in sync.

Because of the massive data size, I thought DBMS_COMPARISON.CREATE_COMPARISON would be a great candidate for this. However, when I tried to run it, I get the error: ora-23626: 'schema.indexname' not eligible index error. Going through the docs, I discover that the tables primary key is a composite made up of 7 columns with a combination of number and nchar data types. As a result, it does not pass the requirements for DBMS_COMPARISON.

What else can you recommend? Is a minus query my only option?

Thank you so much for your help.
Bonnie

and Connor said...

Here are some options you could explore

SQL> create table t1 as select * from dba_objects where object_id is not null;

Table created.

SQL> create table t2 as select * from t1;

Table created.

--
-- Now I'll 'manipulate' t2 so its slightly different from t1
--
SQL> delete from t2 where owner = 'SCOTT' and object_name = 'EMP';

1 row deleted.

SQL> update t2 set owner = 'SCOTT2' where object_name = 'DEPT' and owner = 'SCOTT';

1 row updated.

So we'll start with a conventional minus

SQL> set autotrace on
SQL> select *
  2  from
  3  (
  4  select * from t1
  5  minus
  6  select * from t2
  7  union all
  8  select * from t2
  9  minus
 10  select * from t1
 11  );

OWNER                          OBJECT_NAME
------------------------------ ----------------------------------------
SUBOBJECT_NAME
----------------------------------------------------------------------------------------------------
 OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE             CREATED   LAST_DDL_ TIMESTAMP           STATUS  T
---------- -------------- ----------------------- --------- --------- ------------------- ------- -
G S  NAMESPACE
- - ----------
EDITION_NAME
----------------------------------------------------------------------------------------------------
SHARING       E O
------------- - -
SCOTT2                         DEPT

     92609         170186 TABLE                   20-NOV-15 02-MAR-17 2015-11-20:15:05:06 VALID   N
N N          1

NONE            N


1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 633555309

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |   208K|    73M|       | 12003   (1)| 00:00:01 |
|   1 |  VIEW                   |      |   208K|    73M|       | 12003   (1)| 00:00:01 |
|   2 |   MINUS                 |      |       |       |       |            |          |
|   3 |    SORT UNIQUE          |      |   208K|    45M|    62M| 12003   (1)| 00:00:01 |
|   4 |     UNION-ALL           |      |       |       |       |            |          |
|   5 |      MINUS              |      |       |       |       |            |          |
|   6 |       SORT UNIQUE       |      |   104K|    11M|    15M|  3001   (1)| 00:00:01 |
|   7 |        TABLE ACCESS FULL| T1   |   104K|    11M|       |   313   (1)| 00:00:01 |
|   8 |       SORT UNIQUE       |      |   104K|    11M|    15M|  3001   (1)| 00:00:01 |
|   9 |        TABLE ACCESS FULL| T2   |   104K|    11M|       |   313   (1)| 00:00:01 |
|  10 |      TABLE ACCESS FULL  | T2   |   104K|    11M|       |   313   (1)| 00:00:01 |
|  11 |    SORT UNIQUE          |      |   104K|    11M|    15M|  3001   (1)| 00:00:01 |
|  12 |     TABLE ACCESS FULL   | T1   |   104K|    11M|       |   313   (1)| 00:00:01 |
----------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       7020  consistent gets
          0  physical reads
          0  redo size
       1856  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
          1  rows processed

and the question is - can we do better ? One of the things the database can be best is hash joins. So (at the cost of perhaps more SQL code) we can do a similar comparison using the FULL OUTER JOIN syntax. In the following example, we're assuming the logical primary key for the table is "object_id".


SQL> select *
  2  from (
  3    select t1.object_id t1r, t2.object_id t2r
  4    from t1 full outer join t2
  5    on t1.object_id = t2.object_id
  6  )
  7  where t1r is null or t2r is null;

       T1R        T2R
---------- ----------
     92608

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 53297166

------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |   104K|  2653K|       |   793   (1)| 00:00:01 |
|*  1 |  VIEW                 | VW_FOJ_0 |   104K|  2653K|       |   793   (1)| 00:00:01 |
|*  2 |   HASH JOIN FULL OUTER|          |   104K|  1020K|  1736K|   793   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL  | T1       |   104K|   510K|       |   312   (1)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | T2       |   104K|   510K|       |   312   (1)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T1"."OBJECT_ID" IS NULL OR "T2"."OBJECT_ID" IS NULL)
   2 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3510  consistent gets
          0  physical reads
          0  redo size
        605  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Now that is way faster, and "works" in that it found missing *rows*, but it did not find rows for which a key was present in both source and target but the data was different. If that is a drama, then we have the augment the query with the non-key columns and use some decodes for comparison.

SQL>
SQL> select *
  2  from (
  3    select t1.object_id t1r, t2.object_id t2r, t1.owner t1o, t2.owner t2o
  4    from t1 full outer join t2
  5    on t1.object_id = t2.object_id
  6  )
  7  where
  8    t1r is null or
  9    t2r is null or
 10    decode(t1o,t2o,1,0)=0;

       T1R        T2R
---------- ----------
T1O
----------------------------------------------------------------------------------------------------
T2O
----------------------------------------------------------------------------------------------------
     92609      92609
SCOTT
SCOTT2

     92608
SCOTT



2 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 53297166

------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |   104K|    15M|       |   863   (1)| 00:00:01 |
|*  1 |  VIEW                 | VW_FOJ_0 |   104K|    15M|       |   863   (1)| 00:00:01 |
|*  2 |   HASH JOIN FULL OUTER|          |   104K|  2449K|  2456K|   863   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL  | T1       |   104K|  1224K|       |   312   (1)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | T2       |   104K|  1224K|       |   312   (1)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T1"."OBJECT_ID" IS NULL OR "T2"."OBJECT_ID" IS NULL OR
              DECODE("T1"."OWNER","T2"."OWNER",1,0)=0)
   2 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3511  consistent gets
          0  physical reads
          0  redo size
        816  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

So we can achieve the requirement and still use a join.

The other thing to be aware of is that one of the big costs in a true "source" vs "target" comparison is that if they are on different databases, then we have to drag one of those tables across the network. In such an instance, you might want to consider a potential compromise where we do some hashing ourselves, so we are only dragging hashkeys across the wire, eg

SQL>
SQL> select * from
  2    ( select rowid t1rid, ora_hash(t1.owner||t1.object_name||t1.object_id) t1colhash from t1) t1
  3    full outer join
  4    ( select rowid t2rid, ora_hash(t2.owner||t2.object_name||t2.object_id) t2colhash from t2) t2
  5    on t1.t1colhash = t2.t2colhash
  6  where   t1colhash is null
  7  or t2colhash is null;

T1RID               T1COLHASH T2RID               T2COLHASH
------------------ ---------- ------------------ ----------
                              AAAyZkAAGAAAXOeAA0 2029513192
AAAyZjAAGAAAOCeAA0 1656564078
AAAyZjAAGAAAOCeAAz  583804402

3 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1042339821

------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |   109M|  5216M|       |  1275  (23)| 00:00:01 |
|*  1 |  VIEW                 | VW_FOJ_0 |   109M|  5216M|       |  1275  (23)| 00:00:01 |
|*  2 |   HASH JOIN FULL OUTER|          |   109M|  5216M|  3776K|  1275  (23)| 00:00:01 |
|   3 |    VIEW               |          |   104K|  2551K|       |   312   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL | T1       |   104K|  5000K|       |   312   (1)| 00:00:01 |
|   5 |    VIEW               |          |   104K|  2551K|       |   312   (1)| 00:00:01 |
|   6 |     TABLE ACCESS FULL | T2       |   104K|  5000K|       |   312   (1)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T1COLHASH" IS NULL OR "T2COLHASH" IS NULL)
   2 - access("T1"."T1COLHASH"="T2"."T2COLHASH")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3511  consistent gets
          0  physical reads
          0  redo size
        877  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          3  rows processed

SQL>
SQL>
SQL>

Finally, sometimes a "layered" approach is perhaps a means of doing a "good enough" check. For example, you might do something carved up by owner, eg

SQL> select * from
  2    ( select owner t1own, sum(ora_hash(owner||object_name||object_id)) t1colhash , count(*) t1cnt from t1 group by owner)
  3    full outer join
  4    ( select owner t2own, sum(ora_hash(owner||object_name||object_id)) t2colhash , count(*) t2cnt from t2 group by owner)
  5    on t1.t1own = t2.t2own
  6  where t1own is null
  7  or t1own is null
  8  or t1colhash != t2colhash
  9  or t1cnt != t2cnt;

T1OWN
----------------------------------------------------------------------------------------------------
 T1COLHASH      T1CNT
---------- ----------
T2OWN
----------------------------------------------------------------------------------------------------
 T2COLHASH      T2CNT
---------- ----------


SCOTT2
2029513192          1

SCOTT
4.6253E+10         21
SCOTT
4.4013E+10         19


2 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 632540738

----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |    42 |  7728 |   630   (2)| 00:00:01 |
|*  1 |  VIEW                 | VW_FOJ_0 |    42 |  7728 |   630   (2)| 00:00:01 |
|*  2 |   HASH JOIN FULL OUTER|          |    42 |  7728 |   630   (2)| 00:00:01 |
|   3 |    VIEW               |          |    42 |  3864 |   315   (2)| 00:00:01 |
|   4 |     HASH GROUP BY     |          |    42 |  1554 |   315   (2)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| T1       |   104K|  3775K|   312   (1)| 00:00:01 |
|   6 |    VIEW               |          |    42 |  3864 |   315   (2)| 00:00:01 |
|   7 |     HASH GROUP BY     |          |    42 |  1554 |   315   (2)| 00:00:01 |
|   8 |      TABLE ACCESS FULL| T2       |   104K|  3775K|   312   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T1OWN" IS NULL OR "T1COLHASH"<>"T2COLHASH" OR
              "T1CNT"<>"T2CNT")
   2 - access("T1"."T1OWN"="T2"."T2OWN")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3510  consistent gets
          0  physical reads
          0  redo size
        984  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

which means minimal data across the network, but further work is required if you *do* find a difference.

Hope this helps.

Rating

(1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, March 07, 2017 - 11:47 pm UTC

Thank you Connor. Really appreciate the detailed analysis. I am going to try to use the last method. I will capture the pk value to find the values that are different and fish them out to a staging table and reprocess them as needed.

Thank you so much. I hope they will include nchar values as part of the index key for future releases on the dbms_comparison.

March 09, 2017 - 1:47 am UTC

Glad we could help

More to Explore

DBMS_COMPARISON

More on PL/SQL routine DBMS_COMPARISON here

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

More to Explore

DBMS_COMPARISON

DBMS_COMPARISON: ora-23626: 'schema.indexname' not eligible index error

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

More to Explore

DBMS_COMPARISON