Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Saurabh.

Asked: August 08, 2018 - 6:00 pm UTC

Last updated: September 04, 2018 - 5:16 am UTC

Version: ODI 12C

Viewed 1000+ times

You Asked

Hi Tom,

I am stuck in a Scenario where I have to extract 85 million records in a table and that table is not having any primary key.
After extracting all the data(full load) into target table my concern is I want to make this data loading incremental (first full load is loaded in the table I.e 85 million and after that I want to extract only updated records from source), How can I achieve this , as no primary key is there in the source??

and Connor said...

Thanks for your patience. Even without a primary key there must be *some* indicator of what you have extracted, otherwise there no way to compare what you have already got with what is new. Worst case is that you need to examine every column to make that decision.

In any event, 85 million rows is large but not huge.

For not null columns you can do a direct comparison. For nullable columns, you can use map them a special value just for matching purposes. So you can do something like this:

SQL> create table t1 as
  2  select d.owner, d.object_name, d.object_type, d.created
  3  from  dba_objects d,
  4    ( select 1 from dual connect by level <= 1000 );

Table created.

SQL> desc t1
 Name                                                              Null?    Type
 ----------------------------------------------------------------- -------- -------------
 OWNER                                                             NOT NULL VARCHAR2(128)
 OBJECT_NAME                                                       NOT NULL VARCHAR2(128)
 OBJECT_TYPE                                                       NOT NULL VARCHAR2(23)
 CREATED                                                                    DATE

SQL> select count(*) from t1;

  COUNT(*)
----------
  72952002

--
-- our initial copy
--
SQL> create table t2 as
  2  select * from t1;

Table created.

--
-- some new rows
--
SQL> insert into t1
  2  values ('NEW','CONNOR','INDEX',sysdate);

1 row created.

SQL>
SQL> insert into t1
  2  values ('NEW','CONNOR','TABLE',sysdate);

1 row created.

SQL>
SQL> commit;

Commit complete.

SQL> set timing on
SQL> with
  2    t1x as ( select rowid rid,owner,object_name,object_type,nvl(created,date '2000-01-01') created from t1),
  3    t2x as ( select rowid rid,owner,object_name,object_type,nvl(created,date '2000-01-01') created from t2)
  4  select t1x.rid
  5  from   t1x
  6  where  owner is not null
  7  and    object_name is not null
  8  and    object_type is not null
  9  and    created     is not null
 10  and (owner,object_name,object_type,created) not in
 11        ( select owner,object_name,object_type,created
 12          from   t2x
 13          where  owner is not null
 14          and    object_name is not null
 15          and    object_type is not null
 16          and    created     is not null
 17        );

RID
------------------
AAASN7AANAACZzZAAB
AAASN7AANAACZzZAAA

2 rows selected.

Elapsed: 00:02:53.81


So 3mins isn't too bad (on my home machine). Obviously this will not scale over time as the volumes get larger, but it should get you by ?

And if needs be, you could smash it with parallel ?

SQL> set timing on
SQL> with
  2    t1x as ( select rowid rid,owner,object_name,object_type,nvl(created,date '2000-01-01') created from t1),
  3    t2x as ( select rowid rid,owner,object_name,object_type,nvl(created,date '2000-01-01') created from t2)
  4  select /*+ parallel */ t1x.rid
  5  from   t1x
  6  where  owner is not null
  7  and    object_name is not null
  8  and    object_type is not null
  9  and    created     is not null
 10  and (owner,object_name,object_type,created) not in
 11        ( select owner,object_name,object_type,created
 12          from   t2x
 13          where  owner is not null
 14          and    object_name is not null
 15          and    object_type is not null
 16          and    created     is not null
 17        );

RID
------------------
AAASN7AANAACZzZAAA
AAASN7AANAACZzZAAB

Elapsed: 00:01:11.71


Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Worst case

A reader, August 31, 2018 - 6:39 am UTC

Worst case is that you need to examine every column to make that decision. 

Unfortunately I am in this case.
And need to to get status for any change not only new inserts.
Additionally my source table can have duplicates rows eg same set of compared cols can occur many times and I have to log every single rows changes.
A good news is that the distinct count of all sets of cops to be compared is small say for the 85 millions in the source their distinct is about 3000

Any best solution?
Connor McDonald
September 04, 2018 - 5:13 am UTC

The approach is no different - you just change the SQL method. You could use MERGE, or FULL OUTER JOIN.

How about a FULL OUTER JOIN

Rajeshwaran, Jeyabal, August 31, 2018 - 6:48 am UTC

How about a full outer joins - something like this.
demo@ORA12C> create table t1 as
  2  select d.owner, d.object_name, d.object_type, d.created
  3  from  dba_objects d ;

Table created.

demo@ORA12C> create table t2 as
  2  select *
  3  from t1;

Table created.

demo@ORA12C> insert into t1(owner,object_name,object_type,created)
  2     values('NEW','CONNOR','TABLE',sysdate);

1 row created.

demo@ORA12C> insert into t2(owner,object_name,object_type,created)
  2     values('NEW','CONNOR','INDEX',sysdate);

1 row created.

demo@ORA12C> commit;

Commit complete.

demo@ORA12C> select t1.rowid , t2.rowid,
  2     case when t1.rowid is null then 'Missing/changes in T1'
  3              when t2.rowid is null then 'Missing/changes in T2'
  4              end as results_output
  5  from t1 full outer join t2
  6  on ( t1.owner = t2.owner
  7  and t1.object_name = t2.object_name
  8  and t1.object_type = t2.object_type
  9  and t1.created = t2.created )
 10  where t1.rowid is null or
 11     t2.rowid is null ;

ROWID              ROWID              RESULTS_OUTPUT
------------------ ------------------ ---------------------
                   AAAa8bAAIAAAB3XAAA Missing/changes in T1
AAAa8aAAIAAABtXAAA                    Missing/changes in T2

demo@ORA12C>

Using analytics

Stew Ashton, August 31, 2018 - 8:08 am UTC

I'm borrowing Connor's example, but with one million rows only. Since this solution uses analytic functions, it probably needs a fair amount of memory to run fast.

You want to synchronize the target table with the source table. When there is no primary or unique, non-null key, there is no way to do an UPDATE but we can synchronize with INSERTs and DELETEs. The MERGE statement can do both at once.

I start by comparing the two tables and creating a CDC-style result set: Z##OP contains a flag ('I' for insert, 'D' for delete), and z##RID contains the ROWID of rows to be deleted. The MERGE matches this result set to the target using ROWIDs to avoid an extra table scan of the target table.
SQL> define d_tab_size = 1000000;
SQL> create table t1 as
  2  select d.owner, d.object_name, d.object_type, d.created
  3  from dba_objects d,
  4  (
  5    select null from dual
  6    connect by level <= 1 + &d_tab_size / (select count(*) from dba_objects)
  7  )
  8  where rownum <= &d_tab_size;

Table T1 created.

SQL> create table t2 as
  2  select * from t1 where rownum <= &d_tab_size - 100;

Table T2 created.

SQL> insert into t2 select * from t2 where rownum <= 100;

100 rows inserted.

SQL> commit;

Commit complete.

SQL> merge /*+ qb_name(SYNC_PARTITION) USE_NL(O) */ into (
  2    select /*+ qb_name(target) */
  3      "OWNER", "OBJECT_NAME", "OBJECT_TYPE", "CREATED", rowid Z##RID
  4    from T1
  5  ) O
  6  using (
  7  select /*+ qb_name(CDC_PARTITION) */ * from (
  8    select /*+ qb_name(before_filter) */
  9      "OWNER", "OBJECT_NAME", "OBJECT_TYPE", "CREATED",
 10      case
 11        when Z##NEW = 1
 12          and sum(Z##NEW) over(partition by
 13            "OWNER", "OBJECT_NAME", "OBJECT_TYPE", "CREATED"
 14          order by null rows unbounded preceding) > sum(Z##OLD) over(partition by
 15            "OWNER", "OBJECT_NAME", "OBJECT_TYPE", "CREATED"
 16          )
 17          then 'I'
 18        when Z##OLD = 1
 19          and sum(Z##OLD) over(partition by
 20            "OWNER", "OBJECT_NAME", "OBJECT_TYPE", "CREATED"
 21          order by null rows unbounded preceding) > sum(Z##NEW) over(partition by
 22            "OWNER", "OBJECT_NAME", "OBJECT_TYPE", "CREATED"
 23          )
 24          then 'D'
 25      end Z##OP, Z##RID
 26    FROM (
 27      select /*+ qb_name(old) */
 28      "OWNER", "OBJECT_NAME", "OBJECT_TYPE", "CREATED",
 29      1 Z##OLD, 0 Z##NEW, rowid Z##RID
 30      from T1 O
 31      union all
 32      select /*+ qb_name(new) */
 33      "OWNER", "OBJECT_NAME", "OBJECT_TYPE", "CREATED",
 34      0, 1, null
 35      from t2 N
 36    )
 37  )
 38  where Z##OP is not null
 39  ) N
 40  on (
 41    O.Z##RID = n.Z##RID
 42  )
 43  when matched then update set
 44    "OWNER"=N."OWNER"
 45    delete where N.Z##OP = 'D'
 46  when not matched then insert (
 47    "OWNER", "OBJECT_NAME", "OBJECT_TYPE", "CREATED"
 48  ) values(
 49    N."OWNER", N."OBJECT_NAME", N."OBJECT_TYPE", N."CREATED"
 50  );

200 rows merged.

Elapsed: 00:00:08.392
If you run this statement a second time, you will see that no rows were merged, since the comparison result set is empty.

Best regards,
Stew Ashton

P.S. I generated this code using a package that I explain here:
https://stewashton.wordpress.com/2018/02/12/comp_sync-1-a-new-table-compare-sync-package/
Connor McDonald
September 04, 2018 - 5:16 am UTC

Thanks for dropping by. I did some tests with MERGE on a couple of 100mill rows sets - they did nasty things to my laptop :-) but the approach is sound assuming reasonable hardware.

Ultimately of course, the solution here is - find a logical key structure for the table and implement it.

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library