Skip to Main Content
  • Questions
  • Effective way to compare/merge large tables

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Alex.

Asked: March 21, 2017 - 10:28 am UTC

Last updated: April 07, 2017 - 2:06 am UTC

Version: 11.2

Viewed 10K+ times! This question is

You Asked

Dear Oracle Gurus ;)
I have two big tables (client accounts, 50+ million rows, same fields but different rows) in DWH environment:

- first table coming daily as raw data from OLTP server to staging area of DWH server, contains all client accounts as
a snapshot for today

- second table in DWH details area, contains data as a snapshot for previous load

First question is "what is most effective way to compare first table with second and perform MERGE (i.e. insert new
rows + update changed + mark as 'deleted' rows, which are not present in OLTP anymore)

Options are:

1) single MERGE statement upon full outer join from first+second table

2) multiple statements (insert + update + delete), which allows to use APPEND hint

3) ...?

Second question is (as a subquestion for INSERT statement for 2nd option above) "what is most effective way to find rows existing in first table and not existing in second table.

Options are:

1) first_table left join second_table on key=key WHERE second_table.key is null (FULL SCAN + HASH JOIN)

2) something with ...not in() or ...not exist()

3) .. ?


I intentionally do not mention existing indexes - third question is:
"may any indexes help (=increase performance), in which options, why...?"

As questions are mostly theoretical I do not attach any liveSQL, but if you have examples which strongly prove one of options please share it...

Thanks in advance!

and Connor said...

For me, MERGE is the way to go. To only MERGE new things and those that have changed, you can use a simple decode. eg

  merge into TARGET_TABLE w
  using ( select
             PK_COL
            ,COL2
            ,COL3
            ,...
          from SOURCE_TABLE ) p
   on (  p.PK_COL = w.PK_COL
  )
  when matched then
  update set
    w.COL2           = p.COL2
  , w.COL3           = p.COL3
  , ...
  where   decode(w.COL2,p.COL2,1,0)=0
     or   decode(w.COL2,p.COL2,1,0)=0
     or   ...
  when not matched then
  insert
    ( w.PK_COL
    , w.COL2
    , w.COL3
    , ...
  )
  values (p.PK_COL
         ,p.COL2
         ,p.COL3
         ,...
  );


Using DECODE takes care of null's as well

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

What about DELETEs?

Stew Ashton, March 24, 2017 - 10:37 am UTC

Hi,

Connor's answer is perfect as far as it goes.

If the MERGE does only updates, it can do an inner join of the two tables.

If the MERGE does updates and inserts, it can do "target right join source". Rows in source but not in target are then inserted.

The MERGE cannot directly do the equivalent of a full outer join - at least in Oracle - because a row must be updated in order to be eligible for delete.

In order to do DELETEs, the USING clause must contain a query that compares the two tables and produces rows for all three operations. I use the "group by" method popularized by Tom Kyte instead of a full outer join.
merge /*+ append */ into TARGET_TABLE O
using (
  select * from (
    select PK_COL, COL2, COL3,
      COUNT(*) over(partition by PK_COL) - SUM(Z##_CNT) Z##IUD_FLAG
    from (
      select PK_COL, COL2, COL3,
        -1 Z##_CNT
      from TARGET_TABLE O
      union all
      select PK_COL, COL2, COL3,
        1 Z##_CNT
      from SOURCE_TABLE N
    )
    group by PK_COL, COL2, COL3
    having SUM(Z##_CNT) != 0
  )
  where Z##IUD_FLAG < 3
) N
on (O.PK_COL=N.PK_COL)
when matched then update set
  COL2=N.COL2, COL3=N.COL3
  delete where N.Z##IUD_FLAG = 2
when not matched then insert (PK_COL, COL2, COL3)
  values(N.PK_COL, N.COL2, N.COL3);

Note the APPEND hint will be honored for any insertions.

Best regards, Stew

P.S. You can generate the MERGE statement using a package I wrote. See
https://stewashton.wordpress.com/2015/01/21/stew_compare_sync-introducing-the-package/

Alex, April 04, 2017 - 10:56 am UTC

Thanx for your reply, Connor!

2) several months ago I have compared performance of MERGE approach versus truncate+insert(append) on big fact table (10 mln rows per day, partitioned by one day interval)
results on 11g2 on EXADATA were:
- when you have small rowcont to merge (under 10 mln in several dates) - MERGE performs well;
- on larger rowcounts truncate changed partitions (days) and insert(append) is much more faster than MERGE

should one tale into account this changed rowcount when choosing between MERGE or TRUNC+APPEND approach? what is best practice?

2) Do you have answers for question 2 and 3 from my original post?
Connor McDonald
April 04, 2017 - 12:38 pm UTC

"should one tale into account this changed rowcount when choosing between MERGE or TRUNC+APPEND approach? what is best practice? "

If there was one mechanism which was best in all cases... guess what, we'd ditch the others. I *hate* the term "best practice" unless the term is used to mean: "Use a sensible means to decide on what is best for you".

That sort of answers your other question as well. I've written code in the past to find deltas between two sets with:

a) where (a,b,c) not in ( select ... )

and

b) from t1, t2 where t1.x = t2.y(+) and t2.y is null

Which is one best ? I choose the one that works best for me for that circumstance. We often get swept up in the technology and syntax, when really, the only concern is - what best matches the business requirement.

If that requirement is (as it commonly us) "what gets the job done fastest" then I'm always going to use whatever is correct and finishes first. I've no bias toward a particular method...only to what gives the best business resolution.

why not

gh.., April 04, 2017 - 2:24 pm UTC

Insert /*+ ignore_row_on_dupkey_index(tgt , PK_COL) */
into TARGET_TABLE tgt
select PK_COL, COL2, COL3, COL4, COL5
from SOURCE_TABLE;

Merge
into TARGET_TABLE tgt
using ( select
PK_COL
, COL2
, COL3
, COL4
, COL5
, ora_hash(COL2||COL3||COL4||COL5) as chk
from SOURCE_TABLE
) src
on ( src.PK_COL = tgt.PK_COL
)
when matched then
update set tgt.COL2 = src.COL2
, tgt.COL3 = src.COL3
, tgt.COL4 = src.COL4
, tgt.COL5 = src.COL5
where src.chk <> ora_hash(tgt.COL2||tgt.COL3||tgt.COL4||tgt.COL5);


or if possible :

alter table TARGET_TABLE add (CHECKSUM number generated always as ora_hash(COL2||COL3||COL4||COL5) );
+ index on it


Connor McDonald
April 07, 2017 - 2:06 am UTC

If you are planning on:

ora_hash(COL2||COL3||COL4||COL5)

you would want some delimiters between the cols...and there is even then, still the issue of hash collisions.