Skip to Main Content
  • Questions
  • Large Updates While Resolving Surrogate Keys

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Steven.

Asked: April 11, 2017 - 7:38 pm UTC

Last updated: April 12, 2017 - 3:24 pm UTC

Version: 12c

Viewed 1000+ times

You Asked

We have the need to run a very large import for on hand counts. The table we are updating -- PRODUCT_STORE -- is keyed by a store_id and product_id. However, product_id is a surrogate key and we are given the product_code and store_id and we need to resolve the product_id in order to do the SQL statement.

Because the row could either be an update or an insert, we are generating a batch of prepared statements that perform a MERGE operation. However, I need to resolve the product_id three times in the statement:

1. As part of the MERGE itself to determine MATCHED or NOT MATCHED
2. As part of the INSERT in the NOT MATCHED clause
3. As part of a WHERE clause in the NOT MATCHED instruction to ensure we don't try to insert bad data into the table (in other words, if we cannot resolve the product_id skip it).

The product_id can be resolved with a query -- SELECT PRODUCT_ID FROM STORE_PRODUCT WHERE STORE_ID = ? and PRODUCT_CD = ?

So this makes my MERGE statement something like this:

MERGE INTO STORE_INVENTORY USING DUAL ON
(PRODUCT_ID = (SELECT PRODUCT_ID FROM STORE_PRODUCT WHERE STORE_ID = ? and PRODUCT_CD = ?) and STORE_ID = ?)
WHEN NOT MATCHED THEN INSERT (STORE_ID,PRODUCT_ID,AVAILABLE_QTY)
values (?,(SELECT PRODUCT_ID FROM STORE_PRODUCT WHERE SYSTEM_ID = ? and PRODUCT_CD = ?),?)
WHERE (SELECT PRODUCT_ID FROM STORE_PRODUCT WHERE STORE_ID = ? and PRODUCT_CD = ?) IS NOT NULL
WHEN MATCHED THEN UPDATE SET AVAILABLE_QTY=?

This seems to work fine; my concern is that since I am running the same subquery three times in a single statement, and since there will be millions of rows to process (although I am running them in batches of 500,000) I'm wondering if there a more efficient way to do this?

Thanks!

and Chris said...

I'm not sure I fully understand what you're doing. But it feels like you've missed the point of merge.

If you want to merge product_ids for a particular store and code, then select them in a subquery in the using clause. Then you've got the "valid" product_ids and can refer to them in the update/insert clauses. e.g.:

merge into store_inventory i using 
  ( select product_id from store_product 
    where store_id = ? and product_cd = ? 
  ) p
on ( i.product_id = p.product_id )
when not matched then
  insert ( store_id,product_id,available_qty )
  values ( ?, p.product_id, ? )
when matched then
  update set available_qty=?


I'm a bit concerned when you say you're running in batches of 500,000. Are you looping through half a million store and product ids, calling the merge for each?

If so this will almost certainly take a long time. Just put your query to find all the products, stores and quantities in the using clause. e.g.:

merge into store_inventory i using 
  ( select store_id, product_id,available_qty from store_product 
    where store_id is not null and product_cd is not null
  ) p
on ( i.product_id = p.product_id
when not matched then
  insert ( store_id,product_id,available_qty )
  values ( p.store_id, p.product_id, p.available_qty )
when matched then
  update set available_qty=p.available_qty


Is this answer out of date? If it is, please let us know via a Comment