Large Updates While Resolving Surrogate Keys

Thanks for the question, Steven.

Asked: April 11, 2017 - 7:38 pm UTC

Last updated: April 12, 2017 - 3:24 pm UTC

Version: 12c

Viewed 1000+ times

You Asked

We have the need to run a very large import for on hand counts. The table we are updating -- PRODUCT_STORE -- is keyed by a store_id and product_id. However, product_id is a surrogate key and we are given the product_code and store_id and we need to resolve the product_id in order to do the SQL statement.

Because the row could either be an update or an insert, we are generating a batch of prepared statements that perform a MERGE operation. However, I need to resolve the product_id three times in the statement:

1. As part of the MERGE itself to determine MATCHED or NOT MATCHED
2. As part of the INSERT in the NOT MATCHED clause
3. As part of a WHERE clause in the NOT MATCHED instruction to ensure we don't try to insert bad data into the table (in other words, if we cannot resolve the product_id skip it).

The product_id can be resolved with a query -- SELECT PRODUCT_ID FROM STORE_PRODUCT WHERE STORE_ID = ? and PRODUCT_CD = ?

So this makes my MERGE statement something like this:

MERGE INTO STORE_INVENTORY USING DUAL ON
(PRODUCT_ID = (SELECT PRODUCT_ID FROM STORE_PRODUCT WHERE STORE_ID = ? and PRODUCT_CD = ?) and STORE_ID = ?)
WHEN NOT MATCHED THEN INSERT (STORE_ID,PRODUCT_ID,AVAILABLE_QTY)
values (?,(SELECT PRODUCT_ID FROM STORE_PRODUCT WHERE SYSTEM_ID = ? and PRODUCT_CD = ?),?)
WHERE (SELECT PRODUCT_ID FROM STORE_PRODUCT WHERE STORE_ID = ? and PRODUCT_CD = ?) IS NOT NULL
WHEN MATCHED THEN UPDATE SET AVAILABLE_QTY=?

This seems to work fine; my concern is that since I am running the same subquery three times in a single statement, and since there will be millions of rows to process (although I am running them in batches of 500,000) I'm wondering if there a more efficient way to do this?

Thanks!

and Chris said...

I'm not sure I fully understand what you're doing. But it feels like you've missed the point of merge.

If you want to merge product_ids for a particular store and code, then select them in a subquery in the using clause. Then you've got the "valid" product_ids and can refer to them in the update/insert clauses. e.g.:

merge into store_inventory i using 
  ( select product_id from store_product 
    where store_id = ? and product_cd = ? 
  ) p
on ( i.product_id = p.product_id )
when not matched then
  insert ( store_id,product_id,available_qty )
  values ( ?, p.product_id, ? )
when matched then
  update set available_qty=?

I'm a bit concerned when you say you're running in batches of 500,000. Are you looping through half a million store and product ids, calling the merge for each?

If so this will almost certainly take a long time. Just put your query to find all the products, stores and quantities in the using clause. e.g.:

merge into store_inventory i using 
  ( select store_id, product_id,available_qty from store_product 
    where store_id is not null and product_cd is not null
  ) p
on ( i.product_id = p.product_id
when not matched then
  insert ( store_id,product_id,available_qty )
  values ( p.store_id, p.product_id, p.available_qty )
when matched then
  update set available_qty=p.available_qty

Is this answer out of date? If it is, please let us know via a Comment

Live SQL

Share and learn SQL and PL/SQL; free access to the latest version of Oracle Database!

Dev Gym

Classes, workouts and quizzes on Oracle Database technologies. Expertise through exercise!

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

Live SQL

Dev Gym