We have the need to run a very large import for on hand counts. The table we are updating -- PRODUCT_STORE -- is keyed by a store_id and product_id. However, product_id is a surrogate key and we are given the product_code and store_id and we need to resolve the product_id in order to do the SQL statement.
Because the row could either be an update or an insert, we are generating a batch of prepared statements that perform a MERGE operation. However, I need to resolve the product_id three times in the statement:
1. As part of the MERGE itself to determine MATCHED or NOT MATCHED
2. As part of the INSERT in the NOT MATCHED clause
3. As part of a WHERE clause in the NOT MATCHED instruction to ensure we don't try to insert bad data into the table (in other words, if we cannot resolve the product_id skip it).
The product_id can be resolved with a query -- SELECT PRODUCT_ID FROM STORE_PRODUCT WHERE STORE_ID = ? and PRODUCT_CD = ?
So this makes my MERGE statement something like this:
MERGE INTO STORE_INVENTORY USING DUAL ON
(PRODUCT_ID = (SELECT PRODUCT_ID FROM STORE_PRODUCT WHERE STORE_ID = ? and PRODUCT_CD = ?) and STORE_ID = ?)
WHEN NOT MATCHED THEN INSERT (STORE_ID,PRODUCT_ID,AVAILABLE_QTY)
values (?,(SELECT PRODUCT_ID FROM STORE_PRODUCT WHERE SYSTEM_ID = ? and PRODUCT_CD = ?),?)
WHERE (SELECT PRODUCT_ID FROM STORE_PRODUCT WHERE STORE_ID = ? and PRODUCT_CD = ?) IS NOT NULL
WHEN MATCHED THEN UPDATE SET AVAILABLE_QTY=?
This seems to work fine; my concern is that since I am running the same subquery three times in a single statement, and since there will be millions of rows to process (although I am running them in batches of 500,000) I'm wondering if there a more efficient way to do this?
Thanks!
I'm not sure I fully understand what you're doing. But it feels like you've missed the point of merge.
If you want to merge product_ids for a particular store and code, then select them in a subquery in the using clause. Then you've got the "valid" product_ids and can refer to them in the update/insert clauses. e.g.:
merge into store_inventory i using
( select product_id from store_product
where store_id = ? and product_cd = ?
) p
on ( i.product_id = p.product_id )
when not matched then
insert ( store_id,product_id,available_qty )
values ( ?, p.product_id, ? )
when matched then
update set available_qty=?
I'm a bit concerned when you say you're running in batches of 500,000. Are you looping through half a million store and product ids, calling the merge for each?
If so this will almost certainly take a long time. Just put your query to find all the products, stores and quantities in the using clause. e.g.:
merge into store_inventory i using
( select store_id, product_id,available_qty from store_product
where store_id is not null and product_cd is not null
) p
on ( i.product_id = p.product_id
when not matched then
insert ( store_id,product_id,available_qty )
values ( p.store_id, p.product_id, p.available_qty )
when matched then
update set available_qty=p.available_qty