We have an ETL process on a table with 38 updatable columns. The source data is a CSV file with a single column unique record identifier and we set this file up as an external table. The target table has a corresponding primary key column. The source data can include new and updated records, and will not contain records that have been deprecated. Inserting the new records works fine, as does updating the deprecated records to an inactive status. The problem is updating the modified rows. The table currently has about 85,000 rows and 6,800 will be updated. Using the following syntax, the update takes 65 minutes.
update target_table tgt
set (
column1, ..., column38
) = (
select column1, ..., column38
from external_source_table src
where src.key_value = tgt.primary_key
)
where primary_key in (
select key_value
from (
select
key_value, column1, ..., column38
from external_source_table
minus
primary_key as key_value, column1, ..., column38
from target_table
)
);
Might there be a more efficient way to accomplish the update?
Thanx, D
MERGE is probably your best bet here, because the INSERT phase is optional. You'll have something like:
merge target_table tgt
using ( select ... from external_source_table ) src
on ( tgt.primary_key = src.primary_key )
when matched then
update
set tgt.col1 = src.col1,
tgt.col2 = src.col2,
...
although I'm a little confused about your SQL because your subquery seems to be looking at only picking up rows that are *not* already in the table, which suggests an insert not an update ?