I have 2 tables in Oracle 12c database. Table A has incoming data from an application with modified date+time timestamps for a given product, each day we may get around 50,000 rows in table_A. The goal is to use table A's data and insert into the final target table B(usually has billions of rows) by using table_A's data as the driving data set.
A record needs to be inserted/updated in table_B only when there is a change in the incoming dataset for given product attributes.
The purpose is to track the history/journey of a given product with valid timestamps only when there are changes in its attributes such as state and zip_cd. There should be any overlapping dates or gaps between valid_from and valid_to date fields in the final target table B after inserting the data from A. Think of table_B as slowly changing dimension with Valid_date, valid_to date intervals to indicate the change of a given product
Looking for some guidance as to what would be the best approach to solve this problem. It's ok to create one or multiple temporary/intermediate tables to achieve this.
There are all sorts of problems with the sample data. Valid_to dates before valid_from, overlapping dates, ... This makes it tough to verify the solution does what you want!
Anyhow, you should be able use this as the basis for what you're doing:
- Merge table B with a query on A that:
- Generates valid from/to dates by using lead ( dt ) to find the next date for each product
- Match on product ids and valid from date = modified date
This gives:
create table table_a (
product_id varchar2(30), state varchar2(30), zip_cd varchar2(30), modified_dt date
);
create table table_b (
sequence_key number, product_id varchar2(30), state varchar2(30), zip_cd varchar2(30), valid_from date, valid_to date, latest_flag varchar2(1)
);
begin
insert into table_a values ('abc', 'MN', '123', to_timestamp('3/5/2020 12:01:00 AM', 'mm/dd/yyyy hh:mi:ss AM'));
insert into table_a values ('abc', 'MN', '123', to_timestamp('3/5/2020 6:01:13 PM', 'mm/dd/yyyy hh:mi:ss PM'));
insert into table_a values ('abc', 'IL', '223', to_timestamp('3/5/2020 7:01:15 PM', 'mm/dd/yyyy hh:mi:ss PM'));
insert into table_a values ('abc', 'OH', '333', to_timestamp('3/5/2020 6:01:16 PM', 'mm/dd/yyyy hh:mi:ss PM'));
insert into table_a values ('abc', 'NY', '722', to_timestamp('3/5/2020 4:29:00 PM', 'mm/dd/yyyy hh:mi:ss PM'));
insert into table_a values ('abc', 'KS', '444', to_timestamp('3/5/2020 4:31:41 PM', 'mm/dd/yyyy hh:mi:ss PM'));
insert into table_a values ('bbc', 'MN', '123', to_timestamp('3/19/2020 2:47:08 PM', 'mm/dd/yyyy hh:mi:ss PM'));
insert into table_a values ('bbc', 'IL', '223', to_timestamp('3/19/2020 2:50:37 PM', 'mm/dd/yyyy hh:mi:ss PM'));
insert into table_a values ('ccb', 'MN', '123', to_timestamp('3/19/2020 2:56:24 PM', 'mm/dd/yyyy hh:mi:ss PM'));
insert into table_a values ('dbd', 'KS', '444', to_timestamp('3/19/2020 12:00:00 AM', 'mm/dd/yyyy hh:mi:ss AM'));
commit;
end;
/
create or replace procedure load_data as
begin
merge into table_b b
using (
select a.*,
lead ( modified_dt, 1, date'2099-12-31' ) over (
partition by product_id
order by modified_dt
) next_date
from table_a a
) a
on ( b.product_id = a.product_id and b.valid_from = a.modified_dt )
when not matched then
insert ( b.product_id, b.state, b.zip_cd, b.valid_from, b.valid_to, b.latest_flag )
values ( a.product_id, a.state, a.zip_cd, a.modified_dt, a.next_date, case
when next_date = date'2099-12-31' then 'Y' else 'N' end
)
when matched then
update set b.valid_to = a.next_date, b.latest_flag = 'N'
where a.next_date <> date'2099-12-31';
end load_data;
/
exec load_data();
select * from table_b
order by product_id, valid_from;
SEQUENCE_KEY PRODUCT_ID STATE ZIP_CD VALID_FROM VALID_TO LATEST_FLAG
<null> abc MN 123 05-MAR-2020 00:01:00 05-MAR-2020 16:29:00 N
<null> abc NY 722 05-MAR-2020 16:29:00 05-MAR-2020 16:31:41 N
<null> abc KS 444 05-MAR-2020 16:31:41 05-MAR-2020 18:01:13 N
<null> abc MN 123 05-MAR-2020 18:01:13 05-MAR-2020 18:01:16 N
<null> abc OH 333 05-MAR-2020 18:01:16 05-MAR-2020 19:01:15 N
<null> abc IL 223 05-MAR-2020 19:01:15 31-DEC-2099 00:00:00 Y
<null> bbc MN 123 19-MAR-2020 14:47:08 19-MAR-2020 14:50:37 N
<null> bbc IL 223 19-MAR-2020 14:50:37 31-DEC-2099 00:00:00 Y
<null> ccb MN 123 19-MAR-2020 14:56:24 31-DEC-2099 00:00:00 Y
<null> dbd KS 444 19-MAR-2020 00:00:00 31-DEC-2099 00:00:00 Y
insert into table_a values ( 'abc', 'TX', 654, sysdate );
exec load_data();
select * from table_b
order by product_id, valid_from;
SEQUENCE_KEY PRODUCT_ID STATE ZIP_CD VALID_FROM VALID_TO LATEST_FLAG
<null> abc MN 123 05-MAR-2020 00:01:00 05-MAR-2020 16:29:00 N
<null> abc NY 722 05-MAR-2020 16:29:00 05-MAR-2020 16:31:41 N
<null> abc KS 444 05-MAR-2020 16:31:41 05-MAR-2020 18:01:13 N
<null> abc MN 123 05-MAR-2020 18:01:13 05-MAR-2020 18:01:16 N
<null> abc OH 333 05-MAR-2020 18:01:16 05-MAR-2020 19:01:15 N
<null> abc IL 223 05-MAR-2020 19:01:15 28-JUL-2020 14:54:52 N
<null> abc TX 654 28-JUL-2020 14:54:52 31-DEC-2099 00:00:00 Y
<null> bbc MN 123 19-MAR-2020 14:47:08 19-MAR-2020 14:50:37 N
<null> bbc IL 223 19-MAR-2020 14:50:37 31-DEC-2099 00:00:00 Y
<null> ccb MN 123 19-MAR-2020 14:56:24 31-DEC-2099 00:00:00 Y
<null> dbd KS 444 19-MAR-2020 00:00:00 31-DEC-2099 00:00:00 YNote the case expression to set the latest_flag - this assumes that 31 Dec 2099 is "infinity", so identifies current values. The where clause in the when matched clause stops this overwriting latest to N if you run the merge twice with no changes in between.