Thanks for the question, tahi.
Asked: July 17, 2004 - 6:56 pm UTC
Last updated: July 19, 2004 - 7:22 am UTC
Version: 9R2
Viewed 1000+ times
You Asked
Hi Tom,
This is the Scenario of my problem
This process will read all records in the STAGING_TABLE and process the data into the TARGET_DATABASE tables.
STG_ID will be used to populate the TAR_ID with in TARGET_DATABASE.
(for each STAGING_ID there could be more than one record in the Target table.)
Any records that fail the load process will be flagged as 'error' by populating the ERROR_TABLE with the reason for the failure. After the load has been performed, all error records will be corrected and reloaded.
outline as below:
1. Read all staging table records.
2. Use the STAGING_ID value to determine whether the record already exists with in the TARGET_DATABASE tables.
3. If not exist, create the one with associated records.
4. If exists, compare the staging data to the TARGET_DATABASE data as per the rules. Then update the appropriate data in TARGET_DATABASE.
5. If any records are invalid, update the staging table record with the error message.
6. create a record in MY_LOGS table to indicate successfull completion of the process as well as the number of records processed.
STAGING_TABLE:
STG_ID VARCHAR2(15)
STG_FLD_1 VARCHAR2(5)
STG_FLD_2 NUMBER(10)
STG_ERROR_LOG VARCHAR2(250)
STG_CREATED_DT DATE
STG_VERIFIED_DT DATE
STG_UPDATED_DT DATE
TARGET_TABLE:
TARGET_ID VARCHAR2(15)
TAR_FLD_1 VARCHAR2(5)
TAR_FLD_2 NUMBER(10)
TAR_CREATED_DT DATE
TAR_VERIFIED_DT DATE
TAR_UPDATED_DT DATE
NOW here is the date rules:
If the staging table field is blank, then the target field is NOT updated (except for certain fields where a blank DOES override the target value).
If both staging and target fields have a value, then the date fields will determine which value is retained. Use the first non-blank date when comparing:
- If the record Update date on Staging (STG_UPDATED_DT) is greater than the TAR_UPDATED_DT, overwrite the target value
- If the record record verified date on Staging (STG_VERIFIED_DT) is greater than the TAR_VERIFIED_DT, overwrite the target value
- If the record Created date on Staging (STG_CREATED_DT) is greater than the TAR_CREATED_DT, then overwrite the target value
All errors should be written to the staging table into STG_ERROR_LOG field.
There are many other validations and processing involved in it.
My first thought to this Scenario is to use MERGE INTO statement, but since there are many complex business logics and lot of validations that approach seems to be useless.(specially when it comes to error logging into different table and there are many target tables involved in it). Secondly MERGE INTO is a deterministic approach (either to be completed or rollback) which may cause the whole process to stop when any exceptions occur.
Finally, i am using PL/SQL for this kind of ETL.
The outline is:
Procedure StartÂ….
cusor C is select * from STAGING_TABLE order by STG_ID;
BEGIN
FOR i IN C
BEGIN
SELECT TARGET_ID, STG_FLD_1, STG_FLD_2,TAR_CREATED_DT,TAR_VERIFIED_DT,TAR_UPDATED_DT
INTO vTARGET_ID, vTAR_FLD_1, vTAR_FLD_2, vTAR_CREATED_DT, vTAR_VERIFIED_DT, vTAR_UPDATED_DT;
(check the date rules here.)
bUpdateRec := false;
IF i.STG_UPDATED_DT IS NOT NULL THEN
IF i.STG_UPDATED_DT > vSTG_UPDATED_DT THEN
bUpdateRec := true;
END IF;
ELSIF i.STG_VERIFIED_DT IS NOT NULL THEN
IF i.STG_VERIFIED_DT > vSTG_VERIFIED_DT THEN
bUpdateRec := true;
END IF;
ELSIF i.STG_CREATED_DT IS NOT NULL THEN
IF i.STG_CREATED_DT > vSTG_CREATED_DT THEN
bUpdateRec := true;
END IF;
END IF;
(end date rules)
(process each field)
bUpdateField := false;
IF vTAR_FLD_1 IS NULL THEN
bUpdateField := TRUE;
ELSIF
IF i.STG_FLD_1 IS NOT NULL AND bUpdateRec = TRUE THEN
bUpdateField := TRUE;
END IF;
END IF;
END IF;
IF bUpdateField = TRUE THEN
vTAR_FLD_1 := 'A';
sSQL := 'UPDATE TARGET_TABLE SET ';
sSQL := sSQL || 'TAR_FLD_1 =' || vTAR_FLD_1;
(DO THE SAME THING FOR EACH FIELD AND IN THE END OF THE LOOP EXECUTE sSQL)
ELSE (INSERT THE FEILD)
EXCEPTION
WHEN DUP_VAL_ON_INDEX THEN
Log error on staging table .
(indicate the row or ID that failed in the error). GOTO NEXT_RECORD;
WHEN NO_DATA_FOUND THEN
Log error on staging table.
GOTO NEXT_RECORD;
EtcÂ…
END;
EXEC sSQL;
Commit;
END FOR;
EXCEPTION
WHEN OTHER THEN
--RAISE APPLICATION ERROR AND STOP PROCESSING.
END;
Sorry if I am making it real hard. Do you think that selecting the target fields into variables(SELECT INTO) , and then comparing them with fields in the cursor is good idea ?
What other options do i have ?
It will be great if you explain it with some example.
Thanks!
and Tom said...
this:
(for each STAGING_ID there could be more than one record in the Target table.)
does not seem to make sense, are you sure that is correct? you cannot use select into in that case.
basically - i would do two statements in bulk:
cursor c is
select * from stage where stg_id not in ( select tar_id from target );
cursor c2 is
select stage.*
from stage, target
where stage.stg_id = target.tar_id
and (STG_CREATED_DT > tar_create_dt
or
STG_VERIFIED_DT > tar_verified_dt
or
STG_UPDATED_DT > tar_updated_dt );
begin
open c;
loop
fetch c bulk collect into arrays LIMIT 500;
forall i insert into target save exceptions
process exceptions;
exit when c%notfound;
end loop
close c;
open c2;
loop
fetch c2 bulk collect into arrays LIMIT 500;
forall i update target save exceptions
process exceptions;
exit when c2%notfound;
end loop
close c2;
have as little procedural code as possible.
search for "save exceptions" on this site for discussions as to how save exceptions works with bulk processing.
Rating
(1 rating)
We're not taking comments currently, so please try again later if you want to add a comment.