Skip to Main Content

Breadcrumb

XMAS_TREE

The AskTOM team is taking a break over the holiday season, so we're not taking questions or responding to comments. Please have a happy and safe festive season and as always, thanks for being a member of the AskTOM community.

Question and Answer

Tom Kyte

Thanks for the question, tahi.

Asked: July 17, 2004 - 6:56 pm UTC

Last updated: July 19, 2004 - 7:22 am UTC

Version: 9R2

Viewed 1000+ times

You Asked

Hi Tom,


This is the Scenario of my problem


This process will read all records in the STAGING_TABLE and process the data into the TARGET_DATABASE tables.

STG_ID will be used to populate the TAR_ID with in TARGET_DATABASE.

(for each STAGING_ID there could be more than one record in the Target table.)

Any records that fail the load process will be flagged as 'error' by populating the ERROR_TABLE with the reason for the failure. After the load has been performed, all error records will be corrected and reloaded.

outline as below:

1. Read all staging table records.

2. Use the STAGING_ID value to determine whether the record already exists with in the TARGET_DATABASE tables.

3. If not exist, create the one with associated records.

4. If exists, compare the staging data to the TARGET_DATABASE data as per the rules. Then update the appropriate data in TARGET_DATABASE.

5. If any records are invalid, update the staging table record with the error message.

6. create a record in MY_LOGS table to indicate successfull completion of the process as well as the number of records processed.

STAGING_TABLE:


STG_ID VARCHAR2(15)
STG_FLD_1 VARCHAR2(5)
STG_FLD_2 NUMBER(10)
STG_ERROR_LOG VARCHAR2(250)
STG_CREATED_DT DATE
STG_VERIFIED_DT DATE
STG_UPDATED_DT DATE



TARGET_TABLE:

TARGET_ID VARCHAR2(15)
TAR_FLD_1 VARCHAR2(5)
TAR_FLD_2 NUMBER(10)
TAR_CREATED_DT DATE
TAR_VERIFIED_DT DATE
TAR_UPDATED_DT DATE


NOW here is the date rules:

If the staging table field is blank, then the target field is NOT updated (except for certain fields where a blank DOES override the target value).

If both staging and target fields have a value, then the date fields will determine which value is retained. Use the first non-blank date when comparing:

- If the record Update date on Staging (STG_UPDATED_DT) is greater than the TAR_UPDATED_DT, overwrite the target value

- If the record record verified date on Staging (STG_VERIFIED_DT) is greater than the TAR_VERIFIED_DT, overwrite the target value

- If the record Created date on Staging (STG_CREATED_DT) is greater than the TAR_CREATED_DT, then overwrite the target value



All errors should be written to the staging table into STG_ERROR_LOG field.

There are many other validations and processing involved in it.

My first thought to this Scenario is to use MERGE INTO statement, but since there are many complex business logics and lot of validations that approach seems to be useless.(specially when it comes to error logging into different table and there are many target tables involved in it). Secondly MERGE INTO is a deterministic approach (either to be completed or rollback) which may cause the whole process to stop when any exceptions occur.

Finally, i am using PL/SQL for this kind of ETL.
The outline is:


Procedure StartÂ….

cusor C is select * from STAGING_TABLE order by STG_ID;

BEGIN
FOR i IN C
BEGIN
SELECT TARGET_ID, STG_FLD_1, STG_FLD_2,TAR_CREATED_DT,TAR_VERIFIED_DT,TAR_UPDATED_DT
INTO vTARGET_ID, vTAR_FLD_1, vTAR_FLD_2, vTAR_CREATED_DT, vTAR_VERIFIED_DT, vTAR_UPDATED_DT;


(check the date rules here.)
bUpdateRec := false;

IF i.STG_UPDATED_DT IS NOT NULL THEN
IF i.STG_UPDATED_DT > vSTG_UPDATED_DT THEN
bUpdateRec := true;
END IF;

ELSIF i.STG_VERIFIED_DT IS NOT NULL THEN
IF i.STG_VERIFIED_DT > vSTG_VERIFIED_DT THEN
bUpdateRec := true;
END IF;

ELSIF i.STG_CREATED_DT IS NOT NULL THEN
IF i.STG_CREATED_DT > vSTG_CREATED_DT THEN
bUpdateRec := true;
END IF;
END IF;
(end date rules)


(process each field)
bUpdateField := false;

IF vTAR_FLD_1 IS NULL THEN
bUpdateField := TRUE;

ELSIF
IF i.STG_FLD_1 IS NOT NULL AND bUpdateRec = TRUE THEN
bUpdateField := TRUE;
END IF;
END IF;
END IF;

IF bUpdateField = TRUE THEN

vTAR_FLD_1 := 'A';

sSQL := 'UPDATE TARGET_TABLE SET ';
sSQL := sSQL || 'TAR_FLD_1 =' || vTAR_FLD_1;

(DO THE SAME THING FOR EACH FIELD AND IN THE END OF THE LOOP EXECUTE sSQL)

ELSE (INSERT THE FEILD)

EXCEPTION
WHEN DUP_VAL_ON_INDEX THEN
Log error on staging table .
(indicate the row or ID that failed in the error). GOTO NEXT_RECORD;
WHEN NO_DATA_FOUND THEN
Log error on staging table.
GOTO NEXT_RECORD;
EtcÂ…

END;

EXEC sSQL;

Commit;
END FOR;
EXCEPTION
WHEN OTHER THEN
--RAISE APPLICATION ERROR AND STOP PROCESSING.
END;



Sorry if I am making it real hard. Do you think that selecting the target fields into variables(SELECT INTO) , and then comparing them with fields in the cursor is good idea ?

What other options do i have ?

It will be great if you explain it with some example.

Thanks!



and Tom said...

this:

(for each STAGING_ID there could be more than one record in the Target table.)

does not seem to make sense, are you sure that is correct? you cannot use select into in that case.


basically - i would do two statements in bulk:


cursor c is
select * from stage where stg_id not in ( select tar_id from target );

cursor c2 is
select stage.*
from stage, target
where stage.stg_id = target.tar_id
and (STG_CREATED_DT > tar_create_dt
or
STG_VERIFIED_DT > tar_verified_dt
or
STG_UPDATED_DT > tar_updated_dt );

begin
open c;
loop
fetch c bulk collect into arrays LIMIT 500;
forall i insert into target save exceptions
process exceptions;
exit when c%notfound;
end loop
close c;

open c2;
loop
fetch c2 bulk collect into arrays LIMIT 500;
forall i update target save exceptions
process exceptions;
exit when c2%notfound;
end loop
close c2;

have as little procedural code as possible.


search for "save exceptions" on this site for discussions as to how save exceptions works with bulk processing.



Rating

  (1 rating)

We're not taking comments currently, so please try again later if you want to add a comment.

Comments

feild by feild ETL

TAHI, July 18, 2004 - 7:10 pm UTC

Tom,

Your statment

"fetch c bulk collect into arrays LIMIT 500;
forall i insert into target save exceptions"

means that i have to create a pl/sql table ...
i.e

type arrays is table of stage%rowtype;
p_data arrays;

.......

fetch c bulk collect into arrays LIMIT 500
forall i in 1 ..p_data.count
insert into target save exceptions .......

is that right ?

Please explain that, how can i validate the date rules inside the loop? Do i have to open cursor c2 inside the cursor c for that thing?

Thanks


Tom Kyte
July 19, 2004 - 7:22 am UTC

yes, you create a plsql table type.


you validate data in the loop by looking at the stuff you just fetched into the array?

You are fetching two arrays here -- one of "new data", one of "data to update", you simply look at the values you processed to "validate" and catch any errors during the insert/update phase.

More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library