Hi Tom and Team,
I am looking to remove the invalid records before applying PK
create table OLD_ABC
(
PRODUCT,
CLASS,
SEASON_KEY,
DATE,
PRICE
)
KEY -- (PRODUCT, CLASS, SEASON_KEY, DATE)
TABLE records --
Ex. ABC , 121 , S ,1-JAN-21, 15
PQR , 121 , W ,1-JAN-21, 15 -- Incorrect record because of wrong SEASON NAME
Due to PK on all above field, it was allowing to load the data for same class with invalid Season due to wrong data provided by Source.
CLASS Table holds SEASON Information as it's 1:1 relation
Class table records -
CLASS NAME SEASON_NAME SEASON_KEY (LAST DIGIT OF CLASS)
121 PL1 S 1
122 SL1 W 2
123 IR1 A 3
In new database,the column Season has been removed as part of normlization and to avoid incorrect data to be get loaded.
create table NEW_ABC
(
PRODUCT,
CLASS,
DATE,
PRICE
)
PRM KEY -- (PRODUCT, CLASS, DATE) -- Currently PK is disabled to load historical records and also to analyze data between both databases.
PRODUCT CLASS DATE PRICE
PQR 121 1-JAN-21 15
ABC 121 1-JAN-21 15
Problem statement -
The incorrect records need to remove from OLD database as well as new database. (ex. PQR , 121 , W ,1-JAN-21, 30)
The incorrect data easy to from OLD_ABC by joining with CLASS TABLE but that's not the case with NEW_ABC Table as there is no option to join CLASS Table to find incorrect records.
The option I think is -
OLD_ABC -
create hsah key for each Incorrect record in OLD_ABC table by excluding SEASON_KEY column and crete similar ORA_hash key on NEW_ABC and remove the matching key records.
ex -
OLD_DATABASE -
select O.PRODUCT, O.CLASS, O.SEASON_KEY, O.DATE, O.PRICE, ORA_HASH(O.PRODUCT||','||CLASS||,||DATE||,||PRICE) as OLD_INVALID_KEY
from OLD_ABC O join CLASS C where O.CLASS = C.CLASS and O.SEASON_KEY <> C.SEASON_KEY
NEW_DATABASE
select O.PRODUCT, O.CLASS, O.DATE, O.PRICE, ORA_HASH(O.PRODUCT||','||CLASS||,||DATE||,||PRICE) as NEW_KEY
from NEW_ABC
Then compare -
OLD_ABC.OLC_INVALID_KEY = NEW_ABC.NEW_KEY -- If matches -- delete the record
Main problem is there are alomost
15-20 tables having such incorrect data and all contains different number of columns (from 8 - 25) and each table contains over a million+ records ( historical data) Please advise any alternate best way (other than comparing records using hash key) to clean the invalid data
dynamically from OLD and NEW Table.
** The OLD_ABC and NEW_ABC are on 2 different database servers and New database has diff table name (proper naming convention) and can only access only through DB link between NEW TO OLD.
Will use below process to Copy affected records into another table for future reference.
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:4274289639999 Thanks,
Sachin
First question:
Are you sure you need to
remove all the invalid rows? Or is it just that the season is incorrect and should be
updated?
If it's the latter, normalizing resolves the issue for you!
Assuming you do need to delete the rows, you could try something like this:
delete new_abc n
where ( n.product, n.class, n.date, n.price ) in (
select o.product, o.class, o.season_key, o.date, o.price
from old_abc@olddblink o join class@olddblink c
where o.class = c.class and o.season_key <> c.season_key
)
You'll need to test whether this is faster than using a hash. As-is I suspect this method will be faster because:
- You avoid the work calculating the hash
- It can use the existing PK index
Adding a virtual column containing the hash and indexing that
might work out better though.
Ultimately you need to test and try yourself!
clean the invalid data dynamicallyI'm unsure what you mean by dynamically, but realistically it's highly likely you'll need to hand-code these scripts. Data cleansing like this often throws up unexpected issues you have to resolve manually. The effort you spend writing a dynamic framework for a one-off operation like this is often better spent just writing the SQL to do the changes!