Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Chris Saxon

Thanks for the question.

Asked: December 09, 2021 - 8:05 am UTC

Last updated: January 20, 2022 - 2:47 pm UTC

Version: 19C

Viewed 1000+ times

You Asked

Hi Tom and Team,

I am looking to remove the invalid records before applying PK

create table OLD_ABC
(
  PRODUCT,
  CLASS,
  SEASON_KEY,
  DATE,
  PRICE
)

KEY -- (PRODUCT,  CLASS,  SEASON_KEY, DATE)

TABLE records -- 
Ex.    ABC , 121 , S ,1-JAN-21, 15
       PQR , 121 , W ,1-JAN-21, 15 -- Incorrect record because of wrong SEASON NAME 


Due to PK on all above field, it was allowing to load the data for same class with invalid Season due to wrong data provided by Source.

CLASS Table holds SEASON Information as it's 1:1 relation

Class table records - 
  CLASS  NAME   SEASON_NAME  SEASON_KEY (LAST DIGIT OF CLASS)
  121    PL1       S         1
  122    SL1       W         2
  123    IR1       A         3



In new database,the column Season has been removed as part of normlization and to avoid incorrect data to be get loaded.

create table NEW_ABC
(
  PRODUCT,
  CLASS,
  DATE,
  PRICE
)
PRM KEY --  (PRODUCT, CLASS,  DATE) -- Currently PK is disabled to load historical records and also to analyze data between both databases.

  PRODUCT  CLASS   DATE        PRICE
  PQR      121     1-JAN-21     15
  ABC      121     1-JAN-21     15


Problem statement -

The incorrect records need to remove from OLD database as well as new database. (ex. PQR , 121 , W ,1-JAN-21, 30)

The incorrect data easy to from OLD_ABC by joining with CLASS TABLE but that's not the case with NEW_ABC Table as there is no option to join CLASS Table to find incorrect records.

The option I think is -
OLD_ABC -
create hsah key for each Incorrect record in OLD_ABC table by excluding SEASON_KEY column and crete similar ORA_hash key on NEW_ABC and remove the matching key records.
ex -

OLD_DATABASE -

select O.PRODUCT, O.CLASS, O.SEASON_KEY, O.DATE, O.PRICE, ORA_HASH(O.PRODUCT||','||CLASS||,||DATE||,||PRICE) as OLD_INVALID_KEY
  from OLD_ABC O join CLASS C where O.CLASS = C.CLASS and O.SEASON_KEY <> C.SEASON_KEY


NEW_DATABASE

select O.PRODUCT, O.CLASS,  O.DATE, O.PRICE, ORA_HASH(O.PRODUCT||','||CLASS||,||DATE||,||PRICE) as NEW_KEY
   from NEW_ABC


Then compare -
OLD_ABC.OLC_INVALID_KEY = NEW_ABC.NEW_KEY -- If matches -- delete the record


Main problem is there are alomost 15-20 tables having such incorrect data and all contains different number of columns (from 8 - 25) and each table contains over a million+ records ( historical data)

Please advise any alternate best way (other than comparing records using hash key) to clean the invalid data dynamically from OLD and NEW Table.

** The OLD_ABC and NEW_ABC are on 2 different database servers and New database has diff table name (proper naming convention) and can only access only through DB link between NEW TO OLD.


Will use below process to Copy affected records into another table for future reference.

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:4274289639999


Thanks,
Sachin


and Chris said...

First question:

Are you sure you need to remove all the invalid rows? Or is it just that the season is incorrect and should be updated?

If it's the latter, normalizing resolves the issue for you!

Assuming you do need to delete the rows, you could try something like this:

delete new_abc n
where ( n.product, n.class, n.date, n.price ) in (
  select o.product, o.class, o.season_key, o.date, o.price
  from   old_abc@olddblink o join class@olddblink c 
  where o.class = c.class and o.season_key <> c.season_key
)


You'll need to test whether this is faster than using a hash. As-is I suspect this method will be faster because:

- You avoid the work calculating the hash
- It can use the existing PK index

Adding a virtual column containing the hash and indexing that might work out better though.

Ultimately you need to test and try yourself!

clean the invalid data dynamically

I'm unsure what you mean by dynamically, but realistically it's highly likely you'll need to hand-code these scripts. Data cleansing like this often throws up unexpected issues you have to resolve manually. The effort you spend writing a dynamic framework for a one-off operation like this is often better spent just writing the SQL to do the changes!

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Sachin, January 20, 2022 - 8:11 am UTC

Sorry for the late reply.

Thank you for your response and suggestions.

I have updated the delete record query as there was a mismatch in the number of columns.

Regarding Delete or update --
If the table contains two records with details having different SEASON KEYs then, remove the record with incorrect SEASON KEY Or update the Season Key only if there is only one record with an invalid season Key.

Thanks,
Sachin
Chris Saxon
January 20, 2022 - 2:47 pm UTC

If you need to remove duplicate rows, read:

https://blogs.oracle.com/sql/post/how-to-find-and-delete-duplicate-rows-with-sql

I'd try removing the duplicates first, then running the updates to fix any remaining invalid entries.

More to Explore

SQL

The Oracle documentation contains a complete SQL reference.