Skip to Main Content
  • Questions
  • Update production database from ETL Process

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Geraldo.

Asked: February 13, 2019 - 3:59 pm UTC

Last updated: February 14, 2019 - 11:13 am UTC

Version: 18.3.0

Viewed 1000+ times

You Asked

Hello, Ask Tom Team.

I am using a Microsoft SSIS ETL Package to insert data to Oracle (source and destination).

I want to do an incremental load every day but I have to flag the rows that are already loaded in the destination. It is not just select and insert row by row. There are some transformations and conditions based on dates. So, I can not take into account a row_date field because maybe I want some row that arrived today but it has to meet some date condition and maybe it would be load some days in the future if the condition is met.

Would it be good to update flag those records in the source production database (using a SQL task inside the ETL package) so I can load only the rows that meet the condition and flagged as 0, for example?

Thanks in advanced.

and Chris said...

If the logic to identify which rows to extract is complex, then a list of "which rows to process" is a good idea.

Without knowing more about your approach it's hard to pick a "best" way. But a to_process flag or similar is an easy way.

Note Oracle Database doesn't index rows where every indexed column is null. As the number of "to process" rows tends to be small, the following can be an efficient way to find rows to extract:

- Create an index on to_process
- Set this to Y based on your logic for reading them (or default it for new rows)
- Set it to null once you've processed a row

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Review

Geraldo Peralta, February 13, 2019 - 4:58 pm UTC

Great!

Thanks for the quick response. I'll do it.

Review

Geraldo Peralta, February 13, 2019 - 9:13 pm UTC

Chris.

About what you wrote:

Note Oracle Database doesn't index rows where every indexed column is null and - Set it to null once you've processed a row

In my query I have other where conditions. Still is better to set the value to null once processed?
Chris Saxon
February 14, 2019 - 11:13 am UTC

Over time I would the number of to_process rows become a minuscule fraction of the rows in the table.

Thus an index on just this column should be able to efficiently find rows, even if you have other conditions in your where clause.

Review

Geraldo Peralta, February 14, 2019 - 11:42 am UTC

Got it!

Thanks, Chris.

More to Explore

Design

New to good database design? Check out Chris Saxon's full fundamentals class.