Skip to Main Content
  • Questions
  • Design question around automatic data change notifications

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Prasad.

Asked: August 08, 2018 - 1:28 pm UTC

Last updated: August 10, 2018 - 2:30 pm UTC

Version: 11g

Viewed 1000+ times

You Asked

Hello Oracle experts. Thank you in advance for going through my question.

In our organisation, we are planning to implement a solution to automatically push the data that is changed in our databases to consumers. They are not a high traffic systems and as such volumes and the frequency of changes may not be aggressive. At the moment, consumers are receiving changed data as a batch at the end of the day/week/month. If we were to implement this solution, we stumbled across few design challenges which probably some of you may have implemented it or came across it. Actually we have both SQL server and Oracle databases on premises needing the same solution, so any inputs the following question will be helpful to both the database types.

If I take a simple example of 3 relational tables Customer, Accounts and Address with PK & FK relations with each other as Customer[1]-->[1..*]Accounts[1..*]-->[1]Address. Currently, the consumers receives a report of the data after joining these 3 tables. In the new solution, if there are changes to (say for example) Address table, then, sending address details alone will not have any context, so, we need to do the reverse lookup in Accounts table to see which Account rows are using the changed address and traverse all the way back to the Customer table. So, at the end I receive the list of all Customers and their associated Accounts that use the newly changed address as a data change notification to the consumers.

I'm wondering if there are any best methods to achieve the reverse lookup to get the parent table's primary key values?. At the moment I'm thinking of using Redo logs to get the actual changes and establish that the data change is in Address table, then query the system objects to get the foreign key relationship to Address table recursively until I reach the top parent table and then dynamically build up a SQL query to get the AccountNo value from Accounts table and CustomerNo value from the Customer table (both AccountNo and CustomerNo being the PK columns in the respective tables) that are using the current changed address. Any issues you can think of in terms of generalising this and making the query performant?

Hope it makes sense. Please let me know if you need any further clarification on this.

Kind regards,
Prasad Panchagnula

and Chris said...

So you want your consumers to join changes with the relevant parent data?

Usually with change data capture solutions you have an initial load process. The consumer gets a copy of all tables they'll use from the source system. Then they can join the changes with their data store however they want.

Trying to build a generic solution to get the values from the source system on demand sounds bad to me.

You could follow the FK relationships using *_cons_columns views. But this can run into problems such as circular FK references. And it can end up building queries that select from almost every table in the database! That's a bad omen for performance.

You also have to stop it following irrelevant relationships. For example, addresses could also link to countries. And accounts might chain onto currencies, etc.

How will your generic solution avoid this?

I'm not aware of products that will help you here. And writing your own is likely to lead to a world of pain!

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Incrrmental

Gh, August 08, 2018 - 6:51 pm UTC

You can build any a datalake where all changes are logged in as simple rows as fact tables.
Then build on a set of mv refreshed incrementally using analytics


A reader, August 08, 2018 - 9:27 pm UTC

Thanks for your response Chris.

"So you want your consumers to join changes with the relevant parent data?"

This is the first time we are looking to use CDC design pattern and we are in very early stages and getting mixed up with the traditional batch data extracts and CDC design pattern. So far we discussed about having batch data extracts as initial data loads but these batch extracts won't have primary keys since they are de-normalized. So any subsequent CDC updates will still need to be in de-normalized format, hence my original question about getting PK values of the parent tables.

After your response, I read little bit about CDC design pattern, in all cases CDC is described in use with data warehouse or analytics data (i.e. internal systems only). Is it a common practice to send CDC data changes to external clients and ask them to replicate part of our data tables and let them do the joins? How about future schema/table enhancements? Thanks.


- Prasad P
Chris Saxon
August 09, 2018 - 9:55 am UTC

What exactly are your consumers? What is their purpose and why do they need to reconstruct the full data set?

these batch extracts won't have primary keys since they are de-normalized

Any particular reason to exclude the primary keys? And, assuming you mean sequence assigned PKs (or similar), the data will still include the business key, right?

Can you join on that?

Prasad Panchagnula, August 09, 2018 - 1:47 pm UTC

"What exactly are your consumers? What is their purpose and why do they need to reconstruct the full data set?"

The nature of the data is of financial nature - investments, stocks, assets..etc. Sometimes the data is sent to marketing teams with performance figures. They don't need to reconstruct the full data set, but with CDC implementation pattern, the design pattern setup seems to have mirrored tables in both source and target systems which makes it easy to push (synchronise) changed data in one table to the target system and let the target system join the data. In this way we are hoping that we could send our data changes quickly asap. We are still contemplating whether to send data at table level and let target systems do the joins or we join them up and then send them.

"Any particular reason to exclude the primary keys? And, assuming you mean sequence assigned PKs (or similar), the data will still include the business key, right?"
Correct, they still have business keys but not the number sequence primary keys. In some tables, the business keys are the PK's and in others they are not. So, in those tables where sequence numbers are used as PK's, we need to put some context around it in order to define what data it is related to, hence reverse traverse to the parent tables for parent's business key.

- Prasad P
Chris Saxon
August 09, 2018 - 4:05 pm UTC

It depends on where you want to put your effort.

If you're doing the joins on the source system, it's

- More load on the source system
- More work for you to put the propagation in place
- If the consumers want to change the data in their report, you need to recode the propagation.

Whereas if you do the joins on the consumers, it's

- Less load on the source system
- Less work for you to put the propagation in place
- Consumers have the flexibility to write their own reports (assuming they don't want data from new tables).

I'm struggling to see what benefit you get from doing the joins yourself on the source.

Prasad Panchagnula, August 09, 2018 - 4:22 pm UTC

"I'm struggling to see what benefit you get from doing the joins yourself on the source. "

This is how we have at the moment where we send de-normalized data as a batch at the end of the month (or fortnight).

We believe (or someone within our organisation) that we can push the data changes faster by adopting CDC design pattern. Only thing I'm still not sure is if anyone has implemented this CDC pattern to push data changes to external parties and if anyone has implemented this for a data of this nature which I'm dealing with i.e. financial/asset related.
Chris Saxon
August 10, 2018 - 8:22 am UTC

CDC should allow the consumers to get changes faster.

Forget your current approach. Push changes for all the tables each consumer cares about as they come in. You may want/need to build a transformation step at the end to convert the data to a format the consumer wants.

This is the basis of ETL & data warehousing. There are various ETL design patterns available. But the exact details of how you do this depends on your goals. Understand what you're trying to achieve by doing this before rushing in with CDC!

A reader, August 10, 2018 - 10:33 am UTC

Thank you Chris for all your inputs. Much appreciated for your time in responding to my queries as well.

- Prasad P
Chris Saxon
August 10, 2018 - 2:30 pm UTC

Happy to help

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database