Skip to Main Content
  • Questions
  • best way to derive deltas when there is nothing set up on source

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Jess.

Asked: June 12, 2017 - 7:32 pm UTC

Last updated: June 29, 2017 - 2:22 am UTC

Version: 11gR1

Viewed 1000+ times

You Asked

We are looking at a gradual migration of a legacy system running on 11gR1. We do not have control over the source system (although their DBAs are willing to accommodate small requests we might have). The data model is being changed drastically, but we're unable to do a one-shot migration. We will get a backup-based cut of the database to give us a head start, and then deltas will be replicated from the source system while the legacy and the new one are running in parallel (though no information will be going from new to legacy). Multipart queston on deltas...

1. We're looking at Data Guard for replicating deltas. However, there is nothing to help us set up on the source system: there is no CDC or GoldenGate or anything. The DB is massive (double-digit TB), not all tables have last_update column (and the option of adding to all and reindexing seems rather daunting). What is the best way to derive the deltas? We're contemplating parsing the redo logs using LogMiner to get the DML we could read and understand (there should be no DDL changes other than partition creation/exchange). We've also looked into SCN watermarking, which looks like it would produce false positives. In the absence of paid features and/or things that are not trivial to set up on source, what is the recommended way to get a meaningful set of deltas?

2. As part of migration, we'd like to tag records as having been migrated, delete records we don't need to take forward, etc. Since we have a physical standby and just Data Guard (not Active), we think we need to pause for migration, updates, etc., then flash it back, which doesn't look promising, as we'd lose the data we wrote. We think we may need to have a 'migration' schema on the ultimate target db and keep track of these things there (and nuke the schema once migration is done). Have we missed something, or is there no easier way to go about it?

3. The ltimate target db (the one with the new data model) will be 12cR2. We think our staging server where migration scripts will run should be 11gR1 to match the source. Do you agree, so it's like-for-like, or is there room to make it 12c for more feature richness?

Thanks as always!

and Connor said...

Yeah, if you can't change the source (or at least not much), and you cant use Goldengate, then you're not in for a fun time.

It would be worth talking to your local Oracle account rep, because (I think) its possible to get Goldengate licenses on an interim basis for upgrade cases, but please dont take my word for it. But obviously Goldengate is a natural fit here.

I'm not sure what you mean by "DataGguard for replicating deltas". You'll have a replica of the source system, but you'd still be none the wiser as to what has changed and when it changed, without opening up the DataGuard node (read only or read write) and running comparison scripts.

If you do need to head down the path of data comparison scripts, then this post may help

https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:9534169900346876704

I've had to do similar in the past, and whilst we were successful, it was a lot of plain old hard work, ie, writing comparison scripts on a table by table basis - because had a last_update column, others has a sequential primary we could use to pick up inserts, others had nothing so it will a full scan of source/dest each time etc...and all of which needed to in some cases to be SCN consistent.

It...is....not....fun.... :-)

More "out there" options would be Streams, or a logical standby and try to intercept the incoming LCR's etc .... but all of that to me sounds like high risk, high effort.

Sorry I dont have any great answer for you

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A similar experience

Stew Ashton, June 19, 2017 - 7:30 am UTC

I'll jump in here because I've done something somewhat similar.

First, Connor provided a link to another asktom thread where I mentioned a tool I wrote to synchronize tables. I won't repeat the link here, but I suggest you read the thread to the end.

I have one big question: are you going to apply deltas to the "new" tables, and simultaneously modify them in the new system? How can that possibly work? At the very least, each row needs to be "owned" and modified by only one system, either legacy or new.

I would try to separate "legacy" rows into separate tables or partitions, so the delta apply mechanism would not override changes made by the new system.

Second question: as Connor mentioned, do the deltas need to be "SCN consistent"? That would be because of foreign keys, in which case they need to be deferrable.

Concerning your point 1.

- last_update columns are dangerous because the time in the column is the time of the change, not the time of the commit. If you query the changes since noon, any changes made before noon but committed after noon will not show up.

- ORA_ROWSCN tells you the SCN of the latest change to the block, not to the row. You will get false positives but is that a big problem? You will still have to scan the whole table because ORA_ROWSCN cannot be indexed. I have never done "replication" using ORA_ROWSCN so I won't say any more.

If you have to scan both tables to compare, it would be better to have them in the same database! Comparing large tables over a dblink is a huge pain. You could then apply the delta to the local target and to the remote target on the new system.

If you do need SCN consistency, my tool should be compatible with that.

Personally, I can think of a third party tool that might help you considerably, but I'm not sure I should mention it here.

Best regards, Stew Ashton
Connor McDonald
June 20, 2017 - 5:32 am UTC

Feel free - we're all friends here :-) or I'll just delete the review :-)

Jess, June 26, 2017 - 1:37 am UTC

Hi Connor,
Thanks for your reply.

"Data Guard replicating deltas" was meant to indicate that we intend to pick up any changes to the source system by having them replicated to the staging server. As you say though, it's picking out the exact bits of what happened that's the interesting part.

Thanks for the link the other thread... You're right about high risk stuff, but we are where we are. At the moment, we're thinking that the way forward is to let Data Guard apply the changes, but also grab archive logs and do something with LogMiner to get the DML out. That should give us an idea of what's changing. Though as source/target have TDE enabled and all tablespaces are encrypted, so are the logs, so we're a bit at a loss...

It's the "comparison scripts" notion that makes me uneasy. To compare, say, table by table, you need a copy of said table before and after. If all we have is the target to which Data Guard replicates (and, obviously, the new production to which we're migrating the data), then there is no comparison as such--once DG applies the changes, there is no more 'before'. We can keep some shadow tables (in the new prod, temporarily) with some PKs and dates, but that won't handle updates in any sort of graceful way, which is why we thought of ROWSCN, as false positives shouldn't matter much--it'd be worse if we missed something.

Looked into GoldenGate. It appears one could get a fixed-term license (either a year or 6 month, the latter costing 70-80% of the annual cost). Alternatively, there is a subscription model if you're running it through Oracle Cloud, which I'm not sure is possible in our case.


Stew Ashton,
Thanks, I did see your 'COMPARE_SYNC' post off Connor's link. I'll take some time this week to set up a test environment and give it a go. As I understand though, it would throw another server into the mix (since we can't connect to prod, the staging server would get changes via Data Guard, and we'd need a copy of staging server before the changes to do the comparison). I'm also curious how it performs on 80-odd million row unpartitioned tables... We are definitely in for the fun times.

Re your question: "are you going to apply deltas to the "new" tables, and simultaneously modify them in the new system? How can that possibly work? At the very least, each row needs to be "owned" and modified by only one system, either legacy or new."
It is a staged migration. Say the system lets users create all sorts of support tickets. The first thing being migrated are tickets of type A. This means that, starting with a cutover date, no new tickets of type A can be opened in the old system--only the new system will let you create them. However, type A tickets already open in the old system won't be migrated until they are closed. Similarly, new users can be added only in the legacy system. This means they need to be migrated to the new system. Then, if they want to open type A tickets, they'll do it in the new, else in the old system. Over time, more ticket types will get migrated to the new system until there is nothing left in the legacy. So no, "exact" activities are not happening in 2 systems at once, nor is there data flow from the new system to the legacy.

The tables aren't in the same database--they can't be. The staging server (to which DG is replicating) is a physical standby (well, we'll convert to a snapshot standby). It can't be logical because of unsupportable datatypes. We're looking at excluding, but that may not be possible. For us to keep a copy of the "as is" data to compare with changes after DG logs have been applied, we need another copy of the dataset. The database is 30-odd TB. It's not trivial to just get another server for comparison.


Re: 3rd party tool, I have looked at quite a few, and I am not certain I saw anything that could genuinely help in terms of providing replication and spitting out exactly what's been changed... It's possible I missed something in my research, so any suggestions would be appreciated.

Connor McDonald
June 28, 2017 - 1:25 am UTC

Some more thoughts for you.

1)

"We do not have control over the source system (although their DBAs are willing to accommodate small requests we might have)"

rings alarm bells for me. You're undertaking a significant application restructure of a 30TB system, and DBA's are going to be quibbling over what can and cannot be done on the source. Man, I hate it when DBA's are like that - they are meant to be enablers not obstacles. I'd be raising that issue first and foremost. For a migration of this size and complexity, everyone needs to be rowing the boat in the same direction.

2)

"The first thing being migrated are tickets of type A. This means that, starting with a cutover date, no new tickets of type A can be opened in the old system--only the new system will let you create them. However, type A tickets already open in the old system won't be migrated until they are closed"

seems to be worth a rehink to me. Perhaps a staged migration where type A tickets move in their entirity to the new system is a less risky option ? You obviously still have data sync challenges but at least you dont have to face things like:

- reporting on type A must cross both systems
- enquiries on a single type A must have intelligence to know which system to talk to etc.

3)

(This is related to 1 above).

Since you are going to running both systems during the migration process, it's self-evident (and should be to your DBA's) that the source system is going to need changes (some non-trivial) to accommodate this. For example, you might consider AQ on several structures (eg user addition like you provided) to capture those events and propagate them to the new system.

IMHO, you're going to need that flexibility on the *source* system if you are doing to get this to work.

Jess, June 28, 2017 - 5:39 am UTC

Thanks Connor!

Re: (3), first and foremost...
Now I am concerned... You are anticipating non-trivial changes to source like AQ, etc. to propagate changes. What are we missing, as we weren't expecting that? The plan was to replicate via Data Guard, which should give us all the changes. We're working on being able to exclude unsupported columns so we can have a logical standby (suspect we'll need supplemental logging turned on on source too). This is why we're trying to derive from the logs what's changed (to migrate, DG will apply as it does). And that's why I am asking for ideas on deriving DML from archive logs off a TDE database...

Setting up AQ/CDC... We haven't even raised that knowing the response. We're looking at 25+ tables just to get "customer"... Plan A is definitely minimal changes to source...

=======


Re: Control over source system, agree with you fully, but when a system has been supported by a vendor for a decade, poorly, and the company is finally bringing it in-house, the vendor going out the door ain't going to go above and beyond. They'll do small bits, and anything more will have cost/time unacceptable to the company. Is what it is. The key here is that source DBAs aren't going to volunteer ideas/help. If we give them a list of specific tasks, they'll coast it. Whatcha gonna do....

Re: Staged migration, we've raised numerous risks (including things you pointed out) to no avail. The business have formally accepted the risks, and so it's up to us to make the technology support their choices. The workflow between 2 systems is radically different, so migrating open tickets would be non-trivial (e.g., current state might not exist), and tickets can be open for months. Moreover, there is talk of private beta, whereby both systems will accept type A tickets... There are, thankfully, no reconciliation requirements or reqs to push data from new system to the old...

Thanks again!

Connor McDonald
June 29, 2017 - 2:22 am UTC

"The plan was to replicate via Data Guard, which should give us all the changes'

Yes, but the issue (if I'm interpreting your descriptions correctly) is not *capturing* all of the changes, is it *identifying* and *applying* those changes to your new system.

In an given (say) hour, 1000 changes will get applied via redo logs to your data guard system. Out of those 1000 changes (as I understand it) you need to:

- pick out the 647 of those that are relevant to you

- with each one, take whatever operations it was (eg "add a row to table X") that occurred in the source, and potentially apply a transformation to the logic, because in your new system "add a row to table X" is now "update table Y, delete from Z, add a row to table W"

And as you say, you're starting an old vendor system that isnt a pretty sight :-) A trivial (fictional) example could be user maintenance.

From the application perspective, we are updating a user details. What if the vendor system implements that with a delete old record, insert new record.

When you're intercepting that in the redo log stream - you dont see "Oh, this is user maintenance", you see "User has been deleted" followed by "User has been added", etc. Whereas if you have access to make modifications to the source system, you could (perhaps) record that the "following operations are user maintenance" in a table etc.

I'm just letting you know my thoughts here - I'd hate you to get halfway down the track and hit insurmountable problems. Have a good long think before you get started.