I'm guessing this is all about read consistency and row versioning..
Oracle - Has this always. So a modified row in between the time the changes are pulled might rollback to a previous version of that block?
SQL Server - Has "allow_snapshot_isolation" and "read_committed_snapshot" options for databases now, so depending if you enable that it might work it might not?
DB2 - Doesn't have read consistency, so it's not a problem and will work?
April 12, 2012 - 9:27 am UTC
correct.
Here is the long story....
Ok, you have a trigger or some application code that sets the timestamp on a record. the timestamp is set AS OF the time of modification (not the time of commit). So, if I update a record at 11:59 - the record will be timestamped 11:59. If my transaction does not commit until 12:01 - it will still be timestamped 11:59.
Now, assume this flow of events in Oracle:
11:59:00 - update and timestamp record X in some transaction - do not commit yet.
12:00:00 - starting the 'replication', so you remember the time
12:00:02 - you start the data pull - you do insert into local_table select * from remote_table. Due to read consistency you will read each block of the remote table AS OF 12:00:02 - you will only see the committed records, nothing else. In particular - you will NOT see the 11:59:00 update - NOR will you be blocked by it. Multi-versioning will roll back the change made to the block to the last committed stated prior to your statement beginning. So you read right over the lock with a consistent read.
12:01:00 - you commit the transaction that modified record X. This is not relevant to the data pulling process. It cannot, will not see this change this time - no matter what.
12:05:00 - you finish you data pull, you commit the insert into local_table command.
13:00:00 - you decide to synchronize the two systems again. You read out the last timestamp (12:00:00) and remember the current timestamp (13:00:00) in the table.
13:00:01 - you merge into local table select * from remote table where timestamp >= 12:00:00. you miss the record that was updated at 11:59:00
Now, what would happen in DB2?
11:59:00 - update and timestamp record X in some transaction - do not commit yet.
12:00:00 - starting the 'replication', so you remember the time
12:00:02 - you start the data pull - you do insert into local_table select * from remote_table. If your read hits the updated record, you will BLOCK and wait. If your read doesn't hit the updated record yet, you'll just keep reading (see below for what happens if you don't hit that updated record before the next step)
12:01:00 - you commit the transaction that modified record X. This is VERY relevant to the data pull process for one of two reasons. Reason 1: it is blocked on the updated record. It will now unblock and read the currently committed version of the record - it will in fact see the 11:59:00 change. Reason 2: it hasn't yet gotten to the updated record - but when it does - it will read the currently committed version (no read consistency, no multi-versioning). In either case - in DB2 you would read the record that was updated but not committed at 11:59:00 - by waiting for it if necessary.
12:05:00 - you finish you data pull, you commit the insert into local_table command.
13:00:00 - you decide to synchronize the two systems again. You read out the last timestamp (12:00:00) and remember the current timestamp (13:00:00) in the table.
13:00:01 - you merge into local table select * from remote table where timestamp >= 12:00:00. you miss nothing, because you block and wait for things, possibly you deadlock, possibly you block for hours and hours - whatever
In CURRENT RELEASES of sql server (so this is something that has changed, it didn't used to be true, you have to keep up with feature sets..) you can either do multi-versioning (similar to Oracle but a totally different architecture for doing it) or read committed (db2).
So, the question becomes - how do you fix it?
Well, in DB2, you are "fixed" for this particular algorithm - as long as you can live with the intermittent possible deadlocks, blocking issues and the like. the functionality is there, performance - eh...
In sql server you might be fixed, but if not, you have to figure out what table in sqlserver tells you what the timestamp of the oldest outstanding transaction is (I don't know what that table is or even if such a table exists, but you need it). You'll read the minimum of the current time and those times from this table. that is the time you'll remember.
In oracle, you'l use v$transaction and you'll select the minimum of systimestamp and that time. that is the time you'll remember. In this example, that would have been 11:59:00 instead of 12:00:00 and the refresh at 13:00:00 would have gotten that change.
In short, you need three different sql statements for three different databases. You need different logic for sqlserver (you have to run a query to figure out what mode the database is in, not so in DB2 or Oracle) - and more/MOST importantly - you need a developer that knew in the first place that this was going to be necessary. In my experience, as a developer, I say "good luck" with that last one.
The reason I say good luck is not because I thing developers are "dumb", I just know from my own past experiences - from years gone by, before I even joined Oracle, when I was the lead 'database engineer' on our development team - that I was not smart enough to know all of this stuff at the tender age of 27-28. I learned it eventually - but there is a lot of stuff to know. I had read all of the manuals for sybase, informix, oracle, ingres, gupta sqlbase, sql/ds, db2 and was the 'lead' because I appeared to know more than anyone else on the team - but I didn't yet know how much I did not know. I found that out over time.
And today, I think there is even more stuff to know.
Which leads me back to - it is even more important today to use the heck out of the tools you buy.