Skip to Main Content
  • Questions
  • Database Independence Vs Database Dependence

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question.

Asked: April 11, 2012 - 7:54 am UTC

Last updated: April 14, 2012 - 10:49 am UTC

Version: 11.2.0.3

Viewed 1000+ times

You Asked

Tom -


I am 100% on your side on Database Dependent Applications . I am not a big fan of Database being treated like a Black box.

But the other side ( So called Object Purists / Database Independent Folks ) attribute the success of Peoplesoft / Siebel to DB Independence .

How to counter this ? Being , Peoplesoft / Siebel are now "Oracle" Products . Is there any roadmap to make it DB dependent applications?


and Tom said...

Those big database independent applications:

run on very specific versions of the LIMITED databases they choose to support
with very specific dot releases of those databases
with very specific patches applied to those databases
with very specific initialization parameters that must be set in those databases
on limited platforms, on specific platforms (OS/hardware)
with specific versions of those platforms supported

To call them database independent is wrong, they physically wrote code to support a handful of databases on a smaller handful of versions, on specific platforms. If they were database independent you could run them on any database pretty much - that is the promise of database independence isn't it?

Do those applications support informix? mysql? postgres?

And they each have separate code bases for separate databases. Meaning - they have to implement things differently for Oracle than for DB2 than for SQL-Server.

And they each re-invent database features over and over again (increased development time) rather than use builtin stuff. Re-implement integrity constraints (and do it wrong by the way in many cases, it is non-trivial to do that stuff in the middle tier), etc.

They spend a lot of time trying to be database 'somewhat flexible' and make certain performance decisions (decisions to not perform as well as they could) to do it. And they need to have people that are very knowledgeable of ALL of the databases in order to develop algorithms that work the same in all of the databases.

That last bit is the biggest thing - do your purists know enough to safely develop a 'database independent' application? I doubt it (I'm not saying they are dumb, I'm saying they don't understand the differences between the databases and how one algorithm on one database might not work the same at all on another)

Here is a short example:

let's say to be database independent - you decide to write your own replication. You will timestamp every record during insert/update of the record. This will allow you to identify 'new' or 'changed' records easily. So, what you do after getting everything timestamped and having the code in place to maintain the timestamps is:

a) remember what time it is right now - remember this in a table somewhere. Assume clock synchronization is not a problem for this exercise)

b) pull all data from remote site and commit;

c) later, to do a refresh - you read out the time from (a) - a time that is older than your first 'refresh'.

d) you remember what time it is now

e) you pull all of the changes from the remote system that are greater than or equal to the time from (a)


Now, ask the database agnostics which databases this will work in and which ones this will not work in. And ask them if there are some databases that this will work in sometimes and not work in other times - depending on how the database is configured.


The bottom line is, for even something as simple sounding as that - there are databases this will work in (db2), there are databases this will not work in (Oracle) and there are databases where this will work sometimes and not others depending on how the database was configured (sqlserver).

Ask them to explain HOW this is possible - what causes these differences. Ask them explain HOW this algorithm needs to be changed - for each database (they'll all need a slightly different approach!!)


There are no current plans that I'm aware of to make the applications mentioned "db dependent" however - there would be a bit of backlash over something like that.

Rating

  (8 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, April 12, 2012 - 3:28 am UTC

Tom,

Could you please explain why in your example this approach doesn't work using Oracle? (And why it does using DB2)
Tom Kyte
April 12, 2012 - 7:54 am UTC

Let us open this up - see if people can answer this.


So, anyone, what is wrong with the above logic? Why won't it work in Oracle, why will it work in DB2, under what circumstances will it work in SQL Server and then not work in SQL Server?


mfz, April 12, 2012 - 8:00 am UTC

To the above reader , you may want to read about "Multi Versioning " in Oracle .

To get a jumpstart , see
http://www.databasejournal.com/features/oracle/article.php/10893_1403591_1/Expert-One-on-One-Oracle-Pt-1.htm

Alexander, April 12, 2012 - 9:01 am UTC

I'm guessing this is all about read consistency and row versioning..

Oracle - Has this always. So a modified row in between the time the changes are pulled might rollback to a previous version of that block?

SQL Server - Has "allow_snapshot_isolation" and "read_committed_snapshot" options for databases now, so depending if you enable that it might work it might not?

DB2 - Doesn't have read consistency, so it's not a problem and will work?
Tom Kyte
April 12, 2012 - 9:27 am UTC

correct.

Here is the long story....


Ok, you have a trigger or some application code that sets the timestamp on a record. the timestamp is set AS OF the time of modification (not the time of commit). So, if I update a record at 11:59 - the record will be timestamped 11:59. If my transaction does not commit until 12:01 - it will still be timestamped 11:59.

Now, assume this flow of events in Oracle:

11:59:00 - update and timestamp record X in some transaction - do not commit yet.

12:00:00 - starting the 'replication', so you remember the time

12:00:02 - you start the data pull - you do insert into local_table select * from remote_table. Due to read consistency you will read each block of the remote table AS OF 12:00:02 - you will only see the committed records, nothing else. In particular - you will NOT see the 11:59:00 update - NOR will you be blocked by it. Multi-versioning will roll back the change made to the block to the last committed stated prior to your statement beginning. So you read right over the lock with a consistent read.

12:01:00 - you commit the transaction that modified record X. This is not relevant to the data pulling process. It cannot, will not see this change this time - no matter what.

12:05:00 - you finish you data pull, you commit the insert into local_table command.

13:00:00 - you decide to synchronize the two systems again. You read out the last timestamp (12:00:00) and remember the current timestamp (13:00:00) in the table.

13:00:01 - you merge into local table select * from remote table where timestamp >= 12:00:00. you miss the record that was updated at 11:59:00


Now, what would happen in DB2?



11:59:00 - update and timestamp record X in some transaction - do not commit yet.

12:00:00 - starting the 'replication', so you remember the time

12:00:02 - you start the data pull - you do insert into local_table select * from remote_table. If your read hits the updated record, you will BLOCK and wait. If your read doesn't hit the updated record yet, you'll just keep reading (see below for what happens if you don't hit that updated record before the next step)

12:01:00 - you commit the transaction that modified record X. This is VERY relevant to the data pull process for one of two reasons. Reason 1: it is blocked on the updated record. It will now unblock and read the currently committed version of the record - it will in fact see the 11:59:00 change. Reason 2: it hasn't yet gotten to the updated record - but when it does - it will read the currently committed version (no read consistency, no multi-versioning). In either case - in DB2 you would read the record that was updated but not committed at 11:59:00 - by waiting for it if necessary.

12:05:00 - you finish you data pull, you commit the insert into local_table command.

13:00:00 - you decide to synchronize the two systems again. You read out the last timestamp (12:00:00) and remember the current timestamp (13:00:00) in the table.

13:00:01 - you merge into local table select * from remote table where timestamp >= 12:00:00. you miss nothing, because you block and wait for things, possibly you deadlock, possibly you block for hours and hours - whatever




In CURRENT RELEASES of sql server (so this is something that has changed, it didn't used to be true, you have to keep up with feature sets..) you can either do multi-versioning (similar to Oracle but a totally different architecture for doing it) or read committed (db2).




So, the question becomes - how do you fix it?


Well, in DB2, you are "fixed" for this particular algorithm - as long as you can live with the intermittent possible deadlocks, blocking issues and the like. the functionality is there, performance - eh...

In sql server you might be fixed, but if not, you have to figure out what table in sqlserver tells you what the timestamp of the oldest outstanding transaction is (I don't know what that table is or even if such a table exists, but you need it). You'll read the minimum of the current time and those times from this table. that is the time you'll remember.


In oracle, you'l use v$transaction and you'll select the minimum of systimestamp and that time. that is the time you'll remember. In this example, that would have been 11:59:00 instead of 12:00:00 and the refresh at 13:00:00 would have gotten that change.


In short, you need three different sql statements for three different databases. You need different logic for sqlserver (you have to run a query to figure out what mode the database is in, not so in DB2 or Oracle) - and more/MOST importantly - you need a developer that knew in the first place that this was going to be necessary. In my experience, as a developer, I say "good luck" with that last one.

The reason I say good luck is not because I thing developers are "dumb", I just know from my own past experiences - from years gone by, before I even joined Oracle, when I was the lead 'database engineer' on our development team - that I was not smart enough to know all of this stuff at the tender age of 27-28. I learned it eventually - but there is a lot of stuff to know. I had read all of the manuals for sybase, informix, oracle, ingres, gupta sqlbase, sql/ds, db2 and was the 'lead' because I appeared to know more than anyone else on the team - but I didn't yet know how much I did not know. I found that out over time.

And today, I think there is even more stuff to know.

Which leads me back to - it is even more important today to use the heck out of the tools you buy.

commits which don't commit

Gabe, April 13, 2012 - 6:16 pm UTC

In my experience, as a developer, I say "good luck"

Even when database "independence" is implemented in rdbms-specific layers, porting the code set is easier said than done. Consider something as simple as managing your transactions:

-- prep sqlserver

create table t (n integer)
go

create procedure dbo.p @a integer
as
begin
  begin transaction;
  insert into t values (@a);
  commit transaction;
end

create procedure dbo.q
as
begin
  begin transaction
  exec p 1;
  commit transaction;
  begin transaction
  exec p 2;
  rollback transaction;
  begin transaction
  exec p 3;
  commit transaction;
end;

-- prep oracle

create table t (n integer);

create or replace procedure p (a integer)
as
begin
  insert into t values (a);
  commit;
end;

create or replace procedure q
as
begin
  p(1);
  commit;
  p(2);
  rollback;
  p(3);
  commit;
end;


The code looks straightforward enough (not that I'm advocating for commits inside procedures) and one might expect similar results.

--------------------------------------------------
-- exec sqlserver

C:\>sqlcmd -S xxx -d xx -b
1> select * from t;
2> go
n
-----------

(0 rows affected)
1> exec q;
2> go

(1 rows affected)
1> select * from t;
2> go
n
-----------
          1
          3

(2 rows affected)
1>

--------------------------------------------------
-- exec oracle

C:\>sqlplus yyy@yy

SQL*Plus: Release 11.2.0.1.0 Production on Fri Apr 13 17:04:29 2012

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Enter password:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select * from t;

no rows selected

SQL> exec q

PL/SQL procedure successfully completed.

SQL> select * from t;

         N
----------
         1
         2
         3

SQL>


Tom Kyte
April 14, 2012 - 10:49 am UTC

I've always been flabbergasted at the way sqlserver (first sybase and then MS) dealt with transactions and transaction semantics. It is so arbitrary feeling - rolling back in triggers, this nonsense above where a COMMIT (a COMMIT!!!!) is rolled back. ugh. I cannot imagine why that is not considered a bug.

No database independent application at all

Le, April 14, 2012 - 11:32 am UTC

In my own experience, anyone who believes she/he could end up with a database independent application is actually not a database-aware person at all. No exception. They know little about database and treat database as a file-alike stuff, where they store data in and get data back. Of course, those attemps all failed at last. Interesting.

Data

Enrique Aviles, April 16, 2012 - 7:40 am UTC

The main purpose of the vast majority of applications is to process data. This data is kept (and hopefully processed) in a database so by definition all applications are database dependent.

I understand the dependent/independent debate but the bottom line is that database independence is just a (bad) idea.

Database Tools

Nitin Goyal, April 19, 2012 - 7:44 am UTC

Considering the craze in the market place for Object Relational tools like Hibernate which seem to be the "in thing" for making the applications portable "across databases" (whatever that means ;-) )I think its a fair ask for you to highlight that as well. Most of the programmers I have come across feel using Hibernate will make it so much easier for them to port their applications across the databases. All of them should go through the replication example you outline above.

Mark, May 13, 2014 - 3:13 pm UTC

I love threads like this. I'm curious if anyone has any good, striking examples, like the replication one, where the example is contained to a single database server.