Skip to Main Content
  • Questions
  • Data Model related Question -- history information.

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question.

Asked: June 08, 2009 - 4:47 pm UTC

Last updated: April 07, 2017 - 2:10 am UTC

Version: 9.2.0

Viewed 1000+ times

You Asked

Hi Tom,

In my latest project I have come across the following data model.

All the application tables are modelled on similar lines.

Eg, if we have a customer table with the following information

Column Data_type
------------------------------
Customer_id number
Account_type varchar2(20)
Address_line1 varchar2(2000)
Office_ID number
Salary number
Eff_date date
Inactive_date date

The customer table will always hold one record for the customer id and all the updates and history is stored in a common table ..say.. Appl_hist which would have the folowing structure

Column Datatype
---------------------------
Table_id number
Column_id number
char_from varchar2(4000)
char_to varchar2(4000)
Num_from number
Num_to number
Date_from date
Date_to date

and each table and column is given a unique id.
and every change to every column will go in as a record into this table.

My Question.

1) Isn't this another case of the generic way to store data ?

2) Do you see any merits of storing the history information this way?

3) If the application tables have data for which the currently active data is queried most of the time and history information is queried less, how should we design the model?

Thanks,
Rajesh





and Tom said...

1) yes and no.

The history table is a history, not the current data. Current data needs data integrity constraints, rules, etc. History is *known* to have been correct at that point in time - so technically (since it is read only) we can relax that a bit.

So, it is a history of changes to a column. It is stored very generically (one table for all) and is virtually unquery-able (it takes great SQL prowess to be able to materialize the row values of a row from 12 months ago), it is definitely non-performant in a generic sense (it would be very hard to query up everyone who lived in Virginia and had an age of between 30 and 50 twelve months ago - no indexing scheme would be helpful - it would be very hard)

since it is hardly ever - if ever - queried (my guess, as most audit trails are wont to be that way)... It is probably "ok"

I prefer a shadow table personally. Does that take a bit more space? Sure, but so what - the overhead of maintaining that history (slow by slow inserts into it after exhaustively comparing columns to columns...) and its complete inability to actually be queried make me stay away

(in 11gr1 we have a flashback data archive that automates this history process - it uses a row image - not column by column changes)


2) not really, not a fan, but in the grand scheme of things - this is something I'd not raise a huge fuss about (unless I was forced to use it myself :) )

3) I'd still go with a shadow table - that had the user that modified the data, and the effective date range the row value was valid for.

Easy to query
comprehensive
easy to maintain


Rating

  (14 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thank you....!!

A reader, June 09, 2009 - 7:07 pm UTC

Thanks for the response Tom.

1. I think the shadow table makes a lot of sense in this case as we do have effective and inactive dates for the record in most of the tables. But.....

What if we store the data in the same table with a "active_flag" which indicates whether the record is the currently active record. That way, if the user wants the current data, he would query the record with Active_flag = 'Y' and if he wants the data for a given date, he can use the eff_date and inactive_date without the need to look into two tables?

Thanks,
Rajesh.


Tom Kyte
June 10, 2009 - 11:50 am UTC

if you do that, then there will be some overhead in retrieving current data - you'll have to add active_flag to every index, and if you full scan the data - you will have to read over the history. You'll have issues dealing with constraints as well. What if going forward you have a constraint "x must be greater than zero", in the past, x could be less than or equal to zero, but no longer. You'd have to work with constraints in a fashion that obeyed the active_flag as well. Unique constraints would be a hassle - you'd have to program them as "create unique index I on t( case when active_flag = 'Y' then c1 end, case when active_flag = 'Y' then c2 end )". Primary keys - well, forget it - and then since you don't have true unique or primary key constraints - then no foreign keys, no data integrity...

I would not suggest it in general.

If you want to do it in a single table, we've already done that for you since 8i - it is called "workspace manager", search for dbms_wm on this site for examples.


History data

Vikas Atrey, June 09, 2009 - 11:47 pm UTC

Yes Rajesh you can do that provided the table is not updated very frequently and you see not much data growth by storing the audit of the changes in the table itself on the same line as SCD2( slowly changing dimension type 2).

A reader, June 10, 2009 - 11:27 am UTC

What are the merits ?

<quote>
3) I'd still go with a shadow table - that had the user that modified the data, and the effective date range the row value was valid for.

Easy to query
comprehensive
easy to maintain
</quote>
Tom Kyte
June 10, 2009 - 3:00 pm UTC

I'm not sure what your point was?

Thank you...!!!

A reader, June 10, 2009 - 12:58 pm UTC

thanks for the detailed explanation...
Your site and your books are fantastic too..!


auditing

A reader, June 10, 2009 - 7:40 pm UTC

Tom:

One book on auditing talks negatively on shadow tables due to:

1. This effectively doubles the number of tables in the entire database
2. It may be difficult to turn loggin on or off for individual tables.
3. It is usually only possible to record one set of values those that were written to the database which means you must look elsewhere to find the orignal value.
4. If the structure if any application table is changed then the audit table must also be changed to keep in line.
5. Due to the potentially large number of audit tables it is difficult to have a single online enquiry which is capable of showing the contents of thsoe tables. The usual solution is therefore to have a separate screen to show the contents of each audit table. It requiers a great deal of effort and the online enquiey was clumsy.

He prefers the generic approach. Do you agree with the above?


Tom Kyte
June 11, 2009 - 8:07 am UTC

1) so what

2) how so???? how would it be harder than the above approach???!?!?! You either have the logging done in the application code (stored procedure we hope) or in a trigger. You turn it on, you turn it off.

I fail to see how a shadow table implementation would be any easier OR harder to turn on/off versus implementation "X"

3) that doesn't compute, I'm not sure what is trying to be said there at all. You obviously have access to the before AND after images (you are the application, you are modifying the data, you obviously have both regardless of the database or application development environment you are using)

4) duh, as in big old "no duh". Your audit trail is an integral part/component of your schema -as important as ANY OTHER TABLE. And in fact, it might not look exactly like your production schema.

Schema changes possible:

a) drop a column. Go ahead, drop it from your production current schema, but I'd say you should make a design decision to KEEP it in the audit trail. Net change: nothing

b) add a column. Go ahead, add it in both (no brainer)

c) rename a column. Decision time. Probably rename it in both.

d) change datatype. This is harder - you might change a varchar2 to a number because all current data is now a number, but historical data might not be a number (in the past, the letter 'A' was valid in that column but not anymore). Here you would want to probably rename the existing audit trail column to column_name_before_date (before date would be yyyymmdd or something) and then add a new column to the audit trail


But in short - NO DUH. If you make a schema change it will have an impact on your application - part of which is your audit trail. So be it - I don't see the point (other than blatant laziness)

5) again - HUH? You cannot envision an application that could query up any table? (hint, you run dozens of them every day)




No, I don't agree with anything listed.



A reader, June 10, 2009 - 10:16 pm UTC

Book name , Author , ISBN please .
Does the author gives any benchmarking results.

audit

A reader, June 11, 2009 - 9:36 am UTC

Tom:

On another thread you simply say to make a design decision based on the requirement:

If the reqt is to TRACK only changes to columns then go with one generic table.

If the reqt is to audit the WHOLE ROW (even if one column only changed) go with shadow table.


What you said above is mostly true except that practically speaking it is much harder to manage a 100 table schema than 50 tables and keeping them in sync. I can also have one user screen to query the one generic table while you need 50 query screens to query the 50 audit tables.

It is easier to code for a shadow table but it is harder to manage/maintain and query using a user GUI screen.
Tom Kyte
June 11, 2009 - 11:04 am UTC

it is not harder to manage 100 tables versus 50 tables. If you think it is, please describe in painstaking detail - do not make blanket statements.

the keyword in the comments was REQUIREMENT. You have seen many times on this site Sam (SMK), that I vehemently disagree with peoples so called requirements. You can log the changes column by column - but it is not the approach I would suggest. Shadow tables - easiest to implement, maintain and especially USE.


Describe what synchronization issues you would see - again in PAINSTAKING detail. For the way I see it, you either transactionally maintain your audit trail or not, it matters not if it is ONE TABLE or 50 TABLES. You are either doing it right or not, it is a binary thing. There are no "synchronization" issue, you maintain the audit trail as you maintain the source data - when you commit - either BOTH are committed or NEITHER are.

it is not harder to manage a shadow table, if you say it is - you better list the reasons - I see none, so I say "you are blowing smoke"

it is not harder to maintain a shadow table, ditto on what I just said.

it is not harder to query (in fact the OPPOSITE IS TRUE, that single generic table is a $#!$#! to query - it is harder than heck to query - not to mention performance just does not exist against it), it is EASIER to query.


audit

sam, June 11, 2009 - 1:41 pm UTC

1. This is where you suggested to create a shadow table if the reqt is to audit the WHOLE row.
If not you suggested to use generic table.


http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:59412348055

<<<3) depends - think about it again. If you were auditing all columns every time, a table per table
would make sense (I call that a shadow table). If you are auditing JUST the columns that were
modified and doing it in a fashion that has a row per modified column, then a single table makes
sense. >>



2. You always said the more databases you have the more work you need for DBA and your preference to have all applications in one database (diff schemas).

So creating 100 extra auditing tables would not be more effort? you just have to make sure they are always in sync - which is not much effort.


But, Let us say the user wants to you give him the ability to query the audit trail.

You need now 200 procedures: one for the presentation and one for searching each table based on user input and displaying results.

With a generic design you only need 2 procedures: one for presentation and one for searching one table based on user input and displaying the results.
Tom Kyte
June 11, 2009 - 3:24 pm UTC

1) I don't get your point. I said "if you want to do X, here is how to do X"

I don't see anything there that says "this is the way you must do it", not by me


2) more databases IS NOT more tables. How did you make that leap of intuition? How do you even equate the two ideas??

If you cannot figure out how to write an ad-hoc query screen, and you truly believe you would need to write 200 procedures, well, ok...

I would not write 200 procedures.

But seriously Sam... Think about it. Does SQLPlus have to be recoded and recompiled every time you add a table you might want to query? Or SQLDeveloper? Or <billions of tools>.

Do you think a software developer just might be able to do this rather simple thing - "dump a table to a screen, perhaps after giving the end user a dialog to let them filter the data with first"

In a single routine? Just maybe?



Very true....

A reader, June 11, 2009 - 2:09 pm UTC

<quote>

it is not harder to query (in fact the OPPOSITE IS TRUE, that single generic table is a $#!$#! to query - it is harder than heck to query - not to mention performance just does not exist against it), it is EASIER to query.

</quote>

Cant agree more... (looking at the data and the task.. that I have at hand...) With the shadow table concept, to get a history record, all you would need is a single query and the date for which you want the record..
with the generic table concept...where you break up each row into individual columns, the size of the table increases within no time and to retreive the same data as from a shadow table....you would have to use max(decode) and group by on the large set of rows in the generic table.

Thanks,
Rajesh.




Audit

sam, June 11, 2009 - 4:00 pm UTC

Tom:

Exactly, we are back to the reqt. If customer says he wants to audit only changes why audit the whole row with shadow table concept. IF you do that then you have to compare each row to the previous one to see the changes.

Actually I would use Workspace Manager. simple command and no need to reinvent the wheel -- but it saving all these versions into the same table though.

I see your point about Query tool. I think you can use dynamic SQL and let the user select which table he wants to view and criteria, build the SQL on the fly and display the results in one procedure.

I do not understand why people seem to be concerned with size of table. Does Oracle Engine care whether a table has 1000 records or 1 million records (unless you are doing a full scan for searches).
Tom Kyte
June 11, 2009 - 4:57 pm UTC

and I'd be right back to

"Mr customer, you do not really want what you ask for and let me tell you why"

that is our job, to actually explain to the 'business' why their technical solution (which they should not be coming up with in the first place) isn't what they really want. You can sit back and just do what you are told if you like - and you'll pay for it many times over as you get stuck with totally unworkable stuff. I don't.


I would use workspace manager IF and ONLY IF I benchmarked the actual application since there are very certain performance considerations to be had with that implementation. It works well if you always access via an index and access a small subset of rows at a time.


... Does
Oracle Engine care whether a table has 1000 records or 1 million records
(unless you are doing a full scan for searches). ...

and with your single generic table - guess what you would be doing, frequently (indexing that generic table for ad-hoc queries - not really possible)

Stay with the Shadow Table

Kevin, June 21, 2009 - 6:39 pm UTC

I have been chasing history in an Oracle database since 1994 when on a job, my client had a problem with a BIG customer. In my travels I have read many a paper on what would be cool, what might be doable, what people actually do, and what Oracle provides for solutions. I have concluded that for the time being the best solution is an audit style whole row copy aka. Tom's Shadow Table.

There will always be people who resist the idea. I sometimes think people like to argue simply because they did not think of the idea, or becuase it is something they have never done before, or because they don't want to change anything, or maybe just because they like to hear the sound of their own voices. The generic solution desribed is an old one. It works. But if you want to talk about practicality, my experience is, it is practically useless for anything real. Tom gave some excellent examples of why.

For those who don't know what a shadow table is, here are two basic shadow tables one might create depending upon need. Obviosly the second has slightly more code involved in maintaining, but it offers correspondingly more fuctionality and ease of use at query time.

--
-- root table
--
create table dept
(
    dept_id number not null
  , dept_name varchar2(30) not null
)
/

--
-- shadow table example 1
--
create table dept_a
(
    dept_id number not null
  , dept_name varchar2(30) not null
--
  , change_user varchar2(30) not null
  , change_date date not null
)
/

--
-- shadow table example 2
--
create table dept_a
(
    dept_id number not null
  , dept_name varchar2(30) not null
--
  , change_user varchar2(255) not null
  , start_dt date not null
  , end_dt date not null
  , operation varchar2(1) not null
--
  , transaction_no not null
  , seq_no not null
--
  , any other crap stuff you might thing would help you do something interesting
)
/


You should consider the following when doing a history implementation:

1) the need to recreate prior results. For example, the ability in July-2009 to recreate the January-2009 month end report. Indeed, you should be able to rerun any query with a date, and get the same answer you got at the time you originally ran the query. This includes selecting of views, and complex queries with many joins/subqueries/function calls.

2) the opportunity to use code generators to create your history tables and any trigger or instead-of-trigger and view components. It is much easier and more reliable to write a generator to create these things. With a correct generator, there are fewer mistakes, everything looks consistent, and it takes no longer to produce 2000 of something than it does to produce 2.

3) more tables does mean more work for you DBA. But hey, that is their job so tell them to stop their whining and get working.

4) how to deal with nay-sayers. Whenever you are doing anything good with Oracle, there will always be someone who steps forward to point out you are wrong, or you will have x,y,z problems, or something else. I always reply with this... "thanks, I appreciate anyone who can help me make a better mouse trap. So what should I do instead? What is your better mousttrap and why is it better?". Usually they have little of offer. Do not ever let someone tell you your stuff sucks, without forcing them to put something of their own on the table too. People who consistently put down working solutions without having an alternative are are useless. If they have nothing to offer as an alternative, then I say whatever you got is infinitely better than what they got.

5) listen to your users, but remember, you are the IT sepcialist. They are suposed to tell you what they need, you are supposed to go get it for them. But they don't do design. They don't tell you how to solve a technical problem. That is your job.

So in the end, my experience says go with the Shadow Table. What you will hear about the generic solution may be true, but for most people the bottom line is, it just won't give you what you want, and finding this out eighteen months down the road will be a bad deal.

Good luck, Kevin

shadow table design

James Su, January 24, 2010 - 11:13 am UTC

Hi Tom,
I have questions regarding this shadow table design.
If the requirement is to display the before and after images and show what fields are changed, do you think it necessary to store both them in one row? For example: firstname_before and firstname_after. Or do you think this should be retrieved by an lead/lag function from the history? Do you think we need to store the changed column names in the row, or just calculate it in real time by comparing the before and after images?
Thank you.
Tom Kyte
January 26, 2010 - 1:44 am UTC

you only need to store the before image, you never need the after image. The after image always exists as either

a) the next before image
b) the current row image in the real table.

Just log the entire :old record.

Jess, March 30, 2017 - 5:26 pm UTC

Hi Tom,

We have a scenario where external feeds come into the system and get loaded into ODS with records merged into the tables (as I/U/D all possible). For regular business usage, we only need to display the current state of the record. However, sometimes we need to show full history of a record (e.g., when there is a complaint, and history of changes needs to be audited).

My go-to move in this case would be to create trigger-driven history tables ('shadow' tables as you called them). However, the client will not sign off on this approach out of fear that the triggers could [maliciously/accidentally] be dropped, and then the audit wouldn't happen, and no one would know.

So... Now I'm considering alternatives. Last time I had to do something different, we had AQ running feeding changes into another instance. That's deprecated in 12c. CDC has been replaced with GGate, but that has additional costs, which client won't accept.

What is a cheap and cheerful way of having audit in this case? I looked at flashback archive (it appears as standard no-extra-cost in EE edition; did I get that correctly?), which seems like the way to go. Do you agree, or is there a better option (or performance implications)?

If flashback archive is the way to go, then I'm slightly confused as to how it would be used... We'd need to define an archive for each of the tables in ODS (about 15 in this highly audited schema), but how would the business query "from the beginning of time" for a given record?

As an aside, looking at "level of complexity" angle, does FA add too much and it'd be worthwhile to try to persuade the client to accept triggers, or is FA actually good or better? If they decide to leave Oracle at some point in the future, is it easy to take the data from archive tables?

Many thanks, as always!

Connor McDonald
March 31, 2017 - 4:43 am UTC

The issue here is not a tech one, its

"the triggers could [maliciously/accidentally] be dropped, and then the audit wouldn't happen, and no one would know."

The same would apply to

CDC
GG
FDA
etc
etc



Jess, April 04, 2017 - 3:54 pm UTC

Hi Connor,

You are right--there is risk in all of these. At the end of the day, they are willing to accept some risks and not others. It's easy enough to disable a trigger and forget to enable it. Seen it done a million times. Harder to do that with FDA.

My question is still around alternatives to triggers and recommendations, FDA or otherwise....
Connor McDonald
April 07, 2017 - 2:10 am UTC

Triggers gives you the most control....but of course with the most coding, most complexity and most overhead.

FDA (with 12c and the context extensions) should yield a similar solution. The queries for "changes over time" ends up being like:

SQL> select empno, ename, job, sal, comm,
  2      VERSIONS_STARTTIME TS
  3     ,nvl(VERSIONS_OPERATION,'I') op
  4  from EMP
  5  versions between timestamp
  6    timestamp '2014-02-11 20:12:00' and systimestamp
  7  order by empno, ts;

     EMPNO ENAME      JOB              SAL       COMM TS           O
---------- ---------- --------- ---------- ---------- ------------ -
      7369 SMITH      CLERK            806            08.10.51 PM  I
      7369 SMITH      SALES           8060       1000 08.12.10 PM  U
      7499 ALLEN      SALESMAN        1606  300000000 08.10.51 PM  I
      7521 WARD       SALESMAN        1256  500000000 08.10.51 PM  I
      7566 JONES      MANAGER         2981            08.10.51 PM  I
      ...
      7900 JAMES      CLERK            956            08.10.51 PM  I
      7902 FORD       ANALYST         3006            08.10.51 PM  I
      7934 MILLER     CLERK           1306            08.10.51 PM  I
      7934 MILLER     CLERK           1306            08.12.10 PM  D