Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question.

Asked: December 27, 2004 - 11:35 am UTC

Last updated: June 12, 2007 - 10:29 am UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Hi

I have been handled a DWH project the thing is the company always call their DWH an ODS (Operational Data Store). My question is what are the differences between them?

I read about ODS that its used for Tactical Decisions and DWH for strategic decisions and ODS contains current data (from seconds to 24 hours) and DWH historical data but I dont see why we need an ODS at all when we can query the OLTP systems.

An ODS for example contains almost identical Data Model as OLTP system what happens if the OLTP system is a CRM which has 1000 tables? Do we still replicate all 1000?

When should we use ODS?

thx

and Tom said...

DWH's are typically "read only, batch updated on a schedule"

ODS's are "maintained in more real time, trickle fed constantly"


Some databases do not support concurrent READ and WRITE access to the same data simultaneously, they block and lock and deadlock all over the place. Hence, artificial systems were generated in order to try and alleviate that. In these systems, the OLTP systems are allowed to proceed nicely, without any blocks from the readers of the data - and the ODS is the thing that suffers from the contention.

In Oracle an ODS would be a logical standby database -- it is really there for failover but people "hate to see hardware wasted" (using airquotes on purpose there) like that -- so they want to run reports over there.


ODS's could also be the side effect of having distributed complexity -- in order to get the view of the enterprise, you have to conglomerate N systems together -- so you have this ODS as sort of a staging area, but a source of "global relatively current" truth for your enterprise. Again -- screams for "consolidate me" instead of artifically "fix me by adding yet another database"

Rating

  (34 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Data Mart and ODS

Dipsy, December 28, 2004 - 3:09 am UTC

Hi

Is it correct/feasible develop a Data Mart using an ODS?

A question about "Historical Data" in a DWH, if for example my customer´s address changes twice in a day should these modifications be in DWH?

I have seen ETL processes in ODS work in this way, insert if the record is not present otherwise update (sort of MERGE statement), this sort of processes is quite time consuming if the source has quite a lot of data can this workflow works for a DWH too?

Sorry if the questions are stupid, I am a newbie in this business intelligence field

regards

Tom Kyte
December 28, 2004 - 10:27 am UTC

the ODS is technically designed to be used as the feeder for the DW and other DM's -- yes. It is to be the source of truth.

only you can answer the second paragraph, what says your business. Maybe they don't care, they just want the last address (but if I was doing this DW for an investigative agency -- I might well care alot)....

I would prefer to update and if not there insert (faster generally, many times faster generally -- stress generally, if the record hardly ever, hardly ever exists, then the converse is true, but if the record exists more than occasionally -- update/insert is better)....

but yes, people use that all of the time.

Answer to Dipsy's question

Meghana, December 28, 2004 - 6:29 am UTC

Hi Dipsy,
What we should remember is that a DWH contains "Historic Data". How "Historic" will normally be decided by the end users, but normally a DWH must not have daily updates as it ends up defeating the principle of non-volatility which is the basis of a DWH.

If you have constant or daily updates, then there is no point in maintaining a separate DWH, have it on your operational system instead. A DWH is basically meant for Analytical Processing and not for Transaction Processing.

However, the question does arise as to what should be done if your users do demand daily updates. Bill Inmon suggests in this case to give your users an ODS, which will be more frequently updated than your DWH. In this regard I completely agree with Tom when he says that an ODS can actually be a Standby database. In this case, as Tom says, instead of "wasting the resources" use them for analytical purposes (read ODS)

To Meghana ...

Gabe, December 28, 2004 - 12:09 pm UTC

<quote>but normally a DWH must not have daily updates as it ends up defeating the principle of non-volatility which is the basis of a DWH.</quote>

Non-volatile means you don't change history ... having daily/frequent updates does not automatically imply history gets changed. Hence there is no correlation between the frequency of updates and the _purity_ of a dw.

<quote>Is it correct/feasible develop a Data Mart using an ODS?</quote>
If that means modeling an ODS using dimensional techniques (that is, represented as one or more conformed stars)? .. then, Yes ... it is feasible and an often used approach. Correct/Incorrect though cannot be generally asserted ... the original question actually proves people often cannot even agree if a system is an ODS or something else.


confused by this paragraph

A reader, December 28, 2004 - 3:23 pm UTC

<quote>
I would prefer to update and if not there insert (faster generally, many times faster generally -- stress generally, if the record hardly ever, hardly ever exists, then the converse is true, but if the record exists more than occasionally -- update/insert is better)....
<quote>

Hi

I am confused by this sentence, arent you saying the samething? Update/Insert is fast or you are saying Insert/Update is faster...?!


Tom Kyte
December 28, 2004 - 3:49 pm UTC

first sentence says it all:

I would prefer to update and if not there insert




A reader, December 28, 2004 - 4:58 pm UTC

Hi Tom,
I am confuse with this abbreviation e.g OLTP,OLAP,DSS etc...
What is the difference between OLAP,DSS and data warehouse?

Thanks






Tom Kyte
December 28, 2004 - 6:12 pm UTC

google them, lots and lots of data out there.

To Meghana

Peter, December 29, 2004 - 4:16 am UTC

<quote> If you have constant or daily updates, then there is no point in maintaining a separate DWH, have it on your operational system instead.

I manage several DWH for leading retailers and restaurant chains in the UK. In my experience, these business sectors require daily dataload / update; after all tomorrow's transaction is next week's history (it is is a business call as to whether historic data in the DWH ever gets updated to reflect changes in dimensional data.) Most of the DWH systems I support acquire their data from multiple sources (Point of Sale, Warehouse Inventory Management, as well as corporate ERP) none of these sources hold the depth of history required to support a useful analytic DWH, consolidation on the DWH is only way (for my customers) to go.

<quote>A DWH is basically meant for Analytical Processing and not for Transaction Processing.
Yes, and that is why running DWH type queries on an operational system is bad news. Long running queries (minutes) would cause mayhem on short OLTP transactions.
DWH typically use performance enhancing techniques that would kill OLTP systems. (We use bitmap indexes, star transforms, table partitioning schemes that optimize partition pruning and table compression).

To Peter

Meghana, December 30, 2004 - 4:28 am UTC

Hi Peter!
<quote>A DWH is basically meant for Analytical Processing and not for
Transaction Processing.
Yes, and that is why running DWH type queries on an operational system is bad
news. Long running queries (minutes) would cause mayhem on short OLTP
transactions.
DWH typically use performance enhancing techniques that would kill OLTP systems.
(We use bitmap indexes, star transforms, table partitioning schemes that
optimize partition pruning and table compression).

This is exactly what I meant. That's the reason we are talking about an ODS right?

When I talked about running analytical queries on OLTP systems, i really wanted to explain why not to do it.

Also I realize that business users may require daily updates. I got it a little wrong when I said that it defeats the concept of non-volatility.

Thanx anyway

ETL process approach

Unai, January 04, 2005 - 2:26 pm UTC

Hi

From this thread I see that ODS processes is mainly insert or update, is it possible to do this in bulk in Oracle 8i? In 9i we could use the MERGE command to run this process in bulk.

Something like

select *
bulk cllect ...

then forall and array process

I have developed a process which does insert/update (but not for a ODS/DWH) which was something like this

for i in (select * from a where mod_date > trunc(sysdate - 1))
loop
begin
insert into b values (i.xx, i.yy);
exception
when DUP_VAL_ON_INDEX
then update b set b_xx = i.xx and b_yy = i.yy
end;
end loop;
commit;

Is it possible to transoform this using bulk processing in 8i?

Cheers



Tom Kyte
January 05, 2005 - 8:56 am UTC

search this site for

"save exceptions"

save exceptions

Unai, January 05, 2005 - 11:13 am UTC

Hi

save exceptions is in 9i but my databases are in 8.1.7

is there something similar in 8i?

thx

Tom Kyte
January 05, 2005 - 11:26 am UTC

nope

Bijay, MI-USA, January 05, 2005 - 4:25 pm UTC

To Me ODs is something where the data is there in raw format comming frequently....and the DWH is the history aftter Data Cleansing and ETL process as required...

we only have ODS

A reader, January 09, 2005 - 3:12 am UTC

The cuostomer I am working right now has a ODS but no DWH, all Data MArt are feed using the ODS.

I am not sure if this is a good approach?

Tom Kyte
January 09, 2005 - 11:44 am UTC

are they happy ?

if so, it is a very good approach.

if not, it is not so good.

data marts are just little data warehouses. so you probably have lots of baby DW's. I'd prefer personally (opinion here) to have just a ODS and DW and kill the marts -- they are just more still to admin, audit, etc (and with all of the regulations getting piled on.... auditing and access control becomes pretty important in many cases)

DWh vs DM

David Aldridge, January 09, 2005 - 11:54 am UTC

I think that the difference between the two is not just in the scope of data addressed, but in the structure and optimization also. A data warehouse using a generic star schema is not efficient at answering such questions as "show me all of the items for which the current end-of-month inventory level is more that 25% higher than the minimum end of month level over the past 12 months". It can be done, but if the business is going to repeatedly ask such questions then that is the time to start denormalizing the history by embedding history into each record, like ...

item#
location#
end_of_month#
qty_on_hand_eom
qty_on_hand_eom_minus01
qty_on_hand_eom_minus02
qty_on_hand_eom_minus03
...
qty_on_hand_eom_minus22
qty_on_hand_eom_minus23
qty_on_hand_eom_minus24
min_qty_on_hand_12mth
min_qty_on_hand_24mth

etc.

Philisophically speaking, the query effort is being moved from the level of individual reports into the ETL process. These aren't the kind of structures that are always amenable to optimization through aggregate tables (MV's) and query rewrite either.

Tom Kyte
January 09, 2005 - 12:21 pm UTC

but a DW doesn't have to have just one structure. I would put the DM's in the DW.

a single database is my goal.




To Dave

Peter, January 10, 2005 - 11:38 am UTC

I'm with Tom here.
A major problem with building these time based aggregates as denormalized fact columns is the overhead in maintaining them. Instead of a clean insert of n million rows, you would end up read of many million history rows, perhaps even (worst case)the whole table to allow the calculation of these facts, especially if insertion of late data is allowed.

We had a similar problem with one of our customers' choice of query tool - it could not do year-to-date! and the tool vendor insisted that we pre-built the aggregates - we showed them that it was possible to access custom structures (tables, pipelined functions etc) outside of the normal summary tables with little degradation in query time, but significantly less impact on batch. It was just a matter of telling the vendor they were looking at the wrong problem

David Aldridge, January 10, 2005 - 3:49 pm UTC

Well if there is a way of avoiding the complexity of building a denormalized table then that's fine, but I'd suggest that calculations of year-to-date metrics are generally simple cases that are completely amenable to optimization against standard fact tables -- through intermediate date-to-YTD mapping tables or whatever.

However not every case is so amenable -- what is important is to recognise the cases where the normalized structure cannot meet, or be tweaked to meet, the user requirements, and to apply a cost-benefit analysis to see if it's worth helping the front-end out by ETL'ing a more complex table structure.

TK's point of consolidating the DM's in the same instance as the DWh is a valid matter of interest for performance, maintenance, maybe licensing costs ;) etc., but I don't think it addressed the matter of whether these denormalized structures need to be built or not.

Tom Kyte
January 10, 2005 - 5:17 pm UTC

I don't mind having the data stored in the way that makes best sense to be used -- I mind having it in 15 billion places, never really being sure "what is what, what is current, what is correct, who is using what (and how)"...

David Aldridge, January 10, 2005 - 5:41 pm UTC

Yes, the usual textbook diagrams of the relationships between ODS and DWh and DM always seem to indicate the DM as a seperate physical entity from the DWh, located on some "departmental server", but this has never made much sense to me either -- it's not as if the Accounts Receivable Dept. has their own crack team of DBA's! Keeping DM's and the source DWh on the same instance is better.

Regarding an ODS though, it seems that there could easily be a case for keeping that as a seperate physical entity. The emphasis in the corporate world is to never lose data, and to keep an ODS much more closely in sync with the source OLTP systems, and when you add to that the different nature (and probably tuning) requirements of an ODS instance, that spells "different instance".

Fair points.

Peter, January 11, 2005 - 5:17 am UTC

The main reason I don't like the denormalized aggregate fact approach is space usage. For one of my systems (and yours, Dave are probably very differerent) we store sales by date, customer, store, item. At high levels, how much X did we sell in 2005 is trivial and could be written into a high level summary as a week-by-week YTD but a base level it really wastes storage. If customer y buys x at store z in week 1 we need to populate ydt figures (even if they never buy x @ z ever again!) for each week going forward.... yuk, my sparsity has gone!

I personally go with the one and only truth approach, I have never seen a true need (in terms of security, flexibilty, functionality) for a local data mart - maybe for remote locations where network considerations may make direct DWH access less useful I would consider a replicated sub-set of the DWH. I guess that DMs where shown in text books as seperate servers was to emphasize a logical disconnect between them and the DWH.
I agree with Dave that ODS are not part of a DWH instance, have different performance characteristics (and to my mind, no sense of history!) and belong in another instance (not always their own instance, though) - but not necessarily in these days of RAC, SAN and multi domain machines on physically different tin.

David Aldridge, January 11, 2005 - 10:43 am UTC

Yes, sparsity failure is a very real problem of course, especially if you want to materialize year-to-date metrics on data sets like those you describe. However I'm not averse to using a bunch of space for these things though, if there is a clear benefit. Storage technology being what it is you usually find that these systems run out of i/o capacity long before they run out of storage capacity, and dedicating all of those excess Gb's to the redundant storage of query optimizing structures (simple aggragates or whatever) therefore costs you (theoretically) nothing in terms of disks.

Oh, and it also stops management from thinking that you have vastly over-spent on disk capacity ;)

HO HO

Peter Scott, January 12, 2005 - 4:26 am UTC

Guess that's OK if you already have the disk ;-)

Perhaps I'll sit in dark room for a few weeks estimating how much space I'd need... now 200 stores, 600,000 named customers, 500,000 products.... most customers shop at only one store, most customers shop once a week....

No, I'll get a beer instead ;-)

David Aldridge, January 12, 2005 - 10:04 am UTC

Well in those circumstances a beer would work for me also.

Back to the original question

Abdel, January 12, 2005 - 3:53 pm UTC

The differences between an ODS and DW are :
- As you said, ODS is used for tactical decisions and DW for strategic ones. There is a big difference between the two. Tactical decision decide what to do right now in order to correct or improve the business process. Strategic decision concerns planning for a long term ( generally one to five years).
- An ODS is subject-oriented, current-valued, integrated and contain only detailed data. In the opposite the DW even-if is also subject oriented, contains historical and integrated data, and depending on the business need a certain degree of detail ( granularity).
- An ODS does not contain historical DATA.
- Personally i think if the ODS is used to only feed the DW, and not to distribute information the tactical decision makers, we should use a staging area. And if we already have one, there is no need to add an ODS to the DW melting pot;-)



A reader, January 13, 2005 - 3:41 pm UTC

An ODS is not always tactical - our ODS systems act as the repository of the "single truth" - no descisions (of any time frame) are made from the ODS content

I dont think DM are useful

A reader, June 03, 2005 - 4:07 am UTC

Hi

We have had experiences with DWH/DM then from this DM to another smaller DM.
It is a financial Data Mart so we have basically this setup

origin: Oracle Financials
first dest: DWH
second dest: DM
third dest: DM

We have had bad experience with this setup, too many data sources. There are in fact 3 ETL processes, from OLTP to DWH. DWH to DM and DM to another DM. All this leads to problems, when DWH ETL process fails for some business logic reasons all data propagated to other DM are all wrong. It took us almost 10 months to normalize all the data.

Too long development time and maintenance!

A small DWH design question

PinguMan, June 03, 2005 - 4:16 am UTC

Hi

We want to store our order workflow in our DWH so we can query the DWH to understand how long does a order take to finish from ordering to shipping.

Basically there can be four stages:

1. Place Order
2. Preparing Order
3. Packaging
4. Shipping

There is a characteristic in our company, we mark one year shipped order as EXPIRED.

My users normally are interested in NON-EXPIRED orders.

My question is as follows, I will create a fact table to store all orders workflow (all four stages), there will be a column indicating the stages, for example

1 -> Place Order
2 -> Preparing
3 -> Packaging
4 -> Shipping

I also need a flag to indicate wheather the order is expired or not, I am thinking to create a dimension with order id and it´s expiring status so the user can query as follows

select *
from hist_order, order
where hist_order.order_id = order.order_id
and order.status = 'ACTIVE' /* ACTIVE or EXPIRED are the possible values */

Is this a good approach? Or shall I simply add this flag in the fact table and update the flag in my ETL process (when the order is one year old) thus avoiding the join in my users Reports?

Tom Kyte
June 03, 2005 - 7:26 am UTC

seems to me that 1,2,3,4 are DATES and status =active is really shipping >= add_months(sysdate,-12) isn't it?

not really, we can have an extra stage under some circumnstances

PinguMan, June 03, 2005 - 7:36 am UTC

Hi

Uusally we have these four stages

1 -> Place Order
2 -> Preparing
3 -> Packaging
4 -> Shipping

But there can be cases where the order are cancelled so we can have Cancelling as 5th stage.

We have a column called mod_date that contains sysdate, basically the historical tabla will us that mod_date and the stage indicator to determine when the order is prepared, packaged, shipped etc.

Tom Kyte
June 03, 2005 - 7:40 am UTC

so these stages are "rows" not columns? there is a row/stage?

They are rows

PinguMan, June 03, 2005 - 7:56 am UTC

Hi

Yea they are rows, the table is smth like this

order_id
stage
mod_date

We can have several rows for same order_id to keep the historical movements, we can use this table to track the workflow of the order. So far we can have 5 stages (including the cancelling part) but in the future there might be more so the possibility of making stages as columns is ruled out. I would like to explain why it is ruled out, for example cancelling, right now when the customer cancel his order then later if he wants reorder the same items he will need to place a new order. In the future we might give customer a choice, if the cancelling operation does not exceed 24 hours let´s say we would be able to recover his cancelled order, what this leads to can be this (rare but can happen) a cutsomer place an order same day, cancels the order, recovers the order and cancels again. What this can happen is the number of stage would be dynamic because of cancelling operation, an order can be cancel X times not just once.

Cheers

Tom Kyte
June 03, 2005 - 8:11 am UTC

this seems like a dimension that doesn't need to be factored out then, I'd probably just rather have the status (not the entire word active -- just a "Y" in a varchar2(1) field would do) in this case. It is not like you have other dependent attributes to go with it or anything. (eg: it isn't like a calendar table that turns date -> week
+---> month -> quarter-> fiscal year


Updating in ETL process

A reader, June 03, 2005 - 8:31 am UTC

Hi

If I dont create the dimension then when the order expires I would have to update all rows corresponding to that order status to EXPIRED correct? Which is my second approach.

I was wondering which approach would be better, updating the dimension (if I create the dimension at the end) or update all rows of same order in the fact table (hist_order). If I update the dimension then the reports needs an extra join if I denormalize the fact table and includes the status I would need to update the rows when the order expires.

Which one is better considering right now an order can have a maximum of 5 stages but in future it can have X although I dont think more than 10 (more would be unusual)

Cheers

Tom Kyte
June 03, 2005 - 8:43 am UTC

that is what denomalization is about, instead of

update dimension set status = 'N' where order = :x

you'll

update fact set status = 'N' where order = :x

You need to update when the order expires, period. Since you'll probably query this a lot more than update. It seems doubtful that status would be indexed, so it seems on the face of it -- a safe thing to do.

update-insert, insert-update, merge

A reader, June 03, 2005 - 10:26 am UTC

Hi

You say that depending the data we should do update first which is perfectly understandable if we know that most of data coming from OTLP are data already in DWH/ODS.

I have a question about Merge, does this command do first Update or Insert?

Tom Kyte
June 03, 2005 - 10:58 am UTC

does them "simultaneously" as far as we are concerned. No order of operation can be inferred or relied on.

update-insert, insert-update, merge

A reader, June 03, 2005 - 4:18 pm UTC

Does that mean that Merge can be slower in situations than Update-Insert or Insert-Update?

Tom Kyte
June 03, 2005 - 5:33 pm UTC

No, it means you cannot ascribe to a non-procedural statement like MERGE anything we say or talk about with regards to procedural code!

they are apples and toaster ovens.

In general, set based code is faster, more efficient and uses less resources than what we can do ourselves with slow by slow code.

A reader, June 15, 2005 - 10:42 pm UTC


A reader, October 27, 2006 - 4:43 pm UTC

Tom,

Basic question :

a) What is the difference between data mart vs dataware house vs OLTP

b) difference between ODS vs OLTP



Tom Kyte
October 27, 2006 - 8:14 pm UTC

a) data marts are data warehouses, typically extracts of a data warehouse. In my opinion, they are "not too useful", I'd rather have fewer databases to deal with.

OLTP is online transaction processing, typically many people run one database for their transaction processing and one for their "archive of lots of data we run really big queries against" - the warehouse.

b) oltp's generally feed an ODS (operational data store) and the ODS feeds the warehouse.

ODS's are not necessary in all (most/many cases), they are a near "real time" view of many OLTP systems with current information - that then feed a long term view of data in the warehouse.

A reader, October 27, 2006 - 10:50 pm UTC


Duplication of data?

Bhagya, June 12, 2007 - 9:28 am UTC

I appreciate the difference brought up between ODS and DWH.Just a small clarification...
Like when we have the data in the transactional source systems, the same data in ODS and ODS feeds DWH, this means there is duplication of data and hence waste of valuable storage space?????
Tom Kyte
June 12, 2007 - 10:29 am UTC

waste? only if you consider it waste.

which - well - most people would not.

transactional system - live, right now, in OLTP format. You have 1..N of these

ODS - lags, centralized, consolidated - near real time image of multiple data stores. Typically transactionally oriented table designs.

DWH - lags way behind (typically), structures suited for warehouse queries, updated on a schedule, predicable set of static data


three sets of data, for three distinctly different purposes.

Duplication of data

Bhagya, July 16, 2007 - 7:29 am UTC

Thank you so much for the convincing answer :)

My two cents

Jon, November 05, 2008 - 11:18 am UTC

From my experience in data warehousing the ODS is an
OLTP type database structured for daily processing that
adheres to the Inmon methodology which was outlined in
his book "Building the Datawarehouse" which tends more
towards Date & Codd 3rd Normal form for oltp and reporting
structures ( which still include denormalized reporting tables
)

Data warehousing which base their structure in the star
schema was pioneered by Ralph Kimball in his ground breaking
classic "The Data Warehousing Toolkit" which tends to denormalize tables and discourages normalization or snowflaking unless you have a real good reason.

There are many methods for feeding a data warehouse but
when a ODS is involved Change Data Capture is now coming to
the forefront.

Hope this helps