Skip to Main Content
  • Questions
  • Data Warehousing database & a regular Database

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, Nag.

Asked: September 25, 2002 - 7:20 pm UTC

Last updated: August 02, 2013 - 7:40 pm UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Tom

How is a data warehousing database different from a regular database.

I usually find that data warehousing is seen as a separate skill. Why ?

What are the areas (in a nutshell ) wherein a data warehouse database differes from a regular database.

Thanks

and Tom said...

Most datawarehouses are characterized by:

o hundreds of gigabytes to terabyte in size. dealing with that scale takes knowledge.

o periods of read mostly activity, with periods of short bursts of FURIOUS load, reorg, rebuild activity

o use of specialized features that many people have read about but fewer have used such as bitmap indexes, materialized views, star query schemas, partitioning, etc.

o use of parallel query -- similar to the point right above.

o constantly shifting set of questions - requiring shifting sets of data, MV's, indexes, etc (most systems, once production, are relatively static. DW's -- every day is a brand new day)

To name a few. Dealing with terabytes of data is hard, takes experience. Lots of know how required.

Rating

  (101 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

elaborate

Nag, September 25, 2002 - 8:12 pm UTC

"Lots of know how required. " ... can you substantiate this statement required. A person might be very experienced in a regular database , do you think he will not be able to handle datawarehouse database if he is a novicein datawarehousing?

"constantly shifting set of questions""DW's --
every day is a brand new day", do you mean data is moved to datawarehouse every day. It is not entirely true, some datawarehouses are loaded once in a month and some once in a quarter?

Tom Kyte
September 25, 2002 - 9:17 pm UTC

Nag,

have you worked with TERABYTES of data.

It is very different from GIGABYTES of data, by an order of magnitude. It can be a world of difference (or a world of hurt, depending on who you ask)

What I meant by "constantly shifting set of questions" is that

o you set up a data warehouse on day 1
o the question is "how many red shoes did we sell in NC that day.
o tomorrow it is how may gidgets did we try to sell vs actually sell in the North west for the last 5 years"
o after that, they want to know, so, what should we forecast for widgets in the south based on the last 17 years - taking into consideration that el-nina (weather pattern) was in place in '94.

and so on. Once you open the floodgates on a DW -- all of the "analysis" you did for what questions they would ask is moot. They will ask whatever they want to ask, when they want to ask it. What are you goind to do about that? Ready to respond, to analyze, to create new MV's, new indexes, do you know where to begin to look to find out what you need to do to respond?



Could you also talk about DW backup strategy?

Ramakrishna, September 26, 2002 - 3:07 am UTC

Hi Tom,
Sometimes, it is not feasible to run a data warehouse in ARCHIVELOG mode due to the enormous amount of redo generated for various reasons (unable to use direct path loading, stored procs computing lots of derived values, etc). Since the data is availabe in the staging area anyway, is it OK to run in NOARCHIVELOG mode with nightly cold backups (50GB currently)?
What is the recommended way of backing up a data warehouse?

thanks
Ramakrishna


Tom Kyte
September 26, 2002 - 7:52 am UTC

As long as you can afford to lose the entire database, rebuild the whole thing from scratch -- go for it.

If you cannot, then think 50 times about this.


If you are unable to use direct path loading, I would look into why that is so and perhaps change my methods to permit it. In 9i with EXTERNAL tables, PLSQL pipelined functions, Multi-table direct path inserts -- there will be NO logical reason you cannot.

In order to answer "what is the recommended way...", you would have to look at your requirements for

o MTTR
o ability to rebuild the entire thing from scratch
o cost of not have the data available

and then we could answer

Nathan, September 26, 2002 - 4:26 am UTC

Thanks Tom,
Can U please explain some issues with the terabyte database , compared to Gigabytes. ?

I believe with regards to backup , rebuilding the database would be better than recovering from a archive. periodic Exports I guess could be a good backup strategy.

Thanks Tom,


Tom Kyte
September 26, 2002 - 7:56 am UTC

See -- you got it backwards 100%

That is why people want experience.


Suppose you have a terabyte of data, 1 block goes bad (disk failure). You would really want to rebuild the database??? Not me, I would prefer to be using 9i with RMAN and block level recovery that lets me patch a block in place without taking the datafile offline, without taking the tablespace offline.


Export? On a terabyte database? Your great grandchildren would be running the system by the time your export you kick off today finished. And trying to Import that export? Their great great great great grandchildren might be able to monitor the end of the import. Maybe you mean to "transport the tablespaces", that would work -- but it would be easier to just have backups.

Also, you would have large portions of the DW in read only mode -- meaning you need not backup the entire database -- much of a DW can (should) be read only and you add to it over time (partitioning helps here). Design (physical design) plays a huge role in a DW.

A reader, September 26, 2002 - 9:47 am UTC

Datawarehousing is a different world. There is no way someone with OLTP experience will feel at home with a DW system from day one.

In addition to what Tom already listed, an area where you need experience is ETL(extraction, transformation & loading). The data could be coming from literally hundreds of sources running on a variety of platforms. There could be issues with different conventions, formating, filling up absent data etc. (That is why datawarehouses generally use surrogate keys). You might be having one or more ODS (operational data stores) that holds the data to be loaded to the warehouse. You might be doing lot of "data massaging", summation/pre-processing, you might have to validate the data/data samples. Data loads are done using Direct/Parallel options with data pre-sorted wherever possible. After the data is loaded to the warehouse, comes the performance part. Tuning and re-writing of multipage SQL queries, star transformation, bitmap indexing, database resource management - it is a whole lot which is not generally used in any other systems.

Tim Gorman's book, Oracle 8i Essential Datawarehousing is a very good starting point for beginers.

Cheers,

AR


Great summary of issues

Robert, September 26, 2002 - 9:57 am UTC


What is DW anyway?

A Reader, September 26, 2002 - 11:54 am UTC

Do data warehouses have to be in TERABYTES? What is the definition of DW in sizes? If I have some OLTP experience and want to learn some DW work (Jr. level), where should I start with, or how can we get into the DW business? We will never have a database of this size. I am also scared by the comment:
"There is no way someone with OLTP experience will feel at home with a DW system from day one."
What about the other way around? Those who have DW experiences, can they feel at home when working on OLTP? I mean is DW more difficult than OLTP?
Thanks


Tom Kyte
September 26, 2002 - 12:01 pm UTC

No, i started with:


...
Most datawarehouses are characterized by:

o hundreds of gigabytes to terabyte in size. dealing with that scale takes
knowledge.
....


good book referenced above....


the converse is true, DW != OTLP. they are each equally difficult with their own challenges. I know of OLTP envs that are bigger then some DW's

It's all about experience....

Robert C, September 26, 2002 - 12:34 pm UTC

>>? If I have some OLTP experience and want to learn some >>DW work (Jr.level), where should I start with, or how can >>we get into the DW business?

Just my 2 cents re the above...
Going into DW with "some OLTP experience " ?
I'd say that's almost impossible...you'd really have to impress the other guy with other qualities/soft skills like business knowledge.
(and re hard-skills, there're monster truck loads of technologies, products, tools etc)
With LOTS OLTP experience ? maybe...but that's still a hard sell.
As Tom has mentioned about experience...it (perceived or true) is a BIG thing in the DW world, especially when they are looking to bring in someone from OUTSIDE.
Best place to start: Your current place...try to make a lateral move and get the foot in the door.

star transformation slow

Sreejith, October 21, 2002 - 2:49 pm UTC

The information was very useful. I have a performance issue with star transformation. We are using star schema for a data mart. There is a single fact table and 8 dimensions. There are 8 million records in the fact and 2 of the biggest dimensions have 10,000 records each. All others are small (100ds). When I checked the execution plan it shows the usage of bimap indexes to get rowids from the fact table. To retrieve 10 records , it takes around 60 sec. The same query by creating a btree index retrives the uses less than 1 sec. I tried all the ways - analyzed tables, indexes, histograms etc. Appreciate your suggesions to make the star transformation query faster

Thanks in advance,
Sreejith

Tom Kyte
October 21, 2002 - 6:59 pm UTC

Umm, I'm always confused when I get things like:

I tried "A"
It went to slow
When I use "B" it works greeeaattt, really fast

Me, I would stop at B -- works great, less filling. My job is done, move onto next problem please.

Maybe -- just maybe -- in this case, a star query is not what you want, an index plan that uses b*trees is?

If star queries were the correct answer for every DW query -- we would only supply them and nothing else.

Star transformation

sreejith, October 21, 2002 - 10:53 pm UTC

Issue is that if i alter the query, need to create another set of indexes. Queries are not fixed as it is generated from Cognos. if star transformation gives a better performance that would solve everything

Tom Kyte
October 22, 2002 - 7:07 am UTC

Given the level of information -- it is impossible for me to deliver the "fast = true" setting you are looking for.

Designing a Dataware house

Nag, October 23, 2002 - 10:41 pm UTC

Tom
1.
Are the basics of desinging a datawarehouse different from the ones followed in designing a regular database.

I know for sure that global indexes are not favored for DSS systems, and local indexes are favored for the purpose of ease of maintenance.

Are there any similar do and dont do's when it comes to datawarehouse and a regular database.


2.When we dump data into a datawarehouse, we usually disable the constraints and put the data it( off course we clean the data as much as posible before doing this).

But what if the data loaded has some bad data which violates integrity constraints. Once we enable the constraints again woudnt the table be in a unusable state as the constraints will find that some data is a misfit into the table. How is this situation handled.

Tom Kyte
October 24, 2002 - 6:51 am UTC

Yes, the basics are different.

Global indexes may well come into vogue with 9i and "maintain global indexes". And in a DW/DSS/whatever -- global indexes may well come into play in 8i and before -- all depends on your needs.

As for the constraints -- it depends. If you enable the constraint without validating the data - you could have primary keys that have dups, foreign keys that have no parents, check constraints that aren't satisfied. We "allow" you to do that. If you validate the data, you can use exceptions into tablename to capture the "bad" rows.

Syntax

A reader, October 24, 2002 - 9:05 am UTC

"exceptions into tablename to
capture the "bad" rows"

Can you give us the syntax for doing the above, is this a constraint related setting , or do you do that while creating the table itself.

Tom Kyte
October 24, 2002 - 2:23 pm UTC

star query schemas

Reddy, October 24, 2002 - 12:35 pm UTC

Hi tom

can you please explain about star query schemas more elaborately.

Thanks

Reddy

Tom Kyte
October 24, 2002 - 3:15 pm UTC

Could you explain what's a fully normalized enterprise data warehouse

Sean, January 03, 2003 - 7:30 pm UTC

I have just joined a data warehouse design team. To my surprise, they claim to design a fully normalized enterprise data warehouse. The data warehouse will etl data from OLTP databases. And there will be some datamarts, which are denormalized, etl data from the enterprise dataware house. This is first time I have ever heard "FULLY NORMALIZED DATA WAREHOUSE". It seams to No different with an OLTP database, because most of the OLTP are normalized databases?

Tom Kyte
January 04, 2003 - 9:15 am UTC

Surprisingly -- most OLTP databases are denormalized to a degree as well.

You know what -- I've never paid much attention to whether I'm in 3rd normal form, 2cnd normal for or 99th (ask me to define 3rd normal form, I'd have to go look it up)

The goal, the real goal is to have the data organized in such a way so as

o people can as easily as possible get their information
o the response times are as good as you can get
o the integrity of the data is not compromised

Everything else -- well so what? Who cares if a data warehouse is normalized, I mean -- it is not transactional after all. Sounds like this data warehouse is really the aggregated OLTP schema in disguise -- and performance would not be sufficient to satisfy end users so they end up making "data marts" (always found that a quaint term) which are really the data warehouses. They built a staging area, the data marts could probably just be in the data warehouse as materialized views -- providing for better security, easier managing, greater access, less redundancy, less work.



real warehouses

Stu Charlton, January 04, 2003 - 11:55 am UTC

Bill Inmon style data warehouses are usually highly granular and have maximum flexbility to meet different department's business needs... that usually means something more complex than star schema -- in effect a "normalized" database.

Most Teradata warehouses are designed in this way (their own industry data models aren't star schemas), and I believe some Oracle DW's are desgined this way (for example, the Richard Winter paper:)
</code> http://www.oracle.com/ip/collateral/vldw_winter.pdf <code>

If your team is going normalized for the "purity", it's probably wrong-headed. If it's for the flexibility to handle different informational needs (finance, marketing, sales, engineering, etc.) then it might be appropriate.... assuming that's your business need. But it IS a tradeoff -- flexibility for a more resource-intensive approach. You'll have your work cut out for you.

Dart marts & star schemas are usually highly specific to departmental needs... they are great if that's what you need (and usually it is).

Tom Kyte
January 04, 2003 - 12:00 pm UTC

...
If your team is going normalized for the "purity", it's probably wrong-headed.
.....

you could have left probably out of there and still be 100% dead on.

criteria for DW implementation

Will, January 04, 2003 - 3:08 pm UTC

Where I work, we have ten production Oracle databases ranging in size from 300MB to 17GB totaling about 50GB. Puny by most standards, I would think. We do not have any other significant stores of data in non-Oracle format.

Yet we have on the books for this FY, a data warehouse project. I'm not sure what the budgeted amount for this project is but I believe it's in the high $10K's to low $100K's.

The one-liner description is:

"Provide a centralized up-to-date data store to allow analysis and reporting from multiple databases"

and the justification statement is:

"The lack of an ad-hoc reporting tool and easy-to-use data repository is the number one issue end users complain about"


Basically, the idea is to bring in some consultants to build a data warehouse for us.

The DW concept has been thrown around in our org for a few years now. I have chimed in on a few occasions (apparently to no avail) with the following arguments:

o Many of our users don't seem that hungry for information and as such do not really know what they want. Sometimes we "decide" what they want, give it to them and let them elaborate.

o Some consultant that comes in has no clue what we want and need and must ask the users who also do not know. We've educated too many consultants already anyway!

o Most all of our data is in Oracle which means it is virtually centralized already.

o We already have a DW-like database that replicates certain data away from the OLTP's, we can continue to enhance this.

o Oracle, especially later versions, has tons on DW functionality that we are already familiar with, in-house.

o We're too small of a shop to be considering a full-fledged DW.

...

I agree with your statement that a DW is not something you buy but rather something you do.


I am curious if I am making valid points here and am interested in your input about the decision process regarding the development of a data warehouse.

Thanks, as always.

Tom Kyte
January 04, 2003 - 3:20 pm UTC

with a budge of 10k to low 100k's you probably won't get very far. If you go the consulting route, you'll burn that doing requirements collection!

If the end users don't have requirements -- tell them to stop complaining or develop some requirements. I agree with the consultant route you talk of. Better to bring in warm bodies (temps) to admin the existing systems whilst you (who will be there AFTERWARDS) do the knowledge gathering. You will after all be responsible for it after the fact. If you don't have DW experience, maybe you bring in one "expert" you trust to help you with coming up with the plan -- but do the work yourself.

If you already have a DW like database -- it sounds like the place to start and grow from....

Business reports

Raj, January 05, 2003 - 12:02 pm UTC

Hi Tom,

As I understand, the main objective of a DWH is for reporting. You can get sales, invoice, inventory etc etc information from a DWH.

Can you suggests some books which can give us detailed idea about which kind of reports/data we can extract from a DWH and how that data will be useful for the business.

Thanks.

Sean, January 07, 2003 - 9:05 pm UTC

I might certainly have missed something here. How could a reporting application perform well on a normalized database, which someone cliams as the data warehouse? Most reporting apps, are basically ad hoc queries, have a lot amd lot aggragated summing, sorting grouping and joins are complicate and involving many tables. Cerntainly this kind of application would not perform on the "data warehouse".

what and how to define dimension tables, fact tables are normalized?

A reader, November 20, 2003 - 2:25 pm UTC

Hi

How do we define dimensions in a datawarehouse?
For example we have data for product category which would have items such as drinks, food, cloth and sub product category would have itmes such like cold drinks, hot drinks, sea food, fast food, t-shirts, shirts, pants etc

We would like to do drill-down and rollup analysis on a sales fact table, how would we define the dimensions?

I also read that the fact table is normalized but I always understood that fact tables are denormalized, may you give a brief explanation?

Cheers

Tom Kyte
November 21, 2003 - 4:32 pm UTC

have you read the data warehousing guide ? it is on OTN. it describes such topics. for the nitty gritty details, a book on data warehousing concepts would be called for.

difference between Oracle Dimensions and Plain table dimensions

A reader, November 22, 2003 - 3:19 am UTC

Hi

What's the differnce of dimensions created by

CREATE DIMENSION

and dimensions created by us, such as simple

CREATE TABLE?

Tom Kyte
November 22, 2003 - 9:06 am UTC

A dimension is a schema object that defines hierarchical relationships between columns or column sets. A hierarchical relationship is a functional dependency from one level of a hierarchy to the next one. A dimension is a container of logical relationships and does not require any space in the database. A typical dimension is city, state (or province), region, and country.

In short, the create dimension adds META DATA to the database that can be used for query optimization.

Not to be confused with dimension tables you create for star schemas.

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96520/dimensio.htm#11221 <code>



Empty DW

Dan Clamage, November 24, 2003 - 2:02 pm UTC

The dumbest DW-related remark I ever heard was from a project manager who wanted to build an "empty" Data Warehouse as a "proof of concept". I said you mean don't actually load any data into it? What kind of proof of concept is that? That's not what he meant. I asked him what data did he want to model? He said no data modeling, just a DW engine. I said like OWB? He said no just a DW that has no specific data stores defined. That's when I realized he had no clue what he was talking about, and I pretty much said so. Finally, he said "Let's agree to disagree". I said NO you're wrong! The idea of building a non-descript "empty" Data Warehouse where you have no clue as to what you're putting in it, is nonsensical; it can't be built because there's nothing to build. This from the same idiot who wrote "Assumptions identify things that we are considering and not taking into consideration." A AND NOT A.
If you don't know what you're doing, it'll manifest itself eventually.

Dimension is exactly a star schema dimension

Stewart Bryson, December 03, 2003 - 4:01 pm UTC

The use of the "create dimension..." SQL statment is to specify a dimension table as such for Oracle. The reason this is done is so that the Summary Advisor is knowledgeable of the relationship, as is the query rewrite functionality.

To say that a "dimension" should not be confused with a dimension table in a star schema is false. In order for many of Oracle's advanced warehousing features to be used, it has to be aware of which tables are dimension tables. This is the functionality that does this.

If you carefully read the very page that you included a link for, this is made quite clear.


Tom Kyte
December 03, 2003 - 6:51 pm UTC

how is it false exactly?

I use create dimension to specify rollup hierarchies for MV rewrite.


A dimension table in a star schema is a dimension table.

A 'dimension' created by "create dimension" is NOT a table, it is meta data that describes relationships.

If you believe they are the same, you are mistaken. They are not even remotely "the same"

that would be like saying a primary key is a table. a table is a primary key. (loosely speaking)

dimension description

Stewart Bryson, December 04, 2003 - 12:11 pm UTC

I never said they were "the same". But they are related.

I don't think your primary key/table example holds water, nor is it exactly representative of the point you and I are debating.

You could say that a many-to-one relationship existed between two tables. To ensure that relationship is maintained, you could put a foreign key on the "many" table pointing to the primary key on the "one" table. It is true to say that the foreign key is not the same as the relationship, I agree with you there. However, I think it is misleading to say that the foreign key should not be confused with the relationship... it should be considered representative of the relationship.

I thought the statement "Not to be confused with dimension tables you create for star schemas" was confusing because, just as a foreign key represents a certain relationship to the RDBMS, so does the dimension facility. When you have a dimension table that you are planning on rolling-up, you should designate it as such with the logical functionality of "create dimension..." so query rewrite will know about it.

I do conceed your point... and I shouldn't have said your statement was false. But I do believe it was misleading.

Thanks for the dialogue.

Tom Kyte
December 04, 2003 - 12:35 pm UTC

we'll have to agree to disagree i suppose.


the question was:

What's the differnce of dimensions created by CREATE DIMENSION
and dimensions created by us, such as simple CREATE TABLE?


I would answer the same way. A dimension created via "create dimension" is not a table. A table created via "create table" is not a "dimension that describes a hierarchy".


I would say it the same all over again.

from the doc:

"In Oracle9i, the dimensional information itself is stored in a dimension table. In addition, the database object dimension helps to organize and group dimensional information into hierarchies."


the dimensional information is stored in a dimension table.

the database object "dimension" helps to organized and group dimensional information in hierarchies.



the "create dimension" object should NOT be confused with a dimension database table. they are two distinctly different beasts that may well be used at about the same time -- but they are two totally different things entirely.


"create table" --> something to store data in
"create dimension" --> method for you to put more metadata into the dictionary.

Riddle

noname, December 31, 2003 - 10:55 am UTC

When is a door not a door?
When it's ajar.

>primary key is a table. a table is a primary key
In the narrowest technical sense, this could be true when all columns in the table are also in the primary key. But that's not what you meant of course.

Metadata is data about the data, not the data itself. And a table is not the data, but the organization of the data. And the representation of an object is not the same thing as the real object being modeled.

This was not intended to be a philosophical discussion.

Indexes and FTS

Arun Gupta, February 13, 2004 - 9:23 am UTC

Tom,
Is it normal to NOT have indexes on fact tables in a star schema? Is it normal to do a full table scan of a 13 million row fact table and hash join with a dimension table?
Thanks


Tom Kyte
February 13, 2004 - 10:35 am UTC

not really -- and indexes are NOT the only game in town

star schemas can bitmap index the foreign keys to enable star transformations.

you use bitmap join indexes sometimes to denormalize the structures without denormalizing them.

many fact tables are partitioned (similar to the concept behind an index, reduce the data that needs be scanned via partition elimination)

MV's are used to pre-aggregate/join things.

Quesition about laying out Facts and Dimensions

A reader, April 23, 2004 - 8:51 am UTC

Tom,
I have a fact table called say automobile and dimensions around this fact table called automobile_type, automobile_segment.
I have a second fact table called automobile_certificates (each automobile has 100's of certificates) with multiple dimensions around it.
The question I have is
1) Can the fact table automobile become a dimension table for the 2nd fact table automobile_certificates?
In general can a fact table become a dimension table for a 2nd fact table?

Thank you


Tom Kyte
April 23, 2004 - 1:32 pm UTC

in a manner of speaking sure, if the cert table is the "center" and you join it to N other tables, one of which is the automobile table -- it would be a de-facto "dimension" of sorts.

A reader, September 01, 2004 - 10:43 am UTC

My current database is a DSS of sorts with nightly ETL and batch loads. During the day, there are a handful of legacy client-server apps, VB apps, ASPX pages, etc that mostly query the data. Total size is about 20GB

Business has expressed the need to capture historical information, report off of it, etc. One school of thought is to create a separate database on a new server to store this because the history (DW) could "bring the server to its knees". Its already a DSS and any query today could do the same!

What are your thoughts on this? I prefer to keep everything in one server one database (adding hardware if necessary) and tuning resource allocation using Resource Manager, user profiles, etc

Tom Kyte
September 01, 2004 - 10:54 am UTC

i agree 100% with your last paragraph.

the fewer servers, the better off you are. the fewer databases, the more information you have.

A reader, September 01, 2004 - 1:33 pm UTC

Right, I knew you would agree with me, but can you provide me with some "ammunition" to support my position?

I hate it when people suggest cloning/copying databases for no good reason. It is this whole MS mentality, they think copying an Oracle database is just like Ctrl-C/Ctrl-V a MDB file, for gods sake!

Tom Kyte
September 01, 2004 - 2:24 pm UTC

the ammo would be

a) oracle can handle lots of stuff in a single database
b) a single database is cheaper and easier to manage
c) distributed complexity is just that -- complex
d) did i mention - it will be cheaper and we can have it tomorrow


I always turn the tables around, what is their evidence or science that says they need another database.

It only makes for confused users Â…

Gabe, September 02, 2004 - 1:19 pm UTC

<quote>In general can a fact table become a dimension table for a 2nd fact table?</quote>

<quote>in a manner of speaking sure</quote>

Well, it would be a very confusing manner of speaking.

What exactly would they “query by” from the fact-now-dimension table?
Â… by surrogate PK values? Â… they have no meaning.
Â… by measure values? Â… they have no meaning either if not put in the context of the original dimensions.

So a query from the 2nd fact table which uses _something_ from the 1st fact table would have to get back to the original dimensions to get some meaning/context Â… if this is the case, why not implement FKs for the 2nd fact table straight to the original dimensions (hence keeping, conceptually, facts as facts and dimensions as dimensions)? Â…

This would be consistent with a DW having conformed dimensions Â… having a query (or chain of queries) showing and/or even filtering by measure values from both fact tables would be in line with drilling across (jumping from star to star by _anchoring_ to common dimensions).

To me, implementing the manner of speaking “fact table become a dimension table for a 2nd fact table” would require implementing a FK in the 2nd fact to the PK from the 1st one … and what exactly would be the point of that? Now an “automobile” having “certificates” is predicated on the prior existence of the measures being tracked in the “automobile” fact … what if they don’t exist (or don’t exist yet) … cannot get my “certificates” in!

Alternatively, let us forget about dimension and fact tables and just use the more traditional parent and child terminology.

Thanks.


A reader, October 28, 2004 - 7:34 am UTC

"a) oracle can handle lots of stuff in a single database
b) a single database is cheaper and easier to manage
c) distributed complexity is just that -- complex
d) did i mention - it will be cheaper and we can have it tomorrow"

Regarding (a), yes, Oracle can handle lots of stuff. But lets say I put a large DW and a smallish DSS/OLTP-type systme with a a few web-based and client-server apps all in one Oracle database.

How can I prevent the DW-type queries and their FTS and large data retrievals, etc from overwhelming the resources of the database? They would consume most of the buffer cache,use up lot of IO bandwidth, etc. How can I make it such everyone is happy?

[Yes, I know RAC is one solution, just put the DW on its own RAC node and everyone else on the other RAC node, but we are not quite there yet, I am curious to know if there are any non-RAC solutions]

Tom Kyte
October 28, 2004 - 12:26 pm UTC

you can use resource management to allocate resources. and using physical separation of applications by physical disk/device would speak also to the IO contention.

keep pools, recycle pools -- maybe -- maybe -- maybe even different block sizes address buffer pool issues (however, a cache is pretty darn good at keeping things it needs in the cache and things it doesn't not in the cache -- a DW isn't going to "flush the cache" necessarily).

you can use clustering to physically dedicate resources.



Clustering

A reader, October 28, 2004 - 12:45 pm UTC

"and using physical separation of applications by physical disk/device would speak also to the IO contention"

We are most probably going to upgrade to 10g within new few months. 10g's ASM would make all these kind of IO hotspots, contention issues moot, right? Just throw a bunch of disks at it and voila, IO nirvana? :)

"you can use clustering to physically dedicate resources"

Not sure what you mean by that, are you refering to RAC? Or can I use clustering without using RAC? How?

Tom Kyte
October 28, 2004 - 7:21 pm UTC

well, it depends -- if you want the best overall performance for everyone -- stripe is everywhere.

if you want to keep DW over there and OLTP over there, you'll be using disk groups to segregate the IO over there and over here.


Yes, I was refering to RAC if you want to further say "i want 8 cpus for DW and 4 for OLTP and 8 gig of ram for OLTP and 4 gig for DW"

Bill Inmon or Ralph Kimball ?

A reader, February 24, 2005 - 7:45 am UTC

Hi Tom,

I have been working as a Oracle DBA and would like to get into Data Warehousing. After doing an initial research, I came to know that there are 2 camps - Bill Inmon and Ralph Kimball, Now I want to buy a book,but not sure whom should I follow. Will you please help me out of this?



Tom Kyte
February 24, 2005 - 8:26 am UTC

read both and use the best of each - neither is 100% right or wrong, all have techniques, techniques that are applicable in different cases. Read them as "advice", not as scripture. (like my stuff, advice, no absolutes)

To a reader - Bill Inmon or Ralph Kimball?

Peter, February 24, 2005 - 11:05 am UTC

My view: neither of them is 100% right (or 100% wrong)! There is no such thing as the "one and only true" methodology for DWH.
I manage many multi-terabyte systems for different clients, each has a distinct approach that fits their business needs (such as how slowly changing dimensions are handled) and performance requirements. In my book (and perhaps I should write one!) getting the most from DWH requires an open and questioning approach and a lack of fear in trying (and benchmarking) alternative techniques.
The other thing to note for aspiring DWH people is that techniques available to maximize performance varies greatly between database vendors - DWH design is not a commodity or black box project.

Tom - like the new fair warning

Tom Kyte
February 24, 2005 - 5:01 pm UTC

hey, I could have written that response!....



Buildig areporting model

Atika, February 24, 2005 - 11:25 am UTC

Tom,

We are developing an OLTP system with almost 500+ tables, highly normalized with several associative tables. I am planning to build a separate reporting model ( kind of Data Mart) to solve the need of our reports so that we will have a clean OLTP system (no denormalized and views). Could I still have to build a Dimensional model or may be I can create a data mart with denormalized tables ( similar to fact table) with foreign keys relationship with other tables? I do have hierarchy and level need for my reports.

Thanks

Atika

Tom Kyte
February 24, 2005 - 5:02 pm UTC

<quote>
Could I still have to build a
Dimensional model or may be I can create a data mart with denormalized tables
</quote>

yes yes and yes
or
no maybe yes
or
yes no maybe
or
maybe maybe maybe
or
....................


what makes the most sense for you with your requirements?

Bill Inmon or Ralph Kimball?

A reader, February 24, 2005 - 11:27 pm UTC

Thanks Tom and Peter.

<Peter>
The other thing to note for aspiring DWH people is that techniques available to maximize performance varies greatly between database vendors
</Peter>

Does this mean that I will have to master other RDBMS's too, to be a sucessfull DW Engineer?

Thanks


Tom Kyte
February 25, 2005 - 4:50 pm UTC

Only if you want to use them -- yes.

Otherwise you'll try to do things "Oracle wise" in DB2 and that isn't going to work very well. And just try to apply Oracle concepts to Teradata -- like "trickle feeds" -- constantly loading an operational data store -- not going to happen in that database, could easily in Oracle.

The databases have radically different features.

master other RDBMS ?

Peter, February 25, 2005 - 4:17 am UTC

Well....
I only do Oracle DWH and make a good living from it - I do have a general awareness other vendors products (Teradata, SQL Server), but not (in my opinion) strong enough to run a project using them (am I too honest for this job?). There are a lot of Oracle DWH projects out there, a good understanding of how Oracle works is an enormous boost. I would go with "Know one vendor well"

Tom - Sorry, I think I rewrote your reply above! - what a waste of rows in your database!

Bill Inmon or Ralph Kimball?

A reader, February 26, 2005 - 12:19 am UTC

Thanks Tom and Peter once again.

I think I will learn DW the following way :-

1.The Data Warehouse Lifecycle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses by Ralph Kimball
2.Essential Oracle8i Data Warehousing: Designing, Building, and Managing Oracle Data Warehouses by Gary Dodge, Tim Gorman
and finally
3. DB2 UDB v8 Handbook for Windows and UNIX/Linux
by Philip K. Gunning

Any more Resources?



Tom Kyte
February 26, 2005 - 7:51 am UTC

don't understand #3 :)

Bill Inmon or Ralph Kimball?

A reader, February 27, 2005 - 11:42 pm UTC

<TOM>
Otherwise you'll try to do things "Oracle wise" in DB2 and that isn't going to work very well.
</TOM>

#3. Just to get a FEEL for it, because I suppose after Oracle it is the most Popular RDBMS and will likely encounter it someday.

Also forgot to include
Data Warehousing Guide from Oracle Documentation ;)

Real-Time DW using Oracle

Tony, March 28, 2005 - 5:24 am UTC

Is it possible to built real-time Data Warehouse using Oracle DB? If yes, How? Any DB feature or tool given by Oracle corp for this purpose?



Tom Kyte
March 28, 2005 - 7:51 am UTC

define "real time" for me. If you mean "trickle fed" (continously updated), sure. replication, streams, change data capture, lots of stuff do that.

A Strange Situation - help appreciated

Mark, June 27, 2005 - 2:30 pm UTC

RE: Oracle Standard 8.1.7.4 Windows

No partitioning, etc...

My situation is thus:

We created a mechanism to create base tables in our OLTP application based on a high-level company id to reduce the amount of rows in the base table for each company. Our problem, however, is since we are doing this on the LIVE database and need it archived to the Read Only database, it created amazing amounts of archive Logs. This problem will only get worse as we enable more clients with our reporting tool.

So now, the 'Big Cheese' has decided that he wants me to develop a set of Oracle Views to replace the base tables to eliminate the archive logging. This is can do no prob, but the performance ... bites.

Am I missing some mechanism to accomplish good performance on my Read Only DB with respect to the reporting we are doing? They completely balked at my suggestion of using MV's due to the archive logging taking place...

At a loss.

Thanks in Advance


Tom Kyte
June 27, 2005 - 2:51 pm UTC

You'll have to detail why your approach generates an "amazing amount of archives" and how a view will help anything?

The Deal

Mark, June 27, 2005 - 4:04 pm UTC

I suspect you are leading me to the answer here, so I will give you more information, as you are the best...

Ok, what we did was write Packages to create base tables named in a standard way with the company Id appended to the end of them, like MY_DATA_1234 and MY_DATA_1235, for example. There might be 7 or 8 per company. This table creation happened once, when you turned a company 'on' for reporting.

Then, daily, beginning at 8:30 PM, we would insert rows into these company-specific base tables for all companies that are turned 'on' for reporting, based on some settings. this would generally be the previous days data. This is where the copious amounts of archive logs were generated from. We applied our logs every few hours, but a problem kept creeping in that the system would become bogged down at around 11 PM in the evening, because of the archive space filling up, especially if we 'turned on' several companies at once. It would need to construct the historical data from the actual tables.

Now, we have around 40 GIG of drive space available for archive duplexing, and this is periodically filling up.

Using views would preclude any archiving and use of the base tables by simply querying off the actual tables themselves, but performance is slow (fairly complex medical information).

I personally think the MV's are the way to go here with some creative CUBE and ROLLUP queries, but it is simply not my decision at this point. It's been dropped on me - sort of a 'Hey, you try it, I give up' thing from 'the other guy'.

What are your thoughts? Hope that is enough info for you.

p.s. The Backup/Read Only is available until 8:30 PM for reporting. Applying Archives continually is not an option at the moment (unless I am unaware of some trick or feature of Oracle that allows this).

Regards,
Mark

Tom Kyte
June 27, 2005 - 5:19 pm UTC

Sounds like you might want to direct path the data (insert /*+ APPEND */ into t select * from another_t)

Using that you will bypass UNDO on the table insert (but not the index, you will minimize undo generation on the indexes that are maintained, but you cannot bypass it)

Further, using the direct path, you can if you want bypass REDO generation on the TABLE (but not indexes) by setting the table nologging.


Only a full refresh MV would be able to bypass UNDO and REDO (it would use a truncate + insert /*+ APPEND */ to re-populate the table)



hmmmm....

Mark, June 28, 2005 - 11:26 am UTC

"Further, using the direct path, you can if you want bypass REDO generation on the TABLE (but not indexes) by setting the table nologging."

Doesn't that defeat the purpose? If set to nologging, will the inserts on the live database not be entered into the archive logs for application to the backup/Read only/reporting database?

From Data Warehouse Guide:
"The no-logging mode improves performance because it generates much less log data. The user is responsible for backing up the data after a no-logging insert operation in order to be able to perform media recovery."

Tom, I know the right way to do this is to figure out what questions I want answered and develop properly indexed MV's for them. Unfotunately, it's not my decision to make and if I had a dime every time I told them this....am I missing something here?





Tom Kyte
June 28, 2005 - 11:36 am UTC

defeat the purpose of what exactly?

Yes, if you use non-logged operations, you must consider that in your backup procedures. however, if these tables are *BUILT* from other tables for reporting purposes,..... do you need to back them up.


Your stated goal seemed to be "lots of redo, how can we reduce that"

I don't know what you are missing, I don't fully understand the issue. If your goal is reduce redo, direct path operations to move data are the way to go

ok, more specifics.

Mark, June 28, 2005 - 11:59 am UTC

Ok, my fault. Sorry I did not explain more clearly.

2 databases: LIVE, and BACKUP/REPORTING

8.1.7.4 STANDARD

Our 'Backup' Database is used as a Reporting Database. It is simply flipped into Read Only mode during operating hours so clients can generate reports off it. it is flipped back into Back Up mode at night and the archive logs applied. Therefore, data typically is 1 day behind the LIVE database. that itself is not a problem. We can live with that. The goal is to move all reporting to the back-up/reporting database eventually to alleviate any resource problems on the LIVE database server.

"if these tables are *BUILT* from other tables for
reporting purposes,..... do you need to back them up."

The company specific tables need to be backed up - so they appear on the Reporting/Back Up database. Therefore, NOLOGGING on these tables is not an option, as it would not make entries into the archive logs, and not propogate across.

I guess I should use the APPEND hint which
will bypass UNDO. I'm a little fuzzy what that really means. By using the APPEND hint on my LIVE database to do INSERTS into my company tables, will those newly inserted rows make it to my back-up database overnight when the archive logs are applied, and from an archiving perspective, what types of savings on space can I expect percentage-wise? 25%? 50?

Thanks a ton for helping me.


Tom Kyte
June 28, 2005 - 12:47 pm UTC

as long as the table is LOGGING, the redo will always be generated, yes.

In order to understand what to expect, you'll need to benchmark (test) your situation. It is always a your mileage may vary.

25% or more would be probably in the realm of "unobtainable" by simply using an append. To achieve leaps that large, you'll be looking at totally new approaches, not tweaks of the existing.


Kalyan

Kalyana Chakravarthy k, August 24, 2005 - 10:10 am UTC

Hi There
I am interested in Data Warehousing and would like to know if its necessary to be a DBA in order to learn Data Warehousing. If so what could be the possible hassles in this regard. I am not a DBA and want to pursue a career in Data Warehousing. Please let me know if this will be a problem. Incase if it is, what can I do in this regard.

Kind regards
kalyan

Tom Kyte
August 24, 2005 - 5:36 pm UTC

define to me what you mean by being in "data warehousing"...

do you want to be the DBA of a data warehouse?
the architect behind it? the modeler?
the developer?

Data warehouse

Adrian, August 31, 2005 - 2:42 am UTC

However, I would like to know, why does a buiness have separate databases and a data warehouse?

Tom Kyte
August 31, 2005 - 1:25 pm UTC

data warehouses are typically long term, big, consolidations from multiple transactional systems

they need not be separate, it is just the convention.

David Aldridge http://oraclesponge.blogspot.com, August 31, 2005 - 1:44 pm UTC

My 2cents

There are some other issues, I think, whereby the tuning goals and query type on a data warehouse are so different to those of an OLTP system that it would make it difficult for the two to co-exist.

For example the widespread use of nologging operations in a DWh makes archive logging mode practically redundant as it would only be logging a small fraction of the data changes. So you could run a DWh in nologging mode, whereas you'd be crazy to do so with an OLTP system.

It would also be tricky, Resource Manager not withstanding, to maintain a balance between required OLTP performance and DWh activities.

I expect there are some other issues -- the relative sizing of PGA and SGA for example -- that would push me towards separate databases even if the DWh was fed from a single OLTP system.

Tom Kyte
August 31, 2005 - 2:17 pm UTC

think clusters ;)

materialized views to ETL some of the data from OLTP to DW, separation of disk almost, separate instances, separate cpu, separate memory.

Tom Kyte Rockssssss !!!!

Jay, August 31, 2005 - 2:51 pm UTC

Tom....

Only one question.......

'Do you get paid for answering these questions???'

If so...how much..??

If u need an assistant, I'm ready to assist you..!!!!



Tom Kyte
August 31, 2005 - 3:32 pm UTC

yes, I get my salary.

SORRY and thank god u arent angry!!!!

Jay, September 01, 2005 - 12:59 am UTC

Hey tom...thanks for taking ur time to reply to that dumb message!!!!

I was just so mad at myself after sending that message...(i'm talking about the 'if so..how much?' part)..I thought of editing it but oh well, it was just too late for that.

I actually didnt mean it the way it sounded...hope u took that as a joke. I know very well that its totally unethical to ask anyone such a question...I just got a little bit carried away I guess. I have been following your messages for a while now though I aint an oracle person.

I just totally admire your skills tom..Like I said..YOU ROCK!!!!

Cheers
Jay


P.S. I'm still waiting to hear on the response to my offer of becoming your assistant...(typing, p.a. etc., :-) ) Do you have some one who sorts all this for you or is all done by JUST U. How do you even get the time to do this??? Okay....i'll stop bugging you with this message..have a wonderful day Tom.

Tom Kyte
September 01, 2005 - 1:41 am UTC

no worries, I wasn't "angry".

And I pretty much ignore "personal" questions, things I don't think others need to know (about me)

Building a Data warehouse

Vikas Khanna, September 29, 2005 - 9:08 am UTC

Hi Tom,

We are building a DW where in the captured logs in the RAW format are pushed in by using SQL*LOADER utility. The cron jobs are executed every hour and the load keeps on geeting in the tables. The whole process is extremely slow. What I suggested to the developers:

Right now we should try to improve on these numbers in the same environment:

1. by loading the data using DIRECT option where in sqlldr will bypass the SQL evaluation buffer and will write directly to the datafiles, thus generating minimal redo.

2. We should alter the indexes to unusable state before loading the data so that indexes are n't automatically updated for every row that gets loaded in the table. After the loading of data is done we should rebuild the indexes online for the objects where we have pumped the data in.

3. We should n't commit so frequently as the default sql ldr option suggest. Rather we should give ROWS = some big number so that commit occurs only once, thus making the LGWR to flush the redo contents from the log_buffer occasionally rather than making it work more frequently.

However the answers were that we cannot use direct = true option because we have referntial integrity constraints enabled and we are not sure if the parent record is with us while having the child records. Its just the question of timing when the log puller pulls the raw log file and we using SQL*LOADER just fills the data in.

I think we still can create scratch tables with the table name appened to _tmp and flush the data in using the direct=true option and then using Insert /*+ append */ hint move the data back in the original tables. This will move the HWM to a new value and no redo will be generated.

We are still to evaluate the partitioning concepts since we are loading the data every hour into the real tables. Is there any other suggested approach which can make the load faster.

Thanks for your comments!

Tom Kyte
September 30, 2005 - 7:45 am UTC

have you looked at "what is slow" - before I even look at something like this, I want to understand "what is the bottleneck now".

you have good suggestions, but if they don't address the problem, they are not useful. In fact, they could make it all much slower (if you add 1% new records to a table, disabling and then rebuilding the indexes would probably not make anything faster).

What is the root cause of the "slowness".

I cannot even define "extremely slow", you may well have unreasonable expectations (or not)


I would be more tempted to parallelize this - have the input files split into many smaller ones and load then concurrently. But, one would need to know more about "whats wrong" first.

datawarehousing expertise

daniel, October 15, 2005 - 5:10 pm UTC

Hi Tom, I quite agree with you regarding the datawarehouse expertise required, especially when you made a joke about grandchildren finishing the export/import.
Unfortnately I am in a mess, I have been asked to organise the project of moving a 900 GB datawarehouse from AIX /9I to HP/9I DATABASE, Now the platforms are different and am on 9i. What options do i have, besides exp/import?
I a thinking of sqlplus copy function?

Also since data is partitioned, I would start copying old months data and hopefully within a month i would be able to popule atleast 95% of data to hp via exp/imp.
Then I would annouce a cut off week end, transfer current month partition and ask prod to run the batch on the new system hp onwards?
But any better solution to speed up?
cheers


Tom Kyte
October 16, 2005 - 7:53 am UTC

sqlplus copy for some.

database links with create table as select or insert /*+ append */

unload to flat file, reload with external table/sqlldr

running many exports in parallel and imports in parallel

using Views in DWH

A reader, October 25, 2005 - 9:04 am UTC

Hi

I have just been asked to support a DWH.

I have had a look at some queries and I noticed extensive use of Views to hide complex queries. There queries which queries views and those views are made up of more views

I wonder if this can make CBO go MAD and generate not so optimitic execution plans? Or SBO Will simply rewrite the query into plain SQL?

Cheers




Tom Kyte
October 26, 2005 - 7:18 am UTC

In many cases - it won't "matter", the CBO rewrites the views into the query and optimizes that.

However, if you

create view v
as
select t1.x, t2.y
from t1 left join t2 on (t1.id = t2.id)
/

but you only query:


select x from v;


then you are doing some work you "don't need to have done" IF T1 is one to one with T2 - the left join to T2 - didn't have to happen for "select x from v", but the database will do it anyway.

Also, there are cases with aggregates and analytics whereby predicates cannot be "pushed" into the view(s) (because it would change the answer) - but sometimes that is what you wanted anyway (the changed answer). In that case the view can get in the way.


In general - views are good, but you do need to understand what the view is doing and judge whether it is appropriate for the question being asked....



RE: junk dimension

Bakunian, February 28, 2006 - 12:40 pm UTC

Tom,

I could not find any reference to this on your site therefore I am posting this question here.
I assume that you are familiar with "Junk Dimension" could you shed some light in what instances do you think it is appropriate to use them.

Thank you in advance for your time


Tom Kyte
March 01, 2006 - 7:53 am UTC

You assumed incorrectly.

but google seemed to know.

Junk dimention

Bakunian, March 01, 2006 - 12:35 pm UTC

Heh-heh, but I don't trust to everything goodle says. Can you suggest good book on data warehousing.


Tom Kyte
March 01, 2006 - 1:51 pm UTC

I'm not personally familar with one.

OLTP & DW together

A reader, March 01, 2006 - 2:40 pm UTC

Tom, you're one that always advise to run One Database -> Many applications, but what about when applications are of different architechture ? That is, applications running on the same DB but on different schemas are of different kinds of architechture. Would this advise be still valid ? I would assume DW and OLTP have different requirements at the DB level (parameters, datafiles organization, security, etc.).

Thanks!

Tom Kyte
March 02, 2006 - 8:36 am UTC

different schemas = perfection.

but ask yourself - would you run a DW and OLTP system on the same machine with two databases?

My point is:

ONE machine = ONE database instance


So, would I run OLTP and DW on the same machine? Likely not.

So, it is not really an answerable question since you would not have put your DW on the same box as OLTP in the first place.




largest oracle database you know...

Ryan, March 01, 2006 - 5:37 pm UTC

can you tell us the largest oracle database in terms of bytes that you know exists?

also do you know what the largest transaction oracle database that exists and how much traffic it has?

what are the biggest databases you ever worked on?



Tom Kyte
March 02, 2006 - 9:09 am UTC

goto the winter corp site, they have information on large databases.

</code> http://www.wintercorp.com/VLDB/2005_TopTen_Survey/TopTenWinners_2005.asp <code>

convert oltp database to dataware house database

sachin, April 07, 2006 - 1:47 am UTC

Hi Tom,

I have created the database in interactive mode without selecting any options like dataware house or so.. Now user are complaining that the java components are not working and asking me to create database using dataware house option. Is there any way to convert this database to dataware house database using any packages or sql scripts?

Thanks in advance,
Sachin

Tom Kyte
April 08, 2006 - 8:40 am UTC

er?


every Oracle database is a "oltp" database, everyone could be a data warehouse as well.


What java components are not working - and why does that make you want to install a "data warehouse" version of Oracle (not that such a beast exists really)

DW and OLTP together

Syed, May 08, 2006 - 7:31 am UTC

Hi Tom,

We have a clustered environment where we are planning to have datawarehouse and OLTP on a single machine. They will be in different schemas.

Please tell me the pros and cons which i need to consider now.

Thanks,

Tom Kyte
May 08, 2006 - 8:24 am UTC

"we have a clustered environment..."
"on a single machine..."

can you clarify - will you be a clustered environment, or a single machine?

Data Warehouse and OLTP together

Syed, May 08, 2006 - 10:03 am UTC

Hi Tom,

In the clustered environment whether to have Single instance of Oracle running in support of both OLTP and DW environments or a separate instance of each and their Pros and Cons

Thanks

Tom Kyte
May 08, 2006 - 10:51 am UTC

still not following - if you have a clustered environment, you by definition would have an instance of Oracle on each node in the cluster.

So, is the word "clustered" here a red herring, are you using RAC or not?

DW and OLTP together

Syed, May 08, 2006 - 11:00 am UTC

Yes we are using RAC

Tom Kyte
May 08, 2006 - 1:27 pm UTC

Ok, so there is a SINGLE DATABASE

there are many instances, and each instance is on its own machine.

You will NOT have a single instance - you will not have a single machine.


so, going back to the original questoin - why would we be talking about a single instance on a single machine?

Dave, May 09, 2006 - 4:12 am UTC

I think he means putting the dw and oltp stuff in the same database as opposed to having two databases.



Tom Kyte
May 09, 2006 - 7:59 am UTC

I'm trying to get him to say what he means.

Not guess.

With RAC there would be a single database, multiple instances.

methods of copying large data from production to warehouse

Kubilay, May 26, 2006 - 7:44 am UTC

Hello Tom!

I have to move some tables from the production system to the data warehouse in order to avoid doing joins over db_links between production and datawarehouse, as tables in the data warehouse are not sufficient to answer all queries from management. The move is towards creating an ODS.

Some of these production tables are already populating warehouse tables via nightly ETL, which is fine.

But there are some brand new tables , mostly static lookup info, which I have to move and get refreshed every night via ETL or MV Replication.

Particularly, there is a highly transactional one which has got 2 million rows , a users table, which we want to be available on the warehouse as well for querying.

We need this table at the warehouse since we donÂ’t want to write queries, which would reference this table in the warehouse remotely from production, an already very busy table on production, that makes the query very expensive.

1) What would you recommend as a method of initial copying this 2 million row and growing table from production to warehouse,

* Impdp/expdp
* Create table or maybe materialized view as... via db_link with nologging?
* SQL*Loader load
* Tablespace transport, and then drop the other object which I donÂ’t want from the tablespace?

2) Afterwards, how would you recommend refreshing this table, which at times can be highly transactional OLTP on the production systems.

I think MV Replication is the one I must choose since data on the warehouse is also needed to be quite up to date sometimes or at least have the choice of when to update it during the data via refresh.

Many thanks for all your help and guidance.

Best Regard

Kubilay




Tom Kyte
May 26, 2006 - 8:57 am UTC

probably - PROBABLY - an incrementally refreshed materialized view. 2 million rows is pretty small.

Your users table is "highly transactional" - you'll need to be more clear. a users table sounds like "static stuff" mostly.

Datawarehose and OLTP apps on same DB/Server

A reader, May 26, 2006 - 9:21 am UTC

Tom, what is your opinion about having an OLTP and DW apps both running on the same server/db. One would ask if the underlying configuration of a DB needs to be changed depending on the type of application it will run. I would expect this to te _true_.

Tom Kyte
May 27, 2006 - 9:16 pm UTC

It can work, it does work, many people do it. All depends on your definition of "DW" as well - for many people that is really "i do some big reports".

The underlying configuration need not be different - most things are session settable these days

Rahul, May 27, 2006 - 12:20 am UTC

Reader,

The number of d/b that Tom has recommended multiple times on this site per server is "one".




methods of copying large data from production to warehouse

Kubilay, May 30, 2006 - 6:25 pm UTC

Hi Tom

Many thanks for your answer!

OK, I agree with you, I need to clarify, sorry for not doing it in the first posting.

The users (big_table) table is not highly transactional as in the sense of hundreds of hits per second, it is a large table though. Lots of users, 2 million rows and 50 columns (400MB+ segment)

I really want to find out how I can create this materialized view. As I am planning to add the creation of the Materialized view(s) to the nightly ETL process and get all done and copied in one night from production to the warehouse and then letting it to incrementally refresh once a night via its mlog$ log.

Are you suggesting I create it over a db_link like

Create Materialized view blah blah
refresh fast on demand with primary key
as select * from big_table@someplace nologging;

Is there any other method for this initial creation of the MV which will make it quicker on such a large table?

What if this big_table table is already composite range-hash partitioned? Will it make any difference in the refresh times(speed)?

Many thanks

Kubilay



Tom Kyte
May 30, 2006 - 7:14 pm UTC

400mb - tiny, I have USB drives for my laptop that are much larger.

Are you "hypothesizing" this is "big and hence slow" or..... :)

I do not anticipate this being "slow" It really isn't that big.

IOT for Dimension tables

Suvendu, August 19, 2006 - 9:08 am UTC

Hi Tom,

Is it advised to create all dimension tables in Data warehouse as index organized table?
Could you please, elaborate the advantage and disadvantages to choose IOT for dimension tables?

I was going thru one URL: </code> http://www.databasejournal.com/features/oracle/article.php/3304791 <code>, they are going for IOT for dimension tables.

Your recommendations are always highly appreciate.

Thanking you.

Regards,
Suvendu


Tom Kyte
August 19, 2006 - 10:36 am UTC

they are not ALWAYS going (they being David Aldridge in this case). He points out at least one case where you might consider *not* going for them.

I'll add another - if you frequently access your dimension table via secondary (non primary key) indices as well, and the dimensions are inserted into over time, we might reconsider an IOT as the secondary indices can become inefficient over time due to the inserts and constant movement of data in the IOT.

DW Analyst

HK, November 28, 2006 - 2:13 pm UTC

Tom,

What are key skills (technical or otherwise) is required for a datawarehouse analyst.

We are currently looking for a person for this role, and we would like define the key required skills. Any help in this direction will be much appreciated.

Thanks & Regards,

HK

Tom Kyte
November 28, 2006 - 7:56 pm UTC

I would say if you are hiring someone that you want to have run your warehouse - to be the "lead", then:

o a proven track record backed up with references managing a database of size (near what you expect)


beyond that - it is hard to say - sometimes parallel query is the most important, sometimes partitioning, indexing capabilities, in depth understanding of materialized views...

I'll let others add their thoughts - you might take the table of contents from:

</code> http://docs.oracle.com/docs/cd/B19306_01/server.102/b14223/toc.htm <code>

and ask them to sort of flesh it out (but that presumes you have someone that can judge the fleshing out - which means you would be able to build the list :)



Data Wareshouse Tunning

A reader, April 15, 2007 - 1:49 am UTC

Tom,

Is it OK to have nulls in a fact table? For example we have a 20 Gb. fact table with 200 million rows and dimension table with 30,000 rows. Current corporate design docs recommend storing surrogate key instead of NULL justifying it that inner join to a dimension table will out perform outer join, however 99% of values in some of the fact table columns are actually NULLs. So even if I go into trouble updating millions of rows with a surrogate key the index will not be selective enough. Additionally those NULL columns have b-tree index built on them wasting lots of space.
What is your opinion on this?
1. Can there be NULLs in fact table?
2. If "No" then what do you think when new columns added to a table I'll have to update fact table setting hundred of millions of rows to some value? I've done that in the past it isn't pretty, considering table is not partitioned.

Got your "Effective Oracle" very helpful as always, but did you think to write a book on how to deal with Data Warehouse it is a beast that few know how to tame?

Grateful for your time.


Tom Kyte
April 16, 2007 - 1:00 pm UTC

Nulls in a fact table are just fine.

Given that most times - the outer join would not be necessary

And when it is, it is - and not really any less efficient than an inner join - you'll be full scanning and hashing (or star transforming)

tablespace design in DW db

Sean, May 10, 2007 - 6:24 pm UTC

Hi Tom,

We are in the process to create DW database on 500g single LUN (10gR2). I intend to just create one tablespace for all facts and dimension data, but one of our developers wants to create at least four tablespaces: fact_tbs, fact_index_tbs, dimension_tbs, dimension_index_tbs. What is your suggestion in terms of tablespace in DW db? Thanks so much for your help.

-- Sean
Tom Kyte
May 11, 2007 - 11:23 am UTC

use tablespaces to improve the quality of your life.

that is all.

use them as an administrative tool to organize things in a fashion that makes the most sense to you.

because that is pretty much what they do - they are not a performance thing really, they are an organizational tool

David Aldridge, May 11, 2007 - 3:21 pm UTC

I think that one of the practical differences between OLTP and DW is in the tablespace handling. DW's tend to be pretty tablespace-happy because of the ease of moving bulk data around with transportable tablespaces. Hence I personally like to have tablespaces dedicated to particular month partitions (for example) of a fact table, probably including the partitions of locally partitioned indexes in there and even the relevanrt partitions of materialized views. This lets you get the tablespace into a readonly state, which leads to more happiness for everyone.

Hence DBA's have to be "trained" to use tablespaces more as a convenient encapsulator of data files (as Tom says) and less as a major database configuration item.
Tom Kyte
May 11, 2007 - 3:45 pm UTC

see, you use tablespaces to make your life better, more enjoyable :)


from db2 udb world

Prakash, May 14, 2007 - 9:57 am UTC

I have been maintaining a database, which is appx 50TB in size. This is the database utilizing IBM db2 UDB technology.

Most important factor i feel is to keep your databases at the latest version and to utilize the new features..

Some of the example from db2 UDB world:

1. New options available with LOAD utility
2. Automatic tablespace extension
3. Using "UNION ALL" for view creation
4. ATTACH and DETACH operation
5. Use of DPF (database partitioning) : As i understand this is similar to RAC in Oracle?
6. As already mentioned by "David Aldridge" : intelligent tablespace management
7. Use of MQTs (materialized query tables)
8. Table level compression; operating system (file system) level compression
9. Increamental/delta backups : this is a very tricky one
10. Use of split-mirror (if one can afford it)

My experience may not be very important here as this is a Oracle forum..
We may need a Guru (from oracle+db2) world, who can explain above mentioned db2 features in Oracle language...
Cheers
Prakash

Joining & Grouping

Ravi, February 22, 2008 - 1:41 am UTC

Hi Tom
I am using a query which contain joining of tables, grouping, and Where Clause.
My question is what will be difference between execution plan of database server and datawarehouse server(supposed that i am using star schema)?
Tom Kyte
February 22, 2008 - 7:10 am UTC

eh?

they will be the same - assuming everything is constant - because a database server is a datawarehouse server is a database server.

vikas atrey, February 27, 2008 - 11:10 am UTC

Do you mean star transformation ? If yes then please enjoy Jonathan Lewis book ( Cost Based Oracle: Fundamentals Chapter : 09:Query Transformation)

What is the sence of Dimensions with NORMALIZED tables?

Marat, April 19, 2008 - 6:03 am UTC

Dear Tom,
could you explain the following:

What is the sence of Dimensions if we have fully normalized tables? Lets say, three tables:

T1 (id);
T2 (id, t1_id);
T3 (t2_id, some_fact);

CREATE MATERIALIZED VIEW my_mv
ENABLE QUERY REWRITE
AS
SELECT count(some_fact)
  FROM t3
 GROUP BY t2_id;


Then, if we SELECT from just t3, or either from all t1, t2, and t3, my_mv will be used somehow. Without any dimensions.

So, could you give an example, when a DIMENSION is necessary with fully normalized tables?

Thank you.
Tom Kyte
April 19, 2008 - 3:27 pm UTC

given your example, you would not need a dimension of course. They are useful when you have a hierarchy of relationships modeled in your dimension tables.

and it can happen with "normalized" data


fact table( person_id, name, salary, date_of_birth )

dim table ( the_date, zodiac sign, season )


the_date implies zodiac sign
the_date implies season


if you created a materialized view "select date_of_birth, sum(salary)", you could create a dimension that would allow you to answer "how much do pisces make" (since date of birth -> zodiac sign) and "how much do people born in the summer make"...

Thank you!

Marat, April 20, 2008 - 12:28 am UTC

Now it is clear.
Thank you, Tom!

ooops... Actually, not everything is clear..

Marat, April 21, 2008 - 7:15 am UTC

Dear Tom,
I tried to emplement your idea from previous followup, and I have realized that again a dimension is not nedded:

fact table( person_id, name, salary, date_of_birth ) 
dim table ( the_date, zodiac sign, season ) 

CREATE mv as select date_of_birth, sum(salary)...


if we need to know how much Libras make, we will select fact table through dim table where dim table.zodiac = Libra, and dim table will be joined with fact through date_of_birth. And again mv will be used without a dimension.

Tom Kyte
April 23, 2008 - 5:01 pm UTC

Ok, right,

say date -> zodiac -> season (that is my rule)

now, create materialized view at level of zodiac

then try to use it to get seasonal results.

Dimensional v Relational Schemas.

Bill, April 23, 2008 - 3:18 am UTC

First of all if the database is not built on dimensional schemas, then it probably isn't a data warehouse, it is probably more of some archive, or muddled repository of data. The database probably doesn¿t correspond to the classic definition of being subject-oriented, time-variant, conformed, non-volatile, and (very) large.

And could we also knock this old chestnut of normalising in DWHs on the head please. Normalising or snowflaking is on the whole not a good thing, without sound justification. Just read some Ralph Kimball to convince yourself, if you don¿t think otherwise.

And if you're not getting star transformations, then there is almost certainly something wrong - or your database so is small (< a few hundred GB) that it doesn't make too much difference. Or perhaps your business users are not carrying out real DSS/OLAP - and are probably simply using canned BO/Discovery reports and the like.

However, all this is really a diversion: any good OLTP designer/DBA can with the right training and an open mindset can become a good DWH designer/DBA. Nearly everyone, bar one poster above, has missed out on the fundamental point about a DWH.

The real challenge of a DWH is not the design of the dimensions and fact tables. That is very important, but it should not be a problem to a decent DWH architect. The challenge is not in the front end reports, queries, and datamarts.

The real challenge of any proper DWH is in the ETL. Because that is where the politics come to the fore. You have multiple systems of reference - possibly the domains of non-co-operative, sceptical, or even downright hostile DBAs/developers/users. You have to conform the data from disparate sources and you have to decide on common definitions. That is above all a political, not a technical challenge. You need strong, committed management. You need to sell the DWH to business departments. You need to have conviction, determination, foresight, communication skills, diplomacy, and doggedness, as well as your fancy technical skills.

The technical stuff is manageable; it's the politics where a true, corporate-wide DWH really differs from departmental / stovepipe / OLTP systems.

Now it is really clear

A reader, April 25, 2008 - 5:39 am UTC

"say date -> zodiac -> season (that is my rule)
now, create materialized view at level of zodiac
then try to use it to get seasonal results. "

ok, I tried it and now I see how it works. Thank you, Tom!

Dimensional Versus Relational

SeánMacGC, April 25, 2008 - 5:52 am UTC

Bill from Brussels said:

And could we also knock this old chestnut of normalising in DWHs on the head please. Normalising or
snowflaking is on the whole not a good thing, without sound justification. Just read some Ralph
Kimball to convince yourself, if you don't think otherwise.


Wow! There's a generalisation that would half scare me to death! How about reading some Ted Codd or Chris Date to convince yourself of the soundness of a scientific Relational approach? :o)

A few points:

* Normalisation of data was formulated for very sound, scientific reasons (prevention of UPDATE anomalies, prevention of DELETE anomalies, etc.)

* Equating mandatory denormalisation with data warehouses (or any other flavour of relational database storage incarnations) is a recipe for very confused data structures, in my experience. It seems that that licence once given is soundly and roundly abused, with repeated data elements hither and thither

* Kimball came to his deductions (yes, I've read him) when DBMS engines were somewhat unwieldy affairs perhaps, where joins slaughtered performance

* I have a lot of respect for Kimball, but his philosophy has been horribly abused in too many cases; his disciples have run amok with the surrogates for everything, with the consequent obfuscation of the most elementary and rudimentary of data

* Kimball mixes the physical with the logical, which I concede in certain specific circumstances is a necessity for efficient data manipulation, but any logical inconsistencies arising as a result thereof are, without fail, BIG inconsistencies

* The initial reasons cited for the 'need' for denormalisation (well over a decade ago now in the current era) diminish with each (Oracle) DBMS release:- table clustering, bitmap joins, index organized tables, etc.

* I am not against denormalisation per se in some sets of specific circumstances, i.e., static dimensional data, such as time hierarchies, but it has to be extremely tightly controlled

* And yes, I have had to furnish data mining, OLAP cubes, etc., and the sound science has never let me down, never! ;o)


Denormalise in your views to your heart's content, but LEAVE THE BASE DATA ALONE!



index on warehousing

A reader, August 13, 2008 - 7:40 pm UTC

Hi Tom,

I am working on data warehosuing project, I have very 3 big tables. Each table contains more then 60M records, my question is do I need to create Index on those big big tables. I am joining this 2 big tables and top of this doing some grouping and storing summary result into final table. I have not created primary key any of the table, no index, when one my seniour has seen this he is asking my why u have not created PK, index etc. I told I am accessing more then 10% of data from both table then why we need index. Can u suggest wether we need index or not on data warehoing specialy on big tables?

Thanks

Tom Kyte
August 18, 2008 - 9:49 am UTC

... is do I need to create Index on
those big big tables. ...


10% - where did you make that number up from, you might use an index to

a) retrieve 100% of the data from a table
b) most of the data from a table
c) sometimes an index is inappropriate to retrieve 3% of the data from a table


"U" is not available, will it be OK if "I" answer instead?


You will, in this warehouse, use things like partitioning, materialized views, bitmap indexes, bitmap join indexes, b*tree indexes - whatever you need - to make things "more efficient"

do you need an index?

hmm,

select t1.a, t2.b
from t1, t2
where t1.key = t2.other_key;


well, if t1 has 50 other columns and is 10gb in size (rows, smoes - the SIZE counts most - 10gb is relevant, 60,000,000 is just a number and one that doesn't tell us very much)....

and t2 has 50 other columns and is 10gb....

having an index on t1(key,a) that is about 250mb and t2(other_key,b) that is about 250mb (or b,other_key and a,key - doesn't matter) might be really really useful - we'd use them as skinny versions of our tables..... so we don't have to full scan 20gb, we full scan 500mb.....


The answer is: it totally absolutely, 100% depends on your questions, what you ask of the data - you will study this guide:

http://docs.oracle.com/docs/cd/B28359_01/server.111/b28313/toc.htm

and use all of the techniques - partitioning, indexing, materialized views, etc to accomplish your goal

index on warehousing

A reader, August 19, 2008 - 1:34 pm UTC

Thank you very much Tom.

I need all the columns i.e. non key index columns also so In that case it would be great to go without index. I have checked execution plan after creating index but it is not using at all. But now I understand when Index is required and when it is not required. Thanks once again.

many-to-many in Datawarehouse

A reader, September 23, 2008 - 9:57 pm UTC

Hi Tom,

In my OLTP database I have tables with many-to-many relationship. For example I have tables Product, Product_Warehouse (mapping table) and Warehouse. Mapping table has only two columns product_id and warehouse_id.
1. In data warehouse can I merge those two tables into one dimension table to avoid snowflake schema?
2. If yes, how would I maintain M:N relationship in one table?
3. If no, what would you do?

Thanks for you help
Tom Kyte
September 24, 2008 - 7:01 pm UTC

1) you can do whatever you like

whether it would be a good idea or not is another thing entirely.

it depends on the questions you ask - totally. Joins are not evil...

2) it would be "denormalized"

3) like I said - it depends - it depends on the questions asked of the data, how much work it would be to maintain the denormalized data and the expected payoff in reduced workload the denormalized model would result in (or not)

To "A reader from CA"

A reader, September 25, 2008 - 10:50 am UTC


The model of the dw would be driven by the semantics of your business and not by "patterns" in your source oltp models.

It very much depends on what that relationship between products and warehouses represents and if it is relevant from an analysis/reporting perspective. If it is relevant then there are likely things you want to measure about it.

You could keep Product as one dimension, Warehouse as another dimension ... the many-to-many relationship would be intrinsic to a fact table having these two dimensions (maybe others too).

Tom Kyte
September 25, 2008 - 3:35 pm UTC

meaning in short:

it depends on the questions you need to ask of the data.

Why Data Mart?

A reader, July 26, 2009 - 8:10 am UTC

Hi Tom,

I have come to know that one of the main reason for building data marts is performance, and Teradata does not promote data marts because of their database architecture, and promote views i.e. no movement of data. In oracle, can the same thing be achieved by building views? If yes, then why Oracle is having ETLs (OWB/Sunopsis)?
Thanks
Tom Kyte
July 26, 2009 - 8:18 am UTC


I have come to know that one of the main reason for building data marts is performance


really - I always was under the impression it was for politics and "i want to own my own data".

We do not overly promote data marts either - not really a good idea. Can we do them? Yes. should you do them? Probably not.



OWB is to build a WAREHOUSE - the W in OWB is "Warehouse", it is not oDMb, it is oWb

Data Warehouse Parameters

Arvind Mishra, September 29, 2009 - 1:56 am UTC

Hello Tom,

Which database initialization parameters are specially taken care of when we are crating data warehouse? and what should be their values?

Thanks,

Arvind Mishra


Tom Kyte
October 02, 2009 - 8:22 am UTC

none, let them all default - except for the memory parameters - and their values would depend on the amount of memory you have.

If there was a list, it would be in chapter 0 of the data warehousing guide - or they would be set by default.

Export Speed

Arvind Mishra, September 29, 2009 - 4:31 am UTC

Hello Tom,

How can I estimate export speed. I want to do schema level export and it has arround 100 gb data(Oracle 9(i)). How much time it will take? Is export/import is a good idea or should I use DBUA?

Thanks,

Arvind Mishra
Tom Kyte
October 02, 2009 - 9:04 am UTC

the only way I know is to test it (data pump in 10g will attempt to estimate if you want, but not in 9i)


100gb of data might be

40gb of table data + 60gb of indexes - that will take less time to export than
90gb of table data + 10gb of indexes


the only answer is "it totally depends, on your actual table data size, the speed of your disks, the amount of cpu you have available, the speed of your network"


If you are upgrading a database (i have to presume that is what you mean since you referenced dbua - the upgrade assistant) then you would just UPGRADE, you would NOT export/import.

Data Warehouse Parameters

Arvind Mishra, October 05, 2009 - 1:23 am UTC

Hello Tom,

1. When you say memory parameters following parameters comes to my mind:
SHARED_POOL_SIZE
SHARED_POOL_RESERVED_SIZE
SHARED_POOL_MIN_ALLOC
SORT_AREA_RETAINED_SIZE
SORT_AREA_SIZE
HASH_AREA_SIZE
HASH_MULTIBLOCK_IO_COUNT
BITMAP_MERGE_AREA

Please let me know if I am missing something.

2. I am moving one user's data from Oracle 9(i) to Oracle 10(g).. Data is around 100gb. I know
that I can not do it using DBUA (Sorry for confusion created in earlier question). What will be the
fastest way of doing it?

Thanks,

Arvind



Tom Kyte
October 08, 2009 - 6:36 am UTC

1) the only one that you should/would consider in the year 2009 would be...

shared pool size

the other shared pool ones - you would never set, regardless of year.

the sort_area, hash_area, bitmap_merge_area, you would not set since 9i with automatic pga memory management

but you are missing the big ones, the buffer cache and pga_aggregate_target


2) you have at your disposal:
a) database links
b) exp/imp
c) exp/imp with transportable tablespaces

however, as I've written a couple of times - the best way will be a function of your data - how much is index, how much is data - as well as the machine you have to work with.

100gb is tiny, any approach would probably move it faster than fast enough, it really isn't very much at all.

Data Warehouse Parameters

Arvind Mishra, October 09, 2009 - 1:03 am UTC

Thanks Tom....Each day I read one question on this site and it is helping me take my knowledge level to new heights. Your answers and tips are really helpful. Thanks again.

Warm regards,

Arvind

David Aldridge, October 21, 2009 - 12:24 pm UTC

With regard to parameters for data warehousing, I'd probably think about setting PARALLEL_MAX_SERVERS, simply because the tendency seems to be to provision data warehouses with too high a ratio of CPU power to I/O throughput. The default tends towards being too high IMHO, but the best practice might be to benchmark your i/o subsystem to see how it responds to degrees of parallelism up to the default.

More a problem with the provisioning than with Oracle, of course.

A reader, March 11, 2010 - 4:06 pm UTC

Hi Tom - We are building a 11g datawarehouse. Are there are new and improved features in 11g (generally + related to datawarehouse) that we can leverage ? I am new to datawarehouse and so is our dba team. The architects involved have some prior experience, but we are trying to think about some things like maintenance. What are the general maintenance tasks that come to your mind for a datawarehouse ? Are there any best practises for maintaining stats ? Any help is appreciated.
Tom Kyte
March 12, 2010 - 2:57 pm UTC

The architects involved have
some prior experience, but we are trying to think about some things like
maintenance. What are the general maintenance tasks that come to your mind for
a datawarehouse ? Are there any best practises for maintaining stats ? Any help
is appreciated.


that is sort of like asking for a 3 year course in a paragraph.

I would say "start by getting some real experience, some real grey hairs, on board".

here are new data warehouse features:

http://docs.oracle.com/docs/cd/E11882_01/server.112/e10810/whatsnew.htm#sthref6

but features without experience are not nearly as useful...

Without understanding your entire design, it is not really possible to describe what tasks are going to be involved (in something shorter than a book...)

Foreign keys in fact table

Arvind Mishar, July 24, 2010 - 12:59 am UTC

Hello Tom,

While designing data warehouse, which one of following is good
1. Using foreign key constraints in fact table to relate it to dimensions or
2. Do not use any foreign key in fact table and handle/validate referential integrity in ETL.

If we use foreign keys then what is the best way to deal with a late arriving dimension?

Thanks,

Arvind
Tom Kyte
July 26, 2010 - 8:14 am UTC

foreign keys - absolutely - 100%.


http://asktom.oracle.com/Misc/stuck-in-rut.html

constraints are used by the optimizer to optimize queries, every possible constraint you can tell us is a good thing in a warehouse for performance.

if you have a true 'late arriving dimension' (they are rare), then you do not have the ability to have data integrity and you will miss out on any optimizations that would otherwise be possible. Meaning - your data does not conform to foreign key relations - you cannot be ensured that the data is key preserved, or would be visible in a join, so you'll miss out on any optimizations that could be made otherwise.

Foreign keys in fact table

Arvind Mishar, July 29, 2010 - 7:15 am UTC

Thanks Tom.

CUBE in warehouse

Arvind Mishra, March 05, 2011 - 9:19 am UTC

Hello Tom,

Can you explaing what is a cube in warehouse and how and where it is helpful with an example? If possible then please provide some link to some good douments. And also, how can we build a cube in Oracle?

Thanks,

Arvind


CUBE

Arvind Mishra, March 08, 2011 - 12:29 am UTC

Thanks Tom

active data warehouse

Arvind, May 17, 2011 - 1:28 am UTC

Hello Tom,

Please can you explain what is active data warehouse? How can you implement it in Oracle?

Any pointers to good documents will be really appreciated.

Thanks,

Arvind
Tom Kyte
May 18, 2011 - 9:16 am UTC

google the definition - in a nutshell, it is generally considered to be an "update to date" warehouse, more 'real time' than a traditional one.

There are many ways to have them - sometimes having one just means you don't have one (you use your oltp database directly). It depends on how you decide to define it and what your needs are.

and yes, we have the tools to create any of them - from golden gate, to streams to CDC (change data capture) to materialized views - etc.

active data warehouse

Arvind, May 24, 2011 - 11:46 pm UTC

Hi Tom,

Thanks for your reply but does active data warehouse means all the changes made to OLTP systems should be immediately propagate to data warehouse?

Thanks,

Arvind
Tom Kyte
May 25, 2011 - 11:25 am UTC

you tell me, it is your database after all.

I would say to you "define 'immediately propagate'" first, I don't know what exactly that means.

Does it mean "two phase commit"
Does it mean "the data starts moving right away - say from the redo logs - but it might be a while before it gets there"

Materialized view as fact table

A reader, July 11, 2011 - 9:15 am UTC

Tom,

Do you see any issue if materialized view is used as fact table?

Thanks.
Tom Kyte
July 13, 2011 - 1:24 pm UTC

it is just a table, use it as you want, it will perform *no differently* than any other table during retrieval.

Sh schema TIMES table script

Arvind Mishra, August 14, 2011 - 10:38 am UTC

Hello Tom,

Can you please share the link to the script which Oracle uses to populate times table of SH sample schema.

Thanks,

Arvind

Performance issues.. data loading

A Reader, January 17, 2012 - 11:33 pm UTC

Tom,
We have one of the dataware house wherein it is found that data loading is taking huge time ( sometimes crossing the window).
we use ODW ( Oracle Dataware house Builder).

I have gone through the RealWorld Performance presenation which you shared last year.
I have few questions?

a) HCC is only aplicable to exadata?
b) any pointers links/documents on dataware house performance tuning

(we are at 11gR1 and 11gR2.)

regards
Tom Kyte
January 18, 2012 - 7:15 am UTC

a) and other storage devices:

http://www.oracle.com/us/corporate/press/508020

b) data warehousing guide, if you haven't read it, you should.


with OWB - are you using the row by row option or the set option?

and - with ALL 'etl' tools - with all of them - you will drop down and write custom code for the really big stuff if you want it to perform at all. Think of them as being an 80-20 sort of thing, 80% of the time, they generate code that is acceptably slow, 20% of the time - they do not. We (developers) have to fill in the gaps.

Performance issues.. data loading

A reader, January 18, 2012 - 8:48 am UTC

Thanks Tom,

in OWB we are using set based operations.

Further, agreed I should read data warehousing guide.

I have read once you saying that DBAs at Dataware House needs to have different then DBAs on OLTP system.
(DBAs for DW needs to be trained separately)
How far it is true?

regards






Tom Kyte
January 18, 2012 - 9:54 am UTC

How far it is true?


very true, an OLTP DBA tries to get queries to run subsecond, uses indexes to tune, does things with update and delete and normal inserts (without append). A data warehouse DBA will be ok with long running queries - they know the goal is to get them to run as fast as possible - but that they can take a while, that update and delete and normal inserts (without append) are to be avoided - direct path operations in bulk are the way to go, is afraid of indexes and gets worried when they see them being used a lot.

sort of polar opposites.

What about constraints?

Dhruva, July 22, 2013 - 2:36 pm UTC

Tom,

In a data warehouse, given that the data has been validated before loading, if we create primary keys on dimension tables and foreign keys on fact tables with a RELY DISABLE NOVALIDATE clause to speed up the load, and leave them that way for good, will it cause any "general" query performance issues?

Are there any best practice guidelines?

In case you need to know, there are MVs that are created using joins between the dim and fact tables and most if not all queries that are likely to be fired will use the MV.

Thanks
Tom Kyte
July 31, 2013 - 4:24 pm UTC

... will it cause any "general" query performance issues? ...

yes, it will make some of your queries run much faster.


... Are there any best practice guidelines?....

declare to the database as many constraints as humanly possible (making sure they are VALID, that they are true!). tell us all of the unique/primary keys, foreign keys, check constraints, NOT NULL constraints as you can. All of them. It helps you document, it helps ad-hoc systems build queries, and it helps the optimizer come up with the optimal plan - ESPECIALLY when using materialized views with query rewrite - especially with that!

read:

http://asktom.oracle.com/Misc/stuck-in-rut.html

But...

Dhruva, August 02, 2013 - 3:26 pm UTC

...the caveat being that one cannot create bitmap join indexes if the primary key of the dimension table is either in DISABLE or NOVALIDATE state, nullifying the advantage of these constraint states. Any workarounds gladly accepted.
SQL> CREATE TABLE dim_test (a NUMBER, b NUMBER);
Table created.

SQL> ALTER TABLE dim_test ADD CONSTRAINT dim_test_pk PRIMARY KEY (a) RELY DISABLE;
Table altered.

SQL> CREATE TABLE fact_test (a NUMBER);
Table created.

SQL> ALTER TABLE fact_test ADD CONSTRAINT fact_test_fk FOREIGN KEY (a) REFERENCES dim_test RELY DISABLE;
Table altered.

SQL> CREATE BITMAP INDEX b1 ON fact_test(d.b)
  2  FROM fact_test f, dim_test d
  3  WHERE f.a = d.a;
FROM fact_test f, dim_test d
                  *
ERROR at line 2:
ORA-25954: missing primary key or unique constraint on dimension 

SQL> DROP TABLE fact_test;
Table dropped.

SQL> DROP TABLE dim_test;
Table dropped.

SQL> CREATE TABLE dim_test (a NUMBER, b NUMBER);
Table created.

SQL> ALTER TABLE dim_test ADD CONSTRAINT dim_test_pk PRIMARY KEY (a) RELY NOVALIDATE;
Table altered.

SQL> CREATE TABLE fact_test (a NUMBER);
Table created.

SQL> ALTER TABLE fact_test ADD CONSTRAINT fact_test_fk FOREIGN KEY (a) REFERENCES dim_test RELY NOVALIDATE;
Table altered.

SQL> CREATE BITMAP INDEX b1 ON fact_test(d.b)
  2  FROM fact_test f, dim_test d
  3  WHERE f.a = d.a;
FROM fact_test f, dim_test d
                  *
ERROR at line 2:
ORA-25954: missing primary key or unique constraint on dimension 

Tom Kyte
August 02, 2013 - 7:40 pm UTC

dimension tables are typically small/manageable - why do you need to have them in disable/novalidate state

only the primary key on the smaller dimension table needs to be validated.

Got it

Dhruva, August 05, 2013 - 2:05 pm UTC

So RELY for PK constraints on the dimension tables (the largest dimension is 13M rows with approx 100k inserts/updates on each run), and RELY DISABLE NOVALIDATE for FK constraints on the fact tables.

I configured the tables this way and query/load performance is good, MVs are being used and of course data dictionary is showing the correct relationships. Also saves the hassle of enabling/disabling constraints.

Thanks a lot.

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.