Skip to Main Content
  • Questions
  • Table Design for DataMart/Datawarehouse

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Saeed.

Asked: August 29, 2004 - 1:04 pm UTC

Last updated: July 27, 2009 - 6:03 am UTC

Version: 8.1.7

Viewed 1000+ times

You Asked

Hi Tom,

I have a question for you. I am in a process to design data mart, source data is in DB2 and in 3NF. For my staging area, I designed the model exactly same as source data model except that I resolved one-to-one relationships and sub types into one table. By doing this, some of my tables have around 200 to 225 columns. My staging area is not for query purpose, most of the time there is only Inserts, and few updates. I want to know is there any performance or any other impact on this size (no. of columns) of tables?

Thanks in advance

Saeed

and Tom said...

under 255 columns, no.

over 255 columns, Oracle will internally chain every row. It'll take two logical IO's to read an entire record instead of the normal one.

search for

255 columns chained

on this site to read more about that if interested.

Rating

  (17 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Dimensional modeling -- Star schema --is not it redundent

Ajeet, August 30, 2004 - 5:03 am UTC

Tom,
I want to share a performance issue I faced with a typical star schema design -- and the solution which i found.

Per my initial desgin -- I have 5 dimensions --

Engine (have 500 records) , Aircraft (200 records) , fleet(106 records) , Engine_position (15 records) , Flight_phase (4 records)
and a fact table which has FK's from all of the above dimensions and it has 3 additional fieds -- parameter value (Fact data) , Flight_Datetime and fleet .flight_datetime is to use to parition this table by range + list (composite partitioning).

The fact table has 125 million rows.

Now when I went by the above approach my main queries were performing lot of LIO's. and so a performacne which was not able to meet my customer response time requirements.

I did a redesgin (in development ) --

and merged the above five dimensions into one dimension and used the new key as FK to the fact table...

and I saw a 20 time better response time .. 6 times less LIO's and I was able to do everything I wanted to do.

My question is -- did i take a correct design approach...

You may ask how about many attributes which dimension table can have -- I can still use the dimension table..(In my case i don't need it --but many times it is required).

The Point I want to make is -- is it redundent and increasing the number of reads (LIO's as well) by modeling the star schema the way Ralph et all recommend --

to be more clear -- if I know i will have just 50,000
different combination of aircraft_id + engine_id + fleet+flight_phase + engine_position ...why would I keep 5 different fields in the fact tables for 125 millions records..instead of that I can keep just 1 filed for 125 millions records..Will not it be more efficent and performant way to design it ?

the drilldowns and anlytical queries--all is doable in the second approach I took.

Please advice -- asssuming that I have captured the business requirements correctly and partitioning scheme is based on the quries we make from the above tables.

Thanks
Ajeet




Tom Kyte
August 30, 2004 - 8:46 am UTC

you'd need to understand the types of queries you execute on your system.

Yuo basically did what some query plans would do for you - a cartesian product of the dims.

If you have something maintainable.
That can be loaded efficiently.
That is much more efficient than what you had.

One might say "you win"?

"Ralph et all" ... hmmm

Gabe, September 01, 2004 - 4:13 pm UTC

I know Tom doesn't quite subscribe to hypotheses and wild assumptions but I'll indulge myself (I may be totally off).

Subject Area:

Trend analysis for standard aircraft engine parameters by fleet.

Dimension Tables:

1. "Fleet"
2. "Engine Type" = ( Engine Code, Engine Position, Aircraft Type, etc.)
3. "Day" ... Note: it is optional ... some may be comfortable just with a timestamp column in the Fact table

Fact Table ("Engine Parameter Reading"):

a. FK to "Fleet"
b. FK to "Engine Type"
c. "Engine Parameter" (logically a dimension but implemented as a degenerate dimension)
d. FK to "Day" (if "Day" dimension table implemented)
e. "Reading Timestamp" (just the Time part if "Day" implemented ... Date+Time otherwise)
f. Pre-Flight Check Value
g. Take-off Value
h. In-Flight (Cruise) Value
i. Landing Value
(f,g,h, and i are my assumptions of what the "Flight Phases" may be)

Note: the "Value" measures are semi-additive ... that is, some math (like AVG, etc) makes sense only in the context of the same engine parameter (adding oil pressure with temperature means ... zilch).

So maybe, just maybe, star schema is not <quote>redundant</quote> … and <quote>assuming that I have captured the business requirements correctly</quote> is too much to ask.


Engine Parameter values are not additive at all..

Ajeet, September 02, 2004 - 3:20 am UTC

Glable -- thanks for your input.

Engine Parameter values are not additive --- we never need to do any math on these values..it is kind of different requirement that we typically get for a Datawarehouse project , I was trying to get a optimal desgin for 90% of query which we will ask from this database.and as I am not making any hypothesis -I did that..we are getting a 22 times better response time now..may not be a generic solution but fit to my requirements.
I have designed at least a dozen of very large DW's and followed Ralph Kimball 100% on those.these was an one off rquirement.
I am still validating this approach in general cases--going back to old systems and trying to see if this design can meet all the requiremenst which a typical dw has.
may be in future i will come back with more answers on this page or some other forum.

Thanks

_suspect_ conclusions

Gabe, September 02, 2004 - 10:45 am UTC

Ajeet,

From your comment I gather that, at least in part, I did _get_ the business of your subject area.

My suggestion was that "Flight Phase" did not need to be implemented as a dimension ... rather than one generic "Value" measure, you should have specific measures corresponding to each of the 4 "Flight Phase"s ... this change alone would've reduced the size of your fact table from 125Mil to ~31Mil.

My other comment was that 3 of the existing dimensions "Engine Type", "Engine Position" and "Aircraft Type" very likely belong together into one single dimension table called "Engine Type" (it is not that we combine them in order to have just one integer surrogate PK migrated into the fact ... they _just_ belong together) ... hence the number of FKs in your fact should in fact be reduced from the existing 5 to 2.

Implementing a DW (star schema) requires some degree of denormalization ... it is a balancing act in many respects ... I argue the improvements you see are not proof that Dimensional Modeling (as introduced and described by Kimball) is somehow obsolete and its principles/methods not quite applicable to your particular case ... rather, the original star having 5 (actually 6) dimensions was overly denormalized (someone was too quick in pulling out dimension tables) and what you're actually doing is correcting a modeling weakness. In a sense I was trying to convince you not to go (too quickly) to the other extreme of combining all of your dimensions into one single dimension table ... the technique is valid and it does have its place (hey, it may even be OK in your case if this is an isolated star) ... but think about it, what if a _related_ star is added to the DW ... I guess conformed dimensions (if you agree with its usefulness of course) would be "bye, bye".

In the end you have all the facts and I'm just speculating ... the _veiled_ message (whether intended or not) that star modeling is redundant is what ticked me off.

Good Luck.


star schema

A reader, March 05, 2009 - 5:30 am UTC

Hi Tom,

We are designing data mart, and need your opinion about the scenario:-
All dim tables have surrogate keys joined with fact table. If we create month key as surrogate key, then will oracle eliminate partition in monthly fact table under the join below?
SELECT ...
FROM fact, month_dim dim
WHERE fact.month_key = dim.month_key
and dim.month_no = 200808

We are using oracle 10gR1, and can create bitmap/bitmap join indexes.

Regards,
Tom Kyte
March 05, 2009 - 1:53 pm UTC

funny, I was just writing about natural keys:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1460004570029#1548632300346251599


this month_key is a bad idea. why would you have a surrogate there?


I cannot answer your question, I don't have any clue how you've partitioned the data. But in general, it'll depend. I don't see how the table could be partition by fact.month_key, since that is just a surrogate (range partition? no, hash partition - maybe, but you'd want the hash partition column to be much more unique, list partition - impractical)

Month Key

A reader, March 07, 2009 - 2:10 am UTC

Hi Tom.

Thanks.

As DM is being used for lots of ad-hoc queries/users, putting month_key as smart key in fact table may result into its misuse, for example, users looking at the value, may think - that is good I can apply functions on it, thus defeating the purpose. So,
Will below help for partition elimination?
Each row in fact table will have surrogate month key and also month_no (YYYMM). Month_no is not exposed to users. When users join month dim and fact for dim.month_name = 'JAN 2008' and dim.month_key=fact.month_key, we force (under covers) the join of dim.month_no = fact.month_no. This way partition elimination would happen and month_no won't be misused.

Regards,
Tom Kyte
March 07, 2009 - 12:34 pm UTC

huh, it's misuse???????? It is a value, it is data. You are making things harder by using a surrogate for a date.

I'll reiterate:

this month_key is a bad idea. why would you have a surrogate there?


... month_no (YYYMM). Month_no is not exposed to users. ...

from bad to worse, you just want to make it harder for those people.... You see, they can GET this month_no thing by joining and then "misuse it", you have prevented no "misuse" whatever that means.

Look at your example:

SELECT ...
FROM fact, month_dim dim
WHERE fact.month_key = dim.month_key
and dim.month_no = 200808


Ok, I have a result set with.... ta-dah - month_no - I'll abuse that (albeit, more slowly and less efficiently than if you just let me have at it in the first place)


...
Each row in fact table will have surrogate month key and also month_no (YYYMM). Month_no is not exposed to users. When users join month dim and fact for dim.month_name = 'JAN 2008' and dim.month_key=fact.month_key, we force (under covers) the join of dim.month_no = fact.month_no. This way partition elimination would happen and month_no won't be misused.

......

explain via example how someone would "mis-use" this data. You know, if I wanted Jan-2008 and you had, well, YYYYMM - why wouldn't I query for 200801????? How is that mis-using anything????

month_key

A reader, March 19, 2009 - 5:59 am UTC

Hi Tom,
Thanks.
This leads to a basic question.

What is then the purpose of month dimension in data mart?

Regards,
Tom Kyte
March 19, 2009 - 10:55 am UTC

typically, your month dimension would look like:

the_day date, -- represented as a date without the time component
the_month date, -- represented at the first day of the month (year/month only)
the_qtr ...., -- sometimes represented in some format of your choosing, 2009-1 for example
the_year ...., -- maybe as a number 2009
the_fy_qtr ..., -- fiscal year qtr
the_fy ..., -- fiscal year


that is, like what ALL dimensions are used for - to describe other derivable attributes.

time dimension created using OWB

gary, March 23, 2009 - 1:04 am UTC

Hi Tom,

This is time dimension with surrogate dimension_key created and populated using Oracle Warehouse Builder wizards.

Please clarify .

Thanks


SQL> desc owb_times
Name Null? Type
----------------------------------------------------- -------- ------------------------------------
DIMENSION_KEY NOT NULL NUMBER
CALENDAR_MONTH_END_DATE DATE
CALENDAR_MONTH_CAL_MONTH_CODE NUMBER
MONTH_OF_YEAR NUMBER
CALENDAR_MONTH_TIME_SPAN NUMBER
CAL_MONTH_NUMBER NUMBER
CALENDAR_MONTH_DESCRIPTION VARCHAR2(2000)
MONTH_OF_QUARTER NUMBER
CALENDAR_MONTH_START_DATE DATE
CALENDAR_MONTH_NAME VARCHAR2(25)
CALENDAR_MONTH_ID NUMBER
CALENDAR_QUARTER_TIME_SPAN NUMBER
CALENDAR_QUART_CAL_QUARTER_CO NUMBER
CALENDAR_QUARTER_START_DATE DATE
CALENDAR_QUARTER_END_DATE DATE
QUARTER_OF_YEAR NUMBER
CALENDAR_QUARTER_NAME VARCHAR2(25)
CALENDAR_QUARTER_DESCRIPTION VARCHAR2(2000)
CAL_QUARTER_NUMBER NUMBER
CALENDAR_QUARTER_ID NUMBER
CALENDAR_YEAR_END_DATE DATE
CALENDAR_YEAR_CAL_YEAR_CODE NUMBER
CALENDAR_YEAR_DESCRIPTION VARCHAR2(2000)
CAL_YEAR_NUMBER NUMBER
CALENDAR_YEAR_ID NUMBER
CALENDAR_YEAR_NAME VARCHAR2(25)
CALENDAR_YEAR_TIME_SPAN NUMBER
CALENDAR_YEAR_START_DATE DATE

SQL>

SQL> set lines 1000
SQL> select DIMENSION_KEY,CALENDAR_MONTH_END_DATE
2 ,CALENDAR_MONTH_CAL_MONTH_CODE ,MONTH_OF_YEAR
3 from owb_times
4 /

DIMENSION_KEY CALENDAR_ CALENDAR_MONTH_CAL_MONTH_CODE MONTH_OF_YEAR
------------- --------- ----------------------------- -------------
-2
-7
-8
-9
-10
23 31/JAN/00 200001 1
24 29/FEB/00 200002 2
25 31/MAR/00 200003 3
26 30/APR/00 200004 4
27 31/MAY/00 200005 5
28 30/JUN/00 200006 6

DIMENSION_KEY CALENDAR_ CALENDAR_MONTH_CAL_MONTH_CODE MONTH_OF_YEAR
------------- --------- ----------------------------- -------------
29 31/JUL/00 200007 7
30 31/AUG/00 200008 8
31 30/SEP/00 200009 9
32 31/OCT/00 200010 10
33 30/NOV/00 200011 11
34 31/DEC/00 200012 12

17 rows selected.

SQL>



Tom Kyte
March 26, 2009 - 12:30 pm UTC

clarify what??

time_dimension with surrogate key

gary, March 27, 2009 - 12:44 am UTC

you said

"
this month_key is a bad idea. why would you have a surrogate there?
"


But OWB create time_dimension with surrogate key.
Should time dimension have surrogate key as created by OWB?

Tom Kyte
March 30, 2009 - 3:25 pm UTC

I don't believe so, it isn't necessary and makes things more complex.

month key

A reader, March 30, 2009 - 3:40 pm UTC

Hi Tom,

If month key is surrogate key, I don't think that Oracle will be able to eliminate partitions (from fact table) with month key join to fact and selecting specific month no from month dim. If correct, then this approach (apart from complexity) may not result into high performance queries.

Regards,
Tom Kyte
March 30, 2009 - 6:02 pm UTC

it can partition eliminate.

Month Key

A reader, April 03, 2009 - 5:45 am UTC

Hi Tom,
No, it does not eliminate partition as below:

- Query below does not eliminate partition
- select count(1)
from FACT fct
inner join mth_dim mth on fct.mth_key = mth.mth_key
inner join UNIT orb on FCT.UNIT_KEY = ORB.UNIT_KEY
AND mth.CLNDR_YEAR_MTH= 200810
--AND FCT.CLNDR_MTH = 200810


Execution Plan
---------------------------------------------------------

---------------------------------------------------------------------------------------------------
Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Pst
---------------------------------------------------------------------------------------------------
0 | SELECT STATEMENT | | 1 | 23 | 23580 (6)|
1 | SORT AGGREGATE | | 1 | 23 | |
2 | HASH JOIN | | 1416K| 31M| 23580 (6)|
3 | MERGE JOIN CARTESIAN | | 26770 | 392K| 13 (0)|
4 | TABLE ACCESS FULL | MTH_DIM | 1 | 10 | 2 (0)|
5 | BUFFER SORT | | 26770 | 130K| 11 (0)|
6 | INDEX FAST FULL SCAN| XPKFINANCE_ORGANISATION_UNIT_B | 26770 | 130K| 11 (0)|
7 | PARTITION RANGE ALL | | 8500K| 64M| 23177 (4)|
8 | TABLE ACCESS FULL | FACT | 8500K| 64M| 23177 (4)|
---------------------------------------------------------------------------------------------------
---------------------------------------------------
-- Query below does eliminate partition

select count(1)
from FACT fct
inner join mth_dim mth on fct.mth_key = mth.mth_key
inner join UNIT orb on FCT.UNIT_KEY = ORB.UNIT_KEY
AND FCT.CLNDR_MTH = 200810

Execution Plan
----------------------------------------------------------

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%C
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | | 6515 (
| 1 | SORT AGGREGATE | | 1 | 21 | |
| 2 | HASH JOIN | | 1416K| 28M| 6280K| 6515 (
| 3 | MERGE JOIN CARTESIAN | | 321K| 2509K| | 113
| 4 | INDEX FULL SCAN | XPKMONTH_DIM | 12 | 36 | | 1
| 5 | BUFFER SORT | | 26770 | 130K| | 112
| 6 | INDEX FAST FULL SCAN| XPKFINANCE_ORGANISATION_UNIT_B | 26770 | 130K| | 9
| 7 | PARTITION RANGE SINGLE| | 1416K| 17M| | 4263 (
| 8 | TABLE ACCESS FULL | FACT | 1416K| 17M| | 4263 (
----------------------------------------------------------------------------------------------------

Tom Kyte
April 03, 2009 - 8:09 am UTC

unless I have a 100% complete thing to look at, I'm not looking. I don't see any create table/dbms_stats.set_table_stats etc to reproduce with (and even if they are somewhere up there on the page - I don't hunt and poke around trying to reconstruct the thought)

so, if you give me an entire thing to look at, with simple simple creates (no tablespaces, minimal create statements), I'll look. Otherwise - I don't really have any context to go on here, you are just "a reader"


and that a particular query did not partition eliminate does not mean "it cannot", it can also mean "optimizer chose not to"

mnth key - test case

A reader, April 03, 2009 - 11:14 pm UTC

Hi Tom,

The script is as below
DROP TABLE FACT1 CASCADE CONSTRAINTS;
CREATE TABLE FACT1
(
MTH_KEY NUMBER(6) NOT NULL,
UNIT_KEY NUMBER(15) NOT NULL,
AMT NUMBER(15,2),
CLNDR_MTH NUMBER(6) NOT NULL
)
PARTITION BY RANGE (CLNDR_MTH)
(
PARTITION FACT1_PART1 VALUES LESS THAN (200811),
PARTITION FACT1_PART2 VALUES LESS THAN (200812),
PARTITION FACT1_PART3 VALUES LESS THAN (200901),
PARTITION FACT1_PART4 VALUES LESS THAN (200902),
PARTITION FACT1_PART5 VALUES LESS THAN (200903),
PARTITION FACT1_PART6 VALUES LESS THAN (200904)
);
INSERT INTO FACT1 VALUES (10,10,100,200810);
INSERT INTO FACT1 VALUES (11,11,200,200811);
INSERT INTO FACT1 VALUES (12,12,100,200812);
INSERT INTO FACT1 VALUES (1,1,100,200901);
INSERT INTO FACT1 VALUES (2,2,100,200902);
INSERT INTO FACT1 VALUES (3,3,100,200903);
COMMIT;


CREATE BITMAP INDEX IND_MTH_KEY ON FACT1 (MTH_KEY) LOCAL;
CREATE BITMAP INDEX IND_UNIT_KEY ON FACT1 (UNIT_KEY) LOCAL;

exec dbms_stats.gather_table_stats('USER','FACT1',ESTIMATE_PERCENT=>100, -
METHOD_OPT=>'for all indexed columns size auto',CASCADE=>True);
--
DROP TABLE MTH_DIM1;
CREATE TABLE MTH_DIM1
(
MTH_KEY NUMBER(6) NOT NULL,
CLNDR_YEAR_MTH VARCHAR2(10 BYTE) NOT NULL,
CLNDR_YEAR_MTH_NAME VARCHAR2(100 BYTE) NOT NULL
);

INSERT INTO mth_dim1 VALUES (1,'200901','Jan 2009');
INSERT INTO mth_dim1 VALUES (2,'200902','Feb 2009');
INSERT INTO mth_dim1 VALUES (3,'200903','Mar 2009');
INSERT INTO mth_dim1 VALUES (10,'200810','Oct 2008');
INSERT INTO mth_dim1 VALUES (11,'200811','Nov 2008');
INSERT INTO mth_dim1 VALUES (12,'200812','Dec 2008');
COMMIT;


CREATE UNIQUE INDEX U1_MTH_DIM1 ON MTH_DIM1(MTH_KEY);


ALTER TABLE MTH_DIM1 ADD (
CONSTRAINT PKMONTH_DIM1
PRIMARY KEY
(MTH_KEY)
);

exec dbms_stats.gather_table_stats('USER','MTH_DIM1',ESTIMATE_PERCENT=>100, -
METHOD_OPT=>'for all indexed columns size auto',CASCADE=>True);

--
DROP TABLE UNIT_DIM1;

CREATE TABLE UNIT_DIM1
(
UNIT_KEY NUMBER(6) NOT NULL,
UNIT_CODE VARCHAR2(10 BYTE) NOT NULL,
UNIT_NAME VARCHAR2(100 BYTE) NOT NULL
);
INSERT INTO unit_dim1 VALUES (1,'1','Unit 1');
INSERT INTO unit_dim1 VALUES (2,'2','Unit 2');
INSERT INTO unit_dim1 VALUES (3,'3','Unit 3');
INSERT INTO unit_dim1 VALUES (10,'10','Unit 10');
INSERT INTO unit_dim1 VALUES (11,'11','Unit 11');
INSERT INTO unit_dim1 VALUES (12,12,'Unit 12');
COMMIT;

CREATE UNIQUE INDEX U1_UNIT_DIM1 ON UNIT_DIM1(UNIT_KEY);

exec dbms_stats.gather_table_stats('USER','UNIT_DIM1',ESTIMATE_PERCENT=>100, -
METHOD_OPT=>'for all indexed columns size auto',CASCADE=>True);


ALTER TABLE UNIT_DIM1 ADD (
CONSTRAINT PKUNIT_DIM1
PRIMARY KEY
(UNIT_KEY)
);

ALTER TABLE MTH_DIM1 MODIFY CONSTRAINT PKMONTH_DIM1 RELY;
ALTER TABLE UNIT_DIM1 MODIFY CONSTRAINT PKUNIT_DIM1 RELY;

ALTER TABLE FACT1 ADD
CONSTRAINT FK_FACT1_MTH_KEY
FOREIGN KEY (MTH_KEY)
REFERENCES MTH_DIM1 (MTH_KEY) RELY disable NOVALIDATE;

ALTER TABLE FACT1 ADD
CONSTRAINT FK_FACT1_UNIT_KEY
FOREIGN KEY (UNIT_KEY)
REFERENCES UNIT_DIM1 (UNIT_KEY) RELY disable NOVALIDATE;

SET AUTOTRACE TRACEONLY EXPLAIN

select count(1)
from FACT1 fct
inner join mth_dim1 mth on fct.mth_key = mth.mth_key
inner join unit_dim1 unit on FCT.UNIT_KEY = unit.UNIT_KEY
AND FCT.CLNDR_MTH = 200810
;
select count(1)
from FACT1 fct
inner join mth_dim1 mth on fct.mth_key = mth.mth_key
inner join unit_dim1 unit on FCT.UNIT_KEY = unit.UNIT_KEY
AND mth.CLNDR_YEAR_MTH_NAME = 'Oct 2008'
;


Regards,


month key

A reader, April 12, 2009 - 6:22 pm UTC

Hi Tom,

Could you please advise on the example above.

Regards,


Tom Kyte
April 13, 2009 - 5:17 pm UTC

you partition by clndr_mth

you put a where clause on mth_key

so, I'm missing something here - of course it cannot, you are comparing apples to flying toaster ovens here???


In general, you can partition eliminate, WHEN YOU QUERY ON THE PARTITION KEY. But, you are not?


ops$tkyte%ORA10GR2> DROP TABLE FACT1;

Table dropped.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> CREATE TABLE FACT1
  2  (
  3  MTH_KEY                NUMBER(6)  NOT NULL,
  4  UNIT_KEY               NUMBER(15)  NOT NULL,
  5  AMT                    NUMBER(15,2),
  6  CLNDR_MTH              NUMBER(6)  NOT NULL
  7  )
  8  PARTITION BY RANGE (CLNDR_MTH)
  9  (
 10  PARTITION FACT1_PART1 VALUES LESS THAN (200811),
 11  PARTITION FACT1_PART2 VALUES LESS THAN (200812),
 12  PARTITION FACT1_PART3 VALUES LESS THAN (200901),
 13  PARTITION FACT1_PART4 VALUES LESS THAN (200902),
 14  PARTITION FACT1_PART5 VALUES LESS THAN (200903),
 15  PARTITION FACT1_PART6 VALUES LESS THAN (200904)
 16  );

Table created.

ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'FACT1', numrows => 10000000, numblks => 100000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> DROP TABLE MTH_DIM1;

Table dropped.

ops$tkyte%ORA10GR2> CREATE TABLE MTH_DIM1
  2  (
  3  MTH_KEY                NUMBER(6)        NOT NULL,
  4  CLNDR_MTH              number(6) not null,
  5  CLNDR_YEAR_MTH         VARCHAR2(10 BYTE)  NOT NULL,
  6  CLNDR_YEAR_MTH_NAME    VARCHAR2(100 BYTE)  NOT NULL
  7  );

Table created.

ops$tkyte%ORA10GR2> ALTER TABLE MTH_DIM1 ADD ( CONSTRAINT PKMONTH_DIM1 PRIMARY KEY (MTH_KEY));

Table altered.

ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'MTH_DIM1', numrows => 100000, numblks => 1000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from fact1 where clndr_mth = 200812;

Execution Plan
----------------------------------------------------------
Plan hash value: 103118070

------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |   100K|  5078K|  5276   (4)| 00:00:26 |       |       |
|   1 |  PARTITION RANGE SINGLE|       |   100K|  5078K|  5276   (4)| 00:00:26 |     3 |     3 |
|*  2 |   TABLE ACCESS FULL    | FACT1 |   100K|  5078K|  5276   (4)| 00:00:26 |     3 |     3 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("CLNDR_MTH"=200812)

ops$tkyte%ORA10GR2> select *
  2    from fact1, mth_dim1
  3   where fact1.mth_key = mth_dim1.mth_key
  4     AND fact1.clndr_mth = mth_dim1.clndr_mth
  5     AND mth_dim1.mth_key = 5;

Execution Plan
----------------------------------------------------------
Plan hash value: 1193018093

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |              |     2 |   274 |  5277   (4)| 00:00:26 |       |       |
|   1 |  NESTED LOOPS                |              |     2 |   274 |  5277   (4)| 00:00:26 |       |       |
|   2 |   TABLE ACCESS BY INDEX ROWID| MTH_DIM1     |     1 |    85 |     2   (0)| 00:00:01 |       |       |
|*  3 |    INDEX UNIQUE SCAN         | PKMONTH_DIM1 |     1 |       |     1   (0)| 00:00:01 |       |       |
|   4 |   PARTITION RANGE ITERATOR   |              |     2 |   104 |  5275   (4)| 00:00:26 |   KEY |   KEY |
|*  5 |    TABLE ACCESS FULL         | FACT1        |     2 |   104 |  5275   (4)| 00:00:26 |   KEY |   KEY |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("MTH_DIM1"."MTH_KEY"=5)
   5 - filter("FACT1"."MTH_KEY"=5 AND "FACT1"."CLNDR_MTH"="MTH_DIM1"."CLNDR_MTH")

ops$tkyte%ORA10GR2> set autotrace off




you do need to where on the partition key....
in order to partition eliminate....

Month Key

A reader, April 13, 2009 - 5:43 pm UTC

Hi Tom,

Thanks for your help. This is what I was expecting. But now back to real question:-
In typical DW env., where OWB creates time dimension with month key as surrogate key, how Oracle will leverage partition elimination for queries against specific month(s)? It looks to me that either we need to change the design (use smart key/partition key - YYYYMM) or we can not get performance benefits through partition elimination. Am I correct?

Regards,
Tom Kyte
April 14, 2009 - 9:08 am UTC

well, tell me this - how would you have the date both in the fact table AND the dimension table? and if you do, why would you bother with the dimension at all?

date dimension

A reader, April 14, 2009 - 5:45 pm UTC

Hi Tom,

There are lots of other attributes in typical date dimension like, total number of days to calendar month/financial year month, finacial year month number/name etc. typically needed by any BI users for analysis and repoting etc.
Thanks
Tom Kyte
April 14, 2009 - 6:37 pm UTC

I understand that. I don't understand the relevance to my question.

My question is - why isn't your calendar date in your fact table related to your time DIMENSION. why are you maintaining two bits of data that in fact have to be the same in two different places without any sort of relationship between them.

Your fact table should have the date as the foreign key
Your fact table would be partitioned on that date
That date would be the partition key
You would be 'whereing' on the partition key when you join to the time dimension.

Terabytes size table: Need suggestion on Constraints

Manjunath, July 24, 2009 - 7:57 am UTC

Hi Tom,
We have a DWH application and Datamarts is one of the schemas. Most of the tables in the Datamarts are very large Facts that will be 1TB+ after compression of Data and Indexes.
The development is still in the initial stage and with very less data(mostly mock-up about a few MB).
All the tables have been defined as:
1. A PK on the combination of all the surrogate keys from the dimensions. In some of the facts the PK is as large as 17 columns. In many of the tables it is between 7-11 columns. Apart from this we have indivudual indexes on all the FKs(FKs are defined on all the surrogate keys of dimensions). This has created tables that have indexes between 2-4 times of the actual data itself. These PKs are not referred by any of the other tables and are really just unique records.
2. We use Informatica for ETL and before the monthly loads to these datamart tables, we check the records using a aggregator function in Informatica to ensure that an unique record is being inserted. Similary, the FK values are checked in the app itself that suitable values are in the parent dimensions.
3. In the above case, do we need to have the PK and the additional FK relations and all the related indexes?
I anticipate problems in administering these tables specially with the insert speeds and compression in mind.THe inserts will be slow and for maintaining the compression for the newer inserts, we will need to do extra works(Unless we disable the constraints/indexes we will not be able to do direct loads and non-direct loads imply no compression)

I was therefore for looking to remove the PKs and the FKs and instead retain those indexes that are needed for querying. The other view from the architecture team is that we should be depending more on the database for data integrity issues. (They are quoting your article https://asktom.oracle.com/magazine-archive.htm )
While I agree to that in a broader sense(I thought the article had more in common to an OLTP environment), these datamart tables are not the primary transaction tables. We load them from a separate set of flat files that come from transactional data. Over that, we check for uniqueness in the app itself. So, in this case do we really need to have a PK on a very long list of columns, FKs on all of them and then indexes additionally?

Also, please suggest if there are any alternate ways of handling such large tables in a typical data warehouse?

Thanks and regards
Manjunath
Tom Kyte
July 26, 2009 - 7:28 am UTC

1. A PK on the combination of all the surrogate keys from the dimensions.

why? that is surprising to me. Why would you do that - and what says that constitutes the primary key???

2. We use Informatica for ETL
slow by slow processing, be prepared to wait (and wait and wait)



constraints are used by the optimizer to OPTIMIZE queries. Remove the constraints and lose the ability to have queries run fast.


https://www.oracle.com/technetwork/issue-archive/2009/09-may/o39asktom-096149.html
http://asktom.oracle.com/Misc/stuck-in-rut.html

constraints are so vitally important in a data warehouse FOR PERFORMANCE reasons - do not skip them.


I was therefore for looking to remove the PKs and the FKs and instead retain
those indexes that are needed for querying.


do you not see that the index on the unique set of attributes IS the primary key??? why would removing a primary key (but keeping the index) do anything for your load? Have you thought your thoughts all of the way through?

Thank you.

Manjunath, July 27, 2009 - 5:25 am UTC

Hi Tom,
1. That the PK was not needed, was recognized and is being removed. Along with that goes the associated index too.
2. Nothing can be done on the choice of the tool. Some decisions are beyond consideration after some time in to the development. But, this is my first experience with it and I realize the limitations. :-)

On the removal of FKs: Actually, I will not be removing them altogether. I will convert them with the 'RELY' tag. I think that will address the CBO need as well as the insert performance. I will convert all those indexes on FKs to be bitmap indexes so as to take advantage of star transformation. I will create additional indexes if they are part of the predicates in the reporting queries.

Thanks a lot for your suggestions as always, this site has been a very good source in gaining confidence for me.

Regards
Manjunath

Tom Kyte
July 27, 2009 - 6:03 am UTC

2) that is silly, of course anything can be done. You choose to not do something. Things can always be changed.


Be careful with rely, if you lie to us, we will lie to you. Put on a novalidate rely constraint - you had better be more than certain it is in fact valid.