Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, cory.

Asked: September 08, 2005 - 10:43 am UTC

Last updated: November 21, 2012 - 9:46 am UTC

Version: 920

Viewed 1000+ times

You Asked

Hi Tom,
I have one question about dimensions.
Is a dimension some metadata about dimension tables in my star/snowflake schema, or can dimensions be used for query rewrite.

Can you briefly explain (or provide some example)?

Thanks a lot.

and Tom said...

quote from expert one on one Oracle:

Dimensions

Use of Dimensions is another method by which we can give even more information to Oracle. Suppose we have a table of details. In the details are a transaction date and a customer id. The transaction date points to another table that gives full details of what month the transaction date was in, what quarter of your fiscal year that represents, what fiscal year it was in, and so on. Now, suppose you created a materialized view that stored aggregated sales information at the monthly level. Can Oracle use that view to answer a query for sales data for a particular quarter or year? Well, we know that transaction date implies month, month implies quarter and quarter implies year, so the answer is that it can - but Oracle doesn't know this (yet) - so although it can, it won't.

Using a database object called a dimension, we can alert Oracle to these facts and it will use them to rewrite queries in more cases. A dimension declares a parent/child relationship between pairs of columns. We can use it to describe to Oracle that within a row of a table - the MONTH column implies the value you'll find in the QTR column, the QTR column implies the value you'll find in the YEAR column and so on. Using a dimension, we can have a materialized view that has fewer details then the detailed records do (summarized to the monthly level perhaps) - but is still at a higher level of aggregation than the query requests (the query wants data by quarter) -and Oracle will recognize that it can use the materialized view to get the answer.

Here is a simple example. We will set up a "sales" table.. This table will store the transaction date, a customer id and the total number of sales. This table will have about 350,000 rows in it. Another table, time_hierarchy, will store the mapping of transaction date to month, to quarter, to year. If we join the two together, we can obtain aggregate sales by month, quarter, year and so on. Likewise, if we had a table that mapped a customer id to a zip code and the zip code to a region - we could easily join this table to SALES, and aggregate by zip code or region.

In a conventional database schema (one without materialized views and other structures) these operations would succeed but they would be very slow. For every row in the sales data, we would have to perform an indexed read into the lookup table to convert either the transaction date or customer id into some other value (a NESTED LOOPS JOIN) in order to group by that other value. Enter the materialized view. We can store a summarized rollup of the details - perhaps at the monthly level for the transaction date and at the zip code level for the customer information. Now, rolling up to quarters or by region becomes a very fast operation.

This is what this might look like. We'll start by generating some random test data. We'll create the SALES table and load some random data into it using the ALL_OBJECTS view to create rows for us. My ALL_OBJECTS has about 22,000 rows in it so after 4 inserts that double the size of the table we'll have about 350k records. I am using the /*+ APPEND */ hint simply to avoid the undo/redo log that would otherwise be generated by these large inserts:


tkyte@TKYTE816> create table sales
2 (trans_date date, cust_id int, sales_amount number );
Table created.

tkyte@TKYTE816> insert /*+ APPEND */ into sales
2 select trunc(sysdate,'year')+mod(rownum,366) TRANS_DATE,
3 mod(rownum,100) CUST_ID,
4 abs(dbms_random.random)/100 SALES_AMOUNT
5 from all_objects
6 /
21921 rows created.

tkyte@TKYTE816> commit;
Commit complete.

tkyte@TKYTE816> begin
2 for i in 1 .. 4
3 loop
4 insert /*+ APPEND */ into sales
5 select trans_date, cust_id, abs(dbms_random.random)/100
6 from sales;
7 commit;
8 end loop;
9 end;
10 /
PL/SQL procedure successfully completed.

tkyte@TKYTE816> select count(*) from sales;

COUNT(*)
----------
350736

Those are the details representing one years worth of data. I set up the TRANS_DATE to simply be the first day of this year plus a number between 1 and 365. The CUSTOMER_ID is a number between 0 and 99. The total number of sales is some typically 'large' number (it was a really good year). Now, we need to setup out time_hierarchy table, to rollup the date field by month, year, quarter, and so on. :

tkyte@TKYTE816> create table time_hierarchy
2 (day, mmyyyy, mon_yyyy, qtr_yyyy, yyyy)
3 as
4 select distinct
5 trans_date DAY,
6 cast (to_char(trans_date,'mmyyyy') as number) MMYYYY,
7 to_char(trans_date,'mon-yyyy') MON_YYYY,
8 'Q' || ceil( to_char(trans_date,'mm')/3) || ' FY'
9 || to_char(trans_date,'yyyy') QTR_YYYY,
10 cast( to_char( trans_date, 'yyyy' ) as number ) YYYY
11 from sales
12 /
Table created.

In this case, that was simple enough. We generated:

q MMYYYY: The month including the year
q MON_YYYY: Same as above but we 'spelled' out the month
q QTR_YYYY: The quarter of the year including the year
q YYYY: The year itself

In general, the computations could be much more complex to create this table. For example, fiscal year quarters typically are not so easily computed - neither are fiscal years. They do not generally follow the calendar year. Continuing on we'll create the materialized view. The summary we are creating rolls the data up from individual days to months. We would expect our materialized view to have about 1/30th the number of rows as our SALES table does if the data was evenly distributed:

tkyte@TKYTE816> exec dbms_stats.gather_table_stats( user, 'SALES', cascade=>true );

tkyte@TKYTE816> exec dbms_stats.gather_table_stats( user, 'TIME_HIERARCHY', cascade=>true );

tkyte@TKYTE816> create materialized view sales_mv
2 build immediate
3 refresh on demand
4 enable query rewrite
5 as
6 select sales.cust_id, sum(sales.sales_amount) sales_amount,
7 time_hierarchy.mmyyyy
8 from sales, time_hierarchy
9 where sales.trans_date = time_hierarchy.day
10 group by sales.cust_id, time_hierarchy.mmyyyy
11 /
Materialized view created.

tkyte@TKYTE816> set autotrace on
tkyte@TKYTE816> select time_hierarchy.mmyyyy, sum(sales_amount)
2 from sales, time_hierarchy
3 where sales.trans_date = time_hierarchy.day
4 group by time_hierarchy.mmyyyy
5 /

MMYYYY SUM(SALES_AMOUNT)
---------- -----------------
12001 3.2177E+11
12002 1.0200E+10
22001 2.8848E+11
32001 3.1944E+11
42001 3.1012E+11
52001 3.2066E+11
62001 3.0794E+11
72001 3.1796E+11
82001 3.2176E+11
92001 3.0859E+11
102001 3.1868E+11
112001 3.0763E+11
122001 3.1305E+11

13 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=327 Bytes=850
1 0 SORT (GROUP BY) (Cost=4 Card=327 Bytes=8502)
2 1 TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=1 Card=327 Bytes

So far, so good - Oracle rewrote the query to use the view, SALES_MV. However, if we issue a query that calls for a higher level of aggregation:

tkyte@TKYTE816> set timing on
tkyte@TKYTE816> set autotrace on
tkyte@TKYTE816> select time_hierarchy.qtr_yyyy, sum(sales_amount)
2 from sales, time_hierarchy
3 where sales.trans_date = time_hierarchy.day
4 group by time_hierarchy.qtr_yyyy
5 /

QTR_YYYY SUM(SALES_AMOUNT)
------------------------------------------------ -----------------
Q1 FY2001 9.2969E+11
Q1 FY2002 1.0200E+10
Q2 FY2001 9.3872E+11
Q3 FY2001 9.4832E+11
Q4 FY2001 9.3936E+11

Elapsed: 00:00:05.58

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8289 Card=5 Bytes=14
1 0 SORT (GROUP BY) (Cost=8289 Card=5 Bytes=145)
2 1 NESTED LOOPS (Cost=169 Card=350736 Bytes=10171344)
3 2 TABLE ACCESS (FULL) OF 'SALES' (Cost=169 Card=350736 B
4 2 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_30180' (UNIQUE)

Statistics
----------------------------------------------------------
0 recursive calls
15 db block gets
351853 consistent gets


We see the Oracle doesn't have the knowledge we have. It does not yet know that it could have used the materialized view to answer this particular query, so used the original table SALES instead, and had to do a lot of work to get the answer. The same would be true for fiscal year.

So, let's use a dimension to alert Oracle to the fact that the materialized view would be useful in answering this question. First, we'll create the dimension:

tkyte@TKYTE816> create dimension time_hierarchy_dim
2 level day is time_hierarchy.day
3 level mmyyyy is time_hierarchy.mmyyyy
4 level qtr_yyyy is time_hierarchy.qtr_yyyy
5 level yyyy is time_hierarchy.yyyy
6 hierarchy time_rollup
7 (
8 day child of
9 mmyyyy child of
10 qtr_yyyy child of
11 yyyy
12 )
13 attribute mmyyyy
14 determines mon_yyyy;

Dimension created.

That tells Oracle that the DAY column of the TIME_HIERARCHY table implies MMYYYY, which in turn implies QTR_YYYY. Finally, QTR_YYYY implies YYYY. Also stated, is the fact that MMYYYY and MON_YYYY are synonymous - there is a one-to-one mapping between the two. So, anytime Oracle sees MON_YYYY used, it understands it is as if MMYYYY was used. Now that Oracle has a greater understanding of the relationships between the data we can see a marked improvement in our query response times:

tkyte@TKYTE816> set autotrace on
tkyte@TKYTE816> select time_hierarchy.qtr_yyyy, sum(sales_amount)
2 from sales, time_hierarchy
3 where sales.trans_date = time_hierarchy.day
4 group by time_hierarchy.qtr_yyyy
5 /

QTR_YYYY SUM(SALES_AMOUNT)
------------------------------------------------ -----------------
Q1 FY2001 9.2969E+11
Q1 FY2002 1.0200E+10
Q2 FY2001 9.3872E+11
Q3 FY2001 9.4832E+11
Q4 FY2001 9.3936E+11

Elapsed: 00:00:00.20

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=5 Bytes=195)
1 0 SORT (GROUP BY) (Cost=7 Card=5 Bytes=195)
2 1 HASH JOIN (Cost=6 Card=150 Bytes=5850)
3 2 VIEW (Cost=4 Card=46 Bytes=598)
4 3 SORT (UNIQUE) (Cost=4 Card=46 Bytes=598)
5 4 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_30180' (UNI
6 2 TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=1 Card=327 Byt

Statistics
----------------------------------------------------------
0 recursive calls
16 db block gets
12 consistent gets


Well, we went from 350k plus logical reads to 12 - that is not too bad, not too bad at all. If you run this example, you'll be able to see the difference. The first query took a while (about 6 seconds), the second query was answered on screen before our hands left the enter key (about 1/5 of a second).

We can use this dimension feature many times on the same base fact table. Consider if we assign a ZIP_CODE and REGION attribute to every customer in our database:

tkyte@TKYTE816> create table customer_hierarchy
2 ( cust_id primary key, zip_code, region )
3 organization index
4 as
5 select cust_id,
6 mod( rownum, 6 ) || to_char(mod( rownum, 1000 ), 'fm0000') zip_code,
7 mod( rownum, 6 ) region
8 from ( select distinct cust_id from sales)
9 /
Table created.

tkyte@TKYTE816> analyze table customer_hierarchy compute statistics;
Table analyzed.

And we recreate our materialized view to be a summary that shows us SALES_AMOUNT by ZIP_CODE and MMYYYY:

tkyte@TKYTE816> drop materialized view sales_mv;
Materialized view dropped.

tkyte@TKYTE816> create materialized view sales_mv
2 build immediate
3 refresh on demand
4 enable query rewrite
5 as
6 select customer_hierarchy.zip_code,
7 time_hierarchy.mmyyyy,
8 sum(sales.sales_amount) sales_amount
9 from sales, time_hierarchy, customer_hierarchy
10 where sales.trans_date = time_hierarchy.day
11 and sales.cust_id = customer_hierarchy.cust_id
12 group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy
13 /
Materialized view created.

We'll try to execute a query now that will show us sales by ZIP_CODE and MMYYYY and see that I uses the materialized view as expected:

tkyte@TKYTE816> set autotrace
tkyte@TKYTE816> select customer_hierarchy.zip_code,
2 time_hierarchy.mmyyyy,
3 sum(sales.sales_amount) sales_amount
4 from sales, time_hierarchy, customer_hierarchy
5 where sales.trans_date = time_hierarchy.day
6 and sales.cust_id = customer_hierarchy.cust_id
7 group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy
8 /
1250 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=409 Bytes=204
1 0 TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=1 Card=409 Bytes=2

Statistics
----------------------------------------------------------
28 recursive calls
12 db block gets
120 consistent gets

However, when we ask for information at a different level of aggregation (rolling MMYYYY up to YYYY and ZIP_CODE up to REGION) we see that Oracle does not recognize that it can use the materialized view:

tkyte@TKYTE816> select customer_hierarchy.region,
2 time_hierarchy.yyyy,
3 sum(sales.sales_amount) sales_amount
4 from sales, time_hierarchy, customer_hierarchy
5 where sales.trans_date = time_hierarchy.day
6 and sales.cust_id = customer_hierarchy.cust_id
7 group by customer_hierarchy.region, time_hierarchy.yyyy
8 /
9 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8289 Card=9 Bytes=26
1 0 SORT (GROUP BY) (Cost=8289 Card=9 Bytes=261)
2 1 NESTED LOOPS (Cost=169 Card=350736 Bytes=10171344)
3 2 NESTED LOOPS (Cost=169 Card=350736 Bytes=6663984)
4 3 TABLE ACCESS (FULL) OF 'SALES' (Cost=169 Card=350736
5 3 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_30185' (UNIQUE)
6 2 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_30180' (UNIQUE)

Statistics
----------------------------------------------------------
0 recursive calls
15 db block gets
702589 consistent gets

Oracle understands the time dimension we set up but it does not yet have any information about how CUST_ID, ZIP_CODE and REGION relate to each other in our customer hierarchy table. What we do to correct that is to rebuild our dimension with two hierarchies in it - one describing the time_hierarchy and the other describing the customer_hiearchy table:

tkyte@TKYTE816> drop dimension time_hierarchy_dim
2 /
Dimension dropped.

tkyte@TKYTE816> create dimension sales_dimension
2 level cust_id is customer_hierarchy.cust_id
3 level zip_code is customer_hierarchy.zip_code
4 level region is customer_hierarchy.region
5 level day is time_hierarchy.day
6 level mmyyyy is time_hierarchy.mmyyyy
7 level qtr_yyyy is time_hierarchy.qtr_yyyy
8 level yyyy is time_hierarchy.yyyy
9 hierarchy cust_rollup
10 (
11 cust_id child of
12 zip_code child of
13 region
14 )
15 hierarchy time_rollup
16 (
17 day child of
18 mmyyyy child of
19 qtr_yyyy child of
20 yyyy
21 )
22 attribute mmyyyy
23 determines mon_yyyy;

Dimension created.

We dropped the original time hierarchy and created a new, more descriptive one explaining all of the relevant relationships. Now Oracle will understand that the SALES_MV we created is able to answer many more questions, for example if we ask our REGION by YYYY question again:

tkyte@TKYTE816> select customer_hierarchy.region,
2 time_hierarchy.yyyy,
3 sum(sales.sales_amount) sales_amount
4 from sales, time_hierarchy, customer_hierarchy
5 where sales.trans_date = time_hierarchy.day
6 and sales.cust_id = customer_hierarchy.cust_id
7 group by customer_hierarchy.region, time_hierarchy.yyyy
8 /

REGION YYYY SALES_AMOUNT
---------- ---------- ------------
0 2001 5.9598E+11
0 2002 3123737106
1 2001 6.3789E+11
2 2001 6.3903E+11
2 2002 3538489159
3 2001 6.4069E+11
4 2001 6.3885E+11
4 2002 3537548948
5 2001 6.0365E+11

9 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=9 Bytes=576)
1 0 SORT (GROUP BY) (Cost=11 Card=9 Bytes=576)
2 1 HASH JOIN (Cost=9 Card=78 Bytes=4992)
3 2 HASH JOIN (Cost=6 Card=78 Bytes=4446)
4 3 VIEW (Cost=3 Card=19 Bytes=133)
5 4 SORT (UNIQUE) (Cost=3 Card=19 Bytes=133)
6 5 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_30180' (U
7 3 TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=1 Card=409 B
8 2 VIEW (Cost=3 Card=100 Bytes=700)
9 8 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)
10 9 INDEX (FULL SCAN) OF 'SYS_IOT_TOP_30185' (UNIQUE)

Statistics
----------------------------------------------------------
0 recursive calls
16 db block gets
14 consistent gets


Oracle was able to make use of BOTH hierarchies in the dimension here and can now make use of the materialized view. Due to the dimensions we created, it performed simple lookups to convert the CUST_ID column into REGION (since CUST_ID implies ZIP_CODE implies REGIONS) and the MMYYYY column into QTR_YYYY and answered our question almost instantly. Here, we reduced the number of logical IOs from 700,000 plus to 16. When you consider that the size of the SALES table will only grow over time and the size of the SALES_MV will grow much more slowly (180 records or so per month) - we can see that this query will scale very well over time. No one would complain about that.


Rating

  (26 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

David Aldridge, September 08, 2005 - 4:14 pm UTC

Tom,

I'm intrigued by the way you specified two entirely separate hierarchies in a single dimension statement. Was there a reason why you chose to do that instead of creating a new dimension like:

create dimension sales_hierarchy_dim
level cust_id is customer_hierarchy.cust_id
level zip_code is customer_hierarchy.zip_code
level region is customer_hierarchy.region
hierarchy cust_rollup
(
cust_id child of
zip_code child of
region
);

It appears to give the same rewrite on my own tests.

Tom Kyte
September 08, 2005 - 6:04 pm UTC

because you can?

(I honestly don't recall what I was thinking over 4 years ago when I first wrote that :) there will be new/better examples the next time around though, I learned a lot in 4 years.. and a lot has changed)

David Aldridge, September 09, 2005 - 9:34 am UTC

One difference that occurs to me is that with a single dimension statement covering multiple independent dimensions you can validate them all using a single call to DBMS_OLAP.Validate_Dimension()

This would be handy for cutting down on the amount of code you need to validate, although it would also mean you'd have to validate all of them in one go everytime.

Query with dimensions won’t rewrite without /*+ REWRITE */ hint.

Graeme, October 21, 2005 - 12:36 pm UTC

Hi Tom,

I am having real trouble getting a query to rewrite when using a denormalised hierarchy.. In general the queries on our data warehouse are built through Business Objects so I do not have the option of putting hints on queries (not that I would want to in any case). I am sure I can solve the issue using a partially normalised hierarchy but before taking such a drastic step wondered if you could see any other way around my problem.

We are running 10.1.0.4

We have the following materialized view based on a fact table and a couple of dimensions. (Note the ftt.sk_trans_cal_dt > 2699 predicate is simply to ensure queries rewrite to this test MV rather than any other. ) rolling up the sku day data to department week level.


CREATE MATERIALIZED VIEW GHV_MD_DEPT_LOC_TRANS_WK
REFRESH FAST ON DEMAND
ENABLE QUERY REWRITE
AS
SELECT COUNT(*) AS rf_count,
tim.sk_merch_yr_wk AS sk_merch_yr_wk,
prd.sk_dept_id AS sk_dept_id,
ftt.sk_location_id AS sk_location_id,
SUM(ftt.md_qty) AS md_qty,
COUNT(ftt.md_qty) AS rf_cnt_md_qty
FROM dim_time tim,
dim_product_sku prd,
ftt_md_sku_loc_trans ftt
WHERE ftt.sk_trans_cal_dt = tim.sk_cal_dt
AND ftt.sk_trans_cal_dt > 2699
AND ftt.sk_sku_id = prd.sk_sku_id
GROUP BY tim.sk_merch_yr_wk,
prd.sk_dept_id,
ftt.sk_location_id


CREATE DIMENSION product_sku
LEVEL sk_sku_id IS (dim_product_sku.sk_sku_id)
LEVEL sk_option_id IS (dim_product_sku.sk_option_id)
LEVEL sk_range_id IS (dim_product_sku.sk_range_id)
LEVEL sk_dept_id IS (dim_product_sku.sk_dept_id)
LEVEL sk_group_id IS (dim_product_sku.sk_group_id)
LEVEL sk_div_id IS (dim_product_sku.sk_div_id)
LEVEL all_prd_id IS (dim_product_sku.all_prd_id)
HIERARCHY product_sku_hier (
sk_sku_id CHILD OF
sk_option_id CHILD OF
sk_range_id CHILD OF
sk_dept_id CHILD OF
sk_group_id CHILD OF
sk_div_id CHILD OF
all_prd_id
)
ATTRIBUTE sk_sku_id DETERMINES sku_id
ATTRIBUTE sk_sku_id DETERMINES sku_descr
ATTRIBUTE sk_sku_id DETERMINES size_fit_id
ATTRIBUTE sk_option_id DETERMINES option_id
ATTRIBUTE sk_option_id DETERMINES option_descr
ATTRIBUTE sk_option_id DETERMINES sk_season_id
ATTRIBUTE sk_option_id DETERMINES season_id
ATTRIBUTE sk_option_id DETERMINES colour_id
ATTRIBUTE sk_option_id DETERMINES colour_descr
ATTRIBUTE sk_range_id DETERMINES range_id
ATTRIBUTE sk_range_id DETERMINES range_descr
ATTRIBUTE sk_dept_id DETERMINES dept_id
ATTRIBUTE sk_dept_id DETERMINES dept_descr
ATTRIBUTE sk_group_id DETERMINES group_id
ATTRIBUTE sk_group_id DETERMINES group_descr
ATTRIBUTE sk_div_id DETERMINES div_id
ATTRIBUTE sk_div_id DETERMINES div_descr
ATTRIBUTE all_prd_id DETERMINES all_prd_descr

From this you can see I have a denormalized hierarchy in the dim_product_sku table to allow me to easily roll up the details to a higher level.

The tables are analyzed using the following commands:

begin
dbms_stats.GATHER_TABLE_STATS
('dwh',
'FTT_MD_SKU_LOC_TRANS',
estimate_percent => 10,
method_opt => 'for all indexed columns size auto',
cascade => true,
degree => 4);
end;

The fact table has about 3.5 million rows and a bitmap index on the sk_trans_cal_dt


begin
dbms_stats.GATHER_TABLE_STATS
('dwh',
'DIM_PRODUCT_SKU',
estimate_percent => 40,
method_opt => 'for all columns size auto',
cascade => true,
degree => 4);
end;

The dimension table has 350000 rows

begin
dbms_stats.GATHER_TABLE_STATS
('dwh',
'GHV_MD_DEPT_LOC_TRANS_WK',
estimate_percent => 40,
method_opt => 'for all columns size auto',
cascade => true,
degree => 4);
end;

The materialized view has 110000 rows.

Just to show the rewrite works under simple circumstances:

BOQTST@dwhtst > set autotrace traceonly explain
BOQTST@dwhtst > SELECT
2 sku.sk_dept_id,
3 SUM (md_qty)
4 FROM
5 ftt_md_sku_loc_trans ftt,
6 dim_product_sku sku
7 WHERE
8 ftt.sk_trans_cal_dt > 2699 AND
9 ftt.sk_sku_id = sku.sk_sku_id
10 GROUP BY
11* sk_dept_id
BOQTST@dwhtst > /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=95 Card=56 Bytes=504)
1 0 SORT (GROUP BY) (Cost=95 Card=56 Bytes=504)
2 1 MAT_VIEW REWRITE ACCESS (FULL) OF 'GHV_MD_DEPT_LOC_TRANS_WK' (MAT_VIEW REWRITE) (Cost=56 Card=109603 Bytes=986427)

and that we can roll up further using the dimension:

1 SELECT
2 sku.sk_group_id,
3 SUM (md_qty)
4 FROM
5 ftt_md_sku_loc_trans ftt,
6 dim_product_sku sku
7 WHERE
8 ftt.sk_trans_cal_dt > 2699 AND
9 ftt.sk_sku_id = sku.sk_sku_id
10 GROUP BY
11* sk_group_id
BOQTST@dwhtst > /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1812 Card=7 Bytes=133)
1 0 SORT (GROUP BY) (Cost=1812 Card=7 Bytes=133)
2 1 HASH JOIN (Cost=1602 Card=534315 Bytes=10151985)
3 2 VIEW (Cost=1540 Card=273 Bytes=2730)
4 3 SORT (UNIQUE) (Cost=1540 Card=273 Bytes=2730)
5 4 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1407 Card=348795 Bytes=3487950)
6 2 MAT_VIEW REWRITE ACCESS (FULL) OF 'GHV_MD_DEPT_LOC_TRANS_WK' (MAT_VIEW REWRITE) (Cost=56 Card=109603 Bytes=986427)


If however I get more details with the group level it starts to go horribly wrong

BOQTST@dwhtst > SELECT
2 sku.div_id,
3 sku.div_descr,
4 sku.group_id,
5 sku.group_descr,
6 SUM (md_qty)
7 FROM
8 ftt_md_sku_loc_trans ftt,
9 dim_product_sku sku
10 WHERE
11 ftt.sk_trans_cal_dt > 2699 AND
12 ftt.sk_sku_id = sku.sk_sku_id
13 GROUP BY
14 sku.div_id,
15 sku.div_descr,
16 sk_group_id,
17 sku.group_id,
18 sku.group_descr
19 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=12051 Card=772 Bytes=44004)
1 0 SORT (GROUP BY) (Cost=12051 Card=772 Bytes=44004)
2 1 HASH JOIN (Cost=10486 Card=3493039 Bytes=199103223)
3 2 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1407 Card=348795 Bytes=15695775)
4 2 TABLE ACCESS (BY INDEX ROWID) OF 'FTT_MD_SKU_LOC_TRANS' (TABLE) (Cost=5394 Card=3493039 Bytes= 41916468)
5 4 BITMAP CONVERSION (TO ROWIDS)
6 5 BITMAP INDEX (RANGE SCAN) OF 'FTT_MD_SK_LC_DIM_TIME_FK1_BMI' (INDEX (BITMAP))



I can force it to rewrite and lets see what happens and let’s get the results back

BOQTST@dwhtst > col div_id form a3
BOQTST@dwhtst > col div_descr form a25
BOQTST@dwhtst > col group_id form a3
BOQTST@dwhtst > col group_descr form a25
1 SELECT /*+ REWRITE */
2 sku.div_id,
3 sku.div_descr,
4 sku.group_id,
5 sku.group_descr,
6 SUM (md_qty)
7 FROM
8 ftt_md_sku_loc_trans ftt,
9 dim_product_sku sku
10 WHERE
11 ftt.sk_trans_cal_dt > 2699 AND
12 ftt.sk_sku_id = sku.sk_sku_id
13 GROUP BY
14 sku.div_id,
15 sku.div_descr,
16 sk_group_id,
17 sku.group_id,
18* sku.group_descr
BOQTST@dwhtst > /
DIV DIV_DESCR GRO GROUP_DESCR SUM(MD_QTY)
--- ------------------------- --- ------------------------- -----------
001 Womenswear 001 Womens Clothing 3306297
001 Womenswear 002 Womens Non-clothing 1077890
001 Womenswear 003 Womens Footwear 422264
002 Menswear 004 Mens Clothing 1353178
002 Menswear 006 Mens Footwear 164644
002 Menswear 005 Mens Non-clothing 258622
003 Womenswear - Concessions 007 Womenswear - Concessions 4175
004 Menswear - Concessions 008 Menswear - Concessions 341
8 rows selected.
Elapsed: 00:00:01.38
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=33829 Card=772 Bytes=41688)
1 0 SORT (GROUP BY) (Cost=33829 Card=772 Bytes=41688)
2 1 HASH JOIN (Cost=2601 Card=58745251 Bytes=3172243554)
3 2 VIEW (Cost=2537 Card=30015 Bytes=1350675)
4 3 SORT (UNIQUE) (Cost=2537 Card=30015 Bytes=1350675)
5 4 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1407 Card=348795 Bytes=15695775)
6 2 MAT_VIEW REWRITE ACCESS (FULL) OF 'GHV_MD_DEPT_LOC_TRANS_WK' (MAT_VIEW REWRITE) (Cost=56 Card=
109603 Bytes=986427)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3062 consistent gets
0 physical reads
0 redo size
1183 bytes sent via SQL*Net to client
660 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
8 rows processed
BOQTST@dwhtst >


Obviously it hasn’t rewritten without the hint because the estimated cost is higher, if we compare the results without the rewrite hint the elapsed is significantly longer.

1 SELECT
2 sku.div_id,
3 sku.div_descr,
4 sku.group_id,
5 sku.group_descr,
6 SUM (md_qty)
7 FROM
8 ftt_md_sku_loc_trans ftt,
9 dim_product_sku sku
10 WHERE
11 ftt.sk_trans_cal_dt > 2699 AND
12 ftt.sk_sku_id = sku.sk_sku_id
13 GROUP BY
14 sku.div_id,
15 sku.div_descr,
16 sk_group_id,
17 sku.group_id,
18* sku.group_descr
BOQTST@dwhtst > /
DIV DIV_DESCR GRO GROUP_DESCR SUM(MD_QTY)
--- ------------------------- --- ------------------------- -----------
001 Womenswear 001 Womens Clothing 3306297
001 Womenswear 002 Womens Non-clothing 1077890
001 Womenswear 003 Womens Footwear 422264
002 Menswear 004 Mens Clothing 1353178
002 Menswear 006 Mens Footwear 164644
002 Menswear 005 Mens Non-clothing 258622
003 Womenswear - Concessions 007 Womenswear - Concessions 4175
004 Menswear - Concessions 008 Menswear - Concessions 341
8 rows selected.
Elapsed: 00:00:13.24
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=12051 Card=772 Bytes=44004)
1 0 SORT (GROUP BY) (Cost=12051 Card=772 Bytes=44004)
2 1 HASH JOIN (Cost=10486 Card=3493039 Bytes=199103223)
3 2 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1407 Card=348795 Bytes=15695775)
4 2 TABLE ACCESS (BY INDEX ROWID) OF 'FTT_MD_SKU_LOC_TRANS' (TABLE) (Cost=5394 Card=3493039 Bytes=
41916468)
5 4 BITMAP CONVERSION (TO ROWIDS)
6 5 BITMAP INDEX (RANGE SCAN) OF 'FTT_MD_SK_LC_DIM_TIME_FK1_BMI' (INDEX (BITMAP))


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
24524 consistent gets
0 physical reads
0 redo size
1183 bytes sent via SQL*Net to client
660 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
8 rows processed
BOQTST@dwhtst >


So what is going on here? From my point of view it is the cardinality of the distinct rows in the dim_product_sku for the joinback in the rewritten query is being estimated incorrectly and it is this that causes the optimizer to think the re-written query will be more expensive than it actually is. I am assuming that the rewritten query would look something like this:

BOQTST@dwhtst > set autotrace traceonly explain
BOQTST@dwhtst > SELECT
2 grp.div_id,
3 grp.div_descr,
4 grp.group_id,
5 grp.group_descr,
6 SUM (md_qty)
7 FROM
8 ghv_md_dept_loc_trans_wk mv,
9 (SELECT DISTINCT
10 div_id,
11 div_descr,
12 group_id,
13 group_descr,
14 sk_dept_id
15 FROM dim_product_sku) grp
16 WHERE
17 mv.sk_dept_id = grp.sk_dept_id
18 GROUP BY
19 grp.div_id,
20 grp.div_descr,
21 grp.group_id,
22 grp.group_descr
23 /
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=7348 Card=156 Bytes=7644)
1 0 SORT (GROUP BY) (Cost=7348 Card=156 Bytes=7644)
2 1 HASH JOIN (Cost=1602 Card=11868439 Bytes=581553511)
3 2 VIEW (Cost=1540 Card=6064 Bytes=242560)
4 3 SORT (UNIQUE) (Cost=1540 Card=6064 Bytes=242560)
5 4 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1407 Card=348795 Bytes=13951800)
6 2 MAT_VIEW ACCESS (FULL) OF 'GHV_MD_DEPT_LOC_TRANS_WK' (MAT_VIEW) (Cost=56 Card=109603 Bytes=986
427)

Though I obviously haven’t got it quite right as the cardinality is different – is there any way to see the exact query a rewritten query is written to?

The actual cardinality is 63:
BOQTST@dwhtst > set autotrace off
BOQTST@dwhtst > select count(distinct sk_dept_id ) from dim_product_sku
2 /

COUNT(DISTINCTSK_DEPT_ID)
-------------------------
63

Elapsed: 00:00:00.48

Am I right in thinking that the reason the optimizer has got a different estimated cardinality is that it looks at the cardinality of each individual column and calculates the overall cardinality assuming the columns are independent of each other? Unfortunately they are not – all of the columns are the same within a single sk_dept_id (as shown in the dimension). By taking columns out it can get the estimated cardinality to drop to a point where the query will re-write without the hint, it still gets the cardinality wrong just no so wrong!

BOQTST@dwhtst > set autotrace traceonly explain
BOQTST@dwhtst > SELECT
2 sku.group_id,
3 sku.group_descr,
4 SUM (md_qty)
5 FROM
6 ftt_md_sku_loc_trans ftt,
7 dim_product_sku sku
8 WHERE
9 ftt.sk_trans_cal_dt > 2699 AND
10 ftt.sk_sku_id = sku.sk_sku_id
11 GROUP BY
12 sku.group_id,
13 sku.group_descr
14 /
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2762 Card=35 Bytes=1190)
1 0 SORT (GROUP BY) (Cost=2762 Card=35 Bytes=1190)
2 1 HASH JOIN (Cost=1602 Card=2638301 Bytes=89702234)
3 2 VIEW (Cost=1540 Card=1348 Bytes=33700)
4 3 SORT (UNIQUE) (Cost=1540 Card=1348 Bytes=33700)
5 4 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1407 Card=348795 Bytes=8719875)
6 2 MAT_VIEW REWRITE ACCESS (FULL) OF 'GHV_MD_DEPT_LOC_TRANS_WK' (MAT_VIEW REWRITE) (Cost=56 Card=
109603 Bytes=986427)

Sorry to have gone on so long: My main questions are:

1) is there any way to see the exact query a rewritten query is written to?
2) Is there any way to inform to optimizer than columns are within a hierarchy and not independent so it will estimate the cardinality accurately. I can’t use hints and I would rather not use stored outlines as the nature of the data warehouse is that I can’t know what all the queries will be
3) Does this mean that the denormalised single product hierarchy often recommended for data warehouse applications is inappropriate if we are using materialized views? I understand I could incorporate all the department attributes into the materialized view but lower down the hierarchy we have Materialized views with more than 100 million rows (and growing). To include all the attributes (for the product and other dimensions) in these would have a massive impact on diskspace and full tablescan performance
4) any other things you feel I should try before changing the data model?


Tom Kyte
October 22, 2005 - 9:38 am UTC

... calculates the overall cardinality assuming the columns are
independent of each other? ...

yes - I like an example Jonathan Lewis has used.


"Ok room - how many people are of the zodiac sign Pisces? (1/12 of the room....). Ok, how many were born in December? (1/12 of the room). Ok, therefore 1/144 of you are pisces born in december.... (wrong.... but that is the way the CBO looks at it)"


Can you try this with dynamic sampling?

Dynamic sampling doesn't seem to make much difference

Graeme, October 22, 2005 - 12:53 pm UTC

Dynamic sampling doesn’t seem to make much difference, while it helps with cardinality it doesn’t seem to estimate the number of rows coming back from a select distinct

I deleted the stats for dim_product_sku and then switched off dynamic sampling ( it was set to 2 in previous example but I believe this only kicks in if the tables are unanalyzed?)

BOQTST@dwhtst > set autotrace traceonly explain
BOQTST@dwhtst > alter session set optimizer_dynamic_sampling = 0
2 /
Session altered.
BOQTST@dwhtst > SELECT
2 sku.div_id,
3 sku.div_descr,
4 sku.group_id,
5 sku.group_descr,
6 SUM (md_qty)
7 FROM
8 ftt_md_sku_loc_trans ftt,
9 dim_product_sku sku
10 WHERE
11 ftt.sk_trans_cal_dt > 2699 AND
12 ftt.sk_sku_id = sku.sk_sku_id
13 GROUP BY
14 sku.div_id,
15 sku.div_descr,
16 sk_group_id,
17 sku.group_id,
18 sku.group_descr
19 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=38616 Card=24744810 Bytes=2400246570)
1 0 SORT (GROUP BY) (Cost=38616 Card=24744810 Bytes=2400246570)
2 1 MERGE JOIN (Cost=26097 Card=24744810 Bytes=2400246570)
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=749 Card=962019 Bytes=81771615)
4 3 INDEX (FULL SCAN) OF 'DIM_PRD_SKU_PK' (INDEX (UNIQUE)) (Cost=26 Card=962019)
5 2 SORT (JOIN) (Cost=25348 Card=3493039 Bytes=41916468)
6 5 TABLE ACCESS (FULL) OF 'FTT_MD_SKU_LOC_TRANS' (TABLE) (Cost=17405 Card=3493039 Bytes=41916468)

Results are much the same except the cost goes up due to the cardinality on dim_product_sku without stats or dynamic sampling being estimated at 962019. This really messes up the plan when it is forced to rewrite:
1 SELECT /*+ REWRITE */
2 sku.div_id,
3 sku.div_descr,
4 sku.group_id,
5 sku.group_descr,
6 SUM (md_qty)
7 FROM
8 ftt_md_sku_loc_trans ftt,
9 dim_product_sku sku
10 WHERE
11 ftt.sk_trans_cal_dt > 2699 AND
12 ftt.sk_sku_id = sku.sk_sku_id
13 GROUP BY
14 sku.div_id,
15 sku.div_descr,
16 sk_group_id,
17 sku.group_id,
18* sku.group_descr
BOQTST@dwhtst > /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1199570 Card=1107 Bytes=94095)
1 0 SORT (GROUP BY NOSORT) (Cost=1199570 Card=1107 Bytes=94095)
2 1 VIEW (Cost=1199570 Card=1107 Bytes=94095)
3 2 SORT (UNIQUE) (Cost=1199570 Card=1107 Bytes=111807)
4 3 HASH JOIN (Cost=4694 Card=1882860151 Bytes=190168875251)
5 4 MAT_VIEW REWRITE ACCESS (FULL) OF 'GHV_MD_DEPT_LOC_TRANS_WK' (MAT_VIEW REWRITE) (Cost=56 Card=109603 Bytes=1753648)
6 4 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1436 Card=962019 Bytes=81771615)
IF we now add dynamic sampling level 2:

1 SELECT /*+ DYNAMIC_SAMPLING (2)
2 REWRITE */
3 sku.div_id,
4 sku.div_descr,
5 sku.group_id,
6 sku.group_descr,
7 SUM (md_qty)
8 FROM
9 ftt_md_sku_loc_trans ftt,
10 dim_product_sku sku
11 WHERE
12 ftt.sk_trans_cal_dt > 2699 AND
13 ftt.sk_sku_id = sku.sk_sku_id
14 GROUP BY
15 sku.div_id,
16 sku.div_descr,
17 sku.group_id,
18* sku.group_descr
BOQTST@dwhtst > /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=380384 Card=1107 Bytes=94095)
1 0 SORT (GROUP BY NOSORT) (Cost=380384 Card=1107 Bytes=94095)
2 1 VIEW (Cost=380384 Card=1107 Bytes=94095)
3 2 SORT (UNIQUE) (Cost=380384 Card=1107 Bytes=111807)
4 3 HASH JOIN (Cost=2595 Card=627597962 Bytes=63387394162)
5 4 MAT_VIEW REWRITE ACCESS (FULL) OF 'GHV_MD_DEPT_LOC_TRANS_WK' (MAT_VIEW REWRITE) (Cost=56 Card=109603 Bytes=1753648)
6 4 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1398 Card=320662 Bytes=27256270)

There is essentially very little difference in the plan – despite the improved cardinality estimate of dim_product_sku (320662) the estimated cardinality fater the sort unique was identical. If we up the level of the dynamic sampling to a much higher level like 8 all we get is an improved estimate of the dim_product_sku cardinality.

To take some simpler examples:

BOQTST@dwhtst > select /*+ DYNAMIC_SAMPLING (0) */distinct sk_dept_id
2 from dim_product_sku
3 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3663 Card=962019 Bytes=12506247)
1 0 SORT (UNIQUE) (Cost=3663 Card=962019 Bytes=12506247)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1436 Card=962019 Bytes=12506247)

1 select /*+ DYNAMIC_SAMPLING (4) */distinct sk_dept_id
2* from dim_product_sku
BOQTST@dwhtst > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2132 Card=320662 Bytes=4168606)
1 0 SORT (UNIQUE) (Cost=2132 Card=320662 Bytes=4168606)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1398 Card=320662 Bytes=4168606)

Again showing improvement in the cardinality estimate on dim_product_sku but nothing after the sort.

1 select /*+ DYNAMIC_SAMPLING (0) */distinct sk_dept_id
2 from dim_product_sku
3* where div_id = '001'
BOQTST@dwhtst > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1699 Card=9620 Bytes=211640)
1 0 SORT (UNIQUE) (Cost=1699 Card=9620 Bytes=211640)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1696 Card=9620 Bytes=211640)

1 select /*+ DYNAMIC_SAMPLING (4) */distinct sk_dept_id
2 from dim_product_sku
3* where div_id = '001'
BOQTST@dwhtst > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2137 Card=213308 Bytes=4692776)
1 0 SORT (UNIQUE) (Cost=2137 Card=213308 Bytes=4692776)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1485 Card=213308 Bytes=4692776)

With a predicate the dynamic sampling again shows improvement on the scan but not on the sort

1 select /*+ DYNAMIC_SAMPLING (0) */
2 sk_dept_id,
3 count(*)
4 from dim_product_sku
5 where div_id = '001'
6* group by sk_dept_id
BOQTST@dwhtst > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1699 Card=9620 Bytes=211640)
1 0 SORT (GROUP BY) (Cost=1699 Card=9620 Bytes=211640)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1696 Card=9620 Bytes=211640)

1 select /*+ DYNAMIC_SAMPLING (4) */
2 sk_dept_id,
3 count(*)
4 from dim_product_sku
5 where div_id = '001'
6* group by sk_dept_id
BOQTST@dwhtst > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1563 Card=213308 Bytes=4692776)
1 0 SORT (GROUP BY) (Cost=1563 Card=213308 Bytes=4692776)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1485 Card=213308 Bytes=4692776)

And finally showing the same behaviour on the GROUP BY.
I see exactly what you mean with Jonathon's example Though mine is the other way round (How many born on Christmas day (1/365) combined with how many born in December (1/12) - how can I get the answer 1/365?)

Any more ideas or feed back on original questions 1 and 3

Thanks!



Tom Kyte
October 23, 2005 - 4:37 am UTC

you would want something more like level 3 or 4:

Level 3 Same as level 2, but also include tables that use a guess for selecting some predicate. Use default sampling amounts.

Level 4 Same as level 3, but include tables that have single table predicates that reference two or more columns. Use default sampling amounts.


Still no Joy I'm afraid

Graeme, October 24, 2005 - 5:36 am UTC

Do weekends and nights mean nothing to you? ;-) Thanks for the quick response.

As mentioned above I used up to level 8 without any significant difference. Dynamic sampling seems great for using the predicates to calculate the number of rows it is going to have to tackle but not so useful on the number of rows returned by a distinct or group by.

Foe example, if we put stats back on this dimension table

DWH@dwhtst > begin dbms_stats.GATHER_TABLE_STATS
2 ('dwh',
3 'DIM_PRODUCT_SKU',
4 estimate_percent => 40,
5 method_opt => 'for all columns size auto',
6 cascade => true,
7 degree => 4);
8 end;
9 /

PL/SQL procedure successfully completed.

And then look at the plan for predicates on a single column and two columns when we have stats. Note I have chosen a rather skewed value, though we have 60 departments dept 115 has almost 10% of the rows. We can see the actual cardinality is 30222 rows for this department

BOQTST@dwhtst > select dept_id, sk_dept_id, group_id, div_id, count(*)
2 from dim_product_sku
3 where dept_id = '115'
4 group by dept_id, sk_dept_id, group_id, div_id
5 /

DEPT_ID SK_DEPT_ID GROUP_ID DIV_ID COUNT(*)
--------------- ---------- --------------- --------------- ----------
115 193093 001 001 30222


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1481 Card=30413 By
tes=517021)

1 0 SORT (GROUP BY) (Cost=1481 Card=30413 Bytes=517021)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1470 Card=30413 Bytes=517021)

That’s pretty good on the predicate but the cardinality after the group by is way out as the optimizer is assuming total independence between these columns. If we use 2 columns in the where clause then cardinality from the predicate goes haywire for the same reason.

1 select count(*)
2 from dim_product_sku
3* where dept_id = '115' and sk_dept_id = 193093
BOQTST@dwhtst > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1469 Card=1 Bytes=9)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1469 Card=2660 Bytes=23940)

If we use dynamic sampling level 4 this improves matters, however the skew of the data probably means we need to sample more data

1 select /*+ DYNAMIC_SAMPLING (4) */
2 count(*)
3 from dim_product_sku
4* where dept_id = '115' and sk_dept_id = 193093
BOQTST@dwhtst > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1469 Card=1 Bytes=9)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1469 Card=14072 Bytes=126648)

Interestingly trying with 6 isn’t a major improvement

1 select /*+ DYNAMIC_SAMPLING (6) */
2 count(*)
3 from dim_product_sku
4* where dept_id = '115' and sk_dept_id = 193093
BOQTST@dwhtst > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1469 Card=1 Bytes=9)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1469 Card=17800 Bytes=160200)

But 8 is

1 select /*+ DYNAMIC_SAMPLING (8) */
2 count(*)
3 from dim_product_sku
4* where dept_id = '115' and sk_dept_id = 193093
BOQTST@dwhtst > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1469 Card=1 Bytes=9)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1469 Card=32952 Bytes=296568)

If we incorporate a single column in the group by then dynamic sampling seems just to fall back on the histogram to workout the cardinality after the group by. This is despite the column we are grouping by being one of those in the predicate.

1 select /*+ DYNAMIC_SAMPLING (8) */
2 dept_id, count(*)
3 from dim_product_sku
4 where dept_id = '115' and sk_dept_id = 193093
5* group by dept_id
BOQTST@dwhtst > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1469 Card=60 Bytes=540)
1 0 SORT (GROUP BY NOSORT) (Cost=1469 Card=60 Bytes=540)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1469 Card=32952 Bytes=296568)

A second column to group by just makes it worse

1 select /*+ DYNAMIC_SAMPLING (8) */
2 dept_id, sk_dept_id, count(*)
3 from dim_product_sku
4 where dept_id = '115' and sk_dept_id = 193093
5* group by dept_id, sk_dept_id
BOQTST@dwhtst > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1469 Card=32952 By
tes=296568)

1 0 SORT (GROUP BY NOSORT) (Cost=1469 Card=32952 Bytes=296568)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1
469 Card=32952 Bytes=296568)


If we drop the stats including the histograms to determine what happens just using dynamic sampling we see no improvement

DWH@dwhtst > BEGIN
2 SYS.DBMS_STATS.DELETE_TABLE_STATS (
3 OwnName => 'DWH'
4 ,TabName => 'DIM_PRODUCT_SKU'
5 ,Cascade_Columns => TRUE
6 ,Cascade_Indexes => TRUE
7 ,Cascade_Parts => TRUE
8 ,No_Invalidate => FALSE);
9 END;
10 /

PL/SQL procedure successfully completed.

BOQTST@dwhtst > select /*+ DYNAMIC_SAMPLING (8) */
2 dept_id, count(*)
3 from dim_product_sku
4 where dept_id = '115' and sk_dept_id = 193093
5 group by dept_id
6 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1462 Card=30819 Bytes=678018)
1 0 SORT (GROUP BY NOSORT) (Cost=1462 Card=30819 Bytes=678018)
2 1 TABLE ACCESS (FULL) OF 'DIM_PRODUCT_SKU' (TABLE) (Cost=1462 Card=30819 Bytes=678018)

So in conclusion while using dynamic sampling seems great at sorting out the cardinality when the predicate refers to > 1 column (i.e much better than histograms) there is no improvement in estimating cardinality after a sort. Therefore it is no help in my materialized view joining back to a denormalised product dimension.

Maybe going forward for materialized views the optimizer could make use of the information in the dimension to determine the level of cardinality as well as using it to determine that it can rewrite. It is a shame it seems to disregard the information that allows it to re-write when it works out the plan for the re-written query.

In the meantime unless you have any other ideas it would seem the best thing for me to do is have a product dimension table at each level I summarise to in the materialized view that is denormalised for all the levels above it. So in this example I would have a dim_product_dept with primary key of sk_dept_id denormalised to have all the group and division details in it.


not getting rewrite when using sum without group by

Graeme, February 16, 2006 - 8:39 am UTC

Hi Tom, 

I have a problem with rewriting to materialized views when using a SUM without a group by. Using your definitions of tables and materialized views and dimensions in your original example above I fail to get a re-write to the materialized view if I want to see the sum of the sales for some given quarters:

SQL> set autotrace on
SQL> col qtr_yyyy form a10

SQL> select  sum(sales.sales_amount) as amt
  2     from sales, time_hierarchy
  3   where sales.trans_date = time_hierarchy.day
  4   AND   time_hierarchy.qtr_yyyy  IN ( 'Q1 FY2006', 'Q2 FY2006')
  5  /

       AMT
----------
3.5161E+12


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=499 Card=1 Bytes=33)
   1    0   SORT (AGGREGATE)
   2    1     HASH JOIN (Cost=499 Card=264216 Bytes=8719128)
   3    2       TABLE ACCESS (FULL) OF 'TIME_HIERARCHY' (TABLE) (Cost=4 Card=146 Bytes=2628)
   4    2       TABLE ACCESS (FULL) OF 'SALES' (TABLE) (Cost=455 Card=660539 Bytes=9908085)




Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1047  consistent gets

In this case it would seem the dimension is not telling the optimizer that the columns in the where clause are above the month in the hierarchy and allowing the re-write to occur.

Using the dbms_mview.explain_rewrite procedure I get the following error 

QSM-01050: roll-up check failed for materialized view, SALES_MV

If I use a group by clause then the materialized view is used 


SQL> select time_hierarchy.qtr_yyyy, sum(sales.sales_amount) as amt 
  2     from sales, time_hierarchy 
  3   where sales.trans_date = time_hierarchy.day 
  4   AND   time_hierarchy.qtr_yyyy  IN ( 'Q1 FY2006', 'Q2 FY2006') 
  5   group by time_hierarchy.qtr_yyyy 
  6  /

QTR_YYYY          AMT
---------- ----------
Q1 FY2006  1.7497E+12
Q2 FY2006  1.7663E+12


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=11 Card=2 Bytes=82)
   1    0   SORT (GROUP BY) (Cost=11 Card=2 Bytes=82)
   2    1     HASH JOIN (Cost=10 Card=1827 Bytes=74907)
   3    2       VIEW (Cost=5 Card=19 Bytes=285)
   4    3         SORT (UNIQUE) (Cost=5 Card=19 Bytes=285)
   5    4           TABLE ACCESS (FULL) OF 'TIME_HIERARCHY' (TABLE) (Cost=4 Card=146 Bytes=2190)
   6    2       MAT_VIEW REWRITE ACCESS (FULL) OF 'SALES_MV' (MAT_VIEW REWRITE) (Cost=4 Card=1250 Byte
s=32500)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets



Of course I can get my single row using an inline view but this is far from obvious for the individual developer and getting a tool like business objects to do the same thing while supporting other sales queries is extremely difficult

SQL> select sum(amt) from
  2  (
  3  select time_hierarchy.qtr_yyyy, sum(sales.sales_amount) as amt
  4     from sales, time_hierarchy
  5   where sales.trans_date = time_hierarchy.day
  6   AND   time_hierarchy.qtr_yyyy  IN ( 'Q1 FY2006', 'Q2 FY2006')
  7   group by time_hierarchy.qtr_yyyy
  8  )
  9  /

  SUM(AMT)
----------
3.5161E+12


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=11 Card=1 Bytes=13)
   1    0   SORT (AGGREGATE)
   2    1     VIEW (Cost=11 Card=2 Bytes=26)
   3    2       SORT (GROUP BY) (Cost=11 Card=2 Bytes=82)
   4    3         HASH JOIN (Cost=10 Card=1827 Bytes=74907)
   5    4           VIEW (Cost=5 Card=19 Bytes=285)
   6    5             SORT (UNIQUE) (Cost=5 Card=19 Bytes=285)
   7    6               TABLE ACCESS (FULL) OF 'TIME_HIERARCHY' (TABLE) (Cost=4 Card=146 Bytes=2190)
   8    4           MAT_VIEW REWRITE ACCESS (FULL) OF 'SALES_MV' (MAT_VIEW REWRITE) (Cost=4 Card=1250
Bytes=32500 ) 

Statistics 
---------------------------------------------------------- 
         10  recursive calls 
          0  db block gets 
         23  consistent gets


Would you expect the sum to work without the group by or am I just expecting too much? I have only tested this on 10.1.0.4 and don't know if it is consistent across different versions. 

I can get the query to rewrite if I include the qtr_yyyy in the materialized view definition but want to find some other method as this would mean that I would have to include every column that might be used to restrict a sum and massively increase the size of my materialized view. Any other suggestions on how I could get my original query to re-write? 

Thanks for your help
 

Tom Kyte
February 16, 2006 - 12:05 pm UTC

no example.... (no full example).

I will ask you to use dbms_mview explain_rewrite and explain_mview first to see what it says (flying through these follow/reviews very quickly today). explain rewrite might be very useful.

Not sure what else I should have provided :-(

Graeme, February 17, 2006 - 4:32 am UTC

Hi Tom, 

I'm not sure what else I should have provided. I thought that using the tables, materialized views and dimensions of YOUR example would help you to tell me whether what I am regarding as an issue is what you would expect in any case. I followed exactly the code in your example to create my structures . 

As stated above I used the explain_rewrite procedure and got a QSM-01050: roll-up check failed for materialized view, SALES_MV. full output from the table:

SQL > select sequence, message from rewrite_table order by sequence
  2  /

  SEQUENCE
----------
MESSAGE
---------------------------------------------------------------------------------------
---------------------------------------------
         1
QSM-01050: roll-up check failed for materialized view, SALES_MV

         2
QSM-01102: materialized view, SALES_MV, requires join back to table, TIME_HIERARCHY, on
 column, QTR_YYYY


results of explain_mview:

SQL> select mvname, capability_name, possible, msgtxt, seq
  2  from mv_capabilities_table
  3  /

MVNAME   CAPABILITY_NAME                P
-------- ------------------------------ -
MSGTXT                                                                                                 SEQ
----------------------------------------------------------------------------------------------- ----------
SALES_MV PCT                            N
                                                                                                         1

SALES_MV REFRESH_COMPLETE               Y
                                                                                                      1002

SALES_MV REFRESH_FAST                   N
                                                                                                      2003

SALES_MV REWRITE                        Y
                                                                                                      3004

SALES_MV PCT_TABLE                      N
relation is not a partitioned table                                                                   4005

SALES_MV PCT_TABLE                      N
relation is not a partitioned table                                                                   4006

SALES_MV REFRESH_FAST_AFTER_INSERT      N
the detail table does not have a materialized view log                                                5007

SALES_MV REFRESH_FAST_AFTER_INSERT      N
the detail table does not have a materialized view log                                                5008

SALES_MV REFRESH_FAST_AFTER_ONETAB_DML  N
SUM(expr) without COUNT(expr)                                                                         6009

SALES_MV REFRESH_FAST_AFTER_ONETAB_DML  N
see the reason why REFRESH_FAST_AFTER_INSERT is disabled                                              6010

SALES_MV REFRESH_FAST_AFTER_ONETAB_DML  N
COUNT(*) is not present in the select list                                                            6011

SALES_MV REFRESH_FAST_AFTER_ONETAB_DML  N
SUM(expr) without COUNT(expr)                                                                         6012

SALES_MV REFRESH_FAST_AFTER_ANY_DML     N
see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled                                          7013

SALES_MV REFRESH_FAST_PCT               N
PCT is not possible on any of the detail tables in the materialized view                              8014

SALES_MV REWRITE_FULL_TEXT_MATCH        Y
                                                                                                      9015

SALES_MV REWRITE_PARTIAL_TEXT_MATCH     Y
                                                                                                     10016

SALES_MV REWRITE_GENERAL                Y
                                                                                                     11017

SALES_MV REWRITE_PCT                    N
general rewrite is not possible or PCT is not possible on any of the detail tables                   12018

SALES_MV PCT_TABLE_REWRITE              N
relation is not a partitioned table                                                                  13019

SALES_MV PCT_TABLE_REWRITE              N
relation is not a partitioned table                                                                  13020


20 rows selected.


Thanks for your help
 

Anything more I can Add?

Graeme, February 22, 2006 - 5:33 am UTC

Hi Tom,

Is there any more information I can give you to help with my query?

Thanks for your help

Tom Kyte
February 22, 2006 - 8:43 am UTC

it is just that "my example was large" and I don't know what bits and bytes you used from it, I don't have time to rip it all apart - I like self contained test cases - you know

create this
create that
do this
observe that.

I have more than one version of sales_mv, didn't know which to use.



[tkyte@dellpe ~]$ oerr qsm 1050
01050, 00000, "roll-up check failed for materialized view, %s"
// *Cause: A column in the query cannot be computed from the materialized view.
// *Action: Consider creating a dimension to supply the missing information


if you give me a small self contained example, with as little extras as possible, I can take a look.

From the top then

Graeme Hobbs, February 23, 2006 - 9:43 am UTC

Hi Tom, 

I understand your point, I was just trying to help by making the question shorter and using your examples - I hadn't noticed the change to sales_mv, appologies for that. This time I'll take it from the top!

CREATE TABLES AND MATERIALIZED VIEWS AND DIMENSIONS:

Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

SQL>    create table sales
  2  (trans_date date, cust_id int, sales_amount number );

Table created.

SQL> insert /*+ APPEND */ into sales
  2  select trunc(sysdate,'year')+mod(rownum,366) TRANS_DATE,
  3            mod(rownum,100) CUST_ID,
  4            abs(dbms_random.random)/100 SALES_AMOUNT
  5      from all_objects
  6  /

41387 rows created.
SQL> commit;

Commit complete.

SQL> begin
  2        for i in 1 .. 4
  3       loop
  4            insert /*+ APPEND */ into sales
  5            select trans_date, cust_id, abs(dbms_random.random)/100
  6              from sales;
  7            commit;
  8        end loop;
  9   end;
 10  /

PL/SQL procedure successfully completed.

SQL> select count(*) from sales;

  COUNT(*)
----------
    662192

SQL> create table time_hierarchy
  2    (day, mmyyyy, mon_yyyy, qtr_yyyy, yyyy)
  3    as
  4    select distinct
  5       trans_date    DAY,
  6       cast (to_char(trans_date,'mmyyyy') as number) MMYYYY,
  7       to_char(trans_date,'mon-yyyy') MON_YYYY,
  8       'Q' || ceil( to_char(trans_date,'mm')/3) || ' FY'
  9           || to_char(trans_date,'yyyy') QTR_YYYY,
 10       cast( to_char( trans_date, 'yyyy' ) as number ) YYYY
 11      from sales
 12  /

Table created.

SQL> exec dbms_stats.gather_table_stats( user, 'SALES', cascade=>true);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats( user, 'TIME_HIERARCHY',cascade=>true);

PL/SQL procedure successfully completed.

SQL> create materialized view sales_mv
  2  build immediate
  3  refresh on demand
  4  enable query rewrite
  5  as
  6  select sales.cust_id, sum(sales.sales_amount) sales_amount,
  7         time_hierarchy.mmyyyy
  8  from sales, time_hierarchy
  9  where sales.trans_date = time_hierarchy.day
 10  group by sales.cust_id, time_hierarchy.mmyyyy
 11  /

Materialized view created.

SQL> exec dbms_stats.gather_table_stats( user, 'SALES_MV', cascade=>true);

PL/SQL procedure successfully completed.


So we can do a query with a group by that rewrites without issues to the materialized view 


SQL> set autotrace on
SQL> select time_hierarchy.qtr_yyyy, sum(sales.sales_amount) as amt
  2  from sales, time_hierarchy
  3  where sales.trans_date = time_hierarchy.day
  4  and   time_hierarchy.qtr_yyyy  IN ( 'Q1 FY2006', 'Q2 FY2006')
  5  group by time_hierarchy.qtr_yyyy
  6  /

QTR_YYYY                                                AMT
------------------------------------------------ ----------
Q1 FY2006                                        1.7486E+12
Q2 FY2006                                        1.7721E+12

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=11 Card=2 Bytes=56)
   1    0   SORT (GROUP BY) (Cost=11 Card=2 Bytes=56)
   2    1     HASH JOIN (Cost=10 Card=1827 Bytes=51156)
   3    2       VIEW (Cost=5 Card=19 Bytes=285)
   4    3         SORT (UNIQUE) (Cost=5 Card=19 Bytes=285)
   5    4           TABLE ACCESS (FULL) OF 'TIME_HIERARCHY' (TABLE) (Cost=4 Card=146 Bytes=2190)
   6    2       MAT_VIEW REWRITE ACCESS (FULL) OF 'SALES_MV' (MAT_VIEW REWRITE) (Cost=4 Card=1250 Bytes=16250)

However if I just want the combined sum for these 2 quarters  and drop the group by then it stops using the materialized view. This was completely unexpected (to me, maybe I am expecting too much?) as the data can be completely derived from the materialized view.

  1  select  sum(sales.sales_amount) as amt
  2          from sales, time_hierarchy
  3  where sales.trans_date = time_hierarchy.day
  4* and   time_hierarchy.qtr_yyyy  IN ( 'Q1 FY2006', 'Q2 FY2006')
SQL> /

       AMT
----------
3.5207E+12


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=500 Card=1 Bytes=33)
   1    0   SORT (AGGREGATE)
   2    1     HASH JOIN (Cost=500 Card=264554 Bytes=8730282)
   3    2       TABLE ACCESS (FULL) OF 'TIME_HIERARCHY' (TABLE) (Cost=4 Card=146 Bytes=2628)
   4    2       TABLE ACCESS (FULL) OF 'SALES' (TABLE) (Cost=457 Card=661384 Bytes=9920760)

As mentioned previously placing the query through dbms_mview.explain_rewrite produced the QSM 1050 error but the recommended action (create a dimension) has already been done. 

Of course I can get my single row using an inline view but this is far from obvious for the individual developer and getting a tool like business objects to do the same thing while supporting other sales queries is extremely difficult

SQL> select sum(amt) from
  2  (
  3  select time_hierarchy.qtr_yyyy, sum(sales.sales_amount) as amt
  4     from sales, time_hierarchy
  5   where sales.trans_date = time_hierarchy.day
  6   AND   time_hierarchy.qtr_yyyy  IN ( 'Q1 FY2006', 'Q2 FY2006')
  7   group by time_hierarchy.qtr_yyyy
  8  )
  9  /

  SUM(AMT)
----------
3.5207E+12


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=1 Bytes=13)
   1    0   SORT (AGGREGATE)
   2    1     VIEW (Cost=5 Card=2 Bytes=26)
   3    2       SORT (GROUP BY) (Cost=5 Card=2 Bytes=36)
   4    3         MAT_VIEW REWRITE ACCESS (FULL) OF 'SALES_MV' (MAT_VIEW REWRITE) (Cost=4 Card=500 Bytes=9000)


I can get the original query to rewrite if I include qtr_yyyy in the materialized view but I think this negates the whole point of having the dimension as in theory I'd have to put all the hierarchy into the materialized view and massively increase the size of my materialized view. Any other suggestions on how I could get my original query to re-write? 


SQL> drop materialized view sales_mv;

Materialized view dropped.

SQL> create materialized view sales_mv
  2  build immediate
  3    refresh on demand
  4    enable query rewrite
  5    as
  6    select sales.cust_id, sum(sales.sales_amount) sales_amount,
  7           time_hierarchy.qtr_yyyy, time_hierarchy.mmyyyy
  8      from sales, time_hierarchy
  9     where sales.trans_date = time_hierarchy.day
 10     group by sales.cust_id, time_hierarchy.qtr_yyyy, time_hierarchy.mmyyyy
 11  /

Materialized view created.

SQL> exec dbms_stats.gather_table_stats( user, 'SALES_MV', cascade=>true);

PL/SQL procedure successfully completed.

SQL> select  sum(sales.sales_amount) as amt
  2          from sales, time_hierarchy
  3  where sales.trans_date = time_hierarchy.day
  4  and   time_hierarchy.qtr_yyyy  IN ( 'Q1 FY2006', 'Q2 FY2006')
  5  /

       AMT
----------
3.5207E+12


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=1 Bytes=18)
   1    0   SORT (AGGREGATE)
   2    1     MAT_VIEW REWRITE ACCESS (FULL) OF 'SALES_MV' (MAT_VIEW REWRITE) (Cost=4 Card=500 Bytes=9000)


Thanks for your help, 
 

Tom Kyte
February 23, 2006 - 10:24 am UTC

well, still missing bits (the dimensions and all) but I read through it all this time.

I believe you encountering the documented restriction:

...
If a column or expression is present in the GROUP BY clause of the materialized view, it must also be present in the SELECT list.
......


I know both you and I can see "it obviously can be done, there is no group by at all", but the software does not yet see it as something that can be done due to the missing group by which would allow for the column to be in the select list.


One more question!

Graeme, March 08, 2006 - 12:47 pm UTC

Hi Tom, sorry to bother you further, but when a query can be rewritten to more than one materialized view how does the optimizer determine which MV to use?

I would have guessed it would compare the costs of the original query and all the possible rewritten queries and pick the lowest cost.

However I have a situation where a query sometimes writes to 1 MV and sometimes to the other. Between the change is a massive amount of overnight processing. The cost is always lower for 1 MV (if I force rewrite to it using the rewrite (mvname) hint) but it is not always this one that is selected.

If I use the explain_rewrite procedure I get a
QSM-01022: a more optimal materialized view than .... was used to rewrite.

A 10053 trace seems to suggest the use of only one of the MV's is costed

This probably means my guess is wrong! Putting a test case together will be difficult but may be possible if I get a bit further down the line. To help can you tell me what I should be looking for that might affect which MV is used (and therefore what may be changing to influence the change from one or the other) and hopefully I will be able to narrow down what is happening

Thanks


Tom Kyte
March 09, 2006 - 12:34 pm UTC

it costs them out.


using hints changes the costs, so once you added a hint - all bets are off.


...
Between the change is a massive amount of overnight processing.
.....

and so, what happens to the statistics on the base objects?

You are right, but I think there are exceptions!

Graeme Hobbs, March 10, 2006 - 12:44 pm UTC

Hi Tom, I have managed to investigate further and believe I know what is happening in our production system.

I agree that often the optimizer does cost them out but I have come across circumstances where I beleieve that it doesn't and this is what is happening to us at present. I have managed to reproduce this on a test case (which follows but is fairly long) but only by manually changing the stats - could you check whether what I have done is valid?

My theory is that if there are more than one MV than can be rewritten to that under certain circumstances the optimizer skips doing the costing because it believes that there is another MV that is much more optimal and that there is no point in wasting resources costing the non optimal one. This is the source of the explain_rewrite message QSM-01022

[rits55] /apps/projects/stg/tst-> oerr qsm 1022
01022, 00000, "a more optimal materialized view than %s was used to rewrite"
// *Cause: Query Rewrite always chooses the most optimal materialized view
// based on certain parameters, such as cardinality of the
// materialized view, number of joins etc. The materialized view
// specified was found to be less optimal.
// *Action: To force using a materialized view, consider using a rewrite hint.

As you can see it is very specific ('certain parameters', 'such as', 'etc') ;-)

Assuming in my production system that the only thing that changes from day to day is the cardinality of the MVs (estimated by dbms_stats) I took the statistics to a test system and found that a modification of the row number statistic by a very small margin could cause the optimizer to flip which materialized view it would rewrite to despite the fact that the cost of the queries using the respective MVs are more than an order of magnitude apart.


I think I can prove this. What I have done is set up a test case with 2 MV's. With the data as created and analyzed the query goes to the lowest cost MV without any issues. The explain_rewrite results also seem to indicate this (there is no QSM-01022 error).

What I have then done is set the cardinality of the lowest cost MV using dbms_stats to a much higher number (though in my more complex production case a less that .05 % increase caused similar behaviour. Using the rewrite hint the cost of using that materialized view goes up but it is still less than the cost of using the other MV or the non rewritten query. However the other MV is then selected and the explain_rewrite produces a qsm 1022 error.

If you believe that my example is valid would it be possible for you to find what the 'certain parameters' are and how they work?

The example is rather long winded but here we go, please bear with some of the quirks (e.g sometimes having normalized hierarchy tables and sometimes not ) as it has taken some time and twiddling about to reproduce my issue!


Initially create the dimension and fact tables:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options


ITGH@dwhdev > CREATE TABLE dim_loc (
2 CONSTRAINT dim_loc_pk PRIMARY KEY (sk_location_id),
3 sk_location_id NUMBER,
4 location_id NUMBER(12),
5 location_descr VARCHAR2(50),
6 sk_fascia_id NUMBER,
7 fascia_id NUMBER(12),
8 fascia_descr VARCHAR2(50))
9 /

Table created.

ITGH@dwhdev > CREATE TABLE dim_dept
2 ( CONSTRAINT dim_dept_pk PRIMARY KEY (sk_dept_id),
3 sk_dept_id NUMBER,
4 dept_id NUMBER(12),
5 dept_descr VARCHAR2(50))
6 /

Table created.

ITGH@dwhdev > CREATE TABLE dim_sku (
2 CONSTRAINT dim_sku_pk PRIMARY KEY (sk_sku_id),
3 sk_sku_id NUMBER,
4 sku_id NUMBER(12),
5 sku_descr VARCHAR2(50),
6 option_id NUMBER(12),
7 sk_dept_id NUMBER,
8 dept_id NUMBER(12),
9 dept_descr VARCHAR2(50))
10 /


Table created.

ITGH@dwhdev > ALTER TABLE dim_sku ADD ( CONSTRAINT dim_sku_dept_fk FOREIGN KEY (sk_dept_id)
2 REFERENCES dim_dept (sk_dept_id));

Table altered.

ITGH@dwhdev > CREATE TABLE ftt_sal (
2 sk_sku_id NUMBER,
3 sk_location_id NUMBER,
4 sal_qty NUMBER)
5 /

Table created.

ITGH@dwhdev > ALTER TABLE ftt_sal ADD ( CONSTRAINT ftt_sal_sku_FK FOREIGN KEY (sk_sku_id)
2 REFERENCES dim_sku (sk_sku_id));

Table altered.

ITGH@dwhdev > ALTER TABLE ftt_sal ADD ( CONSTRAINT ftt_sal_loc_FK FOREIGN KEY (sk_location_id)
2 REFERENCES dim_loc (sk_location_id));

Table altered.


Populate the tables: essentially there is an organisation dimension of 200 locations within 2 fascias and a product dimension of 40,000 skus, within 5000 options within 60 departments . I have left off the time dimension to try and keep things simpler!

There is then a fact table at the sku location level that can have duplicates within a sku/location

1 insert into dim_loc
2 select rownum, rownum, 'location ' || rownum,
3 mod(rownum,2), mod(rownum,2), 'fascia '|| mod(rownum,2)
4 from all_objects
5* where rownum < 201
ITGH@dwhdev > /

200 rows created.

1 insert into dim_dept
2 select rownum, rownum, 'dept ' || rownum
3 from all_objects
4* where rownum < 61
ITGH@dwhdev > /

60 rows created.

1 insert into dim_sku
2 select rownum, rownum, 'sku ' || rownum, mod(rownum,5000) AS option_id,
3 mod(rownum,60)+1, mod(rownum,60)+1, 'dept '|| (mod(rownum,60)+1)
4* from all_objects
ITGH@dwhdev > /

41360 rows created.

1 insert into ftt_sal
2 select trunc(1 + mod(sqrt(rownum) * sk_location_id, 10000)),
3 trunc(1 + mod(rownum * sqrt(sk_sku_id), 200)),
4 trunc( mod(sqrt(sk_location_id) * sqrt(sk_sku_id), 100))
5* from dim_sku, (select * from dim_loc where rownum < 50)
ITGH@dwhdev > /

41360 rows created.

Gather stats:

ITGH@dwhdev > exec dbms_stats.gather_table_stats ('itgh', 'dim_sku');

PL/SQL procedure successfully completed.

ITGH@dwhdev > exec dbms_stats.gather_table_stats ('itgh', 'dim_dept');

PL/SQL procedure successfully completed.

ITGH@dwhdev > exec dbms_stats.gather_table_stats ('itgh', 'dim_loc');

PL/SQL procedure successfully completed.
ITGH@dwhdev > exec dbms_stats.gather_table_stats ('itgh', 'ftt_sal');

PL/SQL procedure successfully completed.


Create the MVs one at dept_location the other at option_fascia.

ITGH@dwhdev > CREATE MATERIALIZED VIEW MVW_SAL_DEPT_LOC
2 ENABLE QUERY REWRITE
3 AS
4 SELECT
5 dept.sk_dept_id AS sk_dept_id,
6 ftt.sk_location_id AS sk_location_id,
7 SUM(ftt.sal_qty) AS sal_qty
8 FROM dim_dept dept,
9 dim_sku prd,
10 ftt_sal ftt
11 WHERE ftt.sk_sku_id = prd.sk_sku_id
12 AND prd.sk_dept_id = dept.sk_dept_id
13 GROUP BY
14 dept.sk_dept_id,
15 ftt.sk_location_id;
ITGH@dwhdev > /

Materialized view created.

1 CREATE MATERIALIZED VIEW MVW_SAL_OPTION_FASCIA
2 ENABLE QUERY REWRITE
3 AS
4 SELECT
5 prd.option_id AS option_id,
6 org.sk_fascia_id AS sk_fascia_id,
7 SUM(ftt.sal_qty) AS sal_qty
8 FROM dim_loc org,
9 dim_sku prd,
10 ftt_sal ftt
11 WHERE ftt.sk_sku_id = prd.sk_sku_id
12 AND ftt.sk_location_id = org.sk_location_id
13 GROUP BY
14 prd.option_id,
15* org.sk_fascia_id
ITGH@dwhdev > /

Materialized view created.


Count the rows and get stats


ITGH@dwhdev > select count(*) from MVW_SAL_dept_loc
2 /

COUNT(*)
----------
12000

ITGH@dwhdev > select count(*) from MVW_SAL_OPTION_FASCIA;

COUNT(*)
----------
10000

ITGH@dwhdev > exec dbms_stats.gather_table_stats ('itgh', 'MVW_SAL_OPTION_FASCIA');

PL/SQL procedure successfully completed.

ITGH@dwhdev > exec dbms_stats.gather_table_stats ('itgh', 'MVW_SAL_DEPT_LOC');

PL/SQL procedure successfully completed.


Create dimensions to allow rewrite up the hierarchy

ITGH@dwhdev > CREATE DIMENSION org
2 LEVEL sk_location_id IS (dim_loc.sk_location_id)
3 LEVEL sk_fascia_id IS (dim_loc.sk_fascia_id)
4 HIERARCHY org_hier (
5 sk_location_id CHILD OF
6 sk_fascia_id
7 )
8 ATTRIBUTE sk_location_id DETERMINES location_id
9 ATTRIBUTE sk_location_id DETERMINES location_descr
10 ATTRIBUTE sk_fascia_id DETERMINES fascia_id
11 ATTRIBUTE sk_fascia_id DETERMINES fascia_descr
12 /

Dimension created.

1 CREATE DIMENSION prd
2 LEVEL sk_sku_id IS (dim_sku.sk_sku_id)
3 LEVEL option_id IS (dim_sku.option_id)
4 LEVEL sk_dept_id IS (dim_sku.sk_dept_id)
5 HIERARCHY product_sku_hier (
6 sk_sku_id CHILD OF
7 option_id CHILD OF
8 sk_dept_id
9 )
10 ATTRIBUTE sk_sku_id DETERMINES sku_id
11 ATTRIBUTE sk_sku_id DETERMINES sku_descr
12 ATTRIBUTE sk_dept_id DETERMINES dept_id
13* ATTRIBUTE sk_dept_id DETERMINES dept_descr
ITGH@dwhdev > /

Dimension created.

If we run the query without and hints it rewrites to the dept_loc MV. (If we use +REWRITE (MVW_SAL_DEPT_LOC) the plan is identical)

ITGH@dwhdev > set autotrace traceonly explain
ITGH@dwhdev > SELECT
2 dept.dept_descr,
3 SUM(ftt.sal_qty) AS sal_qty
4 FROM dim_dept dept,
5 dim_sku prd,
6 dim_loc org,
7 ftt_sal ftt
8 WHERE ftt.sk_sku_id = prd.sk_sku_id
9 AND prd.sk_dept_id = dept.sk_dept_id
10 AND ftt.sk_location_id = org.sk_location_id
11 AND org.fascia_id = 1
12 GROUP BY
13 dept.dept_descr
14 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=44 Card=60 Bytes=1800)
1 0 SORT (GROUP BY) (Cost=44 Card=60 Bytes=1800)
2 1 HASH JOIN (Cost=41 Card=6000 Bytes=180000)
3 2 TABLE ACCESS (FULL) OF 'DIM_DEPT' (TABLE) (Cost=25 Card=60 Bytes=660)
4 2 MERGE JOIN (Cost=15 Card=6000 Bytes=114000)
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'DIM_LOC' (TABLE) (Cost=2 Card=100 Bytes=700)
6 5 INDEX (FULL SCAN) OF 'DIM_LOC_PK' (INDEX (UNIQUE)) (Cost=1 Card=200)
7 4 SORT (JOIN) (Cost=13 Card=12000 Bytes=144000)
8 7 MAT_VIEW REWRITE ACCESS (FULL) OF 'MVW_SAL_DEPT_LOC' (MAT_VIEW REWRITE) (Cost=8 Card=12000
Bytes=144000)


If we use the +NOREWRITE hint we can see why it has gone to the MV

1 SELECT /*+NOREWRITE */
2 dept.dept_descr,
3 SUM(ftt.sal_qty) AS sal_qty
4 FROM dim_dept dept,
5 dim_sku prd,
6 dim_loc org,
7 ftt_sal ftt
8 WHERE ftt.sk_sku_id = prd.sk_sku_id
9 AND prd.sk_dept_id = dept.sk_dept_id
10 AND ftt.sk_location_id = org.sk_location_id
11 AND org.fascia_id = 1
12 GROUP BY
13* dept.dept_descr
ITGH@dwhdev > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1881 Card=60 Bytes=2160)
1 0 SORT (GROUP BY) (Cost=1881 Card=60 Bytes=2160)
2 1 HASH JOIN (Cost=1338 Card=1026282 Bytes=36946152)
3 2 TABLE ACCESS (FULL) OF 'DIM_LOC' (TABLE) (Cost=25 Card=100 Bytes=700)
4 2 HASH JOIN (Cost=1191 Card=2062827 Bytes=59821983)
5 4 MERGE JOIN (Cost=71 Card=41360 Bytes=785840)
6 5 TABLE ACCESS (BY INDEX ROWID) OF 'DIM_DEPT' (TABLE) (Cost=2 Card=60 Bytes=660)
7 6 INDEX (FULL SCAN) OF 'DIM_DEPT_PK' (INDEX (UNIQUE)) (Cost=1 Card=60)
8 5 SORT (JOIN) (Cost=70 Card=41360 Bytes=330880)
9 8 TABLE ACCESS (FULL) OF 'DIM_SKU' (TABLE) (Cost=52 Card=41360 Bytes=330880)
10 4 TABLE ACCESS (FULL) OF 'FTT_SAL' (TABLE) (Cost=996 Card=2034299 Bytes=20342990)

AND if we force it to rewrite to the option_fascia mv We can see why that has not been selected

1 SELECT /*+REWRITE(MVW_SAL_OPTION_FASCIA) */
2 dept.dept_descr,
3 SUM(ftt.sal_qty) AS sal_qty
4 FROM dim_dept dept,
5 dim_sku prd,
6 dim_loc org,
7 ftt_sal ftt
8 WHERE ftt.sk_sku_id = prd.sk_sku_id
9 AND prd.sk_dept_id = dept.sk_dept_id
10 AND ftt.sk_location_id = org.sk_location_id
11 AND org.fascia_id = 1
12 GROUP BY
13* dept.dept_descr
ITGH@dwhdev > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=169 Card=60 Bytes=2760)
1 0 SORT (GROUP BY) (Cost=169 Card=60 Bytes=2760)
2 1 HASH JOIN (Cost=133 Card=82720 Bytes=3805120)
3 2 VIEW (Cost=26 Card=2 Bytes=32)
4 3 SORT (UNIQUE) (Cost=26 Card=2 Bytes=12)
5 4 TABLE ACCESS (FULL) OF 'DIM_LOC' (TABLE) (Cost=25 Card=100 Bytes=600)
6 2 HASH JOIN (Cost=101 Card=82720 Bytes=2481600)
7 6 MAT_VIEW REWRITE ACCESS (FULL) OF 'MVW_SAL_OPTION_FASCIA' (MAT_VIEW REWRITE) (Cost=8 Card=10
000 Bytes=120000)

8 6 MERGE JOIN (Cost=89 Card=41360 Bytes=744480)
9 8 TABLE ACCESS (BY INDEX ROWID) OF 'DIM_DEPT' (TABLE) (Cost=2 Card=60 Bytes=660)
10 9 INDEX (FULL SCAN) OF 'DIM_DEPT_PK' (INDEX (UNIQUE)) (Cost=1 Card=60)
11 8 SORT (JOIN) (Cost=87 Card=41360 Bytes=289520)
12 11 VIEW (Cost=70 Card=41360 Bytes=289520)
13 12 SORT (UNIQUE) (Cost=70 Card=41360 Bytes=289520)
14 13 TABLE ACCESS (FULL) OF 'DIM_SKU' (TABLE) (Cost=52 Card=41360 Bytes=289520)


If we then run the explain_rewrite and examine the results it seems both MV's are considered



1 BEGIN
2 DBMS_MVIEW.EXPLAIN_REWRITE ('
3 SELECT
4 dept.dept_descr,
5 SUM(ftt.sal_qty) AS sal_qty
6 FROM dim_dept dept,
7 dim_sku prd,
8 dim_loc org,
9 ftt_sal ftt
10 WHERE ftt.sk_sku_id = prd.sk_sku_id
11 AND prd.sk_dept_id = dept.sk_dept_id
12 AND ftt.sk_location_id = org.sk_location_id
13 AND org.fascia_id = 1
14 GROUP BY
15 dept.dept_descr
16 ',NULL,'QRY1'
17 );
18* END;
19 /

PL/SQL procedure successfully completed.

ITGH@dwhdev > select sequence, message, mv_in_msg, original_cost, rewritten_cost
2 from rewrite_table where statement_id = 'QRY1';

SEQUENCE
----------
MESSAGE
--------------------------------------------------------------------------------------------------------------
----------------------
MV_IN_MSG ORIGINAL_COST REWRITTEN_COST
------------------------------ ------------- --------------
1
QSM-01033: query rewritten with materialized view, MVW_SAL_DEPT_LOC
MVW_SAL_DEPT_LOC 1881 40

2
QSM-01101: rollup(s) took place on mv, MVW_SAL_DEPT_LOC
MVW_SAL_DEPT_LOC 1881 40

3
QSM-01102: materialized view, MVW_SAL_DEPT_LOC, requires join back to table, DIM_DEPT, on column, DEPT_DESCR
MVW_SAL_DEPT_LOC 1881 40

4
QSM-01101: rollup(s) took place on mv, MVW_SAL_OPTION_FASCIA
MVW_SAL_OPTION_FASCIA 1881 40

5
QSM-01102: materialized view, MVW_SAL_OPTION_FASCIA, requires join back to table, DIM_SKU, on column, SK_DEPT_
ID
MVW_SAL_OPTION_FASCIA 1881 40

6
QSM-01102: materialized view, MVW_SAL_OPTION_FASCIA, requires join back to table, DIM_LOC, on column, FASCIA_I
D
MVW_SAL_OPTION_FASCIA 1881 40


6 rows selected.

ITGH@dwhdev >


Now here is the dodgy bit, if I increase the cardinality of the dept_loc MV by a factor of 5 then the behavior completely changes

ITGH@dwhdev > begin
2 dbms_stats.set_table_stats ('itgh','mvw_sal_dept_loc', null,null, null, 60000);
3 end;
4 /

PL/SQL procedure successfully completed.

1 BEGIN
2 DBMS_MVIEW.EXPLAIN_REWRITE ('
3 SELECT
4 dept.dept_descr,
5 SUM(ftt.sal_qty) AS sal_qty
6 FROM dim_dept dept,
7 dim_sku prd,
8 dim_loc org,
9 ftt_sal ftt
10 WHERE ftt.sk_sku_id = prd.sk_sku_id
11 AND prd.sk_dept_id = dept.sk_dept_id
12 AND ftt.sk_location_id = org.sk_location_id
13 AND org.fascia_id = 1
14 GROUP BY
15 dept.dept_descr
16 ',NULL,'QRY2'
17 );
18* END;
ITGH@dwhdev > /

PL/SQL procedure successfully completed.

ITGH@dwhdev > select sequence, message, mv_in_msg, original_cost, rewritten_cost
2 from rewrite_table where statement_id = 'QRY2'
3 /
SEQUENCE
----------
MESSAGE
--------------------------------------------------------------------------------------------------------------
----------------------
MV_IN_MSG ORIGINAL_COST REWRITTEN_COST
------------------------------ ------------- --------------
1
QSM-01033: query rewritten with materialized view, MVW_SAL_OPTION_FASCIA
MVW_SAL_OPTION_FASCIA 1881 152

2
QSM-01022: a more optimal materialized view than MVW_SAL_DEPT_LOC was used to rewrite
MVW_SAL_DEPT_LOC 1881 152

3
QSM-01101: rollup(s) took place on mv, MVW_SAL_OPTION_FASCIA
MVW_SAL_OPTION_FASCIA 1881 152

4
QSM-01102: materialized view, MVW_SAL_OPTION_FASCIA, requires join back to table, DIM_SKU, on column, SK_DEPT_
ID
MVW_SAL_OPTION_FASCIA 1881 152

5
QSM-01102: materialized view, MVW_SAL_OPTION_FASCIA, requires join back to table, DIM_LOC, on column, FASCIA_I
D
MVW_SAL_OPTION_FASCIA 1881 152


ITGH@dwhdev >

This time we get the QSM-01022 message. And the query now rewrites to the option_fascia MV -

ITGH@dwhdev > SELECT
2 dept.dept_descr,
3 SUM(ftt.sal_qty) AS sal_qty
4 FROM dim_dept dept,
5 dim_sku prd,
6 dim_loc org,
7 ftt_sal ftt
8 WHERE ftt.sk_sku_id = prd.sk_sku_id
9 AND prd.sk_dept_id = dept.sk_dept_id
10 AND ftt.sk_location_id = org.sk_location_id
11 AND org.fascia_id = 1
12 GROUP BY
13 dept.dept_descr
14 /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=169 Card=60 Bytes=2760)
1 0 SORT (GROUP BY) (Cost=169 Card=60 Bytes=2760)
2 1 HASH JOIN (Cost=133 Card=82720 Bytes=3805120)
3 2 VIEW (Cost=26 Card=2 Bytes=32)
4 3 SORT (UNIQUE) (Cost=26 Card=2 Bytes=12)
5 4 TABLE ACCESS (FULL) OF 'DIM_LOC' (TABLE) (Cost=25 Card=100 Bytes=600)
6 2 HASH JOIN (Cost=101 Card=82720 Bytes=2481600)
7 6 MAT_VIEW REWRITE ACCESS (FULL) OF 'MVW_SAL_OPTION_FASCIA' (MAT_VIEW REWRITE) (Cost=8 Card=10
000 Bytes=
120000)

8 6 MERGE JOIN (Cost=89 Card=41360 Bytes=744480)
9 8 TABLE ACCESS (BY INDEX ROWID) OF 'DIM_DEPT' (TABLE) (Cost=2 Card=60 Bytes=660)
10 9 INDEX (FULL SCAN) OF 'DIM_DEPT_PK' (INDEX (UNIQUE)) (Cost=1 Card=60)
11 8 SORT (JOIN) (Cost=87 Card=41360 Bytes=289520)
12 11 VIEW (Cost=70 Card=41360 Bytes=289520)
13 12 SORT (UNIQUE) (Cost=70 Card=41360 Bytes=289520)
14 13 TABLE ACCESS (FULL) OF 'DIM_SKU' (TABLE) (Cost=52 Card=41360 Bytes=289520)

however if we force the query to rewrite to the dept_loc MV then the cost is less than half of that to rewrite to the select option_fascia MV

1 SELECT /*+REWRITE (MVW_SAL_DEPT_LOC) */
2 dept.dept_descr,
3 SUM(ftt.sal_qty) AS sal_qty
4 FROM dim_dept dept,
5 dim_sku prd,
6 dim_loc org,
7 ftt_sal ftt
8 WHERE ftt.sk_sku_id = prd.sk_sku_id
9 AND prd.sk_dept_id = dept.sk_dept_id
10 AND ftt.sk_location_id = org.sk_location_id
11 AND org.fascia_id = 1
12 GROUP BY
13* dept.dept_descr
ITGH@dwhdev > /

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=81 Card=60 Bytes=1800)
1 0 SORT (GROUP BY) (Cost=81 Card=60 Bytes=1800)
2 1 HASH JOIN (Cost=69 Card=30000 Bytes=900000)
3 2 TABLE ACCESS (FULL) OF 'DIM_DEPT' (TABLE) (Cost=25 Card=60 Bytes=660)
4 2 HASH JOIN (Cost=41 Card=30000 Bytes=570000)
5 4 TABLE ACCESS (FULL) OF 'DIM_LOC' (TABLE) (Cost=25 Card=100 Bytes=700)
6 4 MAT_VIEW REWRITE ACCESS (FULL) OF 'MVW_SAL_DEPT_LOC' (MAT_VIEW REWRITE) (Cost=12 Card=60000
Bytes=720000)



I am sorry this has gone on so long but it looks to me as though the dept_loc MV has been discarded WITHOUT being costed and this is exactly what happens with our original production query though the tolerance was less than 0.5% on one MV or the same off the other would cause the flip to the inefficient MV. I would be very grateful for your opinion on this!



Graeme, March 16, 2006 - 11:41 am UTC

Hi tom, any feedback on the above, essentially i am trying to find out:

1) is there another method for selecting which materialized view is selected apart from costing all the options, if not what is the cause of the QSM-01022 message?

2) if the answer to 1 is yes do you have any detail on the factors that influence the decision the one materialized view is more optimal than the other

Is there more info I can supply to help?

Thanks!


Tom Kyte
March 16, 2006 - 2:46 pm UTC

Graeme - sorry, but when I page down 14 times, I just have to skip it. This is for quicky followup reviews. If I cannot see anything obvious - I have to go on. I see these once (they are not in a queue), I have minutes to answer them. I judged it as just "too big to look at here and now"

I understand!

Graeme, March 20, 2006 - 11:42 am UTC

Thanks Tom, I understand exactly your point, in trying to give a complete example for something a little obscure it does involve pages of stuff. I apologise for trying to sneak in an extra question its just that I have never been on this site and found you open for questions! I am actually very conscious that you are only one man (and I only ask something if I am really stuck) and I am amazed at how much you can contribute here as well as doing a real job at the same time!

Perhaps you need to take on an apprentice before become a victim of your own success? This site must generate such good PR for Oracle surely they could spare someone to help without the loss of your 'distinctive' style ;-)

Same dimension on two levels of facts?

Duke Ganote, August 02, 2006 - 12:49 pm UTC

Suppose I've been provided with three monthly files for a small datamart. The files feed into three tables:
1. a hierchical dimension table.
2. two fact tables, each at different levels of the hierarchy, but with different measures (that is, one's not a summary of the other)
Can I store the metadata to tell the optimizer that the more summarized one can use the same dimension for information?

Here's a simple example:

DROP TABLE account;
DROP TABLE bank_hierarchy;
DROP TABLE edi_transaction;
CREATE TABLE bank_hierarchy
( bank_id NUMBER
, branch_id NUMBER
, bank_nm varchar2(20)
, branch_nm varchar2(20)
, CONSTRAINT bank_hierarchy_pk
PRIMARY KEY ( bank_id, branch_id )
);
CREATE TABLE account
( bank_id NUMBER
, branch_id NUMBER
, account_id NUMBER
, account_amt NUMBER
, CONSTRAINT account_pk
PRIMARY KEY ( bank_id, branch_id, account_id )
);
CREATE TABLE edi_transaction
( bank_id NUMBER
, transaction_cnt NUMBER
, CONSTRAINT edi_transaction_pk
PRIMARY KEY ( bank_id )
);
CREATE DIMENSION bank_hiearchy_dim
LEVEL bank IS BANK_HIERARCHY.BANK_ID
LEVEL branch IS BANK_HIERARCHY.BRANCH_ID
LEVEL bank_nm IS BANK_HIERARCHY.BANK_NM
LEVEL branch_nm IS BANK_HIERARCHY.BRANCH_NM
HIERARCHY bank_branches
( branch CHILD OF bank )
ATTRIBUTE bank DETERMINES bank_nm
ATTRIBUTE branch DETERMINES branch_nm
;
ALTER TABLE account
ADD CONSTRAINT ACCOUNT_BANK_HIER
FOREIGN KEY
( bank_id, branch_id )
REFERENCES bank_hierarchy
( bank_id, branch_id )
;

All good so far -- how can I do this (which doesn't work):

ALTER TABLE EDI_TRANSACTION
ADD CONSTRAINT EDI_TRX_BANK
FOREIGN KEY
( bank_id )
REFERENCES bank_hierarchy
( bank_id )
;

Our reporting tool is set up to "know" that edi_transaction.bank_id references bank_hierarchy.bank_id; it can lookup the the bank_nm on the dimension table.
Can I tell Oracle the same meta-information?

Tom Kyte
August 02, 2006 - 12:58 pm UTC

... ut with
different measures (that is, one's not a summary of the other)
Can I store the metadata to tell the optimizer that the more summarized one can
use the same dimension for information?...

I don't understand that, if they are not summaries of eachother...?? how can one be "more summarized" than the other.

Same dimension = 2 logical tables

Duke Ganote, August 03, 2006 - 8:58 am UTC

> how can one be "more summarized" than the other?

The ACCOUNT fact is more granular (at the bank branch level) than the EDI_TRANSACTION fact (at the bank level).

Of course, the BANK_HIERARCHY dimension table is logically two tables:

CREATE TABLE bank
( bank_id NUMBER
, bank_nm varchar2(20)
, CONSTRAINT bank_pk
PRIMARY KEY ( bank_id )
);
CREATE TABLE branch
( bank_id NUMBER
, branch_id NUMBER
, branch_nm varchar2(20)
, CONSTRAINT branch_pk
PRIMARY KEY ( bank_id, branch_id )
, CONSTRAINT bank_fk
FOREIGN KEY
(bank_id) REFERENCES bank(bank_id)
);

Without actually normalizing the data into 2 tables, I can inform the optimizer of this hierarchical relationship via the dimension statement:

CREATE DIMENSION bank_hiearchy_dim
LEVEL bank IS BANK_HIERARCHY.BANK_ID
LEVEL branch IS BANK_HIERARCHY.BRANCH_ID
LEVEL bank_nm IS BANK_HIERARCHY.BANK_NM
LEVEL branch_nm IS BANK_HIERARCHY.BRANCH_NM
HIERARCHY bank_branches
( branch CHILD OF bank )
ATTRIBUTE bank DETERMINES bank_nm
ATTRIBUTE branch DETERMINES branch_nm
;

My question is a follow-up on the original question "Is a dimension some metadata about dimension tables in my star/snowflake schema?"

I know I could normalize the data by creating a BANK table, then then inform Oracle of the relationship with the EDI_TRANSACTION fact like this:

ALTER TABLE EDI_TRANSACTION
ADD CONSTRAINT EDI_TRX_BANK
FOREIGN KEY
( bank_id )
REFERENCES bank
( bank_id )
;

However (without creating the explicit, normalized BANK table) can we inform the optimzer that the BANK_HIERARCHY dimension table contains the bank information needed when joining to the EDI_TRANSACTION fact?

Tom Kyte
August 03, 2006 - 9:49 am UTC

without the foreign key, the optimizer cannot assume that joining edi_transaction table is "safe" (it might have bank_id's the other does not and vice versa).

it would have insufficient data.

(and if they are not summaries of eachother.. not sure where we are going with this)

Hierarchy insert questions

A Reader, October 31, 2006 - 6:04 am UTC

Hi Tom.

I have the following two questions:

1. During ETL, does the fact table gets populated first or is it the dimension table?

2. My hierarchy table is a simple list of distinct surrogate keys from the fact table and I have a dimension object that defines the level of the hierarchy. As new keys may be added daily, how do I make sure that my hierarchy tables has all the latest key? Do I create it as a materialized view or do I need to incrementally add to it daily?

Thanks for any help.

Tom Kyte
October 31, 2006 - 8:48 am UTC

1) since I don't have your code, I cannot really answer.

Meaning - it could be:

a) fact then dimension
b) dimension then fact
c) both at the same exact time

2) again, see #1, since I don't have your code, I cannot really answer. I don't know what you mean by a "hierarchy table". If you mean a dimension - it would be up to you as to how you maintain it.

Non unique company names in hierarchy

Dave, January 05, 2007 - 10:33 am UTC

Tom

I have a Company dimension which contains 4 levels of Company associated to a hierarchy.

Company_Level_1 is the highest level (owning) company, down to Company_Level_4.

My issue is that some company names at level 4 are not unique in themselves but are uniquely defined by their owner. For example "AI Holdings" is a company owned by "AI Systems Ltd", and a completely different "AI Holdings" is owned by "XBASE Ltd"

I'm using a materialized view to summarize fact table data at the lowest level of both hierarchies, and this successfully "rolls up" when a higher level summary is required, using the hierarchies in the dimensions.

The problem is that data associated with both "AI Holdings" (1) AND "AI Holdings" (2) is getting rolled up into both parents. Is there any way to avoid this other than changing the data to make it unique within a level in the hierarchy

Thanks for you ongoing help, It's invaluable out here in "real world" land
Tom Kyte
January 06, 2007 - 8:17 am UTC

not sure what else would happen if you just aggregate by this fourth level. You would have to aggregate by some set of dimensions that is "unique"


that does not mean you need to change what "level 4" is, but rather you would aggregate by more than just that column.

Understand that now, but...

Dave, January 08, 2007 - 3:45 am UTC

Yes, I can see that and its quite clear now that you've pointed it out. I've cured the problem by defining the view correctly.
However, is it concerning that all of the received wisdom says that building materialized views and hierarchy meta data is benign ie, Oracle will only use them if there's some benefit but theres no harm in having them hanging around? Frequently we are told that they are "just like indexes" Up to now this has been true for me (i've used them a lot), but this particular data circumstance has caused Oracle to give an incorrect result. The query I wrote, with rewrite turned off gives the right result, with rewrite turned on gives the wrong result.

Sorry Tom, all is clear now

Dave, January 08, 2007 - 4:02 am UTC

And that's why we have TRUSTED and ENFORCED methods for the query rewrite. The clue is in the name "TRUSTED" but there should be a big HEALTH WARNING stamped on that
Tom Kyte
January 08, 2007 - 12:27 pm UTC

;)

Dimension isn't doing anything for me

Jack Mulhollan, March 01, 2007 - 6:22 pm UTC

Tom,

I am trying something very similar to the example in your book, but the dimension isn't doing anything for me. Please see the example below.

I can get rewrite to happen on all three queries if I add the year and quarter columns to my materialized view (e.g. "group by b.year, b.quarter, b.month"). But that works even in the absence of the dimension. Therefore, I am not getting the point of creating the dimension.

Can you explain? Thanks. (version 10.1.0.4)

- Jack


SQL> ----------------------
SQL> -- Create input table.
SQL> ----------------------
SQL> 
SQL> create table sales_tb
  2  as
  3  select trunc(sysdate - trunc(dbms_random.value(1,(365*3)))) sale_date,
  4         round(dbms_random.value(1,500),2) price,
  5         'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' long_string
  6  from all_objects
  7  UNION ALL
  8  select trunc(sysdate - trunc(dbms_random.value(1,(365*3)))) sale_date,
  9         round(dbms_random.value(1,500),2) price,
 10         'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' long_string
 11  from all_objects
 12  UNION ALL
 13  select trunc(sysdate - trunc(dbms_random.value(1,(365*3)))) sale_date,
 14         round(dbms_random.value(1,500),2) price,
 15         'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' long_string
 16  from all_objects
 17  UNION ALL
 18  select trunc(sysdate - trunc(dbms_random.value(1,(365*3)))) sale_date,
 19         round(dbms_random.value(1,500),2) price,
 20         'THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG' long_string
 21  from all_objects
 22  ;

Table created.

Elapsed: 00:00:05.37
SQL> 
SQL> ---------
SQL> -- Count.
SQL> ---------
SQL> 
SQL> select count(*) from sales_tb;

        COUNT(*)
----------------
         165,696

Elapsed: 00:00:00.20
SQL> 
SQL> --------------------------------------
SQL> -- Create time dimension lookup table.
SQL> --------------------------------------
SQL> 
SQL> create table sale_date_lookup_tb
  2  (sale_date, month, quarter, year)
  3  as
  4  select sale_date,
  5         to_char(sale_date,'YYYY-MM') month,
  6         cast(
  7             to_char(sale_date,'YYYY') ||
  8               '-Q' ||
  9               ceil( to_char(sale_date,'MM')/3)
 10             as varchar2(7)
 11             ) quarter,
 12         to_char(sale_date,'YYYY') year
 13  from (
 14       select distinct sale_date
 15       from sales_tb
 16       )
 17  ;

Table created.

Elapsed: 00:00:00.20
SQL> 
SQL> ---------
SQL> -- Count.
SQL> ---------
SQL> 
SQL> select count(*) from sale_date_lookup_tb;

        COUNT(*)
----------------
           1,094

Elapsed: 00:00:00.01
SQL> 
SQL> -----------------------------------------
SQL> -- Create month rollup materialized view.
SQL> -----------------------------------------
SQL> 
SQL> create materialized view month_rollup_mv
  2    enable query rewrite
  3  as
  4  select b.month,
  5         count(price),
  6         count(*),
  7         sum(a.price)
  8  from sales_tb a,
  9       sale_date_lookup_tb b
 10  where a.sale_date = b.sale_date
 11  group by b.month
 12  ;

Materialized view created.

Elapsed: 00:00:00.59
SQL> 
SQL> ---------
SQL> -- Count.
SQL> ---------
SQL> 
SQL> select count(*) from month_rollup_mv;

        COUNT(*)
----------------
              36

Elapsed: 00:00:00.00
SQL> 
SQL> ------------------------------
SQL> -- Create sale_date dimension.
SQL> ------------------------------
SQL> 
SQL> create dimension sale_date_dim
  2    level sale_date is sale_date_lookup_tb.sale_date
  3    level month is sale_date_lookup_tb.month
  4    level quarter is sale_date_lookup_tb.quarter
  5    level year is sale_date_lookup_tb.year
  6  hierarchy sale_date_hier
  7  (
  8   sale_date child of
  9   month child of
 10   quarter child of
 11   year
 12  )
 13  ;

Dimension created.

Elapsed: 00:00:00.02
SQL> 
SQL> -----------
SQL> -- Analyze.
SQL> -----------
SQL> 
SQL> analyze table sales_tb compute statistics;

Table analyzed.

Elapsed: 00:00:02.48
SQL> analyze table sale_date_lookup_tb compute statistics;

Table analyzed.

Elapsed: 00:00:00.04
SQL> analyze table month_rollup_mv compute statistics;

Table analyzed.

Elapsed: 00:00:00.01
SQL> 
SQL> ----------------------------
SQL> -- Set autotrace on explain.
SQL> ----------------------------
SQL> 
SQL> set autotrace on explain
SQL> 
SQL> col id_plus_exp noprint
SQL> col parent_id_plus_exp noprint
SQL> -- col plan_plus_exp for a78 truncated
SQL> col object_node_plus_exp noprint
SQL> 
SQL> ------------------
SQL> -- Rollup on year.
SQL> ------------------
SQL> 
SQL> break on report
SQL> compute sum of count(*) on report
SQL> compute sum of sum(a.price) on report
SQL> 
SQL> select b.year,
  2         count(*),
  3         sum(a.price)
  4  from sales_tb a,
  5       sale_date_lookup_tb b
  6  where a.sale_date = b.sale_date
  7  group by b.year
  8  order by 1
  9  ;

YEAR         COUNT(*)     SUM(A.PRICE)
---- ---------------- ----------------
2004           45,846       11,429,197
2005           55,461       13,885,763
2006           55,318       13,812,632
2007            9,071        2,281,449
     ---------------- ----------------
sum           165,696       41,409,040

Elapsed: 00:00:00.29

Execution Plan
----------------------------------------------------------
SELECT STATEMENT Optimizer=ALL_ROWS (Cost=182 Card=4 Bytes=88)
  SORT (GROUP BY) (Cost=182 Card=4 Bytes=88)
    HASH JOIN (Cost=175 Card=165696 Bytes=3645312)
      TABLE ACCESS (FULL) OF 'SALE_DATE_LOOKUP_TB' (TABLE) (Cost=3 Card=1094 Bytes=12034)

      TABLE ACCESS (FULL) OF 'SALES_TB' (TABLE) (Cost=170 Card=165696 Bytes=1822656)


SQL> 
SQL> ---------------------
SQL> -- Rollup on quarter.
SQL> ---------------------
SQL> 
SQL> select b.quarter,
  2         count(*),
  3         sum(a.price)
  4  from sales_tb a,
  5       sale_date_lookup_tb b
  6  where a.sale_date = b.sale_date
  7  group by b.quarter
  8  order by 1
  9  ;

QUARTER         COUNT(*)     SUM(A.PRICE)
------- ---------------- ----------------
2004-Q1            4,514        1,132,300
2004-Q2           13,387        3,348,818
2004-Q3           13,935        3,473,769
2004-Q4           14,010        3,474,310
2005-Q1           13,705        3,452,055
2005-Q2           13,801        3,460,922
2005-Q3           13,935        3,477,606
2005-Q4           14,020        3,495,180
2006-Q1           13,614        3,395,926
2006-Q2           13,764        3,427,735
2006-Q3           13,835        3,466,381
2006-Q4           14,105        3,522,590
2007-Q1            9,071        2,281,449
        ---------------- ----------------
sum              165,696       41,409,040

13 rows selected.

Elapsed: 00:00:00.25

Execution Plan
----------------------------------------------------------
SELECT STATEMENT Optimizer=ALL_ROWS (Cost=182 Card=13 Bytes=325)
  SORT (GROUP BY) (Cost=182 Card=13 Bytes=325)
    HASH JOIN (Cost=175 Card=165696 Bytes=4142400)
      TABLE ACCESS (FULL) OF 'SALE_DATE_LOOKUP_TB' (TABLE) (Cost=3 Card=1094 Bytes=15316)

      TABLE ACCESS (FULL) OF 'SALES_TB' (TABLE) (Cost=170 Card=165696 Bytes=1822656)


SQL> 
SQL> -------------------
SQL> -- Rollup on month.
SQL> -------------------
SQL> 
SQL> select b.month,
  2         count(*),
  3         sum(a.price)
  4  from sales_tb a,
  5       sale_date_lookup_tb b
  6  where a.sale_date = b.sale_date
  7  group by b.month
  8  order by 1
  9  ;

MONTH           COUNT(*)     SUM(A.PRICE)
------- ---------------- ----------------
2004-03            4,514        1,132,300
2004-04            4,464        1,105,893
2004-05            4,547        1,141,467
2004-06            4,376        1,101,458
2004-07            4,855        1,217,625
2004-08            4,598        1,142,277
2004-09            4,482        1,113,867
2004-10            4,721        1,164,504
2004-11            4,579        1,151,553
2004-12            4,710        1,158,253
2005-01            4,671        1,179,291
2005-02            4,256        1,081,182
2005-03            4,778        1,191,582
2005-04            4,466        1,116,988
2005-05            4,689        1,191,928
2005-06            4,646        1,152,006
2005-07            4,685        1,171,027
2005-08            4,760        1,185,399
2005-09            4,490        1,121,180
2005-10            4,685        1,162,089
2005-11            4,626        1,160,376
2005-12            4,709        1,172,715
2006-01            4,588        1,146,959
2006-02            4,333        1,073,948
2006-03            4,693        1,175,018
2006-04            4,538        1,149,584
2006-05            4,691        1,164,339
2006-06            4,535        1,113,812
2006-07            4,588        1,146,536
2006-08            4,771        1,184,631
2006-09            4,476        1,135,215
2006-10            4,723        1,183,366
2006-11            4,529        1,126,150
2006-12            4,853        1,213,074
2007-01            4,719        1,178,439
2007-02            4,352        1,103,010
        ---------------- ----------------
sum              165,696       41,409,040

36 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=36 Bytes=576)
  SORT (ORDER BY) (Cost=3 Card=36 Bytes=576)
    MAT_VIEW REWRITE ACCESS (FULL) OF 'MONTH_ROLLUP_MV' (MAT_VIEW REWRITE) (Cost=2 Card=36 Bytes=576)


SQL> 
SQL> spool off

Query Rewrite

Dave, March 26, 2007 - 6:57 am UTC

Jack - I tried the script above and got it do the rewrite for the Quarter and Year by setting the session parameters

alter session set query_rewrite_enabled = TRUE
alter session set query_rewrite_integrity = TRUSTED

I also used dbms_stats for the analyze

Hope this helps

Thanks Dave

Jack Mulhollan, April 05, 2007 - 4:03 pm UTC

Dave,

Setting query_rewrite_integrity to TRUSTED did the trick. Thanks!

- Jack



-----------------------------------
From Data Warehousing Guide 10g R2:
-----------------------------------

In TRUSTED mode, the optimizer trusts that the relationships declared in dimensions and RELY constraints are correct. In this mode, the optimizer also uses prebuilt materialized views or materialized views based on views, and it uses relationships that are not enforced as well as those that are enforced. In this mode, the optimizer also trusts declared but not ENABLED VALIDATED primary or unique key constraints and data relationships specified using dimensions. This mode offers greater query rewrite capabilities but also creates the risk of incorrect results if any of the trusted relationships you have declared are incorrect.

http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14223/qrbasic.htm#sthref1318

using dimensions for query rewrite

Ramachandran, October 10, 2008 - 4:38 am UTC

Hi Tom,
I tried to replicate your scenario, but Materialized View is not used in my case even after I have created dimension. Can you please help me.

Regards,
Ram




alter session set query_rewrite_enabled=true;
alter session set query_rewrite_integrity = TRUSTED;


CREATE TABLE SMM_TOPIC_SUMMARY_BK
(
TOPIC_SUMMARY_ID NUMBER(8) NOT NULL,
TOPIC_ID NUMBER(8) NOT NULL,
BUZZ NUMBER(4),
SENTIMENT NUMBER(5,3),
POSITIVE_COUNT NUMBER(8),
NEGATIVE_COUNT NUMBER(8),
NEUTRAL_COUNT NUMBER(8),
DATE1 DATE
);


SET DEFINE OFF;
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(1, 1, 86, 0.78, 45, 89, 23, TO_DATE('01/01/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(2, 1, 78, 0.86, 65, 45, 27, TO_DATE('01/02/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(3, 1, 56, 0.32, 42, 45, 30, TO_DATE('01/03/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(4, 1, 89, 0.81, 66, 77, 17, TO_DATE('01/04/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(5, 2, 45, 0.56, 45, 45, 10, TO_DATE('01/05/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(6, 2, 86, 0.79, 30, 30, 40, TO_DATE('01/06/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(7, 2, 45, 0.55, 35, 35, 30, TO_DATE('01/07/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(8, 2, 55, 0.45, 40, 40, 20, TO_DATE('01/08/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(9, 3, 45, 0.66, 45, 45, 10, TO_DATE('01/09/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(10, 3, 23, 0.78, 33, 33, 34, TO_DATE('01/10/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(11, 3, 77, 0.21, 20, 20, 60, TO_DATE('01/11/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(12, 3, 45, 0.61, 30, 35, 35, TO_DATE('01/12/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(13, 4, 56, 0.25, 50, 30, 20, TO_DATE('01/13/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(14, 4, 63, 0.33, 45, 45, 10, TO_DATE('01/14/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(15, 4, 56, 0.22, 23, 50, 27, TO_DATE('01/15/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
Insert into SMDASHBOARD.SMM_TOPIC_SUMMARY_BK
(TOPIC_SUMMARY_ID, TOPIC_ID, BUZZ, SENTIMENT, POSITIVE_COUNT, NEGATIVE_COUNT, NEUTRAL_COUNT, DATE1)
Values
(16, 4, 89, 0.95, 90, 5, 5, TO_DATE('01/16/2008 00:00:00', 'MM/DD/YYYY HH24:MI:SS'));
COMMIT;




DESC SMM_TOPIC_SUMMARY_BK
Name Null? Type
----------------------------------------- -------- -----------
TOPIC_SUMMARY_ID NOT NULL NUMBER(8)
TOPIC_ID NOT NULL NUMBER(8)
BUZZ NUMBER(4)
SENTIMENT NUMBER(5,3)
POSITIVE_COUNT NUMBER(8)
NEGATIVE_COUNT NUMBER(8)
NEUTRAL_COUNT NUMBER(8)
DATE1 DATE

SELECT COUNT(*) FROM SMM_TOPIC_SUMMARY_BK;

COUNT(*)
----------
16

SELECT * FROM SMM_TOPIC_SUMMARY_BK;

TOPIC_SUMMARY_ID TOPIC_ID BUZZ SENTIMENT POSITIVE_COUNT NEGATIVE_COUNT NEUTRAL_COUNT DATE1
---------------- ---------- ---------- ---------- -------------- -------------- ------------- ---------
1 1 86 .78 45 89 23 01-JAN-08
2 1 78 .86 65 45 27 02-JAN-08
3 1 56 .32 42 45 30 03-JAN-08
4 1 89 .81 66 77 17 04-JAN-08
5 2 45 .56 45 45 10 05-JAN-08
6 2 86 .79 30 30 40 06-JAN-08
7 2 45 .55 35 35 30 07-JAN-08
8 2 55 .45 40 40 20 08-JAN-08
9 3 45 .66 45 45 10 09-JAN-08
10 3 23 .78 33 33 34 10-JAN-08
11 3 77 .21 20 20 60 11-JAN-08

TOPIC_SUMMARY_ID TOPIC_ID BUZZ SENTIMENT POSITIVE_COUNT NEGATIVE_COUNT NEUTRAL_COUNT DATE1
---------------- ---------- ---------- ---------- -------------- -------------- ------------- ---------
12 3 45 .61 30 35 35 12-JAN-08
13 4 56 .25 50 30 20 13-JAN-08
14 4 63 .33 45 45 10 14-JAN-08
15 4 56 .22 23 50 27 15-JAN-08
16 4 89 .95 90 5 5 16-JAN-08





CREATE TABLE TIME_HIERARCHY
(DAY PRIMARY KEY, MMYYYY, MON_YYYY, QTR_YYYY, YYYY)
ORGANIZATION INDEX
AS
SELECT DISTINCT
DATE1 DAY,
CAST (TO_CHAR(DATE1,'MMYYYY') AS NUMBER) MMYYYY,
TO_CHAR (DATE1, 'MON-YYYY') MON_YYYY,
'Q' || CEIL (TO_CHAR(DATE1,'MM')/3) || 'FY' || TO_CHAR(DATE1,'YYYY') QTR_YYYY,
CAST (TO_CHAR(DATE1,'YYYY') AS NUMBER) YYYY
FROM SMM_TOPIC_SUMMARY_BK;



CREATE MATERIALIZED VIEW SENTIMENT100
BUILD IMMEDIATE
REFRESH ON DEMAND
ENABLE QUERY REWRITE
AS
SELECT
TH.DAY,
AVG(STS.SENTIMENT) AS "SENTIMENT"
FROM
SMM_TOPIC_SUMMARY_BK STS,
TIME_HIERARCHY TH
WHERE
STS.DATE1=TH.DAY
GROUP BY
TH.DAY;


SELECT
TH.DAY,
AVG(STS.SENTIMENT) AS "SENTIMENT"
FROM
SMM_TOPIC_SUMMARY_BK STS,
TIME_HIERARCHY TH
WHERE
STS.DATE1=TH.DAY
GROUP BY
TH.DAY;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=16 Bytes=35
2)

1 0 MAT_VIEW REWRITE ACCESS (FULL) OF 'SENTIMENT100' (MAT_VIEW
REWRITE) (Cost=3 Card=16 Bytes=352)


SELECT
TH.QTR_YYYY,
AVG(STS.SENTIMENT) AS "SENTIMENT"
FROM
SMM_TOPIC_SUMMARY_BK STS,
TIME_HIERARCHY TH
WHERE
STS.DATE1=TH.DAY
GROUP BY
TH.QTR_YYYY;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=16 Bytes=89
6)

1 0 HASH (GROUP BY) (Cost=4 Card=16 Bytes=896)
2 1 NESTED LOOPS (Cost=3 Card=16 Bytes=896)
3 2 TABLE ACCESS (FULL) OF 'SMM_TOPIC_SUMMARY_BK' (TABLE)
(Cost=3 Card=16 Bytes=352)

4 2 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_175218' (INDEX (UN
IQUE)) (Cost=0 Card=1 Bytes=34)


create dimension time_hierarchy_dim
level day is TIME_HIERARCHY.day
level mmyyyy is TIME_HIERARCHY.mmyyyy
level qtr_yyyy is TIME_HIERARCHY.qtr_yyyy
level yyyy is TIME_HIERARCHY.yyyy
hierarchy time_rollup
(
day child of
mmyyyy child of
qtr_yyyy child of
yyyy
)
attribute mmyyyy
determines mon_yyyy;



SELECT
TH.QTR_YYYY,
AVG(STS.SENTIMENT) AS "SENTIMENT"
FROM
SMM_TOPIC_SUMMARY_BK STS,
TIME_HIERARCHY TH
WHERE
STS.DATE1=TH.DAY
GROUP BY
TH.QTR_YYYY;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=16 Bytes=89
6)

1 0 HASH (GROUP BY) (Cost=4 Card=16 Bytes=896)
2 1 NESTED LOOPS (Cost=3 Card=16 Bytes=896)
3 2 TABLE ACCESS (FULL) OF 'SMM_TOPIC_SUMMARY_BK' (TABLE)
(Cost=3 Card=16 Bytes=352)

4 2 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_175218' (INDEX (UN
IQUE)) (Cost=0 Card=1 Bytes=34)
Tom Kyte
October 13, 2008 - 2:26 am UTC

look at the row counts, this example is "way too small"

think about it - would it cost more, less, or the same to use any materialized view here?

Probably "about the same", there is hardly ANY DATA.

why query rewrite is not taking place

A reader, June 18, 2010 - 9:58 am UTC

Hi Tom,

created a MV as given below.

CREATE MATERIALIZED VIEW WCC701.WCC_LASTNAME_SEARCH_MV
NOCACHE
NOLOGGING
NOCOMPRESS
NOPARALLEL
BUILD IMMEDIATE
REFRESH COMPLETE ON DEMAND
WITH ROWID
ENABLE QUERY REWRITE
AS
SELECT ps.given_name_one sgivenname151,
ps.given_name_two sgivenname251,
ps.given_name_three sgivenname351,
ps.given_name_four sgivenname451,
ps.last_name,
pn.given_name_one pngivenname151,
pn.given_name_two pngivenname251,
pn.given_name_three pngivenname351,
pn.given_name_four pngivenname451,
pn.last_name pnlastname51,
pn.cont_id contid51,
pn.suffix_desc pnsuffixdesc51,
p.birth_dt birthdt51,
p.gender_tp_code gendertpcode51,
c.inactivated_dt inactivateddt51
FROM wcc701.personsearch ps,
wcc701.personname pn,
wcc701.person p,
wcc701.contact c
WHERE ps.person_name_id = pn.person_name_id
AND ps.cont_id = p.cont_id
AND ps.cont_id = c.cont_id
ORDER BY ps.last_name, pn.given_name_one
;


Then ran a query as given below and query write is not taking place..

SELECT *
FROM ( SELECT PERSONSEARCH.GIVEN_NAME_ONE SGIVENNAME151,
PERSONSEARCH.GIVEN_NAME_TWO SGIVENNAME251,
PERSONSEARCH.GIVEN_NAME_THREE SGIVENNAME351,
PERSONSEARCH.GIVEN_NAME_FOUR SGIVENNAME451,
PERSONSEARCH.LAST_NAME SLASTNAME51,
PERSONNAME.GIVEN_NAME_ONE PNGIVENNAME151,
PERSONNAME.GIVEN_NAME_TWO PNGIVENNAME251,
PERSONNAME.GIVEN_NAME_THREE PNGIVENNAME351,
PERSONNAME.GIVEN_NAME_FOUR PNGIVENNAME451,
PERSONNAME.LAST_NAME PNLASTNAME51,
PERSONNAME.CONT_ID CONTID51,
PERSONNAME.SUFFIX_DESC PNSUFFIXDESC51,
PERSON.BIRTH_DT BIRTHDT51,
PERSON.GENDER_TP_CODE GENDERTPCODE51,
CONTACT.INACTIVATED_DT INACTIVATEDDT51
FROM PERSONSEARCH,
PERSONNAME,
PERSON,
CONTACT
WHERE ( (PERSONSEARCH.LAST_NAME = :1)
AND (PERSONSEARCH.PERSON_NAME_ID =
PERSONNAME.PERSON_NAME_ID)
AND ( (PERSONSEARCH.CONT_ID = PERSON.CONT_ID))
AND (PERSONSEARCH.CONT_ID = CONTACT.CONT_ID)
AND (CONTACT.INACTIVATED_DT IS NULL
OR CONTACT.INACTIVATED_DT > :2))
ORDER BY PERSONNAME.LAST_NAME, PERSONNAME.GIVEN_NAME_ONE)

The exception says that it is happening because of the condition CONTACT.INACTIVATED_DT > :2..

My question is why. can you see a reason behind it.

Thanks

1:n relationship between a parent and children

Chris, November 20, 2012 - 2:54 pm UTC

Hi Tom,

Could you please provide an explanation on why the restriction on having a 1:n relationship between a parent and children exists when creating a dimension?
How can rolling up on a dimension which violates this requirement lead to incorrect results?

Thanks
Tom Kyte
November 20, 2012 - 3:53 pm UTC

the dimension is the parent - and the fact table is the child, we need to know that for every row in the child (the fact) that has a value for the dimension attribute - that is actually will have a parent row to join to.


take a time dimension that tells us how to turn a date without the time into

a) the week
b) the month/year
c) the quarter
etc....


we need to know that for every row in the fact table that has a date (is not null) - that there is a row in the time dimension to join to - if not, we'd lose that row and the time dimension would be "not useful" in rolling up data.

if the fact table had 01-jan-2012 in it, but the time dimension did not - we couldn't turn 01-jan-2012 into a quarter and hence a quarter rollup using it would be wrong.

1:n relationship between a parent and children

Chris, November 21, 2012 - 8:17 am UTC

Thanks,

The explanation is pretty clear in case a record in the fact table does not have an entry in the dimension.

What if within a dimension the children has more than one parent.

Example:
CREATE DIMENSION sponsor_grp_dim
LEVEL sponsor_grp IS (sponsor_grp.sponsor_grp_id)
LEVEL sponsor IS (sponsor_grp.sponsor_id)
LEVEL advisor IS (sponsor_grp.advisor_id)
HIERARCHY sponsor_grp_rollup (
sponsor_grp CHILD OF
sponsor CHILD OF
advisor);

We aggregate an MV at the lowest granularity which is sponsor_grp and we suppose the following functional relationships between the levels

Relation between sponsor_grp and sponsor is 1:n
Relation between sponsor and advisor is n:n (the same sponsor might be part of one or more advisor)

Should we let the rewrite rollup to the sponsor granularity, the results would be correct since the relationship between sponsor_grp and sponsor complies to the 1:n restriction

However, should we let the rewrite rollup to the advisor granularity, would the results be incorrect due to the fact that a sponsor can have multiple advisor?

I can't wrap my head around this, I am trying to figure out an example of where rolling up such a dimension that has an n:n relationship could lead to incorrect aggregations

Thanks,
Chris

Tom Kyte
November 21, 2012 - 9:46 am UTC


as long as sponsor IMPLIES advisor - you can do these rollups.

if sponsor does NOT imply advisor - tell me how you could possibly do a rollup.



think about it. using the time hierarchy again - where you have

DATE -> MONTH_YEAR -> QUARTER


if a month_year had MANY quarters associated to it, how could you rollup from a month year to a quarter??? you'd be duplicating that fact record all over the place. you would be double counting.


relation between sponsor and advisor - by definition - cannot be n:n, it has to be 1:m - a sponsor implies advisor, you can derive advisor from sponsor.

nothing else makes sense.




More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library