Skip to Main Content
  • Questions
  • Unnecessary table access in query when using multiple (> 2) bitmap indexes

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Jay.

Asked: May 03, 2005 - 3:57 pm UTC

Last updated: May 18, 2005 - 3:36 pm UTC

Version: 9.2.0.4

Viewed 1000+ times

You Asked

Consider the following example (on Enterprise Edition Release 9.2.0.4.0 - 64bit):

create table big_table (
c1 number not null,
c2 number not null,
c3 date not null,
c4 number
);

insert into big_table
select 1, 10, trunc(sysdate) + trunc(rownum/100), 1
from all_objects
union all
select 2, 20, trunc(sysdate) + trunc(rownum/100), 2
from all_objects
union all
select 3, 30, trunc(sysdate) + trunc(rownum/100), 3
from all_objects
union all
select 4, 40, trunc(sysdate) + trunc(rownum/100), 4
from all_objects
union all
select 5, 50, trunc(sysdate) + trunc(rownum/100), 5
from all_objects
;
commit;

select count(*) from big_table;
COUNT(*)
----------
113740

create bitmap index c1_bmidx on big_table (c1) compute statistics;
create bitmap index c2_bmidx on big_table (c2) compute statistics;
create bitmap index c3_bmidx on big_table (c3) compute statistics;

exec dbms_stats.gather_table_stats(ownname=>'<Schema Name>',tabname=>'BIG_TABLE',method_opt=>'FOR ALL COLUMNS SIZE 225',degree=>1,cascade=>FALSE);

CASE 1:
------
explain plan set statement_id = 'p1' for
select c3, count(*)
from big_table b
where 1=1
and c1 = :1
and c2 = :2
and c3 > :3
and c3 <= :4
GROUP BY c3;

select * from table(dbms_xplan.display('PLAN_TABLE','p1'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11 | 154 | 21 |
| 1 | SORT GROUP BY | | 11 | 154 | 21 |
|* 2 | FILTER | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID | BIG_TABLE | 11 | 154 | 7 |
| 4 | BITMAP CONVERSION TO ROWIDS| | | | |
| 5 | BITMAP AND | | | | |
|* 6 | BITMAP INDEX SINGLE VALUE| C1_BMIDX | | | |
|* 7 | BITMAP INDEX SINGLE VALUE| C2_BMIDX | | | |
| 8 | BITMAP MERGE | | | | |
|* 9 | BITMAP INDEX RANGE SCAN | C3_BMIDX | | | |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(TO_DATE(:Z)>TO_DATE(:Z) AND TO_DATE(:Z)<TO_DATE(:Z))
6 - access("B"."C1"=TO_NUMBER(:Z))
7 - access("B"."C2"=TO_NUMBER(:Z))
9 - access("B"."C3">:Z AND "B"."C3"<=:Z)

Note: cpu costing is off


CASE 2:
------
explain plan set statement_id = 'p2' for
select c3, count(*)
from big_table b
where 1=1
--and c1 = :1
--and c2 = :2
and c3 > :3
and c3 <= :4
GROUP BY c3;

select * from table(dbms_xplan.display('PLAN_TABLE','p2'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 163 | 1304 | 2 |
| 1 | SORT GROUP BY NOSORT | | 163 | 1304 | 2 |
|* 2 | FILTER | | | | |
| 3 | BITMAP CONVERSION COUNT | | | | |
|* 4 | BITMAP INDEX RANGE SCAN| C3_BMIDX | | | |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(TO_DATE(:Z)>TO_DATE(:Z) AND TO_DATE(:Z)<TO_DATE(:Z))
4 - access("B"."C3">:Z AND "B"."C3"<=:Z)

Note: cpu costing is off

Questions:
---------

1. Why is the table being accessed in the query of CASE 1 when it is (theoretically) possible to get the answer from just the bitmap indexes. CASE 2 demonstrates that the optimizer can actually do it (i.e. without table access) in some cases.
2. What is the need for the filter step (# 2) in the plan (of both CASE 1 & 2)? The predicate info on this filter shows that it involves only bind variable(s). What exactly is the filter step doing ?


and Tom said...

It isn't "unnecessary" given the plan, it was the best path given the query...

It was/is the BITMAP merge operation, an alternate plan is

--------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT GROUP BY | |
|* 2 | FILTER | |
|* 3 | VIEW | index$_join$_001 |
|* 4 | HASH JOIN | |
|* 5 | HASH JOIN | |
| 6 | BITMAP CONVERSION TO ROWIDS| |
|* 7 | BITMAP INDEX RANGE SCAN | C3_BMIDX |
| 8 | BITMAP CONVERSION TO ROWIDS| |
|* 9 | BITMAP INDEX SINGLE VALUE | C1_BMIDX |
| 10 | BITMAP CONVERSION TO ROWIDS | |
|* 11 | BITMAP INDEX SINGLE VALUE | C2_BMIDX |
-------------------------------------------------------------

the merge is making us go back to the table to pick up the values by which to group by.


now, EXPLAIN PLAN cannot bind variable peek -- so in order to do this right with explain plan, you probably want to stuff constants in there -- given that it thinks 11 rows, it does the right thing here. If you do this:


ops$tkyte@ORA9IR2> explain plan for &1;
old 1: explain plan for &1
new 1: explain plan for select c3, count(*) from big_table b where 1=1 and c1 = :1 and c2 = :2 and c3 > to_date('04-may-2005') and c3 <= to_date('07-man-2005') GROUP BY c3

Explained.

ops$tkyte@ORA9IR2> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 177 |
| 1 | SORT GROUP BY | | 177 |
|* 2 | FILTER | | |
|* 3 | VIEW | index$_join$_001 | 279 |
|* 4 | HASH JOIN | | 279 |
|* 5 | HASH JOIN | | 279 |
| 6 | BITMAP CONVERSION TO ROWIDS| | |
|* 7 | BITMAP INDEX SINGLE VALUE | C1_BMIDX | |
| 8 | BITMAP CONVERSION TO ROWIDS| | |
|* 9 | BITMAP INDEX SINGLE VALUE | C2_BMIDX | |
| 10 | BITMAP CONVERSION TO ROWIDS | | |
|* 11 | BITMAP INDEX RANGE SCAN | C3_BMIDX | |
----------------------------------------------------------------------

(given that it was may 4th the day I ran this bear in mind, future readers, adjust accordingly!). It'll stop the table access when too many rows result.


The filter steps let us prevent the query from happening if the binds are NULL :) If you run the query with null binds:

select c3, count(*)
from
big_table b where 1=1 and c1 = :c1 and c2 = :c2 and c3 > to_date(:c3_a) and
c3 <= to_date( :c3_b ) GROUP BY c3


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 0 0 0 0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 245

Rows Row Source Operation
------- ---------------------------------------------------
0 SORT GROUP BY (cr=0 r=0 w=0 time=22 us)
0 FILTER (cr=0 r=0 w=0 time=1 us)
0 TABLE ACCESS BY INDEX ROWID OBJ#(38103)
0 BITMAP CONVERSION TO ROWIDS
0 BITMAP AND
0 BITMAP INDEX SINGLE VALUE OBJ#(38104) (object id 38104)
0 BITMAP INDEX SINGLE VALUE OBJ#(38105) (object id 38105)
0 BITMAP MERGE
0 BITMAP INDEX RANGE SCAN OBJ#(38106) (object id 38106)


No IO.


Rating

  (18 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Unnecessary table access in query when using multiple (> 2) bitmap indexes

Jay, May 04, 2005 - 4:14 pm UTC

Thanks for the response.

A. Consider the same query but with constants for predicates on c3:

SQL> explain plan set statement_id = 'p1' for
  2   select c3, count(*)
  3    from big_table b
  4   where 1=1
  5   and c1 = 1
  6   and c2 = 10     
  7    and c3 > to_date('04-MAY-2005','DD-MON-YYYY')
  8    and c3 <= to_date('07-MAY-2005','DD-MON-YYYY')
  9   GROUP BY c3;
  
Explained.

SQL> select * from table(dbms_xplan.display('PLAN_TABLE','p1'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------
| Id  | Operation                       |  Name             | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                   |     3 |    42 |    23 |
|   1 |  SORT GROUP BY                  |                   |     3 |    42 |    23 |
|*  2 |   VIEW                          | index$_join$_001  |    61 |   854 |     9 |
|*  3 |    HASH JOIN                    |                   |    61 |   854 |       |
|*  4 |     HASH JOIN                   |                   |    61 |   854 |       |
|   5 |      BITMAP CONVERSION TO ROWIDS|                   |       |       |       |
|*  6 |       BITMAP INDEX SINGLE VALUE | C1_BMIDX          |       |       |       |
|   7 |      BITMAP CONVERSION TO ROWIDS|                   |       |       |       |
|*  8 |       BITMAP INDEX SINGLE VALUE | C2_BMIDX          |       |       |       |
|   9 |     BITMAP CONVERSION TO ROWIDS |                   |       |       |       |
|* 10 |      BITMAP INDEX RANGE SCAN    | C3_BMIDX          |       |       |       |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("B"."C1"=1 AND "B"."C2"=10 AND "B"."C3">TO_DATE('2005-05-04 00:00:00',
              'yyyy-mm-dd hh24:mi:ss') AND "B"."C3"<=TO_DATE('2005-05-07 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss'))
   3 - access("indexjoin$_alias$_004".ROWID="indexjoin$_alias$_003".ROWID)
   4 - access("indexjoin$_alias$_003".ROWID="indexjoin$_alias$_002".ROWID)
   6 - access("indexjoin$_alias$_002"."C1"=1)
   8 - access("indexjoin$_alias$_003"."C2"=10)
  10 - access("indexjoin$_alias$_004"."C3">TO_DATE('2005-05-04 00:00:00',
              'yyyy-mm-dd hh24:mi:ss') AND "indexjoin$_alias$_004"."C3"<=TO_DATE('2005-05-07
              00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

Note: cpu costing is off

Questions:
---------
1. As you have shown earlier it uses a Hash join here. But you explained in your post above that when the expected number of rows increases, then it chooses a hash join. But in my case, the expected rows decreased from 11 (with bind variables) to 3 (with constants), yet a hash join was chosen.

2. Why is there a filter (see predicate info above) in step # 2. All the conditions mentioned in this filter seem to be done in steps 6, 8 and 10. Also there are no bind variables to check for NULL values.   

B. Table Access required (or Not) ?

   Look at the following query and pseudo-plan:
   Query:
   -----
   select c3, count(*)
   from gsp_dbo.big_table b
   where 1=1
   and c1 = 1
   and c2 = 10
   and c3 >  to_date('04-MAY-2005','DD-MON-YYYY')
   and c3 <= to_date('07-MAY-2005','DD-MON-YYYY')
   GROUP BY c3;
   
   Pseudo-plan:
   -----------
   
   1. BITMAP INDEX SINGLE VALUE on C1_BMIDX
   2. BITMAP INDEX SINGLE VALUE on C2_BMIDX
   3. BITMAP AND on 1 and 2 above. Lets call it C1_C2_BITMAP
   4. FOR i IN (BITMAP INDEX RANGE SCAN on C3_BMIDX) LOOP  -- fetches one bitmap at a time
        BITMAP AND of i and C1_C2_BITMAP
        BITMAP CONVERSION COUNT of the result of the above step
      END LOOP
   5. Sort on c3 if required
   
   For example if big_table had 5 rows with the following bitmaps in its indexes:
   
   Bitmap Index   Value           Bitmap
   ------------   -----           ---------
   C1_BMIDX         1             1 0 1 0 1
   C2_BMIDX        10             1 1 1 0 1
   C3_BMIDX       5th May 2005    1 0 1 1 1
   C3_BMIDX       6th May 2005    1 0 1 1 0
   C3_BMIDX       7th May 2005    1 1 1 0 0
   
   
                       5th May 2005   6th May 2005   7th May 2005
                       ------------   ------------   ------------
   C1_C2_BITMAP        1 0 1 0 1      1 0 1 0 1      1 0 1 0 1
   C3_BMIDX            1 0 1 1 1      1 0 1 1 0      1 1 1 0 0
   ------------        ------------  -------------   ------------
   BITMAP AND          1 0 1 0 1      1 0 1 0 0      1 0 1 0 0
   ------------        ------------  -------------   ------------
   BITMAP Conv. count      3              2              2
                           
   This pseudo plan only involves bitwise operations - Bitmap And and Bitmap count.
   
   Questions:
   
   3. Since it involves only bitwise operations, wouldn't this pseudo-plan be far better than these alternatives (that the CBO seems to prefer):
      a. Bitmap Access -> Bitmap Merge -> Bitmap And -> Bitmap conv. to RowId -> Table access
      b. Bitmap Access -> Bitmap conv. to RowId -> Hash join (on rowid)   
   
   4. If the pseudo-plan does looks promising, does the CBO lack the capability to recognize it ?
   
   5. Can we make the CBO execute the pseudo-plan listed above, so we can see how this pseudo-plan fares compared to the other alternatives (in terms of IO and execution time)
      
   6. Even if the query is changed to a single value condition on c3 with all constants, the CBO still prefers the hash join plan as opposed to the plan with (only bitwise operations) BITMAP AND and BITMAP Conversion count that should be possible. 
In fact in this plan (see below) the expected row count is 1 (all the more surprising that it should prefer a Hash join plan)
      
SQL> explain plan set statement_id = 'p1' for
  2   select c3, count(*)
  3    from big_table b
  4   where 1=1
  5   and c1 = 1
  6   and c2 = 10     
  7    and c3 = to_date('07-MAY-2005','DD-MON-YYYY')
  8   GROUP BY c3;

Explained.

SQL> select * from table(dbms_xplan.display('PLAN_TABLE','p1'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------
| Id  | Operation                       |  Name             | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                   |     1 |    14 |     6 |
|   1 |  SORT GROUP BY NOSORT           |                   |     1 |    14 |     6 |
|*  2 |   VIEW                          | index$_join$_001  |    20 |   280 |     6 |
|*  3 |    HASH JOIN                    |                   |    20 |   280 |       |
|*  4 |     HASH JOIN                   |                   |    20 |   280 |       |
|   5 |      BITMAP CONVERSION TO ROWIDS|                   |       |       |       |
|*  6 |       BITMAP INDEX SINGLE VALUE | C3_BMIDX          |       |       |       |
|   7 |      BITMAP CONVERSION TO ROWIDS|                   |       |       |       |
|*  8 |       BITMAP INDEX SINGLE VALUE | C1_BMIDX          |       |       |       |
|   9 |     BITMAP CONVERSION TO ROWIDS |                   |       |       |       |
|* 10 |      BITMAP INDEX SINGLE VALUE  | C2_BMIDX          |       |       |       |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("B"."C1"=1 AND "B"."C2"=10 AND "B"."C3"=TO_DATE('2005-05-07 00:00:00',
              'yyyy-mm-dd hh24:mi:ss'))
   3 - access("indexjoin$_alias$_004".ROWID="indexjoin$_alias$_003".ROWID)
   4 - access("indexjoin$_alias$_003".ROWID="indexjoin$_alias$_002".ROWID)
   6 - access("indexjoin$_alias$_002"."C3"=TO_DATE('2005-05-07 00:00:00',
              'yyyy-mm-dd hh24:mi:ss'))
   8 - access("indexjoin$_alias$_003"."C1"=1)
  10 - access("indexjoin$_alias$_004"."C2"=10)

Note: cpu costing is off      
      
  Your thoughts on this are very much appreciated.
 

Tom Kyte
May 04, 2005 - 6:07 pm UTC

1) 3 was the final result, 61 was the intermediate rowcount. 61 was the relevant value, before the group by.

2) as it combines the rows together -- it has to make sure all three conditions are met. I agree that in this case, it appears redundant.

3) benchmark it? for such small rowcounts -- it isn't going to matter one way or the other. Does this really model what you are going to be doing in the end. with things like "11 rows" -- well, it is small



Unnecessary table access in query when using multiple (> 2) bitmap indexes

Jay, May 04, 2005 - 6:24 pm UTC

1. How can I benchmark without knowing how to get the CBO to execute as per the pseudo-plan. Question 5 in my earlier post (which you have skipped) was how to get the CBO to execute as per the pseudo-plan.

2. This is a test case that replicates a query in our production environment. The plan in this test case (Case 1 of my first post) mirrors the plan in production for a similar query. The table in production has about 300 million rows. The plan remains the same irrespective of whether the 300 million row table is partitioned (on c3) or not.

3. In general shouldn't bitwise operations be a lot faster than hash joins or table reads ?

Tom Kyte
May 04, 2005 - 6:58 pm UTC

do your plans have 11 rows? are you doing things on that scale?

Possible strategy

Jonathan Lewis, May 05, 2005 - 10:09 am UTC

Your iteration through dates can't work because Oracle cannot detect that your data is 'date-only', so you will need to change the predicate to an IN-LIST, then add the USE_CONCAT hint to get a UNION-ALL effect of the single-date plans.

So the query became:

select /*+ use_concat */
c3, count(*)
from
big_table b
where
1=1
and c1 = 1
and c2 = 10
and c3 in (
to_date('06-May-2005','DD-MON-YYYY'),
to_date('07-May-2005','DD-MON-YYYY'),
to_date('08-May-2005','DD-MON-YYYY')
)
GROUP BY c3
;


On 9.2.0.6, I got the following plan:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 29 |
| 1 | SORT GROUP BY | | 1 | 14 | |
| 2 | CONCATENATION | | | | |
| 3 | BITMAP CONVERSION TO ROWIDS| | | | |
| 4 | BITMAP AND | | | | |
|* 5 | BITMAP INDEX SINGLE VALUE| C3_BMIDX | | | |
|* 6 | BITMAP INDEX SINGLE VALUE| C1_BMIDX | | | |
|* 7 | BITMAP INDEX SINGLE VALUE| C2_BMIDX | | | |
| 8 | BITMAP CONVERSION TO ROWIDS| | | | |
| 9 | BITMAP AND | | | | |
|* 10 | BITMAP INDEX SINGLE VALUE| C3_BMIDX | | | |
|* 11 | BITMAP INDEX SINGLE VALUE| C1_BMIDX | | | |
|* 12 | BITMAP INDEX SINGLE VALUE| C2_BMIDX | | | |
| 13 | BITMAP CONVERSION TO ROWIDS| | | | |
| 14 | BITMAP AND | | | | |
|* 15 | BITMAP INDEX SINGLE VALUE| C3_BMIDX | | | |
|* 16 | BITMAP INDEX SINGLE VALUE| C1_BMIDX | | | |
|* 17 | BITMAP INDEX SINGLE VALUE| C2_BMIDX | | | |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("B"."C3"=TO_DATE('2005-05-08 00:00:00', 'yyyy-mm-dd
hh24:mi:ss'))
6 - access("B"."C1"=1)
7 - access("B"."C2"=10)
10 - access("B"."C3"=TO_DATE('2005-05-07 00:00:00', 'yyyy-mm-dd
hh24:mi:ss'))
11 - access("B"."C1"=1)
12 - access("B"."C2"=10)
15 - access("B"."C3"=TO_DATE('2005-05-06 00:00:00', 'yyyy-mm-dd
hh24:mi:ss'))
16 - access("B"."C1"=1)
17 - access("B"."C2"=10)



I have not yet checked why the bitmap to btree conversion happens (I had to use a slightly smaller range of dates to get it on my system, so we probably have a slight variation in optimizer-related parameters somewhere - or there is a slight change in code in my version which was 9.2.0.6). It is possible that it is an error in the cost model - there are some very odd buggy bits in the costing of bitmap indexe

Curious on index statistics collected above

Scot, May 05, 2005 - 4:02 pm UTC

Statistics on the bitmaps in the original example were created at the time of bitmap index creation, and the dbms_stats call used a cascade=>false.

I thought that I've read in the past to shy away from this sort of statistics collection, in favor of dbms_stats, but I don't know where I read that, and I might be remembering a myth.

Would changing the index statistic collection method in this case (and/or in general) make any difference?


Tom Kyte
May 05, 2005 - 6:02 pm UTC

in general, there are issues with the create/compute. I do not use it. I did here just to emulate your environment.

I doubt it had an effect in this case, but -- try it out :)



relevant?

Gabe, May 05, 2005 - 5:27 pm UTC

This thread brought memories of a discussion we had regarding the implementation of a DAY dimension …

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:4632159445946#21994567492814 <code>

… where I said:

<quote>My worry (justified or not … maybe you can comment on it) is
that, by choosing to use the FK DATE from the fact table, the action of the end-users may preclude the usage of some optimization techniques (the usage of the index, the applicability of the star transformation with bitmap indexes, etc.).</quote>

Essentially, I was (still am) apprehensive of DATE manipulations in the fact table especially when it comes to end-users or BI tools.

I left that discussion with … <quote>I’ll try to see if I can materialize some technical (contra)arguments for <Tom>I don't see how querying directly against the attributes of the fact table would preclude anything</Tom></quote>

Having read Jonathan’s answer to this thread … <quote>Your iteration through dates can't work because Oracle cannot detect that your data is 'date-only' … </quote> (that is, ‘DAY-only’) I had my own moment of hand-smacking-forehead.

If this BIG_TABLE were a fact table with C3 the migrated FK from some DAY dimension table then the optimizer would have better insight into that C3-related predicate … hence it should/may use the nice BITMAP AND on all 3 bitmap indexes (without having to be nudged along with the hint and the [expert] query re-write).

Just expanding on the table/data presented here …

flip@FLOP> create table day_dim
2 ( id number(9) not null
3 ,dy date not null
4 ,constraint daypk primary key (id)
5 ) organization index
6 ;

Table created.

Elapsed: 00:00:00.00
flip@FLOP> insert into day_dim
2 select rownum as id, c3 as dy
3 from (select distinct c3 from big_table)
4 ;

272 rows created.

Elapsed: 00:00:00.00
flip@FLOP> exec dbms_stats.gather_table_stats(ownname=> USER,tabname=>'DAY_DIM',method_opt=>'FOR ALL INDEXES FOR ALL COLUMNS SIZE 225' ,degree=>1,cascade=>FALSE);

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.02
flip@FLOP> create table big_fact
2 (
3 c1 number not null,
4 c2 number not null,
5 c3 number not null references day_dim(id),
6 c4 number
7 );

Table created.

Elapsed: 00:00:00.00
flip@FLOP>
flip@FLOP> insert into big_fact
2 select t.c1, t.c2, d.id, t.c4
3 from big_table t, day_dim d
4 where t.c3 = d.dy
5 ;

135745 rows created.

Elapsed: 00:00:01.04
flip@FLOP> create bitmap index c1_bfidx on big_fact (c1) compute statistics;

Index created.

Elapsed: 00:00:00.01
flip@FLOP> create bitmap index c2_bfidx on big_fact (c2) compute statistics;

Index created.

Elapsed: 00:00:00.01
flip@FLOP> create bitmap index c3_bfidx on big_fact (c3) compute statistics;

Index created.

Elapsed: 00:00:00.01
flip@FLOP>
flip@FLOP> exec dbms_stats.gather_table_stats(ownname=> USER,tabname=>'BIG_FACT',method_opt=>'FOR ALL INDEXES FOR ALL COLUMNS SIZE 225' ,degree=>1,cascade=>FALSE);

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.04

flip@FLOP> explain plan set statement_id = 'p3' for
2 select d.dy, count(*)
3 from big_fact b, day_dim d
4 where 1=1
5 and b.c3 = d.id
6 and b.c1 = 1
7 and b.c2 = 10
8 and d.dy > to_date('04-MAY-2005','DD-MON-YYYY')
9 and d.dy <= to_date('07-MAY-2005','DD-MON-YYYY')
10 group by d.dy;

Explained.

Elapsed: 00:00:00.00
flip@FLOP> select * from table(dbms_xplan.display('PLAN_TABLE','p3'))
2 ;

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 42 | 15 (20)|
| 1 | SORT GROUP BY | | 2 | 42 | 15 (20)|
| 2 | NESTED LOOPS | | 36 | 756 | 13 (24)|
|* 3 | INDEX FULL SCAN | DAYPK | 2 | 22 | 1 (0)|
| 4 | BITMAP CONVERSION TO ROWIDS| | | | |
| 5 | BITMAP AND | | | | |
|* 6 | BITMAP INDEX SINGLE VALUE| C3_BFIDX | | | |
|* 7 | BITMAP INDEX SINGLE VALUE| C1_BFIDX | | | |
|* 8 | BITMAP INDEX SINGLE VALUE| C2_BFIDX | | | |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("D"."DY">TO_DATE('2005-05-04 00:00:00', 'yyyy-mm-dd hh24:mi:ss')
AND "D"."DY"<=TO_DATE('2005-05-07 00:00:00', 'yyyy-mm-dd hh24:mi:
ss'))
6 - access("B"."C3"="D"."ID")
7 - access("B"."C1"=1)
8 - access("B"."C2"=10)

24 rows selected.

Elapsed: 00:00:00.00

And just to contrast, here is the plan I get for the original query against BIG_TABLE:

flip@FLOP> explain plan set statement_id = 'p1' for
2 select c3, count(*)
3 from big_table b
4 where 1=1
5 and c1 = 1
6 and c2 = 10
7 and c3 > to_date('04-MAY-2005','DD-MON-YYYY')
8 and c3 <= to_date('07-MAY-2005','DD-MON-YYYY')
9 GROUP BY c3;

Explained.

Elapsed: 00:00:00.00
flip@FLOP> select * from table(dbms_xplan.display('PLAN_TABLE','p1'))
2 ;

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 10 (30)|
| 1 | SORT GROUP BY | | 1 | 14 | 10 (30)|
| 2 | TABLE ACCESS BY INDEX ROWID | BIG_TABLE | 24 | 336 | 8 (38)|
| 3 | BITMAP CONVERSION TO ROWIDS| | | | |
| 4 | BITMAP AND | | | | |
|* 5 | BITMAP INDEX SINGLE VALUE| C1_BMIDX | | | |
|* 6 | BITMAP INDEX SINGLE VALUE| C2_BMIDX | | | |
| 7 | BITMAP MERGE | | | | |
|* 8 | BITMAP INDEX RANGE SCAN | C3_BMIDX | | | |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("B"."C1"=1)
6 - access("B"."C2"=10)
8 - access("B"."C3">TO_DATE('2005-05-04 00:00:00', 'yyyy-mm-dd hh24:mi:ss')
AND "B"."C3"<=TO_DATE('2005-05-07 00:00:00', 'yyyy-mm-dd hh24:mi:
ss'))

23 rows selected.

Elapsed: 00:00:00.00

Now, I’m not suggesting one plan is better than the other … only that by always having a TIME (DAY in this case) dimension table seems to give the CBO better insight. Of course, the PK on DAY_DIM could’ve been the DATE column … probably the same effect … the notable exception would be that the DATE PK would get migrated into the fact and one would again be tempted to use the DATE from there (“what would be the point of joining to the time dimension?” … would likely be the explanation).

I’m curious if my hand-smacking-forehead has only produced a … sore forehead?

Cheers.

Tom Kyte
May 05, 2005 - 6:18 pm UTC

that is interesting, really. that is a sort of use concat with a nested loops join (rambling to myself)

I'll point Jonathan here as well. but that does make sense. I'll have to think about that -- to convince myself that 'in general' -- interesting as always

Dimension Table

Jonathan Lewis, May 06, 2005 - 3:43 am UTC

I think the answer to your primary question - does having the dimension table assist the optimizer - can be answered by my standard comment: the better you describe your system to the optimizer, the better choice of path it makes.

I could PROBABLY take your time dimension table and really foul up the optimisation by inserting thousands of values that do not (yet) exist in the fact table. On the other hand, if your time dimension has a row IF AND ONLY IF there is corresponding data in the fact table you have given the optimizer a very accurate idea of how many distinct values exist for the time column in the big fact table without doing a compute on the big fact table.

You also have the benefit that the optimizer can work out whether it is cheaper to use a mechanism that visits the dimension table to pick up values for your specific query, or whether it should use a strategy that is cheaper on index accesses but more expensive on table accesses because it has to visit the fact table.

This is one reason why carrying normalization into the table level is such a good thing - it allows the optimizer maximum flexibility. (Alas, this also allows maximum room for error, and increased time required to find the optimum plan).



Thank you.

Gabe, May 06, 2005 - 9:44 am UTC

Appreciate the follow-up.

Unnecessary table access in query when using multiple (> 2) bitmap indexes

Jay, May 06, 2005 - 7:02 pm UTC

Thanks Jonathan for your example on how to force the optimizer to follow the pseudo-plan. 
It is a puzzle as to why the optimizer insists on doing the bitmap conv. to rowid (in the plan for Jonathan's query) when it could as well count the bits in the bitmap after the BITMAP AND step.

I ran Jonathan's query on my system (9.2.0.4 EE) and got the same exact plan (except that the cost was 37).

On Jonathan's comment that "Oracle cannot detect that your data is 'date-only' ": I did not understand how the knowledge that column c3 not having a time component will help the optimizer.
How different would it be if c3 were a NUMBER column and the conditions on c3 were c3 > 4 and c3 <= 7.
I thought the optimizer would figure out the number of distinct values for the date range on the c3 condition from the table statistics. This would give an idea as to how many bitmaps it would scan on index c3_bmidx.
Ofcourse this would only be possible (if not using bind variable peeking) if not using bind variables on c3 .
But even when not using bind variables, the optimizer prefers a hash join (as shown in my earlier post May 4th) for a 3 day range on c3.

create table big_table2 (
  c1  number not null,
  c2  number not null,
  c3 number not null,
  c4 number
);

insert into big_table2 
select 1, 10, trunc(rownum/100), 1
from all_objects
union all
select 2, 20, trunc(rownum/100), 2
from all_objects
union all
select 3, 30, trunc(rownum/100), 3
from all_objects
union all
select 4, 40, trunc(rownum/100), 4
from all_objects
union all
select 5, 50, trunc(rownum/100), 5
from all_objects
;
commit;

select count(*) from big_table;
  COUNT(*)
----------
    113740

create bitmap index c1_bmidx2 on big_table2 (c1);
create bitmap index c2_bmidx2 on big_table2 (c2);
create bitmap index c3_bmidx2 on big_table2 (c3);

/* Now collecting stats on indexes using dbms_stats (rather than during creation. Also histogram size increased to max. allowable 254 */

exec dbms_stats.gather_table_stats(ownname=>'<Schema Name>',tabname=>'BIG_TABLE2',method_opt=>'FOR ALL COLUMNS SIZE 254',degree=>1,cascade=>TRUE);

SQL> explain plan set statement_id = 'p1' for
  2    select c3, count(*)
  3      from big_table2 b
  4      where 1=1
  5     and c1 = 1
  6     and c2 = 10     
  7    and c3 > 4
  8     and c3 <= 7
  9      GROUP BY c3;

Explained.

SQL> select * from table(dbms_xplan.display('PLAN_TABLE','p1'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------
| Id  | Operation                       |  Name             | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                   |     3 |    30 |    23 |
|   1 |  SORT GROUP BY                  |                   |     3 |    30 |    23 |
|*  2 |   VIEW                          | index$_join$_001  |    54 |   540 |     9 |
|*  3 |    HASH JOIN                    |                   |    54 |   540 |       |
|*  4 |     HASH JOIN                   |                   |    54 |   540 |       |
|   5 |      BITMAP CONVERSION TO ROWIDS|                   |       |       |       |
|*  6 |       BITMAP INDEX SINGLE VALUE | C1_BMIDX2         |       |       |       |
|   7 |      BITMAP CONVERSION TO ROWIDS|                   |       |       |       |
|*  8 |       BITMAP INDEX SINGLE VALUE | C2_BMIDX2         |       |       |       |
|   9 |     BITMAP CONVERSION TO ROWIDS |                   |       |       |       |
|* 10 |      BITMAP INDEX RANGE SCAN    | C3_BMIDX2         |       |       |       |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("B"."C1"=1 AND "B"."C2"=10 AND "B"."C3">4 AND "B"."C3"<=7)
   3 - access("indexjoin$_alias$_004".ROWID="indexjoin$_alias$_003".ROWID)
   4 - access("indexjoin$_alias$_003".ROWID="indexjoin$_alias$_002".ROWID)
   6 - access("indexjoin$_alias$_002"."C1"=1)
   8 - access("indexjoin$_alias$_003"."C2"=10)
  10 - access("indexjoin$_alias$_004"."C3">4 AND "indexjoin$_alias$_004"."C3"<=7)
  
This is the same plan that I got when I used constants for predicates on c3 with c3 as a date column.
So not sure if c3 being a date column makes any difference.

On Tom's question - "do your plans have 11 rows?  are you doing things on that scale? "
The plan in production with no bind variables shows rows = 1. This is probably because we have not collected column histograms.
The actual data shows about 4000 for each date (i.e. c3 value). Would like to avoid table access for these 4000 rows for each date.
 

A problem with Dimension-Fact design?

Charu Joshi, May 09, 2005 - 12:24 pm UTC

Would like to mention one problem that I am encountering with using the Dimension-Fact kind of 'Star schema' design.

In such a design, filter conditions are almost always on the Dimension table and these get applied on the Fact table through the join with Dimension table. My problem is that because the condition is on Dimension table, the CBO is almost NEVER able to accurately estimate the cardinality of the resulting join. The CBO will only use the 'density' statistic of the column in the Fact/Dimension table, due to which the cardinality estimate (and consequently the cost estimate and consequently the execution plan) goes wrong!! This is making the queries on Dimension-Fact type design absolutely nightmarish to tune.


Reusing Gabe's example:

The estimates for the two columns are:


TABLE_NAME COLUMN_NAM NUM_BUCKETS NUM_DISTINCT DENSITY
---------- ---------- ----------- ------------ ----------
BIG_FACT C3 70 71 .000014235
DAY_DIM ID 70 71 .007042254


Shows that both the columns are analyzed to the best possible extent (Frequency based histogram).

Now running the query directly on the Fact table:

cj@CJ>select * from big_fact
2 where c3 > 1 and c3 <= 4;

1500 rows selected.

Elapsed: 00:00:04.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=1500 Bytes=18000)
1 0 TABLE ACCESS (FULL) OF 'BIG_FACT' (Cost=5 Card=1500 Bytes=18000)

The estimated cardinality is 1500, which is exactly equal to the actual cardinality.

Now trying the similar query but through dimension table:


cj@CJ>select * from day_dim d, big_fact b
2 where d.id = b.c3 and d.dy > to_date('09-MAY-2005','DD-MON-YYYY')
3 and d.dy <= to_date('12-MAY-2005','DD-MON-YYYY');

1500 rows selected.

Elapsed: 00:00:04.07

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=742 Bytes=17066)
1 0 NESTED LOOPS (Cost=5 Card=742 Bytes=17066)
2 1 TABLE ACCESS (FULL) OF 'BIG_FACT' (Cost=5 Card=35125 Bytes=421500)
3 1 INDEX (UNIQUE SCAN) OF 'DAYPK' (UNIQUE)


Now the estimate cardinality greatly differs from the actual cardinality, obviously because the cardinality is now:

35125*3*.007042254 ~= 742

It seems to me that the only way to solve this is to denormalize to add the date column directly in the fact table and give the predicate on the fact table.

Hope I have been able to convey the 'problem'. Am desperately trying to find out a better solution. Your thoughts on this please.

Thanks & regards,
Charu




Tom Kyte
May 09, 2005 - 1:39 pm UTC

and if you added /*+ DYNAMIC_SAMPLING(d 4) */ to the second query.

odd numbers ...

Gabe, May 09, 2005 - 5:42 pm UTC

Charu:

<quote>
Reusing Gabe's example:
</quote>

Something is probably quite different in your environment … [clearly you have 70 buckets … I had 225 as Jay had originally used].

Here are the plans I get for your queries (just used different ids for obvious reasons):

flip@FLOP> select * from big_fact
2 where c3 > 5 and c3 <= 8;

1500 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=1207 Bytes=1
4484)

1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_FACT' (Cost=23 Card=
1207 Bytes=14484)

2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (RANGE SCAN) OF 'C3_BFIDX'




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
124 consistent gets
0 physical reads
0 redo size
24683 bytes sent via SQL*Net to client
1588 bytes received via SQL*Net from client
101 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1500 rows processed

flip@FLOP> select * from day_dim d, big_fact b
2 where d.id = b.c3 and d.dy > to_date('09-MAY-2005','DD-MON-YYYY')
3 and d.dy <= to_date('12-MAY-2005','DD-MON-YYYY');

1500 rows selected.

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=20 Card=302 Bytes=69
46)

1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_FACT' (Cost=20 Card=
499 Bytes=5988)

2 1 NESTED LOOPS (Cost=20 Card=302 Bytes=6946)
3 2 INDEX (FULL SCAN) OF 'DAYPK' (UNIQUE) (Cost=1 Card=1 B
ytes=11)

4 2 BITMAP CONVERSION (TO ROWIDS)
5 4 BITMAP INDEX (SINGLE VALUE) OF 'C3_BFIDX'




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
132 consistent gets
0 physical reads
0 redo size
29307 bytes sent via SQL*Net to client
1588 bytes received via SQL*Net from client
101 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1500 rows processed

And for the original BIG_TABLE:

flip@FLOP> select * from big_table b
2 where b.c3 > to_date('09-MAY-2005','DD-MON-YYYY')
3 and b.c3 <= to_date('12-MAY-2005','DD-MON-YYYY');

1500 rows selected.

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=26 Card=1207 Bytes=2
0519)

1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BIG_TABLE' (Cost=26 Card
=1207 Bytes=20519)

2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (RANGE SCAN) OF 'C3_BMIDX'




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
123 consistent gets
0 physical reads
0 redo size
20207 bytes sent via SQL*Net to client
1588 bytes received via SQL*Net from client
101 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1500 rows processed



Problem with Dimension-Fact design?

Charu Joshi, May 10, 2005 - 12:56 am UTC


<TK>
and if you added /*+ DYNAMIC_SAMPLING(d 4) */ to the second query.
</TK>

A. Doesn't do the job. See below.

cj@CJ> select /*+ DYNAMIC_SAMPLING(d 4) */ * from day_dim d, big_fact b
2 where d.id = b.c3 and d.dy > to_date('09-MAY-2005','DD-MON-YYYY')
3 and d.dy <= to_date('12-MAY-2005','DD-MON-YYYY');

1500 rows selected.

Elapsed: 00:00:04.05

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=742 Bytes=17066)
1 0 NESTED LOOPS (Cost=5 Card=742 Bytes=17066)
2 1 TABLE ACCESS (FULL) OF 'BIG_FACT' (Cost=5 Card=35125 Bytes=421500)
3 1 INDEX (UNIQUE SCAN) OF 'DAYPK' (UNIQUE)

I checked the 10046 level 8 trace and found the following statement executed just before the actual query:

SELECT /*+ ALL_ROWS IGNORE_WHERE_CLAUSE */ NVL(SUM(C1),0), NVL(SUM(C2),0)
FROM
(SELECT /*+ IGNORE_WHERE_CLAUSE NOPARALLEL("D") */ 1 AS C1, CASE WHEN
"D"."DY">TO_DATE('2005-05-09 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND
"D"."DY"<=TO_DATE('2005-05-12 00:00:00', 'yyyy-mm-dd hh24:mi:ss') THEN 1
ELSE 0 END AS C2 FROM "DAY_DIM" "D") SAMPLESUB

So the hint IS getting picked up.


B. Even if this were to work, it won't be useful with our front-end tool, which is Siebel Analytics. SA generates queries at runtime and it is impossible to set hints for individual queries.

C. I wanted to highlight the generic issue that underlies the problem -- For queries based on dimension-fact design where the filtering conditions are on dimension tables, the CBO will almost NEVER predict the correct cardinality (due to the way it is built) and consequently choose sub-optimal execution plans. I wanted your comment on this. I will try to contrive a more demonstrative example later in the day.


To Gabe,

Yes. I think the reason for the difference lies in the number of records returned by the 'all_objects' view.

cj@CJ>select count(*) from big_table;

COUNT(*)
----------
35125

1 row selected.

Elapsed: 00:00:00.05
cj@CJ>select count(*) from all_objects;

COUNT(*)
----------
7032

Thanks & regards,
Charu.

Tom Kyte
May 10, 2005 - 7:58 am UTC

what version? for me, it got it almost precisely.

Problem with Dimension-Fact design?

A reader, May 10, 2005 - 8:47 am UTC

Sorry should have mentioned it in the beginning itself.

cj@CJ>select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
PL/SQL Release 9.2.0.6.0 - Production
CORE 9.2.0.6.0 Production
TNS for Solaris: Version 9.2.0.6.0 - Production
NLSRTL Version 9.2.0.6.0 - Production


1 select /*+ DYNAMIC_SAMPLING(d 4) */ * from day_dim d, big_fact b
2 where d.id = b.c3 and d.dy > to_date('09-MAY-2005','DD-MON-YYYY')
3* and d.dy <= to_date('12-MAY-2005','DD-MON-YYYY')
siebel@VLCRMDU>/

1500 rows selected.

Elapsed: 00:00:06.09

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=742 Bytes=17066)
1 0 NESTED LOOPS (Cost=5 Card=742 Bytes=17066)
2 1 TABLE ACCESS (FULL) OF 'BIG_FACT' (Cost=5 Card=35125 Bytes=421500)
3 1 INDEX (UNIQUE SCAN) OF 'DAYPK' (UNIQUE)


Couldn't understand how the hint would help the cardinality estimate, since even without the hint Oracle accurately calculates the cardinality of 'd'.

Join Selectivity = Max(Density(d.id), Density(b.c3))
* (Card(d) - Num_Nulls(d))/Card(d)
* (Card(b) - Num_Nulls(b))/Card(b)

(Thanks to Mr. Breitling)

= Max(.007042254, .000014235)
* (71 - 0)/71 * (35125 - 0)/35125

= .007042254 * 1 * 1 = .007042254

Join Cardinality = JS * Card(d) * Card(b)
= .007042254 * 3 * 35125 = 742.077515

Where & how would the hint influence these calculations?

By any chance did you analyze the column BIG_FACT.C3 with SIZE 1? In this particular case, dropping the histogram on C3 brought the estimated cardinality close to accurate.

Could you please comment on the point C. in my earlier note, which is my main concern, and is causing me some grief. Any new light on the topic would be much appreciated.

Thanks & regards,
Charu.



Tom Kyte
May 10, 2005 - 9:20 am UTC

c) dynamic sampling and sql profiles.


I ran the above stuff verbaitim, i changed nothing.

why star-design problem?

Gabe, May 10, 2005 - 9:57 am UTC

Charu:

Sorry to keep butting in … regarding your point C:

<quote>For queries based on dimension-fact design where the filtering conditions are on dimension tables, the CBO will almost NEVER predict the correct cardinality (due to the way it is built) … [to be continued]</quote>

I’m curious why do you see this as a star-design _issue_ … after all, what we have here is just a master-detail relationship (true the FK has a bitmap index rather than a more common b-tree index one would have in an OLTP system … but still).

Why is the estimated cardinality _the_ goal? An indicator? perhaps.

For instance, if I change optimizer_index_cost_adj and optimizer_index_caching back to their defaults, I do get your plans … but note the estimates for the two queries are still the same (1207 and 302) which tells me the estimated cardinality is not all one should look for and hence [to continue your quote above]<quote> … and consequently choose sub-optimal execution plans.</quote> is not necessarily correct (assuming one _knows_ which is the optimal plan).

flip@FLOP> alter session set optimizer_index_cost_adj=100;

Session altered.

Elapsed: 00:00:00.00
flip@FLOP> alter session set optimizer_index_caching=0;

Session altered.

Elapsed: 00:00:00.00
flip@FLOP> select * from big_fact
2 where c3 > 5 and c3 <= 8;

1500 rows selected.

Elapsed: 00:00:00.09

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=37 Card=1207 Bytes=1
4484)

1 0 TABLE ACCESS (FULL) OF 'BIG_FACT' (Cost=37 Card=1207 Bytes
=14484)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
474 consistent gets
362 physical reads
0 redo size
24593 bytes sent via SQL*Net to client
1588 bytes received via SQL*Net from client
101 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1500 rows processed

flip@FLOP>
flip@FLOP>
flip@FLOP> select * from day_dim d, big_fact b
2 where d.id = b.c3 and d.dy > to_date('09-MAY-2005','DD-MON-YYYY')
3 and d.dy <= to_date('12-MAY-2005','DD-MON-YYYY');

1500 rows selected.

Elapsed: 00:00:01.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=37 Card=302 Bytes=69
46)

1 0 NESTED LOOPS (Cost=37 Card=302 Bytes=6946)
2 1 TABLE ACCESS (FULL) OF 'BIG_FACT' (Cost=37 Card=135745 B
ytes=1628940)

3 1 INDEX (UNIQUE SCAN) OF 'DAYPK' (UNIQUE)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
576 consistent gets
354 physical reads
0 redo size
29313 bytes sent via SQL*Net to client
1588 bytes received via SQL*Net from client
101 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1500 rows processed

Cheers.


Estimated cardinality

Jay, May 11, 2005 - 9:02 am UTC

On Gabe's statement above 'Why is the estimated cardinality _the_ goal?'

Not speaking for Charu here but the estimated cardinality is not the goal. The goal is to get the optimizer to choose an optimal plan (also assuming that we humans wouldn't recognize an optimal plan when we see one let alone suggest one). To do that the optimizer would need as much accurate information about the data distribution as possible. One of the indicators that the optimizer has accurate data distribution information is to check the estimated cardinality that the optmizer comes up with. The estimated cardinality (or the actual cardinality) does not depend on the plan choosen, but the plan choosen gets influenced by the estimated cardinality. Hence the fixation on cardinality (estimated Vs actual).

I think it is important to see that the optimizer comes up with the correct cardinality at every row source operation in the plan (and not just the final cardinality of the query itself).

Tom Kyte
May 11, 2005 - 12:04 pm UTC

bad cardinalities are the things I generally look for first, yes.

Dimension-Fact problem and CBO Cardinality estimate

Charu Joshi, May 17, 2005 - 10:26 am UTC

Hi Tom & Gabe,

Sorry for not following up sooner.

Okay, a word about why I am repeatedly harping on this issue and why I have suffered because of this inherent "feature/drawback" of the CBO while calculating join cardinality:

Ours is a D/W application where the the fact tables contain 25 million to 75 million records. We do not use bitmap indexes and star transformation yet (because of strong reservations of a senior DBA), but now we are seriously looking into that option. Yet, even with B*Tree indexes, we have found that the queries perform quite satisfactorily if the execution plan is correct i.e. if the CBO correctly devises the join order and the predicate order. But due to the above-mentioned inherent "drawback", CBO often chooses wrong predicate/join order. I have had to manipulate statistics to force the right plan, and so far I have been successful. But I am looking for a 'cleaner' alternative, since I suspect I am close to hitting a dead-end with this methodology. Hope this makes sense. Now to the illustration:

Given below is a tiny example to illustrate the cardinality estimation problem that leads to sub-optimal plans. Would really appreciate if you could suggest a way out of this. The constraints are:

-- No hints.
-- No stored outlines.
-- No touching the queries.

These are what we face in our environment which uses Siebel Analytics front-end.

Version: 9.2.0.6 EE
OS: Solaris 8

-- Dimension table.
create table dim tablespace data_sml
as select rownum user_id, username from dba_users

create index dim_idx1 on dim(user_id) tablespace indx_sml

create index dim_idx2 on dim(username) tablespace indx_sml

begin
dbms_stats.gather_table_stats(ownname=>user, tabname=>'DIM',
method_opt=>'FOR COLUMNS USER_ID SIZE 100, USERNAME SIZE 100', cascade=>TRUE);
end;

-- Fact table.
create table fact2 tablespace data_lge
as select * from dba_objects where 1=2

alter table fact2 add (user_id number not null)

/* NOTE
I have looped 200 times to create ~2M records. You can
change the counter based on records in DBA_OBJECTS */

begin
for i in 1..200
loop
insert into fact2
select a.*, 0 from dba_objects a;
end loop;
commit;
end;

update fact2 set user_id = rownum

update fact2
set user_id = (case when user_id < 1001 then 2
when user_id between 1001 and 10000 then 3
when user_id between 10001 and 50000 then 4
when user_id between 50001 and 200000 then 5
when user_id between 200001 and 1000000 then 6
when user_id > 1000000 then 1 end)


create index fact2_idx1 on fact2(user_id) tablespace data_lge
nologging parallel 4 compute statistics

alter index fact2_idx1 logging noparallel

begin dbms_stats.gather_table_stats
(ownname=>user, tabname=>'FACT2',
method_opt=>'FOR COLUMNS user_id SIZE 100, object_id SIZE 100', cascade=>TRUE);
end;


So the data we have is:

cj@CJ>select user_id, username from dim;

USER_ID USERNAME
---------- ----------
1 SYS
2 SYSTEM
3 OUTLN
4 ORCLADMIN
5 SIEBEL
6 INFAREP
7 JOSHIC
8 DBAMENU
9 INFABKP
10 PERF
11 SADMIN
12 PROXYE
13 DBSNMP
14 WMSYS

14 rows selected.


1 select dim.username, count(*) from dim, fact2
2 where dim.user_id=fact2.user_id
3* group by dim.username
cj@CJ>/

USERNAME COUNT(*)
---------- ----------
INFAREP 800000
ORCLADMIN 40000
OUTLN 9000
SIEBEL 150000
SYS 1063000
SYSTEM 1000


The following query touches more than half of the FACT2 table, so should ideally use an FTS.

The last clause is just to make Oracle hit the FACT2 table and not just the index.

1 select count(*) from dim, fact2
2 where dim.user_id = fact2.user_id
3 and dim.username in ('SYS', 'SIEBEL', 'INFAREP')
4* and fact2.object_id >= -1
cj@CJ>/

1 row selected.

Elapsed: 00:00:30.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2506 Card=1 Bytes=17)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'FACT2' (Cost=2504 Card=102220 Bytes=817760)
3 2 NESTED LOOPS (Cost=2506 Card=153330 Bytes=2606610)
4 3 INLIST ITERATOR
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'DIM' (Cost=3 Card=1 Bytes=9)
6 5 INDEX (RANGE SCAN) OF 'DIM_IDX2' (NON-UNIQUE) (Cost=2 Card=1)
7 3 INDEX (RANGE SCAN) OF 'FACT2_IDX1' (NON-UNIQUE) (Cost=334 Card=343833)


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
14583 consistent gets
13378 physical reads
0 redo size
218 bytes sent via SQL*Net to client
272 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


As you can see, the cardinality estimation for the fact table is off-target by a mile. Now just to show that the above one is not the optimal plan:

1 select /*+ USE_HASH(dim) FULL(fact2) */ count(*) from dim, fact2
2 where dim.user_id = fact2.user_id
3 and dim.username in ('SYS', 'SIEBEL', 'INFAREP')
4* and fact2.object_id >= -1
cj@CJ>/

1 row selected.

Elapsed: 00:00:11.07

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3666 Card=1 Bytes=17)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (Cost=3666 Card=153330 Bytes=2606610)
3 2 INLIST ITERATOR
4 3 TABLE ACCESS (BY INDEX ROWID) OF 'DIM' (Cost=3 Card=1 Bytes=9)
5 4 INDEX (RANGE SCAN) OF 'DIM_IDX2' (NON-UNIQUE) (Cost=2 Card=1)
6 2 TABLE ACCESS (FULL) OF 'FACT2' (Cost=3542 Card=2062400 Bytes=16499200)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12976 consistent gets
4930 physical reads
0 redo size
218 bytes sent via SQL*Net to client
272 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed


The reduced number of LIOs show that the later plan is more efficient. Although the difference is not significant in this case, you can understand that it magnifies greatly when the fact table has 75 million odd rows.

Now suppose I denormalize the design, so that I can run the query directly on the fact table:

cj@CJ>select count(*) from fact2
2 where user_id in (1, 5, 6) and object_id > -1;

1 row selected.

Elapsed: 00:00:10.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3657 Card=1 Bytes=8)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'FACT2' (Cost=3657 Card=2012415 Bytes=16099320)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12970 consistent gets
5017 physical reads
0 redo size
218 bytes sent via SQL*Net to client
272 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

I know that denormalization is would make the fact table far bigger than what it is and is in general a bad practice, atleast the CBO can choose the most efficient plan!

Your comments please. Also request you to suggest a good source on SQL profiles. A search for the words 'sql profile' in Concepts manual and the performance tuning manual didn't return any hit.


Thanks & regards,
Charu

Tom Kyte
May 17, 2005 - 1:37 pm UTC

sql profiles are 10g, you won't find them in 9i

Join cardinality issue

Charu Joshi, May 18, 2005 - 1:32 am UTC

Tom,

>>sql profiles are 10g, you won't find them in 9i

Right. Thanks. No comment on the test case provided? Is there no way we can overcome this in 9i?

Regards,
Charu.

Tom Kyte
May 18, 2005 - 8:49 am UTC

well, you've seen both sides of the coin as far as dimension vs not dimension table goes. you might well want to look at the star transformation -- have you read Jonathans Paper on that topic?

use all available tools ...

Gabe, May 18, 2005 - 2:32 pm UTC

Charu:

<quote>
Given below is a tiny example to illustrate the cardinality estimation problem that leads to sub-optimal plans. Would really appreciate if you could suggest a way out of this. The constraints are:

-- No hints.
-- No stored outlines.
-- No touching the queries.
</quote>

Well, in addition to bitmap indexes and star transformation, you could also try materialized views (hope those were not ?banned? as well) … given your test example …

flip@FLOP> select dim.username, count(*) from dim, fact2
2 where dim.user_id=fact2.user_id
3 group by rollup(dim.username)
4 ;

USERNAME COUNT(*)
------------------------------ ----------
DBSNMP 40000
ORDSYS 800000
OUTLN 9000
SYS 1176800
SYSTEM 1000
WMSYS 150000
2176800

7 rows selected.

Elapsed: 00:00:02.08

flip@FLOP> set autotrace on
flip@FLOP> select count(*) from dim, fact2
2 where dim.user_id = fact2.user_id
3 and dim.username in ('SYS', 'SYSTEM', 'DBSNMP')
4 and fact2.object_id >= -1
5 ;

COUNT(*)
----------
1217800

Elapsed: 00:00:54.04

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2591 Card=1 Bytes=17
)

1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'FACT2' (Cost=1295 Card
=45103 Bytes=360824)

3 2 NESTED LOOPS (Cost=2591 Card=67655 Bytes=1150135)
4 3 INLIST ITERATOR
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'DIM' (Cost=2 Car
d=2 Bytes=18)

6 5 INDEX (RANGE SCAN) OF 'DIM_IDX2' (NON-UNIQUE) (C
ost=1 Card=2)

7 3 INDEX (RANGE SCAN) OF 'FACT2_IDX1' (NON-UNIQUE)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
19820 consistent gets
19707 physical reads
0 redo size
381 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

flip@FLOP> alter session set query_rewrite_enabled=true;

Session altered.

Elapsed: 00:00:00.00
flip@FLOP> select count(*) from dim, fact2
2 where dim.user_id = fact2.user_id
3 and dim.username in ('SYS', 'SYSTEM', 'DBSNMP')
4 and fact2.object_id >= -1
5 ;

COUNT(*)
----------
1217800

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=30)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'MV' (Cost=2 Card=3 Bytes=90)




Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
1 physical reads
0 redo size
381 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed



Tom Kyte
May 18, 2005 - 3:36 pm UTC

good point, slipped my mind

"materialized views, the indexes of your data warehouse"

Tuning in D/W environment.

Charu Joshi, May 19, 2005 - 5:04 am UTC

Tom and Gabe,

Thanks for your comments. This interaction has convinced me that I need to shake up the D/W design to introduce bitmap indexes, star transformation and MV (where possible).

Star transformation.. bitmap indexes.. design change.. resistance.. conflict with designers.. sigh..

Thanks & regards,
Charu

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.