Thanks for the question, Aru.

Asked: May 04, 2005 - 10:08 pm UTC

Last updated: January 18, 2017 - 1:33 am UTC

Version: 9.2.0

Viewed 50K+ times! This question is

You Asked

Hi Tom,
Please can you explain what the term cardinality Really means and what is the importance and what role it has to play in various situations and how it influences the optimizer?? I have searched far and wide but the real meaning and importance has eluded me so far.
Please help..
Regards,
Aru.

and Tom said...

1 entry found for cardinality.

cardinality

<mathematics> The number of elements in a set. If two sets
have the same number of elements (i.e. there is a bijection
between them) then they have the same cardinality. A
cardinality is thus an isomorphism class in the category
of sets.
---------------------------------------------------------------

It literraly means that, in a PLAN Card=NNN is the number of rows the optimizer expects to flow from one row source operation to the next:

new 1: explain plan for select * from emp, dept where emp.deptno =dept.deptno
Explained.
scott@ORA10G> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT

----------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 798 |
| 1 | MERGE JOIN | | 14 | 798 |
| 2 | TABLE ACCESS BY INDEX ROWID| DEPT | 4 | 80 |
| 3 | INDEX FULL SCAN | PK_DEPT | 4 | |
|* 4 | SORT JOIN | | 14 | 518 |
| 5 | TABLE ACCESS FULL | EMP | 14 | 518 |
----------------------------------------------------------------

The optmizer thinks that a full scan of emp in this case will result in 14 rows, a index full scan to read dept will result in 4, after joining emp to dept, there will be 14 rows.

If the cardinality is way off -- it can be disasterous, consider:

ops$tkyte@ORA10G> create table t as select * from all_objects;

Table created.

ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T', method_opt=>'for all indexed columns');

PL/SQL procedure successfully completed.

ops$tkyte@ORA10G> create index t_idx on t(object_id);

Index created.

ops$tkyte@ORA10G> exec dbms_stats.gather_index_stats( user, 'T_IDX' );

PL/SQL procedure successfully completed.

ops$tkyte@ORA10G>
ops$tkyte@ORA10G> set autotrace traceonly explain
ops$tkyte@ORA10G> select * from t where object_id = 42;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=482 Bytes=44826)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=2 Card=482 Bytes=44826)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=1 Card=193)

right now, due to the lack of any column stats (we gathered them for all indexed columns BEFORE creating the index in this example!), the optimizer got the estimated cardinality way wrong. In this case "so what", but what if this were a subquery???? and the subquery is thought to return almost 500 rows -- not the ONE row we know!

ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T', method_opt=>'for all indexed columns');

PL/SQL procedure successfully completed.

ops$tkyte@ORA10G> select * from t where object_id = 42;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1 Bytes=93)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=2 Card=1 Bytes=93)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=1 Card=1)

ops$tkyte@ORA10G> set autotrace off

Now it got it right and the resulting plans that incorporate this query -- will be better for it.

Rating

(34 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Cardinality of multiple predicates on same table

Jay, May 05, 2005 - 12:53 pm UTC

Can the optimizer estimate the cardinality of a combination of predicates on the same table with reasonable accuracy ?

For example:

select *
from t
where c1 = 1
and c2 = 20
;

Analyzing the table t would provide cardinality estimates for predicates [c1 = 1] and [c2 = 20]. But what about the cardinality of the combination of both predicates ? How does the optimizer guess this ? How do we improve the optimizer guess about this ?

The cardinality of the combination is the most important info here, because if table t was part of a larger query (with other tables and joins), and if the the predicates above were evaluated first on table t, then the cardinality estimate of the predicate combination might dictate what operation (Hash join, nested loops etc.) the optmizer chooses to join the row source from table t (after applying the predicates) with other row sources.

May 05, 2005 - 1:25 pm UTC

This is where sql profiles in 10g come in really nice.  You can in effect "analyze a query".

But even before, dynamic sampling can be used to great affect.  It comes into play during the hard parse phase (optimization).  


consider:

ops$tkyte@ORA9IR2> create table t ( x int, y int );
Table created.
 
ops$tkyte@ORA9IR2> insert into t
  2  select mod(rownum,100), mod(rownum,250)
  3    from all_objects;
27951 rows created.
 
ops$tkyte@ORA9IR2> update t set y = 42 where x = 55;
279 rows updated.
 
ops$tkyte@ORA9IR2> commit;
Commit complete.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all columns size 254' );
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where y = 42 and x = 55;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=3 Bytes=18)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=9 Card=3 Bytes=18)
 
<b>it thinks 3 based on x with 100 values and y with 250, if we dynamic sample at level 4 (see the performance and tuning guide for what the levels are), it ran a sampling query to discover this information in order to parse</b>
 
ops$tkyte@ORA9IR2> select /*+ DYNAMIC_SAMPLING(t 4) */ * from t where y = 42 and x = 55;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=279 Bytes=1674)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=9 Card=279 Bytes=1674)
 
<b>quite different -- below we'll see why</b>

 
ops$tkyte@ORA9IR2> select * from t where y = 42 and x = 56;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=3 Bytes=18)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=9 Card=3 Bytes=18)
 
 
 
ops$tkyte@ORA9IR2> select /*+ DYNAMIC_SAMPLING(t 4) */ * from t where y = 42 and x = 56;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=1 Bytes=6)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=9 Card=1 Bytes=6)
 
 
 
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> select count(*) from t where y = 42 and x = 55;
 
  COUNT(*)
----------
       279
 
ops$tkyte@ORA9IR2> select count(*) from t where y = 42 and x = 56;
 
  COUNT(*)
----------
         0

Dynamic sampling

Jay, May 05, 2005 - 3:10 pm UTC

Thanks for the info. Dynamic sampling does open a whole new avenue for estimating single table cardinality for multiple predicates.
1. Since you mention that dynamic sampling occurs during hard parse, how is it done for query with bind variables ? Does bind variable peeking happen and sampling done for bind variable values ? If so one should be aware that the cardinality estimates might be totally off-mark when the same plan is re-used for different bind variable values.

2. In the Oracle 9i R2 database performance guide and reference manual page 6-8, the following is mentioned:
"Dynamic sampling is repeatable if no rows have been inserted, deleted, or updated in the table being sampled". What does the above statement mean ?

3. Also what does dynamic sampling at the cursor level (i.e. with hint /*+ DYNAMIC_SAMPLING(<integer>) */ do ? Does it means that dynamic sampling is done for all tables (with predicates applied) in the query ?

May 05, 2005 - 5:57 pm UTC

1) it'll use bind variable peeking with all that entails and implies. The first one to hard parse sets the plan.

2) means (my interpretation) that if you hard parse a query, and then 5 minutes later hard parse the same query -- the plan could be different since the dynamic sample is dependent on the data that exists RIGHT THEN. It is like doing a mini-analyze during each hard parse. (it it more than like that, you could say it is a small analyze)

3) see the docs for the hint -- it changes the meaning of level. chapter 5 page 40 in the 9ir2 doc set.

but does selectivity get confused?

Anonymous :-), May 06, 2005 - 12:02 am UTC

That was a nice definition of cardinality (where did it come from?). What I at least THINK I've observed is that the term gets confused with selectivity, which I think is used in 2 different ways. When one talks about an index selectivity, I've seen it as - number of distinct rows divided by the total number of rows (cardinality of the table?) so that as you approach 1, you have a selective index. I think I've also seen it as how many rows come back on average from the total so that in the case of a 100 distinct row table, a selectivity of .01 would "be good", or as you approach zero that is "good" as oppossed to 1. Would you agree with this observation? Also, cardinality is the # of rows within the context of an operation, and has nothing to do with # of distinct rows, correct?

May 06, 2005 - 7:24 am UTC

I've been using </code> http://www.dictionary.com/ <code>recently. fast, minimal UI.

cardinality has to be taken in context.

There is the cardinality of a table (number of rows in table)

There is the cardinality of a row source operation (step in a plan) -- the number of rows flowing out of that step.

A result set (which technically is a table!) has a cardinality.

The steps in the plan have cardinality.

cardinality is the count.

Cardinality as a Collection function

Rahul, May 10, 2005 - 6:57 pm UTC

Hi Tom,

I was thinking that CARDINALITY is a collection function like here in the link.

</code> http://www.cis.unisa.edu.au/oracle/server.101/b10759/functions014.htm#i1269375 <code>

But here it is discussed as a HINT for optimizer.

I am guessing that they are not related. You mentioned that cardinality was in 9i itself( as a statement for my earlier post in sqlloader). Which cardinality did you mean by that?

Thank you,
Rahul

May 10, 2005 - 11:30 pm UTC

cardinality in that context is a function that returns the count of the elements in a collection.

-

Ik, May 11, 2005 - 4:43 am UTC

Tom,

This is on your follow up to the "original" question.

The cardinality shown here (below) is way off. But, how come the cost is accurate? I mean, how did the optimiser make the right guess that it just has to do 1 LIO to get the row? Is it from the stats in user_indexes? (leaf_blocks?)

Pasted below (from your follow up)

ops$tkyte@ORA10G> select * from t where object_id = 42;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=482 Bytes=44826)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=2 Card=482
Bytes=44826)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=1 Card=193)

Thank You,

May 11, 2005 - 7:35 am UTC

the only stats that where missing were the column stats, everything else (table, index) were there.

clustering factor, height, all of that stuff.

About Cost

Muhammad Riaz Shahid, May 12, 2005 - 12:50 am UTC

So Tom! how optimizer calculates the cost for individual operation (FTS, IRS, IFFS etc) ?

May 12, 2005 - 7:44 am UTC

it takes the various parameter settings in place and using the built in model -- assigns costs. There is no single constant.

Getting the right cardinality in 9i

Jay, June 30, 2005 - 6:21 pm UTC

Consider the following example:

create table t
as
select trunc(rownum/10) c1, trunc(rownum/10) + 5 c2
  -- c1 and c2 values go together i.e. when c1 = 5 then c2 = 10
  -- and when c2 = 10 then c1 = 5
from all_objects
;

exec dbms_stats.gather_table_stats(ownname=>'GSP_DBO',tabname=>'T',method_opt=>'FOR ALL COLUMNS SIZE 254');

SQL> show parameter dynamic

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------
optimizer_dynamic_sampling           integer     1


explain plan for
select * 
from t
where c1 = 5
;
--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |    10 |    70 |     3 |
|*  1 |  TABLE ACCESS FULL   | T           |    10 |    70 |     3 |
--------------------------------------------------------------------

explain plan for
select * 
from t
where c2 = 10
;
--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |    10 |    70 |     3 |
|*  1 |  TABLE ACCESS FULL   | T           |    10 |    70 |     3 |
--------------------------------------------------------------------

explain plan for
select   * 
from t
where c1 = 5
and c2 = 10
-- one of the conditions is redundant
;
--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |     1 |     7 |     3 |
|*  1 |  TABLE ACCESS FULL   | T           |     1 |     7 |     3 |
--------------------------------------------------------------------
^^^^ Optimizer cardinality estimate way off here since it does
     not know that cardinality of the combination of predicates
     is the same as for one of those predicates
     
explain plan for 
select /*+ DYNAMIC_SAMPLING(t 1) */  * 
from t
where c1 = 5
and c2 = 10
;
--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |    10 |    70 |     3 |
|*  1 |  TABLE ACCESS FULL   | T           |    10 |    70 |     3 |
--------------------------------------------------------------------

Q1. The right cardinality here, maybe because of dynamic sampling hint, 
but this is dynamic sampling level 1 (and level 1 conditions are not satisfied 
here). So how come correct cardinality ?

Infact the optimizer comes up with the correct cardinality 
with dynamic sampling hints of levels 2 and 3 also.
The definition of level 2 does not apply here.
Level 3 not sure if it applies.

Q2. If dynamic sampling level 1 does apply here, why wasn't it
applied without the hint since optimizer_dynamic_sampling = 1 ?
Infact changing optimizer_dynamic_sampling to 2,3 does not
have any effect - the cardinality is still incorrect.
Only changing optimizer_dynamic_sampling to 4 
results in the correct cardinality (level 4 does apply here
 - single-table predicates that reference 2 or more columns ).

Q3. In cases where such SQLs (as demonstrated in the example above)
are issued by a third party application, we have no control over SQL
and hence cannot insert hints. To get the correct cardinality, is 
there any option (in Oracle 9i) other than changing 
optimizer_dynamic_sampling parameter ?

Excerpt from Oracle 9.2 Performance tuning guide
-----------------------------------------------
The sampling levels are as follows if the dynamic sampling level used is from a
cursor hint or from the optimizer_dynamic_sampling parameter:

Level 1: Sample all tables that have not been analyzed if the following criteria
are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed
table is joined to another table or appears in a subquery or non-mergeable view;
(3) this unanalyzed table has no indexes; (4) this unanalyzed table has more
blocks than the number of blocks that would be used for dynamic sampling of
this table. The number of blocks sampled is the default number of dynamic
sampling blocks (32).

Level 2: Apply dynamic sampling to all unanalyzed tables. The number of
blocks sampled is the default number of dynamic sampling blocks.

Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all
tables for which standard selectivity estimation used a guess for some predicate
that is a potential dynamic sampling predicate. The number of blocks sampled
is the default number of dynamic sampling blocks.

Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all
tables that have single-table predicates that reference 2 or more columns. The
number of blocks sampled is the default number of dynamic sampling blocks.
---------------------------------------------

June 30, 2005 - 8:42 pm UTC

if you trace the hinted one:

SELECT /*+ ALL_ROWS IGNORE_WHERE_CLAUSE */ NVL(SUM(C1),0), NVL(SUM(C2),0)
FROM
(SELECT /*+ IGNORE_WHERE_CLAUSE NOPARALLEL("T") */ 1 AS C1, CASE WHEN
"T"."C1"=5 AND "T"."C2"=10 THEN 1 ELSE 0 END AS C2 FROM "T" "T") SAMPLESUB

it did sample it.

You need to page down in the performance guide:

...
The sampling levels are as follows if the dynamic sampling level used is from a table hint:

Level 0: Do not use dynamic sampling.
Level 1: The number of blocks sampled is the default number of dynamic
sampling blocks (32).

You put a table in there and said "DO IT"

Getting the right cardinality

Jay, June 30, 2005 - 9:23 pm UTC

Thanks for pointing that out.
Could you please respond to Question 3 in my post above.

June 30, 2005 - 9:31 pm UTC

but even the optimizer dynamic sampling parameter won't do it, it is the hint that lets you have the table in there.

10g - SQL Profiles, yes.

getting the right cardinality

Jay, July 01, 2005 - 11:24 am UTC

But changing optimizer_dynamic_sampling to 4 will sample table t in the query (with predicates on c1 and c2) without the dynamic sampling hint.

July 01, 2005 - 12:07 pm UTC

give it a try (it is what I would do, test your theory out)

optimizer_dynamic_sampling

Jay, July 01, 2005 - 12:54 pm UTC

Ok here is the proof:

create table t
as
select trunc(rownum/10) c1, trunc(rownum/10) + 5 c2
  -- c1 and c2 values go together i.e. when c1 = 5 then c2 = 10
  -- and when c2 = 10 then c1 = 5
from all_objects
;

exec dbms_stats.gather_table_stats(ownname=>'<Schema Name>',tabname=>'T',method_opt=>'FOR ALL COLUMNS SIZE 254');

SQL> show parameter dynamic

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
optimizer_dynamic_sampling           integer     1

explain plan for
select   * 
from t
where c1 = 5
and c2 = 10
-- one of the conditions is redundant
;

--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |     1 |     7 |     3 |
|*  1 |  TABLE ACCESS FULL   | T           |     1 |     7 |     3 |
--------------------------------------------------------------------

alter session set optimizer_dynamic_sampling=4;

SQL> show parameter dynamic

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----
optimizer_dynamic_sampling           integer     4

explain plan for
select   * 
from t
where c1 = 5
and c2 = 10
-- one of the conditions is redundant
;

--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |    10 |    70 |     3 |
|*  1 |  TABLE ACCESS FULL   | T           |    10 |    70 |     3 |
--------------------------------------------------------------------

July 01, 2005 - 1:53 pm UTC

there you go then.

cardinality hint vs. literals

A reader, January 19, 2006 - 8:04 pm UTC

Oracle 9.2.0.6 with CBO

select * from foo
where col1 in (1,2,3)

select * from foo
where col1 in (select /*+ cardinality(t 3) */ c1 from bar t)

Why is the execution plan different for the 2 statements above? The first one does a INLIST ITERATOR for col1 (indexed) so everythins is great.

The second one does some crazy things and slows things down.

As far as the CBO knows, the IN is going to give it 3 rows, so why doesn't it do the INLIST ITERATOR thing for both cases?

Thanks

January 20, 2006 - 9:22 am UTC

need full example, as to what the "crazy" thing is - but the inlist interator is for hard coded inlists.  

I would expect the plans to be different, but sort of the same, like this:


ops$tkyte@ORA9IR2> create table foo ( col1 int primary key, col2 int, data char(80) );
 
Table created.
 
ops$tkyte@ORA9IR2> drop table bar;
 
Table dropped.
 
ops$tkyte@ORA9IR2> create table bar ( c1 int );
 
Table created.
 
ops$tkyte@ORA9IR2> insert /*+ append */ into foo select rownum, rownum, rownum from all_objects;
 
28429 rows created.
 
ops$tkyte@ORA9IR2> insert /*+ append */ into bar select rownum from all_objects where rownum <= 3;
 
3 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'FOO', cascade=>true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'BAR', cascade=>true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from foo
  2  where col1 in (1,2,3);
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=3 Bytes=270)
   1    0   INLIST ITERATOR
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'FOO' (Cost=3 Card=3 Bytes=270)
   3    2       INDEX (RANGE SCAN) OF 'SYS_C009866' (UNIQUE) (Cost=2 Card=3)
 
 
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from foo
  2  where col1 in (select /*+ cardinality(t 3) */ c1 from bar t);
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=52 Card=3 Bytes=279)
   1    0   NESTED LOOPS (Cost=52 Card=3 Bytes=279)
   2    1     SORT (UNIQUE)
   3    2       TABLE ACCESS (FULL) OF 'BAR' (Cost=2 Card=3 Bytes=9)
   4    1     TABLE ACCESS (BY INDEX ROWID) OF 'FOO' (Cost=1 Card=1 Bytes=90)
   5    4       INDEX (UNIQUE SCAN) OF 'SYS_C009866' (UNIQUE)
 
 
 
ops$tkyte@ORA9IR2> set autotrace off

Incorrect Cardinality

Jara, July 31, 2006 - 7:57 am UTC

Hi Tom,

Please can you help me getting to the bottom of why the cardinalities are incorrect in the following query.

Here are all the details ...

user01@MYDB>select count(*) from test_rship;

COUNT(*)
----------
2875047

Elapsed: 00:00:00.78
user01@MYDB>select count(*) from test_rship
2 where prod_parent_type = 'BP'
3 and prod_child_type = 'CUA';

COUNT(*)
----------
683442

Elapsed: 00:00:02.18
user01@MYDB>select count(*) from test_rship
2 where prod_parent_type = 'CUA'
3 and prod_child_type = 'CUV';

COUNT(*)
----------
741223

Elapsed: 00:00:02.26

1 begin
2 dbms_stats.gather_table_stats(user,'TEST_RSHIP',
3 estimate_percent => dbms_stats.auto_sample_size,
4 method_opt => 'for all columns size auto');
5* end;
user01@MYDB>/

PL/SQL procedure successfully completed.

Elapsed: 00:01:22.23

user01@MYDB>select column_name,num_distinct,to_char(last_analyzed,'dd/mm/yyyy')
2 from user_tab_col_statistics
3 where table_name = 'TEST_RSHIP'
4 and column_name in ('PROD_PARENT_TYPE','PROD_CHILD_TYPE');

COLUMN_NAME NUM_DISTINCT TO_CHAR(LA
------------------------------ ------------ ----------
PROD_PARENT_TYPE 3 31/07/2006
PROD_CHILD_TYPE 4 31/07/2006

user01@MYDB>explain plan
2 for
3 select *
4 from
5 (
6 select prod_parent_id pb_prod_gentd_id
7 , parent_iso_ctry_code pb_ctry_code
8 , parent_auth_status pb_auth_status
....
....
....
26 , child_ean cua_ean
27 from test_rship
28 where prod_parent_type = 'BP'
29 and prod_child_type = 'CUA'
30 ) pb_cua
31 join
32 ( select prod_parent_id cua_prod_gentd_id2
33 , parent_iso_ctry_code cua_ctry_code2
34 , prod_child_id cuv_prod_gentd_id
....
.....
....
42 , child_last_auth_upd_dt cuv_last_auth_upd_dt
43 from test_rship
44 where prod_parent_type = 'CUA'
45 and prod_child_type = 'CUV'
46 ) cua_cuv
47 on pb_cua.cua_prod_gentd_id1 = cua_cuv.cua_prod_gentd_id2
48 and pb_cua.cua_ctry_code1 = cua_cuv.cua_ctry_code2;

user01@MYDB>select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 105K| 15M| | 8283 |
|* 1 | HASH JOIN | | 105K| 15M| 16M| 8283 |
|* 2 | TABLE ACCESS FULL | TEST_RSHIP | 163K| 14M| | 3798 |
|* 3 | TABLE ACCESS FULL | TEST_RSHIP | 385K| 22M| | 3798 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------

1 - access("TEST_RSHIP"."PROD_CHILD_ID"="TEST_RSHIP"."PROD_PARENT_ID"
AND "TEST_RSHIP"."CHILD_ISO_CTRY_CODE"="TEST_RSHIP"."PARENT_ISO_CT
RY_CODE")

2 - filter("TEST_RSHIP"."PROD_PARENT_TYPE"='BP' AND
"TEST_RSHIP"."PROD_CHILD_TYPE"='CUA')
3 - filter("TEST_RSHIP"."PROD_PARENT_TYPE"='CUA' AND
"TEST_RSHIP"."PROD_CHILD_TYPE"='CUV')

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Note: cpu costing is off

21 rows selected.

So, as can be seen the cardinality value for the full scan on the TEST_RSHIP table after the filter conditions have been applied is way off from the actual value. I have collected the stats/histograms but still no joy. I have even run the query and have looked at the cardinality values in the V$SQL_PLAN table and they are same as in the explain plan.

Please can you tell me why the cardinality is not right and what I can do to get it as near as the actual values.

Thanks

July 31, 2006 - 8:38 am UTC

the optimizer sees "discrete things"

It gets statistics on

o A table
o A column in that table
o An index
o The system

all individual statistics.  It then uses 12th grade statistics to guess from there.  Suppose you had 1000 people in a room (example gratuitously borrowed from Jonathan Lewis).  You poll the room and ask "how many of you are born in the month of December?" - 1/12th of the room will raise their hands approximiately.  Then you ask "How many of you are of the zodiac sign Pisces?"  Again, about 1/12th of the room will raise their hands.

Now, you ask - "and how many of you Pisces are born in December?".  The optimizer will guess 1/144th of the room (about 7).  The truth?  ZERO - all pisces are born in feb/mar.  There is a correlation between the two columns and the optimizer doesn't see it because the optimizer has statistics about individual things.  

Enter perhaps dynamic sampling.  Consider:

ops$tkyte@ORA10GR2> create table t
  2  as select decode( mod(rownum,2), 0, 'N', 'Y' ) flag1,
  3            decode( mod(rownum,2), 0, 'Y', 'N' ) flag2, a.*
  4    from all_objects a
  5  /
                                                                                                                                       
Table created.
                                                                                                                                       
ops$tkyte@ORA10GR2> create index t_idx on t(flag1,flag2);
                                                                                                                                       
Index created.
                                                                                                                                       
ops$tkyte@ORA10GR2> begin
  2          dbms_stats.gather_table_stats
  3          ( user, 'T',
  4            method_opt=>'for all indexed columns size 254' );
  5  end;
  6  /
                                                                                                                                       
PL/SQL procedure successfully completed.

<b>we have a table such that when flag1 = 'Y', flag2 will be 'N' and vice versa.  NEVER are the two columns set to Y,Y or N,N. 

There are:</b>

                                                                                                     ops$tkyte@ORA10GR2> select num_rows, num_rows/2, num_rows/2/2 from user_tables where table_name = 'T';
                                                                                                                                       
  NUM_ROWS NUM_ROWS/2 NUM_ROWS/2/2
---------- ---------- ------------
     50295    25147.5     12573.75

<b>50,000 rows - when we use "where flag1 = 'N'" the optimizer will rightly guess about 25,000 rows.  

It knows "50% of the rows are flag1='N'"
It knows "50% of the rows are flag2='N'"

It will guess therefore that about 25% of the rows are flag1=N and flag2=N, we can see this:</b>

ops$tkyte@ORA10GR2> select * from t where flag1='N';
                                                                                                                                       
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
                                                                                                                                       
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 25268 |  2393K|   167   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T    | 25268 |  2393K|   167   (2)| 00:00:03 |
--------------------------------------------------------------------------
                                                                                                                                       
Predicate Information (identified by operation id):
---------------------------------------------------
                                                                                                                                       
   1 - filter("FLAG1"='N')

ops$tkyte@ORA10GR2> select * from t where flag2='N';
                                                                                                                                       
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
                                                                                                                                       
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 25027 |  2370K|   167   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T    | 25027 |  2370K|   167   (2)| 00:00:03 |
--------------------------------------------------------------------------
                                                                                                                                       
Predicate Information (identified by operation id):
---------------------------------------------------
                                                                                                                                       
   1 - filter("FLAG2"='N')


ops$tkyte@ORA10GR2> select * from t where flag1='N' and flag2='N';
                                                                                                                                       
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
                                                                                                                                       
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 12573 |  1190K|   167   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T    | 12573 |  1190K|   167   (2)| 00:00:03 |
--------------------------------------------------------------------------
                                                                                                                                       
Predicate Information (identified by operation id):
---------------------------------------------------
                                                                                                                                       
   1 - filter("FLAG2"='N' AND "FLAG1"='N')

<b>that last bit isn't right - it is way off.  Enter dynamic sampling:</b>

ops$tkyte@ORA10GR2> select /*+ dynamic_sampling(t 3) */ * from t where flag1='N' and flag2='N';
                                                                                                                                       
Execution Plan
----------------------------------------------------------
Plan hash value: 470836197
                                                                                                                                       
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     4 |   388 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     4 |   388 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |     4 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
                                                                                                                                       
Predicate Information (identified by operation id):
---------------------------------------------------
                                                                                                                                       
   2 - access("FLAG1"='N' AND "FLAG2"='N')
                                                                                                                                       
Note
-----
   - dynamic sampling used for this statement

<b>here the optimizer ran a query to hard parse our query - it ran this query because we had dynamic sampling set to 3 and the optimizer knew it "guessed" at a cardinality.  It did a quick scan of the data to figure out how close it was and adjusted the cardinality appropriately</b>

My Two Questions

Arindam Mukherjee, August 19, 2006 - 3:00 am UTC

Respected Mr. Tom,

I have already read three (3) threads of your site Cardinality, density and Index rebuild. But still my idea is not clear on the following three terms. You please explain it in very simple way with your inimitable style. I have read as follows from Oracle document.

Cardinality represents the number of rows in a row set. I think Its COUNT of ROWS

The first measure, selectivity, represents a fraction of rows from a row set. Another form of Count may be percentage.

So on reading those lines, my questions are as follows.

Q1> What are low selectivity, poor selectivity and low cardinality? Is there any relationship with density? If yes, How?

Q2> Quote from Oracle 9i concept (Chapter : 10) - The optimizer can use an existing index to build another index. This results in a much faster index build.

Does it speak about Index Rebuild? If yes, on what situation does Optimizer itself build a new index?

August 19, 2006 - 4:48 am UTC

cardinality is simply the "count of elements in a set". Many times however, we use cardinality meaning "distinct cardinality" when discussing selectivity. But, cardinality is defined in the original answer above.

q1) selectivity is a measure, a ratio if you will, of how many tuples/rows out of a set a where clause will return.

For example, using ALL_OBJECTS:

select * from all_objects where owner = 'SYS'

where owner = 'SYS' could be said to be "not very selective, having low selectivity" because many rows are such that "owner='SYS'" is true. However

where owner = 'SCOTT'

could be said to have high selectivity, it is very selective, because few rows are such that owner = 'SCOTT' is true.

low selectivity, not very selective, poor selectivity are all the same.

low cardinality is "few rows", typically would be synonymous with "highly selective"

density, from an Oracle cost base optimizer perspective, is a number used to guess the cardinality. It is a measure of how many rows are expected back from a predicate of the general form: 'where column = constant/:bind'. If you do not have histograms, Oracle will use the 'density' to guess how many rows that predicate will return.

q2) it is referring to what it will do when you issue a create index statement, the optimizer itself doesn't choose to build a new index.

Excellent

Arindam Mukherjee, August 19, 2006 - 6:51 am UTC

Why the cardinality seems not to apply here?

Orlando Reyes, January 10, 2007 - 3:51 pm UTC

Tom,

I have a table with over 30M records (31,705,533) with an index on field prmk (OPT_MEMBER_ATTRIBUTE_IDX2) and the number of unique keys for this index is 60. So when running query ¿SELECT * FROM facets.opt_member_attribute WHERE prmk = '538';¿ I would not expect Oracle to use the index at all, however, the explain plan looks like:

------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 422K| 40M| 4240 (1)|
| 1 | TABLE ACCESS BY INDEX ROWID| OPT_MEMBER_ATTRIBUTE | 422K| 40M| 4240 (1)|
|* 2 | INDEX RANGE SCAN | OPT_MEMBER_ATTRIBUTE_IDX28 | 422K| | 486 (0)|
------------------------------------------------------------------------------------------------

Both the table and the index are analyzed and the total number of records in the table is 55M and the clustering factor for the index is 1,421,538.

And in fact when I run the query it works in a couple of seconds. I was actually expecting a full table scan.

I am using otimizer_index_caching = 90 and optimizer_index_cost_adj =30.

Can you give me your ideas of why this happens? Shouldn¿t Oracle ignore the index because of the low cardinality? Isn¿t this a very lousy selectivity for the index? And a poor choice for the index?

As always,

Thanks for you help.

Sincerely yours,

Orlando Reyes

Cardinality

Anand Varadarajan, May 23, 2007 - 7:20 am UTC

Th first set of output is from production and the second set of values from testing. The number of records retrieved is 3500. The table is apprx <1000000 records.
Individuall searched only based on the indexed column(without the other filters) we get 70000 rows in both Prod and Test. Other than cardinality for index and cost for the suceeding table everything looks same. Production works in 4-10 minute range and Testing in 4-10 Seconds range.
We are absolutely stumped. Kindly advice.
regards

OPERATION OPTIONS OBJECT_NAME OPTIMIZER COST CARDINALITY
     
SELECT STATEMENT   CHOOSE 873 
NESTED LOOPS    873 1
TABLE ACCESS BY INDEX ROWID VH_PSF ANALYZED 871 1
INDEX RANGE SCAN IDX_VH_PSF_TYPE ANALYZED 11 2623
TABLE ACCESS BY GLOBAL INDEX ROWID GM_VIN ANALYZED 2 1
INDEX UNIQUE SCAN SYS_C0014163 ANALYZED 1 1
     
     
     
     
SELECT STATEMENT   CHOOSE 7 
NESTED LOOPS    7 1
TABLE ACCESS BY INDEX ROWID VH_PSF ANALYZED 5 1
INDEX RANGE SCAN IDX_VH_PSF_TYPE ANALYZED 3 4
TABLE ACCESS BY GLOBAL INDEX ROWID GM_VIN ANALYZED 2 1
INDEX UNIQUE SCAN SYS_C0020641 ANALYZED 1 1

May 23, 2007 - 8:37 am UTC

i cannot read that.

but to tune a query, you want a tkprof, not an explain plan.

Cardinality

Anand Varadarajan, May 24, 2007 - 7:57 am UTC

The above output is from v$sql_plan. Getting tkprof is not easy as we are only given "read only" access to the database. We will have to wait to get the tkprof output.

May 26, 2007 - 11:26 am UTC

query v$sql_plan_statistics

A reader, May 27, 2008 - 4:12 pm UTC

Tom,

If Optimizer chosen to sample a table (for e.g. using default number of dynamic sampling blocks of 32), will optimizer read the first 32 blocks of the table? What if the data in the table ordered chronologically? For e.g. I have a table orders with order_date, delivery_date etc. and the table contains order history for the last five years. So if my query filter order for the last two weeks and optimizer will sample the table and read first 32 blocks in the table this will not help optimizer to choose the best plan

May 27, 2008 - 6:44 pm UTC

it will not read the first N, it'll randomly sample.

Cardinality for joins

Kalita, August 01, 2008 - 5:04 am UTC

Hi Tom,
   I was wondering how Oracle estimates cardinality for unknown values for a column. Please see the example below

SQL> create table test_card_number as
  2  select *
  3  from
  4  (
  5  select rownum test_num
  6  from   dual
  7  connect by 1=1
  8  )
  9  where rownum < 100000;

Table created.

SQL> create index test_card_number_ix on test_card_number(test_num);

Index created.

SQL> exec dbms_stats.gather_table_stats('KALITA','TEST_CARD_NUMBER');

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly;
SQL> select  *
  2  from   test_card_number a,(select ( rownum + 100000) rn from dual )b
  3  where  test_num >= b.rn;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 2093273829

-----------------------------------------------------------------------------------------
| Id  | Operation         | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                     |  5000 | 85000 |     4   (0)| 00:00:01 |
|   1 |  NESTED LOOPS     |                     |  5000 | 85000 |     4   (0)| 00:00:01 |
|   2 |   VIEW            |                     |     1 |    13 |     2   (0)| 00:00:01 |
|   3 |    COUNT          |                     |       |       |            |          |
|   4 |     FAST DUAL     |                     |     1 |       |     2   (0)| 00:00:01 |
|*  5 |   INDEX RANGE SCAN| TEST_CARD_NUMBER_IX |  5000 | 20000 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("TEST_NUM">="B"."RN")
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        329  bytes sent via SQL*Net to client
        373  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed


SQL> select *
  2  from   test_card_number a,(select (rownum + 100000) rn from dual )b
  3  where  test_num <= b.rn;

99999 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2093273829
-----------------------------------------------------------------------------------------

| Id  | Operation         | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                     |  5000 | 85000 |     4   (0)| 00:00:01 |
|   1 |  NESTED LOOPS     |                     |  5000 | 85000 |     4   (0)| 00:00:01 |
|   2 |   VIEW            |                     |     1 |    13 |     2   (0)| 00:00:01 |
|   3 |    COUNT          |                     |       |       |            |          |
|   4 |     FAST DUAL     |                     |     1 |       |     2   (0)| 00:00:01 |
|*  5 |   INDEX RANGE SCAN| TEST_CARD_NUMBER_IX |  5000 | 20000 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("TEST_NUM"<="B"."RN")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       6875  consistent gets
          0  physical reads
          0  redo size
    1835957  bytes sent via SQL*Net to client
      73710  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      99999  rows processed

For both >= and <= the cardinality is estimated to be 5000(5% of the rows). I have tried a few more cases and looks like Oracle estimates cardinality to 5% of rows for unknown values of a column. My problem is similar to the <= case in my example above. I get a date from a table which is evaluated for <= in the other table and the expected cardinality is approxmiately 5% of the total rows but actual cardinality would be close to 50%. So, in stead of doing a full table scan, its doing an index range scan. Is there any way of fixing this issue other than hinting the queries?

Thanks,
Kalita

August 03, 2008 - 1:49 pm UTC

the issue is, the plan is generated before the query is executed. So, in the case of a query like:

select *
2 from test_card_number a,(select (rownum + 100000) rn from dual )b
3 where test_num <= b.rn;

a) it cannot presume, assume - just one row from B.
b) it does not know what that value is at all, when building the plan.

so, it has to say "we do a join", and "we do not know what the value is, hence we have to use a heuristic"

I would prefer to write a query like that as:

select * from t where col <= (select X from something_else);

you'll get basically a similar plan and estimated card= values, but at least we know "one row". But since "select x from something_else" is not known until AFTER the plan is already generated.

Thanks Tom..

Kalita, August 04, 2008 - 1:05 am UTC

cardinality , confuse.

jian huang zheng, August 10, 2008 - 1:02 am UTC

Hello Tom

from this example before:
ops$tkyte@ORA10G> select * from t where object_id = 42;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=482 Bytes=44826)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=2 Card=482 Bytes=44826)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=1 Card=193)
 
how come index range scan with card 193 to produce the table access by index rowid of card 482, I think that is at most one to one card, should be 193. any idea why?

Thanks!

dynamic_sampling vs skewness

MH, August 28, 2008 - 7:49 am UTC

After reading up on this hint I found some queries (over skewed data in very large tables) performed waay much faster when using the hint /*+ dynamic_sampling(3) */.

F.i:

With hint:

107867 records selected.

Elapsed: 00:22:20.07

Uitvoeringsplan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=211975 Card=101631 Bytes=21342510)

   1    0   SORT (GROUP BY) (Cost=211975 Card=101631 Bytes=21342510)
   2    1     HASH JOIN (OUTER) (Cost=207337 Card=101631 Bytes=21342510)

   3    2       HASH JOIN (Cost=183935 Card=101631 Bytes=6199491)
   4    3         HASH JOIN (Cost=7787 Card=124222 Bytes=5714212)
   5    4           TABLE ACCESS (FULL) OF 'PMR_CURRENT_STATUS' (Cost=1512 Card=124222 Bytes=2981328)

   6    4           TABLE ACCESS (FULL) OF 'PAN_RESPONDENTS' (Cost=4269 Card=1103180 Bytes=24269960)

   7    3         TABLE ACCESS (FULL) OF 'PAN_RESPONDENT_VALUES' (Cost=174638 Card=902553 Bytes=13538295)

   8    2       VIEW (Cost=22597 Card=59183 Bytes=8818267)
   9    8         SORT (GROUP BY) (Cost=22597 Card=59183 Bytes=3610163)

  10    9           HASH JOIN (Cost=21717 Card=59183 Bytes=3610163)
  11   10             TABLE ACCESS (FULL) OF 'PMR_CURRENT_STATUS' (Cost=1510 Card=124222 Bytes=3726660)

  12   10             TABLE ACCESS (BY INDEX ROWID) OF 'PAN_PROJECT_MEMBERS' (Cost=39 Card=478 Bytes=10516)

  13   12               NESTED LOOPS (Cost=19459 Card=243521 Bytes=7549151)

  14   13                 VIEW OF 'index$_join$_003' (Cost=36 Card=510 Bytes=4590)

  15   14                   HASH JOIN
  16   15                     INDEX (RANGE SCAN) OF 'PMS_SAMP_PANL_FK'(NON-UNIQUE) (Cost=4 Card=510 Bytes=4590)

  17   15                     INDEX (FAST FULL SCAN) OF 'PMS_SAMP_PK' (UNIQUE) (Cost=4 Card=510 Bytes=4590)

  18   13                 INDEX (RANGE SCAN) OF 'PMS_PBR_MULTI_IDX1' (NON-UNIQUE) (Cost=7 Card=1809)





Statistics
----------------------------------------------------------
         17  recursive calls
          0  db block gets
    1122897  consistent gets
     916871  physical reads
      69668  redo size
    7988329  bytes sent via SQL*Net to client
      81634  bytes received via SQL*Net from client
       7193  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
     107867  rows processed

without hint:

107869 records selected.

Elapsed: 01:20:04.02

Uitvoeringsplan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=40760 Card=5174 Bytes=1086540)

   1    0   SORT (GROUP BY) (Cost=40760 Card=5174 Bytes=1086540)
   2    1     TABLE ACCESS (BY INDEX ROWID) OF 'PAN_RESPONDENT_VALUES' (Cost=5 Card=2 Bytes=30)

   3    2       NESTED LOOPS (Cost=40521 Card=5174 Bytes=1086540)
   4    3         HASH JOIN (OUTER) (Cost=26761 Card=3438 Bytes=670410)

   5    4           HASH JOIN (Cost=5790 Card=3438 Bytes=158148)
   6    5             TABLE ACCESS (FULL) OF 'PMR_CURRENT_STATUS' (Cost=1512 Card=3438 Bytes=82512)

   7    5             TABLE ACCESS (FULL) OF 'PAN_RESPONDENTS' (Cost=4269 Card=1103180 Bytes=24269960)

   8    4           VIEW (Cost=20970 Card=355 Bytes=52895)
   9    8             SORT (GROUP BY) (Cost=20970 Card=355 Bytes=21655)

  10    9               HASH JOIN (Cost=20969 Card=355 Bytes=21655)
  11   10                 TABLE ACCESS (FULL) OF 'PMR_CURRENT_STATUS' (Cost=1510 Card=3438 Bytes=103140)

  12   10                 TABLE ACCESS (BY INDEX ROWID) OF 'PAN_PROJECT_MEMBERS' (Cost=39 Card=90 Bytes=1980)

  13   12                   NESTED LOOPS (Cost=19459 Card=46085 Bytes=1428635)

  14   13                     VIEW OF 'index$_join$_003' (Cost=36 Card=510 Bytes=4590)

  15   14                       HASH JOIN
  16   15                         INDEX (RANGE SCAN) OF 'PMS_SAMP_PANL_FK' (NON-UNIQUE) (Cost=4 Card=510 Bytes=4590)

  17   15                         INDEX (FAST FULL SCAN) OF 'PMS_SAMP_PK' (UNIQUE) (Cost=4 Card=510 Bytes=4590)

  18   13                     INDEX (RANGE SCAN) OF 'PMS_PBR_MULTI_IDX1' (NON-UNIQUE) (Cost=7 Card=1809)

  19    3         INDEX (RANGE SCAN) OF 'PMS_REVA_RESP_FK' (NON-UNIQUE) (Cost=4 Card=2)





Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
    1507643  consistent gets
     766208  physical reads
      26840  redo size
    7988422  bytes sent via SQL*Net to client
      81599  bytes received via SQL*Net from client
       7193  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
     107869  rows processed

(2 records were added during tests, so that explains the different results)

With hint the plan looks like I would expect it to be.
(I don't want an Index range scan over a table containing almost 200 million records...it's 'db file sequential reading' forever then)

All tables are analyzed daily (GATHER_STALE) and indexed but quite skewed over certain FK-columns. So sampling (level = 3) is only applied for (quoting from doc.:) "all tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate"?

(When I use level = 2, the optimezer chooses the 'wrong plan' again).

Is this a 'safe hint-approach' to follow when tuning large queries that perform bad due to skewness?
Or should special histograms be preferred?
(Or am I comparing apples to toaster ovens now?)
I now tend to use this hint more often, but would like to know if there are any caveats?

I'm really amazed by the results: almost one hour faster, just by adding this hint...thanks!
Oracle fancies big brutal hash joins the most, i guess ;)

Regards,
Martijn

Why the difference?

A reader, September 02, 2009 - 12:25 pm UTC

Suppose I have a query typically used in web-based reporting tools that allow for optional search filters.

Consider the following 3 variants of this type of query.

1.

where 1=1
and <join conditions>
and (:b1 is null or column in (select ...))

where 1=1
and <join conditions>
and (column in (:b2,:b3,:b4))

where 1=1
and <join conditions>
and (:b1 is null or column in (select /*+ cardinality(t 3) */ ...))

The user input is both optional and variable length. So I use the table() function to convert the comma separated string with the in-list using the code you posted on this site.

When :b1 is not null, I thought #3 would have the same execeution plan/performance as #2 (which performs the best) because the optimizer would optimize away the :b1 is null part and the cardinality hint would tell it that the SELECT returns 3 values which is the same as entering the 3 values into an IN list, as in #2.

But it doesn't do that. It does a hash join instead of a INLIST iterator and the overall performance is much slower.

Any idea why?

Thanks

September 02, 2009 - 1:01 pm UTC

in general

where c in (a,b,c)

will use what is known as an inlist interator - it knows at least three and at most three, it is 100% sure of everything

where (:b1 is null or c in (set))

is completely different - hugely different. And you would need to verify that the estimated card was really 3 - remember - the cardinality hint is "undocumented", it might not be 'taking' there.

It is taking

A reader, September 02, 2009 - 1:55 pm UTC

> it might not be 'taking' there.

When I see the explain plan (card= for that step), I do see it taking, yet the overall execution plan is different.

Is there a way I can force the inlist iterator to be used?

Sometimes, for other queries, it does work, I see something called 'inlist iterator pickler fetch' in the plan and it works great.

Besides, given a predicate like (1=2 or c in (set)), i.e. any predicate where the first part is constant and evaluates to false, isn't it equivalent to not specifying that predicate all? In other words, all else remaining the same, the execution plan for the above predicate and for (c in (set)) should be identical, right? But it isn't. Why is this?

September 02, 2009 - 2:20 pm UTC

where (:x is null or c in (set))

is radically, hugely different than

where (c in (a,b,c))

where (c in (set))

is radically, hugely different than

where (c in (a,b,c))

... the
execution plan for the above predicate and for (c in (set)) should be
identical, right? ...

why ? why would they be identical?

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14211/ex_plan.htm#sthref1416

inlist iterator is for in's with lists.

ops$tkyte%ORA10GR2> create table t
  2  as
  3  select *
  4    from all_objects;

Table created.

ops$tkyte%ORA10GR2> alter table t add constraint t_pk primary key(object_id);

Table altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t where object_id in (42, 55);

Execution Plan
----------------------------------------------------------
Plan hash value: 2629880895

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |    22 |  2816 |     8   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |      |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T    |    22 |  2816 |     8   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | T_PK |   163 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("OBJECT_ID"=42 OR "OBJECT_ID"=55)

Note
-----
   - dynamic sampling used for this statement

ops$tkyte%ORA10GR2> select * from t where object_id in (select 42 from dual union all select 55 from dual);

Execution Plan
----------------------------------------------------------
Plan hash value: 3667190664

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     2 |   262 |     6   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |          |     2 |   262 |     6   (0)| 00:00:01 |
|   2 |   VIEW                       | VW_NSO_1 |     2 |     6 |     4   (0)| 00:00:01 |
|   3 |    HASH UNIQUE               |          |     2 |       |     4  (50)| 00:00:01 |
|   4 |     UNION-ALL                |          |       |       |            |          |
|   5 |      FAST DUAL               |          |     1 |       |     2   (0)| 00:00:01 |
|   6 |      FAST DUAL               |          |     1 |       |     2   (0)| 00:00:01 |
|   7 |   TABLE ACCESS BY INDEX ROWID| T        |     1 |   128 |     1   (0)| 00:00:01 |
|*  8 |    INDEX UNIQUE SCAN         | T_PK     |     1 |       |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access("OBJECT_ID"="$nso_col_1")

Note
-----
   - dynamic sampling used for this statement

it would be able to nested loop (which is a lot like an inlist interator) or hash join or sort merge or any of the JOIN techniques.

Apex collections

A reader, September 02, 2009 - 2:49 pm UTC

OK my actual use case is for an Apex report region where a list of ids is stored in a collection. A page item :x determines whether or not the collection is populated. So the predicate takes the form

(:x is null or c in (select /*+ cardinality(t 10) */ c001 from apex_collections t 
where t.collection_name='X'))

I *have* to use the cardinality hint when using Apex collections because it is generic view that is used by all applications, only I know how much data is expected in any given situation. This query doesn't perform too well, how would one go about speeding it up? Thanks

September 02, 2009 - 3:33 pm UTC

are the estimated card= values in the plan for all of the other steps anywhere near "correct", if the result set is small - it would opt for nested loops -big - hash join. I'm guess the problem lies elsewhere. not with the collection

use the gather_plan_statistics technique outlined here

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8764517459743#1929856400346576884

and show us

1=2

A reader, September 02, 2009 - 5:02 pm UTC

I asked "Besides, given a predicate like (1=2 or c in (set)), i.e. any predicate where the first part is constant and evaluates to false, isn't it equivalent to not specifying that predicate all? In other words, all else remaining the same, the execution plan for the above predicate and for (c in (set)) should be identical, right? But it isn't. Why is this?"

I am not sure I understood your response. Adding a constant, known-at-parse-time expression with a OR into a predicate that evaluates to false i.e. 1=2 OR <whatever> should have absolutely no effect on the query plan, right? But it does.

Here is a (trivial) example

SQL> create table x as select level i from dual connect by level<=10000;

Table created.

SQL> create table y as select level i from dual connect by level<=10000;

Table created.

SQL> create index i1 on x(i);

Index created.

SQL> create index i2 on y(i);

Index created.

SQL> exec dbms_stats.gather_table_stats(user,'x',cascade=>true);

PL/SQL procedure successfully completed.

SQL>  exec dbms_stats.gather_table_stats(user,'y',cascade=>true)

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly exp
SQL> select * from x where (1=2 or x.i in (select y.i from y));

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=503 Card=500 Bytes=1
          500)
   1    0   FILTER
   2    1     TABLE ACCESS (FULL) OF 'X' (Cost=3 Card=500 Bytes=1500)
   3    1     INDEX (RANGE SCAN) OF 'I2' (NON-UNIQUE) (Cost=1 Card=1 Bytes=3)



SQL>  select * from x where  x.i in (select y.i from y);

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=10000 Bytes=60000)

   1    0   NESTED LOOPS (SEMI) (Cost=4 Card=10000 Bytes=60000)
   2    1     INDEX (FULL SCAN) OF 'I1' (NON-UNIQUE) (Cost=22 Card=10000 Bytes=30000) 
   3    1     INDEX (RANGE SCAN) OF 'I2' (NON-UNIQUE)

Shouldn't the plan for the 2 queries be the same since the constant predicate 1=2 isn't really affecting the answer? But the cost of the query with the OR is much higher. Why? I realize that the CBO (probably) doesn't really evaluate the constant predicate but with bind variable peeking, it could do that and optimize away the expression thus arriving at the lower cost plan, couldn't it?

September 03, 2009 - 7:26 am UTC

... I am not sure I understood your response. ...

my response was that

where (c in (a,b,c))

is not in any way the same as

where (c in (set))

that is all - forget the 1=2, it goes away.

they were asking why "c in (a,b,c)" results in inlist iterator and "c in (set)" does not.

A reader, September 03, 2009 - 10:36 am UTC

That wasn't the question. I posted it again to clarify. It was about the 1=2 (or any expression that evaluates to false and has a OR in the predicate). You said the "1=2 goes away" but it doesn't appear to. I gave an example to show that execution plan with it and without are different. Why is that?

September 04, 2009 - 2:03 pm UTC

well, it does go away - you chopped off the predicate bit. the plans are different (that is permitted, there is no reason that might now happen).

ops$tkyte%ORA10GR2> set autotrace traceonly exp
ops$tkyte%ORA10GR2> select * from x where (1=2 or x.i in (select y.i from y));

Execution Plan
----------------------------------------------------------
Plan hash value: 858363292

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 | 30000 |     4   (0)| 00:00:01 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| X    | 10000 | 30000 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | I2   |     1 |     3 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT /*+ */ 0 FROM "Y" "Y" WHERE "Y"."I"=:B1))
   3 - access("Y"."I"=:B1)

ops$tkyte%ORA10GR2> select * from x where (       x.i in (select y.i from y));

Execution Plan
----------------------------------------------------------
Plan hash value: 3518811015

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      | 10000 | 60000 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN SEMI       |      | 10000 | 60000 |     7  (15)| 00:00:01 |
|   2 |   INDEX FAST FULL SCAN| I1   | 10000 | 30000 |     3   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| I2   | 10000 | 30000 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("X"."I"="Y"."I")

ops$tkyte%ORA10GR2> set autotrace off

but the 1=2 is not there.

tervor, August 14, 2012 - 2:03 am UTC

Tom what was the script you ran that did these steps please
>new 1: explain plan for select * from emp, dept where >emp.deptno =dept.deptno
>Explained.

great explanation on cardinality one of the most common confused terms regarding databases

August 17, 2012 - 2:05 pm UTC

just a really old script, with dbms_xplan/autotrace I don't use it anymore.

It just did a delete on the plan table

then explain plan for &1

then a query against the plan table to format it.

just use

set autotrace traceonly explain
select ....
set autotrace off

much easier. better plan display too.

thankyou

Trevor, August 20, 2012 - 11:39 pm UTC

Thanks very much

Anand, January 18, 2013 - 12:42 am UTC

Hi Tom,

What could be the other reason for explain plan to show not correct cardinality evenif my all table are analyzed.

January 18, 2013 - 10:23 am UTC

http://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html

Anand, January 19, 2013 - 2:41 pm UTC

Hi Tom,

Thanks a lot.

Still i have one question.if there a query join of multiple table then how to decide to give hint of dynamic sampling on which table.

Lets say i have below query :

select * from a,b,c,d where .....

i should give hint /*+ dynamic_sampling(a 0) */ or /*+ dynamic_sampling(b 0) */ or dynamic_sampling(c 0) */

or it doesn't matter.

January 21, 2013 - 7:29 am UTC

well, what do you want to have happen?

there is no single answer here - do you want dynamic sampling to happen? or not? it is up to you.

Cardinality mismatch

Amit, February 13, 2014 - 5:10 pm UTC

Hi Tom,

I have following query and I generated execution plan for it. I am not able to understand the cardinality.

Query :

SELECT DISTINCT NPF_ACCOUNT.AC_SK,
NPF_AC_POSN_SUM.ACCT_BAL,
T_ACCOUNT.AC_ID
FROM T_ACCOUNT,
T_ACCT_SUM
WHERE T_ACCOUNT.AC_SK IN (167056,276364,292510,410216,550806,584324,584325,585607,594305,594306,375386,375387,597632,597633,597634,597636,597637,597638,597639,597640,597641,597642,597643,597644,597645,597646,597647,597648,597649,597650,597651,597652,597653,597654,597655,597665,597656,597657,597658,597659,597660,597661,597662,597663,597664,597625,597626,597627,608683,709753,739855,769598,769599,769600,789267,831812,831816,831815,831814,831813,831817,886799,891301,891302,896746,905441,928452,952683,971664,971665,971666,971667,971668,971669,971670,597628,597629,986885,987360,987859,992765,1069330,1070973,1075750,1201860,1201862,1201861,1283345,1283346,1312703,1312704,1312702,1312578,1312579,1415629,1446799,1446798,1520530,1834159,1859876,1894444,1932920,1964377,2033976,2077465,2185914)
AND T_ACCT_SUM.ACCT_BAL <= 1000.00
AND T_ACCOUNT.AC_SK = T_ACCT_SUM.AC_SK
AND T_ACCT_SUM.BAL_DATE = '31-OCT-13';

Execution plan :

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 132 | 4752 | 124 (2)|
| 1 | HASH UNIQUE | | 132 | 4752 | 124 (2)|
| 2 | MERGE JOIN | | 132 | 4752 | 123 (1)|
| 3 | PARTITION RANGE SINGLE | | 128K| 2501K| 85 (0)|
| 4 | INLIST ITERATOR | | | | |
|* 5 | TABLE ACCESS BY LOCAL INDEX ROWID| T_ACCT_SUM | 128K| 2501K| 85 (0)|
|* 6 | INDEX RANGE SCAN | PK_T_ACCT_SUM | 5 | | 32 (0)|
|* 7 | SORT JOIN | | 106 | 1696 | 39 (3)|
| 8 | INLIST ITERATOR | | | | |
| 9 | TABLE ACCESS BY INDEX ROWID | T_ACCOUNT | 106 | 1696 | 38 (0)|
|* 10 | INDEX UNIQUE SCAN | PK_T_ACCOUNT | 106 | | 29 (0)|
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - filter("NPF_AC_POSN_SUM"."ACCT_BAL"<=1000.00)
6 - access(("NPF_AC_POSN_SUM"."AC_SK"=167056 OR "NPF_AC_POSN_SUM"."AC_SK"=276364 OR
"NPF_AC_POSN_SUM"."AC_SK"=292510 OR "NPF_AC_POSN_SUM"."AC_SK"=375386 OR "NPF_AC_POSN_SUM"."AC_SK"=375387 OR
"NPF_AC_POSN_SUM"."AC_SK"=410216 OR "NPF_AC_POSN_SUM"."AC_SK"=550806 OR "NPF_AC_POSN_SUM"."AC_SK"=584324 OR
"NPF_AC_POSN_SUM"."AC_SK"=584325 OR "NPF_AC_POSN_SUM"."AC_SK"=585607 OR "NPF_AC_POSN_SUM"."AC_SK"=594305 OR
"NPF_AC_POSN_SUM"."AC_SK"=594306 OR "NPF_AC_POSN_SUM"."AC_SK"=597625 OR "NPF_AC_POSN_SUM"."AC_SK"=597626 OR
"NPF_AC_POSN_SUM"."AC_SK"=597627 OR "NPF_AC_POSN_SUM"."AC_SK"=597628 OR "NPF_AC_POSN_SUM"."AC_SK"=597629 OR
"NPF_AC_POSN_SUM"."AC_SK"=597632 OR "NPF_AC_POSN_SUM"."AC_SK"=597633 OR "NPF_AC_POSN_SUM"."AC_SK"=597634 OR
"NPF_AC_POSN_SUM"."AC_SK"=597636 OR "NPF_AC_POSN_SUM"."AC_SK"=597637 OR "NPF_AC_POSN_SUM"."AC_SK"=597638 OR
"NPF_AC_POSN_SUM"."AC_SK"=597639 OR "NPF_AC_POSN_SUM"."AC_SK"=597640 OR "NPF_AC_POSN_SUM"."AC_SK"=597641 OR
"NPF_AC_POSN_SUM"."AC_SK"=597642 OR "NPF_AC_POSN_SUM"."AC_SK"=597643 OR "NPF_AC_POSN_SUM"."AC_SK"=597644 OR
"NPF_AC_POSN_SUM"."AC_SK"=597645 OR "NPF_AC_POSN_SUM"."AC_SK"=597646 OR "NPF_AC_POSN_SUM"."AC_SK"=597647 OR
"NPF_AC_POSN_SUM"."AC_SK"=597648 OR "NPF_AC_POSN_SUM"."AC_SK"=597649 OR "NPF_AC_POSN_SUM"."AC_SK"=597650 OR
"NPF_AC_POSN_SUM"."AC_SK"=597651 OR "NPF_AC_POSN_SUM"."AC_SK"=597652 OR "NPF_AC_POSN_SUM"."AC_SK"=597653 OR
"NPF_AC_POSN_SUM"."AC_SK"=597654 OR "NPF_AC_POSN_SUM"."AC_SK"=597655 OR "NPF_AC_POSN_SUM"."AC_SK"=597656 OR
"NPF_AC_POSN_SUM"."AC_SK"=597657 OR "NPF_AC_POSN_SUM"."AC_SK"=597658 OR "NPF_AC_POSN_SUM"."AC_SK"=597659 OR
"NPF_AC_POSN_SUM"."AC_SK"=597660 OR "NPF_AC_POSN_SUM"."AC_SK"=597661 OR "NPF_AC_POSN_SUM"."AC_SK"=597662 OR
"NPF_AC_POSN_SUM"."AC_SK"=597663 OR "NPF_AC_POSN_SUM"."AC_SK"=597664 OR "NPF_AC_POSN_SUM"."AC_SK"=597665 OR
"NPF_AC_POSN_SUM"."AC_SK"=608683 OR "NPF_AC_POSN_SUM"."AC_SK"=709753 OR "NPF_AC_POSN_SUM"."AC_SK"=739855 OR
"NPF_AC_POSN_SUM"."AC_SK"=769598 OR "NPF_AC_POSN_SUM"."AC_SK"=769599 OR "NPF_AC_POSN_SUM"."AC_SK"=769600 OR
"NPF_AC_POSN_SUM"."AC_SK"=789267 OR "NPF_AC_POSN_SUM"."AC_SK"=831812 OR "NPF_AC_POSN_SUM"."AC_SK"=831813 OR
"NPF_AC_POSN_SUM"."AC_SK"=831814 OR "NPF_AC_POSN_SUM"."AC_SK"=831815 OR "NPF_AC_POSN_SUM"."AC_SK"=831816 OR
"NPF_AC_POSN_SUM"."AC_SK"=831817 OR "NPF_AC_POSN_SUM"."AC_SK"=886799 OR "NPF_AC_POSN_SUM"."AC_SK"=891301 OR
"NPF_AC_POSN_SUM"."AC_SK"=891302 OR "NPF_AC_POSN_SUM"."AC_SK"=896746 OR "NPF_AC_POSN_SUM"."AC_SK"=905441 OR
"NPF_AC_POSN_SUM"."AC_SK"=928452 OR "NPF_AC_POSN_SUM"."AC_SK"=952683 OR "NPF_AC_POSN_SUM"."AC_SK"=971664 OR
"NPF_AC_POSN_SUM"."AC_SK"=971665 OR "NPF_AC_POSN_SUM"."AC_SK"=971666 OR "NPF_AC_POSN_SUM"."AC_SK"=971667 OR
"NPF_AC_POSN_SUM"."AC_SK"=971668 OR "NPF_AC_POSN_SUM"."AC_SK"=971669 OR "NPF_AC_POSN_SUM"."AC_SK"=971670 OR
"NPF_AC_POSN_SUM"."AC_SK"=986885 OR "NPF_AC_POSN_SUM"."AC_SK"=987360 OR "NPF_AC_POSN_SUM"."AC_SK"=987859 OR
"NPF_AC_POSN_SUM"."AC_SK"=992765 OR "NPF_AC_POSN_SUM"."AC_SK"=1069330 OR "NPF_AC_POSN_SUM"."AC_SK"=1070973 OR
"NPF_AC_POSN_SUM"."AC_SK"=1075750 OR "NPF_AC_POSN_SUM"."AC_SK"=1201860 OR "NPF_AC_POSN_SUM"."AC_SK"=1201861 OR
"NPF_AC_POSN_SUM"."AC_SK"=1201862 OR "NPF_AC_POSN_SUM"."AC_SK"=1283345 OR "NPF_AC_POSN_SUM"."AC_SK"=1283346 OR
"NPF_AC_POSN_SUM"."AC_SK"=1312578 OR "NPF_AC_POSN_SUM"."AC_SK"=1312579 OR "NPF_AC_POSN_SUM"."AC_SK"=1312702 OR
"NPF_AC_POSN_SUM"."AC_SK"=1312703 OR "NPF_AC_POSN_SUM"."AC_SK"=1312704 OR "NPF_AC_POSN_SUM"."AC_SK"=1415629 OR
"NPF_AC_POSN_SUM"."AC_SK"=1446798 OR "NPF_AC_POSN_SUM"."AC_SK"=1446799 OR "NPF_AC_POSN_SUM"."AC_SK"=1520530 OR
"NPF_AC_POSN_SUM"."AC_SK"=1834159 OR "NPF_AC_POSN_SUM"."AC_SK"=1859876 OR "NPF_AC_POSN_SUM"."AC_SK"=1894444 OR
"NPF_AC_POSN_SUM"."AC_SK"=1932920 OR "NPF_AC_POSN_SUM"."AC_SK"=1964377 OR "NPF_AC_POSN_SUM"."AC_SK"=2033976 OR
"NPF_AC_POSN_SUM"."AC_SK"=2077465 OR "NPF_AC_POSN_SUM"."AC_SK"=2185914) AND
"NPF_AC_POSN_SUM"."VALN_AS_OF_DATE"='31-OCT-13')
7 - access("NPF_ACCOUNT"."AC_SK"="NPF_AC_POSN_SUM"."AC_SK")
filter("NPF_ACCOUNT"."AC_SK"="NPF_AC_POSN_SUM"."AC_SK")
10 - access("NPF_ACCOUNT"."AC_SK"=167056 OR "NPF_ACCOUNT"."AC_SK"=276364 OR "NPF_ACCOUNT"."AC_SK"=292510 OR
"NPF_ACCOUNT"."AC_SK"=375386 OR "NPF_ACCOUNT"."AC_SK"=375387 OR "NPF_ACCOUNT"."AC_SK"=410216 OR
"NPF_ACCOUNT"."AC_SK"=550806 OR "NPF_ACCOUNT"."AC_SK"=584324 OR "NPF_ACCOUNT"."AC_SK"=584325 OR
"NPF_ACCOUNT"."AC_SK"=585607 OR "NPF_ACCOUNT"."AC_SK"=594305 OR "NPF_ACCOUNT"."AC_SK"=594306 OR
"NPF_ACCOUNT"."AC_SK"=597625 OR "NPF_ACCOUNT"."AC_SK"=597626 OR "NPF_ACCOUNT"."AC_SK"=597627 OR
"NPF_ACCOUNT"."AC_SK"=597628 OR "NPF_ACCOUNT"."AC_SK"=597629 OR "NPF_ACCOUNT"."AC_SK"=597632 OR
"NPF_ACCOUNT"."AC_SK"=597633 OR "NPF_ACCOUNT"."AC_SK"=597634 OR "NPF_ACCOUNT"."AC_SK"=597636 OR
"NPF_ACCOUNT"."AC_SK"=597637 OR "NPF_ACCOUNT"."AC_SK"=597638 OR "NPF_ACCOUNT"."AC_SK"=597639 OR
"NPF_ACCOUNT"."AC_SK"=597640 OR "NPF_ACCOUNT"."AC_SK"=597641 OR "NPF_ACCOUNT"."AC_SK"=597642 OR
"NPF_ACCOUNT"."AC_SK"=597643 OR "NPF_ACCOUNT"."AC_SK"=597644 OR "NPF_ACCOUNT"."AC_SK"=597645 OR
"NPF_ACCOUNT"."AC_SK"=597646 OR "NPF_ACCOUNT"."AC_SK"=597647 OR "NPF_ACCOUNT"."AC_SK"=597648 OR
"NPF_ACCOUNT"."AC_SK"=597649 OR "NPF_ACCOUNT"."AC_SK"=597650 OR "NPF_ACCOUNT"."AC_SK"=597651 OR
"NPF_ACCOUNT"."AC_SK"=597652 OR "NPF_ACCOUNT"."AC_SK"=597653 OR "NPF_ACCOUNT"."AC_SK"=597654 OR
"NPF_ACCOUNT"."AC_SK"=597655 OR "NPF_ACCOUNT"."AC_SK"=597656 OR "NPF_ACCOUNT"."AC_SK"=597657 OR
"NPF_ACCOUNT"."AC_SK"=597658 OR "NPF_ACCOUNT"."AC_SK"=597659 OR "NPF_ACCOUNT"."AC_SK"=597660 OR
"NPF_ACCOUNT"."AC_SK"=597661 OR "NPF_ACCOUNT"."AC_SK"=597662 OR "NPF_ACCOUNT"."AC_SK"=597663 OR
"NPF_ACCOUNT"."AC_SK"=597664 OR "NPF_ACCOUNT"."AC_SK"=597665 OR "NPF_ACCOUNT"."AC_SK"=608683 OR
"NPF_ACCOUNT"."AC_SK"=709753 OR "NPF_ACCOUNT"."AC_SK"=739855 OR "NPF_ACCOUNT"."AC_SK"=769598 OR
"NPF_ACCOUNT"."AC_SK"=769599 OR "NPF_ACCOUNT"."AC_SK"=769600 OR "NPF_ACCOUNT"."AC_SK"=789267 OR
"NPF_ACCOUNT"."AC_SK"=831812 OR "NPF_ACCOUNT"."AC_SK"=831813 OR "NPF_ACCOUNT"."AC_SK"=831814 OR
"NPF_ACCOUNT"."AC_SK"=831815 OR "NPF_ACCOUNT"."AC_SK"=831816 OR "NPF_ACCOUNT"."AC_SK"=831817 OR
"NPF_ACCOUNT"."AC_SK"=886799 OR "NPF_ACCOUNT"."AC_SK"=891301 OR "NPF_ACCOUNT"."AC_SK"=891302 OR
"NPF_ACCOUNT"."AC_SK"=896746 OR "NPF_ACCOUNT"."AC_SK"=905441 OR "NPF_ACCOUNT"."AC_SK"=928452 OR
"NPF_ACCOUNT"."AC_SK"=952683 OR "NPF_ACCOUNT"."AC_SK"=971664 OR "NPF_ACCOUNT"."AC_SK"=971665 OR
"NPF_ACCOUNT"."AC_SK"=971666 OR "NPF_ACCOUNT"."AC_SK"=971667 OR "NPF_ACCOUNT"."AC_SK"=971668 OR
"NPF_ACCOUNT"."AC_SK"=971669 OR "NPF_ACCOUNT"."AC_SK"=971670 OR "NPF_ACCOUNT"."AC_SK"=986885 OR
"NPF_ACCOUNT"."AC_SK"=987360 OR "NPF_ACCOUNT"."AC_SK"=987859 OR "NPF_ACCOUNT"."AC_SK"=992765 OR
"NPF_ACCOUNT"."AC_SK"=1069330 OR "NPF_ACCOUNT"."AC_SK"=1070973 OR "NPF_ACCOUNT"."AC_SK"=1075750 OR
"NPF_ACCOUNT"."AC_SK"=1201860 OR "NPF_ACCOUNT"."AC_SK"=1201861 OR "NPF_ACCOUNT"."AC_SK"=1201862 OR
"NPF_ACCOUNT"."AC_SK"=1283345 OR "NPF_ACCOUNT"."AC_SK"=1283346 OR "NPF_ACCOUNT"."AC_SK"=1312578 OR
"NPF_ACCOUNT"."AC_SK"=1312579 OR "NPF_ACCOUNT"."AC_SK"=1312702 OR "NPF_ACCOUNT"."AC_SK"=1312703 OR
"NPF_ACCOUNT"."AC_SK"=1312704 OR "NPF_ACCOUNT"."AC_SK"=1415629 OR "NPF_ACCOUNT"."AC_SK"=1446798 OR
"NPF_ACCOUNT"."AC_SK"=1446799 OR "NPF_ACCOUNT"."AC_SK"=1520530 OR "NPF_ACCOUNT"."AC_SK"=1834159 OR
"NPF_ACCOUNT"."AC_SK"=1859876 OR "NPF_ACCOUNT"."AC_SK"=1894444 OR "NPF_ACCOUNT"."AC_SK"=1932920 OR
"NPF_ACCOUNT"."AC_SK"=1964377 OR "NPF_ACCOUNT"."AC_SK"=2033976 OR "NPF_ACCOUNT"."AC_SK"=2077465 OR
"NPF_ACCOUNT"."AC_SK"=2185914)

Table T_ACCT_SUM is range partitioned on BAL_DATE column. ACCT_BAL column has height balanced histogram.
Database is 11.2.0.3 version 4 node RAC system.
Execution plan is collected using "SET AUTOTRACE ON" command.

On line 6, cardinality calculated is "5" using index range scan. But on line 5, cardinality has increased to 128K for same table. My assumption was that, cardinality returned by index should be greater than or same as the cardinality for its corresponding table. Here, the increase is huge. Can you please help me understand what I am missing?

Thanks!

Simple vocabulary would have made the answer better

fero31@yahoo.co.in, March 23, 2015 - 8:08 pm UTC

I normally dont write reviews but I had to say this, the answers could have been better if it were written with simple vocabulary. I was looking for different oracle questions, always this website came up first, so I would read this answer and then had to move to other websites as I dont comprehend the style of writing that much.

Histogram and Explain plan commands

Rajeshwaran, Jeyabal, January 17, 2017 - 6:37 am UTC

Team,

Looking into the histogram the estimated cardinality for the bind value SYS is (77043-41873) 35170, but the explain plan reports the cardinality as 4075 (which is num_rows/NDV i.e. 77426/19), any reason why the explain plan doesn't make use of Histograms to report the correct cardinality ?

demo@ORA12C> create table t as
  2  select *
  3  from all_objects;

Table created.

demo@ORA12C>
demo@ORA12C> create index t_idx on t(owner);

Index created.

demo@ORA12C> begin
  2     dbms_stats.gather_table_stats(
  3             ownname=>user,
  4             tabname=>'T',
  5             method_opt=>'for all indexed columns size 20');
  6  end;
  7  /

PL/SQL procedure successfully completed.

demo@ORA12C>
demo@ORA12C> select count(*),count(distinct owner) from t;

  COUNT(*) COUNT(DISTINCTOWNER)
---------- --------------------
     77426                   19

1 row selected.

demo@ORA12C>
demo@ORA12C> select num_distinct,num_buckets,histogram,num_nulls
  2  from user_tab_col_statistics
  3  where table_name ='T'
  4  and column_name ='OWNER';

NUM_DISTINCT NUM_BUCKETS HISTOGRAM        NUM_NULLS
------------ ----------- --------------- ----------
          19          19 FREQUENCY                0

1 row selected.

demo@ORA12C>
demo@ORA12C> column endpoint_actual_value format a20
demo@ORA12C> select endpoint_number,endpoint_actual_value
  2  from user_tab_histograms
  3  where table_name ='T'
  4  and column_name ='OWNER';

ENDPOINT_NUMBER ENDPOINT_ACTUAL_VALU
--------------- --------------------
            248 APEX_040200
            356 CTXSYS
            365 DEMO
            382 DVF
            393 DVSYS
            412 GSMADMIN_INTERNAL
            494 LBACSYS
           1658 MDSYS
           1676 OLAPSYS
           1681 ORDDATA
           1686 ORDPLUGINS
           4786 ORDSYS
           4793 OUTLN
          41869 PUBLIC
          41873 SCOTT
          77043 SYS
          77056 SYSTEM
          77126 WMSYS
          77426 XDB

19 rows selected. 

demo@ORA12C> variable x varchar2(20);
demo@ORA12C> exec :x := 'SYS';

PL/SQL procedure successfully completed.

demo@ORA12C>
demo@ORA12C> explain plan for
  2  select * from t
  3  where owner =:x;

Explained.

demo@ORA12C>
demo@ORA12C> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
Plan hash value: 767293772

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |  4075 |   461K|   141   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |  4075 |   461K|   141   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T_IDX |  4075 |       |    10   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OWNER"=:X)

14 rows selected.

demo@ORA12C> exec :x := 'SCOTT';

PL/SQL procedure successfully completed.

demo@ORA12C> explain plan for
  2  select * from t
  3  where owner =:x;

Explained.

demo@ORA12C>
demo@ORA12C> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
Plan hash value: 767293772

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |  4075 |   461K|   141   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |  4075 |   461K|   141   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T_IDX |  4075 |       |    10   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OWNER"=:X)

14 rows selected.

demo@ORA12C>

January 18, 2017 - 1:33 am UTC

An explain plan *command* doesn't peek. It's one of the flaws of using it. Better to run it and use dbms_xplan.display_cursor


SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t as
  2  select *
  3  from all_objects;

Table created.

SQL>
SQL> create index t_idx on t(owner);

Index created.

SQL>
SQL> begin
  2     dbms_stats.gather_table_stats(
  3             ownname=>user,
  4             tabname=>'T',
  5             method_opt=>'for all indexed columns size 35');
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*),count(distinct owner) from t;

  COUNT(*) COUNT(DISTINCTOWNER)
---------- --------------------
    101321                   39

1 row selected.

SQL>
SQL> select num_distinct,num_buckets,histogram,num_nulls
  2  from user_tab_col_statistics
  3  where table_name ='T'
  4  and column_name ='OWNER';

NUM_DISTINCT NUM_BUCKETS HISTOGRAM        NUM_NULLS
------------ ----------- --------------- ----------
          39          35 TOP-FREQUENCY            0

1 row selected.

SQL>
SQL> column endpoint_actual_value format a20
SQL> select endpoint_number,endpoint_actual_value
  2  from user_tab_histograms
  3  where table_name ='T'
  4  and column_name ='OWNER';

ENDPOINT_NUMBER ENDPOINT_ACTUAL_VALU
--------------- --------------------
           3021 APEX_040200
           6378 APEX_050000
           9542 APEX_050100
           9564 APEX_LISTENER
           9724 ASKTOM
           9733 AUDSYS
           9741 BI
          10142 CTXSYS
          10197 DBSNMP
          10216 DVF
          10508 DVSYS
          10520 FLOWS_FILES
          10624 GSMADMIN_INTERNAL
          10658 HR
          10895 LBACSYS
          14055 MCDONAC
          16154 MDSYS
          16171 OJVMSYS
          16196 OLAPSYS
          16470 ORDDATA
          16480 ORDPLUGINS
          19640 ORDSYS
          19848 ORDS_METADATA
          19856 OUTLN
          57418 PUBLIC
          57428 RASADM
          57444 SCOTT
          57735 SH
          57743 SI_INFORMTN_SCHEMA
          57790 SOE
          99915 SYS
         100513 SYSTEM
         100531 WHS_MD
         100902 WMSYS
         101306 XDB

35 rows selected.

SQL>
SQL> variable x varchar2(20);
SQL> exec :x := 'SCOTT';

PL/SQL procedure successfully completed.

SQL>
SQL> select * from t
  2  where owner =:x;

...

16 rows selected.

SQL>
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  7yvn3fqx5x8tp, child number 0
-------------------------------------
select * from t where owner =:x

Plan hash value: 767293772

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |    16 |  1824 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T_IDX |    16 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OWNER"=:X)


19 rows selected.

SQL>
SQL>

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Cardinality of multiple predicates on same table

Dynamic sampling

but does selectivity get confused?

Cardinality as a Collection function

-

About Cost

Getting the right cardinality in 9i

Getting the right cardinality

getting the right cardinality

optimizer_dynamic_sampling

cardinality hint vs. literals

Incorrect Cardinality

My Two Questions

Excellent

Why the cardinality seems not to apply here?

Cardinality

Cardinality

Cardinality for joins

Thanks Tom..

cardinality , confuse.

dynamic_sampling vs skewness

Why the difference?

It is taking

Apex collections

1=2

thankyou

Cardinality mismatch

Simple vocabulary would have made the answer better

Histogram and Explain plan commands

More to Explore

PL/SQL demos

PL/SQL docs