Cardinality of multiple predicates on same table
Jay, May 05, 2005 - 12:53 pm UTC
Can the optimizer estimate the cardinality of a combination of predicates on the same table with reasonable accuracy ?
For example:
select *
from t
where c1 = 1
and c2 = 20
;
Analyzing the table t would provide cardinality estimates for predicates [c1 = 1] and [c2 = 20]. But what about the cardinality of the combination of both predicates ? How does the optimizer guess this ? How do we improve the optimizer guess about this ?
The cardinality of the combination is the most important info here, because if table t was part of a larger query (with other tables and joins), and if the the predicates above were evaluated first on table t, then the cardinality estimate of the predicate combination might dictate what operation (Hash join, nested loops etc.) the optmizer chooses to join the row source from table t (after applying the predicates) with other row sources.
May 05, 2005 - 1:25 pm UTC
This is where sql profiles in 10g come in really nice. You can in effect "analyze a query".
But even before, dynamic sampling can be used to great affect. It comes into play during the hard parse phase (optimization).
consider:
ops$tkyte@ORA9IR2> create table t ( x int, y int );
Table created.
ops$tkyte@ORA9IR2> insert into t
2 select mod(rownum,100), mod(rownum,250)
3 from all_objects;
27951 rows created.
ops$tkyte@ORA9IR2> update t set y = 42 where x = 55;
279 rows updated.
ops$tkyte@ORA9IR2> commit;
Commit complete.
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all columns size 254' );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where y = 42 and x = 55;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=3 Bytes=18)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=9 Card=3 Bytes=18)
<b>it thinks 3 based on x with 100 values and y with 250, if we dynamic sample at level 4 (see the performance and tuning guide for what the levels are), it ran a sampling query to discover this information in order to parse</b>
ops$tkyte@ORA9IR2> select /*+ DYNAMIC_SAMPLING(t 4) */ * from t where y = 42 and x = 55;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=279 Bytes=1674)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=9 Card=279 Bytes=1674)
<b>quite different -- below we'll see why</b>
ops$tkyte@ORA9IR2> select * from t where y = 42 and x = 56;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=3 Bytes=18)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=9 Card=3 Bytes=18)
ops$tkyte@ORA9IR2> select /*+ DYNAMIC_SAMPLING(t 4) */ * from t where y = 42 and x = 56;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=9 Card=1 Bytes=6)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=9 Card=1 Bytes=6)
ops$tkyte@ORA9IR2> set autotrace off
ops$tkyte@ORA9IR2> select count(*) from t where y = 42 and x = 55;
COUNT(*)
----------
279
ops$tkyte@ORA9IR2> select count(*) from t where y = 42 and x = 56;
COUNT(*)
----------
0
Dynamic sampling
Jay, May 05, 2005 - 3:10 pm UTC
Thanks for the info. Dynamic sampling does open a whole new avenue for estimating single table cardinality for multiple predicates.
1. Since you mention that dynamic sampling occurs during hard parse, how is it done for query with bind variables ? Does bind variable peeking happen and sampling done for bind variable values ? If so one should be aware that the cardinality estimates might be totally off-mark when the same plan is re-used for different bind variable values.
2. In the Oracle 9i R2 database performance guide and reference manual page 6-8, the following is mentioned:
"Dynamic sampling is repeatable if no rows have been inserted, deleted, or updated in the table being sampled". What does the above statement mean ?
3. Also what does dynamic sampling at the cursor level (i.e. with hint /*+ DYNAMIC_SAMPLING(<integer>) */ do ? Does it means that dynamic sampling is done for all tables (with predicates applied) in the query ?
May 05, 2005 - 5:57 pm UTC
1) it'll use bind variable peeking with all that entails and implies. The first one to hard parse sets the plan.
2) means (my interpretation) that if you hard parse a query, and then 5 minutes later hard parse the same query -- the plan could be different since the dynamic sample is dependent on the data that exists RIGHT THEN. It is like doing a mini-analyze during each hard parse. (it it more than like that, you could say it is a small analyze)
3) see the docs for the hint -- it changes the meaning of level. chapter 5 page 40 in the 9ir2 doc set.
but does selectivity get confused?
Anonymous :-), May 06, 2005 - 12:02 am UTC
That was a nice definition of cardinality (where did it come from?). What I at least THINK I've observed is that the term gets confused with selectivity, which I think is used in 2 different ways. When one talks about an index selectivity, I've seen it as - number of distinct rows divided by the total number of rows (cardinality of the table?) so that as you approach 1, you have a selective index. I think I've also seen it as how many rows come back on average from the total so that in the case of a 100 distinct row table, a selectivity of .01 would "be good", or as you approach zero that is "good" as oppossed to 1. Would you agree with this observation? Also, cardinality is the # of rows within the context of an operation, and has nothing to do with # of distinct rows, correct?
May 06, 2005 - 7:24 am UTC
I've been using </code>
http://www.dictionary.com/ <code>recently. fast, minimal UI.
cardinality has to be taken in context.
There is the cardinality of a table (number of rows in table)
There is the cardinality of a row source operation (step in a plan) -- the number of rows flowing out of that step.
A result set (which technically is a table!) has a cardinality.
The steps in the plan have cardinality.
cardinality is the count.
Cardinality as a Collection function
Rahul, May 10, 2005 - 6:57 pm UTC
Hi Tom,
I was thinking that CARDINALITY is a collection function like here in the link.
</code>
http://www.cis.unisa.edu.au/oracle/server.101/b10759/functions014.htm#i1269375 <code>
But here it is discussed as a HINT for optimizer.
I am guessing that they are not related. You mentioned that cardinality was in 9i itself( as a statement for my earlier post in sqlloader). Which cardinality did you mean by that?
Thank you,
Rahul
May 10, 2005 - 11:30 pm UTC
cardinality in that context is a function that returns the count of the elements in a collection.
-
Ik, May 11, 2005 - 4:43 am UTC
Tom,
This is on your follow up to the "original" question.
The cardinality shown here (below) is way off. But, how come the cost is accurate? I mean, how did the optimiser make the right guess that it just has to do 1 LIO to get the row? Is it from the stats in user_indexes? (leaf_blocks?)
Pasted below (from your follow up)
ops$tkyte@ORA10G> select * from t where object_id = 42;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=482 Bytes=44826)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=2 Card=482
Bytes=44826)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=1 Card=193)
Thank You,
May 11, 2005 - 7:35 am UTC
the only stats that where missing were the column stats, everything else (table, index) were there.
clustering factor, height, all of that stuff.
About Cost
Muhammad Riaz Shahid, May 12, 2005 - 12:50 am UTC
So Tom! how optimizer calculates the cost for individual operation (FTS, IRS, IFFS etc) ?
May 12, 2005 - 7:44 am UTC
it takes the various parameter settings in place and using the built in model -- assigns costs. There is no single constant.
Getting the right cardinality in 9i
Jay, June 30, 2005 - 6:21 pm UTC
Consider the following example:
create table t
as
select trunc(rownum/10) c1, trunc(rownum/10) + 5 c2
-- c1 and c2 values go together i.e. when c1 = 5 then c2 = 10
-- and when c2 = 10 then c1 = 5
from all_objects
;
exec dbms_stats.gather_table_stats(ownname=>'GSP_DBO',tabname=>'T',method_opt=>'FOR ALL COLUMNS SIZE 254');
SQL> show parameter dynamic
NAME TYPE VALUE
------------------------------------ ----------- ------
optimizer_dynamic_sampling integer 1
explain plan for
select *
from t
where c1 = 5
;
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 70 | 3 |
|* 1 | TABLE ACCESS FULL | T | 10 | 70 | 3 |
--------------------------------------------------------------------
explain plan for
select *
from t
where c2 = 10
;
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 70 | 3 |
|* 1 | TABLE ACCESS FULL | T | 10 | 70 | 3 |
--------------------------------------------------------------------
explain plan for
select *
from t
where c1 = 5
and c2 = 10
-- one of the conditions is redundant
;
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 3 |
|* 1 | TABLE ACCESS FULL | T | 1 | 7 | 3 |
--------------------------------------------------------------------
^^^^ Optimizer cardinality estimate way off here since it does
not know that cardinality of the combination of predicates
is the same as for one of those predicates
explain plan for
select /*+ DYNAMIC_SAMPLING(t 1) */ *
from t
where c1 = 5
and c2 = 10
;
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 70 | 3 |
|* 1 | TABLE ACCESS FULL | T | 10 | 70 | 3 |
--------------------------------------------------------------------
Q1. The right cardinality here, maybe because of dynamic sampling hint,
but this is dynamic sampling level 1 (and level 1 conditions are not satisfied
here). So how come correct cardinality ?
Infact the optimizer comes up with the correct cardinality
with dynamic sampling hints of levels 2 and 3 also.
The definition of level 2 does not apply here.
Level 3 not sure if it applies.
Q2. If dynamic sampling level 1 does apply here, why wasn't it
applied without the hint since optimizer_dynamic_sampling = 1 ?
Infact changing optimizer_dynamic_sampling to 2,3 does not
have any effect - the cardinality is still incorrect.
Only changing optimizer_dynamic_sampling to 4
results in the correct cardinality (level 4 does apply here
- single-table predicates that reference 2 or more columns ).
Q3. In cases where such SQLs (as demonstrated in the example above)
are issued by a third party application, we have no control over SQL
and hence cannot insert hints. To get the correct cardinality, is
there any option (in Oracle 9i) other than changing
optimizer_dynamic_sampling parameter ?
Excerpt from Oracle 9.2 Performance tuning guide
-----------------------------------------------
The sampling levels are as follows if the dynamic sampling level used is from a
cursor hint or from the optimizer_dynamic_sampling parameter:
Level 1: Sample all tables that have not been analyzed if the following criteria
are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed
table is joined to another table or appears in a subquery or non-mergeable view;
(3) this unanalyzed table has no indexes; (4) this unanalyzed table has more
blocks than the number of blocks that would be used for dynamic sampling of
this table. The number of blocks sampled is the default number of dynamic
sampling blocks (32).
Level 2: Apply dynamic sampling to all unanalyzed tables. The number of
blocks sampled is the default number of dynamic sampling blocks.
Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all
tables for which standard selectivity estimation used a guess for some predicate
that is a potential dynamic sampling predicate. The number of blocks sampled
is the default number of dynamic sampling blocks.
Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all
tables that have single-table predicates that reference 2 or more columns. The
number of blocks sampled is the default number of dynamic sampling blocks.
---------------------------------------------
June 30, 2005 - 8:42 pm UTC
if you trace the hinted one:
SELECT /*+ ALL_ROWS IGNORE_WHERE_CLAUSE */ NVL(SUM(C1),0), NVL(SUM(C2),0)
FROM
(SELECT /*+ IGNORE_WHERE_CLAUSE NOPARALLEL("T") */ 1 AS C1, CASE WHEN
"T"."C1"=5 AND "T"."C2"=10 THEN 1 ELSE 0 END AS C2 FROM "T" "T") SAMPLESUB
it did sample it.
You need to page down in the performance guide:
...
The sampling levels are as follows if the dynamic sampling level used is from a table hint:
Level 0: Do not use dynamic sampling.
Level 1: The number of blocks sampled is the default number of dynamic
sampling blocks (32).
You put a table in there and said "DO IT"
Getting the right cardinality
Jay, June 30, 2005 - 9:23 pm UTC
Thanks for pointing that out.
Could you please respond to Question 3 in my post above.
June 30, 2005 - 9:31 pm UTC
but even the optimizer dynamic sampling parameter won't do it, it is the hint that lets you have the table in there.
10g - SQL Profiles, yes.
getting the right cardinality
Jay, July 01, 2005 - 11:24 am UTC
But changing optimizer_dynamic_sampling to 4 will sample table t in the query (with predicates on c1 and c2) without the dynamic sampling hint.
July 01, 2005 - 12:07 pm UTC
give it a try (it is what I would do, test your theory out)
optimizer_dynamic_sampling
Jay, July 01, 2005 - 12:54 pm UTC
Ok here is the proof:
create table t
as
select trunc(rownum/10) c1, trunc(rownum/10) + 5 c2
-- c1 and c2 values go together i.e. when c1 = 5 then c2 = 10
-- and when c2 = 10 then c1 = 5
from all_objects
;
exec dbms_stats.gather_table_stats(ownname=>'<Schema Name>',tabname=>'T',method_opt=>'FOR ALL COLUMNS SIZE 254');
SQL> show parameter dynamic
NAME TYPE VALUE
------------------------------------ ----------- -----
optimizer_dynamic_sampling integer 1
explain plan for
select *
from t
where c1 = 5
and c2 = 10
-- one of the conditions is redundant
;
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 3 |
|* 1 | TABLE ACCESS FULL | T | 1 | 7 | 3 |
--------------------------------------------------------------------
alter session set optimizer_dynamic_sampling=4;
SQL> show parameter dynamic
NAME TYPE VALUE
------------------------------------ ----------- -----
optimizer_dynamic_sampling integer 4
explain plan for
select *
from t
where c1 = 5
and c2 = 10
-- one of the conditions is redundant
;
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 70 | 3 |
|* 1 | TABLE ACCESS FULL | T | 10 | 70 | 3 |
--------------------------------------------------------------------
July 01, 2005 - 1:53 pm UTC
there you go then.
cardinality hint vs. literals
A reader, January 19, 2006 - 8:04 pm UTC
Oracle 9.2.0.6 with CBO
select * from foo
where col1 in (1,2,3)
select * from foo
where col1 in (select /*+ cardinality(t 3) */ c1 from bar t)
Why is the execution plan different for the 2 statements above? The first one does a INLIST ITERATOR for col1 (indexed) so everythins is great.
The second one does some crazy things and slows things down.
As far as the CBO knows, the IN is going to give it 3 rows, so why doesn't it do the INLIST ITERATOR thing for both cases?
Thanks
January 20, 2006 - 9:22 am UTC
need full example, as to what the "crazy" thing is - but the inlist interator is for hard coded inlists.
I would expect the plans to be different, but sort of the same, like this:
ops$tkyte@ORA9IR2> create table foo ( col1 int primary key, col2 int, data char(80) );
Table created.
ops$tkyte@ORA9IR2> drop table bar;
Table dropped.
ops$tkyte@ORA9IR2> create table bar ( c1 int );
Table created.
ops$tkyte@ORA9IR2> insert /*+ append */ into foo select rownum, rownum, rownum from all_objects;
28429 rows created.
ops$tkyte@ORA9IR2> insert /*+ append */ into bar select rownum from all_objects where rownum <= 3;
3 rows created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'FOO', cascade=>true );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'BAR', cascade=>true );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from foo
2 where col1 in (1,2,3);
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=3 Bytes=270)
1 0 INLIST ITERATOR
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'FOO' (Cost=3 Card=3 Bytes=270)
3 2 INDEX (RANGE SCAN) OF 'SYS_C009866' (UNIQUE) (Cost=2 Card=3)
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from foo
2 where col1 in (select /*+ cardinality(t 3) */ c1 from bar t);
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=52 Card=3 Bytes=279)
1 0 NESTED LOOPS (Cost=52 Card=3 Bytes=279)
2 1 SORT (UNIQUE)
3 2 TABLE ACCESS (FULL) OF 'BAR' (Cost=2 Card=3 Bytes=9)
4 1 TABLE ACCESS (BY INDEX ROWID) OF 'FOO' (Cost=1 Card=1 Bytes=90)
5 4 INDEX (UNIQUE SCAN) OF 'SYS_C009866' (UNIQUE)
ops$tkyte@ORA9IR2> set autotrace off
Incorrect Cardinality
Jara, July 31, 2006 - 7:57 am UTC
Hi Tom,
Please can you help me getting to the bottom of why the cardinalities are incorrect in the following query.
Here are all the details ...
user01@MYDB>select count(*) from test_rship;
COUNT(*)
----------
2875047
Elapsed: 00:00:00.78
user01@MYDB>select count(*) from test_rship
2 where prod_parent_type = 'BP'
3 and prod_child_type = 'CUA';
COUNT(*)
----------
683442
Elapsed: 00:00:02.18
user01@MYDB>select count(*) from test_rship
2 where prod_parent_type = 'CUA'
3 and prod_child_type = 'CUV';
COUNT(*)
----------
741223
Elapsed: 00:00:02.26
1 begin
2 dbms_stats.gather_table_stats(user,'TEST_RSHIP',
3 estimate_percent => dbms_stats.auto_sample_size,
4 method_opt => 'for all columns size auto');
5* end;
user01@MYDB>/
PL/SQL procedure successfully completed.
Elapsed: 00:01:22.23
user01@MYDB>select column_name,num_distinct,to_char(last_analyzed,'dd/mm/yyyy')
2 from user_tab_col_statistics
3 where table_name = 'TEST_RSHIP'
4 and column_name in ('PROD_PARENT_TYPE','PROD_CHILD_TYPE');
COLUMN_NAME NUM_DISTINCT TO_CHAR(LA
------------------------------ ------------ ----------
PROD_PARENT_TYPE 3 31/07/2006
PROD_CHILD_TYPE 4 31/07/2006
user01@MYDB>explain plan
2 for
3 select *
4 from
5 (
6 select prod_parent_id pb_prod_gentd_id
7 , parent_iso_ctry_code pb_ctry_code
8 , parent_auth_status pb_auth_status
....
....
....
26 , child_ean cua_ean
27 from test_rship
28 where prod_parent_type = 'BP'
29 and prod_child_type = 'CUA'
30 ) pb_cua
31 join
32 ( select prod_parent_id cua_prod_gentd_id2
33 , parent_iso_ctry_code cua_ctry_code2
34 , prod_child_id cuv_prod_gentd_id
....
.....
....
42 , child_last_auth_upd_dt cuv_last_auth_upd_dt
43 from test_rship
44 where prod_parent_type = 'CUA'
45 and prod_child_type = 'CUV'
46 ) cua_cuv
47 on pb_cua.cua_prod_gentd_id1 = cua_cuv.cua_prod_gentd_id2
48 and pb_cua.cua_ctry_code1 = cua_cuv.cua_ctry_code2;
user01@MYDB>select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 105K| 15M| | 8283 |
|* 1 | HASH JOIN | | 105K| 15M| 16M| 8283 |
|* 2 | TABLE ACCESS FULL | TEST_RSHIP | 163K| 14M| | 3798 |
|* 3 | TABLE ACCESS FULL | TEST_RSHIP | 385K| 22M| | 3798 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------
1 - access("TEST_RSHIP"."PROD_CHILD_ID"="TEST_RSHIP"."PROD_PARENT_ID"
AND "TEST_RSHIP"."CHILD_ISO_CTRY_CODE"="TEST_RSHIP"."PARENT_ISO_CT
RY_CODE")
2 - filter("TEST_RSHIP"."PROD_PARENT_TYPE"='BP' AND
"TEST_RSHIP"."PROD_CHILD_TYPE"='CUA')
3 - filter("TEST_RSHIP"."PROD_PARENT_TYPE"='CUA' AND
"TEST_RSHIP"."PROD_CHILD_TYPE"='CUV')
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Note: cpu costing is off
21 rows selected.
So, as can be seen the cardinality value for the full scan on the TEST_RSHIP table after the filter conditions have been applied is way off from the actual value. I have collected the stats/histograms but still no joy. I have even run the query and have looked at the cardinality values in the V$SQL_PLAN table and they are same as in the explain plan.
Please can you tell me why the cardinality is not right and what I can do to get it as near as the actual values.
Thanks
July 31, 2006 - 8:38 am UTC
the optimizer sees "discrete things"
It gets statistics on
o A table
o A column in that table
o An index
o The system
all individual statistics. It then uses 12th grade statistics to guess from there. Suppose you had 1000 people in a room (example gratuitously borrowed from Jonathan Lewis). You poll the room and ask "how many of you are born in the month of December?" - 1/12th of the room will raise their hands approximiately. Then you ask "How many of you are of the zodiac sign Pisces?" Again, about 1/12th of the room will raise their hands.
Now, you ask - "and how many of you Pisces are born in December?". The optimizer will guess 1/144th of the room (about 7). The truth? ZERO - all pisces are born in feb/mar. There is a correlation between the two columns and the optimizer doesn't see it because the optimizer has statistics about individual things.
Enter perhaps dynamic sampling. Consider:
ops$tkyte@ORA10GR2> create table t
2 as select decode( mod(rownum,2), 0, 'N', 'Y' ) flag1,
3 decode( mod(rownum,2), 0, 'Y', 'N' ) flag2, a.*
4 from all_objects a
5 /
Table created.
ops$tkyte@ORA10GR2> create index t_idx on t(flag1,flag2);
Index created.
ops$tkyte@ORA10GR2> begin
2 dbms_stats.gather_table_stats
3 ( user, 'T',
4 method_opt=>'for all indexed columns size 254' );
5 end;
6 /
PL/SQL procedure successfully completed.
<b>we have a table such that when flag1 = 'Y', flag2 will be 'N' and vice versa. NEVER are the two columns set to Y,Y or N,N.
There are:</b>
ops$tkyte@ORA10GR2> select num_rows, num_rows/2, num_rows/2/2 from user_tables where table_name = 'T';
NUM_ROWS NUM_ROWS/2 NUM_ROWS/2/2
---------- ---------- ------------
50295 25147.5 12573.75
<b>50,000 rows - when we use "where flag1 = 'N'" the optimizer will rightly guess about 25,000 rows.
It knows "50% of the rows are flag1='N'"
It knows "50% of the rows are flag2='N'"
It will guess therefore that about 25% of the rows are flag1=N and flag2=N, we can see this:</b>
ops$tkyte@ORA10GR2> select * from t where flag1='N';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25268 | 2393K| 167 (2)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| T | 25268 | 2393K| 167 (2)| 00:00:03 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FLAG1"='N')
ops$tkyte@ORA10GR2> select * from t where flag2='N';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25027 | 2370K| 167 (2)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| T | 25027 | 2370K| 167 (2)| 00:00:03 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FLAG2"='N')
ops$tkyte@ORA10GR2> select * from t where flag1='N' and flag2='N';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 12573 | 1190K| 167 (2)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| T | 12573 | 1190K| 167 (2)| 00:00:03 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("FLAG2"='N' AND "FLAG1"='N')
<b>that last bit isn't right - it is way off. Enter dynamic sampling:</b>
ops$tkyte@ORA10GR2> select /*+ dynamic_sampling(t 3) */ * from t where flag1='N' and flag2='N';
Execution Plan
----------------------------------------------------------
Plan hash value: 470836197
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 388 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 4 | 388 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IDX | 4 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("FLAG1"='N' AND "FLAG2"='N')
Note
-----
- dynamic sampling used for this statement
<b>here the optimizer ran a query to hard parse our query - it ran this query because we had dynamic sampling set to 3 and the optimizer knew it "guessed" at a cardinality. It did a quick scan of the data to figure out how close it was and adjusted the cardinality appropriately</b>
My Two Questions
Arindam Mukherjee, August 19, 2006 - 3:00 am UTC
Respected Mr. Tom,
I have already read three (3) threads of your site – “Cardinality”, “density” and “Index rebuild”. But still my idea is not clear on the following three terms. You please explain it in very simple way with your inimitable style. I have read as follows from Oracle document.
Cardinality represents the number of rows in a row set. – I think It’s COUNT of ROWS
The first measure, selectivity, represents a fraction of rows from a row set. – Another form of Count may be percentage.
So on reading those lines, my questions are as follows.
Q1> What are low selectivity, poor selectivity and low cardinality? Is there any relationship with “density”? If yes, How?
Q2> Quote from Oracle 9i concept (Chapter : 10) - “The optimizer can use an existing index to build another index. This results in a much faster index build.”
Does it speak about Index Rebuild? If yes, on what situation does Optimizer itself build a new index?
August 19, 2006 - 4:48 am UTC
cardinality is simply the "count of elements in a set". Many times however, we use cardinality meaning "distinct cardinality" when discussing selectivity. But, cardinality is defined in the original answer above.
q1) selectivity is a measure, a ratio if you will, of how many tuples/rows out of a set a where clause will return.
For example, using ALL_OBJECTS:
select * from all_objects where owner = 'SYS'
where owner = 'SYS' could be said to be "not very selective, having low selectivity" because many rows are such that "owner='SYS'" is true. However
where owner = 'SCOTT'
could be said to have high selectivity, it is very selective, because few rows are such that owner = 'SCOTT' is true.
low selectivity, not very selective, poor selectivity are all the same.
low cardinality is "few rows", typically would be synonymous with "highly selective"
density, from an Oracle cost base optimizer perspective, is a number used to guess the cardinality. It is a measure of how many rows are expected back from a predicate of the general form: 'where column = constant/:bind'. If you do not have histograms, Oracle will use the 'density' to guess how many rows that predicate will return.
q2) it is referring to what it will do when you issue a create index statement, the optimizer itself doesn't choose to build a new index.
Excellent
Arindam Mukherjee, August 19, 2006 - 6:51 am UTC
Why the cardinality seems not to apply here?
Orlando Reyes, January 10, 2007 - 3:51 pm UTC
Tom,
I have a table with over 30M records (31,705,533) with an index on field prmk (OPT_MEMBER_ATTRIBUTE_IDX2) and the number of unique keys for this index is 60. So when running query ¿SELECT * FROM facets.opt_member_attribute WHERE prmk = '538';¿ I would not expect Oracle to use the index at all, however, the explain plan looks like:
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 422K| 40M| 4240 (1)|
| 1 | TABLE ACCESS BY INDEX ROWID| OPT_MEMBER_ATTRIBUTE | 422K| 40M| 4240 (1)|
|* 2 | INDEX RANGE SCAN | OPT_MEMBER_ATTRIBUTE_IDX28 | 422K| | 486 (0)|
------------------------------------------------------------------------------------------------
Both the table and the index are analyzed and the total number of records in the table is 55M and the clustering factor for the index is 1,421,538.
And in fact when I run the query it works in a couple of seconds. I was actually expecting a full table scan.
I am using otimizer_index_caching = 90 and optimizer_index_cost_adj =30.
Can you give me your ideas of why this happens? Shouldn¿t Oracle ignore the index because of the low cardinality? Isn¿t this a very lousy selectivity for the index? And a poor choice for the index?
As always,
Thanks for you help.
Sincerely yours,
Orlando Reyes
Cardinality
Anand Varadarajan, May 23, 2007 - 7:20 am UTC
Th first set of output is from production and the second set of values from testing. The number of records retrieved is 3500. The table is apprx <1000000 records.
Individuall searched only based on the indexed column(without the other filters) we get 70000 rows in both Prod and Test. Other than cardinality for index and cost for the suceeding table everything looks same. Production works in 4-10 minute range and Testing in 4-10 Seconds range.
We are absolutely stumped. Kindly advice.
regards
OPERATION OPTIONS OBJECT_NAME OPTIMIZER COST CARDINALITY
SELECT STATEMENT CHOOSE 873
NESTED LOOPS 873 1
TABLE ACCESS BY INDEX ROWID VH_PSF ANALYZED 871 1
INDEX RANGE SCAN IDX_VH_PSF_TYPE ANALYZED 11 2623
TABLE ACCESS BY GLOBAL INDEX ROWID GM_VIN ANALYZED 2 1
INDEX UNIQUE SCAN SYS_C0014163 ANALYZED 1 1
SELECT STATEMENT CHOOSE 7
NESTED LOOPS 7 1
TABLE ACCESS BY INDEX ROWID VH_PSF ANALYZED 5 1
INDEX RANGE SCAN IDX_VH_PSF_TYPE ANALYZED 3 4
TABLE ACCESS BY GLOBAL INDEX ROWID GM_VIN ANALYZED 2 1
INDEX UNIQUE SCAN SYS_C0020641 ANALYZED 1 1
May 23, 2007 - 8:37 am UTC
i cannot read that.
but to tune a query, you want a tkprof, not an explain plan.
Cardinality
Anand Varadarajan, May 24, 2007 - 7:57 am UTC
The above output is from v$sql_plan. Getting tkprof is not easy as we are only given "read only" access to the database. We will have to wait to get the tkprof output.
May 26, 2007 - 11:26 am UTC
query v$sql_plan_statistics
A reader, May 27, 2008 - 4:12 pm UTC
Tom,
If Optimizer chosen to sample a table (for e.g. using default number of dynamic sampling blocks of 32), will optimizer read the first 32 blocks of the table? What if the data in the table ordered chronologically? For e.g. I have a table orders with order_date, delivery_date etc. and the table contains order history for the last five years. So if my query filter order for the last two weeks and optimizer will sample the table and read first 32 blocks in the table this will not help optimizer to choose the best plan
May 27, 2008 - 6:44 pm UTC
it will not read the first N, it'll randomly sample.
Cardinality for joins
Kalita, August 01, 2008 - 5:04 am UTC
Hi Tom,
I was wondering how Oracle estimates cardinality for unknown values for a column. Please see the example below
SQL> create table test_card_number as
2 select *
3 from
4 (
5 select rownum test_num
6 from dual
7 connect by 1=1
8 )
9 where rownum < 100000;
Table created.
SQL> create index test_card_number_ix on test_card_number(test_num);
Index created.
SQL> exec dbms_stats.gather_table_stats('KALITA','TEST_CARD_NUMBER');
PL/SQL procedure successfully completed.
SQL> set autotrace traceonly;
SQL> select *
2 from test_card_number a,(select ( rownum + 100000) rn from dual )b
3 where test_num >= b.rn;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 2093273829
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5000 | 85000 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 5000 | 85000 | 4 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
| 3 | COUNT | | | | | |
| 4 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN| TEST_CARD_NUMBER_IX | 5000 | 20000 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("TEST_NUM">="B"."RN")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
329 bytes sent via SQL*Net to client
373 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
SQL> select *
2 from test_card_number a,(select (rownum + 100000) rn from dual )b
3 where test_num <= b.rn;
99999 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2093273829
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5000 | 85000 | 4 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 5000 | 85000 | 4 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
| 3 | COUNT | | | | | |
| 4 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN| TEST_CARD_NUMBER_IX | 5000 | 20000 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("TEST_NUM"<="B"."RN")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
6875 consistent gets
0 physical reads
0 redo size
1835957 bytes sent via SQL*Net to client
73710 bytes received via SQL*Net from client
6668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
99999 rows processed
For both >= and <= the cardinality is estimated to be 5000(5% of the rows). I have tried a few more cases and looks like Oracle estimates cardinality to 5% of rows for unknown values of a column. My problem is similar to the <= case in my example above. I get a date from a table which is evaluated for <= in the other table and the expected cardinality is approxmiately 5% of the total rows but actual cardinality would be close to 50%. So, in stead of doing a full table scan, its doing an index range scan. Is there any way of fixing this issue other than hinting the queries?
Thanks,
Kalita
August 03, 2008 - 1:49 pm UTC
the issue is, the plan is generated before the query is executed. So, in the case of a query like:
select *
2 from test_card_number a,(select (rownum + 100000) rn from dual )b
3 where test_num <= b.rn;
a) it cannot presume, assume - just one row from B.
b) it does not know what that value is at all, when building the plan.
so, it has to say "we do a join", and "we do not know what the value is, hence we have to use a heuristic"
I would prefer to write a query like that as:
select * from t where col <= (select X from something_else);
you'll get basically a similar plan and estimated card= values, but at least we know "one row". But since "select x from something_else" is not known until AFTER the plan is already generated.
Thanks Tom..
Kalita, August 04, 2008 - 1:05 am UTC
cardinality , confuse.
jian huang zheng, August 10, 2008 - 1:02 am UTC
Hello Tom
from this example before:
ops$tkyte@ORA10G> select * from t where object_id = 42;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=482 Bytes=44826)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=2 Card=482 Bytes=44826)
2 1 INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=1 Card=193)
how come index range scan with card 193 to produce the table access by index rowid of card 482, I think that is at most one to one card, should be 193. any idea why?
Thanks!
dynamic_sampling vs skewness
MH, August 28, 2008 - 7:49 am UTC
After reading up on this hint I found some queries (over skewed data in very large tables) performed waay much faster when using the hint /*+ dynamic_sampling(3) */.
F.i:
With hint:
107867 records selected.
Elapsed: 00:22:20.07
Uitvoeringsplan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=211975 Card=101631 Bytes=21342510)
1 0 SORT (GROUP BY) (Cost=211975 Card=101631 Bytes=21342510)
2 1 HASH JOIN (OUTER) (Cost=207337 Card=101631 Bytes=21342510)
3 2 HASH JOIN (Cost=183935 Card=101631 Bytes=6199491)
4 3 HASH JOIN (Cost=7787 Card=124222 Bytes=5714212)
5 4 TABLE ACCESS (FULL) OF 'PMR_CURRENT_STATUS' (Cost=1512 Card=124222 Bytes=2981328)
6 4 TABLE ACCESS (FULL) OF 'PAN_RESPONDENTS' (Cost=4269 Card=1103180 Bytes=24269960)
7 3 TABLE ACCESS (FULL) OF 'PAN_RESPONDENT_VALUES' (Cost=174638 Card=902553 Bytes=13538295)
8 2 VIEW (Cost=22597 Card=59183 Bytes=8818267)
9 8 SORT (GROUP BY) (Cost=22597 Card=59183 Bytes=3610163)
10 9 HASH JOIN (Cost=21717 Card=59183 Bytes=3610163)
11 10 TABLE ACCESS (FULL) OF 'PMR_CURRENT_STATUS' (Cost=1510 Card=124222 Bytes=3726660)
12 10 TABLE ACCESS (BY INDEX ROWID) OF 'PAN_PROJECT_MEMBERS' (Cost=39 Card=478 Bytes=10516)
13 12 NESTED LOOPS (Cost=19459 Card=243521 Bytes=7549151)
14 13 VIEW OF 'index$_join$_003' (Cost=36 Card=510 Bytes=4590)
15 14 HASH JOIN
16 15 INDEX (RANGE SCAN) OF 'PMS_SAMP_PANL_FK'(NON-UNIQUE) (Cost=4 Card=510 Bytes=4590)
17 15 INDEX (FAST FULL SCAN) OF 'PMS_SAMP_PK' (UNIQUE) (Cost=4 Card=510 Bytes=4590)
18 13 INDEX (RANGE SCAN) OF 'PMS_PBR_MULTI_IDX1' (NON-UNIQUE) (Cost=7 Card=1809)
Statistics
----------------------------------------------------------
17 recursive calls
0 db block gets
1122897 consistent gets
916871 physical reads
69668 redo size
7988329 bytes sent via SQL*Net to client
81634 bytes received via SQL*Net from client
7193 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
107867 rows processed
without hint:
107869 records selected.
Elapsed: 01:20:04.02
Uitvoeringsplan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=40760 Card=5174 Bytes=1086540)
1 0 SORT (GROUP BY) (Cost=40760 Card=5174 Bytes=1086540)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'PAN_RESPONDENT_VALUES' (Cost=5 Card=2 Bytes=30)
3 2 NESTED LOOPS (Cost=40521 Card=5174 Bytes=1086540)
4 3 HASH JOIN (OUTER) (Cost=26761 Card=3438 Bytes=670410)
5 4 HASH JOIN (Cost=5790 Card=3438 Bytes=158148)
6 5 TABLE ACCESS (FULL) OF 'PMR_CURRENT_STATUS' (Cost=1512 Card=3438 Bytes=82512)
7 5 TABLE ACCESS (FULL) OF 'PAN_RESPONDENTS' (Cost=4269 Card=1103180 Bytes=24269960)
8 4 VIEW (Cost=20970 Card=355 Bytes=52895)
9 8 SORT (GROUP BY) (Cost=20970 Card=355 Bytes=21655)
10 9 HASH JOIN (Cost=20969 Card=355 Bytes=21655)
11 10 TABLE ACCESS (FULL) OF 'PMR_CURRENT_STATUS' (Cost=1510 Card=3438 Bytes=103140)
12 10 TABLE ACCESS (BY INDEX ROWID) OF 'PAN_PROJECT_MEMBERS' (Cost=39 Card=90 Bytes=1980)
13 12 NESTED LOOPS (Cost=19459 Card=46085 Bytes=1428635)
14 13 VIEW OF 'index$_join$_003' (Cost=36 Card=510 Bytes=4590)
15 14 HASH JOIN
16 15 INDEX (RANGE SCAN) OF 'PMS_SAMP_PANL_FK' (NON-UNIQUE) (Cost=4 Card=510 Bytes=4590)
17 15 INDEX (FAST FULL SCAN) OF 'PMS_SAMP_PK' (UNIQUE) (Cost=4 Card=510 Bytes=4590)
18 13 INDEX (RANGE SCAN) OF 'PMS_PBR_MULTI_IDX1' (NON-UNIQUE) (Cost=7 Card=1809)
19 3 INDEX (RANGE SCAN) OF 'PMS_REVA_RESP_FK' (NON-UNIQUE) (Cost=4 Card=2)
Statistics
----------------------------------------------------------
5 recursive calls
0 db block gets
1507643 consistent gets
766208 physical reads
26840 redo size
7988422 bytes sent via SQL*Net to client
81599 bytes received via SQL*Net from client
7193 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
107869 rows processed
(2 records were added during tests, so that explains the different results)
With hint the plan looks like I would expect it to be.
(I don't want an Index range scan over a table containing almost 200 million records...it's 'db file sequential reading' forever then)
All tables are analyzed daily (GATHER_STALE) and indexed but quite skewed over certain FK-columns. So sampling (level = 3) is only applied for (quoting from doc.:) "all tables for which standard selectivity estimation used a guess for some predicate that is a potential dynamic sampling predicate"?
(When I use level = 2, the optimezer chooses the 'wrong plan' again).
Is this a 'safe hint-approach' to follow when tuning large queries that perform bad due to skewness?
Or should special histograms be preferred?
(Or am I comparing apples to toaster ovens now?)
I now tend to use this hint more often, but would like to know if there are any caveats?
I'm really amazed by the results: almost one hour faster, just by adding this hint...thanks!
Oracle fancies big brutal hash joins the most, i guess ;)
Regards,
Martijn
Why the difference?
A reader, September 02, 2009 - 12:25 pm UTC
Suppose I have a query typically used in web-based reporting tools that allow for optional search filters.
Consider the following 3 variants of this type of query.
1.
where 1=1
and <join conditions>
and (:b1 is null or column in (select ...))
2.
where 1=1
and <join conditions>
and (column in (:b2,:b3,:b4))
3.
where 1=1
and <join conditions>
and (:b1 is null or column in (select /*+ cardinality(t 3) */ ...))
The user input is both optional and variable length. So I use the table() function to convert the comma separated string with the in-list using the code you posted on this site.
When :b1 is not null, I thought #3 would have the same execeution plan/performance as #2 (which performs the best) because the optimizer would optimize away the :b1 is null part and the cardinality hint would tell it that the SELECT returns 3 values which is the same as entering the 3 values into an IN list, as in #2.
But it doesn't do that. It does a hash join instead of a INLIST iterator and the overall performance is much slower.
Any idea why?
Thanks
September 02, 2009 - 1:01 pm UTC
in general
where c in (a,b,c)
will use what is known as an inlist interator - it knows at least three and at most three, it is 100% sure of everything
where (:b1 is null or c in (set))
is completely different - hugely different. And you would need to verify that the estimated card was really 3 - remember - the cardinality hint is "undocumented", it might not be 'taking' there.
It is taking
A reader, September 02, 2009 - 1:55 pm UTC
> it might not be 'taking' there.
When I see the explain plan (card= for that step), I do see it taking, yet the overall execution plan is different.
Is there a way I can force the inlist iterator to be used?
Sometimes, for other queries, it does work, I see something called 'inlist iterator pickler fetch' in the plan and it works great.
Besides, given a predicate like (1=2 or c in (set)), i.e. any predicate where the first part is constant and evaluates to false, isn't it equivalent to not specifying that predicate all? In other words, all else remaining the same, the execution plan for the above predicate and for (c in (set)) should be identical, right? But it isn't. Why is this?
September 02, 2009 - 2:20 pm UTC
where (:x is null or c in (set))
is radically, hugely different than
where (c in (a,b,c))
where (c in (set))
is radically, hugely different than
where (c in (a,b,c))
... the
execution plan for the above predicate and for (c in (set)) should be
identical, right? ...
why ? why would they be identical?
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14211/ex_plan.htm#sthref1416 inlist iterator is for in's with lists.
ops$tkyte%ORA10GR2> create table t
2 as
3 select *
4 from all_objects;
Table created.
ops$tkyte%ORA10GR2> alter table t add constraint t_pk primary key(object_id);
Table altered.
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t where object_id in (42, 55);
Execution Plan
----------------------------------------------------------
Plan hash value: 2629880895
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 22 | 2816 | 8 (0)| 00:00:01 |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T | 22 | 2816 | 8 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | T_PK | 163 | | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("OBJECT_ID"=42 OR "OBJECT_ID"=55)
Note
-----
- dynamic sampling used for this statement
ops$tkyte%ORA10GR2> select * from t where object_id in (select 42 from dual union all select 55 from dual);
Execution Plan
----------------------------------------------------------
Plan hash value: 3667190664
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 262 | 6 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 2 | 262 | 6 (0)| 00:00:01 |
| 2 | VIEW | VW_NSO_1 | 2 | 6 | 4 (0)| 00:00:01 |
| 3 | HASH UNIQUE | | 2 | | 4 (50)| 00:00:01 |
| 4 | UNION-ALL | | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID| T | 1 | 128 | 1 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | T_PK | 1 | | 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
8 - access("OBJECT_ID"="$nso_col_1")
Note
-----
- dynamic sampling used for this statement
it would be able to nested loop (which is a lot like an inlist interator) or hash join or sort merge or any of the JOIN techniques.
Apex collections
A reader, September 02, 2009 - 2:49 pm UTC
OK my actual use case is for an Apex report region where a list of ids is stored in a collection. A page item :x determines whether or not the collection is populated. So the predicate takes the form
(:x is null or c in (select /*+ cardinality(t 10) */ c001 from apex_collections t
where t.collection_name='X'))
I *have* to use the cardinality hint when using Apex collections because it is generic view that is used by all applications, only I know how much data is expected in any given situation. This query doesn't perform too well, how would one go about speeding it up? Thanks
1=2
A reader, September 02, 2009 - 5:02 pm UTC
I asked "Besides, given a predicate like (1=2 or c in (set)), i.e. any predicate where the first part is constant and evaluates to false, isn't it equivalent to not specifying that predicate all? In other words, all else remaining the same, the execution plan for the above predicate and for (c in (set)) should be identical, right? But it isn't. Why is this?"
I am not sure I understood your response. Adding a constant, known-at-parse-time expression with a OR into a predicate that evaluates to false i.e. 1=2 OR <whatever> should have absolutely no effect on the query plan, right? But it does.
Here is a (trivial) example
SQL> create table x as select level i from dual connect by level<=10000;
Table created.
SQL> create table y as select level i from dual connect by level<=10000;
Table created.
SQL> create index i1 on x(i);
Index created.
SQL> create index i2 on y(i);
Index created.
SQL> exec dbms_stats.gather_table_stats(user,'x',cascade=>true);
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(user,'y',cascade=>true)
PL/SQL procedure successfully completed.
SQL> set autotrace traceonly exp
SQL> select * from x where (1=2 or x.i in (select y.i from y));
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=503 Card=500 Bytes=1
500)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'X' (Cost=3 Card=500 Bytes=1500)
3 1 INDEX (RANGE SCAN) OF 'I2' (NON-UNIQUE) (Cost=1 Card=1 Bytes=3)
SQL> select * from x where x.i in (select y.i from y);
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=10000 Bytes=60000)
1 0 NESTED LOOPS (SEMI) (Cost=4 Card=10000 Bytes=60000)
2 1 INDEX (FULL SCAN) OF 'I1' (NON-UNIQUE) (Cost=22 Card=10000 Bytes=30000)
3 1 INDEX (RANGE SCAN) OF 'I2' (NON-UNIQUE)
Shouldn't the plan for the 2 queries be the same since the constant predicate 1=2 isn't really affecting the answer? But the cost of the query with the OR is much higher. Why? I realize that the CBO (probably) doesn't really evaluate the constant predicate but with bind variable peeking, it could do that and optimize away the expression thus arriving at the lower cost plan, couldn't it?
September 03, 2009 - 7:26 am UTC
... I am not sure I understood your response. ...
my response was that
where (c in (a,b,c))
is not in any way the same as
where (c in (set))
that is all - forget the 1=2, it goes away.
they were asking why "c in (a,b,c)" results in inlist iterator and "c in (set)" does not.
A reader, September 03, 2009 - 10:36 am UTC
That wasn't the question. I posted it again to clarify. It was about the 1=2 (or any expression that evaluates to false and has a OR in the predicate). You said the "1=2 goes away" but it doesn't appear to. I gave an example to show that execution plan with it and without are different. Why is that?
September 04, 2009 - 2:03 pm UTC
well, it does go away - you chopped off the predicate bit. the plans are different (that is permitted, there is no reason that might now happen).
ops$tkyte%ORA10GR2> set autotrace traceonly exp
ops$tkyte%ORA10GR2> select * from x where (1=2 or x.i in (select y.i from y));
Execution Plan
----------------------------------------------------------
Plan hash value: 858363292
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 30000 | 4 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| X | 10000 | 30000 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | I2 | 1 | 3 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT /*+ */ 0 FROM "Y" "Y" WHERE "Y"."I"=:B1))
3 - access("Y"."I"=:B1)
ops$tkyte%ORA10GR2> select * from x where ( x.i in (select y.i from y));
Execution Plan
----------------------------------------------------------
Plan hash value: 3518811015
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 60000 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN SEMI | | 10000 | 60000 | 7 (15)| 00:00:01 |
| 2 | INDEX FAST FULL SCAN| I1 | 10000 | 30000 | 3 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| I2 | 10000 | 30000 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("X"."I"="Y"."I")
ops$tkyte%ORA10GR2> set autotrace off
but the 1=2 is not there.
tervor, August 14, 2012 - 2:03 am UTC
Tom what was the script you ran that did these steps please
>new 1: explain plan for select * from emp, dept where >emp.deptno =dept.deptno
>Explained.
great explanation on cardinality one of the most common confused terms regarding databases
August 17, 2012 - 2:05 pm UTC
just a really old script, with dbms_xplan/autotrace I don't use it anymore.
It just did a delete on the plan table
then explain plan for &1
then a query against the plan table to format it.
just use
set autotrace traceonly explain
select ....
set autotrace off
much easier. better plan display too.
thankyou
Trevor, August 20, 2012 - 11:39 pm UTC
Thanks very much
Anand, January 18, 2013 - 12:42 am UTC
Hi Tom,
What could be the other reason for explain plan to show not correct cardinality evenif my all table are analyzed.
January 18, 2013 - 10:23 am UTC
Anand, January 19, 2013 - 2:41 pm UTC
Hi Tom,
Thanks a lot.
Still i have one question.if there a query join of multiple table then how to decide to give hint of dynamic sampling on which table.
Lets say i have below query :
select * from a,b,c,d where .....
i should give hint /*+ dynamic_sampling(a 0) */ or /*+ dynamic_sampling(b 0) */ or dynamic_sampling(c 0) */
or it doesn't matter.
January 21, 2013 - 7:29 am UTC
well, what do you want to have happen?
there is no single answer here - do you want dynamic sampling to happen? or not? it is up to you.
Cardinality mismatch
Amit, February 13, 2014 - 5:10 pm UTC
Hi Tom,
I have following query and I generated execution plan for it. I am not able to understand the cardinality.
Query :
SELECT DISTINCT NPF_ACCOUNT.AC_SK,
NPF_AC_POSN_SUM.ACCT_BAL,
T_ACCOUNT.AC_ID
FROM T_ACCOUNT,
T_ACCT_SUM
WHERE T_ACCOUNT.AC_SK IN (167056,276364,292510,410216,550806,584324,584325,585607,594305,594306,375386,375387,597632,597633,597634,597636,597637,597638,597639,597640,597641,597642,597643,597644,597645,597646,597647,597648,597649,597650,597651,597652,597653,597654,597655,597665,597656,597657,597658,597659,597660,597661,597662,597663,597664,597625,597626,597627,608683,709753,739855,769598,769599,769600,789267,831812,831816,831815,831814,831813,831817,886799,891301,891302,896746,905441,928452,952683,971664,971665,971666,971667,971668,971669,971670,597628,597629,986885,987360,987859,992765,1069330,1070973,1075750,1201860,1201862,1201861,1283345,1283346,1312703,1312704,1312702,1312578,1312579,1415629,1446799,1446798,1520530,1834159,1859876,1894444,1932920,1964377,2033976,2077465,2185914)
AND T_ACCT_SUM.ACCT_BAL <= 1000.00
AND T_ACCOUNT.AC_SK = T_ACCT_SUM.AC_SK
AND T_ACCT_SUM.BAL_DATE = '31-OCT-13';
Execution plan :
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 132 | 4752 | 124 (2)|
| 1 | HASH UNIQUE | | 132 | 4752 | 124 (2)|
| 2 | MERGE JOIN | | 132 | 4752 | 123 (1)|
| 3 | PARTITION RANGE SINGLE | | 128K| 2501K| 85 (0)|
| 4 | INLIST ITERATOR | | | | |
|* 5 | TABLE ACCESS BY LOCAL INDEX ROWID| T_ACCT_SUM | 128K| 2501K| 85 (0)|
|* 6 | INDEX RANGE SCAN | PK_T_ACCT_SUM | 5 | | 32 (0)|
|* 7 | SORT JOIN | | 106 | 1696 | 39 (3)|
| 8 | INLIST ITERATOR | | | | |
| 9 | TABLE ACCESS BY INDEX ROWID | T_ACCOUNT | 106 | 1696 | 38 (0)|
|* 10 | INDEX UNIQUE SCAN | PK_T_ACCOUNT | 106 | | 29 (0)|
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter("NPF_AC_POSN_SUM"."ACCT_BAL"<=1000.00)
6 - access(("NPF_AC_POSN_SUM"."AC_SK"=167056 OR "NPF_AC_POSN_SUM"."AC_SK"=276364 OR
"NPF_AC_POSN_SUM"."AC_SK"=292510 OR "NPF_AC_POSN_SUM"."AC_SK"=375386 OR "NPF_AC_POSN_SUM"."AC_SK"=375387 OR
"NPF_AC_POSN_SUM"."AC_SK"=410216 OR "NPF_AC_POSN_SUM"."AC_SK"=550806 OR "NPF_AC_POSN_SUM"."AC_SK"=584324 OR
"NPF_AC_POSN_SUM"."AC_SK"=584325 OR "NPF_AC_POSN_SUM"."AC_SK"=585607 OR "NPF_AC_POSN_SUM"."AC_SK"=594305 OR
"NPF_AC_POSN_SUM"."AC_SK"=594306 OR "NPF_AC_POSN_SUM"."AC_SK"=597625 OR "NPF_AC_POSN_SUM"."AC_SK"=597626 OR
"NPF_AC_POSN_SUM"."AC_SK"=597627 OR "NPF_AC_POSN_SUM"."AC_SK"=597628 OR "NPF_AC_POSN_SUM"."AC_SK"=597629 OR
"NPF_AC_POSN_SUM"."AC_SK"=597632 OR "NPF_AC_POSN_SUM"."AC_SK"=597633 OR "NPF_AC_POSN_SUM"."AC_SK"=597634 OR
"NPF_AC_POSN_SUM"."AC_SK"=597636 OR "NPF_AC_POSN_SUM"."AC_SK"=597637 OR "NPF_AC_POSN_SUM"."AC_SK"=597638 OR
"NPF_AC_POSN_SUM"."AC_SK"=597639 OR "NPF_AC_POSN_SUM"."AC_SK"=597640 OR "NPF_AC_POSN_SUM"."AC_SK"=597641 OR
"NPF_AC_POSN_SUM"."AC_SK"=597642 OR "NPF_AC_POSN_SUM"."AC_SK"=597643 OR "NPF_AC_POSN_SUM"."AC_SK"=597644 OR
"NPF_AC_POSN_SUM"."AC_SK"=597645 OR "NPF_AC_POSN_SUM"."AC_SK"=597646 OR "NPF_AC_POSN_SUM"."AC_SK"=597647 OR
"NPF_AC_POSN_SUM"."AC_SK"=597648 OR "NPF_AC_POSN_SUM"."AC_SK"=597649 OR "NPF_AC_POSN_SUM"."AC_SK"=597650 OR
"NPF_AC_POSN_SUM"."AC_SK"=597651 OR "NPF_AC_POSN_SUM"."AC_SK"=597652 OR "NPF_AC_POSN_SUM"."AC_SK"=597653 OR
"NPF_AC_POSN_SUM"."AC_SK"=597654 OR "NPF_AC_POSN_SUM"."AC_SK"=597655 OR "NPF_AC_POSN_SUM"."AC_SK"=597656 OR
"NPF_AC_POSN_SUM"."AC_SK"=597657 OR "NPF_AC_POSN_SUM"."AC_SK"=597658 OR "NPF_AC_POSN_SUM"."AC_SK"=597659 OR
"NPF_AC_POSN_SUM"."AC_SK"=597660 OR "NPF_AC_POSN_SUM"."AC_SK"=597661 OR "NPF_AC_POSN_SUM"."AC_SK"=597662 OR
"NPF_AC_POSN_SUM"."AC_SK"=597663 OR "NPF_AC_POSN_SUM"."AC_SK"=597664 OR "NPF_AC_POSN_SUM"."AC_SK"=597665 OR
"NPF_AC_POSN_SUM"."AC_SK"=608683 OR "NPF_AC_POSN_SUM"."AC_SK"=709753 OR "NPF_AC_POSN_SUM"."AC_SK"=739855 OR
"NPF_AC_POSN_SUM"."AC_SK"=769598 OR "NPF_AC_POSN_SUM"."AC_SK"=769599 OR "NPF_AC_POSN_SUM"."AC_SK"=769600 OR
"NPF_AC_POSN_SUM"."AC_SK"=789267 OR "NPF_AC_POSN_SUM"."AC_SK"=831812 OR "NPF_AC_POSN_SUM"."AC_SK"=831813 OR
"NPF_AC_POSN_SUM"."AC_SK"=831814 OR "NPF_AC_POSN_SUM"."AC_SK"=831815 OR "NPF_AC_POSN_SUM"."AC_SK"=831816 OR
"NPF_AC_POSN_SUM"."AC_SK"=831817 OR "NPF_AC_POSN_SUM"."AC_SK"=886799 OR "NPF_AC_POSN_SUM"."AC_SK"=891301 OR
"NPF_AC_POSN_SUM"."AC_SK"=891302 OR "NPF_AC_POSN_SUM"."AC_SK"=896746 OR "NPF_AC_POSN_SUM"."AC_SK"=905441 OR
"NPF_AC_POSN_SUM"."AC_SK"=928452 OR "NPF_AC_POSN_SUM"."AC_SK"=952683 OR "NPF_AC_POSN_SUM"."AC_SK"=971664 OR
"NPF_AC_POSN_SUM"."AC_SK"=971665 OR "NPF_AC_POSN_SUM"."AC_SK"=971666 OR "NPF_AC_POSN_SUM"."AC_SK"=971667 OR
"NPF_AC_POSN_SUM"."AC_SK"=971668 OR "NPF_AC_POSN_SUM"."AC_SK"=971669 OR "NPF_AC_POSN_SUM"."AC_SK"=971670 OR
"NPF_AC_POSN_SUM"."AC_SK"=986885 OR "NPF_AC_POSN_SUM"."AC_SK"=987360 OR "NPF_AC_POSN_SUM"."AC_SK"=987859 OR
"NPF_AC_POSN_SUM"."AC_SK"=992765 OR "NPF_AC_POSN_SUM"."AC_SK"=1069330 OR "NPF_AC_POSN_SUM"."AC_SK"=1070973 OR
"NPF_AC_POSN_SUM"."AC_SK"=1075750 OR "NPF_AC_POSN_SUM"."AC_SK"=1201860 OR "NPF_AC_POSN_SUM"."AC_SK"=1201861 OR
"NPF_AC_POSN_SUM"."AC_SK"=1201862 OR "NPF_AC_POSN_SUM"."AC_SK"=1283345 OR "NPF_AC_POSN_SUM"."AC_SK"=1283346 OR
"NPF_AC_POSN_SUM"."AC_SK"=1312578 OR "NPF_AC_POSN_SUM"."AC_SK"=1312579 OR "NPF_AC_POSN_SUM"."AC_SK"=1312702 OR
"NPF_AC_POSN_SUM"."AC_SK"=1312703 OR "NPF_AC_POSN_SUM"."AC_SK"=1312704 OR "NPF_AC_POSN_SUM"."AC_SK"=1415629 OR
"NPF_AC_POSN_SUM"."AC_SK"=1446798 OR "NPF_AC_POSN_SUM"."AC_SK"=1446799 OR "NPF_AC_POSN_SUM"."AC_SK"=1520530 OR
"NPF_AC_POSN_SUM"."AC_SK"=1834159 OR "NPF_AC_POSN_SUM"."AC_SK"=1859876 OR "NPF_AC_POSN_SUM"."AC_SK"=1894444 OR
"NPF_AC_POSN_SUM"."AC_SK"=1932920 OR "NPF_AC_POSN_SUM"."AC_SK"=1964377 OR "NPF_AC_POSN_SUM"."AC_SK"=2033976 OR
"NPF_AC_POSN_SUM"."AC_SK"=2077465 OR "NPF_AC_POSN_SUM"."AC_SK"=2185914) AND
"NPF_AC_POSN_SUM"."VALN_AS_OF_DATE"='31-OCT-13')
7 - access("NPF_ACCOUNT"."AC_SK"="NPF_AC_POSN_SUM"."AC_SK")
filter("NPF_ACCOUNT"."AC_SK"="NPF_AC_POSN_SUM"."AC_SK")
10 - access("NPF_ACCOUNT"."AC_SK"=167056 OR "NPF_ACCOUNT"."AC_SK"=276364 OR "NPF_ACCOUNT"."AC_SK"=292510 OR
"NPF_ACCOUNT"."AC_SK"=375386 OR "NPF_ACCOUNT"."AC_SK"=375387 OR "NPF_ACCOUNT"."AC_SK"=410216 OR
"NPF_ACCOUNT"."AC_SK"=550806 OR "NPF_ACCOUNT"."AC_SK"=584324 OR "NPF_ACCOUNT"."AC_SK"=584325 OR
"NPF_ACCOUNT"."AC_SK"=585607 OR "NPF_ACCOUNT"."AC_SK"=594305 OR "NPF_ACCOUNT"."AC_SK"=594306 OR
"NPF_ACCOUNT"."AC_SK"=597625 OR "NPF_ACCOUNT"."AC_SK"=597626 OR "NPF_ACCOUNT"."AC_SK"=597627 OR
"NPF_ACCOUNT"."AC_SK"=597628 OR "NPF_ACCOUNT"."AC_SK"=597629 OR "NPF_ACCOUNT"."AC_SK"=597632 OR
"NPF_ACCOUNT"."AC_SK"=597633 OR "NPF_ACCOUNT"."AC_SK"=597634 OR "NPF_ACCOUNT"."AC_SK"=597636 OR
"NPF_ACCOUNT"."AC_SK"=597637 OR "NPF_ACCOUNT"."AC_SK"=597638 OR "NPF_ACCOUNT"."AC_SK"=597639 OR
"NPF_ACCOUNT"."AC_SK"=597640 OR "NPF_ACCOUNT"."AC_SK"=597641 OR "NPF_ACCOUNT"."AC_SK"=597642 OR
"NPF_ACCOUNT"."AC_SK"=597643 OR "NPF_ACCOUNT"."AC_SK"=597644 OR "NPF_ACCOUNT"."AC_SK"=597645 OR
"NPF_ACCOUNT"."AC_SK"=597646 OR "NPF_ACCOUNT"."AC_SK"=597647 OR "NPF_ACCOUNT"."AC_SK"=597648 OR
"NPF_ACCOUNT"."AC_SK"=597649 OR "NPF_ACCOUNT"."AC_SK"=597650 OR "NPF_ACCOUNT"."AC_SK"=597651 OR
"NPF_ACCOUNT"."AC_SK"=597652 OR "NPF_ACCOUNT"."AC_SK"=597653 OR "NPF_ACCOUNT"."AC_SK"=597654 OR
"NPF_ACCOUNT"."AC_SK"=597655 OR "NPF_ACCOUNT"."AC_SK"=597656 OR "NPF_ACCOUNT"."AC_SK"=597657 OR
"NPF_ACCOUNT"."AC_SK"=597658 OR "NPF_ACCOUNT"."AC_SK"=597659 OR "NPF_ACCOUNT"."AC_SK"=597660 OR
"NPF_ACCOUNT"."AC_SK"=597661 OR "NPF_ACCOUNT"."AC_SK"=597662 OR "NPF_ACCOUNT"."AC_SK"=597663 OR
"NPF_ACCOUNT"."AC_SK"=597664 OR "NPF_ACCOUNT"."AC_SK"=597665 OR "NPF_ACCOUNT"."AC_SK"=608683 OR
"NPF_ACCOUNT"."AC_SK"=709753 OR "NPF_ACCOUNT"."AC_SK"=739855 OR "NPF_ACCOUNT"."AC_SK"=769598 OR
"NPF_ACCOUNT"."AC_SK"=769599 OR "NPF_ACCOUNT"."AC_SK"=769600 OR "NPF_ACCOUNT"."AC_SK"=789267 OR
"NPF_ACCOUNT"."AC_SK"=831812 OR "NPF_ACCOUNT"."AC_SK"=831813 OR "NPF_ACCOUNT"."AC_SK"=831814 OR
"NPF_ACCOUNT"."AC_SK"=831815 OR "NPF_ACCOUNT"."AC_SK"=831816 OR "NPF_ACCOUNT"."AC_SK"=831817 OR
"NPF_ACCOUNT"."AC_SK"=886799 OR "NPF_ACCOUNT"."AC_SK"=891301 OR "NPF_ACCOUNT"."AC_SK"=891302 OR
"NPF_ACCOUNT"."AC_SK"=896746 OR "NPF_ACCOUNT"."AC_SK"=905441 OR "NPF_ACCOUNT"."AC_SK"=928452 OR
"NPF_ACCOUNT"."AC_SK"=952683 OR "NPF_ACCOUNT"."AC_SK"=971664 OR "NPF_ACCOUNT"."AC_SK"=971665 OR
"NPF_ACCOUNT"."AC_SK"=971666 OR "NPF_ACCOUNT"."AC_SK"=971667 OR "NPF_ACCOUNT"."AC_SK"=971668 OR
"NPF_ACCOUNT"."AC_SK"=971669 OR "NPF_ACCOUNT"."AC_SK"=971670 OR "NPF_ACCOUNT"."AC_SK"=986885 OR
"NPF_ACCOUNT"."AC_SK"=987360 OR "NPF_ACCOUNT"."AC_SK"=987859 OR "NPF_ACCOUNT"."AC_SK"=992765 OR
"NPF_ACCOUNT"."AC_SK"=1069330 OR "NPF_ACCOUNT"."AC_SK"=1070973 OR "NPF_ACCOUNT"."AC_SK"=1075750 OR
"NPF_ACCOUNT"."AC_SK"=1201860 OR "NPF_ACCOUNT"."AC_SK"=1201861 OR "NPF_ACCOUNT"."AC_SK"=1201862 OR
"NPF_ACCOUNT"."AC_SK"=1283345 OR "NPF_ACCOUNT"."AC_SK"=1283346 OR "NPF_ACCOUNT"."AC_SK"=1312578 OR
"NPF_ACCOUNT"."AC_SK"=1312579 OR "NPF_ACCOUNT"."AC_SK"=1312702 OR "NPF_ACCOUNT"."AC_SK"=1312703 OR
"NPF_ACCOUNT"."AC_SK"=1312704 OR "NPF_ACCOUNT"."AC_SK"=1415629 OR "NPF_ACCOUNT"."AC_SK"=1446798 OR
"NPF_ACCOUNT"."AC_SK"=1446799 OR "NPF_ACCOUNT"."AC_SK"=1520530 OR "NPF_ACCOUNT"."AC_SK"=1834159 OR
"NPF_ACCOUNT"."AC_SK"=1859876 OR "NPF_ACCOUNT"."AC_SK"=1894444 OR "NPF_ACCOUNT"."AC_SK"=1932920 OR
"NPF_ACCOUNT"."AC_SK"=1964377 OR "NPF_ACCOUNT"."AC_SK"=2033976 OR "NPF_ACCOUNT"."AC_SK"=2077465 OR
"NPF_ACCOUNT"."AC_SK"=2185914)
Table T_ACCT_SUM is range partitioned on BAL_DATE column. ACCT_BAL column has height balanced histogram.
Database is 11.2.0.3 version 4 node RAC system.
Execution plan is collected using "SET AUTOTRACE ON" command.
On line 6, cardinality calculated is "5" using index range scan. But on line 5, cardinality has increased to 128K for same table. My assumption was that, cardinality returned by index should be greater than or same as the cardinality for its corresponding table. Here, the increase is huge. Can you please help me understand what I am missing?
Thanks!
Simple vocabulary would have made the answer better
fero31@yahoo.co.in, March 23, 2015 - 8:08 pm UTC
I normally dont write reviews but I had to say this, the answers could have been better if it were written with simple vocabulary. I was looking for different oracle questions, always this website came up first, so I would read this answer and then had to move to other websites as I dont comprehend the style of writing that much.
Histogram and Explain plan commands
Rajeshwaran, Jeyabal, January 17, 2017 - 6:37 am UTC
Team,
Looking into the histogram the estimated cardinality for the bind value SYS is (77043-41873) 35170, but the explain plan reports the cardinality as 4075 (which is num_rows/NDV i.e. 77426/19), any reason why the explain plan doesn't make use of Histograms to report the correct cardinality ?
demo@ORA12C> create table t as
2 select *
3 from all_objects;
Table created.
demo@ORA12C>
demo@ORA12C> create index t_idx on t(owner);
Index created.
demo@ORA12C> begin
2 dbms_stats.gather_table_stats(
3 ownname=>user,
4 tabname=>'T',
5 method_opt=>'for all indexed columns size 20');
6 end;
7 /
PL/SQL procedure successfully completed.
demo@ORA12C>
demo@ORA12C> select count(*),count(distinct owner) from t;
COUNT(*) COUNT(DISTINCTOWNER)
---------- --------------------
77426 19
1 row selected.
demo@ORA12C>
demo@ORA12C> select num_distinct,num_buckets,histogram,num_nulls
2 from user_tab_col_statistics
3 where table_name ='T'
4 and column_name ='OWNER';
NUM_DISTINCT NUM_BUCKETS HISTOGRAM NUM_NULLS
------------ ----------- --------------- ----------
19 19 FREQUENCY 0
1 row selected.
demo@ORA12C>
demo@ORA12C> column endpoint_actual_value format a20
demo@ORA12C> select endpoint_number,endpoint_actual_value
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='OWNER';
ENDPOINT_NUMBER ENDPOINT_ACTUAL_VALU
--------------- --------------------
248 APEX_040200
356 CTXSYS
365 DEMO
382 DVF
393 DVSYS
412 GSMADMIN_INTERNAL
494 LBACSYS
1658 MDSYS
1676 OLAPSYS
1681 ORDDATA
1686 ORDPLUGINS
4786 ORDSYS
4793 OUTLN
41869 PUBLIC
41873 SCOTT
77043 SYS
77056 SYSTEM
77126 WMSYS
77426 XDB
19 rows selected.
demo@ORA12C> variable x varchar2(20);
demo@ORA12C> exec :x := 'SYS';
PL/SQL procedure successfully completed.
demo@ORA12C>
demo@ORA12C> explain plan for
2 select * from t
3 where owner =:x;
Explained.
demo@ORA12C>
demo@ORA12C> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
Plan hash value: 767293772
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4075 | 461K| 141 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 4075 | 461K| 141 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IDX | 4075 | | 10 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"=:X)
14 rows selected.
demo@ORA12C> exec :x := 'SCOTT';
PL/SQL procedure successfully completed.
demo@ORA12C> explain plan for
2 select * from t
3 where owner =:x;
Explained.
demo@ORA12C>
demo@ORA12C> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------
Plan hash value: 767293772
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4075 | 461K| 141 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 4075 | 461K| 141 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IDX | 4075 | | 10 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"=:X)
14 rows selected.
demo@ORA12C>
January 18, 2017 - 1:33 am UTC
An explain plan *command* doesn't peek. It's one of the flaws of using it. Better to run it and use dbms_xplan.display_cursor
SQL> drop table t purge;
Table dropped.
SQL>
SQL> create table t as
2 select *
3 from all_objects;
Table created.
SQL>
SQL> create index t_idx on t(owner);
Index created.
SQL>
SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname=>user,
4 tabname=>'T',
5 method_opt=>'for all indexed columns size 35');
6 end;
7 /
PL/SQL procedure successfully completed.
SQL>
SQL> select count(*),count(distinct owner) from t;
COUNT(*) COUNT(DISTINCTOWNER)
---------- --------------------
101321 39
1 row selected.
SQL>
SQL> select num_distinct,num_buckets,histogram,num_nulls
2 from user_tab_col_statistics
3 where table_name ='T'
4 and column_name ='OWNER';
NUM_DISTINCT NUM_BUCKETS HISTOGRAM NUM_NULLS
------------ ----------- --------------- ----------
39 35 TOP-FREQUENCY 0
1 row selected.
SQL>
SQL> column endpoint_actual_value format a20
SQL> select endpoint_number,endpoint_actual_value
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='OWNER';
ENDPOINT_NUMBER ENDPOINT_ACTUAL_VALU
--------------- --------------------
3021 APEX_040200
6378 APEX_050000
9542 APEX_050100
9564 APEX_LISTENER
9724 ASKTOM
9733 AUDSYS
9741 BI
10142 CTXSYS
10197 DBSNMP
10216 DVF
10508 DVSYS
10520 FLOWS_FILES
10624 GSMADMIN_INTERNAL
10658 HR
10895 LBACSYS
14055 MCDONAC
16154 MDSYS
16171 OJVMSYS
16196 OLAPSYS
16470 ORDDATA
16480 ORDPLUGINS
19640 ORDSYS
19848 ORDS_METADATA
19856 OUTLN
57418 PUBLIC
57428 RASADM
57444 SCOTT
57735 SH
57743 SI_INFORMTN_SCHEMA
57790 SOE
99915 SYS
100513 SYSTEM
100531 WHS_MD
100902 WMSYS
101306 XDB
35 rows selected.
SQL>
SQL> variable x varchar2(20);
SQL> exec :x := 'SCOTT';
PL/SQL procedure successfully completed.
SQL>
SQL> select * from t
2 where owner =:x;
...
16 rows selected.
SQL>
SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 7yvn3fqx5x8tp, child number 0
-------------------------------------
select * from t where owner =:x
Plan hash value: 767293772
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 16 | 1824 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IDX | 16 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"=:X)
19 rows selected.
SQL>
SQL>