Stephan Bressler, November 21, 2002 - 8:00 am UTC
Hi,
what's the formula
Density = Number of non-popular values
----------------------------
total number of values
In case there are no popular values, is the density = 1?
I thought there would be a density per bucket? Wouldn't that make sense?
Selectivity with bind variables (Oracle 9.2.0.5)
Vladimir Sadilovskiy, November 26, 2005 - 1:03 am UTC
Tom,
Could you look into this test case:
SQL> create table t1 as select 1 f from all_objects;
Table created.
SQL> create index i_t1 on t1(f);
Index created.
SQL> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 254',estimate_percent=>100, cascade=>true);
PL/SQL procedure successfully completed.
SQL> variable f number;
SQL> exec :f := 1;
PL/SQL procedure successfully completed.
SQL> set autotrace on
<b> with constants the plan is nice</b>
SQL> select count(*) from t1 where f=2;
COUNT(*)
----------
0
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=1
Bytes=3)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
490 bytes sent via SQL*Net to client
655 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select count(*) from t1 where f=1;
COUNT(*)
----------
45765
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=3)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'T1' (Cost=8 Card=45765 Bytes=137
295)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
73 consistent gets
0 physical reads
0 redo size
493 bytes sent via SQL*Net to client
655 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
<b>but with bind variable the plan is different</b>
SQL> select count(*) from t1 where f=:f;
COUNT(*)
----------
45765
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=4
58 Bytes=1374)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
73 consistent gets
0 physical reads
0 redo size
493 bytes sent via SQL*Net to client
655 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
<b>1/NDV doesn't seem to match calculated density </b>
SQL> select object_id from dba_objects where object_name='T1';
OBJECT_ID
----------
56142
SQL> set pages 999
SQL> select * from hist_head$ where obj#=56142;
OBJ# COL# BUCKET_CNT ROW_CNT CACHE_CNT NULL_CNT TIMESTAMP
---------- ---------- ---------- ---------- ---------- ---------- ---------
SAMPLE_SIZE MINIMUM MAXIMUM DISTCNT
----------- ---------- ---------- ----------
LOWVAL
----------------------------------------------------------------
HIVAL DENSITY
---------------------------------------------------------------- ----------
INTCOL# SPARE1 SPARE2 AVGCLN SPARE3 SPARE4
---------- ---------- ---------- ---------- ---------- ----------
56142 1 45765 1 1 0 25-NOV-05
45765 1 1 1
C102
C102 .000010925
1 1 2 3
Thank you.
- Vladimir
November 26, 2005 - 12:23 pm UTC
autotrace (explain plan) do not BIND PEEK - so they can "lie"
To demnonstrate this and the effect of bind variable peeking, I'll run this script:
create table t
as
select 99 id, a.* from all_objects a;
update t set id = 1 where rownum = 1;
create index t_idx on t(id);
begin
dbms_stats.gather_table_stats
( user, 'T',
method_opt=>'for all indexed columns size 254',
cascade=>TRUE );
end;
/
variable n number
alter session set sql_trace=true;
exec :n := 1
select count(object_type) from t n_is_1_first where id = :n;
exec :n := 99
select count(object_type) from t n_is_1_first where id = :n;
exec :n := 99
select count(object_type) from t n_is_99_first where id = :n;
exec :n := 1
select count(object_type) from t n_is_99_first where id = :n;
The tkprof was generated with:
tkprof tracefile outputfile aggregate=no explain=u/p
to get both an EXPLAIN PLAN as well as getting all four statements printed individually:
<b>we start with :N of one, the optimizer will actually optimize this statement as if the literal number one was in the query itself:</b>
BEGIN :n := 1; END;
********************************************************************************
select count(object_type) from t n_is_1_first where id = :n
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 70 (OPS$TKYTE)
<b>we did a hard parse (misses = 1), the number one was put in place of :n and the query optimizer said "use index! it'll only get one row!". In the following the row source operation plan is the ACTUAL PLAN used, the EXECUTION PLAN is what explain plan - which DOES NOT BIND PEEK - would give us. NOTE THEY ARE DIFFERENT. Explain plan doesn't have access to the binds:</b>
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
1 TABLE ACCESS BY INDEX ROWID T
1 INDEX RANGE SCAN T_IDX (object id 36010)
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
1 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 99; END;
********************************************************************************
select count(object_type) from t n_is_1_first where id = :n
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 70 (OPS$TKYTE)
<b>This query was soft parsed, the plan already existed. This too will therefore use an index range scan - even though in this case, a full scan would have been "better". Again, explain plan "lies" to us here</b>
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
30689 TABLE ACCESS BY INDEX ROWID T
30689 INDEX RANGE SCAN T_IDX (object id 36010)
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
30689 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 99; END;
********************************************************************************
select count(object_type) from t n_is_99_first where id = :n
<b>Note that this is a hard parse (changed the correlation name in the query, new query). This time - the optimizer plugs 99 into the query and optimizes. We do full scan this time:</b>
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 70 (OPS$TKYTE)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
30689 TABLE ACCESS FULL T
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
30689 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 1; END;
********************************************************************************
select count(object_type) from t n_is_99_first where id = :n
<b>but note that this is a soft parse, the full scan plan will be used - first one to run the query with bind variable inputs (first hard parse of it) gets to to "pick the plan".... So, basically the same query - just changed the correlation name and we have different plans:</b>
Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 70 (OPS$TKYTE)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
1 TABLE ACCESS FULL OBJ#(36009)
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
1 SORT (AGGREGATE)
1 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'T'
<b>You would need to use a 10053 trace to see the "true" cardinality in this case</b>
Unexpected cardinality
Jonathan Lewis, November 26, 2005 - 1:39 pm UTC
Test case does not reproduce on my 9.2.0.6
Note that what you have looks like the standard 1% estimate for function(col) = constant. As if Oracle is treating this like a coercion problem.
select count(*) from t1 where f=:f;
COUNT(*)
----------
45765
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=458 Bytes=1374)
card = 458 = ceil(45765/100).
My plan gives card={actual number of rows in table).
What do you see as the filter_predicates if you run the bind variable version (with a to_number() on the bind) through dbms_xplan ?
November 26, 2005 - 4:38 pm UTC
I didn't even look at the numbers to tell the truth, I saw "bind variable vs literal".....
but yes, I don't see it in 9206 - but in 9iR1:
ops$tkyte@ORA9IR1> create table t1 as select 1 f from all_objects;
Table created.
ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> create index i_t1 on t1(f);
Index created.
ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 254',estimate_percent=>99.99, cascade=>true);
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> variable f number
ops$tkyte@ORA9IR1> set autotrace traceonly explain
ops$tkyte@ORA9IR1> select * from t1 where f=1;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=31020 Bytes=93060)
1 0 TABLE ACCESS (FULL) OF 'T1' (Cost=8 Card=31020 Bytes=93060)
ops$tkyte@ORA9IR1> select * from t1 where f=2;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=3)
1 0 INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=1 Bytes=3)
ops$tkyte@ORA9IR1> select * from t1 where f=:f;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=310 Bytes=930)
1 0 INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=310 Bytes=930)
ops$tkyte@ORA9IR1> set autotrace off
it does the 1%, in 9iR2 - 9206 - it did this:
ops$tkyte@ORA9IR2> select * from t1 where f=:f;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=30693 Bytes=92079)
1 0 INDEX (FAST FULL SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=7 Card=30693 Bytes=92079)
Odd cardinality
Jonathan Lewis, November 26, 2005 - 2:00 pm UTC
I forgot to mention - Wolfgang Breitling has mentioned to me an oddity that appears when you create a histogram on a column with just one value. I think there's an article on his site www.centrexcc.com which mentions it.
question about the density calculation
William Moffatt, November 28, 2005 - 4:52 am UTC
Hi Tom,
I'm just curious about something in the support note: it says about the density calculation :
"The number of distinct NON-null values for a column (COL1) on table TABLE1
can be obtained as follows:
select distinct count(COL1)
from TABLE1
where COL1 is not null;
"
Firstly, isn't the "not null" implied by the count?
Secondly, shouldn't that be "count(distinct COL1)", rather than "distinct count(COL1)"?
This is just idle curiosity, since the intent is obvious, and I presume the calculation works, however it's done :)
(I did test it out, on 8.1.7.4, and it does appear that the query as written is incorrect, however the note does say pre 7.3 - but then again, the "post 7.3" section refers you back to this query too). I've seen you raise documentation bugs before, and similarly if I've made a mistake somewhere, I'd rather know what and why :)
BTW, many thanks for the marvelous resource that this site is.
regards,
William
November 28, 2005 - 7:40 am UTC
it should just be
select count(distinct col1) from table1;
the query presented in the note would not make sense.
select distinct count(col1) from table1 where col1 is not null;
is the same as
select count(col1) from table1;
which would count non-null col1 values - they meant count(distinct col1)
Calculation of density
Mette, July 30, 2008 - 6:41 am UTC
Thanks to this entry I now understand how we get the estimated number of rows when using histograms - thanks.
But I dont understand how Oracle calculates the density (used for the non popular values).
I'm on 10.2.0.3 on AIX
This is the situation:
250 buckets
23644 distinct values
34 popular (spanning more than one bucket)
1.799.977 rows
stats run with 100%
The density is 0.002267921.... (taken from TAB_COLUMNS)
How does he get that number?
I tried with the formula from above - and got nothing like this number.
215 / 23644 = 0.009093216
What have I missed here?
Best regards