Thanks for the question, Matt.

Asked: March 06, 2002 - 11:50 am UTC

Last updated: November 28, 2005 - 7:40 am UTC

Version: 8.1.6

Viewed 10K+ times! This question is

You Asked

What does the density column in dba_tab_columns mean?

and Tom said...

Here is a support note on the topic:

Article-ID: <Note:43041.1>
Circulation: PUBLISHED (LIMITED)
Folder: server.Performance.SqlTuning
Topic: Optimizer Terms
Title: QTUNE: What is Density?
Document-Type: BULLETIN
Impact: MEDIUM
Skill-Level: NOVICE
Updated-Date: 05-APR-2000 21:20:35
References: <NOTE:50750.1> Shared-Refs:

Content-Type: TEXT/PLAIN
Keywords: CBO; FEATURE;
Products: 5/RDBMS (V7.X to V8.X);
Platforms: GENERIC;

PURPOSE

To give an explanation of 'Density' as used by the CBO.

SCOPE & APPLICATION

For Oracle Support Analysts trying to understand how density is used by the
CBO when determining the best access path.

Density is a statistic used by the Cost Based Optimizer to give selectivity
estimates for columns where better information is unavailable (i.e. from
histograms etc.).

You must gather statistics on your table to get density statistics.

You can see density statistics by looking at the following views:

USER_TAB_COLUMNS
ALL_TAB_COLUMNS
DBA_TAB_COLUMNS

Density is a column statistic and provides selectivity estimates for
equi-join predicates (e.g. and A.COL1 = B.COL1) and equality predicates
(e.g. COL1 = 5).

The density is expressed as a decimal number between 0 and 1.
Values close to 1 indicate that this column is unselective
Values close to 0 indicate that this column is highly selective

The more selective a column, the less rows are likely to be returned by a
query referencing this column in its predicate list.

The column selectivity is part of the equation used to decide on the best
path for a query to take to retrieve the data required in the most effective
manner and hence impacts the final cost value for the query.

Density is calculated as follows:

Pre 7.3
~~~~~~~

Density = 1 / Number of distinct NON null values

The number of distinct NON-null values for a column (COL1) on table TABLE1
can be obtained as follows:

select distinct count(COL1)
from TABLE1
where COL1 is not null;

7.3+
~~~~

The Density calculation has been refined by the use of histograms. If
you have created histograms on your columns we can now use the histogram
information to give more accurate information. Otherwise the Density is
calculated as before. With histograms we can use information on
popular and non-popular values to determine the selectivity.

A non-popular value is one that does not span multiple bucket end points.
A popular value is one that spans multiple end points.

(Refer to <Note:50750.1> for details on histograms)

For non-popular values the density is calculated as the number of non-popular
values divided by the total number of values. Formula:

Density = Number of non-popular values
----------------------------
total number of values

We only use the density statistic for non-popular values.

Popular values calculate the selectivity of a particular column values by
using histograms as follows:

The Selectivity for popular values is calculated as the number of end points
spanned by that value divided by the total number of end points. Formula:

Selectivity = Number of end points spanned by this value
------------------------------------------
total number of end points

[Top of Page]

Rating

(6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Stephan Bressler, November 21, 2002 - 8:00 am UTC

Hi,

what's the formula
Density = Number of non-popular values
----------------------------
total number of values

In case there are no popular values, is the density = 1?

I thought there would be a density per bucket? Wouldn't that make sense?

Selectivity with bind variables (Oracle 9.2.0.5)

Vladimir Sadilovskiy, November 26, 2005 - 1:03 am UTC

Tom,

Could you look into this test case:

SQL> create table t1 as select 1 f from all_objects;

Table created.

SQL> create index i_t1 on t1(f);

Index created.

SQL> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 254',estimate_percent=>100, cascade=>true);

PL/SQL procedure successfully completed.

SQL> variable f number;
SQL> exec :f := 1;

PL/SQL procedure successfully completed.

SQL> set autotrace on
<b> with constants the plan is nice</b>
SQL> select count(*) from t1 where f=2;

  COUNT(*)
----------
         0


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
   1    0   SORT (AGGREGATE)
   2    1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=1
           Bytes=3)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        490  bytes sent via SQL*Net to client
        655  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> select count(*) from t1 where f=1;

  COUNT(*)
----------
     45765


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=3)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'T1' (Cost=8 Card=45765 Bytes=137
          295)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         73  consistent gets
          0  physical reads
          0  redo size
        493  bytes sent via SQL*Net to client
        655  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

<b>but with bind variable the plan is different</b>
SQL> select count(*) from t1 where f=:f;

  COUNT(*)
----------
     45765


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
   1    0   SORT (AGGREGATE)
   2    1     INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=4
          58 Bytes=1374)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         73  consistent gets
          0  physical reads
          0  redo size
        493  bytes sent via SQL*Net to client
        655  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

<b>1/NDV doesn't seem to match calculated density </b>
SQL> select object_id from dba_objects where object_name='T1';

 OBJECT_ID
----------
     56142

SQL> set pages 999
SQL> select * from hist_head$ where obj#=56142;

      OBJ#       COL# BUCKET_CNT    ROW_CNT  CACHE_CNT   NULL_CNT TIMESTAMP
---------- ---------- ---------- ---------- ---------- ---------- ---------
SAMPLE_SIZE    MINIMUM    MAXIMUM    DISTCNT
----------- ---------- ---------- ----------
LOWVAL
----------------------------------------------------------------
HIVAL                                                               DENSITY
---------------------------------------------------------------- ----------
   INTCOL#     SPARE1     SPARE2     AVGCLN     SPARE3     SPARE4
---------- ---------- ---------- ---------- ---------- ----------
     56142          1      45765          1          1          0 25-NOV-05
      45765          1          1          1
C102
C102                                                             .000010925
         1          1          2          3

Thank you.

- Vladimir

November 26, 2005 - 12:23 pm UTC

autotrace (explain plan) do not BIND PEEK - so they can "lie"

To demnonstrate this and the effect of bind variable peeking, I'll run this script:


create table t
as
select 99 id, a.* from all_objects a;

update t set id = 1 where rownum = 1;

create index t_idx on t(id);
begin
   dbms_stats.gather_table_stats
   ( user, 'T',
     method_opt=>'for all indexed columns size 254',
     cascade=>TRUE );
end;
/
variable n number
alter session set sql_trace=true;
exec :n := 1
select count(object_type) from t n_is_1_first where id = :n;
exec :n := 99
select count(object_type) from t n_is_1_first where id = :n;
exec :n := 99
select count(object_type) from t n_is_99_first where id = :n;
exec :n := 1
select count(object_type) from t n_is_99_first where id = :n;



The tkprof was generated with:

tkprof tracefile outputfile aggregate=no explain=u/p

to get both an EXPLAIN PLAN as well as getting all four statements printed individually:

<b>we start with :N of one, the optimizer will actually optimize this statement as if the literal number one was in the query itself:</b>


BEGIN :n := 1; END;
********************************************************************************
select count(object_type) from t n_is_1_first where id = :n

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

<b>we did a hard parse (misses = 1), the number one was put in place of :n and the query optimizer said "use index! it'll only get one row!".  In the following the row source operation plan is the ACTUAL PLAN used, the EXECUTION PLAN is what explain plan - which DOES NOT BIND PEEK - would give us.  NOTE THEY ARE DIFFERENT.  Explain plan doesn't have access to the binds:</b>

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
      1   TABLE ACCESS BY INDEX ROWID T
      1    INDEX RANGE SCAN T_IDX (object id 36010)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
      1    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 99; END;
********************************************************************************
select count(object_type) from t n_is_1_first where id = :n

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

<b>This query was soft parsed, the plan already existed.  This too will therefore use an index range scan - even though in this case, a full scan would have been "better".  Again, explain plan "lies" to us here</b>


Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
  30689   TABLE ACCESS BY INDEX ROWID T
  30689    INDEX RANGE SCAN T_IDX (object id 36010)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
  30689    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 99; END;
********************************************************************************
select count(object_type) from t n_is_99_first where id = :n

<b>Note that this is a hard parse (changed the correlation name in the query, new query).  This time - the optimizer plugs 99 into the query and optimizes.  We do full scan this time:</b>

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
  30689   TABLE ACCESS FULL T


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
  30689    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'
********************************************************************************
BEGIN :n := 1; END;
********************************************************************************
select count(object_type) from t n_is_99_first where id = :n

<b>but note that this is a soft parse, the full scan plan will be used - first one to run the query with bind variable inputs (first hard parse of it) gets to to "pick the plan".... So, basically the same query - just changed the correlation name and we have different plans:</b>


Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 70  (OPS$TKYTE)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE
      1   TABLE ACCESS FULL OBJ#(36009)


Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
      1    TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'T'


<b>You would need to use a 10053 trace to see the "true" cardinality in this case</b>

Unexpected cardinality

Jonathan Lewis, November 26, 2005 - 1:39 pm UTC

Test case does not reproduce on my 9.2.0.6

Note that what you have looks like the standard 1% estimate for function(col) = constant. As if Oracle is treating this like a coercion problem.

select count(*) from t1 where f=:f;

COUNT(*)
----------
45765

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=3)
1 0 SORT (AGGREGATE)
2 1 INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=458 Bytes=1374)

card = 458 = ceil(45765/100).

My plan gives card={actual number of rows in table).

What do you see as the filter_predicates if you run the bind variable version (with a to_number() on the bind) through dbms_xplan ?

November 26, 2005 - 4:38 pm UTC

I didn't even look at the numbers to tell the truth, I saw "bind variable vs literal".....


but yes, I don't see it in 9206 - but in 9iR1:

ops$tkyte@ORA9IR1> create table t1 as select 1 f from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> create index i_t1 on t1(f);
 
Index created.
 
ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 254',estimate_percent=>99.99, cascade=>true);
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR1>
ops$tkyte@ORA9IR1> variable f number
ops$tkyte@ORA9IR1> set autotrace traceonly explain
ops$tkyte@ORA9IR1> select * from t1 where f=1;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=31020 Bytes=93060)
   1    0   TABLE ACCESS (FULL) OF 'T1' (Cost=8 Card=31020 Bytes=93060)
 
 
 
ops$tkyte@ORA9IR1> select * from t1 where f=2;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=3)
   1    0   INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=1 Bytes=3)
 
 
 
ops$tkyte@ORA9IR1> select * from t1 where f=:f;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=310 Bytes=930)
   1    0   INDEX (RANGE SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=1 Card=310 Bytes=930)
 
 
 
ops$tkyte@ORA9IR1> set autotrace off


it does the 1%, in 9iR2 - 9206 - it did this:


ops$tkyte@ORA9IR2> select * from t1 where f=:f;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=30693 Bytes=92079)
   1    0   INDEX (FAST FULL SCAN) OF 'I_T1' (NON-UNIQUE) (Cost=7 Card=30693 Bytes=92079)

Odd cardinality

Jonathan Lewis, November 26, 2005 - 2:00 pm UTC

I forgot to mention - Wolfgang Breitling has mentioned to me an oddity that appears when you create a histogram on a column with just one value. I think there's an article on his site www.centrexcc.com which mentions it.

question about the density calculation

William Moffatt, November 28, 2005 - 4:52 am UTC

Hi Tom,

I'm just curious about something in the support note: it says about the density calculation :

"The number of distinct NON-null values for a column (COL1) on table TABLE1
can be obtained as follows:

select distinct count(COL1)
from TABLE1
where COL1 is not null;
"

Firstly, isn't the "not null" implied by the count?

Secondly, shouldn't that be "count(distinct COL1)", rather than "distinct count(COL1)"?

This is just idle curiosity, since the intent is obvious, and I presume the calculation works, however it's done :)

(I did test it out, on 8.1.7.4, and it does appear that the query as written is incorrect, however the note does say pre 7.3 - but then again, the "post 7.3" section refers you back to this query too). I've seen you raise documentation bugs before, and similarly if I've made a mistake somewhere, I'd rather know what and why :)

BTW, many thanks for the marvelous resource that this site is.

regards,

William

November 28, 2005 - 7:40 am UTC

it should just be

select count(distinct col1) from table1;

the query presented in the note would not make sense.

select distinct count(col1) from table1 where col1 is not null;

is the same as

select count(col1) from table1;

which would count non-null col1 values - they meant count(distinct col1)

Calculation of density

Mette, July 30, 2008 - 6:41 am UTC

Thanks to this entry I now understand how we get the estimated number of rows when using histograms - thanks.

But I dont understand how Oracle calculates the density (used for the non popular values).

I'm on 10.2.0.3 on AIX

This is the situation:

250 buckets
23644 distinct values
34 popular (spanning more than one bucket)
1.799.977 rows
stats run with 100%

The density is 0.002267921.... (taken from TAB_COLUMNS)

How does he get that number?

I tried with the formula from above - and got nothing like this number.

215 / 23644 = 0.009093216

What have I missed here?

Best regards

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

Breadcrumb

Question and Answer