Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question.

Asked: August 19, 2004 - 11:16 am UTC

Last updated: October 29, 2004 - 8:15 am UTC

Version: 9.2

Viewed 1000+ times

You Asked

Hi Tom
I have a question regarding DBA_INDEXES.. we use DBMS_STATS to analyze our tables and the indexes.I queried the DBA_INDEXES and i found some interesting results.
The NUM_ROWS and CLUSTERING_FACTOR are very close.we do periodic deletes and lots of insert appends. Does this Indicate any upcoming danger?
thanks


and Tom said...

no, not at all.

the clustering factor is a measure of how sorted the data in the database table is with regards to the index.

take this table:

ops$tkyte@ORA9IR2> create table t ( id int, name varchar2(30) );

Table created.

ops$tkyte@ORA9IR2> insert into t
2 select object_id, object_name
3 from all_objects
4 order by 1;

30679 rows created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create index t_id_idx on t(id);

Index created.

ops$tkyte@ORA9IR2> create index t_name_idx on t(name);

Index created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics for table for all indexes;

Table analyzed.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select a.index_name, a.num_rows, a.clustering_factor, b.blocks
2 from user_indexes a, user_tables b
3 where b.table_name = 'T'
4 and a.table_name = b.table_name;

INDEX_NAME NUM_ROWS CLUSTERING_FACTOR BLOCKS
------------------------------ ---------- ----------------- ----------
T_ID_IDX 30679 141 142
T_NAME_IDX 30679 12330 142

this shows us that since the clustering factor for t_id_idx is close to the number of blocks and far from the number of rows -- the table is "sorted" on disk in the same sort order as the index.

the clustering factor for t_name_idx is far from the number of blocks, closer to the number of rows - that indicates the sorted order of the table is NOT the same as the index on name.

but -- how many ways can a table be sorted??? this table can EITHER be sorted by name or by id:


ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> truncate table t;

Table truncated.

ops$tkyte@ORA9IR2> drop index t_id_idx;

Index dropped.

ops$tkyte@ORA9IR2> drop index t_name_idx;

Index dropped.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t
2 select object_id, object_name
3 from all_objects
4 order by 2;

30679 rows created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create index t_id_idx on t(id);

Index created.

ops$tkyte@ORA9IR2> create index t_name_idx on t(name);

Index created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics for table for all indexes;

Table analyzed.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select a.index_name, a.num_rows, a.clustering_factor, b.blocks
2 from user_indexes a, user_tables b
3 where b.table_name = 'T'
4 and a.table_name = b.table_name;

INDEX_NAME NUM_ROWS CLUSTERING_FACTOR BLOCKS
------------------------------ ---------- ----------------- ----------
T_ID_IDX 30679 14254 142
T_NAME_IDX 30679 141 142


so, if you "fix one" you "break the other"

remember a table can only be sorted one way.

rebuilding an index cannot affect its clustering factor at all

a table rebuild can -- but only one index would likely benefit.


It is normal for the clustering factor to be between the two -- data is usually NOT stored sorted (maybe a primary key on a sequence could be)



Rating

  (6 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Good Explanation

A reader, August 20, 2004 - 9:51 am UTC

Does this mean that rebuiding tables(to reduce the HWM) can also have a detrimental effect on the performance?

What about if you don't specify an order at table creation?

Gj, August 20, 2004 - 9:54 am UTC

Purely for my own interest...

I created a table:

Name Null? Type
----------------------------------------- -------- ----------------------------
C1 VARCHAR2(32)
C2 NUMBER

I filled it with 500 rows using this PL/SQL block:

begin
for i in 1..500
loop
insert into tab1 values(i, i+i-(i*3));
end loop;
end;


I then added a standard normal index to each column, analyzed the indexes and the table.

I then ran the cluster display SQL, it reported that the c2 column (which contains the -1 to -500 number), was clustered better.

Why? I specified no order when I created the table. Why did it chose the second column with the minus values over column1? I've probably missed something really blindly obvious, but I just can't see why!



Tom Kyte
August 20, 2004 - 8:22 pm UTC

500 rows in a teeny tiny table like that isn't a good test case.

num rows and num blocks are far far too "wacky"



the clustering factor is a measure of the number of blocks that would be read (LIO wise) if you range scanned the index -> table from beginning to end.

but really, the clustering factor would be tiny here -- both should have tbeen the "same" -- 1.  this is a one block table!

ops$tkyte@ORA9IR2> edit
Wrote file afiedt.buf
 
  1  begin
  2  for i in 1..500
  3  loop
  4      insert into t values(i, i+i-(i*3));
  5  end loop;
  6* end;
ops$tkyte@ORA9IR2> /
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> create index t_c1 on t(c1);
 
Index created.
 
ops$tkyte@ORA9IR2> create index t_c2 on t(c2);
 
Index created.
 
ops$tkyte@ORA9IR2> select num_rows from user_tables where table_name = 'T';
 
  NUM_ROWS
----------
 
 
ops$tkyte@ORA9IR2> analyze table t compute statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> select num_rows from user_tables where table_name = 'T';
 
  NUM_ROWS
----------
       500
 
ops$tkyte@ORA9IR2> select blocks from user_tables where table_name = 'T';
 
    BLOCKS
----------
         5
 
ops$tkyte@ORA9IR2> select clustering_factor from user_indexes where table_name= 'T';
 
CLUSTERING_FACTOR
-----------------
                1
                1
 
ops$tkyte@ORA9IR2> select dbms_rowid.rowid_block_number(rowid), count(*) from t
  2  group by  dbms_rowid.rowid_block_number(rowid);
 
DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)   COUNT(*)
------------------------------------ ----------
                                 246        500
 

 

Column types

Gj, August 20, 2004 - 10:15 am UTC

Having sat and thought this through, I believe it's the column types that are affecting the outcome. I am happy to be corrected.

How is it calculated

A reader, August 21, 2004 - 12:40 am UTC

How is clustering factor calculated?

Tom Kyte
August 21, 2004 - 11:52 am UTC

it is calculated conceptually by scanning the index and incrementing a counter everytime two adjacent index entries point to different database blocks.

if the counter is close to the number of blocks -- then the table is sorted by the index key values (as we scanned through the index, we incremented the counter very infrequently -- most of the adjacent index entries pointed to the same block)

if the counter is close to the number of rows -- then the table is not sorted at all by the index key values -- every index entry points to a different block than it's neighbors do.




and what with db_block_size en TPS's Data, & Index

Julio David Moreno, October 28, 2004 - 11:06 pm UTC

tablespace's data with db_block_size of 16K and tablespace's index with db_block_size of 8K.
* How change clustering factor?
thanks.
J.David


Tom Kyte
October 29, 2004 - 8:15 am UTC

don't do that.  not what they were designed for.  makes your database harder to manage.


the cluster factor is a measure of how sorted the table is with respect to the index key.  it is a measure of how many consistent gets would have to be performed to read the entire table via the index -- eg: 

create table t ( x int, y int );
alter table t add constraint t_pk primary key(x);

select /*+ index( t t_pk ) */ y from t;


so we would 

  index range scan -> table access by index rowid;

how many LIO's did we need to do that (on the table) with an array size of "ALL ROWS".  that is what the cluster factor would be.

if the table is sorted by the primary key -- the cluster factor would equal the number of blocks.  if the 16k table is 1/2 the number of blocks of the 8k table, it would halve it.  if the data was totally scrambled, it would do basically nothing.

Really, the only way to "affect" the clustering factor in a meaningful way is to have the data on disk "sorted" -- but ask yourself this important question first:  how many ways can a tables data be sorted?  

If maintaining order (or clustering) is vital you want to read about IOT's, b*tree clusters and hash clusters -- they are all structures that make data "have a location"

ops$tkyte@ORA9IR2> drop table t1;
 
Table dropped.
 
ops$tkyte@ORA9IR2> drop table t2;
 
Table dropped.
 
ops$tkyte@ORA9IR2> create table t1 as select * from all_objects order by object_id;
 
Table created.
 
ops$tkyte@ORA9IR2> create index t1_idx on t1(object_id);
 
Index created.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_index_stats( user, 'T1_IDX' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> create table t2 tablespace ts_16k as select * from t1 order by object_id;
 
Table created.
 
ops$tkyte@ORA9IR2> create index t2_idx on t2(object_id);
 
Index created.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_index_stats( user, 'T2_IDX' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select index_name, clustering_factor from user_indexes where index_name like 'T__IDX';
 
INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
T1_IDX                                       420
T2_IDX                                       207
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> drop table t1;
 
Table dropped.
 
ops$tkyte@ORA9IR2> drop table t2;
 
Table dropped.
 
ops$tkyte@ORA9IR2> create table t1 as select * from all_objects order by reverse(object_id);
 
Table created.
 
ops$tkyte@ORA9IR2> create index t1_idx on t1(object_id);
 
Index created.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_index_stats( user, 'T1_IDX' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> create table t2 tablespace ts_16k as select * from t1 order by reverse(object_id);
 
Table created.
 
ops$tkyte@ORA9IR2> create index t2_idx on t2(object_id);
 
Index created.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_index_stats( user, 'T2_IDX' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select index_name, clustering_factor from user_indexes where index_name like 'T__IDX';
 
INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
T1_IDX                                     30679
T2_IDX                                     30678
 

 

Excellent explanation

Kai Quale, August 15, 2006 - 8:11 am UTC


More to Explore

DBMS_STATS

More on PL/SQL routine DBMS_STATS here