Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, Nikhil.

Asked: January 18, 2017 - 4:31 am UTC

Last updated: October 10, 2018 - 1:52 am UTC

Version: 11.2.0

Viewed 10K+ times! This question is

You Asked

Hi Team,

We encountered production issue related 100% CPU usage.

New script introduced that was rebuilding indexes on table having size 450GB with parallel degree 8.

I read one arrival on asktom where they explained rebuild recreates the Index and we do rebuild of Index "almost never".

Could you please give some info like what is th difference between rebuild Index and gather Index stats.

and Connor said...

Huge difference !

gather index stats => reads the index, calculates some metrics and stores in them the data dictionary so that queries can be optimized well. Has no impact whatsoever on the *structure* of the index.

rebuild index => effectively trashes the index and builds a new version of the index in its place. (Its very similar to a 'create index' command).

It's rare you need to rebuild indexes - here's a great paper on the topic

https://richardfoote.files.wordpress.com/2007/12/index-internals-rebuilding-the-truth.pdf

It's generally rare that you need to explicitly gather index stats - by default there is a nightly task that will take care of that for you in *most* circumstances.

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Elaborate

A reader, January 18, 2017 - 6:45 am UTC

"reads the index, calculates some metrics and stores in them the data dictionary so that queries can be optimized "

From which dictionary views?
To which metric components?

Please details
Connor McDonald
January 19, 2017 - 1:00 am UTC

USER_INDEXES
USER_IND_STATISTICS

and others if partitioning is involved.

Elaborate

Rajeshwaran, Jeyabal, January 18, 2017 - 8:23 am UTC

"reads the index, calculates some metrics and stores in them the data dictionary so that queries can be optimized "

From which dictionary views?
To which metric components?


The dictionary view related to index statistics, would be user_indexes and the details available here are used to optimize the sql at run time

To start with, create a Table with index on it.
demo@ORA11G> create table t
  2  as
  3  select *
  4  from all_objects
  5  where 1 = 0 ;

Table created.

demo@ORA11G> create index t_idx on t(object_id);

Index created.

demo@ORA11G> insert into t select * from all_objects;

84884 rows created.

demo@ORA11G> commit;

Commit complete.

demo@ORA11G> column index_name format a15
demo@ORA11G> select i.index_name, i.leaf_blocks,i.blevel,i.distinct_keys,
  2         i.num_rows,i.clustering_factor,t.blocks,
  3        i.last_analyzed
  4  from user_indexes i ,
  5       user_tables t
  6  where t.table_name = i.table_name
  7  and t.table_name ='T';

INDEX_NAME      LEAF_BLOCKS     BLEVEL DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR     BLOCKS LAST_ANALYZED
--------------- ----------- ---------- ------------- ---------- ----------------- ---------- --------------------
T_IDX                     0          0             0          0                 0            18-JAN-2017 13:28:33

1 row selected.

Since no stats gathering is done, we don't have any info available in user_indexes.
demo@ORA11G> exec dbms_stats.gather_Table_stats(user,'T',cascade=>true);

PL/SQL procedure successfully completed.

demo@ORA11G> select i.index_name, i.leaf_blocks,i.blevel,i.distinct_keys,
  2         i.num_rows,i.clustering_factor,t.blocks,
  3        i.last_analyzed
  4  from user_indexes i ,
  5       user_tables t
  6  where t.table_name = i.table_name
  7  and t.table_name ='T';

INDEX_NAME      LEAF_BLOCKS     BLEVEL DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR     BLOCKS LAST_ANALYZED
--------------- ----------- ---------- ------------- ---------- ----------------- ---------- --------------------
T_IDX                   310          1         84884      84884              1259       1252 18-JAN-2017 13:29:38

1 row selected.

demo@ORA11G>

Running the stats gathering process, updates the info at user_indexes and now the Clustering factor (organization of data in the table with respect to the sorted order in the index) is close to the BLOCKS - so data in the table is organized similar to your index. - the chances of getting the index scan is highly possible.
demo@ORA11G> set autotrace traceonly explain
demo@ORA11G> select * from t where object_id between 5000 and 20000;

Execution Plan
----------------------------------------------------------
Plan hash value: 470836197

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       | 11803 |  1129K|   225   (2)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     | 11803 |  1129K|   225   (2)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX | 11803 |       |    46   (3)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID">=5000 AND "OBJECT_ID"<=20000)

demo@ORA11G> set autotrace off

now, let us Trunc and reload this data, in a different order.
demo@ORA11G> truncate table t;

Table truncated.

demo@ORA11G> insert into t select * from all_objects order by dbms_random.random;

84884 rows created.

demo@ORA11G> commit;

Commit complete.

demo@ORA11G> exec dbms_stats.gather_Table_stats(user,'T',cascade=>true);

PL/SQL procedure successfully completed.

demo@ORA11G> select i.index_name, i.leaf_blocks,i.blevel,i.distinct_keys,
  2         i.num_rows,i.clustering_factor,t.blocks,
  3        i.last_analyzed
  4  from user_indexes i ,
  5       user_tables t
  6  where t.table_name = i.table_name
  7  and t.table_name ='T';

INDEX_NAME      LEAF_BLOCKS     BLEVEL DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR     BLOCKS LAST_ANALYZED
--------------- ----------- ---------- ------------- ---------- ----------------- ---------- --------------------
T_IDX                   252          1         84884      84884             84806       1252 18-JAN-2017 13:39:45

1 row selected.

demo@ORA11G> 

The clustering factor changes now, not even close to the BLOCKS, instead close to the number of rows. which confirms that each entry in the leaf block of index points to different blocks table blocks ( table is sorted completely different from the index sorting order).
so given this query at runtime, optimizer feels that full scan is better than jumping back and forth between the leaf block to different Table blocks (since each entry in the leaf block points to different table blocks)
demo@ORA11G> set autotrace traceonly explain
demo@ORA11G> select * from t where object_id between 5000 and 20000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 11803 |  1129K|   642   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    | 11803 |  1129K|   642   (3)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"<=20000 AND "OBJECT_ID">=5000)

demo@ORA11G> set autotrace off

Rearding stats and accurate distinct value or keys

A reader, October 08, 2018 - 5:27 pm UTC

My requirements is to get an exact distinct values .
Say I have a very big subpartitioned table .
Each partition is loaded daily within billions of rows. No pk on. But a nonunique index on (id1, id2)

These 2 cols could be the cols of a unique index but we don't want it because of impact on load load should not be interrupted.

What stats should I do in order to get the ndv or the distinct keys on index ?
So if I compare the num_rows vs the distinct this ensure non duplicate??

Ideally a unique constraint. But as said cannot make load fails in order to report an cannot make it differed cause huge load of billions and commit occur regularly whilst loading.
Connor McDonald
October 10, 2018 - 1:52 am UTC

My requirements is to get an exact distinct values .


Then you need to enforce it either *as you load* (with a unique constraint or index) or check for it afterwards using a query.

A basic query is going to outperform a gather stats call, because gather stats does more work, ie

select count(*)
from ( select distinct id1,id2 from my_table );


That is the fundamental concept - then its just a case of making sure its fast enough, via *potential* options such as:

- parallelism
- indexing id1,id2 to allow a index fast full scan rather than table scan
- if id1,id2 are part of partitioning keys, then only checking partitions touched by the most recent load rather than all of them