Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, trinh.

Asked: May 29, 2001 - 10:27 pm UTC

Last updated: September 15, 2007 - 3:26 pm UTC

Version: 8.0.6

Viewed 10K+ times! This question is

You Asked

Tom,
In your quetion answers, you prefer:

analyze table t compute statistics
for table
for all indexes
for all indexed columns;

does "analyze table t compute statistics
for table"
cover all the rest?

and
"analyze table t compute statistics
for all indexes" covers "analyze table t compute statistics
for all indexed columns"?
Thanks.

and Tom said...

They all do different things.

"for table" analyzes just the table.
"for all indexes" grabs the indexes for the table and analyzes all of them.
"for all indexed columns" computes histograms for any column that is indexed.

Here is an exmaple showing that each adds its own bit of data:

tkyte@TKYTE816> create table t ( x int primary key );
Table created.

tkyte@TKYTE816> insert into t values ( 1 );
1 row created.


tkyte@TKYTE816> select table_name, num_rows from user_tables;

TABLE_NAME NUM_ROWS
---------- ----------
T

tkyte@TKYTE816> select index_name, num_rows from user_indexes;

INDEX_NAME NUM_ROWS
------------------------------ ----------
SYS_C005003

tkyte@TKYTE816> select * from user_tab_histograms;

no rows selected

So, when we start -- the optimizer knows NOTHING about the table/index/columns...

tkyte@TKYTE816> analyze table t compute statistics for table;
Table analyzed.

tkyte@TKYTE816> select table_name, num_rows from user_tables;

TABLE_NAME NUM_ROWS
---------- ----------
T 1

tkyte@TKYTE816> select index_name, num_rows from user_indexes;

INDEX_NAME NUM_ROWS
------------------------------ ----------
SYS_C005003

tkyte@TKYTE816> select * from user_tab_histograms;

no rows selected

after analyzing the TABLE, the optimizer understands something about the table itself but not the indexes and not the columns in the indexes

tkyte@TKYTE816> analyze table t compute statistics for all indexes;
Table analyzed.

tkyte@TKYTE816> select table_name, num_rows from user_tables;

TABLE_NAME NUM_ROWS
---------- ----------
T 1

tkyte@TKYTE816> select index_name, num_rows from user_indexes;

INDEX_NAME NUM_ROWS
------------------------------ ----------
SYS_C005003 1

tkyte@TKYTE816> select * from user_tab_histograms;

no rows selected

After analyzing all of the indexes, the optimizer knows a little more...

tkyte@TKYTE816> analyze table t compute statistics for all indexed columns;
Table analyzed.

tkyte@TKYTE816> select table_name, num_rows from user_tables;

TABLE_NAME NUM_ROWS
---------- ----------
T 1

tkyte@TKYTE816> select index_name, num_rows from user_indexes;

INDEX_NAME NUM_ROWS
------------------------------ ----------
SYS_C005003 1

tkyte@TKYTE816> select * from user_tab_histograms;

TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL
---------- ----------- --------------- -------------- ---------------
T X 0 1
T X 1 1

tkyte@TKYTE816>

and finally, the histograms for the indexed columns are there. What stats you compute depend on your needs. Histograms are generally not computed in an OLTP system but would be in a data warehouse. In many cases, just having the statistics on the tables is sufficient for good performance.

Rating

  (44 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Nicely explained

Gururaj, May 30, 2001 - 10:36 am UTC

Thanks Tom, explanation is very good. Looks even better with examples

Still Confused...

Vineet, November 16, 2001 - 3:52 pm UTC

Tom,

I am still confused. I read this documetn before asking you. My question is for analyze table TABLE_NAME compute statistics command not for

analyze table t compute statistics
for table
for all indexes
for all indexed columns;


I mean analyze table without any option like for table or for all indexes or for all indexed columns...

Don't mean to beat a dead horse...

Steve McKee, February 14, 2002 - 4:11 pm UTC


Hi Tom,

I just want to make sure I understand the documentation. From the documentation (8.1.7):

"Specify a table to be analyzed. When you collect statistics for a table, Oracle also automatically collects the statistics for each of the table’s indexes and domain
indexes, provided that no for_clauses are used."

Therefore, is "analyze table <table> compute statistics"

the same as:

analyze table t compute statistics
for table
for all indexes
for all indexed columns;


One of our main designers and I too want to make sure.

I appreciate it Tom.

Steve

Tom Kyte
February 22, 2002 - 9:09 am UTC

No, it is not.  Consider:



ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t ( x int primary key, y int );

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into t values ( 1, 1 );

1 row created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> select 'TABLE', table_name, to_char(num_rows) num_rows from user_tables
  2  union all
  3  select 'INDEX', index_name, to_char(num_rows) from user_indexes
  4  union all
  5  select 'HIST', table_name, column_name from user_tab_histograms;

'TABL TABLE_NAME                     NUM_ROWS
----- ------------------------------ --------------------
TABLE T
INDEX SYS_C005115

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> select 'TABLE', table_name, to_char(num_rows) num_rows from user_tables
  2  union all
  3  select 'INDEX', index_name, to_char(num_rows) from user_indexes
  4  union all
  5  select 'HIST', table_name, column_name from user_tab_histograms;

'TABL TABLE_NAME                     NUM_ROWS
----- ------------------------------ --------------------
TABLE T                              1
INDEX SYS_C005115                    1
HIST  T                              X
HIST  T                              Y
HIST  T                              X
HIST  T                              Y

6 rows selected.

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t delete statistics;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics
  2  for table
  3  for all indexes
  4  for all indexed columns;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> select 'TABLE', table_name, to_char(num_rows) num_rows from user_tables
  2  union all
  3  select 'INDEX', index_name, to_char(num_rows) from user_indexes
  4  union all
  5  select 'HIST', table_name, column_name from user_tab_histograms;

'TABL TABLE_NAME                     NUM_ROWS
----- ------------------------------ --------------------
TABLE T                              1
INDEX SYS_C005115                    1
HIST  T                              X
HIST  T                              X

ops$tkyte@ORA817DEV.US.ORACLE.COM> 


The analyze table t gets all of the columns, for all indexed columns does not. 

Subrat

Subrat, March 23, 2002 - 8:02 pm UTC

Hi tom,
I appreciate your explanations a lot..
Thats too good for gaining knowledge ...
Hats off to you .
I just want to know 'Will you advise to analyze the Primary Columns too ?'
As When you give 'Analyze table compute statistics for table for all indexes for all indexed columns ' , you are also analyzing the primary key and Oracle strongly says that only the skewed data should be analyzed .
And the other part is that when we Analyze the primary key columns we are generating a lot of records for the primary key in the histograms ,as a result during the parse phase the optimizer will call those records and will consume the space from the DB Block Buffer and Dictionary Cache . Don't you think it will adversely affect the performance in case of heavily used big size database in a OLTP environment .
Above all it adds to the size of the sys tablespace .
Please rectify if I am wrong .
I have a doubt when the optimizer looks for the histograms does it brings the full table of histograms to SGA or brings only the data required for the selected tables during the Parse Process to db_block_buffer and dictionary cache ?



Tom Kyte
March 23, 2002 - 8:24 pm UTC

I use for all indexed columns in my examples frequently -- its a short cut for me. It is easier then running N analyzes to get what I want for what is a very short, quick demo.

I don't see it adversely affecting performance in most normal systems no.

the histograms are retrieved as needed, not all at once.

Analyze Table

R.V.Reddy, March 24, 2002 - 12:34 am UTC

Hi tom

This is a beautiful explanation regarding analyze table. I have one doubt. so for we developed some applications using around 200 tables. But I never ANALYZED the tables. If I am not analyzing the tables is there any impact on the performance or any other issues. If I do what I can achieve in my application.

Thanks

Reddy

Tom Kyte
March 24, 2002 - 3:35 pm UTC

By not analyzing the tables you are missing out on many features such as:

desc indexes,
function based indexes,
bitmap indexes,
partitioning,
parallel query,
.......

and the list grows longer and longer each release.

Analze tables compute statistics

femi, March 24, 2002 - 2:06 pm UTC

Tom,

Correct me if I am wrong, but when I issue the command "analyze table compute statistics", It computes statistics for the indexes as well.

Tom Kyte
March 24, 2002 - 3:48 pm UTC

read the answer, see the way to tell and try it!
(or read the comment a couple of lines up "Dont mean to beat a dead horse..."

for all table

Ak, March 24, 2002 - 8:51 pm UTC

Hi tom,
1)Assume i have 500 tables in a schema,how do i analyze
all tables/indexes at a time,is there any way or i have to do one by one.

2)How frequently we should analyze the tables,assuming it is
oltp system.

3)Is there any performance problem ,when we analyze during
peek time,or it is advisable to do during non-peek hour.

Thanks,
Ak




Tom Kyte
March 24, 2002 - 8:59 pm UTC

well, parallel 500 would be a bit extreme, I haven't seen a machine that could do 500 analyzes at the same time effciently yet.

1) use dbms_stats, use alter table monitoring, gather stats only on stale tables, and if you have the resources -- do so on parallel.

2) use alter table monitoring, gather stats on stale tables when capacity is at its maximum. depends on the system. some systems -- once a day, others once a week, others yet -- once a month.

3) during peak, if you do the analyze, you'll be incurring massive IO overhead. It is something you want to try to do during "off" times if at all possible.

analyze

Ak, March 24, 2002 - 10:59 pm UTC

Hi tom,
Thanks for Your reply !!! Here according to your response
it look like i have to say
analyze table A compute statistics
analyze table B compute statistics
analyze table C compute statistics
analyze table D compute statistics

Like wise for all the 500 tables and then for indexes as well ?? Please correct me if i am wrong.




Tom Kyte
March 25, 2002 - 7:53 am UTC

Read all about DBMS_STATS. It can be done in as little as a single command.

</code> http://docs.oracle.com/docs/cd/A87860_01/doc/appdev.817/a76936/dbms_sta.htm#999107 <code>
....

This is always something taht has bugged me

Chris Slattery, March 25, 2002 - 5:46 am UTC

I always thought a stats gather (default compute) did the whole lot. The manual bears this out,,, And I quote from tahiti.oracle.com SQL 9.0 ref.

"TABLE table
Specify a table to be analyzed. When you collect statistics for a table, Oracle also automatically collects the statistics for each of the table's indexes and domain indexes, as long as

no for_clauses are used.

When you analyze a table, Oracle collects statistics about expressions occurring in any function-based indexes as well. Therefore, be sure to create function-based indexes on the table before analyzing the table"


and so on.


All I am looking for is a clear statement of the difference. There seems to be a discrepancy in the manual, from what Mr. Kyte writes. (and I am quite happy to follow his example).

as an interesting note I see from the same SQL reference that Oracle now wish you to use DBMS_STATS rather than ANALYZE and in fact give an ominous warning that DBMS_STATS will be the only method for generating stats that will actually be used (by the CBO) in some future release.
Any idea as to what release this will actually be ?


Tom Kyte
March 25, 2002 - 8:12 am UTC

What discrepancy do you see? I see NONE.

where is there a discrepancy??

I demonstrated (see comment Don't mean to beat a dead horse.) that analyze table compute stats does more then

analyze table t compute statistics
for table
for all indexes
for all indexed columns;


I also demonstrated that when you use for, you get only what you ask for.


the original QUESTION was:

In your quetion answers, you prefer:

analyze table t compute statistics
for table
for all indexes
for all indexed columns;

does "analyze table t compute statistics
for table"
cover all the rest?


and my answer was (and still is) no, no it does not.




For AK from India

Pawan, March 25, 2002 - 7:30 am UTC

AK,
No you don't have to Analyze all the tables in the schema individually. u can analyze the schema in one command :
This is what I use. I have the following script saved as analyze_schema.sql and at the SQL prompt just run it and input the schema name when prompted.
------
accept user char format 'A40'-
prompt'Enter user name:'
begin
dbms_utility.analyze_schema(upper('&user'),'COMPUTE');
end;
/
-------
Tom,
Correct me if I am wrong.

Proper Analyze

Randy Richardson, March 25, 2002 - 9:17 am UTC

Tom, Thanks for sharing the knowledge. I just bought your latest book. Love it!!. Do you have one covering Oracle9i in the works?

As for the above question: I too use

1. dbms_utility.analyze_schema(USER_NAME),'COMPUTE');

Does this differ to
2. analyze table t compute statistics
for table
for all indexes
for all indexed columns;

and if it does why would you want to do it 1 rather than 2?
And if two is better why not change 1 to do what’s best.

Thanks again!!


Tom Kyte
March 25, 2002 - 12:22 pm UTC


Here is the lowdown on analyzing objects.


in the real world, outside of the examples you see here, you will use DBMS_STATS to compute statistics. It is more flexible, has more features (eg: a parallel analyze that analyze cannot do). In examples here, I analyze only what I need since I tend to CREATE something, use it, and drop it. For me, analyzing my entire schema would be a waste of time.

Never use dbms_utility, it is a legacy from 7.3. Use dbms_stats instead.

(i'm in the process of outlining a performance book that would cover upto 9iR2)

Post-laceration !

Chris Slattery, March 25, 2002 - 9:42 am UTC



Yes Tom ...

You (as always ) are right. Stemmed from misreading the syntax diags. a long time ago and being sucked into the myth (that I need to debunk) that a qualifier (such as for index or for table ) must be added for ANALYZE to work.

Deleting Statistics

Rob, June 06, 2003 - 5:16 pm UTC

Is there anything to watch out for when
deleting stats in the fashion below ?


analyze table
ARCH_CANCEL_NOTICES
delete statistics ;

Thanks

Tom Kyte
June 06, 2003 - 8:02 pm UTC

well, if you join or reference arch_cancel_notices with other analyzed tables, we'll use default stats for arch_cancel_notices -- with disastrous results (gartner says 99.99% probability ;)

DBMS_UTILITY.ANALYZE_SCHEMA is not very smart

A reader, July 31, 2003 - 11:39 pm UTC

Hi Tom,
the package DBMS_UTILITY.ANALYZE_SCHEMA is not very smart it picks up all tables in a schema to analyse them and fails if the schema has a external table, i think the package should omit external tables when it knows that analyze does not work on external tables.


Tom Kyte
August 01, 2003 - 7:44 am UTC

well, you shouldn't be using dbms_utility -- dbms_stats is the only way to go.  and then, you CAN in fact analyze them:

ops$tkyte@ORA920LAP.LOCALHOST> create table external_table
  2  (EMPNO NUMBER(4) ,
  3   ENAME VARCHAR2(10),
  4   JOB VARCHAR2(9),
  5   MGR NUMBER(4),
  6   HIREDATE DATE,
  7   SAL NUMBER(7, 2),
  8   COMM NUMBER(7, 2),
  9   DEPTNO NUMBER(2)
 10  )
 11  ORGANIZATION EXTERNAL
 12  ( type oracle_loader
 13    default directory data_dir
 14    access parameters
 15    ( fields terminated by ',' )
 16    location ('emp.dat')
 17  )
 18  /

Table created.

ops$tkyte@ORA920LAP.LOCALHOST> host flat scott/tiger emp > /tmp/emp.dat

ops$tkyte@ORA920LAP.LOCALHOST>
ops$tkyte@ORA920LAP.LOCALHOST> exec dbms_stats.gather_table_stats( user, 'EXTERNAL_TABLE' )

PL/SQL procedure successfully completed.

ops$tkyte@ORA920LAP.LOCALHOST> select table_name, num_rows, last_analyzed from
  2  user_tables where table_name = 'EXTERNAL_TABLE';

TABLE_NAME                       NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
EXTERNAL_TABLE                         14 01-AUG-03



<b>plus have a whole lot more control</b> 

what the point of FOR ALL COLUMNS

A reader, September 01, 2003 - 1:31 pm UTC

Hi

Why we dont use FOR ALL INDEXED COLUMNS instead of FOR ALL COLUMNS? I dont see the point analyzing columns whcih are not indexed. Any suggestion?

Tom Kyte
September 01, 2003 - 2:29 pm UTC

think of a join....


select *
from t1, t2
where t1.indexed_column = t2.column
and t2.unindexed_column = 'X';


tell me, should we nested loops join to t1 after full scanning t2 or...
should we hash join both

whatever you say is best, I'll put a counter example that shows the other was really best.

I dont see how statistics can help

A reader, September 01, 2003 - 3:52 pm UTC

Hi

I dont see how statistics of a column help in a join, if let's say hash join is best then shouldnt that decision based on number of rows in both tables...? If nested loop is used shouldnt the statistics on the indexed column be enough?

Tom Kyte
September 01, 2003 - 5:58 pm UTC

select * 
 from t1, t2
where t1.indexed_column = t2.column
  and t2.unindexed_column = 'X';


say unindex_colulmn = 'X' is known (via histograms) to return 1 row and that t1.indexed_column is know to be unique.

full scan t2, nested loops using index to t1


say  unindex_column = 'X' is known (via histograms) to return 500,000 rows.

full scan t2, filter, hash join to t1



the decision to hash join IS based to some degree on the number of rows in both tables AFTER FILTERING.  After "unindex_column = 'X'" is evaluated.  If the optimizer has no clue about the distribution of 'X' -- well, there you go, it won't know -- it'll guess and could well be *wrong*


consider:

ops$tkyte@ORA920> create table t1
  2  as
  3  select *
  4    from all_objects;
 
Table created.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> alter table t1 add constraint t1_pk primary key(object_id);
 
Table altered.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> analyze table t1
  2  compute statistics
  3  for table
  4  for all indexes
  5  for all indexed columns;
 
Table analyzed.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> create table t2
  2  as
  3  select decode( mod(rownum,100), 0, to_char(rownum), 'x' ) unindex_column, all_objects.*
  4    from all_objects;
 
Table created.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920> analyze table t2
  2  compute statistics
  3  for table;
 
Table analyzed.
 
ops$tkyte@ORA920>
ops$tkyte@ORA920>
ops$tkyte@ORA920> set autotrace traceonly explain
ops$tkyte@ORA920> select * from t1, t2 where t1.object_id = t2.object_id and t2.unindex_column = '100';
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=100 Card=317 Bytes=61498)
   1    0   HASH JOIN (Cost=100 Card=317 Bytes=61498)
   2    1     TABLE ACCESS (FULL) OF 'T2' (Cost=48 Card=317 Bytes=31066)
   3    1     TABLE ACCESS (FULL) OF 'T1' (Cost=50 Card=31708 Bytes=3043968)
 
 
 
ops$tkyte@ORA920> select * from t1, t2 where t1.object_id = t2.object_id and t2.unindex_column = 'x';
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=100 Card=317 Bytes=61498)
   1    0   HASH JOIN (Cost=100 Card=317 Bytes=61498)
   2    1     TABLE ACCESS (FULL) OF 'T2' (Cost=48 Card=317 Bytes=31066)
   3    1     TABLE ACCESS (FULL) OF 'T1' (Cost=50 Card=31708 Bytes=3043968)
 
 
<b>
ops$tkyte@ORA920> analyze table t2 compute statistics for columns unindex_column size 100;
 
Table analyzed.
 </b>
ops$tkyte@ORA920> Select * from t1, t2 where t1.object_id = t2.object_id and t2.unindex_column = '100';
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=49 Card=1 Bytes=194)
   1    0   NESTED LOOPS (Cost=49 Card=1 Bytes=194)
   2    1     TABLE ACCESS (FULL) OF 'T2' (Cost=48 Card=1 Bytes=98)
   3    1     TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=1 Bytes=96)
   4    3       INDEX (UNIQUE SCAN) OF 'T1_PK' (UNIQUE) (Cost=1 Card=1)
 
 
 
ops$tkyte@ORA920> Select * from t1, t2 where t1.object_id = t2.object_id and t2.unindex_column = 'x';
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=214 Card=31710 Bytes=6151740)
   1    0   HASH JOIN (Cost=214 Card=31710 Bytes=6151740)
   2    1     TABLE ACCESS (FULL) OF 'T1' (Cost=50 Card=31708 Bytes=3043968)
   3    1     TABLE ACCESS (FULL) OF 'T2' (Cost=52 Card=31710 Bytes=3107580)
 
 
 
ops$tkyte@ORA920> set autotrace off

 

Too good explanation!!!!

A reader, September 02, 2003 - 5:47 am UTC


dbms_stats

Michael, October 10, 2003 - 7:06 pm UTC

Hi Tom,

is dbms_stats.gather_table_stats(ownname => 'OWNER',
tabname => 'TABLE',
cascade => 'TRUE');
the same as
analyze table t compute statistics
for table
for all indexes
for all indexed columns;
If not, where are the differences and is there a way to let dbms_stats do the same?
You're answer is as always appreciated.

Tom Kyte
October 11, 2003 - 9:56 am UTC

it is like

analyze table T
compute statistics
for table
for all columns size 1
for all indexes;

the method opt defaults to for all columns size 1

</code> http://docs.oracle.com/docs/cd/B10501_01/appdev.920/a96612/d_stats2.htm#1003993 <code>

you would use method_opt => 'for all indexed columns'

Analyze

Yogesh Hingmire, January 11, 2004 - 12:55 pm UTC

Hi Tom ,
i have read this in the Oracle documentation

"Specify a table to be analyzed. When you collect statistics for a table, Oracle also automatically collects the statistics for each of the table's indexes and domain indexes, as long as no for_clauses are used."

it is mentiones here

</code> http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96540/statements_46a.htm#SQLRF01105 <code>

is it wrong or what do they have to say ????



Tom Kyte
January 11, 2004 - 3:16 pm UTC

it is correct --

analyze table t compute statistics;


does all three -- if you use NO for clauses.

A reader, March 18, 2004 - 4:47 pm UTC

Tom,

Is there any other way to figure out if my tables my been analyzed other than looking at *_tables.

Thanks.

Tom Kyte
March 18, 2004 - 5:07 pm UTC

guess you could use dbms_stats and catch the exception, but sql seems easier

ops$tkyte@ORA9IR2> create table t as select * from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> variable numrows number
ops$tkyte@ORA9IR2> variable numblks number
ops$tkyte@ORA9IR2> variable avgrlen number
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autoprint on
ops$tkyte@ORA9IR2> exec dbms_stats.get_table_stats( user, 'T', null, null, null, :numrows, :numblks, :avgrlen );
BEGIN dbms_stats.get_table_stats( user, 'T', null, null, null, :numrows, :numblks, :avgrlen ); END;
 
*
ERROR at line 1:
ORA-20000: Unable to get values for table T
ORA-06512: at "SYS.DBMS_STATS", line 2582
ORA-06512: at line 1
 
 
 
   AVGRLEN
----------
 
 
 
   NUMBLKS
----------
 
 
 
   NUMROWS
----------
 
 
ops$tkyte@ORA9IR2> analyze table t compute statistics for table;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.get_table_stats( user, 'T', null, null, null, :numrows, :numblks, :avgrlen );
 
PL/SQL procedure successfully completed.
 
 
   AVGRLEN
----------
        96
 
 
   NUMBLKS
----------
       435
 
 
   NUMROWS
----------
     30628
 
ops$tkyte@ORA9IR2>
 

A reader, March 18, 2004 - 5:09 pm UTC

Duh! Totally forgot about that.


Thanks alot.

Analyze tables created from TYPE

surjam, April 27, 2004 - 4:27 pm UTC

Hi Tom,

We have tables created from object TYPE. When we analyze the tables, we dont see the table on user_tables for the table analyzed. Is there any problem or any other way for analyzing tables created from TYPE.

create type my_typ as object
(a number);
/

create table my_tbl of my_typ;
/
insert into my_tbl values(100);
/
select table_name,num_rows from user_tables where table_name='MY_TBL';

no records found.

Thanks




Tom Kyte
April 28, 2004 - 2:04 pm UTC

ops$tkyte@ORA9IR2> create or replace type myType as object ( a number );
  2  /
 
Type created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t of myType
  2  /
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t values ( 100 );
 
1 row created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics
  2  /
 
Table analyzed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select num_rows from user<b>_OBJECT_</b>tables where table_name = 'T';
 
  NUM_ROWS
----------
         1
 

dbms_ddl vs dbms_stats

surjam, April 28, 2004 - 2:36 pm UTC

Tom,

I was going through all your explanations regarding dbms_stats and you recommend this package. Am I right in saying that dbms_ddl.analyze_object is the same as the analyze table command. And, if I use your method of analyzing for table,all columns, all indexed columns will it give the same benefits of dbms_stats?

Also not very sure of how to set up DBMS_STATS.

You are the best Tom. Thanks again.

Tom Kyte
April 28, 2004 - 7:34 pm UTC

dbms_stats is FULLY documented

have you read about it?

surjam, April 29, 2004 - 10:49 am UTC

Tom,

I have read your chapter in Effective Oracle by design book and I now understand the way it is setup. Still if you can let me know the diff between dbms_ddl.analyze_object vs dbms_stats, that would help my understanding. My thought is that dbms_ddl.analyze_object is the same as
analyze table tname compute statistics for table for indexes
for all indexed columns
Is this right? Thanks

Tom Kyte
April 29, 2004 - 11:27 am UTC

dbms_ddl.analyze_object should be ignored, forgotten. it just does

"analyze table t compute statistics"

(if method is compute) -- all columns, all indexes.

dbms_stats should be the only game in town.





For OLTP

Sanjaya Balasuriya, May 12, 2004 - 7:14 am UTC

Hi Tom,

Is it a bad idea for creating histograms for an OLTP ?
This system is active only between 08.00 and 19.00. Even if I do an analyze for table columns after 19.00, isn't OK ?

Or should I just run dbms_stats.gather_schema_ststs("schema_name") ?

Thanks in advance.


Tom Kyte
May 12, 2004 - 7:55 am UTC

it is not "bad" as in evil.

it is generally "not so useful" as you sort of expect short, fast, quick, index reads in true oltp.

last time I saw true oltp however was sometime in 1994 i think. reports and other things creep into it.

it might be that you are spinning extra resources to compute them (gather schema stats defaults to getting histograms). but if it is something you are doing -- keep doing it (don't make big changes without testing), unless and until you test it without and see how it performs

dbms_stats taking more time

nn, May 12, 2004 - 9:05 am UTC

I am working on implementation of DBMS_STATS , Everybody knows
the comparison of dbms_stats and Analyze table but the main issue
is time elapsed. My client DBAs are not convinced to implement
it as it takes more time than ANALYZE.Following are the stats which
i am planning to put in front of them.

1. SELECT COUNT(*) FROM aa
26710372

a.Analyze table aa estimate statistics sample 20
percent;

SAMPLE_SIZE NUM_ROWS Time Elasped
4783565 26710372 04:42:26.42

Execute DBMS_STATS.GATHER_TABLE_STATS
(OWNNAME =>'GCA_USER',TABNAME =>
'GCA_BOOKING_DETAIL',ESTIMATE_PERCENT =>20,degree => 3);


SAMPLE_SIZE NUM_ROWS Time Elasped
5342192 26710960 3:06:13 30%
improvement


B.SELECT COUNT(*) FROM ab
57207324


Analyze table aB estimate statistics
sample 20 percent;

SAMPLE_SIZE NUM_ROWS Time Elasped
9051093 57207324 002:53:01

Execute DBMS_STATS.GATHER_TABLE_STATS
(OWNNAME =>'GCA_USER',TABNAME =>
'aB',ESTIMATE_PERCENT =>20,degree => 3);

SAMPLE_SIZE NUM_ROWS Time Elasped
5722651 57226510 009:45:30


Its really an issue to convience time elapsed it too high

c.SELECT COUNT(*) FROM GL;

24046022
Analyze table GL estimate statistics sample 20
percent;

SAMPLE_SIZE NUM_ROWS Time Elasped
4183321 24046022 1:57:15

Execute DBMS_STATS.GATHER_TABLE_STATS
(OWNNAME =>'GCA_USER',TABNAME =>
'GL',ESTIMATE_PERCENT =>20,degree => 3);

SAMPLE_SIZE NUM_ROWS Time Elasped
25881265 258812650 5:23:18


So Tom Please suggest on this?


Tom Kyte
May 12, 2004 - 6:45 pm UTC

run the dbms_stats with a 10046 level 12 trace and see what it is doing that takes the most time...

Reader

A reader, May 13, 2004 - 7:28 am UTC

Is there a way to verify that the statistics are deleted
when "analyze table delete statistics" is issued

Will the statistics gathered previously still remain.
I was checking at the "density" before and after
"analyze delete". It seems it still remained. How does
the optimizer know that the statistics are deleted

Tom Kyte
May 13, 2004 - 10:31 am UTC

select num_rows from user_tables where table_name = 't';
select count(*) from user_tab_histograms where table_name = 't';
select num_rows, index_name from user_indexes where table_name = 't';


that'll show you if table, column or index stats exist (modify as needed to check partitions or subpartitions)

num_rows is null -> no stats
count(*) = 0 -> no stats

Analyze

arc, August 17, 2004 - 10:15 am UTC

Hi Tom,
Is there any way to identify a table to be analyzed?
I am using Oracle 8i.

Thanks
ARC

Tom Kyte
August 17, 2004 - 10:19 am UTC

um? huh?

if you are using the cbo, all of your tables should be analyzed.


if you mean "which tables need to be reanalyzed or are candidates for being re-analyzed", then you

alter table t monitoring;

and use the gather stale/list stale option of dbms_stats.

Ravi Kumar, August 29, 2004 - 6:41 am UTC

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:718620715884#12100114489478 <code>

Tom in the example where you have compared Hash Join with Nested loop for table T2 with Unindexed column, I have a question.

1)Why does the Cardinality of the Full table Scan should become
TABLE ACCESS (FULL) OF 'T2' (Cost=48 Card=1 Bytes=98)

from

TABLE ACCESS (FULL) OF 'T1' (Cost=50 Card=31708 Bytes=3043968)

I mean FTS means it has to visit ALL the blocks and I'd expect the cardinality to remain 31708 after creating Histograms.

2)Why am I wrong, how do the Histograms WORK to reduce cardinality.

3)Is the cardinality really reduced in a FTS w.r.t Logical reads due to the presences of a Histogram?

4) If logical Reads do get reduced arent Histograms working like Indexes to point the data rows?


Tom Kyte
August 29, 2004 - 11:55 am UTC

look at the predicate -- the estimated row counts are the rows resulting from the operation, to be fed into the next step.  they are not the estimated rows that would be "inspected", they are the estimated rows that would be produced:


ops$tkyte@ORA9IR2> drop table t;
 
Table dropped.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t
  2  as
  3  select 1 x, object_id y
  4    from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> begin
  2          dbms_stats.gather_table_stats
  3          ( user, 'T', method_opt=>'for all columns size 254' );
  4  end;
  5  /
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where x = 1;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=29509 Bytes=206563)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=10 Card=29509 Bytes=206563)
 
 
 
ops$tkyte@ORA9IR2> select * from t where y = 1;
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=1 Bytes=7)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=10 Card=1 Bytes=7)
 
 
 


The cost of the full scan - the same.

The estimated number of rows produced -- different.


So, the cost for finding 1 or 30,000 rows -- 10.  

If this were a more complex query, say joining T to something else, the optimizer would take the 30k rows from one query and say "you know, using an index 30k times is stupid for joining, so lets hash".  It would likewise take the 1 row and say "wouldn't an index be a wonderful thing to join with since we have to do it only once"


 

Why does FTS sees lesser cardinality?

Ravi Kumar, August 29, 2004 - 6:49 am UTC

Sorry tom ignore my previous review as I have made a crucial mistake!

Here's the corrected one:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:718620715884#12100114489478 <code>

Tom in the example where you have compared Hash Join with Nested loop for table
T2 with Unindexed column, I have a question.

1)Why does the Cardinality of the Full table Scan should become

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=49 Card=1 Bytes=194)
1 0 NESTED LOOPS (Cost=49 Card=1 Bytes=194)
2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=48 Card=1 Bytes=98)
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=1 Bytes=96)
4 3 INDEX (UNIQUE SCAN) OF 'T1_PK' (UNIQUE) (Cost=1 Card=1)

To me it appears the Nested Loop gets just ONE row from the FTS of T2 to loop through T1

FROM

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=100 Card=317 Bytes=61498)
1 0 HASH JOIN (Cost=100 Card=317 Bytes=61498)
2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=48 Card=317 Bytes=31066)
3 1 TABLE ACCESS (FULL) OF 'T1' (Cost=50 Card=31708 Bytes=3043968)



I mean FTS means it has to visit ALL the blocks and I'd expect the cardinality
to remain 317 after creating Histograms.

2)Why am I wrong, how do the Histograms WORK to reduce cardinality.

3)Is the cardinality really reduced in a FTS w.r.t Logical reads due to the
presences of a Histogram?

4) If logical Reads do get reduced aren’t Histograms working like Indexes to
point the data rows?

Note: I have run the above queries locally to verify the results and they were as you had given, with a different mileage as expected!


Tom Kyte
August 29, 2004 - 11:59 am UTC

see above.

Analyze Table / Index / Indexed Columns

Vivek Sharma, November 29, 2004 - 3:47 am UTC

Dear Tom,

Please explain me this behaviour. 

Create table t as select * from all_objects;

select count(*) from t;

  COUNT(*)
----------
     29098

SQL> select count(*) from t where data_object_Id is null;

  COUNT(*)
----------
        50

create bitmap index t_inx on t(data_object_id);

The Scenario for query 

select * from t where data_object_id is null

is :

When I do 

1. Analyze table T compute statistics, the index is used.
2. Delete the Statistics and Analyze table t compute statistics for table, the index is not used.
3. With Step 2 and Analyze index t_inx compute statistics, again the index is not used,
4. With Step 2 and Analyze index t_inx compute statistics for all indexed columns, again the index is not used.
5. Analyze table t compute statistics for table for all indexes for all indexed columns, the index is used.


Scenario I
-----------
SQL> analyze table t compute statistics;

Table analyzed.

Elapsed: 00:00:02.00
SQL> select * from t where data_object_id is null;

50 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=50 Bytes=440
          0)

   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=12 Card=50 Byte
          s=4400)

   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (SINGLE VALUE) OF 'T_INX'




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       5314  bytes sent via SQL*Net to client
        532  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

Scenario II
-----------

SQL> analyze table t delete statistics;

Table analyzed.

Elapsed: 00:00:00.00
SQL> analyze table t compute statistics for table;

Table analyzed.

Elapsed: 00:00:00.02
SQL> select * from t where data_object_id is null;

50 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=1455 Bytes=1
          44045)

   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=41 Card=1455 Bytes=144045
          )





Statistics
----------------------------------------------------------
        262  recursive calls
          0  db block gets
        431  consistent gets
          0  physical reads
          0  redo size
       5314  bytes sent via SQL*Net to client
        532  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

Scenario III
-------------

SQL> analyze index t_inx compute statistics;

Index analyzed.

Elapsed: 00:00:00.02
SQL> select * from t where data_object_id is null;

50 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=1455 Bytes=1
          44045)

   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=41 Card=1455 Bytes=144045
          )





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        405  consistent gets
          0  physical reads
          0  redo size
       5314  bytes sent via SQL*Net to client
        532  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

Scenario IV
-----------

SQL> analyze index t_inx compute statistics for all indexed columns;

Index analyzed.

Elapsed: 00:00:00.00
SQL> select * from t where data_object_id is null;

50 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=1455 Bytes=1
          44045)

   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=41 Card=1455 Bytes=144045
          )





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        405  consistent gets
          0  physical reads
          0  redo size
       5314  bytes sent via SQL*Net to client
        532  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

Scenario V
----------

SQL> analyze table t compute statistics for table for all indexes for all indexed columns;

Table analyzed.

Elapsed: 00:00:00.04
SQL> select * from t where data_object_id is null;

50 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=50 Bytes=495
          0)

   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=12 Card=50 Byte
          s=4950)

   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (SINGLE VALUE) OF 'T_INX'




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       5314  bytes sent via SQL*Net to client
        532  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

Why is my Index not getting used when I compute statistics for Indexes separately ?

Thanks
Vivek

 

Tom Kyte
November 29, 2004 - 8:20 am UTC

analyze table t compute statistics for table;
^^^^^^^^^

for table by itself means "for the table, NOT for the columns NOT for the indexes"

hence, Oracle doesn't have a clue that there are only 50 rows null anymore. you took that information away from it.

In Continuation to my previous Query

Vivek Sharma, November 29, 2004 - 8:53 am UTC

Hi Tom,

Thanks for your prompt reply.

I know that since I analyzed only the table, the index is not geeting used. But when I analyzed the Index using Analyze Index, the INdex was not used. Why this behaviour ?

When I analyzed the table and Index using a single commad, the query used the index.

Regards
Vivek

Tom Kyte
November 29, 2004 - 2:54 pm UTC

because you had no COLUMN statistics.


the index doesn't tell us COLUMN statistics.  for all columns does stuff only when you scan the table.


ops$tkyte@ORA9IR2> create table t as select * from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR2> create index t_idx on t(data_object_id);
 
Index created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> create or replace view v
  2  as
  3  select 'T' what, num_rows rows_cname_cnt, null cnt from user_tables where table_name = 'T'
  4  union all
  5  select 'I', num_rows, null from user_indexes where table_name = 'T'
  6  union all
  7  select 'C', count(distinct column_name), count(*) from user_tab_histograms where table_name = 'T';
 
View created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> select * from v;
 
WHAT                           ROWS_CNAME_CNT        CNT
------------------------------ -------------- ----------
T                                       27938
I                                        1389
C                                          13         26
 
ops$tkyte@ORA9IR2> analyze table t delete statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> select * from v;
 
WHAT                           ROWS_CNAME_CNT        CNT
------------------------------ -------------- ----------
T
I
C                                           0          0
 
ops$tkyte@ORA9IR2> analyze table t compute statistics for table;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> select * from v;
 
WHAT                           ROWS_CNAME_CNT        CNT
------------------------------ -------------- ----------
T                                       27938
I
C                                           0          0
 
ops$tkyte@ORA9IR2> analyze index t_idx compute statistics;
 
Index analyzed.
 
ops$tkyte@ORA9IR2> select * from v;
 
WHAT                           ROWS_CNAME_CNT        CNT
------------------------------ -------------- ----------
T                                       27938
I                                        1389
C                                           0          0
 
ops$tkyte@ORA9IR2> analyze index t_idx compute statistics for all indexed columns;
 
Index analyzed.
 
ops$tkyte@ORA9IR2> select * from v;
 
WHAT                           ROWS_CNAME_CNT        CNT
------------------------------ -------------- ----------
T                                       27938
I                                        1389
C                                           0          0
 
ops$tkyte@ORA9IR2> analyze table t compute statistics for table for all indexes for all indexed columns;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> select * from v;
 
WHAT                           ROWS_CNAME_CNT        CNT
------------------------------ -------------- ----------
T                                       27938
I                                        1389
C                                           1         74
 
ops$tkyte@ORA9IR2>
 

I think I understand, but why gather histogram data on non_indexed columns?

Ken, March 02, 2005 - 11:36 am UTC

Tom,

I'm a contract DBA for a big company. They had a well respected consultant in the past who advised that they use "FOR ALL COLUMNS SIZE 1" when gathering stats (I think it's the default in 9.2). I understand that humans might be interested in the histogram data for an unindexed column once in a while(it provides insight on whether to index it or not, if the number of buckets is greater than 1) but why should we use this option for everything, all the time? Can the optimizer come up with a better plan by looking as histogram data for an unindexed column? It will still have to do a FTS is there is not index on the predicated column, right?

I plan to test by gathering stats with "FOR INDEXED COLUMNS" but would greatly appreciate your input first.

Tom Kyte
March 02, 2005 - 11:41 am UTC

select *
from big_table_1 A, big_table_2 B
where A.non_indexed_column = 42
and A.column = B.indexed_column;


Would you like to know that of the 1,000,000,000 rows in big_table_1, there are say 100 values of non_indexed_column, and only 2 rows with the value of 42

so, we get a full scan of big_table_1 and a nested loops join to big_table_2

instead of a full scan of big_table_1, hash it, and a full scan of big_table_2 and hash join?


that is one reason to gather histograms on non-indexed columns.


where did you see to use something "for everything, all the time" -- I need to erase that if true.

Great Information!

Ken, March 02, 2005 - 12:30 pm UTC

Thanks Tom, you are the best!

By using "SIZE 1" the optimizer would put everything into 1 bucket and would therefore only know the low value and high value, correct? So you would have to use a more appropriate value for size to get the benefit you describe, right?

I don't know if the consultant told my client to "always" gather stats this way, but told them to do it that way and it is the only way they do it.

Thanks again. I should be able to come up with some good tests, keeping your comments in mind.

Tom Kyte
March 02, 2005 - 12:39 pm UTC

to let the optimizer understand the skew of the data, yes you would have to have more than one bucket.

Histograms

Kumar, April 30, 2005 - 4:24 pm UTC

Hello Tom,

We are using 9i R2. We were analyzing the tables every weekend using the following command,

Analyze Table <T> compute statistics;

Our DBA had changed this analyze job to use DBMS_STATS two weeks back, this resulted in poor performance of all of our sqls.

The DBA had changed the above Analyze command to the following DBMS_STATS syntax,

exec dbms_stats.gather_table_stats(
<user>, 'T', estimate_percent => null, degree => null, cascade => true);

I think the difference is the histograms( missing method_opt parameter), am I correct?

Can you please advise an equivalent value for method_opt parameter for the analyze command 'Analyze Table <T> compute statistics;'

Thanks.



Tom Kyte
April 30, 2005 - 4:40 pm UTC

method opt defaults to 'for all columns size 1', it is the same.  look somewhere else for the root cause of your problem for now I think:



ops$tkyte@ORA9IR2> create table t
  2  as
  3  select * from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR2> alter table t add constraint t_pk primary key(object_id);
 
Table altered.
 
ops$tkyte@ORA9IR2> create index t_idx on t(object_name);
 
Index created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                   27812
 
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;
 
COLUMN_NAME            COUNT(*)
-------------------- ----------
CREATED                       2
DATA_OBJECT_ID                2
GENERATED                     2
LAST_DDL_TIME                 2
OBJECT_ID                     2
OBJECT_NAME                   2
OBJECT_TYPE                   2
OWNER                         2
SECONDARY                     2
STATUS                        2
SUBOBJECT_NAME                2
TEMPORARY                     2
TIMESTAMP                     2
 
13 rows selected.
 
ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name = 'T';
 
INDEX_NAME                       NUM_ROWS
------------------------------ ----------
T_IDX                               27812
T_PK                                27812
 
ops$tkyte@ORA9IR2> analyze table t delete statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', estimate_percent => null, degree => null, cascade => true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                   27812
 
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;
 
COLUMN_NAME            COUNT(*)
-------------------- ----------
CREATED                       2
DATA_OBJECT_ID                2
GENERATED                     2
LAST_DDL_TIME                 2
OBJECT_ID                     2
OBJECT_NAME                   2
OBJECT_TYPE                   2
OWNER                         2
SECONDARY                     2
STATUS                        2
SUBOBJECT_NAME                2
TEMPORARY                     2
TIMESTAMP                     2
 
13 rows selected.
 
ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name = 'T';
 
INDEX_NAME                       NUM_ROWS
------------------------------ ----------
T_IDX                               27812
T_PK                                27812
 
 

Histograms

Kumar, May 02, 2005 - 2:53 pm UTC

Thanks Tom. Your explanation was very useful. I see one another difference in SAMPLE_SIZE in USER_TAB_COLUMNS.

Analyze sets this to total number of rows in the table but DBMS_STATS sets the actual not null values in column.

This should not be a problem right?

And also our OPTIMIZER_MODE is set to CHOOSE in the init.ora file.

Since the tables are analyzed this will use RBO,but can you please let  me know which one this will use FIRST_ROWS, FIRST_ROWS_n or ALL_ROWS as a default?

Would this make any difference? Our application is not a true OLTP, it has lots of reports and batch jobs also.

Do you advise me to change this from CHOOSE?

Thanks for your help.


Here are the output from that shows the SAMPLE_SIZE differently.

SQL*Plus: Release 9.2.0.1.0 - Production on Mon May 2 10:20:26 2005

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.3.0 - Production

SQL> set pagesize 100
SQL> column COLUMN_NAME format a20


SQL>  analyze table account compute statistics;

Table analyzed.


SQL> select table_name, num_rows,GLOBAL_STATS, USER_STATS,SAMPLE_SIZE, LAST_ANALYZED
  2  from user_tables where table_name='ACCOUNT';

TABLE_NAME                       NUM_ROWS GLO USE SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- --- --- ----------- ---------
ACCOUNT                           4565299 NO  NO      4565299 02-MAY-05


SQL>select column_name, count(*) from user_tab_histograms where 
  2  table_name = 'ACCOUNT' group by column_name;

COLUMN_NAME            COUNT(*)
-------------------- ----------
ACCT_NBR                      2
ACCT_NM                       2
ACCT_OPEN_DT                  2
ACCT_STS_EFF_DT               2
ASC_CD                        2
ASC_OID                       2
CHGD_BY                       2
CHGD_DT                       2
EFFECTIVE_DT                  2
LOB_ID                        2
LOCAL_ORG_CODE                2
OID                           2
PRD_OID                       2
VERFD_BY                      2
VERFD_DT                      2
VERIFY_STS                    2

16 rows selected.


SQL> select index_name, num_rows, GLOBAL_STATS, USER_STATS,SAMPLE_SIZE, 
  2  LAST_ANALYZED from user_indexes where table_name = 'ACCOUNT';

INDEX_NAME                       NUM_ROWS GLO USE SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- --- --- ----------- ---------
ACCOUNT_IDX1                      4565299 NO  NO      4565299 02-MAY-05
ACCOUNT_IDX2                      2151656 NO  NO      2151656 02-MAY-05
ACCOUNT_IDX3                      4565299 NO  NO      4565299 02-MAY-05
ACCOUNT_IDX4                      2197868 NO  NO      2197868 02-MAY-05
ACCOUNT_IDX5                      1952886 NO  NO      1952886 02-MAY-05
ACCOUNT_PK                        4565299 NO  NO      4565299 02-MAY-05

6 rows selected.

SQL> select column_name,NUM_DISTINCT, NUM_NULLS, NUM_BUCKETS, LAST_ANALYZED, SAMPLE_SIZE,
  2  USER_STATS from user_tab_columns where table_name='ACCOUNT';

COLUMN_NAME          NUM_DISTINCT  NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE USE
-------------------- ------------ ---------- ----------- --------- ----------- ---
OID                       4565299          0           1 02-MAY-05     4565299 NO
PRD_OID                    605402    2413643           1 02-MAY-05     4565299 NO
ASC_OID                       489          0           1 02-MAY-05     4565299 NO
ACCT_NBR                  4502947          0           1 02-MAY-05     4565299 NO
ACCT_NM                    477065    2367431           1 02-MAY-05     4565299 NO
ACCT_OPEN_DT                27944    3510061           1 02-MAY-05     4565299 NO
ASC_CD                         39     129563           1 02-MAY-05     4565299 NO
ACCT_STS_EFF_DT            211095    3539583           1 02-MAY-05     4565299 NO
LOB_ID                         17    2610012           1 02-MAY-05     4565299 NO
LOCAL_ORG_CODE               5749    2612413           1 02-MAY-05     4565299 NO
EFFECTIVE_DT                80661    2114806           1 02-MAY-05     4565299 NO
EXPIRY_DT                       0    4565299           1 02-MAY-05             NO
VERIFY_STS                      1    2174754           1 02-MAY-05     4565299 NO
VERFD_BY                    59812    2183161           1 02-MAY-05     4565299 NO
VERFD_DT                     3639    2183455           1 02-MAY-05     4565299 NO
CHGD_BY                      6972          0           1 02-MAY-05     4565299 NO
CHGD_DT                    353852          0           1 02-MAY-05     4565299 NO

17 rows selected.


SQL>  exec dbms_stats.gather_table_stats( user, 'ACCOUNT',estimate_percent => null, degree => null, 
cascade => true );

PL/SQL procedure successfully completed.


 

SQL> select table_name, num_rows,GLOBAL_STATS, USER_STATS,SAMPLE_SIZE, LAST_ANALYZED
  2  from user_tables where table_name='ACCOUNT';

TABLE_NAME                       NUM_ROWS GLO USE SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- --- --- ----------- ---------
ACCOUNT                           4565299 YES NO      4565299 02-MAY-05


SQL> select column_name, count(*) from user_tab_histograms where 
  2  table_name = 'ACCOUNT' group by column_name;

COLUMN_NAME            COUNT(*)
-------------------- ----------
ACCT_NBR                      2
ACCT_NM                       2
ACCT_OPEN_DT                  2
ACCT_STS_EFF_DT               2
ASC_CD                        2
ASC_OID                       2
CHGD_BY                       2
CHGD_DT                       2
EFFECTIVE_DT                  2
EXPIRY_DT                     2
LOB_ID                        2
LOCAL_ORG_CODE                2
OID                           2
PRD_OID                       2
VERFD_BY                      2
VERFD_DT                      2
VERIFY_STS                    2

17 rows selected.


SQL> select index_name, num_rows, GLOBAL_STATS, USER_STATS,SAMPLE_SIZE, 
  2  LAST_ANALYZED from user_indexes where table_name = 'ACCOUNT';

INDEX_NAME                       NUM_ROWS GLO USE SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- --- --- ----------- ---------
ACCOUNT_IDX1                      4565299 YES NO      4565299 02-MAY-05
ACCOUNT_IDX2                      2151656 YES NO      2151656 02-MAY-05
ACCOUNT_IDX3                      4565299 YES NO      4565299 02-MAY-05
ACCOUNT_IDX4                      2197868 YES NO      2197868 02-MAY-05
ACCOUNT_IDX5                      1952886 YES NO      1952886 02-MAY-05
ACCOUNT_PK                        4565299 YES NO      4565299 02-MAY-05

6 rows selected.


SQL> select column_name,NUM_DISTINCT, NUM_NULLS, NUM_BUCKETS, LAST_ANALYZED, SAMPLE_SIZE,
  2  USER_STATS from user_tab_columns where table_name='ACCOUNT'
  3  /

COLUMN_NAME          NUM_DISTINCT  NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE USE
-------------------- ------------ ---------- ----------- --------- ----------- ---
OID                       4565299          0           1 02-MAY-05     4565299 NO
PRD_OID                    605402    2413643           1 02-MAY-05     2151656 NO
ASC_OID                       489          0           1 02-MAY-05     4565299 NO
ACCT_NBR                  4502947          0           1 02-MAY-05     4565299 NO
ACCT_NM                    484325    2367431           1 02-MAY-05     2197868 NO
ACCT_OPEN_DT                27944    3510061           1 02-MAY-05     1055238 NO
ASC_CD                         39     129563           1 02-MAY-05     4435736 NO
ACCT_STS_EFF_DT            211095    3539583           1 02-MAY-05     1025716 NO
LOB_ID                         17    2610012           1 02-MAY-05     1955287 NO
LOCAL_ORG_CODE               5749    2612413           1 02-MAY-05     1952886 NO
EFFECTIVE_DT                80661    2114806           1 02-MAY-05     2450493 NO
EXPIRY_DT                       0    4565299           1 02-MAY-05             NO
VERIFY_STS                      1    2174754           1 02-MAY-05     2390545 NO
VERFD_BY                    59812    2183161           1 02-MAY-05     2382138 NO
VERFD_DT                     3639    2183455           1 02-MAY-05     2381844 NO
CHGD_BY                      6972          0           1 02-MAY-05     4565299 NO
CHGD_DT                    353852          0           1 02-MAY-05     4565299 NO

17 rows selected.










 

Tom Kyte
May 03, 2005 - 7:19 am UTC

the optimizer is written expecting dbms_stats was used to collect stats. dbms_stats should be the baseline.

I think you meant

Since the tables are analyzed this will use CBO,
not
Since the tables are analyzed this will use RBO,

and with choose, you are basically "all_rows", i would leave at choose.

Kumar, May 03, 2005 - 12:59 pm UTC

Thanks Tom. You are correct, I meant CBO in the above question.sorry for the confusion.

>>>with choose, you are basically "all_rows", i would leave at choose.

Our application is having online, batch and reports, will leaving the parameter to "choose" in turn to "all_rows" create issues to online screens?

Should I make this as "first_rows_n" and ask the developers to change it to "all_rows" in batch/reports connections?

And also I had come across one of the oracle white papers that says "first_rows" is obsolete and developers should use first_rows_n.

Can you please give your advise on this.


Tom Kyte
May 03, 2005 - 2:43 pm UTC

let things that need it use first_rows at the session level.

the less you change globally, the better.

Estimate time for analyze

JJ, March 02, 2006 - 6:41 am UTC

Hi Tom,

Is there any way to estimate the total time required for analyzing a table without actually analyzing it?




Tom Kyte
March 02, 2006 - 12:27 pm UTC

you mean for dbms_stats to run of course ;)

Not really, you could do a really small estimate and a slightly larger one and attempt to extrapolate - given that it won't necessarily straight line.

Index creation and analyze table

Neeraj R Rath, March 10, 2006 - 12:15 am UTC

Hi TOM,
This Neeraj R Rath from India.
I am working as a performance engineer in a large consultacy.
I was checking one of the application where i found that non of production tables are analyzed.
Then i contacted the concerned application team ,they told me that there is a clear instruction of the vender not to analyze the tables as it may degrade the performance of the application .They are many indexes existing in production tables.
Please suggest if am creating indexes in a particular table to get better performance of a query and am not analyzing the table then what are the chances that query response will not be optimal.

Please suggest me the importance of analyzing the production tables.

OS:Sun Solaris,
DB:Ortacle 8.1.7.4.0



Tom Kyte
March 10, 2006 - 12:11 pm UTC

please contact application vendor - they are the ones that tell you what you are allowed to do with their application.

Excellent

Saraswathi, July 25, 2006 - 3:20 am UTC

Tom,your explanations are excellent.

Analyze Frequency?

Steve, August 17, 2007 - 10:45 am UTC

Tom,
We have an equity trading system that populates a "clean" database on a daily basis. Basically, the data in our core transactional tables is cleared out at the end of each day, and we start with empty tables at the beginning of the next trading day.

We currently analyze (using DBMS_STATS) at 3am,7am,10:30am,12:30pm,2pm,4pm. (EST)
These times represent a 1/2 hour to an hour after our typical spikes in volume.
There is generally a 20-30% increase in rowcount per hour, with certain hourly spikes around 200% or so.

At the end of the day, the number of rows in the all of the tables ranges from a few thousand to several hundred thousand rows.

Do you have any recommendations for the frequency of gathering statistics in this scenario.

Thanks.
Tom Kyte
August 22, 2007 - 9:20 am UTC

maybe you do it once at 2pm (average sized tables) and never again (retain plans as they are)

actually - what you should do is ask yourself (you can query v$ tables to answer this) - DO MY PLANS ACTUALLY CHANGE MATERIALLY as the day progresses.

If the plans do not really change much - perhaps you just want one plan. For what you are doing with the gather stats ever period of time is the logical equivalent of "flush my shared pool"

so, before you do anything, analyze what plans change - if any - and whether those changes are relevant (eg: if the last plan is good enough for all day, then just gather stats one more time - near the end of the day when the tables are almost full and run with those)

RE: "Analyze Frequency?" above

Steve, August 20, 2007 - 1:43 pm UTC

Oracle Version 10.1.0.5

Whose job is this?

karthick, September 06, 2007 - 8:50 am UTC

Who is responsible for analyzing the tables DBA or a Developer?

For example I have a process which uses a table. At the start the table is empty. The process populates it with lots of data (really lots of data!!). And then it calls some more process which work with that table. And once every thing is done the table is truncated.

So now is it a good thing to analyze that table on the fly when the data is populated to that table so that the other process using the table can use it in a optimal way.

As a developer can I put code in my process to analyze the table on the fly?

Regards,

Karthick.



Tom Kyte
September 11, 2007 - 8:08 am UTC

typically a DBA would be responsible for actually scheduling and monitoring the dbms_stats

however, a developer may have very relevant information needed by the DBA to properly do this.

so, both.

as a developer, you may call dbms_stats in your code, you might not even have to gather stats on the table, you might just SET THEM

actually, you might gather stats ONCE to get pretty good representative stats and just leave them there - never having the database think 'table is entirely empty' or 'table is entirely full'

ok or very bad

karthick, September 11, 2007 - 9:56 am UTC

as i told the table is populated only in the process and hence the following is coded in the process

EXECUTE IMMEDIATE 'ANALYZE TABLE MY_STG_TBL ESTIMATE STATISTICS';

is this ok or very bad when considering the overall perfomance.
Tom Kyte
September 15, 2007 - 3:26 pm UTC

as I said - just put representative stats in place and keep them.


and as I said, use dbms_stats, do NOT use your execute immediate.


More to Explore

Performance

Get all the information about database performance in the Database Performance guide.