Thanks for the question, brenda.

Asked: October 07, 2002 - 2:29 pm UTC

Last updated: May 21, 2012 - 8:03 am UTC

Version: 8.1.7.3.0

Viewed 100K+ times! This question is

You Asked

Hi Tom:
we are having some issues with performance on our 500GB database environment. we do analyze tables on a daily basis every 4 hours since data loading on different time, not sure when the data will finish loading. I have seen different kinds of 'analyze':

a. analyze table xxxx estimate statistics sample 10 percent;

b. analyze table xxxx estimate statistics sample 10 percent for table for all indexes for all indexed columns;

c. analyze table xxxx validate structure;

d. analyze table t compute statistics
for table
for all indexes
for all indexed columns;

e. analyze table xxxx compute statistics for table

could you tell me 5 analyze differences? and what is the best for DW when we have daily loading going on during the day and analyze job is running as well?

and Tom said...

I don't like any of them.

DBMS_STATS is what you want to use. If you

ALTER TABLE <tname> MONITORING

then you can gather stats JUST on tables that need it (we'll keep track for you -- and only analyze tables in which about 10% of the data has changed). That way the analyzes will go much faster.

Me, I'm sort of a fan of a method that would analyze the table, the indexes and in most cases -- the indexed columns (although for unique or data with fairly uniform distributions, you could be more specific and just analyze the indexes that are on skewed data).

Here is a synopsis of the differences for the above (but, don't use them! use dbms_stats to do what they do. Which is best???? well, compute is prefered but for really big things, not feasible. I said what i like -- table, indexes and the proper set of indexed columns)

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t as select * from all_objects;

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create index t_idx on t(object_id);

Index created.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t estimate statistics sample 10 percent;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> select t.num_rows, i.num_rows, c.cnt
2 from (select num_rows from user_tables where table_name = 'T') t,
3 (select num_rows from user_indexes where table_name = 'T' ) i,
4 (select count(distinct column_name) cnt from user_tab_histograms where table_name = 'T' ) c
5 /

NUM_ROWS NUM_ROWS CNT
---------- ---------- ----------
23634 23634 12

that shows we got the table analyzed, the index(es) analyzed and histograms for all of the columns in the table that had at least one not null value (expensive!)

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t delete statistics;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t estimate statistics sample 10 percent for table for all indexes for all indexed columns;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> select t.num_rows, i.num_rows, c.cnt
2 from (select num_rows from user_tables where table_name = 'T') t,
3 (select num_rows from user_indexes where table_name = 'T' ) i,
4 (select count(distinct column_name) cnt from user_tab_histograms where table_name = 'T' ) c
5 /

NUM_ROWS NUM_ROWS CNT
---------- ---------- ----------
23634 23634 1

Here we got the table, the indexes and a histogram on ONE column -- the one we have an index on (not particularly relevant in this case as object id is unique and hence distributed nicely, not skewed)

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t delete statistics;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t validate structure;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> select t.num_rows, i.num_rows, c.cnt
2 from (select num_rows from user_tables where table_name = 'T') t,
3 (select num_rows from user_indexes where table_name = 'T' ) i,
4 (select count(distinct column_name) cnt from user_tab_histograms where table_name = 'T' ) c
5 /

NUM_ROWS NUM_ROWS CNT
---------- ---------- ----------
0

No stats, just a diagnostic tool -- not for statistics gathering!

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t delete statistics;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics for table for all indexes for all indexed columns;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> select t.num_rows, i.num_rows, c.cnt
2 from (select num_rows from user_tables where table_name = 'T') t,
3 (select num_rows from user_indexes where table_name = 'T' ) i,
4 (select count(distinct column_name) cnt from user_tab_histograms where table_name = 'T' ) c
5 /

NUM_ROWS NUM_ROWS CNT
---------- ---------- ----------
23634 23634 1

ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t delete statistics;

Table analyzed.

same as the estimate but does the full table -- if I had a bigger set, the num_rows here might be more accurate using compute then estimate

ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics;

Table analyzed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> select t.num_rows, i.num_rows, c.cnt
2 from (select num_rows from user_tables where table_name = 'T') t,
3 (select num_rows from user_indexes where table_name = 'T' ) i,
4 (select count(distinct column_name) cnt from user_tab_histograms where table_name = 'T' ) c
5 /

NUM_ROWS NUM_ROWS CNT
---------- ---------- ----------
23634 23634 12

same as the corresponding estimate but does the entire table

Rating

(193 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

What do you think of this analyze script?

Alan Rosenthal, October 08, 2002 - 12:59 am UTC

Hi Tom,

I use the following sqlplus script to analyze my db's. I have a good window of time when I can run this so I am not concerned with the time to run the analyze. Is this script good? Is there anything more that I should do?

=====================================
prompt "Gathering all stats"
execute dbms_stats.gather_database_stats( cascade => TRUE ) ;

prompt "Deleting sys stats"
execute dbms_stats.delete_schema_stats( 'SYS' ) ;

prompt "Deleting system stats"
execute dbms_stats.delete_schema_stats( 'SYSTEM' ) ;

exit ;
===============================

I'm definitely going to look into monitoring.

Thanks,

Alan

October 08, 2002 - 7:32 am UTC

Instead of gathering on SYS and SYSTEM and then removing - you might

for x in ( select username
from dba_users
where username not in ( 'SYS', 'SYSTEM' ) )
loop
dbms_stats.gather_SCHEMA_stats( ...... for that username )
end loop;

Review

Irfan, October 08, 2002 - 11:58 am UTC

Q1. Your advice is to use dbms_stat. There is another package
dbms_utility, Do you prefer dbms_stat over dbms_utility?
if yes then why ?

Q2. In your book expert one-on-one, in the appendix A regarding dbms_utility you have written that "WE START BY FIRST DELETING AND THEN COLLECTING STATISTICS". Why there is a need to delete the statistics? The oracle manual says that ENTRIES ARE UPDATED FOR AN OBJECT EACH TIME WE ANALYZE THE OBJECT.

Thanks and regards.

October 08, 2002 - 12:48 pm UTC

q1) dbms_utility has a "pre dbms stat" procedure that offers no good functionality (nothing like dbms_stat)

ignore that dbms_utility procedure
use dbms_stat

q2) I did that (the delete) just to show "look, there is nothing up my sleeve". I deleted -- showed there were no stats and then analyzed showing there are now stats.

Thats all, just an example.

Histograms with bind variables

Todd, October 08, 2002 - 1:06 pm UTC

Tom, if we have built an app to use bind variables instead of literal values, is there any benefit to building histograms on even just indexed columns?

October 08, 2002 - 2:14 pm UTC

could be -- you can (and should sometimes) have literals in there, eg:

select count(*)
from t
where dname = 'SALES'
and sal > :x;

There is nothing wrong with that -- if the query is executed over and over and over. A histogram on dname might well be appropriate. A well tuned system uses bind variables where appropriate and literals when it wants to. See -- if that query was ALWAYS executed with dname = 'SALES', there would be no point in binding 'SALES' in -- we only bind in values that change from execution to execution.

DBMS_STATS

brenda zhang, October 08, 2002 - 1:49 pm UTC

Hi Tom,
thank you very much! very through.... but I still have oen question here: for instance, we have 8 milions table cust_dim,we create a following procedure to analyze table, but this table cust_dim will not get analyzed untill there are more than 10% of data change as you said:
"you can gather stats JUST on tables that need it (we'll keep track for you and only analyze tables in which about 10% of the data has changed).

so the statistics will not be accurate all the time if the data change is less than 10%. do we still have to use scheduler to analyze on a regular basis beside DBMS_STATS.GATHER_DATABASE_STATS?

create or replace procedure anal_schemas is
begin
begin
DBMS_STATS.GATHER_DATABASE_STATS (
estimate_percent => 10,
block_sample => FALSE,
method_opt => 'FOR ALL INDEXED COLUMNS SIZE 1',
degree => 8,
granularity =>'DEFAULT',
cascade => TRUE,
stattab => NULL,
statid => NULL,
options => 'GATHER STALE'
-- objlist =>'LIST',
-- statown => NULL
);
end;

begin
DBMS_STATS.DELETE_SCHEMA_STATS ('SYS');
end;

begin
DBMS_STATS.DELETE_SCHEMA_STATS ('SYSTEM');
end;
end;
/

October 08, 2002 - 2:20 pm UTC

well, I still would gather for SCHEMAS not database...

and yes, you'll need to schedule this to run from time to time.

DBMS_STATS.GATHER_SCHEMA_STATS doesn't work

Brenda Zhang, October 08, 2002 - 5:45 pm UTC

Tom: there is a bug for DBMS_STATS.GATHER_SCHEMA_STATS??
<Bug 1890016>: GATHER STALE DOES NOT SEEM TO BE WORKING FOR DBMS_STATS.GATHER_SCHEMA_STATS

October 08, 2002 - 6:22 pm UTC

but the bug also has in it:

Rediscovery Information:
You will see this bug when using dbms_stats.gather_xxx_stats and :
1. An invalid granularity is supplied
2. The GATHER STALE or GATHER EMPTY options are used
3. The procedure call does not seem to work
.
Workaround:
Correct the granularity parameter

simple workaround (and its fixed in the current releases)

Review

Irfan, October 09, 2002 - 8:54 am UTC

You mean to say that <Bug 1890016>: will not affect if I ignore GATHER STALE or GATHER EMPTY. In this case following should work in Oracle 9.0.1.3 to compute statistics of schema ?

execute DBMS_STATS.GATHER_SCHEMA_STATS('SCOTT',NULL);

Thanks

October 09, 2002 - 4:35 pm UTC

yes.

Advice on "SIZE 1" histogram

Brad, October 14, 2002 - 7:49 pm UTC

The manual describing the DBMS_STATS options makes the point that you must specify "SIZE 1" in order to enable parallel processing for histograms. Doesn't a 1 bucket histogram limit it's usefulness?

Any advice on this?

E.G.

DBMS_STATS.GATHER_DATABASE_STATS (
estimate_percent => 15,
method_opt => 'FOR ALL INDEXED COLUMNS SIZE 1',
degree => 8, ^^^^^^
cascade => TRUE) ;

October 14, 2002 - 8:35 pm UTC

There may be certain specific columns you won't do in parallel... computing histograms is very expensive -- do it only for those columns you have identified a need for (eg: wanna make a system go faster at night, stop using "analyze table compute statistics", you are computing histograms on everything)

Review

Irfan, October 26, 2002 - 5:04 pm UTC

I am using following to Analyze schema with COMPUTE option
(I am specifying NULL for COMPUTE.)
execute DBMS_STATS.GATHER_SCHEMA_STATS('SCOTT',NULL);

For gathering statistics on index as well, do I need to specify CASCADE=true ? Or by default above will collect index Statistics ?

Thanks.

October 27, 2002 - 11:07 am UTC

Here is a way you can determine what is happening when you analyze (whats getting gathered)


ops$tkyte@ORA920.US.ORACLE.COM> drop user a cascade;
User dropped.

ops$tkyte@ORA920.US.ORACLE.COM> grant dba to a identified by a;
Grant succeeded.

ops$tkyte@ORA920.US.ORACLE.COM> @connect a/a

a@ORA920.US.ORACLE.COM> create table t ( x int primary key, y int );
Table created.

a@ORA920.US.ORACLE.COM> insert into t values ( 1, 1 );
1 row created.

a@ORA920.US.ORACLE.COM> exec dbms_stats.gather_schema_stats( 'A', NULL );
PL/SQL procedure successfully completed.

a@ORA920.US.ORACLE.COM> select count(*) from user_tables where num_rows is not null;

  COUNT(*)
----------
         1

<b>indicates TABLE got analyzed</b>

a@ORA920.US.ORACLE.COM> select count(*) from user_indexes where num_rows is not null;

  COUNT(*)
----------
         0

<b>but not any indexes</b>

a@ORA920.US.ORACLE.COM> select count(distinct column_name) from user_tab_histograms;

COUNT(DISTINCTCOLUMN_NAME)
--------------------------
                         2

a@ORA920.US.ORACLE.COM>

<b>but it did get histograms for every column in the table</b>

which granularity should be used to analyze a schema?

Leo, October 27, 2002 - 11:41 am UTC

I have a schema that has partition tables and non-partition tables. I want to use DBMS_STATS.GATHER_SCHEMA_STATS to analyze all tables in the schema. Which granularity should be used, 'DEFAULT', 'ALL' or 'PARTITION'?

October 27, 2002 - 12:14 pm UTC

only you can answer that. which do you want to use (best to understand what they do, and apply the one that is correct and proper for your environment)

if there was a single way that was superior, there wouldn't be a choice.

You might find gather schema stats to be too coarse in a higher end system. you may find you need to do some things table by table.

leo, October 27, 2002 - 12:35 pm UTC

One advantage to use DBMS_STATS is that it can only analyze stale tables, but this option is only available in GATHER_SCHEMA_STATS and GATHER_DATABASE_STATS procedures. Using the right granularity is critical for analyzing both partition and non-partition tables in a schema.

October 27, 2002 - 1:56 pm UTC

hmmm, well, lets see --

I can use LIST STALE on a schema and then do it table by table based on the results.

So, while you must use stale at the database or schema level, you certainly are not forced to analyze stale only at the schema or database level, you can just ask for a list of stale "things" and analyze them as you see fit (based off of a table perhaps you maintain of analyze options for a given segment?)

So, I disagree that stale is limited to schema or database. It is that you would ask for a list of stale objects at the database or schema level and then proceed with analyzing if you like.

Review

Irfan, October 27, 2002 - 3:46 pm UTC

I went through your example to see want is happening when we analyze schema. I used dbms_utility.analyze_schema with your example. Following is the outcome.

SQL> create table t ( x int primary key, y int );

Table created.

SQL> insert into t values ( 1, 1 );

1 row created.

SQL> execute dbms_utility.analyze_schema('A','COMPUTE');

PL/SQL procedure successfully completed.

SQL> select count(*) from user_tables where num_rows is not
  2  null;

  COUNT(*)
----------
         1

SQL> select count(*) from user_indexes where num_rows is not
  2  null;

  COUNT(*)
----------
         1
SQL> select count(distinct column_name) from
  2  user_tab_histograms;

COUNT(DISTINCTCOLUMN_NAME)
--------------------------
                         2


So, when I use DBMS_UTILITY.ANALYZE_SCHEMA I am getting statistics on INDEXES also. Correct ?

Then in this case DBMS_UTILITY should be prefered over DBMS_STATS. Please suggest.

October 27, 2002 - 4:12 pm UTC

do not use dbms_utility -- it is there for backwards support to 7.3.

Use dbms_stats, ignore dbms_utility for analyzing tables. It is never preferred. It has so few features as to make it just pale in comparision.

and remember this -- more is NOT better. more is sometimes "more".

It is RARE that you want histograms on everything -- it is very very very time consuming and resource consuming to compute them.

REVIEW

Irfan, October 27, 2002 - 4:32 pm UTC

So, I am going to use DBMS_STATS. I want to get the statistics on indexes also. I this case how can I define the syntex of DBMS_STATS.

I tried to do following to Analyze ENRIRE schema with INDEXES including.

exec DBMS_STATS.GATHER_SCHEMA_STATS('SCOTT',NULL,FALSE,'FOR ALLCLOUMNS SIZE 1',NULL, 'DEFAULT',TRUE)

Is the above correct.(I could not understand block_sample and method_opt, that's why I put default values.)

Is there any other better way to do this ?

October 27, 2002 - 5:30 pm UTC

well, lets see:

@ORA920.US.ORACLE.COM> create table t ( x int primary key, y int );

Table created.

a@ORA920.US.ORACLE.COM> insert into t values ( 1, 1 );

1 row created.

a@ORA920.US.ORACLE.COM> exec DBMS_STATS.GATHER_SCHEMA_STATS('A',NULL,FALSE,'FOR ALL COLUMNS SIZE 1',NULL, 'DEFAULT',TRUE)

PL/SQL procedure successfully completed.

a@ORA920.US.ORACLE.COM>
a@ORA920.US.ORACLE.COM> select count(*) from user_tables where num_rows is not null;

COUNT(*)
----------
1

a@ORA920.US.ORACLE.COM> select count(*) from user_indexes where num_rows is not null;

COUNT(*)
----------
1

a@ORA920.US.ORACLE.COM> select count(distinct column_name) from user_tab_histograms;

COUNT(DISTINCTCOLUMN_NAME)
--------------------------
2

it does accomplish your goal of analyzing the indexes, yes. But -- are you sure you want 100% column level histograms -- that, only you can answer.

Review

Irfan, October 27, 2002 - 6:14 pm UTC

Thanks for detailed explanation one more question. Instead of typing the all default values, Can I use following to save me typing effort. It it take all the default values?

execute dbms_stats.gather_schema_stats ('A',NULL,cascade=>true);

Thanks again.

October 27, 2002 - 6:28 pm UTC

fortunately, we document all of the defaults and how to use these things:

</code> http://docs.oracle.com/docs/cd/A87860_01/doc/appdev.817/a76936/dbms_st2.htm#1003995 <code>

is the documentation for gather schema stats -- just check it out and make sure the defaults are the ones you want.

by the way .... most people would SCHEDULE this using dbms_job to happen on a recurring basis -- so you would save on typing by scheduling it once or by creating a script that has the "long version" in it

is the 8.1.7 doc wrong?

A reader, October 28, 2002 - 6:34 am UTC

Hi

In SQL ref guide it says this about analyze command:

===========================================================
If you want to collect statistics on both the table as a whole and on one or more columns, be sure to generate the statistics for the table first, and then for the columns. Otherwise, the table-only ANALYZE will overwrite the histograms generated by the column ANALYZE. For example, issue the following statements:

ANALYZE TABLE emp ESTIMATE STATISTICS;
ANALYZE TABLE emp ESTIMATE STATISTICS
FOR ALL COLUMNS;

===========================================================

I mean why we have to do twice if the first analyze command gets stats for all columns already?????

October 28, 2002 - 7:38 am UTC

not technically wrong -- just a bad example. The premise is right -- the example is not "good". you do not need to do that twice. It is pointing out that if you do some analyzes in certain orders -- the last one "wins". They should have done it differently, yes.

Clarification

Steve, November 05, 2002 - 6:13 am UTC

Hi Tom

We've identified about 3 tables that have indexed columns that require histograms - so we issue the following:

exec dbms_stats.gather_table_stats(ownname=>'<schema>',tabname=>'<tabname>',method_opt=>'FOR COLUMNS <columnname> SIZE 3');

If we want to analyse the entire schema - will we need to run the above command after gathering stats for the entire schema or is there a way of getting the database to look after it for us?

Cheers

Steve

November 05, 2002 - 9:51 am UTC

you would run it after -- not sure what you mean by "getting the database to look after it for us"

Steve, November 05, 2002 - 9:57 am UTC

Thanks Tom,

I guess what I meant was whether we had to run the command separately afterwards as you've now said, or whether, once it's analysed with the extra buckets, Oracle would always use the same number of buckets with a standard dbms_stat.gather_schema_statistics command on those columns.

Why DBMS_STATS.gather_schema_stats('SCHEMA') does not work?

A Reader, November 12, 2002 - 11:21 am UTC

Hi Tom,
I have a database that is used on online transaction system. Some thing about the statictis I don't understand. When I run DBMS_STATS.gather_schema_stats('SCHEMA'), the tables in the schema are analyzed (I check the last_analyzed in v$session). However, when I run a query, the performance does not improve at all. I also rin the dbms_utility.analyze_schema('SCHEMA','COMPUTE'), the same thing happen, no performance gain. Both methods worked in other databases. Since I know which table I need to analyze, I run "analyze table t compute statistics". After this, the respose time for the query improve from 1.1 s to 80 ms. The execution plan never changes according to the plans generated before and after all the analysis. Another thing I notice is that, after a few logins using the application interface, the query slow down again. Questions:
1. Wyh DBMS_STATS.gather_schema_stats('SCHEMA') or dbms_utility.analyze_schema does not work and I need to analyze the specific table?
2. Why the plan does not change but the query (which always uses full table scan) performance changes?
3. Why the query slow again after a few logins? How can I "chache" the staistics or the plan?

Thank you very much for your help.

November 13, 2002 - 12:26 pm UTC

Oh - would it not be totally cool if just running dbms_stats was like setting fast=true!

what you need to do is
a) run the queries with sql_trace before and after having statistics
b) compare the query plans and results

Same with all of the other questions you have -- sql_tracing will tell you what is going on.

With the amount of detail provided here -- no tkprofs, no examples -- there is not much more I can say (well, I can say that if you have my book "Expert one on one Oracle" read chapter 10 and follow the procedures outlined there to see what is going on)

please guide

Yogeeeraj, November 29, 2002 - 3:02 am UTC

hello,

From above:
<quote>
I can use LIST STALE on a schema and then do it table by table based on the results.
</quote>

Can you please direct me to the document where this is explained in more details or give an example?

thanks a lot.
Best Regards
Yogeeraj

November 29, 2002 - 10:24 am UTC

have you read the supplied packages guide?  all supplied packages are documented there.

this is a rather simple exercise in plsql programming.  Here is a simple example:


ops$tkyte@ORA920.LOCALHOST> create table t as select * from all_objects where rownum < 100;
Table created.

<b>
ops$tkyte@ORA920.LOCALHOST> alter table t monitoring;
Table altered.

-- that is needed for "staleness" measuring
</b>


ops$tkyte@ORA920.LOCALHOST> analyze table t compute statistics;
Table analyzed.

ops$tkyte@ORA920.LOCALHOST> declare
  2          l_objList  dbms_stats.objectTab;
  3  begin
  4          dbms_stats.gather_schema_stats
  5          ( ownname        => USER,
  6            options        => 'LIST STALE',
  7            objlist        => l_objList );
  8
  9          for i in 1 .. l_objList.count
 10          loop
 11                  dbms_output.put_line( l_objList(i).objType );
 12                  dbms_output.put_line( l_objList(i).objName );
 13          end loop;
 14  end;
 15  /

PL/SQL procedure successfully completed.

<b>
that tells me that no objects need stats right now.  If they did, we would only list them out -- not actually gather stats on them</b>

ops$tkyte@ORA920.LOCALHOST>
ops$tkyte@ORA920.LOCALHOST> insert into t select * from t;

99 rows created.

ops$tkyte@ORA920.LOCALHOST> commit;

Commit complete.

ops$tkyte@ORA920.LOCALHOST>
ops$tkyte@ORA920.LOCALHOST> declare
  2          l_objList  dbms_stats.objectTab;
  3  begin
  4          dbms_stats.gather_schema_stats
  5          ( ownname        => USER,
  6            options        => 'LIST STALE',
  7            objlist        => l_objList );
  8
  9          for i in 1 .. l_objList.count
 10          loop
 11                  dbms_output.put_line( l_objList(i).objType );
 12                  dbms_output.put_line( l_objList(i).objName );
 13          end loop;
 14  end;
 15  /
TABLE
T

PL/SQL procedure successfully completed.

<b>Ahh, now T needs stats, more then about 10% of the data was modified.  This shows that we just listed this fact:</b>


ops$tkyte@ORA920.LOCALHOST> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
        99

ops$tkyte@ORA920.LOCALHOST> exec dbms_stats.gather_schema_stats( ownname => user, options => 'GATHER STALE' );

PL/SQL procedure successfully completed.

ops$tkyte@ORA920.LOCALHOST> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
       198

ops$tkyte@ORA920.LOCALHOST>

<b>it told us to gather stats, it didn't actually do it when we used LIST STALE</b>

Great explanation !!

Yogeeraj, November 30, 2002 - 12:16 am UTC

Hello,

Thank you a lot for this detailed explanation. It really helps!

I will go through the supplied package guide again.

Best Regards
Yogeeraj

New v9.2 GATHER_AUTO feature

Brad, December 05, 2002 - 4:19 pm UTC

The GATHER_AUTO option is new in v9.x. The 9.2 doc says that when you use the GATHER_AUTO option you may not specify other options that direct which statistics are gathered, and seems to imply that it decides which stats to gather in an automated fashion. Do you happen to know if it determines which stats to gather on a table by table basis or if it just has a standard set of stats that it gathers.

-- Turn on monitoring for all the tables in the schema.
-- This only needs to be done as new tables are added.
begin
dbms_stats.alter_schema_tab_monitoring(
ownname => 'MYSCHEMA'
);
end;
/

-- The GATHER AUTO option causes statistics to be gathered
-- for tables that Oracle thinks need updated stats.
begin
dbms_stats.gather_schema_stats(
ownname => 'MYSCHEMA'
,options => 'GATHER AUTO'
);
end;
/

December 05, 2002 - 6:45 pm UTC

it is table/table, column by column. It is also wholly undocumented. It samples to decide what to do.

If you look at a sql_trace (be prepared to wait -- sql_trace really slows this puppy down), you'll see really strange looking queries like:

select /*+ cursor_sharing_exact dynamic_sampling(0) no_monitoring no_expand
index_ffs(t,"T1_IDX1") parallel_index(t,"T1_IDX1",32767) */ count(*) as nrw,
count(distinct sys_op_lbid(34421,'L',t.rowid)) as nlb,null as ndk,null as
clf
from
"A"."T1" sample block (.00001) t where "OBJECT_ID" is not null

select /*+ cursor_sharing_exact dynamic_sampling(0) no_monitoring no_expand
index_ffs(t,"T1_IDX1") parallel_index(t,"T1_IDX1",32767) */ count(*) as nrw,
count(distinct sys_op_lbid(34421,'L',t.rowid)) as nlb,null as ndk,null as
clf
from
"A"."T1" sample block (.001) t where "OBJECT_ID" is not null

select /*+ cursor_sharing_exact dynamic_sampling(0) no_monitoring no_expand
index_ffs(t,"T1_IDX1") parallel_index(t,"T1_IDX1",32767) */ count(*) as nrw,
count(distinct sys_op_lbid(34421,'L',t.rowid)) as nlb,null as ndk,null as
clf
from
"A"."T1" sample block (.1) t where "OBJECT_ID" is not null

select /*+ cursor_sharing_exact dynamic_sampling(0) no_monitoring parallel(t,
32767) parallel_index(t,32767) */ count(*)
from
"A"."T1" sample block (.001)

select /*+ cursor_sharing_exact dynamic_sampling(0) no_monitoring parallel(t,
32767) parallel_index(t,32767) */ count(*)
from
"A"."T1" sample block (.1)

....
insert /*+ append */ into sys.ora_temp_1_ds_110 select /*+
cursor_sharing_exact dynamic_sampling(0) no_monitoring parallel(t,32767)
parallel_index(t,32767) */ "OWNER","OBJECT_NAME","SUBOBJECT_NAME",
"OBJECT_ID","DATA_OBJECT_ID","OBJECT_TYPE","CREATED","LAST_DDL_TIME",
"TIMESTAMP","STATUS","TEMPORARY","GENERATED","SECONDARY" from "A"."T1"
sample (14.70588235294117647058823529411764705882) t
....

(please don't ask me about undocumented hints and functions you see -- pretend they do not exist because for all intents and purposes -- they don't!)

Again dbms_stats vs analyze

Igor, January 20, 2003 - 9:46 am UTC

Hi

We were doing daily
dbms_stats.gather_schema_stats( OWNNAME => 'SCHEMA', DEGREE => 2, CASCADE => true, method_opt => 'for all columns size 1' )
but found that some complex queries have suboptimal execution path.

After running,
analyze table &tab
compute statistics
for table
for all indexes
for all columns

on tables in these queries optimizer found much efficient
execution plan.
My conclusion was that that analyze collected more info
for CBO.
What would be equivalent in dbms_stats to this analyze ?

Igor

P.S. Also what confuses me a bit is that you are advertising dbms_stats but use analyze on table
level instead of gather_table_stats in most
of your examples (I dont mean things like chaining, validate structure and such)

Thanks for your clarification

Igor

January 20, 2003 - 10:58 am UTC

No,they just gathered DIFFERENT statistics. your method opt in dbms_stats is different then in the analyze.

In my "demos" -- I have toy tables. I just find it easier to type in (as that is what I do, type it in) analyze then the corresponding dbms_stats command. Just laziness on my part as far as that goes.

do as i say, not as i do (in this case) ;)

Yes, but

Igor, January 20, 2003 - 1:05 pm UTC

Thanks for your fast response

When you look at method opt option they are not
as versatile as with 'analyze table'.
I would like to run DBMS_STATS.GATHER_SCHEMA_STATS
with a equal level collected by prior analyze table
so that CBO is able to do efficient execution plan.
Maybe is a good idea to play with number of buckets (size x) ? ( I don't know what is number Oracle choose in ANALYZE statement)

Thanks for your response

Igor

January 20, 2003 - 1:13 pm UTC

they are as versitile? what do you mean?

More description of GATHER_SCHEMA_STATS method_opt

David Penington, January 20, 2003 - 10:33 pm UTC

Tom,
I have read the 9iR1 Supplied PL/SQL packages documentation for DBMS_STATS, and the chapter on gathering statistics in the 9iR1 Database Performance Guide. Is there any more information available on the meaning and meaning and impact of the METHOD_OPT of GATHER_SCHEMA_STATS and GATHER_TABLE_STATS ?
I know I could work it out by repeated testing, but something formal is nice (and more reliable)
Eg. Does it always gather table stat's ?
What do SIZE REPEAT, SIZE AUTO, SIZE SKEWONLY do?
What will it do when a column is names but doesn't occur in some of the tables (testing tells me what happens in the version I'm testing, but not on other versions)?

January 21, 2003 - 9:50 am UTC

Ok, the method opt dictates how histograms will be computed.

DBMS_STATS in general always gets table statistics (gather_table,gather_schema,gather_database all do).

If you use CASCADE=>TRUE, it'll get indexes.

METHOD_OPT dictates how histograms are done.

Take the analyze command for example:

analyze table T compute statistics
for table <<=== dbms_stats does this when you gather
table,schema,database stats
for all indexes <<=== when you do table/schema/database AND cascade=>
TRUE or you use gather index stats directly
for all columns size 1 <<== when you default method_opt, can be overriden

Now, we have method_opts of:

o for all columns - gather histograms on all columns -- hugely expensive and rarely should be used

o for all indexed columns - gather histograms on all columns that are currently indexed. I find this typically to be the "most useful" although you can arguably skip nearly unique columns such as primary keys

o for all hidden columns - gather histograms on columns you cannot see, useful if using the object relational features which incorporate tons of hidden columns in your objects.

o for columns <list> - gather histograms on these specific columns, could be useful to skip primary key indexes or just to get the 2 or 3 columns of interest

Each of the above can be modified with SIZE clauses where SIZE can be:

o SIZE <N>. N is an integer between 1 and 254 representing the number of buckets. See note at
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:3126073805757 <code>
to understand what this means and what happens if for example you use N = 10 and there are 5 distinct column values or you use N = 10 and there are 15 distinct values...

o SIZE REPEAT -- says to do the histograms just like you did them last time. It reads the data dictionary to figure out what to do

o SIZE AUTO -- Oracle looks at the data and using a magical, undocumented and changing algorithm, figures out all by itself what columns to gather stats on and how many buckets and all. It'll collect histograms in memory only for those columns which are used by your applications (those columns appearing in a predicate involving an equality, range, or like operators). we know that a particular column was used by an application because at parse time, we'll store
workload information in SGA. Then we'll store histograms in the data dictionary only if it has skewed data (and it worthy of a histogram)

o SIZE SKEWONLY when you collect histograms with the SIZE option set to SKEWONLY, we collect histogram data in memory for all specified columns (if you do not specify any, all columns are used). Once an "in-memory" histogram is computed for a column, it is stored inside the data dictionary only if it has "popular" values (multiple end-points with the same value which is what we define by "there is skew in the data").

don't know what you mean by "What will it do when a column is names but doesn't occur in some of the tables (testing tells me what happens in the version I'm testing, but not on other versions)?"... guess you mean "i listed a column name in the method opt, but not such column exists" -- it should ignore it.

Igor, January 22, 2003 - 6:19 am UTC

Hi

OK. Playing with buckets I got wanted result.
Thank for your help

gather schema & unusable indexes

Igor, January 24, 2003 - 4:33 am UTC

Hi

 Is there workaround without index rebuild ?
 (I would like to recollect statistics skipping unusable
 indexes, but doing it on the rest)

drop table t;

Table dropped.

SQL> create table t as select * from dba_objects;

Table created.

SQL> create index ti on t ( object_id );

Index created.

SQL> alter index ti unusable;

Index altered.

SQL> SQL> alter session set skip_unusable_indexes=true;

Session altered.


SQL> SQL> create table t2 as select * from dba_objects;
c
Table created.

SQL> reate index ti2 on t2 ( object_id );

Index created.

SQL> exec dbms_stats.gather_schema_stats( OWNNAME => 'T', DEGREE => 4, CASCADE => true, method_opt => 'for all columns size 1' )
BEGIN dbms_stats.gather_schema_stats( OWNNAME => 'T', DEGREE => 4, CASCADE => true, method_opt => 'for all columns size 1' ); END;

*
ERROR at line 1:
ORA-01502: index 'T.TI' or partition of such index is in unusable state
ORA-06512: at "SYS.DBMS_STATS", line 4481
ORA-06512: at "SYS.DBMS_STATS", line 4612
ORA-06512: at "SYS.DBMS_STATS", line 4722
ORA-06512: at "SYS.DBMS_STATS", line 4704
ORA-06512: at line 1


SQL> 1 select table_name, last_analyzed
SQL> /

TABLE_NAME                     LAST_ANAL
------------------------------ ---------
T                              24-JAN-03
T2

 Thanks for your answer

                                               Igor

January 24, 2003 - 7:29 am UTC

obvious solution:

cascade => false

for x in ( select * from user_indexes where status = 'VALID' )
loop
dbms_stats.gather_index_stats()

Igor, January 24, 2003 - 1:32 pm UTC

Hah, but it's not so elegant ;-)

Anyway, what is a bit confusing is that even when
I should skip these indexes (and really Oracle does it
in DML statements ), when you start procedure to collect
statistics, which is essentially select ( OK, a bit "strange" ) he doesn't.
It smells like "feature", no ? :-)

Igor

January 24, 2003 - 2:29 pm UTC

well, the docs do say

...
This setting allows all operations (inserts, deletes, updates, and selects) on tables with unusable indexes or index partitions.
......

nothing about other operations.

A reader, March 06, 2003 - 9:36 am UTC

Tom,

What should be the frequency of issuing analyze commands in OLTP vs DSS systems.

Thanks.

March 06, 2003 - 9:53 am UTC

I don't know.

Really.

I would recommend you use alter table T monitoring and let the system tell you how often for OLTP....

In DW -- it is usually a little more clear -- after you load.

Follow-up question

Bob Rich, March 17, 2003 - 8:33 am UTC

Tom,
I'm trying to test the gather_stale option on multiple tables having been analyzed with different percentages. I've set-up a test with two identical tables with more that 1 million rows each. I analyzed table 'A' using 100% of the rows, and table 'B' using an estimate of 15%. I then turned on table monitoring, truncated and reloaded each table. I then used dbms_stats.gather_schema_stats( ownname => 'XXXX', options => 'GATHER STALE'); It appears each table was reanalyzed using 100% of the rows. I expected table 'B' to use 15%. Is this the way it is supposed to work?

Thank you

March 17, 2003 - 9:34 am UTC

yes it is. If you want "custom collection methods" on the objects (different percents, whatever) you would (could) use list stale and maintain a database table of parameters to dbms_stats to be used

so you would list stale
iterate over the list and fetch the options
gather table stats for that object

One more follow-up

Bob Rich, March 17, 2003 - 12:33 pm UTC

Tom,
Okay, I can build a script to read the dba_tab_modifications table and analyze the tables, but how do I "clean out" the dba_tab_modifications table once the analyze script runs? Or, do I only use table monitoring on tables that I am willing to use estimate_percentage = 100%. I can't find anything on Metalink.

Granularity

Matt, April 24, 2003 - 6:36 am UTC

Tom,

We have a v large partitioned table, around 1000 Million rows, each period we add extra partitions an analyze them.

Initially we were doing a
dbms_stats.gather_stats specifying table name and the new partitions with granularity set to 'ALL' - over time this has started to slow down.

I think we would be better changing this to 'PARTITION' and then updating the global table stats once at the end, is this correct ? What is the best method for gathering the global table stats, is there anyway we can get it to use the already analyzed partition stats, or will it do a scan ?

April 24, 2003 - 8:30 am UTC

in order to gather global stats -- the data as a whole must be inspected (an estimate will help there, maybe parallel as well).

We cannot correctly determine data skew and distributions from the individual partition stats, we need to "globally" look at the table.

Now, if your queries all do partition pruning down to a single partition -- you do not really need global stats...

Tom, what's going on here ?

Fan, July 15, 2003 - 2:30 pm UTC

(9iR2/LINUX)

Tom, please help, thanks

begin DBMS_STATS.GATHER_SCHEMA_STATS('scott',NULL); end;

ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file client_UNW.bad in UTL_FILE_DIR not found
ORA-06512: at "SYS.DBMS_STATS", line 9136
ORA-06512: at "SYS.DBMS_STATS", line 9616
ORA-06512: at "SYS.DBMS_STATS", line 9800
ORA-06512: at "SYS.DBMS_STATS", line 9854
ORA-06512: at "SYS.DBMS_STATS", line 9831
ORA-06512: at line 1

July 15, 2003 - 3:12 pm UTC

scott has an external table

the exteneral file is "bogus", doesn't work, gives an error when you query it.

either

a) fix it
b) skip it
c) drop it

query

USER_EXTERNAL_TABLES

how does estimate statistics work?

A reader, July 23, 2003 - 8:15 am UTC

Hi

I would like to know how estimate statistics work? If I do

analyze table mytable estimate statistics sample 2000 rows

and myemp has 1000000 rows then how does Oracle estimate?

July 23, 2003 - 8:40 am UTC

by pulling a random sample of about 2000 rows from the table and using that as the basis for the statistics.

You can see it looks at pretty much the entire thing to get this random sample of rows via:


ops$tkyte@ORA920LAP> create table t1 as select * from big_table.big_table where rownum <= 10000;

Table created.

ops$tkyte@ORA920LAP> create table t2 as select * from big_table.big_table where rownum <= 50000;

Table created.

ops$tkyte@ORA920LAP> create table t3 as select * from big_table.big_table where rownum <= 100000;

Table created.

ops$tkyte@ORA920LAP> create table t4 as select * from big_table.big_table where rownum <= 500000;

Table created.

ops$tkyte@ORA920LAP> create table t5 as select * from big_table.big_table where rownum <= 1000000;

Table created.

ops$tkyte@ORA920LAP>
ops$tkyte@ORA920LAP> alter session set sql_trace=true;

Session altered.

ops$tkyte@ORA920LAP> analyze table t1 estimate statistics sample 2000 rows;

Table analyzed.

ops$tkyte@ORA920LAP> analyze table t2 estimate statistics sample 2000 rows;

Table analyzed.

ops$tkyte@ORA920LAP> analyze table t3 estimate statistics sample 2000 rows;

Table analyzed.

ops$tkyte@ORA920LAP> analyze table t4 estimate statistics sample 2000 rows;

Table analyzed.

ops$tkyte@ORA920LAP> analyze table t5 estimate statistics sample 2000 rows;

Table analyzed.


analyze table t1 estimate statistics sample 2000 rows

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.08       0.84        139        148          2           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.08       0.84        139        148          2           0
********************************************************************************
analyze table t2 estimate statistics sample 2000 rows

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.13       0.88        734        726          2           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.14       0.88        734        726          2           0
********************************************************************************
analyze table t3 estimate statistics sample 2000 rows

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.20       1.41       1480       1439          3           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.20       1.41       1480       1439          3           0
********************************************************************************
analyze table t4 estimate statistics sample 2000 rows

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.42       3.42       6111       2009          2           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.42       3.42       6111       2009          2           0
********************************************************************************
analyze table t5 estimate statistics sample 2000 rows

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.58       1.88      10322       2009          2           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.59       1.88      10322       2009          2           0


The more data in the table, the more work it takes to get the required number of rows.

also note the 2000 is "about" 2000:

ops$tkyte@ORA920LAP> select table_name, sample_size from user_tables;

TABLE_NAME                     SAMPLE_SIZE
------------------------------ -----------
T1                                    1724
T2                                    1914
T3                                    1914
T4                                    1983
T5                                    1838

cool

robert, July 23, 2003 - 10:19 am UTC

but aren't you on vacation ?
so what's all this VPNing :}

I dont understand

A reader, July 23, 2003 - 2:11 pm UTC

Hi

From your last example how does Oracle determine the total number of rows by just reading roughly 2000 rows? Does it do this: determine how many blocks are used to store these 2000 rows then get the average rows per block then multiply that to number of blocks? If so where does it get the number of blocks?

July 23, 2003 - 7:12 pm UTC

not documented, but it certainly knows where the HWM of the table is (it knows how many blocks might contain rows) -- even the RBO needed that in order to full scan.

if HWM is always known

A reader, July 24, 2003 - 6:26 pm UTC

Hi

As far as I know we must analyze the tables to get the HWM, if as you said RBO also needs HWM then how does RBO obtain that data from?

July 25, 2003 - 6:44 am UTC

you do not need to analyze, the package dbms_space can be used to find this information out. it is all on the segment headers.

otherwise, the rbo would NOT be able to full scan!

What happens if SYS/SYSTEM tables are analyzed also

Dilbert, July 25, 2003 - 2:42 pm UTC

Hi Tom,

What I understand from your answers is that sys and system accounts should not be analyzed as they work best with RULE. In one of our instances, I found that sys and system have been analyzed. Would it have a negative effect on performance ? I am a pl/sql developer so its not an easy task for me to play around with the statistics to create a test case and prove :(

July 25, 2003 - 5:00 pm UTC

in 9i -- it is OK and supported. in 8i it might be an issue (but you would know, it would be obvious that there is a problem). in the next release, it'll be the way to go.

dbms_stats equivalent of estimate statistics

Peter Tran, July 25, 2003 - 6:48 pm UTC

Tom,

I've recommended for our company to convert over to using dbms_stats rather than analyze table.

One question I got was what's the equivalent of dbms_stats parameter setting to emulate analyze table estimate statistics runtime?

I tried:

DBMS_STATS.GATHER_TABLE_STATS (
ownname=>NULL,
tabname=><table_name>,
partname=>NULL,
estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,
method_opt=>'FOR ALL INDEXED COLUMNS SIZE AUTO',
degree=>4,
cascade=>TRUE);

On a 15 million rows table, the analyze table estimate statistics ran in seconds, while the above dbms_stats.gather_table_stats() took over 3 minutes!

Are they so different that we should expect dbms_stats to run much longer than analyze table <table_name> estimate statistics?

Is trying to compare the runtime meaningless, since one (dbms_stats) is much better than the other?

Thanks,
-Peter

July 25, 2003 - 8:45 pm UTC

3 minutes! good gosh, so? really -- if you would like, you can use a fixed estimate percent just like before. but hey, 3 minutes on a 15million row table doesn't sound too bad given the level of stats you are gathering.

you do not give me the command you used to use -- did you cascade there? did you gather histograms? what?

are you sure parallel makes sense even -- is the data nicely spread out?

The gory details

Peter Tran, July 25, 2003 - 9:41 pm UTC

SQL> select count(*) from big_table;

  COUNT(*)
----------
  15179090

SQL> analyze table big_table delete statistics;

Table analyzed.

SQL> analyze table big_table estimate statistics;

Table analyzed.

Elapsed: 00:00:02.00

SQL> @check_table

INDEX_NAME                     T_NUM_ROWS I_NUM_ROWS UQ_COLUMNS
------------------------------ ---------- ---------- ----------
big_table_PK                     15195279 14946619.2          5
big_table_IX                     15195279   12665663          5
big_table_UQ                     15195279   12390453          5

SQL> analyze table big_table delete statistics;

Table analyzed.

SQL> BEGIN
  2    DBMS_STATS.GATHER_TABLE_STATS (
  3       ownname=>NULL,tabname=>'big_table',
  4       partname=>NULL,
  5       estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE,
  6       method_opt=>'FOR ALL INDEXED COLUMNS SIZE AUTO',
  7       degree=>4,
  8       cascade=>true);
  9  END;
 10  /

PL/SQL procedure successfully completed.

Elapsed: 00:03:36.09

SQL> @check_table

INDEX_NAME                     T_NUM_ROWS I_NUM_ROWS UQ_COLUMNS
------------------------------ ---------- ---------- ----------
big_table_PK                     15179090   14712725          5
big_table_IX                     15179090   14593628          5
big_table_UQ                     15179090   14705608          5


Where @check_table.sql is:

select i.index_name, 
       t.num_rows T_NUM_ROWS, 
       i.num_rows I_NUM_ROWS, c.cnt UQ_COLUMNS
from (select num_rows from user_tables where table_name = 'big_table') t,
     (select index_name, num_rows from user_indexes where table_name = 'big_table' ) i,
     (select count(distinct column_name) cnt from user_tab_histograms where table_name = 'big_table' ) c
/

I guess it's just a shock to the developers when they first start using dbms_stat and realize that they're not going to get response time in the seconds.

I thought the degree parameter is used to specify how much parallism you can use while gathering the statistics.  I noticed that if I leave out the degree=>4 part, it takes double the time to execute the dbms_stats.

In the first version, I forgot to specify the estimate_percent, so it defaulted to compute.  The job ran for more than 30 minutes.  I finally had to kill it.

-Peter

July 26, 2003 - 7:03 am UTC

you are if you check out the sample size, you'll find you were analyzing about 1064 rows with that estimate. You might have been lucky, you might have the most well behaved data on the planet whereby 1064 rows from 15million is representative enough to represent your data.

In most cases, it will not be. you can obviously set the estimate size and such so as to "correct" this, to make the estimate go faster if you like.

9i

Tatiane, July 29, 2003 - 11:10 am UTC

>in 9i -- it is OK and supported. in 8i it might be an >issue (but you would
>know, it would be obvious that there is a problem). in >the next release, it'll
>be the way to go.

You mean in 9i (and beyond) could I simply do (without problems with sys or system):

DBMS_STATS.GATHER_DATABASE_STATS (
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO',
cascade => TRUE
);

?

July 29, 2003 - 11:26 am UTC

after TESTING in your TEST system, running your applications in TEST, yes.

although you might consider table monitoring.... and gather stale...

remember your data dictionary is not very volitile, it need not be analyzed frequently.

so, gather database stats, not sure, schema stats -- yes.

statistics connected as SYS

Juancarlosreyesp@yahoo.com, August 01, 2003 - 11:34 am UTC

Hi, Tom, which is your opinion about
connect as SYS to gather full statistics
with the following command.
begin
DBMS_STATS.GATHER_DATABASE_STATS();
end;

I am almost completely sure I'm wrong, which privileges do you suggest, to execute them, only DBA?

August 01, 2003 - 11:38 am UTC

i would prefer to gather them schema by schema using the gather stale option instead of "gather everything".

have each schema gather its own stats on its own schedule.

A reader, August 01, 2003 - 12:02 pm UTC

Thank you :)

estimate vs compute

Reader, August 15, 2003 - 12:46 pm UTC

Tom, If I use estimate statistics with some % in analyze table command, is there anything like 25% or 30% for estimate that will do the analysis just the same if I used compute statistics option. In other words, is there any optimum % oracle recommends for estimate that would do the same as compute so that for really "huge" tables we don't have to use compute option.

Also, analyze table t compute statistics command by default analyzes the table as well for all the columns and indexes associated with it. Thanks.

August 15, 2003 - 12:59 pm UTC

if there were an "optimum", that would be the only setting ;)

anything at 49 or above is COMPUTE.

anything below tries to do what you ask.  In many cases, given a large enough evenly distributed set of data 10% can do it.  the more skewed the data, the larger the sample required to get a feel for the degree of skewedness.


A simple test like this can show you what is getting gathered with what commands:

ops$tkyte@ORA920> create table t ( x int primary key );

Table created.

ops$tkyte@ORA920> insert into t values(1);

1 row created.

ops$tkyte@ORA920> commit;

Commit complete.

ops$tkyte@ORA920>
ops$tkyte@ORA920> select 'table', decode( num_rows, null, 'nope', 'yep' ) from user_tables where table_name = 'T'
  2  union all
  3  select 'index', decode( num_rows, null, 'nope', 'yep' ) from user_indexes where table_name = 'T'
  4  union all
  5  select 'histograms', decode( count(*), 0, 'nope', 'yep' ) from user_tab_histograms where table_name = 'T' and rownum = 1
  6  /

'TABLE'    DECO
---------- ----
table      nope
index      nope
histograms nope

ops$tkyte@ORA920> analyze table t compute statistics;

Table analyzed.

ops$tkyte@ORA920> select 'table', decode( num_rows, null, 'nope', 'yep' ) from user_tables where table_name = 'T'
  2  union all
  3  select 'index', decode( num_rows, null, 'nope', 'yep' ) from user_indexes where table_name = 'T'
  4  union all
  5  select 'histograms', decode( count(*), 0, 'nope', 'yep' ) from user_tab_histograms where table_name = 'T' and rownum = 1
  6  /

'TABLE'    DECO
---------- ----
table      yep
index      yep
histograms yep

ops$tkyte@ORA920> analyze table t delete statistics;

Table analyzed.

ops$tkyte@ORA920> analyze table t compute statistics for table;

Table analyzed.

ops$tkyte@ORA920> select 'table', decode( num_rows, null, 'nope', 'yep' ) from user_tables where table_name = 'T'
  2  union all
  3  select 'index', decode( num_rows, null, 'nope', 'yep' ) from user_indexes where table_name = 'T'
  4  union all
  5  select 'histograms', decode( count(*), 0, 'nope', 'yep' ) from user_tab_histograms where table_name = 'T' and rownum = 1
  6  /

'TABLE'    DECO
---------- ----
table      yep
index      nope
histograms nope

ops$tkyte@ORA920> analyze table t delete statistics;

Table analyzed.

ops$tkyte@ORA920> analyze table t compute statistics for all indexes;

Table analyzed.

ops$tkyte@ORA920> select 'table', decode( num_rows, null, 'nope', 'yep' ) from user_tables where table_name = 'T'
  2  union all
  3  select 'index', decode( num_rows, null, 'nope', 'yep' ) from user_indexes where table_name = 'T'
  4  union all
  5  select 'histograms', decode( count(*), 0, 'nope', 'yep' ) from user_tab_histograms where table_name = 'T' and rownum = 1
  6  /

'TABLE'    DECO
---------- ----
table      nope
index      yep
histograms nope

Very helpful. Thanks much.

reader, August 15, 2003 - 3:02 pm UTC

dbms_stats does it different ....

reader, August 15, 2003 - 3:20 pm UTC

Tom, I did the above test with dbms_stats package as follows, and could please sugest how do i get my index analyzed also. A is the table t in your above demo with primary key defined.

SQL> exec dbms_stats.gather_table_stats('DEMO','A');

PL/SQL procedure successfully completed.

SQL> select 'table', decode( num_rows, null, 'NO', 'YES' ) from 
  2  dba_tables where table_name = 'A'
  3    union all
  4  select 'index', decode( num_rows, null, 'NO', 'YES' ) from dba_indexes 
  5  where table_name = 'A'
  6    union all
  7  select 'histograms', decode( count(*), 0, 'NO', 'YES' ) from 
  8  dba_tab_histograms where table_name = 'A' and rownum = 1
  9  /

'TABLE'    DEC
---------- ---
table      YES
index      NO
histograms YES

SQL> analyze table a delete statistics;

Table analyzed.

SQL> select 'table', decode( num_rows, null, 'NO', 'YES' ) from 
  2  dba_tables where table_name = 'A'
  3    union all
  4  select 'index', decode( num_rows, null, 'NO', 'YES' ) from dba_indexes 
  5  where table_name = 'A'
  6    union all
  7  select 'histograms', decode( count(*), 0, 'NO', 'YES' ) from 
  8  dba_tab_histograms where table_name = 'A' and rownum = 1
  9  /

'TABLE'    DEC
---------- ---
table      NO
index      NO
histograms NO

SQL> exec dbms_stats.gather_table_stats('DEMO','A', METHOD_OPT => 'FOR ALL INDEXED cOLUMNS');

PL/SQL procedure successfully completed.

SQL> select 'table', decode( num_rows, null, 'NO', 'YES' ) from 
  2  dba_tables where table_name = 'A'
  3    union all
  4  select 'index', decode( num_rows, null, 'NO', 'YES' ) from dba_indexes 
  5  where table_name = 'A'
  6    union all
  7  select 'histograms', decode( count(*), 0, 'NO', 'YES' ) from 
  8  dba_tab_histograms where table_name = 'A' and rownum = 1
  9  /

'TABLE'    DEC
---------- ---
table      YES
index      NO
histograms YES

guess I have to use gather_index_stats proc?

reader, August 15, 2003 - 3:23 pm UTC

Figured it out by reading fully this thread

reader, August 15, 2003 - 3:40 pm UTC

I should use cascade option. Tom, you are great!

SQL> exec dbms_stats.gather_table_stats('DEMO','A', CASCADE => TRUE);

PL/SQL procedure successfully completed.

SQL> select 'table', decode( num_rows, null, 'NO', 'YES' ) from 
  2  dba_tables where table_name = 'A'
  3    union all
  4  select 'index', decode( num_rows, null, 'NO', 'YES' ) from dba_indexes
  5  where table_name = 'A'
  6    union all
  7  select 'histograms', decode( count(*), 0, 'NO', 'YES' ) from 
  8  dba_tab_histograms where table_name = 'A' and rownum = 1
  9  /

'TABLE'    DEC
---------- ---
table      YES
index      YES
histograms YES

difference between dbms_stats and analyze

Ramu, August 15, 2003 - 7:19 pm UTC

Other than analyzing using parallel option what are the advantages in using dbms_stats package in analyzing tables and indexes?

When I use either analyze table command or dbms_stats.gather_table_stats procedure, indexes associated with the table also get analyzed? If so, why there is another command analyze index and another procedure dbms_stats.gather_index_stats to analyze indexes? Thanks.

August 15, 2003 - 8:33 pm UTC

analyze table T compute statistics

is much like

dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all columns', cascade=>true);

analyze index i compute statistics

is much like

dbms_stats.gather_index_stats

and so on. dbms_stats.gather_schema_stats -- well, that has no corresponding equal in analyze.

you cannot export/import stats with analyze

you cannot gather system stats with analyze

you cannot set stats with analyze

you cannot gather STALE <<<=== big reason, huge reason >>> with analyze

it is the stated direction (dbms_stats is)

stale?

A Reader, August 16, 2003 - 11:34 am UTC

<quote>you cannot gather STALE<quote>

Tom, what does stale do? Thanks.

August 16, 2003 - 11:46 am UTC

oracle can monitor tables.

you only need to gather stats on tables where the data changed enough to warrant it.

dbms-stats can use this monitoring data to decide what tables are "stale" and in need of gathering.

so, some nights, dbms_stats might gather stats on ZERO tables, others on all, others on some -- the amount of work it does is proportional to the amount of work it needs to do.

in general, some tables, but not all, will be stale -- so every night, dbms_stats will gather stats on some subset of your tables, not all of them.

analyze does everything, regardless of whether it needs it or not.

great explanation - how about new tables

Sravan, September 17, 2003 - 5:18 pm UTC

Tom,
I am monitoring all the schema objects and using the stale option with "gather AUTO" for dbms_stats job. They are running fine now on the existing tables.

How about the new tables which gets added to the database.
1) Does the job automatically recognizes the new table?
2) Will the table be monitored?( picks up automatically ? Or should I set a job to run ALTER_SCHEMA_TAB_MONITORING frequently to add it the list ?)
3) Will gather_schema_stats recognises the new tables and gathers the statistics ?

=================================================
Monitoring of the tables is done using:

exec dbms_stats.ALTER_SCHEMA_TAB_MONITORING('SCOTT',TRUE);
================================================
dbms_stats definition looks like(runs 3 times a week):

exec dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
options => 'GATHER AUTO',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'FOR ALL INDEXED COLUMNS SIZE 1'
degree => 3
)
=================================================

September 17, 2003 - 6:12 pm UTC

it'll get the new tables...
it will not be monitored
you might add the alter schema tab monitoring to the script...

ops$tkyte@ORA920> @connect a/a
ops$tkyte@ORA920> set termout off

a@ORA920> set termout on
a@ORA920>
a@ORA920> create table t1 ( x int );
 
Table created.
 
a@ORA920>
a@ORA920> exec dbms_stats.ALTER_SCHEMA_TAB_MONITORING('A',TRUE);
 
PL/SQL procedure successfully completed.
 
a@ORA920>
a@ORA920> create table t2 ( x int );
 
Table created.
 
a@ORA920>
a@ORA920> select table_name, monitoring from user_tables;
 
TABLE_NAME                     MON
------------------------------ ---
T1                             YES
T2                             NO
 
a@ORA920>
a@ORA920>
a@ORA920> begin dbms_stats.gather_schema_stats(
  2       ownname          => user,
  3       options          => 'GATHER AUTO',
  4       estimate_percent => dbms_stats.auto_sample_size,
  5       method_opt       => 'FOR ALL INDEXED COLUMNS SIZE 1',
  6       degree           => 3
  7     );
  8  end;
  9  /
 
PL/SQL procedure successfully completed.
 
a@ORA920>
a@ORA920> create table t3 ( x int );
 
Table created.
 
a@ORA920> select table_name, monitoring, num_rows from user_tables;
 
TABLE_NAME                     MON   NUM_ROWS
------------------------------ --- ----------
T1                             YES          0
T2                             NO           0
T3                             NO
 
<b>stats on both tables -- no monitoring tho on t2 and t3</b>

a@ORA920>
a@ORA920> begin dbms_stats.gather_schema_stats(
  2       ownname          => user,
  3       options          => 'GATHER AUTO',
  4       estimate_percent => dbms_stats.auto_sample_size,
  5       method_opt       => 'FOR ALL INDEXED COLUMNS SIZE 1',
  6       degree           => 3
  7     );
  8  end;
  9  /
 
PL/SQL procedure successfully completed.
 
a@ORA920>
a@ORA920> select table_name, monitoring, num_rows from user_tables;
 
TABLE_NAME                     MON   NUM_ROWS
------------------------------ --- ----------
T1                             YES          0
T2                             NO           0
T3                             NO           0

<b>shows it picks up all tables -- but no monitoring</b>
 
a@ORA920> exec dbms_stats.ALTER_SCHEMA_TAB_MONITORING('A',TRUE);
 
PL/SQL procedure successfully completed.
 
a@ORA920> select table_name, monitoring, num_rows from user_tables;
 
TABLE_NAME                     MON   NUM_ROWS
------------------------------ --- ----------
T1                             YES          0
T2                             YES          0
T3                             YES          0

new tables - cleared

A reader, September 18, 2003 - 1:47 pm UTC

Tom, great illustration. Thanks for your help.
Sravan

Spendid info !

horace, October 28, 2003 - 6:53 am UTC

Thanks for the excellent tutorial

We have tables where all the columns are indexed. We use dbms_stats with cascade=true to analyze the tables and indexes.

The default for method_opt is 'for all columns size 1' with the other options being.....

o for all columns
o for all indexed columns
o for all hidden columns
o for columns <list>

Is there a way of analyzing the table and indexes without gathering histographical stats using dbms_stats, i.e., the equivalent of 'analyze table table_name estimate statistics;'

October 28, 2003 - 8:03 am UTC

method_opt => null

Dillip, October 29, 2003 - 7:52 pm UTC

Hi Tom,

If I collect the stats using the following command, is there something missing for the CBO to generate a good execution plan ? Or what is the best command to collect the stats of a table (including indexes, histograms) so that the optmizer gets all the information that is needed.

execute dbms_stats.gather_table_stats(ownname => 'TEST',
tabname => 'TEST',
method_opt => 'FOR ALL INDEXED COLUMNS',
granularity => 'ALL',
cascade => TRUE,
degree => DBMS_STATS.DEFAULT_DEGREE)

October 30, 2003 - 7:01 am UTC

that is a good "general purpose" one. many times the histograms are in fact not necessary -- that depends on your data and queries.

DBMS_STATS was very useful for me..Thanks

M S Reddy, November 07, 2003 - 10:16 am UTC

Tom
what happens if we run an explain plan while we are running dbms_stats to gatheer a table statistics.i can see that the plan changes even before the procedure is completed.does this do any harm or effect my gathered statistics on that table.what happens if i run a simple select count(*) on that table while we are gathering stats.
i expect that this should not cause any harm,but just wanted to confirm it from you.
Thanks,
M S Reddy.

November 07, 2003 - 10:42 am UTC

stats are consulted to generate the plan, then they are not used. so, it is really only during parse that anything happens here -- the parse will read whatever is in the data dictionary at that point in time.

no harm.

A reader, December 15, 2003 - 12:39 pm UTC

SKEWONLY usage

A reader, December 16, 2003 - 5:05 pm UTC

Hi Tom, recently i received a tip from Burleston
at the end shows

--*******************************************************
-- SKEWONLY option ? Detailed analysis
--
-- Use this method for a first-time analysis
-- for skewed indexes
-- This runs a long time because all indexes are examined
--*******************************************************

begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size skewonly',
degree => 7
);
end;
/

My question is
If I'm using bind variables and get that kind of statistics
in theory I'm not going to get benefit from histogram
statistics
Or I'm wrong and this information helps the optimizer in some way?

December 16, 2003 - 6:51 pm UTC

there is bind variable peeking that happens

there will be times you have constants.

so yes, they can and do come into play.

and it is funny his comment says "indexes" -- but this call doesn't even gather stats on indexes, cascade defaults to false -- the for all columns is about the table columns, not indexed columns.. it is not just looking at indexed columns at all (which is OK, just the comment is muffed up)

using SIZE SKEWONLY means that when you collect histograms with the SIZE option set to SKEWONLY, we collect histogram data in memory for all specified columns (if you do not specify any, all columns are used). Once an "in-memory" histogram is computed for a column, it is stored inside the data dictionary only if it has
"popular" values (multiple end-points with the same value which is what we
define by "there is skew in the data")

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:5792247321358 <code>

More information

A reader, December 17, 2003 - 9:14 am UTC

Thanks Tom
Sorry I should had gave you the link
This is the tip
</code> http://www.dba-oracle.com/oracle_tips_skewonly.htm


    Something funny is that you Burleston has in this page

http://www.dba-oracle.com/articles.htm#tips <code>
the tips untils january 24 2004 ;)

Let me clear my question, it was
Even if I only use bind variables (select a from b where c=:1), not constants(select a from b where c='hi').
As I understand, maybe wrongly, the additional information got with skewonly helps to the optimizer.
I'm right or not?.

Or the question in other way.
If I don't use constant values in the where clauses, required to get advantage of histograms, will I benefit of using skewonly as is explained in the article?

Thanks Tom :)

December 18, 2003 - 8:26 am UTC

BIND VARAIBLE PEEKING -- yes, histograms are relevant even with binds.

A reader, December 18, 2003 - 12:38 pm UTC

Thanks a lot tom :)

a quote

A reader, December 18, 2003 - 12:55 pm UTC

If you don't have any idea what is bind variable peeking,

"In summary, bind variable peeking in Oracle 9i does not help us when we need different execution plans for different values of a bind variable. Bind variable peeking allows the optimizer to take the value of a bind variable into account when first generating an execution plan for a SQL statement. If we know that the data distribution is heavily skewed and that future executions of the SQL statement may involve different data distributions for the bind variable value, we must still resort to using literals to get the best performance."

from here
</code> http://www.dbspecialists.com/specialists/specialist2003-11.html <code>
:)

Dilllip, December 29, 2003 - 2:23 pm UTC

Tom,

With the following command the column statisticts (NUM_DISTINCT,NUM_NULLS and DENSITY) for the non-indexes columns are not collected, is there a way to have the column statistics for all the columns and histograms for the indexed columns in a single dbms_stats command ?

execute dbms_stats.gather_table_stats(ownname => 'TEST', tabname => 'TEST', method_opt => 'FOR ALL INDEXED COLUMNS', granularity => 'ALL', cascade => TRUE, degree => DBMS_STATS.DEFAULT_DEGREE)

December 29, 2003 - 2:53 pm UTC

those are gathered as part of histogram processing -- column statistics & histograms are sort of synonymous here.

Dillip, December 30, 2003 - 2:01 pm UTC

I am not clear how they are synonymous. Can you pls explain ? Pls see the test case I have provided, I do not see any stats for non-indexed columns as method of stat collection is only indexed columns, when I collect stats using the analyze command, I see stats for all the columns for the table and then I collect histograms for the indexes columns separately.

SQL> create table test as select * from dba_objects;

Table created.

SQL> create index test_idx on test(owner, object_name);

Index created.

SQL> create index test_idx1 on test(created);

Index created.

SQL> execute dbms_stats.gather_table_stats(ownname => 'ORAPERF', tabname => 'TEST', method_opt => 'FOR ALL INDEXED COLUMNS', granularity => 'ALL', cascade => TRUE, degree => DBMS_STATS.DEFAULT_DEGREE);

PL/SQL procedure successfully completed.

SQL> @stat test

Table ....

TABNAME              LAST_ANALYZED         NUM_ROWS     BLOCKS AVG_ROW_LEN SAMPLE_SIZE
-------------------- ------------------- ---------- ---------- ----------- -----------
TEST                 12-30-2003 18:55:36      23429        301         100       23429

Columns(s) ....

COLUMN_NAME                    LAST_ANALYZED       NUM_DISTINCT  NUM_NULLS         DENSITY NUM_BUCKETS
------------------------------ ------------------- ------------ ---------- --------------- -----------
OWNER                          12-30-2003 18:55:36            5          0       .00002134           4
OBJECT_NAME                    12-30-2003 18:55:36        22055          0       .00004851          75
SUBOBJECT_NAME
OBJECT_ID
DATA_OBJECT_ID
OBJECT_TYPE
CREATED                        12-30-2003 18:55:36         3257          0       .00050874          75
LAST_DDL_TIME
TIMESTAMP
STATUS
TEMPORARY
GENERATED
SECONDARY

Index(s) ....

INDNAME            LAST_ANALYZED            LFBLK      DKEYS    ALVLBLK    AVGDBLK         CF     BLEVEL
------------------ ------------------- ---------- ---------- ---------- ---------- ---------- ----------
TEST_IDX           12-30-2003 18:55:41        113      23062          1          1        414          1
                                                                      1          1
TEST_IDX1          12-30-2003 18:55:42         63       3257          1          1       3897          1

December 30, 2003 - 2:18 pm UTC

you don't show your analyze, but analyze by default gets histograms:

ops$tkyte@ORA9IR2> create table t as select * from scott.emp;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select count(*) from user_tab_histograms where table_name = 'T';
 
  COUNT(*)
----------
        16


so, your "collection of them after the fact" isn't useful.  they are already there, for all columns.

Dillip, December 30, 2003 - 4:00 pm UTC

Pls see this, the two entries with end point as 0, 1 are collected by default. 

Pls see below and comment

SQL> analyze table test delete statistics;

Table analyzed.

SQL> analyze table test compute statistics;

Table analyzed.

SQL> @stat test

Table ....

TABNAME              LAST_ANALYZED         NUM_ROWS     BLOCKS AVG_ROW_LEN SAMPLE_SIZE
-------------------- ------------------- ---------- ---------- ----------- -----------
TEST                 12-30-2003 20:57:31      23429        301          90       23429

Columns(s) ....

COLUMN_NAME                    LAST_ANALYZED       NUM_DISTINCT  NUM_NULLS         DENSITY NUM
------------------------------ ------------------- ------------ ---------- --------------- ---
OWNER                          12-30-2003 20:57:31            5          0       .20000000
OBJECT_NAME                    12-30-2003 20:57:31        22055          0       .00004534
SUBOBJECT_NAME                 12-30-2003 20:57:31            2      23378       .50000000
OBJECT_ID                      12-30-2003 20:57:31        23429          0       .00004268
DATA_OBJECT_ID                 12-30-2003 20:57:31        18441       4952       .00005423
OBJECT_TYPE                    12-30-2003 20:57:31           22          0       .04545455
CREATED                        12-30-2003 20:57:31         3257          0       .00030703
LAST_DDL_TIME                  12-30-2003 20:57:31         2110          0       .00047393
TIMESTAMP                      12-30-2003 20:57:31         3471          0       .00028810
STATUS                         12-30-2003 20:57:31            1          0      1.00000000
TEMPORARY                      12-30-2003 20:57:31            2          0       .50000000
GENERATED                      12-30-2003 20:57:31            2          0       .50000000
SECONDARY                      12-30-2003 20:57:31            1          0      1.00000000

Index(s) ....

INDNAME            LAST_ANALYZED            LFBLK      DKEYS    ALVLBLK    AVGDBLK         CF
------------------ ------------------- ---------- ---------- ---------- ---------- ----------
TEST_IDX           12-30-2003 20:57:31        113      23062          1          1        305
                                                                      1          1
TEST_IDX1          12-30-2003 20:57:31         63       3257          1          1       3897
SQL> @hist test.object_id

Histograms ....

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              0              2
              1         187780
SQL> analyze table test compute statistics for columns object_id;

Table analyzed.

SQL> @hist test.object_id

Histograms ....

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              0              2
              1            325
              2            652
              3            967
              4           1280
              5           1593
              6           1906
              7           2221
              8           2534
              9           2847
             10           3167
             11           3489
             12           3823
             13           4144
             14           4916
             15           5249
             16           5569
             17           5882
             18           7895
             19           8213
             20           8526
             21           8839
             22           9156
             23          14043
             24          14501
             25          14875
             26          15206
             27          15576
             28          15913
             29          16234
             30          16549
             31          16881
             32          17360
             33          17744
             34          18147
             35          18546
             36          19003
             37          19395
             38          19726
             39          20053
             40          20373
             41          20808
             42          21160
             43          21492
             44          21867
             45          22207
             46          22642
             47          23052
             48          23522
             49          23972
             50          52147
             51          82430
             52         102176
             53         109848
             54         128984
             55         141920
             56         181841
             57         182154
             58         182467
             59         182780
             60         183093
             61         183406
             62         183719
             63         184032
             64         184345
             65         184658
             66         184971
             67         185284
             68         185597
             69         185910
             70         186223
             71         186536
             72         186849
             73         187162
             74         187475
             75         187780

December 30, 2003 - 4:55 pm UTC

so -- you HAD histograms, the analyze command gathered histograms for all columns. dbms_stats can do that as well -- but that goes against your stated desire, hence my point that "you cannot get there from here -- to get the data you are asking for requires histograms!"

Dillip, December 30, 2003 - 5:44 pm UTC

Let me rephrase my question then.

I want to collect default histograms (two values) for
all columns and histograms for all indexes columns (75 buckets).

Can you let me know the dbms_stats command to do that ?

December 30, 2003 - 6:19 pm UTC

it takes two of them.  just like with analyze. 

ops$tkyte@ORA9IR2> create table t as select * from scott.emp;
 
Table created.
 
ops$tkyte@ORA9IR2> create index empno_idx on t(empno);
 
Index created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2><b>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', cascade=>TRUE );</b>
 
PL/SQL procedure successfully completed.

<b>that is just like "analyze table compute statistics"</b>
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select column_name, num_distinct
  2    from user_tab_columns
  3   where table_name = 'T'
  4  /
 
COLUMN_NAME                    NUM_DISTINCT
------------------------------ ------------
EMPNO                                    14
ENAME                                    14
JOB                                       5
MGR                                       6
HIREDATE                                 13
SAL                                      12
COMM                                      4
DEPTNO                                    3
 
8 rows selected.
 
ops$tkyte@ORA9IR2> select column_name, count(*)
  2    from user_tab_histograms
  3   where table_name = 'T'
  4   group by column_name
  5  /
 
COLUMN_NAME                      COUNT(*)
------------------------------ ----------
COMM                                    2
DEPTNO                                  2
EMPNO                                   2
ENAME                                   2
HIREDATE                                2
JOB                                     2
MGR                                     2
SAL                                     2
 
8 rows selected.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2><b>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all indexed columns size 75' );</b>
 
PL/SQL procedure successfully completed.
 
<b>and those are you "special" histograms</b>


ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select column_name, num_distinct
  2    from user_tab_columns
  3   where table_name = 'T'
  4  /
 
COLUMN_NAME                    NUM_DISTINCT
------------------------------ ------------
EMPNO                                    14
ENAME                                    14
JOB                                       5
MGR                                       6
HIREDATE                                 13
SAL                                      12
COMM                                      4
DEPTNO                                    3
 
8 rows selected.
 
ops$tkyte@ORA9IR2> select column_name, count(*)
  2    from user_tab_histograms
  3   where table_name = 'T'
  4   group by column_name
  5  /
 
COLUMN_NAME                      COUNT(*)
------------------------------ ----------
COMM                                    2
DEPTNO                                  2
EMPNO                                  14
ENAME                                   2
HIREDATE                                2
JOB                                     2
MGR                                     2
SAL                                     2
 
8 rows selected.

Dillip, December 30, 2003 - 7:18 pm UTC

Tom, Thanks a lot. So there is no single command to do this
in dbms_stats (that was my initial question).

Thanks again for the clarification.

Dillip, January 23, 2004 - 12:41 am UTC

Tom,

I have one more question, if I use these two commands, will the second command again collect the table stats ?
When I test I see the timestamp for the LAST_ANALYZED is changing, but not sure if it is collecting the stats again ? If it's going to collect the table stats again is there a way not to collect the without using dbms_stat.gather_index_stats ?

(1)
execute dbms_stats.gather_table_stats(ownname => 'SCOTT', tabname => 'TEST');

(2)
execute dbms_stats.gather_table_stats(ownname => 'SCOTT', tabname => 'TEST', method_opt => 'FOR ALL INDEXED COLUMNS', cascade => TRUE);

January 23, 2004 - 7:26 am UTC

table stats are always gathered with dbms_stats.gather_table_stats

it can additionally gather histograms and indexes.

you cannot use "gather table stats" (which by its very name says something!) without gathering table stats. if you only want index stats, well, you'd use that api call.

pl/sql numeric error

A reader, February 17, 2004 - 9:26 am UTC

Hi

I have following procedure (part of a package)

PROCEDURE ANALYZE_DAILY
AS
type array is table of varchar2(256);
l_schema array := array('LSC', 'PERFSTAT');
l_weekday number;
l_degree number := 1;
l_percent number := 20;
l_table_name array;
l_columns array;
l_monitoring array;
l_partitioned array;
l_last_analyzed array;
l_objlist dbms_stats.objectTab;
l_granularity varchar2(10) := 'DEFAULT';
l_method varchar2(50) := 'FOR ALL INDEXED COLUMNS SIZE 75';
BEGIN
execute immediate 'alter session set nls_territory = ''SPAIN''';
execute immediate 'alter session set sort_area_size = 33554432';
l_weekday := to_char(sysdate, 'D');
if l_weekday in (4) then
for i in 1 .. l_schema.count
loop
select a.table_name, b.columns, a.monitoring, a.partitioned, a.last_analyzed
bulk collect into l_table_name, l_columns, l_monitoring, l_partitioned, l_last_analyzed
from dba_tables a, dbstat_tab b
where a.owner = b.owner(+)
and a.table_name = b.table_name(+)
and a.owner = l_schema(i);
dbms_stats.gather_schema_stats(OWNNAME => l_schema(i),
OPTIONS => 'LIST STALE',
OBJLIST => l_objlist);
for x in l_table_name.first..l_table_name.last
loop
if l_monitoring(x) = 'NO' then
execute immediate 'alter table '||l_schema(i)||'.'||l_table_name(x)||' monitoring';
end if;
if l_last_analyzed(x) is NULL then
analyze_table(p_owner => l_schema(i),
p_table => l_table_name(x),
p_percent => l_percent,
p_method => 'FOR COLUMNS '||l_columns(i)||' SIZE 75',
p_granularity => l_granularity,
p_degree => l_degree);
end if;
end loop;
for y in 1..l_objlist.count
loop
analyze_table(p_owner => l_schema(i),
p_table => l_objlist(y).objname,
p_percent => l_percent,
p_method => 'FOR COLUMNS '||l_columns(i)||' SIZE 75',
p_granularity => l_granularity,
p_degree => l_degree);
end loop;
end loop;
elsif l_weekday = 7 then
for i in 1 .. l_schema.count
loop
analyze_schema(p_owner => l_schema(i),
p_percent => l_percent,
p_method => l_method,
p_granularity => l_granularity,
p_degree => l_degree);
end loop;
else
null;
end if;
END ANALYZE_DAILY;

it works well, however if I change

for y in 1..l_objlist.count

to

for y in l_objlist.first..l_objlist.last

I get an error saying

exec pkg_dbstat.analyze_daily;
BEGIN pkg_dbstat.analyze_daily; END;

*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error
ORA-06512: at "SYSTEM.PKG_DBSTAT", line 112
ORA-06512: at line 1

However it does what I need, analyze the tables.

What is the difference between

for y in 1..l_objlist.count
and
for y in l_objlist.first..l_objlist.last

???

February 17, 2004 - 10:06 am UTC

if l_objlist is "empty"

- count will be 0
- but first and last will be null

ops$tkyte@ORA920PC> begin
  2    dbms_output.put_line( 'before loop' );
  3    for i in 1 .. 0
  4    loop
  5          dbms_output.put_line( 'in loop' );
  6    end loop;
  7    dbms_output.put_line( 'after loop' );
  8  end;
  9  /
before loop
after loop
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> begin
  2    dbms_output.put_line( 'before loop' );
  3    for i in null..null
  4    loop
  5          dbms_output.put_line( 'in loop' );
  6    end loop;
  7    dbms_output.put_line( 'after loop' );
  8  end;
  9  /
before loop
begin
*
ERROR at line 1:
ORA-06502: PL/SQL: numeric or value error
ORA-06512: at line 3


You cannot "loop" with NULL -- that is the error.

reg table monitoring

sam, March 19, 2004 - 1:00 pm UTC

Tom,

How much of a overhead a table monitoring has on a table??

Is it advisable to have table monitoring on ??
Both on OLTP & DSS systems.

Or, if we know on our system , a order entry table has lot's of insert/update etc. so let's not monitor that.(generate stats by default , since we know it's changed a lot- 10% may be )
Monitor other tables where the changes are random.
Is that good approach?

March 19, 2004 - 2:06 pm UTC

pretty much nill overhead.

in 10g it'll actually be turned on for you pretty much.

Delete Stats

A reader, March 31, 2004 - 9:31 am UTC

In what cases should I delete statistics ? What is it useful for ?

March 31, 2004 - 9:46 am UTC

you generally never need it. there for completeness. I use it for testing (get rid of existing, gather in a different fashion).

confused at analyze

Jerry, April 01, 2004 - 9:32 pm UTC

Tom,

This site is really a wonderland. Thank you so much for your hard work.

A follow up on the analyze question. I found that different analyze options can bring totally different results. In my company we run analyze like this:

exec dbms_stats.gather_table_stats (ownname=>'XYZ',tabname=> 'EMPLOYEES',estimate_percent=>25,method_opt=> 'FOR ALL COLUMNS',degree=>4,cascade=>TRUE );

It runs for all user tables.
My first question is why we need 'FOR ALL COLUMNS'. At most, 'for all indexed columns' should be enought, right?

Also If I run analyze using 'for all column' or using 'for all indexed columns' in one database, one important query ran really slow because the execution plan does not utilize a certain index. If I delete the stats, then run analyze without the method_opt, just

exec dbms_stats.gather_table_stats (ownname=>'XYZ',tabname=> 'EMPLOYEES',estimate_percent=>25,degree=>4,cascade=>TRUE );

Then the query uses the index and comes back fast.

However, the same analyze options tested in another database with exactly same table/views but different data showed no different in results.

I looked at the query. It's actually fairly simple, one subquery then outer-join. I don't think there is any other way I rewrite it to make it simpler.

Is the analyze method the issue here? How should I go about to diagnose this type of issue?

Thank you!

April 02, 2004 - 9:52 am UTC

it doesn't run for all usre tables -- it should only run for one?

you'd have to tell ME why you need or want for all columns. In a DW, it can be relevant. In a transactional system -- probably not.

that dbms_stats without the method opt is using for 'all columns size 1' by default.

I'd need an example to look at.

Jerry, April 02, 2004 - 1:58 pm UTC

This is an internal application we build. Because we don't know what's data would be like, there are lots of indexes built, on all foreign keys, all id columns. Many columns can have its own index and also be part of another indexed. Also the analyze is scripted and histogram info is collect for all indexed columns.

The application is primarily OLTP but also hosts some DW types of queries, sort of hybrid. Ugly but for history and cost reason, not much can be done.

As for the query, please see below:
sql>analyze table MAPPING delete statistics;

Table analyzed.

sql>exec dbms_stats.gather_table_stats (ownname=>'TMP',tabname=> 'MAPP'
ING',estimate_percent=>25,method_opt=> 'FOR ALL COLUMNS',degree=>4,cascade=>TRUE );

PL/SQL procedure successfully completed.

sql>explain plan for
2 select * from shared;

Explained.

sql>@utlxpls

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 103K| 85M| | 177K|
| 1 | VIEW | SHARED | 103K| 85M| | 177K|
| 2 | SORT UNIQUE | | 103K| 12M| 27M| 177K|
|* 3 | HASH JOIN | | 103K| 12M| | 175K|
| 4 | VIEW | | 1285 | 7710 | | 161 |
|* 5 | FILTER | | | | | |
| 6 | SORT GROUP BY | | 1285 | 12850 | | 161 |
|* 7 | INDEX FAST FULL SCAN| DEVICE_UNIQ | 49913 | 487K| | 16 |
| 8 | NESTED LOOPS OUTER | | 87717 | 10M| | 175K|
| 9 | TABLE ACCESS FULL | ALLOCATION | 87717 | 6852K| | 240 |
|* 10 | VIEW PUSHED PREDICATE | | 1 | 43 | | 2 |
|* 11 | HASH JOIN | | 190 | 8170 | 1736K| 104 |
| 12 | TABLE ACCESS FULL | HOSTDEVICE | 41204 | 1247K| | 74 |
|* 13 | INDEX RANGE SCAN | MAPPING_IDX1 | 190 | 2280 | | 2 |
---------------------------------------------------------------------------------------------------

31 rows selected.

sql>analyze table MAPPING delete statistics;

Table analyzed.

sql>exec dbms_stats.gather_table_stats (ownname=>'TMP',tabname=> 'MAPP
ING',estimate_percent=>25,degree=>4,cascade=>TRUE );

PL/SQL procedure successfully completed.

sql>delete from plan_table;

14 rows deleted.

sql>commit;

Commit complete.

sql>explain plan for
2 select * from shared;

Explained.

sql>@utlxpls

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 103K| 85M| | 177K|
| 1 | VIEW | SHARED | 103K| 85M| | 177K|
| 2 | SORT UNIQUE | | 103K| 12M| 27M| 177K|
|* 3 | HASH JOIN | | 103K| 12M| | 175K|
| 4 | VIEW | | 1285 | 7710 | | 161 |
|* 5 | FILTER | | | | | |
| 6 | SORT GROUP BY | | 1285 | 12850 | | 161 |
|* 7 | INDEX FAST FULL SCAN | DEVICE_UNIQ | 49913 | 487K| | 16 |
| 8 | NESTED LOOPS OUTER | | 87717 | 10M| | 175K|
| 9 | TABLE ACCESS FULL | ALLOCATION | 87717 | 6852K| | 240 |
|* 10 | VIEW PUSHED PREDICATE | | 1 | 43 | | 2 |
| 11 | NESTED LOOPS | | 12 | 516 | | 14 |
|* 12 | INDEX RANGE SCAN | MAPPING_IDX1 | 12 | 144 | | 2 |
| 13 | TABLE ACCESS BY INDEX ROWID| HOSTDEVICE | 1 | 31 | | 1 |
|* 14 | INDEX UNIQUE SCAN | PK_HOSTDEVICE | 1 | | | |
----------------------------------------------------------------------------------------------------

32 rows selected.

The second query runs very fast. The first one did not come back after 4-5 hours. There is no data change on the static testing environment. It seems that oracle pick the wrong execution plan when given too much information. The mapping table are all id columns joining several tables together

Statistics for database.

Kashif, April 20, 2004 - 11:22 am UTC

Hi Tom,

You've mentioned in some earlier posts that you prefer gathering stats on a schema-by-schema basis instead of on the entire database. I've currently scheduled statistics to be gathered for the entire database for only stale objects. So the command I use is similar to the following:

dbms_stats.gather_database_stats ( method_opt => 'for all indexed columns',
granularity => 'DEFAULT',
cascade => TRUE,
options => 'GATHER STALE',
objlist => StaleObjectsArray );

No chance there on stats being generated inadvertently on the SYS or SYSTEM or any other schema, unless MONITORING is turned on for each of the individual tables. We have a DW here, and the load of the data into the DW happens at the beginning of the month, and the schemas I have turned monitoring on all get loaded at the same time, i.e. at the beginning of the month. I've schedule the stats to be gathered at the middle and the end of the month.

I like the ability to turn MONITORING on any table in the database and be done with it, instead of having to schedule the individual schema, although admittedly we don't really bring schemas online frequently at all (we have two schemas being monitored at the moment). Is this a feasible scenario for gathering stats at the database level? Any insight will be useful. Thanks.

Kashif

P.S. In asking you the question, I sort of reached my own conclusion, i.e. just gather the stats per schema, but I'm still curious to know your thoughts.

April 20, 2004 - 11:26 am UTC

everything is feasible - i look at schemas as "independent entities with their own schedules" -- IN GENERAL.

in your unique set of circumstances, this is not the case. so, what you are doing makes sense for you.

HIDDEN COLUMNS

tHIRU, May 12, 2004 - 12:01 pm UTC

Tom,

Pasting your earlier comments on hidden columns:

o for all hidden columns - gather histograms on columns you cannot see, useful if using the object relational features which incorporate tons of hidden columns in your objects.

We have high implementation of object relational features. Is it ok just to go with
exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=> NULL, TABNAME => 'T1Tab', CASCADE => TRUE
,METHOD_OPT => 'FOR ALL HIDDEN COLLUMNS');

AND THEN REFRESH WITH:

exec DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => NULL, TABNAME => 'T1Tab', CASCADE=> FALSE,
METHOD_OPT => 'FOR ALL HIDDEN COLUMNS');

WILL THIS TAKE CARE OF ALL INDEXED COLUMNS TOO?

May 13, 2004 - 8:48 am UTC

Looks like I got that one wrong in researching that further.  Seems to not seek out object hidden columns, but rather hidden virtual columns from function based indexes:

ops$tkyte@ORA9IR2> create table t ( x int, y int );
 
Table created.
 
ops$tkyte@ORA9IR2> insert into t
  2  select rownum, rownum/2 from all_users;
 
43 rows created.
 
ops$tkyte@ORA9IR2> create index t_idx on t( (x+y)/2 );
 
Index created.
 
ops$tkyte@ORA9IR2> create index t_idx2 on t(x);
 
Index created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> begin
  2    DBMS_STATS.GATHER_TABLE_STATS
  3    (OWNNAME=> NULL,
  4     TABNAME => 'T',
  5     CASCADE => TRUE,
  6     METHOD_OPT => 'FOR ALL hidden COLUMNS');
  7  end;
  8  /
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                      43
 
ops$tkyte@ORA9IR2> select table_name, column_name, count(*) from user_tab_histograms
  2  where table_name = 'T' group by table_name, column_name;
 
TABLE_NAME                     COLUMN_NAME    COUNT(*)
------------------------------ ------------ ----------
T                              SYS_NC00003$         43
 
ops$tkyte@ORA9IR2> select table_name, index_name, num_rows from user_indexes where table_name = 'T';
 
TABLE_NAME                     INDEX_NAME                       NUM_ROWS
------------------------------ ------------------------------ ----------
T                              T_IDX                                  43
T                              T_IDX2                                 43
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.delete_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> begin
  2    DBMS_STATS.GATHER_TABLE_STATS
  3    (OWNNAME=> NULL,
  4     TABNAME => 'T',
  5     CASCADE => TRUE,
  6     METHOD_OPT => 'FOR ALL indexed COLUMNS');
  7  end;
  8  /
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T                                      43
 
ops$tkyte@ORA9IR2> select table_name, column_name, count(*) from user_tab_histograms
  2  where table_name = 'T' group by table_name, column_name;
 
TABLE_NAME                     COLUMN_NAME    COUNT(*)
------------------------------ ------------ ----------
T                              X                    43
T                              SYS_NC00003$         43
 
ops$tkyte@ORA9IR2> select table_name, index_name, num_rows from user_indexes where table_name = 'T';
 
TABLE_NAME                     INDEX_NAME                       NUM_ROWS
------------------------------ ------------------------------ ----------
T                              T_IDX                                  43
T                              T_IDX2                                 43


but -- why would you want to stop analyzing the indexes?

A reader, May 13, 2004 - 10:07 am UTC

The refresh option of CASCADE =>FALSE I got it from Metalink. So my guess is it speeds
up the refresh process. So you recommend that I give CASCADE => TRUE even the second time
and "for all indexed columns". Right Tom? One more thing just to know that I am doing right:
Everyday in the morning the tables get loaded fresh through a batch job. And I have planned to
do the initial "gather_stats" after the load and then the refresh in the afternoon when the
database is hot. Is this a suggested way or are there any other tips to do this. Also will the
users get a performance degradation while the stats are being refreshed? Thanks.

May 13, 2004 - 10:55 am UTC

I

o analyze tables first time after creation
o alter them to be monitoring
o gather STALE after that (this is what 10g automates actually)

in that fashion, only objects that need to be regathered will be (stale).

do you change the data so dramatically during the morning/early afternoon that you would need to refresh stats all over again? (they will be like a big batch job, that is the impact they'll have -- that and when they complete -- they will by default invalidate cursors so you'll have a hard parse bump as well)

thiru

A reader, May 13, 2004 - 11:29 am UTC

Yes, the data changes though the no of records dont' jump up much. I would follow
your steps to implement stats. Is dbms_stats.ALTER_SCHEMA_TAB_MONITORING the right
procedure to use instead of doing for each table? Almost all the tables are built on
object types. Also if you can highlight the steps to gathere STALE, it would
immensely help.

May 13, 2004 - 2:10 pm UTC

you can use that procedure or just alter tables, whichever you like better.

a gather stale example:
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:1154434873552#8727886618633 <code>

Thiru, May 13, 2004 - 2:43 pm UTC

Thanks for the input. One last thing:

a. During "GATHER STALE" which option is better ->"FOR ALL COLUMNS" or
"FOR ALL INDEXED COLUMNS" or "FOR ALL COLUMNS SIZE 1" . How to determine
the option to be used.
DECLARE jobno number;
BEGIN
DBMS_JOB.SUBMIT(job => jobno,
what =>'DBMS_STATS.GATHER_SCHEMA_STATS (''TEST_USER'',
NULL,FALSE,''FOR ALL COLUMNS'',NULL,''DEFAULT'',TRUE,NULL,NULL,''GATHER STALE'',''LIST'' ); ',
next_date => SYSDATE, interval => 'Sysdate + 1/24');
END;

b. Will the cursors be invalidated for tables that are refreshed with "GATHER STALE". In
that case, the first few queries would take a longer time than before. Right? Any way
to avoid this?

c. Any performance issues while these statistics run during heavy load.

I am reading your book "Effective Oracle..." It is excellent.

May 13, 2004 - 3:32 pm UTC

a) use what you used in the first place? on the initial gather? or just use "repeat"

b) after you gather stats, plans need to be re-evaluated, so sure.

to avoid it, don't gather stats, don't generate new plans :)

there is a no_invalidate parameter, but if you don't plan on creating new plans, why bother gathering?

is it only for statistics gathering?

Raaghid, May 15, 2004 - 8:06 am UTC

Is Analyze used only for statistics gathering? or will it increase performance of the query, if the tables used in the query analyzed. In nutshell, what are the usages of analyze

May 15, 2004 - 12:42 pm UTC

Please see the admin and performance guides.

analyze is for gathering statistics and/or validating structures.

it is predominately to be used for the latter.

A reader, May 31, 2004 - 3:40 pm UTC

1) It's not clear to me in the docs, but I see in your example that the default for "alter table t compute/estimate statistics" is:

for table
for all columns
for all indexed columns

right ?

2) What is the default sample for the estimate clause ?

May 31, 2004 - 3:59 pm UTC

1) yes.

2)
....
SAMPLE Specify the amount of data from the analyzed object Oracle should sample to estimate statistics. If you omit this parameter, then Oracle samples 1064 rows.
......

analyze_schema gives better plan than gather_schema_stats ??

A reader, June 03, 2004 - 6:22 pm UTC

Hi Tom,

I'm in the process of replacing analyze_schema scripts with gather_schema_stats.

Earlier someone stated that they saw a more efficient plan with 'analyze' than with gather_schema_stats. I'm observing something similar but I will let your expertise decide which plan is "better", owing to the fact that I know very little about CBO. 

Why does gather_schema_stats results in full table scans? Is it because the tables are pretty small? Although, request table does have 107K rows.

Thanks so much, as always.

Here are the facts:

$ cat j1.sql

select ROW_NUM, REQ_ID, LAST_UPDATED, REQ_STATUS,
ACTION_DESC, CMD_DESC, GUI_CMD_DESC, OMA, USER_NAME from (
select rownum as ROW_NUM, REQ_ID, LAST_UPDATED, REQ_STATUS,
ACTION_DESC, CMD_DESC, GUI_CMD_DESC, OMA, USER_NAME from
(
  select r.REQ_ID as REQ_ID,
  r.last_updated as LAST_UPDATED,
  r.REQ_STATUS as REQ_STATUS,
  dc.ACTION_DESC as ACTION_DESC,
  dc.CMD_DESC as CMD_DESC,
  dc.GUI_CMD_DESC as GUI_CMD_DESC,
 dc.oma as OMA,
  u.first_name||' '||u.last_name as USER_NAME
  from REQ r, DEV_COM dc, users u
  where r.dev_id = 6
  and r.cmd_id = dc.cmd_id
  and r.user_id = u.user_id
  order by 1 desc
  )
  )
where ROW_NUM between 1 and 5
;

**************************************************************

SQL> select count(*) from users;
  COUNT(*)
----------
        29

SQL> select count(*) from dev_com;

  COUNT(*)
----------
       226

SQL> select count(*) from req;

  COUNT(*)
----------
    107520

**************************************************************

exec dbms_stats.delete_schema_stats(user);

exec dbms_stats.gather_schema_stats (ownname=>USER, estimate_percent=>dbms_stats
.auto_sample_size, cascade=>TRUE, method_opt=>'FOR ALL INDEXED COLUMNS SIZE AUTO
');

select count(*) from user_tab_histograms where table_name='REQ';
  COUNT(*)
----------
       349


SQL> @j1
Elapsed: 00:00:04.92

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1710 Card=45723 Byte
          s=6767004)

   1    0   VIEW (Cost=1710 Card=45723 Bytes=6767004)
   2    1     COUNT
   3    2       VIEW (Cost=1710 Card=45723 Bytes=6172605)
   4    3         SORT (ORDER BY) (Cost=1710 Card=45723 Bytes=6675558)
   5    4           HASH JOIN (Cost=675 Card=45723 Bytes=6675558)
   6    5             TABLE ACCESS (FULL) OF 'USERS' (Cost=10 Card=29
          Bytes=1102)

   7    5             HASH JOIN (Cost=664 Card=45723 Bytes=4938084)
   8    7               TABLE ACCESS (FULL) OF 'DEV_COM' (Cost=
          2 Card=226 Bytes=12430)

   9    7               TABLE ACCESS (FULL) OF 'REQ' (Cost=661 Car
          d=45723 Bytes=2423319)



exec dbms_stats.delete_schema_stats(user);


exec dbms_utility.analyze_schema(user,'compute');

select count(*) from user_tab_histograms where table_name='REQ';
  COUNT(*)
----------
       226


SQL> @j1
Elapsed: 00:00:01.89

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=188 Card=1991 Bytes=
          294668)

   1    0   VIEW (Cost=188 Card=1991 Bytes=294668)
   2    1     COUNT
   3    2       VIEW (Cost=188 Card=1991 Bytes=268785)
   4    3         SORT (ORDER BY) (Cost=188 Card=1991 Bytes=157289)
   5    4           HASH JOIN (Cost=159 Card=1991 Bytes=157289)
   6    5             TABLE ACCESS (FULL) OF 'DEV_COM' (Cost=2
          Card=226 Bytes=8814)

   7    5             HASH JOIN (Cost=156 Card=1991 Bytes=79640)
   8    7               TABLE ACCESS (FULL) OF 'USERS' (Cost=10 Card=2
          9 Bytes=377)

   9    7               TABLE ACCESS (BY INDEX ROWID) OF 'REQ' (Co
          st=145 Card=1991 Bytes=53757)

  10    9                 INDEX (RANGE SCAN) OF 'DUP_REQ_2' (NON-U
          NIQUE) (Cost=7 Card=1991)

June 03, 2004 - 7:37 pm UTC

always gotta have the tkprofs.

I would recommend always to write queries like that as:

select /*+ FIRST_ROWS */ *
from ( select /*+ FIRST_ROWS */ a.*, rownum r
from (YOUR_FIRST_ROWS_QUERY_HERE) A
where rownum <= :max)
where r >= :min;

using between is way sub-optimal. If you have an index on r(dev_id,req_id) -- this could be quite speedy indeed.

Strategy for Gather Stat.

Goh Seong Hin, June 03, 2004 - 11:42 pm UTC

Dear Tom,

What is the recommended strategy to gather the statistic in OLTP environment ? 1 huge table (10million records) is very active in DML (insert, update and delete) throughout the day and the rest of the tables is not very active (probably 200-500 records updated/inserted/deleted per day).
Assuming that we are in Oracle 8i which does not have auto monitor features to assist us. Based on your experience, what should be the best way to gather the statistic.

How about the following strategy?
---------------------------------
Huge table -> gather statistics with estimate 5% of data every 6 hours.
The rest of the table -> gather full statistics every sunday nite.

What is the different between the analyze table x stimate statistics sample 5 rows and 10 rows besides the number of rows inspected to gather the statistic. I tried to gather a table called emp (contains 11 rows) with 3 rows sample and found out that the system shows the total number of distinct in dba_tab_columns. Since the system is able to get the info without going through the entire table. Is it recommended for us to gather certain percentage of row for all of our tables (to shorten the gathering time).

COLUMN_NAME NUM_DISTINCT
------------------------------ ------------
EMP_NO
EMP_NAME
DEPT_NO

SQL> analyze table emp estimate statistics sample 3 rows;

Table analyzed.

SQL> select column_name, num_distinct
from dba_tab_columns
where table_name = 'EMP' 2 3
4 /

COLUMN_NAME NUM_DISTINCT
------------------------------ ------------
EMP_NO 11
EMP_NAME 11
DEPT_NO 3

Thanks in advance.

Rgds
SHGoh

June 04, 2004 - 7:38 am UTC

8i has alter table T monitoring and GATHER STALE with dbms_stats.

use that to identify just the tables that need to be analyzed. 10,000,000 records without histograms won't take very long to compute -- only if you need histograms on many columns (doubtful, probably don't need any) would you look at estimate.

testing against really small things like that isn't "meaningful". 3 rows -- not a good test case.

Here is the tkprof from FTS ...

A reader, June 04, 2004 - 11:31 am UTC

Hi Tom,

This is the tkprof from gather_schema_stats that's doing a FTS on REQ table:
============================================

TKPROF: Release 9.2.0.4.0 - Production on Fri Jun 4 10:35:16 2004

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Trace file: itdb_ora_2508.trc
Sort options: default

********************************************************************************
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
elapsed = elapsed time in seconds executing
disk = number of physical reads of buffers from disk
query = number of buffers gotten for consistent read
current = number of buffers gotten in current mode (usually for update)
rows = number of rows processed by the fetch or execute call
********************************************************************************

alter session set sql_trace = true

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.04 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1 0.00 0.04 0 0 0 0

Misses in library cache during parse: 0
Misses in library cache during execute: 1
Optimizer goal: CHOOSE
Parsing user id: 433
********************************************************************************

select default$
from
col$ where rowid=:1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.00 0.00 0 0 0 0
Execute 4 0.00 0.00 0 0 0 0
Fetch 4 0.00 0.00 0 8 0 4
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 12 0.00 0.00 0 8 0 4

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: SYS (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS BY USER ROWID COL$ (cr=1 r=0 w=0 time=91 us)

********************************************************************************

select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#,
sample_size, minimum, maximum, distcnt, lowval, hival, density, col#,
spare1, spare2, avgcln
from
hist_head$ where obj#=:1 and intcol#=:2

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.00 0.00 0 0 0 0
Execute 44 0.00 0.00 0 0 0 0
Fetch 44 0.00 0.00 0 88 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 92 0.00 0.01 0 88 0 0

Misses in library cache during parse: 1
Optimizer goal: RULE
Parsing user id: SYS (recursive depth: 1)
********************************************************************************

select ROW_NUM, REQ_ID, LAST_UPDATED, REQ_STATUS,
ACTION_DESC, CMD_DESC, GUI_CMD_DESC, OMA, USER_NAME from (
select rownum as ROW_NUM, REQ_ID, LAST_UPDATED, REQ_STATUS,
ACTION_DESC, CMD_DESC, GUI_CMD_DESC, OMA, USER_NAME from
(
select r.REQ_ID as REQ_ID,
r.last_updated as LAST_UPDATED,
r.REQ_STATUS as REQ_STATUS,
dc.ACTION_DESC as ACTION_DESC,
dc.CMD_DESC as CMD_DESC,
dc.GUI_CMD_DESC as GUI_CMD_DESC,
dc.oma as OMA,
u.first_name||' '||u.last_name as USER_NAME
from REQUEST r, DEVICE_COMMAND dc, users u
where r.dev_id = 6
and r.cmd_id = dc.cmd_id
and r.user_id = u.user_id
order by 1 desc
)
)
where ROW_NUM between 1 and 5

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.18 0.17 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 4 4.33 11.19 6308 8746 0 10
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 8 4.51 11.36 6308 8746 0 10

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
5 VIEW (cr=4373 r=3253 w=0 time=5458161 us)
45952 COUNT (cr=4373 r=3253 w=0 time=5341429 us)
45952 VIEW (cr=4373 r=3253 w=0 time=5182836 us)
45952 SORT ORDER BY (cr=4373 r=3253 w=0 time=5019472 us)
45952 HASH JOIN (cr=4373 r=3253 w=0 time=4493816 us)
29 TABLE ACCESS FULL USERS (cr=21 r=0 w=0 time=703 us)
45952 HASH JOIN (cr=4352 r=3253 w=0 time=4123196 us)
226 TABLE ACCESS FULL DEVICE_COMMAND (cr=5 r=3 w=0 time=24315 us)
45952 TABLE ACCESS FULL REQUEST (cr=4347 r=3250 w=0 time=3698938 us)

********************************************************************************

SELECT DECODE('A','A','1','2')
FROM
DUAL

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.01 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 6 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 0.01 0.00 0 6 0 2

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS FULL DUAL (cr=3 r=0 w=0 time=201 us)

********************************************************************************

select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname,
o.dataobj#,o.flags
from
obj$ o where o.obj#=:1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 6 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 0.00 0.00 0 6 0 2

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: SYS (recursive depth: 1)
********************************************************************************

SELECT ID ID_PLUS_EXP,PARENT_ID PARENT_ID_PLUS_EXP,LPAD(' ',2*(LEVEL-1))
||OPERATION||DECODE(OTHER_TAG,NULL,'','*')||DECODE(OPTIONS,NULL,'','
('||OPTIONS||')')||DECODE(OBJECT_NAME,NULL,'',' OF '''||OBJECT_NAME||'''')
||DECODE(OBJECT_TYPE,NULL,'',' ('||OBJECT_TYPE||')')||DECODE(ID,0,
DECODE(OPTIMIZER,NULL,'',' Optimizer='||OPTIMIZER))||DECODE(COST,NULL,'','
(Cost='||COST||DECODE(CARDINALITY,NULL,'',' Card='||CARDINALITY)
||DECODE(BYTES,NULL,'',' Bytes='||BYTES)||')') PLAN_PLUS_EXP,OBJECT_NODE
OBJECT_NODE_PLUS_EXP
FROM
PLAN_TABLE START WITH ID=0 AND STATEMENT_ID=:1 CONNECT BY PRIOR ID=PARENT_ID
AND STATEMENT_ID=:1 ORDER BY ID,POSITION

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.01 0.00 0 0 0 0
Fetch 3 0.02 0.02 0 31 0 10
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 7 0.03 0.02 0 31 0 10

Misses in library cache during parse: 2
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
0 SORT ORDER BY (cr=3 r=0 w=0 time=239 us)
0 CONNECT BY WITH FILTERING (cr=3 r=0 w=0 time=219 us)
0 NESTED LOOPS (cr=3 r=0 w=0 time=156 us)
0 TABLE ACCESS FULL PLAN_TABLE (cr=3 r=0 w=0 time=150 us)
0 TABLE ACCESS BY USER ROWID PLAN_TABLE (cr=0 r=0 w=0 time=0 us)
0 HASH JOIN (cr=0 r=0 w=0 time=0 us)
0 CONNECT BY PUMP (cr=0 r=0 w=0 time=0 us)
0 TABLE ACCESS FULL PLAN_TABLE (cr=0 r=0 w=0 time=0 us)

********************************************************************************

SELECT USERENV('SESSIONID')
FROM
DUAL

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 3 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 0 3 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS FULL DUAL (cr=3 r=0 w=0 time=130 us)

********************************************************************************

select text
from
view$ where rowid=:1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 1 4 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 0.00 0.01 1 4 0 2

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: SYS (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS BY USER ROWID VIEW$ (cr=1 r=1 w=0 time=9114 us)

********************************************************************************

SELECT DISTINCT SID
FROM
V$MYSTAT

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 1 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.01 1 2 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT UNIQUE (cr=0 r=0 w=0 time=3015 us)
248 FILTER (cr=0 r=0 w=0 time=2347 us)
280 FIXED TABLE FULL X$KSUMYSTA (cr=0 r=0 w=0 time=1369 us)
1 FIXED TABLE FULL X$KSUSGIF (cr=0 r=0 w=0 time=7 us)

********************************************************************************

SELECT STATISTIC# S, NAME
FROM
SYS.V_$STATNAME WHERE NAME IN ('recursive calls','db block gets','consistent
gets','physical reads','redo size','bytes sent via SQL*Net to client',
'bytes received via SQL*Net from client','SQL*Net roundtrips to/from
client','sorts (memory)','sorts (disk)') ORDER BY S

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.06 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 0 0 10
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.06 0 2 0 10

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
10 FIXED TABLE FULL X$KSUSD (cr=0 r=0 w=0 time=730 us)

********************************************************************************

insert into sqm_alarm_config(sqm_alarm_config_id,ALARM_MSG_ID,MSG_TYPE_ID,
ALARM_LEVEl,ALARM_LEVEL_VALUE,alarm_thresh,create_user)
values
(102,34,251,'LAI','q' ,'10',1)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 0 0 0
Execute 1 0.00 0.04 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.01 0.04 0 0 0 0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433
********************************************************************************

DELETE FROM PLAN_TABLE
WHERE
STATEMENT_ID=:1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 6 12 10
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 6 12 10

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
0 DELETE (cr=3 r=0 w=0 time=134 us)
0 TABLE ACCESS FULL PLAN_TABLE (cr=3 r=0 w=0 time=123 us)

********************************************************************************

EXPLAIN PLAN SET STATEMENT_ID='PLUS7522' FOR select ROW_NUM, REQ_ID, LAST_UPDATED, REQ_STATUS,
ACTION_DESC, CMD_DESC, GUI_CMD_DESC, OMA, USER_NAME from (
select rownum as ROW_NUM, REQ_ID, LAST_UPDATED, REQ_STATUS,
ACTION_DESC, CMD_DESC, GUI_CMD_DESC, OMA, USER_NAME from
(
select r.REQ_ID as REQ_ID,
r.last_updated as LAST_UPDATED,
r.REQ_STATUS as REQ_STATUS,
dc.ACTION_DESC as ACTION_DESC,
dc.CMD_DESC as CMD_DESC,
dc.GUI_CMD_DESC as GUI_CMD_DESC,
dc.oma as OMA,
u.first_name||' '||u.last_name as USER_NAME
from REQUEST r, DEVICE_COMMAND dc, users u
where r.dev_id = 6
and r.cmd_id = dc.cmd_id
and r.user_id = u.user_id
order by 1 desc
)
)
where ROW_NUM between 1 and 5

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.01 0 0 0 0
Execute 1 0.01 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.03 0.02 0 0 0 0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433
********************************************************************************

insert into plan_table (statement_id, timestamp, operation, options,
object_node, object_owner, object_name, object_instance, object_type,
search_columns, id, parent_id, position, other,optimizer, cost, cardinality,
bytes, other_tag, partition_start, partition_stop, partition_id,
distribution, cpu_cost, io_cost, temp_space, access_predicates,
filter_predicates )
values
(:1,SYSDATE,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17,:18,:19,
:20,:21,:22,:23,:24,:25,:26,:27)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 10 0.01 0.01 0 2 11 10
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 11 0.01 0.01 0 2 11 10

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer goal: CHOOSE
Parsing user id: 433 (recursive depth: 1)
********************************************************************************

select o.name, u.name
from
sys.obj$ o, sys.user$ u where obj# = :1 and owner# = user#

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 0 0.00 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1 0.00 0.00 0 0 0 0

Misses in library cache during parse: 1
Parsing user id: SYS (recursive depth: 1)
********************************************************************************

SELECT ID ID_PLUS_EXP,OTHER_TAG OTHER_TAG_PLUS_EXP,OTHER OTHER_PLUS_EXP
FROM
PLAN_TABLE WHERE STATEMENT_ID=:1 AND OTHER_TAG IS NOT NULL ORDER BY ID

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 3 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 0 3 0 0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
0 SORT ORDER BY (cr=3 r=0 w=0 time=259 us)
0 TABLE ACCESS FULL PLAN_TABLE (cr=3 r=0 w=0 time=175 us)

********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 14 0.23 0.28 1 4 0 0
Execute 15 0.02 0.10 0 6 12 10
Fetch 14 4.35 11.22 6308 8789 0 34
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 43 4.60 11.61 6309 8799 12 44

Misses in library cache during parse: 11
Misses in library cache during execute: 1

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 14 0.00 0.00 0 0 0 0
Execute 62 0.01 0.01 0 2 11 10
Fetch 52 0.00 0.01 1 106 0 8
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 128 0.01 0.04 1 108 11 18

Misses in library cache during parse: 6
Misses in library cache during execute: 1

16 user SQL statements in session.
13 internal SQL statements in session.
29 SQL statements in session.
********************************************************************************
Trace file: itdb_ora_2508.trc
Trace file compatibility: 9.00.01
Sort options: default

3 sessions in tracefile.
30 user SQL statements in trace file.
28 internal SQL statements in trace file.
29 SQL statements in trace file.
17 unique SQL statements in trace file.
425 lines in trace file.

June 04, 2004 - 12:25 pm UTC

really only needed a tiny bit, the relevant portion.

Here is the tkprof from INDEX RANGE SCAN...

A reader, June 04, 2004 - 11:32 am UTC

This is the tkprof from analyze_schema that's doing a INDEX RANGE SCAN on REQ table:
============================================

TKPROF: Release 9.2.0.4.0 - Production on Fri Jun 4 10:55:10 2004

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Trace file: itdb_ora_2508.trc
Sort options: default

********************************************************************************
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
elapsed = elapsed time in seconds executing
disk = number of physical reads of buffers from disk
query = number of buffers gotten for consistent read
current = number of buffers gotten in current mode (usually for update)
rows = number of rows processed by the fetch or execute call
********************************************************************************

alter session set sql_trace = true

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 0

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 433
********************************************************************************

SELECT DECODE('A','A','1','2')
FROM
DUAL

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 6 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 0.00 0.00 0 6 0 2

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS FULL DUAL (cr=3 r=0 w=0 time=199 us)

********************************************************************************

SELECT ID ID_PLUS_EXP,PARENT_ID PARENT_ID_PLUS_EXP,LPAD(' ',2*(LEVEL-1))
||OPERATION||DECODE(OTHER_TAG,NULL,'','*')||DECODE(OPTIONS,NULL,'','
('||OPTIONS||')')||DECODE(OBJECT_NAME,NULL,'',' OF '''||OBJECT_NAME||'''')
||DECODE(OBJECT_TYPE,NULL,'',' ('||OBJECT_TYPE||')')||DECODE(ID,0,
DECODE(OPTIMIZER,NULL,'',' Optimizer='||OPTIMIZER))||DECODE(COST,NULL,'','
(Cost='||COST||DECODE(CARDINALITY,NULL,'',' Card='||CARDINALITY)
||DECODE(BYTES,NULL,'',' Bytes='||BYTES)||')') PLAN_PLUS_EXP,OBJECT_NODE
OBJECT_NODE_PLUS_EXP
FROM
PLAN_TABLE START WITH ID=0 AND STATEMENT_ID=:1 CONNECT BY PRIOR ID=PARENT_ID
AND STATEMENT_ID=:1 ORDER BY ID,POSITION

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.00 0.00 0 0 0 0
Execute 4 0.00 0.00 0 0 0 0
Fetch 6 0.05 0.07 1 68 0 22
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 14 0.05 0.08 1 68 0 22

Misses in library cache during parse: 3
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
0 SORT ORDER BY (cr=3 r=0 w=0 time=382 us)
0 CONNECT BY WITH FILTERING (cr=3 r=0 w=0 time=324 us)
0 NESTED LOOPS (cr=3 r=0 w=0 time=229 us)
0 TABLE ACCESS FULL PLAN_TABLE (cr=3 r=0 w=0 time=219 us)
0 TABLE ACCESS BY USER ROWID PLAN_TABLE (cr=0 r=0 w=0 time=0 us)
0 HASH JOIN (cr=0 r=0 w=0 time=0 us)
0 CONNECT BY PUMP (cr=0 r=0 w=0 time=0 us)
0 TABLE ACCESS FULL PLAN_TABLE (cr=0 r=0 w=0 time=0 us)

********************************************************************************

SELECT USERENV('SESSIONID')
FROM
DUAL

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 6 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 0.00 0.00 0 6 0 2

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS FULL DUAL (cr=3 r=0 w=0 time=192 us)

********************************************************************************

select ROW_NUM, REQ_ID, LAST_UPDATED, REQ_STATUS,
ACTION_DESC, CMD_DESC, GUI_CMD_DESC, OMA, USER_NAME from (
select rownum as ROW_NUM, REQ_ID, LAST_UPDATED, REQ_STATUS,
ACTION_DESC, CMD_DESC, GUI_CMD_DESC, OMA, USER_NAME from
(
select r.REQ_ID as REQ_ID,
r.last_updated as LAST_UPDATED,
r.REQ_STATUS as REQ_STATUS,
dc.ACTION_DESC as ACTION_DESC,
dc.CMD_DESC as CMD_DESC,
dc.GUI_CMD_DESC as GUI_CMD_DESC,
dc.oma as OMA,
u.first_name||' '||u.last_name as USER_NAME
from REQ r, DEV_COM dc, users u
where r.dev_id = 6
and r.cmd_id = dc.cmd_id
and r.user_id = u.user_id
order by 1 desc
)
)
where ROW_NUM between 1 and 5

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.01 0.01 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 4 4.14 4.37 70 8776 0 10
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 8 4.15 4.38 70 8776 0 10

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
5 VIEW (cr=4388 r=0 w=0 time=2068784 us)
45952 COUNT (cr=4388 r=0 w=0 time=1958742 us)
45952 VIEW (cr=4388 r=0 w=0 time=1799604 us)
45952 SORT ORDER BY (cr=4388 r=0 w=0 time=1642682 us)
45952 HASH JOIN (cr=4388 r=0 w=0 time=1175554 us)
226 TABLE ACCESS FULL DEV_COM (cr=5 r=0 w=0 time=1095 us)
45952 HASH JOIN (cr=4383 r=0 w=0 time=778086 us)
29 TABLE ACCESS FULL USERS (cr=21 r=0 w=0 time=728 us)
45952 TABLE ACCESS BY INDEX ROWID REQ (cr=4362 r=0 w=0 time=406214 us)
45952 INDEX RANGE SCAN DUP_REQ_2 (cr=137 r=0 w=0 time=123214 us)(object id 213014)

********************************************************************************

DELETE FROM PLAN_TABLE
WHERE
STATEMENT_ID=:1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.01 0.00 0 0 0 0
Execute 4 0.00 0.00 0 12 24 22
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 8 0.01 0.00 0 12 24 22

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
0 DELETE (cr=3 r=0 w=0 time=161 us)
0 TABLE ACCESS FULL PLAN_TABLE (cr=3 r=0 w=0 time=150 us)

********************************************************************************

select default$
from
col$ where rowid=:1

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 6 0.00 0.00 0 0 0 0
Execute 6 0.00 0.00 0 0 0 0
Fetch 6 0.00 0.00 0 12 0 6
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 18 0.00 0.00 0 12 0 6

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: SYS (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS BY USER ROWID COL$ (cr=1 r=0 w=0 time=86 us)

********************************************************************************

EXPLAIN PLAN SET STATEMENT_ID='PLUS7524' FOR select ROW_NUM, REQ_ID, LAST_UPDATED, REQ_STATUS,
ACTION_DESC, CMD_DESC, GUI_CMD_DESC, OMA, USER_NAME from (
select rownum as ROW_NUM, REQ_ID, LAST_UPDATED, REQ_STATUS,
ACTION_DESC, CMD_DESC, GUI_CMD_DESC, OMA, USER_NAME from
(
select r.REQ_ID as REQ_ID,
r.last_updated as LAST_UPDATED,
r.REQ_STATUS as REQ_STATUS,
dc.ACTION_DESC as ACTION_DESC,
dc.CMD_DESC as CMD_DESC,
dc.GUI_CMD_DESC as GUI_CMD_DESC,
dc.oma as OMA,
u.first_name||' '||u.last_name as USER_NAME
from REQ r, DEV_COM dc, users u
where r.dev_id = 6
and r.cmd_id = dc.cmd_id
and r.user_id = u.user_id
order by 1 desc
)
)
where ROW_NUM between 1 and 5

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.14 0.14 0 0 0 0
Execute 1 0.01 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.15 0.14 0 0 0 0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433
********************************************************************************

select user#
from
sys.user$ where name = 'OUTLN'

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 2 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 0 2 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: SYS (recursive depth: 2)

Rows Row Source Operation
------- ---------------------------------------------------
1 TABLE ACCESS BY INDEX ROWID USER$ (cr=2 r=0 w=0 time=81 us)
1 INDEX UNIQUE SCAN I_USER1 (cr=1 r=0 w=0 time=45 us)(object id 44)

********************************************************************************

insert into plan_table (statement_id, timestamp, operation, options,
object_node, object_owner, object_name, object_instance, object_type,
search_columns, id, parent_id, position, other,optimizer, cost, cardinality,
bytes, other_tag, partition_start, partition_stop, partition_id,
distribution, cpu_cost, io_cost, temp_space, access_predicates,
filter_predicates )
values
(:1,SYSDATE,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17,:18,:19,
:20,:21,:22,:23,:24,:25,:26,:27)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.01 0.00 0 0 0 0
Execute 22 0.00 0.00 0 3 24 22
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 24 0.01 0.01 0 3 24 22

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer goal: CHOOSE
Parsing user id: 433 (recursive depth: 1)
********************************************************************************

select o.name, u.name
from
sys.obj$ o, sys.user$ u where obj# = :1 and owner# = user#

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 0 0.00 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 0

Misses in library cache during parse: 0
Parsing user id: SYS (recursive depth: 1)
********************************************************************************

SELECT ID ID_PLUS_EXP,OTHER_TAG OTHER_TAG_PLUS_EXP,OTHER OTHER_PLUS_EXP
FROM
PLAN_TABLE WHERE STATEMENT_ID=:1 AND OTHER_TAG IS NOT NULL ORDER BY ID

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 0 0 0
Execute 2 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 6 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 0.00 0.00 0 6 0 0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433

Rows Row Source Operation
------- ---------------------------------------------------
0 SORT ORDER BY (cr=3 r=0 w=0 time=257 us)
0 TABLE ACCESS FULL PLAN_TABLE (cr=3 r=0 w=0 time=173 us)

********************************************************************************

EXPLAIN PLAN SET STATEMENT_ID='PLUS7525' FOR select ROW_NUM, REQ_ID, LAST_UPDATED, REQ_STATUS,
ACTION_DESC, CMD_DESC, GUI_CMD_DESC, OMA, USER_NAME from (
select rownum as ROW_NUM, REQ_ID, LAST_UPDATED, REQ_STATUS,
ACTION_DESC, CMD_DESC, GUI_CMD_DESC, OMA, USER_NAME from
(
select r.REQ_ID as REQ_ID,
r.last_updated as LAST_UPDATED,
r.REQ_STATUS as REQ_STATUS,
dc.ACTION_DESC as ACTION_DESC,
dc.CMD_DESC as CMD_DESC,
dc.GUI_CMD_DESC as GUI_CMD_DESC,
dc.oma as OMA,
u.first_name||' '||u.last_name as USER_NAME
from REQ r, DEV_COM dc, users u
where r.dev_id = 6
and r.cmd_id = dc.cmd_id
and r.user_id = u.user_id
order by 1 desc
)
)
where ROW_NUM between 1 and 5

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.02 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.03 0.02 0 0 0 0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 433

********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 18 0.17 0.18 0 0 0 0
Execute 20 0.03 0.02 0 12 24 22
Fetch 16 4.19 4.44 71 8862 0 36
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 54 4.39 4.65 71 8874 24 58

Misses in library cache during parse: 8

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 11 0.01 0.00 0 0 0 0
Execute 29 0.00 0.01 0 3 24 22
Fetch 7 0.00 0.00 0 14 0 7
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 47 0.01 0.01 0 17 24 29

Misses in library cache during parse: 2
Misses in library cache during execute: 1

22 user SQL statements in session.
9 internal SQL statements in session.
31 SQL statements in session.
********************************************************************************
Trace file: itdb_ora_2508.trc
Trace file compatibility: 9.00.01
Sort options: default

2 sessions in tracefile.
33 user SQL statements in trace file.
13 internal SQL statements in trace file.
31 SQL statements in trace file.
13 unique SQL statements in trace file.
424 lines in trace file.

June 04, 2004 - 12:32 pm UTC

interesting.

autotraces would lead a reasonable person to believe we were using the same tables and this was apples to apples.

but tkprof showing reality shows you are using totally different sets of tables.

HMMM.....

but the bottom line here is the first one got hit with some physical IO the other did not.

but they processed the same "amount" of data (8,746 vs 8,776 LIO's).

But again -- take the advice to rewrite this query using just "r" -- ordering it, getting the first N, and then joining to that and you'll find this query runs "sub second" in all cases.

Confession to make!!

A reader, June 07, 2004 - 6:56 pm UTC

Well, I was trying to hide the real table names by editing the tkprof output.. did it in one but not the other... that didn't escape your eagle eyes :)

Anyway, I
i) simplified the query and got the between out of the equation.
ii)Did this on a UNIX box and got windoze out of the equation ;)
iii) changed optimizer_index_cost_adj to 35 and optimzer_index_caching to 90.

None of these made a difference - gather_stats is still doing a FTS on request table. I really want to move away from analyze_schema and am trying to build some confidence in gather_stats. This is the first candidate query I tried for comparison.

Can you help me identify why the FTS with gather_stats? Is there anything else I can email you?

Thanks.

June 07, 2004 - 7:37 pm UTC

but they do the same amount of work.

one did not have to do PIO
the other did have to do PIO

does the other do repeated PIO if you run it over and over?

Here is the simplified query and its explain plans

A reader, June 07, 2004 - 6:59 pm UTC

Simplified Query (with real table names :)
==========================================

select req_id
from REQUEST r, DEVICE_COMMAND dc , users u
where r.dev_id = 6
and r.cmd_id = dc.cmd_id
and r.user_id = u.user_id
order by 1 desc
;

Plan with analyze schema:
=========================

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=31 Bytes=620
)
1 0 SORT (ORDER BY) (Cost=12 Card=31 Bytes=620)
2 1 NESTED LOOPS (Cost=5 Card=31 Bytes=620)
3 2 NESTED LOOPS (Cost=4 Card=31 Bytes=496)
4 3 TABLE ACCESS (BY INDEX ROWID) OF 'REQUEST' (Cost=3 C
ard=31 Bytes=403)
5 4 INDEX (RANGE SCAN) OF 'DUP_REQUEST_2' (NON-UNIQUE)
(Cost=1 Card=31)
6 3 INDEX (UNIQUE SCAN) OF 'PK_USERS_1' (UNIQUE)
7 2 INDEX (UNIQUE SCAN) OF 'PK_DEVICE_COMMAND' (UNIQUE)

Plan with gather_stats:
=========================

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=19 Card=718 Bytes=18
668)

1 0 SORT (ORDER BY) (Cost=19 Card=718 Bytes=18668)
2 1 NESTED LOOPS (Cost=9 Card=718 Bytes=18668)
3 2 NESTED LOOPS (Cost=8 Card=718 Bytes=15078)
4 3 TABLE ACCESS (FULL) OF 'REQUEST' (Cost=6 Card=718 By
tes=12206)

5 3 INDEX (UNIQUE SCAN) OF 'PK_USERS_1' (UNIQUE)
6 2 INDEX (UNIQUE SCAN) OF 'PK_DEVICE_COMMAND' (UNIQUE)

dbms_stats.alter_database_tab_monitoring

A reader, June 29, 2004 - 7:52 pm UTC

I tried dbms_stats.alter_database_tab_monitoring a few times on my database and it failed with 'Resouce busy, update with NOWAIT requested' all the time.

Why would this be? All this should do is loop thru all the schemas and set ALTER TABLE ... MONITORING, right? Why would that fail like this?

Thanks

[BTW, the 9iR2 docs refer to this as alter_database_tabLE_monitoring which is more than the allowed 30-character length for identifiers!]

June 29, 2004 - 8:24 pm UTC

the alter table needs a table lock for an instant, that is what is failing.

either wait for a period of inactivity -- or simply DIY:

begin
for x in ( select .... )
execute immediate 'alter ....'
end loop;
end;

and through an exception handler in there to catch the resource busy errors.

Jane, July 01, 2004 - 4:41 pm UTC

Tom,
Whenever we enable Monitoring for table, how will the database remember whether it needs to be analyzed or not.

Is it something that triggered when we do dbms_stats

Thanks in advance

July 01, 2004 - 7:59 pm UTC

they maintain stats in memory and flush this to disk every 3 hours (now 15 minutes in 9i and up). dba_tab_modifications lets you peek at this (also in 9i, you can flush the status out using dbms_stats)

then, you would use the GATHER STALE option with dbms_stats gather schema stats (or database stats) to gather stats just on the tables that need it.

Statistics on SYS

Ross, July 02, 2004 - 12:16 pm UTC

Hi Tom,

This has been an informative discussion, but I have another question: In many of the posts, people have analyzed the entire database and then deleted the statistics for SYS and SYSTEM.
What are the possible effects of having statistics for these users?

The reason I ask is because one of our testers was struggling with running a query on one of our views based on the following query:

select P.param_value client
,VI.host_name hostname
,user schema
,sysdate snapshot_dt
,object_type
,object_name
,OTH.version
,NVL(vid,'NO VID') vid
,create_dt
,modified_dt
,OTH.ora_status
from PARAMETER P,
V$INSTANCE VI,
(
select object_type
,object_name
,TDP_WHAT.get_vers(vid) version
,create_dt
,modified_dt
,ora_status
,vid
from
(
select UO.object_type,
UO.object_name,
vid,
created create_dt,
last_ddl_time modified_dt,
status ora_status
from user_objects UO,
(
select type object_type,
name object_name,
TDP_WHAT.get_vid(type,name) vid
from user_source
where instr(text, '@'||'(#)', 1 ) > 0
and instr(text,'$Header',1) > 0
and line < 1000
UNION
select object_type,
object_name,
TDP_WHAT.get_vid(object_type, object_name) vid
from USER_OBJECTS
where object_type = 'VIEW'
or ( object_type = 'TRIGGER' and ( substr(object_name, length(object_name)-1, 1 ) != '_' and substr(object_name, length(object_name)-2, 1 ) != '_' ))
) SRC
where UO.object_type in ( 'PACKAGE', 'PACKAGE BODY', 'FUNCTION', 'PROCEDURE', 'TRIGGER', 'VIEW', 'TYPE', 'TYPE BODY' )
and UO.object_name = SRC.object_name(+)
and UO.object_type = SRC.object_type(+)
and ( UO.object_type != 'TRIGGER' or ( substr(UO.object_name, length(UO.object_name)-1, 1 ) != '_' and substr(UO.object_name, length(UO.object_name)-2, 1 ) != '_' ))
and instr(UO.object_name, '$' ) = 0
)
) OTH
where param_name = 'SITE_NAME'
order by 1,2,3

This was generally taking around 30 seconds to return a single row until he tried logging on as SYS and analysing system tables SOURCE$, VIEW$ and TRIGGER$ (using DBMS_DDL.analyze_object - don't worry, I know your views on this package, but anyway).
This reduced the run time of the query for a given object/object type by a factor of roughly 200.

We are considering incorporating this into the nightly statistics gathering operation, but I wanted to request your views on analyzing SYS-owned objects. Are there specific dangers to this?

Many thanks,
Ross

July 02, 2004 - 12:30 pm UTC

in 8i, you can, but probably don't want to.
in 9i, you can -- as long as you did so on your test environment as well
in 10g, it just happens.

Delete column statistics

Sami, August 11, 2004 - 10:12 am UTC

Dear Tom,

Is there a way to delete column statistics alone without deleting table statistics?

August 11, 2004 - 1:11 pm UTC

well, we should be able to:

export stats with cascade=> false
delete stats
import stats

but, that didn't work... so, this is a workaround:

ops$tkyte@ORA9IR2> create table t as select * from all_users;
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select num_rows from user_tables where table_name = 'T';
 
  NUM_ROWS
----------
        53
 
ops$tkyte@ORA9IR2> select count(*) from user_tab_histograms where table_name = 'T';
 
  COUNT(*)
----------
         6
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.create_stat_table( user, 'STATS' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.export_table_stats( user, 'T', null, 'STATS' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> delete from stats where type = 'C' and C1 = 'T' and c5 = user;
 
3 rows deleted.
<b>blow away the column stats for table USER.T</b>

 
ops$tkyte@ORA9IR2> exec dbms_stats.delete_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.import_table_stats( user, 'T', null, 'STATS' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select num_rows from user_tables where table_name = 'T';
 
  NUM_ROWS
----------
        53
 
ops$tkyte@ORA9IR2> select count(*) from user_tab_histograms where table_name = 'T';
 
  COUNT(*)
----------
         0

Why don't we use dbms_stats.delete_column_stats

A reader, August 11, 2004 - 5:09 pm UTC

Tom,

Simply we can use dbms_stats.delete_column_stats, right?correct me if I am wrong.

August 12, 2004 - 8:29 am UTC

you know, I could have sworn i tried that -- I could have *sworn* i did and couldn't get it to work.  must have been typing way too fast for this morning it does work:

ops$tkyte@ORA9IR2> create table t as select * from all_users;
 
Table created.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select 't', num_rows from user_tables where table_name = 'T'
  2  union all
  3  select 'c', count(*) from user_tab_histograms where table_name = 'T';
 
'   NUM_ROWS
- ----------
t         52
c          6
 
ops$tkyte@ORA9IR2> exec dbms_stats.delete_column_stats( user, 'T', 'USERNAME' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.delete_column_stats( user, 'T', 'CREATED' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.delete_column_stats( user, 'T', 'USER_ID' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select 't', num_rows from user_tables where table_name = 'T'
  2  union all
  3  select 'c', count(*) from user_tab_histograms where table_name = 'T';
 
'   NUM_ROWS
- ----------
t         52
c          0

A reader, August 11, 2004 - 7:00 pm UTC

Most and should for every Oracle professionals

umesh, September 13, 2004 - 2:29 pm UTC

It is just like a news paper to me, without
going through this site i won't start my day.

Stats with Histograms

reader, September 21, 2004 - 10:55 am UTC

Tom
you wrote:

"o for all indexed columns - gather histograms on all   columns that are currently indexed.  I find this typically to be the "most useful" although you can arguably skip nearly unique columns such as primary keys"


on your above comment...is there a clause to exclude
primary key columns from histogram stats.


SQL> exec dbms_stats.gather_schema_stats'XXX',method_opt=>'for all indexed columns size auto',cascade=>true);

thanks

September 21, 2004 - 11:14 am UTC

method_opt => and list the columns you want

there is also skewonly, but it won't "skip them", just adjusts what it does based on the skewedness of the data

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:5792247321358#7417227783861 <code>

USAGE

Raaghid, September 23, 2004 - 8:20 am UTC

Can you please list out the usages of the analyze briefly. (and also give link to do read full document)

September 24, 2004 - 8:23 am UTC

otn.oracle.com -> documention -> goto your version and read the SQL reference guide. it is all there.

USAGE

Raaghid, September 23, 2004 - 8:21 am UTC

Can you please list out the usage (basically benefit) of the analyze briefly. (and also give link to do read full document)

dbms_stats vs analyze

Prashant Mahajan, October 23, 2004 - 7:28 pm UTC

Tom,

In the OTN Oracle 9i Performance Tuning document, there is following highlighted note:

"the cost-based optimizer will eventually use only statistics that have been collected by DBMS_STATS. " (and not the staticstics generated by ANALYZE command???)

Does the above statement mean what is mentioned in next statement after few paragraphs?

"DBMS_STATS gathers only statistics needed for cost-based optimization; it does not gather other statistics. For example, the table statistics gathered by DBMS_STATS include the number of rows, number of blocks currently containing data, and average row length, but not the number of chained rows, average free space, or number of unused data blocks."

The reason we want to know the difference is because we have a weekly scheduled job that does analyze on each table in our application schema using "ANALYZE" rather than DBMS_STATS. The way it is done is using query

select 'ANALYZE TABLE ' || table_name || ' COMPUTE STATISTICS'
from all_tables
where owner = <schema_name>;

And then executing the resulting statements.

Our application is web application and its a relatively small database than many out there. There are 300-350 tables and only 15% of tables are large meaning those have about 500000 rows. The job runs on weekly schedule. And we are getting performance we want.

If we were to use dbms_stats will it be any different? Are we doing anything wrong by not using dbms_stats?

Please explain.

Thanks

October 23, 2004 - 7:34 pm UTC

the CBO is built to believe that DBMS_STATS gathered the stats.

dbms_stats and analyze come to slightly different results for somethings. this could affect you.

dbms_stats would simplify your process -- you would just have to say "gather schema stats".

but frankly, right now, today, if you have a working system, that you are not really changing much if at all -- the path of least resistance is "leave it be".

The caveat is that at some point in the future, analyze will most likely stop collecting stats for the optimizer (just as dbms_stats ONLY collects things used by the optimizer)

That was real quick and useful response.

Prashant Mahajan, October 23, 2004 - 7:43 pm UTC

You are great!

A reader, November 03, 2004 - 4:28 pm UTC

Tom,

Is dbms_stats available in 8i?

November 05, 2004 - 11:01 am UTC

yes

Anto, November 21, 2004 - 9:45 am UTC

Hi Tom,

This may sound like a silly question but still am not sure about it.

Suppose a table is having uptodate statistics on it.
We then do a delete or truncate of the table and then do a insert into this table. All the statistics ( number of rows,blocks,empty blocks,avg row length etc) are almost the same before the truncate/delete and after the insert.
My question is
Is it still necessary to gather statistics after the insert operation OR the table should work fine with currently existing statistics ?
The same sitaution arise when we do a full refresh of a materialized view(MV) using dbms_refresh('Dummy MV','MV').
It does a delete and then a insert of the materialized view(MV). If the statistics have not changed much after the refresh, is it still necessary to do a 'gather table stats' on the MV, after the refresh ?

November 21, 2004 - 7:04 pm UTC

statistics are just numbers used by the optimizer to figure out "what plan"

if the numbers would not change, then they need not be re-gathered....

however, I am pretty sure the cluster factor on the indexes would be different unless the data was sorted exactly the same -- so indexes might necessarily be affected here.

A reader, November 29, 2004 - 5:09 am UTC

Elaborate on cluster factor......

Prashant, December 28, 2004 - 3:26 pm UTC

Hi Tom,

The earlier review was extremely useful, since I have been working on preparing a strategy for analyzing tables and indexes in our data warehouse, we truncate and populate the tables daily and the volume of the data as well as the type of data isn't changing much, perhaps there will be a 2-3% increase daily, and I was planning on having the analyze run during the weekend instead of after every load.

But as you mentioned about the cluster factor for indexes, could you please elaborate on the same.

Thanks and Regards
Prashant

December 28, 2004 - 3:51 pm UTC

do you know what a clustering factor is. In Expert one on one Oracle i go into it in some detail. as well, a search for it on this site will turn up many articles.

COMPOSIT INDEX

Ashutosh Upadhyay, December 29, 2004 - 12:30 am UTC

I want to search on large number of columns. I have differenet queries (dynamicly created), which searches on different column, let's say it may be 1 column or it may go up to 40 columns that too in any sequence. This is totally dependent on user, ehat search criteria, he/she want to.

I want to know, wheather by building a composit index on 40 columns in this condition, will help any.

December 29, 2004 - 9:56 am UTC

You'd be looking at Oracle text I think. One approach would be to sort of "xml-ise" the data in the index (tag it) so you can either just do "any column searching" or "search this specific column" searching

all with a single index. on all 40 columns.

see
</code> https://docs.oracle.com#index-TEX <code>

method_opt => null

daxu, January 10, 2005 - 12:01 pm UTC

Tom,
I saw this question on this web page, "
Is there a way of analyzing the table and indexes without gathering
histographical stats using dbms_stats, i.e., the equivalent of 'analyze table
table_name estimate statistics;'

Followup:
method_opt => null"

I tried the the following command
execute dbms_stats.gather_schema_stats(user, method_opt => null);
and the columns is still got analyzed. I checked both tables all_tab_columns last_analyzed and also all_tab_histrogram and they showed the columns are analyzed. It looks like method_opt => null not work properly.

Thanks,

Daxu

January 10, 2005 - 1:38 pm UTC

actually, the default method opt won't collect full blown histograms, just column level stats (hi/lo/number of distinct values) -- things you probably really really REALLY want.

do you really not want the hi/lo and number of distinct values?

default method_opt

daxu, January 10, 2005 - 3:41 pm UTC

Tom,
Thank you for the prompt answer and this helps a lot as usual. I still have question about the method_opt you mentioned.

Is that method_opt => null is the same as
method_opt => 'for all columns size 1' or something like that because you said
"the default method opt won't collect full blown histograms, just
column level stats (hi/lo/number of distinct values"

Will that be more efficient if I use
method_opt => 'for all indexed columns" if I normally only joint tables with the indexed columns?

And I see a lot 'for all indexed columns' without specifying size bucket on this web site, is it the default size 1 or auto if skipping the SIZE?

Thanks,

Daxu

January 10, 2005 - 4:25 pm UTC

you want high/low values pretty much all of the time -- for any column that has a predicate on it (helps get the CARD= values somewhere "near reality"), vital for the correct plan)

i would let it default unless you needed specific histograms on some set of columns.

What does DBMS_STATS.GATHER_SCHEMA_STATS do?

A Reader, January 10, 2005 - 5:27 pm UTC

Tom,
When we run
exec DBMS_STATS.GATHER_SCHEMA_STATS (ownname=>SCHEMA);
What does it do?
We used to run analyze table compute statistics and every thing works fine (9iR2). When switched to DBMS_STATS.GATHER_SCHEMA_STATS , some queries do not work well anymore (that's what I am told) and we have to go back to analyze tables, the old way.
My question here are what does the GATHER_SCHEMA_STATS do? Does it also analyze indexes as well? Do we need to do some specific settings to gather all necessary stats? I thought the GATHER_SCHEMA_STATS will analyze all the tables and collect statistics for CBO.
I read Oracle document on the topic and still don't have clear answers.
Thank you.

January 10, 2005 - 6:06 pm UTC

well, the documentation actually covers what it does:

http://docs.oracle.com/docs/cd/B10501_01/appdev.920/a96612/d_stats2.htm#1003995



and you don't really say "how" you analyzed the tables, so I'll guess "analyze table t compute statistics"

that analyze got table, index and column information.

your dbms_stats got table, and colum stats, no index stats since cacade defaults to false.

So, let's do what you should have done in your TEST system prior to even thinking about doing this (eg: figuring out "what is different, what will happen" -- before it happens...)


ops$tkyte@ORA9IR2> create table t as select * from all_objects;
Table created.
 
ops$tkyte@ORA9IR2> create index t_idx on t(object_name);
Index created.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_schema_stats( user, CASCADE=>true );
PL/SQL procedure successfully completed.
 
<b>if you run this with cascade=> FALSE, you can see the difference that makes</b>


ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table tabinfo as select 'DBMS' what, user_tables.* from user_tables where table_name = 'T';
 
Table created.
 
ops$tkyte@ORA9IR2> create table indexinfo as select 'DBMS' what, user_indexes.* from user_indexes;
 
Table created.
 
ops$tkyte@ORA9IR2> create table colinfo as select 'DBMS' what, x.* from user_tab_histograms x;
 
Table created.
 
ops$tkyte@ORA9IR2> exec dbms_stats.delete_schema_stats( user );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> insert into tabinfo select 'AN' what, user_tables.* from user_tables where table_name = 'T';
 
1 row created.
 
ops$tkyte@ORA9IR2> insert into indexinfo select 'AN' what, user_indexes.* from user_indexes;
 
1 row created.
 
ops$tkyte@ORA9IR2> insert into colinfo select 'AN' what, x.* from user_tab_histograms x;
 
26 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec compare_first_two( 'TABINFO', ' what = ''AN'' ', ' what = ''DBMS'' ' );
AVG_ROW_LEN                   : 96                             93
AVG_SPACE                     : 879                            0
EMPTY_BLOCKS                  : 115                            0
GLOBAL_STATS                  : NO                             YES
WHAT                          : AN                             DBMS
 
PL/SQL procedure successfully completed.

<b>that shows the differences in the table information.  the optimizer doesn't use avg_space, empty_blocks -- so, nothing there.  Else, they are basically "the same" here.  num_rows and all were "equal" (this routine only prints out differences)</b>


 
ops$tkyte@ORA9IR2> exec compare_first_two( 'INDEXINFO', ' what = ''AN'' ', ' what = ''DBMS'' ' );
GLOBAL_STATS                  : NO                             YES
WHAT                          : AN                             DBMS
 
PL/SQL procedure successfully completed.

<b>nary a difference in the index stuff</b>

 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select what, column_name, count(*), min(ENDPOINT_NUMBER) minep, max(ENDPOINT_NUMBER)maxep
  2    from colinfo
  3   group by what, column_name
  4  /
 
WHAT COLUMN_NAME            COUNT(*)      MINEP      MAXEP
---- -------------------- ---------- ---------- ----------
AN   OWNER                         2          0          1
AN   STATUS                        2          0          1
AN   CREATED                       2          0          1
AN   GENERATED                     2          0          1
AN   OBJECT_ID                     2          0          1
AN   SECONDARY                     2          0          1
AN   TEMPORARY                     2          0          1
AN   TIMESTAMP                     2          0          1
AN   OBJECT_NAME                   2          0          1
AN   OBJECT_TYPE                   2          0          1
AN   LAST_DDL_TIME                 2          0          1
AN   DATA_OBJECT_ID                2          0          1
AN   SUBOBJECT_NAME                2          0          1
DBMS OWNER                         2          0          1
DBMS STATUS                        2          0          1
DBMS CREATED                       2          0          1
DBMS GENERATED                     2          0          1
DBMS OBJECT_ID                     2          0          1
DBMS SECONDARY                     2          0          1
DBMS TEMPORARY                     2          0          1
DBMS TIMESTAMP                     2          0          1
DBMS OBJECT_NAME                   2          0          1
DBMS OBJECT_TYPE                   2          0          1
DBMS LAST_DDL_TIME                 2          0          1
DBMS DATA_OBJECT_ID                2          0          1
DBMS SUBOBJECT_NAME                2          0          1
 
26 rows selected.

<b>and the column info is the same</b>


So, perhaps you were missing "cascade=>true" -- but this stuff is really easy to figure out (the dictionary shows you what you have....)




create or replace type myScalarType as object
( rnum number, cname varchar2(30), val varchar2(4000) )
/
create or replace type myTableType as table of myScalarType
/
                                                                                                                               
create or replace
function cols_as_rows( p_query in varchar2 ) return myTableType
-- this function is designed to be installed ONCE per database, and
-- it is nice to have ROLES active for the dynamic sql, hence the
-- AUTHID CURRENT_USER
authid current_user
-- this function is a pipelined function -- meaning, it'll send
-- rows back to the client before getting the last row itself
-- in 8i, we cannot do this
PIPELINED
as
    l_theCursor     integer default dbms_sql.open_cursor;
    l_columnValue   varchar2(4000);
    l_status        integer;
    l_colCnt        number default 0;
    l_descTbl       dbms_sql.desc_tab;
    l_rnum          number := 1;
begin
    -- parse, describe and define the query.  Note, unlike print_table
    -- i am not altering the session in this routine.  the
    -- caller would use TO_CHAR() on dates to format and if they
    -- want, they would set cursor_sharing.  This routine would
    -- be called rather infrequently, I did not see the need
    -- to set cursor sharing therefore.
    dbms_sql.parse(  l_theCursor,  p_query, dbms_sql.native );
    dbms_sql.describe_columns( l_theCursor, l_colCnt, l_descTbl );
    for i in 1 .. l_colCnt loop
        dbms_sql.define_column( l_theCursor, i, l_columnValue, 4000 );
    end loop;
                                                                                                                               
    -- Now, execute the query and fetch the rows.  Iterate over
    -- the columns and "pipe" each column out as a separate row
    -- in the loop.  increment the row counter after each
    -- dbms_sql row
    l_status := dbms_sql.execute(l_theCursor);
    while ( dbms_sql.fetch_rows(l_theCursor) > 0 )
    loop
        for i in 1 .. l_colCnt
        loop
            dbms_sql.column_value( l_theCursor, i, l_columnValue );
            pipe row
            (myScalarType( l_rnum, l_descTbl(i).col_name, l_columnValue ));
        end loop;
        l_rnum := l_rnum+1;
    end loop;
                                                                                                                               
    -- clean up and return...
    dbms_sql.close_cursor(l_theCursor);
    return;
end cols_as_rows;
/
                                                                                                                               
create or replace procedure compare_first_two( p_tname in varchar2,
                                              p_where1 in varchar2,
                                              p_where2 in varchar2 )
as
begin
    for x in ( select a.rnum, a.cname, a.val t1, b.val  t2
                 from table( cols_as_rows( 'select * from ' || p_tname ||
                                           ' where ' || p_where1 ) ) a,
                      table( cols_as_rows( 'select * from ' || p_tname || 
                                           ' where ' || p_where2 ) ) b
                where a.rnum = b.rnum
                  and a.cname = b.cname
                  and ( a.val <> b.val or ((a.val is null or b.val is null) 
                               and a.val||b.val is not null) )
              )
    loop
        dbms_output.put_line( rpad( x.cname, 30 ) || ': ' || 
         rpad( substr(x.t1,1,30), 30 ) || ' ' || rpad(substr(x.t2,1,30), 30 ) );
    end loop;
end;
/


(and remember these utilities are run to demonstrate something, not in "real code" so the lack of binds in this particular case is OK)

Oracle 9204

A reader, January 13, 2005 - 5:21 pm UTC

Tom,

One of my colleagues pointed this out, can you please provide your expertise on this.

begin
dbms_output.put_line('UTILITY START' || to_char(sysdate,'mm/dd/yy hh24:mi:ss'));
dbms_utility.analyze_schema (schema=> 'HA',
                          method=>'ESTIMATE',
                          estimate_percent=>1,
                        method_opt=>' for all indexed columns size 254');
dbms_output.put_line('UTILITY STOP' || to_char(sysdate,'mm/dd/yy hh24:mi:ss'));
dbms_output.put_line('STATS START' || to_char(sysdate,'mm/dd/yy hh24:mi:ss'));
dbms_stats.gather_schema_stats
       (        ownname=> 'HA',
                estimate_percent=>1,
                method_opt => 'for all indexes for all indexed columns size 254',
                degree=>8,
                cascade => true
        );
dbms_output.put_line('STATS STOP' || to_char(sysdate,'mm/dd/yy hh24:mi:ss'));
end
/

===== results
15:49:46 SQL> /
UTILITY START 01/13/05 15:49:46
UTILITY STOP 01/13/05 16:00:41
STATS START 01/13/05 16:00:41
STATS STOP 01/13/05 16:16:58

I understand that parallel=fast is not always true, but why is the first one faster than the second

HA shema has around 400 tables and some of them have about 30 mill records. 

OS :- Solaris 8.0 with 2 cpus.

Thanks.

January 13, 2005 - 6:39 pm UTC

we are talking about 11 minutes versus 16 minutes.

if it takes about 16.5 seconds per table, and PQ didn't really have a chance to massively speed anything up (and max degree of parallelism should really be about 2xCPU, not 8), and it took say 2-4 seconds per segment extra (not much in the grand scheme of things.......

you see where I'm going. you had LOTS of small stuff -- to which applying parallel adds considerable overhead. Add considerable overhead 400 times and you get "lots of seconds"

A reader, January 14, 2005 - 1:12 pm UTC

Tom,

I modified the degree but to no good, please see below results.

=== degree 0
UTILITY START 01/14/05 08:51:11
UTILITY STOP 01/14/05 09:04:25
STATS START 01/14/05 09:04:25
STATS STOP 01/14/05 09:29:36

=== degree 2
10:05:08 SQL> /
UTILITY START 01/14/05 10:05:09
UTILITY STOP 01/14/05 10:14:56
STATS START 01/14/05 10:14:56
STATS STOP 01/14/05 10:29:58

=== degree 4
10:36:11 SQL> /
UTILITY START 01/14/05 10:36:11
UTILITY STOP 01/14/05 10:42:36
STATS START 01/14/05 10:42:36
STATS STOP 01/14/05 10:57:02

=== degree 8
11:44:56 SQL> /
UTILITY START 01/14/05 11:44:57
UTILITY STOP 01/14/05 11:51:13
STATS START 01/14/05 11:51:13
STATS STOP 01/14/05 12:06:07

January 14, 2005 - 8:20 pm UTC

you missed my main point.

I did say "4" would be your max degree of parallelism

but, you have 400 things that each take a small bit of time. PQ is not appropriate here, the setup time for the PQ is probably longer than the execution time for many of those objects.

A reader, January 14, 2005 - 10:26 pm UTC

Tom,

I understand your point. After reading your response i tried with degree 0 (please see first option), others were test cases.

Now the point is why is dbms_utility faster than dbms_stats?

Thanks.

January 15, 2005 - 9:36 am UTC

dbms_utility is only there for backwards compability with older releases. Not to be used.

We are talking *small minutes* here, nothing relevant in the grand scheme of things when considering that you are gathering stats for an entire schema.

dbms_stats collects the precise statistics the optimizer is developed to expect. If you compare the numbers gathered by each, you will see they are different slightly.

A reader, February 14, 2005 - 11:08 am UTC

Hi Tom,

From one of your previous posts in this thread :

o SIZE AUTO -- Oracle looks at the data and using a magical, undocumented and
changing algorithm, figures out all by itself what columns to gather stats on
and how many buckets and all. It'll collect histograms in memory only for those
columns which are used by your applications (those columns appearing in a
predicate involving an equality, range, or like operators). we know that a
particular column was used by an application because at parse time, we'll store
workload information in SGA. Then we'll store histograms in the data dictionary
only if it has skewed data (and it worthy of a histogram)

From the above "It'll collect histograms in memory only for those
columns which are used by your applications (those columns appearing in a
predicate involving an equality, range, or like operators). we know that a
particular column was used by an application because at parse time, we'll store
workload information in SGA."

Does this mean that SIZE AUTO will not work as intended after a instance bounce(shutdown/restart) ?

thanks
Anto

February 14, 2005 - 1:52 pm UTC

the static dictionary table, sys.col_usage$, maintains this persistantly across reboots.

A reader, February 14, 2005 - 2:02 pm UTC

Thanks, Tom

Anto

Analyze index

Anil, March 04, 2005 - 8:49 am UTC

Hi Tom

What is the impact of not anlyzing index or an index analyze will influence the optimizer. Would you show examples. Normally optmizer will prepare the plan based on the presense of indexes and data sistribution based on hostomgrams if any exist. I am not clear waht value it looks in indexes stats to decide the plan. Appreciate your comments. We had one incident where DBA analyzed only tables and forgot to analyze the index for a long time, When he analyzed index the many queries started performing badly with different plans..

Rgds
Anil

March 04, 2005 - 9:33 am UTC

you can see it yourself, the optimizer has less information, doesn't understand everything it should and gets the wrong plan.

index stats have things like "this index has N blocks" (might use it for a fast full scan if it knows that it is smaller than the table)

this index and the table are sorted well with respect to eachother (good clustering factor), I can therefore use this index to range scan for bigger ranges in a performant manner.

Analyze index

Anil, March 04, 2005 - 8:50 am UTC

Hi Tom

What is the impact of not anlyzing index or how an index analyze will influence the optimizer. Would you show examples. Normally optmizer will prepare the plan based on the presense of indexes and data sistribution based on hostomgrams if any exist. I am not clear waht value it looks in indexes stats to decide the plan. Appreciate your comments. We had one incident where DBA analyzed only tables and forgot to analyze the index for a long time, When he analyzed index the many queries started performing badly with different plans..

Rgds
Anil

Some more clarifications needed

Sanjaya Balasuriya, March 07, 2005 - 3:56 am UTC

Hi Tom,

I have a table; t1 with 968516 rows. In this table there is a column; c1 with 13173 distinct values. But since c1 is a foreign key, it has been indexed. 

After gathering stats this index should be skipped. Is'nt it ?

begin
  dbms_stats.gather_table_stats(ownname => 'BORDER'
                               ,tabname => 'BC_T_DEPARTURES'
                               ,method_opt => 'for all indexed columns size 200'
                               ,cascade => true);
end;

But still the index on c1 is accessed.

Here is the autotrace output;

  1  SELECT DISTINCT BC_T_DEPARTURES.DEP_FLIGHT_NO,
  2    BC_T_DEPARTURES.DEP_FLIGHT_DATE, BC_T_DEPARTURES.DEP_FLT_SEQ_NO
  3    FROM BC_T_DEPARTURES,BC_M_FLIGHTS
  4    WHERE BC_M_FLIGHTS.FLT_SEQ_NO = BC_T_DEPARTURES.DEP_FLT_SEQ_NO
  5    AND FLT_FLIGHT_DATE BETWEEN
  6    (SELECT SYSDATE -
  7    (SELECT SPM_NUMBER_VALUE FROM BC_P_SYS_PARAMS WHERE SPM_INDEX_NO = '07')
FROM DUAL)
  8    AND (SELECT SYSDATE +
  9    (SELECT SPM_NUMBER_VALUE FROM BC_P_SYS_PARAMS WHERE SPM_INDEX_NO = '07')
FROM DUAL)
 10   AND BC_M_FLIGHTS.FLT_LOC_CODE='BIA'
 11*  ORDER BY BC_T_DEPARTURES.DEP_FLIGHT_NO
SQL> /


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=88 Card=1618 Byt
          es=97080)

   1    0   SORT (UNIQUE) (Cost=71 Card=1618 Bytes=97080)
   2    1     FILTER
   3    2       NESTED LOOPS (Cost=53 Card=1618 Bytes=97080)
   4    3         INDEX (FAST FULL SCAN) OF 'INDX_FLIGHTS' (NON-UNIQUE
          ) (Cost=9 Card=22 Bytes=638)

   5    3         TABLE ACCESS (BY INDEX ROWID) OF 'BC_T_DEPARTURES' (
          Cost=2 Card=968516 Bytes=30023996)

   6    5           INDEX (RANGE SCAN) OF 'DEP_FLT_FK_I' (NON-UNIQUE)
   7    2       TABLE ACCESS (FULL) OF 'DUAL' (Cost=1 Card=1)
   8    2       TABLE ACCESS (FULL) OF 'DUAL' (Cost=1 Card=1)




Statistics
----------------------------------------------------------
          0  recursive calls
         12  db block gets
      85823  consistent gets
       6737  physical reads
          0  redo size
        309  bytes sent via SQL*Net to client
        708  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

I think the problem with this SQL is that all the rows of the table are accessed through the index. Why this index is not skipped ? 
Is there are a way I can improve this SQL ? Will it be effective chnaging "size" in "method_opt" ?

March 07, 2005 - 8:17 am UTC

you flipped from t1, c1 to real table names -- lost me there.

how about a consistent example -- all t1/c1 or none -- what index do you think should not be used? CORRELATION NAMES on all columns - really necessary, you know immediately what columns go with what tables - i do not.

Sorry for the mistake

Sanjaya Balasuriya, March 07, 2005 - 10:23 am UTC

Sorry Tom.

I'll come with real names.

Table with 968516 rows is BC_T_DEPARTURES. And the columns with 13173 distinct values is DEP_FLT_SEQ_NO.
The index on DEP_FLT_SEQ_NO is DEP_FLT_FK_I.

March 07, 2005 - 3:56 pm UTC

start over - tables, columns in each table, what indexes on each table...

Clarification on SYS and SYSTEM statistics

Flemming Andersen, March 09, 2005 - 4:48 am UTC

Hi Tom
I'm using 9.2.0.5. In some of the examples above, statistics is gathered for the entire database, and afterwards the statistics are deleted for SYS and SYSTEM. I cannot find any comments on that in the Oracle documentation for gather_database_stats, maybe it's just common sense. But could you clarify:
If using gather_database_stats SYS- and SYSTEM stats must be deleted afterwards. In that case don't Oracle need any statistics for SYS and SYSTEM, or doesn't it have any impact if we delete the stats.

Best Regards Flemming

March 09, 2005 - 7:48 am UTC

You may gather stats on sys and system in 9i -- it is fully supported (and sort of mandatory in 10g)

make sure you've tested this before changing the way you gather anything in production please.

A question on analyze of SYS

A reader, March 16, 2005 - 10:11 am UTC

I was having difficulty getting a query involving the view DBA_EXTENTS to perform well. I eventually analyzed the SYS schema and it then generated a different plan that was much faster. However, if I deleted the schema stats for SYS, the query did not return to the original plan that was generated before stats was taken. Why is that?

SQL> select count(*) from dba_tables where owner='SYS' and last_analyzed is not null;
         0

SQL> select count(*) from dba_indexes where owner='SYS' and last_analyzed is not null;
         0

SQL> set autotrace on
SQL> set timing on
SQL> SELECT DISTINCT tablespace_name
  2  FROM DBA_EXTENTS WHERE owner = UPPER('mageor')
  3  AND segment_name = UPPER('TEST')
  4  /
USERS

Elapsed: 00:01:45.00

Execution Plan
----------------------------------------------------------
ERROR:
ORA-01039: insufficient privileges on underlying objects of the view


SP2-0612: Error generating AUTOTRACE EXPLAIN report

Statistics
----------------------------------------------------------
         22  recursive calls
          0  db block gets
   12629869  consistent gets
          0  physical reads
          0  redo size
        313  bytes sent via SQL*Net to client
        495  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

**** - ran exec dbms_stats.gather_schema_stats(ownname=>'SYS', cascade=>true);

SQL> set autotrace off
SQL> set timing off
SQL> select count(*) from dba_tables where owner='SYS' and last_analyzed is not null;
       304

SQL> select count(*) from dba_indexes where owner='SYS' and last_analyzed is not null;
       303
       
SQL> set autotrace on
SQL> set timing on
SQL> SELECT DISTINCT tablespace_name
  2  FROM DBA_EXTENTS WHERE owner = UPPER('mageor')
  3  AND segment_name = UPPER('TEST')
  4  /
USERS

Elapsed: 00:00:01.06

Execution Plan
----------------------------------------------------------
ERROR:
ORA-01039: insufficient privileges on underlying objects of the view


SP2-0612: Error generating AUTOTRACE EXPLAIN report

Statistics
----------------------------------------------------------
         85  recursive calls
          0  db block gets
       3487  consistent gets
       2360  physical reads
          0  redo size
        313  bytes sent via SQL*Net to client
        495  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
          1  rows processed

*** ran exec dbms_stats.delete_schema_stats(ownname=>'SYS');

SQL> set timing off
SQL> set autotrace off
SQL> select count(*) from dba_tables where owner='SYS' and last_analyzed is not null;
         0

SQL> select count(*) from dba_indexes where owner='SYS' and last_analyzed is not null;
         0

SQL> set timing on
SQL> set autotrace on
SQL> SELECT DISTINCT tablespace_name
  2  FROM DBA_EXTENTS WHERE owner = UPPER('mageor')
  3  AND segment_name = UPPER('TEST')
  4  /
USERS

Elapsed: 00:00:00.06

Execution Plan
----------------------------------------------------------
ERROR:
ORA-01039: insufficient privileges on underlying objects of the view


SP2-0612: Error generating AUTOTRACE EXPLAIN report

Statistics
----------------------------------------------------------
       1379  recursive calls
          0  db block gets
       4323  consistent gets
         17  physical reads
          0  redo size
        313  bytes sent via SQL*Net to client
        495  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         27  sorts (memory)
          0  sorts (disk)
          1  rows processed

March 16, 2005 - 10:14 am UTC

I can only guess that you had at some point in the past gathered stats. (state 1)

they became stale, leading to a bad plan... (state 2)

regathering stats fixed the stale stats and fixed the plan (state 3)

removing the stats put you into a new state 4 -- the rbo

analyze --cont --

A reader, March 16, 2005 - 2:55 pm UTC

I queried the the dba_indexes and dba_tables before I began, to verify there were no stats. I was able to repeat this behavior on 3 different databases on 3 different machines. The query was causing a problem for the app. The only way I could get a good plan was to take stats on the SYS schema. I am just trying to understand all the variables that could affect the CBO. The other behavior I noticed was that when I ran the query against the dba_extents before taking stats and added the rule hint - it ran pretty fast. Remove the hint - back to running slow. I still have a dataase that the query is running slow on. No SYS stats have been taken on this database if there is any thing that I can check to explain the behavior.

March 17, 2005 - 7:48 am UTC

no stats on what...

for you you

a) run a query with RBO
b) analyze and run with CBO
c) delete stats and run with RBO

and the plans for A and B are *different* something funky is going on.

but -- if collecting stats fixes the problem -- you have a query that under the RBO runs "slow" and under the cbo runs "fast"

stmt to exec

Jairo Ojeda M., March 22, 2005 - 12:16 pm UTC

TOM, you said
" exec dbms_stats.gather_table_stats( USER, 'T', CASCADE=>TRUE );
that IS just like "analyze table compute statistics" "
so, What is like " ANALYZE TABLE T COMPUTE STATISTICS FOR TABLE FOR ALL INDEXES FOR ALL INDEXED COLUMNS; ", using dbms_stats.gather_table_stats()?

thanks,

March 22, 2005 - 6:00 pm UTC

gather_table_stats( user, 't', method_opt=> 'for all indexed columns', cascade=>true)

:)

Jairo Ojeda, March 22, 2005 - 6:07 pm UTC

I have read that page again and I find the answer here
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:5792247321358#7417227783861 <code>

null statistics

A reader, April 02, 2005 - 7:51 am UTC

Hi Tom,
When I delete statistics, I get value of 0. How can I get null stats? I want to distinguish 0 (gathered with 0 values) with null (not yet gathered).
Thanks

April 02, 2005 - 9:28 am UTC

show me what you mean

ops$tkyte@ORA9IR2> create table t as select * from all_users;
 
Table created.
 
ops$tkyte@ORA9IR2> analyze table t compute statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> analyze table t delete statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> select last_analyzed, num_rows from user_tables where table_name = 'T';
 
LAST_ANAL   NUM_ROWS
--------- ----------
 
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats(user,'T');
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec dbms_stats.delete_table_stats(user,'T');
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select last_analyzed, num_rows from user_tables where table_name = 'T';
 
LAST_ANAL   NUM_ROWS
--------- ----------
 
 
ops$tkyte@ORA9IR2>

delete stats

A reader, April 02, 2005 - 6:12 pm UTC

Hi Tom,
Thanks for help. Here is the example:
select partition_name, num_rows, last_analyzed from user_tab_partitions where table_name ='TEST';

PARTITION_NAME                   NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
CVSPNSTYFACT_PART1                      0 20-MAR-05
CVSPNSTYFACT_PART10                     0 20-MAR-05
CVSPNSTYFACT_PART11                     0 20-MAR-05
CVSPNSTYFACT_PART12                     0 20-MAR-05
CVSPNSTYFACT_PART13                     0 20-MAR-05
CVSPNSTYFACT_PART14                     0 20-MAR-05
CVSPNSTYFACT_PART15                     0 20-MAR-05
CVSPNSTYFACT_PART16                     0 20-MAR-05
CVSPNSTYFACT_PART17                     0 20-MAR-05
CVSPNSTYFACT_PART18                     0 20-MAR-05
CVSPNSTYFACT_PART19                     0 20-MAR-05

PARTITION_NAME                   NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
CVSPNSTYFACT_PART2                      0 20-MAR-05
CVSPNSTYFACT_PART20                     0 20-MAR-05
CVSPNSTYFACT_PART21                     0 20-MAR-05
CVSPNSTYFACT_PART22                     0 20-MAR-05
CVSPNSTYFACT_PART23                     0 20-MAR-05
CVSPNSTYFACT_PART24                     0 20-MAR-05
CVSPNSTYFACT_PART25                     0 20-MAR-05
CVSPNSTYFACT_PART3                      0 20-MAR-05
CVSPNSTYFACT_PART4                      0 20-MAR-05
CVSPNSTYFACT_PART5                      0 20-MAR-05
CVSPNSTYFACT_PART6                      0 20-MAR-05

PARTITION_NAME                   NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
CVSPNSTYFACT_PART7                      0 20-MAR-05
CVSPNSTYFACT_PART8              135203648 20-MAR-05
CVSPNSTYFACT_PART9                      0 20-MAR-05

25 rows selected.

SQL> analyze table TEST partition (CVSPNSTYFACT_PART9) delete statistics;

Table analyzed.

SQL> select partition_name, num_rows, last_analyzed from user_tab_partitions where table_name ='TEST';

PARTITION_NAME                   NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
CVSPNSTYFACT_PART1                      0 20-MAR-05
CVSPNSTYFACT_PART10                     0 20-MAR-05
CVSPNSTYFACT_PART11                     0 20-MAR-05
CVSPNSTYFACT_PART12                     0 20-MAR-05
CVSPNSTYFACT_PART13                     0 20-MAR-05
CVSPNSTYFACT_PART14                     0 20-MAR-05
CVSPNSTYFACT_PART15                     0 20-MAR-05
CVSPNSTYFACT_PART16                     0 20-MAR-05
CVSPNSTYFACT_PART17                     0 20-MAR-05
CVSPNSTYFACT_PART18                     0 20-MAR-05
CVSPNSTYFACT_PART19                     0 20-MAR-05

PARTITION_NAME                   NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
CVSPNSTYFACT_PART2                      0 20-MAR-05
CVSPNSTYFACT_PART20                     0 20-MAR-05
CVSPNSTYFACT_PART21                     0 20-MAR-05
CVSPNSTYFACT_PART22                     0 20-MAR-05
CVSPNSTYFACT_PART23                     0 20-MAR-05
CVSPNSTYFACT_PART24                     0 20-MAR-05
CVSPNSTYFACT_PART25                     0 20-MAR-05
CVSPNSTYFACT_PART3                      0 20-MAR-05
CVSPNSTYFACT_PART4                      0 20-MAR-05
CVSPNSTYFACT_PART5                      0 20-MAR-05
CVSPNSTYFACT_PART6                      0 20-MAR-05

PARTITION_NAME                   NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
CVSPNSTYFACT_PART7                      0 20-MAR-05
CVSPNSTYFACT_PART8              135203648 20-MAR-05
CVSPNSTYFACT_PART9                      0 20-MAR-05

25 rows selected.
Thanks

April 03, 2005 - 9:05 am UTC

cannot reproduce, no idea what version you are using or how the table was created.

ops$tkyte@ORA9IR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int,
  5    y   varchar2(25)
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) tablespace users,
 10    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) tablespace tools,
 11    PARTITION junk VALUES LESS THAN (MAXVALUE)
 12  )
 13  /
 
Table created.
 
ops$tkyte@ORA9IR2> insert into t select to_date('12-mar-2003')+mod(rownum,5), rownum, rownum from all_users;
 
25 rows created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select partition_name, num_rows, last_analyzed from user_tab_partitions where table_name = 'T';
 
PARTITION_NAME                   NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
PART1                                   5 03-APR-05
PART2                                   5 03-APR-05
JUNK                                   15 03-APR-05
 
ops$tkyte@ORA9IR2> exec dbms_stats.delete_table_stats( user, 'T', partname => 'PART1' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select partition_name, num_rows, last_analyzed from user_tab_partitions where table_name = 'T';
 
PARTITION_NAME                   NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
PART1
PART2                                   5 03-APR-05
JUNK                                   15 03-APR-05
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.delete_table_stats(user,'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select partition_name, num_rows, last_analyzed from user_tab_partitions where table_name = 'T';
 
PARTITION_NAME                   NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
PART1
PART2
JUNK
 
ops$tkyte@ORA9IR2> analyze table t compute statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> select partition_name, num_rows, last_analyzed from user_tab_partitions where table_name = 'T';
 
PARTITION_NAME                   NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
PART1                                   5 03-APR-05
PART2                                   5 03-APR-05
JUNK                                   15 03-APR-05
 
ops$tkyte@ORA9IR2> analyze table t partition (part1) delete statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> select partition_name, num_rows, last_analyzed from user_tab_partitions where table_name = 'T';
 
PARTITION_NAME                   NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
PART1
PART2                                   5 03-APR-05
JUNK                                   15 03-APR-05

delete stats - reason

A reader, April 02, 2005 - 11:56 pm UTC

Hi Tom,
Sorry to bother you, I found that first I had to delete the stats (for some reason?) from tables through exec dbms_stats.delete_table_stats(user,'TEST'). Probably, you can tell the reason, "analyze table test delete statistics" did not delete table/partition stats (though the command did not return any error). Once I was able to delete stats through table, and then I am able to gather stats at the specific partition level, as I our tables will be accessed through partition key.
Thanks

April 03, 2005 - 9:12 am UTC

you should either use dbms_stats for everything or analyze for everything.

dbms_stats is the recommended choice in most all cases.

Thanks a lot!

A reader, April 09, 2005 - 5:10 pm UTC

Hi Tom,
How do you know all this?

btw: Don't worry about Don Burleson, Don should have closely looked at this site, and seen how your crystal clear and timely approach (free too!) helped thousands in real world.
Q: Did you know how Don can charge huge money an hour?

Ans: Atleast not with simple and clear approach.

But what if the statistics become obsolete all the time?

Oren Nakdimon, April 19, 2005 - 3:20 pm UTC

Dear Tom.
What do you recommend to do in cases where the execution plan of some query highly depends on data entered only short time prior to the query execution (and therefore the statistics are not up-to-date)?
More specifically, here is an abstracted sample of my real problem:

There are many "elements", with various attributes.
In the (OLTP) application, the user creates groups of elements. Each group is a sub-group of another group, filtered by conditions on attributes of the elements included in the original group.

In the example I populate the elements table with approximately 1,300,000 records:

ops$orenn@TIRAMISU> create table elements (
2 id,
3 attribute1 ,
4 attribute2 ,
5 attribute3 ,
6 attribute4 ,
7 attribute5 ,
8 constraint e_pk primary key(id)
9 )
10 as
11 select
12 rownum,
13 mod(rownum,3),
14 mod(rownum,5),
15 mod(rownum,7),
16 mod(rownum,10),
17 mod(rownum,100)
18 from
19 all_objects,
20 all_users;

Table created.

ops$orenn@TIRAMISU> create bitmap index e_i1 on elements(attribute1);

Index created.

ops$orenn@TIRAMISU> create bitmap index e_i2 on elements(attribute2);

Index created.

ops$orenn@TIRAMISU> create bitmap index e_i3 on elements(attribute3);

Index created.

ops$orenn@TIRAMISU> create bitmap index e_i4 on elements(attribute4);

Index created.

ops$orenn@TIRAMISU> create bitmap index e_i5 on elements(attribute5);

Index created.

ops$orenn@TIRAMISU> create table groups (
2 id number(10),
3 description varchar2(100),
4 constraint g_pk primary key(id)
5 );

Table created.

ops$orenn@TIRAMISU>
ops$orenn@TIRAMISU> create table elements_in_groups (
2 group_id references groups(id),
3 element_id references elements(id),
4 constraint eig_pk primary key (group_id,element_id)
5 );

Table created.

When the user creates a new group, the executed SQL statements look like this:

insert into groups (id,description) values (:new_group_id,:group_desc);

insert into elements_in_groups (group_id,element_id)
select
:new_group_id,
e.id
from
elements e,
elements_in_groups eig
where
eig.group_id={original_group_id} and
e.id=eig.element_id and
e.{column}={value} [and e.{column}={value}...];

Suppose we have 2 existing groups, a small one and a big one, and that the statistics are up-to-date:

ops$orenn@TIRAMISU> insert into groups (id,description) values (1,'Small group, analyzed');

1 row created.

ops$orenn@TIRAMISU> insert into groups (id,description) values (2,'Big group, analyzed');

1 row created.

ops$orenn@TIRAMISU> insert into elements_in_groups (group_id,element_id)
2 select 1,e.id from elements e where rownum<=100;

100 rows created.

ops$orenn@TIRAMISU> insert into elements_in_groups (group_id,element_id)
2 select 2,e.id from elements e where rownum<=1000000;

1000000 rows created.

ops$orenn@TIRAMISU> commit;

Commit complete.

ops$orenn@TIRAMISU> begin
2 dbms_stats.gather_table_stats (user,'ELEMENTS', cascade => true);
3 dbms_stats.gather_table_stats (user,'GROUPS', cascade => true);
4 dbms_stats.gather_table_stats (user,'ELEMENTS_IN_GROUPS', method_opt => 'for columns group_id', cascade => true);
5 end;
6 /

PL/SQL procedure successfully completed.

When the user creates now 2 new group, one refining the small group and the other refining the big group, the optimizer beautifully chooses different plans - good plans:

ops$orenn@TIRAMISU> explain plan for
2 insert into elements_in_groups (group_id,element_id)
3 select
4 :group_id,
5 e.id
6 from
7 elements e,
8 elements_in_groups eig
9 where
10 eig.group_id=1 and
11 e.id=eig.element_id and
12 e.attribute1=0 and
13 e.attribute3=0 and
14 e.attribute5=0;

Explained.

ops$orenn@TIRAMISU> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 100 | 3100 | 22 (19)|
| 1 | NESTED LOOPS | | 100 | 3100 | 22 (19)|
|* 2 | INDEX RANGE SCAN | EIG_PK | 100 | 1600 | 3 (0)|
| 3 | BITMAP CONVERSION TO ROWIDS | | | | |
| 4 | BITMAP AND | | | | |
|* 5 | BITMAP INDEX SINGLE VALUE | E_I5 | | | |
|* 6 | BITMAP INDEX SINGLE VALUE | E_I3 | | | |
|* 7 | BITMAP INDEX SINGLE VALUE | E_I1 | | | |
| 8 | BITMAP CONVERSION FROM ROWIDS| | | | |
|* 9 | INDEX RANGE SCAN | E_PK | 1 | | |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("EIG"."GROUP_ID"=1)
5 - access("E"."ATTRIBUTE5"=0)
6 - access("E"."ATTRIBUTE3"=0)
7 - access("E"."ATTRIBUTE1"=0)
9 - access("E"."ID"="EIG"."ELEMENT_ID")

24 rows selected.

ops$orenn@TIRAMISU> explain plan for
2 insert into elements_in_groups (group_id,element_id)
3 select
4 :group_id,
5 e.id
6 from
7 elements e,
8 elements_in_groups eig
9 where
10 eig.group_id=2 and
11 e.id=eig.element_id and
12 e.attribute1=0 and
13 e.attribute3=0 and
14 e.attribute5=0;

Explained.

ops$orenn@TIRAMISU> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 477 | 14787 | 46 (0)|
| 1 | NESTED LOOPS | | 477 | 14787 | 46 (0)|
| 2 | TABLE ACCESS BY INDEX ROWID | ELEMENTS | 642 | 9630 | 45 (0)|
| 3 | BITMAP CONVERSION TO ROWIDS| | | | |
| 4 | BITMAP AND | | | | |
|* 5 | BITMAP INDEX SINGLE VALUE| E_I5 | | | |
|* 6 | BITMAP INDEX SINGLE VALUE| E_I1 | | | |
|* 7 | BITMAP INDEX SINGLE VALUE| E_I3 | | | |
|* 8 | INDEX UNIQUE SCAN | EIG_PK | 1 | 16 | |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("E"."ATTRIBUTE5"=0)
6 - access("E"."ATTRIBUTE1"=0)
7 - access("E"."ATTRIBUTE3"=0)
8 - access("EIG"."GROUP_ID"=2 AND "E"."ID"="EIG"."ELEMENT_ID")

22 rows selected.

Please note I use a literal and not a bind variable in the "where eig.group_id=1" predicate, since different values may generate significantly different plans (it is one of these cases we prefer literals over bind variables, right?)

However, if the original group has just been created (also by the user, using the same application), the statistics are not up-to-date, the optimizer assumes the group is empty and therefore chooses a bad plan:

ops$orenn@TIRAMISU> insert into groups (id,description) values (3,'Small group, not analyzed');

1 row created.

ops$orenn@TIRAMISU> insert into groups (id,description) values (4,'Big group, not analyzed');

1 row created.

ops$orenn@TIRAMISU> insert into elements_in_groups (group_id,element_id)
2 select group_id+2,element_id from elements_in_groups;

1000100 rows created.

ops$orenn@TIRAMISU> commit;

Commit complete.

ops$orenn@TIRAMISU> explain plan for
2 insert into elements_in_groups (group_id,element_id)
3 select
4 :group_id,
5 e.id
6 from
7 elements e,
8 elements_in_groups eig
9 where
10 eig.group_id=4 and
11 e.id=eig.element_id and
12 e.attribute1=0 and
13 e.attribute3=0 and
14 e.attribute5=0;

Explained.

ops$orenn@TIRAMISU> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | 31 | 3 (34)|
| 1 | NESTED LOOPS | | 1 | 31 | 3 (34)|
|* 2 | INDEX RANGE SCAN | EIG_PK | 1 | 16 | 3 (0)|
|* 3 | TABLE ACCESS BY INDEX ROWID| ELEMENTS | 1 | 15 | 2 (50)|
|* 4 | INDEX UNIQUE SCAN | E_PK | 1 | | |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("EIG"."GROUP_ID"=4)
3 - filter("E"."ATTRIBUTE1"=0 AND "E"."ATTRIBUTE3"=0 AND "E"."ATTRIBUTE5"=0)
4 - access("E"."ID"="EIG"."ELEMENT_ID")

17 rows selected.

Is there some kind of "quick update" of the statistics, perhaps only on the portion of the table that I've just added - kind of an incremental statistics update? (I know, I know, I only need to set the "fast=true" init.ora parameter for getting quick updates, but seriously, if I have to collect statistics to make the insert work fast, but the statistics gathering takes long time, it doesn't really solve the problem, right?)

I guess this kind of problem is not very rare. Are there best practices for such a situation?

Many thanks,
Oren.

P.S.
I'm on Oracle 9iR2/10g.

April 19, 2005 - 7:40 pm UTC

wait -- you said

"OLTP"

and

"bitmap indexes"

in the same paragraph?

followed by OLTP and no bind variables?

I'm confused - because neither of those can really go with OLTP?

Well, it's not OLTP from end to end...

Oren Nakdimon, April 19, 2005 - 11:09 pm UTC

In my case, the "elements" table is "read only" for the OLTP users. There is a single process in the system (a batch process) that fills this table (and does it in bulk operations). There are no bitmap indexes on the tables manipulated by the end users ("groups" and "elements_in_groups" in the example).

Regarding the bind variables issue - I use a literal on the "elements_in_groups.group_id" column which is a very skewed column. I must use literals to take advantage of histograms, don't I?
Also, from knowing the application, I can tell that only few users will issue these "literal statements" and there will be a quite moderate number of total executions of them (the SGA is not going to be flooded).

April 20, 2005 - 6:48 am UTC

I can only go by what I read and see - I see "no binds", I see "bitmaps", I see "OLTP", they do not compute.

you can either use dbms_stats to set the column stats after a big insert or use hints to get the query you want .

I might in this case, since you always want that special plan, use hints cautiously -- but also use BINDS as you don't need literals if you are hinting anyway.

I'll try to make myself clearer

Oren Nakdimon, April 20, 2005 - 7:59 am UTC

Tom ,sorry for not making myself clear enough.
The major issue here is that I do NOT want the same plan every time. I think the optimizer does here an excellent job, and chooses the best plan according to the size of the group and the cardinality of the refining conditions (on elements.attributeX,...). The problem is that it must know the size of the group for that, and this group may be created 2 seconds before executing the statement.

The general question I raise here is what to do in cases where the execution plan of some query highly depends on data entered only short time prior to the query execution (and therefore the statistics are not up-to-date)?
I think the sample I gave above emphasizes the problematic situation.

Many thanks,
Oren.

April 20, 2005 - 8:21 am UTC

then, the optimizer must know the correct and proper information. you can use dbms_stats to spoon feed that information (using set_..... functions)

or you would have to gather statistics. it can only do what it can do given the information it has available to it.

Histograms lost?

A reader, April 20, 2005 - 12:52 pm UTC

Tom,

Quick question...

We gather stats using gather_table_stats on a daily basis. I'm testing how collecting histograms improve the execution of some queries. If I gather stats during the day and use the method_opt parameter to specify histograms and at night run stats again without specifying anything for method_opt, do I lose the original histograms?

April 20, 2005 - 9:07 pm UTC

ops$tkyte@ORA9IR2.US.ORACLE.COM> create table t ( x int );
 
Table created.
 
ops$tkyte@ORA9IR2.US.ORACLE.COM> insert into t select rownum from all_users;
 
26 rows created.
 
ops$tkyte@ORA9IR2.US.ORACLE.COM>
ops$tkyte@ORA9IR2.US.ORACLE.COM> exec dbms_stats.gather_table_stats( user, 'T', method_opt=>'for all columns size 254' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2.US.ORACLE.COM> select count(*) from user_tab_histograms where table_name = 'T';
 
  COUNT(*)
----------
        26
 
ops$tkyte@ORA9IR2.US.ORACLE.COM> exec dbms_stats.gather_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2.US.ORACLE.COM> select count(*) from user_tab_histograms where table_name = 'T';
 
  COUNT(*)
----------
         2

Set/Import stats - do not collect them

Vadim Gotovsky, April 20, 2005 - 4:55 pm UTC

Oren,

in your case the best solution is to either set stats (as Tom suggested) or even easier - import stats having them exported into stats table earlier from a populated table.

We have 500GB partitioned database where we do not collect stats at all. Every day we insert data into new partitions and immediately read from them, so there is no time to collect stats for newly populated partitions. We have stats table that contains stats exported from other populated partitions earlier (1 year ago). Right after populating new partitions we update stats table with new partition names and then import stats. Data profile does not change significantly from partition to partition so running queries always on the same stats is just fine.

Hope this helps

Thanks a lot!!

A reader, April 21, 2005 - 9:27 am UTC

Tom,

You could have just answered "yes" to my queston. Instead, you replied with a simple and clear example. Brilliant!

A reader, May 14, 2005 - 6:00 pm UTC

Hi Tom,
we are using Siebel CRM application version 772 with cost based optimizer. Siebel has released following alert.. Let me know where it has any merits or not..

---------------------------------------
Alert 1162: Sub-optimal Execution Plan can be Generated by the Oracle Optimizer if the Statistics are Generated on Empty Tables

--------------------------------------------------------------------------------

Description

For a single SQL statement, there can be many execution paths that all provide the correct answer but differ significantly in execution time. The challenge for the Oracle optimizer is to quickly find a cost-effective plan.

The cost based optimizer uses table, column and index statistics to determine the access path for a query. Database administrators (DBA) should gather statistics periodically in order to reflect the existent data in the database.

However, there have been reported cases where the Oracle optimizer will generate a sub-optimal execution plan for a SQL statement if it includes tables that have 0 (zero) rows and the DBA has collected statistics on them.

Change Request 12-WSLYQDhas been logged to address this documentation enhancement request.

Likelihood of Occurence

This behavior is likely to occur on queries that include tables with 0 rows and statistics has been collected for them.

This is only applicable when using the Oracle cost based optimizer mode. Please refer to the System Requirements and Supported Platforms Guide to evaluate if your Siebel application supports the Oracle cost based optimizer. Technical Note 582 contains additional information on Oracle CBO and Siebel version 7.7.

Possible Symptoms

Users can experience some performance degradation as a consequence of the sub-optimal execution plan generated by the Oracle optimizer for the query affected.

Workaround or Resolution

Delete the statistics on tables that are included on the affected SQL statement and has 0 rows. An example of the command used to delete the Oracle statistics is shown below:

SQL> analyze table s_contact_ss delete statistics;

Maintenance Release Number

Not Applicable

------------------------------------------------------------------------

May 14, 2005 - 6:56 pm UTC

Umm, if the table has zero rows, this shouldn't be a problem.

It would be a problem if when the stats were gathered the table was empty - BUT when the query is executed, it actually has rows.

That is most likely the case here, they have some "temporary tables" in their schema then when you gather stats are empty (because they are only full when the application is actually running with them, the gather session would see them as empty)

So, they gathered stats on empty tables, but fill them up in your session and the optimizer -- believing that they are in fact empty -- generates the optimal plan for when they are empty, but since they are not -- you have a "not so optimal" plan.

So, their solution is to delete statistics on those tables -- having the optimizer use DEFAULTS which make the table appear non-empty, instead of empty and run with that.

It is one workaround, but they didn't word it very well.

A related question...

A reader From Houston, May 20, 2005 - 12:52 pm UTC

Can we do the following? IF not why? I don't understand why doing the following giving an error (at the end)
PROCEDURE kcmstat_job IS

kcmOwner VARCHAR2(100);
tableExist integer;
burped EXCEPTION;
pragma exception_init(burped,-900);
stmt VARCHAR2(2000);
errormsg VARCHAR2(1000);
l_theCursor integer default dbms_sql.open_cursor;
rows_processed integer;
l_columnValue varchar2(2000);

BEGIN
-- get the schema owner
SELECT user INTO kcmOwner FROM dual;

SELECT NVL(min(rownum),0) INTO tableExist FROM user_tables WHERE table_name='KCMSTATS';

IF (tableExist = 0) THEN
stmt := 'DBMS_STATS.CREATE_STAT_TABLE(ownname => '''||kcmOwner||''', stattab => ''KCMSTATS'')';
dbms_sql.parse (l_theCursor, stmt, dbms_sql.native);
dbms_sql.define_column( l_theCursor, 1, l_columnValue, 2000 );
rows_processed := dbms_sql.execute (l_theCursor);
dbms_sql.close_cursor(l_theCursor);
-- ELSIF
-- Do something. Yeah we will perform other stuff here

END IF;

EXCEPTION
when burped then
dbms_sql.close_cursor(l_theCursor);
dbms_output.put_line('The kcmstat_job procedure failed during execution of:');
dbms_output.put_line(SUBSTR(stmt,1,80));
dbms_output.put_line(SUBSTR(stmt,81,160));
dbms_output.put_line(SUBSTR(stmt,161,240));
dbms_output.put_line('Oracle error is:');
dbms_output.put_line(SUBSTR(SQLERRM,1,80));
dbms_output.put_line('-');
dbms_output.put_line('--------- WARNING WARNING WARNING -----------');
dbms_output.put_line('-');
dbms_output.put_line('The SQL statement that you are trying to process is an invalid statement.');
dbms_output.put_line('-');

END kcmstat_job;

-- ************************* End of Functions/Procedures Definition **************
BEGIN -- start of package main block
ROLLBACK;
END kcmstat;

This is throwing the following error. Why?
The kcmstat_job procedure failed during execution of:
DBMS_STATS.CREATE_STAT_TABLE(ownname => 'MYTEST_TEST', stattab => 'KCMSTATS')
Oracle error is:
ORA-00900: invalid SQL statement
-
--------- WARNING WARNING WARNING -----------
-
The SQL statement that you are trying to process is an invalid statement.
-

PL/SQL procedure successfully completed.

May 20, 2005 - 6:40 pm UTC

stmt := 'begin dbms_stats....... ); end;';

and please USE BINDS

but one questions "why dynamic sql AT ALL"????? it is not even needed or desirable here. just call dbms_stats???

A related question...thanks

A reader From Houston, May 23, 2005 - 3:07 pm UTC

Why dynamic sql? Well sometimes we get stuck with inefficient design and can not change as it breaks consistency / rule of existing application. I agree with you that we don't need dynamic sql and we just need a call for dbms_stat. Here is my case the parse and execute are defined in another pckage that handle all other jobs related stuff like dml and ddl queries. So we have to use this inefficient approach...and yes I will use Binds

Thanks for the help! I hate to ask debugging questions :)

Newbie, June 22, 2005 - 7:45 am UTC

Hi Tom,

Is there a "best" time to execute dbms_stats.gather_table_stats? Does the database need to be idle?

June 22, 2005 - 1:55 pm UTC

no and no :)

probably during "off peak time" could be considered "best", but the database need not be idle

estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE

Craig, July 05, 2005 - 5:41 pm UTC

Tom,

Any way for use to peak at the actual percentage(s) Oracle uses/has used when we use the DBMS_STATS.AUTO_SAMPLE_SIZE?

July 05, 2005 - 5:44 pm UTC

the number of sampled items is in the dictionary.

ops$tkyte@ORA10GR2> desc user_tables
 Name                                     Null?    Type
 ---------------------------------------- -------- ----------------------------
 TABLE_NAME                               NOT NULL VARCHAR2(30)
 TABLESPACE_NAME                                   VARCHAR2(30)
 CLUSTER_NAME                                      VARCHAR2(30)
 IOT_NAME                                          VARCHAR2(30)
....
 SAMPLE_SIZE                                       NUMBER
 LAST_ANALYZED                                     DATE

analyze table...query is not using indexes !!!

Utpal, July 23, 2005 - 6:21 am UTC

hi,

there is 300GB database, and i m doing performance tuning of application. data in tables like millions.
i have created indexes to give better result of query, and after that have analyzed table.
before analyzing tables the queries which are using indexes are not using the index at all, and coz degrading performance.
but some queires are not using indexes but also givnig better performance...me confuse whether ve to do analyze the table or not??

i ve used analyze table <table_name> estimate statistics.

pls suggest me , do we ve to do analyze table.
or any other suggestion to do the performance tuning..any guidence.

thanks in advance!

Utpal

July 23, 2005 - 10:36 am UTC

first, dbms_stats is the tool to use to gather statistics.

What you'll want to do is gather an autotrace/explain plan for some of the poorly performing queries and a tkprof for the same. (keyword here is "you", not me :)

compare the estimated cardinalities from the explain plans to the REAL cardinalities in the tkprof -- are they even remotely close, if not, that is what needs be fixed first.

Utpal, July 26, 2005 - 11:30 pm UTC

Hi,

i did the same which you ve said earlier and there is no diff in analyze explain plan and tkprof.

but my problem is:

1) Before analyzing the tables(query listed below with explain plan- Example-1), query indexed.
But, after doing analyze(see below Example-2), same query is not hitting the indexed.
and when i deleted the statistics of the same tables, again query is indexed.

So while doing performance tuning, do i need to analyze the tables or not???

Example-1:
SELECT a.strPolNbr,
b.nBenefitNo,
a.dtpolIssue,
a.dtpolcomc,
b.dtProdComc,
b.dtriskcomc,
DECODE(a.npmtmode,99,1,a.npmtmode) npmtmode,
a.npmttype,
a.npolstatcd,
b.nProdType,
c.dmodalfactor,
SUM(NVL(b.DTABULARPRM,0)) org_prem,
SUM(NVL(b.dsa,0)) org_sa
FROM com_pol_prod_dtl_org b,
com_policy_m_org a,
prd_modal_factor c
WHERE b.dtProdComc >= '1-jan-05'
AND b.dtProdComc <= '31-dec-05'
AND a.npolstatcd NOT IN (SELECT nstatus
FROM chm_perst_pol_dmr
WHERE strpersttype = 'BNMPERST_1'
)
AND ((a.npmtmode != 99 AND 0 = 0) OR (0 = 1 ))
AND a.nPmtMode = c.nPmtMode
AND a.nPmtType = c.nPmtType
AND b.strPolNbr = a.strPolNbr
AND b.nProdType = c.nProdType
GROUP BY a.strPolNbr,
b.nBenefitNo,
a.dtpolIssue,
a.dtpolcomc,
b.dtProdComc,
b.dtriskcomc,
a.npmtmode,
a.npmttype,
a.npolstatcd,
b.nProdType,
c.dmodalfactor
ORDER BY 1,2,3,4,5

and explain for above query is
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 404 K 2773032
SORT ORDER BY 404 K 24 M 2773032
CONCATENATION
TABLE ACCESS BY INDEX ROWID CMS.FIN_JV_STAGE 55 K 434 K 1816
NESTED LOOPS 55 K 434 K
TABLE ACCESS BY INDEX ROWID CMS.COM_ACC_PAY 289 K 15 M 152
INDEX RANGE SCAN CMS.IDX_ACC_PAY_05 371 K 56
INDEX RANGE SCAN CMS.IDX_FIN_JV_STAGE1 55 K 121
HASH JOIN 404 K 24 M 19235
TABLE ACCESS BY INDEX ROWID CMS.FIN_JV_STAGE 55 K 434 K 1816
INDEX RANGE SCAN CMS.FIN_JV_STAGE 55 K 121
TABLE ACCESS BY INDEX ROWID CMS.COM_ACC_PAY 289 K 15 M 152
INDEX RANGE SCAN CMS.IDX_ACC_PAY_05 371 K 56

Example-2:
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

SELECT STATEMENT Optimizer Mode=CHOOSE 8 M 4124682
SORT ORDER BY 8 M 493 M 4124682
HASH JOIN 8 M 493 M 1370897
TABLE ACCESS FULL CMS.FIN_JV_STAGE 55 K 434 K 4177
TABLE ACCESS FULL CMS.COM_ACC_PAY 5 M 309 M 1051700

Utpal.

July 27, 2005 - 9:19 am UTC

*are the CARDINALITIES* correct (not the plans identical, that we sort of expect, but are the ESTIMATED (autotrace) cardinalities even remotely near the ACTUALs (tkprof)

Yes, in order to use the CBO, you must have proper statistics.

Utpal, July 27, 2005 - 11:33 pm UTC

me not getting ,what you want me to do in this respect...

July 28, 2005 - 8:47 am UTC

take an autotrace (full of ESTIMATED card= values)
take a tkprof (full of actual cardianlities)

compare them, are they even remotely similar. is the optimizer GUESSING the right card= values given your statistics.

If not, that is a problem and we'll need to figure out "why"

Index doesn't do parallel..!

Shaik, August 06, 2005 - 10:24 pm UTC

Tom,

one of the dbms_stats feature is to do the job in parallel,for table 'column size 1' ensures that to do job in parallel..
for index any option is there to do like this...!
how about while doing schema stats?

August 07, 2005 - 9:20 am UTC

that does not "ensure", in much earlier releases it used to be a prerequisite.

dbms_stats documentation includes documentation gather_index_stats
</code> http://docs.oracle.com/docs/cd/B10501_01/appdev.920/a96612/d_stats2.htm#1001337 <code>
which begins with:

GATHER_INDEX_STATS Procedure

This procedure gathers index statistics. It attempts to parallelize as much of the work as possible

dbms_stats slow..

Sabir, August 11, 2005 - 11:59 pm UTC

Tom,
In my env we have around 400 tables and usually we do analyze using Analyze table < > estimate statistics 15 percent and it would take 1/2 hr to complete..

This time we tried with
exec dbms_stats.gather_schema_stats( -
10:07:29 > ownname => 'PARTNERDB', -
10:07:33 > estimate_percent => dbms_stats.auto_sample_size -
10:07:38 > )

PL/SQL

: 02:29:33.96

this we did in schema itself.. see the time it takes 2:30 hr.. kindly let me know the work around..

August 12, 2005 - 8:35 am UTC

try starting with the same inputs, if you want the same outputs or something to compare to.

dbms_stats slow..

Sabir, August 14, 2005 - 9:45 pm UTC

Tom,

I didn't catch your answer.. anyway i tried again, it takes 1 hr 15 min to complete..
but still slow than analyze..
anything i have to do to speed up dbms_stats.
plz pardon me and give answer..

August 15, 2005 - 6:55 am UTC

try doing the dbms_stats with parameters equivalent to what you are comparing to -- the analyze.

You used to estimate 15%, do so.

What do you think of this analyze script?

Alf, August 15, 2005 - 1:38 pm UTC

Hi Tom,

Im trying the suggestion on your follow up on Alans post
I need to analyze the objects for 3 schemas on a DW environment. So I tried to enclose a procedure within a package as shown below. Later Ill try setting this in the dbms_job scheduler to have this run every Monday morning at 1:00am

Is there any recommendation to better accomplish this?

create or replace package pkg_analyze_schema
as
procedure analyze_schema;
end;

create or replace package body pkg_analyze_schema
as
procedure analyze_schema
is
begin

for x in ( select username
from dba_users
where username not in ( 'SYS', 'SYSTEM' ) )
loop
dbms_stats.gather_SCHEMA_stats(
ownname=>'schema_one',
tabname=>'big_fact_table',
partname=>'NULL',
estimate_percent=> 25,
degree => 3,
cascade => TRUE)
end loop;
end;
end;

August 15, 2005 - 10:20 pm UTC

you said "for 3 schemas", so -- why all but sys and system -- why not the 3?

you are not passing in the schema name, you are always sending in 'schema_one'

You don't give table names with gather_schema_stats

leave out partname as well.

degree 3? that's a strange pick.

What do you think of this analyze script?

Alf, August 16, 2005 - 10:15 am UTC

Hi Tom,

Thanks for your help
The tabname=>'big_fact_table', is what I found in some documentation left by the dev/vendor, left it out in the revised version below.
degree => 3, --vendor's recommendation as well, what would be a good/better degree?

About this revision?

as
procedure analyze_schema;
end;

create or replace package body pkg_analyze_schema
as
procedure analyze_schema
is
begin
for x in ( select username
from dba_users
where username in('schema_one', 'schema_two','schema_three' ) )
loop
dbms_stats.gather_SCHEMA_stats(
ownname=>'x',
estimate_percent=> 25,
degree => 3,
cascade => TRUE)
end loop;
end;
end;

August 17, 2005 - 10:44 am UTC

(when vendor makes recommendation, one must ask vendor what they were thinking when making such a recommendation)

ownname=>'x' will not *work*, think about it, it'll analyze 'x', not the schemas you ask to analyze.

missing semicolons, degree should probably "not be there" at all and 3 is a very strange number.

dbms_stats slow..!

sabir, August 17, 2005 - 4:48 am UTC

Tom,

As you said i changed the script into 15% of estimate,
its greatly reduced upto 12 min as analyze would be 30min..

Great..!

Kindly let me know why its taking much time when we hanover the percentage job to oracle itself as recommended.

August 17, 2005 - 1:17 pm UTC

it decided "full is OK" would be the likely answer. it is software, it doesn't really care how long it takes.

What do you think of this analyze script?

Alf, August 17, 2005 - 4:17 pm UTC

Hi Tom,

Its not that easy, as a liaison between vendor and management I have written down things I must ask from the vendor. It would be just great if I could get strait answers any time I ask the vendor to explain something. Their usual replay is theyll get back to us. So here I am guessing around how to fix things with little or not proper docs for our DBAs.

Back to the package Im trying to put together to analyze some schemas objects.

Based on some re-search I did on PL/SQL guide I was missing to set properly the x variable and since I don't really understand yet what would be a recommended degree # I just cut it out.

Would please point out what Im still missing? Thanks.

create or replace package pkg_analyze_schema
as
procedure analyze_schema;

end;

create or replace package body pkg_analyze_schema
as
procedure analyze_schema
is
user_x varchar2 := x
begin

for x in ( select username
from dba_users
where username in ('schema_one','schema_two','schema_three' ) )
loop
dbms_stats.gather_SCHEMA_stats(
ownname=>':user_x',
estimate_percent=> 25,
cascade => TRUE)
end loop;
end;
end;

August 17, 2005 - 5:17 pm UTC

for x in ( select username
from dba_users
where username in ('schema_one','schema_two','schema_three' ) )
loop
dbms_stats.gather_SCHEMA_stats(
ownname=> x.username,
estimate_percent=> 25,
cascade => TRUE);
end loop;

remember, schemas are case sensitive.

Brilliant! What do you think of this analyze script?

Alf, August 18, 2005 - 3:34 pm UTC

Thanks a billion Tom,

Thanks for case sensitive reminder.

Cheers
Alf

Can we use analyze for verification ?

Yogesh Purabiya, August 28, 2005 - 5:33 am UTC

We have an old (Legacy) Materials System
on Oracle 7.3 HP-UX 11 - Instance O73
Legacy - that is, no more in production.
It has only READ-ONLY history data.

Two users : stores (nearly 250 tables & 700 MB )
& purchase (nearly 75 tables & 300 MB )

We are now migrating to Oralce 8.1.x
on windows/2003 - Instnce O81

- Some tables were copied by exp / imp.
- Most of the tables were CREATEd-AS-SELECT-FROM-@DB_LINK
- Some tables were created with Partitioning (on O81, ofcourse !)

Then, to verify whether the data are properly copied,
I carried out following steps :

(1) exec DBMS_UTILITY.ANALYZE_SCHEMA (user, 'Compute')

(both O73 and O81 and both the users in both the instances)

(2) Took difference as stores (as well as purchase) user on O81

SELECT Table_Name, Num_Rows, Num_Distincts, Low_Value, High_Value
FROM User_Tab_Columns@o73
MINUS
SELECT Table_Name, Num_Rows, Num_Distincts, Low_Value, High_Value
FROM User_Tab_Columns

(3) Then, took difference as stores (as well as purchase) user on O73

SELECT Table_Name, Num_Rows, Num_Distincts, Low_Value, High_Value
FROM User_Tab_Columns@O81
MINUS
SELECT Table_Name, Num_Rows, Num_Distincts, Low_Value, High_Value
FROM User_Tab_Columns

(4) I found that
- Num_Rows are all equal - no rows missing OR added
- Low_Value and/or High_Value and/or Num_Distinct are not matching in many tables
- The differences were found for the tables with
- exp/imp
- CREATE-AS=SELECT-FROM
- Without / With Partition

(5) For some tables where I found the differences (see 4 above)
I checked the whole tables as folloing.
But, found no difference (no rows selected)

(5.1)
SELECT *
FROM <table-name>@O73
MINUS
SELECT *
FROM <table-name>

(5.2)
SELECT *
FROM <table-name>@O81
MINUS
SELECT *
FROM <table-name>

While going through this page,
I find that it is NOT necessary to 'DELETE' statistics before 'COMPUTE'
(See your reply to the review by Irfan from Kuwait - October 08, 2002 above)

What may be the reason ?

August 28, 2005 - 5:56 am UTC

In my reply, I said just to gather the schema stats? You need not delete before gathering new (that would leave you without stats during the gather!)

But anyway, I don't really see a question here?

Can we use analyze for verification ?

Yogesh Purabiya, August 29, 2005 - 10:17 am UTC

The question is "Why there are discrepancies ?"

Though the "MINUS" operations show that table contents are
iedntical, the Low_Value and/or High_Value and/or
Num_Distinct are not matching for many tables

What may be the reason ?

August 29, 2005 - 1:50 pm UTC

as I said "i did *not* really see a clear question here"

you are using analyze (dbms_utility is analyze). Not sure if it should return the same exact values given different versions, no reason it should (bug fixes, adjustments, etc)

both versions are so very old as well, hard to say what could have changed between 7.3 and 8.1.7.

How those values are computed ?

Yogesh Purabiya, August 30, 2005 - 12:40 pm UTC

So, values of Low_Value, High_Value and Num_Distinct do not depend *only* on the data (values), but there are some functions applied on the data; and these functions may vary from version to version; is it so ?

Or, analyze is less reliable compared to gather_stats ?

August 30, 2005 - 1:38 pm UTC

you are going back to 7.3.

the web as we know it did not really exist.
commercials did not include a URL.
we used stamps to get stuff in the mail.
this girl was just being born (yup, my daughter)
</code> http://photos23.flickr.com/35895898_b212ecd519.jpg <code>

1995 - anything can change, you are talking a decade into the past here.

Heck ROWIDS looked different back then.

ORA-01039: insufficient privileges on underlying objects of the view

Shailesh Saraff, August 31, 2005 - 2:28 am UTC

Hello Tom,

Thank you very much for you replies. We have written function which is used in our application to check cost of queries. Although not helpful, but application has passed queries on data dictionary to this function and this has raised error "ORA-01039: insufficient privileges on underlying objects of the view". Our user has required privileges on view <DBA_TAB_COLUMNS>, on Oracle 8i it was working, but when we migrated from 8i to 10g it is giving this error. In <USER_TAB_PRIVS> following privileges exist

On 8i: INSERT, UPDATE, DELETE, SELECT
On 10g: INSERT, UPDATE, DELETE, SELECT, REFERENCES, ON COMMIT REFRESH, FLASHBACK, DEBUG, QUERY REWRITE

Is there any privilege missing for our user to access DBA_TAB_COLUMNS? Query and sample function code given below please help us to solve this issue.

SELECT Getcost('SELECT * FROM SYS.Dba_Tab_Columns WHERE Table_Name = ''TABLENAME''') FROM Dual;

-----------------------------------------------------
FUNCTION GetCost(Sqltext In Varchar2) RETURN NUMBER is
PRAGMA AUTONOMOUS_TRANSACTION;
nCost NUMBER;
cSeqName Varchar2(30) := NULL;
Stmt Varchar2 (4000) := NULL;

Type Cur1 IS REF Cursor;
C1 Cur1;
Begin

Execute Immediate 'SELECT SeqExplainPlan.Nextval FROM Dual' Into nPlanVal;

Name := 'I1PLAN'||nPlanVal;

Stmt:= 'Explain Plan Set Statement_Id=' ;
Stmt:= Stmt||''''||Name||''''||' For '||Sqltext;
Execute Immediate Stmt;
Commit;

Open C1 for
SELECT Cost
FROM Plan_Table
WHERE Statement_Id = Name
AND Id = 0;
Fetch C1 into nCost ;
Close C1;
RETURN nCost;

End;
/

Thanks & Regards,

Shailesh

August 31, 2005 - 1:20 pm UTC

</code> http://asktom.oracle.com/pls/ask/search?p_string=O7_DICTIONARY_ACCESSIBILITY <code>

the dictionary is much more secure in 9i and 10g than in 8i by default.

Thanks a ton for the suggestions !!

Shailesh Saraff, September 01, 2005 - 3:16 am UTC

GRANT SELECT ANY DICTIONARY TO <User> ;

has solved my problem.

Regards,

Shailesh

How to check table is analyzed or not

krishnakant Nirala, September 05, 2005 - 7:00 am UTC

I want to check that table is analyzed or not , How can I check.

I have 378 tables inwhich 207 tables are analyzed and rest is not analyzed.

September 05, 2005 - 11:00 am UTC

see the last_analyzed column in the dictionary view for user_tables

atul, September 30, 2005 - 1:58 am UTC

Hi,

why the stats are diff when using table_stats VS schema stats..

We got situation when performance is slow after doing SCHEMA_STAT,but if we do individual table_stat then performace gets good.

We are using oracle 9.2.0.6.

Could u tell us?

Thanks,
Atul

September 30, 2005 - 9:03 am UTC

"u" isn't here, I've yet to meet that person (don't even know the gender! I'm curious though)

you need to give an "example", show me the stats being different between the two and the precise commands involved.

Can the "cardinality" hint solve my problem?

Oren Nakdimon, January 07, 2006 - 7:26 pm UTC

Tom, I've recently discovered the "cardinality" hint in one of your answers. I believe this is exactly the answer to my question from 19-April-2005 in this thread, isn't it?
Thanks,
Oren.

P.S.
It's great you're coming to Israel. I'm waiting for your lectures excitedly.

January 08, 2006 - 11:37 am UTC

No, I think dbms_stats and getting the correct statistics is what you are looking for.

Reasoning:

a) cardinality hint is not documented

b) you have real tables - therefore, you can set real statistics in the dictionary (the only time I've shown how to use the cardinality hint is for PIPELINED functions - for which there is no reasonable alternative, you cannot set stats on them)

c) you have the information necessary to set the stats, so go ahead and do that.

SQL

Suresh N, January 10, 2006 - 1:33 am UTC

what is the use of Analyse table with estimate Statistics
in Oracle?

January 10, 2006 - 8:09 am UTC

depending on the other options you use with it - it would be the version 8.0 and before way of gathering statistics on table/index/set of columns using a sample (estimation).

Consider is deprecated in favor of dbms_stats with 8i and above.

Index Vs Table Stats

Mita, January 20, 2006 - 9:27 am UTC

Tom,

which is better in case I need to get both table and index stats ??

1. gathering index stats & gathering table stats
2. gathering table stats with cascade option for index stats

Thanks
Mita

January 20, 2006 - 10:36 am UTC

depends, in 10g if you

a) create table
b) load table
c) gather table stats
d) create indexes (which compute stats as they are created)

no cascade needed.... (in 9i, create index does not gather stats, 10g it does)

neither is "better", both are a "choice"

if you want different options for each - you have less choice (you need to do them independently of course)

Follow up

Mita, January 20, 2006 - 10:49 am UTC

Tom,

Thanks.

if we are using same options for both gather_stats, would there be any difference in time it takes in both cases ??

automatic stats gathering on index creation is from 10gR1 or 10R2 ??

January 20, 2006 - 12:29 pm UTC

I would expect it to be about the same - but I have never personally benchmarked it


ops$tkyte@ORA10G> select * from v$version;
                                                                                                                                                          
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - Prod
PL/SQL Release 10.1.0.4.0 - Production
CORE    10.1.0.4.0      Production
TNS for Linux: Version 10.1.0.4.0 - Production
NLSRTL Version 10.1.0.4.0 - Production
 
ops$tkyte@ORA10G> @test
ops$tkyte@ORA10G> drop table t;
 
Table dropped.
 
ops$tkyte@ORA10G> create table t as select * from all_users;
 
Table created.
 
ops$tkyte@ORA10G> create index t_idx on t(user_id);
 
Index created.
 
ops$tkyte@ORA10G> select num_rows from user_indexes where index_name = 'T_IDX';
 
  NUM_ROWS
----------
        39
 
ops$tkyte@ORA10G>

gather_table_stats and gather_index_stats

Michel Cadot, January 20, 2006 - 11:30 am UTC

A small difference between gather_table_stats with cascade=true and gather_index_stats is that in the former case index statistics gathering is not parallelized.

Regards
Michel

Su Baba, March 13, 2006 - 7:00 pm UTC

Why dbms_stats.gather_table_stats does not populate user_tables.avg_space and user_tables.avg_row_len as "analyze table" does?


SQL> CREATE TABLE in_row_t (
  2  id     INT PRIMARY KEY,
  3  in_row CLOB
  4  )
  5  LOB (in_row) STORE AS (ENABLE STORAGE IN ROW);

Table created.

SQL> 
SQL> INSERT INTO in_row_t
  2  SELECT ROWNUM, RPAD('x', 1000, 'x')
  3  FROM   all_objects
  4  WHERE  rownum < 100;

99 rows created.

SQL> commit;

Commit complete.

SQL> 
SQL> BEGIN
  2        dbms_stats.gather_table_stats(
  3           ownname => user,
  4           tabname => 'IN_ROW_T',
  5           cascade => TRUE
  6        );
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL> 
SQL> SELECT num_rows, avg_space, chain_cnt, avg_row_len
  2  FROM   user_tables
  3  WHERE  table_name = 'IN_ROW_T';

  NUM_ROWS  AVG_SPACE  CHAIN_CNT AVG_ROW_LEN
---------- ---------- ---------- -----------
        99          0          0          90

SQL> ANALYZE TABLE in_row_t COMPUTE STATISTICS;

Table analyzed.

SQL> 
SQL> SELECT num_rows, avg_space, chain_cnt, avg_row_len
  2  FROM   user_tables
  3  WHERE  table_name = 'IN_ROW_T';

  NUM_ROWS  AVG_SPACE  CHAIN_CNT AVG_ROW_LEN
---------- ---------- ---------- -----------
        99       2888          0        1045

March 14, 2006 - 10:05 am UTC

dbms_stats gathers only things used by the CBO - it does not compute things like chain_cnt for example that the CBO doesn't use. And it gathers these statistics using algorithms the CBO is expecting, as opposed to analyze which gathers things the CBO doesn't use and does not gather them entirely in the manner the CBO is expecting.

dbms_stats doesn't run

Elaine H, June 13, 2006 - 11:04 am UTC

Issued the following:

exec dbms_stats.gather_table_stats(ownname=>'DI', -
tabname=>'DI_NAME2', -
estimate_percent=>'.1', -
method_opt=>'FOR ALL INDEXED COLUMNS', -
degree=>'9', -
cascade=>FALSE);

at 3am -- it didn't error but nothing appears to be happening (currently 10:50am).

the table against which it is processing is about 3.5 billion rows partitioned 9 ways. Any ideas on what went wrong or what to look at?

Thanks.

June 13, 2006 - 12:29 pm UTC

is it still "running"

don't know what you mean...

running

Elaine H, June 13, 2006 - 1:17 pm UTC

at the time that i originally posted, the session was still active in the database. the gather process took 8.5 hours.

June 13, 2006 - 5:02 pm UTC

so, now it is done?

35billion rows is not meaningful

4.2 terabytes - or whatever - that would be...

purely a funtion of the size of the data to be read and the number of histograms you are computing (which is not clear either)

but now you know how long it takes.. given the methods you used to gather stats.

size and histograms

Elaine H, June 14, 2006 - 9:21 am UTC

the actual size of the table is now around 3.8 billion rows and takes up roughly 2.6 Tb of space.

The number of distinct indexed columns. These appear in 9 indexes. We also ran statistics for these 9 indexes and that took approximately 1.5 hours.

Using your previously provided script:
1 select t.num_rows, i.num_rows, c.cnt
2 from (select num_rows from user_tables where table_name = 'DI_NAME2') t,
3 (select num_rows from user_indexes where table_name = 'DI_NAME2') i,
4 (select count(distinct column_name) cnt from user_tab_histograms
5* where table_name = 'DI_NAME2') c

I got the following results:
NUM_ROWS NUM_ROWS CNT
---------- ---------- ----------
3824663000 3862146000 17
3824663000 3101536000 17
3824663000 3318100000 17
3824663000 556190000 17
3824663000 3858204000 17
3824663000 3818052000 17
3824663000 3851250000 17
3824663000 3833374000 17
3824663000 3832497000 17

What I was asking is whether there was a more efficient and less time consuming method for gathering statistics. Support wants us to drop the statistics for table and indexes to run a test against our production and I would like to cut down on the 10 hours that it will take to rerun them.

June 14, 2006 - 12:31 pm UTC

you can attempt using block sampling instead of row sampling.

row sampling reads the entire table and just looks at "x % of the rows"

block sampling will read only "x %" of the allocated space (but beware, can be subject to issues with badly skewed data because of the different sampling technique)

RE: size and historgrams

Lance Hoover, June 14, 2006 - 2:11 pm UTC

If the database is 10 (not sure what point release), couldn't they just use this:

</code> http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#i41814 <code>

and re-apply the old statistics?

June 15, 2006 - 8:14 am UTC

how so?

gather_table_stats strategy

A reader, August 20, 2006 - 7:43 pm UTC

hi tom,

we have a datawarehousing environment where the DW gets loaded weekly. Currently the strategy we follow during the load is

- diable indexes
- load data in the table (fact, dimenion, aggregate)
- rebuild indexes
- analyze using (dbms_stats.gather_table_stats)
method_opt -> 'FOR ALL INDEXED COLUMNS SIZE 1'
estimate_percent -> 5
degree -> 5

since we have large tables in the WH - the overall time added by dbms_stats.gather_table_stats for all tables is close to 2 to 3 hours.

since the load to the WH is weekly - i want to change the process as follows

- middle of the week, during off hours do
dbms_stats.gather_table_stats
- then in the weekly load - after the tables are loaded etc
dont do dbms_stats.gather_table_stats

so stats are in effect one 'cycle' behind - but since the loads are incremantal - i have a feeling that should be ok,

i would like to hear your thoughts on the approach - and if you see a big no-no even in trying this approach,

thanks as always for your insightful answers and comments.

August 27, 2006 - 8:55 am UTC

I'm not sure why you are using "for all indexed columns size 1", what was the thought process behind that.

define "incremental" for me.

and tell me why gathering in the middle of the week is better than right after the load or whatever? I mean, if you are going to run on stale statistics (you are entirely rebuilding indexes and such), why wait, you can query whilst stats gathering is taking place.

gather_table_stats strategy follw-up

A Reader, August 27, 2006 - 11:29 pm UTC

Hi Tom

Thanks for your response - my answers and follow-up q's
precceded with --> below :

I'm not sure why you are using "for all indexed columns size 1", what was the thought process behind that.

-->ALL INDEXED COLUMNS
-->being a DW environment most columns which are used in
-->predicates are indexed - i assume the histograms on
-->these would be most helpful

-->SIZE 1
-->i had picked SIZE 1 based on the documentation that
-->you have to spaecify SIZE as 1 to build histograms
-->in parallel

-->actually i am very confused about this option - one
-->basic question i have is - does the SIZE parameters
-->affect for which columns the histograms are built?
--> OR
-->does it only specify the number of buckets in each
-->histogram
-->OR
-->both of the above

--> and what about using SIZE 1 for parallel execution

-->will greatly appreciate your clarification on this

define "incremental" for me.

-->the warehouse has 260 weeks of data (5 years) - each
-->weekly load is adding one week of very similar
-->data to the fact tables and dimension tables. Even
-->though the number of rows in most tables would likely
-->increse each month - the data profile (eg skewedness of
-->dept in the employee table) is not going to change from
-->month to month. that is the assumption which makes
-->me feel that rather then spending time regathering stats
-->as the weekly load is running - its worth trying to load
-->with stale stats - once the data is loaded - it can be
-->released to users - and then recalc stats as you suggest
-->- immediately after the load is over.
-->Have you seen this done ? with any success ?

and tell me why gathering in the middle of the week is better than right after the load or whatever? I mean, if you are going to run on stale statistics (you are entirely rebuilding indexes and such), why wait, you can query whilst stats gathering is taking place.

--> agreed, its best to 'refreh' immediately after
--> processing the weekly load

August 28, 2006 - 8:51 am UTC

all indexed columns
- being a data warehouse, that doesn't seem right, as end users tend to ad-hoc
the system. but so be it, if that is true in your case - it would be "not
the usual case"

size 1
- is like saying "no histograms", it is the buckets. it used to be that size 1
was necessary for parallel but hasn't been for a while.

incremental

- but one would presume that high and maybe low values would change dramatically, such as the DATES themselves.

DBMS_STATS does not generate information about chained rows

koshal, November 01, 2006 - 11:46 am UTC

Quote
"DBMS_STATS does not generate information about chained rows and the structural integrity of segments."

Does this one going to affect the execution plan? If not why analyze command is gathering this stats? where the status about no of chained rows going to help us?

Can you please explain structural integrity of segments?

Thanks
Koshal

November 01, 2006 - 6:14 pm UTC

the CBO never used chained row counts, that is why dbms_stats does not gather it, it is not anything the CBO ever looked at.

analyze can be used with "validate structure", dbms_stats has not such option.

When/how the column CHAIN_CNT gets updated without analyze

koshal, November 02, 2006 - 11:44 am UTC

If DBMS_Stats does not generate information about chained rows. When/how the column CHAIN_CNT in all_tables(uderlying table) gets updated/can be updated.

If we want to know the no of chained rows do we need to scan through the whole table again with analyze/different package/procedure?

Can I infer that with CBO
if no of chained rows is less than 0.01% or 90% of the total rows the plan is going to be one and the same?

what is the advantage of analyze with validate structure?

Thanks
Koshal

November 02, 2006 - 12:13 pm UTC

you need to use the analyze command to populate that column.

the CBO does not look at this number at all for anything, nothing

validate structure does what it says in the documentation it does, it is a structural integrity testing tool

Great!

koshal, November 02, 2006 - 2:34 pm UTC

Thanks a lot!

Thanks
Koshal

HOW DBMS_STATS.GATHER_TABLE_STATS WORKS??

Anil, March 02, 2007 - 4:39 am UTC

why gather table statistics is taking long time.?
i have table named QM_SAMPLE which has nearly 1 billion records. The total block is 7761455. there are no any index on this table. I gather using:

exec dbms_stats.gather_table_stats(ownname => 'S_SAMPLE', tabname => 'QM_SAMPLE', estimate_percent => 5, cascade => true);

why is it scanning, sort/merge, output the temp table repeatedly??

OPNAME TARGET SOFAR TOTALWORK TIME_REMAINING ELAPSED_SECONDS
Gather Table's Index Statistics  0 0 Indexes 0
Table Scan SYS.ORA_TEMP_1_DS_6159 120632 387628 387628 Blocks 15
Table Scan SYS.ORA_TEMP_1_DS_6159 120632 387628 387628 Blocks 12
Sort Output    181071 223536 37 157
Sort/Merge    223584 223584 0 195
Table Scan SYS.ORA_TEMP_1_DS_6159 387628 387628 0 151
Sort Output    60661 60661 0 48
Sort/Merge    60676 60676 0 81
Table Scan SYS.ORA_TEMP_1_DS_6159 387628 387628 0 38
Table Scan SYS.ORA_TEMP_1_DS_6159 387628 387628 0 25
Sort Output    221697 221697 0 199
Sort/Merge    221760 221760 0 264
Table Scan SYS.ORA_TEMP_1_DS_6159 387628 387628 0 100
Sort Output    119823 119823 0 111
Sort/Merge    119876 119876 0 122
Table Scan SYS.ORA_TEMP_1_DS_6159 387628 387628 0 82
Table Scan SYS.ORA_TEMP_1_DS_6159 387628 387628 0 27
Table Scan SYS.ORA_TEMP_1_DS_6159 387628 387628 0 15
Table Scan SYS.ORA_TEMP_1_DS_6159 387628 387628 0 15
Table Scan SYS.ORA_TEMP_1_DS_6159 387628 387628 0 190
Table Scan S_SAMPLE.QM_SAMPLE 7761455 7761455 0 1042

TOTAL TIME TAKEN: 59 MIN

now, when i change estimate_percent to 0.5 % then result will be

Gather Table's Index Statistics  0 0  0
Table Scan (stale) obj# 120631 38763 38763 0 16
Table Scan (stale) obj# 120631 38763 38763 0 8
Table Scan (stale) obj# 120631 38763 38763 0 20
Table Scan S_SAMPLE.QM_SAMPLE 7761455 7761455 0 880

TOTAL TIME: 17 min

So why is this doing multiple sort/merge and table scan? It would be very helpful if you please explain how this gather_table_stats works internally?

March 04, 2007 - 12:53 pm UTC

just turn on sql_trace=true and you'll see what it does in the trace file/tkprof report.

Why removing SYS and SYSTEm stats

dieter, March 27, 2007 - 3:59 pm UTC

Hi Tom:

In this discussion a few times people discussed with you on how to gather database statistics (DBMS_STATS.GATHER_DATABASE_STATS) and them removed (DBMS_STATS.DELETE_SCHEMA_STATS) the statistics for the SYS and SYSTEM schema.
Why should this be done? I would have expected, that proper statistics for SYS and SYSTEM are as important as the "own" schema's.

March 28, 2007 - 11:20 am UTC

depended on the release. 8i, not recommended on the dictionary, 9i - ok to have as long as you tested before just gathering them in production for the first time, 10g - they will and should be gathered.

You are like GOD in Oracle.

A Reader, April 05, 2007 - 12:46 pm UTC

Thanks Alt Tom for your help. I appreciate your dedication and you support.

You are great Tom.

Analyze table for indexed columns only

Mahomed Suria, May 08, 2007 - 10:44 am UTC

Are there any circumsances where a user would compute statistics for indexed columns only?

e.g analyze table <table_name> compute statistics
for all indexed columns

May 10, 2007 - 8:34 pm UTC

not anymore, because we all know never to use analyze in 8i and up to gather statistics, dbms_stats is the way to go.

so, you might use dbms_stats and you might use this, but probably not.

Analyze schema after Nightly Refresh is taking too long

Murali, August 05, 2007 - 5:17 am UTC

Hi Tom,

We have a daily over night RMAN refresh from Production to Reporting database which takes backup of producion database and restores to Reporting. After this we analyze one main schema which is used for reporting using
DBMS_STATS.GATHER_SCHEMA_STATS (ownname => 'XYZ', method_opt => 'FOR ALL INDEXED COLUMNS SIZE 1', degree => 4, cascade => TRUE);

Earlier it used to take approx 3hrs. As the database is growing the time it takes to run the analyze is also increasing. So, our DBA truncated one duplicate table having some ten indexes on it. From next day onwards ANALYZE is taking around 6 to 7hrs. Users are reporting severe slow response during analyze. The analyze should finish before the business hours starts. But it is running almost until evening causing slow response.

Can you tell me truncating a table causes analyze to be that much slow? It is a critical database and we need to provide them resolution ASAP.

Thanks for your help
Murali

August 05, 2007 - 2:16 pm UTC

why not estimate

and are you sure you need to gather statistics? aren't the ones from the original production database better than good enough?

Analyze schema after Nightly Refresh is taking too long

Murali, August 06, 2007 - 8:42 am UTC

I have suggested business with these options. But, my question here is until 31st July, the analyze with compute used to take 3hrs where as from 1st august it is taking more than 6hrs.

What might have caused this sudden increase in time taken to complete analyze? Please note that there were no bulk loads happened on that day. The database growth is normal.

Thanks for your help
Murali.

August 06, 2007 - 12:06 pm UTC

what might have caused that?

oh, about a billion things - none of which I can guess accurately.

If you have been monitoring your database using statspack or AWR - you should be able to report out what is different between the two points.

if you have no information about how the system USED to work (when it was 3 hours), then it'll be virtually impossible to make any educated statements about what changed, you can hypothesize till the cows come home - but you'll never know.

Analyze schema after Nightly Refresh is taking too long

Murali, August 06, 2007 - 12:39 pm UTC

Thanks much for your help Tom

Murali

Table Analyze

Vinay, August 18, 2007 - 3:18 pm UTC

Hi Tom,
As part of a daily process, we need to execute several stored procedures in sequence.. prior to execution start, we analyze the schema. However, during the course of the run, the functions and procedures being run make massive changes to table data which would be required by the subsequent function(s). So I'd need to recompute statistics for individual tables and table indexes during the course of the run, am I right?

Also, I'd like your opinion on an approach which I've thought of to accomplish this. Prior to the start of a given procedure, in *parallel* to the execution of the preceding procedure, dynamically find out all the dependent snapshots and tables of the upcoming procedure (using user_dependencies), and compute stats for those tables and snapshots that will be used in the next procedure..

Another question I have regarding dbms_stats.gather_table_stats is: What happens if I try to gather stats on a table that is currently being manipulated by another query/function? Of course, statistics might not be proper since the other query might insert/delete data from the table, but is there any other harm that this might cause?

Thanks

August 22, 2007 - 10:17 am UTC

... am I right? ...

maybe yes
maybe no

It depends.

see
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:718620715884#408018600346441550

gathering stats on a table will not affect currently executing statements, their plans are already set and running.

Table locks during analyze

Vinay, August 21, 2007 - 12:41 am UTC

Hi Tom,
In addition to the question I've posted above, I have a question on table locks during execution of dbms_stats. My understanding was that gathering statistics only involves issuing queries to find out stuff about the table. This should not lock tables. However, I came across a web page where they mentioned that gathering statistics on indexes locks the tables of that index so that no DML or DDL would be permitted on the table.
Is that right? Does computing stats on indexes really lock the table on which the index is created?
Please do reply whenever you find time..

Thanks,
Vinay

August 22, 2007 - 11:20 am UTC

I come across lots of web pages.

sometimes I misread what is on them
sometimes they are wrong
occasionally I read them correctly and they are right

before I say "what a load of ...." - I'd want to see the quote in context.

No, it does not lock the table. ANALYZE validate structure might, but dbms_stats - no.

In the following, we leave a transaction open on table T - that will prevent another transaction from gaining a lock on table T obviously.

We then gather stats and commit in another transaction - and just to prove it is another transaction - we query T and see two rows - rollback and see just one.

And the num_rows returned from the dictionary reflect just a single row.

So, no table lock

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t ( x int, y int );

Table created.

ops$tkyte%ORA10GR2> insert into t values (1,1);

1 row created.

ops$tkyte%ORA10GR2> create index t_idx on t(x);

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t values ( 2, 2 );

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> declare
  2          pragma autonomous_transaction;
  3  begin
  4          dbms_stats.gather_table_stats( user, 'T', cascade=>true );
  5          commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select * from t;

         X          Y
---------- ----------
         1          1
         2          2

ops$tkyte%ORA10GR2> rollback;

Rollback complete.

ops$tkyte%ORA10GR2> select * from t;

         X          Y
---------- ----------
         1          1

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
         1

ops$tkyte%ORA10GR2> select num_rows from user_indexes where table_name = 'T';

  NUM_ROWS
----------
         1

Regarding table locks during analyze

Vinay, August 23, 2007 - 7:59 am UTC

Hi Tom,
Thanks a lot.. I performed a test similar to the one you've provided above on our DB here - confirmed that there is no lock.. here's the text on the web site in context:

<QUOTE>
"Analyzing an index puts a shared lock on the related table; hence, neither DML nor DDL can be performed. Preferably avoid all activities during the statistics generation phase"
</QUOTE> on http://www.pafumi.net/Gather_Statistics.html

Regarding your answer to my previous question above: Yes, I'll perform a few checks and look at whether my execution plan is actually changing drastically - I'll decide where I need to compute stats and where I do not, based on that information..

Thanks,
Vinay

August 23, 2007 - 12:19 pm UTC

A normal analyze will not (for statistics) - but as I said a validate structure (done normally - without the online keyword) will

ops$tkyte%ORA10GR2> create table t ( x int, y int );

Table created.

ops$tkyte%ORA10GR2> create index t_idx on t(x);

Index created.

ops$tkyte%ORA10GR2> insert into t values (1,1);

1 row created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> declare
  2          pragma autonomous_transaction;
  3  begin
  4          execute immediate 'analyze index t_idx compute statistics';
  5  end;
  6  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> declare
  2          pragma autonomous_transaction;
  3  begin
  4          execute immediate 'analyze index t_idx VALIDATE STRUCTURE';
  5  end;
  6  /
declare
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified
ORA-06512: at line 4

I suggest you write to the author of that paper and ask them to clarify/fix it.

incremental stats gathering

Alex, September 27, 2007 - 11:02 am UTC

Tom,

When we switched our statistics gathering from 'GATHER' to 'GATHER AUTO', we found immediately several queries degraded. What did we wrong? Should we use 'GATHER STALE' plus 'GATHER EMPTY' instead?

Below is the 1) working gathering which of couse is very slow 2) incremental gathering which caused sql degradation:

1) exec dbms_stats.gather_schema_stats(ownname=>'${SCHEMA}', options=>'GATER', cascade=>TRUE, estimate_percent=>${PERCENT})

2) exec dbms_stats.alter_schema_tab_monitoring(ownname=>'${SCHEMA}',monitorng => TRUE );

exec dbms_stats.gather_schema_stats(ownname=>'${SCHEMA}', options=>'GATER AUTO', cascade=>TRUE, estimate_percent=>${PERCENT});
Thanks you very much,

Alex

September 28, 2007 - 4:05 pm UTC

Well, gather auto simply says "we'll figure out what we feel like gathering"

You went from gathering statistics on everything - to - you have no idea.

I cannot tell you which to use, you should look at what is happening under the covers - what object(s) are getting missed that cause an issue.

"of course is very slow" is not a truism - gathering all stats is not "very slow" for everything, for everyone.

It might be that a gather_stale, coupled with a few specific gather_table_stats for "special" tables (you have to find those, look for what is getting missed that counts)

you can also specify a percent to sample (say 10%) and see what that does for you as well.

If you have to use gather empty, implying that you actually create objects on the fly in PRODUCTION, you might want to ask "why are we doing that". You shouldn't need to do that in production.

how to determine whether dbms_stats package was used to analyze

Rish G., November 23, 2007 - 12:12 pm UTC

Tom,
How can we determine whether dbms_stats package or the analyze table compute/estimate statistics command was used to analyze a table?

November 26, 2007 - 11:42 am UTC

dbms_stats will not fill in avg_space, so, while not definitive, avg_space = 0 would likely be dbms_stats. Same with chain_count (but again, not definitive)

Still dbms_stat Required

Rajeswari, February 13, 2009 - 7:11 am UTC

DB Version: 9.2

We have the following programs scheduled in PERT - 2 days once i.e for each cycle - we have 14 cycles for a month

a) pgm1 - delete and bulk insert into table X (~150000)

For each cycle we will insert ~150000 records.
In next month run we will delete the records of that cycle and insert fresh records
e.g
Jan process => insert cycle =1 count = 150000
Feb process => delete jan cycle 1 data and insert for cycle 1

Currently we are keeping data for a month for any support issues.

b) pgm2 - access table X and do some process

Program2 is running more than 4 hours due to complex process.
We planned to remove the delete and replace with truncate for pgm1 (previous/current month data is not required once pgm2 completes)

After reading your book I am insisting of stats collection after insert but we have problem in scheduling stat collection, as starting time of pgm1 varies for each cycle it is dependent on other sequence of programs.

My question is whether dbms_stat is required after every bulk insert in pgm1 or any limit in number of records for doing stats collection.

February 16, 2009 - 10:47 am UTC

if you bulk delete and repopulate X on a schedule like that, I'm very confused as to why scheduling the dbms_stats for it would be hard? Just add that call to the program that fills the table???

pgm1 - last step, gather OR set table statistics for X.

Thank You

Rajeswari, February 17, 2009 - 7:09 am UTC

Thanks a lot Tom.

Initially DBA refused to provide access to dbms_stat. As they have a separate jobs to gather statistics. They will execute in non-peak hours.

Just today they accepted to provide access to the procedure thru which we can do stats collection.

Whenever I ask inputs/solution from you, it gets done immediately here :-)

Best Analyze

A reader, March 11, 2009 - 11:56 am UTC

Greetings Thomas,

and thanks like always,

1) What is better, analyze indexes first and then tables, or analyze tables then indexes?

2) if i create an index using the "compute statistics" option, the performance had been improved without the need for analyzing the table, so if i create the index using this option is it nessary to analyze the table, if yes, why?

Thanks

March 12, 2009 - 7:46 am UTC

you would typically use dbms_stats with cascade=>true to gather the table and indexes at more or less the same time.

in 10g, create/rebuild of an index computes statistics already (and you cannot skip it - if the table has at least one row). So, if your process is a) load data, b) index data - then step c would be gather statistics just on table.

why gather on table? Because a table in general has a lot more stuff in it then a single index on the table would. The index cannot tell us how many rows are in the table (in general), if the index is on column X and only on X, it cannot tell us anything about column Y and so on. You of course need statistics on the table, not just the index.

True in all cases? Version 10.2.0.3.0

David Tait, May 13, 2009 - 5:35 pm UTC

Tom:

Very helpful advice, as always.

One example I did not see:

On a Table of 38,000,000 rows, with partitions and sub-partitions, built up over a 2 year period, a project has just completed that added 2 more of the company's regions overnight.

This project added 1,800,000 rows, well below the 10% threshold.

Would running a Table, Index, Partition and Sub-partitioned Analyze be advised?

Thanks again,

Dave

May 13, 2009 - 5:45 pm UTC

is what true in all cases? Not sure what you are referring to.

I can say "pretty near NOTHING is true EVERY TIME"

I don't know enough about your data, your queries and how the 1.8 million rows were added.

Knowing what you know about the data - would you expect plans to probably change? Did all of the data go into a single partition? Or spread over many? did the insert radically change the skew of some columns? did the insert radically change the high/low values for some columns?

dbms_stats not generating correct execution plan

Ashish Goel, June 16, 2009 - 8:06 am UTC

Hi Tom,

Few days back I faced some application performance problem, on investigation I found that execution plan of few sql's is not good.
I updated table statistics and problem was resolved.

The issue here is when I am using dbms_stats.gather_table_stats it is not generating correct execution plan.

but when I used
Analyze Table <TableName> Estimate Statistics Sample 100 Percent;

followed by dbms_stats.gather_table_stats, it worked fine (correct execution plan)

Analyze Table alone is also not sufficient.

This is an unusual problem I noticed, please suggest.

My Oracle version is 10.2

Thanks
Ashish

June 16, 2009 - 2:27 pm UTC

do not use analyze.

do read about dbms_stats, understand the INPUTS to it, read about them, use the ones you want to use. there are quite a few inputs to dbms_stats, you let them all default - read them, understand them, then use them.

you can use estimate_percent => 100 with dbms_stats too you know (instead of auto)

dbms_stats not generating correct execution plan

Ashish Goel, June 17, 2009 - 2:12 am UTC

Thanks Tom,

I am using following statements to gather statistics.

EXECUTE dbms_stats.gather_schema_stats(ownname => 'NDL807_MDB', estimate_percent => 100 , method_opt => 'FOR ALL COLUMNS SIZE AUTO', cascade => TRUE, options=>'GATHER' );

or

execute dbms_stats.gather_table_stats(ownname => 'NDL807_MDB', tabname => 'COREBASECLASS' , method_opt => 'for all columns size auto', degree => 4, cascade => TRUE);

but these statements only work after I do
Analyze Table NDL807_MDB.COREBASECLASS Estimate Statistics Sample 100 Percent;

My current problem is with three tables that have large number of records. We also had created a night job that gather statistics for all schemas, but also didn't help.

Please suggest what is missing.

June 17, 2009 - 11:26 am UTC

did you consider comparing the statistics so you could tell us in general "what is radically different"

your gather_table_stats, why do you use 100% on schema, 100% when you ANALYZE, but not anything resembling 100% when you gather_table_stats?? that is a big difference don't you think?

dbms_stats not generating correct execution plan

Ashish Goel, June 18, 2009 - 6:53 am UTC

Hello Tom,

I compared status after using both Analyze and DBMS_STATS.

Statement used:

Analyze Table FKR806_MDB.COREBASECLASS Estimate Statistics Sample 100 Percent;

and

execute dbms_stats.gather_table_stats(ownname => 'FKR806_MDB', tabname => 'COREBASECLASS' , estimate_percent => 100, method_opt => 'FOR ALL COLUMNS', cascade => TRUE);

After comparison of statistis I found that Analyze calculated EMPTY_BLOCKS and AVG_SPACE, rest all other values remains the same.

Please note: neither Analyze nor Dbms_stats help alone. It only work when i use Analyze followed by dbms_stats.

Regards
Ashish

June 18, 2009 - 3:17 pm UTC

if there were no other differences, then I'd say your analysis is wrong (we don't use those metrics for optimization).

prove it - show us the plans before and after WITH CUTS AND PASTES of actual sqlplus interaction that shows us precisely what you are doing

your analyze computes statistics on the table, but not on the columns nor indexes. your dbms_stats does an estimate on the table, histograms on columns it feels like, and indexes. It would overwrite what the analyze did. There has to be something *different* between the analyze and dbms_stats

that is, you need to "unset statistics"
you would then analyze - and remember those values
you would then unset statistics
you would then dbms_stats - and remember those values
you would then unset again
you would then analyze + dbms_stats

then compare the three.

dbms_stats not generating correct execution plan

Ashish Goel, June 18, 2009 - 8:17 am UTC

Hi,

I have one table as IOT, do it have any different behaviour.

Thanks
Ashish

A reader, September 02, 2009 - 3:10 pm UTC

Trying to revisit the method of statistics collection in our production database.

1. Need some information on how we can use dba_tab_histograms to get some details and how that information can be used to tune our statistics collection.
2. Stale statistics - Can we just do this on a weekly basis ? What does this do ?

September 02, 2009 - 3:35 pm UTC

1) what would you like to know? basically it is a binary thing - you either

a) want histograms because you have skewed data and
b) you do not bind (you use literals) against that column

or

you do not want them.

2) it depends - do you start with an empty table and put 1,000,000 rows per day in it? waiting a week would not be a good thing. do you start with a 1,000,000 row table and add 1,000 rows per day to it?

A reader, September 02, 2009 - 4:21 pm UTC

1. How do we determine how skewed the data is ?
2. I have checked dba_tab_modifications view and that has a lot of rows with some number of inserts, updates, deletes. Do those numbers signify the number of inserts, updates, deletes since the last time the stats were collected ?

September 03, 2009 - 7:21 am UTC

1) select column, count(*) from t group by column order by column;

if you want a 'picture', you can

select column, cnt,
substr( rpad('*',30*round( cnt/max(cnt)over(),2),'*'),1,30) hg
from
( select column, count(*) cnt from t group by column )
order by column
/

ops$tkyte%ORA10GR2> select owner, cnt,
  2         substr( rpad('*',30*round( cnt/max(cnt)over(),2),'*'),1,30) hg
  3    from
  4  ( select owner , count(*) cnt from all_objects group by owner )
  5   order by owner
  6  /

OWNER                                 CNT HG
------------------------------ ---------- ------------------------------
A                                       2
ABC                                    10
BIG_TABLE                               4
CTXSYS                                313
DBSNMP                                 46
DMSYS                                 189
EXFSYS                                279
HR                                     34
IM                                      2
MDSYS                                 863 *
OLAPSYS                               718
OPS$TKYTE                              16
ORACLE_OCM                              8
ORDPLUGINS                             10
ORDSYS                               1721 **
OUTLN                                   8
PERFSTAT                              139
PUBLIC                              20094 **************************
RT_TEST                                 4
SCOTT                                   5
SI_INFORMTN_SCHEMA                      8
SYS                                 22936 ******************************
SYSMAN                               1291 *
SYSTEM                                425
TEST                                    1
TESTCASE1                               3
TESTCASE2                               4
TSMSYS                                  2
USER_1                                  3
WMSYS                                 305
XDB                                   346

31 rows selected.

that shows that 1) less than 254 (max buckets) 2) data is skewed - so if you did not bind against owner (warehouse/reporting system only) then - a histogram against this might make sense since you would have a frequency histogram (perfect picture of the data) and your literal SQL would be able to use it nicely (skewed)

2) not unless you use dbms_stats to flush to it, we cache things and update the underlying tables from time to time. do NOT use dbms_stats to flush to it, just saying that unless you do, the information in this view is stale.

Actually, if you peeked at:
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14237/statviews_4149.htm#sthref1901

you could have read that :)

dbms_stats.delete_table_stats and dbms_stats.gather_table_stats

Sarayu K.S., March 18, 2010 - 8:31 am UTC

Tom,

We see two different situations using DBMS_STATS.

Case 1) oracle 10.2.0.4

we gathered stats using

1) EXEC dbms_stats.gather_table_stats(ownname=>'OWNER',tabname=>'TABLE_NAME',partname=>NULL,estimate_percent=>25,cascade=>TRUE,DEGREE=>8);

After we completed gather stats for all tables, there was no improvement in query response. Then we did delete_stats

1) EXEC dbms_stats.delete_table_stats(ownname=>'OWNER_NAME',tabname=>'TABLE_NAME');

and again did gather stats using

2) EXEC dbms_stats.gather_table_stats(ownname=>'OWNER',tabname=>'TABLE_NAME',partname=>NULL,estimate_percent=>25,cascade=>TRUE,DEGREE=>8);

Now the query ran very fast.

Case 2)

On another database, Which is also on oracle 10.2.0.4

Delete stats was executed first and then Gather stats was executed with same parameters.

EXEC dbms_stats.delete_table_stats(ownname=>'OWNER_NAME',tabname=>'TABLE_NAME');
EXEC dbms_stats.gather_table_stats(ownname=>'OWNER',tabname=>'TABLE_NAME',partname=>NULL,estimate_percent=>25,cascade=>TRUE,DEGREE=>8);

Query was running very slowly.

Then we restored the stats which was done without delete stats and just gather_table_stats. Query was running very fast.

Would you please let us know when to use delete_table_stats?
When not to use delete_table_stats before gather_table_stats?

Thank You
Sarayu K.S.

March 18, 2010 - 8:58 am UTC

case1) you likely had very different statistics after doing the delete.

Here is my educated guess.

At some point, someone gathered statistics for some objects in a manner different from the way you normally did it. They gathered some details - like histograms - that your normal process did not. So, you had those legacy statistics sitting around.

When you delete stats, you wiped them out

And then we didn't have them anymore. So you had completely different statistics in the two cases.

case2) see #1, same thing happened.

solution: get consistent, use the same technique so you actually know what is in there over time.

Thanks

Sarayu K.S., March 18, 2010 - 11:44 pm UTC

Thanks Tom.

So delete stats will be useful only when we want to gather stats in a different way than the normal way. Other wise, if we have only one way of gathering stats for a table, there is no need to use delete stats , right?

Thanks

A reader, December 17, 2011 - 7:29 am UTC

Hi Tom,

We use below script to gather stats of table.
Suppose i have table where column a having unique value but column b non-unique.Index and on both the column.Below script will create histogram for both the column.Is there any harm if we are doing for unique column ???? or we dont do for column will give performance benefit ?

sys.dbms_stats.gather_table_stats('NCBSHOST',
'LN_TXNLOG_ROW_CNT',
estimate_percent => 40,
cascade => true,
method_opt => 'for all columns size 1',
degree => 4);

December 17, 2011 - 9:25 am UTC

it won't generate a histogram really, you have size 1.

you don't have any histograms really.

Gather stats

A reader, December 18, 2011 - 1:03 am UTC

Hi Tom,

We are using below script to gather stats :

sys.dbms_stats.gather_table_stats('NCBSHOST',
'LN_TXNLOG_ROW_CNT',
estimate_percent => 40,
cascade => true,
method_opt => 'for all columns size 1',
degree => 4);

But planning to replace with below script :

sys.dbms_stats.gather_table_stats('NCBSHOST',
'LN_TXNLOG_ROW_CNT',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
cascade => true,
method_opt => 'for all indexes for all indexed columns size auto',
degree => 4);

can we go ahead with this ??? Of course will check for sample table the plan difference before and after.But can we have more than 10000 table.And we can check for each and every table.

December 18, 2011 - 5:31 am UTC

just use this

sys.dbms_stats.gather_table_stats('NCBSHOST',
'LN_TXNLOG_ROW_CNT',
cascade => true,
degree => 4);

read:

http://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html

the "why does my plan change" section.

If you let method opt default, we'll figure out what columns should have histograms. And - here is a hint - it might be a bunch of columns that do not have indexes on them.

stats

A reader, December 20, 2011 - 10:00 am UTC

Hi Tom,

We are planning to run the modified script to get the histogram on column.But there is some concern.What if any (in worst case ) thing goes wrong in production in batch process.I mean process running longer than usual time.As this is very critical.And have time limit to finish it off.Obviously we will do test run same as production environment.

December 20, 2011 - 10:48 am UTC

Obviously we will do
test run same as production environment.

and that would give you a good indication of what to expect in real life then.

Date

A reader, December 21, 2011 - 1:05 am UTC

Hi Tom,

Is is it useful to gather histogram on date column of a table.

December 21, 2011 - 7:58 am UTC

yes, it can be. If the dates are skewed.

Just like it might be for strings and numbers...

Thank you for your detailed Explaination...

A reader, May 18, 2012 - 9:56 am UTC

Hi,
Thank you for your detailed explanation :) ...
One small clarification...
Recently there was a performance problem. One of my colleague analysed a global temporary table and the issue got resolved.
Do we need to analyse global temporary tables(No indexes present on temp table) ?

Thanks in advance...

Best Regards,
Sree

May 21, 2012 - 8:03 am UTC

You need to have representative statistics in place. We'd need to know the table size, number of rows, perhaps the natural skew of data.

for example, if you did:

select * from some_table where x in (select y from gtt);

we'd like to know if gtt (global temporary table) was 1 row or 1,000,000 rows. We'd like to know about how many distinct values of Y there would be.

That would tell us whether to full scan some_table and hash join to the GTT or use an index (assuming one exists) against some_table.x

Statistics are important to discover estimated cardinalities. They are not just useful on "indexed things". We need them on all things.

what kind of analyze is best

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

What do you think of this analyze script?

Review

Histograms with bind variables

DBMS_STATS

DBMS_STATS.GATHER_SCHEMA_STATS doesn't work

Review

Advice on "SIZE 1" histogram

Review

which granularity should be used to analyze a schema?

Review

REVIEW

Review

is the 8.1.7 doc wrong?

Clarification

Why DBMS_STATS.gather_schema_stats('SCHEMA') does not work?

please guide

Great explanation !!

New v9.2 GATHER_AUTO feature

Again dbms_stats vs analyze

Yes, but

More description of GATHER_SCHEMA_STATS method_opt

gather schema & unusable indexes

Follow-up question

One more follow-up

Granularity

Tom, what's going on here ?

how does estimate statistics work?

cool

I dont understand

if HWM is always known

What happens if SYS/SYSTEM tables are analyzed also

dbms_stats equivalent of estimate statistics

The gory details

9i

statistics connected as SYS

estimate vs compute

Very helpful. Thanks much.

dbms_stats does it different ....

guess I have to use gather_index_stats proc?

Figured it out by reading fully this thread

difference between dbms_stats and analyze

stale?

great explanation - how about new tables

new tables - cleared

Spendid info !

DBMS_STATS was very useful for me..Thanks

SKEWONLY usage

More information

a quote

pl/sql numeric error

reg table monitoring

Delete Stats

confused at analyze

Statistics for database.

HIDDEN COLUMNS

thiru

is it only for statistics gathering?

analyze_schema gives better plan than gather_schema_stats ??

Strategy for Gather Stat.

Here is the tkprof from FTS ...

Here is the tkprof from INDEX RANGE SCAN...

Confession to make!!

Here is the simplified query and its explain plans

dbms_stats.alter_database_tab_monitoring

Statistics on SYS

Delete column statistics

Why don't we use dbms_stats.delete_column_stats

Most and should for every Oracle professionals

Stats with Histograms

USAGE

USAGE

dbms_stats vs analyze

That was real quick and useful response.