Home>Question Details



Suresh -- Thanks for the question regarding "Analyze and DBMS_STATS", version 8.1.7

Submitted on 23-Jun-2002 15:43 Central time zone
Last updated 26-Jan-2010 1:48

You Asked

Tom,
   Could you please tell me if there are any other important differences, advantages with 
DBMS_STATS over ANALYZE other than the points listed below.

1. DBMS_STATS can be done in parallel
2. Monitoring can be done and stale statistics can be collected for 
   changed rows using DBMS_STATS.

Thanks,
Suresh

 

and we said...

you can import/export/set statistics directly with dbms_stats

it is easier to automate with dbms_stats (it is procedural, analyze is just a command)

dbms_stats is the stated, preferred method of collecting statisttics.

dbms_stats can analyze external tables, analyze cannot.

DBMS_STATS gathers statistics only for cost-based optimization; it does not gather other 
statistics. For example, the table statistics gathered by DBMS_STATS include the number 
of rows, number of blocks currently containing data, and average row length but not the 
number of chained rows, average free space, or number of unused data blocks.

dbms_stats (in 9i) can gather system stats (new)

ANALYZE calculates global statistics for partitioned tables and indexes instead
of gathering them directly. This can lead to inaccuracies for some statistics, such as 
the number of distinct values.  DBMS_Stats won't do that.

Most importantly, in the future, ANALYZE will not collect statistics needed by
the cost-based optimizer. 

Reviews    
3 stars Incomplete statistics?   June 23, 2002 - 8pm Central time zone
Reviewer: Jayaraman Ashok from India
Hi Tom
You have replied that dbms_stats does not gather information on chained rows,unused data blocks and 
average free space available in blocks. Does it mean that to get the above mentioned information we 
have to run analyze command again? Also you mention that dbms_stats gathers statistics required by 
the CBO. Are there any special statistics that are gathered apart from number of rows, number of 
blocks occupied? If it is only the above two, then i believe that analyze command already 
accomplishes it. The only feature that is of use may be the set_stats procedure for tuning queries 
before using them in production.

Thanks & regards,
Ashok 


Followup   June 23, 2002 - 9pm Central time zone:

dbms_stats only gathers things the CBO cares about.  The CBO does not care about that information. 

In the future, analyze will not gather CBO info, only these other things CBO doesn't care about.  
DBMS_STATS currently and will only gather CBO info.

I think you need to reread the answer.  dbms_stats gathers everything that the analyze command does 
but.... can do it in parallel, can do it for stale tables, can export/import stats, can analyze 
things analyze cannot (the system for example) and so on.  There are lots more stats then just num 
rows and num blocks -- and both commands collect them.  There are things dbms_stats will do that 
analyze does not (especially on a partitioned table).  

Recommendation: use dbms_stats.   

5 stars   June 24, 2002 - 9am Central time zone
Reviewer: Dave from Dayton, OH
When I "CREATE INDEX my_index ... COMPUTE STATISTICS" is the analyze being performed by the 
DBMS_STATS method, or the ANALYZE INDEX method?

If by the latter method, am I likely to gain by running DBMS_STATS post-CREATE or post-REBUILD 
mostly for partitioned segments (ref. your comments on gathering global statistics), with "ANALYZE 
..." still being a reasonable choice at the moment for non-partitioned segments? 


Followup   June 24, 2002 - 9am Central time zone:

It is closer to the analyze method -- only because it is DDL (same results from analyze, 
dbms_stats, create index ultimately).

But since you only create an index once, it is good only for the initial build, after that -- you 
are back to dbms_stats.

It is a general recommendation to use dbms_stats, period.  There will come a time when dbms_stats 
is the only way. 

5 stars external table, index and dbms_stats   June 24, 2002 - 10am Central time zone
Reviewer: Suresh from New York
Tom,
   You have mentioned that DBMS_STATS can analyze external tables. I tried to create an index on an 
external table and it came up with an error "not supported on external table".
My question is if dbms_stats could analyze external tables, why not allow an index on external 
table as well and analyze it with dbms_stats. 
 


Followup   June 25, 2002 - 8am Central time zone:

you cannot index something that we have no control over.

Consider if we did.  say the file started with:

1,a,hello
2,b,goodbye


and we indexed that (the first column).  Now, you come along and replace that file with:

2,b,goodbye
1,a,hello


and issue: select * from external_table where column1 = 1;

what do you think would happen if

a) we used an index
b) we full scanned

I think we would get two answers.  We cannot index a file that is changeable (and changeable by 
forces outside of our control)


 

5 stars Queries on Analyze   June 24, 2002 - 4pm Central time zone
Reviewer: Sanjay Jha from Toronto, Canada
Tom,
The ongoing discussion has generated few more related queries in me. I am listing them below, for 
you crisp and clear answers :
1. What are different packages for running analyze (eg. dbms_utulity, dbms_stats, db_maintenance 
etc.) and what are the advantages of each?
2. What is the best way to analyze one particular table from CBO's perspective(BTW, what are all 
possible ways)?
3. If someone has to automate the analyze, what do you recommend and why - scheduling through cron 
or dba_jobs? (For that matter what is your opinion on scheduling any job for database- OS level or 
through Oracle jobs?)
4. This question is not related to analyze but to the Oracle supplied packages- Where in a databse 
one can get info about these packages (like v$ views etc.) or give me an url for reference please.

Thanks.

 


Followup   June 25, 2002 - 8am Central time zone:

1) dbms_stats is the one you should use. dbms_utility has some very very primitive functions left 
in there for backwards compatibility only.

2) for that, please read the ADMIN guide, it covers the methods and when to use what.

You know -- if there was a "best way", it would be the only way...  

3) If you have my book -- you know my opinion on this.  Do it in the database.

Cron -- can cron tell if the database is up and running?  cron will try to analyze even when the 
database is not ready.  DBMS_JOBS is the right tool for this in my opinion (barring that, OEM would 
be the right place).  I want all of my database related stuff going on in the database -- one place 
to look for issues (my alert log or OEM logs) and so on.

4) how about the supplied packages guide.  Instead of a direct link, I'll teach you to fish.

goto otn.oracle.com
click on documentation
pick what database version you have

go after the supplied packages guide (in 8i and before, this was with the "database" set of docs.  
In 9i -- there is just one big list)

 

5 stars DBMS_STATS and statistics   May 2, 2003 - 7pm Central time zone
Reviewer: Arun Gupta from Harrisburg, PA USA
Tom
I did the following:
a) Using dbms_stats, analyzed all the tables in schema X of database A and using 
dbms_stats.export_schema_stats, exported the statistics into stattab.
b) Extracted the table/index creation scripts for schema X, excluding the stattab table.
c) Using Oracle export (exp), exported the stattab table.
d) Using the scripts in b), I created all the tables and indexes in schema Y of database B. These 
tables do not contain even a single row of data.
e) Using Oracle import(imp), imported the stattab into schema Y of database B. 
f) Using dbms_stats.import_schema_stats, I imported all the statistics from stattab.

I cannot see the statistics like num_rows, avg_row_len etc. in user_tables. How does the CBO read 
these statistics?

Thanks 


Followup   May 2, 2003 - 8pm Central time zone:

dbms_stats was designed to move the stats for user A from db1 to user A in db2.

It is the X to Y that is messing you up.  This demonstrates what you are seeing, but shows how to 
adapt it to work anyway:



ops$tkyte@ORA920LAP> drop user a cascade;
User dropped.

ops$tkyte@ORA920LAP> drop user b cascade;
User dropped.

ops$tkyte@ORA920LAP> grant dba to a identified by a;
Grant succeeded.

ops$tkyte@ORA920LAP> grant dba to b identified by b;
Grant succeeded.

ops$tkyte@ORA920LAP> @connect a/a

a@ORA920LAP> create table t as select * from all_objects;
Table created.

a@ORA920LAP> create index t_idx on t(object_id);
Index created.

a@ORA920LAP> analyze table t compute statistics
  2  for table for all indexes for all indexed columns;
Table analyzed.

a@ORA920LAP> exec dbms_stats.create_stat_table( user, 'st' );
PL/SQL procedure successfully completed.

a@ORA920LAP> exec dbms_stats.export_schema_stats( user, 'st' );
PL/SQL procedure successfully completed.

a@ORA920LAP> !exp userid=a/a tables=st

Export: Release 9.2.0.3.0 - Production on Fri May 2 20:08:56 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.3.0 - Production
Export done in WE8ISO8859P1 character set and AL16UTF16 NCHAR character set

About to export specified tables via Conventional Path ...
. . exporting table                             ST         78 rows exported
Export terminated successfully without warnings.


a@ORA920LAP> @connect b/b

b@ORA920LAP> create table t as select * from all_objects where 1=0;
Table created.

b@ORA920LAP> create index t_idx on t(object_id);
Index created.

b@ORA920LAP> !imp userid=b/b fromuser=a touser=b

Import: Release 9.2.0.3.0 - Production on Fri May 2 20:08:56 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.3.0 - Production

Export file created by EXPORT:V09.02.00 via conventional path

Warning: the objects were exported by A, not by you

import done in WE8ISO8859P1 character set and AL16UTF16 NCHAR character set
. . importing table                           "ST"         78 rows imported
Import terminated successfully without warnings.

b@ORA920LAP> exec dbms_stats.import_schema_stats( user, 'st' );

PL/SQL procedure successfully completed.

b@ORA920LAP> select table_name, num_rows from user_tables;

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
ST
T

that is what you currently see but:

b@ORA920LAP> update st
  2  set c5 = USER
  3  /

78 rows updated.

b@ORA920LAP> exec dbms_stats.import_schema_stats( user, 'st' );

PL/SQL procedure successfully completed.

b@ORA920LAP> select table_name, num_rows from user_tables;

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
ST
T                                   30557

b@ORA920LAP>

that'll "rename the user" in the stats tab so it'll be looking for B.T, not A.T

 

5 stars It works   May 2, 2003 - 9pm Central time zone
Reviewer: Arun Gupta from Harrisburg, PA USA
After updating c5 in stattab to new schema name, it works. 
Thanks...!! 


5 stars What about execution statistics   May 6, 2003 - 9am Central time zone
Reviewer: Arun Gupta from Harrisburg, PA USA
Tom
I am a bit unclear about the purpose of importing statistics from one database to another. My 
questions are:

a) If I set a table with 0 rows with the statistics of a 1 million row table, can I run a query 
against this 0 row table and actually see the same buffer gets as I would see against a 1 million 
row table? 
b) When the query actually hits a 1 million row table, can Oracle change the plan? 


Followup   May 6, 2003 - 10am Central time zone:

a) no
b) not sure what you mean.  if you mean "will the plan for a table with 900,000 rows be different 
then a plan for 1,000,000" -- the answer is "yes, it could be".  If you mean "if the statistics say 
there are N rows but there are really M rows -- will Oracle change the plan after finding there are 
not N rows?" -- the answer is "no, once formulated -- the plan is the plan"



One use of the export/import stats trick is

a) you have a logical standby or some reporting instance that is a close copy of production (using 
replication, whatever)

b) you run stats on the copy.  move them into production.


 

4 stars Problems in anlayze   May 7, 2003 - 1am Central time zone
Reviewer: SUBBARAO. 
There is much useful information on ANALYZE in this article. It is helpful. My scenario is like 
this, looks strange but it is.

  In my production system, we have some problems in getting some reports from the application.
For one report query has 8 tables, out of which 2 tables have huge number of records, both these 
tables are not analyzed, rest all the tables are analyzed. This report is not launching.
We have now analyzed the entire schema. Then the report started working. The cost of the query 
before and after analyze has drastically come down. Some tables will be getting different kinds of 
data, and some tables will be getting same type of data.  I am not sure how long this report will 
work and when I should analyze tables. I have read about the table monitoring feature and document 
says that Oracle will automatically analyze the table if it requires analyze. 

My questions:
1) So enabling this feature, what will be the side effects in performance. I know that if we want 
some thing we should scarifice few things, but I would like to highlight these side effects to my 
boss. 

2) If a query is having 5 tables and out of which 3 tables are not analyzed and 2 tables are 
analyzed, will the cost calculation be accurate? I mean will the optimizer gives a better plan.
Is it required that all the tables in the query are to be analyzed or all not to be analyzed.
 


Followup   May 7, 2003 - 7am Central time zone:

1) monitoring is extremely low impact, not even measurable.  YOU still have to analyze, you just 
use dbms_stats with the gather stale option.  it'll collect stats only on tables you need.

2) of course the costing won't be accurate.  you should either have all or no objects refered to by 
a query analyzed. 

4 stars Excellent and Timely discussion....   May 7, 2003 - 10am Central time zone
Reviewer: A reader from Mohsin/Dar-es-salaam/Tanzania
Hi Tom:

As usual, your answers allows us (DBA's) to have a good night's sleep.

On the issue of monitoring...

What criteria does oracle use to determine that the stats are stale ?
Can these thresholds be changed/set by the user ?
Are the stats in the user_tab_modifications cumulative ?
If I need to preserve the stats in the above view, can you please suggest a procedure/script as to 
how I go about saving these. I am interested in preserving the amount of dml activity by segment.

Thanks
Mohsin 


Followup   May 7, 2003 - 10am Central time zone:

o when about 10% of the table is changed

o no

o yes

o you would have to "insert into some_other_table select sysdate, a.* from the modifications table 
A" before gathering stats.  Make that part of your 'process'. 

4 stars Excessive Recursive calls after stats   May 7, 2003 - 11am Central time zone
Reviewer: A reader from Mohsin/Dar-es-salaam/Tanzania
Hi Tom:

I have observed that the statistics section of the sql+ trace reports excessive recursive call 
count for queries agains the tables just after the tables have been analyzed.

What does this mean and what are the side effects of stats gathering ? Has the stats gathering 
invalidated the parsed sql in the shared pool ?

Would this be a good way of implementing my own thresholds:
I'd turn on monitoring for the segments.
Collect the dml activity from the user_tab_modifications to my own table and then have my procedure 
re-compute the stats after say x% of change to the segment and not use the gather stale option of 
the dbms_stats.
What is the timestamp column for ? If the values are cummulative then how does it determine the 
incremental from the last stale stats run ?

Thanks. 


Followup   May 7, 2003 - 1pm Central time zone:

hows about an example ? 

5 stars Here it is.   May 7, 2003 - 2pm Central time zone
Reviewer: A reader from Mohsin/Dar-es-salaam/Tanzania
09:11:53 SQL> connect scott/tiger
Connected.
09:16:14 SQL> 
09:16:14 SQL> 
09:16:14 SQL> analyze table junk compute statistics for table for all indexed columns for all 
indexes;

Table analyzed.

09:17:06 SQL> show user
USER is "SCOTT"
14:36:01 SQL> set autotrace traceonly explain statistics;
14:36:21 SQL> 
14:36:23 SQL> select * from junk where owner='MMJ';

1 row selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=86)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'JUNK' (Cost=4 Card=1 Byt
          es=86)

   2    1     INDEX (RANGE SCAN) OF 'IDX_JUNK_OWNER' (NON-UNIQUE) (Cos
          t=3 Card=1)





Statistics
----------------------------------------------------------
        330  recursive calls
          0  db block gets
         73  consistent gets
          3  physical reads
          0  redo size
       1114  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          1  rows processed

14:36:37 SQL> /

1 row selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=86)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'JUNK' (Cost=4 Card=1 Byt
          es=86)

   2    1     INDEX (RANGE SCAN) OF 'IDX_JUNK_OWNER' (NON-UNIQUE) (Cos
          t=3 Card=1)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
       1114  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

14:36:58 SQL>  


Followup   May 8, 2003 - 9am Central time zone:

analyzing the table will invalidate open cursors on that object in the shared pool.  the next parse 
against that object will be "hard".  hard does lots more recursive sql then "soft"

It is not the analyze so much as "you did something to cause hard parsing to have to take place"

here is an example using the rbo and EMP:


ops$tkyte@ORA817DEV> select * from emp;

14 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (FULL) OF 'EMP'




Statistics
----------------------------------------------------------
          0  recursive calls
         12  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1979  bytes sent via SQL*Net to client
        425  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         14  rows processed

ops$tkyte@ORA817DEV> alter table emp add x number;

Table altered.

ops$tkyte@ORA817DEV> select * from emp;

14 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (FULL) OF 'EMP'




Statistics
----------------------------------------------------------
        148  recursive calls
         15  db block gets
         26  consistent gets
          0  physical reads
          0  redo size
       2042  bytes sent via SQL*Net to client
        425  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
         14  rows processed


Of course, SHUTDOWN/STARTUP would have the same effect.

normal, natural, expected and one time -- you hard parse and soft after that. 

5 stars Some more info....   May 7, 2003 - 3pm Central time zone
Reviewer: A reader 
Hi Tom.

Here is some more info that might help you.

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE    9.2.0.1.0       Production
TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production


The junk table is a mirror of dba_objects view.

And there was no activity in the d/b between the time 9:16  and 14:36 in my sql+ session results I 
posted earlier.


Please let me know if you need more info.

Thanks
Mohsin 


3 stars effect of partial analyze   May 8, 2003 - 7am Central time zone
Reviewer: kumar from India
Tom,

You have mentioned in one of your replies that we should not analyze certain objects used in a 
query and leave the rest un-analyzed. I have read similar statements in lots of places, but i could 
not really understand why. Can you please explain why is it so dangerous to do so? 


Followup   May 8, 2003 - 9am Central time zone:

analogy time.

You are a manager now.

I give you a team of 4 people.

I tell you all about 2 of them -- their capabilities, experience, level of comprehension, lots of 
stuff.

I tell you NOTHING about the other two.

I give you a problem to solve using these 4 people.  You need to make assignments based on your 
knowledge of them.  You have good knowledge of 2, no knowledge of the other 2.  You look at them 
and size them up.  Maybe you don't like the way the 4th guy dressed -- he "looks dumb" to you.

You dole out responsibilities -- not giving much hard stuff to #4.  Unfortunately for you he was 
the smartest in the group and if he had been given 99% of the work -- it would have been a success. 
 As it was, due to inefficient resource allocation -- you never finish your project!


So it is with statistics.  You give the optimizer only a little information, it has to "make up the 
rest" and can come to some very bad decisions as to the correct approach. 

5 stars Hat's off to you...   May 8, 2003 - 11am Central time zone
Reviewer: A reader from Mohsin/Dar/Tanzania
Hi Tom:

Excellent analogy about the CBO....

I will confirm your point, but I am positive nothing was happening on my d/b to have caused 
invalidation of my cursors resulting in hard parse. I conducted the test on my laptop, so I know 
what was happening.

Could the Oracle version have made the difference in this case?

I was using Oracle 9i Rel 2, and from your SQL+ prompt it seems you were using 8i.

Thanks
Mohsin 


5 stars Ahhh the language barrier strikes again...   May 8, 2003 - 11am Central time zone
Reviewer: A reader from Mohsin/Dar/Tanzania.
Sorry Tom:

I read and re-read your response. I get it.

Once in a while in between Swahili, Gujarati, Katchi, Urdu, Hindi and a sprinkling of Arabic I 
forget my English...

As they say in Swahili - Hakuna Matata (No Problem)

Kwaheri.

Thanks again.
 


5 stars Analogy worth more than 5 stars!!!!   May 8, 2003 - 11am Central time zone
Reviewer: A reader 


3 stars Both Analyze and DBMS_STATS?   May 16, 2003 - 7pm Central time zone
Reviewer: Dan Jatnieks from Redwood City, CA USA
In your original response you stated there are some statistics that DBMS_STATS does not collect:

"...but not the number of chained rows, average free 
space, or number of unused data blocks"

I would like to switch to using DBMS_STATS for CBO statistics on my 9i database, but I also want to 
collect the statistics that DBMS_STATS does not; I have monitoring software that collects and uses 
chained rows, avg free space and unused data blocks. 

Can I use Analyze first and then DBMS_STATS (in that order) to accomplish this? 

The reasoning would be that first using Analyze will collect the data that DBMS_STATS does not; 
then DBMS_STATS will overwrite the CBO statistics that Analyze collected, but leave alone the other 
Analyze data.

Will that work?
 


Followup   May 17, 2003 - 10am Central time zone:

it would work but it would be like using two cars to goto work -- it would be double the work in 
most cases (partitioned tables being the exception).

can the monitoring software use "list chained rows" instead and can it use dbms_space to get the 
other info. 

4 stars Statistics for Partioned tables   February 5, 2004 - 10am Central time zone
Reviewer: Nathan from United Kingdom
Dear Tom, 
 Your answers have been comprehensive and helpful all the time, just another quick one ( hopefully) 
from me.

When we the Estimate_percent parameter and sent it to 0.00001 in DBMS_STATS.GATHER_TABLE_STATS , 
would it mean only that percent of the blocks be scanned for analysis ? Our gather_stats takes with 
estimate takes hours to complete , My Senior DBA is skeptical of the percentage option and reckons 
that it could be a bug , he concluded by watching V$session_longops ( as the same number of blocks 
were scanned ) . I have not checked it yet but apparently the percentage does work on 
non-partitioned tables . Is this true ? Is there a bug in oracle 9i release 1.0 software

Thanks and Regards
Vish 


Followup   February 5, 2004 - 7pm Central time zone:

hows about you share the details -- such as the entire, full command you are using plus the type of 
table you are doing this to and some facts about the table itself.

0.00001 would be a "not so good number" to use, pretty darn small it would be. 

2 stars Analyze and DBMS_STATS   February 6, 2004 - 12am Central time zone
Reviewer: chandu singh from Bangalore, India
Hi 
 i think it is a best to have the percentage of minium to 10% . is that right if not pls explain. 


Followup   February 7, 2004 - 12pm Central time zone:

why -- whats the reasoning (not that you are wrong, but -- why do you think that :)

it could be 1%
it could be 10%
it could be 100%

it just depends on "how evenly distributed the data is in general".  If the data is large-ish AND 
pretty much evenly distributed, or you are not computing histograms -- 1% might be dandy.

Me, I like to compute whenever possible personally.  Only when it gets really too large -- and then 
estimate as high as you can (realizing of course anything over 49% is really compute in disguise) 

2 stars Analyze and DBMS_STATS   February 6, 2004 - 12am Central time zone
Reviewer: chandu singh from Bangalore, India
Hi 
 i think it is a best to have the percentage of minium to 10% . is that right if not pls explain. 


5 stars DBMS_STATS.AUTO_SAMPLE_SIZE Issues   February 6, 2004 - 12pm Central time zone
Reviewer: RyuO from Washington, DC USA
My experience was that DBMS_STATS is wonderful, but requires a lot of wrapping. Here is the wrapper 
proc I evolved:

CREATE OR REPLACE PROCEDURE analyze_schema(
   p_schema_name IN VARCHAR2 := USER,
   p_Estimate_Pct IN PLS_INTEGER := NULL) IS
   -- like DBMS_UTILITY.ANALYZE_SCHEMA, but using DBMS_STATS
   c_proc_name          CONSTANT VARCHAR2(128) := UPPER('analyze_schema');
   c_Def_Estimate_Pct   CONSTANT PLS_INTEGER := 35;
   c_Min_Degree         CONSTANT PLS_INTEGER := 1;
   c_Def_Degree         CONSTANT PLS_INTEGER := LEAST(c_Min_Degree,DBMS_STATS.DEFAULT_DEGREE);
      -- default degree is somehow 32767 at the moment
   c_Def_Options        CONSTANT VARCHAR2(30) := 'GATHER AUTO';
      -- 'GATHER' means analyze all objects.
      -- 'GATHER AUTO' means only objects that have changed.
   v_Estimate_Pct PLS_INTEGER;
      -- v_Estimate_Pct can also be set to:
      --    DBMS_STATS.AUTO_SAMPLE_SIZE (Oracle sets this, but it has been known to be 0...)
      --    NULL (which means, look at everything, e.g., 'COMPUTE')
   v_schema_name        ALL_TABLES.OWNER%TYPE := NVL(UPPER(p_schema_name),USER);
   v_Msg_Text           VARCHAR2(2000);
BEGIN
   CASE p_Estimate_Pct
      WHEN NULL THEN
         v_Estimate_Pct := 100;
      WHEN 100 THEN
         v_Estimate_Pct := 100;
      ELSE
         IF DBMS_STATS.AUTO_SAMPLE_SIZE = 0 THEN
            IF p_Estimate_Pct BETWEEN 1 AND 99 THEN
               v_Estimate_Pct := p_Estimate_Pct;
            ELSE
               v_Estimate_Pct := c_Def_Estimate_Pct;
            END IF;
         ELSE
            v_Estimate_Pct := DBMS_STATS.AUTO_SAMPLE_SIZE;
         END IF;
   END CASE;
   DBMS_STATS.GATHER_SCHEMA_STATS(
      ownname => v_schema_name,
      estimate_percent => v_Estimate_Pct,
      cascade => TRUE,
      degree => c_Def_Degree,
      options => c_Def_Options);
      -- "cascade" means analyze indexes too.
   v_Msg_Text := c_proc_name||'-> Schema['||v_schema_name||'], Option['||c_Def_Options||'], 
Sampling['||v_Estimate_Pct||'%]';
   DBMS_OUTPUT.PUT_LINE(v_Msg_Text);
EXCEPTION
   WHEN OTHERS THEN
      v_Msg_Text := c_proc_name||'-> EXCEPTION: '||SQLERRM;
      RAISE_APPLICATION_ERROR(-20199,v_Msg_Text);
END analyze_schema;

My questions are related to AUTO_SAMPLE_SIZE:
1. How come it is always 0? That's what I always see, generally in 9.2.0.3 and 9.2.0.4 on Solaris.
2. What should the default estimate percentage be, if DBMS_STATS can't tell me? Anecdotal evidence 
suggests that the ANALYZE/ESTIMATE percentage should be 35-40%, but then ANALYZE and DBMS_STATS are 
different... 


Followup   February 7, 2004 - 1pm Central time zone:

requires....

OR "permits".

it requires NO WRAPPING.

It quite handily permits it easily.


you'll have to provide me a reproducible, small, simple, yet 100% complete test case to demonstrate 
what you mean by "0"

ops$tkyte@ORA9IR2>  exec dbms_stats.gather_table_stats( user,'BIG_TABLE',estimate_percent => 
dbms_stats.auto_sample_size );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select sample_size from user_tables where table_name = 'BIG_TABLE';
 
SAMPLE_SIZE
-----------
     100000


(which also shows "does not require", but would "permit" wrapping.... ) 

3 stars understanding sample_percent   February 25, 2004 - 3pm Central time zone
Reviewer: A reader 
Oracle 9.2.0.2
I observed that when estimate_percent is specified, the gather_schema_stats is still doing a 
compute statistics. The sample_size=num_rows. Am I incorrect in concluding that its ignoring the 
25% and estimating 100% of rows ? If so, why ?
OR have I overlooked at any setting ?

Please look at the following Case :
------- # of rows from T --------
SQL> select count(*) from T
  COUNT(*)
  --------
    600980

---- Querying the dba_tables ---------
SQL> select table_name, monitoring, num_rows, sample_size, to_char(last_analyzed, 'HH24-mi-ss') 
ANALYZED from dba_tables;

TABLE_NAME            MON   NUM_ROWS SAMPLE_SIZE ANALYZED
--------------------- --- ---------- ----------- --------
T                     YES     600980      600980 15-39-16

------ inserting data ----------
SQL> insert into T select * from T

600980 rows created.

SQL> commit;
Commit complete.

--------- row count from T ---------
SQL> select count(*) from T;
  COUNT(*)
----------
   1201960

-------- Querying the dba_tables ---------
SQL> select table_name, monitoring, num_rows, sample_size, to_char(last_analyzed, 'HH24-mi-ss') 
ANALYZED from dba_tables;

TABLE_NAME            MON   NUM_ROWS SAMPLE_SIZE ANALYZED
--------------------- --- ---------- ----------- --------
T                     YES     600980      600980 15-39-16

---------- gather_schema_stats ---------
begin
  dbms_stats.gather_schema_stats(
       ownname          => 'TEST',
       estimate_percent => 25,
       method_opt       => 'FOR ALL INDEXED COLUMNS SIZE 1',
       degree           => 2,
       cascade          => FALSE,
       options          => 'GATHER AUTO'
       );
  end;
/ 
PL/SQL procedure successfully completed.

-------- Querying the dba_tables ---------
SQL> select table_name, monitoring, num_rows, sample_size, to_char(last_analyzed, 'HH24-mi-ss') 
ANALYZED from dba_tables;

TABLE_NAME            MON   NUM_ROWS SAMPLE_SIZE ANALYZED
--------------------- --- ---------- ----------- --------
T                     YES    1201960     1201960 15-48-45

----------------------------------- 


Followup   February 25, 2004 - 7pm Central time zone:

straight from the docs here:


GATHER AUTO: Gathers all necessary statistics automatically. Oracle implicitly determines which 
objects need new statistics, and determines how to gather those statistics. When GATHER AUTO is 
specified, the only additional valid parameters are ownname, stattab, statid, objlist and statown; 
all other parameter settings are ignored. Returns a list of processed objects.





 

5 stars got it ... but   February 26, 2004 - 3pm Central time zone
Reviewer: A reader 
Tom, I understand it better now. 
I took out the invalid parameters.
But my question still lingers :
Why is the gather_schema_stats analyzing all 100% of rows. I changed only 10%(50K of rows) out of 
450K rows.
My dbms_job wakes up every few days to execute gather_schema_stats. Now, if I have 10-15 tables 
which have changed by more than 10%, then analyzing these tables 100%(compute) will take lot of 
time.
I know oracle automatically calculates how much percentage to analyze. But everytime why is it 
analyzing all 100% of the rows? Why is it not estimating ? 

=========
begin
   dbms_stats.gather_schema_stats(
         ownname          => 'TEST',
         options          => 'GATHER AUTO'
         );
    end;
/
==============

 


Followup   February 26, 2004 - 4pm Central time zone:

it does not take that long to analyze 1,000,000 rows.

gather stale does not re-analyze just "changed rows"
gather stale re-analyzes the entire object.


If you don't want AUTO, don't use AUTO, use gather stale and fill in the remaining blanks (but for 
such small tables, i would get it all) 

4 stars hmmm   February 27, 2004 - 2pm Central time zone
Reviewer: A reader 
Hi Tom,
the thing am fighting is I like to use "stale" because I can specify the percentage to analyze and 
take advantage of the degree. AUTO is good because it automatically analyzes newly added tables.

Best of both worlds would be :
1) analyze newly added tables -> AUTO does this. STALE doesn't.

2) analyze only percentage of rows, not 100% -> AUTO is  analyzing 100%. STALE can do percentage.

Is it possible ?

Thanks as always 


Followup   February 27, 2004 - 3pm Central time zone:

sounds like you might want:

a) alter schema tab monitoring
b) gather empty
c) gather stale

for your degree of control.  consider:




ops$tkyte@ORA920PC> create table t1 as select * from all_users;
 
Table created.
 
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> exec dbms_stats.alter_schema_tab_monitoring
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> select table_name, num_rows from user_tables;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1
 
ops$tkyte@ORA920PC> exec dbms_stats.gather_schema_stats( user, options => 'gather empty' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> select table_name, num_rows from user_tables;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                     45

that picked it up, this

 
ops$tkyte@ORA920PC> exec dbms_stats.gather_schema_stats( user, options => 'gather stale' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> select table_name, num_rows from user_tables;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                     45

did nothing..

 
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> create table t2 as select * from all_users;
 
Table created.
 
ops$tkyte@ORA920PC> insert into t1 select * from t1;
 
45 rows created.
 
ops$tkyte@ORA920PC> insert into t1 select * from t1;
 
90 rows created.
 
ops$tkyte@ORA920PC> insert into t1 select * from t1;
 
180 rows created.
 
ops$tkyte@ORA920PC> commit;
 
Commit complete.
 
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> exec dbms_stats.alter_schema_tab_monitoring
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> select table_name, num_rows from user_tables;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                     45
T2
 
ops$tkyte@ORA920PC> exec dbms_stats.gather_schema_stats( user, options => 'gather empty' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> select table_name, num_rows from user_tables;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                     45
T2                                     45

that got t2 and...
 
ops$tkyte@ORA920PC> exec dbms_stats.gather_schema_stats( user, options => 'gather stale' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> select table_name, num_rows from user_tables;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                    360
T2                                     45
 
ops$tkyte@ORA920PC>

that got t1...

 

5 stars perfecto   February 27, 2004 - 10pm Central time zone
Reviewer: A reader 
pretty GOOD !
Thank you much.
Have a good weekend. 


5 stars gather_schema_stats('SYS');   April 27, 2004 - 9am Central time zone
Reviewer: A reader 
Tom,

Please let us know, when we need to do the following

execute dbms_stats.gather_schema_stats('SYS');

at what point of time, do you recommend us executing the above sql


Thanks a lot 


Followup   April 28, 2004 - 12pm Central time zone:

in your test system before going production, to ensure you are not going to mess up an existing 
system.


in 10g, it'll be done for you out of the box.

in 9i, it is "ok and safe in most cases to do" -- but TEST it, don't just DO IT

in 8i and before, I would not do it, not necessary and probably would lead only to troubles.  but 
again, you would TEST it of course. 

3 stars Confusing situation   May 14, 2004 - 2pm Central time zone
Reviewer: Bruno Di Rei Araujo from Uberlândia, MG, Brazil
Tom,

I've read you're advice about utilizing DBMS_STAT instead of ANALYZE statement. But, I had an 
experience that makes me get confused: with analyze, the cost got Higher, but the response time was 
the best; with DBMS_STATS, little tiny costs, but 12 times the response time (comparing TKPROF 
results). Is there any reason for that situation?

(TKPROF of ANALYZEd query)
********************************************************************************

SELECT a.seq_mov, a.empresa, a.filial, a.LOCAL, a.produto, a.qtde_mov,
a.vlr_tot_mov, a.num_docto, a.cod_oper, a.dt_mov, a.historico,
a.lote_cont, a.lcto_direto, a.disp_trigger, a.dt_registro,
a.qtde_val,
b.entrada_saida
FROM ce_movest a, ce_operacoes b
WHERE (:b5 IS NULL OR a.empresa = :b5)
AND (:b4 IS NULL OR a.filial = :b4)
AND (:b3 IS NULL OR a.produto = :b3)
AND (:b2 IS NULL OR a.dt_mov >= :b2)
AND (:b1 IS NULL OR a.dt_mov <= (TO_DATE (:b1) + .99999))
AND b.empresa = a.empresa
AND b.cod_oper = a.cod_oper
order by a.empresa, a.filial, a.produto, a.dt_mov, b.ajuste_estoque,
      b.entrada_saida /*decode(b.entrada_saida, 'E', decode(b.transferencia, 'S', 3, 1), 2)*/,
    a.local, a.qtde_mov

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      2.27       5.99         55      19061          0           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      2.27       5.99         55      19061          0           4

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 47  

Rows     Row Source Operation
-------  ---------------------------------------------------
      4  SORT ORDER BY (cr=19061 r=55 w=0 time=5997116 us)
      4   HASH JOIN  (cr=19061 r=55 w=0 time=5996843 us)
    103    INDEX FULL SCAN CE_OPERACOES_AK1 (cr=1 r=0 w=0 time=288 us)(object id 553481)
      4    TABLE ACCESS FULL CE_MOVEST (cr=19060 r=55 w=0 time=5992324 us)

********************************************************************************




(TKPROF of DBMS_STATSed)
********************************************************************************

SELECT a.seq_mov, a.empresa, a.filial, a.LOCAL, a.produto, a.qtde_mov,
a.vlr_tot_mov, a.num_docto, a.cod_oper, a.dt_mov, a.historico,
a.lote_cont, a.lcto_direto, a.disp_trigger, a.dt_registro,
a.qtde_val,
b.entrada_saida
FROM ce_movest a, ce_operacoes b
WHERE (:b5 IS NULL OR a.empresa = :b5)
AND (:b4 IS NULL OR a.filial = :b4)
AND (:b3 IS NULL OR a.produto = :b3)
AND (:b2 IS NULL OR a.dt_mov >= :b2)
AND (:b1 IS NULL OR a.dt_mov <= (TO_DATE (:b1) + .99999))
AND b.empresa = a.empresa
AND b.cod_oper = a.cod_oper
order by a.empresa, a.filial, a.produto, a.dt_mov, b.ajuste_estoque,
      b.entrada_saida /*decode(b.entrada_saida, 'E', decode(b.transferencia, 'S', 3, 1), 2)*/,
    a.local, a.qtde_mov

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     13.03      74.99          0     104482          0           2
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     13.03      75.00          0     104482          0           2

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 47  

Rows     Row Source Operation
-------  ---------------------------------------------------
      2  SORT ORDER BY (cr=104482 r=0 w=0 time=74999502 us)
      2   TABLE ACCESS BY INDEX ROWID CE_MOVEST (cr=104482 r=0 w=0 time=74999276 us)
1276536    NESTED LOOPS  (cr=4901 r=0 w=0 time=29219344 us)
    103     TABLE ACCESS FULL CE_OPERACOES (cr=3 r=0 w=0 time=728 us)
1276432     INDEX RANGE SCAN CE_MOVEST_FK3 (cr=4898 r=0 w=0 time=10600999 us)(object id 29012)

********************************************************************************

As we can see, the time for the query havin tables DBMS_STATSed were 75 seconds; when ANALYZEd, it 
took about 6 seconds... Is there a reason for that?

Thanks,
Bruno 


Followup   May 15, 2004 - 12pm Central time zone:

umm, exact commands used please? 

3 stars Related questions for a data load(9.2.0.4.0)   May 16, 2004 - 9pm Central time zone
Reviewer: Matt from Australia
I have a third party system that initially has no data. I need to load alot of data into this 
system. The product provides a number of interface tables into which I can load my data. I then 
initiate a third party supplied procedure that will pick up the data in the interface tables, 
validate  and "interalize"  the data.

I want the load to run efficiently. I believe my best bet is to re-generate stats (on the products 
internal tables) whilst the load is in progress (a 10-20 hrs process). Here are my questions:

Will open cursors necessarily be reparsed when the stats have been generated?

I plan to also generate system stats (io, cpu etc). I am also wondering about generating stats on 
'SYS' and 'SYSTEM'. Is this recommended on 9.2.0.4.0?

Many Thanks.


 


Followup   May 17, 2004 - 7am Central time zone:

"Will open cursors necessarily be reparsed when the stats have been generated?"

if not, why bother generating stats at all -- they will have their entire working set already 
cached in memory (in the shared pool).  if we did not invalidate the cursors, the stats would "not 
be used".  dbms_stats does have an invalidate flag you can use to control that --- but, I would ask 
the vendor for their best practice with their product (make sure they support the cbo first and 
foremost)

as long as you always do gather stats on sys and system, it is definitely a supported option in 
9ir2 -- in 10g, it will be "the only way" in fact. 

3 stars What is the difference between the following options..   May 28, 2004 - 10am Central time zone
Reviewer: Matt from Australia
to dbms_stats.gather_schema_stats:

method_opt => 'FOR ALL INDEXED COLUMNS'
and
method_opt => 'FOR ALL COLUMNS SIZE AUTO' ?

I just drastically improved the performance of a query by using the first. The first appears to 
have difference use different bucket sizes and generates more stats. Can you please comment on what 
should be "best practice" as far as generating stats is concerned?

Many Thanks.

I've noted the follwing difference in stats:

 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------

 VMETERID                                  NOT NULL NUMBER(38)
 CUI_TID                                   NOT NULL NUMBER(38)
 INTERVAL_ENDDT                            NOT NULL DATE
 INTERVAL_STARTDT                          NOT NULL DATE
 VALUE                                              FLOAT(126)
 READ_TYPE_FLAG                                     VARCHAR2(5)
 TRANS_TID                                          NUMBER(38)
 TRANS_UID                                          NUMBER(38)
 TRANS_STARTTS                                      DATE

method_opt => 'FOR ALL INDEXED COLUMNS':
=======================================

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS NUM_BUCKETS          DENSITY LAST_ANALYZED   
                                
------------------------------ ------------ ---------- ----------- ---------------- 
-------------------                             
CUI_TID                                   1          0           1     1.0000000000 05/28/2004 
23:03:14                             
INTERVAL_ENDDT                          796          0           1      .0012562814 05/28/2004 
23:03:14                             
INTERVAL_STARTDT                        792          0           1      .0012626263 05/28/2004 
23:03:14                             
READ_TYPE_FLAG                            3          0           1      .3333333333 05/28/2004 
23:03:14                             
TRANS_STARTTS                          7894          0           1      .0001266785 05/28/2004 
23:03:14                             
TRANS_TID                                 1          0           1     1.0000000000 05/28/2004 
23:03:14                             
TRANS_UID                                 1          0           1     1.0000000000 05/28/2004 
23:03:14                             
VALUE                                 57746          0           1      .0000173172 05/28/2004 
23:03:14                             
VMETERID                             487822          0           1      .0000020499 05/28/2004 
23:03:14    

method_opt => 'FOR ALL COLUMNS SIZE AUTO':
=========================================

 Name                                                                     Null?    Type
 ------------------------------------------------------------------------ -------- 
-------------------------------------------------
 VMETERID                                                                 NOT NULL NUMBER(38)
 CUI_TID                                                                  NOT NULL NUMBER(38)
 INTERVAL_ENDDT                                                           NOT NULL DATE
 INTERVAL_STARTDT                                                         NOT NULL DATE
 VALUE                                                                             FLOAT(126)
 READ_TYPE_FLAG                                                                    VARCHAR2(5)
 TRANS_TID                                                                         NUMBER(38)
 TRANS_UID                                                                         NUMBER(38)
 TRANS_STARTTS                                                                     DATE


INDEXES FOR TABLE: VMC_DATA
================================================


INDEX_NAME                     COLUMN_NAME                    COLUMN_POSITION                       
                                
------------------------------ ------------------------------ ---------------                       
                                
PKVMC_DATA                     VMETERID                                     1                       
                                
                               CUI_TID                                      2                       
                                
                               INTERVAL_STARTDT                             3                       
                                


TABLE STATISTICS FOR : VMC_DATA
===================================================


  NUM_ROWS     BLOCKS AVG_ROW_LEN LAST_ANALYZED                                                     
                                
---------- ---------- ----------- -------------------                                               
                                
   4680488      35452         100 05/28/2004 22:27:10                                               
                                


COLUMN STATISTICS FOR : VMC_DATA
====================================================


COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS NUM_BUCKETS          DENSITY LAST_ANALYZED   
                                
------------------------------ ------------ ---------- ----------- ---------------- 
-------------------                             
CUI_TID                                   1          0           1     1.0000000000 05/28/2004 
22:27:10                             
INTERVAL_STARTDT                        792          0          68      .0012626263 05/28/2004 
22:27:10                             
VMETERID                             487489          0          75      .0000031279 05/28/2004 
22:27:10  


3 stars Confusing Situation (Continued)   June 5, 2004 - 1pm Central time zone
Reviewer: Bruno Di Rei Araujo from Uberlândia, MG, Brazil
Tom,

As you asked for (from my Review above), the exact commands were:

DBMS Stats
----------
exec dbms_stats.delete_table_stats('SAGE', 'CE_MOVEST');
exec dbms_stats.delete_table_stats('SAGE', 'CE_OPERACOES');

exec dbms_stats.gather_table_stats('SAGE', 'CE_MOVEST');
exec dbms_stats.gather_table_stats('SAGE', 'CE_OPERACOES');


ANALYZE
-------
exec dbms_stats.delete_table_stats('SAGE', 'CE_MOVEST');
exec dbms_stats.delete_table_stats('SAGE', 'CE_OPERACOES');

analyze table ce_movest delete statistics;
analyze table ce_operacoes delete statistics;

analyze table ce_movest estimate statistics for table for all indexes for all indexed columns;
analyze table ce_operacoes estimate statistics for table for all indexes for all indexed columns;


Note that my approach was, first, remove all relevant statistics for the method used and then 
gather them brand new.

Just can't understand why is it so different... Can you see any reason for it?

Thanks again,
Bruno. 


Followup   June 5, 2004 - 1pm Central time zone:

add 

cascade => true 

to the dbms_stats and retry (so you compare apples to apples).  you have no index stats. 

4 stars Statistics   September 21, 2004 - 2am Central time zone
Reviewer: Aru from nz
Hi Tom,

Please can you clarify in what circumstances and why would you use the 3 different(listed below ) 
methods to analyze tables,

1). a@ORA920LAP> analyze table t compute statistics
    for table for all indexes for all indexed columns;
2). a@ORA920LAP> analyze table t compute statistics;
3). dbms_stats(....

I,m using oracle8i.

Q1) Why would I need to compute stats for 
all indexed columns when the whole table is getting 
analyzed.

Q2) Does dbms_stats analyze the table, it's indexes and the indexed columns too or what exactly 
does it do.If I specify a schema then does it to all the above mentioned statistics.

Thanks lots as always Tom,
Regards,
Aru.



 


Followup   September 21, 2004 - 7am Central time zone:

1) i would not (do not) -- i have old examples that do but promise never to do it in an example 
again

2) see #1

3) yes.

q1) because you wanted to have histograms for all indexed columns (not saying you do, but if you 
did "for all indexed columns" that is the outcome -- you have histograms for indexed columns)

q2) see the documentation -- it'll tell you that method_opt defaults to 'for all columns size 1' 
and cascade (to indexes) defaults to FALSE, so you would have to pass TRUE to get indexes done. 

5 stars cascade=true -- > local indexes and global indexes?   January 2, 2005 - 7am Central time zone
Reviewer: Pravesh Karthik from India 
Tom,


For a partitioned table, if i say cascade=true. It will collect stats on all local indexes and 
global indexes?

If yes,i dont need to use GATHER_INDEX_STATS - right?

Please confirm.

Thanks a lot
Pravesh Karthik  


Followup   January 2, 2005 - 11am Central time zone:

depends on your granularity -- but left to the default, 

exec dbms_stats.gather_table_stats( user, 'T', cascade=>true );

will get table, table partition, index, index partition statistics.
 

5 stars if yes -- then index statistics in which mode?   January 2, 2005 - 7am Central time zone
Reviewer: Pravesh Karthik from India 
...contd

and if yes for local and global indexes. what mode does it do for the indexes ?

"It's not recommended to compute index statistics in estimated mode, as the Cost Optimizer  is 
almost based on index statistics, particularly on the CLUSTERING_FACTOR value. It needs the 
statistics  would be as accurate as possible. " 

above lines are from -- 
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_id=175258.1&p_database_id=NOT
Thanks again .. 


Followup   January 2, 2005 - 11am Central time zone:

that is a broad sweeping generalization that is not really accurate (or well stated -- "... to 
COMPUTE index statistics in ESTIMATED mode..." -- bummer.  You either COMPUTE or you ESTIMATE)

and to say the CBO is almost based on index statistcs -- well, I sent an email to the author of the 
note asking them to remove that paragraph. 

5 stars awesome !!   January 2, 2005 - 11am Central time zone
Reviewer: Pravesh Karthik from India 


5 stars   January 3, 2005 - 2am Central time zone
Reviewer: chong 
Hi Tom,
1)exec dbms_stats.gather_database_stats(method_opt=>'FOR ALL INDEXED 
COLUMNS',degree=>7,cascade=>true);

2)exec dbms_stats.gather_database_stats(method_opt=>'FOR ALL INDEXED COLUMNS SIZE 
AUTO',degree=>7,cascade=>true);

Is the first command will collecting histogram? If yes what is the different between first and 
second command?

thanks 


Followup   January 3, 2005 - 8am Central time zone:

see
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:5792247321358#7417227783861
they are very different.  size auto sort of needs a historical workload to peek at (queries) as 
well -- that is part of it's algorithm.  

consider:

ops$tkyte@ORA9IR2> create table t as select * from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create index t_idx1 on t(object_id);
 
Index created.
 
ops$tkyte@ORA9IR2> create index t_idx2 on t(owner);
 
Index created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed 
columns', cascade => true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' 
group by column_name;
 
COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_ID                    76
OWNER                        22
 
histograms collected for both...

ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed 
columns size auto', cascade => true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' 
group by column_name;
 
COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_ID                     2
OWNER                         2

no histograms -- just hi/lo values and such.  there were no queries that ever referenced either 
column so histograms were not collected

 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed 
columns size skewonly', cascade => true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' 
group by column_name;
 
COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_ID                     2
OWNER                        22
 
it looked at the columns and said "object id, unique -- no histograms needed there", owner -- not 
so unique and pretty skewed, lets get them



But see that link, the meanings are defined there. 

5 stars   January 3, 2005 - 10pm Central time zone
Reviewer: chong 
Hi Tom,
Based on the link, size auto means:
It'll collect histograms in memory only for those columns which are used by your applications 
(those columns appearing in a predicate involving an equality, range, or like operators). we know 
that a particular column was used by an application because at parse time, we'll store workload 
information in SGA. Then we'll store histograms in the data dictionary only if it has skewed data 
(and it worthy of a histogram)
What i understand from the above definition is (please correct me if i'm wrong), oracle will only 
collect histogram on those tables' columns which are ever been used since database startup and has 
skewed data, it will not collect histogram for others tables' columns that never been used since 
database startup even skewed data exists. If my understanding is true, how long oracle will store 
those table parsing information in SGA (so that dbms_stats will know which column to analyze)? Does 
it means this option is not appropriate for a fresh database which is just starting up? 
I see some of the oracle notes highly recommend to used this option to analyze the histogram 
information, is it true?
 


Followup   January 3, 2005 - 10pm Central time zone:

traveling right now but this is worthy of followup later after some research - so, check back maybe 
later this week.

(see, I don't know everything off the top of my head :)



Update:

looks like sys.col_usage$, it is persistently stored in the dictionary.  The name of the columns 
that represent "flags" in there says it all.


 

3 stars dbms_stats.gather_schema_stats - Oracle9i   January 4, 2005 - 4pm Central time zone
Reviewer: Altieri from Sao Paulo, SP Brazil
Tom,
    I'm using dbms_stats.gather_schema_stats( ownname=>'XXX') to analyze a whole schema. 
    I dont want to analyze just a table in this schema. What I have to do ?. 
    I have been verifying all parameters, but and I didnt find any options yet. 

Thanks,
Altieri
 


Followup   January 5, 2005 - 9am Central time zone:

there is no "anti gather" parameter.


you would have to use gather table stats on the tables you wanted stats on, or move the table to 
another schema.

seems strange to not want stats on a single table doesn't it? 

5 stars How to determine if statistics were collected by ANALYZE?   January 7, 2005 - 3pm Central time zone
Reviewer: Alexander from Chicago
Tom, how to determine if the statistics were collected by ANALYZE command or by the package?

 


Followup   January 8, 2005 - 4pm Central time zone:

if you look at columns that are filled in by ANALYZE but not dbms_stats - avg_space is one - you 
might be able to tell if ANALYZE was at some time in the past used:

ops$tkyte@ORA9IR2> create table t as select * from all_Users;
 
Table created.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select avg_space from user_tables where table_name = 'T';
 
 AVG_SPACE
----------
         0
 
ops$tkyte@ORA9IR2> analyze table t compute statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> select avg_space from user_tables where table_name = 'T';
 
 AVG_SPACE
----------
      7488
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select avg_space from user_tables where table_name = 'T';
 
 AVG_SPACE
----------
      7488


but dbms_stats won't overwrite the value -- so it is not conclusive proof.
 

5 stars followup on 3 jan 2005   January 11, 2005 - 2am Central time zone
Reviewer: chong 
Hi Tom,
Do you find out any research on 3rd jan 2005 thread which relate to dbms_stats with size auto 
option? 

thanks 


3 stars dbms stats failure   January 11, 2005 - 12pm Central time zone
Reviewer: Rob H from Winnipeg Canada
I have the database setup (9i) to gather statistics however I am running into a strange error

BEGIN DBMS_STATS.GATHER_DATABASE_STATS(cascade=>TRUE); END;

*
ERROR at line 1:
ORA-00904: : invalid identifier 
ORA-06512: at "SYS.DBMS_STATS", line 9375 
ORA-06512: at "SYS.DBMS_STATS", line 9857 
ORA-06512: at "SYS.DBMS_STATS", line 10041 
ORA-06512: at "SYS.DBMS_STATS", line 10134 
ORA-06512: at "SYS.DBMS_STATS", line 10114 
ORA-06512: at line 1 

I don't understand what 
"ORA-00904: : invalid identifier" means with nothing between the 2 colon's.  Any ideas?  I think I 
know the schema its failing in. 


Followup   January 11, 2005 - 1pm Central time zone:

trace it.

any 'bad' sql will be in the trace file.  if you think you know the schema, just gather schema 
stats and see if it repros in there, tracefile would be smaller. 

3 stars structure not sql   January 11, 2005 - 1pm Central time zone
Reviewer: Rob H from Winnipeg Canada
Because I'm doing an analyze, isn't it an issue of object structure, not sql?  Regardless, I am 
tracing it now. 


Followup   January 11, 2005 - 2pm Central time zone:

dbms_stats runs sql

ora-904 is raised from sql. 

5 stars interesting....   January 12, 2005 - 12pm Central time zone
Reviewer: Rob H from Winnipeg Canada
after analyzing the tkprof for the failure I found that the table that was failing.  What was 
interesting was that I found that the table being analyzed was failing on a specific index.  One 
that is incredibly old (also note that table contains hourly transactions in the realm of only 6 
million rows).  Dropping the index and using dbms_stats again worked.  Adding the index back and 
using dbms_stats worked.

TKPROF

********************************************************************************

select /*+ cursor_sharing_exact dynamic_sampling(0) no_monitoring no_expand 
  index(t,"INDX_TAB_CAMP") */ count(*) as nrw,count(distinct 
  sys_op_lbid(30519,'L',t.rowid)) as nlb,count(distinct "CAMP") as ndk,
  sys_op_countchg(substrb(t.rowid,1,15),1) as clf 
from
 "KCAGENT"."TAB_ALL"  sample block 
  (5.77406383513445589344056295551646142247)  t where "CAMP" is not null


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.00       0.00          1         14          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.00       0.00          1         14          0           0

Misses in library cache during parse: 1
Optimizer mode: CHOOSE
Parsing user id: 43     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  SORT GROUP BY (cr=0 r=0 w=0 time=0 us)
      1   INDEX SAMPLE FAST FULL SCAN INDX_TAB_ALL_ID_DATE (cr=14 r=1 w=0 time=6851 us)(object id 
49563)

********************************************************************************


But the index in the hint is different than the index in the explain.  

Regardless, the immediate problem is solved, thank you. 


4 stars will the statistics be automatically be updated ?   January 30, 2005 - 12pm Central time zone
Reviewer: Chenna 
I'm interested in knowing in the following

1.Lets say table t has 1000 rows

2.Generate statistics for table t and the associated indexes.

3.Now lets truncate table t.

--will the statistics be automatically be updated ?

4.In the next scenario , lets delete all the rows without truncating.

--will the statistics be automatically be updated ?

If they dont can we do something to update the statistics immedietly based on a certain operation.

Thanks






 


Followup   January 30, 2005 - 12pm Central time zone:

when you tried it, what happened?  I mean, the scenario you scoped out here would be pretty 
*trivial* to test?  That is what I'm trying to teach here -- you "spec"ed out the question very 
well, the test case you would need to see would be very easy.


statistics are gathered when you ask us to gather them, or some job you run gathers them.   

5 stars Very interesting discussion.   May 22, 2005 - 9pm Central time zone
Reviewer: Bob from Michigan USA
Thanks everyone.

My question - 10g's "black box" statistics collection mechanism, does it collect histograms and if 
so, through what parameters?

I've been reasonably satisfied with the execution plans we're getting and collecting system 
statistics in particular has been a boon.  However there still are some zingers and they typically 
involve tables with what I know to be rather skewed data distribution on joined columns.  I'd like 
to keep the process as "black box" as possible (our DBAs have bigger problems to worry about) so 
I've been wondering how I could verify and/or incorporate histogram collection into the black box, 
or whether I need to disable the auto collection and wrap up my own.

Thanks in advance - this forum is great! 


Followup   May 23, 2005 - 8am Central time zone:

it uses "auto" and "auto" uses sys.col_usage$ to determine what columns have predicates applied to 
them and how the predicates use the column (equality, range scans.

so the set of columns that histograms are collected on can vary over time by simply introducing a 
new query into the system that "where's" or joins on columns that were never used that way before. 

5 stars   May 25, 2005 - 8am Central time zone
Reviewer: A reader 
Once a table is set up as Monitoring, how do I disable monitoring, if I need to? 


Followup   May 25, 2005 - 11am Central time zone:

ops$tkyte@ORA9IR2> alter table t monitoring;
                                                                                
Table altered.
                                                                                
ops$tkyte@ORA9IR2> alter table t NOmonitoring;
                                                                                
Table altered.
 

5 stars   May 25, 2005 - 8am Central time zone
Reviewer: Ravi 
How do I disable MONITORING on a table? 


3 stars DBMS_STATS with low sample size produces wrong DISTINCT_KEYS on nonunique index   May 27, 2005 - 3pm Central time zone
Reviewer: Marco Coletti from Trento, ITALY
DBMS_STATS seems very unaccurate at estimating number of distinct keys on nonunique indexes.

I think it could be a bug.

/*
  Test case:
  DBMS_STATS gives wrong DISTINCT_KEYS for nonunique index   
*/

select * from V$VERSION;

/*
BANNER
------------------------------------------------------------Oracle9i Enterprise Edition Release 
9.2.0.6.0 - 64bit Production
PL/SQL Release 9.2.0.6.0 - Production
CORE    9.2.0.6.0       Production
TNS for Solaris: Version 9.2.0.6.0 - Production
NLSRTL Version 9.2.0.6.0 - Production
*/


create table NUMBERS as
select mod(N,2000001) N from (select ROWNUM N from DUAL connect by ROWNUM <= 6000000)
;

create index IX_N on NUMBERS (N) compute statistics;

select count(distinct N) from NUMBERS;
/*
COUNT(DISTINCTN)
----------------
         2000001
*/

select i.DISTINCT_KEYS, i.SAMPLE_SIZE, i.LAST_ANALYZED
from USER_INDEXES i
where i.INDEX_NAME = 'IX_N'
;
/*
DISTINCT_KEYS SAMPLE_SIZE LAST_ANALYZED
------------- ----------- -------------------
      2000001     6000000 2005-05-25 16:25:12
*/


-- begin bug ------------------------------------
begin
  DBMS_STATS.GATHER_INDEX_STATS (
    ownname => user, 
    indname => 'IX_N',
    estimate_percent => 20,
    degree => 1);
end;
/

select i.DISTINCT_KEYS, i.SAMPLE_SIZE, i.LAST_ANALYZED
from USER_INDEXES i
where i.INDEX_NAME = 'IX_N'
;
/*
DISTINCT_KEYS SAMPLE_SIZE LAST_ANALYZED
------------- ----------- -------------------
       414785     1188855 2005-05-25 16:27:39
*/
-- end bug ---------------------------------------


analyze index IX_N estimate statistics sample 20 percent; 

select i.DISTINCT_KEYS, i.SAMPLE_SIZE, i.LAST_ANALYZED
from USER_INDEXES i
where i.INDEX_NAME = 'IX_N'
;
/*
DISTINCT_KEYS SAMPLE_SIZE LAST_ANALYZED
------------- ----------- -------------------
      2023171     1415367 2005-05-25 16:47:17
*/


begin
  DBMS_STATS.GATHER_INDEX_STATS (
    ownname => user, 
    indname => 'IX_N',
    estimate_percent => NULL,
    degree => 1);
end;
/

select i.DISTINCT_KEYS, i.SAMPLE_SIZE, i.LAST_ANALYZED
from USER_INDEXES i
where i.INDEX_NAME = 'IX_N'
;
/*
DISTINCT_KEYS SAMPLE_SIZE LAST_ANALYZED
------------- ----------- -------------------
      2000001     6000000 2005-05-25 16:48:53
*/

 


Followup   May 27, 2005 - 5pm Central time zone:

it is being computed the way the intended (believe it or not).

do you have a compelling case where by the number of distinct values results in a really wrong 
plan?

There is an enhancement request in to have a "re-look" at this, but if you have a compelling "bad 
plan", I'll be glad to take a look and see if we cannot get that into the system. 

1 stars No compelling bad plan due to bad NDV for nonunique index   May 31, 2005 - 9am Central time zone
Reviewer: Marco Coletti from Trento, ITALY
We have lots of almost-unique indexes in our system.

Probably this is a simptom of bad design, but in development we were heavily constrained by the 
requirements, which are basically:
* a linear chain of four external systems that exchange messages bidirectionally about entities via 
Tibco bus; the systems do not employ an entity omogeneus unique key, but only entity unique keys 
specific for each system (i.e. each system marks an entity with its own key format), and 
correlation keys from system to system
* collect entities and related events from these systems intercepting the messages in near 
real-time
* aggregate the events every 1 hour
* make a entity flat warehouse table to be queried by a GUI; refresh it every 1 hour
* delete "dead" entities each day
* the average number of live entities is about 3.5 million per system

I don't remember a specific query which performs bad due to this problem.
 


Followup   May 31, 2005 - 10am Central time zone:

that is the issue though.

dbms_stats uses block sampling to quickly guestimate these statistics.  the block sampling tends to 
read "sorted data" in the index.  So, the number of distinct values is hard to compute.  Analyze on 
the other hand used a totally different technique.

But unless you can show this is leading to "bad things happening", that the NDV is off materially 
affects somethnig -- there isn't an issue.  The sampling is happening the way they intended it to 
happen, the bug was opened and closed on this.  They would need a compelling case in order to 
re-look at it.  

Is NDV accurate?  No, observably so.
Is that fact an issue?  That is what they would need to see. 

5 stars External tables   June 6, 2005 - 9am Central time zone
Reviewer: A reader 
We thought we'd create a DDL trigger to automatically turn on Monitoring for new tables, but this 
DID NOT work for External tables.
Is there some way, (considering it is an AFTER trigger) to know, for SURE, that the current table

1)Is NOT an External table and set them to monitoring
or
2)IS an External table and dont set Monitoring for it?

CREATE OR REPLACE TRIGGER object_change AFTER
CREATE ON SCHEMA
DECLARE
alt_mont varchar2(1000);
BEGIN
 if (ora_sysevent in  ('CREATE') and
                        ora_dict_obj_type = 'TABLE')  then
 alt_mont :='ALTER TABLE ' ||ora_dict_obj_name||' MONITORING ';
  execute immediate alt_mont;
 end if;
end;
/ 


Followup   June 6, 2005 - 10am Central time zone:

either query the dictionary or just "try" and catch the exception (expect it, ignore it) 

5 stars   June 6, 2005 - 2pm Central time zone
Reviewer: A reader 
Thanks, accept catching exception is a brilliant solution, but  can't find anything in the Data 
Dictionery that suggests the table being created is an External one.
By looking at xxx_objects or xxx_tables, what is the DEFINITIVE way to categorise Tables? 


Followup   June 6, 2005 - 3pm Central time zone:

ops$tkyte@ORA9IR2> select * from user_external_tables;
 
no rows selected
 

4 stars replacing analyze with dbms_stats in 9.2.0.6   July 18, 2005 - 1pm Central time zone
Reviewer: James K. Purtzer from Lincoln, NE USA
Currently using:
DBMS_UTILITY.ANALYZE_SCHEMA('ESD', 'COMPUTE');
It takes about 70 minutes to run on about 30 Gb of Data warehouse weekly.

Would like to replace with:

dbms_stats.gather_schema_stats(ownname=>'ESD',
estimate_percent=>dbms_stats.auto_sample_size,
method_opt=>'AUTO', cascade=>TRUE);

1. I'm struggling with how to assess impact on the production server. I can run the new
query on an RMAN restore of the data, without a representative connection load. Is that a fair test 
per your earlier recommendations? 

2. Does the old analyze data need to be dropped before I run the new statement? 

3.Looking at the DBMS_STATS statement above, as its specified
It will only gather stale statistics from objects the CBO is using?

 


Followup   July 18, 2005 - 1pm Central time zone:

1) I'd be looking to compare plans.  Biggest thing will be auto in the method_opt, in order for 
that to work, you need to have had some queries going over time.  auto (you have the syntax wrong, 
see below) will not work "good" until it knows the predicates you use:

ops$tkyte-ORA9IR2> create table ttt as select object_name, owner, object_type from all_objects;
 
Table created.
 
ops$tkyte-ORA9IR2> create index t_idx1 on ttt(owner);
 
Index created.
 
ops$tkyte-ORA9IR2> exec dbms_stats.gather_schema_stats( ownname=> user, 
estimate_percent=>dbms_stats.auto_sample_size, method_opt=>'for all columns size 
AUTO',cascade=>true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte-ORA9IR2>
ops$tkyte-ORA9IR2> set autotrace traceonly explain
ops$tkyte-ORA9IR2> select * from ttt t1 where owner = 'SCOTT';
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=1271 Bytes=48298)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TTT' (Cost=12 Card=1271 Bytes=48298)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX1' (NON-UNIQUE) (Cost=3 Card=1271)
 
see the card = 1271 
 
ops$tkyte-ORA9IR2> set autotrace off
ops$tkyte-ORA9IR2>
ops$tkyte-ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'TTT' 
group by column_name;
 
COLUMN_NAME                      COUNT(*)
------------------------------ ----------
OBJECT_NAME                             2
OBJECT_TYPE                             2
OWNER                                   2
 
only two buckets, no histograms really

ops$tkyte-ORA9IR2> exec dbms_stats.gather_schema_stats( ownname=> user, 
estimate_percent=>dbms_stats.auto_sample_size, method_opt=>'for all columns size 
AUTO',cascade=>true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte-ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'TTT' 
group by column_name;
 
COLUMN_NAME                      COUNT(*)
------------------------------ ----------
OBJECT_NAME                             2
OBJECT_TYPE                             2
OWNER                                  22
 
but it saw the predicate now, so it got them

ops$tkyte-ORA9IR2>
ops$tkyte-ORA9IR2> set autotrace traceonly explain
ops$tkyte-ORA9IR2> select * from ttt t2 where owner = 'SCOTT';
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=40 Bytes=1520)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TTT' (Cost=2 Card=40 Bytes=1520)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX1' (NON-UNIQUE) (Cost=1 Card=40)
 
 
 
ops$tkyte-ORA9IR2> set autotrace off
ops$tkyte-ORA9IR2> spool off

and that card= is more accurate



2) no

3) no, it'll gather them all.  you would need to use the alter table T monitoring command and then 
do a gather stale option. 

5 stars so method_opt updates as queries run on new stats?   July 18, 2005 - 3pm Central time zone
Reviewer: James K. Purtzer from Lincoln, NE USA
How does the method_opt=>'for all column size AUTO' update as the CBO parses queries? Your example 
showed cardiality decreasing as more queries run on the table, correct?  


Followup   July 18, 2005 - 4pm Central time zone:

ctl-f for col_usage$ on this page. 

3 stars transport stats to another table   July 18, 2005 - 4pm Central time zone
Reviewer: valent from slovakia
Hello Tom,

let's say i want to transport statistics from one big table to another. 

create table t2 as select * from big_table

big table has up to date statistics and i want to transport them to t2

import_table_stats will not work, because it uses same table name.

thanks for your option 


Followup   July 18, 2005 - 5pm Central time zone:

you can use SET_TABLE_STATS  (you do know the stats will be different right -- t2 will be nicely 
compacted, no whitespace, the clustering factor on indexes will possibly be different.....)

the stats are not made to go from table to table.

but if you look at the stats table you export to, you'll see table names in there, it is not hard 
to figure out what you would have to update in the stats table after you export to it to touch up 
the table name. 

3 stars   July 19, 2005 - 4am Central time zone
Reviewer: valent from slovakia
thank you Tom,

we have large reporting table (no updates) with 15+ million rows par partition, created with CTAS. 
Statistics should be very similar for my new table copy. 


Followup   July 19, 2005 - 7am Central time zone:

go for it then 

3 stars suggestion..!   July 20, 2005 - 5am Central time zone
Reviewer: Reader from japan.
Tom,

Every week we are analyzing the database schema using DBMS_STAT package as a batch process in our 
production environment. 
But the this job is getting failed often with ORA-01555 Snapshot too old.We could understand the 
cause of the problem that is,this analyze job and another job which is schedule in the batch 
running parallely getting clashed at some point and end with ORA-01555 error.Here my question is,if 
we re-run the DBMS_STAT package after it failed,its analyzing all the objects from the begining.But 
we want to analyze objects from the point it failed previously.Is there any option in DBMS_STAT 
package?.Pls advice.I hope you understand the scenario .







 

 

 


Followup   July 20, 2005 - 6am Central time zone:

why don't we fix the problem?  the 1555?  make your UNDO permanently larger -- sized right to 
fullful the requirements of your system

the other path will be "write code" as dbms_stats doesn't have a "remember where you were and start 
over", we'd have to help it out that way. 

4 stars dbms_stats hanging,   July 20, 2005 - 9am Central time zone
Reviewer: sns from austin,tx
Lately we are experiencing an issue with DBMS_STATS package when we try to analyze the table.

This is what we did:
BEGIN dbms_stats.gather_table_stats(ownname => 'EURO_TSTAT',  estimate_percent => 15,  tabname => 
'CALL_SYSTEM_EURO',  block_sample => FALSE,  method_opt => 'FOR ALL INDEXED COLUMNS SIZE 1',  
degree => 4,  granularity => 'ALL',  cascade => TRUE ); END;


What are the things we need to see while dbms_stats are in hung state?
Any steps to resolve this?

thanks, 


Followup   July 20, 2005 - 12pm Central time zone:

you could trace it, you could query v$ views (eg: v$session_event) to see what it is waiting on



sql_trace with waits (10046 trace) could be used.

 

3 stars need to analyze index separately..!   August 6, 2005 - 10pm Central time zone
Reviewer: Reader from Trichy,India.
Tom,

I have noted that gather schema stats would not analyze indexes until we specify cascade=>true,
is it true?
here the procesure I'm using,kindly give suggestion on this to do the normal process with indexes 
as well..
exec dbms_stats.gather_schema_stats( -
ownname          => 'OMSTNX', -
options          => 'GATHER AUTO', -
estimate_percent => dbms_stats.auto_sample_size, -
degree           => DBMS_STATS.DEFAULT_DEGREE  -
)

 


Followup   August 7, 2005 - 9am Central time zone:

that is the documented purpose of cascade, yes.   

5 stars Trying to understand this behavior...   August 15, 2005 - 10am Central time zone
Reviewer: Mark from Boston, MA
Hi Tom,

We have this as a job run Weekly (Sunday, Early AM):
DBMS_STATS.GATHER_SCHEMA_STATS(ownname =>'HT4'
, estimate_percent => 40
, block_sample => FALSE
, method_opt => 'FOR ALL COLUMNS SIZE 1'
, degree => NULL
, granularity => 'DEFAULT'
, cascade => TRUE
);

Now, I came into work Monday and the performance was 'yukky', so I did this:
DECLARE
BEGIN
   EXECUTE IMMEDIATE 'ANALYZE TABLE PAT_INSURANCE ESTIMATE STATISTICS FOR TABLE FOR ALL INDEXED 
COLUMNS FOR ALL INDEXES';
   EXECUTE IMMEDIATE 'ANALYZE TABLE PAT_PROCEDURE ESTIMATE STATISTICS FOR TABLE FOR ALL INDEXED 
COLUMNS FOR ALL INDEXES';
   EXECUTE IMMEDIATE 'ANALYZE TABLE PAT_PROCEDURE_BALANCE ESTIMATE STATISTICS FOR TABLE FOR ALL 
INDEXED COLUMNS FOR ALL INDEXES';
   EXECUTE IMMEDIATE 'ANALYZE TABLE PAT_FINANCIAL_TRANSACTION ESTIMATE STATISTICS FOR TABLE FOR ALL 
INDEXED COLUMNS FOR ALL INDEXES';
   EXECUTE IMMEDIATE 'ANALYZE TABLE ALL_NAME ESTIMATE STATISTICS FOR TABLE FOR ALL INDEXED COLUMNS 
FOR ALL INDEXES';
EXCEPTION
   WHEN OTHERS
   THEN
      NULL;
END;

... And all was good again. Any ideas what is incorrect perhaps on the 
DBMS_STATS.GATHER_SCHEMA_STATS parameter settings, and could you suggest a good method using this 
procedure?

Thanks! 


Followup   August 15, 2005 - 4pm Central time zone:

you would need to tell us what happened between sunday early am and monday morning.

and why you have a when other then null???????? 
erase that and stop using it, you want to just silently ignore ERRORS!!!!!???

I can say that the dbms_stats -- you told it "no histograms", and the analyze you let it "get 
histograms" on the indexed columns -- that jumps right out. 

5 stars Yes, ok.   August 17, 2005 - 2pm Central time zone
Reviewer: Mark from Boston, MA
Thanks Tom.

The Exception handler was quick adhoc code and I don't usually code that way.

Anyway, this is what I've done:

1) turned table monitoring on
2) Via JOB at 1AM:
/* Gather Stale Stats Nightly */
DBMS_STATS.GATHER_SCHEMA_STATS(ownname =>'HT4',
estimate_percent => 40, block_sample => FALSE, method_opt =>
'FOR ALL INDEXED COLUMNS', degree => NULL, granularity =>
'DEFAULT', options => 'GATHER STALE', cascade => TRUE);

3) Once per week (Sunday Early AM):
/* Gather All Stats Weekly */
DBMS_STATS.GATHER_SCHEMA_STATS(ownname =>'HT4',
estimate_percent => 40, block_sample => FALSE, method_opt =>
'FOR ALL INDEXED COLUMNS', degree => NULL, granularity =>
'DEFAULT', options => 'GATHER', cascade => TRUE);

This seems to give me a more consistent performance.

Thanks! 


3 stars collecting initial stats using dbms_stats   September 2, 2005 - 1pm Central time zone
Reviewer: JK Purtzer from Lincoln, Nebraska USA
running the package mentioned above in my previous review

begin
dbms_stats.gather_schema_stats(ownname=> 'ESD',estimate_percent=>dbms_stats.auto_sample_size, 
method_opt=>'for all columns size 
AUTO',cascade=>true );
commit;
end;

I can't seem to verify with
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96533/stats.htm#33117
SELECT TABLE_NAME, NUM_ROWS, BLOCKS, AVG_ROW_LEN,
       TO_CHAR(LAST_ANALYZED, 'MM/DD/YYYY HH24:MI:SS')
     FROM DBA_TABLES
     WHERE TABLE_NAME IN ('SO_LINES_ALL','SO_HEADERS_ALL','SO_LAST_ALL');
I get 0 rows
thinking I'm missing something about initially gathering stats, I looked at ch6 of effective oracle 
by design also
the procedure only takes 1 sec to run, it can't be gathering stats quite that fast or can it? 


Followup   September 3, 2005 - 7am Central time zone:

well, that dbms_stats will gather stats for the ESD schema

could it take 1 second?  depends on the size of the ESD schema.

That query will return rows for those exact table names, if nothing is coming back, you don't have 
any tables by those names. 

3 stars Nice   September 15, 2005 - 2pm Central time zone
Reviewer: Catherine from OH,USA
Hi Tom,
How Statistics gathering is useful for the Optimizer??
In what ways it uses the statistics??

Please provide a detailed answer.

Bye!!
 


Followup   September 15, 2005 - 2pm Central time zone:

http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96533/optimops.htm#51003

5 stars Support of Analyze   September 28, 2005 - 9am Central time zone
Reviewer: Bill from Wilmington, Delaware
Hi Tom,

Both you and the Oracle documentation (9i) state the cost-based optimizer will eventually use only 
statistics that have been collected by DBMS_STATS.

Do you know when this will be the case?

Also, is the DBMS_DDL.ANALYZE_OBJECT procedure equivalent to ANALYZE TABLE?

Thanks! 


Followup   September 28, 2005 - 10am Central time zone:

The optimizer expects the statistics to be gathered by dbms_stats, the stats generated by analyze 
can and will be different than dbms_stats, but the optimizer is assuming "dbms_stats" did that.

There is no "end of life" on analyze for statistics gathering 

4 stars   October 3, 2005 - 5pm Central time zone
Reviewer: JK Purtzer from Lincoln, NE
I ran the sql below to gather stats:

begin
dbms_stats.gather_schema_stats(ownname=> 'schema',
estimate_percent=>dbms_stats.auto_sample_size, 
cascade=>true); 
commit;
end;

This caused several of the plans for procedures that are "unhinted" and were using indexes to stop 
using them and do full table scans, what would you suggest to look at to find out why.

Thanks again for the great book from the OTN lounge at OOW05 


Followup   October 3, 2005 - 9pm Central time zone:

tell me you tested this in a test system first please????


you just turned on the CBO, entirely.  Some things will go better - some will be unchanged and 
invariably, there will be others that do not go so well the first time (they are the ones you 
NOTICE, the others you do not).

They are something to be analyzed in a testing environment.  Without version info and other stuff - 
no comment. 

5 stars analyze and cursor invalidation   October 4, 2005 - 11am Central time zone
Reviewer: Al from Washington DC USA
In a followup above you state:
----------------
analyzing the table will invalidate open cursors on that object in the shared pool.  the next parse 
against that object will be "hard".  hard does lots more recursive sql then "soft"
-----------------

I don't fully understand what you mean by 'invalidate open cursors".  An anaylze of a table will 
not cause some other session's executing sql to fail, will it?

For complicated reasons I will need to run a bunch of dbms_stats.gather_table_stats after an 
application upgrade, and there will be real users on the application at that point.  I am assuming 
that as the stats are gathered, any parse (soft or hard) of any sql on an analyzed object will 
cause complete re-optimization, but that in no case will an executing statement fail because of the 
analyze.  Is this correct?

The goal here, obviously, is to get users on the application as quickly as possible after the 
upgrade, even if performance isn't what we'd like; then gather the stats as the system runs to 
bring it back to its 'normal' statistics state and 'normal' performance. 


Followup   October 4, 2005 - 4pm Central time zone:

it'll invalidate the shared sql, the cursors in the shared pool so they next time they are 
executed, they'll hard parse.

it will not break "already running" stuff. 

5 stars anaylze table, open cursors and parsing ...   October 4, 2005 - 2pm Central time zone
Reviewer: Catalin CIONTU from Craiova, Romania .
Tom, please correct me if i'm wrong, but by analyzing the table, its corresponding open cursors 
will be invalidated in the shared_pool and the only difference is that the cursors have to do a 
hard parse instead of a soft parse . The process will complete, but will be less effective, this 
time. Regards, catalin .  


Followup   October 4, 2005 - 6pm Central time zone:

the process will complete, yes, as to it "effectiveness", one of three things can happen..... 
(slower, faster, the same) 

4 stars dbms_stats quibble   October 4, 2005 - 6pm Central time zone
Reviewer: JK Purtzer from Lincoln, NE
9.2.0.6 DB
We did some testing,(see general question above, sorry about that one, panic'd)Here is a more 
specific one.

We are now using the DBMS_STATS package for collecting stats. We were previously using the old 
DBMS_UTILITY.ANALYZE_SCHEMA method. 

The following is the new method we are using to collect stats:


dbms_stats.gather_schema_stats(

ownname=>’XXX', 

estimate_percent=>dbms_stats.auto_sample_size, 

method_opt=>'for all columns size AUTO', 

cascade=>true);

 
Since we changed over to the new statistics gathering, we have seen significant degradation in some 
of our existing queries. In analyzing one of the queries, we found some strange behavior by the CBO 
when evaluating a TO_DATE conversion in the where predicate:


When the date string includes a 4-digit year such as this:


WHERE OA.CREATE_DATE >= TO_DATE('7/28/2005', 'MM/DD/RR')


It uses the index on the CREATE_DATE column which is desired.

 

When the date string uses a 2-digit year such as this:

 

WHERE OA.CREATE_DATE >= TO_DATE('7/28/05', 'MM/DD/RR')

 

It does NOT use the index on the CREATE_DATE column even though this also converts the string to 
7/28/2005.

 
The difference in execution times between these two execution plans is less than a second when the 
index is used to minutes when the index is not used.

This is quite confusing and was hoping you could shed some light on this.
 


Followup   October 4, 2005 - 9pm Central time zone:

WHERE OA.CREATE_DATE >= TO_DATE('7/28/2005', 'MM/DD/RR')

THAT is a constant....

WHERE OA.CREATE_DATE >= TO_DATE('7/28/05', 'MM/DD/RR')

THAT is an "unknown", could be 1905, 2005, 2105 - depends on when you execute it (I know, a human 
would be able to sort of make a good guess - but the date format is not constant)


Are you sure this is due to stats?  can we see an autotrace traceonly explain?



 

3 stars date quibble with explain plans   October 4, 2005 - 9pm Central time zone
Reviewer: JK Purtzer from Lincoln, NE
Hope you can read this, my email cut and paste got rid of the formatting, need a clarification
 statement that the 2-digit year is an unknown. I am issuing the query today and by Oracle's rules 
that would mean an 05 year with an RR date format elelment would return year 2005. Tha makes it a 
constant as of right now doesn't it?


Using a 4-digit year in the to_date string:

 

 
  
  
 Cost
 Cardinality
 Bytes
 
SELECT STATEMENT, GOAL = CHOOSE
  
  
 3102
 207
 40986
 
 SORT ORDER BY
  
  
 3102
 207
 40986
 
  HASH JOIN OUTER
  
  
 3094
 207
 40986
 
   HASH JOIN OUTER
  
  
 2717
 206
 30488
 
    NESTED LOOPS
  
  
 2714
 206
 29046
 
     HASH JOIN OUTER
  
  
 2302
 206
 26162
 
      NESTED LOOPS OUTER
  
  
 2227
 206
 23484
 
       NESTED LOOPS
  
  
 2227
 206
 21836
 
        HASH JOIN
  
  
 2021
 206
 16686
 
         TABLE ACCESS BY INDEX ROWID
 WAREHOUSE
 ORDER_AGGREGATE
 1988
 206
 14214
 
          INDEX RANGE SCAN
 WAREHOUSE
 OA_CREATE_DATE_IDX
 30
 5041
  
 
         TABLE ACCESS FULL
 WAREHOUSE
 PUBLISHER_PROFILE
 32
 11898
 142776
 
        TABLE ACCESS BY INDEX ROWID
 ESD
 PUBLISHERS
 1
 1
 25
 
         INDEX UNIQUE SCAN
 ESD
 PUBLISHERS_PK
  
 1
  
 
       INDEX UNIQUE SCAN
 ESD
 SALES_AGENT_ID_PK
  
 1
 8
 
      TABLE ACCESS FULL
 WAREHOUSE
 FULFILLMENTS
 74
 48036
 624468
 
     TABLE ACCESS BY INDEX ROWID
 WAREHOUSE
 ORDER_EXT_AGGREGATE
 2
 1
 14
 
      INDEX UNIQUE SCAN
 WAREHOUSE
 ORDERS_EXT_AGG_PK
 1
 1
  
 
    TABLE ACCESS FULL
 WAREHOUSE
 CURRENCIES
 2
 169
 1183
 
   VIEW
 WAREHOUSE
  
 375
 38293
 1914650
 
    SORT GROUP BY
  
  
 375
 38293
 1799771
 
     HASH JOIN
  
  
 66
 38293
 1799771
 
      TABLE ACCESS FULL
 WAREHOUSE
 RETURN_REASONS
 3
 696
 20880
 
      TABLE ACCESS FULL
 WAREHOUSE
 RETURNS
 62
 38293
 650981
 

 

Using a 2-digit year in the to_date string:

 

05, RR
  
  
  
  
  
 
SELECT STATEMENT, GOAL = CHOOSE
  
  
 21420
 234
 46332
 
 SORT ORDER BY
  
  
 21420
 234
 46332
 
  FILTER
  
  
  
  
  
 
   HASH JOIN OUTER
  
  
 21411
 234
 46332
 
    HASH JOIN OUTER
  
  
 21034
 233
 34484
 
     NESTED LOOPS
  
  
 21031
 233
 32853
 
      HASH JOIN OUTER
  
  
 20565
 233
 29591
 
       NESTED LOOPS OUTER
  
  
 20490
 233
 26562
 
        NESTED LOOPS
  
  
 20490
 233
 24698
 
         HASH JOIN
  
  
 20257
 233
 18873
 
          TABLE ACCESS FULL
 WAREHOUSE
 ORDER_AGGREGATE
 20224
 233
 16077
 
          TABLE ACCESS FULL
 WAREHOUSE
 PUBLISHER_PROFILE
 32
 11898
 142776
 
         TABLE ACCESS BY INDEX ROWID
 ESD
 PUBLISHERS
 1
 1
 25
 
          INDEX UNIQUE SCAN
 ESD
 PUBLISHERS_PK
  
 1
  
 
        INDEX UNIQUE SCAN
 ESD
 SALES_AGENT_ID_PK
  
 1
 8
 
       TABLE ACCESS FULL
 WAREHOUSE
 FULFILLMENTS
 74
 48036
 624468
 
      TABLE ACCESS BY INDEX ROWID
 WAREHOUSE
 ORDER_EXT_AGGREGATE
 2
 1
 14
 
       INDEX UNIQUE SCAN
 WAREHOUSE
 ORDERS_EXT_AGG_PK
 1
 1
  
 
     TABLE ACCESS FULL
 WAREHOUSE
 CURRENCIES
 2
 169
 1183
 
    VIEW
 WAREHOUSE
  
 375
 38293
 1914650
 
     SORT GROUP BY
  
  
 375
 38293
 1799771
 
      HASH JOIN
  
  
 66
 38293
 1799771
 
       TABLE ACCESS FULL
 WAREHOUSE
 RETURN_REASONS
 3
 696
 20880
 
       TABLE ACCESS FULL
 WAREHOUSE
 RETURNS
 62
 38293
 650981
 
 


Followup   October 5, 2005 - 6am Central time zone:

... That makes it a constant as of right now doesn't it? ...


no more than SYSDATE is a constant as of right now.  5 minutes from now - sysdate can be different 
(can be, doesn't have to be! you can freeze time)

WHERE OA.CREATE_DATE >= TO_DATE('7/28/05', 'MM/DD/RR')

Tell me, what date is that - no matter what you answer, I'll say "nope, I was thinking of another 
date"


would need to see the plan part that represented the estimated cardinality for this step.

(you can format plans skinny, I do it all of the time...) 

5 stars How do you know if a table was analyzed with dbms_stats or analyze table?   December 15, 2005 - 7pm Central time zone
Reviewer: Peter Tran from Houston, TX USA
Hi Tom,

Someone asked me this today, but I haven't been able to figure out the answer.

Is there anyway you can tell whether someone analyzed a table using "DBMS_STATS" vs "ANALZE TABLE"?

My gut call is no.

Thanks,
-Peter 


Followup   December 16, 2005 - 7am Central time zone:

it won't be "definitive", but you can take an educated guess.

I'll use dbms_stats, then analyze - compare, then dbms_stats and recompare..


ops$tkyte@ORA10GR2> create table t ( x int, y varchar2(4000), z varchar2(4000) );
Table created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert into t (x) select rownum from all_objects where rownum <= 10;
10 rows created.

ops$tkyte@ORA10GR2> commit;
Commit complete.

ops$tkyte@ORA10GR2> update t set y = rpad('*',4000,'*'), z = rpad('*',4000,'*');
10 rows updated.

ops$tkyte@ORA10GR2> commit;
Commit complete.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR2> create table tstats as
  2  select 'DBMS_STATS' name, a.* from user_tables a where table_name = 'T';

Table created.

ops$tkyte@ORA10GR2> analyze table t compute statistics;

Table analyzed.

ops$tkyte@ORA10GR2> insert into tstats
  2  select 'ANALYZE' name, a.* from user_tables a where table_name = 'T';

1 row created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> column cname format a20
ops$tkyte@ORA10GR2> column dbms_stats format a25
ops$tkyte@ORA10GR2> column analyze format a25
ops$tkyte@ORA10GR2> select a.cname, a.val dbms_stats, b.val analyze
  2    from table( cols_as_rows( q'|select *
  3                                   from tstats
  4                                  where name = 'DBMS_STATS' |' ) ) a,
  5         table( cols_as_rows( q'|select *
  6                                   from tstats
  7                                  where name = 'ANALYZE' |' ) ) b
  8       where a.cname = b.cname
  9             and decode( a.val, b.val, 0, 1 ) = 1
 10  /

CNAME                DBMS_STATS                ANALYZE
-------------------- ------------------------- -------------------------
NAME                 DBMS_STATS                ANALYZE
EMPTY_BLOCKS         0                         3
AVG_SPACE            0                         1878
CHAIN_CNT            0                         10
AVG_ROW_LEN          8005                      8022
GLOBAL_STATS         YES                       NO

6 rows selected.

dbms_stats won't gather things not used by the CBO such as chain_cnt/avg_space in particular.  If 
they are 0 or null - it is unlikely that analyze has been used in the past - dbms_stats would be 
the one.  

But global_stats for a non-partitioned table could be a "tell", avg_space, etc is not "good enough" 
since:

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> delete from tstats where name = 'DBMS_STATS';

1 row deleted.

ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR2> insert into tstats
  2  select 'DBMS_STATS' name, a.* from user_tables a where table_name = 'T';

1 row created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select a.cname, a.val dbms_stats, b.val analyze
  2    from table( cols_as_rows( q'|select *
  3                                   from tstats
  4                                  where name = 'DBMS_STATS' |' ) ) a,
  5         table( cols_as_rows( q'|select *
  6                                   from tstats
  7                                  where name = 'ANALYZE' |' ) ) b
  8       where a.cname = b.cname
  9             and decode( a.val, b.val, 0, 1 ) = 1
 10  /

CNAME                DBMS_STATS                ANALYZE
-------------------- ------------------------- -------------------------
NAME                 DBMS_STATS                ANALYZE
AVG_ROW_LEN          8005                      8022
GLOBAL_STATS         YES                       NO

dbms_stats won't 'reset' them, their being set indicates analyze was used at some time in the 
past - but since analyze and dbms_stats set and reset global_stats - that could be a "tell" for 
many normal tables...

 

5 stars Wow...Awesome.   December 16, 2005 - 12pm Central time zone
Reviewer: Peter Tran from Houston, TX USA
Perfect...

Thanks again Tom. 


5 stars CHAIN_CNT   January 5, 2006 - 10am Central time zone
Reviewer: A reader 
Hi

Since dbms_stats doesnt get CHAIN_CNT for us, if we want to find out if we have chained rows on 
whic tablas how can we achieve it? By analyzing all tablas manually and overwrtting statistics 
gathered by dbms_stats? 


Followup   January 5, 2006 - 11am Central time zone:

ops$tkyte@ORA10GR2> @?/rdbms/admin/utlchain

Table created.

ops$tkyte@ORA10GR2> analyze table t list chained rows;

Table analyzed.

ops$tkyte@ORA10GR2> desc chained_rows
 Name                                     Null?    Type
 ---------------------------------------- -------- ----------------------------
 OWNER_NAME                                        VARCHAR2(30)
 TABLE_NAME                                        VARCHAR2(30)
 CLUSTER_NAME                                      VARCHAR2(30)
 PARTITION_NAME                                    VARCHAR2(30)
 SUBPARTITION_NAME                                 VARCHAR2(30)
 HEAD_ROWID                                        ROWID
 ANALYZE_TIMESTAMP                                 DATE

 

5 stars   January 5, 2006 - 10pm Central time zone
Reviewer: A reader 
Tom,

What is the reason for dbms_stats not gathering info about chained rows whereas analyze command 
does?

Thanks. 


Followup   January 6, 2006 - 1pm Central time zone:

dbms_stats gathers only that which is used by the optimizer - it gathers no other bits of 
information. 

4 stars   January 6, 2006 - 9am Central time zone
Reviewer: Ramesh from MN USA
Tom,
How setting "mbrc" through dbms_stats and  parameter db_file_multiblock_read_count  influence 
differently.
Is that some thing like "mbrc" used during cost evaluation phase and db_file_multiblock_read_count 
used during execution phase.
Thanks for your time.
Ramesh. 


Followup   January 6, 2006 - 2pm Central time zone:

the system statistics mbrc is the "observed actual multi-block read count on the system, regardless 
of the db file multiblock read count init.ora parameter"

the optimizer will use the "observed value", the "retrieval" engine will use 
db_file_multiblock_read_count to do the actual IO however.

So, if you tell the system via dbms_stats "we only get 8 blocks on multi-block reads" but set the 
multiblock read count to the max (say 128 blocks) - when the optimizer decides to to a full scan 
(multi-block read), we'll do as large an IO as possible, but the costing engine would consider "8" 
to be the number we would expect.

 

5 stars dbms_stats in 10gR2   February 21, 2006 - 5pm Central time zone
Reviewer: Oraboy 
Guess I am really tired and my eyes cant catch what I am missing here..Could anyone please help?

Issue: The index associated with Primary key Constraint never gets updated with correct statistics 
and hence my query is throwing wrong results

Commodity_DIM is a table with 700+ rows

-- Version check 
SQL> select * from V$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

-- Table details 
SQL> select table_name,num_rows,sample_size,last_analyzed from user_tables
  2  where table_name='COMMODITY_DIM'
  3  /

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- ----------- ---------
COMMODITY_DIM                         730         730 21-FEB-06

SQL> select count(*) from commodity_dim;

  COUNT(*)
----------
         0

-- Index details 

SQL> select index_name,num_rows,DISTINCT_KEYS,status from user_indexes
  2  where table_name='COMMODITY_DIM';

INDEX_NAME                       NUM_ROWS DISTINCT_KEYS STATUS
------------------------------ ---------- ------------- --------
SYS_C009071                             0             0 VALID

-- Index associates to Primary Key constraint in the table 

SQL> select constraint_name,constraint_type from user_constraints
  2  where table_name='COMMODITY_DIM' and index_name='SYS_C009071';

CONSTRAINT_NAME                C
------------------------------ -
SYS_C009071                    P

-- and the constraint is Valid 

SQL> select constraint_name,constraint_type,status from user_constraints
  2  where table_name='COMMODITY_DIM' and index_name='SYS_C009071';

CONSTRAINT_NAME                C STATUS
------------------------------ - --------
SYS_C009071                    P ENABLED

-- analyzing the index - no help 

SQL> analyze index SYS_C009071 compute statistics;

Index analyzed.

SQL> select index_name,num_rows,DISTINCT_KEYS,status from user_indexes
  2  where table_name='COMMODITY_DIM';

INDEX_NAME                       NUM_ROWS DISTINCT_KEYS STATUS
------------------------------ ---------- ------------- --------
SYS_C009071                             0             0 VALID

-- dbms_stats ; perform compute  

SQL> exec dbms_stats.gather_index_stats(ownname=>'GOLDDEMO',-
> indname=>'SYS_C009071',estimate_percent=>NULL);

PL/SQL procedure successfully completed.

SQL> select index_name,num_rows,DISTINCT_KEYS,status from user_indexes
  2  where table_name='COMMODITY_DIM';

INDEX_NAME                       NUM_ROWS DISTINCT_KEYS STATUS
------------------------------ ---------- ------------- --------
SYS_C009071                             0             0 VALID

SQL> desc commodity_dim
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 COMM_DIM_PK                               NOT NULL NUMBER
 COMM_LOW_CATG_CODE                        NOT NULL VARCHAR2(10)
 COMM_LOW_CATG_DESC                                 VARCHAR2(50)
 COMM_MID_CATG_CODE                        NOT NULL VARCHAR2(50)
 COMM_MID_CATG_DESC                                 VARCHAR2(50)
 COMM_HI_CATG_CODE                         NOT NULL VARCHAR2(50)
 COMM_HI_CATG_DESC                                  VARCHAR2(50)
 COMM_ALL_CATG_CODE                                 VARCHAR2(10)
 COMM_ALL_CATG_DESC                                 VARCHAR2(50)

-- Here is the problem 

SQL> select count(*) from commodity_dim;

  COUNT(*)
----------
         0

SQL> select count(distinct comm_dim_pk) from commodity_dim;

COUNT(DISTINCTCOMM_DIM_PK)
--------------------------
                         0

SQL> select /*+ FULL(c)*/
  2         count(distinct comm_dim_pk) from commodity_dim c;

COUNT(DISTINCTCOMM_DIM_PK)
--------------------------
                       730

-- Verifying Optimizer plans  

SQL> explain plan for select count(*) from commodity_dim;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
Plan hash value: 818636992

------------------------------------------------------------------------
| Id  | Operation        | Name        | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT |             |     1 |     0   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE  |             |     1 |            |          |
|   2 |   INDEX FULL SCAN| SYS_C009071 |   730 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------


SQL> explain plan for 
  2  select /*+ FULL(c)*/
  3         count(distinct comm_dim_pk) from commodity_dim c;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
Plan hash value: 3961897095

-------------------------------------------------------------------------
| Id  | Operation          | Name          | Rows  | Bytes | Cost (%CPU)| T
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |               |     1 |     4 |     8   (0)| 0
|   1 |  SORT GROUP BY     |               |     1 |     4 |            |  
|   2 |   TABLE ACCESS FULL| COMMODITY_DIM |   730 |  2920 |     8   (0)| 0
---------------------------------------------------------------------------

-- parameter listing , if this helps  

SQL> show parameter opt

NAME                                 TYPE        VALUE
------------------------------------ ----------- ----------
filesystemio_options                 string
object_cache_optimal_size            integer     102400
optimizer_dynamic_sampling           integer     2
optimizer_features_enable            string      10.2.0.1
optimizer_index_caching              integer     0
optimizer_index_cost_adj             integer     100
optimizer_mode                       string      ALL_ROWS
optimizer_secure_view_merging        boolean     TRUE
plsql_optimize_level                 integer     2 


Followup   February 22, 2006 - 8am Central time zone:

analyze table commodity_dim validate structure cascade;

what does that return. 

5 stars Ora-1499   February 22, 2006 - 9am Central time zone
Reviewer: oraboy 
Thanks Tom.. Looks like there is much bigger problem here.

SQL> analyze table commodity_dim validate structure;

Table analyzed.

SQL> analyze table commodity_dim validate structure cascade;
analyze table commodity_dim validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

I 'll follow-up this in metalink.

Thanks for your pointers, as always.
 


5 stars Weird but worked   February 22, 2006 - 9am Central time zone
Reviewer: Oraboy 
SQL> analyze table commodity_dim validate structure cascade;
analyze table commodity_dim validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file


SQL> alter table commodity_dim move;

Table altered.

SQL> select index_name from user_indexes
  2  where table_name='COMMODITY_DIM';

INDEX_NAME
------------------------------
SYS_C009071

SQL> alter index SYS_C009071 rebuild ;

Index altered.

SQL> analyze table commodity_dim validate structure cascade;

Table analyzed.

SQL> select count(*) from commodity_dim;

  COUNT(*)
----------
       730

SQL> explain plan for select count(*) from commodity_dim;

Explained.


SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 2809477265

-----------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |     1 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |             |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| SYS_C009071 |   730 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

 


5 stars Comparable test, different results...   March 9, 2006 - 2pm Central time zone
Reviewer: Ben from USA
This is a great discussion, it has brought to light the concept of histograms and generating 
statistics overall for me. The argument for how bright that light is, is still up in the air ;)

After reading the portion, dated Jan 05, that gave examples of the differences between the 
different method_opt choices, I decided to experiment a little for myself. I am using a test data 
warehouse that is composed of only 7 tables, each of which have a primary key index and two of 
which have an additional one column non-unique index. I show those below.

I found my results to be un-expectantly different from yours and I wondered if you could help me to 
understand why. 

The first test was using 'FOR ALL INDEXED COLUMNS'. The second test was with 'FOR ALL INDEXED 
COLUMNS SIZE AUTO'. Finally, the third test was using 'FOR ALL INDEXED COLUMNS SIZE SKEWONLY'. The 
third test gave the most confusing results to me. I will include my observation and question 
following the results from each test, within the spooled listing below. 

SQL> show user
USER is "DMR"

SQL> select USER || '.' || table_name, index_name, column_name
  2  from user_ind_columns;

USER||'.'||TABLE_NAME INDEX_NAME             COLUMN_NAME
--------------------- ---------------------- -------------------------
DMR.BRANCH_BO         BRANCH_BO_1            REPORT_DATE 
DMR.BRANCH_BO         BRANCH_BO_1            PRODUCT  
DMR.BRANCH_BO         BRANCH_BO_1            BRANCH_PLANT
DMR.BRANCH_BO         BRANCH_BO_1            TRX_CLASS 

DMR.INV_FACT          INV_FACT_1             REPORT_DATE  
DMR.INV_FACT          INV_FACT_1             PRODUCT 
DMR.INV_FACT          INV_FACT_1             BRANCH_PLANT 
DMR.INV_FACT          INV_FACT_1             CATEGORY      
DMR.INV_FACT          INV_FACT_1             OBSOLETE 

DMR.ITEM              ITEM_1                 PRODUCT 
DMR.ITEM              ITEM_2**               D_BUS_UNIT

DMR.ITEM_BRANCH       ITEM_BRANCH_1          PRODUCT 
DMR.ITEM_BRANCH       ITEM_BRANCH_1          BRANCH_PLANT

DMR.ITEM_FILL         ITEM_FILL_1            REPORT_DATE 
DMR.ITEM_FILL         ITEM_FILL_1            PRODUCT  
DMR.ITEM_FILL         ITEM_FILL_2**          PRODUCT 

DMR.OSB_FACT          OSB_FACT_1             REPORT_DATE 
DMR.OSB_FACT          OSB_FACT_1             PRODUCT
DMR.OSB_FACT          OSB_FACT_1             TRX_CLASS 

DMR.PROD_FACT         PROD_FACT_1            REPORT_DATE
DMR.PROD_FACT         PROD_FACT_1            PRODUCT 
DMR.PROD_FACT         PROD_FACT_1            BRANCH_PLANT

DMR.TIME              TIME_1                 CURDATE 

** signifies non unique indexes/columns

23 rows selected.

############## 
The following shows the skewdness of d_bus_unit, curdate, and obsolete
##############


SQL> select count(distinct D_BUS_UNIT)
  2  from item;
 
COUNT(DISTINCTD_BUS_UNIT)
-------------------------
                       12
 
SQL> select count(*)
  2  from item;
 
  COUNT(*)
----------
     65119
 
SQL> select D_BUS_UNIT, count(*)
  2  from item
  3  group by D_BUS_UNIT;
 
D_BUS_UNIT                       COUNT(*)
------------------------------ ----------
                                      617
Acute Care                          29298
Corporate                              10
Drop Ship Fee                           1
Freight                                 2
Min Order Qty Fee                       1
NonStock Expense Item                   8
OEM                                  1920
Patient Care                        30547
REBATE CREDIT                           1
Raw Material Samples                    2
 
D_BUS_UNIT                       COUNT(*)
------------------------------ ----------
Wound Care                           2699
 
12 rows selected.
SQL> select count(distinct obsolete)
  2  from inv_fact;
 
COUNT(DISTINCTOBSOLETE)
-----------------------
                      2
 
SQL> select count(*)
  2  from inv_fact;
 
  COUNT(*)
----------
   2743569
 
SQL> select obsolete, count(*)
  2  from inv_fact
  3  group by obsolete;
 
O   COUNT(*)
- ----------
N    2649249
Y      94320

SQL> select count(*)
  2  from time;
 
  COUNT(*)
----------
       156
 
SQL> select count(distinct curdate)
  2  from time;
 
COUNT(DISTINCTCURDATE)
----------------------
                   156

#################################

SQL> exec dbms_stats.delete_schema_stats( 'DMR' );

PL/SQL procedure successfully completed.

SQL> select table_name, column_name, count(*)
  2  from user_tab_histograms
  3  group by table_name, column_name;

no rows selected

SQL> exec dbms_stats.gather_schema_stats
    ( ownname         =>'DMR', 
      cascade         => TRUE, 
      estimate_percent     => 10, 
      method_opt         => 'FOR ALL INDEXED COLUMNS');

PL/SQL procedure successfully completed.

SQL> l
  1  select table_name, column_name, count(*)
  2  from user_tab_histograms
  3* group by table_name, column_name
SQL> /

TABLE_NAME            COLUMN_NAME     COUNT(*)             
--------------------- --------------  ----------       
ITEM                  PRODUCT         2             
ITEM                  D_BUS_UNIT      7             
TIME                  CURDATE         76             
INV_FACT              PRODUCT         76             
INV_FACT              CATEGORY        2             
INV_FACT              OBSOLETE        2             
INV_FACT              REPORT_DATE     74             
INV_FACT              BRANCH_PLANT    25             
OSB_FACT              PRODUCT         76             
OSB_FACT              TRX_CLASS       6             
OSB_FACT              REPORT_DATE     74             
BRANCH_BO             PRODUCT         76             
BRANCH_BO             TRX_CLASS       6             
BRANCH_BO             REPORT_DATE     62             
BRANCH_BO             BRANCH_PLANT    24             
ITEM_FILL             PRODUCT         76             
ITEM_FILL             REPORT_DATE     3             
PROD_FACT             PRODUCT         76             
PROD_FACT             REPORT_DATE     44             
PROD_FACT             BRANCH_PLANT    32             
ITEM_BRANCH           PRODUCT         76             
ITEM_BRANCH           BRANCH_PLANT    33             

22 rows selected.

######### OBSERVATIONS/QUESTIONS ########
When you did this test on table "T" that was built from 
create table t as select * from all_objects
you got a 76 bucket histogram on object_id, a unique column, and a 22 bucket histogram on owner, a 
non-unique column.

In my sample item.product is the primary key, I would have expected more than a two bucket 
histogram.
ITEM.D_BUS_UNIT is non-unique and very skewed but yet I only got a 7 bucket histogram.
What could be the cause of this? It seems like I got the opposite of what your example shows.
################# END of Q1


SQL> exec dbms_stats.delete_schema_stats( 'DMR' );

PL/SQL procedure successfully completed.

SQL> select table_name, column_name, count(*)
  2  from user_tab_histograms
  3  group by table_name, column_name;

no rows selected

SQL> exec dbms_stats.gather_schema_stats
    ( ownname        => 'DMR', 
      cascade         => TRUE, 
      estimate_percent     => 10, 
      method_opt         => 'FOR ALL INDEXED COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

SQL> select table_name, column_name, count(*)
  2  from user_tab_histograms
  3  group by table_name, column_name;

TABLE_NAME                     COLUMN_NAME                 COUNT(*)             
------------------------------ ------------------------- ----------             
ITEM                           PRODUCT                            2             
ITEM                           D_BUS_UNIT                         2             
TIME                           CURDATE                            2             
INV_FACT                       PRODUCT                            2             
INV_FACT                       CATEGORY                           2             
INV_FACT                       OBSOLETE                           2             
INV_FACT                       REPORT_DATE                        2             
INV_FACT                       BRANCH_PLANT                       2             
OSB_FACT                       PRODUCT                            2             
OSB_FACT                       TRX_CLASS                          2             
OSB_FACT                       REPORT_DATE                        2             
BRANCH_BO                      PRODUCT                            2             
BRANCH_BO                      TRX_CLASS                          2             
BRANCH_BO                      REPORT_DATE                        2             
BRANCH_BO                      BRANCH_PLANT                       2             
ITEM_FILL                      PRODUCT                            2             
ITEM_FILL                      REPORT_DATE                        2             
PROD_FACT                      PRODUCT                            2             
PROD_FACT                      REPORT_DATE                        2             
PROD_FACT                      BRANCH_PLANT                       2             
ITEM_BRANCH                    PRODUCT                            2             
ITEM_BRANCH                    BRANCH_PLANT                       2             

22 rows selected.


##########################
No confusion here, got the exact results as your sample. This is a test db and hasn't had any sql 
ran for some time.
###########################

SQL> exec dbms_stats.delete_schema_stats( 'DMR' );

PL/SQL procedure successfully completed.

SQL> select table_name, column_name, count(*)
  2  from user_tab_histograms
  3  group by table_name, column_name;

no rows selected

SQL> exec dbms_stats.gather_schema_stats
    ( ownname        =>'DMR', 
      cascade         => TRUE, 
      estimate_percent     => 10, 
      method_opt         => 'FOR ALL INDEXED COLUMNS SIZE SKEWONLY');

PL/SQL procedure successfully completed.

SQL> select table_name, column_name, count(*)
  2  from user_tab_histograms
  3  group by table_name, column_name;

TABLE_NAME                     COLUMN_NAME                 COUNT(*)             
------------------------------ ------------------------- ----------             
ITEM                           PRODUCT                            2             
ITEM                           D_BUS_UNIT                         2             
TIME                           CURDATE                           76             
INV_FACT                       PRODUCT                          201             
INV_FACT                       CATEGORY                           2             
INV_FACT                       OBSOLETE                           2             
INV_FACT                       REPORT_DATE                       99             
INV_FACT                       BRANCH_PLANT                      48             
OSB_FACT                       PRODUCT                          201             
OSB_FACT                       TRX_CLASS                          6             
OSB_FACT                       REPORT_DATE                      100             
BRANCH_BO                      PRODUCT                          201             
BRANCH_BO                      TRX_CLASS                          6             
BRANCH_BO                      REPORT_DATE                       89             
BRANCH_BO                      BRANCH_PLANT                      24             
ITEM_FILL                      PRODUCT                          102             
ITEM_FILL                      REPORT_DATE                        3             
PROD_FACT                      PRODUCT                          201             
PROD_FACT                      REPORT_DATE                      108             
PROD_FACT                      BRANCH_PLANT                      31             
ITEM_BRANCH                    PRODUCT                          201             
ITEM_BRANCH                    BRANCH_PLANT                      57             

22 rows selected.
################################
Here's where I really get confused. I listed some statements at the top that shows the skewdness of
item.d_bus_unit, time.curdate, and inv_fact.obsolete. item.product is the primary key of item, so 
it's unique, and from example I would expect the 2 bucket histogram, but d_bus_unit is non_unique 
and very skewed. Why no histograms here? Time.curdate is unique, 
why does it have histograms? inv_fact.obsolete is non-unique and very skewed, but yet again, no 
histograms. What could be causing this?

Also, I might be a little confused with the terminology and referring to that count as buckets. I 
saw where you were saying that 2 meant no histograms, just hi/lo endpoints, and that >2 meant 
histograms.

Thank you for any explanations you could give me, and let me know if you need to see other settings 
"maybe init.ora" or anything else.


 


Followup   March 9, 2006 - 3pm Central time zone:

what version here - dbms_stats behavior is driven by that greatly.

Also, I likely won't address everything here - this is sort of a review followup area, this is 
"big" (and I cannot really reproduce, I do not have your data, that makes it "hard")
 

5 stars in addition to above...   March 9, 2006 - 3pm Central time zone
Reviewer: Ben from USA
Sorry, I forgot to include the banner information

SQL> select banner from v$version;
 
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
PL/SQL Release 9.2.0.5.0 - Production
CORE    9.2.0.6.0       Production
TNS for IBM/AIX RISC System/6000: Version 9.2.0.5.0 - Production
NLSRTL Version 9.2.0.5.0 - Production 


5 stars another addition, your response wasn't posted before last.   March 9, 2006 - 4pm Central time zone
Reviewer: Ben from USA
sorry, again for the lack of data. I did think of something else that might impact this. 
In my init.ora the compatible parameter is set to 8.1.0 BUT the optimizer_features_enable is set to 
9.2.0

I just recently inherited the database so I can't give you a reason why these are the way they are. 
 


Followup   March 9, 2006 - 4pm Central time zone:

I'll take a look later when I get back from travel if I get a chance.


Meanwhile and faraway, can you try to generate a test case using mod, rownum, decode whatever to 
generate dummy data that we can test with ? 

5 stars a closer comparison   March 10, 2006 - 1pm Central time zone
Reviewer: Ben from USA
I re-created your exact example by using the all_objects table. 
Those stats were almost exact. I think the difference with using skewonly is probably due to my db 
having fewer schemas than the one you were using. 

I'm still confused on some things with my specific data tables. 

The time table has a record for each day. with curdate being the actual date and primary key. The 
other fields are for day, month, year, day of week, day of year, etc. This is a test schema so 
there may not be a record for every day begining to end. Could this cause the need for histograms 
on that column? For example the min date could be 1/1/03 and the max 1/1/06 but it could be missing 
three months in between or even all of 2005. 



SQL> show user
USER is "DMR"
SQL> create table t as select * from all_objects;

Table created.

SQL> create index t_idx1 on t(object_id);

Index created.

SQL> create index t_idx2 on t(owner);

Index created.

SQL> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns', cascade 
=> true );

PL/SQL procedure successfully completed.

SQL> col column_name for a20
SQL> select column_name, count(*)
  2  from user_tab_histograms
  3  where table_name = 'TIME';
  4  group by column_name
SQL> /

COLUMN_NAME            COUNT(*)                                                 
-------------------- ----------                                                 
OBJECT_ID                    76                                                 
OWNER                         4                                                 

SQL> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns size 
auto', cascade => true );

PL/SQL procedure successfully completed.

SQL> /

COLUMN_NAME            COUNT(*)                                                 
-------------------- ----------                                                 
OBJECT_ID                     2                                                 
OWNER                         2                                                 

SQL> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns size 
skewonly', cascade => true );

PL/SQL procedure successfully completed.

SQL> /

COLUMN_NAME            COUNT(*)                                                 
-------------------- ----------                                                 
OBJECT_ID                     2                                                 
OWNER                         4                                                 

SQL> exec dbms_stats.gather_table_stats( user, 'TIME', method_opt=> 'for all indexed columns size 
skewonly', cascade => true );

PL/SQL procedure successfully completed.

SQL> select column_name, count(*)
  2  from user_tab_histograms
  3  where table_name = 'TIME';
  4  group by column_name
SQL> /

COLUMN_NAME            COUNT(*)                                                 
-------------------- ----------                                                 
CURDATE                      76 


Followup   March 10, 2006 - 8pm Central time zone:

we flipped from T to TIME?  I didn't follow?  What is TIME exactly? 

5 stars explanation   March 14, 2006 - 8am Central time zone
Reviewer: Ben from USA
I tried to explain in the last paragraph at the top before the example. I went through and 
recreated your example with almost exact results. I then posted numbers from a TIME.CURDATE field 
that is a primary key that holds the date. In production there is a record for every day, but the 
test instance has some gaps. I was asking if that is why I would get histograms on that column.

If I am running dbms_stats.gather_schema_stats(user, 'TIME', method_opt=> 'for all 
indexed columns size skewonly', cascade => true );
Why would a single column primary key of varchar2 datatype get a histogram generated. That could 
explain a lot to me. I thought if a column is unique it would only need 1 bucket. Is that not true 
with varchar2 columns. 
If, on a smaller scale, my column looks like:
10-245
10-456
10-984
10-999
15-875
50-3000
50-450
50-451
50-780
90-800
That is a unique column but would it get histograms generated if I used "for all indexed columns 
size skewonly"?


On the other hand, I also have a char(1) column that is either Y or N. Most of the 100,000 records 
have an N but yet I only get 1 bucket for this column. I wouldn't think that is sufficient. 
example:
N
N
N
Y
N
N
N
N
N
Y
Y
N
N
I am only getting a 1 bucket histogram for a column like this, of course on a much larger scale.

If one value out of two possible makes up less than 5% or 10% are histograms not created?

 


2 stars analyze, and gather_stats schemas   April 24, 2006 - 4am Central time zone
Reviewer: Mohammed Abdul Samad from KSA
this is good and useful but I need more detail with 
example like if we are gathering stats every week
so how to make scheduling with dbms_jobs,
can you give me with complete examples
hope you got the point...... 


Followup   April 24, 2006 - 5am Central time zone:

what issue do you have with dbms_job?  It is pretty straightforward.

You have a procedure you write to gather stats (or just call dbms_stats)
You schedule it.

 

4 stars Problem with table analyze.   May 2, 2006 - 4pm Central time zone
Reviewer: Abdul Seerae from Mitsubishi Motors - IL - USA
Dear Tom,

We have the following script running daily (AT 2.00 AM) to gather statistics on a specific schema.

[ dbms_stats.gather_schema_stats(ownname=>'ABCDOWNER',cascade=>true); ]

We are facing a problem with this script. The table analyze stops at a specific table 
(say 'Problem_tbl') and it will not proceed any further. It gives no error message or updates any 
logs. It stays there for ever. When checking through OEM, or sqlplus, we can see 'last analyzed 
date' updated for all tables before 'Problem_tbl' table, in this schema. 
This “problem_tbl” is not very big in size (only16,000 rows) and we have many big ones in this 
schema.

When we analyze this table individually, we got same result.  It stays for ever. 
[ dbms_stats.gather_table_stats(ownname=> 'ABCDOWNER', tabname=> 'PROBLEM_TBL', partname=> NULL); 
]. 

We can analyze all other tables without any problem. To rule out any table corruption, we exported 
and imported this table to our development server.  The data seems OK. 

Once we restart database, this problem goes away for few weeks before it reappear.

Any suggestions / thoughts in this respect will be highly appreciated.


O/S:  Windows 2003.
Oracle : 9i (9.2.0.6.0)


 


Followup   May 3, 2006 - 1am Central time zone:

(this does seem like an OBVIOUS support issue, I always wonder why that path isn't taken...)

I would peek in the v$ views to see what this session is waiting for when it gets "stuck".

v$session_wait
v$session_event 

5 stars Locks   May 8, 2006 - 6am Central time zone
Reviewer: Ik from BG
Tom,

Two questions:

Does ANALYZE or DBMS_STATS against a table, lock the table in any mode? I came across an old Oracle 
press documentation which said so, but could not simulate it (10g R2). Was it true in older 
versions of Oracle?


If the table is locked by another session, would analyze/dbms_stats wait for that lock to be 
released?

Thanks,
 


Followup   May 8, 2006 - 8am Central time zone:

certain variations of analyze can (to validate structures and such), but dbms_stats just runs 
queries basically. 

5 stars Doubts   May 9, 2006 - 10am Central time zone
Reviewer: Ik from BG
Thanks a lot Tom.

I have some related questions.

1. Why do we have parameter force for gather_table_stats? This is FALSE by default. This means, 
dbms_stats.gather_table_stats errors out (or does it wait?) if the table is locked?

2. no_invalidate clause -- The documentation says "Does not invalidate the dependent cursors if set 
to TRUE. The procedure invalidates the dependent cursors immediately if set to FALSE." 
Invalidating dependent cursors means, all the queries that have used this table and are present in 
the shared pool will be invalidated and will be hard parsed if executed again?

Was this the default behaviour with ANALYZE?

3. DBMS_STATS.GATHER_TABLE_STATS

BEGIN
DBMS_STATS.GATHER_TABLE_STATS     
   (     
   OWNNAME => 'TEST'     
   ,TABNAME => 'RTEST'     
   ,ESTIMATE_PERCENT => 1
   ,GRANULARITY => 'GLOBAL'     
   ,CASCADE => FALSE     
   ); 
END; 

This table has 140 partitions, 6018191841 rows and 80538932 blocks.

The above command takes 80 minutes. 

If i change block_sample => TRUE, it completes in less than a minute.

I found this query (modified) to be the most time consuming one and waits on "db file scattered 
reads"

select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact 
 use_weak_name_resl dynamic_sampling(0) no_monitoring */ * 
 from rtest sample  (1) t 

When the block_sample is made TRUE, query changes to 

select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact 
 use_weak_name_resl dynamic_sampling(0) no_monitoring */ * 
 from rtest sample block (1) t 

and is much faster.

Tom, i found sample rows and sample block to return different amount of rows. Which is more 
accurate... when used within dbms_stats , and why?

Thanks a lot for your time.

Thanks,




 


Followup   May 9, 2006 - 11am Central time zone:

1) force overrides statistics that were "locked", you can "lock" statistics - make it so a normal 
"gather" won't overwrite them.  Maybe they could have called it "freezing" statistics instead of 
locking them.

....
force     When value of this argument is TRUE, deletes column statistics even if locked
.......


2) correct. analyze would invalidate cursors as well.

3) You need to understand the difference between block and row sampleing.

One of them samples 1% of the BLOCKS and processes any rows it finds on them.

The other samples 1% of the rows (read the rows, but only look at 1 out of every 100 of them).

You would certainly expect them to return different statistics since they look at radically 
different data in very different ways. 

5 stars Thanks   May 9, 2006 - 11am Central time zone
Reviewer: Ik from BG
Many thanks Tom for the very fast and precise response.

When row sampling (1%) is used, oracle reads every row from the table, but considers only 1 out of 
every hundred rows. In effect the whole table is read.

The difference between this and COMPUTE statistics would then be that COMPUTE considers all the 
rows whereas sample rows considers only 1% of the rows. But, in both cases the entire table is 
read.

So, we save on CPU mostly by using sample rows when compared to COMPUTE? Am i correct in saying so?


Also, the Oracle documentation says "Random block sampling is more efficient, but if the data is 
not randomly distributed on disk, then the sample values may be somewhat correlated." How is random 
block sampling more efficient?

Iam sorry for asking too many questions, Tom, but these are doubts for which very few people have 
an answer to.




 


Followup   May 9, 2006 - 12pm Central time zone:

you saw how random block sampling is more efficient - it only read 1% of the blocks, it ran a lot 
faster.  

But it might get entirely the wrong answer by accident since it reads 1% of the blocks and could be 
subject to skew (load sorted data and it'll see lots of "the same values" on any given block which 
could be unnatural)

 

5 stars   May 9, 2006 - 11am Central time zone
Reviewer: Alberto Dell'Era from Milan, Italy
> Maybe they could have called it "freezing" statistics 
> instead of locking them.

It's worth mentioning the different behaviour when trying to gather stats on a locked table versus 
the schema/db that contains it:

dellera@ORACLE10> exec  dbms_stats.lock_table_stats (user, 't');

PL/SQL procedure successfully completed.

dellera@ORACLE10> exec dbms_stats.gather_table_stats (user, 't');
BEGIN dbms_stats.gather_table_stats (user, 't'); END;

*
ERROR at line 1:
ORA-20005: object statistics are locked (stattype = ALL)
ORA-06512: at "SYS.DBMS_STATS", line 13056
ORA-06512: at "SYS.DBMS_STATS", line 13076
ORA-06512: at line 1

dellera@ORACLE10> exec dbms_stats.gather_schema_stats (user);

PL/SQL procedure successfully completed.

The first is definetely a "lock" as in "you bang your head into the locked door" ;) - as a 
non-native speaker would express it ... 


Followup   May 9, 2006 - 12pm Central time zone:

but not to be confused with a table lock or a row lock :) 

5 stars   May 9, 2006 - 12pm Central time zone
Reviewer: David Aldridge from Colorado Springs, CO
>> But it might get entirely the wrong answer by accident since it reads 1% of the 
blocks and could be subject to skew (load sorted data and it'll see lots of "the 
same values" on any given block which could be unnatural)

It's also sensitive to skewed numbers of rows per block, and the 1% sample is by no means 
guaranteed

http://oraclesponge.blogspot.com/2005/06/statistics-by-block-sampling.html


5 stars dbms_stats for IOTs   May 16, 2006 - 8am Central time zone
Reviewer: Ik from BG
Tom,

This is 10g R2.

I noticed that when gathering stats on 10g using dbms_stats (specifying partname, granularity => 
'PARTITION' and estimate_percent => <some number>)
performance on IOTs are very poor when compared to tables of similar/much bigger size.

The SQL it is waiting for is 

INSERT /*+ append*/ into sys.<temp table>
SELECT .... FROM <iot> sample (0.001) t
WHERE TBL$OR$IDX$PART$NUM(<iot>,0,4,0,ROWID) = :objn

It is waiting on 'db_file_sequential_reads' and in the P1 column of v$session_wait, it shows up 
almost every datafile that this IOT spans across. This includes other partitions too. My question 
is that the granularity keyword seems to have no impact on IOTs.

Is this how it should be?

This command is taking more than an hour to run. We have around 100 partitions.


Tom, whenever granularity of PARTITION is specified for a table, index or IOT, shouldnt it go 
against that partition alone?

In cases when the granularity => 'PARTITION', would Oracle attempt to gather global statistics also 
by simply rolling up from individual partitions each time?


Please let us know. 


4 stars   June 3, 2006 - 10pm Central time zone
Reviewer: A reader 
Hi Tom,
        A basic question but I am bit confused about it. I create one schema with some tables in 
it. Do I need to gateher stats for them before using them? Or Oracle will take care once they start 
getting populated? I am bit confused if we do need to take the stats for empty tables. This I am 
asking as I was asked to do so.May be I did understand it properly and I might have to go back and 
confirm.

Thx
Santosh. 


Followup   June 4, 2006 - 8am Central time zone:

The only answers to all questions are:

a) why 
b) it depends

The answer to this is "it depends"

are you using an older release and using the rule based optimizer (RBO) - then the answer is no, 
because you are using the RBO and the RBO doesn't want statistics

are you using current software and using the cost based optimizer (CBO)?  Then we need to know how 
current - is it 10g current where dynamic sampling will kick in automatically or is it 8i/9i 
current where sampling does not kick in automatically.  

In 8i/9i - you would be responsible for collecting the statistics.  In 10g, it'll dynamically 
sample the table the first time it hard parses a query and if you are using the automatic job that 
installs with the database, it'll gather statistics for you on these objects in the future as the 
job runs over time.

 

3 stars Dbms_stats.gather_schema_stats   June 4, 2006 - 9am Central time zone
Reviewer: Vikas from INDIA
Hi Tom,

We are currently managing a database worth 461G worth of data. This is on the database server 
hosted on Linux machine having 4 CPU's , 8 disk devices supported by 4 disk controllers and 16 G of 
RAM.

We have been running the package to collect the statistics which will help the optimizer to 
generate the optimised execution plans using dbms_stats.gather_database_stats, but recently data 
being growing everyday to the tune of 1G the capturing of statistics has really slowed down and is 
taking 5 hrs to complete the job.

Create or replace procedure gather_db_stats AS
BEGIN
declare
    Path                     VARCHAR2(100) ;
    Filename                 VARCHAR2(100) := 'Dbmsstats.txt';
    Output_file              utl_file.file_type;
    l_Start                  NUMBER;
    l_tname                 VARCHAR2(30);
    l_partition_name         VARCHAR2(30);
    rebuild_partition_index  VARCHAR2(500);
    err_num                  NUMBER;
    err_msg                  VARCHAR2(500);
BEGIN 
    Select value into path from v$parameter where name = 'utl_file_dir';
    output_file := utl_file.fopen (path,filename, 'a');

    for x in (Select * from DWUSER.Rolling_tables where status = 'Y') loop
        l_tname := x.table_name;
        for table_specific_partitions in (Select all_ind_partitions.partition_name part_name from 
all_indexes,all_ind_partitions 
                                          Where  all_indexes.index_name    = 
all_ind_partitions.index_name 
                      And    all_indexes.table_name    = l_tname
                                          And    all_ind_partitions.Status = 'UNUSABLE') loop
        begin 
               l_partition_name := table_specific_partitions.part_name;
               rebuild_partition_index := 'Alter table ' || l_tname || ' modify partition ' || 
l_partition_name || ' rebuild unusable local indexes';
               execute immediate rebuild_partition_index;
           EXCEPTION
             When OTHERS then
                err_num := SQLCODE;
                err_msg := Substr(SQLERRM,1,500);
                utl_file.put_line (output_file,'Date : ' || to_Char(sysdate,'dd/mm/yyyy 
hh24:mi:ss') || ' Error generated while rebuilding the UNUSABLE local indexes with an error message 
' || err_msg);
                utl_file.fclose(output_file);
                return;
           end;
        end loop;
     end loop;

     for x in (select Username from dba_users where username not in ('SYS','SYSTEM'))
     lOOP
        l_start := dbms_utility.get_time;
        dbms_stats.gather_schema_stats(ownname           => x.username,
                                       estimate_percent  => dbms_stats.auto_sample_size,
                                       method_opt        => 'FOR ALL INDEXED COLUMNS SIZE 
SKEWONLY',
                                       degree            => 16,
                                       cascade           => true,
                                       options           => 'GATHER AUTO');
        utl_file.put_line (output_file,'Date : ' || to_Char(sysdate,'dd/mm/yyyy hh24:mi:ss') || ' 
The total time spent to gather stats for the schema owner : ' || x.username || ' is ' || 
round((dbms_utility.get_time - l_start)/100,2) || 'seconds');
    end loop;
END;
END;
/

We just don't have the bandwidth of 5 hrs to get this done. We have 4 partitioned tables which do 
have most of the data (could be 350M). 

Can we use gather_table_stats instead of gather_schema_stats for only those partitions which have 
the LAST_ANALYZED_DATE as NULL, since our application is such which doesn't perforn any delete or 
insert activity once the End of day completes. We have Range partitition based on Date and have 
daily partitoins.

Will this help, or let us know if we can configure some of the parameter values for the 
gather_schema_stats to get the job done in a much faster way. We don't need to build histograms for 
the columns.

Thanks in anticipation 

 


Followup   June 4, 2006 - 4pm Central time zone:

The answer is as always one of two:

a) why
b) it depends

your answer is b), it depends.

Can you just do partition stats?  do your queries always hit a SINGLE partition, and is that 
partition known at hard parse time?

If your query hits more than one partition OR your query doesn't let us figure out which partition 
it will hit - then GLOBAL TABLE stats are used to optimize (not local partition statistics).

And the global table stats could be slightly "wrong" for columns if you don't maintain them over 
time.

eg: if you have 2 partitions - and each partition has 500,000 rows - it is easy enough to know 
there are 1,000,000 rows - that works OK from the local partition statistics.

however, if in those 2 partitions there it a column X and it has 10,000 distinct values in 
partition1 and 10,000 distinct values in partition2 - how many distinct values does it have 
"globally"?  somewhere between 10,000 and 20,000 - which can be the difference between "night and 
day" to the optimizer.



There is no cut and dry answer here - you might have to "help" on some columns using 
dbms_stats.set_***  to set the column stats for us if you know them (and you are gathering 
histograms there - you said you don't need to so why?) 

4 stars dbms_stats   July 5, 2006 - 4am Central time zone
Reviewer: geepee 
Dear Tom,

We have around 150 fast growing/changing transactional data tables in our production system.

Database is 10gR2. 
Growth of these tables is from 0.1 million to 1 million in six months. 
A few tables are partitioned tables.

Which way is good to analyze these fast growing tables? Dynamic sampling? or scheduling the script 
run using dbms_stats for these tables? Pl. suggest.

Regards.
 


Followup   July 8, 2006 - 7am Central time zone:

dynamic sampling will happen only at the first hard parse time (so if you never shutdown - you 
never get different plans - assuming you have used bind variables - which you must be since you 
said "we are transactional").  It would not be appropriate.

More likely would be to use dbms_stats to gather stats on STALE tables (tables are all set to 
monitoring in 10g you can use "gather stale" as the option, to avoid gathering stats on tables that 
haven't changed much). 

4 stars Table statistics and 3113 end-of-file on comm channel   July 12, 2006 - 11am Central time zone
Reviewer: Jon Waterhouse from Newfoundland, Canada
I ran into the following problem on XE, which I find difficult to explain. I had the following 
query:

select count(*),urban
from  case@caps s 
inner join res_address@caps r ON r.res_address_id = s.res_address_id
inner join ref.good_postal_code p ON p.postal_code=r.postal_code 
WHERE s.adults=1 and s.children > 0  -- is single parent
AND exists (select * from payment@caps p
           WHERE s.case_id = p.case_id
           AND p.payment_date between to_date('200605','yyyymm') and to_date('200606','yyyymm')-1)
GROUP BY urban

which worked fine. Note that there is one table on XE and the other two tables are remote. I then 
modified the query to take out the two items that restricted the query to single parents. I got a 
3113 error running an explain plan. I found that adding in some restrictions (e.g. s.adults in 
(1,2,3)) would allow the query to run -- even though this in no way restricts the query - all 
records would satisfy this. I also found that running an analyze table on ref.good_postal_code 
allowed the query to run.

Since I wasn't sure if it was possible to delete stats I then did a CTAS to dreate a new table 
exactly the same as good_postal_code but with no stats. Surprisingly the query ran no problem.

I read somewhere that resource limits can generate 3113s. Is that what is happening here? i.e. the 
explain plan process decides that the query is going to be too big and returns the 3113?  


Followup   July 12, 2006 - 5pm Central time zone:

doubtful - can you reproduce in "not XE"? 

5 stars   July 24, 2006 - 12pm Central time zone
Reviewer: ZB 
system@oip1> create or replace procedure ods_analyze
  2  is
  3  begin
  4  execute immediate 'execute DBMS_STATS.GATHER_SCHEMA_STATS(''ODS'')';
  5  end;
  6  /

Procedure created.

system@oip1> exec ods_analyze;
BEGIN ods_analyze; END;

*
ERROR at line 1:
ORA-00900: invalid SQL statement
ORA-06512: at "SYSTEM.ODS_ANALYZE", line 4
ORA-06512: at line 1

Tom, why did I get the error?
 


Followup   July 24, 2006 - 2pm Central time zone:

why would you try to dynamically execute plsql - plsql that never changes???????

...
is
begin
   dbms_stats.gather_schema_stats( 'ODS' );
end;


you have plsql
you are writing plsql
just call it


execute is a "sqlplus-ism".  you can see this from your own example:

system@oip1> exec ods_analyze;
BEGIN ods_analyze; END;

*
ERROR at line 1:
ORA-00900: invalid SQL statement


see - it says "BEGIN ods_analyze; END;" - exec is just shorthand for sqlplus to glue begin and end 
on it.  execute is NOT plsql - it is not part of the language at all. 

5 stars   July 24, 2006 - 3pm Central time zone
Reviewer: ZB 
thanks Tom, now i got the procedure compiled.
But got the new:
system@oip1> exec ods_analyze
BEGIN ods_analyze; END;

*
ERROR at line 1:
ORA-20000: Insufficient privileges to analyze an object in Schema
ORA-06512: at "SYS.DBMS_STATS", line 7649
ORA-06512: at "SYS.DBMS_STATS", line 7779
ORA-06512: at "SYS.DBMS_STATS", line 7830
ORA-06512: at "SYS.DBMS_STATS", line 7809
ORA-06512: at "SYSTEM.ODS_ANALYZE", line 4
ORA-06512: at line 1


system@oip1>   exec DBMS_STATS.GATHER_SCHEMA_STATS('ODS');

PL/SQL procedure successfully completed.

But i could do it directly. 


Followup   July 24, 2006 - 4pm Central time zone:

STOP USING SYSTEM

you have no reason to use system, it is "ours", stop using it.  

create your own account 

and read:

http://asktom.oracle.com/~tkyte/Misc/RolesAndProcedures.html

grant it "analyze any" directly (no roles)


or better yet....

just write this as the ODS user, schedule it as the ODS user.  


but don't use system! 

5 stars Can a table be ANALYZE during the DML operations?   August 9, 2006 - 1am Central time zone
Reviewer: Ravi from India
Hi Tom,

1. Can a table be ANALYZE during (to collect statistics) the DML (INSERT,
    UPDATE, DELETE) operations?

2. If the column in WHERE caluse has an index on it, whether the statistics are
    much concerns to use the index during DMLs?

3. Having statistics and not having statisttics will affect the use of the
    index during DMLs? how?

4. If we use (within the same database) ANALYZE for few tables and DBMS_STATS
    package to gather statistics for some other tables, will there be a performance
    issue by using these two together?


Thanks,
Ravi 


Followup   August 9, 2006 - 10am Central time zone:

1) you may use dbms_stats to gather statistics while DML is happening, yes.

2) sure.  It is all about estimated cardinality.

3) you need to have statistics to use the cost based optimizer (CBO) safely.  If you do not, you 
will not be able to use the CBO safely.

4) do not use analyze, use dbms_stats 

4 stars DBMS_STATS & ANALYZE statement Oracle10g R2   August 19, 2006 - 7am Central time zone
Reviewer: Suvendu from Bangalore, INDIA
Hi Tom,

Below is the test case which shows the wrong information with DBMS_STATS in Oracle 10g R2. It has 
the correct information in Oracle 9i R2. As per metalink, CBO will get exact information when we 
will go by COMPUTE STATISTICS where NUM_ROWS and SAMPLE_SIZE columns will point same values. And, I 
also checked out on query execution.

SQL> create table t2 as select * from t1;

Table created.

SQL> select count(1) from t2;

  COUNT(1)
----------
    735392

SQL>  exec DBMS_STATS.GATHER_TABLE_STATS(user, TABNAME => 'T2',GRANULARITY => 'DEFAULT' ,METHOD_OPT 
=> 'FOR COLUMNS');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, sample_size from user_tables where table_name='T2';

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE
------------------------------ ---------- -----------
T2                                 735392       51451

SQL> REM Here is the wrong number of rows compare to 735392 and 735392.

SQL> analyze table t2 compute statistics;

Table analyzed.

SQL> select table_name, num_rows, sample_size from user_tables where table_name='T2';

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE
------------------------------ ---------- -----------
T2                                 735392      735392

SQL> REM Here all values are accurate.

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL>

======================
Oracle 9i R2
======================

SQL> select count(1) from t1;

  COUNT(1)
----------
  27525120

Elapsed: 00:01:07.05
SQL>  exec DBMS_STATS.GATHER_TABLE_STATS(user, TABNAME => 'T1',GRANULARITY => 'DEFAULT' ,METHOD_OPT 
=> 'FOR COLUMNS', DEGREE=>8);

PL/SQL procedure successfully completed.

Elapsed: 00:00:25.22
SQL> select table_name, num_rows, sample_size from user_tables where table_name='T1';

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE
------------------------------ ---------- -----------
T1                               27525120    27525120

Elapsed: 00:00:00.08
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
PL/SQL Release 9.2.0.6.0 - Production
CORE    9.2.0.6.0       Production
TNS for HPUX: Version 9.2.0.6.0 - Production
NLSRTL Version 9.2.0.6.0 - Production

Elapsed: 00:00:00.08
albprod@ALBBPM>


Could you please, elaborate the difference of DBMS_STATS in Oracle 10g ?

Thanking you.

Regards,
Suvendu

 


Followup   August 19, 2006 - 9am Central time zone:

that is not "wrong"

That is "correct" actually.

If you check out the docs for dbms_stats, you'll find the specification of gather table stats in 
10gr2 is:

http://download-east.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_stats.htm#sthref8129
DBMS_STATS.GATHER_TABLE_STATS (
   ownname          VARCHAR2, 
   tabname          VARCHAR2, 
   partname         VARCHAR2 DEFAULT NULL,
   estimate_percent NUMBER   DEFAULT to_estimate_percent_type 
                                                (get_param('ESTIMATE_PERCENT')), 
   block_sample     BOOLEAN  DEFAULT FALSE,
   method_opt       VARCHAR2 DEFAULT get_param('METHOD_OPT'),
   degree           NUMBER   DEFAULT to_degree_type(get_param('DEGREE')),
   granularity      VARCHAR2 DEFAULT GET_PARAM('GRANULARITY'), 
   cascade          BOOLEAN  DEFAULT to_cascade_type(get_param('CASCADE')),
   stattab          VARCHAR2 DEFAULT NULL, 
   statid           VARCHAR2 DEFAULT NULL,
   statown          VARCHAR2 DEFAULT NULL,
   no_invalidate    BOOLEAN  DEFAULT  to_no_invalidate_type (
                                     get_param('NO_INVALIDATE')),
   force            BOOLEAN DEFAULT FALSE);


and that:

ops$tkyte%ORA10GR2> select dbms_stats.get_param( 'estimate_percent' ) from dual;

DBMS_STATS.GET_PARAM('ESTIMATE_PERCENT')
-------------------------------------------------------------------------------
DBMS_STATS.AUTO_SAMPLE_SIZE



you are using the auto_sample_size, therefore, it is free to sample.


It performed as documented - but things change (and the fact that the row count is pretty much dead 
on in 10gr2....) 

3 stars Help needed   August 23, 2006 - 5am Central time zone
Reviewer: Anil P from India
Hello,

This query is taking more than 12 hrs to execute. Can you let me know which areas I need to look to 
tune.

Thanks
Anil P

SQL> select z.IND_ID,
  2         z.BLK_OPEN_DT,
  3         z.ORG_SRC_CD,
  4         z.PRM_SRC_CD,
  5         c.calendar_month_name_abbr ACC_OPEN_MTH_ABR,
  6         c.calendar_month ACC_OPEN_MTH,
  7         c.calendar_year ACC_OPEN_YEAR
  8    from mrtlookup.calendar_date c,
  9         (select ind.individual_id IND_ID,
 10                 x.BLK_OPEN_DT,
 11                 ind.original_source_cd ORG_SRC_CD,
 12                 ind.primary_source_cd PRM_SRC_CD
 13            from mrtcustomer.individual ind,
 14                 (select prp.individual_id IND_ID, min(prp.open_dt) BLK_OPEN_DT
 15                    from mrtcustomer.proprietary_account prp,
 16                         mrtcustomer.individual_segment isg,
 17                         (select (c.calendar_month_end_dt + 1) BLK_OPEN_START_DT,
 18                                 (select max(c.calendar_month_end_dt)
 19                                    from mrtlookup.calendar_date c,
 20                                         (SELECT PERIOD_DATES.WEEK_END_DT computation_dt
 21                                            FROM MRTLOOKUP.PERIOD_DATES PERIOD_DATES) e
 22                                   where c.calendar_month_end_dt <=
 23                                         e.computation_dt) END_DT
 24                            from mrtlookup.calendar_date c
 25                           where c.calendar_date =
 26                                 (select min(ind.individual_create_dt)
 27                                    from mrtcustomer.individual ind
 28                                   where ind.original_source_cd = 'TEL')) y
 29                   where y.BLK_OPEN_START_DT <= prp.open_dt
 30                     and y.END_DT >= prp.open_dt
 31                     and prp.division_ind is not null
 32                     and isg.individual_id = prp.individual_id
 33                     and isg.marketing_segment_cd = 'MAILABLE'
 34                   group by prp.individual_id) x
 35           where ind.primary_source_cd = 'GEC'
 36             and ind.individual_id = x.IND_ID) z
 37   where c.calendar_date = z.BLK_OPEN_DT;

1143156 rows selected.



Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=204854 Card=37843
          Bytes=2194894)

   1    0   HASH JOIN (Cost=204854 Card=37843 Bytes=2194894)
   2    1     TABLE ACCESS (FULL) OF 'CALENDAR_DATE' (TABLE) (Cost=68
          Card=14640 Bytes=278160)

   3    1     NESTED LOOPS (Cost=204785 Card=37843 Bytes=1475877)
   4    3       VIEW (Cost=166771 Card=37843 Bytes=832546)
   5    4         SORT (GROUP BY) (Cost=166771 Card=37843 Bytes=200567
          9)

   6    5           NESTED LOOPS (Cost=58475 Card=37843 Bytes=2005679)
   7    6             NESTED LOOPS (Cost=27419 Card=31019 Bytes=108566
          5)

   8    7               TABLE ACCESS (BY INDEX ROWID) OF 'CALENDAR_DAT
          E' (TABLE) (Cost=2 Card=1 Bytes=16)

   9    8                 INDEX (UNIQUE SCAN) OF 'PK_CD' (INDEX (UNIQU
          E)) (Cost=1 Card=1)

  10    9                   SORT (AGGREGATE)
  11   10                     PARTITION RANGE (ALL) (Cost=107721 Card=
          1302415 Bytes=15628980)

  12   11                       TABLE ACCESS (FULL) OF 'INDIVIDUAL' (T
          ABLE) (Cost=107721 Card=1302415 Bytes=15628980)

  13    7               TABLE ACCESS (FULL) OF 'PROPRIETARY_ACCOUNT' (
          TABLE) (Cost=27417 Card=31019 Bytes=589361)

  14   13                 SORT (AGGREGATE)
  15   14                   NESTED LOOPS (Cost=78 Card=732 Bytes=11712
          )

  16   15                     TABLE ACCESS (FULL) OF 'PERIOD_DATES' (T
          ABLE) (Cost=9 Card=1 Bytes=8)

  17   15                     TABLE ACCESS (FULL) OF 'CALENDAR_DATE' (
          TABLE) (Cost=69 Card=732 Bytes=5856)

  18    6             PARTITION RANGE (ITERATOR) (Cost=1 Card=1 Bytes=
          18)

  19   18               INDEX (UNIQUE SCAN) OF 'PK_ISEG_A' (INDEX (UNI
          QUE)) (Cost=1 Card=1 Bytes=18)

  20    3       PARTITION RANGE (ITERATOR) (Cost=2 Card=1 Bytes=17)
  21   20         TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'INDIVIDUAL'
          (TABLE) (Cost=2 Card=1 Bytes=17)

  22   21           INDEX (UNIQUE SCAN) OF 'PK_IND' (INDEX (UNIQUE)) (
          Cost=1 Card=1)






Statistics
----------------------------------------------------------
        274  recursive calls
          4  db block gets
   14526033  consistent gets
    1299590  physical reads
        232  redo size
   38571751  bytes sent via SQL*Net to client
     838832  bytes received via SQL*Net from client
      76212  SQL*Net roundtrips to/from client
          0  sorts (memory)
          1  sorts (disk)
    1143156  rows processed
 


Followup   August 27, 2006 - 3pm Central time zone:

are the ESTIMATED cardinalities in the explain plan above "realistic"  (I'd say "no" based on the 
returned rows and the estimated returned rows)

are the statistics "up to date" 

3 stars Thanks   August 29, 2006 - 9am Central time zone
Reviewer: Anil P from Chennai
Hello Tom,
Yes it's latest and realistic status. And it's a production db on Oracle 10g. Here's the latest 
explain plan output. 

This query is been generated by a cognos tool. 

Can you tell me which areas needs to be looked into?

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3257 Card=1 Bytes=
          103)

   1    0   FILTER
   2    1     SORT (GROUP BY) (Cost=3257 Card=1 Bytes=103)
   3    2       MERGE JOIN (CARTESIAN) (Cost=1344 Card=445 Bytes=45835
          )

   4    3         NESTED LOOPS (Cost=1281 Card=1 Bytes=77)
   5    4           NESTED LOOPS (Cost=1280 Card=1 Bytes=59)
   6    5             NESTED LOOPS (Cost=369 Card=455 Bytes=15925)
   7    6               TABLE ACCESS (BY INDEX ROWID) OF 'CALENDAR_DAT
          E' (TABLE) (Cost=2 Card=1 Bytes=16)

   8    7                 INDEX (UNIQUE SCAN) OF 'PK_CD' (INDEX (UNIQU
          E)) (Cost=1 Card=1)

   9    8                   SORT (AGGREGATE)
  10    9                     TABLE ACCESS (FULL) OF 'INDIVIDUAL' (TAB
          LE) (Cost=1847 Card=81250 Bytes=975000)

  11    6               TABLE ACCESS (FULL) OF 'PROPRIETARY_ACCOUNT' (
          TABLE) (Cost=367 Card=455 Bytes=8645)

  12   11                 SORT (AGGREGATE)
  13   12                   NESTED LOOPS (Cost=66 Card=740 Bytes=11840
          )

  14   13                     TABLE ACCESS (FULL) OF 'PERIOD_DATES' (T
          ABLE) (Cost=2 Card=1 Bytes=8)

  15   13                     TABLE ACCESS (FULL) OF 'CALENDAR_DATE' (
          TABLE) (Cost=64 Card=740 Bytes=5920)

  16    5             TABLE ACCESS (BY INDEX ROWID) OF 'INDIVIDUAL' (T
          ABLE) (Cost=2 Card=1 Bytes=24)

  17   16               INDEX (UNIQUE SCAN) OF 'PK_IND' (INDEX (UNIQUE
          )) (Cost=1 Card=1)

  18    4           INDEX (UNIQUE SCAN) OF 'PK_ISEG' (INDEX (UNIQUE))
          (Cost=1 Card=1 Bytes=18)

  19    3         BUFFER (SORT) (Cost=1344 Card=14792 Bytes=384592)
  20   19           TABLE ACCESS (FULL) OF 'CALENDAR_DATE' (TABLE) (Co
          st=63 Card=14792 Bytes=384592)


Thanks
Anil 


Followup   August 29, 2006 - 3pm Central time zone:

eh?  this is totally different from above.

anyway - I will not be able to automagically fix this 

3 stars Initial Load and statistics   September 8, 2006 - 8am Central time zone
Reviewer: A reader 
Hi Tom,

I am doing a massif initial load. During this initial load I am inserting into several tables and 
in the same time I am selecting from theses tables. This is why I kept indexes usable. In addition, 
when I am in 50% of my initial load I compute statistics using:

 DBMS_STATS.GATHER_SCHEMA_STATS (shema_name, 30)

I wonder if you will advise me to do this or not?

I want to suppress this call to DBMS_STATS but want to be sure that this will improve performance 
of my process

Thanks for your advise 


Followup   September 9, 2006 - 11am Central time zone:

depends on whether the queries you issue would benefit from this regather.

It is quite possible that you need not gather stats - but since YOU are doing the load (you know 
the stats already!!! you know the number of rows loaded, you know the average row width, you can 
guess at the number of blocks and so on) you could just call SET statistics and provide the correct 
values. 

4 stars GATHER STALE   September 15, 2006 - 2pm Central time zone
Reviewer: Neeraj Nagpal 
Hi Tom,


I have a question for you regarding the GATHER STALE option of DBMS_STATS. Here is a quick little 
background about my database setup: I have 4 very large composite-partitioned tables -- with 50 
partitions and 64 (hash) sub-partitions in each range partition. Each one of these 4 tables has on 
an average around 120 Million rows, with somewhere around 250 columns each.

Now, the data in these 50 partitions is pretty skewed, in terms of no. of records in them. And each 
one of these partitions has its own peculiar update cycle -- some states are updated more (much 
more) than others and some are inserted more than others -- Now what I want to do is that instead 
of indiscriminately issuing 
"Exec dbms_stats.gather_schema_stats( user, options => 'gather stale' );" I want to get just the 
listing of the skewed partitions within the tables and selectively analyze these partitions which 
require them the most. Is there a system table which can give me the information about the 
particular partitions which are STALE in a table?? 


Thanks so much for your help,
Neeraj

 


Followup   September 16, 2006 - 2pm Central time zone:

look at "LIST STALE" as the option 

3 stars Please answer   September 21, 2006 - 3am Central time zone
Reviewer: Reader 
Sir,
Please clarify my doubt.

q1) Why is CUSLED_KEY_1 Index being used instead of CUSLED_PK Primary Key index? 
q2) Why its filtering ACCOUNT and AGE_PERIOD instead of using the PK?
q3) Or whatever is happening is correct? 
But when I disable CUSLED index it uses CUSLED_PK index. 


SQL> select index_name, substr(column_name,1,40) cname from all_ind_columns
  2  where table_name = 'CUSLED';

INDEX_NAME                     CNAME
------------------------------ ----------------------------------------
CUSLED_PK                      LEDGER
CUSLED_PK                      ACCOUNT
CUSLED_PK                      AGE_PERIOD
CUSLED_PK                      DOC_REF
CUSLED_KEY_1                   LEDGER
CUSLED_KEY_1                   DOC_REF
CUSLED_KEY_2                   LEDGER
CUSLED_KEY_2                   ACCOUNT_REF
CUSLED_KEY_3                   LEDGER
CUSLED_KEY_3                   CHEQUE_NO
CUSLED_ACCOUNT_FK              ACCOUNT

SQL> SELECT *
  2  FROM CUSLED
  3  WHERE CUSLED.LEDGER = 'UD'
  4  AND CUSLED.ACCOUNT = '0000070101'
  5  AND CUSLED.AGE_PERIOD = 'A510'
  6  AND CUSLED.DOC_REF = '30001013'
  7  AND CUSLED.REC_STA != 'X';

Execution Plan
----------------------------------------------------------
Plan hash value: 2871354732

--------------------------------------------------------------------------------
------------

| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)
| Time     |

--------------------------------------------------------------------------------
------------

|   0 | SELECT STATEMENT            |              |     1 |   405 |     2   (0)
| 00:00:01 |

|*  1 |  TABLE ACCESS BY INDEX ROWID| CUSLED       |     1 |   405 |     2   (0)
| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | CUSLED_KEY_1 |     3 |       |     1   (0)
| 00:00:01 |

--------------------------------------------------------------------------------
------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CUSLED"."ACCOUNT"='0000070101' AND "CUSLED"."AGE_PERIOD"='A510' A
ND

              "CUSLED"."REC_STA"<>'X')
   2 - access("CUSLED"."LEDGER"='UD' AND "CUSLED"."DOC_REF"='30001013')



SQL> ALTER INDEX CUSLED_KEY_1 UNUSABLE;

Index altered.

SQL> SELECT *
  2  FROM CUSLED
  3  WHERE CUSLED.LEDGER = 'UD'
  4  AND CUSLED.ACCOUNT = '0000070101'
  5  AND CUSLED.AGE_PERIOD = 'A510'
  6  AND CUSLED.DOC_REF = '30001013'
  7  AND CUSLED.REC_STA != 'X';

Execution Plan
----------------------------------------------------------
Plan hash value: 4156143435

--------------------------------------------------------------------------------
---------

| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| T
ime     |

--------------------------------------------------------------------------------
---------

|   0 | SELECT STATEMENT            |           |     1 |   405 |     2   (0)| 0
0:00:01 |

|*  1 |  TABLE ACCESS BY INDEX ROWID| CUSLED    |     1 |   405 |     2   (0)| 0
0:00:01 |

|*  2 |   INDEX UNIQUE SCAN         | CUSLED_PK |     1 |       |     1   (0)| 0
0:00:01 |

--------------------------------------------------------------------------------
---------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CUSLED"."REC_STA"<>'X')
   2 - access("CUSLED"."LEDGER"='UD' AND "CUSLED"."ACCOUNT"='0000070101' AND
              "CUSLED"."AGE_PERIOD"='A510' AND "CUSLED"."DOC_REF"='30001013')


Thanks 


Followup   September 22, 2006 - 1am Central time zone:

the costs in this case where so close, that they "tied", either index was in this case equally 
appropriate. 

1 stars Still awaiting for your reply   September 22, 2006 - 12am Central time zone
Reviewer: Reader 
Still awaiting for your reply 


Followup   September 22, 2006 - 2am Central time zone:

gee, thanks for letting me know?

I was hungry last night and went out for food....   

4 stars LIST STALE PARTITIONS   September 26, 2006 - 2pm Central time zone
Reviewer: Neeraj Nagpal 
Tom,

Thanks for your answer to my question -- titled "GATHER STALE"  asked on September 15, 2006. 
Breifly searching through your website, I came across this procedure that lists the STALE TABLES -- 
changed since the last-analyzed date -- but I am more interested in finding out the STALE 
PARTITIONS rather than just the STALE TABLE. How can one display the names of the STALE PARTITIONS 
using the following script.

 declare
        l_objList  dbms_stats.objectTab;
begin
        dbms_stats.gather_schema_stats
        ( ownname        => USER,
          options        => 'LIST STALE',
          objlist        => l_objList );
        for i in 1 .. l_objList.count
        loop
                dbms_output.put_line( l_objList(i).objType );
                dbms_output.put_line( l_objList(i).objName );
        end loop;
end;





Appreciate Your Help,
Neeraj 


Followup   September 26, 2006 - 5pm Central time zone:

Ok, my help:

when you glanced at the documentation to see what else might be available in the 
dbms_stats.objectTable type - what did you see?

http://download-uk.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_stats.htm#sthref7889
I saw this:

TYPE ObjectElem IS RECORD (
  ownname     VARCHAR2(30),     -- owner
  objtype     VARCHAR2(6),      -- 'TABLE' or 'INDEX'
  objname     VARCHAR2(30),     -- table/index
  partname    VARCHAR2(30),     -- partition
  subpartname VARCHAR2(30),     -- subpartition
  confidence  NUMBER);          -- not used
type ObjectTab is TABLE of ObjectElem;

which would indicate that other attributes of use exist there :)

 

5 stars Thanks Tom   September 26, 2006 - 6pm Central time zone
Reviewer: Neeraj Nagpal 
Thanks Tom, Yes ..it was my shortsighted search on the topic!

Neeraj Nagpal 


4 stars analyze vs. dbms_stats   October 10, 2006 - 4pm Central time zone
Reviewer: Rob from New Zealand
Tom,

Very useful - Thank You.

I have seen in your examples the use of:
analyze table t compute statistics for table for all indexes for all indexed columns

but then you'll say something like
In real life, however, one would want to use DBMS_STATS instead.

Could you please supply the equivalent statement using dbms_stats ?

Thanks 


Followup   October 10, 2006 - 8pm Central time zone:

those are all old examples, many of which pre-date the existance of dbms_stats.

dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all indexed columns', cascade => true 
);

would be the "same" as that particular analyze. 

4 stars DBMS_STATS - Locking   November 2, 2006 - 7pm Central time zone
Reviewer: A reader 
Tom,

Somewhere up there in this thread, you said “certain variations of analyze can lock (to validate 
structures and such), but dbms_stats just runs queries basically." 

However while running the following command, 

        DBMS_STATS.gather_table_stats (
        ownname=> 'ADBPROD',
        tabname=> ‘OP_LAYER’,
        estimate_percent=> dbms_stats.auto_sample_size,
        partname=> ‘CALIFORNIA’,
        granularity=> 'PARTITION');

I noticed intermittently that the table 'OP_LAYER' was being locked by the Gather_Table_Stats 
command, and a few users, who wanted to acquire locks on this table, had to wait.

See the screen dump.

(Pardon the bad text formatting here …)

SQL> Select a.username, b.owner||'.'||b.name,b.type, b.mode_held,b.mode_requested 
         from v$session a, dba_ddl_locks b
         where a.sid=b.session_id
        and  b.mode_held = 'None' and b.mode_requested <> 'None'
11:29:19   5  /  

USERNAME                       B.OWNER||'.'||B.NAME
------------------------------ -------------------------------------------------------------
TYPE                                     MODE_HELD MODE_REQU
---------------------------------------- --------- ---------
ADBPROD                        ADBPROD.OPINION_LAYER
Table/Procedure/Type                     None      Share  &#61663; (<-Share lock requested on the 
table)

ADBPROD                        ADBPROD.OPINION_LAYER
Table/Procedure/Type                     None      Share &#61663; (<-Share lock requested on the 
table)


ADBPROD                        ADBPROD.OPINION_LAYER
Table/Procedure/Type                     None      Share &#61663; (<-Share lock requested on the 
table)


ADBPROD                        ADBPROD.OPINION_LAYER
Table/Procedure/Type                     None      Share &#61663; (<-Share lock requested on the 
table)


ADBPROD                        ADBPROD.OPINION_LAYER
Table/Procedure/Type                     None      Share &#61663; (<-Share lock requested on the 
table)




11:37:24 ADBPROD@PROD SQL> Select a.username, b.owner||'.'||b.name,b.type, 
b.mode_held,b.mode_reques
ted
12:02:21   2  from v$session a, dba_ddl_locks b
12:02:21   3  where a.sid=b.session_id
12:02:21   4  and  b.mode_held = 'None' and b.mode_requested <> 'None'
12:02:21   5  /

no rows selected  <-NO LOCKS ON THE TABLE, AFTER A SHORT INTERVAL 

Could you please tell me that is it one of those “certain variations” of the DBMS_STATS which can 
lock the table being analyzed???





 


Followup   November 2, 2006 - 9pm Central time zone:

would have been useful to see the blocker and their lock mode? 

4 stars Blockers   November 3, 2006 - 1pm Central time zone
Reviewer: A reader 
Well I did check the blockers/waiters using the query below
, but got back -- no rows returned --

SELECT sid, DECODE( block, 0, 'NO', 'YES' ) BLOCKER, DECODE( request, 0, 'NO', 'YES' ) WAITER FROM 
v$lock WHERE request > 0 OR block > 0 ORDER BY block DESC;

Although I did'nt see any blockers OR waiters in the database at that time, the users were still 
getting breifly locked (like for 2-3 minutes and then released) on the table OP_LAYER. Not sure -- 
why. And I know that only process accessing the table 'OP_LAYER' was the DBMS_STATS.


 


Followup   November 4, 2006 - 12pm Central time zone:

v$sesstat
v$sessevent 

would be useful if you see it again. 

5 stars Is Commit after each table Analyze can make difference?   November 21, 2006 - 12pm Central time zone
Reviewer: Maulesh Jani from India
HI TOM, 
    We are analyzing the schema in following way :
for rec_table_name in (
        select table_name from user_tables
        minus
        select table_name from user_external_tables
    )    loop
dbms_stats.gather_table_stats('XXX',''||rec_table_name.TABLE_NAME||'',estimate_percent=>40,DEGREE=>4
,CASCADE => TRUE);
end loop;

Now according to you if I put commit after each Loop iteration , than can it reduce overhead on 
UNDO/REDO/TEMP tablespace or its nothing to relate with it.        
 


Followup   November 22, 2006 - 3pm Central time zone:

according to me, it would be a waste of time, since dbms_stats already commits.

ops$tkyte%ORA10GR2> create table t1 ( x int );

Table created.

ops$tkyte%ORA10GR2> create table t2 ( x int );

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 values ( 1 );

1 row created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> rollback;

Rollback complete.

ops$tkyte%ORA10GR2> select * from t1;

         X
----------
         1



and there is no such thing as "overhead on redo/undo/temp" - I am not familar with what you might 
be thinking. 

5 stars Can you please help me on this ..   November 22, 2006 - 12am Central time zone
Reviewer: Maulesh Jani from INdia
Hi Tom ,
       I need your help on this . As I have mentioned in my above question .We have one Analyze 
Schema stats process in Production system since 3 years . Now suddenly this process fails with 
below error :
Error While Analyzing Schema tables :

declare
*
ERROR at line 1:
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10$"
too small
ORA-06512: at line 12

Can you please tell me on it , that during stats gathering large amount of Redo generated . I am 
not sure but I am not been able to find any conviencing document on it. And the error is typical 
snapshot too old error , how it came during Analyze table stats ?

 Parallel with this process no other process is executing .

 
Regards
Maulesh  


Followup   November 24, 2006 - 12pm Central time zone:

that has nothing to do with redo

everything to do with undo.


snapshot too old happens when you have a query (and that is what dbms_stats does, runs queries) 
that is in need of undo that was overwritten.

increase your undo retention and/or review v$undostats to ensure you are not expiring undo 
prematurely. 

5 stars Mayday   November 30, 2006 - 1pm Central time zone
Reviewer: Apurva from India
We are handling massive volumes of data in our warehouse. Most of the fact tables will be appended 
with more than 200 million rows each month, and in the steady state the facts will have close to 
2.5 billion rows.

The end-users of our tables are some analysts -- so to give these guys a low downtime (or, 
sleepless nights - speaking dysphemistically), we came up with the following strategy:

* We'll have two sets of tables
  -> Temporary tables - in which we will load each month's data; these tables will not be 
partitioned 
  -> Final tables - each month we'll append Temporary tables' content to these tables through 
partition-exchange; Final tables will we range-partitioned on Month_ID

* As indexing is a time-intensive process -- we will build all the indexes in the world on the set 
of Temporary tables and have 'including indexes' clause in the partition-exchange statement

Now, here's where I am stumped (assuming you have played cricket)
 * As my tables are pretty big in size (though their sheer size makes them ugly), and they will be 
used by analysts -- it's almost mandatory to gather statistics for these tables.   Unfortunately, 
there is no option for a clause like 'including statistics' in partition-exchange statement -- by 
which I'll gather statistics for the Temporary table and carry the stats to the Final table (it 
doesn't make much sense either). So, the only option I am left with is to dump data in the Final 
table, and analyze the entire Final table for statistics -- but with 2.5 billion rows it'll be eons 
before it gets analyzed. Or is there a way of escaping this?, which'll help me realize my sadistic 
objectives.

Thanks Tom.

 
 * We load each month's data in a Temporary table
 * We build all the indexes we want on the temporary table
 * Partition-exchange Temporary table with the Final table 


Followup   November 30, 2006 - 3pm Central time zone:

from above:

...
you can import/export/set statistics directly with dbms_stats
.....
                      ^^^


the partition stats will go, you'll just roll up the global stats you want to roll up - using your 
knowledge of the data



ops$tkyte%ORA10GR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int not null,
  5    y   varchar2(30)
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) ,
 10    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) ,
 11    PARTITION junk VALUES LESS THAN (MAXVALUE)
 12  )
 13  /

Table created.

ops$tkyte%ORA10GR2> insert into t select to_date( '12-mar-2003','dd-mon-yyyy')+mod(rownum,3), 
user_id, username from all_users;

31 rows created.

ops$tkyte%ORA10GR2> create index t_idx on t(x) local;

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
        31

ops$tkyte%ORA10GR2> select partition_name, num_rows from user_tab_partitions where table_name =
  2  'T';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
JUNK                                   10
PART1                                  10
PART2                                  11

ops$tkyte%ORA10GR2> select num_rows from user_indexes where table_name = 'T';

  NUM_ROWS
----------
        31

ops$tkyte%ORA10GR2> select partition_name, num_rows from user_ind_partitions where index_name =
  2  'T_IDX';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
JUNK                                   10
PART1                                  10
PART2                                  11

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table temp (dt date, x int not null, y varchar2(30) );

Table created.

ops$tkyte%ORA10GR2> insert into temp
  2  select to_date( '12-mar-2003'), user_id, username from all_users;

31 rows created.

ops$tkyte%ORA10GR2> create index temp_idx on temp(x);

Index created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'TEMP' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table t
  2  exchange partition part1
  3  with table temp
  4  including indexes
  5  without validation
  6  /

Table altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> begin
  2          for x in (select sum(num_rows) nrows from user_tab_partitions where table_name = 'T')
  3          loop
  4                  dbms_stats.set_table_stats( user, 'T', numrows => x.nrows );
  5                  dbms_stats.set_index_stats( user, 'T_IDX', numrows => x.nrows );
  6          end loop;
  7  end;
  8  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
        52

ops$tkyte%ORA10GR2> select partition_name, num_rows from user_tab_partitions where table_name = 
'T';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
JUNK                                   10
PART1                                  31
PART2                                  11

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select num_rows from user_indexes where table_name = 'T';

  NUM_ROWS
----------
        52

ops$tkyte%ORA10GR2> select partition_name, num_rows from user_ind_partitions where index_name =
  2  'T_IDX';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
JUNK                                   10
PART1                                  31
PART2                                  11
 

5 stars Thanks gazillion   December 1, 2006 - 1am Central time zone
Reviewer: Apurva 


5 stars Local vs. (Global + Local) statistics   December 6, 2006 - 6am Central time zone
Reviewer: Apurva 
Tom,

Consider this scenario:

My table has 12 partitions, with ~200 million rown in each partition. I have statistics gathered 
for each individual partition, but don't have the Global statistics for the table.

Can I expect to achieve a significant increase in query performance with Global indexes gathered? 

Or, gathering Global stats will only be too consuming without much improvement in query response 
(as I already have stats for individual partitions)?

I understand this question is a bit abstract, and that I can test it myself -- but I don't enough 
data or resources to test it.

Please share your invaluable views.
Thanks 


Followup   December 7, 2006 - 8am Central time zone:

...
Can I expect to achieve a significant increase in query performance with Global 
indexes gathered? 

....

one of three things can be expected:

a) increase in performance
b) decrease in performance (unlikely, but possible)
c) no change at all in performance

in other words - IT DEPENDS.

if your queries all include the partition key in such a way that the optimizer knows "only one 
partition will be hit", local statistics will be used to optimize that query.

global statistics may never be used (meaning C is likely)

on the other hand, if you run queries that do not use partition elimination down to a single 
partition - then global statistics will be used...


Now, are you sure you don't have SOME global stats :)


ops$tkyte%ORA10GR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int,
  5    y   varchar2(30)
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) ,
 10    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) ,
 11    PARTITION junk VALUES LESS THAN (MAXVALUE)
 12  )
 13  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t
  2  select to_date( '12-mar-2003','dd-mon-yyyy')+mod(rownum,3), object_id, object_name
  3    from all_objects;

50122 rows created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', partname => 'PART1', granularity 
=> 'PARTITION' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------


ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', partname => 'PART2', granularity 
=> 'PARTITION' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------


ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', partname => 'JUNK', granularity 
=> 'PARTITION' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
     50122

ops$tkyte%ORA10GR2> select partition_name, num_rows from user_tab_partitions where table_name = 
'T';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
JUNK                                16707
PART1                               16707
PART2                               16708

 

5 stars Thanks Tom!   December 7, 2006 - 9am Central time zone
Reviewer: Apurva 


2 stars Off-the-topic   December 7, 2006 - 4pm Central time zone
Reviewer: Apurva 
"What gets us into trouble is not what we don't know;
it's what we know for sure that just ain't so."

-- Twain 


5 stars sys_op_countchg - Usage   January 4, 2007 - 4am Central time zone
Reviewer: Arindam Mukherjee from Kolkata, India
Sir,

Today I was reading a chapter "The Clustering Factor" written by Mr. Jonathan Lewis in his new book (Chapter - 5) and have found one function "sys_op_countchg ()". I have found this explanation rather opaque due to my poor knowledge on Oracle. Could you kindly explain its usage with your inimitable style for us?
Why and How to use this function in live or production database Oracle 9i and 10g?


Followup   January 5, 2007 - 9am Central time zone:

you do not, it is not documented, do not use it.
5 stars Last_analyzed shows the incorrect date   January 22, 2007 - 7am Central time zone
Reviewer: Maulesh from India
Hi TOM,
    Thanks for the support and sharing knowledge about Oracle. We have implemented Schedule Job for statistics gathering . For which we are using Analyze statement .
But currently we come to know that values for Last_analyzed is not geeting updated daily and it is just remain as it is . Can you please help me on this ?

Thanks

5 stars system stats need updating?   February 21, 2007 - 11am Central time zone
Reviewer: steve 
Oracle 10G
Solaris 9


Regarding the "out of the box" gathering of system statistics, I'm wondering why this query is so slow. It takes well over 1 minute to run and there's not that many connections/processes (< 500).

select p.username pu,
    s.username su,
    s.status stat,
    s.sid ssid,
    s.serial# sser,
    lpad(p.spid,7) spid,
    to_char(logon_time,'YYYY-MM-DD HH24:MI:SS') logon_time,
    substr(sa.sql_text,1,540) txt
from v$process p, v$session s, v$sqlarea sa
where  p.addr=s.paddr
and    s.username is not null
and    s.sql_address=sa.address(+)
and    s.sql_hash_value=sa.hash_value(+)
order by 1,2,7



The explain plan is:
====================

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1091464685

----------------------------------------------------------------------------------------------
| Id | Operation            | Name        | Rows | Bytes | Cost (%CPU)| Time  |
----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT      |          |  1 |  738 |  100  (4)| 00:00:02 |
|  1 | SORT ORDER BY        |          |  1 |  738 |  100  (4)| 00:00:02 |
|* 2 |  HASH JOIN OUTER      |          |  1 |  738 |  99  (4)| 00:00:02 |
|  3 |  NESTED LOOPS        |          |  1 |  204 |  73  (2)| 00:00:01 |
|* 4 |  HASH JOIN        |          |  1 |  191 |  49  (3)| 00:00:01 |
|* 5 |    FIXED TABLE FULL    | X$KSUSE      |  1 |  142 |  24  (0)| 00:00:01 |
|* 6 |    FIXED TABLE FULL    | X$KSUPR      |  1 |  49 |  24  (0)| 00:00:01 |
|* 7 |  FIXED TABLE FIXED INDEX| X$KSLED (ind:2) |  1 |  13 |  24  (0)| 00:00:01 |
|* 8 |  VIEW            | GV$SQLAREA    |  100 | 53400 |  25  (4)| 00:00:01 |
|* 9 |  FILTER          |          |    |    |        |      |
| 10 |    SORT GROUP BY      |          |  100 | 72300 |  25  (4)| 00:00:01 |
| 11 |    FIXED TABLE FULL  | X$KGLOB      |  100 | 72300 |  24  (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - access("S"."KSUSESQL"="ADDRESS"(+) AND "S"."KSUSESQH"="HASH_VALUE"(+))
  4 - access("ADDR"="S"."KSUSEPRO")
  5 - filter("S"."KSUUDLNA" IS NOT NULL AND "S"."INST_ID"=:B1 AND
        BITAND("S"."KSSPAFLG",1)<>0 AND BITAND("S"."KSUSEFLG",1)<>0)
  6 - filter("INST_ID"=:B1 AND BITAND("KSSPAFLG",1)<>0)
  7 - filter("S"."KSUSEOPC"="E"."INDX")
  8 - filter("INST_ID"(+)=:B1)
  9 - filter(SUM(DECODE("KGLOBT09",0,"KGLOBT02",0))<>0)

30 rows selected.

SQL> select count(*) from X$KSUSE;

COUNT(*)
----------
    1200

SQL> select count(*) from X$KSUPR;

COUNT(*)
----------
    700

SQL> select count(*) from X$KSLED;

COUNT(*)
----------
    812


The last count(*) took a long time!
====================================

SQL> set timing on
SQL> select count(*) from X$KGLOB;

COUNT(*)
----------
  933900

Elapsed: 00:01:08.83


Followup   February 21, 2007 - 12pm Central time zone:

you might want up to date stats on the fixed tables, yes.
5 stars DBMS_STAT   February 21, 2007 - 1pm Central time zone
Reviewer: AD from UK
Hi Tom,

Some of my team members think that we should have a wrapper (package) that would call DBMS_STATS to gather statistics instead of using the DBMS_STATS directly. What is your opinion about this approach.


Regards


Followup   February 21, 2007 - 3pm Central time zone:

depends on the goal of the wrapper package

if it simply calls dbms_stats - call for call - it would not be "smart"

if it does something "intelligent" (and you have to define intelligent in your context), then perhaps - but it would not really be a wrapper package at that point, but a set of procedural logic that gathers statistics on your system in the manner you find best.
5 stars   February 25, 2007 - 8am Central time zone
Reviewer: SEan 
Tom,
i used dbms_stats.gather_database_stats as :
begin sys.DBMS_STATS.GATHER_DATABASE_STATS (); end;

and found out some of the schemas ware not gathered.

does it gather all the schemas in the database as supposed?

TIA



Followup   February 26, 2007 - 1pm Central time zone:

umm,

versions?
what schemas?

got example?
5 stars   February 27, 2007 - 10am Central time zone
Reviewer: Sean 
it's 9.2.0.7
the job completed via dbms_job and also checked no errors in alert.log. the job have a few time after Feb 25.

the schemas ware not completed:
SQL> select owner, count(*) not_c from dba_tables where last_analyzed <'24-FEB-07' group by owner;

OWNER                    NOT_C
------------------------------ ---------
BIM                      61
BIS                      102
ENI                      46
FII                      76
GL                        1
HRI                      53
ISC                      10
JTF                        6
OPI                      24
OSM                        1
POA                      36
PORTAL30_DEMO                1
SYS                      316

13 rows selected.

the follwoing t_count is the total no. of the tables owned by the schema.

SQL> select b.owner, count(*) t_count from dba_tables b where b.owner in (select a.owner from dba_ta
bles a
2 where a.last_analyzed <'24-FEB-07') group by b.owner;

OWNER                  T_COUNT
------------------------------ ---------
BIM                      261
BIS                      460
ENI                      140
FII                      278
GL                      176
HRI                      185
ISC                      37
JTF                      403
OPI                      122
OSM                      198
POA                      138
PORTAL30_DEMO                9
SYS                      350

13 rows selected.

SQL>


Followup   February 27, 2007 - 11am Central time zone:

so, lets see the JOB.
5 stars   February 27, 2007 - 1pm Central time zone
Reviewer: Sean 
those are not LOBs. not sure why dbms_stats.gather_database_stats does not pick up those.
analyze table does work for those...

SQL> select b.owner, b.object_name, b.object_type from
2 dba_objects b, dba_tables a
3 where a.owner=b.owner
4 and b.object_name=a.table_name
5 and a.last_analyzed <'24-FEB-07' and a.owner='SYS'
6 and rownum<3
7 /

OWNER  OBJECT_NAME          OBJECT_T
-------- ------------------------- --------
SYS    FILE$              TABLE
SYS    BOOTSTRAP$          TABLE

SQL> desc BOOTSTRAP$ 
ERROR:
ORA-04043: object BOOTSTRAP$ does not exist


SQL> desc sys.BOOTSTRAP$
Name                                Null?  Type
----------------------------------------------------- -------- -----------------------
LINE#                                NOT NULL NUMBER
OBJ#                                NOT NULL NUMBER
SQL_TEXT                              NOT NULL VARCHAR2(4000)

SQL> select last_analyzed from dba_tables where table_name='BOOTSTRAP$';

LAST_ANAL
---------
07-OCT-06

SQL> analyze table sys.BOOTSTRAP$ compute statistics;

Table analyzed.

SQL> select last_analyzed from dba_tables where table_name='BOOTSTRAP$';

LAST_ANAL
---------
27-FEB-07

SQL>

SQL> select job
2 , log_user
3 , priv_user
4 , schema_user
5 ,    last_date
6 , last_sec
7 ,    this_date
8 , this_sec
9 ,    next_date
10 , next_sec
11 ,    total_time
12 , broken
13 , interval
14 ,    failures
15 , what
16 from sys.dba_jobs where log_user='SYS'
17 /

    LOG    PRIV  SCHEMA
  JOB USER  USER  USER  LAST_DATE LAST_SEC THIS_DATE THIS_SEC
------- -------- -------- -------- --------- -------- --------- --------
            TOTAL
NEXT_DATE NEXT_SEC  TIME B INTERVAL        FAIL WHAT
--------- -------- ------ - -------------------- ---- --------------------
 
40704 SYS    SYS    SYS    26-FEB-07 19:33:42
04-MAR-07 19:33:42 41484 N sysdate + 6        0 begin sys.DBMS_STATS
                                    .GATHER_DATABASE_STA
                                    TS (); end;


Followup   February 27, 2007 - 2pm Central time zone:

LOBS? I said "job", just wanted to see the parameters.

where those tables EVER analyzed?
5 stars   February 27, 2007 - 2pm Central time zone
Reviewer: Sean 
use portal30_demo as example:
SQL> select table_name ,last_analyzed from dba_tables where owner='PORTAL30_DEMO';

TABLE_NAME              LAST_ANAL
------------------------------ ---------
EMP                  27-FEB-07
DEPT                  27-FEB-07
MLOG$_EMP              14-NOV-04
RUPD$_EMP
EMP_SNAPSHOT            27-FEB-07
PEOPLE_INFO$            27-FEB-07
TASK_CATEGORY$          27-FEB-07
TASK_DEVELOPER$          27-FEB-07
TASK$                27-FEB-07

there are two tables were not picked by dbms_stats.gather_database_stats.

Followup   February 27, 2007 - 3pm Central time zone:

system generated things, so - got version?

older releases didn't really want statistics on some things.
5 stars it's 9.2.0.7   February 27, 2007 - 5pm Central time zone
Reviewer: Sean 


Followup   February 28, 2007 - 3pm Central time zone:

gather database stats skips those two particular segments.
4 stars dynamic sampling   March 1, 2007 - 12pm Central time zone
Reviewer: Amit 
Hi Tom,

Thanks for sharing your vast experience and knowledge with us.

From discussions about dynamic sampling above, I gathered the following information-
1)In 9i we have to collect the stats ourself,
2)In 10g, Oracle calc the stats for us (when we first fire the query during its hard parse)

Assuming point 2 above is right, I tried a test in 10g R2
But what I have noticed is dynamic sampling is not kicked in automatically as you can see in Area 1 below ?
and other issues in area 2 and area 3

10g Rel 2
=========================================================
AREA 0)
SQL> create table temp
2 (a number,
3 b number);

Table created.

SQL> declare
2 begin
3 for i in 1..100000
4 loop
5 insert into temp values(i,i+1);
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.
============================================================
AREA 1)

SQL> explain plan for select a from temp;
Explained.

SQL> select * from table( dbms_xplan.display );
PLAN_TABLE_OUTPUT
-----------------------------------------------------------
| Id | Operation      | Name |
----------------------------------
|  0 | SELECT STATEMENT |    |
|  1 | TABLE ACCESS FULL| TEMP |
----------------------------------
Note
-----
  - 'PLAN_TABLE' is old version
  - rule based optimizer used (consider using cbo)
12 rows selected.
===========================================================
So in Above it is not using dynamic_sampling automatically because it is using RBO and not CBO

But if I hint it to use dynamic sampling (as follows) then it shift to CBO and Rows are guesses nearly Right
=======================================
AREA 2)
SQL> explain plan for select /*+ dynamic_sampling (t 10) */ a from temp;

Explained.

SQL> select * from table( dbms_xplan.display );

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------

----------------------------------------------------------
| Id | Operation      | Name | Rows | Bytes | Cost |
----------------------------------------------------------
|  0 | SELECT STATEMENT |    | 83932 | 1065K|  57 |
|  1 | TABLE ACCESS FULL| TEMP | 83932 | 1065K|  57 |
----------------------------------------------------------

Note
-----
  - 'PLAN_TABLE' is old version

11 rows selected.
========================================================
Now dynamic sampling is done in previous step so it should hard parsed and plan is ready in shared pool. This same plan should be used if I fire syntactically same query again but again it is going back to RBO (as follows)
===============================================
AREA 3)
SQL> explain plan for select a from temp;

Explained.

SQL> select * from table( dbms_xplan.display );

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

----------------------------------
| Id | Operation      | Name |
----------------------------------
|  0 | SELECT STATEMENT |    |
|  1 | TABLE ACCESS FULL| TEMP |
----------------------------------

Note
-----
  - 'PLAN_TABLE' is old version
  - rule based optimizer used (consider using cbo)

12 rows selected.
================================================


Followup   March 2, 2007 - 12pm Central time zone:

1) you have dynamic sampling in 9i as well - it is just defaulting to a "lower level" than 10g did.



the RBO doesn't sample because the rbo would never use the information. Do not use the RBO in 10g.
5 stars Tables with stale stats ignored by dbms_stats   March 1, 2007 - 4pm Central time zone
Reviewer: A reader 
Tom,
This is 10gr2. We use the following command to gather statistics:
DBMS_STATS.GATHER_SCHEMA_STATS(ownname=>'ABC',GRANULARITY=>'ALL',CASCADE
=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS size 1',DEGREE=>DBMS_STATS.AUTO_DEGREE);

The tables are in monitoring mode. We were told by Oracle support that the above method will ignore tables with stale statistics. Here is is quote from analyst:
=============
It is because some of the objects become stale and are ignored while gathering schema statistics. It is better to use
options=>'GATHER STALE' in the procedure.
=============

I always thought that if I do not mention any option, it will gather statistics for all tables in the schema, stale or not.

Thanks


5 stars Analyze process is taking very long   March 5, 2007 - 11am Central time zone
Reviewer: Apurva 
Tom,

I am using the following piece of PL-SQL code to analyze a 20-column table which has ~650 million rows.

<SQL>
    begin
    dbms_stats.gather_table_stats(ownname => 'ops\$datamart', tabname => 'FCT_SALES', estimate_percent => 15 ,
                          granularity => 'ALL', cascade => true, force => true);
    end;
    /
</SQL>

The problem is that the analyze process is taking close to 21 hours!, although we have a very good architecture.

What would be your take on this?

Thanks in anticipation



Followup   March 5, 2007 - 2pm Central time zone:

... although we have a very good architecture. ...

well, a pretty building won't make anything run faster (sorry, I could not resist - although we have a very good architecture... don't even know what to make of that!! :)

if the table is dense, you could try block sampling (do 15% of the IO to read).

You could also introduce "parallel" into it.

even if you do not use parallel on the table itself, you could run more than one gather against the table + indexes, instead of cascade...

or if you are on 10g, you might not even need cascade if you are re-creating the indexes after a load since the indexes collect stats during a create now.


but - you really don't give us much to work with. 20 columns and 650 million rows and a "very good archicture" plus $1 will get you a cup of coffee (not at starbucks of course).

you don't mention why you gather statistics (what do you do that causes this, a truncate and reload, a load of a new partition, "time" - what)

you don't tell us about the structure of this table

you don't tell us very much........
5 stars Table hung when dbms_stats   March 8, 2007 - 2pm Central time zone
Reviewer: A reader 
Hi Tom,

We have one partitioned table with 3986259 rows, we have script scheduled (twice a week) to gather stats on this table. This table usually takes maximum 10Mins to gather stats without any issues. But yesterday this job did not finished at all, we had to kill that process after 14Hours. When this was running any query on this table was hanging, even "desc" from sqlplus hung. As soon as this process was killed all was normal. Due to this reason any application that was trying to access this table was just freezing and brought down the whole system.


DBMS_STATS.gather_table_stats ('AUSER',
                    'ATABLE',
                      estimate_percent    => 5,
                      CASCADE          => TRUE
                      );

Do you know what could be the problem?

Many Thanks




Followup   March 8, 2007 - 3pm Central time zone:

please utilize support for this one.
5 stars   March 8, 2007 - 2pm Central time zone
Reviewer: A reader 
Sorry forgot to mention that this Oracle 9.2.0.6 RAC on Sun Solaris

Thanks

3 stars   March 12, 2007 - 4am Central time zone
Reviewer: A reader 
Thanks

5 stars Thanks, but...   March 16, 2007 - 4pm Central time zone
Reviewer: Apurva 
Tom,

Thanks a lot for all your help, and sorry for following-up a bit late: the system as well the building it's housed in have a wonderful architecture :-)

* 8 CPUs
* 32 GB RAM
* Oracle 10g R2 - both client and server run on Linux
* 1071, Fifth Avenue at 89th Street

Here are the details of how we load out fact table:
* The table is partitioned by MONTH (date format)
* In every refresh we first truncate the entire table
* Then using SQL loader we populate historical as well as current month's data [and in a given month we load about 200 million rows]
* Then we build 12 local bitmap indexes on 12 non-fact columns
* Then we would gather the statistics of the table using the following procedure:

<SQL>
  begin
  dbms_stats.gather_table_stats(ownname => 'ops\$datamart', tabname => 'FCT_SALES', estimate_percent => 15 ,
                granularity => 'ALL', cascade => true, force => true);
  end;
  /
</SQL>

...but it would take close to a million years (~21 hours)

I came to you and as per your suggestion I removed 'force => true' option, but Oracle is misbehaving and is still gathering statistics for the indexes?! such that the overall and the time taken is still ~20 hours. Moreover, it's index analyzation that is eating most of the time - we found that out of 20 odd hours that the entire process takes, Oracle spends close to 18 hrs to gather stats for indexes...and quite obviously it's not a good idea as - as you remarked - in 10g index stats are automatically gathered when the index is built.

Seems it's an Oracle bug? What are your thoughts?

Thanks in anticipation




Followup   March 17, 2007 - 4pm Central time zone:

cascade is what affects indexes, oracle is not misbehaving, it is simply doing what you told it to.

cascade=>false, not force.

did you try ANY of the other things I mentioned??!!??


5 stars   March 17, 2007 - 11pm Central time zone
Reviewer: Apurva 
that's what happens when you write your mails in anticipation of St. Patrick's day...

I simply removed 'cascade=>true' (and not 'force=>true' - as I miswrote) and did not explicitly write 'cascade=>false', in the fond hope that default value for cascade parameter is 'false'

I tried your 'block sampling' suggestion and found that it did not affect analyze time substantially - index-analyzation is the killer

Tom,

Thanks *A LOT* again.



5 stars it's working!!!!!!   March 17, 2007 - 11pm Central time zone
Reviewer: Apurva 


3 stars wrapper for DBMS_STATS   March 20, 2007 - 7am Central time zone
Reviewer: Chris Gould from London, UK
I noticed recently that if you use DBMS_STATS.GATHER_TABLE_STAS with CASCADE=>TRUE then it gathers stats for the table and each index in turn. This can take a long time, especially if there are a lot of indexes on the table (eg. in a warehouse). One way to speed this up is to use DBMS_SCHEDULER to create and run jobs for each index on the table simultaneously. Scheduler and the resource manager should ensure the database is not swamped and the jobs all complete as efficiently as possible. The following code indicates the idea :
begin
  for r in (select ui.index_name
                  ,uip.partition_name
                  ,dbms_scheduler.generate_job_name('IXAZ_') as jobname
              from user_indexes ui
            inner join user_ind_partitions uip
             on (uip.index_name = ui.index_name)
            where ui.table_name = 'ASSET'
              and uip.partition_name = 'P200703'
              order by ui.index_name,uip.partition_name )
  loop
  
        execute immediate
'BEGIN
   DBMS_SCHEDULER.CREATE_JOB (
   job_name             => :jobname, 
   job_type             => :jobtype,
   job_action           => :jobaction,
   start_date           => NULL,
   repeat_interval      => NULL, 
   end_date             => NULL,
   enabled              => TRUE,
   auto_drop            => TRUE,
   comments             => :jobcomments);
END;'
using 
        r.jobname
      ,'PLSQL_BLOCK'
      ,'begin dbms_stats.gather_index_stats(
         ownname=>'''||user||'''
         ,indname=>'''||r.index_name||'''
         ,partname=>'''||r.partition_name||''''||');end;'
      ,'user='||user||',index='||r.index_name||',ptn='||r.partition_name;
      
end loop;

end;


The code above only considers the indexes for one partition on the table, but the total time to analyze all the indexes will be approx the time taken for the largest index if analyzed individually.


3 stars wrapper for DBMS_STATS - additional info   March 21, 2007 - 5am Central time zone
Reviewer: Chris Gould from London, UK
One suggestion I'd make regarding the DBMS_SCHEDULER wrapper to DBMS_STATS is not to specify a degree of parallelism > 1 when analyzing. Using a large degree of parallelism seems to confuse the resource manager - I guess because it thinks it's executing one job but this then spawns multiple sessions for the additional degrees of parallelism - and you can quickly exhaust the supply of available sessions.

3 stars Need Guidance on Partitioned tables   September 20, 2007 - 5am Central time zone
Reviewer: Vk from India
Hi Tom,

We have huge patririoned tables which are continually loaded on a daily basis based on Range 
partition, and we do maintain 30 days worth of retention. The data for the past 29 days is static 
for the 30th day and we don't want to re-compute the partition level statistics again for them as 
they have become stale.

However we drop the old partition and adds a new partition on a daily basis , what is termed as 
Rolling Window.

Now please provide your suggestions:

1. We are doing gather_table_stats in a loop for the complete schema using 

dbms_stats.gather_table_stats(ownname=> l_username,
tabname=> l_table_name,
estimate_percent  => 5,
method_opt        => l_method_opt,
degree            => 16,
cascade           => true);

Where l_method_opt is :
'for columns ' || p_col_name || ' SIZE 1'; for the partitioned tables and p_col_name is the column 
on which the data is partitioned.

However if the table is non-partitioned then 
l_method_opt is :
'for all indexed columns SIZE 1';

Now this complete gathering of stats takes around 3 hrs and for us it is quite significant.

Can you please provide us suggestions as to how we can capture global and partition level 
statistics without recomputing them all over again and again.

Your advice/help is appreciated.


Followup   September 24, 2007 - 7am Central time zone:

why would you gather stats on the partition key only? that seems a little strange, what is the thought process behind that?


If you gather local stats on only the single new partition - we'll do our best to "guess" the global stats. In many cases - we'll get it right (eg: number of total rows). In other cases, we might not - eg: number of distinct values - if you have 30 partitions and X has 10 distinct values in each partition - you either have a) 10 distinct values of X, b) 300 distinct values of X, or c) between 11 and 299 distinct values of X - the only way to truly know would be to analyze everything OR have you use dbms_stats.set_xxxxx routines to tell us.
3 stars Thought Process of only gather stats on Partitioned Column   September 24, 2007 - 1pm Central time zone
Reviewer: Vk from INDIA
Hi Tom,

Since every query which comes to us has to have the WHERE clause suppoted by the PARTITIONED COLUMN 
we thought we don't need to gather stats on any other column, since this will drive the rows 
flowing from this step.

I am pretty okay in gathering only the stats for PARTITITIONED COLUMN and not on all the columns of 
the table as is done by Gather Stats by default. One reson is the time it takes to gather stats is 
very high on nearly a 1TB of database.

I am also thinking to gather the partition level stats only for the latest partition and not 
capture global level stats if Oracle's CBO can guess it 90% correct, I am happy with it.

Thanks,
Vikas khanna


Followup   September 26, 2007 - 8pm Central time zone:

umm, that is flawed

if you only get column stats on the partition column, what would the estimated card= value be for something like:

select from t where x in (select Y from partitioned_table where key=val and SOME_COL='abc')

if we know nothing about some_col, we cannot get the card=value.

so, please - go into more detail as to why you did this? what facts did you use in coming to this conclusion?

the "partition_key=value" genearally is used for partition elimination. after that, we still - well - need statistics.
5 stars Analyze Vs DBMS_STATS   October 17, 2007 - 12am Central time zone
Reviewer: Karthick from India
for a particular table iam able to use the ANALYZE command but not DBMS_STATS.

see the test case below.

Enter user-name: sysadm/sysadm@inlabtst

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select count(1) from HX_SSN_GEN_STG_TBL;

  COUNT(1)
----------
    162439

SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SYSADM', tabname => 'HX_SSN_GEN_STG_TBL');
BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SYSADM', tabname => 'HX_SSN_GEN_STG_TBL'); END;

*
ERROR at line 1:
ORA-20000: Unable to analyze TABLE "SYSADM"."HX_SSN_GEN_STG_TBL", insufficient
privileges or does not exist
ORA-06512: at "SYS.DBMS_STATS", line 13046
ORA-06512: at "SYS.DBMS_STATS", line 13076
ORA-06512: at line 1


SQL> ANALYZE TABLE HX_SSN_GEN_STG_TBL COMPUTE STATISTICS;

Table analyzed.

SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SYSADM', tabname => 'HX_NAMES_STG');

PL/SQL procedure successfully completed.

just for that one table iam not able to use DBMS_STATS for other tables i can. but i can use 
ANALYZE command on that table. any idea what could be the problem.


5 stars Found the answer   October 17, 2007 - 12am Central time zone
Reviewer: Karthick from India
Sorry for posting the pervious one tom. After some search in orafaq got the answer..

SQL>  select username, privilege from user_sys_privs;

USERNAME                       PRIVILEGE
------------------------------ ----------------------------------------
SYSADM                         UNLIMITED TABLESPACE
SYSADM                         CREATE TABLE
SYSADM                         CREATE SESSION
SYSADM                         ANALYZE ANY

SQL> GRANT SELECT ANY TABLE TO SYSADM;

Grant succeeded.

SQL> select username, privilege from user_sys_privs;

USERNAME                       PRIVILEGE
------------------------------ ----------------------------------------
SYSADM                         UNLIMITED TABLESPACE
SYSADM                         SELECT ANY TABLE
SYSADM                         CREATE TABLE
SYSADM                         CREATE SESSION
SYSADM                         ANALYZE ANY

SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SYSADM', tabname => 'HX_SSN_GEN_STG_TBL');

PL/SQL procedure successfully completed.


5 stars About the first entry   February 21, 2008 - 7pm Central time zone
Reviewer: Alvin from New Zealand.
On a 10g box.

Will running an analyze table overwrite the statistics gathered by dbms_stats earlier ?

Dbms_stats ran at 10am 
analyze table ran at 11am

Do both (analyze table and DBMS_Stat) store statistics at the same location ? (hence the earlier 
question of overwritting).


Followup   February 21, 2008 - 10pm Central time zone:

statistics are stored in one place. so, sure they would.

4 stars Killind DBMS_STATS in the middle of a large table   March 16, 2008 - 10am Central time zone
Reviewer: Ricardo Masashi from Brazil
Hi, Tom,

I was wondering what happen if we kill a DBMS_STATS running job, by some reason, in the middle of 
gathering statistics of a large/huge table.

That table would contain stale, last analyzed or no stats ?

My concern is if the dbms_stats:
- Delete stats before collecting;
- Collect before overwriting;

Thanks!


Followup   March 24, 2008 - 8am Central time zone:

well, statistics are gathered and updated and gathered and updated - and then finally committed at the end - as a transaction.

So, dbms_stats works like DDL in that sense, all or nothing.
4 stars different execution plans when using ANALYZE versus DBMS_STATS   May 16, 2008 - 10am Central time zone
Reviewer: A reader 
hi tom,

i analyzed one table, one time with ANALYZE the other time with DBMS_STATS.GATHER_TABLE_STATS 
(commands below). i got different executions plans for the same query. good with analyze, bad with 
DBMS_STATS.GATHER_TABLE_STATS. when comparing all statistics i found they to be the same with the 
excpetion of the histograms having different slot sizes (see below). 


analyze table gfm.SUBCONFIG_CONFIGURABLE_OBJ_T estimate statistics
for table
for all indexes
for all indexed columns
sample 40 percent;


exec dbms_stats.gather_table_stats ( -
  'GFM', 'SUBCONFIG_CONFIGURABLE_OBJ_T', estimate_percent=>40 -
 ,method_opt=>'for all indexed columns', cascade=>true);


SQL> select * from (
  2  select 'ANALYZE' meth, a.* from system.t_analyze_hist a
  3  UNION ALL
  4  select 'DBMSSTATS', s.* from system.t_stats_hist s
  5  ) where column_name = 'SUBCONFIG_ID'
  6  order by 1;

METH       OWNER      TABLE_NAME                     COLUMN_NAME          ENDPOINT_NUMBER 
ENDPOINT_VALUE ENDPOINT_A
---------- ---------- ------------------------------ -------------------- --------------- 
----------
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                     782           
   0
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    1662           
   1
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    2546          
19371
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    3423          
19390
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    4315          
19411
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    5191          
19471
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    6075          
19490
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    6943          
19510
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    7826          
19531
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    8720          
19571
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    9636          
19591
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   10531          
19611
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   11414          
19631
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   12301          
19651
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   13184          
19671
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   14056          
19673
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   14930          
19691
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   15816          
19711
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    1526           
   0
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    3033           
   1
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    4396          
19371
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    5745          
19390
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    7119          
19411
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    8486          
19471
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    9894          
19490
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   11234          
19510
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   12627          
19531
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   13996          
19571
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   15363          
19591
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   16761          
19611
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   18149          
19631
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   19540          
19651
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   20926          
19671
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   22320          
19673
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   23722          
19691
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   25042          
19711

36 rows selected.

Q1) could the slot size difference of the histograms be the reason for the different plan?
Q2) what should i do in the future? Going back to ANALYZE?


Followup   May 19, 2008 - 3pm Central time zone:

q1) yes
q2) ummm, no. your output is hard to read - not sure how many buckets each one generated, but use something that generates the same buckets

and realize that since you have histograms, you'll get different plans for different "inputs" over time. sometimes (since you do not have a bucket PER VALUE) dbms_stats will get the better plan than analyze did - just change your "predicate" and you'll see - you have an imperfect picture of your data here.
3 stars different execution plans when using ANALYZE versus DBMS_STATS   May 20, 2008 - 10am Central time zone
Reviewer: A reader 
hi tom,

thanks for your answer. while doing further investigations on ANALYZE versus DBMS_STATS i found 
that i get the same histograms on COMPUTE statistics, but different ones on ESTIMATE SAMPLE x 
percent.

i would assume now that the internal sampling is done differently by ANALYZE than DBMS_STATS. 
correct?


Followup   May 20, 2008 - 11am Central time zone:

if you ran analyze over and over - you would discover IT generates different statistics.

they are random samples, if you randomly sample something over and over - you'll get different results....


5 stars how oracle will use the statistics after rename a table   May 30, 2008 - 9am Central time zone
Reviewer: Yuna from canada
Hi Tom,

We are going to delete tons of rows from a table. We will do the following:

1) create a temp table;
2) copy the good data  from old table (A) to the temp table;
3) create index and trigger in the temp table;
4) rename old table to A_archive table;
5) rename temp table to A old table;
6) trucate A_archive table;

Thw question is after the temp is rename to A old table ( the same name). Whether Oralce will use 
the old statistic from the old table? in that case we need to analyze the new table right way?



Followup   May 30, 2008 - 3pm Central time zone:

the statistics are tracked by object ids, not the 'name'.


ops$tkyte%ORA10GR2> create table t1 as select * from all_objects;

Table created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T1' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> create table t2 as select * from t where owner = 'SCOTT';

Table created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select table_name, num_rows from user_tables where table_name in ( 'T1', 'T2' 
);

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                  49904
T2                                     16

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table T1 rename to A_archive;

Table altered.

ops$tkyte%ORA10GR2> alter table T2 rename to T1;

Table altered.

ops$tkyte%ORA10GR2> truncate table A_archive;

Table truncated.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select table_name, num_rows from user_tables where table_name in ( 'T1', 'T2', 
'A_ARCHIVE' );

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
A_ARCHIVE                           49904
T1                                     16


3 stars about the statisitcs   June 10, 2008 - 2pm Central time zone
Reviewer: yuna from canada
Thank you very much Tom!

I have the following questions.

1) After creating a table T1 and doing work for a while.

a:  Without anylyze it :  select table_name, num_rows from user_tables where table_name in = 'T1';

No rows returned. But we can see the statistics from ORACLE enterprise manager console. 

b: Anylyze it :  select table_name, num_rows from user_tables where table_name in = 'T1';

Row will be returned.

Does this mean after creating a table and never analyze that table manually, Oracle database still 
getting the related statistics for it?

2) If I want to calculate the table size using user_extents, whether I need to analyze the table in 
order to get the correct number? Whether the statisitcs by analyze or dbms_ stats is only for 
execution plan?

Thank you again for your wisdom.

Best regards, Yuna 


Followup   June 10, 2008 - 2pm Central time zone:

a: Without anylyze it : select table_name, num_rows from user_tables where
table_name in = 'T1';

No rows returned. But we can see the statistics from ORACLE enterprise manager
console.



sorry, but that is not true, that returns a row as soon as the table is 'there', you have made a mistake somewhere.



2) user extents shows you allocated space, it is always accurate about what is allocated for the table. 
4 stars still the statistics after creating that table (2)   June 11, 2008 - 9am Central time zone
Reviewer: Yuna from canada
Hi Tom, 

Please see the following scripts about creating table without statistics in user_tables before 
analyzing.


SQL S3 >create table temptest2 as select * from JCL_SCHEDULE;

SQL S3 >select count(*) from temptest2;

  COUNT(*)
----------
        10

SQL S3  >select NUM_ROWS from user_tables where TABLE_NAME ='TEMPTEST2';

  NUM_ROWS
----------


 SQL S3  >analyze table temptest2 compute statistics for table for all indexed columns for all
indexes;  2

Table analyzed.

 SQL S3  >select NUM_ROWS from user_tables where TABLE_NAME ='TEMPTEST2';

  NUM_ROWS
----------
        10

Thank you very much for your comments.

Best regards


Followup   June 11, 2008 - 9am Central time zone:

yes, so, this is how it is supposed to work and you got a record EACH TIME - both times you got a record.

so, what is the question?


(do not use analyze to gather statistics, use dbms_stats)
4 stars please read it again   June 11, 2008 - 10am Central time zone
Reviewer: A reader 
Please read the process again. There is no record for the first time run of :

SQL S3  >select NUM_ROWS from user_tables where TABLE_NAME ='TEMPTEST2';

  NUM_ROWS
----------


Thank you!

Best regards, Yuna


Followup   June 11, 2008 - 11am Central time zone:

sure there is.

select ROWNUM, num_rows from user_tables where table_name = 'TEMPTEST2';


you cannot "see" null.


ops$tkyte%ORA9IR2> select null from dual where 1=0;

no rows selected

ops$tkyte%ORA9IR2> select null from dual where 1=1;

N
-


ops$tkyte%ORA9IR2>


see the difference.
3 stars no record ( or null record) returned   June 11, 2008 - 1pm Central time zone
Reviewer: A reader 
Hi Tom,

Thank you for the explanation. maybe my question can be satted as this way, Why before analyzing 
the table ( with 10 rows in that table), running "select NUM_ROWS from user_tables where TABLE_NAME 
='TEMPTEST2';" returns NUll ( no records) from the SQLPLUS, but the result will show in DB console. 
After analyzing the table and run the same query, it will return the real num_rows to the SQLPLUS 
and db console? 

SQL S3 >create table temptest2 as select * from JCL_SCHEDULE;

SQL S3 >select count(*) from temptest2; ( this is the real number)

  COUNT(*)
----------
        10

SQL S3  >select NUM_ROWS from user_tables where TABLE_NAME ='TEMPTEST2'; ( NULL returned before 
analyzing)

  NUM_ROWS
----------


 SQL S3  >analyze table temptest2 compute statistics for table for all indexed columns for all
indexes;  2

Table analyzed.

 SQL S3  >select NUM_ROWS from user_tables where TABLE_NAME ='TEMPTEST2'; ( the real number 
returned).

  NUM_ROWS
----------
        10


Thank you! 
Best regards, Yuna



Followup   June 11, 2008 - 5pm Central time zone:

(please stop using analyze)

tell me the precise steps you undertake in dbconsole from the home page in order to see what you say you see.
4 stars analize table   June 12, 2008 - 9am Central time zone
Reviewer: A reader 
Thank you very much Tom for the answer.

I am sorry that I gave you wrong info about the db console. I could not see the stats either until 
I use dbms_stats to gather the infomation (exec  dbms_stats.gather_table_stats(user, 'TEMPTEST2'). 
There was another long existing table before. I could get info from the dbconsole, but not from the 
user_tables. So I applied that for the newly created table.

the step is schema-table-and the specific table.

So I could not get the stats either from user_tables or the DB console before analyzing the newly 
created table.

Best regards, Yuna


4 stars privileges   June 12, 2008 - 11am Central time zone
Reviewer: A reader 
Hi Tom,

I'd like to know what specific privilege is used to analyze a table?

Thank you!

Best regards, Yuna


5 stars MV logs need gather stats ?   June 22, 2008 - 6pm Central time zone
Reviewer: Megala from NJ
Tom:

Does materialized view log tables need to be analyzed or gather stats (dbms_stats)? Thanks


Followup   June 22, 2008 - 9pm Central time zone:

they could, sure, they are a table, in 10g if queried without - we will dynamically sample them at parse time.

so you probably do not need to do anything special for them, we'll sample them at parse time and given how simply they are used - that is probably more than sufficient.
5 stars MV log table analyze and estimate_percent => null question   June 22, 2008 - 11pm Central time zone
Reviewer: Megala 
Thanks Tom.

In Oracle 9i,Does MV log table always needs to be analyzed with current stats because after MVs 
refreshed, MV log will be emptied anyway. ?

Will there be performance problem if MV log tables are not analyzed ?

Thanks for your insight.


2) In oracle 9.2.0.6, When estimate_percent parameter is not specified, Will Oracle gather 100% 
table stats or ?

Example:

exec dbms_stats.gather_table_stats('CUR_DBA2', tabname => 'CTBILLINGTABLE', degree => 
dbms_stats.default_degree, cascade =>
true ) ;


Followup   June 23, 2008 - 7am Central time zone:


1) probably not. In 9i, dynamic sampling is set to 1

"Sample tables that have not been analyzed if there is more than one table in the query, the table in question has not been analyzed and it has no indexes, and the optimizer determines that the query plan would be affected based on the size of this object."


Hence, if you are using the CBO and we access this table using the CBO in a multi-table query, and this mlog$ table is 'large' (allocated space is big), it'll sample it. for example:


ops$tkyte%ORA9IR2> create table t ( x int primary key, y int );

Table created.

ops$tkyte%ORA9IR2> insert into t select rownum, rownum from all_users;

35 rows created.

ops$tkyte%ORA9IR2> create materialized view log on t;

Materialized view log created.

ops$tkyte%ORA9IR2> select object_name, object_type from user_objects order by object_name;

OBJECT_NAME                    OBJECT_TYPE
------------------------------ ------------------
MLOG$_T                        TABLE
RUPD$_T                        TABLE
SYS_C004615                    INDEX
T                              TABLE

ops$tkyte%ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> insert into mlog$_t
  2  select rownum, sysdate, 'x', 'x', utl_raw.cast_to_raw( rpad('x',255,'x') ) from all_objects;

30662 rows created.

ops$tkyte%ORA9IR2> commit;

Commit complete.

ops$tkyte%ORA9IR2> delete from mlog$_t;

30662 rows deleted.

ops$tkyte%ORA9IR2> commit;

Commit complete.

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> show parameter optimizer_dynamic_sampling

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling           integer     1
ops$tkyte%ORA9IR2> set autotrace traceonly explain
ops$tkyte%ORA9IR2> select * from t, mlog$_t where t.x = mlog$_t.x;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=161)
   1    0   NESTED LOOPS (Cost=3 Card=1 Bytes=161)
   2    1     TABLE ACCESS (FULL) OF 'MLOG$_T' (Cost=2 Card=1 Bytes=155)
   3    1     TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=1 Card=1 Bytes=6)
   4    3       INDEX (UNIQUE SCAN) OF 'SYS_C004615' (UNIQUE)



ops$tkyte%ORA9IR2> alter session set optimizer_dynamic_sampling = 0;

Session altered.

ops$tkyte%ORA9IR2> select * from t, mlog$_t where t.x = mlog$_t.x;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=82 Bytes=13202)
   1    0   HASH JOIN (Cost=5 Card=82 Bytes=13202)
   2    1     TABLE ACCESS (FULL) OF 'T' (Cost=2 Card=35 Bytes=210)
   3    1     TABLE ACCESS (FULL) OF 'MLOG$_T' (Cost=2 Card=82 Bytes=12710)



ops$tkyte%ORA9IR2> set autotrace off



see how the plan is with and without dynamic sampling at it's default level...



2) documentation is actually quite good for that

http://download.oracle.com/docs/cd/B10501_01/appdev.920/a96612/d_stats2.htm#1003993


.. estimate_percent
 

Percentage of rows to estimate (NULL means compute) The valid range ..
5 stars estimate_percent => null   June 24, 2008 - 6pm Central time zone
Reviewer: A reader 
Tom,

I just need to clarify:

estimate_percent => null (default value) is equivalent to 
estimate_percent=> 100  in dbms_stats.gather_table_stats ?

Thanks
 


Followup   June 24, 2008 - 7pm Central time zone:

what did the documentation say?


oh wait, we already covered that?!!?!?!

<quote from above>
2) documentation is actually quite good for that

http://download.oracle.com/docs/cd/B10501_01/appdev.920/a96612/d_stats2.htm#1003993


.. estimate_percent
Percentage of rows to estimate (NULL means compute) The valid range .
</quote>


NULL means compute......
5 stars 100% table analyze   June 24, 2008 - 11pm Central time zone
Reviewer: A reader from NJ
Tom:

my goal is to analyze the table 100%.
Can you suggest , Should i use default "compute" or estimate_percent => 100 for any tables ?

Is there any difference between Estimate(100) and Compute in the dbms_stats.gather_table_stats 
analyze ? 

Under what circumstance, one should use "compute" over estimate_percent => 100.  


Followup   June 25, 2008 - 8am Central time zone:

they are the same, which pleases you more? null or 100?


5 stars Lock Schema or Table Stats   July 29, 2008 - 1pm Central time zone
Reviewer: A reader 
Which data dictionary view contains the info about whether a table or a schema's statistics is 
currently locked?


Followup   August 1, 2008 - 10am Central time zone:

ops$tkyte%ORA10GR2> select stattype_locked from user_tab_statistics where table_name = 'T';

STATT
-----


ops$tkyte%ORA10GR2> exec dbms_stats.lock_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select stattype_locked from user_tab_statistics where table_name = 'T';

STATT
-----
ALL


5 stars confusing stats   September 4, 2008 - 4pm Central time zone
Reviewer: a reader 
Hey Tom:

  I created a table.

SQL> select count(*) from MYBITMAP;

COUNT(*)
----------
  1270064

SQL> analyze table mybitmap estimate statistics;

Table analyzed.

Elapsed: 00:00:01.44
SQL> select NUM_ROWS, blocks, empty_blocks,PCT_FREE,PCT_USED,CHAIN_CNT,AVG_ROW_LEN from user_tables where table_name='MYBITMAP';

NUM_ROWS  BLOCKS EMPTY_BLOCKS  PCT_FREE  PCT_USED CHAIN_CNT AVG_ROW_LEN
---------- ---------- ------------ ---------- ---------- ---------- -----------
  1279153    31156      204      10              0      172


SQL> exec dbms_stats.gather_table_stats(ownname => 'JOSH', tabname => 'MYBITMAP', estimate_percent => 10);

PL/SQL procedure successfully completed.


SQL> select NUM_ROWS, blocks, empty_blocks,PCT_FREE,PCT_USED,CHAIN_CNT,AVG_ROW_LEN from user_tables where table_name='MYBITMAP';

NUM_ROWS  BLOCKS EMPTY_BLOCKS  PCT_FREE  PCT_USED CHAIN_CNT AVG_ROW_LEN
---------- ---------- ------------ ---------- ---------- ---------- -----------
  1267080    31156      204      10              0      169

So how come the number of rows are different from analyze and the dbms_stat package, and also neither of them is the same with the real number of rows in the table.


Thank you very much!




Followup   September 5, 2008 - 7am Central time zone:

because you used estimate on the analyze (you saw that right? the word "estimate")

and you used a 10% estimate with dbms_stats (estimate... and estimate)


and estimate is - a guesstimate, an approximation, a guess. Consider:

ops$tkyte%ORA10GR2> create table t as select * from all_objects;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', estimate_percent=> 10 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
     49600

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', estimate_percent=> 10 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
     50730



just run dbms_stats and you'll get slightly different answers time after time. It is normal, expected, anticipated even.

Do not use analyze to gather statistics on segments, the optimizer is built to use information dbms_stats gathers and dbms_stats gathers things differently than analyze does. dbms_stats has been kept up to date over the years whereas analyze isn't as far as statistics gathering goes.
5 stars confusing stats   September 4, 2008 - 5pm Central time zone
Reviewer: A reader 
Sorry, I get it now, it is because they are only estimating the stats, after I do "compute" the 
number of rows are correct. but why the average row length from package is different from analyze 
though?

SQL> analyze table mybitmap compute statistics;

Table analyzed.

Elapsed: 00:02:28.66
SQL>  select NUM_ROWS, blocks, empty_blocks,PCT_FREE,PCT_USED,CHAIN_CNT,AVG_ROW_LEN from 
user_tables where table_name='MYBITMAP';

  NUM_ROWS     BLOCKS EMPTY_BLOCKS   PCT_FREE   PCT_USED  CHAIN_CNT AVG_ROW_LEN
---------- ---------- ------------ ---------- ---------- ---------- -----------
   1270064      31156          204         10                     0         172


Followup   September 5, 2008 - 8am Central time zone:

because analyze is legacy code, not maintained and you should only use dbms_stats to gather statistics.
4 stars DBMS_STATS in parallel   October 2, 2008 - 7am Central time zone
Reviewer: A reader 
Hi Tom,

If stats are being gathered for one partition, and there is no index on table. Can oracle use degree of parallelism? if yes, what is happening in parallel?

Thanks


Followup   October 2, 2008 - 7am Central time zone:

yes it can use parallel.

The queries that dbms_stats execute can use parallel execution. That is what would be done in parallel - the sampling queries dbms_stats runs
5 stars degree of parallellism   October 4, 2008 - 9pm Central time zone
Reviewer: A reader 
Hi Tom,

Thanks.

If table is not defined with parallel clause, and degree of parallellism is not defined in dbms_stats clause. Does oracle automatically uses degree based upon spfile parameters?

Regards,

Followup   October 6, 2008 - 2pm Central time zone:

you would see what the default is - which varies by version.

check out the documentation for your release of the database.
4 stars which case both Analyze and dbms_stats do the same job(optimizer perspective ?   October 6, 2008 - 1pm Central time zone
Reviewer: Mahesh from USA
Tom,

Will following do the same thing ?

ANALYZE TABLE <tab_name> COMPUTE STATISTICS;
and
DBMS_STATS.GATHER_TABLE_STATS (
 ownname => <owner>, 
   tabname =><tab_name>, 
estimate_percent => 100 );  --> 100%

I am concerned because one of our application requires full statistics on certain IOT table. But 
few IOT tables are getting errored out(ora-600) due to bug# 4763768

So I am thinking to user dbms_stats with estimate_percent =100.

Will that be the same case as ANALYZE with COMPUTE STATISTICS?


4 stars Query Perf Slow after Analyze...   October 28, 2008 - 5pm Central time zone
Reviewer: A reader 
Hi Tom,

In our database :

Ver : 9.2.0.6
Optmiser Mode: FIRST_ROWS


Day#1: dbms_stats were used to gather stats and  query "A" is taking 10 sec to get completed.
Day #2 : dbms_job failed . and no stats gathred ..query "A" is taking 10 sec to get completed.

....
....

day #50: and no stats gathred till date after day#1..query "A" is taking 10 sec to get completed.

day #51 : Then we used analyze command (analyze table .... compute statistics) to gather stats - 
compute stats.

day#52 : we noticed a query "A" finishing in 10 sec earlier now taking 1 hr.

day#52 : Added the hint /*+ Rule */ in quer "A" and now taking 12 sec.

Question:

a) If we gather stats today on day#52, using dbms_stats will the same query will get finished in 
~12 sec?
b) CBO could not use efficiently the stats gatherred using analyze table .... compute statistics..? 
Your comments Please.

 

 


Followup   October 28, 2008 - 7pm Central time zone:

try it, you should not be using analyze, the optimizer is written to use the values dbms_stats provides - which are different than analyze will produce.


4 stars Query Perf Slow after Analyze...   October 28, 2008 - 8pm Central time zone
Reviewer: A reader 
Hi Tom

We have gathered stats using the dbms_stats ( gather auto) and it has finished gathering the stats.

There are 1000 tables in our schema. All were analyzed using "Analyze" command on day#52 as 
mentioned above.

Now after running the dbms_stats.gather_schema_stats ( gather auto) we found  the last_analyzed on 
today  only 10 tables. 

Question : 

a) Whether all the necessary stats which optimiser looks for would have been gathered.?

If not,

b) whether we need to delete statistics and then run dbms_stat again to gather the schema stats.

c) Query "A" still take 1 hr to complete.




Followup   October 28, 2008 - 9pm Central time zone:

you only gathered on 10 things. So, you can probably answer (a) yourself.

where any of the objects query A accesses in that set of 10?
4 stars Query Perf Slow after Analyze...   October 28, 2008 - 9pm Central time zone
Reviewer: A reader 
continued.....
d)If I
exe  dbms_stats.delete_schema_stats('SCHEMA_OWNER')
and then run
exe dbms_stats.gather_schema_stats with gather auto on the above schema. 
Then what stats optimiser will take after delete and before create statistics?

e)  is it safe to do d) above in Live server?

Regards.




Followup   October 29, 2008 - 7am Central time zone:

d) the ones the documentation says would be there - in short, if you delete them all, gather will gather all of them.

and I'm not sure you want to use auto, just gather_schema_stats and let as many things default as possible

and do this IN A TEST INSTANCE, IF YOU DO THIS ON PRODUCTION WITHOUT TESTING - PLEASE DON'T COME BACK HERE. You never make major changes in a live system.
4 stars Query Perf Slow after Analyze...   October 28, 2008 - 10pm Central time zone
Reviewer: A user 
...you only gathered on 10 things. So, you can probably answer (a) yourself. 

Sorry Tom

Not making the things clear earlier.


I ran 

exec dbms_stats.gather_schema_stats(ownname=>'ZZ', degree=>4, cascade=>true, options=>'gather 
auto');

....where any of the objects query A accesses in that set of 10?

NO 


Followup   October 29, 2008 - 7am Central time zone:

you said above "and last analyzed was today for only 10 things" - so apparently, you gathered statistics for 10 things, out of 1000

please stop playing in production right now - I'll not follow up anymore until you tell me "we are doing this in test now"
2 stars Query Perf Slow after Analyze...   October 29, 2008 - 12am Central time zone
Reviewer: A reader 
HI Tom.

>you only gathered on 10 things. So, you can probably answer (a) yourself. 

You are right. I used 'gather' option to collect stats on all the tables/indexes.

Now the querry "A" has stats for all the objects it accesses using dbms_stat.

But still it takes long to execute. Does it mean that querry has some problem and stats are 
correct.?


 


4 stars Query Perf Slow after Analyze...   October 29, 2008 - 2pm Central time zone
Reviewer: A Reader 
Sorry Tom,

I understand the point which you made above.

Regards.


5 stars Analyze versus DBMS_STATS   December 17, 2008 - 8am Central time zone
Reviewer: Edson from Brazil
Hello my name is Edson and I'm from Brazil.
So I have been problems using DBMS_STATS because when I´m using Analyze for 4 tables, my query get 
better 20 seconds and when I´m using DBMS_STATS my query get better just 13 seconds.
I think that I´m using DBMS_STATS wrong.

The way that I´m using is  : 
EXECUTE 
SYS.DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'USER',TABNAME=>'TABLE',GRANULARITY=>'ALL',METHOD_OPT=>'F
OR ALL COLUMNS SIZE 1', ESTIMATE_PERCENT=>NULL,CASCADE=>TRUE, DEGREE=>6);

Please Help me.


Followup   December 29, 2008 - 12pm Central time zone:

do you actually get different plans?

show us the TKPROFS with wait events included.

The difference between 13 and 20 could be "not different" for all we know. Show us the row source operations from a tkprof for the two.
3 stars Query Performance in 10g   December 31, 2008 - 2pm Central time zone
Reviewer: Suzanne from Portland OR
We use Siebel as our CRM and I've just upgraded the Oracle database from 9.2.0.7 to 10.2.0.4.  The 
database performance has been uneven but generally slower under 10g than 9i.  One of the solutions 
that Siebel asked us to implement was the removal of all of the statistics on any empty table 
(there are many in our Siebel implementation but we don't use them that is why they are empty).  
This seems to has improved our performance significantly.  This runs contrary to all I know about 
Oracle and statistics.  Why did the removal of statistics on empty tables give us a significant 
performance boost?


Followup   January 5, 2009 - 10am Central time zone:

are they really empty?

or - are they empty usually - filled by the application - and then used.

I would say "probably, they are empty when you analyze and filled by the application at run time"


If you never query them - then removing the statistics would do nothing for you.

So, you must query them... And if you query them, they probably have data (else you would always get zero rows...).

In 9i, you had no statistics on these tables - and if they were not indexed, dynamic sampling would kick in. Dynamic sampling would have told the optimizer "there are 10,000 rows in here". We'd get the right plan.

In 10g, the auto job would find these unanalyzed tables and analyze them. Putting into the dictionary the "fact" that there are 0 rows. However, at run time, there would actually be 10,000 rows in there. When we have the wrong 'card=' values - we get the wrong plan. So, because we think 0 records - we get the wrong plan.


Remove the statistics and in 10g, we'd dynamically sample at hard parse time. We would discover "hey, there are 10,000 rows". We'd get the right plan.

http://www.oracle.com/technology/oramag/oracle/09-jan/o19asktom.html

3 stars RE:Analyze versus DBMS_STATS   January 5, 2009 - 7am Central time zone
Reviewer: Edson from Brazil
Hi Tom,
My argument about Analyze cost plan is better as DBMS_STATS is wrong ... I have seen with my DB and 
the DBMS_STATS get the same cost plan.
Thank you for your answer.

And I want to tell you that the topic is so much rich and help me so much for my research.

One more time I really grateful for your help.


5 stars Oramag Article   January 17, 2009 - 1pm Central time zone
Reviewer: S from Canada
Tom,

In the Oracle Magazine article On Dynamic Sampling, under the title When the Optimizer Guesses, you 
wrote:

"After gathering statistics, you ask the optimizer to estimate how many people were born in 
December, and it would almost certainly be able to come up with a very accurate estimate of 1/12 of 
the data (assuming a normal distribution of birth dates)."

It should probably read: (assuming the uniform distribution of birth dates).

Regards


Followup   January 17, 2009 - 1pm Central time zone:

hah, you are correct, I guess if it were "normal", it would have most people in the summer and very few in the winter.

Unless you are in Australia and then it would be the opposite...

thanks and I agree.
4 stars DBMS_STATS Problem !!!!   March 9, 2009 - 6am Central time zone
Reviewer: Indranil DAS from INDIA
Hi Tom,
We are not able to find out why the below error is occurred ::
SQL> BEGIN dbms_stats.GATHER_TABLE_STATS('MEA_REPORTS','RPT_OUTER_CARTON_T', estimate_percent=> 
DBMS_STATS.AUTO_SAMPLE_SIZE, block_sample=>TRUE, method_opt=>
 'for all indexed columns size AUTO', granularity=>'ALL',cascade=>TRUE, degree=>4); END;

*
ERROR at line 1:
ORA-00979: not a GROUP BY expression
ORA-06512: at "SYS.DBMS_STATS", line 10502
ORA-06512: at "SYS.DBMS_STATS", line 10516
ORA-06512: at line 1

Tom, what wents wrong here -plz suggest.

Thanks in advance for your help.
Indranil
Oracle DBA,IBM


Followup   March 9, 2009 - 12pm Central time zone:

please utilize support/metalink for that.


I remember something vaguely relating to a function based index in early releases causing this.
5 stars Extended Statistics   May 4, 2009 - 8pm Central time zone
Reviewer: Arvind Mishra 
Hello Tom,

a) Please can you tell me how can I effectively use extended statistics.
b) If i use expression statistics then function based index is still useful? When should I use 
function based index and when should I use function based index?
c) How can I see which multicolumn statistics are gathered on a table.
d) Is this a new feature in 11g?

Thanks,

ARVIND


Followup   May 9, 2009 - 10am Central time zone:

a) by gathering statistics on expressions or by gathering statistics on correlated columns... In the same manner you would for an INDIVIDUAL column.

b) sure, if you gathered statistics on EMPNO in EMP, but did not have a primary key on EMPNO in EMP and EMP had a lot of records - would

select * from emp where empno = :x;

run any faster? Of course not :) you would certainly create an index in order to retrieve that single record quickly.

Same with an expression!!!


ops$tkyte%ORA11GR1> create table emp
  2  as
  3  select object_id empno,
  4         trunc( dbms_random.value( 1000, 4000 ) ) sal,
  5         trunc( dbms_random.value( 5000, 10000 ) ) comm,
  6             decode( mod(rownum,2), 0, 'N', 'Y' ) flag1,
  7             decode( mod(rownum,2), 0, 'Y', 'N' ) flag2
  8    from stage
  9  /

Table created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> create index flag_idx on emp(flag1,flag2);

Index created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> exec dbms_stats.gather_table_stats( user, 'EMP', method_opt => 'for all columns 
size 254', estimate_percent=>100);

PL/SQL procedure successfully completed.

Now, without any extended statistics:


ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> begin
  2          for x in ( select /*+ gather_plan_statistics */  * from  emp where flag1 = 'N' )
  3          loop
  4                  null;
  5          end loop;
  6          for x in ( select plan_table_output from table(dbms_xplan.display_cursor(null, null, 
'allstats last')) )
  7          loop
  8                  dbms_output.put_line( '.'||x.plan_table_output );
  9          end loop;
 10          for x in ( select /*+ gather_plan_statistics */  * from  emp where flag2 = 'N' )
 11          loop
 12                  null;
 13          end loop;
 14          for x in ( select plan_table_output from table(dbms_xplan.display_cursor(null, null, 
'allstats last')) )
 15          loop
 16                  dbms_output.put_line( '.'||x.plan_table_output );
 17          end loop;
 18          for x in ( select /*+ gather_plan_statistics */  * from  emp where flag1 = 'N' and 
flag2 = 'N' )
 19          loop
 20                  null;
 21          end loop;
 22          for x in ( select plan_table_output from table(dbms_xplan.display_cursor(null, null, 
'allstats last')) )
 23          loop
 24                  dbms_output.put_line( '.'||x.plan_table_output );
 25          end loop;
 26  end;
 27  /
.SQL_ID  6ukxu8bf6rrbc, child number 0
.-------------------------------------
.SELECT /*+ gather_plan_statistics */ * FROM EMP WHERE FLAG1 = 'N'
.
.Plan hash value: 3956160932
.
.------------------------------------------------------------------------------------
.| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
.------------------------------------------------------------------------------------
.|   0 | SELECT STATEMENT  |      |      1 |        |  34232 |00:00:00.01 |     552 |
.|*  1 |  TABLE ACCESS FULL| EMP  |      1 |  34232 |  34232 |00:00:00.01 |     552 |
.------------------------------------------------------------------------------------
.
.Predicate Information (identified by operation id):
.---------------------------------------------------
.
.   1 - filter("FLAG1"='N')
.
.SQL_ID  1hwwuw5p2v7s8, child number 0
.-------------------------------------
.SELECT /*+ gather_plan_statistics */ * FROM EMP WHERE FLAG2 = 'N'
.
.Plan hash value: 3956160932
.
.------------------------------------------------------------------------------------
.| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
.------------------------------------------------------------------------------------
.|   0 | SELECT STATEMENT  |      |      1 |        |  34232 |00:00:00.01 |     552 |
.|*  1 |  TABLE ACCESS FULL| EMP  |      1 |  34232 |  34232 |00:00:00.01 |     552 |
.------------------------------------------------------------------------------------
.
.Predicate Information (identified by operation id):
.---------------------------------------------------
.
.   1 - filter("FLAG2"='N')
.
.SQL_ID  00s1uatbzv2j2, child number 0
.-------------------------------------
.SELECT /*+ gather_plan_statistics */ * FROM EMP WHERE FLAG1 = 'N' AND
.FLAG2 = 'N'
.
.Plan hash value: 3956160932
.
.------------------------------------------------------------------------------------
.| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
.------------------------------------------------------------------------------------
.|   0 | SELECT STATEMENT  |      |      1 |        |      0 |00:00:00.01 |     211 |
.|*  1 |  TABLE ACCESS FULL| EMP  |      1 |  17116 |      0 |00:00:00.01 |     211 |
.------------------------------------------------------------------------------------
.
.Predicate Information (identified by operation id):
.---------------------------------------------------
.
.   1 - filter(("FLAG1"='N' AND "FLAG2"='N'))
.

PL/SQL procedure successfully completed.

it gets the right estimated cardinalities for flag1=N, for flag2=N, but totally messes up flag1=N 
and flag2=N.  Add that pair:


ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> exec dbms_output.put_line( dbms_stats.create_extended_stats( user, 'EMP', 
'(FLAG1,FLAG2)' ) );
SYS_STUGVNB7PTIYWAVPJX#YT77WGD

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> exec dbms_stats.gather_table_stats( user, 'EMP', method_opt => 'for all columns 
size 254', estimate_percent=>100);

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> begin
  2          for x in ( select /*+ gather_plan_statistics */  * from  emp EMP2 where flag1 = 'N' 
and flag2 = 'N' )
  3          loop
  4                  null;
  5          end loop;
  6          for x in ( select plan_table_output from table(dbms_xplan.display_cursor(null, null, 
'allstats last')) )
  7          loop
  8                  dbms_output.put_line( '.'||x.plan_table_output );
  9          end loop;
 10  end;
 11  /
.SQL_ID  769uyry623wf9, child number 0
.-------------------------------------
.SELECT /*+ gather_plan_statistics */ * FROM EMP EMP2 WHERE FLAG1 = 'N'
.AND FLAG2 = 'N'
.
.Plan hash value: 3400700027
.
.--------------------------------------------------------------------------------------------------
.| Id  | Operation                   | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
.--------------------------------------------------------------------------------------------------
.|   0 | SELECT STATEMENT            |          |      1 |        |      0 |00:00:00.01 |       2 |
.|   1 |  TABLE ACCESS BY INDEX ROWID| EMP      |      1 |      1 |      0 |00:00:00.01 |       2 |
.|*  2 |   INDEX RANGE SCAN          | FLAG_IDX |      1 |      1 |      0 |00:00:00.01 |       2 |
.--------------------------------------------------------------------------------------------------
.
.Predicate Information (identified by operation id):
.---------------------------------------------------
.
.   2 - access("FLAG1"='N' AND "FLAG2"='N')
.

PL/SQL procedure successfully completed.

 cardinality is right, plan changes - all good

Now, do an expression:


ops$tkyte%ORA11GR1> begin
  2          for x in ( select /*+ gather_plan_statistics */  * from  EMP where ln(sal+comm) > 9 )
  3          loop
  4                  null;
  5          end loop;
  6          for x in ( select plan_table_output from table(dbms_xplan.display_cursor(null, null, 
'allstats last')) )
  7          loop
  8                  dbms_output.put_line( '.'||x.plan_table_output );
  9          end loop;
 10  end;
 11  /
.SQL_ID  f03rzhdyzy1gc, child number 0
.-------------------------------------
.SELECT /*+ gather_plan_statistics */ * FROM EMP WHERE LN(SAL+COMM) > 9
.
.Plan hash value: 3956160932
.
.------------------------------------------------------------------------------------
.| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
.------------------------------------------------------------------------------------
.|   0 | SELECT STATEMENT  |      |      1 |        |  58463 |00:00:08.42 |     794 |
.|*  1 |  TABLE ACCESS FULL| EMP  |      1 |   3423 |  58463 |00:00:08.42 |     794 |
.------------------------------------------------------------------------------------
.
.Predicate Information (identified by operation id):
.---------------------------------------------------
.
.   1 - filter(LN("SAL"+"COMM")>9)
.

PL/SQL procedure successfully completed.

well, we can see that is wrong (the estimated row count) 

ops$tkyte%ORA11GR1> alter table EMP add (comp as (ln(sal+comm)));

Table altered.

ops$tkyte%ORA11GR1> exec dbms_stats.gather_table_stats( user, 'EMP', method_opt => 'for all columns 
size 254', estimate_percent=>100);

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> begin
  2          for x in ( select /*+ gather_plan_statistics */  * from  EMP where ln(sal+comm) > 9 )
  3          loop
  4                  null;
  5          end loop;
  6          for x in ( select plan_table_output from table(dbms_xplan.display_cursor(null, null, 
'allstats last')) )
  7          loop
  8                  dbms_output.put_line( '.'||x.plan_table_output );
  9          end loop;
 10  end;
 11  /
.SQL_ID  f03rzhdyzy1gc, child number 0
.-------------------------------------
.SELECT /*+ gather_plan_statistics */ * FROM EMP WHERE LN(SAL+COMM) > 9
.
.Plan hash value: 3956160932
.
.------------------------------------------------------------------------------------
.| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
.------------------------------------------------------------------------------------
.|   0 | SELECT STATEMENT  |      |      1 |        |  58463 |00:00:05.61 |     794 |
.|*  1 |  TABLE ACCESS FULL| EMP  |      1 |  58481 |  58463 |00:00:05.61 |     794 |
.------------------------------------------------------------------------------------
.
.Predicate Information (identified by operation id):
.---------------------------------------------------
.
.   1 - filter(LN("SAL"+"COMM")>9)
.

PL/SQL procedure successfully completed.

card = is now correct...




c) query the dictionary:

ops$tkyte%ORA11GR1> select table_name, column_name from user_tab_col_statistics where table_name = 
'EMP';

TABLE_NAME                     COLUMN_NAME
------------------------------ ------------------------------
EMP                            EMPNO
EMP                            SAL
EMP                            COMM
EMP                            FLAG1
EMP                            FLAG2
EMP                            SYS_STUGVNB7PTIYWAVPJX#YT77WGD
EMP                            COMP

7 rows selected.




that sys_fadfadafds thing was dbms_output'ed above - that is flag1,flag2 - our virtual column looks like a normal one - COMP.


c)

ops$tkyte%ORA11GR1> select table_name, column_name from user_tab_col_statistics where table_name = 
'EMP';

TABLE_NAME                     COLUMN_NAME
------------------------------ ------------------------------
EMP                            EMPNO
EMP                            SAL
EMP                            COMM
EMP                            FLAG1
EMP                            FLAG2
EMP                            SYS_STUGVNB7PTIYWAVPJX#YT77WGD
EMP                            COMP

7 rows selected.

ops$tkyte%ORA11GR1> SELECT extension_name, extension
  2  FROM   dba_stat_extensions
  3  WHERE  table_name = 'EMP';

EXTENSION_NAME
------------------------------
EXTENSION
-------------------------------------------------------------------------------
SYS_STUGVNB7PTIYWAVPJX#YT77WGD
("FLAG1","FLAG2")

COMP
("SAL"+"COMM")




d) yes
4 stars CHAIN_CNT affecting query plan ... ?   August 19, 2009 - 11am Central time zone
Reviewer: Gary 
Tom, 

You said in an earlier post in this thread (December 16, 2005) that "dbms_stats won't gather things 
not used by the CBO such as chain_cnt/avg_space in particular." We ran into an odd situation 
recently where removing the chain_cnt statistic seemed to result in an improved plan.

The job stream was running ANALYZE followed by DBMS_STATS, since that was the only way the 
development team knew to get a count of chained rows.

We decided on a simplified test case, since the developer had been rebuilding the table with a 
higher pctfree to eliminate the chained rows; here is what I tried:

Use DBMS_STATS.EXPORT_TABLE_STATS to preserve existing stats on two tables used by the problem 
query.
ANALYZE TABLE <name> DELETE STATISTICS ; -- for both tables
Use IMPORT_TABLE_STATS to put the exported statistics back into place

This should have had the net effect of removing only the stats gathered by ANALYZE TABLE, including 
CHAIN_CNT and AVG_SPACE. The plan changed very much for the better after this procedure was done. 
The old plan used index access to hit the two tables; the new plan full scanned them then used them 
in hash joins. The tables were only 2 MB and 14 MB in size, so the hash join approach seems like a 
no brainer chained rows or no. 

The developer seemed pretty sure before this that CHAIN_CNT was affecting the query plan ... now 
I'm not sure what to tell him. (Maybe some other stat gathered by ANALYZE TABLE is to blame?)

Thanks, 

Gary


Followup   August 24, 2009 - 5pm Central time zone:

did you export column and index statistics as well?

else you deleted A LOT of stuff.
5 stars stats   August 27, 2009 - 12pm Central time zone
Reviewer: sam 
Tom:

I want to setup a nightly job to update stats in a database.

What is the standard statement you would do. is it

dbms_stats.gather_schema_stats('MYAPP');

I noticed this did not update the stats in USER_IDNEXES and COLUMNS.


Followup   August 28, 2009 - 4pm Central time zone:

Sam/SMK

you know my answer is going to be

it depends.



You noticed something incorrectly. It does.


ops$tkyte%ORA10GR2> create table t ( x int );

Table created.

ops$tkyte%ORA10GR2> create index t_idx on t(x);

Index created.

ops$tkyte%ORA10GR2> insert into t select rownum from all_users;

42 rows created.

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select 't', num_rows from user_tables where table_name = 'T'
  2  union all
  3  select 'i', num_rows from user_indexes where index_name = 'T_IDX'
  4  union all
  5  select 'c', count(*) from user_tab_histograms where table_name = 'T';

'   NUM_ROWS
- ----------
t
i
c          0

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_schema_stats( user );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select 't', num_rows from user_tables where table_name = 'T'
  2  union all
  3  select 'i', num_rows from user_indexes where index_name = 'T_IDX'
  4  union all
  5  select 'c', count(*) from user_tab_histograms where table_name = 'T';

'   NUM_ROWS
- ----------
t         42
i         42
c          2



how cool is that, 42 :) that was a pleasant surprise.
5 stars stats   August 29, 2009 - 1pm Central time zone
Reviewer: A reader 
Tom:

1.  You are looking at num_rows column.

I was looking at LAST_ANALYZED date stamp in USER INDEXES.

The LAST_ANALYZED in USER_TABLES gets update correctly to the date you run the statement but not 
USER_INDEXES.

2, on  a second note, i was reading a book that says collecting stats may not be helpful ever day 
depending on data.

The Oracle optimizer will generate new query plan every time you collect stats so it will not reuse 
old plans.

is this correct and do you adivese to schedule it on a weekly basis?


Followup   August 29, 2009 - 7pm Central time zone:

1) sam, perhaps you should have looked again?

ops$tkyte%ORA10GR2> create table t ( x int );

Table created.

ops$tkyte%ORA10GR2> create index t_idx on t(x);

Index created.

ops$tkyte%ORA10GR2> insert into t select rownum from all_users;

42 rows created.

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2> select 't', num_rows, last_analyzed from user_tables where table_name = 'T'
  2  union all
  3  select 'i', num_rows, last_analyzed from user_indexes where index_name = 'T_IDX'
  4  union all
  5  select 'c', count(*), to_date( null) from user_tab_histograms where table_name = 'T';

'   NUM_ROWS LAST_ANAL
- ---------- ---------
t
i
c          0

ops$tkyte%ORA10GR2> exec dbms_stats.gather_schema_stats( user );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select 't', num_rows, last_analyzed from user_tables where table_name = 'T'
  2  union all
  3  select 'i', num_rows, last_analyzed from user_indexes where index_name = 'T_IDX'
  4  union all
  5  select 'c', count(*), to_date( null) from user_tab_histograms where table_name = 'T';

'   NUM_ROWS LAST_ANAL
- ---------- ---------
t         42 29-AUG-09
i         42 29-AUG-09
c          2





2) .. The Oracle optimizer will generate new query plan every time you collect stats
so it will not reuse old plans.
..

sam - you do understand that the major goal of collecting statistics is....


to cause plans to change because the size and shape of the data has changed.


so, (this is dead serious, I'm not sarcastic here at all), so do you want your plans to change like that - or not.


Only you can answer the question "how often should I analyze"

you know your data
you know how your data changes
you know how your applications use your data
you know this, if you don't, stop - and get that information



5 stars index creation on table with LOCKED stats   November 4, 2009 - 1pm Central time zone
Reviewer: psha from Ahm IND
Hi Tom,
We are using Oracle 10g R2.We have one table with LOCKED stats. Now if I create indx on it, index 
is not analyzed. We are planining to migrate to 11g, is there any way in 11g to lock table stats 
such that whenever we create any index, index gets analyzed as soon as it is creatd?

Thanks
Reader


Followup   November 11, 2009 - 9am Central time zone:

no, when you lock, you lock.

you would unlock in order to gather, so just unlock, create index, lock

ops$tkyte%ORA11GR2> create table t
  2  as
  3  select * from all_objects;

Table created.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> create index t_idx on t(object_name);

Index created.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> exec dbms_stats.lock_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select 't', table_name, num_rows, last_analyzed from user_tables where 
table_name = 'T'
  2  union all
  3  select 'i', index_name, num_rows, last_analyzed from user_indexes where table_name = 'T';

' TABLE_NAME                       NUM_ROWS LAST_ANALYZED
- ------------------------------ ---------- --------------------
t T                                   71588 11-nov-2009 10:30:13
i T_IDX                               71588 11-nov-2009 10:30:13

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> exec dbms_stats.unlock_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> create index t_idx2 on t(owner,object_type);

Index created.

ops$tkyte%ORA11GR2> exec dbms_stats.lock_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select 't', table_name, num_rows, last_analyzed from user_tables where 
table_name = 'T'
  2  union all
  3  select 'i', index_name, num_rows, last_analyzed from user_indexes where table_name = 'T';

' TABLE_NAME                       NUM_ROWS LAST_ANALYZED
- ------------------------------ ---------- --------------------
t T                                   71588 11-nov-2009 10:30:13
i T_IDX2                              71588 11-nov-2009 10:30:14
i T_IDX                               71588 11-nov-2009 10:30:13

2 stars   January 24, 2010 - 3pm Central time zone
Reviewer: A reader 
After I issue "alter table move" or "index rebuild", do I need to analyze tables or use dbms_stats 
to gather statistics?
From your book I understand that column level histograms and avg_row_len statistics are lost??


Followup   January 26, 2010 - 1am Central time zone:

from *my* book? where did you read that - so I can fix it.

ops$tkyte%ORA11GR1> create table t as select * from all_users;

Table created.

ops$tkyte%ORA11GR1> create index t_idx on t(username);

Index created.

ops$tkyte%ORA11GR1> begin
  2  dbms_stats.gather_table_stats
  3  ( user, 'T', method_opt => 'for all indexed columns size 254' );
  4  end;
  5  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select avg_row_len from user_tables where table_name = 'T';

AVG_ROW_LEN
-----------
         20

ops$tkyte%ORA11GR1> select count(*) from user_tab_histograms where table_name = 'T';

  COUNT(*)
----------
        38

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> alter table t move;

Table altered.

ops$tkyte%ORA11GR1> alter index t_idx rebuild;

Index altered.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select avg_row_len from user_tables where table_name = 'T';

AVG_ROW_LEN
-----------
         20

ops$tkyte%ORA11GR1> select count(*) from user_tab_histograms where table_name = 'T';

  COUNT(*)
----------
        38




Now, that aside, you may well have to gather statistics after a reorganization since major metrics can and will change - like number of blocks for example (just for example, the others might be affected as well)

Write a Review
 


All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.

About Oracle | Legal Notices and Terms of Use | Privacy Statement