Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Suresh.

Asked: June 23, 2002 - 3:43 pm UTC

Last updated: November 15, 2022 - 7:06 am UTC

Version: 8.1.7

Viewed 100K+ times! This question is

You Asked

Tom,
Could you please tell me if there are any other important differences, advantages with DBMS_STATS over ANALYZE other than the points listed below.

1. DBMS_STATS can be done in parallel
2. Monitoring can be done and stale statistics can be collected for
changed rows using DBMS_STATS.

Thanks,
Suresh



and Tom said...

you can import/export/set statistics directly with dbms_stats

it is easier to automate with dbms_stats (it is procedural, analyze is just a command)

dbms_stats is the stated, preferred method of collecting statisttics.

dbms_stats can analyze external tables, analyze cannot.

DBMS_STATS gathers statistics only for cost-based optimization; it does not gather other statistics. For example, the table statistics gathered by DBMS_STATS include the number of rows, number of blocks currently containing data, and average row length but not the number of chained rows, average free space, or number of unused data blocks.

dbms_stats (in 9i) can gather system stats (new)

ANALYZE calculates global statistics for partitioned tables and indexes instead
of gathering them directly. This can lead to inaccuracies for some statistics, such as the number of distinct values. DBMS_Stats won't do that.

Most importantly, in the future, ANALYZE will not collect statistics needed by
the cost-based optimizer.

Rating

  (229 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Incomplete statistics?

Jayaraman Ashok, June 23, 2002 - 8:44 pm UTC

Hi Tom
You have replied that dbms_stats does not gather information on chained rows,unused data blocks and average free space available in blocks. Does it mean that to get the above mentioned information we have to run analyze command again? Also you mention that dbms_stats gathers statistics required by the CBO. Are there any special statistics that are gathered apart from number of rows, number of blocks occupied? If it is only the above two, then i believe that analyze command already accomplishes it. The only feature that is of use may be the set_stats procedure for tuning queries before using them in production.

Thanks & regards,
Ashok

Tom Kyte
June 23, 2002 - 9:32 pm UTC

dbms_stats only gathers things the CBO cares about. The CBO does not care about that information.

In the future, analyze will not gather CBO info, only these other things CBO doesn't care about. DBMS_STATS currently and will only gather CBO info.

I think you need to reread the answer. dbms_stats gathers everything that the analyze command does but.... can do it in parallel, can do it for stale tables, can export/import stats, can analyze things analyze cannot (the system for example) and so on. There are lots more stats then just num rows and num blocks -- and both commands collect them. There are things dbms_stats will do that analyze does not (especially on a partitioned table).

Recommendation: use dbms_stats.

Dave, June 24, 2002 - 9:43 am UTC

When I "CREATE INDEX my_index ... COMPUTE STATISTICS" is the analyze being performed by the DBMS_STATS method, or the ANALYZE INDEX method?

If by the latter method, am I likely to gain by running DBMS_STATS post-CREATE or post-REBUILD mostly for partitioned segments (ref. your comments on gathering global statistics), with "ANALYZE ..." still being a reasonable choice at the moment for non-partitioned segments?

Tom Kyte
June 24, 2002 - 9:48 am UTC

It is closer to the analyze method -- only because it is DDL (same results from analyze, dbms_stats, create index ultimately).

But since you only create an index once, it is good only for the initial build, after that -- you are back to dbms_stats.

It is a general recommendation to use dbms_stats, period. There will come a time when dbms_stats is the only way.

external table, index and dbms_stats

Suresh, June 24, 2002 - 10:39 am UTC

Tom,
You have mentioned that DBMS_STATS can analyze external tables. I tried to create an index on an external table and it came up with an error "not supported on external table".
My question is if dbms_stats could analyze external tables, why not allow an index on external table as well and analyze it with dbms_stats.


Tom Kyte
June 25, 2002 - 8:00 am UTC

you cannot index something that we have no control over.

Consider if we did. say the file started with:

1,a,hello
2,b,goodbye


and we indexed that (the first column). Now, you come along and replace that file with:

2,b,goodbye
1,a,hello


and issue: select * from external_table where column1 = 1;

what do you think would happen if

a) we used an index
b) we full scanned

I think we would get two answers. We cannot index a file that is changeable (and changeable by forces outside of our control)




Queries on Analyze

Sanjay Jha, June 24, 2002 - 4:24 pm UTC

Tom,
The ongoing discussion has generated few more related queries in me. I am listing them below, for you crisp and clear answers :
1. What are different packages for running analyze (eg. dbms_utulity, dbms_stats, db_maintenance etc.) and what are the advantages of each?
2. What is the best way to analyze one particular table from CBO's perspective(BTW, what are all possible ways)?
3. If someone has to automate the analyze, what do you recommend and why - scheduling through cron or dba_jobs? (For that matter what is your opinion on scheduling any job for database- OS level or through Oracle jobs?)
4. This question is not related to analyze but to the Oracle supplied packages- Where in a databse one can get info about these packages (like v$ views etc.) or give me an url for reference please.

Thanks.



Tom Kyte
June 25, 2002 - 8:39 am UTC

1) dbms_stats is the one you should use. dbms_utility has some very very primitive functions left in there for backwards compatibility only.

2) for that, please read the ADMIN guide, it covers the methods and when to use what.

You know -- if there was a "best way", it would be the only way...

3) If you have my book -- you know my opinion on this. Do it in the database.

Cron -- can cron tell if the database is up and running? cron will try to analyze even when the database is not ready. DBMS_JOBS is the right tool for this in my opinion (barring that, OEM would be the right place). I want all of my database related stuff going on in the database -- one place to look for issues (my alert log or OEM logs) and so on.

4) how about the supplied packages guide. Instead of a direct link, I'll teach you to fish.

goto otn.oracle.com
click on documentation
pick what database version you have

go after the supplied packages guide (in 8i and before, this was with the "database" set of docs. In 9i -- there is just one big list)



DBMS_STATS and statistics

Arun Gupta, May 02, 2003 - 7:10 pm UTC

Tom
I did the following:
a) Using dbms_stats, analyzed all the tables in schema X of database A and using dbms_stats.export_schema_stats, exported the statistics into stattab.
b) Extracted the table/index creation scripts for schema X, excluding the stattab table.
c) Using Oracle export (exp), exported the stattab table.
d) Using the scripts in b), I created all the tables and indexes in schema Y of database B. These tables do not contain even a single row of data.
e) Using Oracle import(imp), imported the stattab into schema Y of database B.
f) Using dbms_stats.import_schema_stats, I imported all the statistics from stattab.

I cannot see the statistics like num_rows, avg_row_len etc. in user_tables. How does the CBO read these statistics?

Thanks

Tom Kyte
May 02, 2003 - 8:11 pm UTC

dbms_stats was designed to move the stats for user A from db1 to user A in db2.

It is the X to Y that is messing you up.  This demonstrates what you are seeing, but shows how to adapt it to work anyway:



ops$tkyte@ORA920LAP> drop user a cascade;
User dropped.

ops$tkyte@ORA920LAP> drop user b cascade;
User dropped.

ops$tkyte@ORA920LAP> grant dba to a identified by a;
Grant succeeded.

ops$tkyte@ORA920LAP> grant dba to b identified by b;
Grant succeeded.

ops$tkyte@ORA920LAP> @connect a/a

a@ORA920LAP> create table t as select * from all_objects;
Table created.

a@ORA920LAP> create index t_idx on t(object_id);
Index created.

a@ORA920LAP> analyze table t compute statistics
  2  for table for all indexes for all indexed columns;
Table analyzed.

a@ORA920LAP> exec dbms_stats.create_stat_table( user, 'st' );
PL/SQL procedure successfully completed.

a@ORA920LAP> exec dbms_stats.export_schema_stats( user, 'st' );
PL/SQL procedure successfully completed.

a@ORA920LAP> !exp userid=a/a tables=st

Export: Release 9.2.0.3.0 - Production on Fri May 2 20:08:56 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.3.0 - Production
Export done in WE8ISO8859P1 character set and AL16UTF16 NCHAR character set

About to export specified tables via Conventional Path ...
. . exporting table                             ST         78 rows exported
Export terminated successfully without warnings.


a@ORA920LAP> @connect b/b

b@ORA920LAP> create table t as select * from all_objects where 1=0;
Table created.

b@ORA920LAP> create index t_idx on t(object_id);
Index created.

b@ORA920LAP> !imp userid=b/b fromuser=a touser=b

Import: Release 9.2.0.3.0 - Production on Fri May 2 20:08:56 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.3.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.3.0 - Production

Export file created by EXPORT:V09.02.00 via conventional path

Warning: the objects were exported by A, not by you

import done in WE8ISO8859P1 character set and AL16UTF16 NCHAR character set
. . importing table                           "ST"         78 rows imported
Import terminated successfully without warnings.

b@ORA920LAP> exec dbms_stats.import_schema_stats( user, 'st' );

PL/SQL procedure successfully completed.

b@ORA920LAP> select table_name, num_rows from user_tables;

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
ST
T

<b>that is what you currently see but:</b>

b@ORA920LAP> update st
  2  set c5 = USER
  3  /

78 rows updated.

b@ORA920LAP> exec dbms_stats.import_schema_stats( user, 'st' );

PL/SQL procedure successfully completed.

b@ORA920LAP> select table_name, num_rows from user_tables;

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
ST
T                                   30557

b@ORA920LAP>

<b>that'll "rename the user" in the stats tab so it'll be looking for B.T, not A.T
</b>
 

It works

Arun Gupta, May 02, 2003 - 9:16 pm UTC

After updating c5 in stattab to new schema name, it works.
Thanks...!!

What about execution statistics

Arun Gupta, May 06, 2003 - 9:56 am UTC

Tom
I am a bit unclear about the purpose of importing statistics from one database to another. My questions are:

a) If I set a table with 0 rows with the statistics of a 1 million row table, can I run a query against this 0 row table and actually see the same buffer gets as I would see against a 1 million row table?
b) When the query actually hits a 1 million row table, can Oracle change the plan?

Tom Kyte
May 06, 2003 - 10:05 am UTC

a) no
b) not sure what you mean. if you mean "will the plan for a table with 900,000 rows be different then a plan for 1,000,000" -- the answer is "yes, it could be". If you mean "if the statistics say there are N rows but there are really M rows -- will Oracle change the plan after finding there are not N rows?" -- the answer is "no, once formulated -- the plan is the plan"



One use of the export/import stats trick is

a) you have a logical standby or some reporting instance that is a close copy of production (using replication, whatever)

b) you run stats on the copy. move them into production.




Problems in anlayze

SUBBARAO., May 07, 2003 - 1:20 am UTC

There is much useful information on ANALYZE in this article. It is helpful. My scenario is like this, looks strange but it is.

In my production system, we have some problems in getting some reports from the application.
For one report query has 8 tables, out of which 2 tables have huge number of records, both these tables are not analyzed, rest all the tables are analyzed. This report is not launching.
We have now analyzed the entire schema. Then the report started working. The cost of the query before and after analyze has drastically come down. Some tables will be getting different kinds of data, and some tables will be getting same type of data. I am not sure how long this report will work and when I should analyze tables. I have read about the table monitoring feature and document says that Oracle will automatically analyze the table if it requires analyze.

My questions:
1) So enabling this feature, what will be the side effects in performance. I know that if we want some thing we should scarifice few things, but I would like to highlight these side effects to my boss.

2) If a query is having 5 tables and out of which 3 tables are not analyzed and 2 tables are analyzed, will the cost calculation be accurate? I mean will the optimizer gives a better plan.
Is it required that all the tables in the query are to be analyzed or all not to be analyzed.


Tom Kyte
May 07, 2003 - 7:32 am UTC

1) monitoring is extremely low impact, not even measurable. YOU still have to analyze, you just use dbms_stats with the gather stale option. it'll collect stats only on tables you need.

2) of course the costing won't be accurate. you should either have all or no objects refered to by a query analyzed.

Excellent and Timely discussion....

A reader, May 07, 2003 - 10:22 am UTC

Hi Tom:

As usual, your answers allows us (DBA's) to have a good night's sleep.

On the issue of monitoring...

What criteria does oracle use to determine that the stats are stale ?
Can these thresholds be changed/set by the user ?
Are the stats in the user_tab_modifications cumulative ?
If I need to preserve the stats in the above view, can you please suggest a procedure/script as to how I go about saving these. I am interested in preserving the amount of dml activity by segment.

Thanks
Mohsin

Tom Kyte
May 07, 2003 - 10:46 am UTC

o when about 10% of the table is changed

o no

o yes

o you would have to "insert into some_other_table select sysdate, a.* from the modifications table A" before gathering stats. Make that part of your 'process'.

Excessive Recursive calls after stats

A reader, May 07, 2003 - 11:53 am UTC

Hi Tom:

I have observed that the statistics section of the sql+ trace reports excessive recursive call count for queries agains the tables just after the tables have been analyzed.

What does this mean and what are the side effects of stats gathering ? Has the stats gathering invalidated the parsed sql in the shared pool ?

Would this be a good way of implementing my own thresholds:
I'd turn on monitoring for the segments.
Collect the dml activity from the user_tab_modifications to my own table and then have my procedure re-compute the stats after say x% of change to the segment and not use the gather stale option of the dbms_stats.
What is the timestamp column for ? If the values are cummulative then how does it determine the incremental from the last stale stats run ?

Thanks.

Tom Kyte
May 07, 2003 - 1:45 pm UTC

hows about an example ?

Here it is.

A reader, May 07, 2003 - 2:42 pm UTC

09:11:53 SQL> connect scott/tiger
Connected.
09:16:14 SQL> 
09:16:14 SQL> 
09:16:14 SQL> analyze table junk compute statistics for table for all indexed columns for all indexes;

Table analyzed.

09:17:06 SQL> show user
USER is "SCOTT"
14:36:01 SQL> set autotrace traceonly explain statistics;
14:36:21 SQL> 
14:36:23 SQL> select * from junk where owner='MMJ';

1 row selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=86)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'JUNK' (Cost=4 Card=1 Byt
          es=86)

   2    1     INDEX (RANGE SCAN) OF 'IDX_JUNK_OWNER' (NON-UNIQUE) (Cos
          t=3 Card=1)





Statistics
----------------------------------------------------------
        330  recursive calls
          0  db block gets
         73  consistent gets
          3  physical reads
          0  redo size
       1114  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          1  rows processed

14:36:37 SQL> /

1 row selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=86)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'JUNK' (Cost=4 Card=1 Byt
          es=86)

   2    1     INDEX (RANGE SCAN) OF 'IDX_JUNK_OWNER' (NON-UNIQUE) (Cos
          t=3 Card=1)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
       1114  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

14:36:58 SQL>  

Tom Kyte
May 08, 2003 - 9:17 am UTC

analyzing the table will invalidate open cursors on that object in the shared pool.  the next parse against that object will be "hard".  hard does lots more recursive sql then "soft"

It is not the analyze so much as "you did something to cause hard parsing to have to take place"

here is an example using the rbo and EMP:


ops$tkyte@ORA817DEV> select * from emp;

14 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (FULL) OF 'EMP'




Statistics
----------------------------------------------------------
          0  recursive calls
         12  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1979  bytes sent via SQL*Net to client
        425  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         14  rows processed

ops$tkyte@ORA817DEV> alter table emp add x number;

Table altered.

ops$tkyte@ORA817DEV> select * from emp;

14 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   TABLE ACCESS (FULL) OF 'EMP'




Statistics
----------------------------------------------------------
        148  recursive calls
         15  db block gets
         26  consistent gets
          0  physical reads
          0  redo size
       2042  bytes sent via SQL*Net to client
        425  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
         14  rows processed


Of course, SHUTDOWN/STARTUP would have the same effect.

normal, natural, expected and one time -- you hard parse and soft after that. 

Some more info....

A reader, May 07, 2003 - 3:32 pm UTC

Hi Tom.

Here is some more info that might help you.

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE 9.2.0.1.0 Production
TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production


The junk table is a mirror of dba_objects view.

And there was no activity in the d/b between the time 9:16 and 14:36 in my sql+ session results I posted earlier.


Please let me know if you need more info.

Thanks
Mohsin

effect of partial analyze

kumar, May 08, 2003 - 7:26 am UTC

Tom,

You have mentioned in one of your replies that we should not analyze certain objects used in a query and leave the rest un-analyzed. I have read similar statements in lots of places, but i could not really understand why. Can you please explain why is it so dangerous to do so?

Tom Kyte
May 08, 2003 - 9:57 am UTC

analogy time.

You are a manager now.

I give you a team of 4 people.

I tell you all about 2 of them -- their capabilities, experience, level of comprehension, lots of stuff.

I tell you NOTHING about the other two.

I give you a problem to solve using these 4 people. You need to make assignments based on your knowledge of them. You have good knowledge of 2, no knowledge of the other 2. You look at them and size them up. Maybe you don't like the way the 4th guy dressed -- he "looks dumb" to you.

You dole out responsibilities -- not giving much hard stuff to #4. Unfortunately for you he was the smartest in the group and if he had been given 99% of the work -- it would have been a success. As it was, due to inefficient resource allocation -- you never finish your project!


So it is with statistics. You give the optimizer only a little information, it has to "make up the rest" and can come to some very bad decisions as to the correct approach.

Hat's off to you...

A reader, May 08, 2003 - 11:20 am UTC

Hi Tom:

Excellent analogy about the CBO....

I will confirm your point, but I am positive nothing was happening on my d/b to have caused invalidation of my cursors resulting in hard parse. I conducted the test on my laptop, so I know what was happening.

Could the Oracle version have made the difference in this case?

I was using Oracle 9i Rel 2, and from your SQL+ prompt it seems you were using 8i.

Thanks
Mohsin

Ahhh the language barrier strikes again...

A reader, May 08, 2003 - 11:27 am UTC

Sorry Tom:

I read and re-read your response. I get it.

Once in a while in between Swahili, Gujarati, Katchi, Urdu, Hindi and a sprinkling of Arabic I forget my English...

As they say in Swahili - Hakuna Matata (No Problem)

Kwaheri.

Thanks again.


Analogy worth more than 5 stars!!!!

A reader, May 08, 2003 - 11:53 am UTC


Both Analyze and DBMS_STATS?

Dan Jatnieks, May 16, 2003 - 7:41 pm UTC

In your original response you stated there are some statistics that DBMS_STATS does not collect:

"...but not the number of chained rows, average free
space, or number of unused data blocks"

I would like to switch to using DBMS_STATS for CBO statistics on my 9i database, but I also want to collect the statistics that DBMS_STATS does not; I have monitoring software that collects and uses chained rows, avg free space and unused data blocks.

Can I use Analyze first and then DBMS_STATS (in that order) to accomplish this?

The reasoning would be that first using Analyze will collect the data that DBMS_STATS does not; then DBMS_STATS will overwrite the CBO statistics that Analyze collected, but leave alone the other Analyze data.

Will that work?


Tom Kyte
May 17, 2003 - 10:02 am UTC

it would work but it would be like using two cars to goto work -- it would be double the work in most cases (partitioned tables being the exception).

can the monitoring software use "list chained rows" instead and can it use dbms_space to get the other info.

Statistics for Partioned tables

Nathan, February 05, 2004 - 10:08 am UTC

Dear Tom,
Your answers have been comprehensive and helpful all the time, just another quick one ( hopefully) from me.

When we the Estimate_percent parameter and sent it to 0.00001 in DBMS_STATS.GATHER_TABLE_STATS , would it mean only that percent of the blocks be scanned for analysis ? Our gather_stats takes with estimate takes hours to complete , My Senior DBA is skeptical of the percentage option and reckons that it could be a bug , he concluded by watching V$session_longops ( as the same number of blocks were scanned ) . I have not checked it yet but apparently the percentage does work on non-partitioned tables . Is this true ? Is there a bug in oracle 9i release 1.0 software

Thanks and Regards
Vish

Tom Kyte
February 05, 2004 - 7:18 pm UTC

hows about you share the details -- such as the entire, full command you are using plus the type of table you are doing this to and some facts about the table itself.

0.00001 would be a "not so good number" to use, pretty darn small it would be.

Analyze and DBMS_STATS

chandu singh, February 06, 2004 - 12:28 am UTC

Hi
i think it is a best to have the percentage of minium to 10% . is that right if not pls explain.

Tom Kyte
February 07, 2004 - 12:10 pm UTC

why -- whats the reasoning (not that you are wrong, but -- why do you think that :)

it could be 1%
it could be 10%
it could be 100%

it just depends on "how evenly distributed the data is in general". If the data is large-ish AND pretty much evenly distributed, or you are not computing histograms -- 1% might be dandy.

Me, I like to compute whenever possible personally. Only when it gets really too large -- and then estimate as high as you can (realizing of course anything over 49% is really compute in disguise)

Analyze and DBMS_STATS

chandu singh, February 06, 2004 - 12:28 am UTC

Hi
i think it is a best to have the percentage of minium to 10% . is that right if not pls explain.

DBMS_STATS.AUTO_SAMPLE_SIZE Issues

RyuO, February 06, 2004 - 12:14 pm UTC

My experience was that DBMS_STATS is wonderful, but requires a lot of wrapping. Here is the wrapper proc I evolved:

CREATE OR REPLACE PROCEDURE analyze_schema(
p_schema_name IN VARCHAR2 := USER,
p_Estimate_Pct IN PLS_INTEGER := NULL) IS
-- like DBMS_UTILITY.ANALYZE_SCHEMA, but using DBMS_STATS
c_proc_name CONSTANT VARCHAR2(128) := UPPER('analyze_schema');
c_Def_Estimate_Pct CONSTANT PLS_INTEGER := 35;
c_Min_Degree CONSTANT PLS_INTEGER := 1;
c_Def_Degree CONSTANT PLS_INTEGER := LEAST(c_Min_Degree,DBMS_STATS.DEFAULT_DEGREE);
-- default degree is somehow 32767 at the moment
c_Def_Options CONSTANT VARCHAR2(30) := 'GATHER AUTO';
-- 'GATHER' means analyze all objects.
-- 'GATHER AUTO' means only objects that have changed.
v_Estimate_Pct PLS_INTEGER;
-- v_Estimate_Pct can also be set to:
-- DBMS_STATS.AUTO_SAMPLE_SIZE (Oracle sets this, but it has been known to be 0...)
-- NULL (which means, look at everything, e.g., 'COMPUTE')
v_schema_name ALL_TABLES.OWNER%TYPE := NVL(UPPER(p_schema_name),USER);
v_Msg_Text VARCHAR2(2000);
BEGIN
CASE p_Estimate_Pct
WHEN NULL THEN
v_Estimate_Pct := 100;
WHEN 100 THEN
v_Estimate_Pct := 100;
ELSE
IF DBMS_STATS.AUTO_SAMPLE_SIZE = 0 THEN
IF p_Estimate_Pct BETWEEN 1 AND 99 THEN
v_Estimate_Pct := p_Estimate_Pct;
ELSE
v_Estimate_Pct := c_Def_Estimate_Pct;
END IF;
ELSE
v_Estimate_Pct := DBMS_STATS.AUTO_SAMPLE_SIZE;
END IF;
END CASE;
DBMS_STATS.GATHER_SCHEMA_STATS(
ownname => v_schema_name,
estimate_percent => v_Estimate_Pct,
cascade => TRUE,
degree => c_Def_Degree,
options => c_Def_Options);
-- "cascade" means analyze indexes too.
v_Msg_Text := c_proc_name||'-> Schema['||v_schema_name||'], Option['||c_Def_Options||'], Sampling['||v_Estimate_Pct||'%]';
DBMS_OUTPUT.PUT_LINE(v_Msg_Text);
EXCEPTION
WHEN OTHERS THEN
v_Msg_Text := c_proc_name||'-> EXCEPTION: '||SQLERRM;
RAISE_APPLICATION_ERROR(-20199,v_Msg_Text);
END analyze_schema;

My questions are related to AUTO_SAMPLE_SIZE:
1. How come it is always 0? That's what I always see, generally in 9.2.0.3 and 9.2.0.4 on Solaris.
2. What should the default estimate percentage be, if DBMS_STATS can't tell me? Anecdotal evidence suggests that the ANALYZE/ESTIMATE percentage should be 35-40%, but then ANALYZE and DBMS_STATS are different...

Tom Kyte
February 07, 2004 - 1:34 pm UTC

requires....

OR "permits".

it requires NO WRAPPING.

It quite handily permits it easily.


you'll have to provide me a reproducible, small, simple, yet 100% complete test case to demonstrate what you mean by "0"

ops$tkyte@ORA9IR2>  exec dbms_stats.gather_table_stats( user,'BIG_TABLE',estimate_percent => dbms_stats.auto_sample_size );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select sample_size from user_tables where table_name = 'BIG_TABLE';
 
SAMPLE_SIZE
-----------
     100000


(which also shows "does not require", but would "permit" wrapping.... ) 

understanding sample_percent

A reader, February 25, 2004 - 3:55 pm UTC

Oracle 9.2.0.2
I observed that when estimate_percent is specified, the gather_schema_stats is still doing a compute statistics. The sample_size=num_rows. Am I incorrect in concluding that its ignoring the 25% and estimating 100% of rows ? If so, why ?
OR have I overlooked at any setting ?

Please look at the following Case :
------- # of rows from T --------
SQL> select count(*) from T
  COUNT(*)
  --------
    600980

---- Querying the dba_tables ---------
SQL> select table_name, monitoring, num_rows, sample_size, to_char(last_analyzed, 'HH24-mi-ss') ANALYZED from dba_tables;

TABLE_NAME            MON   NUM_ROWS SAMPLE_SIZE ANALYZED
--------------------- --- ---------- ----------- --------
T                     YES     600980      600980 15-39-16

------ inserting data ----------
SQL> insert into T select * from T

600980 rows created.

SQL> commit;
Commit complete.

--------- row count from T ---------
SQL> select count(*) from T;
  COUNT(*)
----------
   1201960

-------- Querying the dba_tables ---------
SQL> select table_name, monitoring, num_rows, sample_size, to_char(last_analyzed, 'HH24-mi-ss') ANALYZED from dba_tables;

TABLE_NAME            MON   NUM_ROWS SAMPLE_SIZE ANALYZED
--------------------- --- ---------- ----------- --------
T                     YES     600980      600980 15-39-16

---------- gather_schema_stats ---------
begin
  dbms_stats.gather_schema_stats(
       ownname          => 'TEST',
       estimate_percent => 25,
       method_opt       => 'FOR ALL INDEXED COLUMNS SIZE 1',
       degree           => 2,
       cascade          => FALSE,
       options          => 'GATHER AUTO'
       );
  end;
/ 
PL/SQL procedure successfully completed.

-------- Querying the dba_tables ---------
SQL> select table_name, monitoring, num_rows, sample_size, to_char(last_analyzed, 'HH24-mi-ss') ANALYZED from dba_tables;

TABLE_NAME            MON   NUM_ROWS SAMPLE_SIZE ANALYZED
--------------------- --- ---------- ----------- --------
T                     YES    1201960     1201960 15-48-45

----------------------------------- 

Tom Kyte
February 25, 2004 - 7:13 pm UTC

straight from the docs here:


GATHER AUTO: Gathers all necessary statistics automatically. Oracle implicitly determines which objects need new statistics, and determines how to gather those statistics. When GATHER AUTO is specified, the only additional valid parameters are ownname, stattab, statid, objlist and statown; all other parameter settings are ignored. Returns a list of processed objects.







got it ... but

A reader, February 26, 2004 - 3:38 pm UTC

Tom, I understand it better now.
I took out the invalid parameters.
But my question still lingers :
Why is the gather_schema_stats analyzing all 100% of rows. I changed only 10%(50K of rows) out of 450K rows.
My dbms_job wakes up every few days to execute gather_schema_stats. Now, if I have 10-15 tables which have changed by more than 10%, then analyzing these tables 100%(compute) will take lot of time.
I know oracle automatically calculates how much percentage to analyze. But everytime why is it analyzing all 100% of the rows? Why is it not estimating ?

=========
begin
dbms_stats.gather_schema_stats(
ownname => 'TEST',
options => 'GATHER AUTO'
);
end;
/
==============



Tom Kyte
February 26, 2004 - 4:24 pm UTC

it does not take that long to analyze 1,000,000 rows.

gather stale does not re-analyze just "changed rows"
gather stale re-analyzes the entire object.


If you don't want AUTO, don't use AUTO, use gather stale and fill in the remaining blanks (but for such small tables, i would get it all)

hmmm

A reader, February 27, 2004 - 2:52 pm UTC

Hi Tom,
the thing am fighting is I like to use "stale" because I can specify the percentage to analyze and take advantage of the degree. AUTO is good because it automatically analyzes newly added tables.

Best of both worlds would be :
1) analyze newly added tables -> AUTO does this. STALE doesn't.

2) analyze only percentage of rows, not 100% -> AUTO is analyzing 100%. STALE can do percentage.

Is it possible ?

Thanks as always

Tom Kyte
February 27, 2004 - 3:07 pm UTC

sounds like you might want:

a) alter schema tab monitoring
b) gather empty
c) gather stale

for your degree of control.  consider:




ops$tkyte@ORA920PC> create table t1 as select * from all_users;
 
Table created.
 
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> exec dbms_stats.alter_schema_tab_monitoring
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> select table_name, num_rows from user_tables;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1
 
ops$tkyte@ORA920PC> exec dbms_stats.gather_schema_stats( user, options => 'gather empty' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> select table_name, num_rows from user_tables;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                     45

<b>that picked it up, this</b>

 
ops$tkyte@ORA920PC> exec dbms_stats.gather_schema_stats( user, options => 'gather stale' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> select table_name, num_rows from user_tables;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                     45

<b>did nothing..</b>

 
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> create table t2 as select * from all_users;
 
Table created.
 
ops$tkyte@ORA920PC> insert into t1 select * from t1;
 
45 rows created.
 
ops$tkyte@ORA920PC> insert into t1 select * from t1;
 
90 rows created.
 
ops$tkyte@ORA920PC> insert into t1 select * from t1;
 
180 rows created.
 
ops$tkyte@ORA920PC> commit;
 
Commit complete.
 
ops$tkyte@ORA920PC>
ops$tkyte@ORA920PC> exec dbms_stats.alter_schema_tab_monitoring
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> select table_name, num_rows from user_tables;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                     45
T2
 
ops$tkyte@ORA920PC> exec dbms_stats.gather_schema_stats( user, options => 'gather empty' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> select table_name, num_rows from user_tables;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                     45
T2                                     45

<b>that got t2 and...</b>
 
ops$tkyte@ORA920PC> exec dbms_stats.gather_schema_stats( user, options => 'gather stale' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA920PC> select table_name, num_rows from user_tables;
 
TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                    360
T2                                     45
 
ops$tkyte@ORA920PC>

<b>that got t1...</b>

 

perfecto

A reader, February 27, 2004 - 10:56 pm UTC

pretty GOOD !
Thank you much.
Have a good weekend.

gather_schema_stats('SYS');

A reader, April 27, 2004 - 9:46 am UTC

Tom,

Please let us know, when we need to do the following

execute dbms_stats.gather_schema_stats('SYS');

at what point of time, do you recommend us executing the above sql


Thanks a lot

Tom Kyte
April 28, 2004 - 12:40 pm UTC

in your test system before going production, to ensure you are not going to mess up an existing system.


in 10g, it'll be done for you out of the box.

in 9i, it is "ok and safe in most cases to do" -- but TEST it, don't just DO IT

in 8i and before, I would not do it, not necessary and probably would lead only to troubles. but again, you would TEST it of course.

Confusing situation

Bruno Di Rei Araujo, May 14, 2004 - 2:21 pm UTC

Tom,

I've read you're advice about utilizing DBMS_STAT instead of ANALYZE statement. But, I had an experience that makes me get confused: with analyze, the cost got Higher, but the response time was the best; with DBMS_STATS, little tiny costs, but 12 times the response time (comparing TKPROF results). Is there any reason for that situation?

(TKPROF of ANALYZEd query)
********************************************************************************

SELECT a.seq_mov, a.empresa, a.filial, a.LOCAL, a.produto, a.qtde_mov,
a.vlr_tot_mov, a.num_docto, a.cod_oper, a.dt_mov, a.historico,
a.lote_cont, a.lcto_direto, a.disp_trigger, a.dt_registro,
a.qtde_val,
b.entrada_saida
FROM ce_movest a, ce_operacoes b
WHERE (:b5 IS NULL OR a.empresa = :b5)
AND (:b4 IS NULL OR a.filial = :b4)
AND (:b3 IS NULL OR a.produto = :b3)
AND (:b2 IS NULL OR a.dt_mov >= :b2)
AND (:b1 IS NULL OR a.dt_mov <= (TO_DATE (:b1) + .99999))
AND b.empresa = a.empresa
AND b.cod_oper = a.cod_oper
order by a.empresa, a.filial, a.produto, a.dt_mov, b.ajuste_estoque,
b.entrada_saida /*decode(b.entrada_saida, 'E', decode(b.transferencia, 'S', 3, 1), 2)*/,
a.local, a.qtde_mov

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 2.27 5.99 55 19061 0 4
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 2.27 5.99 55 19061 0 4

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 47

Rows Row Source Operation
------- ---------------------------------------------------
4 SORT ORDER BY (cr=19061 r=55 w=0 time=5997116 us)
4 HASH JOIN (cr=19061 r=55 w=0 time=5996843 us)
103 INDEX FULL SCAN CE_OPERACOES_AK1 (cr=1 r=0 w=0 time=288 us)(object id 553481)
4 TABLE ACCESS FULL CE_MOVEST (cr=19060 r=55 w=0 time=5992324 us)

********************************************************************************




(TKPROF of DBMS_STATSed)
********************************************************************************

SELECT a.seq_mov, a.empresa, a.filial, a.LOCAL, a.produto, a.qtde_mov,
a.vlr_tot_mov, a.num_docto, a.cod_oper, a.dt_mov, a.historico,
a.lote_cont, a.lcto_direto, a.disp_trigger, a.dt_registro,
a.qtde_val,
b.entrada_saida
FROM ce_movest a, ce_operacoes b
WHERE (:b5 IS NULL OR a.empresa = :b5)
AND (:b4 IS NULL OR a.filial = :b4)
AND (:b3 IS NULL OR a.produto = :b3)
AND (:b2 IS NULL OR a.dt_mov >= :b2)
AND (:b1 IS NULL OR a.dt_mov <= (TO_DATE (:b1) + .99999))
AND b.empresa = a.empresa
AND b.cod_oper = a.cod_oper
order by a.empresa, a.filial, a.produto, a.dt_mov, b.ajuste_estoque,
b.entrada_saida /*decode(b.entrada_saida, 'E', decode(b.transferencia, 'S', 3, 1), 2)*/,
a.local, a.qtde_mov

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 13.03 74.99 0 104482 0 2
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 13.03 75.00 0 104482 0 2

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 47

Rows Row Source Operation
------- ---------------------------------------------------
2 SORT ORDER BY (cr=104482 r=0 w=0 time=74999502 us)
2 TABLE ACCESS BY INDEX ROWID CE_MOVEST (cr=104482 r=0 w=0 time=74999276 us)
1276536 NESTED LOOPS (cr=4901 r=0 w=0 time=29219344 us)
103 TABLE ACCESS FULL CE_OPERACOES (cr=3 r=0 w=0 time=728 us)
1276432 INDEX RANGE SCAN CE_MOVEST_FK3 (cr=4898 r=0 w=0 time=10600999 us)(object id 29012)

********************************************************************************

As we can see, the time for the query havin tables DBMS_STATSed were 75 seconds; when ANALYZEd, it took about 6 seconds... Is there a reason for that?

Thanks,
Bruno

Tom Kyte
May 15, 2004 - 12:14 pm UTC

umm, exact commands used please?

Related questions for a data load(9.2.0.4.0)

Matt, May 16, 2004 - 9:39 pm UTC

I have a third party system that initially has no data. I need to load alot of data into this system. The product provides a number of interface tables into which I can load my data. I then initiate a third party supplied procedure that will pick up the data in the interface tables, validate and "interalize" the data.

I want the load to run efficiently. I believe my best bet is to re-generate stats (on the products internal tables) whilst the load is in progress (a 10-20 hrs process). Here are my questions:

Will open cursors necessarily be reparsed when the stats have been generated?

I plan to also generate system stats (io, cpu etc). I am also wondering about generating stats on 'SYS' and 'SYSTEM'. Is this recommended on 9.2.0.4.0?

Many Thanks.




Tom Kyte
May 17, 2004 - 7:19 am UTC

"Will open cursors necessarily be reparsed when the stats have been generated?"

if not, why bother generating stats at all -- they will have their entire working set already cached in memory (in the shared pool). if we did not invalidate the cursors, the stats would "not be used". dbms_stats does have an invalidate flag you can use to control that --- but, I would ask the vendor for their best practice with their product (make sure they support the cbo first and foremost)

as long as you always do gather stats on sys and system, it is definitely a supported option in 9ir2 -- in 10g, it will be "the only way" in fact.

What is the difference between the following options..

Matt, May 28, 2004 - 10:20 am UTC

to dbms_stats.gather_schema_stats:

method_opt => 'FOR ALL INDEXED COLUMNS'
and
method_opt => 'FOR ALL COLUMNS SIZE AUTO' ?

I just drastically improved the performance of a query by using the first. The first appears to have difference use different bucket sizes and generates more stats. Can you please comment on what should be "best practice" as far as generating stats is concerned?

Many Thanks.

I've noted the follwing difference in stats:

Name Null? Type
----------------------------------------- -------- ---------------

VMETERID NOT NULL NUMBER(38)
CUI_TID NOT NULL NUMBER(38)
INTERVAL_ENDDT NOT NULL DATE
INTERVAL_STARTDT NOT NULL DATE
VALUE FLOAT(126)
READ_TYPE_FLAG VARCHAR2(5)
TRANS_TID NUMBER(38)
TRANS_UID NUMBER(38)
TRANS_STARTTS DATE

method_opt => 'FOR ALL INDEXED COLUMNS':
=======================================

COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS DENSITY LAST_ANALYZED
------------------------------ ------------ ---------- ----------- ---------------- -------------------
CUI_TID 1 0 1 1.0000000000 05/28/2004 23:03:14
INTERVAL_ENDDT 796 0 1 .0012562814 05/28/2004 23:03:14
INTERVAL_STARTDT 792 0 1 .0012626263 05/28/2004 23:03:14
READ_TYPE_FLAG 3 0 1 .3333333333 05/28/2004 23:03:14
TRANS_STARTTS 7894 0 1 .0001266785 05/28/2004 23:03:14
TRANS_TID 1 0 1 1.0000000000 05/28/2004 23:03:14
TRANS_UID 1 0 1 1.0000000000 05/28/2004 23:03:14
VALUE 57746 0 1 .0000173172 05/28/2004 23:03:14
VMETERID 487822 0 1 .0000020499 05/28/2004 23:03:14

method_opt => 'FOR ALL COLUMNS SIZE AUTO':
=========================================

Name Null? Type
------------------------------------------------------------------------ -------- -------------------------------------------------
VMETERID NOT NULL NUMBER(38)
CUI_TID NOT NULL NUMBER(38)
INTERVAL_ENDDT NOT NULL DATE
INTERVAL_STARTDT NOT NULL DATE
VALUE FLOAT(126)
READ_TYPE_FLAG VARCHAR2(5)
TRANS_TID NUMBER(38)
TRANS_UID NUMBER(38)
TRANS_STARTTS DATE


INDEXES FOR TABLE: VMC_DATA
================================================


INDEX_NAME COLUMN_NAME COLUMN_POSITION
------------------------------ ------------------------------ ---------------
PKVMC_DATA VMETERID 1
CUI_TID 2
INTERVAL_STARTDT 3


TABLE STATISTICS FOR : VMC_DATA
===================================================


NUM_ROWS BLOCKS AVG_ROW_LEN LAST_ANALYZED
---------- ---------- ----------- -------------------
4680488 35452 100 05/28/2004 22:27:10


COLUMN STATISTICS FOR : VMC_DATA
====================================================


COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS DENSITY LAST_ANALYZED
------------------------------ ------------ ---------- ----------- ---------------- -------------------
CUI_TID 1 0 1 1.0000000000 05/28/2004 22:27:10
INTERVAL_STARTDT 792 0 68 .0012626263 05/28/2004 22:27:10
VMETERID 487489 0 75 .0000031279 05/28/2004 22:27:10

Confusing Situation (Continued)

Bruno Di Rei Araujo, June 05, 2004 - 1:16 pm UTC

Tom,

As you asked for (from my Review above), the exact commands were:

DBMS Stats
----------
exec dbms_stats.delete_table_stats('SAGE', 'CE_MOVEST');
exec dbms_stats.delete_table_stats('SAGE', 'CE_OPERACOES');

exec dbms_stats.gather_table_stats('SAGE', 'CE_MOVEST');
exec dbms_stats.gather_table_stats('SAGE', 'CE_OPERACOES');


ANALYZE
-------
exec dbms_stats.delete_table_stats('SAGE', 'CE_MOVEST');
exec dbms_stats.delete_table_stats('SAGE', 'CE_OPERACOES');

analyze table ce_movest delete statistics;
analyze table ce_operacoes delete statistics;

analyze table ce_movest estimate statistics for table for all indexes for all indexed columns;
analyze table ce_operacoes estimate statistics for table for all indexes for all indexed columns;


Note that my approach was, first, remove all relevant statistics for the method used and then gather them brand new.

Just can't understand why is it so different... Can you see any reason for it?

Thanks again,
Bruno.

Tom Kyte
June 05, 2004 - 1:40 pm UTC

add

cascade => true

to the dbms_stats and retry (so you compare apples to apples). you have no index stats.

Statistics

Aru, September 21, 2004 - 2:34 am UTC

Hi Tom,

Please can you clarify in what circumstances and why would you use the 3 different(listed below ) methods to analyze tables,

1). a@ORA920LAP> analyze table t compute statistics
for table for all indexes for all indexed columns;
2). a@ORA920LAP> analyze table t compute statistics;
3). dbms_stats(....

I,m using oracle8i.

Q1) Why would I need to compute stats for
all indexed columns when the whole table is getting
analyzed.

Q2) Does dbms_stats analyze the table, it's indexes and the indexed columns too or what exactly does it do.If I specify a schema then does it to all the above mentioned statistics.

Thanks lots as always Tom,
Regards,
Aru.





Tom Kyte
September 21, 2004 - 7:43 am UTC

1) i would not (do not) -- i have old examples that do but promise never to do it in an example again

2) see #1

3) yes.

q1) because you wanted to have histograms for all indexed columns (not saying you do, but if you did "for all indexed columns" that is the outcome -- you have histograms for indexed columns)

q2) see the documentation -- it'll tell you that method_opt defaults to 'for all columns size 1' and cascade (to indexes) defaults to FALSE, so you would have to pass TRUE to get indexes done.

cascade=true -- > local indexes and global indexes?

Pravesh Karthik from India, January 02, 2005 - 7:09 am UTC

Tom,


For a partitioned table, if i say cascade=true. It will collect stats on all local indexes and global indexes?

If yes,i dont need to use GATHER_INDEX_STATS - right?

Please confirm.

Thanks a lot
Pravesh Karthik

Tom Kyte
January 02, 2005 - 11:05 am UTC

depends on your granularity -- but left to the default,

exec dbms_stats.gather_table_stats( user, 'T', cascade=>true );

will get table, table partition, index, index partition statistics.


if yes -- then index statistics in which mode?

Pravesh Karthik from India, January 02, 2005 - 7:51 am UTC

...contd

and if yes for local and global indexes. what mode does it do for the indexes ?

"It's not recommended to compute index statistics in estimated mode, as the Cost Optimizer is almost based on index statistics, particularly on the CLUSTERING_FACTOR value. It needs the statistics would be as accurate as possible. "

above lines are from -- </code> http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_id=175258.1&p_database_id=NOT <code>

Thanks again ..

Tom Kyte
January 02, 2005 - 11:14 am UTC

that is a broad sweeping generalization that is not really accurate (or well stated -- "... to COMPUTE index statistics in ESTIMATED mode..." -- bummer. You either COMPUTE or you ESTIMATE)

and to say the CBO is almost based on index statistcs -- well, I sent an email to the author of the note asking them to remove that paragraph.

awesome !!

Pravesh Karthik from India, January 02, 2005 - 11:37 am UTC


chong, January 03, 2005 - 2:49 am UTC

Hi Tom,
1)exec dbms_stats.gather_database_stats(method_opt=>'FOR ALL INDEXED COLUMNS',degree=>7,cascade=>true);

2)exec dbms_stats.gather_database_stats(method_opt=>'FOR ALL INDEXED COLUMNS SIZE AUTO',degree=>7,cascade=>true);

Is the first command will collecting histogram? If yes what is the different between first and second command?

thanks

Tom Kyte
January 03, 2005 - 8:45 am UTC

see
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:5792247321358#7417227783861

they are very different.  size auto sort of needs a historical workload to peek at (queries) as well -- that is part of it's algorithm.  

consider:

ops$tkyte@ORA9IR2> create table t as select * from all_objects;
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create index t_idx1 on t(object_id);
 
Index created.
 
ops$tkyte@ORA9IR2> create index t_idx2 on t(owner);
 
Index created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns', cascade => true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;
 
COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_ID                    76
OWNER                        22
 
<b>histograms collected for both...</b>

ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns size auto', cascade => true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;
 
COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_ID                     2
OWNER                         2

<b>no histograms -- just hi/lo values and such.  there were no queries that ever referenced either column so histograms were not collected</b>

 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns size skewonly', cascade => true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;
 
COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_ID                     2
OWNER                        22
 
<b>it looked at the columns and said "object id, unique -- no histograms needed there", owner -- not so unique and pretty skewed, lets get them</b>



But see that link, the meanings are defined there. 

chong, January 03, 2005 - 10:26 pm UTC

Hi Tom,
Based on the link, size auto means:
It'll collect histograms in memory only for those columns which are used by your applications (those columns appearing in a predicate involving an equality, range, or like operators). we know that a particular column was used by an application because at parse time, we'll store workload information in SGA. Then we'll store histograms in the data dictionary only if it has skewed data (and it worthy of a histogram)
What i understand from the above definition is (please correct me if i'm wrong), oracle will only collect histogram on those tables' columns which are ever been used since database startup and has skewed data, it will not collect histogram for others tables' columns that never been used since database startup even skewed data exists. If my understanding is true, how long oracle will store those table parsing information in SGA (so that dbms_stats will know which column to analyze)? Does it means this option is not appropriate for a fresh database which is just starting up?
I see some of the oracle notes highly recommend to used this option to analyze the histogram information, is it true?


Tom Kyte
January 03, 2005 - 10:45 pm UTC

traveling right now but this is worthy of followup later after some research - so, check back maybe later this week.

(see, I don't know everything off the top of my head :)



Update:

looks like sys.col_usage$, it is persistently stored in the dictionary. The name of the columns that represent "flags" in there says it all.




dbms_stats.gather_schema_stats - Oracle9i

Altieri, January 04, 2005 - 4:19 pm UTC

Tom,
I'm using dbms_stats.gather_schema_stats( ownname=>'XXX') to analyze a whole schema.
I dont want to analyze just a table in this schema. What I have to do ?.
I have been verifying all parameters, but and I didnt find any options yet.

Thanks,
Altieri


Tom Kyte
January 05, 2005 - 9:06 am UTC

there is no "anti gather" parameter.


you would have to use gather table stats on the tables you wanted stats on, or move the table to another schema.

seems strange to not want stats on a single table doesn't it?

How to determine if statistics were collected by ANALYZE?

Alexander, January 07, 2005 - 3:19 pm UTC

Tom, how to determine if the statistics were collected by ANALYZE command or by the package?



Tom Kyte
January 08, 2005 - 4:15 pm UTC

if you look at columns that are filled in by ANALYZE but not dbms_stats - avg_space is one - you might be able to tell if ANALYZE was at some time in the past used:

ops$tkyte@ORA9IR2> create table t as select * from all_Users;
 
Table created.
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select avg_space from user_tables where table_name = 'T';
 
 AVG_SPACE
----------
         0
 
ops$tkyte@ORA9IR2> analyze table t compute statistics;
 
Table analyzed.
 
ops$tkyte@ORA9IR2> select avg_space from user_tables where table_name = 'T';
 
 AVG_SPACE
----------
      7488
 
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> select avg_space from user_tables where table_name = 'T';
 
 AVG_SPACE
----------
      7488


but dbms_stats won't overwrite the value -- so it is not conclusive proof.
 

followup on 3 jan 2005

chong, January 11, 2005 - 2:19 am UTC

Hi Tom,
Do you find out any research on 3rd jan 2005 thread which relate to dbms_stats with size auto option?

thanks

dbms stats failure

Rob H, January 11, 2005 - 12:18 pm UTC

I have the database setup (9i) to gather statistics however I am running into a strange error

BEGIN DBMS_STATS.GATHER_DATABASE_STATS(cascade=>TRUE); END;

*
ERROR at line 1:
ORA-00904: : invalid identifier
ORA-06512: at "SYS.DBMS_STATS", line 9375
ORA-06512: at "SYS.DBMS_STATS", line 9857
ORA-06512: at "SYS.DBMS_STATS", line 10041
ORA-06512: at "SYS.DBMS_STATS", line 10134
ORA-06512: at "SYS.DBMS_STATS", line 10114
ORA-06512: at line 1

I don't understand what
"ORA-00904: : invalid identifier" means with nothing between the 2 colon's. Any ideas? I think I know the schema its failing in.

Tom Kyte
January 11, 2005 - 1:48 pm UTC

trace it.

any 'bad' sql will be in the trace file. if you think you know the schema, just gather schema stats and see if it repros in there, tracefile would be smaller.

structure not sql

Rob H, January 11, 2005 - 1:56 pm UTC

Because I'm doing an analyze, isn't it an issue of object structure, not sql? Regardless, I am tracing it now.

Tom Kyte
January 11, 2005 - 2:03 pm UTC

dbms_stats runs sql

ora-904 is raised from sql.

interesting....

Rob H, January 12, 2005 - 12:21 pm UTC

after analyzing the tkprof for the failure I found that the table that was failing. What was interesting was that I found that the table being analyzed was failing on a specific index. One that is incredibly old (also note that table contains hourly transactions in the realm of only 6 million rows). Dropping the index and using dbms_stats again worked. Adding the index back and using dbms_stats worked.

TKPROF

********************************************************************************

select /*+ cursor_sharing_exact dynamic_sampling(0) no_monitoring no_expand
index(t,"INDX_TAB_CAMP") */ count(*) as nrw,count(distinct
sys_op_lbid(30519,'L',t.rowid)) as nlb,count(distinct "CAMP") as ndk,
sys_op_countchg(substrb(t.rowid,1,15),1) as clf
from
"KCAGENT"."TAB_ALL" sample block
(5.77406383513445589344056295551646142247) t where "CAMP" is not null


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 1 14 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.00 0.00 1 14 0 0

Misses in library cache during parse: 1
Optimizer mode: CHOOSE
Parsing user id: 43 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
0 SORT GROUP BY (cr=0 r=0 w=0 time=0 us)
1 INDEX SAMPLE FAST FULL SCAN INDX_TAB_ALL_ID_DATE (cr=14 r=1 w=0 time=6851 us)(object id 49563)

********************************************************************************


But the index in the hint is different than the index in the explain.

Regardless, the immediate problem is solved, thank you.

will the statistics be automatically be updated ?

Chenna, January 30, 2005 - 12:18 pm UTC

I'm interested in knowing in the following

1.Lets say table t has 1000 rows

2.Generate statistics for table t and the associated indexes.

3.Now lets truncate table t.

--will the statistics be automatically be updated ?

4.In the next scenario , lets delete all the rows without truncating.

--will the statistics be automatically be updated ?

If they dont can we do something to update the statistics immedietly based on a certain operation.

Thanks








Tom Kyte
January 30, 2005 - 12:36 pm UTC

when you tried it, what happened? I mean, the scenario you scoped out here would be pretty *trivial* to test? That is what I'm trying to teach here -- you "spec"ed out the question very well, the test case you would need to see would be very easy.


statistics are gathered when you ask us to gather them, or some job you run gathers them.

Very interesting discussion.

Bob, May 22, 2005 - 9:57 pm UTC

Thanks everyone.

My question - 10g's "black box" statistics collection mechanism, does it collect histograms and if so, through what parameters?

I've been reasonably satisfied with the execution plans we're getting and collecting system statistics in particular has been a boon. However there still are some zingers and they typically involve tables with what I know to be rather skewed data distribution on joined columns. I'd like to keep the process as "black box" as possible (our DBAs have bigger problems to worry about) so I've been wondering how I could verify and/or incorporate histogram collection into the black box, or whether I need to disable the auto collection and wrap up my own.

Thanks in advance - this forum is great!

Tom Kyte
May 23, 2005 - 8:21 am UTC

it uses "auto" and "auto" uses sys.col_usage$ to determine what columns have predicates applied to them and how the predicates use the column (equality, range scans.

so the set of columns that histograms are collected on can vary over time by simply introducing a new query into the system that "where's" or joins on columns that were never used that way before.

A reader, May 25, 2005 - 8:39 am UTC

Once a table is set up as Monitoring, how do I disable monitoring, if I need to?

Tom Kyte
May 25, 2005 - 11:43 am UTC

ops$tkyte@ORA9IR2> alter table t monitoring;
                                                                                
Table altered.
                                                                                
ops$tkyte@ORA9IR2> alter table t NOmonitoring;
                                                                                
Table altered.
 

Ravi, May 25, 2005 - 8:39 am UTC

How do I disable MONITORING on a table?

DBMS_STATS with low sample size produces wrong DISTINCT_KEYS on nonunique index

Marco Coletti, May 27, 2005 - 3:59 pm UTC

DBMS_STATS seems very unaccurate at estimating number of distinct keys on nonunique indexes.

I think it could be a bug.

/*
Test case:
DBMS_STATS gives wrong DISTINCT_KEYS for nonunique index
*/

select * from V$VERSION;

/*
BANNER
------------------------------------------------------------Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
PL/SQL Release 9.2.0.6.0 - Production
CORE 9.2.0.6.0 Production
TNS for Solaris: Version 9.2.0.6.0 - Production
NLSRTL Version 9.2.0.6.0 - Production
*/


create table NUMBERS as
select mod(N,2000001) N from (select ROWNUM N from DUAL connect by ROWNUM <= 6000000)
;

create index IX_N on NUMBERS (N) compute statistics;

select count(distinct N) from NUMBERS;
/*
COUNT(DISTINCTN)
----------------
2000001
*/

select i.DISTINCT_KEYS, i.SAMPLE_SIZE, i.LAST_ANALYZED
from USER_INDEXES i
where i.INDEX_NAME = 'IX_N'
;
/*
DISTINCT_KEYS SAMPLE_SIZE LAST_ANALYZED
------------- ----------- -------------------
2000001 6000000 2005-05-25 16:25:12
*/


-- begin bug ------------------------------------
begin
DBMS_STATS.GATHER_INDEX_STATS (
ownname => user,
indname => 'IX_N',
estimate_percent => 20,
degree => 1);
end;
/

select i.DISTINCT_KEYS, i.SAMPLE_SIZE, i.LAST_ANALYZED
from USER_INDEXES i
where i.INDEX_NAME = 'IX_N'
;
/*
DISTINCT_KEYS SAMPLE_SIZE LAST_ANALYZED
------------- ----------- -------------------
414785 1188855 2005-05-25 16:27:39
*/
-- end bug ---------------------------------------


analyze index IX_N estimate statistics sample 20 percent;

select i.DISTINCT_KEYS, i.SAMPLE_SIZE, i.LAST_ANALYZED
from USER_INDEXES i
where i.INDEX_NAME = 'IX_N'
;
/*
DISTINCT_KEYS SAMPLE_SIZE LAST_ANALYZED
------------- ----------- -------------------
2023171 1415367 2005-05-25 16:47:17
*/


begin
DBMS_STATS.GATHER_INDEX_STATS (
ownname => user,
indname => 'IX_N',
estimate_percent => NULL,
degree => 1);
end;
/

select i.DISTINCT_KEYS, i.SAMPLE_SIZE, i.LAST_ANALYZED
from USER_INDEXES i
where i.INDEX_NAME = 'IX_N'
;
/*
DISTINCT_KEYS SAMPLE_SIZE LAST_ANALYZED
------------- ----------- -------------------
2000001 6000000 2005-05-25 16:48:53
*/



Tom Kyte
May 27, 2005 - 5:32 pm UTC

it is being computed the way the intended (believe it or not).

do you have a compelling case where by the number of distinct values results in a really wrong plan?

There is an enhancement request in to have a "re-look" at this, but if you have a compelling "bad plan", I'll be glad to take a look and see if we cannot get that into the system.

No compelling bad plan due to bad NDV for nonunique index

Marco Coletti, May 31, 2005 - 9:25 am UTC

We have lots of almost-unique indexes in our system.

Probably this is a simptom of bad design, but in development we were heavily constrained by the requirements, which are basically:
* a linear chain of four external systems that exchange messages bidirectionally about entities via Tibco bus; the systems do not employ an entity omogeneus unique key, but only entity unique keys specific for each system (i.e. each system marks an entity with its own key format), and correlation keys from system to system
* collect entities and related events from these systems intercepting the messages in near real-time
* aggregate the events every 1 hour
* make a entity flat warehouse table to be queried by a GUI; refresh it every 1 hour
* delete "dead" entities each day
* the average number of live entities is about 3.5 million per system

I don't remember a specific query which performs bad due to this problem.


Tom Kyte
May 31, 2005 - 10:20 am UTC

that is the issue though.

dbms_stats uses block sampling to quickly guestimate these statistics. the block sampling tends to read "sorted data" in the index. So, the number of distinct values is hard to compute. Analyze on the other hand used a totally different technique.

But unless you can show this is leading to "bad things happening", that the NDV is off materially affects somethnig -- there isn't an issue. The sampling is happening the way they intended it to happen, the bug was opened and closed on this. They would need a compelling case in order to re-look at it.

Is NDV accurate? No, observably so.
Is that fact an issue? That is what they would need to see.

External tables

A reader, June 06, 2005 - 9:49 am UTC

We thought we'd create a DDL trigger to automatically turn on Monitoring for new tables, but this DID NOT work for External tables.
Is there some way, (considering it is an AFTER trigger) to know, for SURE, that the current table

1)Is NOT an External table and set them to monitoring
or
2)IS an External table and dont set Monitoring for it?

CREATE OR REPLACE TRIGGER object_change AFTER
CREATE ON SCHEMA
DECLARE
alt_mont varchar2(1000);
BEGIN
if (ora_sysevent in ('CREATE') and
ora_dict_obj_type = 'TABLE') then
alt_mont :='ALTER TABLE ' ||ora_dict_obj_name||' MONITORING ';
execute immediate alt_mont;
end if;
end;
/

Tom Kyte
June 06, 2005 - 10:50 am UTC

either query the dictionary or just "try" and catch the exception (expect it, ignore it)

A reader, June 06, 2005 - 2:12 pm UTC

Thanks, accept catching exception is a brilliant solution, but can't find anything in the Data Dictionery that suggests the table being created is an External one.
By looking at xxx_objects or xxx_tables, what is the DEFINITIVE way to categorise Tables?

Tom Kyte
June 06, 2005 - 3:04 pm UTC

ops$tkyte@ORA9IR2> select * from user_external_tables;
 
no rows selected
 

replacing analyze with dbms_stats in 9.2.0.6

James K. Purtzer, July 18, 2005 - 1:09 pm UTC

Currently using:
DBMS_UTILITY.ANALYZE_SCHEMA('ESD', 'COMPUTE');
It takes about 70 minutes to run on about 30 Gb of Data warehouse weekly.

Would like to replace with:

dbms_stats.gather_schema_stats(ownname=>'ESD',
estimate_percent=>dbms_stats.auto_sample_size,
method_opt=>'AUTO', cascade=>TRUE);

1. I'm struggling with how to assess impact on the production server. I can run the new
query on an RMAN restore of the data, without a representative connection load. Is that a fair test per your earlier recommendations?

2. Does the old analyze data need to be dropped before I run the new statement?

3.Looking at the DBMS_STATS statement above, as its specified
It will only gather stale statistics from objects the CBO is using?



Tom Kyte
July 18, 2005 - 1:41 pm UTC

1) I'd be looking to compare plans.  Biggest thing will be auto in the method_opt, in order for that to work, you need to have had some queries going over time.  auto (you have the syntax wrong, see below) will not work "good" until it knows the predicates you use:

ops$tkyte-ORA9IR2> create table ttt as select object_name, owner, object_type from all_objects;
 
Table created.
 
ops$tkyte-ORA9IR2> create index t_idx1 on ttt(owner);
 
Index created.
 
ops$tkyte-ORA9IR2> exec dbms_stats.gather_schema_stats( ownname=> user, estimate_percent=>dbms_stats.auto_sample_size, method_opt=>'for all columns size AUTO',cascade=>true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte-ORA9IR2>
ops$tkyte-ORA9IR2> set autotrace traceonly explain
ops$tkyte-ORA9IR2> select * from ttt t1 where owner = 'SCOTT';
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=1271 Bytes=48298)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TTT' (Cost=12 Card=1271 Bytes=48298)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX1' (NON-UNIQUE) (Cost=3 Card=1271)
 
<b>see the card = 1271</b> 
 
ops$tkyte-ORA9IR2> set autotrace off
ops$tkyte-ORA9IR2>
ops$tkyte-ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'TTT' group by column_name;
 
COLUMN_NAME                      COUNT(*)
------------------------------ ----------
OBJECT_NAME                             2
OBJECT_TYPE                             2
OWNER                                   2
 
<b>only two buckets, no histograms really</b>

ops$tkyte-ORA9IR2> exec dbms_stats.gather_schema_stats( ownname=> user, estimate_percent=>dbms_stats.auto_sample_size, method_opt=>'for all columns size AUTO',cascade=>true );
 
PL/SQL procedure successfully completed.
 
ops$tkyte-ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'TTT' group by column_name;
 
COLUMN_NAME                      COUNT(*)
------------------------------ ----------
OBJECT_NAME                             2
OBJECT_TYPE                             2
OWNER                                  22
 
<b>but it saw the predicate now, so it got them</b>

ops$tkyte-ORA9IR2>
ops$tkyte-ORA9IR2> set autotrace traceonly explain
ops$tkyte-ORA9IR2> select * from ttt t2 where owner = 'SCOTT';
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=40 Bytes=1520)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TTT' (Cost=2 Card=40 Bytes=1520)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX1' (NON-UNIQUE) (Cost=1 Card=40)
 
 
 
ops$tkyte-ORA9IR2> set autotrace off
ops$tkyte-ORA9IR2> spool off

<b>and that card= is more accurate</b>



2) no

3) no, it'll gather them all.  you would need to use the alter table T monitoring command and then do a gather stale option. 

so method_opt updates as queries run on new stats?

James K. Purtzer, July 18, 2005 - 3:26 pm UTC

How does the method_opt=>'for all column size AUTO' update as the CBO parses queries? Your example showed cardiality decreasing as more queries run on the table, correct?

Tom Kyte
July 18, 2005 - 4:02 pm UTC

ctl-f for col_usage$ on this page.

transport stats to another table

valent, July 18, 2005 - 4:35 pm UTC

Hello Tom,

let's say i want to transport statistics from one big table to another.

create table t2 as select * from big_table

big table has up to date statistics and i want to transport them to t2

import_table_stats will not work, because it uses same table name.

thanks for your option

Tom Kyte
July 18, 2005 - 5:08 pm UTC

you can use SET_TABLE_STATS (you do know the stats will be different right -- t2 will be nicely compacted, no whitespace, the clustering factor on indexes will possibly be different.....)

the stats are not made to go from table to table.

but if you look at the stats table you export to, you'll see table names in there, it is not hard to figure out what you would have to update in the stats table after you export to it to touch up the table name.

valent, July 19, 2005 - 4:22 am UTC

thank you Tom,

we have large reporting table (no updates) with 15+ million rows par partition, created with CTAS. Statistics should be very similar for my new table copy.

Tom Kyte
July 19, 2005 - 7:34 am UTC

go for it then

suggestion..!

Reader, July 20, 2005 - 5:05 am UTC

Tom,

Every week we are analyzing the database schema using DBMS_STAT package as a batch process in our production environment.
But the this job is getting failed often with ORA-01555 Snapshot too old.We could understand the cause of the problem that is,this analyze job and another job which is schedule in the batch running parallely getting clashed at some point and end with ORA-01555 error.Here my question is,if we re-run the DBMS_STAT package after it failed,its analyzing all the objects from the begining.But we want to analyze objects from the point it failed previously.Is there any option in DBMS_STAT package?.Pls advice.I hope you understand the scenario .













Tom Kyte
July 20, 2005 - 6:42 am UTC

why don't we fix the problem? the 1555? make your UNDO permanently larger -- sized right to fullful the requirements of your system

the other path will be "write code" as dbms_stats doesn't have a "remember where you were and start over", we'd have to help it out that way.

dbms_stats hanging,

sns, July 20, 2005 - 9:23 am UTC

Lately we are experiencing an issue with DBMS_STATS package when we try to analyze the table.

This is what we did:
BEGIN dbms_stats.gather_table_stats(ownname => 'EURO_TSTAT', estimate_percent => 15, tabname => 'CALL_SYSTEM_EURO', block_sample => FALSE, method_opt => 'FOR ALL INDEXED COLUMNS SIZE 1', degree => 4, granularity => 'ALL', cascade => TRUE ); END;


What are the things we need to see while dbms_stats are in hung state?
Any steps to resolve this?

thanks,

Tom Kyte
July 20, 2005 - 12:49 pm UTC

you could trace it, you could query v$ views (eg: v$session_event) to see what it is waiting on



sql_trace with waits (10046 trace) could be used.



need to analyze index separately..!

Reader, August 06, 2005 - 10:17 pm UTC

Tom,

I have noted that gather schema stats would not analyze indexes until we specify cascade=>true,
is it true?
here the procesure I'm using,kindly give suggestion on this to do the normal process with indexes as well..
exec dbms_stats.gather_schema_stats( -
ownname => 'OMSTNX', -
options => 'GATHER AUTO', -
estimate_percent => dbms_stats.auto_sample_size, -
degree => DBMS_STATS.DEFAULT_DEGREE -
)



Tom Kyte
August 07, 2005 - 9:14 am UTC

that is the documented purpose of cascade, yes.

Trying to understand this behavior...

Mark, August 15, 2005 - 10:02 am UTC

Hi Tom,

We have this as a job run Weekly (Sunday, Early AM):
DBMS_STATS.GATHER_SCHEMA_STATS(ownname =>'HT4'
, estimate_percent => 40
, block_sample => FALSE
, method_opt => 'FOR ALL COLUMNS SIZE 1'
, degree => NULL
, granularity => 'DEFAULT'
, cascade => TRUE
);

Now, I came into work Monday and the performance was 'yukky', so I did this:
DECLARE
BEGIN
EXECUTE IMMEDIATE 'ANALYZE TABLE PAT_INSURANCE ESTIMATE STATISTICS FOR TABLE FOR ALL INDEXED COLUMNS FOR ALL INDEXES';
EXECUTE IMMEDIATE 'ANALYZE TABLE PAT_PROCEDURE ESTIMATE STATISTICS FOR TABLE FOR ALL INDEXED COLUMNS FOR ALL INDEXES';
EXECUTE IMMEDIATE 'ANALYZE TABLE PAT_PROCEDURE_BALANCE ESTIMATE STATISTICS FOR TABLE FOR ALL INDEXED COLUMNS FOR ALL INDEXES';
EXECUTE IMMEDIATE 'ANALYZE TABLE PAT_FINANCIAL_TRANSACTION ESTIMATE STATISTICS FOR TABLE FOR ALL INDEXED COLUMNS FOR ALL INDEXES';
EXECUTE IMMEDIATE 'ANALYZE TABLE ALL_NAME ESTIMATE STATISTICS FOR TABLE FOR ALL INDEXED COLUMNS FOR ALL INDEXES';
EXCEPTION
WHEN OTHERS
THEN
NULL;
END;

... And all was good again. Any ideas what is incorrect perhaps on the DBMS_STATS.GATHER_SCHEMA_STATS parameter settings, and could you suggest a good method using this procedure?

Thanks!

Tom Kyte
August 15, 2005 - 4:00 pm UTC

you would need to tell us what happened between sunday early am and monday morning.

and why you have a when other then null????????
erase that and stop using it, you want to just silently ignore ERRORS!!!!!???

I can say that the dbms_stats -- you told it "no histograms", and the analyze you let it "get histograms" on the indexed columns -- that jumps right out.

Yes, ok.

Mark, August 17, 2005 - 2:36 pm UTC

Thanks Tom.

The Exception handler was quick adhoc code and I don't usually code that way.

Anyway, this is what I've done:

1) turned table monitoring on
2) Via JOB at 1AM:
/* Gather Stale Stats Nightly */
DBMS_STATS.GATHER_SCHEMA_STATS(ownname =>'HT4',
estimate_percent => 40, block_sample => FALSE, method_opt =>
'FOR ALL INDEXED COLUMNS', degree => NULL, granularity =>
'DEFAULT', options => 'GATHER STALE', cascade => TRUE);

3) Once per week (Sunday Early AM):
/* Gather All Stats Weekly */
DBMS_STATS.GATHER_SCHEMA_STATS(ownname =>'HT4',
estimate_percent => 40, block_sample => FALSE, method_opt =>
'FOR ALL INDEXED COLUMNS', degree => NULL, granularity =>
'DEFAULT', options => 'GATHER', cascade => TRUE);

This seems to give me a more consistent performance.

Thanks!

collecting initial stats using dbms_stats

JK Purtzer, September 02, 2005 - 1:14 pm UTC

running the package mentioned above in my previous review

begin
dbms_stats.gather_schema_stats(ownname=> 'ESD',estimate_percent=>dbms_stats.auto_sample_size,
method_opt=>'for all columns size
AUTO',cascade=>true );
commit;
end;

I can't seem to verify with
</code> http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96533/stats.htm#33117 <code>
SELECT TABLE_NAME, NUM_ROWS, BLOCKS, AVG_ROW_LEN,
TO_CHAR(LAST_ANALYZED, 'MM/DD/YYYY HH24:MI:SS')
FROM DBA_TABLES
WHERE TABLE_NAME IN ('SO_LINES_ALL','SO_HEADERS_ALL','SO_LAST_ALL');
I get 0 rows
thinking I'm missing something about initially gathering stats, I looked at ch6 of effective oracle by design also
the procedure only takes 1 sec to run, it can't be gathering stats quite that fast or can it?

Tom Kyte
September 03, 2005 - 7:40 am UTC

well, that dbms_stats will gather stats for the ESD schema

could it take 1 second? depends on the size of the ESD schema.

That query will return rows for those exact table names, if nothing is coming back, you don't have any tables by those names.

Nice

Catherine, September 15, 2005 - 2:10 pm UTC

Hi Tom,
How Statistics gathering is useful for the Optimizer??
In what ways it uses the statistics??

Please provide a detailed answer.

Bye!!


Support of Analyze

Bill, September 28, 2005 - 9:26 am UTC

Hi Tom,

Both you and the Oracle documentation (9i) state the cost-based optimizer will eventually use only statistics that have been collected by DBMS_STATS.

Do you know when this will be the case?

Also, is the DBMS_DDL.ANALYZE_OBJECT procedure equivalent to ANALYZE TABLE?

Thanks!

Tom Kyte
September 28, 2005 - 10:25 am UTC

The optimizer expects the statistics to be gathered by dbms_stats, the stats generated by analyze can and will be different than dbms_stats, but the optimizer is assuming "dbms_stats" did that.

There is no "end of life" on analyze for statistics gathering

JK Purtzer, October 03, 2005 - 5:14 pm UTC

I ran the sql below to gather stats:

begin
dbms_stats.gather_schema_stats(ownname=> 'schema',
estimate_percent=>dbms_stats.auto_sample_size,
cascade=>true);
commit;
end;

This caused several of the plans for procedures that are "unhinted" and were using indexes to stop using them and do full table scans, what would you suggest to look at to find out why.

Thanks again for the great book from the OTN lounge at OOW05

Tom Kyte
October 03, 2005 - 9:01 pm UTC

tell me you tested this in a test system first please????


you just turned on the CBO, entirely. Some things will go better - some will be unchanged and invariably, there will be others that do not go so well the first time (they are the ones you NOTICE, the others you do not).

They are something to be analyzed in a testing environment. Without version info and other stuff - no comment.

analyze and cursor invalidation

Al, October 04, 2005 - 11:05 am UTC

In a followup above you state:
----------------
analyzing the table will invalidate open cursors on that object in the shared pool. the next parse against that object will be "hard". hard does lots more recursive sql then "soft"
-----------------

I don't fully understand what you mean by 'invalidate open cursors". An anaylze of a table will not cause some other session's executing sql to fail, will it?

For complicated reasons I will need to run a bunch of dbms_stats.gather_table_stats after an application upgrade, and there will be real users on the application at that point. I am assuming that as the stats are gathered, any parse (soft or hard) of any sql on an analyzed object will cause complete re-optimization, but that in no case will an executing statement fail because of the analyze. Is this correct?

The goal here, obviously, is to get users on the application as quickly as possible after the upgrade, even if performance isn't what we'd like; then gather the stats as the system runs to bring it back to its 'normal' statistics state and 'normal' performance.

Tom Kyte
October 04, 2005 - 4:43 pm UTC

it'll invalidate the shared sql, the cursors in the shared pool so they next time they are executed, they'll hard parse.

it will not break "already running" stuff.

anaylze table, open cursors and parsing ...

Catalin CIONTU, October 04, 2005 - 2:23 pm UTC

Tom, please correct me if i'm wrong, but by analyzing the table, its corresponding open cursors will be invalidated in the shared_pool and the only difference is that the cursors have to do a hard parse instead of a soft parse . The process will complete, but will be less effective, this time. Regards, catalin .

Tom Kyte
October 04, 2005 - 6:24 pm UTC

the process will complete, yes, as to it "effectiveness", one of three things can happen..... (slower, faster, the same)

dbms_stats quibble

JK Purtzer, October 04, 2005 - 6:40 pm UTC

9.2.0.6 DB
We did some testing,(see general question above, sorry about that one, panic'd)Here is a more specific one.

We are now using the DBMS_STATS package for collecting stats. We were previously using the old DBMS_UTILITY.ANALYZE_SCHEMA method.

The following is the new method we are using to collect stats:


dbms_stats.gather_schema_stats(

ownname=>’XXX',

estimate_percent=>dbms_stats.auto_sample_size,

method_opt=>'for all columns size AUTO',

cascade=>true);


Since we changed over to the new statistics gathering, we have seen significant degradation in some of our existing queries. In analyzing one of the queries, we found some strange behavior by the CBO when evaluating a TO_DATE conversion in the where predicate:


When the date string includes a 4-digit year such as this:


WHERE OA.CREATE_DATE >= TO_DATE('7/28/2005', 'MM/DD/RR')


It uses the index on the CREATE_DATE column which is desired.



When the date string uses a 2-digit year such as this:



WHERE OA.CREATE_DATE >= TO_DATE('7/28/05', 'MM/DD/RR')



It does NOT use the index on the CREATE_DATE column even though this also converts the string to 7/28/2005.


The difference in execution times between these two execution plans is less than a second when the index is used to minutes when the index is not used.

This is quite confusing and was hoping you could shed some light on this.


Tom Kyte
October 04, 2005 - 9:01 pm UTC

WHERE OA.CREATE_DATE >= TO_DATE('7/28/2005', 'MM/DD/RR')

THAT is a constant....

WHERE OA.CREATE_DATE >= TO_DATE('7/28/05', 'MM/DD/RR')

THAT is an "unknown", could be 1905, 2005, 2105 - depends on when you execute it (I know, a human would be able to sort of make a good guess - but the date format is not constant)


Are you sure this is due to stats? can we see an autotrace traceonly explain?





date quibble with explain plans

JK Purtzer, October 04, 2005 - 9:47 pm UTC

Hope you can read this, my email cut and paste got rid of the formatting, need a clarification
statement that the 2-digit year is an unknown. I am issuing the query today and by Oracle's rules that would mean an 05 year with an RR date format elelment would return year 2005. Tha makes it a constant as of right now doesn't it?


Using a 4-digit year in the to_date string:






Cost
Cardinality
Bytes

SELECT STATEMENT, GOAL = CHOOSE


3102
207
40986

SORT ORDER BY


3102
207
40986

HASH JOIN OUTER


3094
207
40986

HASH JOIN OUTER


2717
206
30488

NESTED LOOPS


2714
206
29046

HASH JOIN OUTER


2302
206
26162

NESTED LOOPS OUTER


2227
206
23484

NESTED LOOPS


2227
206
21836

HASH JOIN


2021
206
16686

TABLE ACCESS BY INDEX ROWID
WAREHOUSE
ORDER_AGGREGATE
1988
206
14214

INDEX RANGE SCAN
WAREHOUSE
OA_CREATE_DATE_IDX
30
5041


TABLE ACCESS FULL
WAREHOUSE
PUBLISHER_PROFILE
32
11898
142776

TABLE ACCESS BY INDEX ROWID
ESD
PUBLISHERS
1
1
25

INDEX UNIQUE SCAN
ESD
PUBLISHERS_PK

1


INDEX UNIQUE SCAN
ESD
SALES_AGENT_ID_PK

1
8

TABLE ACCESS FULL
WAREHOUSE
FULFILLMENTS
74
48036
624468

TABLE ACCESS BY INDEX ROWID
WAREHOUSE
ORDER_EXT_AGGREGATE
2
1
14

INDEX UNIQUE SCAN
WAREHOUSE
ORDERS_EXT_AGG_PK
1
1


TABLE ACCESS FULL
WAREHOUSE
CURRENCIES
2
169
1183

VIEW
WAREHOUSE

375
38293
1914650

SORT GROUP BY


375
38293
1799771

HASH JOIN


66
38293
1799771

TABLE ACCESS FULL
WAREHOUSE
RETURN_REASONS
3
696
20880

TABLE ACCESS FULL
WAREHOUSE
RETURNS
62
38293
650981




Using a 2-digit year in the to_date string:



05, RR






SELECT STATEMENT, GOAL = CHOOSE


21420
234
46332

SORT ORDER BY


21420
234
46332

FILTER






HASH JOIN OUTER


21411
234
46332

HASH JOIN OUTER


21034
233
34484

NESTED LOOPS


21031
233
32853

HASH JOIN OUTER


20565
233
29591

NESTED LOOPS OUTER


20490
233
26562

NESTED LOOPS


20490
233
24698

HASH JOIN


20257
233
18873

TABLE ACCESS FULL
WAREHOUSE
ORDER_AGGREGATE
20224
233
16077

TABLE ACCESS FULL
WAREHOUSE
PUBLISHER_PROFILE
32
11898
142776

TABLE ACCESS BY INDEX ROWID
ESD
PUBLISHERS
1
1
25

INDEX UNIQUE SCAN
ESD
PUBLISHERS_PK

1


INDEX UNIQUE SCAN
ESD
SALES_AGENT_ID_PK

1
8

TABLE ACCESS FULL
WAREHOUSE
FULFILLMENTS
74
48036
624468

TABLE ACCESS BY INDEX ROWID
WAREHOUSE
ORDER_EXT_AGGREGATE
2
1
14

INDEX UNIQUE SCAN
WAREHOUSE
ORDERS_EXT_AGG_PK
1
1


TABLE ACCESS FULL
WAREHOUSE
CURRENCIES
2
169
1183

VIEW
WAREHOUSE

375
38293
1914650

SORT GROUP BY


375
38293
1799771

HASH JOIN


66
38293
1799771

TABLE ACCESS FULL
WAREHOUSE
RETURN_REASONS
3
696
20880

TABLE ACCESS FULL
WAREHOUSE
RETURNS
62
38293
650981



Tom Kyte
October 05, 2005 - 6:59 am UTC

... That makes it a constant as of right now doesn't it? ...


no more than SYSDATE is a constant as of right now. 5 minutes from now - sysdate can be different (can be, doesn't have to be! you can freeze time)

WHERE OA.CREATE_DATE >= TO_DATE('7/28/05', 'MM/DD/RR')

Tell me, what date is that - no matter what you answer, I'll say "nope, I was thinking of another date"


would need to see the plan part that represented the estimated cardinality for this step.

(you can format plans skinny, I do it all of the time...)

How do you know if a table was analyzed with dbms_stats or analyze table?

Peter Tran, December 15, 2005 - 7:38 pm UTC

Hi Tom,

Someone asked me this today, but I haven't been able to figure out the answer.

Is there anyway you can tell whether someone analyzed a table using "DBMS_STATS" vs "ANALZE TABLE"?

My gut call is no.

Thanks,
-Peter

Tom Kyte
December 16, 2005 - 7:45 am UTC

it won't be "definitive", but you can take an educated guess.

I'll use dbms_stats, then analyze - compare, then dbms_stats and recompare..


ops$tkyte@ORA10GR2> create table t ( x int, y varchar2(4000), z varchar2(4000) );
Table created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> insert into t (x) select rownum from all_objects where rownum <= 10;
10 rows created.

ops$tkyte@ORA10GR2> commit;
Commit complete.

ops$tkyte@ORA10GR2> update t set y = rpad('*',4000,'*'), z = rpad('*',4000,'*');
10 rows updated.

ops$tkyte@ORA10GR2> commit;
Commit complete.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR2> create table tstats as
  2  select 'DBMS_STATS' name, a.* from user_tables a where table_name = 'T';

Table created.

ops$tkyte@ORA10GR2> analyze table t compute statistics;

Table analyzed.

ops$tkyte@ORA10GR2> insert into tstats
  2  select 'ANALYZE' name, a.* from user_tables a where table_name = 'T';

1 row created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> column cname format a20
ops$tkyte@ORA10GR2> column dbms_stats format a25
ops$tkyte@ORA10GR2> column analyze format a25
ops$tkyte@ORA10GR2> select a.cname, a.val dbms_stats, b.val analyze
  2    from table( cols_as_rows( q'|select *
  3                                   from tstats
  4                                  where name = 'DBMS_STATS' |' ) ) a,
  5         table( cols_as_rows( q'|select *
  6                                   from tstats
  7                                  where name = 'ANALYZE' |' ) ) b
  8       where a.cname = b.cname
  9             and decode( a.val, b.val, 0, 1 ) = 1
 10  /

CNAME                DBMS_STATS                ANALYZE
-------------------- ------------------------- -------------------------
NAME                 DBMS_STATS                ANALYZE
EMPTY_BLOCKS         0                         3
AVG_SPACE            0                         1878
CHAIN_CNT            0                         10
AVG_ROW_LEN          8005                      8022
GLOBAL_STATS         YES                       NO

6 rows selected.

<b>dbms_stats won't gather things not used by the CBO such as chain_cnt/avg_space in particular.  If they are 0 or null - it is unlikely that analyze has been used in the past - dbms_stats would be the one.  

But global_stats for a non-partitioned table could be a "tell", avg_space, etc is not "good enough" since:</b>

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> delete from tstats where name = 'DBMS_STATS';

1 row deleted.

ops$tkyte@ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte@ORA10GR2> insert into tstats
  2  select 'DBMS_STATS' name, a.* from user_tables a where table_name = 'T';

1 row created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select a.cname, a.val dbms_stats, b.val analyze
  2    from table( cols_as_rows( q'|select *
  3                                   from tstats
  4                                  where name = 'DBMS_STATS' |' ) ) a,
  5         table( cols_as_rows( q'|select *
  6                                   from tstats
  7                                  where name = 'ANALYZE' |' ) ) b
  8       where a.cname = b.cname
  9             and decode( a.val, b.val, 0, 1 ) = 1
 10  /

CNAME                DBMS_STATS                ANALYZE
-------------------- ------------------------- -------------------------
NAME                 DBMS_STATS                ANALYZE
AVG_ROW_LEN          8005                      8022
GLOBAL_STATS         YES                       NO

<b>dbms_stats won't 'reset' them, their being set indicates analyze was used at some time in the past - but since analyze and dbms_stats set and reset global_stats - that could be a "tell" for many normal tables...</b>

 

Wow...Awesome.

Peter Tran, December 16, 2005 - 12:10 pm UTC

Perfect...

Thanks again Tom.

CHAIN_CNT

A reader, January 05, 2006 - 10:05 am UTC

Hi

Since dbms_stats doesnt get CHAIN_CNT for us, if we want to find out if we have chained rows on whic tablas how can we achieve it? By analyzing all tablas manually and overwrtting statistics gathered by dbms_stats?

Tom Kyte
January 05, 2006 - 11:04 am UTC

ops$tkyte@ORA10GR2> @?/rdbms/admin/utlchain

Table created.

ops$tkyte@ORA10GR2> analyze table t list chained rows;

Table analyzed.

ops$tkyte@ORA10GR2> desc chained_rows
 Name                                     Null?    Type
 ---------------------------------------- -------- ----------------------------
 OWNER_NAME                                        VARCHAR2(30)
 TABLE_NAME                                        VARCHAR2(30)
 CLUSTER_NAME                                      VARCHAR2(30)
 PARTITION_NAME                                    VARCHAR2(30)
 SUBPARTITION_NAME                                 VARCHAR2(30)
 HEAD_ROWID                                        ROWID
 ANALYZE_TIMESTAMP                                 DATE

 

A reader, January 05, 2006 - 10:48 pm UTC

Tom,

What is the reason for dbms_stats not gathering info about chained rows whereas analyze command does?

Thanks.

Tom Kyte
January 06, 2006 - 1:37 pm UTC

dbms_stats gathers only that which is used by the optimizer - it gathers no other bits of information.

Ramesh, January 06, 2006 - 9:48 am UTC

Tom,
How setting "mbrc" through dbms_stats and parameter db_file_multiblock_read_count influence differently.
Is that some thing like "mbrc" used during cost evaluation phase and db_file_multiblock_read_count used during execution phase.
Thanks for your time.
Ramesh.

Tom Kyte
January 06, 2006 - 2:11 pm UTC

the system statistics mbrc is the "observed actual multi-block read count on the system, regardless of the db file multiblock read count init.ora parameter"

the optimizer will use the "observed value", the "retrieval" engine will use db_file_multiblock_read_count to do the actual IO however.

So, if you tell the system via dbms_stats "we only get 8 blocks on multi-block reads" but set the multiblock read count to the max (say 128 blocks) - when the optimizer decides to to a full scan (multi-block read), we'll do as large an IO as possible, but the costing engine would consider "8" to be the number we would expect.



dbms_stats in 10gR2

Oraboy, February 21, 2006 - 5:21 pm UTC

Guess I am really tired and my eyes cant catch what I am missing here..Could anyone please help?

Issue: The index associated with Primary key Constraint never gets updated with correct statistics and hence my query is throwing wrong results

Commodity_DIM is a table with 700+ rows

<b>-- Version check </b>
SQL> select * from V$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

<b>-- Table details </b>
SQL> select table_name,num_rows,sample_size,last_analyzed from user_tables
  2  where table_name='COMMODITY_DIM'
  3  /

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- ----------- ---------
COMMODITY_DIM                         730         730 21-FEB-06

SQL> select count(*) from commodity_dim;

  COUNT(*)
----------
         0

<b>-- Index details </b>

SQL> select index_name,num_rows,DISTINCT_KEYS,status from user_indexes
  2  where table_name='COMMODITY_DIM';

INDEX_NAME                       NUM_ROWS DISTINCT_KEYS STATUS
------------------------------ ---------- ------------- --------
SYS_C009071                             0             0 VALID

<b>-- Index associates to Primary Key constraint in the table </b>

SQL> select constraint_name,constraint_type from user_constraints
  2  where table_name='COMMODITY_DIM' and index_name='SYS_C009071';

CONSTRAINT_NAME                C
------------------------------ -
SYS_C009071                    P

<b>-- and the constraint is Valid </b>

SQL> select constraint_name,constraint_type,status from user_constraints
  2  where table_name='COMMODITY_DIM' and index_name='SYS_C009071';

CONSTRAINT_NAME                C STATUS
------------------------------ - --------
SYS_C009071                    P ENABLED

<b>-- analyzing the index - no help </b>

SQL> analyze index SYS_C009071 compute statistics;

Index analyzed.

SQL> select index_name,num_rows,DISTINCT_KEYS,status from user_indexes
  2  where table_name='COMMODITY_DIM';

INDEX_NAME                       NUM_ROWS DISTINCT_KEYS STATUS
------------------------------ ---------- ------------- --------
SYS_C009071                             0             0 VALID

<b>-- dbms_stats ; perform compute  </b>

SQL> exec dbms_stats.gather_index_stats(ownname=>'GOLDDEMO',-
> indname=>'SYS_C009071',estimate_percent=>NULL);

PL/SQL procedure successfully completed.

SQL> select index_name,num_rows,DISTINCT_KEYS,status from user_indexes
  2  where table_name='COMMODITY_DIM';

INDEX_NAME                       NUM_ROWS DISTINCT_KEYS STATUS
------------------------------ ---------- ------------- --------
SYS_C009071                             0             0 VALID

SQL> desc commodity_dim
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 COMM_DIM_PK                               NOT NULL NUMBER
 COMM_LOW_CATG_CODE                        NOT NULL VARCHAR2(10)
 COMM_LOW_CATG_DESC                                 VARCHAR2(50)
 COMM_MID_CATG_CODE                        NOT NULL VARCHAR2(50)
 COMM_MID_CATG_DESC                                 VARCHAR2(50)
 COMM_HI_CATG_CODE                         NOT NULL VARCHAR2(50)
 COMM_HI_CATG_DESC                                  VARCHAR2(50)
 COMM_ALL_CATG_CODE                                 VARCHAR2(10)
 COMM_ALL_CATG_DESC                                 VARCHAR2(50)

<b>-- Here is the problem </b>

SQL> select count(*) from commodity_dim;

  COUNT(*)
----------
         0

SQL> select count(distinct comm_dim_pk) from commodity_dim;

COUNT(DISTINCTCOMM_DIM_PK)
--------------------------
                         0

SQL> select /*+ FULL(c)*/
  2         count(distinct comm_dim_pk) from commodity_dim c;

COUNT(DISTINCTCOMM_DIM_PK)
--------------------------
                       730

<b>-- Verifying Optimizer plans  </b>

SQL> explain plan for select count(*) from commodity_dim;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
Plan hash value: 818636992

------------------------------------------------------------------------
| Id  | Operation        | Name        | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT |             |     1 |     0   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE  |             |     1 |            |          |
|   2 |   INDEX FULL SCAN| SYS_C009071 |   730 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------


SQL> explain plan for 
  2  select /*+ FULL(c)*/
  3         count(distinct comm_dim_pk) from commodity_dim c;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
Plan hash value: 3961897095

-------------------------------------------------------------------------
| Id  | Operation          | Name          | Rows  | Bytes | Cost (%CPU)| T
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |               |     1 |     4 |     8   (0)| 0
|   1 |  SORT GROUP BY     |               |     1 |     4 |            |  
|   2 |   TABLE ACCESS FULL| COMMODITY_DIM |   730 |  2920 |     8   (0)| 0
---------------------------------------------------------------------------

<b>-- parameter listing , if this helps  </b>

SQL> show parameter opt

NAME                                 TYPE        VALUE
------------------------------------ ----------- ----------
filesystemio_options                 string
object_cache_optimal_size            integer     102400
optimizer_dynamic_sampling           integer     2
optimizer_features_enable            string      10.2.0.1
optimizer_index_caching              integer     0
optimizer_index_cost_adj             integer     100
optimizer_mode                       string      ALL_ROWS
optimizer_secure_view_merging        boolean     TRUE
plsql_optimize_level                 integer     2 

Tom Kyte
February 22, 2006 - 8:28 am UTC

analyze table commodity_dim validate structure cascade;

what does that return.

Ora-1499

oraboy, February 22, 2006 - 9:31 am UTC

Thanks Tom.. Looks like there is much bigger problem here.

SQL> analyze table commodity_dim validate structure;

Table analyzed.

SQL> analyze table commodity_dim validate structure cascade;
analyze table commodity_dim validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

I 'll follow-up this in metalink.

Thanks for your pointers, as always.
 

Weird but worked

Oraboy, February 22, 2006 - 9:39 am UTC

SQL> analyze table commodity_dim validate structure cascade;
analyze table commodity_dim validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file


SQL> alter table commodity_dim move;

Table altered.

SQL> select index_name from user_indexes
  2  where table_name='COMMODITY_DIM';

INDEX_NAME
------------------------------
SYS_C009071

SQL> alter index SYS_C009071 rebuild ;

Index altered.

SQL> analyze table commodity_dim validate structure cascade;

Table analyzed.

SQL> select count(*) from commodity_dim;

  COUNT(*)
----------
       730

SQL> explain plan for select count(*) from commodity_dim;

Explained.


SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
Plan hash value: 2809477265

-----------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |     1 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |             |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| SYS_C009071 |   730 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

 

Comparable test, different results...

Ben, March 09, 2006 - 2:52 pm UTC

This is a great discussion, it has brought to light the concept of histograms and generating statistics overall for me. The argument for how bright that light is, is still up in the air ;)

After reading the portion, dated Jan 05, that gave examples of the differences between the different method_opt choices, I decided to experiment a little for myself. I am using a test data warehouse that is composed of only 7 tables, each of which have a primary key index and two of which have an additional one column non-unique index. I show those below.

I found my results to be un-expectantly different from yours and I wondered if you could help me to understand why. 

The first test was using 'FOR ALL INDEXED COLUMNS'. The second test was with 'FOR ALL INDEXED COLUMNS SIZE AUTO'. Finally, the third test was using 'FOR ALL INDEXED COLUMNS SIZE SKEWONLY'. The third test gave the most confusing results to me. I will include my observation and question following the results from each test, within the spooled listing below. 

SQL> show user
USER is "DMR"

SQL> select USER || '.' || table_name, index_name, column_name
  2  from user_ind_columns;

USER||'.'||TABLE_NAME INDEX_NAME             COLUMN_NAME
--------------------- ---------------------- -------------------------
DMR.BRANCH_BO         BRANCH_BO_1            REPORT_DATE 
DMR.BRANCH_BO         BRANCH_BO_1            PRODUCT  
DMR.BRANCH_BO         BRANCH_BO_1            BRANCH_PLANT
DMR.BRANCH_BO         BRANCH_BO_1            TRX_CLASS 

DMR.INV_FACT          INV_FACT_1             REPORT_DATE  
DMR.INV_FACT          INV_FACT_1             PRODUCT 
DMR.INV_FACT          INV_FACT_1             BRANCH_PLANT 
DMR.INV_FACT          INV_FACT_1             CATEGORY      
DMR.INV_FACT          INV_FACT_1             OBSOLETE 

DMR.ITEM              ITEM_1                 PRODUCT 
DMR.ITEM              ITEM_2**               D_BUS_UNIT

DMR.ITEM_BRANCH       ITEM_BRANCH_1          PRODUCT 
DMR.ITEM_BRANCH       ITEM_BRANCH_1          BRANCH_PLANT

DMR.ITEM_FILL         ITEM_FILL_1            REPORT_DATE 
DMR.ITEM_FILL         ITEM_FILL_1            PRODUCT  
DMR.ITEM_FILL         ITEM_FILL_2**          PRODUCT 

DMR.OSB_FACT          OSB_FACT_1             REPORT_DATE 
DMR.OSB_FACT          OSB_FACT_1             PRODUCT
DMR.OSB_FACT          OSB_FACT_1             TRX_CLASS 

DMR.PROD_FACT         PROD_FACT_1            REPORT_DATE
DMR.PROD_FACT         PROD_FACT_1            PRODUCT 
DMR.PROD_FACT         PROD_FACT_1            BRANCH_PLANT

DMR.TIME              TIME_1                 CURDATE 

** signifies non unique indexes/columns

23 rows selected.

############## 
The following shows the skewdness of d_bus_unit, curdate, and obsolete
##############


SQL> select count(distinct D_BUS_UNIT)
  2  from item;
 
COUNT(DISTINCTD_BUS_UNIT)
-------------------------
                       12
 
SQL> select count(*)
  2  from item;
 
  COUNT(*)
----------
     65119
 
SQL> select D_BUS_UNIT, count(*)
  2  from item
  3  group by D_BUS_UNIT;
 
D_BUS_UNIT                       COUNT(*)
------------------------------ ----------
                                      617
Acute Care                          29298
Corporate                              10
Drop Ship Fee                           1
Freight                                 2
Min Order Qty Fee                       1
NonStock Expense Item                   8
OEM                                  1920
Patient Care                        30547
REBATE CREDIT                           1
Raw Material Samples                    2
 
D_BUS_UNIT                       COUNT(*)
------------------------------ ----------
Wound Care                           2699
 
12 rows selected.
SQL> select count(distinct obsolete)
  2  from inv_fact;
 
COUNT(DISTINCTOBSOLETE)
-----------------------
                      2
 
SQL> select count(*)
  2  from inv_fact;
 
  COUNT(*)
----------
   2743569
 
SQL> select obsolete, count(*)
  2  from inv_fact
  3  group by obsolete;
 
O   COUNT(*)
- ----------
N    2649249
Y      94320

SQL> select count(*)
  2  from time;
 
  COUNT(*)
----------
       156
 
SQL> select count(distinct curdate)
  2  from time;
 
COUNT(DISTINCTCURDATE)
----------------------
                   156

#################################

SQL> exec dbms_stats.delete_schema_stats( 'DMR' );

PL/SQL procedure successfully completed.

SQL> select table_name, column_name, count(*)
  2  from user_tab_histograms
  3  group by table_name, column_name;

no rows selected

SQL> exec dbms_stats.gather_schema_stats
    ( ownname         =>'DMR', 
      cascade         => TRUE, 
      estimate_percent     => 10, 
      method_opt         => 'FOR ALL INDEXED COLUMNS');

PL/SQL procedure successfully completed.

SQL> l
  1  select table_name, column_name, count(*)
  2  from user_tab_histograms
  3* group by table_name, column_name
SQL> /

TABLE_NAME            COLUMN_NAME     COUNT(*)             
--------------------- --------------  ----------       
ITEM                  PRODUCT         2             
ITEM                  D_BUS_UNIT      7             
TIME                  CURDATE         76             
INV_FACT              PRODUCT         76             
INV_FACT              CATEGORY        2             
INV_FACT              OBSOLETE        2             
INV_FACT              REPORT_DATE     74             
INV_FACT              BRANCH_PLANT    25             
OSB_FACT              PRODUCT         76             
OSB_FACT              TRX_CLASS       6             
OSB_FACT              REPORT_DATE     74             
BRANCH_BO             PRODUCT         76             
BRANCH_BO             TRX_CLASS       6             
BRANCH_BO             REPORT_DATE     62             
BRANCH_BO             BRANCH_PLANT    24             
ITEM_FILL             PRODUCT         76             
ITEM_FILL             REPORT_DATE     3             
PROD_FACT             PRODUCT         76             
PROD_FACT             REPORT_DATE     44             
PROD_FACT             BRANCH_PLANT    32             
ITEM_BRANCH           PRODUCT         76             
ITEM_BRANCH           BRANCH_PLANT    33             

22 rows selected.

######### OBSERVATIONS/QUESTIONS ########
When you did this test on table "T" that was built from 
create table t as select * from all_objects
you got a 76 bucket histogram on object_id, a unique column, and a 22 bucket histogram on owner, a non-unique column.

In my sample item.product is the primary key, I would have expected more than a two bucket histogram.
ITEM.D_BUS_UNIT is non-unique and very skewed but yet I only got a 7 bucket histogram.
What could be the cause of this? It seems like I got the opposite of what your example shows.
################# END of Q1


SQL> exec dbms_stats.delete_schema_stats( 'DMR' );

PL/SQL procedure successfully completed.

SQL> select table_name, column_name, count(*)
  2  from user_tab_histograms
  3  group by table_name, column_name;

no rows selected

SQL> exec dbms_stats.gather_schema_stats
    ( ownname        => 'DMR', 
      cascade         => TRUE, 
      estimate_percent     => 10, 
      method_opt         => 'FOR ALL INDEXED COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

SQL> select table_name, column_name, count(*)
  2  from user_tab_histograms
  3  group by table_name, column_name;

TABLE_NAME                     COLUMN_NAME                 COUNT(*)             
------------------------------ ------------------------- ----------             
ITEM                           PRODUCT                            2             
ITEM                           D_BUS_UNIT                         2             
TIME                           CURDATE                            2             
INV_FACT                       PRODUCT                            2             
INV_FACT                       CATEGORY                           2             
INV_FACT                       OBSOLETE                           2             
INV_FACT                       REPORT_DATE                        2             
INV_FACT                       BRANCH_PLANT                       2             
OSB_FACT                       PRODUCT                            2             
OSB_FACT                       TRX_CLASS                          2             
OSB_FACT                       REPORT_DATE                        2             
BRANCH_BO                      PRODUCT                            2             
BRANCH_BO                      TRX_CLASS                          2             
BRANCH_BO                      REPORT_DATE                        2             
BRANCH_BO                      BRANCH_PLANT                       2             
ITEM_FILL                      PRODUCT                            2             
ITEM_FILL                      REPORT_DATE                        2             
PROD_FACT                      PRODUCT                            2             
PROD_FACT                      REPORT_DATE                        2             
PROD_FACT                      BRANCH_PLANT                       2             
ITEM_BRANCH                    PRODUCT                            2             
ITEM_BRANCH                    BRANCH_PLANT                       2             

22 rows selected.


##########################
No confusion here, got the exact results as your sample. This is a test db and hasn't had any sql ran for some time.
###########################

SQL> exec dbms_stats.delete_schema_stats( 'DMR' );

PL/SQL procedure successfully completed.

SQL> select table_name, column_name, count(*)
  2  from user_tab_histograms
  3  group by table_name, column_name;

no rows selected

SQL> exec dbms_stats.gather_schema_stats
    ( ownname        =>'DMR', 
      cascade         => TRUE, 
      estimate_percent     => 10, 
      method_opt         => 'FOR ALL INDEXED COLUMNS SIZE SKEWONLY');

PL/SQL procedure successfully completed.

SQL> select table_name, column_name, count(*)
  2  from user_tab_histograms
  3  group by table_name, column_name;

TABLE_NAME                     COLUMN_NAME                 COUNT(*)             
------------------------------ ------------------------- ----------             
ITEM                           PRODUCT                            2             
ITEM                           D_BUS_UNIT                         2             
TIME                           CURDATE                           76             
INV_FACT                       PRODUCT                          201             
INV_FACT                       CATEGORY                           2             
INV_FACT                       OBSOLETE                           2             
INV_FACT                       REPORT_DATE                       99             
INV_FACT                       BRANCH_PLANT                      48             
OSB_FACT                       PRODUCT                          201             
OSB_FACT                       TRX_CLASS                          6             
OSB_FACT                       REPORT_DATE                      100             
BRANCH_BO                      PRODUCT                          201             
BRANCH_BO                      TRX_CLASS                          6             
BRANCH_BO                      REPORT_DATE                       89             
BRANCH_BO                      BRANCH_PLANT                      24             
ITEM_FILL                      PRODUCT                          102             
ITEM_FILL                      REPORT_DATE                        3             
PROD_FACT                      PRODUCT                          201             
PROD_FACT                      REPORT_DATE                      108             
PROD_FACT                      BRANCH_PLANT                      31             
ITEM_BRANCH                    PRODUCT                          201             
ITEM_BRANCH                    BRANCH_PLANT                      57             

22 rows selected.
################################
Here's where I really get confused. I listed some statements at the top that shows the skewdness of
item.d_bus_unit, time.curdate, and inv_fact.obsolete. item.product is the primary key of item, so it's unique, and from example I would expect the 2 bucket histogram, but d_bus_unit is non_unique and very skewed. Why no histograms here? Time.curdate is unique, 
why does it have histograms? inv_fact.obsolete is non-unique and very skewed, but yet again, no histograms. What could be causing this?

Also, I might be a little confused with the terminology and referring to that count as buckets. I saw where you were saying that 2 meant no histograms, just hi/lo endpoints, and that >2 meant histograms.

Thank you for any explanations you could give me, and let me know if you need to see other settings "maybe init.ora" or anything else.


 

Tom Kyte
March 09, 2006 - 3:58 pm UTC

what version here - dbms_stats behavior is driven by that greatly.

Also, I likely won't address everything here - this is sort of a review followup area, this is "big" (and I cannot really reproduce, I do not have your data, that makes it "hard")


in addition to above...

Ben, March 09, 2006 - 3:51 pm UTC

Sorry, I forgot to include the banner information

SQL> select banner from v$version;
 
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
PL/SQL Release 9.2.0.5.0 - Production
CORE    9.2.0.6.0       Production
TNS for IBM/AIX RISC System/6000: Version 9.2.0.5.0 - Production
NLSRTL Version 9.2.0.5.0 - Production 

another addition, your response wasn't posted before last.

Ben, March 09, 2006 - 4:17 pm UTC

sorry, again for the lack of data. I did think of something else that might impact this.
In my init.ora the compatible parameter is set to 8.1.0 BUT the optimizer_features_enable is set to 9.2.0

I just recently inherited the database so I can't give you a reason why these are the way they are.

Tom Kyte
March 09, 2006 - 4:25 pm UTC

I'll take a look later when I get back from travel if I get a chance.


Meanwhile and faraway, can you try to generate a test case using mod, rownum, decode whatever to generate dummy data that we can test with ?

a closer comparison

Ben, March 10, 2006 - 1:28 pm UTC

I re-created your exact example by using the all_objects table. 
Those stats were almost exact. I think the difference with using skewonly is probably due to my db having fewer schemas than the one you were using. 

I'm still confused on some things with my specific data tables. 

The time table has a record for each day. with curdate being the actual date and primary key. The other fields are for day, month, year, day of week, day of year, etc. This is a test schema so there may not be a record for every day begining to end. Could this cause the need for histograms on that column? For example the min date could be 1/1/03 and the max 1/1/06 but it could be missing three months in between or even all of 2005. 



SQL> show user
USER is "DMR"
SQL> create table t as select * from all_objects;

Table created.

SQL> create index t_idx1 on t(object_id);

Index created.

SQL> create index t_idx2 on t(owner);

Index created.

SQL> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns', cascade => true );

PL/SQL procedure successfully completed.

SQL> col column_name for a20
SQL> select column_name, count(*)
  2  from user_tab_histograms
  3  where table_name = 'TIME';
  4  group by column_name
SQL> /

COLUMN_NAME            COUNT(*)                                                 
-------------------- ----------                                                 
OBJECT_ID                    76                                                 
OWNER                         4                                                 

SQL> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns size auto', cascade => true );

PL/SQL procedure successfully completed.

SQL> /

COLUMN_NAME            COUNT(*)                                                 
-------------------- ----------                                                 
OBJECT_ID                     2                                                 
OWNER                         2                                                 

SQL> exec dbms_stats.gather_table_stats( user, 'T', method_opt=> 'for all indexed columns size skewonly', cascade => true );

PL/SQL procedure successfully completed.

SQL> /

COLUMN_NAME            COUNT(*)                                                 
-------------------- ----------                                                 
OBJECT_ID                     2                                                 
OWNER                         4                                                 

SQL> exec dbms_stats.gather_table_stats( user, 'TIME', method_opt=> 'for all indexed columns size skewonly', cascade => true );

PL/SQL procedure successfully completed.

SQL> select column_name, count(*)
  2  from user_tab_histograms
  3  where table_name = 'TIME';
  4  group by column_name
SQL> /

COLUMN_NAME            COUNT(*)                                                 
-------------------- ----------                                                 
CURDATE                      76 

Tom Kyte
March 10, 2006 - 8:29 pm UTC

we flipped from T to TIME? I didn't follow? What is TIME exactly?

explanation

Ben, March 14, 2006 - 8:42 am UTC

I tried to explain in the last paragraph at the top before the example. I went through and recreated your example with almost exact results. I then posted numbers from a TIME.CURDATE field that is a primary key that holds the date. In production there is a record for every day, but the test instance has some gaps. I was asking if that is why I would get histograms on that column.

If I am running dbms_stats.gather_schema_stats(user, 'TIME', method_opt=> 'for all
indexed columns size skewonly', cascade => true );
Why would a single column primary key of varchar2 datatype get a histogram generated. That could explain a lot to me. I thought if a column is unique it would only need 1 bucket. Is that not true with varchar2 columns.
If, on a smaller scale, my column looks like:
10-245
10-456
10-984
10-999
15-875
50-3000
50-450
50-451
50-780
90-800
That is a unique column but would it get histograms generated if I used "for all indexed columns size skewonly"?


On the other hand, I also have a char(1) column that is either Y or N. Most of the 100,000 records have an N but yet I only get 1 bucket for this column. I wouldn't think that is sufficient. example:
N
N
N
Y
N
N
N
N
N
Y
Y
N
N
I am only getting a 1 bucket histogram for a column like this, of course on a much larger scale.

If one value out of two possible makes up less than 5% or 10% are histograms not created?



analyze, and gather_stats schemas

Mohammed Abdul Samad, April 24, 2006 - 4:59 am UTC

this is good and useful but I need more detail with
example like if we are gathering stats every week
so how to make scheduling with dbms_jobs,
can you give me with complete examples
hope you got the point......

Tom Kyte
April 24, 2006 - 5:45 am UTC

what issue do you have with dbms_job? It is pretty straightforward.

You have a procedure you write to gather stats (or just call dbms_stats)
You schedule it.



Problem with table analyze.

Abdul Seerae, May 02, 2006 - 4:15 pm UTC

Dear Tom,

We have the following script running daily (AT 2.00 AM) to gather statistics on a specific schema.

[ dbms_stats.gather_schema_stats(ownname=>'ABCDOWNER',cascade=>true); ]

We are facing a problem with this script. The table analyze stops at a specific table
(say 'Problem_tbl') and it will not proceed any further. It gives no error message or updates any logs. It stays there for ever. When checking through OEM, or sqlplus, we can see 'last analyzed date' updated for all tables before 'Problem_tbl' table, in this schema.
This “problem_tbl” is not very big in size (only16,000 rows) and we have many big ones in this schema.

When we analyze this table individually, we got same result. It stays for ever.
[ dbms_stats.gather_table_stats(ownname=> 'ABCDOWNER', tabname=> 'PROBLEM_TBL', partname=> NULL); ].

We can analyze all other tables without any problem. To rule out any table corruption, we exported and imported this table to our development server. The data seems OK.

Once we restart database, this problem goes away for few weeks before it reappear.

Any suggestions / thoughts in this respect will be highly appreciated.


O/S: Windows 2003.
Oracle : 9i (9.2.0.6.0)




Tom Kyte
May 03, 2006 - 1:33 am UTC

(this does seem like an OBVIOUS support issue, I always wonder why that path isn't taken...)

I would peek in the v$ views to see what this session is waiting for when it gets "stuck".

v$session_wait
v$session_event

Locks

Ik, May 08, 2006 - 6:53 am UTC

Tom,

Two questions:

Does ANALYZE or DBMS_STATS against a table, lock the table in any mode? I came across an old Oracle press documentation which said so, but could not simulate it (10g R2). Was it true in older versions of Oracle?


If the table is locked by another session, would analyze/dbms_stats wait for that lock to be released?

Thanks,


Tom Kyte
May 08, 2006 - 8:22 am UTC

certain variations of analyze can (to validate structures and such), but dbms_stats just runs queries basically.

Doubts

Ik, May 09, 2006 - 10:53 am UTC

Thanks a lot Tom.

I have some related questions.

1. Why do we have parameter force for gather_table_stats? This is FALSE by default. This means, dbms_stats.gather_table_stats errors out (or does it wait?) if the table is locked?

2. no_invalidate clause -- The documentation says "Does not invalidate the dependent cursors if set to TRUE. The procedure invalidates the dependent cursors immediately if set to FALSE."
Invalidating dependent cursors means, all the queries that have used this table and are present in the shared pool will be invalidated and will be hard parsed if executed again?

Was this the default behaviour with ANALYZE?

3. DBMS_STATS.GATHER_TABLE_STATS

BEGIN
DBMS_STATS.GATHER_TABLE_STATS
(
OWNNAME => 'TEST'
,TABNAME => 'RTEST'
,ESTIMATE_PERCENT => 1
,GRANULARITY => 'GLOBAL'
,CASCADE => FALSE
);
END;

This table has 140 partitions, 6018191841 rows and 80538932 blocks.

The above command takes 80 minutes.

If i change block_sample => TRUE, it completes in less than a minute.

I found this query (modified) to be the most time consuming one and waits on "db file scattered reads"

select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact
use_weak_name_resl dynamic_sampling(0) no_monitoring */ *
from rtest sample (1) t

When the block_sample is made TRUE, query changes to

select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact
use_weak_name_resl dynamic_sampling(0) no_monitoring */ *
from rtest sample block (1) t

and is much faster.

Tom, i found sample rows and sample block to return different amount of rows. Which is more accurate... when used within dbms_stats , and why?

Thanks a lot for your time.

Thanks,






Tom Kyte
May 09, 2006 - 11:08 am UTC

1) force overrides statistics that were "locked", you can "lock" statistics - make it so a normal "gather" won't overwrite them. Maybe they could have called it "freezing" statistics instead of locking them.

....
force When value of this argument is TRUE, deletes column statistics even if locked
.......


2) correct. analyze would invalidate cursors as well.

3) You need to understand the difference between block and row sampleing.

One of them samples 1% of the BLOCKS and processes any rows it finds on them.

The other samples 1% of the rows (read the rows, but only look at 1 out of every 100 of them).

You would certainly expect them to return different statistics since they look at radically different data in very different ways.

Thanks

Ik, May 09, 2006 - 11:23 am UTC


Many thanks Tom for the very fast and precise response.

When row sampling (1%) is used, oracle reads every row from the table, but considers only 1 out of every hundred rows. In effect the whole table is read.

The difference between this and COMPUTE statistics would then be that COMPUTE considers all the rows whereas sample rows considers only 1% of the rows. But, in both cases the entire table is read.

So, we save on CPU mostly by using sample rows when compared to COMPUTE? Am i correct in saying so?


Also, the Oracle documentation says "Random block sampling is more efficient, but if the data is not randomly distributed on disk, then the sample values may be somewhat correlated." How is random block sampling more efficient?

Iam sorry for asking too many questions, Tom, but these are doubts for which very few people have an answer to.






Tom Kyte
May 09, 2006 - 12:12 pm UTC

you saw how random block sampling is more efficient - it only read 1% of the blocks, it ran a lot faster.

But it might get entirely the wrong answer by accident since it reads 1% of the blocks and could be subject to skew (load sorted data and it'll see lots of "the same values" on any given block which could be unnatural)



Alberto Dell'Era, May 09, 2006 - 11:30 am UTC

> Maybe they could have called it "freezing" statistics
> instead of locking them.

It's worth mentioning the different behaviour when trying to gather stats on a locked table versus the schema/db that contains it:

dellera@ORACLE10> exec dbms_stats.lock_table_stats (user, 't');

PL/SQL procedure successfully completed.

dellera@ORACLE10> exec dbms_stats.gather_table_stats (user, 't');
BEGIN dbms_stats.gather_table_stats (user, 't'); END;

*
ERROR at line 1:
ORA-20005: object statistics are locked (stattype = ALL)
ORA-06512: at "SYS.DBMS_STATS", line 13056
ORA-06512: at "SYS.DBMS_STATS", line 13076
ORA-06512: at line 1

dellera@ORACLE10> exec dbms_stats.gather_schema_stats (user);

PL/SQL procedure successfully completed.

The first is definetely a "lock" as in "you bang your head into the locked door" ;) - as a non-native speaker would express it ...

Tom Kyte
May 09, 2006 - 12:12 pm UTC

but not to be confused with a table lock or a row lock :)

David Aldridge, May 09, 2006 - 12:50 pm UTC

>> But it might get entirely the wrong answer by accident since it reads 1% of the
blocks and could be subject to skew (load sorted data and it'll see lots of "the
same values" on any given block which could be unnatural)

It's also sensitive to skewed numbers of rows per block, and the 1% sample is by no means guaranteed

</code> http://oraclesponge.blogspot.com/2005/06/statistics-by-block-sampling.html <code>

dbms_stats for IOTs

Ik, May 16, 2006 - 8:11 am UTC

Tom,

This is 10g R2.

I noticed that when gathering stats on 10g using dbms_stats (specifying partname, granularity => 'PARTITION' and estimate_percent => <some number>)
performance on IOTs are very poor when compared to tables of similar/much bigger size.

The SQL it is waiting for is

INSERT /*+ append*/ into sys.<temp table>
SELECT .... FROM <iot> sample (0.001) t
WHERE TBL$OR$IDX$PART$NUM(<iot>,0,4,0,ROWID) = :objn

It is waiting on 'db_file_sequential_reads' and in the P1 column of v$session_wait, it shows up almost every datafile that this IOT spans across. This includes other partitions too. My question is that the granularity keyword seems to have no impact on IOTs.

Is this how it should be?

This command is taking more than an hour to run. We have around 100 partitions.


Tom, whenever granularity of PARTITION is specified for a table, index or IOT, shouldnt it go against that partition alone?

In cases when the granularity => 'PARTITION', would Oracle attempt to gather global statistics also by simply rolling up from individual partitions each time?


Please let us know.

A reader, June 03, 2006 - 10:05 pm UTC

Hi Tom,
A basic question but I am bit confused about it. I create one schema with some tables in it. Do I need to gateher stats for them before using them? Or Oracle will take care once they start getting populated? I am bit confused if we do need to take the stats for empty tables. This I am asking as I was asked to do so.May be I did understand it properly and I might have to go back and confirm.

Thx
Santosh.

Tom Kyte
June 04, 2006 - 8:16 am UTC

The only answers to all questions are:

a) why
b) it depends

The answer to this is "it depends"

are you using an older release and using the rule based optimizer (RBO) - then the answer is no, because you are using the RBO and the RBO doesn't want statistics

are you using current software and using the cost based optimizer (CBO)? Then we need to know how current - is it 10g current where dynamic sampling will kick in automatically or is it 8i/9i current where sampling does not kick in automatically.

In 8i/9i - you would be responsible for collecting the statistics. In 10g, it'll dynamically sample the table the first time it hard parses a query and if you are using the automatic job that installs with the database, it'll gather statistics for you on these objects in the future as the job runs over time.



Dbms_stats.gather_schema_stats

Vikas, June 04, 2006 - 9:49 am UTC

Hi Tom,

We are currently managing a database worth 461G worth of data. This is on the database server hosted on Linux machine having 4 CPU's , 8 disk devices supported by 4 disk controllers and 16 G of RAM.

We have been running the package to collect the statistics which will help the optimizer to generate the optimised execution plans using dbms_stats.gather_database_stats, but recently data being growing everyday to the tune of 1G the capturing of statistics has really slowed down and is taking 5 hrs to complete the job.

Create or replace procedure gather_db_stats AS
BEGIN
declare
Path VARCHAR2(100) ;
Filename VARCHAR2(100) := 'Dbmsstats.txt';
Output_file utl_file.file_type;
l_Start NUMBER;
l_tname VARCHAR2(30);
l_partition_name VARCHAR2(30);
rebuild_partition_index VARCHAR2(500);
err_num NUMBER;
err_msg VARCHAR2(500);
BEGIN
Select value into path from v$parameter where name = 'utl_file_dir';
output_file := utl_file.fopen (path,filename, 'a');

for x in (Select * from DWUSER.Rolling_tables where status = 'Y') loop
l_tname := x.table_name;
for table_specific_partitions in (Select all_ind_partitions.partition_name part_name from all_indexes,all_ind_partitions
Where all_indexes.index_name = all_ind_partitions.index_name
And all_indexes.table_name = l_tname
And all_ind_partitions.Status = 'UNUSABLE') loop
begin
l_partition_name := table_specific_partitions.part_name;
rebuild_partition_index := 'Alter table ' || l_tname || ' modify partition ' || l_partition_name || ' rebuild unusable local indexes';
execute immediate rebuild_partition_index;
EXCEPTION
When OTHERS then
err_num := SQLCODE;
err_msg := Substr(SQLERRM,1,500);
utl_file.put_line (output_file,'Date : ' || to_Char(sysdate,'dd/mm/yyyy hh24:mi:ss') || ' Error generated while rebuilding the UNUSABLE local indexes with an error message ' || err_msg);
utl_file.fclose(output_file);
return;
end;
end loop;
end loop;

for x in (select Username from dba_users where username not in ('SYS','SYSTEM'))
lOOP
l_start := dbms_utility.get_time;
dbms_stats.gather_schema_stats(ownname => x.username,
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'FOR ALL INDEXED COLUMNS SIZE SKEWONLY',
degree => 16,
cascade => true,
options => 'GATHER AUTO');
utl_file.put_line (output_file,'Date : ' || to_Char(sysdate,'dd/mm/yyyy hh24:mi:ss') || ' The total time spent to gather stats for the schema owner : ' || x.username || ' is ' || round((dbms_utility.get_time - l_start)/100,2) || 'seconds');
end loop;
END;
END;
/

We just don't have the bandwidth of 5 hrs to get this done. We have 4 partitioned tables which do have most of the data (could be 350M).

Can we use gather_table_stats instead of gather_schema_stats for only those partitions which have the LAST_ANALYZED_DATE as NULL, since our application is such which doesn't perforn any delete or insert activity once the End of day completes. We have Range partitition based on Date and have daily partitoins.

Will this help, or let us know if we can configure some of the parameter values for the gather_schema_stats to get the job done in a much faster way. We don't need to build histograms for the columns.

Thanks in anticipation



Tom Kyte
June 04, 2006 - 4:16 pm UTC

The answer is as always one of two:

a) why
b) it depends

your answer is b), it depends.

Can you just do partition stats? do your queries always hit a SINGLE partition, and is that partition known at hard parse time?

If your query hits more than one partition OR your query doesn't let us figure out which partition it will hit - then GLOBAL TABLE stats are used to optimize (not local partition statistics).

And the global table stats could be slightly "wrong" for columns if you don't maintain them over time.

eg: if you have 2 partitions - and each partition has 500,000 rows - it is easy enough to know there are 1,000,000 rows - that works OK from the local partition statistics.

however, if in those 2 partitions there it a column X and it has 10,000 distinct values in partition1 and 10,000 distinct values in partition2 - how many distinct values does it have "globally"? somewhere between 10,000 and 20,000 - which can be the difference between "night and day" to the optimizer.



There is no cut and dry answer here - you might have to "help" on some columns using dbms_stats.set_*** to set the column stats for us if you know them (and you are gathering histograms there - you said you don't need to so why?)

dbms_stats

geepee, July 05, 2006 - 4:56 am UTC

Dear Tom,

We have around 150 fast growing/changing transactional data tables in our production system.

Database is 10gR2.
Growth of these tables is from 0.1 million to 1 million in six months.
A few tables are partitioned tables.

Which way is good to analyze these fast growing tables? Dynamic sampling? or scheduling the script run using dbms_stats for these tables? Pl. suggest.

Regards.


Tom Kyte
July 08, 2006 - 7:51 am UTC

dynamic sampling will happen only at the first hard parse time (so if you never shutdown - you never get different plans - assuming you have used bind variables - which you must be since you said "we are transactional"). It would not be appropriate.

More likely would be to use dbms_stats to gather stats on STALE tables (tables are all set to monitoring in 10g you can use "gather stale" as the option, to avoid gathering stats on tables that haven't changed much).

Table statistics and 3113 end-of-file on comm channel

Jon Waterhouse, July 12, 2006 - 11:38 am UTC

I ran into the following problem on XE, which I find difficult to explain. I had the following query:

select count(*),urban
from case@caps s
inner join res_address@caps r ON r.res_address_id = s.res_address_id
inner join ref.good_postal_code p ON p.postal_code=r.postal_code
WHERE s.adults=1 and s.children > 0 -- is single parent
AND exists (select * from payment@caps p
WHERE s.case_id = p.case_id
AND p.payment_date between to_date('200605','yyyymm') and to_date('200606','yyyymm')-1)
GROUP BY urban

which worked fine. Note that there is one table on XE and the other two tables are remote. I then modified the query to take out the two items that restricted the query to single parents. I got a 3113 error running an explain plan. I found that adding in some restrictions (e.g. s.adults in (1,2,3)) would allow the query to run -- even though this in no way restricts the query - all records would satisfy this. I also found that running an analyze table on ref.good_postal_code allowed the query to run.

Since I wasn't sure if it was possible to delete stats I then did a CTAS to dreate a new table exactly the same as good_postal_code but with no stats. Surprisingly the query ran no problem.

I read somewhere that resource limits can generate 3113s. Is that what is happening here? i.e. the explain plan process decides that the query is going to be too big and returns the 3113?

Tom Kyte
July 12, 2006 - 5:04 pm UTC

doubtful - can you reproduce in "not XE"?

ZB, July 24, 2006 - 12:44 pm UTC

system@oip1> create or replace procedure ods_analyze
2 is
3 begin
4 execute immediate 'execute DBMS_STATS.GATHER_SCHEMA_STATS(''ODS'')';
5 end;
6 /

Procedure created.

system@oip1> exec ods_analyze;
BEGIN ods_analyze; END;

*
ERROR at line 1:
ORA-00900: invalid SQL statement
ORA-06512: at "SYSTEM.ODS_ANALYZE", line 4
ORA-06512: at line 1

Tom, why did I get the error?


Tom Kyte
July 24, 2006 - 2:54 pm UTC

why would you try to dynamically execute plsql - plsql that never changes???????

...
is
begin
dbms_stats.gather_schema_stats( 'ODS' );
end;


you have plsql
you are writing plsql
just call it


execute is a "sqlplus-ism". you can see this from your own example:

system@oip1> exec ods_analyze;
BEGIN ods_analyze; END;

*
ERROR at line 1:
ORA-00900: invalid SQL statement


see - it says "BEGIN ods_analyze; END;" - exec is just shorthand for sqlplus to glue begin and end on it. execute is NOT plsql - it is not part of the language at all.

ZB, July 24, 2006 - 3:07 pm UTC

thanks Tom, now i got the procedure compiled.
But got the new:
system@oip1> exec ods_analyze
BEGIN ods_analyze; END;

*
ERROR at line 1:
ORA-20000: Insufficient privileges to analyze an object in Schema
ORA-06512: at "SYS.DBMS_STATS", line 7649
ORA-06512: at "SYS.DBMS_STATS", line 7779
ORA-06512: at "SYS.DBMS_STATS", line 7830
ORA-06512: at "SYS.DBMS_STATS", line 7809
ORA-06512: at "SYSTEM.ODS_ANALYZE", line 4
ORA-06512: at line 1


system@oip1> exec DBMS_STATS.GATHER_SCHEMA_STATS('ODS');

PL/SQL procedure successfully completed.

But i could do it directly.

Tom Kyte
July 24, 2006 - 4:17 pm UTC

STOP USING SYSTEM

you have no reason to use system, it is "ours", stop using it.

create your own account

and read:

</code> http://asktom.oracle.com/Misc/RolesAndProcedures.html <code>


grant it "analyze any" directly (no roles)


or better yet....

just write this as the ODS user, schedule it as the ODS user.


but don't use system!

Can a table be ANALYZE during the DML operations?

Ravi, August 09, 2006 - 1:01 am UTC

Hi Tom,

1. Can a table be ANALYZE during (to collect statistics) the DML (INSERT,
UPDATE, DELETE) operations?

2. If the column in WHERE caluse has an index on it, whether the statistics are
much concerns to use the index during DMLs?

3. Having statistics and not having statisttics will affect the use of the
index during DMLs? how?

4. If we use (within the same database) ANALYZE for few tables and DBMS_STATS
package to gather statistics for some other tables, will there be a performance
issue by using these two together?


Thanks,
Ravi

Tom Kyte
August 09, 2006 - 10:42 am UTC

1) you may use dbms_stats to gather statistics while DML is happening, yes.

2) sure. It is all about estimated cardinality.

3) you need to have statistics to use the cost based optimizer (CBO) safely. If you do not, you will not be able to use the CBO safely.

4) do not use analyze, use dbms_stats

DBMS_STATS & ANALYZE statement Oracle10g R2

Suvendu, August 19, 2006 - 7:06 am UTC

Hi Tom,

Below is the test case which shows the wrong information with DBMS_STATS in Oracle 10g R2. It has the correct information in Oracle 9i R2. As per metalink, CBO will get exact information when we will go by COMPUTE STATISTICS where NUM_ROWS and SAMPLE_SIZE columns will point same values. And, I also checked out on query execution.

SQL> create table t2 as select * from t1;

Table created.

SQL> select count(1) from t2;

  COUNT(1)
----------
    735392

SQL>  exec DBMS_STATS.GATHER_TABLE_STATS(user, TABNAME => 'T2',GRANULARITY => 'DEFAULT' ,METHOD_OPT => 'FOR COLUMNS');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, sample_size from user_tables where table_name='T2';

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE
------------------------------ ---------- -----------
T2                                 735392       51451

SQL> REM Here is the wrong number of rows compare to 735392 and 735392.

SQL> analyze table t2 compute statistics;

Table analyzed.

SQL> select table_name, num_rows, sample_size from user_tables where table_name='T2';

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE
------------------------------ ---------- -----------
T2                                 735392      735392

SQL> REM Here all values are accurate.

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL>

======================
Oracle 9i R2
======================

SQL> select count(1) from t1;

  COUNT(1)
----------
  27525120

Elapsed: 00:01:07.05
SQL>  exec DBMS_STATS.GATHER_TABLE_STATS(user, TABNAME => 'T1',GRANULARITY => 'DEFAULT' ,METHOD_OPT => 'FOR COLUMNS', DEGREE=>8);

PL/SQL procedure successfully completed.

Elapsed: 00:00:25.22
SQL> select table_name, num_rows, sample_size from user_tables where table_name='T1';

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE
------------------------------ ---------- -----------
T1                               27525120    27525120

Elapsed: 00:00:00.08
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
PL/SQL Release 9.2.0.6.0 - Production
CORE    9.2.0.6.0       Production
TNS for HPUX: Version 9.2.0.6.0 - Production
NLSRTL Version 9.2.0.6.0 - Production

Elapsed: 00:00:00.08
albprod@ALBBPM>


Could you please, elaborate the difference of DBMS_STATS in Oracle 10g ?

Thanking you.

Regards,
Suvendu

 

Tom Kyte
August 19, 2006 - 9:15 am UTC

that is not "wrong"

That is "correct" actually.

If you check out the docs for dbms_stats, you'll find the specification of gather table stats in 10gr2 is:

http://docs.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_stats.htm#sthref8129

DBMS_STATS.GATHER_TABLE_STATS (
   ownname          VARCHAR2, 
   tabname          VARCHAR2, 
   partname         VARCHAR2 DEFAULT NULL,
   estimate_percent NUMBER   DEFAULT to_estimate_percent_type 
                                                (get_param('ESTIMATE_PERCENT')), 
   block_sample     BOOLEAN  DEFAULT FALSE,
   method_opt       VARCHAR2 DEFAULT get_param('METHOD_OPT'),
   degree           NUMBER   DEFAULT to_degree_type(get_param('DEGREE')),
   granularity      VARCHAR2 DEFAULT GET_PARAM('GRANULARITY'), 
   cascade          BOOLEAN  DEFAULT to_cascade_type(get_param('CASCADE')),
   stattab          VARCHAR2 DEFAULT NULL, 
   statid           VARCHAR2 DEFAULT NULL,
   statown          VARCHAR2 DEFAULT NULL,
   no_invalidate    BOOLEAN  DEFAULT  to_no_invalidate_type (
                                     get_param('NO_INVALIDATE')),
   force            BOOLEAN DEFAULT FALSE);


and that:

ops$tkyte%ORA10GR2> select dbms_stats.get_param( 'estimate_percent' ) from dual;

DBMS_STATS.GET_PARAM('ESTIMATE_PERCENT')
-------------------------------------------------------------------------------
DBMS_STATS.AUTO_SAMPLE_SIZE



you are using the auto_sample_size, therefore, it is free to sample.


It performed as documented - but things change (and the fact that the row count is pretty much dead on in 10gr2....) 

Help needed

Anil P, August 23, 2006 - 5:09 am UTC

Hello,

This query is taking more than 12 hrs to execute. Can you let me know which areas I need to look to tune.

Thanks
Anil P

SQL> select z.IND_ID,
  2         z.BLK_OPEN_DT,
  3         z.ORG_SRC_CD,
  4         z.PRM_SRC_CD,
  5         c.calendar_month_name_abbr ACC_OPEN_MTH_ABR,
  6         c.calendar_month ACC_OPEN_MTH,
  7         c.calendar_year ACC_OPEN_YEAR
  8    from mrtlookup.calendar_date c,
  9         (select ind.individual_id IND_ID,
 10                 x.BLK_OPEN_DT,
 11                 ind.original_source_cd ORG_SRC_CD,
 12                 ind.primary_source_cd PRM_SRC_CD
 13            from mrtcustomer.individual ind,
 14                 (select prp.individual_id IND_ID, min(prp.open_dt) BLK_OPEN_DT
 15                    from mrtcustomer.proprietary_account prp,
 16                         mrtcustomer.individual_segment isg,
 17                         (select (c.calendar_month_end_dt + 1) BLK_OPEN_START_DT,
 18                                 (select max(c.calendar_month_end_dt)
 19                                    from mrtlookup.calendar_date c,
 20                                         (SELECT PERIOD_DATES.WEEK_END_DT computation_dt
 21                                            FROM MRTLOOKUP.PERIOD_DATES PERIOD_DATES) e
 22                                   where c.calendar_month_end_dt <=
 23                                         e.computation_dt) END_DT
 24                            from mrtlookup.calendar_date c
 25                           where c.calendar_date =
 26                                 (select min(ind.individual_create_dt)
 27                                    from mrtcustomer.individual ind
 28                                   where ind.original_source_cd = 'TEL')) y
 29                   where y.BLK_OPEN_START_DT <= prp.open_dt
 30                     and y.END_DT >= prp.open_dt
 31                     and prp.division_ind is not null
 32                     and isg.individual_id = prp.individual_id
 33                     and isg.marketing_segment_cd = 'MAILABLE'
 34                   group by prp.individual_id) x
 35           where ind.primary_source_cd = 'GEC'
 36             and ind.individual_id = x.IND_ID) z
 37   where c.calendar_date = z.BLK_OPEN_DT;

1143156 rows selected.



Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=204854 Card=37843
          Bytes=2194894)

   1    0   HASH JOIN (Cost=204854 Card=37843 Bytes=2194894)
   2    1     TABLE ACCESS (FULL) OF 'CALENDAR_DATE' (TABLE) (Cost=68
          Card=14640 Bytes=278160)

   3    1     NESTED LOOPS (Cost=204785 Card=37843 Bytes=1475877)
   4    3       VIEW (Cost=166771 Card=37843 Bytes=832546)
   5    4         SORT (GROUP BY) (Cost=166771 Card=37843 Bytes=200567
          9)

   6    5           NESTED LOOPS (Cost=58475 Card=37843 Bytes=2005679)
   7    6             NESTED LOOPS (Cost=27419 Card=31019 Bytes=108566
          5)

   8    7               TABLE ACCESS (BY INDEX ROWID) OF 'CALENDAR_DAT
          E' (TABLE) (Cost=2 Card=1 Bytes=16)

   9    8                 INDEX (UNIQUE SCAN) OF 'PK_CD' (INDEX (UNIQU
          E)) (Cost=1 Card=1)

  10    9                   SORT (AGGREGATE)
  11   10                     PARTITION RANGE (ALL) (Cost=107721 Card=
          1302415 Bytes=15628980)

  12   11                       TABLE ACCESS (FULL) OF 'INDIVIDUAL' (T
          ABLE) (Cost=107721 Card=1302415 Bytes=15628980)

  13    7               TABLE ACCESS (FULL) OF 'PROPRIETARY_ACCOUNT' (
          TABLE) (Cost=27417 Card=31019 Bytes=589361)

  14   13                 SORT (AGGREGATE)
  15   14                   NESTED LOOPS (Cost=78 Card=732 Bytes=11712
          )

  16   15                     TABLE ACCESS (FULL) OF 'PERIOD_DATES' (T
          ABLE) (Cost=9 Card=1 Bytes=8)

  17   15                     TABLE ACCESS (FULL) OF 'CALENDAR_DATE' (
          TABLE) (Cost=69 Card=732 Bytes=5856)

  18    6             PARTITION RANGE (ITERATOR) (Cost=1 Card=1 Bytes=
          18)

  19   18               INDEX (UNIQUE SCAN) OF 'PK_ISEG_A' (INDEX (UNI
          QUE)) (Cost=1 Card=1 Bytes=18)

  20    3       PARTITION RANGE (ITERATOR) (Cost=2 Card=1 Bytes=17)
  21   20         TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'INDIVIDUAL'
          (TABLE) (Cost=2 Card=1 Bytes=17)

  22   21           INDEX (UNIQUE SCAN) OF 'PK_IND' (INDEX (UNIQUE)) (
          Cost=1 Card=1)






Statistics
----------------------------------------------------------
        274  recursive calls
          4  db block gets
   14526033  consistent gets
    1299590  physical reads
        232  redo size
   38571751  bytes sent via SQL*Net to client
     838832  bytes received via SQL*Net from client
      76212  SQL*Net roundtrips to/from client
          0  sorts (memory)
          1  sorts (disk)
    1143156  rows processed
 

Tom Kyte
August 27, 2006 - 3:23 pm UTC

are the ESTIMATED cardinalities in the explain plan above "realistic" (I'd say "no" based on the returned rows and the estimated returned rows)

are the statistics "up to date"

Thanks

Anil P, August 29, 2006 - 9:20 am UTC

Hello Tom,
Yes it's latest and realistic status. And it's a production db on Oracle 10g. Here's the latest explain plan output.

This query is been generated by a cognos tool.

Can you tell me which areas needs to be looked into?

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3257 Card=1 Bytes=
103)

1 0 FILTER
2 1 SORT (GROUP BY) (Cost=3257 Card=1 Bytes=103)
3 2 MERGE JOIN (CARTESIAN) (Cost=1344 Card=445 Bytes=45835
)

4 3 NESTED LOOPS (Cost=1281 Card=1 Bytes=77)
5 4 NESTED LOOPS (Cost=1280 Card=1 Bytes=59)
6 5 NESTED LOOPS (Cost=369 Card=455 Bytes=15925)
7 6 TABLE ACCESS (BY INDEX ROWID) OF 'CALENDAR_DAT
E' (TABLE) (Cost=2 Card=1 Bytes=16)

8 7 INDEX (UNIQUE SCAN) OF 'PK_CD' (INDEX (UNIQU
E)) (Cost=1 Card=1)

9 8 SORT (AGGREGATE)
10 9 TABLE ACCESS (FULL) OF 'INDIVIDUAL' (TAB
LE) (Cost=1847 Card=81250 Bytes=975000)

11 6 TABLE ACCESS (FULL) OF 'PROPRIETARY_ACCOUNT' (
TABLE) (Cost=367 Card=455 Bytes=8645)

12 11 SORT (AGGREGATE)
13 12 NESTED LOOPS (Cost=66 Card=740 Bytes=11840
)

14 13 TABLE ACCESS (FULL) OF 'PERIOD_DATES' (T
ABLE) (Cost=2 Card=1 Bytes=8)

15 13 TABLE ACCESS (FULL) OF 'CALENDAR_DATE' (
TABLE) (Cost=64 Card=740 Bytes=5920)

16 5 TABLE ACCESS (BY INDEX ROWID) OF 'INDIVIDUAL' (T
ABLE) (Cost=2 Card=1 Bytes=24)

17 16 INDEX (UNIQUE SCAN) OF 'PK_IND' (INDEX (UNIQUE
)) (Cost=1 Card=1)

18 4 INDEX (UNIQUE SCAN) OF 'PK_ISEG' (INDEX (UNIQUE))
(Cost=1 Card=1 Bytes=18)

19 3 BUFFER (SORT) (Cost=1344 Card=14792 Bytes=384592)
20 19 TABLE ACCESS (FULL) OF 'CALENDAR_DATE' (TABLE) (Co
st=63 Card=14792 Bytes=384592)


Thanks
Anil

Tom Kyte
August 29, 2006 - 3:28 pm UTC

eh? this is totally different from above.

anyway - I will not be able to automagically fix this

Initial Load and statistics

A reader, September 08, 2006 - 8:51 am UTC

Hi Tom,

I am doing a massif initial load. During this initial load I am inserting into several tables and in the same time I am selecting from theses tables. This is why I kept indexes usable. In addition, when I am in 50% of my initial load I compute statistics using:

DBMS_STATS.GATHER_SCHEMA_STATS (shema_name, 30)

I wonder if you will advise me to do this or not?

I want to suppress this call to DBMS_STATS but want to be sure that this will improve performance of my process

Thanks for your advise

Tom Kyte
September 09, 2006 - 11:46 am UTC

depends on whether the queries you issue would benefit from this regather.

It is quite possible that you need not gather stats - but since YOU are doing the load (you know the stats already!!! you know the number of rows loaded, you know the average row width, you can guess at the number of blocks and so on) you could just call SET statistics and provide the correct values.

GATHER STALE

Neeraj Nagpal, September 15, 2006 - 2:49 pm UTC

Hi Tom,


I have a question for you regarding the GATHER STALE option of DBMS_STATS. Here is a quick little background about my database setup: I have 4 very large composite-partitioned tables -- with 50 partitions and 64 (hash) sub-partitions in each range partition. Each one of these 4 tables has on an average around 120 Million rows, with somewhere around 250 columns each.

Now, the data in these 50 partitions is pretty skewed, in terms of no. of records in them. And each one of these partitions has its own peculiar update cycle -- some states are updated more (much more) than others and some are inserted more than others -- Now what I want to do is that instead of indiscriminately issuing
"Exec dbms_stats.gather_schema_stats( user, options => 'gather stale' );" I want to get just the listing of the skewed partitions within the tables and selectively analyze these partitions which require them the most. Is there a system table which can give me the information about the particular partitions which are STALE in a table??


Thanks so much for your help,
Neeraj



Tom Kyte
September 16, 2006 - 2:27 pm UTC

look at "LIST STALE" as the option

Please answer

Reader, September 21, 2006 - 3:50 am UTC

Sir,
Please clarify my doubt.

q1) Why is CUSLED_KEY_1 Index being used instead of CUSLED_PK Primary Key index? 
q2) Why its filtering ACCOUNT and AGE_PERIOD instead of using the PK?
q3) Or whatever is happening is correct? 
But when I disable CUSLED index it uses CUSLED_PK index. 


SQL> select index_name, substr(column_name,1,40) cname from all_ind_columns
  2  where table_name = 'CUSLED';

INDEX_NAME                     CNAME
------------------------------ ----------------------------------------
CUSLED_PK                      LEDGER
CUSLED_PK                      ACCOUNT
CUSLED_PK                      AGE_PERIOD
CUSLED_PK                      DOC_REF
CUSLED_KEY_1                   LEDGER
CUSLED_KEY_1                   DOC_REF
CUSLED_KEY_2                   LEDGER
CUSLED_KEY_2                   ACCOUNT_REF
CUSLED_KEY_3                   LEDGER
CUSLED_KEY_3                   CHEQUE_NO
CUSLED_ACCOUNT_FK              ACCOUNT

SQL> SELECT *
  2  FROM CUSLED
  3  WHERE CUSLED.LEDGER = 'UD'
  4  AND CUSLED.ACCOUNT = '0000070101'
  5  AND CUSLED.AGE_PERIOD = 'A510'
  6  AND CUSLED.DOC_REF = '30001013'
  7  AND CUSLED.REC_STA != 'X';

Execution Plan
----------------------------------------------------------
Plan hash value: 2871354732

--------------------------------------------------------------------------------
------------

| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)
| Time     |

--------------------------------------------------------------------------------
------------

|   0 | SELECT STATEMENT            |              |     1 |   405 |     2   (0)
| 00:00:01 |

|*  1 |  TABLE ACCESS BY INDEX ROWID| CUSLED       |     1 |   405 |     2   (0)
| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | CUSLED_KEY_1 |     3 |       |     1   (0)
| 00:00:01 |

--------------------------------------------------------------------------------
------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CUSLED"."ACCOUNT"='0000070101' AND "CUSLED"."AGE_PERIOD"='A510' A
ND

              "CUSLED"."REC_STA"<>'X')
   2 - access("CUSLED"."LEDGER"='UD' AND "CUSLED"."DOC_REF"='30001013')



SQL> ALTER INDEX CUSLED_KEY_1 UNUSABLE;

Index altered.

SQL> SELECT *
  2  FROM CUSLED
  3  WHERE CUSLED.LEDGER = 'UD'
  4  AND CUSLED.ACCOUNT = '0000070101'
  5  AND CUSLED.AGE_PERIOD = 'A510'
  6  AND CUSLED.DOC_REF = '30001013'
  7  AND CUSLED.REC_STA != 'X';

Execution Plan
----------------------------------------------------------
Plan hash value: 4156143435

--------------------------------------------------------------------------------
---------

| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| T
ime     |

--------------------------------------------------------------------------------
---------

|   0 | SELECT STATEMENT            |           |     1 |   405 |     2   (0)| 0
0:00:01 |

|*  1 |  TABLE ACCESS BY INDEX ROWID| CUSLED    |     1 |   405 |     2   (0)| 0
0:00:01 |

|*  2 |   INDEX UNIQUE SCAN         | CUSLED_PK |     1 |       |     1   (0)| 0
0:00:01 |

--------------------------------------------------------------------------------
---------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CUSLED"."REC_STA"<>'X')
   2 - access("CUSLED"."LEDGER"='UD' AND "CUSLED"."ACCOUNT"='0000070101' AND
              "CUSLED"."AGE_PERIOD"='A510' AND "CUSLED"."DOC_REF"='30001013')


Thanks 

Tom Kyte
September 22, 2006 - 1:50 am UTC

the costs in this case where so close, that they "tied", either index was in this case equally appropriate.

Still awaiting for your reply

Reader, September 22, 2006 - 12:29 am UTC

Still awaiting for your reply

Tom Kyte
September 22, 2006 - 2:51 am UTC

gee, thanks for letting me know?

I was hungry last night and went out for food....

LIST STALE PARTITIONS

Neeraj Nagpal, September 26, 2006 - 2:35 pm UTC

Tom,

Thanks for your answer to my question -- titled "GATHER STALE" asked on September 15, 2006. Breifly searching through your website, I came across this procedure that lists the STALE TABLES -- changed since the last-analyzed date -- but I am more interested in finding out the STALE PARTITIONS rather than just the STALE TABLE. How can one display the names of the STALE PARTITIONS using the following script.

declare
l_objList dbms_stats.objectTab;
begin
dbms_stats.gather_schema_stats
( ownname => USER,
options => 'LIST STALE',
objlist => l_objList );
for i in 1 .. l_objList.count
loop
dbms_output.put_line( l_objList(i).objType );
dbms_output.put_line( l_objList(i).objName );
end loop;
end;





Appreciate Your Help,
Neeraj

Tom Kyte
September 26, 2006 - 5:03 pm UTC

Ok, my help:

when you glanced at the documentation to see what else might be available in the dbms_stats.objectTable type - what did you see?

</code> http://docs.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_stats.htm#sthref7889 <code>

I saw this:

TYPE ObjectElem IS RECORD (
ownname VARCHAR2(30), -- owner
objtype VARCHAR2(6), -- 'TABLE' or 'INDEX'
objname VARCHAR2(30), -- table/index
partname VARCHAR2(30), -- partition
subpartname VARCHAR2(30), -- subpartition
confidence NUMBER); -- not used
type ObjectTab is TABLE of ObjectElem;

which would indicate that other attributes of use exist there :)



Thanks Tom

Neeraj Nagpal, September 26, 2006 - 6:31 pm UTC

Thanks Tom, Yes ..it was my shortsighted search on the topic!

Neeraj Nagpal

analyze vs. dbms_stats

Rob, October 10, 2006 - 4:12 pm UTC

Tom,

Very useful - Thank You.

I have seen in your examples the use of:
analyze table t compute statistics for table for all indexes for all indexed columns

but then you'll say something like
In real life, however, one would want to use DBMS_STATS instead.

Could you please supply the equivalent statement using dbms_stats ?

Thanks

Tom Kyte
October 10, 2006 - 8:21 pm UTC

those are all old examples, many of which pre-date the existance of dbms_stats.

dbms_stats.gather_table_stats( user, 'T', method_opt => 'for all indexed columns', cascade => true );

would be the "same" as that particular analyze.

DBMS_STATS - Locking

A reader, November 02, 2006 - 7:09 pm UTC

Tom,

Somewhere up there in this thread, you said “certain variations of analyze can lock (to validate structures and such), but dbms_stats just runs queries basically." 

However while running the following command, 

        DBMS_STATS.gather_table_stats (
        ownname=> 'ADBPROD',
        tabname=> ‘OP_LAYER’,
        estimate_percent=> dbms_stats.auto_sample_size,
        partname=> ‘CALIFORNIA’,
        granularity=> 'PARTITION');

I noticed intermittently that the table 'OP_LAYER' was being locked by the Gather_Table_Stats command, and a few users, who wanted to acquire locks on this table, had to wait.

See the screen dump.

(Pardon the bad text formatting here …)

SQL> Select a.username, b.owner||'.'||b.name,b.type, b.mode_held,b.mode_requested 
         from v$session a, dba_ddl_locks b
         where a.sid=b.session_id
        and  b.mode_held = 'None' and b.mode_requested <> 'None'
11:29:19   5  /  

USERNAME                       B.OWNER||'.'||B.NAME
------------------------------ -------------------------------------------------------------
TYPE                                     MODE_HELD MODE_REQU
---------------------------------------- --------- ---------
ADBPROD                        ADBPROD.OPINION_LAYER
Table/Procedure/Type                     None      Share  &#61663; (<-Share lock requested on the table)

ADBPROD                        ADBPROD.OPINION_LAYER
Table/Procedure/Type                     None      Share &#61663; (<-Share lock requested on the table)


ADBPROD                        ADBPROD.OPINION_LAYER
Table/Procedure/Type                     None      Share &#61663; (<-Share lock requested on the table)


ADBPROD                        ADBPROD.OPINION_LAYER
Table/Procedure/Type                     None      Share &#61663; (<-Share lock requested on the table)


ADBPROD                        ADBPROD.OPINION_LAYER
Table/Procedure/Type                     None      Share &#61663; (<-Share lock requested on the table)




11:37:24 ADBPROD@PROD SQL> Select a.username, b.owner||'.'||b.name,b.type, b.mode_held,b.mode_reques
ted
12:02:21   2  from v$session a, dba_ddl_locks b
12:02:21   3  where a.sid=b.session_id
12:02:21   4  and  b.mode_held = 'None' and b.mode_requested <> 'None'
12:02:21   5  /

no rows selected  <-NO LOCKS ON THE TABLE, AFTER A SHORT INTERVAL 

Could you please tell me that is it one of those “certain variations” of the DBMS_STATS which can lock the table being analyzed???





 

Tom Kyte
November 02, 2006 - 9:07 pm UTC

would have been useful to see the blocker and their lock mode?

Blockers

A reader, November 03, 2006 - 1:58 pm UTC

Well I did check the blockers/waiters using the query below
, but got back -- no rows returned --

SELECT sid, DECODE( block, 0, 'NO', 'YES' ) BLOCKER, DECODE( request, 0, 'NO', 'YES' ) WAITER FROM v$lock WHERE request > 0 OR block > 0 ORDER BY block DESC;

Although I did'nt see any blockers OR waiters in the database at that time, the users were still getting breifly locked (like for 2-3 minutes and then released) on the table OP_LAYER. Not sure -- why. And I know that only process accessing the table 'OP_LAYER' was the DBMS_STATS.




Tom Kyte
November 04, 2006 - 12:08 pm UTC

v$sesstat
v$sessevent

would be useful if you see it again.

Is Commit after each table Analyze can make difference?

Maulesh Jani, November 21, 2006 - 12:08 pm UTC

HI TOM,
We are analyzing the schema in following way :
for rec_table_name in (
select table_name from user_tables
minus
select table_name from user_external_tables
) loop
dbms_stats.gather_table_stats('XXX',''||rec_table_name.TABLE_NAME||'',estimate_percent=>40,DEGREE=>4,CASCADE => TRUE);
end loop;

Now according to you if I put commit after each Loop iteration , than can it reduce overhead on UNDO/REDO/TEMP tablespace or its nothing to relate with it.


Tom Kyte
November 22, 2006 - 3:47 pm UTC

according to me, it would be a waste of time, since dbms_stats already commits.

ops$tkyte%ORA10GR2> create table t1 ( x int );

Table created.

ops$tkyte%ORA10GR2> create table t2 ( x int );

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t1 values ( 1 );

1 row created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> rollback;

Rollback complete.

ops$tkyte%ORA10GR2> select * from t1;

         X
----------
         1



and there is no such thing as "overhead on redo/undo/temp" - I am not familar with what you might be thinking. 

Can you please help me on this ..

Maulesh Jani, November 22, 2006 - 12:19 am UTC

Hi Tom ,
I need your help on this . As I have mentioned in my above question .We have one Analyze Schema stats process in Production system since 3 years . Now suddenly this process fails with below error :
Error While Analyzing Schema tables :

declare
*
ERROR at line 1:
ORA-01555: snapshot too old: rollback segment number 10 with name "_SYSSMU10$"
too small
ORA-06512: at line 12

Can you please tell me on it , that during stats gathering large amount of Redo generated . I am not sure but I am not been able to find any conviencing document on it. And the error is typical snapshot too old error , how it came during Analyze table stats ?

Parallel with this process no other process is executing .


Regards
Maulesh

Tom Kyte
November 24, 2006 - 12:29 pm UTC

that has nothing to do with redo

everything to do with undo.


snapshot too old happens when you have a query (and that is what dbms_stats does, runs queries) that is in need of undo that was overwritten.

increase your undo retention and/or review v$undostats to ensure you are not expiring undo prematurely.

Mayday

Apurva, November 30, 2006 - 1:46 pm UTC

We are handling massive volumes of data in our warehouse. Most of the fact tables will be appended with more than 200 million rows each month, and in the steady state the facts will have close to 2.5 billion rows.

The end-users of our tables are some analysts -- so to give these guys a low downtime (or, sleepless nights - speaking dysphemistically), we came up with the following strategy:

* We'll have two sets of tables
-> Temporary tables - in which we will load each month's data; these tables will not be partitioned
-> Final tables - each month we'll append Temporary tables' content to these tables through partition-exchange; Final tables will we range-partitioned on Month_ID

* As indexing is a time-intensive process -- we will build all the indexes in the world on the set of Temporary tables and have 'including indexes' clause in the partition-exchange statement

Now, here's where I am stumped (assuming you have played cricket)
* As my tables are pretty big in size (though their sheer size makes them ugly), and they will be used by analysts -- it's almost mandatory to gather statistics for these tables. Unfortunately, there is no option for a clause like 'including statistics' in partition-exchange statement -- by which I'll gather statistics for the Temporary table and carry the stats to the Final table (it doesn't make much sense either). So, the only option I am left with is to dump data in the Final table, and analyze the entire Final table for statistics -- but with 2.5 billion rows it'll be eons before it gets analyzed. Or is there a way of escaping this?, which'll help me realize my sadistic objectives.

Thanks Tom.


* We load each month's data in a Temporary table
* We build all the indexes we want on the temporary table
* Partition-exchange Temporary table with the Final table

Tom Kyte
November 30, 2006 - 3:32 pm UTC

from above:

...
you can import/export/set statistics directly with dbms_stats
.....
                      ^^^


the partition stats will go, you'll just roll up the global stats you want to roll up - using your knowledge of the data



ops$tkyte%ORA10GR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int not null,
  5    y   varchar2(30)
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) ,
 10    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) ,
 11    PARTITION junk VALUES LESS THAN (MAXVALUE)
 12  )
 13  /

Table created.

ops$tkyte%ORA10GR2> insert into t select to_date( '12-mar-2003','dd-mon-yyyy')+mod(rownum,3), user_id, username from all_users;

31 rows created.

ops$tkyte%ORA10GR2> create index t_idx on t(x) local;

Index created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
        31

ops$tkyte%ORA10GR2> select partition_name, num_rows from user_tab_partitions where table_name =
  2  'T';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
JUNK                                   10
PART1                                  10
PART2                                  11

ops$tkyte%ORA10GR2> select num_rows from user_indexes where table_name = 'T';

  NUM_ROWS
----------
        31

ops$tkyte%ORA10GR2> select partition_name, num_rows from user_ind_partitions where index_name =
  2  'T_IDX';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
JUNK                                   10
PART1                                  10
PART2                                  11

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table temp (dt date, x int not null, y varchar2(30) );

Table created.

ops$tkyte%ORA10GR2> insert into temp
  2  select to_date( '12-mar-2003'), user_id, username from all_users;

31 rows created.

ops$tkyte%ORA10GR2> create index temp_idx on temp(x);

Index created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'TEMP' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table t
  2  exchange partition part1
  3  with table temp
  4  including indexes
  5  without validation
  6  /

Table altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> begin
  2          for x in (select sum(num_rows) nrows from user_tab_partitions where table_name = 'T')
  3          loop
  4                  dbms_stats.set_table_stats( user, 'T', numrows => x.nrows );
  5                  dbms_stats.set_index_stats( user, 'T_IDX', numrows => x.nrows );
  6          end loop;
  7  end;
  8  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
        52

ops$tkyte%ORA10GR2> select partition_name, num_rows from user_tab_partitions where table_name = 'T';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
JUNK                                   10
PART1                                  31
PART2                                  11

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select num_rows from user_indexes where table_name = 'T';

  NUM_ROWS
----------
        52

ops$tkyte%ORA10GR2> select partition_name, num_rows from user_ind_partitions where index_name =
  2  'T_IDX';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
JUNK                                   10
PART1                                  31
PART2                                  11
 

Thanks gazillion

Apurva, December 01, 2006 - 1:15 am UTC


Local vs. (Global + Local) statistics

Apurva, December 06, 2006 - 6:00 am UTC

Tom,

Consider this scenario:

My table has 12 partitions, with ~200 million rown in each partition. I have statistics gathered for each individual partition, but don't have the Global statistics for the table.

Can I expect to achieve a significant increase in query performance with Global indexes gathered?

Or, gathering Global stats will only be too consuming without much improvement in query response (as I already have stats for individual partitions)?

I understand this question is a bit abstract, and that I can test it myself -- but I don't enough data or resources to test it.

Please share your invaluable views.
Thanks

Tom Kyte
December 07, 2006 - 8:33 am UTC

...
Can I expect to achieve a significant increase in query performance with Global 
indexes gathered? 

....

one of three things can be expected:

a) increase in performance
b) decrease in performance (unlikely, but possible)
c) no change at all in performance

in other words - IT DEPENDS.

if your queries all include the partition key in such a way that the optimizer knows "only one partition will be hit", local statistics will be used to optimize that query.

global statistics may never be used (meaning C is likely)

on the other hand, if you run queries that do not use partition elimination down to a single partition - then global statistics will be used...


Now, are you sure you don't have SOME global stats :)


ops$tkyte%ORA10GR2> CREATE TABLE t
  2  (
  3    dt  date,
  4    x   int,
  5    y   varchar2(30)
  6  )
  7  PARTITION BY RANGE (dt)
  8  (
  9    PARTITION part1 VALUES LESS THAN (to_date('13-mar-2003','dd-mon-yyyy')) ,
 10    PARTITION part2 VALUES LESS THAN (to_date('14-mar-2003','dd-mon-yyyy')) ,
 11    PARTITION junk VALUES LESS THAN (MAXVALUE)
 12  )
 13  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> insert into t
  2  select to_date( '12-mar-2003','dd-mon-yyyy')+mod(rownum,3), object_id, object_name
  3    from all_objects;

50122 rows created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', partname => 'PART1', granularity => 'PARTITION' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------


ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', partname => 'PART2', granularity => 'PARTITION' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------


ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', partname => 'JUNK', granularity => 'PARTITION' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
     50122

ops$tkyte%ORA10GR2> select partition_name, num_rows from user_tab_partitions where table_name = 'T';

PARTITION_NAME                   NUM_ROWS
------------------------------ ----------
JUNK                                16707
PART1                               16707
PART2                               16708

 

Thanks Tom!

Apurva, December 07, 2006 - 9:33 am UTC


Off-the-topic

Apurva, December 07, 2006 - 4:15 pm UTC


"What gets us into trouble is not what we don't know;
it's what we know for sure that just ain't so."

-- Twain

sys_op_countchg - Usage

Arindam Mukherjee, January 04, 2007 - 4:27 am UTC

Sir,

Today I was reading a chapter "The Clustering Factor" written by Mr. Jonathan Lewis in his new book (Chapter - 5) and have found one function "sys_op_countchg ()". I have found this explanation rather opaque due to my poor knowledge on Oracle. Could you kindly explain its usage with your inimitable style for us?
Why and How to use this function in live or production database Oracle 9i and 10g?

Tom Kyte
January 05, 2007 - 9:03 am UTC

you do not, it is not documented, do not use it.

Last_analyzed shows the incorrect date

Maulesh, January 22, 2007 - 7:19 am UTC

Hi TOM,
Thanks for the support and sharing knowledge about Oracle. We have implemented Schedule Job for statistics gathering . For which we are using Analyze statement .
But currently we come to know that values for Last_analyzed is not geeting updated daily and it is just remain as it is . Can you please help me on this ?

Thanks

system stats need updating?

steve, February 21, 2007 - 11:53 am UTC

Oracle 10G
Solaris 9


Regarding the "out of the box" gathering of system statistics, I'm wondering why this query is so slow. It takes well over 1 minute to run and there's not that many connections/processes (< 500).

select p.username pu,
s.username su,
s.status stat,
s.sid ssid,
s.serial# sser,
lpad(p.spid,7) spid,
to_char(logon_time,'YYYY-MM-DD HH24:MI:SS') logon_time,
substr(sa.sql_text,1,540) txt
from v$process p, v$session s, v$sqlarea sa
where p.addr=s.paddr
and s.username is not null
and s.sql_address=sa.address(+)
and s.sql_hash_value=sa.hash_value(+)
order by 1,2,7



The explain plan is:
====================

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1091464685

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 738 | 100 (4)| 00:00:02 |
| 1 | SORT ORDER BY | | 1 | 738 | 100 (4)| 00:00:02 |
|* 2 | HASH JOIN OUTER | | 1 | 738 | 99 (4)| 00:00:02 |
| 3 | NESTED LOOPS | | 1 | 204 | 73 (2)| 00:00:01 |
|* 4 | HASH JOIN | | 1 | 191 | 49 (3)| 00:00:01 |
|* 5 | FIXED TABLE FULL | X$KSUSE | 1 | 142 | 24 (0)| 00:00:01 |
|* 6 | FIXED TABLE FULL | X$KSUPR | 1 | 49 | 24 (0)| 00:00:01 |
|* 7 | FIXED TABLE FIXED INDEX| X$KSLED (ind:2) | 1 | 13 | 24 (0)| 00:00:01 |
|* 8 | VIEW | GV$SQLAREA | 100 | 53400 | 25 (4)| 00:00:01 |
|* 9 | FILTER | | | | | |
| 10 | SORT GROUP BY | | 100 | 72300 | 25 (4)| 00:00:01 |
| 11 | FIXED TABLE FULL | X$KGLOB | 100 | 72300 | 24 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("S"."KSUSESQL"="ADDRESS"(+) AND "S"."KSUSESQH"="HASH_VALUE"(+))
4 - access("ADDR"="S"."KSUSEPRO")
5 - filter("S"."KSUUDLNA" IS NOT NULL AND "S"."INST_ID"=:B1 AND
BITAND("S"."KSSPAFLG",1)<>0 AND BITAND("S"."KSUSEFLG",1)<>0)
6 - filter("INST_ID"=:B1 AND BITAND("KSSPAFLG",1)<>0)
7 - filter("S"."KSUSEOPC"="E"."INDX")
8 - filter("INST_ID"(+)=:B1)
9 - filter(SUM(DECODE("KGLOBT09",0,"KGLOBT02",0))<>0)

30 rows selected.

SQL> select count(*) from X$KSUSE;

COUNT(*)
----------
1200

SQL> select count(*) from X$KSUPR;

COUNT(*)
----------
700

SQL> select count(*) from X$KSLED;

COUNT(*)
----------
812


The last count(*) took a long time!
====================================

SQL> set timing on
SQL> select count(*) from X$KGLOB;

COUNT(*)
----------
933900

Elapsed: 00:01:08.83

Tom Kyte
February 21, 2007 - 12:27 pm UTC

you might want up to date stats on the fixed tables, yes.

DBMS_STAT

AD, February 21, 2007 - 1:56 pm UTC

Hi Tom,

Some of my team members think that we should have a wrapper (package) that would call DBMS_STATS to gather statistics instead of using the DBMS_STATS directly. What is your opinion about this approach.


Regards

Tom Kyte
February 21, 2007 - 3:12 pm UTC

depends on the goal of the wrapper package

if it simply calls dbms_stats - call for call - it would not be "smart"

if it does something "intelligent" (and you have to define intelligent in your context), then perhaps - but it would not really be a wrapper package at that point, but a set of procedural logic that gathers statistics on your system in the manner you find best.

SEan, February 25, 2007 - 8:59 am UTC

Tom,
i used dbms_stats.gather_database_stats as :
begin sys.DBMS_STATS.GATHER_DATABASE_STATS (); end;

and found out some of the schemas ware not gathered.

does it gather all the schemas in the database as supposed?

TIA


Tom Kyte
February 26, 2007 - 1:24 pm UTC

umm,

versions?
what schemas?

got example?

Sean, February 27, 2007 - 10:21 am UTC

it's 9.2.0.7
the job completed via dbms_job and also checked no errors in alert.log. the job have a few time after Feb 25.

the schemas ware not completed:
SQL> select owner, count(*) not_c from dba_tables where last_analyzed <'24-FEB-07' group by owner;

OWNER NOT_C
------------------------------ ---------
BIM 61
BIS 102
ENI 46
FII 76
GL 1
HRI 53
ISC 10
JTF 6
OPI 24
OSM 1
POA 36
PORTAL30_DEMO 1
SYS 316

13 rows selected.

the follwoing t_count is the total no. of the tables owned by the schema.

SQL> select b.owner, count(*) t_count from dba_tables b where b.owner in (select a.owner from dba_ta
bles a
2 where a.last_analyzed <'24-FEB-07') group by b.owner;

OWNER T_COUNT
------------------------------ ---------
BIM 261
BIS 460
ENI 140
FII 278
GL 176
HRI 185
ISC 37
JTF 403
OPI 122
OSM 198
POA 138
PORTAL30_DEMO 9
SYS 350

13 rows selected.

SQL>

Tom Kyte
February 27, 2007 - 11:03 am UTC

so, lets see the JOB.

Sean, February 27, 2007 - 1:16 pm UTC

those are not LOBs. not sure why dbms_stats.gather_database_stats does not pick up those.
analyze table does work for those...

SQL> select b.owner, b.object_name, b.object_type from
2 dba_objects b, dba_tables a
3 where a.owner=b.owner
4 and b.object_name=a.table_name
5 and a.last_analyzed <'24-FEB-07' and a.owner='SYS'
6 and rownum<3
7 /

OWNER OBJECT_NAME OBJECT_T
-------- ------------------------- --------
SYS FILE$ TABLE
SYS BOOTSTRAP$ TABLE

SQL> desc BOOTSTRAP$
ERROR:
ORA-04043: object BOOTSTRAP$ does not exist


SQL> desc sys.BOOTSTRAP$
Name Null? Type
----------------------------------------------------- -------- -----------------------
LINE# NOT NULL NUMBER
OBJ# NOT NULL NUMBER
SQL_TEXT NOT NULL VARCHAR2(4000)

SQL> select last_analyzed from dba_tables where table_name='BOOTSTRAP$';

LAST_ANAL
---------
07-OCT-06

SQL> analyze table sys.BOOTSTRAP$ compute statistics;

Table analyzed.

SQL> select last_analyzed from dba_tables where table_name='BOOTSTRAP$';

LAST_ANAL
---------
27-FEB-07

SQL>

SQL> select job
2 , log_user
3 , priv_user
4 , schema_user
5 , last_date
6 , last_sec
7 , this_date
8 , this_sec
9 , next_date
10 , next_sec
11 , total_time
12 , broken
13 , interval
14 , failures
15 , what
16 from sys.dba_jobs where log_user='SYS'
17 /

LOG PRIV SCHEMA
JOB USER USER USER LAST_DATE LAST_SEC THIS_DATE THIS_SEC
------- -------- -------- -------- --------- -------- --------- --------
TOTAL
NEXT_DATE NEXT_SEC TIME B INTERVAL FAIL WHAT
--------- -------- ------ - -------------------- ---- --------------------

40704 SYS SYS SYS 26-FEB-07 19:33:42
04-MAR-07 19:33:42 41484 N sysdate + 6 0 begin sys.DBMS_STATS
.GATHER_DATABASE_STA
TS (); end;

Tom Kyte
February 27, 2007 - 2:24 pm UTC

LOBS? I said "job", just wanted to see the parameters.

where those tables EVER analyzed?

Sean, February 27, 2007 - 2:50 pm UTC

use portal30_demo as example:
SQL> select table_name ,last_analyzed from dba_tables where owner='PORTAL30_DEMO';

TABLE_NAME LAST_ANAL
------------------------------ ---------
EMP 27-FEB-07
DEPT 27-FEB-07
MLOG$_EMP 14-NOV-04
RUPD$_EMP
EMP_SNAPSHOT 27-FEB-07
PEOPLE_INFO$ 27-FEB-07
TASK_CATEGORY$ 27-FEB-07
TASK_DEVELOPER$ 27-FEB-07
TASK$ 27-FEB-07

there are two tables were not picked by dbms_stats.gather_database_stats.
Tom Kyte
February 27, 2007 - 3:39 pm UTC

system generated things, so - got version?

older releases didn't really want statistics on some things.

it's 9.2.0.7

Sean, February 27, 2007 - 5:53 pm UTC


Tom Kyte
February 28, 2007 - 3:39 pm UTC

gather database stats skips those two particular segments.

dynamic sampling

Amit, March 01, 2007 - 12:38 pm UTC

Hi Tom,

Thanks for sharing your vast experience and knowledge with us.

From discussions about dynamic sampling above, I gathered the following information-
1)In 9i we have to collect the stats ourself,
2)In 10g, Oracle calc the stats for us (when we first fire the query during its hard parse)

Assuming point 2 above is right, I tried a test in 10g R2
But what I have noticed is dynamic sampling is not kicked in automatically as you can see in Area 1 below ?
and other issues in area 2 and area 3

10g Rel 2
=========================================================
AREA 0)
SQL> create table temp
2 (a number,
3 b number);

Table created.

SQL> declare
2 begin
3 for i in 1..100000
4 loop
5 insert into temp values(i,i+1);
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.
============================================================
AREA 1)

SQL> explain plan for select a from temp;
Explained.

SQL> select * from table( dbms_xplan.display );
PLAN_TABLE_OUTPUT
-----------------------------------------------------------
| Id | Operation | Name |
----------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| TEMP |
----------------------------------
Note
-----
- 'PLAN_TABLE' is old version
- rule based optimizer used (consider using cbo)
12 rows selected.
===========================================================
So in Above it is not using dynamic_sampling automatically because it is using RBO and not CBO

But if I hint it to use dynamic sampling (as follows) then it shift to CBO and Rows are guesses nearly Right
=======================================
AREA 2)
SQL> explain plan for select /*+ dynamic_sampling (t 10) */ a from temp;

Explained.

SQL> select * from table( dbms_xplan.display );

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------

----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------
| 0 | SELECT STATEMENT | | 83932 | 1065K| 57 |
| 1 | TABLE ACCESS FULL| TEMP | 83932 | 1065K| 57 |
----------------------------------------------------------

Note
-----
- 'PLAN_TABLE' is old version

11 rows selected.
========================================================
Now dynamic sampling is done in previous step so it should hard parsed and plan is ready in shared pool. This same plan should be used if I fire syntactically same query again but again it is going back to RBO (as follows)
===============================================
AREA 3)
SQL> explain plan for select a from temp;

Explained.

SQL> select * from table( dbms_xplan.display );

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

----------------------------------
| Id | Operation | Name |
----------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| TEMP |
----------------------------------

Note
-----
- 'PLAN_TABLE' is old version
- rule based optimizer used (consider using cbo)

12 rows selected.
================================================

Tom Kyte
March 02, 2007 - 12:48 pm UTC

1) you have dynamic sampling in 9i as well - it is just defaulting to a "lower level" than 10g did.



the RBO doesn't sample because the rbo would never use the information. Do not use the RBO in 10g.

Tables with stale stats ignored by dbms_stats

A reader, March 01, 2007 - 4:35 pm UTC

Tom,
This is 10gr2. We use the following command to gather statistics:
DBMS_STATS.GATHER_SCHEMA_STATS(ownname=>'ABC',GRANULARITY=>'ALL',CASCADE
=>TRUE,METHOD_OPT=>'FOR ALL COLUMNS size 1',DEGREE=>DBMS_STATS.AUTO_DEGREE);

The tables are in monitoring mode. We were told by Oracle support that the above method will ignore tables with stale statistics. Here is is quote from analyst:
=============
It is because some of the objects become stale and are ignored while gathering schema statistics. It is better to use
options=>'GATHER STALE' in the procedure.
=============

I always thought that if I do not mention any option, it will gather statistics for all tables in the schema, stale or not.

Thanks

Analyze process is taking very long

Apurva, March 05, 2007 - 11:16 am UTC

Tom,

I am using the following piece of PL-SQL code to analyze a 20-column table which has ~650 million rows.

<SQL>
begin
dbms_stats.gather_table_stats(ownname => 'ops\$datamart', tabname => 'FCT_SALES', estimate_percent => 15 ,
granularity => 'ALL', cascade => true, force => true);
end;
/
</SQL>

The problem is that the analyze process is taking close to 21 hours!, although we have a very good architecture.

What would be your take on this?

Thanks in anticipation


Tom Kyte
March 05, 2007 - 2:25 pm UTC

... although we have a very good architecture. ...

well, a pretty building won't make anything run faster (sorry, I could not resist - although we have a very good architecture... don't even know what to make of that!! :)

if the table is dense, you could try block sampling (do 15% of the IO to read).

You could also introduce "parallel" into it.

even if you do not use parallel on the table itself, you could run more than one gather against the table + indexes, instead of cascade...

or if you are on 10g, you might not even need cascade if you are re-creating the indexes after a load since the indexes collect stats during a create now.


but - you really don't give us much to work with. 20 columns and 650 million rows and a "very good archicture" plus $1 will get you a cup of coffee (not at starbucks of course).

you don't mention why you gather statistics (what do you do that causes this, a truncate and reload, a load of a new partition, "time" - what)

you don't tell us about the structure of this table

you don't tell us very much........

Table hung when dbms_stats

A reader, March 08, 2007 - 2:38 pm UTC

Hi Tom,

We have one partitioned table with 3986259 rows, we have script scheduled (twice a week) to gather stats on this table. This table usually takes maximum 10Mins to gather stats without any issues. But yesterday this job did not finished at all, we had to kill that process after 14Hours. When this was running any query on this table was hanging, even "desc" from sqlplus hung. As soon as this process was killed all was normal. Due to this reason any application that was trying to access this table was just freezing and brought down the whole system.


DBMS_STATS.gather_table_stats ('AUSER',
'ATABLE',
estimate_percent => 5,
CASCADE => TRUE
);

Do you know what could be the problem?

Many Thanks



Tom Kyte
March 08, 2007 - 3:34 pm UTC

please utilize support for this one.

A reader, March 08, 2007 - 2:56 pm UTC

Sorry forgot to mention that this Oracle 9.2.0.6 RAC on Sun Solaris

Thanks

A reader, March 12, 2007 - 4:22 am UTC

Thanks

Thanks, but...

Apurva, March 16, 2007 - 4:19 pm UTC

Tom,

Thanks a lot for all your help, and sorry for following-up a bit late: the system as well the building it's housed in have a wonderful architecture :-)

* 8 CPUs
* 32 GB RAM
* Oracle 10g R2 - both client and server run on Linux
* 1071, Fifth Avenue at 89th Street

Here are the details of how we load out fact table:
* The table is partitioned by MONTH (date format)
* In every refresh we first truncate the entire table
* Then using SQL loader we populate historical as well as current month's data [and in a given month we load about 200 million rows]
* Then we build 12 local bitmap indexes on 12 non-fact columns
* Then we would gather the statistics of the table using the following procedure:

<SQL>
begin
dbms_stats.gather_table_stats(ownname => 'ops\$datamart', tabname => 'FCT_SALES', estimate_percent => 15 ,
granularity => 'ALL', cascade => true, force => true);
end;
/
</SQL>

...but it would take close to a million years (~21 hours)

I came to you and as per your suggestion I removed 'force => true' option, but Oracle is misbehaving and is still gathering statistics for the indexes?! such that the overall and the time taken is still ~20 hours. Moreover, it's index analyzation that is eating most of the time - we found that out of 20 odd hours that the entire process takes, Oracle spends close to 18 hrs to gather stats for indexes...and quite obviously it's not a good idea as - as you remarked - in 10g index stats are automatically gathered when the index is built.

Seems it's an Oracle bug? What are your thoughts?

Thanks in anticipation



Tom Kyte
March 17, 2007 - 4:02 pm UTC

cascade is what affects indexes, oracle is not misbehaving, it is simply doing what you told it to.

cascade=>false, not force.

did you try ANY of the other things I mentioned??!!??


Apurva, March 17, 2007 - 11:09 pm UTC

that's what happens when you write your mails in anticipation of St. Patrick's day...

I simply removed 'cascade=>true' (and not 'force=>true' - as I miswrote) and did not explicitly write 'cascade=>false', in the fond hope that default value for cascade parameter is 'false'

I tried your 'block sampling' suggestion and found that it did not affect analyze time substantially - index-analyzation is the killer

Tom,

Thanks *A LOT* again.


it's working!!!!!!

Apurva, March 17, 2007 - 11:22 pm UTC


wrapper for DBMS_STATS

Chris Gould, March 20, 2007 - 7:10 am UTC

I noticed recently that if you use DBMS_STATS.GATHER_TABLE_STAS with CASCADE=>TRUE then it gathers stats for the table and each index in turn. This can take a long time, especially if there are a lot of indexes on the table (eg. in a warehouse). One way to speed this up is to use DBMS_SCHEDULER to create and run jobs for each index on the table simultaneously. Scheduler and the resource manager should ensure the database is not swamped and the jobs all complete as efficiently as possible. The following code indicates the idea :
begin
  for r in (select ui.index_name
                  ,uip.partition_name
                  ,dbms_scheduler.generate_job_name('IXAZ_') as jobname
              from user_indexes ui
            inner join user_ind_partitions uip
             on (uip.index_name = ui.index_name)
            where ui.table_name = 'ASSET'
              and uip.partition_name = 'P200703'
              order by ui.index_name,uip.partition_name )
  loop
  
        execute immediate
'BEGIN
   DBMS_SCHEDULER.CREATE_JOB (
   job_name             => :jobname, 
   job_type             => :jobtype,
   job_action           => :jobaction,
   start_date           => NULL,
   repeat_interval      => NULL, 
   end_date             => NULL,
   enabled              => TRUE,
   auto_drop            => TRUE,
   comments             => :jobcomments);
END;'
using 
        r.jobname
      ,'PLSQL_BLOCK'
      ,'begin dbms_stats.gather_index_stats(
         ownname=>'''||user||'''
         ,indname=>'''||r.index_name||'''
         ,partname=>'''||r.partition_name||''''||');end;'
      ,'user='||user||',index='||r.index_name||',ptn='||r.partition_name;
      
end loop;

end;


The code above only considers the indexes for one partition on the table, but the total time to analyze all the indexes will be approx the time taken for the largest index if analyzed individually.

wrapper for DBMS_STATS - additional info

Chris Gould, March 21, 2007 - 5:28 am UTC

One suggestion I'd make regarding the DBMS_SCHEDULER wrapper to DBMS_STATS is not to specify a degree of parallelism > 1 when analyzing. Using a large degree of parallelism seems to confuse the resource manager - I guess because it thinks it's executing one job but this then spawns multiple sessions for the additional degrees of parallelism - and you can quickly exhaust the supply of available sessions.

Need Guidance on Partitioned tables

Vk, September 20, 2007 - 5:45 am UTC

Hi Tom,

We have huge patririoned tables which are continually loaded on a daily basis based on Range partition, and we do maintain 30 days worth of retention. The data for the past 29 days is static for the 30th day and we don't want to re-compute the partition level statistics again for them as they have become stale.

However we drop the old partition and adds a new partition on a daily basis , what is termed as Rolling Window.

Now please provide your suggestions:

1. We are doing gather_table_stats in a loop for the complete schema using

dbms_stats.gather_table_stats(ownname=> l_username,
tabname=> l_table_name,
estimate_percent => 5,
method_opt => l_method_opt,
degree => 16,
cascade => true);

Where l_method_opt is :
'for columns ' || p_col_name || ' SIZE 1'; for the partitioned tables and p_col_name is the column on which the data is partitioned.

However if the table is non-partitioned then
l_method_opt is :
'for all indexed columns SIZE 1';

Now this complete gathering of stats takes around 3 hrs and for us it is quite significant.

Can you please provide us suggestions as to how we can capture global and partition level statistics without recomputing them all over again and again.

Your advice/help is appreciated.
Tom Kyte
September 24, 2007 - 7:15 am UTC

why would you gather stats on the partition key only? that seems a little strange, what is the thought process behind that?


If you gather local stats on only the single new partition - we'll do our best to "guess" the global stats. In many cases - we'll get it right (eg: number of total rows). In other cases, we might not - eg: number of distinct values - if you have 30 partitions and X has 10 distinct values in each partition - you either have a) 10 distinct values of X, b) 300 distinct values of X, or c) between 11 and 299 distinct values of X - the only way to truly know would be to analyze everything OR have you use dbms_stats.set_xxxxx routines to tell us.

Thought Process of only gather stats on Partitioned Column

Vk, September 24, 2007 - 1:53 pm UTC

Hi Tom,

Since every query which comes to us has to have the WHERE clause suppoted by the PARTITIONED COLUMN we thought we don't need to gather stats on any other column, since this will drive the rows flowing from this step.

I am pretty okay in gathering only the stats for PARTITITIONED COLUMN and not on all the columns of the table as is done by Gather Stats by default. One reson is the time it takes to gather stats is very high on nearly a 1TB of database.

I am also thinking to gather the partition level stats only for the latest partition and not capture global level stats if Oracle's CBO can guess it 90% correct, I am happy with it.

Thanks,
Vikas khanna
Tom Kyte
September 26, 2007 - 8:30 pm UTC

umm, that is flawed

if you only get column stats on the partition column, what would the estimated card= value be for something like:

select from t where x in (select Y from partitioned_table where key=val and SOME_COL='abc')

if we know nothing about some_col, we cannot get the card=value.

so, please - go into more detail as to why you did this? what facts did you use in coming to this conclusion?

the "partition_key=value" genearally is used for partition elimination. after that, we still - well - need statistics.

Analyze Vs DBMS_STATS

Karthick, October 17, 2007 - 12:40 am UTC

for a particular table iam able to use the ANALYZE command but not DBMS_STATS.

see the test case below.

Enter user-name: sysadm/sysadm@inlabtst

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select count(1) from HX_SSN_GEN_STG_TBL;

  COUNT(1)
----------
    162439

SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SYSADM', tabname => 'HX_SSN_GEN_STG_TBL');
BEGIN DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SYSADM', tabname => 'HX_SSN_GEN_STG_TBL'); END;

*
ERROR at line 1:
ORA-20000: Unable to analyze TABLE "SYSADM"."HX_SSN_GEN_STG_TBL", insufficient
privileges or does not exist
ORA-06512: at "SYS.DBMS_STATS", line 13046
ORA-06512: at "SYS.DBMS_STATS", line 13076
ORA-06512: at line 1


SQL> ANALYZE TABLE HX_SSN_GEN_STG_TBL COMPUTE STATISTICS;

Table analyzed.

SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SYSADM', tabname => 'HX_NAMES_STG');

PL/SQL procedure successfully completed.

just for that one table iam not able to use DBMS_STATS for other tables i can. but i can use ANALYZE command on that table. any idea what could be the problem.

Found the answer

Karthick, October 17, 2007 - 12:53 am UTC

Sorry for posting the pervious one tom. After some search in orafaq got the answer..

SQL>  select username, privilege from user_sys_privs;

USERNAME                       PRIVILEGE
------------------------------ ----------------------------------------
SYSADM                         UNLIMITED TABLESPACE
SYSADM                         CREATE TABLE
SYSADM                         CREATE SESSION
SYSADM                         ANALYZE ANY

SQL> GRANT SELECT ANY TABLE TO SYSADM;

Grant succeeded.

SQL> select username, privilege from user_sys_privs;

USERNAME                       PRIVILEGE
------------------------------ ----------------------------------------
SYSADM                         UNLIMITED TABLESPACE
SYSADM                         SELECT ANY TABLE
SYSADM                         CREATE TABLE
SYSADM                         CREATE SESSION
SYSADM                         ANALYZE ANY

SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'SYSADM', tabname => 'HX_SSN_GEN_STG_TBL');

PL/SQL procedure successfully completed.

About the first entry

Alvin, February 21, 2008 - 7:39 pm UTC

On a 10g box.

Will running an analyze table overwrite the statistics gathered by dbms_stats earlier ?

Dbms_stats ran at 10am
analyze table ran at 11am

Do both (analyze table and DBMS_Stat) store statistics at the same location ? (hence the earlier question of overwritting).
Tom Kyte
February 21, 2008 - 10:43 pm UTC

statistics are stored in one place. so, sure they would.

Killind DBMS_STATS in the middle of a large table

Ricardo Masashi, March 16, 2008 - 10:16 am UTC

Hi, Tom,

I was wondering what happen if we kill a DBMS_STATS running job, by some reason, in the middle of gathering statistics of a large/huge table.

That table would contain stale, last analyzed or no stats ?

My concern is if the dbms_stats:
- Delete stats before collecting;
- Collect before overwriting;

Thanks!
Tom Kyte
March 24, 2008 - 8:26 am UTC

well, statistics are gathered and updated and gathered and updated - and then finally committed at the end - as a transaction.

So, dbms_stats works like DDL in that sense, all or nothing.

different execution plans when using ANALYZE versus DBMS_STATS

A reader, May 16, 2008 - 10:24 am UTC

hi tom,

i analyzed one table, one time with ANALYZE the other time with DBMS_STATS.GATHER_TABLE_STATS (commands below). i got different executions plans for the same query. good with analyze, bad with DBMS_STATS.GATHER_TABLE_STATS. when comparing all statistics i found they to be the same with the excpetion of the histograms having different slot sizes (see below). 


analyze table gfm.SUBCONFIG_CONFIGURABLE_OBJ_T estimate statistics
for table
for all indexes
for all indexed columns
sample 40 percent;


exec dbms_stats.gather_table_stats ( -
  'GFM', 'SUBCONFIG_CONFIGURABLE_OBJ_T', estimate_percent=>40 -
 ,method_opt=>'for all indexed columns', cascade=>true);


SQL> select * from (
  2  select 'ANALYZE' meth, a.* from system.t_analyze_hist a
  3  UNION ALL
  4  select 'DBMSSTATS', s.* from system.t_stats_hist s
  5  ) where column_name = 'SUBCONFIG_ID'
  6  order by 1;

METH       OWNER      TABLE_NAME                     COLUMN_NAME          ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_A
---------- ---------- ------------------------------ -------------------- --------------- ----------
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                     782              0
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    1662              1
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    2546          19371
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    3423          19390
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    4315          19411
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    5191          19471
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    6075          19490
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    6943          19510
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    7826          19531
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    8720          19571
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    9636          19591
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   10531          19611
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   11414          19631
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   12301          19651
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   13184          19671
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   14056          19673
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   14930          19691
ANALYZE    GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   15816          19711
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    1526              0
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    3033              1
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    4396          19371
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    5745          19390
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    7119          19411
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    8486          19471
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                    9894          19490
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   11234          19510
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   12627          19531
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   13996          19571
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   15363          19591
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   16761          19611
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   18149          19631
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   19540          19651
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   20926          19671
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   22320          19673
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   23722          19691
DBMSSTATS  GFM        SUBCONFIG_CONFIGURABLE_OBJ_T   SUBCONFIG_ID                   25042          19711

36 rows selected.

Q1) could the slot size difference of the histograms be the reason for the different plan?
Q2) what should i do in the future? Going back to ANALYZE?

Tom Kyte
May 19, 2008 - 3:10 pm UTC

q1) yes
q2) ummm, no. your output is hard to read - not sure how many buckets each one generated, but use something that generates the same buckets

and realize that since you have histograms, you'll get different plans for different "inputs" over time. sometimes (since you do not have a bucket PER VALUE) dbms_stats will get the better plan than analyze did - just change your "predicate" and you'll see - you have an imperfect picture of your data here.

different execution plans when using ANALYZE versus DBMS_STATS

A reader, May 20, 2008 - 10:30 am UTC

hi tom,

thanks for your answer. while doing further investigations on ANALYZE versus DBMS_STATS i found that i get the same histograms on COMPUTE statistics, but different ones on ESTIMATE SAMPLE x percent.

i would assume now that the internal sampling is done differently by ANALYZE than DBMS_STATS. correct?
Tom Kyte
May 20, 2008 - 11:45 am UTC

if you ran analyze over and over - you would discover IT generates different statistics.

they are random samples, if you randomly sample something over and over - you'll get different results....


how oracle will use the statistics after rename a table

Yuna, May 30, 2008 - 9:24 am UTC

Hi Tom,

We are going to delete tons of rows from a table. We will do the following:

1) create a temp table;
2) copy the good data from old table (A) to the temp table;
3) create index and trigger in the temp table;
4) rename old table to A_archive table;
5) rename temp table to A old table;
6) trucate A_archive table;

Thw question is after the temp is rename to A old table ( the same name). Whether Oralce will use the old statistic from the old table? in that case we need to analyze the new table right way?


Tom Kyte
May 30, 2008 - 3:07 pm UTC

the statistics are tracked by object ids, not the 'name'.


ops$tkyte%ORA10GR2> create table t1 as select * from all_objects;

Table created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T1' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> create table t2 as select * from t where owner = 'SCOTT';

Table created.

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T2' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select table_name, num_rows from user_tables where table_name in ( 'T1', 'T2' );

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                  49904
T2                                     16

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> alter table T1 rename to A_archive;

Table altered.

ops$tkyte%ORA10GR2> alter table T2 rename to T1;

Table altered.

ops$tkyte%ORA10GR2> truncate table A_archive;

Table truncated.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select table_name, num_rows from user_tables where table_name in ( 'T1', 'T2', 'A_ARCHIVE' );

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
A_ARCHIVE                           49904
T1                                     16


about the statisitcs

yuna, June 10, 2008 - 2:18 pm UTC

Thank you very much Tom!

I have the following questions.

1) After creating a table T1 and doing work for a while.

a: Without anylyze it : select table_name, num_rows from user_tables where table_name in = 'T1';

No rows returned. But we can see the statistics from ORACLE enterprise manager console.

b: Anylyze it : select table_name, num_rows from user_tables where table_name in = 'T1';

Row will be returned.

Does this mean after creating a table and never analyze that table manually, Oracle database still getting the related statistics for it?

2) If I want to calculate the table size using user_extents, whether I need to analyze the table in order to get the correct number? Whether the statisitcs by analyze or dbms_ stats is only for execution plan?

Thank you again for your wisdom.

Best regards, Yuna

Tom Kyte
June 10, 2008 - 2:59 pm UTC

a: Without anylyze it : select table_name, num_rows from user_tables where
table_name in = 'T1';

No rows returned. But we can see the statistics from ORACLE enterprise manager
console.



sorry, but that is not true, that returns a row as soon as the table is 'there', you have made a mistake somewhere.



2) user extents shows you allocated space, it is always accurate about what is allocated for the table.

still the statistics after creating that table (2)

Yuna, June 11, 2008 - 9:48 am UTC

Hi Tom,

Please see the following scripts about creating table without statistics in user_tables before analyzing.


SQL S3 >create table temptest2 as select * from JCL_SCHEDULE;

SQL S3 >select count(*) from temptest2;

COUNT(*)
----------
10

SQL S3 >select NUM_ROWS from user_tables where TABLE_NAME ='TEMPTEST2';

NUM_ROWS
----------


SQL S3 >analyze table temptest2 compute statistics for table for all indexed columns for all
indexes; 2

Table analyzed.

SQL S3 >select NUM_ROWS from user_tables where TABLE_NAME ='TEMPTEST2';

NUM_ROWS
----------
10

Thank you very much for your comments.

Best regards
Tom Kyte
June 11, 2008 - 9:55 am UTC

yes, so, this is how it is supposed to work and you got a record EACH TIME - both times you got a record.

so, what is the question?


(do not use analyze to gather statistics, use dbms_stats)

please read it again

A reader, June 11, 2008 - 10:51 am UTC

Please read the process again. There is no record for the first time run of :

SQL S3 >select NUM_ROWS from user_tables where TABLE_NAME ='TEMPTEST2';

NUM_ROWS
----------


Thank you!

Best regards, Yuna

Tom Kyte
June 11, 2008 - 11:19 am UTC

sure there is.

select ROWNUM, num_rows from user_tables where table_name = 'TEMPTEST2';


you cannot "see" null.


ops$tkyte%ORA9IR2> select null from dual where 1=0;

no rows selected

ops$tkyte%ORA9IR2> select null from dual where 1=1;

N
-


ops$tkyte%ORA9IR2>


see the difference.

no record ( or null record) returned

A reader, June 11, 2008 - 1:55 pm UTC

Hi Tom,

Thank you for the explanation. maybe my question can be satted as this way, Why before analyzing the table ( with 10 rows in that table), running "select NUM_ROWS from user_tables where TABLE_NAME ='TEMPTEST2';" returns NUll ( no records) from the SQLPLUS, but the result will show in DB console. After analyzing the table and run the same query, it will return the real num_rows to the SQLPLUS and db console?

SQL S3 >create table temptest2 as select * from JCL_SCHEDULE;

SQL S3 >select count(*) from temptest2; ( this is the real number)

COUNT(*)
----------
10

SQL S3 >select NUM_ROWS from user_tables where TABLE_NAME ='TEMPTEST2'; ( NULL returned before analyzing)

NUM_ROWS
----------


SQL S3 >analyze table temptest2 compute statistics for table for all indexed columns for all
indexes; 2

Table analyzed.

SQL S3 >select NUM_ROWS from user_tables where TABLE_NAME ='TEMPTEST2'; ( the real number returned).

NUM_ROWS
----------
10


Thank you!
Best regards, Yuna


Tom Kyte
June 11, 2008 - 5:50 pm UTC

(please stop using analyze)

tell me the precise steps you undertake in dbconsole from the home page in order to see what you say you see.

analize table

A reader, June 12, 2008 - 9:01 am UTC

Thank you very much Tom for the answer.

I am sorry that I gave you wrong info about the db console. I could not see the stats either until I use dbms_stats to gather the infomation (exec dbms_stats.gather_table_stats(user, 'TEMPTEST2'). There was another long existing table before. I could get info from the dbconsole, but not from the user_tables. So I applied that for the newly created table.

the step is schema-table-and the specific table.

So I could not get the stats either from user_tables or the DB console before analyzing the newly created table.

Best regards, Yuna

privileges

A reader, June 12, 2008 - 11:47 am UTC

Hi Tom,

I'd like to know what specific privilege is used to analyze a table?

Thank you!

Best regards, Yuna

MV logs need gather stats ?

Megala, June 22, 2008 - 6:54 pm UTC

Tom:

Does materialized view log tables need to be analyzed or gather stats (dbms_stats)? Thanks
Tom Kyte
June 22, 2008 - 9:52 pm UTC

they could, sure, they are a table, in 10g if queried without - we will dynamically sample them at parse time.

so you probably do not need to do anything special for them, we'll sample them at parse time and given how simply they are used - that is probably more than sufficient.

MV log table analyze and estimate_percent => null question

Megala, June 22, 2008 - 11:04 pm UTC

Thanks Tom.

In Oracle 9i,Does MV log table always needs to be analyzed with current stats because after MVs refreshed, MV log will be emptied anyway. ?

Will there be performance problem if MV log tables are not analyzed ?

Thanks for your insight.


2) In oracle 9.2.0.6, When estimate_percent parameter is not specified, Will Oracle gather 100% table stats or ?

Example:

exec dbms_stats.gather_table_stats('CUR_DBA2', tabname => 'CTBILLINGTABLE', degree => dbms_stats.default_degree, cascade =>
true ) ;

Tom Kyte
June 23, 2008 - 7:42 am UTC


1) probably not. In 9i, dynamic sampling is set to 1

"Sample tables that have not been analyzed if there is more than one table in the query, the table in question has not been analyzed and it has no indexes, and the optimizer determines that the query plan would be affected based on the size of this object."


Hence, if you are using the CBO and we access this table using the CBO in a multi-table query, and this mlog$ table is 'large' (allocated space is big), it'll sample it. for example:


ops$tkyte%ORA9IR2> create table t ( x int primary key, y int );

Table created.

ops$tkyte%ORA9IR2> insert into t select rownum, rownum from all_users;

35 rows created.

ops$tkyte%ORA9IR2> create materialized view log on t;

Materialized view log created.

ops$tkyte%ORA9IR2> select object_name, object_type from user_objects order by object_name;

OBJECT_NAME                    OBJECT_TYPE
------------------------------ ------------------
MLOG$_T                        TABLE
RUPD$_T                        TABLE
SYS_C004615                    INDEX
T                              TABLE

ops$tkyte%ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> insert into mlog$_t
  2  select rownum, sysdate, 'x', 'x', utl_raw.cast_to_raw( rpad('x',255,'x') ) from all_objects;

30662 rows created.

ops$tkyte%ORA9IR2> commit;

Commit complete.

ops$tkyte%ORA9IR2> delete from mlog$_t;

30662 rows deleted.

ops$tkyte%ORA9IR2> commit;

Commit complete.

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> show parameter optimizer_dynamic_sampling

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling           integer     1
ops$tkyte%ORA9IR2> set autotrace traceonly explain
ops$tkyte%ORA9IR2> select * from t, mlog$_t where t.x = mlog$_t.x;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=161)
   1    0   NESTED LOOPS (Cost=3 Card=1 Bytes=161)
   2    1     TABLE ACCESS (FULL) OF 'MLOG$_T' (Cost=2 Card=1 Bytes=155)
   3    1     TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=1 Card=1 Bytes=6)
   4    3       INDEX (UNIQUE SCAN) OF 'SYS_C004615' (UNIQUE)



ops$tkyte%ORA9IR2> alter session set optimizer_dynamic_sampling = 0;

Session altered.

ops$tkyte%ORA9IR2> select * from t, mlog$_t where t.x = mlog$_t.x;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=82 Bytes=13202)
   1    0   HASH JOIN (Cost=5 Card=82 Bytes=13202)
   2    1     TABLE ACCESS (FULL) OF 'T' (Cost=2 Card=35 Bytes=210)
   3    1     TABLE ACCESS (FULL) OF 'MLOG$_T' (Cost=2 Card=82 Bytes=12710)



ops$tkyte%ORA9IR2> set autotrace off



see how the plan is with and without dynamic sampling at it's default level...



2) documentation is actually quite good for that

http://docs.oracle.com/docs/cd/B10501_01/appdev.920/a96612/d_stats2.htm#1003993

.. estimate_percent


Percentage of rows to estimate (NULL means compute) The valid range ..

estimate_percent => null

A reader, June 24, 2008 - 6:36 pm UTC

Tom,

I just need to clarify:

estimate_percent => null (default value) is equivalent to
estimate_percent=> 100 in dbms_stats.gather_table_stats ?

Thanks

Tom Kyte
June 24, 2008 - 7:48 pm UTC

what did the documentation say?


oh wait, we already covered that?!!?!?!

<quote from above>
2) documentation is actually quite good for that

http://docs.oracle.com/docs/cd/B10501_01/appdev.920/a96612/d_stats2.htm#1003993

.. estimate_percent
Percentage of rows to estimate (NULL means compute) The valid range .
</quote>


NULL means compute......

100% table analyze

A reader, June 24, 2008 - 11:40 pm UTC

Tom:

my goal is to analyze the table 100%.
Can you suggest , Should i use default "compute" or estimate_percent => 100 for any tables ?

Is there any difference between Estimate(100) and Compute in the dbms_stats.gather_table_stats analyze ?

Under what circumstance, one should use "compute" over estimate_percent => 100.
Tom Kyte
June 25, 2008 - 8:29 am UTC

they are the same, which pleases you more? null or 100?


Lock Schema or Table Stats

A reader, July 29, 2008 - 1:48 pm UTC

Which data dictionary view contains the info about whether a table or a schema's statistics is currently locked?

Tom Kyte
August 01, 2008 - 10:22 am UTC

ops$tkyte%ORA10GR2> select stattype_locked from user_tab_statistics where table_name = 'T';

STATT
-----


ops$tkyte%ORA10GR2> exec dbms_stats.lock_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select stattype_locked from user_tab_statistics where table_name = 'T';

STATT
-----
ALL


confusing stats

a reader, September 04, 2008 - 4:49 pm UTC

Hey Tom:

I created a table.

SQL> select count(*) from MYBITMAP;

COUNT(*)
----------
1270064

SQL> analyze table mybitmap estimate statistics;

Table analyzed.

Elapsed: 00:00:01.44
SQL> select NUM_ROWS, blocks, empty_blocks,PCT_FREE,PCT_USED,CHAIN_CNT,AVG_ROW_LEN from user_tables where table_name='MYBITMAP';

NUM_ROWS BLOCKS EMPTY_BLOCKS PCT_FREE PCT_USED CHAIN_CNT AVG_ROW_LEN
---------- ---------- ------------ ---------- ---------- ---------- -----------
1279153 31156 204 10 0 172


SQL> exec dbms_stats.gather_table_stats(ownname => 'JOSH', tabname => 'MYBITMAP', estimate_percent => 10);

PL/SQL procedure successfully completed.


SQL> select NUM_ROWS, blocks, empty_blocks,PCT_FREE,PCT_USED,CHAIN_CNT,AVG_ROW_LEN from user_tables where table_name='MYBITMAP';

NUM_ROWS BLOCKS EMPTY_BLOCKS PCT_FREE PCT_USED CHAIN_CNT AVG_ROW_LEN
---------- ---------- ------------ ---------- ---------- ---------- -----------
1267080 31156 204 10 0 169

So how come the number of rows are different from analyze and the dbms_stat package, and also neither of them is the same with the real number of rows in the table.


Thank you very much!



Tom Kyte
September 05, 2008 - 7:40 am UTC

because you used estimate on the analyze (you saw that right? the word "estimate")

and you used a 10% estimate with dbms_stats (estimate... and estimate)


and estimate is - a guesstimate, an approximation, a guess. Consider:

ops$tkyte%ORA10GR2> create table t as select * from all_objects;

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', estimate_percent=> 10 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
     49600

ops$tkyte%ORA10GR2> exec dbms_stats.gather_table_stats( user, 'T', estimate_percent=> 10 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';

  NUM_ROWS
----------
     50730



just run dbms_stats and you'll get slightly different answers time after time. It is normal, expected, anticipated even.

Do not use analyze to gather statistics on segments, the optimizer is built to use information dbms_stats gathers and dbms_stats gathers things differently than analyze does. dbms_stats has been kept up to date over the years whereas analyze isn't as far as statistics gathering goes.

confusing stats

A reader, September 04, 2008 - 5:17 pm UTC

Sorry, I get it now, it is because they are only estimating the stats, after I do "compute" the number of rows are correct. but why the average row length from package is different from analyze though?

SQL> analyze table mybitmap compute statistics;

Table analyzed.

Elapsed: 00:02:28.66
SQL>  select NUM_ROWS, blocks, empty_blocks,PCT_FREE,PCT_USED,CHAIN_CNT,AVG_ROW_LEN from user_tables where table_name='MYBITMAP';

  NUM_ROWS     BLOCKS EMPTY_BLOCKS   PCT_FREE   PCT_USED  CHAIN_CNT AVG_ROW_LEN
---------- ---------- ------------ ---------- ---------- ---------- -----------
   1270064      31156          204         10                     0         172

Tom Kyte
September 05, 2008 - 8:03 am UTC

because analyze is legacy code, not maintained and you should only use dbms_stats to gather statistics.

DBMS_STATS in parallel

A reader, October 02, 2008 - 7:29 am UTC

Hi Tom,

If stats are being gathered for one partition, and there is no index on table. Can oracle use degree of parallelism? if yes, what is happening in parallel?

Thanks

Tom Kyte
October 02, 2008 - 7:49 am UTC

yes it can use parallel.

The queries that dbms_stats execute can use parallel execution. That is what would be done in parallel - the sampling queries dbms_stats runs

degree of parallellism

A reader, October 04, 2008 - 9:45 pm UTC

Hi Tom,

Thanks.

If table is not defined with parallel clause, and degree of parallellism is not defined in dbms_stats clause. Does oracle automatically uses degree based upon spfile parameters?

Regards,
Tom Kyte
October 06, 2008 - 2:47 pm UTC

you would see what the default is - which varies by version.

check out the documentation for your release of the database.

which case both Analyze and dbms_stats do the same job(optimizer perspective ?

Mahesh, October 06, 2008 - 1:34 pm UTC

Tom,

Will following do the same thing ?

ANALYZE TABLE <tab_name> COMPUTE STATISTICS;
and
DBMS_STATS.GATHER_TABLE_STATS (
ownname => <owner>,
tabname =><tab_name>,
estimate_percent => 100 ); --> 100%

I am concerned because one of our application requires full statistics on certain IOT table. But few IOT tables are getting errored out(ora-600) due to bug# 4763768

So I am thinking to user dbms_stats with estimate_percent =100.

Will that be the same case as ANALYZE with COMPUTE STATISTICS?

Query Perf Slow after Analyze...

A reader, October 28, 2008 - 5:26 pm UTC

Hi Tom,

In our database :

Ver : 9.2.0.6
Optmiser Mode: FIRST_ROWS


Day#1: dbms_stats were used to gather stats and query "A" is taking 10 sec to get completed.
Day #2 : dbms_job failed . and no stats gathred ..query "A" is taking 10 sec to get completed.

....
....

day #50: and no stats gathred till date after day#1..query "A" is taking 10 sec to get completed.

day #51 : Then we used analyze command (analyze table .... compute statistics) to gather stats - compute stats.

day#52 : we noticed a query "A" finishing in 10 sec earlier now taking 1 hr.

day#52 : Added the hint /*+ Rule */ in quer "A" and now taking 12 sec.

Question:

a) If we gather stats today on day#52, using dbms_stats will the same query will get finished in ~12 sec?
b) CBO could not use efficiently the stats gatherred using analyze table .... compute statistics..? Your comments Please.




Tom Kyte
October 28, 2008 - 7:10 pm UTC

try it, you should not be using analyze, the optimizer is written to use the values dbms_stats provides - which are different than analyze will produce.


Query Perf Slow after Analyze...

A reader, October 28, 2008 - 8:49 pm UTC

Hi Tom

We have gathered stats using the dbms_stats ( gather auto) and it has finished gathering the stats.

There are 1000 tables in our schema. All were analyzed using "Analyze" command on day#52 as mentioned above.

Now after running the dbms_stats.gather_schema_stats ( gather auto) we found the last_analyzed on today only 10 tables.

Question :

a) Whether all the necessary stats which optimiser looks for would have been gathered.?

If not,

b) whether we need to delete statistics and then run dbms_stat again to gather the schema stats.

c) Query "A" still take 1 hr to complete.



Tom Kyte
October 28, 2008 - 9:47 pm UTC

you only gathered on 10 things. So, you can probably answer (a) yourself.

where any of the objects query A accesses in that set of 10?

Query Perf Slow after Analyze...

A reader, October 28, 2008 - 9:57 pm UTC

continued.....
d)If I
exe dbms_stats.delete_schema_stats('SCHEMA_OWNER')
and then run
exe dbms_stats.gather_schema_stats with gather auto on the above schema.
Then what stats optimiser will take after delete and before create statistics?

e) is it safe to do d) above in Live server?

Regards.



Tom Kyte
October 29, 2008 - 7:47 am UTC

d) the ones the documentation says would be there - in short, if you delete them all, gather will gather all of them.

and I'm not sure you want to use auto, just gather_schema_stats and let as many things default as possible

and do this IN A TEST INSTANCE, IF YOU DO THIS ON PRODUCTION WITHOUT TESTING - PLEASE DON'T COME BACK HERE. You never make major changes in a live system.

Query Perf Slow after Analyze...

A user, October 28, 2008 - 10:02 pm UTC

...you only gathered on 10 things. So, you can probably answer (a) yourself.

Sorry Tom

Not making the things clear earlier.


I ran

exec dbms_stats.gather_schema_stats(ownname=>'ZZ', degree=>4, cascade=>true, options=>'gather auto');

....where any of the objects query A accesses in that set of 10?

NO

Tom Kyte
October 29, 2008 - 7:48 am UTC

you said above "and last analyzed was today for only 10 things" - so apparently, you gathered statistics for 10 things, out of 1000

please stop playing in production right now - I'll not follow up anymore until you tell me "we are doing this in test now"

Query Perf Slow after Analyze...

A reader, October 29, 2008 - 12:34 am UTC

HI Tom.

>you only gathered on 10 things. So, you can probably answer (a) yourself.

You are right. I used 'gather' option to collect stats on all the tables/indexes.

Now the querry "A" has stats for all the objects it accesses using dbms_stat.

But still it takes long to execute. Does it mean that querry has some problem and stats are correct.?



Query Perf Slow after Analyze...

A Reader, October 29, 2008 - 2:56 pm UTC

Sorry Tom,

I understand the point which you made above.

Regards.

Analyze versus DBMS_STATS

Edson, December 17, 2008 - 8:18 am UTC

Hello my name is Edson and I'm from Brazil.
So I have been problems using DBMS_STATS because when I´m using Analyze for 4 tables, my query get better 20 seconds and when I´m using DBMS_STATS my query get better just 13 seconds.
I think that I´m using DBMS_STATS wrong.

The way that I´m using is :
EXECUTE SYS.DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'USER',TABNAME=>'TABLE',GRANULARITY=>'ALL',METHOD_OPT=>'FOR ALL COLUMNS SIZE 1', ESTIMATE_PERCENT=>NULL,CASCADE=>TRUE, DEGREE=>6);

Please Help me.
Tom Kyte
December 29, 2008 - 12:28 pm UTC

do you actually get different plans?

show us the TKPROFS with wait events included.

The difference between 13 and 20 could be "not different" for all we know. Show us the row source operations from a tkprof for the two.

Query Performance in 10g

Suzanne, December 31, 2008 - 2:01 pm UTC

We use Siebel as our CRM and I've just upgraded the Oracle database from 9.2.0.7 to 10.2.0.4. The database performance has been uneven but generally slower under 10g than 9i. One of the solutions that Siebel asked us to implement was the removal of all of the statistics on any empty table (there are many in our Siebel implementation but we don't use them that is why they are empty). This seems to has improved our performance significantly. This runs contrary to all I know about Oracle and statistics. Why did the removal of statistics on empty tables give us a significant performance boost?
Tom Kyte
January 05, 2009 - 10:20 am UTC

are they really empty?

or - are they empty usually - filled by the application - and then used.

I would say "probably, they are empty when you analyze and filled by the application at run time"


If you never query them - then removing the statistics would do nothing for you.

So, you must query them... And if you query them, they probably have data (else you would always get zero rows...).

In 9i, you had no statistics on these tables - and if they were not indexed, dynamic sampling would kick in. Dynamic sampling would have told the optimizer "there are 10,000 rows in here". We'd get the right plan.

In 10g, the auto job would find these unanalyzed tables and analyze them. Putting into the dictionary the "fact" that there are 0 rows. However, at run time, there would actually be 10,000 rows in there. When we have the wrong 'card=' values - we get the wrong plan. So, because we think 0 records - we get the wrong plan.


Remove the statistics and in 10g, we'd dynamically sample at hard parse time. We would discover "hey, there are 10,000 rows". We'd get the right plan.

https://www.oracle.com/technetwork/issue-archive/2009/09-jan/o19asktom-086775.html

RE:Analyze versus DBMS_STATS

Edson, January 05, 2009 - 7:56 am UTC

Hi Tom,
My argument about Analyze cost plan is better as DBMS_STATS is wrong ... I have seen with my DB and the DBMS_STATS get the same cost plan.
Thank you for your answer.

And I want to tell you that the topic is so much rich and help me so much for my research.

One more time I really grateful for your help.

Oramag Article

S, January 17, 2009 - 1:03 pm UTC

Tom,

In the Oracle Magazine article On Dynamic Sampling, under the title When the Optimizer Guesses, you wrote:

"After gathering statistics, you ask the optimizer to estimate how many people were born in December, and it would almost certainly be able to come up with a very accurate estimate of 1/12 of the data (assuming a normal distribution of birth dates)."

It should probably read: (assuming the uniform distribution of birth dates).

Regards
Tom Kyte
January 17, 2009 - 1:28 pm UTC

hah, you are correct, I guess if it were "normal", it would have most people in the summer and very few in the winter.

Unless you are in Australia and then it would be the opposite...

thanks and I agree.

DBMS_STATS Problem !!!!

Indranil DAS, March 09, 2009 - 6:10 am UTC

Hi Tom,
We are not able to find out why the below error is occurred ::
SQL> BEGIN dbms_stats.GATHER_TABLE_STATS('MEA_REPORTS','RPT_OUTER_CARTON_T', estimate_percent=> DBMS_STATS.AUTO_SAMPLE_SIZE, block_sample=>TRUE, method_opt=>
 'for all indexed columns size AUTO', granularity=>'ALL',cascade=>TRUE, degree=>4); END;

*
ERROR at line 1:
ORA-00979: not a GROUP BY expression
ORA-06512: at "SYS.DBMS_STATS", line 10502
ORA-06512: at "SYS.DBMS_STATS", line 10516
ORA-06512: at line 1

Tom, what wents wrong here -plz suggest.

Thanks in advance for your help.
Indranil
Oracle DBA,IBM

Tom Kyte
March 09, 2009 - 12:59 pm UTC

please utilize support/metalink for that.


I remember something vaguely relating to a function based index in early releases causing this.

Extended Statistics

Arvind Mishra, May 04, 2009 - 8:21 pm UTC

Hello Tom,

a) Please can you tell me how can I effectively use extended statistics.
b) If i use expression statistics then function based index is still useful? When should I use function based index and when should I use function based index?
c) How can I see which multicolumn statistics are gathered on a table.
d) Is this a new feature in 11g?

Thanks,

ARVIND
Tom Kyte
May 09, 2009 - 10:32 am UTC

a) by gathering statistics on expressions or by gathering statistics on correlated columns... In the same manner you would for an INDIVIDUAL column.

b) sure, if you gathered statistics on EMPNO in EMP, but did not have a primary key on EMPNO in EMP and EMP had a lot of records - would

select * from emp where empno = :x;

run any faster? Of course not :) you would certainly create an index in order to retrieve that single record quickly.

Same with an expression!!!


ops$tkyte%ORA11GR1> create table emp
  2  as
  3  select object_id empno,
  4         trunc( dbms_random.value( 1000, 4000 ) ) sal,
  5         trunc( dbms_random.value( 5000, 10000 ) ) comm,
  6             decode( mod(rownum,2), 0, 'N', 'Y' ) flag1,
  7             decode( mod(rownum,2), 0, 'Y', 'N' ) flag2
  8    from stage
  9  /

Table created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> create index flag_idx on emp(flag1,flag2);

Index created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> exec dbms_stats.gather_table_stats( user, 'EMP', method_opt => 'for all columns size 254', estimate_percent=>100);

PL/SQL procedure successfully completed.

<b>Now, without any extended statistics:</b>


ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> begin
  2          for x in ( select /*+ gather_plan_statistics */  * from  emp where flag1 = 'N' )
  3          loop
  4                  null;
  5          end loop;
  6          for x in ( select plan_table_output from table(dbms_xplan.display_cursor(null, null, 'allstats last')) )
  7          loop
  8                  dbms_output.put_line( '.'||x.plan_table_output );
  9          end loop;
 10          for x in ( select /*+ gather_plan_statistics */  * from  emp where flag2 = 'N' )
 11          loop
 12                  null;
 13          end loop;
 14          for x in ( select plan_table_output from table(dbms_xplan.display_cursor(null, null, 'allstats last')) )
 15          loop
 16                  dbms_output.put_line( '.'||x.plan_table_output );
 17          end loop;
 18          for x in ( select /*+ gather_plan_statistics */  * from  emp where flag1 = 'N' and flag2 = 'N' )
 19          loop
 20                  null;
 21          end loop;
 22          for x in ( select plan_table_output from table(dbms_xplan.display_cursor(null, null, 'allstats last')) )
 23          loop
 24                  dbms_output.put_line( '.'||x.plan_table_output );
 25          end loop;
 26  end;
 27  /
.SQL_ID  6ukxu8bf6rrbc, child number 0
.-------------------------------------
.SELECT /*+ gather_plan_statistics */ * FROM EMP WHERE FLAG1 = 'N'
.
.Plan hash value: 3956160932
.
.------------------------------------------------------------------------------------
.| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
.------------------------------------------------------------------------------------
.|   0 | SELECT STATEMENT  |      |      1 |        |  34232 |00:00:00.01 |     552 |
.|*  1 |  TABLE ACCESS FULL| EMP  |      1 |  34232 |  34232 |00:00:00.01 |     552 |
.------------------------------------------------------------------------------------
.
.Predicate Information (identified by operation id):
.---------------------------------------------------
.
.   1 - filter("FLAG1"='N')
.
.SQL_ID  1hwwuw5p2v7s8, child number 0
.-------------------------------------
.SELECT /*+ gather_plan_statistics */ * FROM EMP WHERE FLAG2 = 'N'
.
.Plan hash value: 3956160932
.
.------------------------------------------------------------------------------------
.| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
.------------------------------------------------------------------------------------
.|   0 | SELECT STATEMENT  |      |      1 |        |  34232 |00:00:00.01 |     552 |
.|*  1 |  TABLE ACCESS FULL| EMP  |      1 |  34232 |  34232 |00:00:00.01 |     552 |
.------------------------------------------------------------------------------------
.
.Predicate Information (identified by operation id):
.---------------------------------------------------
.
.   1 - filter("FLAG2"='N')
.
.SQL_ID  00s1uatbzv2j2, child number 0
.-------------------------------------
.SELECT /*+ gather_plan_statistics */ * FROM EMP WHERE FLAG1 = 'N' AND
.FLAG2 = 'N'
.
.Plan hash value: 3956160932
.
.------------------------------------------------------------------------------------
.| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
.------------------------------------------------------------------------------------
.|   0 | SELECT STATEMENT  |      |      1 |        |      0 |00:00:00.01 |     211 |
.|*  1 |  TABLE ACCESS FULL| EMP  |      1 |  17116 |      0 |00:00:00.01 |     211 |
.------------------------------------------------------------------------------------
.
.Predicate Information (identified by operation id):
.---------------------------------------------------
.
.   1 - filter(("FLAG1"='N' AND "FLAG2"='N'))
.

PL/SQL procedure successfully completed.

<b>it gets the right estimated cardinalities for flag1=N, for flag2=N, but totally messes up flag1=N and flag2=N.  Add that pair:</b>


ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> exec dbms_output.put_line( dbms_stats.create_extended_stats( user, 'EMP', '(FLAG1,FLAG2)' ) );
SYS_STUGVNB7PTIYWAVPJX#YT77WGD

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> exec dbms_stats.gather_table_stats( user, 'EMP', method_opt => 'for all columns size 254', estimate_percent=>100);

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> begin
  2          for x in ( select /*+ gather_plan_statistics */  * from  emp EMP2 where flag1 = 'N' and flag2 = 'N' )
  3          loop
  4                  null;
  5          end loop;
  6          for x in ( select plan_table_output from table(dbms_xplan.display_cursor(null, null, 'allstats last')) )
  7          loop
  8                  dbms_output.put_line( '.'||x.plan_table_output );
  9          end loop;
 10  end;
 11  /
.SQL_ID  769uyry623wf9, child number 0
.-------------------------------------
.SELECT /*+ gather_plan_statistics */ * FROM EMP EMP2 WHERE FLAG1 = 'N'
.AND FLAG2 = 'N'
.
.Plan hash value: 3400700027
.
.--------------------------------------------------------------------------------------------------
.| Id  | Operation                   | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
.--------------------------------------------------------------------------------------------------
.|   0 | SELECT STATEMENT            |          |      1 |        |      0 |00:00:00.01 |       2 |
.|   1 |  TABLE ACCESS BY INDEX ROWID| EMP      |      1 |      1 |      0 |00:00:00.01 |       2 |
.|*  2 |   INDEX RANGE SCAN          | FLAG_IDX |      1 |      1 |      0 |00:00:00.01 |       2 |
.--------------------------------------------------------------------------------------------------
.
.Predicate Information (identified by operation id):
.---------------------------------------------------
.
.   2 - access("FLAG1"='N' AND "FLAG2"='N')
.

PL/SQL procedure successfully completed.

<b> cardinality is right, plan changes - all good

Now, do an expression:</b>


ops$tkyte%ORA11GR1> begin
  2          for x in ( select /*+ gather_plan_statistics */  * from  EMP where ln(sal+comm) > 9 )
  3          loop
  4                  null;
  5          end loop;
  6          for x in ( select plan_table_output from table(dbms_xplan.display_cursor(null, null, 'allstats last')) )
  7          loop
  8                  dbms_output.put_line( '.'||x.plan_table_output );
  9          end loop;
 10  end;
 11  /
.SQL_ID  f03rzhdyzy1gc, child number 0
.-------------------------------------
.SELECT /*+ gather_plan_statistics */ * FROM EMP WHERE LN(SAL+COMM) > 9
.
.Plan hash value: 3956160932
.
.------------------------------------------------------------------------------------
.| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
.------------------------------------------------------------------------------------
.|   0 | SELECT STATEMENT  |      |      1 |        |  58463 |00:00:08.42 |     794 |
.|*  1 |  TABLE ACCESS FULL| EMP  |      1 |   3423 |  58463 |00:00:08.42 |     794 |
.------------------------------------------------------------------------------------
.
.Predicate Information (identified by operation id):
.---------------------------------------------------
.
.   1 - filter(LN("SAL"+"COMM")>9)
.

PL/SQL procedure successfully completed.

<b>well, we can see that is wrong (the estimated row count) </b>

ops$tkyte%ORA11GR1> alter table EMP add (comp as (ln(sal+comm)));

Table altered.

ops$tkyte%ORA11GR1> exec dbms_stats.gather_table_stats( user, 'EMP', method_opt => 'for all columns size 254', estimate_percent=>100);

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> begin
  2          for x in ( select /*+ gather_plan_statistics */  * from  EMP where ln(sal+comm) > 9 )
  3          loop
  4                  null;
  5          end loop;
  6          for x in ( select plan_table_output from table(dbms_xplan.display_cursor(null, null, 'allstats last')) )
  7          loop
  8                  dbms_output.put_line( '.'||x.plan_table_output );
  9          end loop;
 10  end;
 11  /
.SQL_ID  f03rzhdyzy1gc, child number 0
.-------------------------------------
.SELECT /*+ gather_plan_statistics */ * FROM EMP WHERE LN(SAL+COMM) > 9
.
.Plan hash value: 3956160932
.
.------------------------------------------------------------------------------------
.| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
.------------------------------------------------------------------------------------
.|   0 | SELECT STATEMENT  |      |      1 |        |  58463 |00:00:05.61 |     794 |
.|*  1 |  TABLE ACCESS FULL| EMP  |      1 |  58481 |  58463 |00:00:05.61 |     794 |
.------------------------------------------------------------------------------------
.
.Predicate Information (identified by operation id):
.---------------------------------------------------
.
.   1 - filter(LN("SAL"+"COMM")>9)
.

PL/SQL procedure successfully completed.

<b>card = is now correct...</b>




c) query the dictionary:

ops$tkyte%ORA11GR1> select table_name, column_name from user_tab_col_statistics where table_name = 'EMP';

TABLE_NAME                     COLUMN_NAME
------------------------------ ------------------------------
EMP                            EMPNO
EMP                            SAL
EMP                            COMM
EMP                            FLAG1
EMP                            FLAG2
EMP                            SYS_STUGVNB7PTIYWAVPJX#YT77WGD
EMP                            COMP

7 rows selected.




that sys_fadfadafds thing was dbms_output'ed above - that is flag1,flag2 - our virtual column looks like a normal one - COMP.


c)

ops$tkyte%ORA11GR1> select table_name, column_name from user_tab_col_statistics where table_name = 'EMP';

TABLE_NAME                     COLUMN_NAME
------------------------------ ------------------------------
EMP                            EMPNO
EMP                            SAL
EMP                            COMM
EMP                            FLAG1
EMP                            FLAG2
EMP                            SYS_STUGVNB7PTIYWAVPJX#YT77WGD
EMP                            COMP

7 rows selected.

ops$tkyte%ORA11GR1> SELECT extension_name, extension
  2  FROM   dba_stat_extensions
  3  WHERE  table_name = 'EMP';

EXTENSION_NAME
------------------------------
EXTENSION
-------------------------------------------------------------------------------
SYS_STUGVNB7PTIYWAVPJX#YT77WGD
("FLAG1","FLAG2")

COMP
("SAL"+"COMM")




d) yes

CHAIN_CNT affecting query plan ... ?

Gary, August 19, 2009 - 11:23 am UTC

Tom,

You said in an earlier post in this thread (December 16, 2005) that "dbms_stats won't gather things not used by the CBO such as chain_cnt/avg_space in particular." We ran into an odd situation recently where removing the chain_cnt statistic seemed to result in an improved plan.

The job stream was running ANALYZE followed by DBMS_STATS, since that was the only way the development team knew to get a count of chained rows.

We decided on a simplified test case, since the developer had been rebuilding the table with a higher pctfree to eliminate the chained rows; here is what I tried:

Use DBMS_STATS.EXPORT_TABLE_STATS to preserve existing stats on two tables used by the problem query.
ANALYZE TABLE <name> DELETE STATISTICS ; -- for both tables
Use IMPORT_TABLE_STATS to put the exported statistics back into place

This should have had the net effect of removing only the stats gathered by ANALYZE TABLE, including CHAIN_CNT and AVG_SPACE. The plan changed very much for the better after this procedure was done. The old plan used index access to hit the two tables; the new plan full scanned them then used them in hash joins. The tables were only 2 MB and 14 MB in size, so the hash join approach seems like a no brainer chained rows or no.

The developer seemed pretty sure before this that CHAIN_CNT was affecting the query plan ... now I'm not sure what to tell him. (Maybe some other stat gathered by ANALYZE TABLE is to blame?)

Thanks,

Gary
Tom Kyte
August 24, 2009 - 5:00 pm UTC

did you export column and index statistics as well?

else you deleted A LOT of stuff.

stats

sam, August 27, 2009 - 12:43 pm UTC

Tom:

I want to setup a nightly job to update stats in a database.

What is the standard statement you would do. is it

dbms_stats.gather_schema_stats('MYAPP');

I noticed this did not update the stats in USER_IDNEXES and COLUMNS.
Tom Kyte
August 28, 2009 - 4:57 pm UTC

Sam/SMK

you know my answer is going to be

it depends.



You noticed something incorrectly. It does.


ops$tkyte%ORA10GR2> create table t ( x int );

Table created.

ops$tkyte%ORA10GR2> create index t_idx on t(x);

Index created.

ops$tkyte%ORA10GR2> insert into t select rownum from all_users;

42 rows created.

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select 't', num_rows from user_tables where table_name = 'T'
  2  union all
  3  select 'i', num_rows from user_indexes where index_name = 'T_IDX'
  4  union all
  5  select 'c', count(*) from user_tab_histograms where table_name = 'T';

'   NUM_ROWS
- ----------
t
i
c          0

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> exec dbms_stats.gather_schema_stats( user );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select 't', num_rows from user_tables where table_name = 'T'
  2  union all
  3  select 'i', num_rows from user_indexes where index_name = 'T_IDX'
  4  union all
  5  select 'c', count(*) from user_tab_histograms where table_name = 'T';

'   NUM_ROWS
- ----------
t         42
i         42
c          2



how cool is that, 42 :) that was a pleasant surprise.

stats

A reader, August 29, 2009 - 1:51 pm UTC

Tom:

1. You are looking at num_rows column.

I was looking at LAST_ANALYZED date stamp in USER INDEXES.

The LAST_ANALYZED in USER_TABLES gets update correctly to the date you run the statement but not USER_INDEXES.

2, on a second note, i was reading a book that says collecting stats may not be helpful ever day depending on data.

The Oracle optimizer will generate new query plan every time you collect stats so it will not reuse old plans.

is this correct and do you adivese to schedule it on a weekly basis?
Tom Kyte
August 29, 2009 - 7:27 pm UTC

1) sam, perhaps you should have looked again?

ops$tkyte%ORA10GR2> create table t ( x int );

Table created.

ops$tkyte%ORA10GR2> create index t_idx on t(x);

Index created.

ops$tkyte%ORA10GR2> insert into t select rownum from all_users;

42 rows created.

ops$tkyte%ORA10GR2> commit;

Commit complete.

ops$tkyte%ORA10GR2> select 't', num_rows, last_analyzed from user_tables where table_name = 'T'
  2  union all
  3  select 'i', num_rows, last_analyzed from user_indexes where index_name = 'T_IDX'
  4  union all
  5  select 'c', count(*), to_date( null) from user_tab_histograms where table_name = 'T';

'   NUM_ROWS LAST_ANAL
- ---------- ---------
t
i
c          0

ops$tkyte%ORA10GR2> exec dbms_stats.gather_schema_stats( user );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select 't', num_rows, last_analyzed from user_tables where table_name = 'T'
  2  union all
  3  select 'i', num_rows, last_analyzed from user_indexes where index_name = 'T_IDX'
  4  union all
  5  select 'c', count(*), to_date( null) from user_tab_histograms where table_name = 'T';

'   NUM_ROWS LAST_ANAL
- ---------- ---------
t         42 29-AUG-09
i         42 29-AUG-09
c          2





2) .. The Oracle optimizer will generate new query plan every time you collect stats
so it will not reuse old plans.
..

sam - you do understand that the major goal of collecting statistics is....


to cause plans to change because the size and shape of the data has changed.


so, (this is dead serious, I'm not sarcastic here at all), so do you want your plans to change like that - or not.


Only you can answer the question "how often should I analyze"

you know your data
you know how your data changes
you know how your applications use your data
you know this, if you don't, stop - and get that information



index creation on table with LOCKED stats

psha, November 04, 2009 - 1:35 pm UTC

Hi Tom,
We are using Oracle 10g R2.We have one table with LOCKED stats. Now if I create indx on it, index is not analyzed. We are planining to migrate to 11g, is there any way in 11g to lock table stats such that whenever we create any index, index gets analyzed as soon as it is creatd?

Thanks
Reader
Tom Kyte
November 11, 2009 - 9:31 am UTC

no, when you lock, you lock.

you would unlock in order to gather, so just unlock, create index, lock

ops$tkyte%ORA11GR2> create table t
  2  as
  3  select * from all_objects;

Table created.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> create index t_idx on t(object_name);

Index created.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> exec dbms_stats.gather_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> exec dbms_stats.lock_table_stats( user, 'T' );

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select 't', table_name, num_rows, last_analyzed from user_tables where table_name = 'T'
  2  union all
  3  select 'i', index_name, num_rows, last_analyzed from user_indexes where table_name = 'T';

' TABLE_NAME                       NUM_ROWS LAST_ANALYZED
- ------------------------------ ---------- --------------------
t T                                   71588 11-nov-2009 10:30:13
i T_IDX                               71588 11-nov-2009 10:30:13

ops$tkyte%ORA11GR2><b>
ops$tkyte%ORA11GR2> exec dbms_stats.unlock_table_stats( user, 'T' );
</b>
PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> create index t_idx2 on t(owner,object_type);

Index created.
<b>
ops$tkyte%ORA11GR2> exec dbms_stats.lock_table_stats( user, 'T' );
</b>
PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select 't', table_name, num_rows, last_analyzed from user_tables where table_name = 'T'
  2  union all
  3  select 'i', index_name, num_rows, last_analyzed from user_indexes where table_name = 'T';

' TABLE_NAME                       NUM_ROWS LAST_ANALYZED
- ------------------------------ ---------- --------------------
t T                                   71588 11-nov-2009 10:30:13
i T_IDX2                              71588 11-nov-2009 10:30:14
i T_IDX                               71588 11-nov-2009 10:30:13

A reader, January 24, 2010 - 3:43 pm UTC

After I issue "alter table move" or "index rebuild", do I need to analyze tables or use dbms_stats to gather statistics?
From your book I understand that column level histograms and avg_row_len statistics are lost??
Tom Kyte
January 26, 2010 - 1:48 am UTC

from *my* book? where did you read that - so I can fix it.

ops$tkyte%ORA11GR1> create table t as select * from all_users;

Table created.

ops$tkyte%ORA11GR1> create index t_idx on t(username);

Index created.

ops$tkyte%ORA11GR1> begin
  2  dbms_stats.gather_table_stats
  3  ( user, 'T', method_opt => 'for all indexed columns size 254' );
  4  end;
  5  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select avg_row_len from user_tables where table_name = 'T';

AVG_ROW_LEN
-----------
         20

ops$tkyte%ORA11GR1> select count(*) from user_tab_histograms where table_name = 'T';

  COUNT(*)
----------
        38

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> alter table t move;

Table altered.

ops$tkyte%ORA11GR1> alter index t_idx rebuild;

Index altered.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select avg_row_len from user_tables where table_name = 'T';

AVG_ROW_LEN
-----------
         20

ops$tkyte%ORA11GR1> select count(*) from user_tab_histograms where table_name = 'T';

  COUNT(*)
----------
        38




Now, that aside, you may well have to gather statistics after a reorganization since major metrics can and will change - like number of blocks for example (just for example, the others might be affected as well)

A reader, February 20, 2010 - 11:31 pm UTC

Hi

Suppose statistics of two databases are exactly same,database version is also same, but there is a big difference interms of amount of data.

Does the response time of the sql queries will be exactly same ?


Tom Kyte
March 01, 2010 - 5:36 am UTC

... Does the response time of the sql queries will be exactly same ? ...

of course not. Use a little critical thinking and your knowledge of "physics" and you should be able to come up with an example. Here is a simple one:


select * from t;


In both databases, statistics say "there are 1,000 records". Everything is the same between the two - with the exception that in database 1 there are 1,000 records and in database 2 there are 1,000,000,000 records.

Will they perform the same?



Even with the same volumes of data - even with exactly the SAME NUMBER OF ROWS - even with precisely the same exact bits and bytes, you would not, should not, expect the performance to be the same.

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8764517459743#737753700346729628

ops$tkyte%ORA10GR2> set echo on
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> /*
ops$tkyte%ORA10GR2> drop table t1;
ops$tkyte%ORA10GR2> drop table t2;
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create table t1 as select * from all_objects where 1=0 order by object_id;
ops$tkyte%ORA10GR2> create table t2 as select * from all_objects where 1=0 order by dbms_random.random;
ops$tkyte%ORA10GR2> create index t1_idx on t1(object_id);
ops$tkyte%ORA10GR2> create index t2_idx on t2(object_id);
ops$tkyte%ORA10GR2> insert into t1 select * from all_objects order by object_id;
ops$tkyte%ORA10GR2> insert into t2 select * from all_objects order by dbms_random.random;
ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'T1', numrows => 70000, numblks => 1000 );
ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'T2', numrows => 70000, numblks => 1000 );
ops$tkyte%ORA10GR2> exec dbms_stats.set_index_stats( user, 'T1_IDX', numrows => 70000, numlblks => 100, numdist=>70000 );
ops$tkyte%ORA10GR2> exec dbms_stats.set_index_stats( user, 'T2_IDX', numrows => 70000, numlblks => 100, numdist=>70000 );
ops$tkyte%ORA10GR2> */
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> set linesize 1000
ops$tkyte%ORA10GR2> set autotrace traceonly
ops$tkyte%ORA10GR2> select * from t1 where object_id between 1 and 500;

456 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 546753835

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   175 | 17500 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |   175 | 17500 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IDX |   315 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID">=1 AND "OBJECT_ID"<=500)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         70  consistent gets
          0  physical reads
          0  redo size
      19275  bytes sent via SQL*Net to client
        730  bytes received via SQL*Net from client
         32  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        456  rows processed

ops$tkyte%ORA10GR2> select * from t2 where object_id between 1 and 500;

456 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4244861920

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   175 | 17500 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T2     |   175 | 17500 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T2_IDX |   315 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID">=1 AND "OBJECT_ID"<=500)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        490  consistent gets
          0  physical reads
          0  redo size
      19275  bytes sent via SQL*Net to client
        730  bytes received via SQL*Net from client
         32  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        456  rows processed

ops$tkyte%ORA10GR2> set autotrace off
ops$tkyte%ORA10GR2>



see, same exact data, exactly same data, same stats, exactly the same, same plans, same everything

except the amount of work performed - since the organization of data on disk is different.

A reader, March 12, 2010 - 8:19 pm UTC

Mmmm.
Thanks sir.
Suppose uat database is refreshed from production.

Is there any benefit for transferring statistics from production to UAT?
Tom Kyte
March 15, 2010 - 9:51 am UTC

depends on how you refresh.

if you refresh by dump and load - just remember - UAT and production are only as alike as your image in a funhouse mirror is like you. You should gather stats on UAT as the order of rows will be different, sizes of things will be different - everything will be different except for the VALUES of the data. Do expect different plans. You could likely get the same plans if you move the stats over - however - you have no idea if that plan is good bad or indifferent - since the data is organized differently.

if you refresh by restoring a backup, you are done, just use it.

analyze

A reader, July 08, 2010 - 2:28 am UTC

greetings Thomas,

and thanks like always.

exec dbms_stats.gather_schema_stats('A');

exec dbms_stats.gather_table_stats('A','ACTIONS',estimate_percent => 100, cascade => true);

but the problem that, when oracle access the table (actions)
it issues an analyze (i traced the session and found analyze table A.ACTIONS compute statistics in the trace file).

table_name: A.ACTIONS
number of rows: 0
number of columns: 148
number of indexes: 113

i think it is related to the number of indexes, because when i dropped it worked fine, but my question is why is that happening??????
Tom Kyte
July 08, 2010 - 12:22 pm UTC

you think what is related to what???


what version are you on.


analyze

A reader, July 15, 2010 - 4:57 pm UTC

Greetings thomas,

and thanks like always.

regarding the above post:

-----you think what is related to what???

lots of columns and lots of indexes in one table.

but that is a guess, and that is why i am asking you.


------what version are you on.
10.2.0.1
Tom Kyte
July 19, 2010 - 1:10 pm UTC

1,000 columns, 1,000 indexes, no analyze

I'd need a test case to reproduce with
drop table t;

declare
        l_create long := 'create table t ( a int primary key';
begin
        for i in 1 .. 999
        loop
                l_create := l_create || ', c' || i || ' int';
        end loop;
        execute immediate l_create || ')';
        for i in 1 .. 999
        loop
                execute immediate 'create index t_idx' || i || ' on t(c' || i || ')';
        end loop;
end;
/
alter session set sql_trace=true;
exec dbms_stats.gather_table_stats( user, 'T', estimate_percent=> 100, cascade=> true );



i did not see analyze in a trace file anywhere.


_optimizer_invalidation_period

Ravi B, September 14, 2010 - 1:56 pm UTC

Hi Tom,

I read somewhere that the hidden parameter "_optimizer_invalidation_period" influences the setting NO_INVALIDATE in DBMS_STATS. By default it is 180000 (5hrs). What does it mean? The (IN)VALIDATION doesn't occur for 5hrs after it is set? I don't see any documentation on oracle support site either.

Thanks,
Ravi
Tom Kyte
September 15, 2010 - 7:36 am UTC

that is because it is undocumented, I don't discuss things like that really, you don't need to set them - except in response to support telling you to do so. Their meaning can, will, has changed from dot release to dot release - it would be unwise to play with them.

Oracle 11g : Enterprise vs Standard Editions

Cj, February 04, 2011 - 2:26 am UTC

Hello,

I have few question regarding Oracle 11g Standard Edition and Enterprise Edition

1- Where could I find a very detailed document of the differences between Enterprise and Standard Editions on Oracle 11g?

2. In Standard edition, is it possible to use packages DBMS_STATS and DBMS_UTILY?

Thanks in advance
Tom Kyte
February 04, 2011 - 9:32 am UTC

see
http://www.oracle.com/pls/db112/homepage

every release has a License guide that details what comes with each edition.

yes, dbms_stats and dbms_utility are available with SE

Importing Statistics

Shravani, February 15, 2011 - 1:22 pm UTC

Hi Tom,

In our application, we have 2 schemas (A and B) with exactly same tables and data. During the data load process we connect user to B copy. We gather stats in A schema using DBMS_STATS and then import the stats to B schema.

However we are seeing the execution plan and timings totally different between A and B schema. Query executed in A copy returns results quickly as compare to that in B.

Can you please let me know what would make this happens? And what are the areas to look for.

Regards
Shravani.
Tom Kyte
February 15, 2011 - 4:54 pm UTC

show us a tkprof and a 'plan' (with estimated cardinalities - like from autotrace) of both.

Also - show us the method you use to copy the statistics.

Import Statistics

Shravani, February 16, 2011 - 11:51 am UTC

Thanks for your reply Tom.

Here is the query we are executing in A and B Copy
SELECT DISTINCT product_matrix.patient_case_key
FROM date_time_dim init_rcvd_date_condition,
dm_load_dt dm_load_dt_condition,
product_matrix,
product_dim product_dim_suspect
WHERE ( product_matrix.suspect_product_key =
product_dim_suspect.product_key
AND product_dim_suspect.role_indic = 'SUSPECT')
AND (product_matrix.init_recv_dt_key =
init_rcvd_date_condition.year_month_day)
AND (( dm_load_dt_condition.increment_dt >=
product_matrix.effective_dt AND dm_load_dt_condition.increment_dt < product_matrix.end_dt))
AND ( (TO_DATE ('23-SEP-2010', 'DD-MON-YYYY') = dm_load_dt_condition.reporting_dt)
AND dm_load_dt_condition.active = 'Y'
)
AND ( ( product_dim_suspect.generic_nm IN
('CELECOXIB')
OR 'ALL' IN ('CELECOXIB')
)
AND (dm_load_dt_condition.reporting_dt >=
SYS_CONTEXT ('pfast', 'mbss_release')
)
AND ((init_rcvd_date_condition.calendar_dt) >=
TO_DATE ('01-JAN-2010', 'DD-MON-YYYY')
)
AND ((init_rcvd_date_condition.calendar_dt) <=
TO_DATE ('25-SEP-2010', 'DD-MON-YYYY')
)
);

And here is the TKPROF for A
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.05 0 23 0 0
Execute 1 0.00 0.02 0 128 0 0
Fetch 210 0.00 5.41 0 36 0 3127
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 212 0.01 5.49 0 187 0 3127

Misses in library cache during parse: 1
Optimizer goal: ALL_ROWS
Parsing user id: 73 (FSTADM)

Rows Row Source Operation
------- ---------------------------------------------------
3127 PX COORDINATOR (cr=164 pr=0 pw=0 time=1302 us)
0 PX SEND QC (RANDOM) :TQ10002 (cr=0 pr=0 pw=0 time=0 us cost=171 size=106 card=1)
0 HASH UNIQUE (cr=0 pr=0 pw=0 time=0 us cost=171 size=106 card=1)
0 PX RECEIVE (cr=0 pr=0 pw=0 time=0 us)
0 PX SEND HASH :TQ10001 (cr=0 pr=0 pw=0 time=0 us)
0 FILTER (cr=0 pr=0 pw=0 time=0 us)
0 NESTED LOOPS (cr=0 pr=0 pw=0 time=0 us)
0 NESTED LOOPS (cr=0 pr=0 pw=0 time=0 us cost=170 size=106 card=1)
0 NESTED LOOPS (cr=0 pr=0 pw=0 time=0 us cost=170 size=276 card=3)
0 BUFFER SORT (cr=0 pr=0 pw=0 time=0 us)
0 PX RECEIVE (cr=0 pr=0 pw=0 time=0 us)
0 PX SEND BROADCAST :TQ10000 (cr=0 pr=0 pw=0 time=0 us)
31 MERGE JOIN CARTESIAN (cr=36 pr=0 pw=0 time=2280 us cost=10 size=58 card=1)
31 TABLE ACCESS BY GLOBAL INDEX ROWID PRODUCT_DIM PARTITION: ROW LOCATION ROW LOCATION (cr=33 pr=0 pw=0 time=2100 us cost=8 size=39 card=1)
39 INDEX RANGE SCAN UX_PD_GNM_ROLE_PK (cr=3 pr=0 pw=0 time=0 us cost=2 size=0 card=12)(object id 8689572)
31 BUFFER SORT (cr=3 pr=0 pw=0 time=0 us cost=2 size=19 card=1)
1 TABLE ACCESS BY INDEX ROWID DM_LOAD_DT (cr=3 pr=0 pw=0 time=0 us cost=2 size=19 card=1)
1 INDEX RANGE SCAN IX_RPT_DATE_DM_LOAD_DT (cr=2 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 8691296)
0 PX PARTITION HASH ALL PARTITION: 1 4 (cr=0 pr=0 pw=0 time=0 us cost=170 size=170 card=5)
0 TABLE ACCESS BY LOCAL INDEX ROWID PRODUCT_MATRIX PARTITION: KEY 88 (cr=0 pr=0 pw=0 time=0 us cost=170 size=170 card=5)
0 BITMAP CONVERSION TO ROWIDS (cr=0 pr=0 pw=0 time=0 us)
0 BITMAP AND (cr=0 pr=0 pw=0 time=0 us)
0 BITMAP INDEX SINGLE VALUE BMAK_MATRIX_SPPRODKEY PARTITION: KEY 88 (cr=0 pr=0 pw=0 time=0 us)(object id 8695173)
0 BITMAP MERGE (cr=0 pr=0 pw=0 time=0 us)
0 BITMAP INDEX RANGE SCAN BMAK_MATRIX_EFFDT PARTITION: KEY 88 (cr=0 pr=0 pw=0 time=0 us)(object id 8694549)
0 INDEX UNIQUE SCAN PK_DATE_TIME_DIM (cr=0 pr=0 pw=0 time=0 us cost=0 size=0 card=1)(object id 8689529)
0 TABLE ACCESS BY INDEX ROWID DATE_TIME_DIM (cr=0 pr=0 pw=0 time=0 us cost=0 size=14 card=1)

unable to set optimizer goal
ORA-01986: OPTIMIZER_GOAL is obsolete

parse error offset: 33

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: ALL_ROWS
3127 PX COORDINATOR
0 PX SEND (QC (RANDOM)) OF ':TQ10002' [:Q1002]
0 HASH (UNIQUE) [:Q1002]
0 PX RECEIVE [:Q1002]
0 PX SEND (HASH) OF ':TQ10001' [:Q1001]
0 FILTER [:Q1001]
0 NESTED LOOPS [:Q1001]
0 NESTED LOOPS [:Q1001]
0 NESTED LOOPS [:Q1001]
0 BUFFER (SORT) [:Q1001]
0 PX RECEIVE [:Q1001]
0 PX SEND (BROADCAST) OF ':TQ10000'
31 MERGE JOIN (CARTESIAN)
31 TABLE ACCESS GOAL: ANALYZED (BY
GLOBAL INDEX ROWID) OF 'PRODUCT_DIM' (TABLE)
PARTITION:ROW LOCATION
39 INDEX GOAL: ANALYZED (RANGE SCAN)
OF 'UX_PD_GNM_ROLE_PK' (INDEX (UNIQUE))
31 BUFFER (SORT)
1 TABLE ACCESS GOAL: ANALYZED (BY
INDEX ROWID) OF 'DM_LOAD_DT' (TABLE)
1 INDEX GOAL: ANALYZED (RANGE SCAN)
OF 'IX_RPT_DATE_DM_LOAD_DT' (INDEX)
0 PX PARTITION HASH (ALL) [:Q1001] PARTITION:
START=1 STOP=4
0 TABLE ACCESS GOAL: ANALYZED (BY LOCAL
INDEX ROWID) OF 'PRODUCT_MATRIX' (TABLE) [:Q1001]
PARTITION:KEY STOP=88
0 BITMAP CONVERSION (TO ROWIDS) [:Q1001]
0 BITMAP AND [:Q1001]
0 BITMAP INDEX (SINGLE VALUE) OF
'BMAK_MATRIX_SPPRODKEY' (INDEX (BITMAP)) [:Q1001]
PARTITION:KEY STOP=88
0 BITMAP MERGE [:Q1001]
0 BITMAP INDEX (RANGE SCAN) OF
'BMAK_MATRIX_EFFDT' (INDEX (BITMAP)) [:Q1001]
PARTITION:KEY STOP=88
0 INDEX GOAL: ANALYZED (UNIQUE SCAN) OF
'PK_DATE_TIME_DIM' (INDEX (UNIQUE)) [:Q1001]
0 TABLE ACCESS GOAL: ANALYZED (BY INDEX ROWID) OF
'DATE_TIME_DIM' (TABLE) [:Q1001]

=========================================
Here is TKProf for B

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.04 0 26 0 0
Execute 1 0.00 0.02 0 100 0 0
Fetch 210 0.00 69.29 0 36 0 3127
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 212 0.02 69.36 0 162 0 3127

Misses in library cache during parse: 1
Optimizer goal: ALL_ROWS
Parsing user id: 76 (FSTBDM)

Rows Row Source Operation
------- ---------------------------------------------------
3127 PX COORDINATOR (cr=136 pr=0 pw=0 time=1693 us)
0 PX SEND QC (RANDOM) :TQ10002 (cr=0 pr=0 pw=0 time=0 us cost=517 size=114 card=1)
0 HASH UNIQUE (cr=0 pr=0 pw=0 time=0 us cost=517 size=114 card=1)
0 PX RECEIVE (cr=0 pr=0 pw=0 time=0 us)
0 PX SEND HASH :TQ10001 (cr=0 pr=0 pw=0 time=0 us)
0 FILTER (cr=0 pr=0 pw=0 time=0 us)
0 NESTED LOOPS (cr=0 pr=0 pw=0 time=0 us)
0 NESTED LOOPS (cr=0 pr=0 pw=0 time=0 us cost=516 size=114 card=1)
0 NESTED LOOPS (cr=0 pr=0 pw=0 time=0 us cost=516 size=368 card=4)
0 BUFFER SORT (cr=0 pr=0 pw=0 time=0 us)
0 PX RECEIVE (cr=0 pr=0 pw=0 time=0 us)
0 PX SEND BROADCAST :TQ10000 (cr=0 pr=0 pw=0 time=0 us)
31 MERGE JOIN CARTESIAN (cr=36 pr=0 pw=0 time=780 us cost=99 size=58 card=1)
31 TABLE ACCESS BY GLOBAL INDEX ROWID PRODUCT_DIM PARTITION: ROW LOCATION ROW LOCATION (cr=33 pr=0 pw=0 time=600 us cost=97 size=39 card=1)
39 INDEX RANGE SCAN UX_PD_GNM_ROLE_PK (cr=3 pr=0 pw=0 time=0 us cost=2 size=0 card=189)(object id 8743581)
31 BUFFER SORT (cr=3 pr=0 pw=0 time=0 us cost=2 size=19 card=1)
1 TABLE ACCESS BY INDEX ROWID DM_LOAD_DT (cr=3 pr=0 pw=0 time=0 us cost=2 size=19 card=1)
1 INDEX RANGE SCAN IX_RPT_DATE_DM_LOAD_DT (cr=2 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 8743601)
0 PX PARTITION HASH ALL PARTITION: 1 4 (cr=0 pr=0 pw=0 time=0 us cost=516 size=238 card=7)
0 TABLE ACCESS BY LOCAL INDEX ROWID PRODUCT_MATRIX PARTITION: KEY 88 (cr=0 pr=0 pw=0 time=0 us cost=516 size=238 card=7)
0 BITMAP CONVERSION TO ROWIDS (cr=0 pr=0 pw=0 time=0 us)
0 BITMAP AND (cr=0 pr=0 pw=0 time=0 us)
0 BITMAP INDEX SINGLE VALUE BMAK_MATRIX_SPPRODKEY PARTITION: KEY 88 (cr=0 pr=0 pw=0 time=0 us)(object id 8748879)
0 BITMAP MERGE (cr=0 pr=0 pw=0 time=0 us)
0 BITMAP INDEX RANGE SCAN BMAK_MATRIX_EFFDT PARTITION: KEY 88 (cr=0 pr=0 pw=0 time=0 us)(object id 8748990)
0 INDEX UNIQUE SCAN PK_DATE_TIME_DIM (cr=0 pr=0 pw=0 time=0 us cost=0 size=0 card=1)(object id 8743544)
0 TABLE ACCESS BY INDEX ROWID DATE_TIME_DIM (cr=0 pr=0 pw=0 time=0 us cost=0 size=22 card=1)

unable to set optimizer goal
ORA-01986: OPTIMIZER_GOAL is obsolete

parse error offset: 33

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: ALL_ROWS
3127 PX COORDINATOR
0 PX SEND (QC (RANDOM)) OF ':TQ10002' [:Q1002]
0 HASH (UNIQUE) [:Q1002]
0 PX RECEIVE [:Q1002]
0 PX SEND (HASH) OF ':TQ10001' [:Q1001]
0 FILTER [:Q1001]
0 NESTED LOOPS [:Q1001]
0 NESTED LOOPS [:Q1001]
0 NESTED LOOPS [:Q1001]
0 BUFFER (SORT) [:Q1001]
0 PX RECEIVE [:Q1001]
0 PX SEND (BROADCAST) OF ':TQ10000'
31 MERGE JOIN (CARTESIAN)
31 TABLE ACCESS GOAL: ANALYZED (BY
GLOBAL INDEX ROWID) OF 'PRODUCT_DIM' (TABLE)
PARTITION:ROW LOCATION
39 INDEX GOAL: ANALYZED (RANGE SCAN)
OF 'UX_PD_GNM_ROLE_PK' (INDEX (UNIQUE))
31 BUFFER (SORT)
1 TABLE ACCESS GOAL: ANALYZED (BY
INDEX ROWID) OF 'DM_LOAD_DT' (TABLE)
1 INDEX GOAL: ANALYZED (RANGE SCAN)
OF 'IX_RPT_DATE_DM_LOAD_DT' (INDEX)
0 PX PARTITION HASH (ALL) [:Q1001] PARTITION:
START=1 STOP=4
0 TABLE ACCESS GOAL: ANALYZED (BY LOCAL
INDEX ROWID) OF 'PRODUCT_MATRIX' (TABLE) [:Q1001]
PARTITION:KEY STOP=88
0 BITMAP CONVERSION (TO ROWIDS) [:Q1001]
0 BITMAP AND [:Q1001]
0 BITMAP INDEX (SINGLE VALUE) OF
'BMAK_MATRIX_SPPRODKEY' (INDEX (BITMAP)) [:Q1001]
PARTITION:KEY STOP=88
0 BITMAP MERGE [:Q1001]
0 BITMAP INDEX (RANGE SCAN) OF
'BMAK_MATRIX_EFFDT' (INDEX (BITMAP)) [:Q1001]
PARTITION:KEY STOP=88
0 INDEX GOAL: ANALYZED (UNIQUE SCAN) OF
'PK_DATE_TIME_DIM' (INDEX (UNIQUE)) [:Q1001]
0 TABLE ACCESS (BY INDEX ROWID) OF 'DATE_TIME_DIM'
(TABLE) [:Q1001]


=========================================================

We are using "dbms_stats.export_schema_stats" to export the stats from A copy. And then "dbms_stats.import_table_stats" to import the stats into B copy.

Thanks Again

Regards
Shravani

Tom Kyte
February 16, 2011 - 2:11 pm UTC

same plans, same row source operations, same tim= values (similar enough to be the same)...

are you sure in real life the queries take this long - if you run them in sqlplus - do they run really fast in one schema and really slow (repeatedly) in the other?

This looks more like a strange tkprof issue - the TIM= information isn't matching the elapsed column at all.

Importing Stats

Shravani, February 16, 2011 - 3:35 pm UTC

Yes Tome, please believe me the query execution is faster in A copy where we gather the statistics. Can you please let us know where to lookout to find the root cause for this discrepancy?
Tom Kyte
February 16, 2011 - 5:27 pm UTC

can you show me a sqlplus cut/paste with timing on (run with set autotrace traceonly statistics too - we don't need to see the data, just the query, the row counts and the timing from sqplus)

are the data sets on the same devices?
is there other stuff going on on the machine?
do you have ash/awr access?

Import Statistics

Shravani, February 17, 2011 - 10:15 am UTC

Thanks so much Tom. Here is the required input

Query :
SELECT 'hints', patient_case_dim.case_nbr, therapy_profile_dim.dose_range,
       meddra_dim.llt_cd_desc, meddra_dim.pt_cd_desc,
       meddra_dim.primary_soc_cd_desc, meddra_dim.hlt_cd_desc,
       meddra_dim.hlgt_cd_desc, therapy_profile_dim.multiple_route_admin,
       therapy_profile_dim.multiple_formulation,
       therapy_profile_dim.multiple_indication,
       product_event_fact.interacting_drug_indic_desc,
       patient_demographics_dim.gender_cd_desc,
       patient_demographics_dim.age_group,
       case_profile_dim.case_country_cd_desc,
       case_profile_dim.co_suspect_prod_presence, patient_case_dim.age,
       case_profile_dim.source_derived, case_profile_dim.conmed_prod_presence,
       dm_load_dt.meddra_version_nbr,
       product_event_profile_dim.latency_time_2, 'Default',
       case_profile_dim.patient_history_presence, product_dim.generic_nm,
       patient_case_dim.age_unit_cd_desc, case_profile_dim.case_completeness,
       product_therapy_fact.first_daily_dose_qty,
       CONCAT (RTRIM (patient_case_dim.case_nbr),
               RTRIM (patient_case_dim.source_system_nm)
              ),
       patient_case_dim.patient_case_key, product_dim.drl_cd,
       product_event_fact.latency_days_calculated,
       therapy_start_date.calendar_dt, therapy_end_date.calendar_dt,
       event_profile_dim.event_seriousness_rep_cd_desc,
       patient_demographics_dim.elderly_case_indic,
       onset_date_dim.calendar_dt,
       product_event_profile_dim.ib_labeling_indic,
       product_event_profile_dim.cds_labeling_indic,
       product_event_profile_dim.spc_labeling_indic,
       product_event_profile_dim.uspi_only_labeling_indic,
       meddra_dim.dme_indic, case_profile_dim.case_outcome_hierarchy,
       case_profile_dim.case_seriousness_co_cd_desc,
       CASE
          WHEN (meddra_dim.pt_cd_desc) IS NULL
             THEN (product_event_fact.verbatim_event_term) || '*'
          ELSE (meddra_dim.pt_cd_desc)
       END,
       CASE
          WHEN (meddra_dim.primary_soc_cd_desc) IS NULL
             THEN 'SOC Not Yet Coded'
          ELSE (meddra_dim.primary_soc_cd_desc)
       END,
       product_event_profile_dim.dechallenge_cd_desc,
       event_profile_dim.treatment_rcvd_desc,
       event_profile_dim.event_outcome_rep_cd_desc,
       product_event_profile_dim.change_in_dosage_cd_desc,
       patient_case_dim.age_in_years,
       patient_demographics_dim.age_group_cd_desc,
       case_xref_fact.case_reftype_desc, case_xref_fact.case_reference_id,
       patient_case_dim.first_avail_eff_dt, therapy_start_date.derived_dt,
       therapy_end_date.derived_dt,
       product_event_profile_dim.latency_unit_cd_desc,
       product_event_fact.latency, onset_date_dim.derived_dt
  FROM product_event_fact,
       product_therapy_fact,
       therapy_profile_dim,
       patient_case_dim,
       product_dim,
       case_profile_dim,
       meddra_dim,
       product_event_profile_dim,
       date_time_dim therapy_start_date,
       date_time_dim therapy_end_date,
       date_time_dim onset_date_dim,
       event_profile_dim,
       patient_demographics_dim,
       dm_load_dt,
       case_xref_fact
 WHERE (product_therapy_fact.therapy_profile_key =
                                       therapy_profile_dim.therapy_profile_key
       )
   AND (meddra_dim.meddra_key = product_event_fact.reaction_key)
   AND (event_profile_dim.event_profile_key =
                                          product_event_fact.event_profile_key
       )
   AND (product_event_profile_dim.product_event_profile_key =
                                  product_event_fact.product_event_profile_key
       )
   AND (patient_case_dim.patient_case_key =
                                           product_event_fact.patient_case_key
       )
   AND (patient_case_dim.case_profile_key = case_profile_dim.case_profile_key
       )
   AND (onset_date_dim.year_month_day(+) = product_event_fact.onset_dt_key)
   AND (product_therapy_fact.therapy_start_dt_key = therapy_start_date.year_month_day(+))
   AND (product_therapy_fact.therapy_end_dt_key = therapy_end_date.year_month_day(+))
   AND (product_event_fact.patient_demographics_key =
                             patient_demographics_dim.patient_demographics_key
       )
   AND (    product_event_fact.patient_case_key = product_therapy_fact.patient_case_key(+)
        AND product_event_fact.product_key = product_therapy_fact.product_key(+)
        AND product_event_fact.seq_product = product_therapy_fact.seq_product(+)
       )
   AND (    product_event_fact.product_key = product_dim.product_key
        AND product_dim.role_indic = 'SUSPECT'
       )
   AND ((    dm_load_dt.increment_dt >= patient_case_dim.effective_dt
         AND dm_load_dt.increment_dt < patient_case_dim.end_dt
        )
       )
   AND (case_xref_fact.patient_case_key(+) = patient_case_dim.patient_case_key)
   AND (    (TO_DATE ('23-SEP-2010', 'DD-MON-YYYY') = dm_load_dt.reporting_dt
            )
        AND dm_load_dt.active = 'Y'
       )
   AND (    (TO_DATE ('23-SEP-2010', 'DD-MON-YYYY') = dm_load_dt.reporting_dt
            )
        AND dm_load_dt.active = 'Y'
       )
   AND (    (    patient_case_dim.case_logical_del_indic = 0
             AND (   case_profile_dim.historical_case_indic IS NULL
                  OR case_profile_dim.historical_case_indic = 2
                 )
            )
        AND (product_dim.generic_nm IN ('CELECOXIB') OR 'ALL' IN
                                                                ('CELECOXIB')
            )
        AND (patient_case_dim.case_nbr NOT IN (
                SELECT clm_list_item.aer_no
                  FROM bo_sec.clm_list_item, bo_sec.clm_list
                 WHERE (clm_list.list_id = clm_list_item.list_id)
                   AND (clm_list.list_name IN ('NONE'))
                   AND (clm_list.list_type = 'BYPASS'))
            )
        AND patient_case_dim.patient_case_key IN (
               SELECT DISTINCT product_matrix.patient_case_key
                          FROM date_time_dim init_rcvd_date_condition,
                               dm_load_dt dm_load_dt_condition,
                               product_matrix,
                               product_dim product_dim_suspect
                         WHERE (    product_matrix.suspect_product_key =
                                               product_dim_suspect.product_key
                                AND product_dim_suspect.role_indic = 'SUSPECT'
                               )
                           AND (product_matrix.init_recv_dt_key =
                                       init_rcvd_date_condition.year_month_day
                               )
                           AND ((    dm_load_dt_condition.increment_dt >=
                                                   product_matrix.effective_dt
                                 AND dm_load_dt_condition.increment_dt <
                                                         product_matrix.end_dt
                                )
                               )
                           AND (    (TO_DATE ('23-SEP-2010', 'DD-MON-YYYY') =
                                             dm_load_dt_condition.reporting_dt
                                    )
                                AND dm_load_dt_condition.active = 'Y'
                               )
                           AND (    (   product_dim_suspect.generic_nm IN
                                                                ('CELECOXIB')
                                     OR 'ALL' IN ('CELECOXIB')
                                    )
                                AND (dm_load_dt_condition.reporting_dt >=
                                         SYS_CONTEXT ('pfast', 'mbss_release')
                                    )
                                AND ((init_rcvd_date_condition.calendar_dt) >=
                                        TO_DATE ('01-JAN-2010', 'DD-MON-YYYY')
                                    )
                                AND ((init_rcvd_date_condition.calendar_dt) <=
                                        TO_DATE ('25-SEP-2010', 'DD-MON-YYYY')
                                    )
                               ))
       )
=========================

Statistics from A Schema
=====================================================
21:06:53 SQL> @query1

12659 rows selected.

Elapsed: 00:10:58.07

Statistics
----------------------------------------------------------
       5690  recursive calls
          0  db block gets
   13040813  consistent gets
     454569  physical reads
       1876  redo size
    1883128  bytes sent via SQL*Net to client
      16153  bytes received via SQL*Net from client
        845  SQL*Net roundtrips to/from client
       1755  sorts (memory)
          0  sorts (disk)
      12659  rows processed

21:17:55 SQL> spool off;
=====================================================
Statistics from B schema

21:07:46 SQL> @query1

12659 rows selected.

Elapsed: 00:28:13.07

Statistics
----------------------------------------------------------
        785  recursive calls
          0  db block gets
   16474030  consistent gets
     935724  physical reads
     479484  redo size
    1999291  bytes sent via SQL*Net to client
      16153  bytes received via SQL*Net from client
        845  SQL*Net roundtrips to/from client
         19  sorts (memory)
          0  sorts (disk)
      12659  rows processed

21:36:06 SQL> spool off;
===================================================

The data in both the schema is exactly same. Only difference is in A copy data gets inserted via ETL process. And in B schema we truncate the current partition and copy data from A schema. Once the copy process is done we export the statistics gathered in A using DBMS_STATS package procedures. The execution plan for the query in the B copy is very much different than what appears for A.

And Sir, when we execute the queries in A and B copy there were no background jobs or any other processes running.

Can you please let me know what should be done to remove the discrepancy?

Regards
Shravani

Tom Kyte
February 17, 2011 - 12:07 pm UTC

... The data in both the schema is exactly same ...

Nope, sorry, but it is not. Look at the consistent gets. The data VALUES might be the same - but the ordering of the rows on disk is undoubtedly different (compare the statistics in both schemas - look for index clustering factors that are very different - that'll prove that out)

Here is an example - SAME DATA - exactly the SAME DATA - but watch the IO differences:

ops$tkyte%ORA11GR2> create table organized
  2  as
  3  select x.*
  4    from (select * from stage order by object_name) x
  5  /

Table created.

ops$tkyte%ORA11GR2> create table disorganized
  2  as
  3  select x.*
  4    from (select * from stage order by dbms_random.random) x
  5  /

Table created.

ops$tkyte%ORA11GR2> create index organized_idx on organized(object_name);

Index created.

ops$tkyte%ORA11GR2> create index disorganized_idx on disorganized(object_name);

Index created.

ops$tkyte%ORA11GR2> begin
  2  dbms_stats.gather_table_stats
  3  ( user, 'ORGANIZED',
  4    estimate_percent => 100,
  5    method_opt=>'for all indexed columns size 254'
  6  );
  7  dbms_stats.gather_table_stats
  8  ( user, 'DISORGANIZED',
  9    estimate_percent => 100,
 10    method_opt=>'for all indexed columns size 254'
 11  );
 12  end;
 13  /

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> select table_name, blocks, num_rows, 0.05*num_rows, 0.10*num_rows from user_tables
  2  where table_name like '%ORGANIZED' order by 1;

TABLE_NAME                         BLOCKS   NUM_ROWS 0.05*NUM_ROWS 0.10*NUM_ROWS
------------------------------ ---------- ---------- ------------- -------------
DISORGANIZED                         1064      72759       3637.95        7275.9
ORGANIZED                            1064      72759       3637.95        7275.9

ops$tkyte%ORA11GR2> select table_name, index_name, clustering_factor from user_indexes
  2  where table_name like '%ORGANIZED' order by 1;

TABLE_NAME                     INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ ------------------------------ -----------------
DISORGANIZED                   DISORGANIZED_IDX                           72581
ORGANIZED                      ORGANIZED_IDX                               1038

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> set autotrace on
ops$tkyte%ORA11GR2> select /*+ index( organized organized_idx) */
  2    count(subobject_name)
  3    from organized;

COUNT(SUBOBJECT_NAME)
---------------------
                 1106


Execution Plan
----------------------------------------------------------
Plan hash value: 2296712572

----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |     1 |    17 |  1402   (1)| 00:00:17 |
|   1 |  SORT AGGREGATE              |               |     1 |    17 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| ORGANIZED     | 72759 |  1207K|  1402   (1)| 00:00:17 |
|   3 |    INDEX FULL SCAN           | ORGANIZED_IDX | 72759 |       |   363   (1)| 00:00:05 |
----------------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1400  consistent gets
          0  physical reads
          0  redo size
        436  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

ops$tkyte%ORA11GR2> select /*+ index( disorganized disorganized_idx) */
  2    count(subobject_name)
  3    from disorganized;

COUNT(SUBOBJECT_NAME)
---------------------
                 1106


Execution Plan
----------------------------------------------------------
Plan hash value: 3014554493

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |     1 |    17 | 72966   (1)| 00:14:36 |
|   1 |  SORT AGGREGATE              |                  |     1 |    17 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| DISORGANIZED     | 72759 |  1207K| 72966   (1)| 00:14:36 |
|   3 |    INDEX FULL SCAN           | DISORGANIZED_IDX | 72759 |       |   363   (1)| 00:00:05 |
-------------------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      72943  consistent gets
          0  physical reads
          0  redo size
        436  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

ops$tkyte%ORA11GR2> set autotrace off



One of your schemas did more then twice the physical IO's and 3 million more logical IO's - the data is different.


Try gathering statistics on B directly and compare the statistics of the tables, indexes and columns - you should see they are "different"

Shravani, February 17, 2011 - 1:57 pm UTC

Dear Tom,

Thanks so so much for this answer. Yes after gathering the stats, I can see index clustering factor is very different.

However I am still not getting why the data in B copy is disorganised. Do we need to do changes in the copy process?

The current logic is, if the table is partitioned we truncate the current partition in B copy and then select-insert the current partition data from A into B. For the non-partitioned tables we just truncate-insert.

One more thing is, each day we drop all the indexes in B and recreate them. So to summarize the A-B copy process works in following way

1. Drop all indexes in B copy
2. Truncate the current partition or a complete table in B
3. Insert the data from A copy
4. Create all the indexes
5. Clone the statistics from A using DBMS_STATS.Export_Schema_Stats
6. Copy the statistics into b using DBMS_STATS.Import_Schema_Stats

Do you think one of the above steps has lead to disorganized data in B copy? And can you please give us hint on the probable resolution for this?

Thanks
Shravani
Tom Kyte
February 17, 2011 - 2:11 pm UTC

show me the code for step 3

tell me about the tablespaces - they are automatic segment space managed or manual segment space managed?

Importing Statistics

Shravani, February 17, 2011 - 3:30 pm UTC

Hi Tom,

The tablespace are automatic segment space managed.

Here is the code

IF clonetype = 1 THEN
EXECUTE IMMEDIATE ('alter table ' || targetschema || '.' || in_table || ' truncate subpartition ' || in_subpart );
ELSIF clonetype = 2 THEN
EXECUTE IMMEDIATE ('alter table ' || targetschema || '.' || in_table || ' truncate partition ' || in_subpart );
ELSIF clonetype = 3 THEN
EXECUTE IMMEDIATE ('truncate table ' || targetschema || '.' || in_table);
END IF;
/* Now, insert the data in Direct Path mode... */
diag_section := 'APPEND DATA';
mysql := 'insert /*+ APPEND NOLOGGING */ into ' || targetschema || '.' || in_table;
IF clonetype = 1 THEN
mysql := mysql || ' subpartition (' || in_subpart || ') (select * from '|| sourceschema || '.';
mysql := mysql || in_table || ' subpartition (' || in_subpart || '))';
ELSIF clonetype = 2 THEN
mysql := mysql || ' partition (' || in_subpart || ') (select * from ' || sourceschema || '.';
mysql := mysql || in_table || ' partition (' || in_subpart || '))';
ELSIF clonetype = 3 THEN
mysql := mysql || ' (select * from ' || sourceschema || '.' || in_table || ')';
END IF;

Regards
Shravani
Tom Kyte
February 17, 2011 - 6:02 pm UTC

you know that NOLOGGING is meaningless. Nologging is not a hint, it is not usable in SQL selects/inserts - it is only useful in an ALTER TABLE in that context.

that should be inserting the data in MORE OR LESS the same sort of order - automatic segment space management spreads data out by design though, so it won't be exact. Also - is parallel involved anywhere?

But if the clustering factors are very different - then, we know that it is laid out differently on disk.

You can use an order by on the insert as select to cause the rows to be organized in some fashion if you so desire.

Thanks so much Tom

shravani, February 21, 2011 - 7:31 am UTC

Thanks for the clarification Tom. We are now changing the INSERT method by adding the ORDER BY clause.


granularity with where condition

A reader, March 25, 2011 - 8:50 am UTC

Is it possible to gather table stats based on where condition

granularity=>"some_column=some_value"
Tom Kyte
March 25, 2011 - 11:21 am UTC

no, why??? what is your goal.

gather_stats

A reader, March 26, 2011 - 4:25 am UTC

I do not want to subpartition further. I would like to gather statistics on subset of data.
Tom Kyte
March 29, 2011 - 2:40 am UTC

We can only gather statistics on a segment. If you do not want to partition further, you have to gather statistics on what you have.

Analyze

A reader, July 15, 2011 - 3:49 pm UTC

Hi Tom - I thought multiple DML in a statement should be followed by analyze of the table with the indexes. I did as below but dont see any effects. So is it even multiple dml on the table keeps the index intact - so analyze needed.



-- create a table with a timestamp column
create table Test_Index as select a.*,a.OBJECT_ID indx_col from all_objects a where 1=2;

--create an index on the table
create index indx_indx_col on test_index(indx_col );

--analyze the table , nothing to analyze
analyze table Test_Index compute statistics
for table
for all indexes
for all indexed columns;

-- insert into the table
insert into test_index select a.*,a.OBJECT_ID indx_col from all_objects a;


-- see the sql plan

set autot traceonly
select * from test_index where indx_col=&1;

-- delete from the table
delete from test_index;

-- insert into the table
insert into test_index select a.*,a.OBJECT_ID indx_col from all_objects a;


-- insert into the table
insert into test_index select a.*,a.OBJECT_ID indx_col from all_objects a;

-- see the sql plan

set autot traceonly
select * from test_index where indx_col=&1;

--analyze the table , nothing to analyze
analyze table Test_Index compute statistics
for table
for all indexes
for all indexed columns;

-- see the sql plan
set autot traceonly
select * from test_index where indx_col=&1;

Please suggest.
Tom Kyte
July 18, 2011 - 10:14 am UTC

you should never use analyze to gather statistics, please ONLY use dbms_stats.


a gathering of statistics MAY affect the plan, it may not. It gives an opportunity for a plan change, it does not mean "the plan will change, the plan has to change". It simply means "plans might change, then again, plans might stay the same"


In general, you DO NOT want to gather statistics after multiple DML, it is not necessary, you do it when the data changes greatly.


when I ran the above (changing analyze to dbms_stats), it just always decides to use the index, nothing wrong or suspicious there.

Any Example

A reader, July 19, 2011 - 6:19 pm UTC

Hi Tom - Is there any specific example/links on your site where i can lookup to see what you meant by "
In general, you DO NOT want to gather statistics after multiple DML, it is not necessary, you do it when the data changes greatly. "

Interested to gather more information on the "data changes greatly" .

Please help.

Thanks as always

Another use for dbms_stats?

Raj, August 19, 2011 - 3:20 am UTC

Hi Tom,
I was trying to find another use for dbms_stats exports by trying to trend a table's growth (row count) over time. If I had a daily export (taken before a new gather), I'm having troubles figuring out which columns in my export table are reliable. I realize that this isn't totally accurate, more of a problem of not understanding the mapping. Eg.

select count(*) from test.a;
COUNT(*)
----------
107929

exec dbms_stats.gather_table_stats(ownname=>'test',tabname=>'a',estimate_percent=>0.1,granularity=>'ALL',degree=>2,cascade=>true);

select count(*) from test.a;
COUNT(*)
----------
107929

exec dbms_stats.create_stat_table('test','a_statexp','test');
exec dbms_stats.export_table_stats(ownname=>'test', tabname=>'a', stattab=>'a_statexp', statown=>'test');

I find 3 columns with the value 107929:
select count(*) from test.a_statexp where N1=107929;
COUNT(*)
----------
5
select count(*) from test.a_statexp where N3=107929;
COUNT(*)
----------
1
select count(*) from test.a_statexp where N8=107929;
COUNT(*)
----------
5

When I look at the value of C1 in the a_statexp table, I see that it's the primary key rather than the table itself. It sort of makes sense, but doesn't seem very trustworthy.

Is there a better way?

Re: Another use for dbms_stats?

Raj, August 19, 2011 - 10:37 am UTC

I forgot to include the concluding statement I was wondering about...
select n1 from test.a_statexp where c1 = pk_index_of_tab_a;

That's the statement that seems a little flaky to me (rather than finding the column that represents the num_rows on the table itself... if it exists!).

analyze and dbms_stats

Sushil, August 23, 2011 - 1:36 pm UTC

Tom, I tried to analyze couple tables in oracle 11g database and I could analyze some and couldn't some. Also, if I run the following gather table stats, it will be successful for some table and I am getting error for some table.
BEGIN DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'RPAPPL',TABNAME=>'ECHO_ZHOURLY',estimate_percent=>10,
method_opt=>'FOR ALL COLUMNS SIZE 1',Degree=>4, GRANULARITY=>'AUTO', CASCADE=>TRUE); END;
/
error:
BEGIN DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'RPAPPL',TABNAME=>'ECHO_ZHOURLY',estimate_percent=>10,
method_opt=>'FOR ALL COLUMNS SIZE 1',Degree=>4, GRANULARITY=>'AUTO', CASCADE=>TRUE); END;
Error at line 7
ORA-20001: Invalid or inconsistent input values
ORA-06512: at "SYS.DBMS_STATS", line 18408
ORA-06512: at "SYS.DBMS_STATS", line 18429
ORA-06512: at line 1
Script Terminated on line 7.

can you please help me to identify the issue?
Tom Kyte
August 30, 2011 - 3:25 pm UTC

I would highly recommend in 11g to use defaults for estimate_percent - do not sample anymore, let it figure out what it wants to do. Highly recommendj that.


You would need to give me a complete example if you want me to comment. I need to be able to reproduce the issue.

will be of more use

venkata, October 10, 2011 - 5:14 am UTC

Tom,

i have observed you use @connect user/password in all your replies. can i know what have you written in that connect.sql. hope this is not above classified ;)
Tom Kyte
October 10, 2011 - 10:48 am UTC

it isn't - and I've published it before - but since 10g it isn't necessary anymore.

It was simply:


set termout off
connect &1
@login
set termout on


because earlier releases of sqlplus didn't re-run the login.sql file (I set my prompt in that). But sqlplus does that now so I don't use it at all anymore.

rbo query

sandeep, December 11, 2011 - 11:34 am UTC

Hi Tom,

We have a 9i database which is in RBO mode. One query (shown below) which used to take 1 hr to complete took 32 hours to complete after running dbms_stats on those tables.

explain plan for
SELECT /*+ USE_NL(CFA) */
SUM( CFA.TRNSCTN_AMNT) NOB67AMNT
FROM
(SELECT C.CLM_ID,
TO_CHAR(RPTD_DATE, 'YYYY') RPT_YEAR,
C.ACCNT_PRGRM_PRD_ID
FROM CLM_R C,
PLCY_SRVC P,
LOSS_R L
WHERE C.CLM_ID > 0 AND
C.PLCY_SRVC_ID = P.PLCY_SRVC_ID AND
P.LOB_CODE LIKE '%DD' AND
C.LOSS_ID = L.LOSS_ID AND
TO_CHAR(L.RPTD_DATE, 'YYYY') > TO_CHAR(TO_NUMBER(TO_CHAR(TRUNC(SYSDATE),'YYYY'))-10) AND
C.LMTBL_PRDCT_ID =
(SELECT LMTBL_PRDCT_ID
FROM LMTBL_PRDCT
WHERE LMTBL_PRDCT_SHRT_NAME = 'WC')) C,
SVNGS_FCTR_HSTRY D,
NTR_OF_BNFT_R NOB,
CLM_FNNCL_ACTVTY_R CFA,
ENTRY_CODE_R E
WHERE CFA.CLM_FNNCL_ACTVTY_ID > 0 AND
E.ENTRY_CODE_NAME NOT LIKE 'NH%' AND
CFA.ENTRY_CODE_ID = E.ENTRY_CODE_ID AND
C.CLM_ID = CFA.CLM_ID AND
CFA.NTR_OF_BNFT_ID = NOB.NTR_OF_BNFT_ID AND
D.ACCNT_PRGRM_PRD_ID = C.ACCNT_PRGRM_PRD_ID AND
NOB.NTR_OF_BNFT_CODE = '67' AND
TO_CHAR(CFA.ROW_INSRTD_DATE, 'YYYY') >= C.RPT_YEAR AND
TO_CHAR(D.VLTN_MNTH_YEAR,'MM/YYYY')= TO_CHAR(ADD_MONTHS(TRUNC(SYSDATE), -1),'MM/YYYY') AND
TO_DATE(CFA.ROW_INSRTD_DATE, 'DD-MON-YYYY') <= TO_DATE(LAST_DAY(ADD_MONTHS(TRUNC(SYSDATE), -1)), 'DD-MON-YYYY');


---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 140 | 2656 |
| 1 | SORT AGGREGATE | | 1 | 140 | |
| 2 | NESTED LOOPS | | 1 | 140 | 2656 |
| 3 | NESTED LOOPS | | 11 | 1375 | 2655 |
| 4 | NESTED LOOPS | | 11 | 1254 | 2654 |
| 5 | NESTED LOOPS | | 920 | 68080 | 2470 |
| 6 | NESTED LOOPS | | 5784 | 338K| 1892 |
| 7 | NESTED LOOPS | | 422 | 10550 | 14 |
| 8 | TABLE ACCESS BY INDEX ROWID| NTR_OF_BNFT_R | 1 | 9 | 1 |
|* 9 | INDEX UNIQUE SCAN | NTR_OF_BNFT_CK01 | 1 | | |
|* 10 | INDEX FULL SCAN | SVNGS_FCTR_HSTRY_CK01 | 422 | 6752 | 246 |
|* 11 | TABLE ACCESS BY INDEX ROWID | CLM_R | 14 | 490 | 5 |
|* 12 | INDEX RANGE SCAN | CLM_R_PF01 | 192 | | 2 |
| 13 | TABLE ACCESS BY INDEX ROWID| LMTBL_PRDCT | 1 | 13 | 1 |
|* 14 | INDEX UNIQUE SCAN | LMTBL_PRDCT_CK02 | 1 | | |
|* 15 | TABLE ACCESS BY INDEX ROWID | PLCY_SRVC | 1 | 14 | 1 |
|* 16 | INDEX UNIQUE SCAN | PLCY_SRVC_PK | 1 | | 1 |
|* 17 | TABLE ACCESS BY INDEX ROWID | CLM_FNNCL_ACTVTY_R | 1 | 40 | 1 |
|* 18 | INDEX RANGE SCAN | CLM_FNNCL_ACTVTY_PF02 | 1 | | 3 |
|* 19 | TABLE ACCESS BY INDEX ROWID | ENTRY_CODE_R | 1 | 11 | 1 |
|* 20 | INDEX UNIQUE SCAN | SYS_C0010550 | 1 | | |
|* 21 | TABLE ACCESS BY INDEX ROWID | LOSS_R | 1 | 15 | 1 |
|* 22 | INDEX UNIQUE SCAN | LOSS_R_PK | 1 | | 1 |
---------------------------------------------------------------------------------------------

Whwn I am applying the hint RULE then its coming within 1 hour.


explain plan for
SELECT /*+ RULE */
SUM( CFA.TRNSCTN_AMNT) NOB67AMNT
FROM
(SELECT C.CLM_ID,
TO_CHAR(RPTD_DATE, 'YYYY') RPT_YEAR,
C.ACCNT_PRGRM_PRD_ID
FROM CLM_R C,
PLCY_SRVC P,
LOSS_R L
WHERE C.CLM_ID > 0 AND
C.PLCY_SRVC_ID = P.PLCY_SRVC_ID AND
P.LOB_CODE LIKE '%DD' AND
C.LOSS_ID = L.LOSS_ID AND
TO_CHAR(L.RPTD_DATE, 'YYYY') > TO_CHAR(TO_NUMBER(TO_CHAR(TRUNC(SYSDATE),'YYYY'))-10) AND
C.LMTBL_PRDCT_ID =
(SELECT LMTBL_PRDCT_ID
FROM LMTBL_PRDCT
WHERE LMTBL_PRDCT_SHRT_NAME = 'WC')) C,
SVNGS_FCTR_HSTRY D,
NTR_OF_BNFT_R NOB,
CLM_FNNCL_ACTVTY_R CFA,
ENTRY_CODE_R E
WHERE CFA.CLM_FNNCL_ACTVTY_ID > 0 AND
E.ENTRY_CODE_NAME NOT LIKE 'NH%' AND
CFA.ENTRY_CODE_ID = E.ENTRY_CODE_ID AND
C.CLM_ID = CFA.CLM_ID AND
CFA.NTR_OF_BNFT_ID = NOB.NTR_OF_BNFT_ID AND
D.ACCNT_PRGRM_PRD_ID = C.ACCNT_PRGRM_PRD_ID AND
NOB.NTR_OF_BNFT_CODE = '67' AND
TO_CHAR(CFA.ROW_INSRTD_DATE, 'YYYY') >= C.RPT_YEAR AND
TO_CHAR(D.VLTN_MNTH_YEAR,'MM/YYYY')= TO_CHAR(ADD_MONTHS(TRUNC(SYSDATE), -1),'MM/YYYY') AND
TO_DATE(CFA.ROW_INSRTD_DATE, 'DD-MON-YYYY') <= TO_DATE(LAST_DAY(ADD_MONTHS(TRUNC(SYSDATE), -1)), 'DD-MON-YYYY');

----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | SORT AGGREGATE | | | | |
|* 2 | FILTER | | | | |
| 3 | NESTED LOOPS | | | | |
| 4 | NESTED LOOPS | | | | |
| 5 | NESTED LOOPS | | | | |
| 6 | NESTED LOOPS | | | | |
| 7 | NESTED LOOPS | | | | |
| 8 | NESTED LOOPS | | | | |
| 9 | TABLE ACCESS BY INDEX ROWID| NTR_OF_BNFT_R | | | |
|* 10 | INDEX UNIQUE SCAN | NTR_OF_BNFT_CK01 | | | |
|* 11 | TABLE ACCESS BY INDEX ROWID| CLM_FNNCL_ACTVTY_R | | | |
|* 12 | INDEX RANGE SCAN | CLM_FNNCL_ACTVTY_FK01 | | | |
|* 13 | TABLE ACCESS BY INDEX ROWID | ENTRY_CODE_R | | | |
|* 14 | INDEX UNIQUE SCAN | SYS_C0010550 | | | |
| 15 | TABLE ACCESS BY INDEX ROWID | CLM_R | | | |
|* 16 | INDEX UNIQUE SCAN | CLM_PK | | | |
|* 17 | TABLE ACCESS BY INDEX ROWID | LOSS_R | | | |
|* 18 | INDEX UNIQUE SCAN | LOSS_R_PK | | | |
|* 19 | TABLE ACCESS BY INDEX ROWID | PLCY_SRVC | | | |
|* 20 | INDEX UNIQUE SCAN | PLCY_SRVC_PK | | | |
|* 21 | INDEX RANGE SCAN | SVNGS_FCTR_HSTRY_CK01 | | | |
| 22 | TABLE ACCESS BY INDEX ROWID | LMTBL_PRDCT | | | |
|* 23 | INDEX UNIQUE SCAN | LMTBL_PRDCT_CK02 | | | |
----------------------------------------------------------------------------------------------

We have a job to run the dbms_stats weekly as shown below.

exec sys.DBMS_STATS.GATHER_TABLE_STATS ('PSTPP','CLM_FNNCL_ACTVTY_R',estimate_percent =>25,degree =>4,cascade => TRUE);
exec sys.DBMS_STATS.GATHER_TABLE_STATS ('PSTPP','CLM_R',estimate_percent =>100,degree =>4,cascade => TRUE);
exec sys.DBMS_STATS.GATHER_TABLE_STATS ('PSTPP','LOSS_R',estimate_percent =>100,degree =>4,cascade => TRUE);
exec sys.DBMS_STATS.GATHER_TABLE_STATS ('PSTPP','PLCY_SRVC',estimate_percent =>100,degree =>1,cascade => TRUE);
exec sys.DBMS_STATS.GATHER_TABLE_STATS ('PSTPP','SVNGS_FCTR_HSTRY',estimate_percent =>100,degree =>1,cascade => TRUE);
exec sys.DBMS_STATS.GATHER_TABLE_STATS ('PSTPP','LMTBL_PRDCT',estimate_percent =>100,degree =>1,cascade => TRUE);
exec sys.DBMS_STATS.GATHER_TABLE_STATS ('PSTPP','ENTRY_CODE_R',estimate_percent =>100,degree =>1,cascade => TRUE);
exec sys.DBMS_STATS.GATHER_TABLE_STATS ('PSTPP','NTR_OF_BNFT_R',estimate_percent =>100,degree =>1,cascade => TRUE);

But last week I had run the dbms_stats for clm_r in this way and after that the weekly job for analyzing also ran. I think this the reason clm_r coming within the nested loop before CLM_FNNCL_ACTVTY_R comes which is causing so much delay.

dbms_stats.gather_table_stats (
ownname => user,
tabname => 'CLM_R ,
estimate_percent => 100,
method_opt => 'for all columns size auto',
degree => 4,
cascade => TRUE
);
The size of the tables involved in the above query is given below .

OWNER SEGMENT_NAME SEGMENT_TYPE mb
------------------------------ ----------------------- ------------------ ----------
PSTPP PLCY_SRVC TABLE 120
PSTPP NTR_OF_BNFT_R TABLE .125
PSTPP CLM_FNNCL_ACTVTY_R TABLE 23552
PSTPP ENTRY_CODE_R TABLE .125
PSTPP LOSS_R TABLE 4736
PSTPP LMTBL_PRDCT TABLE .125
PSTPP SVNGS_FCTR_HSTRY TABLE 1.875
PSTPP CLM_R TABLE 5504

Can you please suggest how should I run the dbms_stats to give a constant plan. The plan is getting changed and causing the query to finh in 30 hours instaed of getting finshing within 1 hour.
Tom Kyte
December 11, 2011 - 2:56 pm UTC

why do you have a hint in it?

are the actual cardinalities (number of rows) flowing through the steps of the plan anywhere close to the estimated cardinalities?

it seems you had a method for getting good stats before and you changed your method and it didn't work out so well - is that correct?

rbo query

sandeep, December 14, 2011 - 10:14 pm UTC

Hi Tom,
Thanks for your reply. This query with the use_nl hint is being used from a long time(may be some years) in our database.So the developers are reluctant to change the code. Previously we were collecting stats by using "analyze table ..estimate statistics method". Now I have changed it to dbms_stats .
Now again I have deleted the statistics of a table -clm_r and did the old way of "analyze table ..estimate statistics method" of that particular table.The rest others are in dbms_stats way. Now when I am running the query the result is coming within 45 mins. I m bit confused which way I am going to collect the stats. Can you please suggest which method should I follow to collect the stats.

Many thanks in advance.
sandeep
Tom Kyte
December 16, 2011 - 6:13 am UTC

the code is not working, why are they reluctant?

get rid of it if you want to discuss it with me.



dbms_stats and analyze result in different plan

A reader, February 26, 2012 - 9:23 pm UTC

Hi Tom,

Could you please help to elaborate more on why 'dbms_stats' and 'analyze' result in different plan?

Below is the test scripts for your reference.


create table test as select rownum id, 'test' text from dual connect by level <= 10000;
create index test_i on test(id);
delete test where id <= 9990;
commit;
exec dbms_stats.gather_table_stats(ownname=>user, tabname=> 'TEST', cascade => true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
select * from test where id between 1 and 10000;
--IndexRangeScan
analyze index test_i compute statistics;
select * from test where id between 1 and 10000;
--FTS. Even there is only 10 rows now, but HWM is still high, so FTS is bad.
Tom Kyte
February 28, 2012 - 6:26 am UTC

because analyze has not been touched in many years as far as stats gathering goes, because analyze is deprecated for gathering stats, because dbms_stats is the only way to gather stats for the optimizer - the optimizer is coded expecting the numbers to have come from dbms_stats.


In short, they generate different statistics, different statistics often lead to different plans.


also, your two commands are entirely different - you gathered entirely different sets of stats..

Analyze any sys priv in 11gr2

A reader, March 19, 2012 - 10:51 am UTC

Tom,
I recently encountered a strange behavior. After upgrading from 10.2.0.3 to 11gr2, a package stopped working. This package owned by OWNER1, uses dbms_stats to gather stats on tables owned by another user OWNER2. It was working fine in 10g but in 11gr2 I started getting:

dbms_stats.gather_table_stats(ownname=>'OWNER2',tabname=>'T_1',cascade=>true);

ORA-20000: Unable to analyze TABLE "OWNER2"."T_1", insufficient privileges or does not exist

When I grant "ANALYZE ANY" privilege to package owner OWNER1, then I do not get any error.

Question is: Is this a change in 10g to 11gr2? In 10g the same command worked without granting "ANALYZE ANY" privilege.

Thanks...
Tom Kyte
March 19, 2012 - 9:01 pm UTC

how did you upgrade?

these are the table privs in 10g:

http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_9013.htm#sthref9340

analyze is not one of them.

you would have needed that in 10g

Analyze any sys priv in 11gr2

A reader, March 20, 2012 - 7:39 am UTC

Upgrade was done via cold backup of 10g, restore on 11g, startup in upgrade mode, run catupgrd. So no 10g privileges were modified. The 10g database is still intact and I compared all the user privileges between 10g and 11g. They are same. I have checked all roles assigned to OWNER1 and they match between 10g and 11g. Somehow, OWNER1 can analyze OWNER2's objects with requiring "analyze all" privilege in 10g.

I just wanted to know if you were aware of something which would cause this to happen in 10g, like any initialization parameter or something which I might have missed.

Based on your reply, I think my best opton is to open SR with support.

Thanks...
Tom Kyte
March 20, 2012 - 9:52 am UTC

show us the set of privileges granted to the user in 10g - reproduce this.

Do this before opening an SR.


show me how, in 10g, you were able to do this - what grants did you give the user in 10g that made this possible?

Analyze any sys priv in 11gr2

A reader, March 20, 2012 - 10:21 am UTC

Seems like a 10g bug...

C:\>sqlplus system

SQL*Plus: Release 10.2.0.4.0 - Production on Tue Mar 20 10:58:35 2012

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Enter password:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> drop user USER1 cascade;
drop user USER1 cascade
          *
ERROR at line 1:
ORA-01918: user 'USER1' does not exist


SQL> drop user USER2 cascade;
drop user USER2 cascade
          *
ERROR at line 1:
ORA-01918: user 'USER2' does not exist


SQL> create user USER1 identified by USER1;

User created.

SQL> grant create session to USER1;

Grant succeeded.

SQL> create user USER2 identified by USER2;

User created.

SQL> grant create session, create table to USER2;

Grant succeeded.

SQL> alter user USER2 quota unlimited on users_data;

User altered.

SQL> conn USER2/USER2
Connected.
SQL> create table t1 as select * from all_objects;

Table created.

SQL> conn USER1/USER1
Connected.
SQL> exec dbms_stats. gather_table_stats (ownname=>'USER2',tabname=>'T1');

PL/SQL procedure successfully completed.

SQL>


-- 11g

C:\Users\oradba>sqlplus system

SQL*Plus: Release 11.2.0.2.0 Production on Tue Mar 20 11:03:20 2012

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Enter password:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters and Automatic Storage Management options

SQL> drop user USER1 cascade;
drop user USER1 cascade
          *
ERROR at line 1:
ORA-01918: user 'USER1' does not exist


SQL> drop user USER2 cascade;
drop user USER2 cascade
          *
ERROR at line 1:
ORA-01918: user 'USER2' does not exist


SQL> create user USER1 identified by USER1;

User created.

SQL> grant create session to USER1;

Grant succeeded.

SQL> create user USER2 identified by USER2;

User created.

SQL> grant create session, create table to USER2;

Grant succeeded.


SQL> alter user USER2 quota unlimited on users;

User altered.

SQL> conn USER2/USER2
Connected.
SQL> create table t1 as select * from all_objects;

Table created.

SQL> conn USER1/USER1
Connected.
SQL> exec dbms_stats. gather_table_stats (ownname=>'USER2',tabname=>'T1');
BEGIN dbms_stats. gather_table_stats (ownname=>'USER2',tabname=>'T1'); END;

*
ERROR at line 1:
ORA-20000: Unable to analyze TABLE "USER2"."T1", insufficient privileges or
does not exist
ORA-06512: at "SYS.DBMS_STATS", line 23427
ORA-06512: at "SYS.DBMS_STATS", line 23489
ORA-06512: at line 1


SQL>

Tom Kyte
March 20, 2012 - 3:19 pm UTC

It doesn't work like that by default in 10g

ps$tkyte%ORA10GR2> drop user USER1 cascade;

User dropped.

ops$tkyte%ORA10GR2> drop user USER2 cascade;

User dropped.

ops$tkyte%ORA10GR2> create user USER1 identified by USER1;

User created.

ops$tkyte%ORA10GR2> grant create session to USER1;

Grant succeeded.

ops$tkyte%ORA10GR2> create user USER2 identified by USER2 default tablespace users quota unlimited on users;

User created.

ops$tkyte%ORA10GR2> grant create session, create table to USER2;

Grant succeeded.

ops$tkyte%ORA10GR2> conn USER2/USER2
Connected.
user2%ORA10GR2> create table t1 as select * from all_objects where rownum = 1;

Table created.

user2%ORA10GR2> conn USER1/USER1
Connected.
user1%ORA10GR2> exec dbms_stats. gather_table_stats (ownname=>'USER2',tabname=>'T1');
BEGIN dbms_stats. gather_table_stats (ownname=>'USER2',tabname=>'T1'); END;

*
ERROR at line 1:
ORA-20000: Unable to analyze TABLE "USER2"."T1", insufficient privileges or
does not exist
ORA-06512: at "SYS.DBMS_STATS", line 15017
ORA-06512: at "SYS.DBMS_STATS", line 15049
ORA-06512: at line 1




I believe (firmly) that your 10g database had analyze any granted to PUBLIC. Please check out what privs PUBLIC has in your 10g database.

Nothing changed...

A reader, March 20, 2012 - 8:11 pm UTC

SQL> select * from dba_sys_privs where grantee='PUBLIC';

no rows selected

SQL>

I have tested in 3 different 10.2.0.3 databases and get the same behavior. I can run gather stats against a different user's schema even without any privilege.

I am sure it must be a 10.2.0.3 bug which was fixed may be in 10.2.0.4 and later. If it was a wrongly granted system privilege, it would have carried over into 11g database since the 11g database was created from cold backup of 10g.

Tom Kyte
March 20, 2012 - 9:17 pm UTC

I don't have a 10.2.0.3 instance to test with me this week...


Can anyone else out there with 10.2.0.3 try this out? I don't see any bugs that would have let this happen.

A reader, March 20, 2012 - 9:41 pm UTC

I have a bug number but no details. It might not be a published bug.

Bug 12714396: USER IS ABLE TO EXECUTE DBMS_STATS WITHOUT ANY PRIVILEGE
Tom Kyte
March 20, 2012 - 10:16 pm UTC

that was a bug closed as NOT A BUG

the bug was opened because someone could execute dbms_stats to gather stats on a schema without the "analyze any" privilege.

but what in fact was happening was that NOTHING in that schema needed stats, so nothing was gathered. if something needed stats - it failed as it should.


that bug does not apply here at all.

Analyze any sys priv in 11gr2

A reader, March 21, 2012 - 8:15 am UTC

Thanks for looking at the bug. Since this is not very urgent, please see if you can find a 10.2.0.3 database to test as per your convenience.

I will also try to create a 10.2.0.3 database and check if it works out of the box. Then I will upgrade to 10.2.0.4 and test if it is resolved. I will have to start with OS install so it will take couple of days.

Thanks for following up on this one even with your extremely busy schedule.
Tom Kyte
March 21, 2012 - 10:32 am UTC

If you can, please do this in 10g and report back the entire set of apparent system privs the newly created user has:

PLEASE REVIEW CAREFULLY, it operates as sysdba to create a new "dictoinary" like view that shows all of the roles granted to someone


user1%ORA10GR2> connect / as sysdba
Connected.
sys%ORA10GR2>
sys%ORA10GR2> create or replace view user_role_hierarchy
  2  as
  3  select u2.name granted_role
  4    from ( select *
  5             from sys.sysauth$
  6          connect by prior privilege# = grantee#
  7            start with grantee# = uid or grantee# = 1) sa,
  8         sys.user$ u2
  9   where u2.user#=sa.privilege#
 10  union all select user from dual
 11  union all select 'PUBLIC' from dual
 12  /

View created.

sys%ORA10GR2> grant select on user_role_hierarchy to public;

Grant succeeded.

sys%ORA10GR2> create or replace public synonym user_role_hierarchy for user_role_hierarchy;

Synonym created.

sys%ORA10GR2>
sys%ORA10GR2> drop user user1 cascade;

User dropped.

sys%ORA10GR2> create user user1 identified by user1;

User created.

sys%ORA10GR2> grant create session to user1;

Grant succeeded.

sys%ORA10GR2> grant select on dba_sys_privs to user1;

Grant succeeded.

sys%ORA10GR2>
sys%ORA10GR2> connect user1/user1;
Connected.
user1%ORA10GR2>
user1%ORA10GR2> select * from user_role_hierarchy;

GRANTED_ROLE
------------------------------
USER1
PUBLIC

user1%ORA10GR2>
user1%ORA10GR2> select distinct PRIVILEGE
  2    from dba_sys_privs
  3   where grantee in ( select * from user_role_hierarchy )
  4  /

PRIVILEGE
----------------------------------------
CREATE SESSION

user1%ORA10GR2>

A reader, March 21, 2012 - 11:49 am UTC

C:\>sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Wed Mar 21 12:45:20 2012

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> create or replace view user_role_hierarchy
  2  as
  3  select u2.name granted_role
  4    from ( select *
  5           from sys.sysauth$
  6           connect by prior privilege# = grantee#
  7           start with grantee# = uid or grantee# = 1) sa,
  8           sys.user$ u2
  9               where u2.user#=sa.privilege#
 10   union all
 11          select user from dual
 12   union all
 13          select 'PUBLIC' from dual
 14  /

View created.

SQL> grant select on user_role_hierarchy to public;

Grant succeeded.

SQL> create or replace public synonym user_role_hierarchy for user_role_hierarchy;

Synonym created.

SQL> drop user user1 cascade;

User dropped.

SQL> create user user1 identified by user1;

User created.

SQL> grant create session to user1;

Grant succeeded.

SQL> grant select on dba_sys_privs to user1;

Grant succeeded.

SQL> connect user1/user1
Connected.
SQL> select * from user_role_hierarchy;

GRANTED_ROLE
------------------------------
USER1
PUBLIC

SQL> select distinct PRIVILEGE
  2      from dba_sys_privs
  3     where grantee in ( select * from user_role_hierarchy )
  4  /

PRIVILEGE
----------------------------------------
CREATE SESSION

SQL>

Tom Kyte
March 21, 2012 - 10:44 pm UTC

that is perplexing, I would contact support at this point - you should not be able to analyze ANYONES tables - period.


can anyone else with 10.2.0.3 available reproduce?

Analyze any sys priv in 11gr2

A reader, March 22, 2012 - 8:08 am UTC

Tom,
I really like to express my sincere thanks to you for working on this issue. I have seen your busy schedule on the calendar. It is really amazing that you still find time to help Oracle users.

I did open a SR and the only thing I got out of support is:
"In 11g we tightened security on ANALYZE. It is a behavior change between versions."

Since your test on higher versions of 10g shows that dbms_stats fails without privileges, I kind of seem to disagree with that statement. Also, if it was a behavior change, there would be some documentation of it in new features or PL/SQL packages and type references guide.

Tomorrow I am going to create a VM and test with various versions of 10g to observe the behavior out of the box.

10.2.0.1.0 - 10.2.0.3.0 Results

Mark Williams, March 22, 2012 - 6:21 pm UTC

Here's some results from versions 10.2.0.1.0, 10.2.0.2.0, and 10.2.0.3.0 on a test VM (Windows Server 2003 32-bit).

Quick summary: I couldn't reproduce the issue.

Common tasks for each version:

create user USER1 identified by USER1
default tablespace users
temporary tablespace temp
quota unlimited on users;

grant create session to USER1;

create user USER2 identified by USER2
default tablespace users
temporary tablespace temp
quota unlimited on users;

grant create session, create table to USER2;

conn USER2/USER2

create table t1 as select * from all_objects;

conn USER1/USER1

exec dbms_stats.gather_table_stats (ownname=>'USER2',tabname=>'T1');


Here are the results for each version with the common tasks snipped out for space:

SQL> @get_version

PRODUCT                                          VERSION
------------------------------------------------ ----------
NLSRTL                                           10.2.0.1.0
Oracle Database 10g Enterprise Edition           10.2.0.1.0
PL/SQL                                           10.2.0.1.0
TNS for 32-bit Windows:                          10.2.0.1.0

4 rows selected.

SQL> exec dbms_stats.gather_table_stats (ownname=>'USER2',tabname=>'T1');
BEGIN dbms_stats.gather_table_stats (ownname=>'USER2',tabname=>'T1'); END;

*
ERROR at line 1:
ORA-20000: Unable to analyze TABLE "USER2"."T1", insufficient privileges or does not exist
ORA-06512: at "SYS.DBMS_STATS", line 13046
ORA-06512: at "SYS.DBMS_STATS", line 13076
ORA-06512: at line 1


SQL> @get_version

PRODUCT                                          VERSION
------------------------------------------------ ----------
NLSRTL                                           10.2.0.2.0
Oracle Database 10g Enterprise Edition           10.2.0.2.0
PL/SQL                                           10.2.0.2.0
TNS for 32-bit Windows:                          10.2.0.2.0

4 rows selected.

SQL> exec dbms_stats.gather_table_stats (ownname=>'USER2',tabname=>'T1');
BEGIN dbms_stats.gather_table_stats (ownname=>'USER2',tabname=>'T1'); END;

*
ERROR at line 1:
ORA-20000: Unable to analyze TABLE "USER2"."T1", insufficient privileges or does not exist
ORA-06512: at "SYS.DBMS_STATS", line 13149
ORA-06512: at "SYS.DBMS_STATS", line 13179
ORA-06512: at line 1


SQL> @get_version

PRODUCT                                          VERSION
------------------------------------------------ ----------
NLSRTL                                           10.2.0.3.0
Oracle Database 10g Enterprise Edition           10.2.0.3.0
PL/SQL                                           10.2.0.3.0
TNS for 32-bit Windows:                          10.2.0.3.0

4 rows selected.

SQL> exec dbms_stats.gather_table_stats (ownname=>'USER2',tabname=>'T1');
BEGIN dbms_stats.gather_table_stats (ownname=>'USER2',tabname=>'T1'); END;

*
ERROR at line 1:
ORA-20000: Unable to analyze TABLE "USER2"."T1", insufficient privileges or does not exist
ORA-06512: at "SYS.DBMS_STATS", line 13172
ORA-06512: at "SYS.DBMS_STATS", line 13202
ORA-06512: at line 1


Tom Kyte
March 22, 2012 - 7:04 pm UTC

Thanks Mark - I'm pretty sure there is something funky that I just cannot see with the original posters database.

That would have been a pretty large hole had it existed.

A reader, July 23, 2012 - 1:27 pm UTC

Tom,
When I gather table stats using a procedure that's scheduled using dbms_job I get the error below. If I gather stats the table directly I don't get that error. Any reason why? Your reply is greatly appreciated.

Unable to analyze TABLE , insufficient
privileges or does not exist

A reader, July 30, 2012 - 12:39 pm UTC

That doesn't tell what to do if I run the procedure through job scheduler. I submitted a TAR with oracle support and the support analyst said I have to grant the analyze any table analyze any schema privileges to dbsnmp user. I can gather_stat if I run it using my id which has dba privilege.
Tom Kyte
July 30, 2012 - 4:05 pm UTC

same thing, you have no roles.

you need the privileges directly granted to you. You'll be running from inside of a stored procedure environment.

Analyze Command to be used?

Ramki, July 31, 2012 - 1:04 pm UTC

Tom
I saw your explanation on difference on using Analyze command vs dbms_stats.gather_table_stats.
I recently joined a Datawarehouse team where the DBA's are using Analyze command as a part of their periodic weekly stats gathering process for important tables.
The Oracle version used is 11g(Release 1102000100).
I referred to Oracle 11g Release notes they recommended to use dbms_stats.gather_table_stats package.

After seeing your recommendations i suggested them to change them to dbms_stats.gather_table_stats.

They(DBA's) responded by stating that "Actually oracle internally runs the “analyze” command within the dbms_stats package. Only difference is the cardinality will be more if you use the dbms_stats. Also running dbms_stats takes longer time than the regular analyze statement."

Is the above DBA statement is correct?Can you please share your thoughts on this so that i can follow up with DBA's to correct them.
Tom Kyte
July 31, 2012 - 1:08 pm UTC

They(DBA's) responded by stating that "Actually oracle internally runs the
“analyze” command within the dbms_stats package. Only difference is the
cardinality will be more if you use the dbms_stats. Also running dbms_stats
takes longer time than the regular analyze statement."


that is false on every level

first, enable sql trace, run dbms_stats, and then tkprof the trace file, show me a single ANALYZE in the trace? you cannot. show me a ton of SQL run against the objects? you'll find it all over the place.

dbms_stats does not run analyze, it doesn't.


(and if there is a difference - as they say there will be - doesn't that sort of rule out dbms_stats 'just runs analyze' - if they return different numbers?????)

and they should benchmark dbms_stats versus analyze. I think they'll be surprised.



dbams_stats gather taking ages.

A Reader, January 23, 2013 - 3:08 am UTC

Tom,
we are at 10.2.0.4

the job:

SQL> exec dbms_stats.gather_table_stats(user,'TABLE_1',cascade=>true);

it  is running for more then 24 hrs...


table TABLE_1 size -   45 gb
Index IDX_NAME_UI size -   28 gb 
table -  sys.ORA_TEMP_1_DS_23485  has 0 rows in it.


index IDX_NAME_UI functiona based index  on name column -- name 
and it  has definition  as  :
create index IDX_NAME_UI on TABLE_1 (  UPPER('name') );


SQL> desc TABLE_1;
 Name                                                                                                      Null?    Type
 ----------------------------------------------------------------------------------------------------------------- -------- 
 MANAGEDSERVICEID                                                                                          NOT NULL NUMBER
 NAME                                                                                                               VARCHAR2(650 CHAR)
 MS2MANAGEDSERVICETYPE                                                                                              NUMBER
 MS2MARKETTYPE                                                                                                      NUMBER
 OMSID                                                                                                              VARCHAR2(2000 CHAR)
 FULLNAME                                                                                                           VARCHAR2(50 CHAR)
 RELATIVENAME                                                                                                       VARCHAR2(50 CHAR)
 ALIAS1                                                                                                             VARCHAR2(50 CHAR)
 ALIAS2                                                                                                             VARCHAR2(50 CHAR)
 OBJECTID                                                                                                           VARCHAR2(64 CHAR)
 SUBTYPE                                                                                                            VARCHAR2(50 CHAR)
 SUBSTATUS                                                                                                          VARCHAR2(50 CHAR)
 DESCRIPTION                                                                                                        VARCHAR2(2000 CHAR)
 NOTES                                                                                                              VARCHAR2(2000 CHAR)
 CREATEDDATE                                                                                                        DATE
 LASTMODIFIEDDATE                                                                                                   DATE
 CREATEDBY2DIMUSER                                                                                                  NUMBER
 LASTMODIFIEDBY2DIMUSER                                                                                             NUMBER
 MS2PROVISIONSTATUS                                                                                                 NUMBER
 MS2FUNCTIONALSTATUS                                                                                                NUMBER
 MARKEDFORDELETE                                                                                                    NUMBER(1)
 ISVISIBLE                                                                                                          NUMBER
 LABEL                                                                                                              NUMBER
 X_CTCV_CONV_ID                                                                                                     VARCHAR2(50 CHAR)
 X_CONV_RUN_NO                                                                                                      NUMBER



 we seen currently it is executing the below statements.
 
 
 sid for the above job is - 1023
 
SQL> select event,state,p1,p2,p3 from v$session_wait where sid=1023;

EVENT                                                            STATE                             P1         P2         P3
---------------------------------------------------------------- ------------------- ---------------- ---------- ----------
latch: cache buffers chains                                      WAITED SHORT TIME     23568357320        122          1

SQL> select  segment_name,
  2              segment_type
  3  from     dba_extents
  4  where  file_id = 23568357320
  5       and  122  between
  6      block_id and block_id + blocks -1;

no rows selected

-- no file with file_id = 23568357320 !

-- current statement being executed ..



SQL> explain plan for
  2  SELECT COUNT (rep)
  3    FROM (  SELECT                   /*+ leading(v1 v2) use_nl_with_index(v2) */
  4                  COUNT (v1.val) rep
  5              FROM (SELECT rn, val
  6                      FROM (SELECT ROWNUM rn, val
  7                              FROM (  SELECT /*+ no_merge no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */
  8                                            SUBSTRB (
  9                                                SYS_DS_ALIAS_26,
 10                                                1,
 11                                                32)
 12                                                val
 13                                        FROM sys.ora_temp_1_ds_23485 t
 14                                       WHERE SUBSTRB (SYS_DS_ALIAS_26, 1, 32)
 15                                                IS NOT NULL
 16                                    GROUP BY SUBSTRB (SYS_DS_ALIAS_26, 1, 32)
 17                                      HAVING COUNT (
 18                                                SUBSTRB (SYS_DS_ALIAS_26, 1, 32)) =
 19                                                1))
 20                     WHERE ORA_HASH (rn) <= 4015864698) v1,
 21                   (SELECT                                      /*+ index(t2) */
 22                          UPPER ("NAME") val
 23                      FROM "CRAMER"."TABLE_1" t2
 24                     WHERE TBL$OR$IDX$PART$NUM ("CRAMER"."TABLE_1",
 25                                                0,
 26                                                4,
 27                                                0,
 28                                                "ROWID") = :objn) v2
 29             WHERE v2.val LIKE v1.val || '%'
 30          GROUP BY v1.val
 31            HAVING COUNT (v1.val) <= 2);

Explained.

SQL> set linesize 200
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------- 
Plan hash value: 3648757343

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     1 |    13 |   100K  (4)| 00:20:06 |    |          |
|   1 |  SORT AGGREGATE             |                     |     1 |    13 |            |          |    |          |
|   2 |   VIEW                      |                     |    30M|   374M|   100K  (4)| 00:20:06 |    |          |
|*  3 |    FILTER                   |                     |       |       |            |          |    |          |
|   4 |     HASH GROUP BY           |                     |    30M|  1871M|   100K  (4)| 00:20:06 |    |          |
|   5 |      NESTED LOOPS           |                     |    30M|  1871M| 97232   (1)| 00:19:27 |    |          |
|*  6 |       VIEW                  |                     |   408 | 12648 |    21  (10)| 00:00:01 |    |          |
|   7 |        COUNT                |                     |       |       |            |          |    |          |
|   8 |         VIEW                |                     |   408 |  7344 |    21  (10)| 00:00:01 |    |          |
|*  9 |          FILTER             |                     |       |       |            |          |    |          |
|  10 |           HASH GROUP BY     |                     |   408 |   130K|    21  (10)| 00:00:01 |    |          |
|* 11 |            TABLE ACCESS FULL| ORA_TEMP_1_DS_23485 |   408 |   130K|    20   (5)| 00:00:01 |    |          |
|  12 |       PARTITION RANGE SINGLE|                     | 74008 |  2457K|   238   (1)| 00:00:03 |   KEY |   KEY |
|* 13 |        INDEX RANGE SCAN     | IDX_NAME_UI          | 74008 |  2457K|   238   (1)| 00:00:03 |   KEY |   KEY |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(COUNT("VAL")<=2)
   6 - filter(ORA_HASH("RN")<=4015864698)
   9 - filter(COUNT(SUBSTRB("SYS_DS_ALIAS_26",1,32))=1)
  11 - filter(SUBSTRB("SYS_DS_ALIAS_26",1,32) IS NOT NULL)
  13 - access(UPPER("NAME") LIKE "VAL"||'%')
       filter(UPPER("NAME") LIKE "VAL"||'%')

30 rows selected.

SQL>



question
a) where it is waiting ?
b) what P1 = 23568357320 from v$session_wait is referring here?
c) do system level stats gather would help?
d) any other point .. bug?


Tom Kyte
January 30, 2013 - 12:28 pm UTC

-- no file with file_id = 23568357320 !


p1 = address of latch, not file

ops$tkyte%ORA11GR2> select parameter1 p1, parameter2 p2, parameter3 p3
  2  from v$event_name
  3  where name = 'latch: cache buffers chains'
  4  /

P1
----------------------------------------------------------------
P2
----------------------------------------------------------------
P3
----------------------------------------------------------------
address
number
tries





use ASH reports to see what it was doing.

do you have the resources to run in parallel (IO, CPU and memory)

..dbms_stats job taking ages

A Reader, January 23, 2013 - 5:16 am UTC

Tom
further to add to above

I did
alter session set events '10046 trace name context forever,level 12';

to above session runing stats gather:


excerpts from trace file:( job was running for many hrs..)

this obj#=202725 belongs to

|* 11 | TABLE ACCESS FULL| ORA_TEMP_1_DS_23485 | 408 | 130K| 20 (5)| 00:00:01 | | |


WAIT #20: nam='db file scattered read' ela= 41817 file#=1009 block#=6154 blocks=128 obj#=202725 tim=3368259988396
WAIT #20: nam='db file scattered read' ela= 3920 file#=1009 block#=6282 blocks=19 obj#=202725 tim=3368260153260



|* 13 | INDEX RANGE SCAN | IDX_NAME_UI | 74008 | 2457K| 238 (1)| 00:00:03 | KEY | KEY |

obj#=194946 belongs to IDX_NAME_UI

seen many events as below


WAIT #23: nam='db file sequential read' ela= 467 file#=260 block#=1432646 blocks=1 obj#=194946 tim=3369425036064
WAIT #23: nam='db file sequential read' ela= 533 file#=277 block#=215812 blocks=1 obj#=194946 tim=3369426851963
WAIT #23: nam='db file sequential read' ela= 442 file#=277 block#=215813 blocks=1 obj#=194946 tim=3369426852683
WAIT #23: nam='db file sequential read' ela= 403 file#=277 block#=215814 blocks=1 obj#=194946 tim=3369426853244


Trace file was not updating:
below was the line ... where it was aiting at

*** 2013-01-23 21:35:21.609
WAIT #23: nam='db file sequential read' ela= 13780 file#=269 block#=242690 blocks=1 obj#=194946 tim=3369458683230


I am sure it is this operation.

| 5 | NESTED LOOPS | | 30M| 1871M| 97232 (1)| 00:19:27 | | |


So
whether nested looop is nasty here ?


..dbms_stats job taking forever

A Reader, January 25, 2013 - 1:53 am UTC

Tom,
sorry to keep adding  the info as it comes.
Here are more details to above problem
 
I ran the dbsm_stats.gather_table_stats job again without parallel ( no degree clause speficied)
and traced the session. below excerpts from the tracefile from udump.

...snip
... snip
WAIT #22: nam='db file sequential read' ela= 27129 file#=258 block#=1454180 blocks=1 obj#=194946 tim=3423691838220
WAIT #22: nam='db file sequential read' ela= 40410 file#=260 block#=1453019 blocks=1 obj#=194946 tim=3423691914858
WAIT #22: nam='db file sequential read' ela= 7393 file#=260 block#=1453060 blocks=1 obj#=194946 tim=3423691928244

<here no updates to the trace file size last 11 hrs... since the job is running..>

seeing what for session is waiting:
SQL> select event,state,p1,p2,p3 from v$session_wait where sid=1045;

EVENT                                                            STATE                       P1      P2    P3
---------------------------------------------------------------- ------------------- ---------- ---------- ----------
db file sequential read                                          WAITED KNOWN TIME          260    1453060          1


SQL> select  segment_name,
  2                  segment_type
  3     from     dba_extents
  4       where  file_id = 260
  5           and 1453060   between
  6          block_id and block_id + blocks -1;

SEGMENT_NAME                   SEGMENT_TYPE
------------------------------ --------------------------------------------------
IDX_NAME_UI                    INDEX PARTITION
 

- obj#=194946 in  tracefile  refres to index partition segment IDX_NAME_UI

when I looked at AWR
last 1 hr.. ( 4 cpu dual core) 

 

Top 5 Timed Events 

    
Event Waits     Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time       38,824     100 
control file parallel write  36,658 187    5  0.5   System I/O
control file sequential read  198,347 168    1  0.4   System I/O
direct path read    62,568 34    1  0.1   User I/O
enq: CF - contention   120  31    259  0.1   Other


SQL ordered by Elapsed Time

Elapsed Time (s)  CPU Time (s) Executions  Elap per Exec (s)  % Total DB Time SQL Id SQL Module SQL Text 
38,686     38,683    0          99.61 4wx2na79mm0jv SQL*Plus  BEGIN DBMS_STATS.GATHER_TABLE_... 
38,686     38,683    0          99.61 6kd805142rv5w SQL*Plus  select count(rep) from (select... 




question
a) is it because of sluggish IO?
b) we seen subquery ( running on behalf of DBMS_STATS.GATHER_TABLE_.. job) is cpu intesive?
   full query is:
   select count(rep) from (select /*+ leading(v1 v2) use_nl_with_index(v2) */ count(v1.val) rep from (select rn, val from (select rownum rn, val from (select /*+ no_merge no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */substrb("NAME", 1, 32) val from sys.ora_temp_1_ds_23528 t where substrb("NAME", 1, 32) is not null group by substrb("NAME", 1, 32) having count(substrb("NAME", 1, 32)) = 1)) where ora_hash(rn) <= 3909847333) v1, (select /*+ index(t2) */ "NAME" val from "user"."TABLE_1" t2 where TBL$OR$IDX$PART$NUM("user"."TABLE_1", 0, 4, 0, "ROWID") = :objn) v2 where v2.val like v1.val||'%' group by v1.val having count(v1.val) <= 2)
   
c) what is the way. do I need to raise a bug !.



..dbms_job taking ages..

A Reader, January 26, 2013 - 3:48 pm UTC

Tom
I got the tkprof for it.


snip...here

********************************************************************************

select count(rep) 
from
 (select /*+ leading(v1 v2) use_nl_with_index(v2) */ count(v1.val) rep from 
  (select rn, val from (select rownum rn, val from (select /*+ no_merge 
  no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact 
  use_weak_name_resl dynamic_sampling(0) no_monitoring */substrb("NAME",1,32) 
  val from sys.ora_temp_1_ds_23528 t where substrb("NAME",1,32) is not null 
  group by substrb("NAME",1,32) having count(substrb("NAME",1,32)) = 1)) 
  where ora_hash(rn) <= 3909847333) v1, (select /*+ index(t2) */ "NAME" val 
  from "user"."TABLE_1" t2 where 
  TBL$OR$IDX$PART$NUM("CRAMER"."TABLE_1",0,4,0,"ROWID") = :objn) v2 
  where v2.val like v1.val||'%' group by v1.val having count(v1.val) <= 2)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0        297          0           0
Fetch        1  86487.88   84518.20       3001  662474171          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3  86487.89   84518.20       3001  662474468          0           1

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: CHOOSE
Parsing user id: 72     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=662474171 pr=3001 pw=0 time=2913827502 us)
   1123   VIEW  (cr=662474171 pr=3001 pw=0 time=2913846548 us)
   1123    FILTER  (cr=662474171 pr=3001 pw=0 time=2913845415 us)
   2005     HASH GROUP BY (cr=662474171 pr=3001 pw=0 time=2913821602 us)
 611873      NESTED LOOPS  (cr=662474171 pr=3001 pw=0 time=111974935129 us)
   2005       VIEW  (cr=116 pr=0 pw=0 time=152402 us)
   2197        COUNT  (cr=116 pr=0 pw=0 time=102234 us)
   2197         VIEW  (cr=116 pr=0 pw=0 time=91204 us)
   2197          FILTER  (cr=116 pr=0 pw=0 time=80192 us)
   2308           HASH GROUP BY (cr=116 pr=0 pw=0 time=78997 us)
   4308            TABLE ACCESS FULL ORA_TEMP_1_DS_23528 (cr=116 pr=0 pw=0 time=4440 us)
 611873       PARTITION RANGE SINGLE PARTITION: KEY KEY (cr=662474055 pr=3001 pw=0 time=65289249313 us)
 611873        INDEX RANGE SCAN IDX_NAME_UI PARTITION: KEY KEY (cr=662474055 pr=3001 pw=0 time=65289129215 us)(object id 194945)

 
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                      3001        0.06         28.32  
  
*********************

here Oracle created  sys.ora_temp_1_ds_23528  using the TABLE_1 definition.


index clustering factor, table count, number of blocks---
  
SQL>  select  column_name, data_type , data_length  from user_TAB_COLUMNS where   column_name in (select  column_name   from dba_ind_columns where  index_name ='IDX_NAME_UI') and table_name ='TABLE_1';

COLUMN_NAME                    DATA_TYPE                      DATA_LENGTH
------------------------------ ------------------------------ -----------
NAME                           VARCHAR2                               650


SQL> select index_name, last_analyzed, clustering_factor, num_rows , distinct_keys from dba_indexes where table_name ='TABLE_1';

INDEX_NAME                     LAST_ANAL CLUSTERING_FACTOR   NUM_ROWS DISTINCT_KEYS
------------------------------ --------- ----------------- ---------- -------------
IDX_NAME_UI                     13-JAN-13         230094384  306871930     304700852



SQL> select  sum(blocks)  from dba_segments where  segment_name ='TABLE_1';

SUM(BLOCKS)
-----------
    5895680

observations/questions
a) each row from   operation --> 2005       VIEW  (cr=116 pr=0 pw=0 time=152402 us)
is looked into index  IDX_NAME_UI and data is read from table  TABLE_1.  only 611873 matching rows are found.
due to poor clustering factor each block is read many many many times.?

b) Optimizer has chosen Nested loop join correctly but issue is poor clustering factor.?
   the table in question   has stale stats.. so to get the proper plan .. we need stats...so sort of race around condition

c) what is the way here - improve Clustering Factor. ? will do alter table move or online redifnition... but would that put the data in table in order for the NAME column key?  table has another column createdate for which I can think of most of keys of createdate column would together in a block .. but not for name column..


d) Nested Loop operation has  below stats
    611873      NESTED LOOPS  (cr=662474171 pr=3001 pw=0 time=111974935129 us)
    here 111974 seconds !  ~ 31 hr
    to compare  611873 rows taking 31 hr is huge..I see inner table  lookup taking long here adding total time to NL operation.?
  
 611873       PARTITION RANGE SINGLE PARTITION: KEY KEY (cr=662474055 pr=3001 pw=0 time=65289249313 us)
 611873        INDEX RANGE SCAN IDX_NAME_UI PARTITION: KEY KEY (cr=662474055 pr=3001 pw=0 time=65289129215 us)(object id 194945)
  time taken is around 18 hrs ( 65289249313 us) .
 
 
e) how to probe it further that  poor clustering factor or Nested Loop  is the issue? any other test...?

..dbms_stats taking ages

A Reader, January 27, 2013 - 5:48 pm UTC

Tom,

I see the above iss closely related with.

Bug 8651614 : GATHERING STATISTICS TAKE A LONG TIME WITH DBMS_STATS

will go for it.

Thanks

Execute DBMS_STATS.GATHER_TABLE_STATS inside a procedure

Raghu, April 03, 2013 - 7:20 pm UTC

Tom,

We have 2 tables that gets about 5-6 million transactions each day and the stats are gathered at the end of each day(all tables, not just these 2).

We have a couple of stored Procedures that access these tables but are run before the stats are gathered.
We have had to kill the session (since it takes more than 7 hrs sometimes, which it shouldn't), gather stats and re-run the stored procs (runs really fast the second time around) quite often..

Is it a good idea to gather stats only for those 2 tables inside the stored proc before every run (once everyday)?

procedure abcd
is
begin
DBMS_STATS.GATHER_TABLE_STATS(owner, table, cascade=>true);

Our Code here;

end;

Thanks in advance
Tom Kyte
April 22, 2013 - 1:58 pm UTC

... Is it a good idea to gather stats only for those 2 tables inside the stored
proc before every run (once everyday)? ...


it would be a good idea to either have already gathered statistics OR to have SET them (using dbms_stats) or to have copied them from yeserdays data - or to LOCK the stats on these tables and never gather again.

what you want to have in place are statistics that represent the data in the table - you could do that by gathering stats ONCE when the table is full, locking them and never doing it again.

you could do that by setting stats on the table - tell us "there are 6 million rows in here and they look sort of like this"

you could do that by gathering stats after the data has been loaded.


but doing what you are doing doesn't make sense...

Execute DBMS_STATS.GATHER_TABLE_STATS 2 methods

Linda, June 28, 2013 - 4:47 pm UTC

How to know that the statistics of a table where gathered with this two method?

DBMS_STATS.GATHER_TABLE_STATS(
ownname => <table owner>
,tabname =><table_name>
,estimate_percent => NULL
,block_sample => FALSE
,degree => DBMS_STATS.AUTO_DEGREE
,cascade => TRUE

And:

DBMS_STATS.GATHER_TABLE_STATS(
ownname =><table owner>
,tabname => <table_name>
,estimate_percent => NULL
,block_sample => FALSE
,method_opt => ''FOR ALL INDEXED COLUMNS''
,degree => DBMS_STATS.AUTO_DEGREE
,cascade => FALSE

Pawan Tyagi, October 23, 2013 - 12:37 pm UTC

I need to see the old DBMS Stats data on a table in oracle database. I'm able to see the time when stats ran last time and its history date as well but how can i see old values and new values on a table.

Analyzing Tables using DBMS STATS or Compute Statistics, which will be better ?

Malaya, October 31, 2013 - 2:47 pm UTC

Hi,

I am under confused stage. Generally I use DBMS STATS to optimize DB performance. But since last 2 months I encountered severe performance issue. Then I used compute statistics and performance also increased.
Please advice me, what should I use in future ? I am in a confuse stage now. Please help me.

Malaya
Tom Kyte
November 01, 2013 - 9:27 pm UTC

only use dbms_stats.

analyze table for gathering statistics is a deprecated bit of functionality.


use dbms_stats and make sure you are gathering the same set of statistics you did with analyze (you didn't share your analyze or dbms_stats commands so I cannot comment further). what happened is that you probably gathered entirely DIFFERENT types of statistics with the two commands. Make sure they are gathering the same amount and types of statistics.

Degree level in DBMS_Stats

Mahes, November 18, 2013 - 11:10 am UTC

hi Tom

i am using following scripts in loading Store Procedure. which runs every day and load the data to target table.

Orace Version:11.2

DBMS_STATS.Gather_Table_stats(ownname ='XYZ', tabname = V_tbl_name, cascade = True, degree => 3 );


in degree it is mentioned number. what basis it is used? and what is the range can be used?

i saw in this page autodegree number. can you explain how it is allocated?

also please tell about cascade?

Thanks

ENDPOINT_VALUE holding VARCHAR

Andre, January 10, 2014 - 1:54 pm UTC

Hi Tom,

A Very Happy New Year 2014 !!

+

Is there a way to decode the ENDPOINT_VALUE from histogram view where the value is VARCHAR2..?

I have tried to search for answers for days - got a link to Jonathan Lewis and others - but ...
... it seems that decoding the first 6 bytes is OK 0 beyond that there are problems.

And ... the field "ENDPOINT_ACTUAL_VALUE" - sometimes shows such values - but then they also somehow vanish...

Thanks in advance
Andre

ENDPOINT_VAL for VARCHAR2

Andre, January 12, 2014 - 11:03 am UTC

Hi Tom,

More searching through Jonathan Lewis' papers and I have located his paper on this subject.

It appears that Oracle ONLY encodes the first 5 characters + possibly the 6th (although this is not 100% correct) but beyond this the remaining values are garbage.

JL explained it pretty well - and it appears that the ACTUAL value column that is mostly NULL in the HISTOGRAM view stores the values if the first 6 are not unique. I have not tested such cases yet - and probably will when time allows.

But I am curious as to Oracle strategy... i.e. the first 5 bytes. Does it change for 12c..?

Thanks
Best wishes
A

indexing a patitioned table

jason, January 22, 2014 - 10:17 pm UTC

Could you please let me know the best way to find out which columns of a table is used most frequently? I just partitioned a table using range (gl date). But i would like to create indexes on a column or columns taht are used mostly in queries. id there a way for me to get this information? I have used the SYS.COL_USAGE$.but im not quite sure that's what is needed

performance imapct on stats collection

Ravi B, April 24, 2014 - 9:40 pm UTC

Hi Tom,

We have several "big" tables in our data warehouse. After each ETL we gather stats on the tables. Each of the table has partition and sub-partition. The complete process of gathering stats is taking hours to finish.

currently, this is what we do for each table.

EXEC DBMS_STATS.GATHER_TABLE_STATS(<SCHEMA_NAME>,<TABLE_NAME>);


Do you recommend us doing the following as recommended at https://blogs.oracle.com/optimizer/entry/maintaining_statistics_on_large_partitioned_tables

EXEC DBMS_STATS.SET_TABLE_PREFS(<SCHEMA_NAME>,<TABLE_NAME>,'INCREMENTAL','TRUE');

EXEC DBMS_STATS.GATHER_TABLE_STATS(<SCHEMA_NAME>,<TABLE_NAME>);


Thanks!

a review

A reader, March 31, 2016 - 5:09 pm UTC

a review

Undo analyze table

Praveen Darla, May 27, 2019 - 10:05 am UTC

Hi Tom,
I analyzed table before taking previous statistics of that table. I want to undo or revert the
analyze table tablename compute statistics. Please assist with solution.

Thanks

Connor McDonald
May 29, 2019 - 6:23 am UTC

You can't undo an analyze (which is why we recommend using dbms_stats).

If you *do* have some historical stats, you can do:

exec dbms_stats.delete_table_stats(ownname=>'MYUSER', tabname=>'MYTABLE');
exec dbms_stats.restore_table_stats(ownname=>'schema1',tabname=>'MYTABLE',as_of_timestamp=>'12/11/19 12:12:12,123456 +08:00');



gabriel, November 14, 2022 - 5:39 pm UTC

hello. i have a problem and i don't know how to figure it out. can you help me please? i have this requires : Count the total number of books, magazines rented, and total number of members
used the library in the selected month.

my code : Select sum(transaction_id) from ledgers where month(issue_date) = 7;

the result : ORA-00904: "MONTH": invalid identifier

thank you.
Connor McDonald
November 15, 2022 - 7:06 am UTC

month(issue_date) = 7;

is probably

extract(month from issue_date) = 7

or you might consider something to ensure issue_date is unaffected, eg

issue_date >= date '2022-07-01' and
issue_date < date '2022-08-01'

(assuming you want data just for the current year)

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.