Database Statistics
Tim, November 11, 2004 - 9:08 am UTC
I appreciate you time on this one. My curiosity is satisfied and I can stop scratching my head looking for what I missed.
Thanks.
Automatic Statistics Collection
Tim, March 31, 2005 - 4:21 pm UTC
In Oracle 10g, the GATHER_STATS_JOB job gathers optimizer statistics by calling the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure. This job is automatically scheduled to run in the batch maintenance window.
The GATHER_DATABASE_STATS_JOB_PROC procedure is undocumented as it is an "internal" routine. Database documentation indicates that the GATHER_STATS_JOB "operates in a very similar fashion to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option."
The question I have relates to the amount of time the job requires to complete for a database with many large tables. The Oracle documentation for DBMS_STATS indicates that "the GATHER_STATS_JOB continues until it finishes, even if it exceeds the allocated time for the maintenance window."
Are the statistics collected by the GATHER_STATS_JOB based on "compute" rather than "estimate" or is there some way to specify how statistics are to be collected?
I recognize the importance of valid statistics to ensure query performance, but there are trade-offs that must be considered.
March 31, 2005 - 5:38 pm UTC
well, this is purely empirical, but...
ops$tkyte@ORA10G> l
1 select owner, table_name, last_analyzed, num_rows, sample_size from dba_tables
2 where last_analyzed is not null
3* order by abs(num_rows-sample_size)
ops$tkyte@ORA10G> /
BIG_TABLE BIG_TABLE 26-MAR-05 996009 57750
OPS$TKYTE DEPT 13-MAR-05 10000000 2000
1551 rows selected.
Now, I did not analyze those tables the system did and it sampled them.
another quick and dirty experiment:
ops$tkyte@ORA10G> create table t as select * from big_table.big_table;
Table created.
ops$tkyte@ORA10G> connect / as sysdba
Connected.
sys@ORA10G> exec dbms_stats.GATHER_DATABASE_STATS_JOB_PROC
PL/SQL procedure successfully completed.
sys@ORA10G> @connect /
ops$tkyte@ORA10G> select last_analyzed, num_rows, sample_size from user_tables where table_name = 'T';
LAST_ANAL NUM_ROWS SAMPLE_SIZE
--------- ---------- -----------
31-MAR-05 1000284 74173
seems to sample
But things that start with "A" (auto) are fairly black boxed these days.
clarification
amit poddar, April 01, 2005 - 7:34 am UTC
Hi,
If we gather system stats then is there a point in setting optimizer_index_cost_adj parameter. But even if we set it after gather system statistics, what effect does it make
April 01, 2005 - 8:45 am UTC
do you have access to Effective Oracle by Design? If so, in there -- I describe and show the effects of setting these and they would have the same effects with or without system stats
But the overall outcome without system stats would most likely be DIFFERENT than with system stats. All the optimizer_* parameters do is change the cost assigned to various operations made by the optimizer.
clarification
amit poddar, April 01, 2005 - 8:48 am UTC
so you are saying that setting optimizer_index_cost_adj will have the same effect with or without system statistics ?
In that case would you recommend setting optimizer_index_cost_adj along with system statistics ?
I understand that we should test throghly before making any kind of decesiion. But my question is to determine a starting point for the testing
April 01, 2005 - 9:10 am UTC
it will have the same EFFECT (it is affecting the math of the optimizer)
it will likely not have the same OUTCOME.
It does the same thing with or without them, but it is affecting *different* costs under each.
I like defaults, if I run better than good enough without modifying anything, I'm very happy.
Nice
Kumar, April 26, 2005 - 11:56 am UTC
Dear Sir,
How to calculate the Database size and
Data dictionary size?
How to answer these questions when
asked in interviews??
April 26, 2005 - 1:38 pm UTC
query dba_data_files... sum up the sizes of files.
then sum up the sizes of the files allocated to the SYSTEM tablespace (and sysaux if 10g and you want to count that)
skips redo logs, control files and temporary files, but would be the answer most would expect I suppose.
how to change the default window
Tapas Mishra, March 21, 2006 - 5:27 pm UTC
Hi Tom:
The 10gR2 Performance Tuning Guide (chapter 14) says "This job(GATHER_STATS_JOB) is created automatically at database creation time and is managed by the Scheduler. The Scheduler runs this job when the maintenance window is opened. By default, the maintenance window opens every night from 10 P.M. to 6 A.M. and all day on weekends."
And because all thee are "internal" to Oracle, how can we change the window time to the best business needs of ours, when we have less activity on the system? Can you shed some light on this?
Thanks and regards,
-tapas-
March 22, 2006 - 3:23 pm UTC
The name of the job is 'GATHER_STATS_JOB' and you can use the dbms_scheduler api to change its attributes
EXEC DBMS_SCHEDULER.DISABLE('GATHER_STATS_JOB');
for example to turn it off - or dbms_scheduler.set_atributes to change its runtimes/schedule
how to change the default window
Tapas Mishra, March 24, 2006 - 12:03 pm UTC
Thanks Tom for getting back to me and for the advise. I was thinking to turn the job off like you said, but still use the PROC that it uses (GATHER_STATS_JOB_PROC) for collecting statistics and schedule this proc using DBMS_SCHEDULER. Is that can be a correct approach too?
Anotehr question is, in some other thread here I read that on Oracle9iR2 it is better to turn on TABLE MONITORING for objects and go for the STALe approach to gather stats. Will this be applicable to SYS objects? Do not want to touch SYS objects and make them MONITORING too. Your opinion please.
Best regards,
-tapas-
March 24, 2006 - 3:34 pm UTC
I'm not sure I entirely like the entirely "auto" approach for anything above a smallish system. Someone types a new query into sqlplus and the next time we gather stats - we gather a different set of stats (start getting histograms because sys.col_usage$ got updated saying we use a predicate on column X). I'd rather gather what I wanted to gather in most cases still. I might change my mind over time, but that is my mindset right now.
ops$tkyte@ORA10GR2> select monitoring, count(*) from dba_tables
2 where owner = 'SYS'
3 group by monitoring;
MON COUNT(*)
--- ----------
NO 40
YES 665
most will be in 10g
How to change the window
Tapas Mishra, March 25, 2006 - 1:02 pm UTC
Thanks Tom for your insight, I know, this is very long question, but I treat you as my mentor and the only safe place where I can get true guidance! So please bear with me... (by the way, you can consider me as a R&D DBA for an application vendor and not production DBA, but I work very closely with good production DBAs to help them in crisis in order to have my application running).
yes, I agree with you I'm also little not sure what to do with this entire "auto" approach in 10G. Well, you cleared my base question above of "changing window time". However, let me give you little background of the challenges I'm facing. My software works great in RBO mode so far; Some of my customers have gone to CBO mode on 9iR2 and still running fine, although they only collect my schema stats. With that, at times we have some issues (some queries/searches are slow)and then when we gather stats again, we are fine. None of them use the SYSTEM stats nor the I/O, CPU stats (per your advise I got Jonathan's CBO fundamental book, and on 2nd chapter , it's taking time to digest for me). At the same time I'm catching up with 10gR2 - wherein RBO is dead. I want to do everything possible in 9iR2 in the lines of 10GR2 to make all of customers go to CBO mode on 9iR2 first, since I know with a little downtime I can bring them back in RBO mode (otpimizer_mode=RULE). But that's my last goal, but first I want to do everything right to be successful on CBO mode - with no or less changes to my applications (working with developers on those challenges too :-).
With above in mind, I thought to gather SYS stats and do table monitoring for my application stats. But at this point not sure on the following:
1. If I do not want to touch the monitoring part of the SYS objects, how often should I update the SYS stats? I was thinking to use the GATHER_DATABASE_STATS that will collect all schema's stats for STALEness. But if you advise, I can change it to TWO GATHER_SCHEMA_STATS, one for my apps and the other for SYS schema. Please advise, since I will NEVER change those 40 tables to MONITORING in SYS by myself. Frequency of the SYS is the major question, like you said on the col_usage$.
2. How often should I collect CPU and I/O stats, I know that I can find a peak load and collect at that time or at a very consistent and busy transaction time. But how often? Is it once, until something drastic changes like memory addition, CPU addition etc....?
- The other reason is multi-user environement test, that is becomming expensive for me since the only thing to do it right is to get load runner, that's expensive and take me a while to get there.
Please advise what you would have done in my place, in order to have application's smooth transition to CBO from RBO in a multi-user environment, with the above constraints?
Best regards,
-tapas-
March 25, 2006 - 2:59 pm UTC
1) if your application doesn't really query the dictionary, the gather of stats on SYS objects might well be completely "not relevant" to you.
I don't like "gather database", prefer doing it at the schema level. That is a preference.
the sys.col_usage$ reference was the fact that if you do something like the following:
create table t ( x int, ....... );
create index t_idx on t(x);
load t
gather stats using AUTO right now - it will NOT get histograms for x
select * from t where x = :x
select * from t where x between :a and :b;
all of a sudden, the next time you gather (no modifications to T, just re-gather stats), dbms_stats using AUTO will query sys.col_usage$ and discover that you put predicates on X and change what it gathers!
2) unless something changes - you don't need to keep doing this. Think of the system like a "table", if you never modified a table - would you constantly gather stats on it?
gather stats for CBO migration from RBO
Tapas Mishra, March 25, 2006 - 3:30 pm UTC
Thanks again Tom, it is very much appreciated that you take some time even on a Saturday to explain concepts to us.
1. The way I understand your first point above is, the col_ussage$ wil be used only if the SYS schema stats is gathered, correct? My apps do not query the SYS schema, only during sometimes for schema upgrades where in we check for some columns existance, other than that we donot use SYS everyday. What I understand is, not to gather stats on SYS if my apps are not using it, correct?
2. Yes, now I understand that if no variable is changed on the entire system, the CPU, I/o etc can be one-time, but when it changes then updates to it is required in order feed the optimizer the correct information for future.
In essence, you are thinking that, for my kind of apps not going thru SYS, it is approapraite to gather STALE stats for my schema objects only and at the same time keep the optimizer informed about CPU, I/O stats. Is my understanding correct?
3. You probably missed my last point on migrating from RBO to CBO. Could youplease give me some insight on this? Although most of the test I do in single-user mode still holds good for us inmulti-user mode, change from RBO to CBO is something, without Load Runner kind of stuff, is scary to try on a multi-user environment on production directly. How do you advise m to solve such things given the constraints?
Thanks again,Tom.
-tapas-
March 25, 2006 - 8:03 pm UTC
1) not at all, that is not what I said at all.
If YOU
a) create a table and load it
b) gather stats with auto - dbms_stats will peek on col_usage$ and see you never "where" on anything. Decides what stats to gather based on that.
c) later run some queries (Oracle will UPDATE col_usage$ to say "this table has predicates on these columns)
d) gather stats with auto again (no data modifications needed!) dbms_stats will gather stats in an entirely DIFFERENT manner - because you ran some queries.
has nothing to do with the presence or lack thereof of statistics on sys tables.
2) yes.
3) test, test, test, test, test. That is all.
dbms_stats.lock_schema_stats
Alberto Dell'Era, March 25, 2006 - 5:31 pm UTC
What's your opinion about using dbms_stats.lock_schema_stats to "disable" the Automatic Statistics Gathering at the schema level, instead of turning it off for the whole database ?
(with a caveat: dbms_stats.lock_schema_stats just locks the tables currently in the schema, not the schema itself - newborn tables in the schema are not locked by default).
Or - do you know of another way to disable the auto stats gathering at the schema level ?
I'd rather prefer to have at least the SYS schema stats automatically maintained the way the kernel developers designed them to be (and other "system" schemata as well that may appear in the future) - and probably canned applications are going to require it as they migrate from 9i to 10g.
tia
March 25, 2006 - 8:03 pm UTC
I'm not a big fan (yet) of AUTO in general.
You could create your OWN job to gather SYS schema stats if you like...
Alberto Dell'Era, March 26, 2006 - 6:37 am UTC
>You could create your OWN job to gather SYS schema stats if you like...
Sure, but since SYS is not "my schema", how could I know eg if gathering histograms is ok or not, and the same for auto_sample_size and other parameters ?
I'd rather gather stats on SYS the way the kernel developers intended to do - and as far as I know, that way is hidden into the Automatic Statistics Gathering job.
Hence, the idea to let the job run, and lock "my schemas" instead, to have them skipped.
March 26, 2006 - 8:31 am UTC
I'm not a fan of the AUTO it uses - they "intended" the stats to change based on what queries you happen to accidently run. I'm not a fan of that (yet).
Up to you - I don't have any good advice for this one.
tow posts before - below got skipped - please see this
tapas mishra, March 26, 2006 - 3:57 pm UTC
Thanks again Tom, it is very much appreciated that you take some time even on a
Saturday to explain concepts to us.
1. The way I understand your first point above is, the col_ussage$ wil be used
only if the SYS schema stats is gathered, correct? My apps do not query the SYS
schema, only during sometimes for schema upgrades where in we check for some
columns existance, other than that we donot use SYS everyday. What I understand
is, not to gather stats on SYS if my apps are not using it, correct?
2. Yes, now I understand that if no variable is changed on the entire system,
the CPU, I/o etc can be one-time, but when it changes then updates to it is
required in order feed the optimizer the correct information for future.
In essence, you are thinking that, for my kind of apps not going thru SYS, it is
approapraite to gather STALE stats for my schema objects only and at the same
time keep the optimizer informed about CPU, I/O stats. Is my understanding
correct?
3. You probably missed my last point on migrating from RBO to CBO. Could
youplease give me some insight on this? Although most of the test I do in
single-user mode still holds good for us inmulti-user mode, change from RBO to
CBO is something, without Load Runner kind of stuff, is scary to try on a
multi-user environment on production directly. How do you advise m to solve such
things given the constraints?
Thanks again,Tom.
-tapas-
March 26, 2006 - 5:19 pm UTC
I did not skip these, I responded in full above.
Thanks for everything
Tapas Mishra, March 26, 2006 - 6:20 pm UTC
Sorry Tom, yes, you answered all of them above. Could you please point me right direction:
1. When and how often SYS stats should be gather for apps not querying the SYS schema? Any links will be helpful.
2. Moving from RBO to CBO, yes, I understand TEST is all about it. Well, do you mean singel-user mode of application test or multi-user tests? My issues is without an automation tool, it is difficult to simulate multi-user. Since are only worried about searches (query) being slow from the database, is the single-user mode appropriate for this? I beleive the execution plan will not change irrespective of if it is single-user mode or multi-user mode. Correct?
Thanks again.
-tapas-
March 26, 2006 - 7:19 pm UTC
1) </code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:60121137844769 <code>
2) *typically* much of this (compare the plans) can be done in isolation - however, not always.
when you compare the CBO to the RBO:
Much of the time, we are comparing "work done (IO's) and cpu used" - if it is "less" in both - you are probably OK as it scales up.
If it is "less IO, more CPU" - it might not scale if lots of people do it at the same time (insufficient cpu)
if it is "more IO, less CPU" - it might not scale if lots of people do it at the same time (latching related issues due to more IO)
if it is "more of both", it is not a good thing
I/O and CPU input is excellent idea
Tapas Mishra, March 26, 2006 - 8:10 pm UTC
Thanks Tom so much, I will follow the link you provided and I'm sure that will keep me busy and help to find out the SYS stat part.
I think the I/O and CPU costs for TOP SQLs in CBO and RBO mode comparision is an excellent idea. I was only thinking in terms of "DB Time" but now I feel this is more crucial for scale up for multi-user environment.
Best regards,
-tapas-
GATHER_STATS_JOB on non-system schemas?
David McWhinnie, April 28, 2006 - 12:35 pm UTC
Is the Gather_Stats_job gathering statistics on the non-system schemas too? Metalink note 341839.1 seems to indicate that only the internal tables are being analyzed, but we are seeing all schemas getting analyzed.
From the Metalink note: "The GATHER_STATS JOB will collect statistics only on Installed components which
are being utilized in your system."
April 28, 2006 - 1:03 pm UTC
ops$tkyte@ORA10GR2> create table t as select * from all_users;
Table created.
ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select table_name, last_analyzed from user_tables where table_name = 'T';
TABLE_NAME LAST_ANAL
------------------------------ ---------
T
ops$tkyte@ORA10GR2> connect / as sysdba
Connected.
sys@ORA10GR2> exec dbms_stats.gather_database_stats_job_proc;
PL/SQL procedure successfully completed.
sys@ORA10GR2> connect /
Connected.
ops$tkyte@ORA10GR2> select table_name, last_analyzed from user_tables where table_name = 'T';
TABLE_NAME LAST_ANAL
------------------------------ ---------
T 28-APR-06
that is the procedure the job runs....
Metalink Note
Jay, May 15, 2006 - 10:12 am UTC
I saw that Metalink note, 341839.1 (last revisioned on 5/10/2006), too, and it's very confusing. Here is the quote from the note:
--------------------- Start Quote ----------------------
4) Why does the GATHER_STATS_JOB procedure not collect statistics on all the objects of the database ?
The GATHER_STATS JOB will collect statistics only on Installed components which
are being utilized in your system.
The GATHER_STATS_JOB uses the following SQL to determine which objects to
collect statistics.
'ORACLE' actually restricts the list of schemas for which the automatic stats gathering job will gather statistics to a list of Oracle component system schemas generated by the following query:
select distinct u.name
from registry$ r, user$ u
where r.schema#=u.user#
and r.status in (1,3,5)
and r.namespace = 'SERVER';
The objects on which the statistics have not been collected have never been utilized, hence the JOB does not gather statistics on them.
--------------------- End Quote --------------------
I ran above query, basically none of my application schemas were included in the result. However, I can see my used tables are anlayzed every night while I don't have other stats collection jobs.
May 15, 2006 - 10:57 am UTC
strangely, when I traced the job, it doesn't even query the registry$ table...
but the demonstration above is pretty compelling to the end that it does in fact gather statistics outside of what that note suggests
Re: Metalink Note
Jay, May 15, 2006 - 12:07 pm UTC
Thanks Tom!
Re: Metalink Note
Jay, May 22, 2006 - 1:15 pm UTC
Note 341839.1 was revised again on 5/17/2006 and it now reads like
----------------------- Start Quote --------------------
4) Why does the GATHER_STATS_JOB procedure not collect statistics on all the objects of the database ?
The GATHER_STATS JOB will collect statistics only on Installed components which
are being utilized in your system.
The GATHER_STATS_JOB determines if STALE stats are present.
If any object has changed more than 10% in size, that obect is a candidate and will have statistics re-gathered.
The objects on which the statistics have not been collected have never been utilized.
Therefore, the GATHER_STATS_JOB does not gather statistics on them.
----------------------- End Quote -------------------------
The high exposure of ASKTOM must have persuaded them to correct this part of note. Thanks!!
How to set and change %
atul, July 03, 2006 - 4:37 am UTC
Hi,
In 10g at database creation time gather stats job is created.
I want to check if that job is running or not?
And change estimate % value to 30 and change the start time of the job.
What i did is below.
SQL> select owner,job_name,job_subname,JOB_CREATOR,run_count,PROGRAM_NAME from DBA_SCHEDULER_JOBS where job_name='GATHER_STATS_JOB';
OWNER JOB_NAME
------------------------------ ------------------------------
JOB_SUBNAME JOB_CREATOR RUN_COUNT
------------------------------ ------------------------------ ----------
PROGRAM_NAME
--------------------------------------------------------------------------------
SYS GATHER_STATS_JOB
SYS 83
GATHER_STATS_PROG
SQL> select schedule_owner,schedule_name,schedule_type from DBA_SCHEDULER_JOBS where job_name='GATHER_STATS_JOB';
SYS
MAINTENANCE_WINDOW_GROUP
WINDOW_GROUP
SQL> select SCHEDULE_NAME,start_date from DBA_SCHEDULER_SCHEDULES;
DAILY_PURGE_SCHEDULE
So from this information,Could you tell me if the job is running or not?
And How to change estimate and Run timing of the job.
Please help in detail.
Thanks,
Atul
July 07, 2006 - 3:20 pm UTC
job has run 83 times, seems to be running. Look at the max last_analyzed_time as well - you can see when the last time stats where automagically gathered.
gather_database_stats_job_proc
RD, April 05, 2007 - 11:53 pm UTC
Hi Tom,
Please can you tell me if 'gather_database_stats_job_proc' gathers statistics on the dictionary objects (sys) also ?
Thanking you,
Regards,
RD
April 06, 2007 - 11:39 am UTC
yeah, it does. select last analyzed out and you should see some "current" values if the dictionary has changed enough
Wonderful..
A reader, April 16, 2007 - 3:24 pm UTC
Tom..
thanks for your Great Explanation..
We have a Prod Data warehouse , where 50% of the data is wiped out at 5PM , and new data will start coming in from 6 PM , and we start out load process using ETL tools into Oracle Tables every night, starting from 6 PM and goes on till 1 PM.
We have recently, migrated from Oracle9iR2 to 10gR2. Before moving to 10g , we used to analyze the tables after the DAta load, using
analyze table lookup_description estimate statistics sample 25 percent;
So you are saying that we dont need to Analyze anymore, and let Oracle 10gR2's "GATHER_STATS_JOBS" do the work for us ?..
but, i want that to analyze after the data load, and iam not sure how Oracle Automatic JOB can get the time right for us to capture the Stats ?..
Any help is appreciated
Ram K
Plan Change after gathering Stats
VLS, May 03, 2007 - 11:14 am UTC
Hi Tom,
I have noticed that whenever stats are generated, execution plan of some of the queries change and sometimes these change in the plan is bad and affects the performance of the overall database. Hence, I have some queries :
1. If I want the same plans through out the life cycle based on the current stats, why do I generate Stats ?
2. We have seen in highly volatile tables that plan changes dynamically and suddenly a query changes its index pattern. Why is it so ?
In this case, query returns to normaly once we gather stats.
Regards
May 03, 2007 - 10:41 pm UTC
on the other hand:
I have noticed that whenever stats are generated, execution plan of some of the queries change and most of thtimes these change in the plan is exceptionally good.
1) because you gather statistics to do two things:
a) have plans change
b) have plans stay the same.
think about the statistic that tells the optimizer what the "max value for column X is". If X is populated via a sequence or a date - that'll get "out of date" rather fast... and the estimated card= values will not be accurate and the wrong plan will be generated (because the optimizer starts thinking "zero rows" over time)
2) need more details. could be bind peeking, but insufficient detail to say. Or it could be 1.b from above...
col_usage$ purge/reset ?
Nicosa, October 24, 2008 - 10:32 am UTC
I can't figure out if col_usage$ is ever reset.
The information in col_usage$ seems to persist throught database bounce, statistics gathering, table alteration, table truncate (at least on 10gR2).
Ain't this information ever reset ?
(I guess that's where the timestamp column comes into play...)
October 24, 2008 - 1:48 pm UTC
it persists. That it is in a real table and not a v$ table....
Stats on Partitioned Tables
A reader, January 14, 2009 - 6:02 am UTC
Hi Tom,
1.I see GATHER_STATS_JOB even for global granularity takes a long time on partitioned tables stats collection?
Can you suggest on how to gather statistics optimally for partitioned tables if only global granularity is needed for an application?
January 16, 2009 - 4:24 pm UTC
well, you have at your disposal parallel and estimate percents.
Correction for above-Stats on partitioned tables
A reader, January 14, 2009 - 6:05 am UTC
One correction:
i am not clear how gather_stats_job collects for partitioned tables stats or default sometimes takes too long.
My question remains same though as above
January 16, 2009 - 4:25 pm UTC
if you want to control this, you won't be using the default gather stats job, you'll be submitting dbms_stats.gather_table_stats yourself (and then the gather stats job will skip it since it won't be "stale")
A reader, February 11, 2009 - 3:16 pm UTC
Hi - Currently we have a job that computes full schema statistics every day and right now it takes upto 5 hrs to complete. So we are reevaluating and trying to find out if we really need to do the full statistics every day. We also want to evaluate the tables based on the application that populates it, rate of change of data, etc. so the method of computing stats can be different for each table.
1. How would you go about analyzing how to compute statistics for every table
2. What other criteria should we base our evaluation on ?
February 12, 2009 - 10:44 am UTC
I wouldn't (compute statistics for every table every day) - do you need to, does your data change in such a manner that you need a compute every day on all of it? do you really need a compute?
Why not something like making sure all tables are in monitoring mode (10g they are, 9i you alter them to be so) and then gather stale using a 5% estimate?
A caveat with that approach would be statistics on some columns that are monotonically increasing - like sequences or columns populated by sysdate.
In those cases, say you have a 1,000,000 table that you insert into. It will not be considered stale until you insert another 100,000 records. So say you insert 25,000 records per month. It will be 4 months before we gather statistics on this table again - by that time the high value for a sequence/date populated column will be 4 months out of sync, you might want to do that table on a schedule OR if it is stale (but every day would almost certainly be overkill - maybe once a week)
A reader, February 12, 2009 - 1:39 pm UTC
We have data in different groups of tables - like lookup tables, tables with a high rate of change of data that get changed throughout the day and some that only get changed by a nightly batch process. Will the gather stale option pertain to all kinds of tables ? What do we do about the nearly static lookup tables ?
stats
A reader, February 12, 2009 - 10:15 pm UTC
A reader, February 17, 2009 - 1:29 pm UTC
I read the article in the last link posted above
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#i41884 While reading about the stale statistics, there was a mention about checking user_tab_modifications view. I checked and found that that view only returns details about 6 tables in my production schema while I know that there are lot more tables that get modified. What is the reason for that ? How do I find the details about the other tables ?
February 17, 2009 - 2:48 pm UTC
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14237/statviews_4465.htm#sthref2375 <quote>
This view is populated only for tables with the MONITORING attribute. <<== check that
It is intended for statistics collection
over a long period of time. <<== keep that in mind
For performance reasons, the Oracle Database
does not populate this view immediately when the actual modifications occur. <<== and that is probably the 'root' cause
</quote>
A reader, February 17, 2009 - 3:01 pm UTC
"Monitoring attribute" => how do I check if tables have this turned on or not.
"Over a long period of time" => This has been in production for about 8 months now, is that good enough ?
"Views do not populate immediately" => I am looking for tables that should have changed last night.
February 17, 2009 - 4:25 pm UTC
B YES
T2 YES
BIG_TABLE YES
ERR$_T YES
SALES_SUMMARY YES
T YES
T1 YES
as for the 8 months, it doesn't matter how long the TABLE has been there, it is a function of how long ago statistics where run. And how long we decide to cache the information. We flush it if you gather stats, but you could flush it yourself (but don't, you really don't need to)
ops$tkyte%ORA10GR2> select * from user_tab_modifications;
no rows selected
ops$tkyte%ORA10GR2> exec dbms_stats.FLUSH_DATABASE_MONITORING_INFO
PL/SQL procedure successfully completed.
ops$tkyte%ORA10GR2> select * from user_tab_modifications;
TABLE_NAME PARTITION_NAME SUBPARTITION_NAME INSERTS UPDATES DELETES TIMESTAMP TRU DROP_SEGMENTS
------------------------------ ------------------------------ ------------------------------ ---------- ---------- ---------- --------- --- -------------
TKTSTATUS 6 0 0 17-FEB-09 NO 0
T1 2 0 0 17-FEB-09 NO 0
T2 2 0 0 17-FEB-09 NO 0
S 3 3 0 17-FEB-09 NO 0
X 3 0 0 17-FEB-09 NO 0
Y 2 0 0 17-FEB-09 NO 0
SALES_SUMMARY 10 0 0 17-FEB-09 NO 0
ERR$_T 12 0 0 17-FEB-09 NO 0
QUES_SET 10 0 0 17-FEB-09 NO 0
BIG_TABLE 100000 0 0 17-FEB-09 NO 0
T 200000 0 0 17-FEB-09 NO 0
11 rows selected.
"Views do not populate immediately" => I am looking for tables that should have
changed last night.right, the views do not populate immediately, or even quickly, we *cache* stuff, it is information WE use - we need it, we don't need to publish it rapidly.
A reader, February 17, 2009 - 4:33 pm UTC
B YES
T2 YES
BIG_TABLE YES
ERR$_T YES
SALES_SUMMARY YES
T YES
T1 YES
Is there a query that returned this info ?
February 17, 2009 - 4:53 pm UTC
whoops, yeah, a simple one against user_tables
ops$tkyte%ORA10GR2> select table_name, monitoring from user_tables;
TABLE_NAME MON
------------------------------ ---
C YES
KEY_LIST YES
TTTT YES
A reader, February 17, 2009 - 5:21 pm UTC
Hi - This is the query I ran in my production schema and looks like almost all of them have monitoring turned on especially the ones that I am interested in.
1 select monitoring, count(*) from user_tables
2* group by monitoring
SQL> /
MON COUNT(*)
--- ----------
YES 208
NO 2
Dont know why I cannot find the modifications info then... Should I execute the flush proc ?
February 17, 2009 - 8:22 pm UTC
I told you "no, you should not"
I told you "it takes a long time"
Do not worry about this, it is working as expected. WE need the information, WE have the information, WE use the information - you do not really need to see it.
If you are curious, go ahead - ON TEST. Don't do spurious things in a production system.
A reader, February 17, 2009 - 8:36 pm UTC
The thing is it would be helpful to get more information on other tables which have monitoring turned on. I only see the information in user_Tab_modifications for 6 tables out of 208. What is the reason for that ?
February 17, 2009 - 9:25 pm UTC
Ok, trying one more time:
we cache this information
for a long time
there is no reason for us to not to
it would be bad to flush it often
if you gather statistics 'frequently' - you might *never* see anything in there.
what is your goal, why do you think this would be "helpful", if we know that, we might be able to say "well, you could do this.... or look at that...."
A reader, February 18, 2009 - 10:22 am UTC
Right now, we gather full schema stats every morning. Our goal is to not do this but instead understand how the tables are getting changed in order to determine if we can change the process of gathering stats and do a better job about it. So we thought we could get the information about the data change from user_tab_modifications. Now that that might not be possible, do you have any other suggestions ?
February 18, 2009 - 3:59 pm UTC
but why not just gather stale then?
I mean - think about it, either you read this table and figure out "hey, that table changed, lets gather"
OR
we read that data and we figure out "hey, that table changed, lets gather"
It is possible, I told you how to flush them, I just wouldn't bother.
updated statistics
Yoav, March 02, 2009 - 2:04 pm UTC
We are moving from 8i rule base to 10gR2.
Reading one of your last article in oracle magazine about the dynamic sampling shows that even if object does not have statistics at all , but dynamic sampling is set to 2 in instance level (default value) ,in most cases oracle succeed to produce a good plan after all.
I also understand that if the CBO dont have accurate and updated statistics it may act like a "blind".
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm "Because the objects in a database can be constantly changing, statistics must be regularly updated
so that they accurately describe these database objects"
....
" Because the automatic statistics gathering runs during an overnight batch window,
the statistics on tables which are significantly modified during the day may become stale.
There are typically two types of such objects:
Volatile tables that are being deleted or truncated and rebuilt during the course of the day.
Objects which are the target of large bulk loads which add 10% or more to the object's total size.
...."
"...This is more effective than the GATHER_STATS_JOB, because any statistics generated on the table
during the overnight batch window may not be the most appropriate statistics for the daytime workload."
"....For tables which are being bulk-loaded, the statistics-gathering procedures should be run on those tables immediately following the load process, preferably as part of the same script or job that is running the bulk load."
I am 100% count on oracle documentation but the above is in theory, and they did not show any example that support it.
As a dba i would like to demonstrate to my developers, which are not familier with the CBO at all,
that its bad practice not follow this recommendation.
1. Could you please show a case when the CBO choose a rong plan after executing bulk loaded and statistics-gathering did not executed immediately after the load process ?
2. The same for truncating a table and not gathering statistics immediately after that ?
Best Regards
March 03, 2009 - 3:44 pm UTC
1) think about a table that has 100 rows, statistics are gathered, then you bulk load 1,000,000 more rows. What might happen?
2) think about a table that thousands of rows, you truncate (which doesn't reset statistics) and it is now very small
ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';
NUM_ROWS
----------
49998
ops$tkyte%ORA10GR2> truncate table t;
Table truncated.
ops$tkyte%ORA10GR2> select num_rows from user_tables where table_name = 'T';
NUM_ROWS
----------
49998
Statistics running long suddenly
A reader, October 02, 2009 - 12:18 pm UTC
We have four tables in our database for which we gather statistics after a specific process happens. These tables are partitioned and we run the following commands to calculate the statistics on the current partition.
BEGIN
SYS.DBMS_STATS.UNLOCK_TABLE_STATS('owner name','table name');
END;
ALTER TABLE owner.table MOVE PARTITION partition COMPRESS UPDATE GLOBAL INDEXES PARALLEL;
ALTER TABLE owner.table MODIFY PARTITION partition REBUILD UNUSABLE LOCAL INDEXES;
BEGIN
SYS.DBMS_STATS.gather_table_stats('owner name','table name','partition name');
END;
BEGIN
SYS.DBMS_STATS.LOCK_TABLE_STATS('owner name','table name');
END;
The statistics have been completing in 10 mins or so, over the past week it has been running for an hr. No significant change in the amount of data. I am stumped as to where I should start looking for the probable cause. Can you throw some light as to the possibility of this happening ?
October 07, 2009 - 3:43 pm UTC
not much to go on here, like not even *a version*.
so I'll guess.
10g, using defaults, someone queried the table and we remembered the where clause conditions and are now generating histograms:
https://www.oracle.com/technetwork/issue-archive/2006/06-may/o36asktom-099107.html Why Does My Plan Change?
or, we decided to do bigger samples (because auto is allowed to change its mind whenever it wants to)
unless you have statistic information from 'when it was fast' to compare to - there are just too many reasons to even list.
Check the statistics
A reader, October 08, 2009 - 10:17 am UTC
Hi - In our database, we have partitioned tables, IOT tables and non partitioned non IOT tables. We want to work on some scripts that will quickly give us the status of the statistics on each of the above table categories (like when last_analyzed, on partitions if compression is enabled etc). Do you have any scripts that can be used for this purpose ?