Nicely explained
Gururaj, May 30, 2001 - 10:36 am UTC
Thanks Tom, explanation is very good. Looks even better with examples
Still Confused...
Vineet, November 16, 2001 - 3:52 pm UTC
Tom,
I am still confused. I read this documetn before asking you. My question is for analyze table TABLE_NAME compute statistics command not for
analyze table t compute statistics
for table
for all indexes
for all indexed columns;
I mean analyze table without any option like for table or for all indexes or for all indexed columns...
Don't mean to beat a dead horse...
Steve McKee, February 14, 2002 - 4:11 pm UTC
Hi Tom,
I just want to make sure I understand the documentation. From the documentation (8.1.7):
"Specify a table to be analyzed. When you collect statistics for a table, Oracle also automatically collects the statistics for each of the tables indexes and domain
indexes, provided that no for_clauses are used."
Therefore, is "analyze table <table> compute statistics"
the same as:
analyze table t compute statistics
for table
for all indexes
for all indexed columns;
One of our main designers and I too want to make sure.
I appreciate it Tom.
Steve
February 22, 2002 - 9:09 am UTC
No, it is not. Consider:
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t ( x int primary key, y int );
Table created.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into t values ( 1, 1 );
1 row created.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> select 'TABLE', table_name, to_char(num_rows) num_rows from user_tables
2 union all
3 select 'INDEX', index_name, to_char(num_rows) from user_indexes
4 union all
5 select 'HIST', table_name, column_name from user_tab_histograms;
'TABL TABLE_NAME NUM_ROWS
----- ------------------------------ --------------------
TABLE T
INDEX SYS_C005115
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics;
Table analyzed.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> select 'TABLE', table_name, to_char(num_rows) num_rows from user_tables
2 union all
3 select 'INDEX', index_name, to_char(num_rows) from user_indexes
4 union all
5 select 'HIST', table_name, column_name from user_tab_histograms;
'TABL TABLE_NAME NUM_ROWS
----- ------------------------------ --------------------
TABLE T 1
INDEX SYS_C005115 1
HIST T X
HIST T Y
HIST T X
HIST T Y
6 rows selected.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t delete statistics;
Table analyzed.
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics
2 for table
3 for all indexes
4 for all indexed columns;
Table analyzed.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> select 'TABLE', table_name, to_char(num_rows) num_rows from user_tables
2 union all
3 select 'INDEX', index_name, to_char(num_rows) from user_indexes
4 union all
5 select 'HIST', table_name, column_name from user_tab_histograms;
'TABL TABLE_NAME NUM_ROWS
----- ------------------------------ --------------------
TABLE T 1
INDEX SYS_C005115 1
HIST T X
HIST T X
ops$tkyte@ORA817DEV.US.ORACLE.COM>
The analyze table t gets all of the columns, for all indexed columns does not.
Subrat
Subrat, March 23, 2002 - 8:02 pm UTC
Hi tom,
I appreciate your explanations a lot..
Thats too good for gaining knowledge ...
Hats off to you .
I just want to know 'Will you advise to analyze the Primary Columns too ?'
As When you give 'Analyze table compute statistics for table for all indexes for all indexed columns ' , you are also analyzing the primary key and Oracle strongly says that only the skewed data should be analyzed .
And the other part is that when we Analyze the primary key columns we are generating a lot of records for the primary key in the histograms ,as a result during the parse phase the optimizer will call those records and will consume the space from the DB Block Buffer and Dictionary Cache . Don't you think it will adversely affect the performance in case of heavily used big size database in a OLTP environment .
Above all it adds to the size of the sys tablespace .
Please rectify if I am wrong .
I have a doubt when the optimizer looks for the histograms does it brings the full table of histograms to SGA or brings only the data required for the selected tables during the Parse Process to db_block_buffer and dictionary cache ?
March 23, 2002 - 8:24 pm UTC
I use for all indexed columns in my examples frequently -- its a short cut for me. It is easier then running N analyzes to get what I want for what is a very short, quick demo.
I don't see it adversely affecting performance in most normal systems no.
the histograms are retrieved as needed, not all at once.
Analyze Table
R.V.Reddy, March 24, 2002 - 12:34 am UTC
Hi tom
This is a beautiful explanation regarding analyze table. I have one doubt. so for we developed some applications using around 200 tables. But I never ANALYZED the tables. If I am not analyzing the tables is there any impact on the performance or any other issues. If I do what I can achieve in my application.
Thanks
Reddy
March 24, 2002 - 3:35 pm UTC
By not analyzing the tables you are missing out on many features such as:
desc indexes,
function based indexes,
bitmap indexes,
partitioning,
parallel query,
.......
and the list grows longer and longer each release.
Analze tables compute statistics
femi, March 24, 2002 - 2:06 pm UTC
Tom,
Correct me if I am wrong, but when I issue the command "analyze table compute statistics", It computes statistics for the indexes as well.
March 24, 2002 - 3:48 pm UTC
read the answer, see the way to tell and try it!
(or read the comment a couple of lines up "Dont mean to beat a dead horse..."
for all table
Ak, March 24, 2002 - 8:51 pm UTC
Hi tom,
1)Assume i have 500 tables in a schema,how do i analyze
all tables/indexes at a time,is there any way or i have to do one by one.
2)How frequently we should analyze the tables,assuming it is
oltp system.
3)Is there any performance problem ,when we analyze during
peek time,or it is advisable to do during non-peek hour.
Thanks,
Ak
March 24, 2002 - 8:59 pm UTC
well, parallel 500 would be a bit extreme, I haven't seen a machine that could do 500 analyzes at the same time effciently yet.
1) use dbms_stats, use alter table monitoring, gather stats only on stale tables, and if you have the resources -- do so on parallel.
2) use alter table monitoring, gather stats on stale tables when capacity is at its maximum. depends on the system. some systems -- once a day, others once a week, others yet -- once a month.
3) during peak, if you do the analyze, you'll be incurring massive IO overhead. It is something you want to try to do during "off" times if at all possible.
analyze
Ak, March 24, 2002 - 10:59 pm UTC
Hi tom,
Thanks for Your reply !!! Here according to your response
it look like i have to say
analyze table A compute statistics
analyze table B compute statistics
analyze table C compute statistics
analyze table D compute statistics
Like wise for all the 500 tables and then for indexes as well ?? Please correct me if i am wrong.
This is always something taht has bugged me
Chris Slattery, March 25, 2002 - 5:46 am UTC
I always thought a stats gather (default compute) did the whole lot. The manual bears this out,,, And I quote from tahiti.oracle.com SQL 9.0 ref.
"TABLE table
Specify a table to be analyzed. When you collect statistics for a table, Oracle also automatically collects the statistics for each of the table's indexes and domain indexes, as long as
no for_clauses are used.
When you analyze a table, Oracle collects statistics about expressions occurring in any function-based indexes as well. Therefore, be sure to create function-based indexes on the table before analyzing the table"
and so on.
All I am looking for is a clear statement of the difference. There seems to be a discrepancy in the manual, from what Mr. Kyte writes. (and I am quite happy to follow his example).
as an interesting note I see from the same SQL reference that Oracle now wish you to use DBMS_STATS rather than ANALYZE and in fact give an ominous warning that DBMS_STATS will be the only method for generating stats that will actually be used (by the CBO) in some future release.
Any idea as to what release this will actually be ?
March 25, 2002 - 8:12 am UTC
What discrepancy do you see? I see NONE.
where is there a discrepancy??
I demonstrated (see comment Don't mean to beat a dead horse.) that analyze table compute stats does more then
analyze table t compute statistics
for table
for all indexes
for all indexed columns;
I also demonstrated that when you use for, you get only what you ask for.
the original QUESTION was:
In your quetion answers, you prefer:
analyze table t compute statistics
for table
for all indexes
for all indexed columns;
does "analyze table t compute statistics
for table"
cover all the rest?
and my answer was (and still is) no, no it does not.
For AK from India
Pawan, March 25, 2002 - 7:30 am UTC
AK,
No you don't have to Analyze all the tables in the schema individually. u can analyze the schema in one command :
This is what I use. I have the following script saved as analyze_schema.sql and at the SQL prompt just run it and input the schema name when prompted.
------
accept user char format 'A40'-
prompt'Enter user name:'
begin
dbms_utility.analyze_schema(upper('&user'),'COMPUTE');
end;
/
-------
Tom,
Correct me if I am wrong.
Proper Analyze
Randy Richardson, March 25, 2002 - 9:17 am UTC
Tom, Thanks for sharing the knowledge. I just bought your latest book. Love it!!. Do you have one covering Oracle9i in the works?
As for the above question: I too use
1. dbms_utility.analyze_schema(USER_NAME),'COMPUTE');
Does this differ to
2. analyze table t compute statistics
for table
for all indexes
for all indexed columns;
and if it does why would you want to do it 1 rather than 2?
And if two is better why not change 1 to do whats best.
Thanks again!!
March 25, 2002 - 12:22 pm UTC
Here is the lowdown on analyzing objects.
in the real world, outside of the examples you see here, you will use DBMS_STATS to compute statistics. It is more flexible, has more features (eg: a parallel analyze that analyze cannot do). In examples here, I analyze only what I need since I tend to CREATE something, use it, and drop it. For me, analyzing my entire schema would be a waste of time.
Never use dbms_utility, it is a legacy from 7.3. Use dbms_stats instead.
(i'm in the process of outlining a performance book that would cover upto 9iR2)
Post-laceration !
Chris Slattery, March 25, 2002 - 9:42 am UTC
Yes Tom ...
You (as always ) are right. Stemmed from misreading the syntax diags. a long time ago and being sucked into the myth (that I need to debunk) that a qualifier (such as for index or for table ) must be added for ANALYZE to work.
Deleting Statistics
Rob, June 06, 2003 - 5:16 pm UTC
Is there anything to watch out for when
deleting stats in the fashion below ?
analyze table
ARCH_CANCEL_NOTICES
delete statistics ;
Thanks
June 06, 2003 - 8:02 pm UTC
well, if you join or reference arch_cancel_notices with other analyzed tables, we'll use default stats for arch_cancel_notices -- with disastrous results (gartner says 99.99% probability ;)
DBMS_UTILITY.ANALYZE_SCHEMA is not very smart
A reader, July 31, 2003 - 11:39 pm UTC
Hi Tom,
the package DBMS_UTILITY.ANALYZE_SCHEMA is not very smart it picks up all tables in a schema to analyse them and fails if the schema has a external table, i think the package should omit external tables when it knows that analyze does not work on external tables.
August 01, 2003 - 7:44 am UTC
well, you shouldn't be using dbms_utility -- dbms_stats is the only way to go. and then, you CAN in fact analyze them:
ops$tkyte@ORA920LAP.LOCALHOST> create table external_table
2 (EMPNO NUMBER(4) ,
3 ENAME VARCHAR2(10),
4 JOB VARCHAR2(9),
5 MGR NUMBER(4),
6 HIREDATE DATE,
7 SAL NUMBER(7, 2),
8 COMM NUMBER(7, 2),
9 DEPTNO NUMBER(2)
10 )
11 ORGANIZATION EXTERNAL
12 ( type oracle_loader
13 default directory data_dir
14 access parameters
15 ( fields terminated by ',' )
16 location ('emp.dat')
17 )
18 /
Table created.
ops$tkyte@ORA920LAP.LOCALHOST> host flat scott/tiger emp > /tmp/emp.dat
ops$tkyte@ORA920LAP.LOCALHOST>
ops$tkyte@ORA920LAP.LOCALHOST> exec dbms_stats.gather_table_stats( user, 'EXTERNAL_TABLE' )
PL/SQL procedure successfully completed.
ops$tkyte@ORA920LAP.LOCALHOST> select table_name, num_rows, last_analyzed from
2 user_tables where table_name = 'EXTERNAL_TABLE';
TABLE_NAME NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
EXTERNAL_TABLE 14 01-AUG-03
<b>plus have a whole lot more control</b>
what the point of FOR ALL COLUMNS
A reader, September 01, 2003 - 1:31 pm UTC
Hi
Why we dont use FOR ALL INDEXED COLUMNS instead of FOR ALL COLUMNS? I dont see the point analyzing columns whcih are not indexed. Any suggestion?
September 01, 2003 - 2:29 pm UTC
think of a join....
select *
from t1, t2
where t1.indexed_column = t2.column
and t2.unindexed_column = 'X';
tell me, should we nested loops join to t1 after full scanning t2 or...
should we hash join both
whatever you say is best, I'll put a counter example that shows the other was really best.
I dont see how statistics can help
A reader, September 01, 2003 - 3:52 pm UTC
Hi
I dont see how statistics of a column help in a join, if let's say hash join is best then shouldnt that decision based on number of rows in both tables...? If nested loop is used shouldnt the statistics on the indexed column be enough?
September 01, 2003 - 5:58 pm UTC
select *
from t1, t2
where t1.indexed_column = t2.column
and t2.unindexed_column = 'X';
say unindex_colulmn = 'X' is known (via histograms) to return 1 row and that t1.indexed_column is know to be unique.
full scan t2, nested loops using index to t1
say unindex_column = 'X' is known (via histograms) to return 500,000 rows.
full scan t2, filter, hash join to t1
the decision to hash join IS based to some degree on the number of rows in both tables AFTER FILTERING. After "unindex_column = 'X'" is evaluated. If the optimizer has no clue about the distribution of 'X' -- well, there you go, it won't know -- it'll guess and could well be *wrong*
consider:
ops$tkyte@ORA920> create table t1
2 as
3 select *
4 from all_objects;
Table created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> alter table t1 add constraint t1_pk primary key(object_id);
Table altered.
ops$tkyte@ORA920>
ops$tkyte@ORA920> analyze table t1
2 compute statistics
3 for table
4 for all indexes
5 for all indexed columns;
Table analyzed.
ops$tkyte@ORA920>
ops$tkyte@ORA920> create table t2
2 as
3 select decode( mod(rownum,100), 0, to_char(rownum), 'x' ) unindex_column, all_objects.*
4 from all_objects;
Table created.
ops$tkyte@ORA920>
ops$tkyte@ORA920> analyze table t2
2 compute statistics
3 for table;
Table analyzed.
ops$tkyte@ORA920>
ops$tkyte@ORA920>
ops$tkyte@ORA920> set autotrace traceonly explain
ops$tkyte@ORA920> select * from t1, t2 where t1.object_id = t2.object_id and t2.unindex_column = '100';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=100 Card=317 Bytes=61498)
1 0 HASH JOIN (Cost=100 Card=317 Bytes=61498)
2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=48 Card=317 Bytes=31066)
3 1 TABLE ACCESS (FULL) OF 'T1' (Cost=50 Card=31708 Bytes=3043968)
ops$tkyte@ORA920> select * from t1, t2 where t1.object_id = t2.object_id and t2.unindex_column = 'x';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=100 Card=317 Bytes=61498)
1 0 HASH JOIN (Cost=100 Card=317 Bytes=61498)
2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=48 Card=317 Bytes=31066)
3 1 TABLE ACCESS (FULL) OF 'T1' (Cost=50 Card=31708 Bytes=3043968)
<b>
ops$tkyte@ORA920> analyze table t2 compute statistics for columns unindex_column size 100;
Table analyzed.
</b>
ops$tkyte@ORA920> Select * from t1, t2 where t1.object_id = t2.object_id and t2.unindex_column = '100';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=49 Card=1 Bytes=194)
1 0 NESTED LOOPS (Cost=49 Card=1 Bytes=194)
2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=48 Card=1 Bytes=98)
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=1 Bytes=96)
4 3 INDEX (UNIQUE SCAN) OF 'T1_PK' (UNIQUE) (Cost=1 Card=1)
ops$tkyte@ORA920> Select * from t1, t2 where t1.object_id = t2.object_id and t2.unindex_column = 'x';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=214 Card=31710 Bytes=6151740)
1 0 HASH JOIN (Cost=214 Card=31710 Bytes=6151740)
2 1 TABLE ACCESS (FULL) OF 'T1' (Cost=50 Card=31708 Bytes=3043968)
3 1 TABLE ACCESS (FULL) OF 'T2' (Cost=52 Card=31710 Bytes=3107580)
ops$tkyte@ORA920> set autotrace off
Too good explanation!!!!
A reader, September 02, 2003 - 5:47 am UTC
dbms_stats
Michael, October 10, 2003 - 7:06 pm UTC
Hi Tom,
is dbms_stats.gather_table_stats(ownname => 'OWNER',
tabname => 'TABLE',
cascade => 'TRUE');
the same as
analyze table t compute statistics
for table
for all indexes
for all indexed columns;
If not, where are the differences and is there a way to let dbms_stats do the same?
You're answer is as always appreciated.
Analyze
Yogesh Hingmire, January 11, 2004 - 12:55 pm UTC
January 11, 2004 - 3:16 pm UTC
it is correct --
analyze table t compute statistics;
does all three -- if you use NO for clauses.
A reader, March 18, 2004 - 4:47 pm UTC
Tom,
Is there any other way to figure out if my tables my been analyzed other than looking at *_tables.
Thanks.
March 18, 2004 - 5:07 pm UTC
guess you could use dbms_stats and catch the exception, but sql seems easier
ops$tkyte@ORA9IR2> create table t as select * from all_objects;
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> variable numrows number
ops$tkyte@ORA9IR2> variable numblks number
ops$tkyte@ORA9IR2> variable avgrlen number
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autoprint on
ops$tkyte@ORA9IR2> exec dbms_stats.get_table_stats( user, 'T', null, null, null, :numrows, :numblks, :avgrlen );
BEGIN dbms_stats.get_table_stats( user, 'T', null, null, null, :numrows, :numblks, :avgrlen ); END;
*
ERROR at line 1:
ORA-20000: Unable to get values for table T
ORA-06512: at "SYS.DBMS_STATS", line 2582
ORA-06512: at line 1
AVGRLEN
----------
NUMBLKS
----------
NUMROWS
----------
ops$tkyte@ORA9IR2> analyze table t compute statistics for table;
Table analyzed.
ops$tkyte@ORA9IR2> exec dbms_stats.get_table_stats( user, 'T', null, null, null, :numrows, :numblks, :avgrlen );
PL/SQL procedure successfully completed.
AVGRLEN
----------
96
NUMBLKS
----------
435
NUMROWS
----------
30628
ops$tkyte@ORA9IR2>
A reader, March 18, 2004 - 5:09 pm UTC
Duh! Totally forgot about that.
Thanks alot.
Analyze tables created from TYPE
surjam, April 27, 2004 - 4:27 pm UTC
Hi Tom,
We have tables created from object TYPE. When we analyze the tables, we dont see the table on user_tables for the table analyzed. Is there any problem or any other way for analyzing tables created from TYPE.
create type my_typ as object
(a number);
/
create table my_tbl of my_typ;
/
insert into my_tbl values(100);
/
select table_name,num_rows from user_tables where table_name='MY_TBL';
no records found.
Thanks
April 28, 2004 - 2:04 pm UTC
ops$tkyte@ORA9IR2> create or replace type myType as object ( a number );
2 /
Type created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t of myType
2 /
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> insert into t values ( 100 );
1 row created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics
2 /
Table analyzed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select num_rows from user<b>_OBJECT_</b>tables where table_name = 'T';
NUM_ROWS
----------
1
dbms_ddl vs dbms_stats
surjam, April 28, 2004 - 2:36 pm UTC
Tom,
I was going through all your explanations regarding dbms_stats and you recommend this package. Am I right in saying that dbms_ddl.analyze_object is the same as the analyze table command. And, if I use your method of analyzing for table,all columns, all indexed columns will it give the same benefits of dbms_stats?
Also not very sure of how to set up DBMS_STATS.
You are the best Tom. Thanks again.
April 28, 2004 - 7:34 pm UTC
dbms_stats is FULLY documented
have you read about it?
surjam, April 29, 2004 - 10:49 am UTC
Tom,
I have read your chapter in Effective Oracle by design book and I now understand the way it is setup. Still if you can let me know the diff between dbms_ddl.analyze_object vs dbms_stats, that would help my understanding. My thought is that dbms_ddl.analyze_object is the same as
analyze table tname compute statistics for table for indexes
for all indexed columns
Is this right? Thanks
April 29, 2004 - 11:27 am UTC
dbms_ddl.analyze_object should be ignored, forgotten. it just does
"analyze table t compute statistics"
(if method is compute) -- all columns, all indexes.
dbms_stats should be the only game in town.
For OLTP
Sanjaya Balasuriya, May 12, 2004 - 7:14 am UTC
Hi Tom,
Is it a bad idea for creating histograms for an OLTP ?
This system is active only between 08.00 and 19.00. Even if I do an analyze for table columns after 19.00, isn't OK ?
Or should I just run dbms_stats.gather_schema_ststs("schema_name") ?
Thanks in advance.
May 12, 2004 - 7:55 am UTC
it is not "bad" as in evil.
it is generally "not so useful" as you sort of expect short, fast, quick, index reads in true oltp.
last time I saw true oltp however was sometime in 1994 i think. reports and other things creep into it.
it might be that you are spinning extra resources to compute them (gather schema stats defaults to getting histograms). but if it is something you are doing -- keep doing it (don't make big changes without testing), unless and until you test it without and see how it performs
dbms_stats taking more time
nn, May 12, 2004 - 9:05 am UTC
I am working on implementation of DBMS_STATS , Everybody knows
the comparison of dbms_stats and Analyze table but the main issue
is time elapsed. My client DBAs are not convinced to implement
it as it takes more time than ANALYZE.Following are the stats which
i am planning to put in front of them.
1. SELECT COUNT(*) FROM aa
26710372
a.Analyze table aa estimate statistics sample 20
percent;
SAMPLE_SIZE NUM_ROWS Time Elasped
4783565 26710372 04:42:26.42
Execute DBMS_STATS.GATHER_TABLE_STATS
(OWNNAME =>'GCA_USER',TABNAME =>
'GCA_BOOKING_DETAIL',ESTIMATE_PERCENT =>20,degree => 3);
SAMPLE_SIZE NUM_ROWS Time Elasped
5342192 26710960 3:06:13 30%
improvement
B.SELECT COUNT(*) FROM ab
57207324
Analyze table aB estimate statistics
sample 20 percent;
SAMPLE_SIZE NUM_ROWS Time Elasped
9051093 57207324 002:53:01
Execute DBMS_STATS.GATHER_TABLE_STATS
(OWNNAME =>'GCA_USER',TABNAME =>
'aB',ESTIMATE_PERCENT =>20,degree => 3);
SAMPLE_SIZE NUM_ROWS Time Elasped
5722651 57226510 009:45:30
Its really an issue to convience time elapsed it too high
c.SELECT COUNT(*) FROM GL;
24046022
Analyze table GL estimate statistics sample 20
percent;
SAMPLE_SIZE NUM_ROWS Time Elasped
4183321 24046022 1:57:15
Execute DBMS_STATS.GATHER_TABLE_STATS
(OWNNAME =>'GCA_USER',TABNAME =>
'GL',ESTIMATE_PERCENT =>20,degree => 3);
SAMPLE_SIZE NUM_ROWS Time Elasped
25881265 258812650 5:23:18
So Tom Please suggest on this?
May 12, 2004 - 6:45 pm UTC
run the dbms_stats with a 10046 level 12 trace and see what it is doing that takes the most time...
Reader
A reader, May 13, 2004 - 7:28 am UTC
Is there a way to verify that the statistics are deleted
when "analyze table delete statistics" is issued
Will the statistics gathered previously still remain.
I was checking at the "density" before and after
"analyze delete". It seems it still remained. How does
the optimizer know that the statistics are deleted
May 13, 2004 - 10:31 am UTC
select num_rows from user_tables where table_name = 't';
select count(*) from user_tab_histograms where table_name = 't';
select num_rows, index_name from user_indexes where table_name = 't';
that'll show you if table, column or index stats exist (modify as needed to check partitions or subpartitions)
num_rows is null -> no stats
count(*) = 0 -> no stats
Analyze
arc, August 17, 2004 - 10:15 am UTC
Hi Tom,
Is there any way to identify a table to be analyzed?
I am using Oracle 8i.
Thanks
ARC
August 17, 2004 - 10:19 am UTC
um? huh?
if you are using the cbo, all of your tables should be analyzed.
if you mean "which tables need to be reanalyzed or are candidates for being re-analyzed", then you
alter table t monitoring;
and use the gather stale/list stale option of dbms_stats.
Ravi Kumar, August 29, 2004 - 6:41 am UTC
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:718620715884#12100114489478 <code>
Tom in the example where you have compared Hash Join with Nested loop for table T2 with Unindexed column, I have a question.
1)Why does the Cardinality of the Full table Scan should become
TABLE ACCESS (FULL) OF 'T2' (Cost=48 Card=1 Bytes=98)
from
TABLE ACCESS (FULL) OF 'T1' (Cost=50 Card=31708 Bytes=3043968)
I mean FTS means it has to visit ALL the blocks and I'd expect the cardinality to remain 31708 after creating Histograms.
2)Why am I wrong, how do the Histograms WORK to reduce cardinality.
3)Is the cardinality really reduced in a FTS w.r.t Logical reads due to the presences of a Histogram?
4) If logical Reads do get reduced arent Histograms working like Indexes to point the data rows?
August 29, 2004 - 11:55 am UTC
look at the predicate -- the estimated row counts are the rows resulting from the operation, to be fed into the next step. they are not the estimated rows that would be "inspected", they are the estimated rows that would be produced:
ops$tkyte@ORA9IR2> drop table t;
Table dropped.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create table t
2 as
3 select 1 x, object_id y
4 from all_objects;
Table created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> begin
2 dbms_stats.gather_table_stats
3 ( user, 'T', method_opt=>'for all columns size 254' );
4 end;
5 /
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set autotrace traceonly explain
ops$tkyte@ORA9IR2> select * from t where x = 1;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=29509 Bytes=206563)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=10 Card=29509 Bytes=206563)
ops$tkyte@ORA9IR2> select * from t where y = 1;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=1 Bytes=7)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=10 Card=1 Bytes=7)
The cost of the full scan - the same.
The estimated number of rows produced -- different.
So, the cost for finding 1 or 30,000 rows -- 10.
If this were a more complex query, say joining T to something else, the optimizer would take the 30k rows from one query and say "you know, using an index 30k times is stupid for joining, so lets hash". It would likewise take the 1 row and say "wouldn't an index be a wonderful thing to join with since we have to do it only once"
Why does FTS sees lesser cardinality?
Ravi Kumar, August 29, 2004 - 6:49 am UTC
Sorry tom ignore my previous review as I have made a crucial mistake!
Here's the corrected one:
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:718620715884#12100114489478 <code>
Tom in the example where you have compared Hash Join with Nested loop for table
T2 with Unindexed column, I have a question.
1)Why does the Cardinality of the Full table Scan should become
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=49 Card=1 Bytes=194)
1 0 NESTED LOOPS (Cost=49 Card=1 Bytes=194)
2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=48 Card=1 Bytes=98)
3 1 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=1 Bytes=96)
4 3 INDEX (UNIQUE SCAN) OF 'T1_PK' (UNIQUE) (Cost=1 Card=1)
To me it appears the Nested Loop gets just ONE row from the FTS of T2 to loop through T1
FROM
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=100 Card=317 Bytes=61498)
1 0 HASH JOIN (Cost=100 Card=317 Bytes=61498)
2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=48 Card=317 Bytes=31066)
3 1 TABLE ACCESS (FULL) OF 'T1' (Cost=50 Card=31708 Bytes=3043968)
I mean FTS means it has to visit ALL the blocks and I'd expect the cardinality
to remain 317 after creating Histograms.
2)Why am I wrong, how do the Histograms WORK to reduce cardinality.
3)Is the cardinality really reduced in a FTS w.r.t Logical reads due to the
presences of a Histogram?
4) If logical Reads do get reduced arent Histograms working like Indexes to
point the data rows?
Note: I have run the above queries locally to verify the results and they were as you had given, with a different mileage as expected!
August 29, 2004 - 11:59 am UTC
see above.
Analyze Table / Index / Indexed Columns
Vivek Sharma, November 29, 2004 - 3:47 am UTC
Dear Tom,
Please explain me this behaviour.
Create table t as select * from all_objects;
select count(*) from t;
COUNT(*)
----------
29098
SQL> select count(*) from t where data_object_Id is null;
COUNT(*)
----------
50
create bitmap index t_inx on t(data_object_id);
The Scenario for query
select * from t where data_object_id is null
is :
When I do
1. Analyze table T compute statistics, the index is used.
2. Delete the Statistics and Analyze table t compute statistics for table, the index is not used.
3. With Step 2 and Analyze index t_inx compute statistics, again the index is not used,
4. With Step 2 and Analyze index t_inx compute statistics for all indexed columns, again the index is not used.
5. Analyze table t compute statistics for table for all indexes for all indexed columns, the index is used.
Scenario I
-----------
SQL> analyze table t compute statistics;
Table analyzed.
Elapsed: 00:00:02.00
SQL> select * from t where data_object_id is null;
50 rows selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=50 Bytes=440
0)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=12 Card=50 Byte
s=4400)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (SINGLE VALUE) OF 'T_INX'
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
5314 bytes sent via SQL*Net to client
532 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50 rows processed
Scenario II
-----------
SQL> analyze table t delete statistics;
Table analyzed.
Elapsed: 00:00:00.00
SQL> analyze table t compute statistics for table;
Table analyzed.
Elapsed: 00:00:00.02
SQL> select * from t where data_object_id is null;
50 rows selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=1455 Bytes=1
44045)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=41 Card=1455 Bytes=144045
)
Statistics
----------------------------------------------------------
262 recursive calls
0 db block gets
431 consistent gets
0 physical reads
0 redo size
5314 bytes sent via SQL*Net to client
532 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50 rows processed
Scenario III
-------------
SQL> analyze index t_inx compute statistics;
Index analyzed.
Elapsed: 00:00:00.02
SQL> select * from t where data_object_id is null;
50 rows selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=1455 Bytes=1
44045)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=41 Card=1455 Bytes=144045
)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
405 consistent gets
0 physical reads
0 redo size
5314 bytes sent via SQL*Net to client
532 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50 rows processed
Scenario IV
-----------
SQL> analyze index t_inx compute statistics for all indexed columns;
Index analyzed.
Elapsed: 00:00:00.00
SQL> select * from t where data_object_id is null;
50 rows selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=1455 Bytes=1
44045)
1 0 TABLE ACCESS (FULL) OF 'T' (Cost=41 Card=1455 Bytes=144045
)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
405 consistent gets
0 physical reads
0 redo size
5314 bytes sent via SQL*Net to client
532 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50 rows processed
Scenario V
----------
SQL> analyze table t compute statistics for table for all indexes for all indexed columns;
Table analyzed.
Elapsed: 00:00:00.04
SQL> select * from t where data_object_id is null;
50 rows selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=50 Bytes=495
0)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=12 Card=50 Byte
s=4950)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (SINGLE VALUE) OF 'T_INX'
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
5314 bytes sent via SQL*Net to client
532 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50 rows processed
Why is my Index not getting used when I compute statistics for Indexes separately ?
Thanks
Vivek
November 29, 2004 - 8:20 am UTC
analyze table t compute statistics for table;
^^^^^^^^^
for table by itself means "for the table, NOT for the columns NOT for the indexes"
hence, Oracle doesn't have a clue that there are only 50 rows null anymore. you took that information away from it.
In Continuation to my previous Query
Vivek Sharma, November 29, 2004 - 8:53 am UTC
Hi Tom,
Thanks for your prompt reply.
I know that since I analyzed only the table, the index is not geeting used. But when I analyzed the Index using Analyze Index, the INdex was not used. Why this behaviour ?
When I analyzed the table and Index using a single commad, the query used the index.
Regards
Vivek
November 29, 2004 - 2:54 pm UTC
because you had no COLUMN statistics.
the index doesn't tell us COLUMN statistics. for all columns does stuff only when you scan the table.
ops$tkyte@ORA9IR2> create table t as select * from all_objects;
Table created.
ops$tkyte@ORA9IR2> create index t_idx on t(data_object_id);
Index created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics;
Table analyzed.
ops$tkyte@ORA9IR2> create or replace view v
2 as
3 select 'T' what, num_rows rows_cname_cnt, null cnt from user_tables where table_name = 'T'
4 union all
5 select 'I', num_rows, null from user_indexes where table_name = 'T'
6 union all
7 select 'C', count(distinct column_name), count(*) from user_tab_histograms where table_name = 'T';
View created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics;
Table analyzed.
ops$tkyte@ORA9IR2> select * from v;
WHAT ROWS_CNAME_CNT CNT
------------------------------ -------------- ----------
T 27938
I 1389
C 13 26
ops$tkyte@ORA9IR2> analyze table t delete statistics;
Table analyzed.
ops$tkyte@ORA9IR2> select * from v;
WHAT ROWS_CNAME_CNT CNT
------------------------------ -------------- ----------
T
I
C 0 0
ops$tkyte@ORA9IR2> analyze table t compute statistics for table;
Table analyzed.
ops$tkyte@ORA9IR2> select * from v;
WHAT ROWS_CNAME_CNT CNT
------------------------------ -------------- ----------
T 27938
I
C 0 0
ops$tkyte@ORA9IR2> analyze index t_idx compute statistics;
Index analyzed.
ops$tkyte@ORA9IR2> select * from v;
WHAT ROWS_CNAME_CNT CNT
------------------------------ -------------- ----------
T 27938
I 1389
C 0 0
ops$tkyte@ORA9IR2> analyze index t_idx compute statistics for all indexed columns;
Index analyzed.
ops$tkyte@ORA9IR2> select * from v;
WHAT ROWS_CNAME_CNT CNT
------------------------------ -------------- ----------
T 27938
I 1389
C 0 0
ops$tkyte@ORA9IR2> analyze table t compute statistics for table for all indexes for all indexed columns;
Table analyzed.
ops$tkyte@ORA9IR2> select * from v;
WHAT ROWS_CNAME_CNT CNT
------------------------------ -------------- ----------
T 27938
I 1389
C 1 74
ops$tkyte@ORA9IR2>
I think I understand, but why gather histogram data on non_indexed columns?
Ken, March 02, 2005 - 11:36 am UTC
Tom,
I'm a contract DBA for a big company. They had a well respected consultant in the past who advised that they use "FOR ALL COLUMNS SIZE 1" when gathering stats (I think it's the default in 9.2). I understand that humans might be interested in the histogram data for an unindexed column once in a while(it provides insight on whether to index it or not, if the number of buckets is greater than 1) but why should we use this option for everything, all the time? Can the optimizer come up with a better plan by looking as histogram data for an unindexed column? It will still have to do a FTS is there is not index on the predicated column, right?
I plan to test by gathering stats with "FOR INDEXED COLUMNS" but would greatly appreciate your input first.
March 02, 2005 - 11:41 am UTC
select *
from big_table_1 A, big_table_2 B
where A.non_indexed_column = 42
and A.column = B.indexed_column;
Would you like to know that of the 1,000,000,000 rows in big_table_1, there are say 100 values of non_indexed_column, and only 2 rows with the value of 42
so, we get a full scan of big_table_1 and a nested loops join to big_table_2
instead of a full scan of big_table_1, hash it, and a full scan of big_table_2 and hash join?
that is one reason to gather histograms on non-indexed columns.
where did you see to use something "for everything, all the time" -- I need to erase that if true.
Great Information!
Ken, March 02, 2005 - 12:30 pm UTC
Thanks Tom, you are the best!
By using "SIZE 1" the optimizer would put everything into 1 bucket and would therefore only know the low value and high value, correct? So you would have to use a more appropriate value for size to get the benefit you describe, right?
I don't know if the consultant told my client to "always" gather stats this way, but told them to do it that way and it is the only way they do it.
Thanks again. I should be able to come up with some good tests, keeping your comments in mind.
March 02, 2005 - 12:39 pm UTC
to let the optimizer understand the skew of the data, yes you would have to have more than one bucket.
Histograms
Kumar, April 30, 2005 - 4:24 pm UTC
Hello Tom,
We are using 9i R2. We were analyzing the tables every weekend using the following command,
Analyze Table <T> compute statistics;
Our DBA had changed this analyze job to use DBMS_STATS two weeks back, this resulted in poor performance of all of our sqls.
The DBA had changed the above Analyze command to the following DBMS_STATS syntax,
exec dbms_stats.gather_table_stats(
<user>, 'T', estimate_percent => null, degree => null, cascade => true);
I think the difference is the histograms( missing method_opt parameter), am I correct?
Can you please advise an equivalent value for method_opt parameter for the analyze command 'Analyze Table <T> compute statistics;'
Thanks.
April 30, 2005 - 4:40 pm UTC
method opt defaults to 'for all columns size 1', it is the same. look somewhere else for the root cause of your problem for now I think:
ops$tkyte@ORA9IR2> create table t
2 as
3 select * from all_objects;
Table created.
ops$tkyte@ORA9IR2> alter table t add constraint t_pk primary key(object_id);
Table altered.
ops$tkyte@ORA9IR2> create index t_idx on t(object_name);
Index created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> analyze table t compute statistics;
Table analyzed.
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
TABLE_NAME NUM_ROWS
------------------------------ ----------
T 27812
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
CREATED 2
DATA_OBJECT_ID 2
GENERATED 2
LAST_DDL_TIME 2
OBJECT_ID 2
OBJECT_NAME 2
OBJECT_TYPE 2
OWNER 2
SECONDARY 2
STATUS 2
SUBOBJECT_NAME 2
TEMPORARY 2
TIMESTAMP 2
13 rows selected.
ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name = 'T';
INDEX_NAME NUM_ROWS
------------------------------ ----------
T_IDX 27812
T_PK 27812
ops$tkyte@ORA9IR2> analyze table t delete statistics;
Table analyzed.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T', estimate_percent => null, degree => null, cascade => true );
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> select table_name, num_rows from user_tables where table_name = 'T';
TABLE_NAME NUM_ROWS
------------------------------ ----------
T 27812
ops$tkyte@ORA9IR2> select column_name, count(*) from user_tab_histograms where table_name = 'T' group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
CREATED 2
DATA_OBJECT_ID 2
GENERATED 2
LAST_DDL_TIME 2
OBJECT_ID 2
OBJECT_NAME 2
OBJECT_TYPE 2
OWNER 2
SECONDARY 2
STATUS 2
SUBOBJECT_NAME 2
TEMPORARY 2
TIMESTAMP 2
13 rows selected.
ops$tkyte@ORA9IR2> select index_name, num_rows from user_indexes where table_name = 'T';
INDEX_NAME NUM_ROWS
------------------------------ ----------
T_IDX 27812
T_PK 27812
Histograms
Kumar, May 02, 2005 - 2:53 pm UTC
Thanks Tom. Your explanation was very useful. I see one another difference in SAMPLE_SIZE in USER_TAB_COLUMNS.
Analyze sets this to total number of rows in the table but DBMS_STATS sets the actual not null values in column.
This should not be a problem right?
And also our OPTIMIZER_MODE is set to CHOOSE in the init.ora file.
Since the tables are analyzed this will use RBO,but can you please let me know which one this will use FIRST_ROWS, FIRST_ROWS_n or ALL_ROWS as a default?
Would this make any difference? Our application is not a true OLTP, it has lots of reports and batch jobs also.
Do you advise me to change this from CHOOSE?
Thanks for your help.
Here are the output from that shows the SAMPLE_SIZE differently.
SQL*Plus: Release 9.2.0.1.0 - Production on Mon May 2 10:20:26 2005
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.3.0 - Production
SQL> set pagesize 100
SQL> column COLUMN_NAME format a20
SQL> analyze table account compute statistics;
Table analyzed.
SQL> select table_name, num_rows,GLOBAL_STATS, USER_STATS,SAMPLE_SIZE, LAST_ANALYZED
2 from user_tables where table_name='ACCOUNT';
TABLE_NAME NUM_ROWS GLO USE SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- --- --- ----------- ---------
ACCOUNT 4565299 NO NO 4565299 02-MAY-05
SQL>select column_name, count(*) from user_tab_histograms where
2 table_name = 'ACCOUNT' group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
ACCT_NBR 2
ACCT_NM 2
ACCT_OPEN_DT 2
ACCT_STS_EFF_DT 2
ASC_CD 2
ASC_OID 2
CHGD_BY 2
CHGD_DT 2
EFFECTIVE_DT 2
LOB_ID 2
LOCAL_ORG_CODE 2
OID 2
PRD_OID 2
VERFD_BY 2
VERFD_DT 2
VERIFY_STS 2
16 rows selected.
SQL> select index_name, num_rows, GLOBAL_STATS, USER_STATS,SAMPLE_SIZE,
2 LAST_ANALYZED from user_indexes where table_name = 'ACCOUNT';
INDEX_NAME NUM_ROWS GLO USE SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- --- --- ----------- ---------
ACCOUNT_IDX1 4565299 NO NO 4565299 02-MAY-05
ACCOUNT_IDX2 2151656 NO NO 2151656 02-MAY-05
ACCOUNT_IDX3 4565299 NO NO 4565299 02-MAY-05
ACCOUNT_IDX4 2197868 NO NO 2197868 02-MAY-05
ACCOUNT_IDX5 1952886 NO NO 1952886 02-MAY-05
ACCOUNT_PK 4565299 NO NO 4565299 02-MAY-05
6 rows selected.
SQL> select column_name,NUM_DISTINCT, NUM_NULLS, NUM_BUCKETS, LAST_ANALYZED, SAMPLE_SIZE,
2 USER_STATS from user_tab_columns where table_name='ACCOUNT';
COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE USE
-------------------- ------------ ---------- ----------- --------- ----------- ---
OID 4565299 0 1 02-MAY-05 4565299 NO
PRD_OID 605402 2413643 1 02-MAY-05 4565299 NO
ASC_OID 489 0 1 02-MAY-05 4565299 NO
ACCT_NBR 4502947 0 1 02-MAY-05 4565299 NO
ACCT_NM 477065 2367431 1 02-MAY-05 4565299 NO
ACCT_OPEN_DT 27944 3510061 1 02-MAY-05 4565299 NO
ASC_CD 39 129563 1 02-MAY-05 4565299 NO
ACCT_STS_EFF_DT 211095 3539583 1 02-MAY-05 4565299 NO
LOB_ID 17 2610012 1 02-MAY-05 4565299 NO
LOCAL_ORG_CODE 5749 2612413 1 02-MAY-05 4565299 NO
EFFECTIVE_DT 80661 2114806 1 02-MAY-05 4565299 NO
EXPIRY_DT 0 4565299 1 02-MAY-05 NO
VERIFY_STS 1 2174754 1 02-MAY-05 4565299 NO
VERFD_BY 59812 2183161 1 02-MAY-05 4565299 NO
VERFD_DT 3639 2183455 1 02-MAY-05 4565299 NO
CHGD_BY 6972 0 1 02-MAY-05 4565299 NO
CHGD_DT 353852 0 1 02-MAY-05 4565299 NO
17 rows selected.
SQL> exec dbms_stats.gather_table_stats( user, 'ACCOUNT',estimate_percent => null, degree => null,
cascade => true );
PL/SQL procedure successfully completed.
SQL> select table_name, num_rows,GLOBAL_STATS, USER_STATS,SAMPLE_SIZE, LAST_ANALYZED
2 from user_tables where table_name='ACCOUNT';
TABLE_NAME NUM_ROWS GLO USE SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- --- --- ----------- ---------
ACCOUNT 4565299 YES NO 4565299 02-MAY-05
SQL> select column_name, count(*) from user_tab_histograms where
2 table_name = 'ACCOUNT' group by column_name;
COLUMN_NAME COUNT(*)
-------------------- ----------
ACCT_NBR 2
ACCT_NM 2
ACCT_OPEN_DT 2
ACCT_STS_EFF_DT 2
ASC_CD 2
ASC_OID 2
CHGD_BY 2
CHGD_DT 2
EFFECTIVE_DT 2
EXPIRY_DT 2
LOB_ID 2
LOCAL_ORG_CODE 2
OID 2
PRD_OID 2
VERFD_BY 2
VERFD_DT 2
VERIFY_STS 2
17 rows selected.
SQL> select index_name, num_rows, GLOBAL_STATS, USER_STATS,SAMPLE_SIZE,
2 LAST_ANALYZED from user_indexes where table_name = 'ACCOUNT';
INDEX_NAME NUM_ROWS GLO USE SAMPLE_SIZE LAST_ANAL
------------------------------ ---------- --- --- ----------- ---------
ACCOUNT_IDX1 4565299 YES NO 4565299 02-MAY-05
ACCOUNT_IDX2 2151656 YES NO 2151656 02-MAY-05
ACCOUNT_IDX3 4565299 YES NO 4565299 02-MAY-05
ACCOUNT_IDX4 2197868 YES NO 2197868 02-MAY-05
ACCOUNT_IDX5 1952886 YES NO 1952886 02-MAY-05
ACCOUNT_PK 4565299 YES NO 4565299 02-MAY-05
6 rows selected.
SQL> select column_name,NUM_DISTINCT, NUM_NULLS, NUM_BUCKETS, LAST_ANALYZED, SAMPLE_SIZE,
2 USER_STATS from user_tab_columns where table_name='ACCOUNT'
3 /
COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE USE
-------------------- ------------ ---------- ----------- --------- ----------- ---
OID 4565299 0 1 02-MAY-05 4565299 NO
PRD_OID 605402 2413643 1 02-MAY-05 2151656 NO
ASC_OID 489 0 1 02-MAY-05 4565299 NO
ACCT_NBR 4502947 0 1 02-MAY-05 4565299 NO
ACCT_NM 484325 2367431 1 02-MAY-05 2197868 NO
ACCT_OPEN_DT 27944 3510061 1 02-MAY-05 1055238 NO
ASC_CD 39 129563 1 02-MAY-05 4435736 NO
ACCT_STS_EFF_DT 211095 3539583 1 02-MAY-05 1025716 NO
LOB_ID 17 2610012 1 02-MAY-05 1955287 NO
LOCAL_ORG_CODE 5749 2612413 1 02-MAY-05 1952886 NO
EFFECTIVE_DT 80661 2114806 1 02-MAY-05 2450493 NO
EXPIRY_DT 0 4565299 1 02-MAY-05 NO
VERIFY_STS 1 2174754 1 02-MAY-05 2390545 NO
VERFD_BY 59812 2183161 1 02-MAY-05 2382138 NO
VERFD_DT 3639 2183455 1 02-MAY-05 2381844 NO
CHGD_BY 6972 0 1 02-MAY-05 4565299 NO
CHGD_DT 353852 0 1 02-MAY-05 4565299 NO
17 rows selected.
May 03, 2005 - 7:19 am UTC
the optimizer is written expecting dbms_stats was used to collect stats. dbms_stats should be the baseline.
I think you meant
Since the tables are analyzed this will use CBO,
not
Since the tables are analyzed this will use RBO,
and with choose, you are basically "all_rows", i would leave at choose.
Kumar, May 03, 2005 - 12:59 pm UTC
Thanks Tom. You are correct, I meant CBO in the above question.sorry for the confusion.
>>>with choose, you are basically "all_rows", i would leave at choose.
Our application is having online, batch and reports, will leaving the parameter to "choose" in turn to "all_rows" create issues to online screens?
Should I make this as "first_rows_n" and ask the developers to change it to "all_rows" in batch/reports connections?
And also I had come across one of the oracle white papers that says "first_rows" is obsolete and developers should use first_rows_n.
Can you please give your advise on this.
May 03, 2005 - 2:43 pm UTC
let things that need it use first_rows at the session level.
the less you change globally, the better.
Estimate time for analyze
JJ, March 02, 2006 - 6:41 am UTC
Hi Tom,
Is there any way to estimate the total time required for analyzing a table without actually analyzing it?
March 02, 2006 - 12:27 pm UTC
you mean for dbms_stats to run of course ;)
Not really, you could do a really small estimate and a slightly larger one and attempt to extrapolate - given that it won't necessarily straight line.
Index creation and analyze table
Neeraj R Rath, March 10, 2006 - 12:15 am UTC
Hi TOM,
This Neeraj R Rath from India.
I am working as a performance engineer in a large consultacy.
I was checking one of the application where i found that non of production tables are analyzed.
Then i contacted the concerned application team ,they told me that there is a clear instruction of the vender not to analyze the tables as it may degrade the performance of the application .They are many indexes existing in production tables.
Please suggest if am creating indexes in a particular table to get better performance of a query and am not analyzing the table then what are the chances that query response will not be optimal.
Please suggest me the importance of analyzing the production tables.
OS:Sun Solaris,
DB:Ortacle 8.1.7.4.0
March 10, 2006 - 12:11 pm UTC
please contact application vendor - they are the ones that tell you what you are allowed to do with their application.
Excellent
Saraswathi, July 25, 2006 - 3:20 am UTC
Tom,your explanations are excellent.
Analyze Frequency?
Steve, August 17, 2007 - 10:45 am UTC
Tom,
We have an equity trading system that populates a "clean" database on a daily basis. Basically, the data in our core transactional tables is cleared out at the end of each day, and we start with empty tables at the beginning of the next trading day.
We currently analyze (using DBMS_STATS) at 3am,7am,10:30am,12:30pm,2pm,4pm. (EST)
These times represent a 1/2 hour to an hour after our typical spikes in volume.
There is generally a 20-30% increase in rowcount per hour, with certain hourly spikes around 200% or so.
At the end of the day, the number of rows in the all of the tables ranges from a few thousand to several hundred thousand rows.
Do you have any recommendations for the frequency of gathering statistics in this scenario.
Thanks.
August 22, 2007 - 9:20 am UTC
maybe you do it once at 2pm (average sized tables) and never again (retain plans as they are)
actually - what you should do is ask yourself (you can query v$ tables to answer this) - DO MY PLANS ACTUALLY CHANGE MATERIALLY as the day progresses.
If the plans do not really change much - perhaps you just want one plan. For what you are doing with the gather stats ever period of time is the logical equivalent of "flush my shared pool"
so, before you do anything, analyze what plans change - if any - and whether those changes are relevant (eg: if the last plan is good enough for all day, then just gather stats one more time - near the end of the day when the tables are almost full and run with those)
RE: "Analyze Frequency?" above
Steve, August 20, 2007 - 1:43 pm UTC
Oracle Version 10.1.0.5
Whose job is this?
karthick, September 06, 2007 - 8:50 am UTC
Who is responsible for analyzing the tables DBA or a Developer?
For example I have a process which uses a table. At the start the table is empty. The process populates it with lots of data (really lots of data!!). And then it calls some more process which work with that table. And once every thing is done the table is truncated.
So now is it a good thing to analyze that table on the fly when the data is populated to that table so that the other process using the table can use it in a optimal way.
As a developer can I put code in my process to analyze the table on the fly?
Regards,
Karthick.
September 11, 2007 - 8:08 am UTC
typically a DBA would be responsible for actually scheduling and monitoring the dbms_stats
however, a developer may have very relevant information needed by the DBA to properly do this.
so, both.
as a developer, you may call dbms_stats in your code, you might not even have to gather stats on the table, you might just SET THEM
actually, you might gather stats ONCE to get pretty good representative stats and just leave them there - never having the database think 'table is entirely empty' or 'table is entirely full'
ok or very bad
karthick, September 11, 2007 - 9:56 am UTC
as i told the table is populated only in the process and hence the following is coded in the process
EXECUTE IMMEDIATE 'ANALYZE TABLE MY_STG_TBL ESTIMATE STATISTICS';
is this ok or very bad when considering the overall perfomance.
September 15, 2007 - 3:26 pm UTC
as I said - just put representative stats in place and keep them.
and as I said, use dbms_stats, do NOT use your execute immediate.