When analyzing a table to setup histograms, how does Oracle determine the number of buckets? I notice different numbers in user_histograms for a column than the number of buckets I specify in the command. For example, I do some analysis and determine that 68 buckets are needed for a reference table with about 23,000 rows. Then I execute this command:
analyze table geo_xref_tb
compute statistics
for columns map_ref size 68
and then looking at the data dictionary, I see this
select table_name,column_name,count(*)
from user_histograms
where table_name = 'GEO_XREF_TB'
group by table_name,column_name
having count(*) > 2
TABLE_NAME COLUMN_NAME COUNT(*)
------------------------------ --------------- ----------
GEO_XREF_TB MAP_REF 64
It seems Oracle has decided that it doesn't need all the 68 that I specified. I have not come across the situation where Oracle has increased the number of buckets beyond what I specify. Most often Oracle and I either have the same number or are +/- 1. Am I counting the number of buckets wrong or chalk it up to "the database knows better than I do"?
Since it knows what its doing and its numbers are all that count, is it better to set the size to the maximum 254 all the time and let Oracle figure out what it needs?
Thank-you for your help,
Bruce
It is a normal, expected outcome -- it did what you asked. What you are seeing is the different outcome you get when you specify a number of buckets that is LESS then the number of distinct values.
Consider:
ops$tkyte@ORA817DEV.US.ORACLE.COM> create table t ( x int );
Table created.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into t select rownum from all_users where rownum <= 10;
10 rows created.
ops$tkyte@ORA817DEV.US.ORACLE.COM> insert into t select 5 from all_objects;
22819 rows created.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics for columns x size 10;
Table analyzed.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> select column_name, endpoint_number, endpoint_value
2 from user_tab_histograms
3 where table_name = 'T';
COL ENDPOINT_NUMBER ENDPOINT_VALUE
--- --------------- --------------
X 1 1
X 2 2
X 3 3
X 4 4
X 22824 5
X 22825 6
X 22826 7
X 22827 8
X 22828 9
X 22829 10
10 rows selected.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t delete statistics;
Table analyzed.
ops$tkyte@ORA817DEV.US.ORACLE.COM> analyze table t compute statistics for columns x size 8;
Table analyzed.
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM>
ops$tkyte@ORA817DEV.US.ORACLE.COM> select column_name, endpoint_number, endpoint_value
2 from user_tab_histograms
3 where table_name = 'T';
COL ENDPOINT_NUMBER ENDPOINT_VALUE
--- --------------- --------------
X 7 5
X 8 10
X 0 1
ops$tkyte@ORA817DEV.US.ORACLE.COM>
So, did it do the right thing? Yes, it did. I asked for 8 and appear to have gotten 3 but I really got 8 (like I asked for). It is just that redundant ones (that could be inferred) were left out.
There is a support note, available on metalink as well, explains it in detail:
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:3126073805757 <code>
read through that...