The moment you have more than 255 distinct values (in 11g), then by definition, you are not guaranteed to have accurate estimate for a given distinct value.
You will possibly have a *better* estimate than if you didnt have histograms at all, but you obviously cant record accurate measurements on 9000 things with 255 buckets.
For example, here's a table with values 1 .. 5000, such that each value occurs that number of times. So there is 1 row with value 1, 2 rows with value 2, ..., 5000 rows with value 5000.
So there is skew in there - I grab a histogram, and look at what results
SQL> exec dbms_stats.gather_table_stats('','T',method_opt=>'for all columns size 254');
PL/SQL procedure successfully completed.
SQL> select endpoint_number, endpoint_value
2 from user_histograms
3 where table_name = 'T'
4 and column_name = 'R'
5 order by 1;
ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
0 86
1 354
2 461
3 556
4 621
5 701
6 766
7 824
8 885
9 931
10 973
11 1013
12 1050
13 1084
14 1148
15 1197
16 1230
17 1288
18 1332
19 1368
20 1398
21 1432
22 1459
23 1495
24 1528
25 1550
26 1580
27 1611
28 1647
29 1667
30 1694
31 1724
32 1748
33 1777
34 1809
35 1835
36 1863
37 1888
38 1905
39 1931
40 1968
41 1992
42 2012
43 2038
44 2068
45 2093
46 2114
47 2131
48 2161
49 2180
50 2198
51 2223
52 2237
53 2259
54 2274
55 2304
56 2323
57 2344
58 2363
59 2388
60 2408
61 2425
62 2447
63 2464
64 2480
65 2510
66 2524
67 2549
68 2569
69 2593
70 2615
71 2635
72 2653
73 2676
74 2697
75 2713
76 2732
77 2756
78 2772
79 2787
80 2808
81 2835
82 2852
83 2871
84 2892
85 2912
86 2926
87 2945
88 2959
89 2977
90 2992
91 3013
92 3038
93 3057
94 3071
95 3096
96 3111
97 3127
98 3143
99 3159
100 3175
101 3191
102 3207
103 3219
104 3236
105 3251
106 3263
107 3279
108 3291
109 3303
110 3314
111 3329
112 3341
113 3362
114 3375
115 3395
116 3412
117 3429
118 3450
119 3464
120 3479
121 3494
122 3513
123 3525
124 3538
125 3553
126 3565
127 3577
128 3588
129 3603
130 3617
131 3631
132 3643
133 3655
134 3665
135 3674
136 3685
137 3698
138 3712
139 3725
140 3736
141 3749
142 3769
143 3779
144 3793
145 3805
146 3814
147 3827
148 3836
149 3853
150 3863
151 3883
152 3894
153 3904
154 3918
155 3930
156 3940
157 3955
158 3965
159 3978
160 3990
161 4008
162 4023
163 4038
164 4051
165 4063
166 4075
167 4088
168 4098
169 4112
170 4123
171 4133
172 4142
173 4155
174 4172
175 4188
176 4197
177 4211
178 4222
179 4237
180 4248
181 4258
182 4269
183 4280
184 4287
185 4297
186 4309
187 4324
188 4331
189 4347
190 4360
191 4372
192 4382
193 4391
194 4403
195 4410
196 4423
197 4429
198 4436
199 4448
200 4461
201 4471
202 4483
203 4492
204 4504
205 4513
206 4524
207 4533
208 4544
209 4550
210 4561
211 4570
212 4580
213 4591
214 4601
215 4611
216 4620
217 4631
218 4641
219 4651
220 4662
221 4672
222 4683
223 4695
224 4703
225 4712
226 4721
227 4732
228 4745
229 4756
230 4768
231 4779
232 4787
233 4797
234 4802
235 4811
236 4819
237 4829
238 4843
239 4853
240 4861
241 4872
242 4885
243 4893
244 4906
245 4919
246 4927
247 4938
248 4948
249 4958
250 4967
251 4975
252 4980
253 4991
254 5000
255 rows selected.
Notice that there is no repeated value in the histogram. So when I do
select * from t where r = ...
the best we can do is know that it falls into one of the buckets, which isnt much different to having no buckets.
That doesn't mean the histogram is useless... for example, for a range scan, it improves our estimates a lot:
SQL> select count(*) from t where r > 4000;
Explained.
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 5280 (2)| 00:01:04 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | TABLE ACCESS FULL| T | 4599K| 17M| 5280 (2)| 00:01:04 |
---------------------------------------------------------------------------
SQL> select count(*) from t where r > 4000;
COUNT(*)
----------
4500500