Tom, consider this:
SQL> column table_name format a11
SQL> column column_name format a11
SQL> column column_name format a11
SQL> column endpoint_number format 999,999
SQL> column endpoint_value format 9.999999EEEE
SQL> column endpoint_actual_value format a22
SQL> column owner format A8
SQL>
SQL> create table test as select owner from dba_objects where owner < 'C'
2 /
Table created.
SQL> select owner, count(*) from test group by owner
2 /
OWNER COUNT(*)
-------- ----------
ABM 464
AHL 518
AHM 167
AK 188
ALR 128
AMF 106
AMS 1557
AMV 189
AMW 244
AP 795
APPLSYS 2766
APPLSYSPUB 17
APPS 140786
AR 2211
ASF 10
ASG 154
ASL 53
ASN 4
ASO 285
ASP 4
AST 38
AX 141
AZ 74
BEN 2783
BIC 70
BIL 226
BIM 838
BIS 1497
BIV 39
BIX 193
BNE 150
BOM 787
BSC 300
33 rows selected.
SQL> exec dbms_stats.gather_table_stats(ownname => user, tabname=> 'TEST', method_opt=>'for columns owner size 33');
PL/SQL procedure successfully completed.
SQL> select * from user_histograms where table_name = 'TEST' and column_name = 'OWNER'
2 /
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
----------- ----------- --------------- -------------- ----------------------
TEST OWNER 0 3.388440E+35
TEST OWNER 1 3.391283E+35
TEST OWNER 30 3.391283E+35
TEST OWNER 31 3.391625E+35
TEST OWNER 32 3.440973E+35
TEST OWNER 33 3.443803E+35
6 rows selected.
SQL> exec dbms_stats.gather_table_stats(ownname => user, tabname=> 'TEST', method_opt=>'for columns owner size 40');
PL/SQL procedure successfully completed.
SQL> select * from user_histograms where table_name = 'TEST' and column_name = 'OWNER'
2 /
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
----------- ----------- --------------- -------------- ----------------------
TEST OWNER 0 3.388440E+35
TEST OWNER 1 3.391219E+35
TEST OWNER 37 3.391283E+35
TEST OWNER 38 3.391625E+35
TEST OWNER 39 3.441782E+35
TEST OWNER 40 3.443803E+35
6 rows selected.
SQL> exec dbms_stats.gather_table_stats(ownname => user, tabname=> 'TEST', method_opt=>'for columns owner size 50');
PL/SQL procedure successfully completed.
SQL> select * from user_histograms where table_name = 'TEST' and column_name = 'OWNER'
2 /
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
----------- ----------- --------------- -------------- ----------------------
TEST OWNER 464 3.388440E+35 ABM
TEST OWNER 982 3.389657E+35 AHL
TEST OWNER 1,149 3.389657E+35 AHM
TEST OWNER 1,337 3.390205E+35 AK
TEST OWNER 1,465 3.390473E+35 ALR
TEST OWNER 1,571 3.390666E+35 AMF
TEST OWNER 3,128 3.390676E+35 AMS
TEST OWNER 3,317 3.390679E+35 AMV
TEST OWNER 3,561 3.390679E+35 AMW
TEST OWNER 4,356 3.391219E+35 AP
TEST OWNER 7,122 3.391283E+35 APPLSYS
TEST OWNER 7,139 3.391283E+35 APPLSYSPUB
TEST OWNER 147,925 3.391283E+35 APPS
TEST OWNER 150,136 3.391625E+35 AR
TEST OWNER 150,146 3.391883E+35 ASF
TEST OWNER 150,300 3.391884E+35 ASG
TEST OWNER 150,353 3.391888E+35 ASL
TEST OWNER 150,357 3.391889E+35 ASN
TEST OWNER 150,642 3.391890E+35 ASO
TEST OWNER 150,646 3.391891E+35 ASP
TEST OWNER 150,684 3.391894E+35 AST
TEST OWNER 150,825 3.392841E+35 AX
TEST OWNER 150,899 3.393247E+35 AZ
TEST OWNER 153,682 3.440973E+35 BEN
TEST OWNER 153,752 3.441775E+35 BIC
TEST OWNER 153,978 3.441782E+35 BIL
TEST OWNER 154,816 3.441783E+35 BIM
TEST OWNER 156,313 3.441788E+35 BIS
TEST OWNER 156,352 3.441790E+35 BIV
TEST OWNER 156,545 3.441792E+35 BIX
TEST OWNER 156,695 3.442791E+35 BNE
TEST OWNER 157,482 3.443000E+35 BOM
TEST OWNER 157,782 3.443803E+35 BSC
33 rows selected.
SQL> drop table test
2 /
Table dropped.
SQL> select * from v$version
2 /
BANNER
--------------------------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production
PL/SQL Release 9.2.0.7.0 - Production
CORE 9.2.0.7.0 Production
TNS for IBM/AIX RISC System/6000: Version 9.2.0.7.0 - Production
NLSRTL Version 9.2.0.7.0 - Production
5 rows selected.
There are 33 distinct values in OWNER column. After gathering statistics for this column with number of buckets equal and even more than number of distinct values (N between 33 and 44), I see in ENDPOINT_NUMBER column ordinal number of bucket and nothing in ENDPOINT_ACTUAL_VALUE. But when N in 'for columns owner size N' is sufficient greater than the number of distinct values (it seems the rule is "N >= 1.33333*number_of_distinct_values+1") then in ENDPOINT_NUMBER column appears the cumulative numbers of rows (according metalink Note:72539.1) and ENDPOINT_ACTUAL_VALUE is not empty. Here are my questions:
1. Why is it so? In above example I am expecting to see same results after
exec dbms_stats.gather_table_stats(ownname => user, tabname=> 'TEST', method_opt=>'for columns owner size 33');
and after
exec dbms_stats.gather_table_stats(ownname => user, tabname=> 'TEST', method_opt=>'for columns owner size 50');
2. Why there are only 6 rows (buckets) in USER_HISTOGRAMS when I calling dbms_stats with 'for columns owner size 33' (for example)? I am expecting to see 33 rows (buckets) after this call...
3. Is it better to have in ENDPOINT_NUMBER column cumulative numbers of rows than just bucket numbers? Can optimizer make its decisions more accurately in this case?
This happens when you have some values that utterly dominate the other values - as you do - that one really high value can be used to infer the other buckets.
consider the differences between t1 and t2 below:
ops$tkyte%ORA9IR2> create table t1
2 as
3 select 'user' || mod(rownum,33) owner, object_name
4 from all_objects;
Table created.
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> create table t2
2 as
3 select owner, object_name
4 from all_objects;
Table created.
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> select 't1', owner, count(*) from t1 group by owner union all
2 select 't2', owner, count(*) from t2 group by owner
3 order by 1, 2;
'T OWNER COUNT(*)
-- -------------------------------------------- ----------
t1 user0 929
t1 user1 930
t1 user10 930
t1 user11 930
t1 user12 930
t1 user13 930
t1 user14 930
t1 user15 930
t1 user16 930
t1 user17 930
t1 user18 929
t1 user19 929
t1 user2 930
t1 user20 929
t1 user21 929
t1 user22 929
t1 user23 929
t1 user24 929
t1 user25 929
t1 user26 929
t1 user27 929
t1 user28 929
t1 user29 929
t1 user3 930
t1 user30 929
t1 user31 929
t1 user32 929
t1 user4 930
t1 user5 930
t1 user6 930
t1 user7 930
t1 user8 930
t1 user9 930
t2 A 1
t2 BIG_TABLE 2
t2 CTXSYS 261
t2 DEMO 1
t2 HR 34
t2 MDSYS 234
t2 ODM 439
t2 ODM_MTR 12
t2 OE 86
t2 OLAPSYS 662
t2 OPS$TKYTE 16
t2 ORDPLUGINS 29
t2 ORDSYS 969
t2 OUTLN 7
t2 PERFSTAT 86
t2 PM 9
t2 PUBLIC 12285
t2 QS 41
t2 QS_ADM 7
t2 QS_CBADM 24
t2 QS_CS 23
t2 QS_ES 39
t2 QS_OS 39
t2 QS_WS 39
t2 SCOTT 6
t2 SH 173
t2 SYS 14073
t2 SYSTEM 382
t2 USER1 9
t2 USER2 9
t2 WKSYS 279
t2 WMSYS 129
t2 XDB 270
66 rows selected.
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T1', method_opt => 'for columns owner size 33' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA9IR2> select * from user_tab_col_statistics where table_name = 'T1';
TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE GLO USE AVG_COL_LEN
--------------- --------------- ------------ ---------- ---------- ---------- ---------- ----------- --------- ----------- --- --- -----------
T1 OWNER 33 7573657230 7573657239 .030302545 0 32 06-AUG-08 30674 YES NO 7
ops$tkyte%ORA9IR2> select * from user_histograms where table_name = 'T1' and column_name = 'OWNER';
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL
--------------- --------------- --------------- -------------- ---------------
T1 OWNER 0 6.0984E+35
T1 OWNER 1 6.0984E+35
T1 OWNER 2 6.0984E+35
T1 OWNER 3 6.0984E+35
T1 OWNER 4 6.0984E+35
T1 OWNER 5 6.0984E+35
T1 OWNER 6 6.0984E+35
T1 OWNER 7 6.0984E+35
T1 OWNER 8 6.0984E+35
T1 OWNER 9 6.0984E+35
T1 OWNER 10 6.0984E+35
T1 OWNER 11 6.0984E+35
T1 OWNER 12 6.0984E+35
T1 OWNER 13 6.0984E+35
T1 OWNER 14 6.0984E+35
T1 OWNER 15 6.0984E+35
T1 OWNER 16 6.0984E+35
T1 OWNER 17 6.0984E+35
T1 OWNER 18 6.0984E+35
T1 OWNER 19 6.0984E+35
T1 OWNER 20 6.0984E+35
T1 OWNER 21 6.0984E+35
T1 OWNER 22 6.0984E+35
T1 OWNER 23 6.0984E+35
T1 OWNER 24 6.0984E+35
T1 OWNER 25 6.0984E+35
T1 OWNER 26 6.0984E+35
T1 OWNER 27 6.0984E+35
T1 OWNER 28 6.0984E+35
T1 OWNER 29 6.0984E+35
T1 OWNER 30 6.0984E+35
T1 OWNER 31 6.0984E+35
T1 OWNER 33 6.0984E+35
33 rows selected.
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> exec dbms_stats.gather_table_stats( user, 'T2', method_opt => 'for columns owner size 33' );
PL/SQL procedure successfully completed.
ops$tkyte%ORA9IR2> select * from user_tab_col_statistics where table_name = 'T2';
TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS NUM_BUCKETS LAST_ANAL SAMPLE_SIZE GLO USE AVG_COL_LEN
--------------- --------------- ------------ ---------- ---------- ---------- ---------- ----------- --------- ----------- --- --- -----------
T2 OWNER 33 41 584442 .015563337 0 7 06-AUG-08 30675 YES NO 6
ops$tkyte%ORA9IR2> select * from user_histograms where table_name = 'T2' and column_name = 'OWNER';
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL
--------------- --------------- --------------- -------------- ---------------
T2 OWNER 0 3.3750E+35
T2 OWNER 1 4.1158E+35
T2 OWNER 2 4.1186E+35
T2 OWNER 3 4.1679E+35
T2 OWNER 16 4.1711E+35
T2 OWNER 31 4.3277E+35
T2 OWNER 32 4.3277E+35
T2 OWNER 33 4.5831E+35
8 rows selected.
sys and public so dominated the others in the second case that they allowed us to collapse the histogram.
If you use dbms_xplan to view the results, you can see that we just used a low value for the "low values" and the dominating values appear:
ops$tkyte%ORA9IR2> explain plan for
2 select null from t2 where owner = 'A' union all
3 select null from t2 where owner = 'PUBLIC' union all
4 select null from t2 where owner = 'SYS' union all
5 select null from t2 where owner = 'SYSTEM' union all
6 select null from t2 where owner = 'XDB';
Explained.
ops$tkyte%ORA9IR2> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 27458 | 160K| 85 |
| 1 | UNION-ALL | | | | |
|* 2 | TABLE ACCESS FULL | T2 | 477 | 2862 | 17 |
|* 3 | TABLE ACCESS FULL | T2 | 12084 | 72504 | 17 |
|* 4 | TABLE ACCESS FULL | T2 | 13943 | 83658 | 17 |
|* 5 | TABLE ACCESS FULL | T2 | 477 | 2862 | 17 |
|* 6 | TABLE ACCESS FULL | T2 | 477 | 2862 | 17 |
--------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T2"."OWNER"='A')
3 - filter("T2"."OWNER"='PUBLIC')
4 - filter("T2"."OWNER"='SYS')
5 - filter("T2"."OWNER"='SYSTEM')
6 - filter("T2"."OWNER"='XDB')
Note: cpu costing is off
it is better than good enough - if you up the number of buckets really high - then it will go to the trouble to save them all for you - but not at much gain in that case.
Basically, your table has two types of values
a) low cardinality
b) really high cardinality