Ok, starting with a) Run the queries from the inside out:
ops$tkyte@ORA9IR2> create table t as select * from all_objects;
Table created.
ops$tkyte@ORA9IR2> define TNAME=&1
ops$tkyte@ORA9IR2> define CNAME=&2
ops$tkyte@ORA9IR2> define BUCKETS=&3
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from (
2 select &cname, dense_rank() over (order by &cname) r
3 from &tname
4 ) where rownum <= 10
5 /
OWNER R
------------------------------ ----------
B 1
BIG_TABLE 2
BIG_TABLE 2
BIG_TABLE 2
CTXSYS 3
CTXSYS 3
CTXSYS 3
CTXSYS 3
CTXSYS 3
CTXSYS 3
10 rows selected.
<b>assign a ranking (dense, 1, 2, 3... ) to the strings. width_buckets for creating histograms -- they need a number. mapping of strings to numbers in sorted order</b>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select * from (
2 select &cname,
3 width_bucket( r, 0, (select count(distinct &cname) from &tname)+1, &buckets) wb
4 from (
5 select &cname, dense_rank() over (order by &cname) r
6 from &tname
7 )
8 ) where rownum <= 10
9 /
OWNER WB
------------------------------ ---
B 1
BIG_TABLE 1
BIG_TABLE 1
BIG_TABLE 1
CTXSYS 2
CTXSYS 2
CTXSYS 2
CTXSYS 2
CTXSYS 2
CTXSYS 2
10 rows selected.
<b>Now, put them into "buckets". we've told Oracle "here is a range of numbers - r, now, please put r into a histogram of &BUCKETS buckets, low range of the histogram is 0 and the high non-inclusive bound is count(distinct &cname)+1 -- since the upper bound is non-inclusive, we added 1"</b>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select wb,
2 count(*) cnt,
3 min(&cname) lo,
4 max(&cname) hi
5 from (
6 select &cname, width_bucket( r, 0, (select count(distinct &cname) from &tname)+1, &buckets) wb
7 from (
8 select &cname, dense_rank() over (order by &cname) r
9 from &tname
10 )
11 )
12 group by wb
13 /
WB CNT LO HI
--- ------ ------------------------------ ------------------------------
1 4 B BIG_TABLE
2 263 CTXSYS DW_XPS
3 333 MDSYS OPS$TKYTE
4 998 ORDPLUGINS ORDSYS
5 93 OUTLN PERFSTAT
6 11743 PUBLIC SCOTT
7 14037 SYS SYSTEM
8 2 U1 U2
9 131 USER1 WMSYS
10 274 WSMGMT XDB
10 rows selected.
<b>once we have them in buckets -- grouping them and counting how many fall into the bucket it easy. we can even get the HI and LO values in the buck as demostrated, the remaing level in the query is just for formatting</b>
<b>so, onto number 2 here:</b>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> set echo off
@wb t object_id 10
old 9: min(&cname) lo,
new 9: min(object_id) lo,
old 10: max(&cname) hi
new 10: max(object_id) hi
old 12: select &cname, width_bucket( r, 0, (select count(distinct &cname) from &tname)+1, &buckets) wb
new 12: select object_id, width_bucket( r, 0, (select count(distinct object_id) from t)+1, 10) wb
old 14: select &cname, dense_rank() over (order by &cname) r
new 14: select object_id, dense_rank() over (order by object_id) r
old 15: from &tname
new 15: from t
WB DATA CNT RAT HIST
--- ------------------ ------ ------- ----------------------------------------
1 3/2849 2787 99.96 ***************************************
2 2850/5722 2788 100.00 ****************************************
3 5723/8540 2788 100.00 ****************************************
4 8541/11328 2788 100.00 ****************************************
5 11329/14116 2788 100.00 ****************************************
6 14117/16904 2788 100.00 ****************************************
7 16905/19692 2788 100.00 ****************************************
8 19693/22480 2788 100.00 ****************************************
9 22481/25285 2788 100.00 ****************************************
10 25286/38285 2787 99.96 ***************************************
10 rows selected.
@wb t owner 10
old 9: min(&cname) lo,
new 9: min(owner) lo,
old 10: max(&cname) hi
new 10: max(owner) hi
old 12: select &cname, width_bucket( r, 0, (select count(distinct &cname) from &tname)+1, &buckets) wb
new 12: select owner, width_bucket( r, 0, (select count(distinct owner) from t)+1, 10) wb
old 14: select &cname, dense_rank() over (order by &cname) r
new 14: select owner, dense_rank() over (order by owner) r
old 15: from &tname
new 15: from t
WB DATA CNT RAT HIST
--- ------------------ ------ ------- ----------------------------------------
1 B/BIG_TABLE 4 .03
2 CTXSYS/DW_XPS 263 1.87
3 MDSYS/OPS$TKYTE 333 2.37
4 ORDPLUGINS/ORDSYS 998 7.11 **
5 OUTLN/PERFSTAT 93 .66
6 PUBLIC/SCOTT 11743 83.66 *********************************
7 SYS/SYSTEM 14037 100.00 ****************************************
8 U1/U2 2 .01
9 USER1/WMSYS 131 .93
10 WSMGMT/XDB 274 1.95
10 rows selected.
<b>you are not using this to determine the optimal nubmer, but rather to visualize of the data is skewed...
object id, appears to not be (use a &3 that fills your screen, lets you see any patterns).
owner appears to definitely be. and in fact, if you used 254 for the buckets in my case, it would completely fill out since I only have 20 distinct schemas with objects. You would see that owner is a candidate for histograms by the pattern.
The number of buckets -- 254 seems good. If there are less distinct values in the table, you get less. If there are more, you get as many as you can. It would be rare to have to play with that number</b>
c) done.