Skip to Main Content
  • Questions
  • Generating histogram output against a column

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Peter.

Asked: April 11, 2004 - 12:52 pm UTC

Last updated: July 31, 2017 - 3:01 am UTC

Version: 9.2.0.5

Viewed 1000+ times

You Asked

Hi Tom,

I know I read this somewhere, but I forgot where. I was hoping you can help me or point me to the correct link.

Anyway, I have an attribute column storing data ranging from 0..100.

I would like an output to tell my how many records are within each range using a histogram output. E.g.

0 - 10 *****
11 - 20 **
21 - 30 ****
31 - 40 *********
41 - 50 *
51 - 60 **************
61 - 70 ***
71 - 80 ******
81 - 90 ********
91 - 100 *

What would be the most efficient way to do this? I could do this in JAVA pretty fast, if I had the data in memory. In Java I would only need to make one pass through the dataset, but I think there's got to be a way to do this just as efficiently on the database.

I tried using an inline subquery, for each range above, but I don't think that's the best approach. It seems that each range would make a pass through the data which means if I have 10 range buckets, then I'm making 10 passes through the table.

Thanks,
-Peter

and Tom said...

ops$tkyte@ORA9IR2> create table t
2 as
3 select rownum-1 data
4 from all_objects
5 where rownum <= 101
6 /

Table created.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select data, width_bucket( data, 0, 101, 10 )
2 from t
3 /

DATA WIDTH_BUCKET(DATA,0,101,10)
---------- ---------------------------
0 1
1 1
2 1
3 1
.....
96 10
97 10
98 10
99 10
100 10

101 rows selected.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select width_bucket(data,0,101,10), count(*)
2 from t
3 group by width_bucket(data,0,101,10)
4 /

WIDTH_BUCKET(DATA,0,101,10) COUNT(*)
--------------------------- ----------
1 11
2 10
3 10
4 10
5 10
6 10
7 10
8 10
9 10
10 10

10 rows selected.


(heck for this one simple DIVISION would do it for you!

ops$tkyte@ORA9IR2> select trunc((data-0.000001)/10)+1, count(*)
2 from t
3 group by trunc((data-0.000001)/10)+1
4 /

TRUNC((DATA-0.000001)/10)+1 COUNT(*)
--------------------------- ----------
1 11
2 10
3 10
4 10
5 10
6 10
7 10
8 10
9 10
10 10

10 rows selected.

..... I'll take your java code on speedwise -- both in time to code/debug and runtime performance anyday for that one :)


Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Excellent

Peter Tran, April 11, 2004 - 5:45 pm UTC

Touché!

Thank you!

-Peter

A reader, April 12, 2004 - 6:38 am UTC


width_bucket understanding wrt performance

a, June 04, 2004 - 7:56 am UTC

TOM

I AM GETTING scare as still need lot of understanding :(

Please help me to understand concept of width_bucket

Tom Kyte
June 04, 2004 - 10:43 am UTC

it is just a histogram function?

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:707586567563#15793066195196 <code>

is another example -- it is documented. Just breaks a range of values you pass in, into N "buckets" and assigns a bucket number to each row.

Finishing touches

Jack, June 07, 2004 - 12:19 pm UTC

Here's a way to translate your counts into bars:

  1  select rownum, lpad('*',rownum,'*') hist
  2  from all_objects
  3* where rownum <= 10
SQL> /

    ROWNUM HIST
---------- --------------------
         1 *
         2 **
         3 ***
         4 ****
         5 *****
         6 ******
         7 *******
         8 ********
         9 *********
        10 **********

10 rows selected. 

Empty buckets, how to display them?

José, July 28, 2017 - 8:04 am UTC

Hi Tom,
thanks for your explanation. I like this feature!

I was just wondering if there is a (simple) way to make empty "buckets" visible. That would be of great help to display a histogram without distorting the x axis.

One example: I have an application where case numbers are created. However, as data is imported from different locations, not every number range is used: we have some "holes".

CREATE TABLE SAMPLE_TABLE (CASENUMBER number);

The table is populated from various sources.
SELECT     WIDTH_BUCKET (CASENUMBER, 1, 5000000000, 500) * 10000000 - 10000000 "Bereich (von)",
           WIDTH_BUCKET (CASENUMBER, 1, 5000000000, 500) * 10000000 - 1        "Bereich (bis)",
           COUNT(*)   Anzahl
    FROM   SAMPLE_TABLE
GROUP BY   WIDTH_BUCKET (CASENUMBER, 1, 5000000000, 500)
ORDER BY 1 ASC;


My example shows:
Bereich (von)   Bereich (bis)   Anzahl
3050000000      3059999999      16903
3060000000      3069999999      38977
3070000000      3079999999      39091
3090000000      3099999999      1336

As we can see, there is a "hole" for the range between 3080000000 and 3089999999. I would like to display that as follows:
Bereich (von)   Bereich (bis)   Anzahl
3050000000      3059999999      16903
3060000000      3069999999      38977
3070000000      3079999999      39091
3080000000      3089999999      0
3090000000      3099999999      1336


Is there any way to do that?

Thanks!


Connor McDonald
July 31, 2017 - 3:01 am UTC

You just need to enumerate the possible buckets, and then outer join

SQL> with all_buckets as
  2    ( select rownum-1 bk from dual
  3      connect by level <= 12 ),
  4  original_data as
  5    (
  6    select width_bucket(object_id,1,1000,10) w, count(*) hist
  7    from t
  8    group by width_bucket(object_id,1,1000,10)
  9    )
 10  select bk, nvl(hist,0) hist
 11  from original_data, all_buckets
 12  where bk = w(+)
 13  order by 1;

        BK       HIST
---------- ----------
         0          0
         1         99
         2        100
         3         99
         4         99
         5          0
         6          0
         7        100
         8        100
         9        100
        10         99
        11          0

12 rows selected.