Thanks for the question, Gareth.

Asked: September 15, 2005 - 10:42 am UTC

Last updated: March 29, 2013 - 5:59 pm UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Tom,

I would like to partition a range of values into balanced sets. Initially I figured that one of the Analytics functions might be useful for this and decided to look into these and learn more about them. The question in mind was "For an ordered list of values how can we 'chop' them in to ranges and then list the first and last value for each range. i.e. something like:

Range Start End
0 1 1000
1 1001 2000
2 2001 3000
. . .
. . .

Having read around and got an initial (poor) understanding of analytic function the first attempt was something like:

SELECT my_bucket,
MIN(rn) OVER(PARTITION BY my_bucket ORDER BY rn) min_row,
MAX(rn) OVER(PARTITION BY my_bucket ORDER BY rn) max_row
FROM (SELECT rn,
CASE
WHEN rn BETWEEN 0 AND 1000 THEN 1
WHEN rn BETWEEN 1001 AND 2000 THEN 2
WHEN rn BETWEEN 2001 AND 3000 THEN 3
WHEN rn BETWEEN 3001 AND 4000 THEN 4
WHEN rn BETWEEN 4001 AND 5000 THEN 5
WHEN rn BETWEEN 5001 AND 6000 THEN 6
WHEN rn BETWEEN 6001 AND 7000 THEN 7
WHEN rn BETWEEN 7001 AND 8000 THEN 8
WHEN rn BETWEEN 8001 AND 9000 THEN 9
WHEN rn BETWEEN 9001 AND 10000 THEN 10
END my_bucket,
object_name
FROM (SELECT ROWNUM rn,
object_name
FROM all_objects
WHERE ROWNUM < 10001));

The output this produced was not as expected (hoped - perhaps more accurately) and for each bucket the MIN_ROW is correct but MAX_ROW is one more than the previous. I now understand that this is because the functions are looking at the current row for the partition range.

At this point I thought I could use a 'ROWS' window range to limit the result, but again I hadn't fully grasped how the analytic functions worked with a window range - but after some more head scratching it all made sense and obviously worked as would be expected, i.e. as per the documentation.

Finally we arrived at the below to achieve what we were after:

SELECT DISTINCT bucket,
FIRST_VALUE(rn) OVER(PARTITION BY bucket ORDER BY rn)
min_row,
FIRST_VALUE(rn) OVER(PARTITION BY bucket ORDER BY rn DESC)
max_row
FROM (SELECT rn,
CASE
WHEN rn BETWEEN 0 AND 1000 THEN 1
WHEN rn BETWEEN 1001 AND 2000 THEN 2
WHEN rn BETWEEN 2001 AND 3000 THEN 3
WHEN rn BETWEEN 3001 AND 4000 THEN 4
WHEN rn BETWEEN 4001 AND 5000 THEN 5
WHEN rn BETWEEN 5001 AND 6000 THEN 6
WHEN rn BETWEEN 6001 AND 7000 THEN 7
WHEN rn BETWEEN 7001 AND 8000 THEN 8
WHEN rn BETWEEN 8001 AND 9000 THEN 9
WHEN rn BETWEEN 9001 AND 10000 THEN 10
END bucket,
object_name
FROM (SELECT ROWNUM rn,
object_name
FROM all_objects
WHERE ROWNUM < 10001));

After going through this exercise, which was worth while if for no other reason than improving our understanding of analytic function, we wondered if we'd missed a trick and there was an easier, or more straightforward way of doing this. The crux of the question is:

Is there a way to take n rows and ascertain the first and last value in that range and then skip to the n'th + 1 row and repeat in pure SQL?

Thanks in advance,

Gareth.

and Tom said...

ops$tkyte@ORA10G> select min(object_id), max(object_id), count(*), nt
2 from ( select object_id, ntile(8) over (order by object_id) nt
3 from all_objects
4 )
5 group by nt;

MIN(OBJECT_ID) MAX(OBJECT_ID) COUNT(*) NT
-------------- -------------- ---------- ----------
2 6900 6088 1
6901 13285 6088 2
13286 19373 6088 3
19374 25460 6087 4
25461 31547 6087 5
31548 37640 6087 6
37641 45252 6087 7
45253 80371 6087 8

8 rows selected.

analytics rock, they roll.

Rating

(35 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

great!

oraboy, September 15, 2005 - 5:53 pm UTC

thats amazing!!

"analytics rock, they roll. "
- cant agree anymore..may be its time you add to usual phrase
"analytics rock, they roll
- all the way to glory"
:)

5 stars with two thumps-up!

Big deal

Mikito Harakiri, September 15, 2005 - 6:35 pm UTC

with a as (
select count(1) total from all_objects
) select min(object_id), max(object_id), count(*), nt from (
select object_id, floor(rownum*8/total) nt
from all_objects, a
order by object_id
)
group by nt;

Admittedly, 3 times slower

September 15, 2005 - 6:54 pm UTC

so, not nearly as good, I agree - it would be a bad approach.

NTILE rocks and rolls ;) Thanks for confirming that analytics are the coolest thing to happen to SQL since the keyword SELECT!

What a PITA

mdinh, September 15, 2005 - 6:44 pm UTC

Mikito Harakiri, why do you have to be such a PITA?

SYS@TIGGER> set autotrace traceonly
SYS@TIGGER> SELECT MIN (object_id), MAX (object_id), COUNT (*), nt
FROM (SELECT object_id, NTILE (8) OVER (ORDER BY object_id) nt
FROM all_objects)
GROUP BY nt; 2 3 4

8 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (GROUP BY)
2 1 VIEW
3 2 WINDOW (SORT)
4 3 FILTER
5 4 TABLE ACCESS (BY INDEX ROWID) OF 'OBJ$'
6 5 NESTED LOOPS
7 6 TABLE ACCESS (FULL) OF 'USER$'
8 6 INDEX (RANGE SCAN) OF 'I_OBJ2' (UNIQUE)
9 4 TABLE ACCESS (BY INDEX ROWID) OF 'IND$'
10 9 INDEX (UNIQUE SCAN) OF 'I_IND1' (UNIQUE)
11 4 TABLE ACCESS (BY INDEX ROWID) OF 'OBJAUTH$'
12 11 NESTED LOOPS
13 12 FIXED TABLE (FULL) OF 'X$KZSRO'
14 12 INDEX (RANGE SCAN) OF 'I_OBJAUTH2' (NON-UNIQUE
)

15 4 FIXED TABLE (FULL) OF 'X$KZSPR'
16 4 FIXED TABLE (FULL) OF 'X$KZSPR'
17 4 FIXED TABLE (FULL) OF 'X$KZSPR'
18 4 FIXED TABLE (FULL) OF 'X$KZSPR'
19 4 FIXED TABLE (FULL) OF 'X$KZSPR'
20 4 FIXED TABLE (FULL) OF 'X$KZSPR'
21 4 FIXED TABLE (FULL) OF 'X$KZSPR'
22 4 FIXED TABLE (FULL) OF 'X$KZSPR'
23 4 FIXED TABLE (FULL) OF 'X$KZSPR'
24 4 FIXED TABLE (FULL) OF 'X$KZSPR'
25 4 FIXED TABLE (FULL) OF 'X$KZSPR'
26 4 FIXED TABLE (FULL) OF 'X$KZSPR'
27 4 FIXED TABLE (FULL) OF 'X$KZSPR'
28 4 FIXED TABLE (FULL) OF 'X$KZSPR'
29 4 FIXED TABLE (FULL) OF 'X$KZSPR'
30 4 FIXED TABLE (FULL) OF 'X$KZSPR'
31 4 FIXED TABLE (FULL) OF 'X$KZSPR'

Statistics
----------------------------------------------------------
7 recursive calls
0 db block gets
47893 consistent gets
0 physical reads
0 redo size
873 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
8 rows processed

SYS@TIGGER> WITH a AS
2 (SELECT COUNT (1) total
3 FROM all_objects)
4 SELECT MIN (object_id), MAX (object_id), COUNT (*), nt
5 FROM (SELECT object_id, FLOOR (ROWNUM * 8 / total) nt
6 FROM all_objects, a
7 ORDER BY object_id)
8 GROUP BY nt;

9 rows selected.

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 SORT (GROUP BY)
2 1 VIEW
3 2 SORT (ORDER BY)
4 3 COUNT
5 4 FILTER
6 5 TABLE ACCESS (BY INDEX ROWID) OF 'OBJ$'
7 6 NESTED LOOPS
8 7 NESTED LOOPS
9 8 VIEW
10 9 SORT (AGGREGATE)
11 10 FILTER
12 11 TABLE ACCESS (BY INDEX ROWID) OF 'OB
J$'

13 12 NESTED LOOPS
14 13 TABLE ACCESS (FULL) OF 'USER$'
15 13 INDEX (RANGE SCAN) OF 'I_OBJ2' (
UNIQUE)

16 11 TABLE ACCESS (BY INDEX ROWID) OF 'IN
D$'

17 16 INDEX (UNIQUE SCAN) OF 'I_IND1' (U
NIQUE)

18 11 TABLE ACCESS (BY INDEX ROWID) OF 'OB
JAUTH$'

19 18 NESTED LOOPS
20 19 FIXED TABLE (FULL) OF 'X$KZSRO'
21 19 INDEX (RANGE SCAN) OF 'I_OBJAUTH
2' (NON-UNIQUE)

22 11 FIXED TABLE (FULL) OF 'X$KZSPR'
23 11 FIXED TABLE (FULL) OF 'X$KZSPR'
24 11 FIXED TABLE (FULL) OF 'X$KZSPR'
25 11 FIXED TABLE (FULL) OF 'X$KZSPR'
26 11 FIXED TABLE (FULL) OF 'X$KZSPR'
27 11 FIXED TABLE (FULL) OF 'X$KZSPR'
28 11 FIXED TABLE (FULL) OF 'X$KZSPR'
29 11 FIXED TABLE (FULL) OF 'X$KZSPR'
30 11 FIXED TABLE (FULL) OF 'X$KZSPR'
31 11 FIXED TABLE (FULL) OF 'X$KZSPR'
32 11 FIXED TABLE (FULL) OF 'X$KZSPR'
33 11 FIXED TABLE (FULL) OF 'X$KZSPR'
34 11 FIXED TABLE (FULL) OF 'X$KZSPR'
35 11 FIXED TABLE (FULL) OF 'X$KZSPR'
36 11 FIXED TABLE (FULL) OF 'X$KZSPR'
37 11 FIXED TABLE (FULL) OF 'X$KZSPR'
38 11 FIXED TABLE (FULL) OF 'X$KZSPR'
39 8 TABLE ACCESS (FULL) OF 'USER$'
40 7 INDEX (RANGE SCAN) OF 'I_OBJ2' (UNIQUE)
41 5 TABLE ACCESS (BY INDEX ROWID) OF 'IND$'
42 41 INDEX (UNIQUE SCAN) OF 'I_IND1' (UNIQUE)
43 5 TABLE ACCESS (BY INDEX ROWID) OF 'OBJAUTH$'
44 43 NESTED LOOPS
45 44 FIXED TABLE (FULL) OF 'X$KZSRO'
46 44 INDEX (RANGE SCAN) OF 'I_OBJAUTH2' (NON-UNIQ
UE)

47 5 FIXED TABLE (FULL) OF 'X$KZSPR'
48 5 FIXED TABLE (FULL) OF 'X$KZSPR'
49 5 FIXED TABLE (FULL) OF 'X$KZSPR'
50 5 FIXED TABLE (FULL) OF 'X$KZSPR'
51 5 FIXED TABLE (FULL) OF 'X$KZSPR'
52 5 FIXED TABLE (FULL) OF 'X$KZSPR'
53 5 FIXED TABLE (FULL) OF 'X$KZSPR'
54 5 FIXED TABLE (FULL) OF 'X$KZSPR'
55 5 FIXED TABLE (FULL) OF 'X$KZSPR'
56 5 FIXED TABLE (FULL) OF 'X$KZSPR'
57 5 FIXED TABLE (FULL) OF 'X$KZSPR'
58 5 FIXED TABLE (FULL) OF 'X$KZSPR'
59 5 FIXED TABLE (FULL) OF 'X$KZSPR'
60 5 FIXED TABLE (FULL) OF 'X$KZSPR'
61 5 FIXED TABLE (FULL) OF 'X$KZSPR'
62 5 FIXED TABLE (FULL) OF 'X$KZSPR'
63 5 FIXED TABLE (FULL) OF 'X$KZSPR'

Statistics
----------------------------------------------------------
14 recursive calls
0 db block gets
95786 consistent gets
0 physical reads
0 redo size
909 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
9 rows processed

SYS@TIGGER>

Your suggestion generates 2X LIO.

To mdinh ...

Gabe, September 15, 2005 - 10:46 pm UTC

Why PITA? Seeing other points of view is good ... no? Could learn from them too!

If one is to compare/measure the two methods, maybe using a custom table rather than a complex dictionary view like all_objects would be better. Making sure the two solutions are equivalent also helps see 8 vs. 9 rows processed.

Besides, that "... pure SQL" from the original question may have referred to a non-analytical query. The NTILE solution would indeed be the "easier ... more straightforward", but maybe there is room for alternatives in this case.

I knew something (obvious!) was being missed

Gareth, September 16, 2005 - 4:41 am UTC

Tom,

Many thanks for that, it is exactly what I felt we were missing (if that makes sense)! The solution seems so obvious now and I have no idea why I missed the ntile function, it does EXACTLY what we were asking - d'oh!

ntile (and analytics in general) do indeed appear to rock and roll.

Cheers,

Gareth.

Variation on original query....but not NTILE

Robert, September 19, 2005 - 1:41 pm UTC

Tom,

What if in the original question we wanted to group the results by the actual column VALUES and not just thier COUNT?

For example...

drop table t;
create table t (c number);
insert into t values (1);
insert into t values (2);
insert into t values (3);
insert into t values (7);
insert into t values (8);
insert into t values (15);
insert into t values (17);
insert into t values (20);

select c, ntile(2) over (order by c) nt from t;

C NT
--------- ---------
1 1
2 1
3 1
7 1
8 2
15 2
17 2
20 2

What I want instead... is to divide the VALUES in the 'C' column into 2 groups... like this...

MAX(C)/<# of groups>

C NT
--------- ---------
1 1
2 1
3 1
7 1
8 1 <-- NOW 8 is in the 1st group (between 1 and 10)
15 2
17 2
20 2

Is there an analytic function to accomplish this?

Thank you,

Robert.

September 19, 2005 - 4:46 pm UTC

ops$tkyte@ORA10GR2> select min(user_id), max(user_id), count(*), grp
  2    from (
  3  select user_id, trunc((user_id-0.1)/ceil(max(user_id) over ()/2)) grp
  4    from all_users
  5         )
  6   group by grp
  7   order by grp
  8  /

MIN(USER_ID) MAX(USER_ID)   COUNT(*)        GRP
------------ ------------ ---------- ----------
           0           25          7          0
          34           61         21          1


something like that? (assumes integers - the 0.1 trick will get 1..10 to be in group one and 11..20 to be in group two and so on)

That seems very close

Robert, September 19, 2005 - 5:12 pm UTC

Tom,

I changed it to 10 groups and subtracted the max-min to see the values in the range.... it appears very even, except for grp=2 where there are much fewer rows in the group... I checked the data, and sure enough the data is very sparse around that area.

So would this be the best I could expect for grouping data in this fasion? (or am I asking for something illogical/imposible to get the groups more even based on a potentially unknown distribution of the data :)

Why aren't the DIFFs even among the groups (e.g. GRP=2) ?

In any event, thanks for your help!

  1  select min(user_id)
  2       , max(user_id)
  3       , max(user_id) - min(user_id) diff
  4       , count(*), grp
  5      from (
  6    select user_id, trunc((user_id-0.1)/ceil(max(user_id) over ()/10)) grp
  7      from all_users
  8           )
  9     group by grp
 10*    order by grp
15:38:52 SQL> /

MIN(USER_ID) MAX(USER_ID)      DIFF  COUNT(*)       GRP
------------ ------------ --------- --------- ---------
           0          148       148        90         0
         149          291       142       104         1
         377          444        67        67         2
         445          592       147       123         3
         593          740       147       128         4
         741          888       147       123         5
         889         1036       147       111         6
        1037         1184       147       123         7
        1185         1332       147       139         8
        1333         1476       143       136         9

10 rows selected.

September 19, 2005 - 11:35 pm UTC

if you want an even distribution -- ntile()

if you want to take the (high-min)/cnt and slot them out -- with non-evenly distributed data - you won't get that by definition.

something like this?

Gabe, September 19, 2005 - 9:57 pm UTC

SQL> select c, ntile(2) over (order by c) equiheight_bucket from t;

         C EQUIHEIGHT_BUCKET
---------- -----------------
         1                 1
         2                 1
         3                 1
         7                 1
         8                 2
        15                 2
        17                 2
        20                 2

8 rows selected.

SQL> with x as (select min(c) min_c, max(c) max_c from t)
  2  SELECT c, width_bucket(c,min_c,max_c+1,2) equiwidth_bucket
  3  from  t, x;

         C EQUIWIDTH_BUCKET
---------- ----------------
         1                1
         2                1
         3                1
         7                1
         8                1
        15                2
        17                2
        20                2

8 rows selected.

Thanks Gabe

Oraboy, September 20, 2005 - 5:13 pm UTC

Thanks Gabe..I learnt one more cool thing (width_bucket) from your post.

I wasnt even aware of such a function and this is what I came up with.. the table 't' is simply a dump of all object_ids less than 10000..and then I deleted lots of rows between 6000 and 7000  and 1500-2000 just to check.


SQL>l
  1  With DivRange as (select (max(object_id)+1)/(&N) divrange from t)
  2  select min(object_id),max(object_id), max(object_id)-min(object_id) diff , count(*),grp+1 as grp
  3  from (select object_id,trunc(object_id/d.Divrange) grp
  4           from t, divrange d)
  5  group by grp
  6* order by grp
SQL>/
Enter value for n: 10

MIN(OBJECT_ID) MAX(OBJECT_ID)     DIFF   COUNT(*)        GRP
-------------- -------------- -------- ---------- ----------
             3            865      862        803          1
           866           1499      633        634          2
          2001           2597      596        595          3
          2598           3463      865        853          4
          3464           4329      865        822          5
          4330           5195      865        389          6
          5196           6061      865        846          7
          6062           6927      865        158          8
          6928           7793      865        703          9
          7794           8658      864        749         10

10 rows selected.

Elapsed: 00:00:00.00

SQL>REM Gabes sql
SQL>REM 
SQL>With DivRange as (select min(object_id) minid, max(object_id) maxid from t)
  2  select min(object_id), max(object_id),max(object_id) - min(object_id) diff, count(*), nt
  3  from (
  4         select object_id,width_bucket(object_id,minid,maxid+1,&n) nt from t ,DivRange
  5       )
  6  group by nt order by nt
  7  /
Enter value for n: 10

MIN(OBJECT_ID) MAX(OBJECT_ID)     DIFF   COUNT(*)         NT
-------------- -------------- -------- ---------- ----------
             3            868      865        806          1
           869           1499      630        631          2
          2001           2599      598        597          3
          2600           3465      865        853          4
          3466           4330      864        821          5
          4331           5196      865        389          6
          5197           6062      865        846          7
          6063           6927      864        157          8
          6928           7793      865        703          9
          7794           8658      864        749         10

10 rows selected.

Elapsed: 00:00:00.00

autotrace show Full Table scan for both approaches (with LIOs ~ 30 blocks in both)

A little variation

Asim Naveed, October 02, 2005 - 10:13 am UTC

I have a problem slightly varied.
I have a table with only one column.

CREATE TABLE RN (MYNUM NUMBER(10));

INSERT INTO RN (MYNUM) VALUES(1);
INSERT INTO RN (MYNUM) VALUES(2);
INSERT INTO RN (MYNUM) VALUES(3);
INSERT INTO RN (MYNUM) VALUES(4);
INSERT INTO RN (MYNUM) VALUES(5);
INSERT INTO RN (MYNUM) VALUES(7);
INSERT INTO RN (MYNUM) VALUES(8);
INSERT INTO RN (MYNUM) VALUES(9);
INSERT INTO RN (MYNUM) VALUES(15);
INSERT INTO RN (MYNUM) VALUES(16);
INSERT INTO RN (MYNUM) VALUES(17);

What I want is a query that will make ranges from the
above data on the bases of missing nos.

I want a query that will return me

RF RT
1 5
7 9
15 17

Thanks

October 02, 2005 - 11:01 am UTC

ops$tkyte@ORA9IR2> select min(mynum), max(mynum)
  2    from (
  3  select mynum,
  4         max(grp) over (order by mynum) maxgrp
  5    from (
  6  select mynum,
  7         case when lag(mynum) over (order by mynum) <> mynum-1
  8              then row_number() over (order by mynum)
  9          end grp
 10    from rn
 11         )
 12         )
 13   group by maxgrp
 14   order by 1
 15  /
 
MIN(MYNUM) MAX(MYNUM)
---------- ----------
         1          5
         7          9
        15         17

Very Good

PRS, October 02, 2005 - 5:33 pm UTC

Can you please explain the query.

October 02, 2005 - 5:43 pm UTC

see
</code> https://www.oracle.com/technetwork/issue-archive/2014/14-mar/o24asktom-2147206.html <code>

and read analytics to the rescue, uses the same technique.

Another thing to do -- run the innermost query, then the next layer, then the next layer and so on -- to see what they do.

Space for the "buckets" to grow...

MarkB..., December 04, 2005 - 11:36 am UTC

How about if the last "bucket" needs to have space to grow?

My requirement is to balance a set of SKUs into a number of ranges, and ensure that the last range has x% growth, since the surrogate key on the SKU is generated from a sequence (therefore new additions will fall into the last bucket range, which would quickly become "un-balanced").

Can you help?

December 04, 2005 - 11:54 am UTC

I don't get it?

what does it mean for a range to have "x% growth"???


ntile breaks a set of M rows (regardless of key values) into N buckets each having about M/N rows in it.


M = 1000
N = 10

each bucket would have 100 rows in it.


ops$tkyte@ORA10GR2> create table t as select dbms_random.random id from all_objects where rownum <= 1000;

Table created.

ops$tkyte@ORA10GR2>
ops$tkyte@ORA10GR2> select nt, min(id), max(id), max(id)-min(id) gap, count(*)
  2    from (select id, ntile(10) over (order by id) nt from t )
  3   group by nt
  4   order by nt
  5  /

        NT    MIN(ID)    MAX(ID)        GAP   COUNT(*)
---------- ---------- ---------- ---------- ----------
         1 -2.145E+09 -1.727E+09  417541517        100
         2 -1.725E+09 -1.252E+09  472724860        100
         3 -1.245E+09 -869326065  375722857        100
         4 -864818826 -418490867  446327959        100
         5 -411805973   39727487  451533460        100
         6   41056961  492515025  451458064        100
         7  493298437  852795730  359497293        100
         8  853341595 1249840427  396498832        100
         9 1252590379 1648824944  396234565        100
        10 1649062217 2140724647  491662430        100

10 rows selected.

Shachi, December 05, 2005 - 8:17 am UTC

Hi Tom,

How to divide rows in buckets of different sizes. e.g Let's say 3 buckets of 70% rows, 20% and 10% rows respectively.
How can we achieve it?

December 06, 2005 - 4:33 am UTC

ops$tkyte@ORA9IR2> select min(object_id), max(object_id), count(*), grp
  2    from (
  3  select object_id,
  4             case when rn/cnt <= .7 then 70
  5                  when rn/cnt <= .9 then 20
  6                          else                   10
  7                  end grp
  8    from (
  9  select object_id, row_number() over (order by object_id) rn, 
                  count(*) over () cnt
 10    from all_objects
 11         )
 12         )
 13   group by grp
 14  /

MIN(OBJECT_ID) MAX(OBJECT_ID)   COUNT(*)        GRP
-------------- -------------- ---------- ----------
         29496          33977       3092         10
         22314          29495       6184         20
             3          22313      21644         70

Different sized buckets

Bob B, December 05, 2005 - 9:56 am UTC

Find the common divisor and find out how many buckets you need for that percent and decode.

10%, 20%, 70% - common divisor is 10% = 10 buckets

DECODE(
ntile(10) over ( order by id ),
1, 1,
2, 2,
3, 2,
3
) Bucket

Bucket 1 has ~10% of the rows
Bucket 2 has ~20% of the rows
Bucket 3 has ~70% of the rows

December 06, 2005 - 5:01 am UTC

indeed, that is a good way. easier than what I first thought of.

Jean-Pierre, December 06, 2005 - 9:20 am UTC

Hello Tom

Say I have these data

C1 C2
-----
A 1
A 1
B 3
B 4
C 5
D 6
......
......
......

How can I have the pourcentage of A,B,C and so on. Like this :

C1 count(*) %
A 2 n%
B 3 n%
C 1 n%
......

I don't know how many distinct C1 there is in my table, but of course I can know how many C2 there is for each distinct C1

Thanks a lot for reply

December 06, 2005 - 9:50 am UTC

I'd show you how to use ratio_to_report, but I don't have your table or data ;(

Jean-Pierre, December 06, 2005 - 10:38 am UTC

Here is my test

create table T (C1 varchar2(1),C2 varchar2(1));

insert into T values ('A','1');
insert into T values ('A','2');
insert into T values ('A','3');
insert into T values ('A','4');
insert into T values ('B','5');
insert into T values ('B','6');
insert into T values ('B','7');
insert into T values ('D','8');
insert into T values ('D','9');
...

The probleme is that I don't know how many distinct C1 there is in table T.

The result I'd like is :

Col1 Col2 Col3
---- ---- -------
A N rows x% of total rows
B N rows x% of total rows
C N rows x% of total rows
etc...

Thanks a lot

December 06, 2005 - 2:41 pm UTC

see below :) someone got my ratio to report hint right above.......

Analytics Rock

Bob B, December 06, 2005 - 11:10 am UTC

SELECT
COL1,
COUNT(*) COL2
COUNT(*)/(COUNT(*) OVER ( PARTITION BY COL1 )) COL3
FROM T
GROUP BY COL1

My question

Steve, December 06, 2005 - 12:13 pm UTC

Hi Tom,

I have a table T,

CREATE TABLE T ( dept_num, item_id ) AS
SELECT MOD(TRUNC(dbms_RANDOM.VALUE*100),12),
MOD(TRUNC(dbms_RANDOM.VALUE*1000),100)
FROM all_objects;

I like to partition table T into k (say 4) subsets,
and the item in the same dept must be in the same subset.

how to make each subset as balanced as possible?

Thanks!

December 06, 2005 - 3:20 pm UTC

it is tough in sql as it is sort of a bin fitting problem (maybe someone else wants to write the iterative solution seeking routine using the model clause :)


you can use something close to what I use to split a table up into N chunks by extents:

ops$tkyte@ORA9IR2> select dept_num, cnt,sum(cnt) over (order by dept_num) sum_cnt,
  2         ceil(sum(cnt) over ()/4)sz
  3    from (
  4  select dept_num, count(*) cnt
  5    from t
  6   group by dept_num
  7         )
  8  /

  DEPT_NUM        CNT    SUM_CNT         SZ
---------- ---------- ---------- ----------
         0       2761       2761       7731
         1       2896       5657       7731
         2       2788       8445       7731
         3       2683      11128       7731
         4       2487      13615       7731
         5       2433      16048       7731
         6       2454      18502       7731
         7       2538      21040       7731
         8       2390      23430       7731
         9       2529      25959       7731
        10       2557      28516       7731
        11       2406      30922       7731

12 rows selected.

<b>our perfect set size is 7731, we'll never get it, but we'll do the best we can "by deptno" (sorted by deptno)</b>


ops$tkyte@ORA9IR2> select grp, min(dept_num), max(dept_num), sum(cnt)
  2    from (
  3  select dept_num, cnt,
  4         floor( (sum(cnt) over (order by dept_num)) / ceil(sum(cnt) over ()/4) ) grp
  5    from (
  6  select dept_num, count(*) cnt
  7    from t
  8   group by dept_num
  9         )
 10         )
 11   group by grp
 12  /

       GRP MIN(DEPT_NUM) MAX(DEPT_NUM)   SUM(CNT)
---------- ------------- ------------- ----------
         0             0             1       5657
         1             2             4       7958
         2             5             7       7425
         3             8            11       9882

ops$tkyte@ORA9IR2>


You could play with some fudge factors on the group size - but you'll get errors either way depending on the data input into the table...

to: Jean-Pierre from FRANCE

Marcio Portes, December 06, 2005 - 1:48 pm UTC

Is it what you're looking for?

SQL> select c1, count(*),
  2         (ratio_to_report(count(*)) over ())*100 ratio_n
  3    from t
  4   group by c1
  5  /

C      COUNT(*)       RATIO_N
- ------------- -------------
A             4 44,4444444444
B             3 33,3333333333
D             2 22,2222222222

3 rows selected.

Always calling me "someone"

Marcio Portes, December 06, 2005 - 9:15 pm UTC

<quote>
Followup:

see below :) someone got my ratio to report hint right above.......
</quote>

and

</code> http://tkyte.blogspot.com/2005/10/someone-asked-me.html <code>

That's becoming a habit :) LoL

Cheers,
Marcio "someone" Portes

How to do such grouping ?

Parag J Patankar, January 31, 2006 - 7:54 am UTC

Hi Tom,

I have following data

create table emp1 as select * from scott.emp;
insert into emp1 (empno, ename) values (8000, 'PARAG');
insert into emp1 (empno, ename) values (8001, 'TOM');

commit;

so my emp1 table is now 15 records. Now I want this data to be grouped into 4 groups such as 3 groups will have 4 emps and 5th group will have 3 employees How can I do this ?

I am selecting empno, ename

select empno, ename from emp1 order by dbms_random.value;

So my desired output

EMPNO ENAME GROUP
---------- -------- -----
7934 MILLER 1
7782 CLARK 1
7566 JONES 1
7521 WARD 1
7654 MARTIN 2
7876 ADAMS 2
7698 BLAKE 2
7777 FORD 2
7369 SMITH 3
7499 ALLEN 3
8001 TOM 3
7844 TURNER 3
8000 PARAG 4
7788 SCOTT 4
7839 KING 4

15 rows selected.

How can I do this ?

regards & thanks
pjp

January 31, 2006 - 3:17 pm UTC

ops$tkyte@ORA10GR2> create table t as select * from scott.emp;

Table created.

ops$tkyte@ORA10GR2> insert into t
  2  select * from scott.emp where rownum = 1;

1 row created.

ops$tkyte@ORA10GR2> update t set ename = lower(ename), empno = -empno where rownum = 1;

1 row updated.

ops$tkyte@ORA10GR2> commit;

Commit complete.

ops$tkyte@ORA10GR2> select empno, ename, ntile(4) over (order by dbms_random.value) grp
  2    from t;

     EMPNO ENAME             GRP
---------- ---------- ----------
      7839 KING                1
      7698 BLAKE               1
      7521 WARD                1
      7934 MILLER              1
      7566 JONES               2
      7876 ADAMS               2
      7844 TURNER              2
     -7369 smith               2
      7788 SCOTT               3
      7654 MARTIN              3
      7499 ALLEN               3
      7902 FORD                3
      7900 JAMES               4
      7782 CLARK               4
      7369 SMITH               4

15 rows selected.

But, what about ...

Mark, March 14, 2006 - 12:00 pm UTC

How about an 8i Standard solution to this problem?

Regards,
MArk

March 15, 2006 - 9:03 am UTC

second query...

ops$tkyte@ORA9IR2> select min(object_id), max(object_id), count(*), nt
  2  from ( select object_id, ntile(8) over (order by object_id) nt
  3  from all_objects
  4  )
  5  group by nt;

MIN(OBJECT_ID) MAX(OBJECT_ID)   COUNT(*)         NT
-------------- -------------- ---------- ----------
             3           3996       3867          1
          3997           8401       3867          2
          8402          12268       3867          3
         12269          16137       3867          4
         16138          20004       3867          5
         20005          23875       3867          6
         23876          28659       3867          7
         28660          33936       3867          8

8 rows selected.

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select min(object_id), max(object_id), count(*), nt
  2    from
  3  (
  4  select trunc((rownum-0.1)/(select trunc(count(*)/8) from all_objects)) nt, object_id
  5    from
  6  (
  7  select object_id
  8    from all_objects
  9   order by object_id
 10  )
 11  )
 12  group by nt;

MIN(OBJECT_ID) MAX(OBJECT_ID)   COUNT(*)         NT
-------------- -------------- ---------- ----------
             3           3996       3867          0
          3997           8401       3867          1
          8402          12268       3867          2
         12269          16137       3867          3
         16138          20004       3867          4
         20005          23875       3867          5
         23876          28659       3867          6
         28660          33936       3867          7

8 rows selected.

A reader, April 12, 2006 - 4:35 pm UTC

Close ...

Greg, March 22, 2007 - 1:50 pm UTC

Was looking for some help with a problem I have, and this thread came sooooo close ... :)
(Of course, learned about NTILE and WIDTH_BUCKET ... how cool is that?)

Ok ... so far we've been talking about dividing a group into a number of buckets and keeping it even "by count" ... how to do it "by sum" ?

Here's a sample script and setup to help demonstrate what I'm looking for:

drop table junk;

create table junk
( name varchar2(5),
bytes number )
/

insert into junk values ( 'A', 2837456 );
insert into junk values ( 'B', 984563 );
insert into junk values ( 'C', 92346512 );
insert into junk values ( 'D', 37465 );
insert into junk values ( 'E', 10293243 );
insert into junk values ( 'F', 282374 );
insert into junk values ( 'G', 27364 );
insert into junk values ( 'H', 883745 );
insert into junk values ( 'I', 227364 );
insert into junk values ( 'J', 827345 );
insert into junk values ( 'K', 2833 );
insert into junk values ( 'L', 123 );

commit;

set pagesize 50

break on grp skip 1
compute sum of bytes on grp

select name, grp, bytes, sum(bytes) over ( partition by grp )
from ( select name, bytes,
ntile(3) over (order by bytes desc) grp from junk )
/

Results in:
NAME GRP BYTES SUM(BYTES)OVER(PARTITIONBYGRP)
------------------------------ ---------- ---------- ------------------------------
C 1 92346512 106461774
E 10293243 106461774
A 2837456 106461774
B 984563 106461774
********** ----------
sum 106461774

H 2 883745 2220828
J 827345 2220828
I 227364 2220828
F 282374 2220828
********** ----------
sum 2220828

D 3 37465 67785
K 2833 67785
G 27364 67785
L 123 67785
********** ----------
sum 67785

12 rows selected.

However, what I would prefer to see, are those SUMS closer in total ... even if that means 1 group only has 1 record in it ...

SQL > select sum(bytes) from junk;

SUM(BYTES)
----------------
108,750,387

1 row selected.

SQL > select sum(bytes)/3 from junk;

SUM(BYTES)/3
------------
36250129

1 row selected.

Based on those queries, I'm expecting to see all 3 "buckets" with a total close to 36250129
(I know we won't hit perfect, don't want perfect, just want close).

I've done this previously with PL/SQL ... but would like to know if we can pull this off in SQL .. ;)

March 22, 2007 - 2:59 pm UTC

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10498431232211

see the "split" query I have there that splits a table up into near equi sized 'chunks' by using a running total of the extent sizes...

what if we don't have a key column

dav, April 24, 2007 - 2:13 pm UTC

How do we use ntile() to split the range even if we don't have key column.

I've a million record table with account numbers repeating along with some other columns. Table has no key column. I want to split the table into 10 ranges by accounting number field and send the min(account) and max(account) to a PL/SQL procedure so that I can execute the procedure in parallel with different account number ranges.

Please let me know. Thank you

April 24, 2007 - 3:02 pm UTC

well, we have to assume that account_number repeats, so, in the following, pretend ID is account_number.

start by rolling up to account_number (id)

  9  select id, count(*) cnt
 10    from t
 11   group by id

add to that a running total of record cnts and the total of all records:

  5  select id, cnt,
  6         sum(cnt) over (order by id) running_total,
  7         sum(cnt) over () total
  8    from (
  9  select id, count(*) cnt
 10    from t
 11   group by id
 12         )

Next, we can put everyone into a "group" by dividing:

  3  select id, cnt, ceil( running_total / ( total/4 )) grp
  4    from (
  5  select id, cnt,
  6         sum(cnt) over (order by id) running_total,
  7         sum(cnt) over () total
  8    from (
  9  select id, count(*) cnt
 10    from t
 11   group by id
 12         )
 13         )

that used "4", you can adjust...

and now we just aggregate:

ops$tkyte%ORA9IR2> /*
DOC>drop table t;
DOC>
DOC>create table t
DOC>as
DOC>select mod(object_id,200) id
DOC>  from all_objects;
DOC>*/
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> select count(distinct id), count(*) from t;

COUNT(DISTINCTID)   COUNT(*)
----------------- ----------
              200      30398

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> select min(id), max(id), sum(cnt), grp
  2    from (
  3  select id, cnt, ceil( running_total / ( total/4 )) grp
  4    from (
  5  select id, cnt,
  6         sum(cnt) over (order by id) running_total,
  7         sum(cnt) over () total
  8    from (
  9  select id, count(*) cnt
 10    from t
 11   group by id
 12         )
 13         )
 14         )
 15   group by grp
 16   order by grp
 17  /

   MIN(ID)    MAX(ID)   SUM(CNT)        GRP
---------- ---------- ---------- ----------
         0         48       7539          1
        49         98       7587          2
        99        148       7569          3
       149        199       7703          4


ops$tkyte%ORA9IR2> select min(id), max(id), sum(cnt), grp
  2    from (
  3  select id, cnt, ceil( running_total / ( total/6 )) grp
  4    from (
  5  select id, cnt,
  6         sum(cnt) over (order by id) running_total,
  7         sum(cnt) over () total
  8    from (
  9  select id, count(*) cnt
 10    from t
 11   group by id
 12         )
 13         )
 14         )
 15   group by grp
 16   order by grp
 17  /

   MIN(ID)    MAX(ID)   SUM(CNT)        GRP
---------- ---------- ---------- ----------
         0         31       4933          1
        32         65       5186          2
        66         98       5007          3
        99        131       4996          4
       132        165       5127          5
       166        199       5149          6

6 rows selected.

Works Great.

A reader, April 24, 2007 - 3:08 pm UTC

Power of Analytic functions. Thanks.

Use NTILE to split into ranges

Kailash Subbaraman, August 08, 2007 - 1:10 pm UTC

Tom,

I have been a very big fan of you. I have solved lot of problems with analytic functions (truly analytics rock and analytics roll). But I have the following situation that I need your help.

I have a table with 2 columns :
Create Table A (F1 VARCHAR2(2), F2 VARCHAR2(3));

INSERT INTO A ( F1, F2 ) VALUES ( 'A1', '1');
INSERT INTO A ( F1, F2 ) VALUES ( 'A1', '2');
INSERT INTO A ( F1, F2 ) VALUES ( 'A1', '3');
INSERT INTO A ( F1, F2 ) VALUES ( 'A1', '4');
INSERT INTO A ( F1, F2 ) VALUES ( 'A2', '2');
INSERT INTO A ( F1, F2 ) VALUES ( 'A3', '1');
INSERT INTO A ( F1, F2 ) VALUES ( 'A4', '5');
INSERT INTO A ( F1, F2 ) VALUES ( 'A2', '3');
INSERT INTO A ( F1, F2 ) VALUES ( 'A2', '4');
INSERT INTO A ( F1, F2 ) VALUES ( 'A2', '5');
INSERT INTO A ( F1, F2 ) VALUES ( 'A2', '6');
INSERT INTO A ( F1, F2 ) VALUES ( 'A2', '7');
INSERT INTO A ( F1, F2 ) VALUES ( 'A3', '6');
INSERT INTO A ( F1, F2 ) VALUES ( 'A3', '7');
INSERT INTO A ( F1, F2 ) VALUES ( 'A3', '8');
INSERT INTO A ( F1, F2 ) VALUES ( 'A4', '5');
INSERT INTO A ( F1, F2 ) VALUES ( 'A4', '7');

I need to split these rows into say 3 groups (using function such as NTILE) in such a way that all records with the same value for the column F1 should be in the same group.

e.g

A1 1 1
A1 2 1
A1 3 1
A1 4 1

A2 2 2
A2 3 2
A2 4 2
A2 5 2
A2 6 2
A2 7 2

A3 1 3
A3 6 3
A3 7 3
A3 8 3
A4 5 3
A4 7 3

Thanks

franc, December 21, 2009 - 10:21 am UTC

so how do i get the whole (with the max value of one column) row from one group (ntile function used)??

Any ideas??

with regards
Fr.

December 21, 2009 - 3:13 pm UTC

I'm not sure what you mean here.

Unbalanced Sets ?

bc, December 23, 2009 - 11:11 am UTC

How can I split the below data into different buckets, the split should occur when there is a gap in time between the end time of the current row and the start time
of the next row.

My data looks like this, and the intent is to assign rows 1 thru 5 to bucket 1 and rows 6 thru 10 in bucket 2 and so forth. the number of buckets can vary depeding on
the number of gaps in the employees time.

ROWNUM EMPLOYEE_ID WORK_DAY START_TIME END_TIME HOURS_WORKED
---------- ----------- ----------------- ----------------------- ----------------------- ------------
1 99121331 22-Dec-2009 22-Dec-2009 12:00:00 AM 22-Dec-2009 04:00:00 AM 4.0
2 99121331 22-Dec-2009 22-Dec-2009 04:00:00 AM 22-Dec-2009 04:30:00 AM 0.5
3 99121331 22-Dec-2009 22-Dec-2009 04:30:00 AM 22-Dec-2009 08:30:00 AM 4.0
4 99121331 22-Dec-2009 22-Dec-2009 08:30:00 AM 22-Dec-2009 09:00:00 AM 0.5
5 99121331 22-Dec-2009 22-Dec-2009 09:00:00 AM 22-Dec-2009 12:00:00 PM 3.0
6 99121331 22-Dec-2009 22-Dec-2009 01:00:00 PM 22-Dec-2009 05:00:00 PM 4.0
7 99121331 22-Dec-2009 22-Dec-2009 05:00:00 PM 22-Dec-2009 05:30:00 PM 0.5
8 99121331 22-Dec-2009 22-Dec-2009 05:30:00 PM 22-Dec-2009 09:30:00 PM 4.0
9 99121331 22-Dec-2009 22-Dec-2009 09:30:00 PM 22-Dec-2009 10:00:00 PM 0.5
10 99121331 22-Dec-2009 22-Dec-2009 10:00:00 PM 23-Dec-2009 12:00:00 AM 2.0

script to create test data,

create table employee_time
(
employee_id number(10),
work_day date,
start_time date,
end_time date,
hours_worked number,
description varchar2(50)
);

insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009', 'dd-mm-yyyy'), to_date('22-12-2009 04:00:00', 'dd-mm-yyyy hh24:mi:ss'), 4, 'work', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));

insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 04:00:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 04:30:00', 'dd-mm-yyyy hh24:mi:ss'), 0.5, 'break', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));

insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 04:30:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 08:30:00', 'dd-mm-yyyy hh24:mi:ss'), 4, 'work', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));

insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 08:30:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 09:00:00', 'dd-mm-yyyy hh24:mi:ss'), 0.5, 'break', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));

insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 09:00:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 12:00:00', 'dd-mm-yyyy hh24:mi:ss'), 3, 'work', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));

insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 13:00:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 17:00:00', 'dd-mm-yyyy hh24:mi:ss'), 4, 'work', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));

insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 17:00:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 17:30:00', 'dd-mm-yyyy hh24:mi:ss'), 0.5, 'break', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));

insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 17:30:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 21:30:00', 'dd-mm-yyyy hh24:mi:ss'), 4, 'work', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));

insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 21:30:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('22-12-2009 22:00:00', 'dd-mm-yyyy hh24:mi:ss'), 0.5, 'break', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));

insert into employee_time (start_time, end_time, hours_worked, description, employee_id, work_day)
values (to_date('22-12-2009 22:00:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('23-12-2009', 'dd-mm-yyyy'), 2, 'work', 99121331, to_date('22-12-2009', 'dd-mm-yyyy'));

I used lag and lead to get the previous and next rows.

select a.*,
case
when ( next_start_time - end_time ) * 24 * 60 > 15 then 'Y'
else 'N'
end split
from ( select rownum,
employee_id,
to_char(work_day, 'dd-Mon-yyyy') work_day,
start_time,
end_time,
hours_worked,
lag( start_time, 1 ) over( partition by employee_id, work_day
order by start_time ) prev_start_time,
lag( end_time, 1 ) over( partition by employee_id, work_day
order by end_time ) prev_end_time,
lead( start_time, 1 ) over( partition by employee_id, work_day
order by start_time ) next_start_time,
lead( end_time, 1 ) over( partition by employee_id, work_day
order by end_time ) next_end_time
from employee_time ) a

Thanks Tom, I appreciate your help.

December 31, 2009 - 12:10 pm UTC

I don't know why you partition by work_day - not sure why work_day even exists - but if you think you do - so be it.

it would be interesting to have test data that would test your partitions - don't you think? Since boundary conditions are what kill us in real life...

ops$tkyte%ORA10GR2> select employee_id, start_time, end_time, last_value(flag ignore nulls) over (partition by employee_id, work_day order by start_time) grp
  2    from (
  3  select employee_id, start_time, end_time, work_day,
  4         case when nvl(lag(end_time) over (partition by employee_id, work_day order by start_time),start_time-1) <> start_time
  5                  then row_number() over (partition by employee_id, work_day order by start_time)
  6                  end flag
  7    from employee_time
  8         )
  9  /

EMPLOYEE_ID START_TIME           END_TIME                    GRP
----------- -------------------- -------------------- ----------
   99121331 22-dec-2009 00:00:00 22-dec-2009 04:00:00          1
   99121331 22-dec-2009 04:00:00 22-dec-2009 04:30:00          1
   99121331 22-dec-2009 04:30:00 22-dec-2009 08:30:00          1
   99121331 22-dec-2009 08:30:00 22-dec-2009 09:00:00          1
   99121331 22-dec-2009 09:00:00 22-dec-2009 12:00:00          1
   99121331 22-dec-2009 13:00:00 22-dec-2009 17:00:00          6
   99121331 22-dec-2009 17:00:00 22-dec-2009 17:30:00          6
   99121331 22-dec-2009 17:30:00 22-dec-2009 21:30:00          6
   99121331 22-dec-2009 21:30:00 22-dec-2009 22:00:00          6
   99121331 22-dec-2009 22:00:00 23-dec-2009 00:00:00          6

10 rows selected.

Sunil, December 31, 2009 - 7:26 am UTC

Try following query

SELECT   employee_id,
         work_day,
         start_time,
         end_time,
         hours_worked,
         description,
         DENSE_RANK () OVER (ORDER BY rn) bucket
  FROM   (SELECT   tt.*, MAX (rnum) OVER (ORDER BY start_time) rn
            FROM   (SELECT   dd.*,
                             CASE
                                WHEN ABS(start_time
                                         - lag (end_time) OVER (ORDER BY start_time)) > 0
                                  THEN ROW_NUMBER () OVER (ORDER BY start_time)
                                WHEN ROW_NUMBER () OVER (ORDER BY start_time) = 1 THEN 1
                                ELSE
                                   NULL
                             END rnum
                      FROM   employee_time dd) tt) order by start_time

Awesome ...

BC, January 05, 2010 - 9:18 am UTC

Tom,

Once again, thank you so much.

As for work date, it already exists in the table ( Been there for 6+ years ) and is used all over the place. Seems like this system was designed by a RDBMS antagonist ...

BC

January 05, 2010 - 10:46 am UTC

... RDBMS
antagonist ...

I liked that, I've heard of "database agnostic", but database antagonistic is even better :)

how to convert time into buckets (minute into 5 minutes)

Peter Lopera, September 10, 2012 - 6:29 am UTC

Hi Tom

I am trying to convert Oracle sysmetric data from minutes samples into 5 minutes sample data.

Each minute I sample v$sysmetric and save data into a table, so when I query the table I have data such as

SAMPLE_TIMESTAMP METRIC_NAME VALUE
-------------------- ------------------------------ ----------
sep-04 1210 Database Time Per Sec 629.758744
sep-04 1211 Database Time Per Sec 405.815103
sep-04 1212 Database Time Per Sec 418.245743
sep-04 1213 Database Time Per Sec 393.123067
sep-04 1214 Database Time Per Sec 284.260073
sep-04 1215 Database Time Per Sec 257.588038
sep-04 1216 Database Time Per Sec 232.273854
sep-04 1217 Database Time Per Sec 297.157191
sep-04 1218 Database Time Per Sec 267.150952
sep-04 1219 Database Time Per Sec 349.309651

I would like to know how to query this data in interval of 5 minutes such s following output:

SAMPLE_TIMESTAMP METRIC_NAME VALUE
-------------------- ------------------------------ ----------
sep-04 1210 - 1214 Database Time Per Sec 426.240546
sep-04 1215 - 1219 Database Time Per Sec 280.695937

what it does is it takes average value from minute 1210 to 1214 and 1215 to 1219javascript:apex.submit('SUBMIT_REVIEW');

Is this possible with SQL?

Thank you

September 14, 2012 - 2:51 pm UTC

sure and I could show you how to do it if I had been given

create table
insert into table

so I could actually write the sql...

all you need to do is get an expression that works on whatever your timestamp data is (strange format there, don't know if that is what you really have in a string or if that is a timestamp or a date) to floor it down to the five minute interval.

ops$tkyte%ORA11GR2> create table t
  2  as
  3  select to_date( 'sep-04 ' || to_char(1209+rownum,'fm0000'), 'mon-yy hh24mi') x,
  4         'blah blah blah' metric,
  5             dbms_random.value value
  6    from dual
  7  connect by level <= 15;

Table created.

ops$tkyte%ORA11GR2> select to_char(x,'mon-yy hh24mi'),
  2         to_char(
  3         to_date( to_char(x,'mon-yy hh24' ) || to_char( to_number(to_char(x,'mi')) - mod(to_number(to_char(x,'mi')), 5 ), 'fm00' ), 'mon-yy hh24mi')
  4             , 'mon-yy hh24mi' ),
  5             metric,
  6             value
  7  from t;

TO_CHAR(X,' TO_CHAR(TO_ METRIC              VALUE
----------- ----------- -------------- ----------
sep-04 1210 sep-04 1210 blah blah blah .206367439
sep-04 1211 sep-04 1210 blah blah blah .821680086
sep-04 1212 sep-04 1210 blah blah blah .043872448
sep-04 1213 sep-04 1210 blah blah blah .621996029
sep-04 1214 sep-04 1210 blah blah blah .256090763
sep-04 1215 sep-04 1215 blah blah blah  .64936993
sep-04 1216 sep-04 1215 blah blah blah .925411805
sep-04 1217 sep-04 1215 blah blah blah  .99099649
sep-04 1218 sep-04 1215 blah blah blah .688817703
sep-04 1219 sep-04 1215 blah blah blah .493901283
sep-04 1220 sep-04 1220 blah blah blah .834203449
sep-04 1221 sep-04 1220 blah blah blah .136844052
sep-04 1222 sep-04 1220 blah blah blah .904672883
sep-04 1223 sep-04 1220 blah blah blah .596391313
sep-04 1224 sep-04 1220 blah blah blah .459076258

15 rows selected.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select to_char(newtime,'mon-yy hh24mi' ), metric, avg(value)
  2    from (
  3  select to_date( to_char(x,'mon-yy hh24' ) || to_char( to_number(to_char(x,'mi')) - mod(to_number(to_char(x,'mi')), 5 ), 'fm00' ), 'mon-yy hh24mi') newtime,
  4             metric,
  5             value
  6   from t
  7         )
  8   group by newtime, metric
  9   order by newtime;

TO_CHAR(NEW METRIC         AVG(VALUE)
----------- -------------- ----------
sep-04 1210 blah blah blah .390001353
sep-04 1215 blah blah blah .749699442
sep-04 1220 blah blah blah .586237591

how to convert time into buckets (minute into 5 minutes)

A reader, September 21, 2012 - 2:07 am UTC

Hi

Sorry didnt attach the data set, including them below.

The solution is good however it works if the data set start in minute 1, if it starts in minute 3 the first bucket would include only 3 minutes data.

I thought of using analytics and partition the data into 5 minutes interval but could not find a way!

create table sysmetric
(
sample_date date,
metric_name varchar2(40),
value number(10, 2)
);
insert into sysmetric values (to_date('20120918 105300', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 629.75);
insert into sysmetric values (to_date('20120918 105400', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 405.81);
insert into sysmetric values (to_date('20120918 105500', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 418.24);
insert into sysmetric values (to_date('20120918 105600', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 393.12);
insert into sysmetric values (to_date('20120918 105700', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 284.26);
insert into sysmetric values (to_date('20120918 105800', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 257.58);
insert into sysmetric values (to_date('20120918 105900', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 232.27);
insert into sysmetric values (to_date('20120918 110000', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 297.15);
insert into sysmetric values (to_date('20120918 110100', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 267.15);
insert into sysmetric values (to_date('20120918 110200', 'yyyymmdd hh24miss'), 'Database Time Per Sec', 349.30);

output
SAMPLE_TIMESTAMP METRIC_NAME VALUE
-------------------- ------------------------------ ----------
sep-18 1053 - 1057 Database Time Per Sec 426.240546
sep-18 1058 - 1102 Database Time Per Sec 280.695937

September 26, 2012 - 12:21 pm UTC

... The solution is good however it works if the data set start in minute 1, if it
starts in minute 3 the first bucket would include only 3 minutes data. ..

just stop - stop right now.

write down your requirements, exactly, precisely, specifically, in great detail, at such a level of detail your mother would be able to write psuedo code for it.

I answered exactly the question - exactly the question - you asked. 100%.

so, apparently you have a totally different question. please specify it clearly, precisely. If you want 5 minute intervals based on a start time of the minimum value - say so, clearly. I think that is what you are saying

(and if it is, you should be able to figure it out - what sort of math would you do if you had the minute of the first sample? once you have that - getting the first sample it pretty easy with a scalar subquery...)

Ntile returns overlapping values

karma, March 29, 2013 - 4:59 pm UTC

Hi Tom:

I am trying to create buckets with non-overlapping values from large table usign ntile but i get overlapping values. I thought ntile will give you non-overlapping values. Is there a way to avoid this ? I would like to have next min value start after max value of previous.

SELECT nt
,MIN (experiment_ver_id)
,MAX (experiment_ver_id)
,COUNT (*)
FROM (SELECT experiment_ver_id, NTILE (4) OVER (ORDER BY experiment_ver_id) nt
FROM experiment_core )
GROUP BY nt
ORDER BY nt;

Results:
NT MIN (experiment_ver_id) MAX (experiment_ver_id) COUNT (*)
1 1002 53826 371902005
2 53826 64228 371902004
3 64228 635067 371902004
4 635067 8369496 371902004

March 29, 2013 - 5:59 pm UTC

http://docs.oracle.com/cd/E14072_01/server.112/e10592/functions113.htm

The number of rows in the buckets can differ by at most 1.

ntile breaks the data set into equi-sized groups.

What if you had a table with 100 rows with x=1, 200 rows with x=2, 300 rows with x=3

and then said "ntile that 4" and it didn't overlap?

In fact, what would you like to see come from such a grouping?

I guess you could just take the max() value - so get nt, max(experiment_ver_id) max_evid, and then use

select nt, nvl(lag(max_evid) over (order by nt)+1, 0) min_evid, max_evid
from (that_result)

that would give you non-overlapping ranges you can be between (the endpoints might not exist in your set, but they can be used for lower and upper bounds). Your buckets could be very uneven in counts after this (your count(*) won't count anymore ;) )

I obviously cannot give you a syntactically correct query since I don't have your data ;)

excellent..!!

Shah, June 14, 2015 - 10:10 am UTC

ntile was exactly what I was looking out for..rocking and rolling !!

Thanks Tom

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

great!

Big deal

What a PITA

To mdinh ...

I knew something (obvious!) was being missed

Variation on original query....but not NTILE

That seems very close

something like this?

Thanks Gabe

A little variation

Very Good

Space for the "buckets" to grow...

Different sized buckets

Analytics Rock

My question

to: Jean-Pierre from FRANCE

Always calling me "someone"

How to do such grouping ?

But, what about ...

Close ...

what if we don't have a key column

Works Great.

Use NTILE to split into ranges

Unbalanced Sets ?

Awesome ...

how to convert time into buckets (minute into 5 minutes)

how to convert time into buckets (minute into 5 minutes)

Ntile returns overlapping values

excellent..!!

More to Explore

Analytics

Balanced sets in SQL

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

great!

Big deal

What a PITA

To mdinh ...

I knew something (obvious!) was being missed

Variation on original query....but not NTILE

That seems very close

something like this?

Thanks Gabe

A little variation

Very Good

Space for the "buckets" to grow...

Different sized buckets

Analytics Rock

My question

to: Jean-Pierre from FRANCE

Always calling me "someone"

How to do such grouping ?

But, what about ...

Close ...

what if we don't have a key column

Works Great.

Use NTILE to split into ranges

Unbalanced Sets ?

Awesome ...

how to convert time into buckets (minute into 5 minutes)

how to convert time into buckets (minute into 5 minutes)

Ntile returns overlapping values

excellent..!!

More to Explore

Analytics