Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Giri.

Asked: November 21, 2011 - 8:39 pm UTC

Last updated: May 13, 2013 - 1:28 pm UTC

Version: 9.2

Viewed 10K+ times! This question is

You Asked

Hi Tom
I enjoy reading your site. I learned lot of tricks from your site. I need to write a SQL to group the data in ranges.
Below is the data

Site_no tot_rec
1001 10000
1002 20000
1003 30500
1004 50000
1005 25000
1006 36000
1007 28000
1008 21000

For every running total greater than 65000 records I need to get starting site, ending site number and sum of records for those sites as below

Start_site Ending_site running_total
1001 1003 60500
1004 1004 50000
1005 1006 61000
1007 1008 49000



CREATE table T (STUDY_SITE number, CNT NUMBER)
/

INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1001,3407)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1002,4323)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1004,1623)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1008,1991)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1011,885)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1012,11597)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1014,1989)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1015,5282)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1017,2841)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1018,5183)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1020,6176)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1022,2784)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1023,25865)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1024,3734)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1026,137)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1028,6005)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1029,76)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1031,4599)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1032,1989)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1034,3427)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1036,879)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1038,6485)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1039,3)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1040,1105)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1041,6460)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1042,968)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1044,471)
/
INSERT INTO t
("STUDY_SITE","CNT")
VALUES
(1045,3360)
/

I want for every running total greater than or equal of 65000 records need to know starting study_site and ending study_site number

if you run below SQL

SELECT study_site, cnt,
SUM (cnt) OVER (ORDER BY study_site)
run_tot
FROM t
ORDER BY study_site;
at study_site 1023 running total is 73946 which is greater than 65000

I want result as

start_site end_site Total_records
1001 1022 48081

next running total should start from study_site 1023 and continue until it reachs running total greater than 65000 and give the starting site as 1023 and whatever the ending site and corresponding total count as above.
I apprecite your help for Oracle world.

Regards,
Giri

and we said...

I'm going to cheat a little here. There is likely a way to do this using the MODEL clause (search this site for the words:

bin fitting

to see examples), but I'm going to take a different tact. A little bit of plsql:

ops$tkyte%ORA11GR2> create or replace type myScalarType as object
  2  ( study_site number, cnt number, the_group number )
  3  /

Type created.

ops$tkyte%ORA11GR2> create or replace type myTableType as table of myScalarType
  2  /

Type created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> create or replace function foo( p_cursor in sys_refcursor, p_threshold in number ) return myTableType
  2  pipelined
  3  as
  4      type array is table of t%rowtype index by binary_integer;
  5  
  6      l_data          array;
  7      l_running_total number := 0;
  8      l_group         number := 1;
  9      n               number := 100;
 10  begin
 11      loop
 12          fetch p_cursor bulk collect into l_data limit N;
 13          for i in 1 .. l_data.count
 14          loop
 15              l_running_total := l_running_total + l_data(i).cnt;
 16              if ( l_running_total > 65000 )
 17              then
 18                  l_group := l_group + 1;
 19                  l_running_total := l_data(i).cnt;
 20              end if;
 21              pipe row( myScalarType( l_data(i).study_site, l_data(i).cnt, l_group ) );
 22          end loop;
 23          exit when p_cursor%notfound;
 24      end loop;
 25      close p_cursor;
 26      return;
 27  end;
 28  /

Function created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select *
  2    from TABLE( foo( cursor(select study_site, cnt from t order by study_site), 65000 ) );

STUDY_SITE        CNT  THE_GROUP
---------- ---------- ----------
      1001       3407          1
      1002       4323          1
      1004       1623          1
      1008       1991          1
      1011        885          1
      1012      11597          1
      1014       1989          1
      1015       5282          1
      1017       2841          1
      1018       5183          1
      1020       6176          1
      1022       2784          1
      1023      25865          2
      1024       3734          2
      1026        137          2
      1028       6005          2
      1029         76          2
      1031       4599          2
      1032       1989          2
      1034       3427          2
      1036        879          2
      1038       6485          2
      1039          3          2
      1040       1105          2
      1041       6460          2
      1042        968          2
      1044        471          2
      1045       3360          3

28 rows selected.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select min(study_site), max(study_site), sum(cnt)
  2    from (
  3  select *
  4    from TABLE( foo( cursor(select study_site, cnt from t order by study_site), 65000 ) )
  5         )
  6   group by the_group
  7   order by the_group
  8  /

MIN(STUDY_SITE) MAX(STUDY_SITE)   SUM(CNT)
--------------- --------------- ----------
           1001            1022      48081
           1023            1044      62203
           1045            1045       3360


Rating

  (27 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

How about...

Venkat, November 23, 2011 - 11:11 am UTC

with threshold as
(select 60500 lim from dual) -- or 65000
----
select min(study_site) as site_from, max(study_site) as site_to, sum(cnt) as tot
from (select study_site, cnt, cumtot,
trunc(cumtot/lim) + sign(mod(cumtot,lim)) - 1 as grp_num
from (select study_site, cnt, sum(cnt) over
(order by study_site rows unbounded preceding) as cumtot
from t
),
threshold
)
group by grp_num
order by grp_num
----

Tom Kyte
November 23, 2011 - 11:16 am UTC

ops$tkyte%ORA11GR2> with threshold as
  2  (select 60500 lim from dual) -- or 65000
  3  ----
  4  select min(study_site) as site_from, max(study_site) as site_to, sum(cnt) as
  5  tot
  6    from (select study_site, cnt, cumtot,
  7                 trunc(cumtot/lim) + sign(mod(cumtot,lim)) - 1 as grp_num
  8            from (select study_site, cnt, sum(cnt) over
  9                         (order by study_site rows unbounded preceding) as cumtot
 10                    from t
 11                  ),
 12                  threshold
 13          )
 14   group by grp_num
 15   order by grp_num
 16  /

 SITE_FROM    SITE_TO        TOT
---------- ---------- ----------
      1001       1022      48081
      1023       1045      65563

ops$tkyte%ORA11GR2> 



where is the last row?

it got rolled into the range 1023-1044 - because you didn't reset the running total...

because the bins are uneven, you cannot use simple integer division to bucket them.

A little bug

Alvaro Varela De Marco, November 23, 2011 - 12:44 pm UTC

Tom,
In the function foo, you define a parameter named "p_threshold", but in the body, you never use it. Instead of this parameter, you use 65000 (hardcoded).
In the sample the parameter has the value 65000, so it's correct, but for other values, it will be an error.

With the wrong version:
select min(study_site), max(study_site), sum(cnt)
  from (
select *
  from TABLE( foo( cursor(select study_site, cnt from t order by study_site), 30000 ) )
       )
 group by the_group
 order by the_group
==>
MIN(STUDY_SITE) MAX(STUDY_SITE) SUM(CNT)                               
--------------- --------------- ------------------
           1001            1022              48081 
           1023            1044              62203 
           1045            1045               3360 



Changing 65000 to p_treshold in the code:

create or replace function foo( p_cursor in sys_refcursor, 
p_threshold in number ) return myTableType
 pipelined
 as
     type array is table of t%rowtype index by binary_integer;
 
     l_data          array;
     l_running_total number := 0;
     l_group         number := 1;
     n               number := 100;
 begin
     loop
         fetch p_cursor bulk collect into l_data limit N;
         for i in 1 .. l_data.count
         loop
             l_running_total := l_running_total + l_data(i).cnt;
             if ( l_running_total > p_threshold )
             then
                 l_group := l_group + 1;
                 l_running_total := l_data(i).cnt;
             end if;
             pipe row( myScalarType( l_data(i).study_site, l_data(i).cnt, l_group ) );

         end loop;
         exit when p_cursor%notfound;
     end loop;
     close p_cursor;
     return;
 end;
 /
 


And now, the result is:
select min(study_site), max(study_site), sum(cnt)
  from (
select *
  from TABLE( foo( cursor(select study_site, cnt from t order by study_site), 30000 ) )
       )
 group by the_group
 order by the_group
==>
MIN(STUDY_SITE)      MAX(STUDY_SITE) SUM(CNT)                               
-------------------- --------------- ----------------- 
                1001            1014             25815 
                1015            1022             22266 
                1023            1026             29736 
                1028            1040             24568 
                1041            1045             11259 

Tom Kyte
November 28, 2011 - 8:40 am UTC

you are correct, sorry about that - I forgot I was trying to make it "more generic"...

A reader, November 23, 2011 - 4:52 pm UTC

if you are on 11Gr2.... using recursive


{code}
with data as
(
select 1001 as site_no, 10000 tot_rec from dual union all
select 1002, 20000 from dual union all
select 1003, 30500 from dual union all
select 1004, 50000 from dual union all
select 1005, 25000 from dual union all
select 1006, 36000 from dual union all
select 1007, 28000 from dual union all
select 1008, 21000 from dual
)

,
data1 as
(
select row_number()over(order by site_no) as rno, Site_no, tot_rec from data
)
,
rec (rno, Site_no, tot_rec, Total, flg) as
(
select rno, Site_no, tot_rec, tot_rec, 0
from data1 where rno=1

union all

select d.rno, d.Site_no, d.tot_rec, case when r.total + d.tot_rec > 65000 Then d.tot_rec ELse r.total + d.tot_rec END, case when r.total + d.tot_rec > 65000 Then r.flg+1 ELse r.flg END
from data1 d, rec r
where d.rno=r.rno+1


)
select
min(site_no), max(site_no), sum(tot_rec)
from rec
group by flg

{code}

Tom Kyte
November 28, 2011 - 8:49 am UTC

neat approach.

however... with just 28 rows - a tkprof looked like this:

with data1 as
(
  select row_number()over(order by study_site) as rno, study_site Site_no, cnt tot_rec from t
)
,
rec (rno, Site_no, tot_rec, Total, flg) as
(
  select rno, Site_no, tot_rec, tot_rec, 0
  from data1 where rno=1
  union all
  select d.rno, d.Site_no, d.tot_rec,
         case when r.total + d.tot_rec > 65000 Then d.tot_rec ELse r.total + d.tot_rec END,
         case when r.total + d.tot_rec > 65000 Then r.flg+1 ELse r.flg END
  from data1 d, rec r
  where d.rno=r.rno+1
)
select
min(site_no), max(site_no), sum(tot_rec)
from rec
group by flg
order by flg

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         3          3          3  SORT GROUP BY (cr=203 pr=0 pw=0 time=9178 us cost=
        28         28         28   VIEW  (cr=203 pr=0 pw=0 time=25632 us cost=13 siz
        28         28         28    UNION ALL (RECURSIVE WITH) BREADTH FIRST (cr=203 
         1          1          1     VIEW  (cr=7 pr=0 pw=0 time=169 us cost=4 size=1
         2          2          2      WINDOW SORT PUSHED RANK (cr=7 pr=0 pw=0 time=1
        28         28         28       TABLE ACCESS FULL T (cr=7 pr=0 pw=0 time=68 u
        27         27         27     HASH JOIN  (cr=196 pr=0 pw=0 time=8428 us cost=
       784        784        784      VIEW  (cr=196 pr=0 pw=0 time=3109 us cost=4 si
       784        784        784       WINDOW SORT (cr=196 pr=0 pw=0 time=2229 us co
       784        784        784        TABLE ACCESS FULL T (cr=196 pr=0 pw=0 time=8
        28         28         28      RECURSIVE WITH PUMP  (cr=0 pr=0 pw=0 time=2224 



that would be an expensive approach in general.

But - very nice.

You answered my question

Giri, November 23, 2011 - 6:03 pm UTC

Hi Tom
Thanks for your time in answering my question. You have answered my question and I was able incorporate in my code.
Have a happy Thanksgiving.

Regards
Giri

MODEL clause does the trick

Jichao Li, November 25, 2011 - 3:00 am UTC

MODEL clause will do the trick. Here the threshold is turned down to 20000 to show more groups.

 SELECT s, MAX(e), MAX(sm) FROM (
    SELECT s, e, cnt, sm FROM t 
     MODEL DIMENSION BY(row_number() over(order by study_site) rn)
           MEASURES(study_site s, study_site e, cnt, cnt sm) 
           RULES(sm[rn > 1] = 
                   CASE WHEN (sm[cv() - 1] + cnt[cv()]) > 20000 OR cnt[cv()] > 20000
                        THEN cnt[cv()]
                        ELSE sm[cv() - 1] + cnt[cv()]
                    END, 
                 s[rn > 1] =
                   CASE WHEN(sm[cv() - 1] + cnt[cv()]) > 20000 OR cnt[cv()] > 20000
                        THEN s[cv()]
                        ELSE s[cv() - 1]
                    END))
  GROUP BY s ORDER BY s; 

S    MAX(E) MAX(SM)
---- ------ -------
1001   1011   12229 
1012   1015   18868 
1017   1022   16984 
1023   1023   25865 
1024   1034   19967 
1036   1045   19731 


Tom Kyte
November 28, 2011 - 9:47 am UTC

I still have to make room on my calendar to master the MODEL clause some day - I'm jealous when I see things like this :)


 SELECT s, MAX(e), MAX(sm) FROM (
    SELECT s, e, cnt, sm FROM t
     MODEL DIMENSION BY(row_number() over(order by study_site) rn)
           MEASURES(study_site s, study_site e, cnt, cnt sm)
           RULES(sm[rn > 1] =
                   CASE WHEN (sm[cv() - 1] + cnt[cv()]) > 20000 OR cnt[cv()] >
20000
                        THEN cnt[cv()]
                        ELSE sm[cv() - 1] + cnt[cv()]
                    END,
                 s[rn > 1] =
                   CASE WHEN(sm[cv() - 1] + cnt[cv()]) > 20000 OR cnt[cv()] >
20000
                        THEN s[cv()]
                        ELSE s[cv() - 1]
                    END))
  GROUP BY s ORDER BY s
  
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          1          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0          7          0           6
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0          8          0           6

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 374  
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         6          6          6  SORT GROUP BY (cr=7 pr=0 pw=0 time=1739 us cost=5 
        28         28         28   VIEW  (cr=7 pr=0 pw=0 time=1680 us cost=4 size=10
        28         28         28    SQL MODEL ORDERED (cr=7 pr=0 pw=0 time=1644 us c
        28         28         28     WINDOW SORT (cr=7 pr=0 pw=0 time=249 us cost=4 
        28         28         28      TABLE ACCESS FULL T (cr=7 pr=0 pw=0 time=124 u



very efficient as well.

Recursive approach

Jichao Li, November 25, 2011 - 4:01 am UTC

The 11gR2 recursive approach, similar to a previous post.
WITH nt AS (
  SELECT study_site, 
         LAG(study_site) over (ORDER BY study_site) prev_site, 
         cnt
    FROM t),
running_sum(s, e, sm) AS (
  SELECT study_site,
         study_site,
         cnt
    FROM nt
   WHERE prev_site IS NULL
   UNION ALL
  SELECT CASE WHEN rs.sm + nt.cnt > 20000 THEN nt.study_site
              ELSE rs.s
         END,
         nt.study_site,
         CASE WHEN rs.sm + nt.cnt > 20000 THEN nt.cnt
              ELSE rs.sm + nt.cnt
         END
    FROM running_sum rs,
         nt
   WHERE nt.prev_site = rs.e)  
SELECT s, MAX(e), MAX(sm)
  FROM running_sum
 GROUP BY s ORDER BY s;

Tom Kyte
November 28, 2011 - 9:49 am UTC

but the tkprof will be almost identical to the prior recursive one - a recursive query built on an analytic used in THIS fashion (referencing the analytic in the join) is not going to scale very well.

More good options

Charlie 木匠, November 25, 2011 - 4:30 pm UTC

JiChao,

Recursive Subquery approaches is inspiring and illuminating.

Thanks,
Charlie 木匠

With MODEL clause, don't forget ORDER BY

Stew Ashton, November 27, 2011 - 4:04 pm UTC

Without the ORDER BY in the RULE, results are non-deterministic. http://docs.oracle.com/cd/E11882_01/server.112/e16579/sqlmodel.htm#i1012511

ORDER BY in RULE defaults to order of columns in DIMENSION BY

Jichao Li, November 27, 2011 - 8:45 pm UTC

Stew,

According to the Oracle's doc below:

---------quote----------
If the ORDER BY clause is not specified, the order defaults to the order of the columns as specified in the DIMENSION BY clause.
--------unquote---------

http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10002.htm#i2168616
So ORDER BY in RULE only needs to be specified when the sequence of calculation is different from what it is in DIMENSION BY clause.

@Jichao Li : thank you!

Stew Ashton, November 28, 2011 - 8:58 am UTC


I missed that when reading the documentation. Your correction is most appreciated.

blows my mind

Roadling, November 28, 2011 - 10:39 pm UTC

The model solution from Jichao Li just blows my mind. I too am quite jealous.

slight change

Venkat, November 30, 2011 - 10:59 am UTC

Can we simplify the second part of Jichao Li's solution using model clause to the following?

s[rn > 1] = case when sm[cv()] = cnt[cv()] then s[cv()]
else s[cv()-1]
end

(since the rule for sm[rn > 1] has already been evaluated)...

Also "dimension by (rownum rn)" seems to work, instead of using row_number...

Here's the complete sql-

SELECT s, MAX(e), MAX(sm) FROM (
select s, e, cnt, sm from t
model dimension by (rownum rn)
measures (study_site s, study_site e, cnt, cnt sm)
RULES (sm[rn > 1] =
CASE WHEN (sm[cv() - 1] + cnt[cv()]) > 20000 OR cnt[cv()] > 20000
THEN cnt[cv()]
ELSE sm[cv() - 1] + cnt[cv()]
end,
s[rn > 1] = case when sm[cv()] = cnt[cv()] then s[cv()]
else s[cv()-1] end)

) group by s order by s;


I see the problem with not using row_number()

Venkat, November 30, 2011 - 11:04 am UTC

I see that row_number() is used to process data in the order of study_site.

Oracle Magazine

Anthony, February 21, 2012 - 5:25 am UTC

I've just read your latest Oracle Magazine article on grouping ranges and although its purpose may have been to showcase alternative techniques, what's wrong with simply doing this:

drop table tmp purge;
create table tmp(
study_site number
, tot_rec number
);

insert into tmp values(1001, 10000);
insert into tmp values(1002, 20000);
insert into tmp values(1003, 30500);
insert into tmp values(1004, 50000);
insert into tmp values(1005, 25000);
insert into tmp values(1006, 36000);
insert into tmp values(1007, 28000);
insert into tmp values(1008, 21000);

commit;

with grp as(
select
study_site
, tot_rec
, floor(sum(tot_rec) over(order by study_site) / 65000) group_id
from
tmp
)
select
min(study_site)
, max(study_site)
, sum(tot_rec)
from
grp
group by
group_id;
Tom Kyte
February 21, 2012 - 9:34 pm UTC

normally - nothing, but that simple approach would not work for the question that was asked ;)

We need to "carry forward" - that was what made it interesting.

http://www.oracle.com/technetwork/issue-archive/2012/12-mar/o22asktom-1518271.html

The question also involves a bit of a bin-fitting problem, where the goal is to group rows until some threshold is met and then start a new group. This is quite challenging to do in “pure SQL,” so I cheated a little with my approach, which uses a small bit of PL/SQL.

a simple division would not work for the problem as stated.

Oracle Magazine

Anthiny, February 21, 2012 - 7:39 am UTC

Slight amendment to my previous post to allow for the case when the running total is exactly divisible by 65000.

drop table tmp purge;
create table tmp(
study_site number
, tot_rec number
);

insert into tmp values(1001, 10000);
insert into tmp values(1002, 20000);
insert into tmp values(1003, 35000); -- amended
insert into tmp values(1004, 50000);
insert into tmp values(1005, 25000);
insert into tmp values(1006, 36000);
insert into tmp values(1007, 28000);
insert into tmp values(1008, 21000);

commit;

with grp as(
select
study_site
, tot_rec
, floor(
case
when (sum(tot_rec) over(order by study_site) / 65000) = floor(sum(tot_rec) over(order by study_site) / 65000) then
(sum(tot_rec) over(order by study_site) / 65000) - 0.1
else
(sum(tot_rec) over(order by study_site) / 65000)
end
) group_id
from
tmp
)
select
min(study_site)
, max(study_site)
, sum(tot_rec)
from
grp
group by
group_id;
Tom Kyte
February 21, 2012 - 9:51 pm UTC

it is a bin fitting problem - a simple divide doesn't do it in general ;)

take this set of data:

create table t ( study_site number, tot_rec number );

insert into t values ( 1001, 10000 );
insert into t values ( 1002, 20000 );
insert into t values ( 1003, 30500 );
insert into t values ( 1004, 50000 );
insert into t values ( 1005, 25000 );
insert into t values ( 1006, 36000 );
insert into t values ( 1007, 28000 );
insert into t values ( 1008, 21000 );



and set the threshold to 50,000 instead of 65,000 and you'll see you get the wrong answer.


ops$tkyte%ORA11GR2> with data1 as
  2  (
  3    select row_number()over(order by study_site) as rno,
  4           study_site Site_no, tot_rec from t
  5  )
  6  ,
  7  rec (rno, Site_no, tot_rec, Total, flg) as
  8  (
  9    select rno, Site_no, tot_rec, tot_rec, 0
 10    from data1 where rno=1
 11    union all
 12    select d.rno, d.Site_no, d.tot_rec,
 13           case when r.total + d.tot_rec > 50000
 14                Then d.tot_rec
 15                Else r.total + d.tot_rec END,
 16           case when r.total + d.tot_rec > 50000
 17                Then r.flg+1
 18                Else r.flg END
 19    from data1 d, rec r
 20    where d.rno=r.rno+1
 21  )
 22  select min(site_no), max(site_no), sum(tot_rec)
 23    from rec
 24   group by flg
 25   order by flg
 26  /

MIN(SITE_NO) MAX(SITE_NO) SUM(TOT_REC)
------------ ------------ ------------
        1001         1002        30000
        1003         1003        30500
        1004         1004        50000
        1005         1005        25000
        1006         1006        36000
        1007         1008        49000

6 rows selected.

ops$tkyte%ORA11GR2> SELECT s, MAX(e), MAX(sm) FROM (
  2      SELECT s, e, cnt, sm FROM t
  3       MODEL DIMENSION BY(row_number()
  4                              over(order by study_site) rn)
  5             MEASURES(study_site s, study_site e, tot_rec cnt, tot_rec sm)
  6             RULES(sm[rn > 1] =
  7                     CASE WHEN (sm[cv() - 1] + cnt[cv()]) > 50000
  8                                OR cnt[cv()] > 50000
  9                          THEN cnt[cv()]
 10                          ELSE sm[cv() - 1] + cnt[cv()]
 11                      END,
 12                   s[rn > 1] =
 13                     CASE WHEN(sm[cv() - 1] + cnt[cv()]) > 50000
 14                               OR cnt[cv()] > 50000
 15                          THEN s[cv()]
 16                          ELSE s[cv() - 1]
 17                      END))
 18    GROUP BY s ORDER BY s;

         S     MAX(E)    MAX(SM)
---------- ---------- ----------
      1001       1002      30000
      1003       1003      30500
      1004       1004      50000
      1005       1005      25000
      1006       1006      36000
      1007       1008      49000

6 rows selected.

ops$tkyte%ORA11GR2> with grp as(
  2      select
  3          study_site
  4      ,   tot_rec
  5      ,   floor(
  6              case
  7                  when (sum(tot_rec) over(order by study_site) / 50000) = floor(sum(tot_rec)
  8  over(order by study_site) / 50000) then
  9                      (sum(tot_rec) over(order by study_site) / 50000) - 0.1
 10              else
 11                  (sum(tot_rec) over(order by study_site) / 50000)
 12              end
 13          ) group_id
 14      from
 15          t
 16  )
 17  select
 18      min(study_site)
 19  ,   max(study_site)
 20  ,   sum(tot_rec)
 21  from
 22      grp
 23  group by
 24      group_id order by 1;

MIN(STUDY_SITE) MAX(STUDY_SITE) SUM(TOT_REC)
--------------- --------------- ------------
           1001            1002        30000
           1003            1003        30500
           1004            1005        75000
           1006            1007        64000
           1008            1008        21000

ops$tkyte%ORA11GR2> 

A reader, February 22, 2012 - 9:34 am UTC

Hi tom,

I see the plan for subquery factoring is not very efficint is there will be improvement going forward in future rel of oracle?

Thanks
Tom Kyte
February 23, 2012 - 7:28 pm UTC

?????

It can be extremely efficient - I have *not a clue* what you mean by your statement. None.

Is the one approach taken on this page by one poster the right way to do this? no - it would not be. Is it a possible approach? Sure.




A reader, February 24, 2012 - 11:32 am UTC

Sorry I did not reference it properly I paste it below


with data1 as
(
  select row_number()over(order by study_site) as rno, study_site Site_no, cnt tot_rec from t
)
,
rec (rno, Site_no, tot_rec, Total, flg) as
(
  select rno, Site_no, tot_rec, tot_rec, 0
  from data1 where rno=1
  union all
  select d.rno, d.Site_no, d.tot_rec,
         case when r.total + d.tot_rec > 65000 Then d.tot_rec ELse r.total + d.tot_rec END,
         case when r.total + d.tot_rec > 65000 Then r.flg+1 ELse r.flg END
  from data1 d, rec r
  where d.rno=r.rno+1
)
select
min(site_no), max(site_no), sum(tot_rec)
from rec
group by flg
order by flg

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         3          3          3  SORT GROUP BY (cr=203 pr=0 pw=0 time=9178 us cost=
        28         28         28   VIEW  (cr=203 pr=0 pw=0 time=25632 us cost=13 siz
        28         28         28    UNION ALL (RECURSIVE WITH) BREADTH FIRST (cr=203 
         1          1          1     VIEW  (cr=7 pr=0 pw=0 time=169 us cost=4 size=1
         2          2          2      WINDOW SORT PUSHED RANK (cr=7 pr=0 pw=0 time=1
        28         28         28       TABLE ACCESS FULL T (cr=7 pr=0 pw=0 time=68 u
        27         27         27     HASH JOIN  (cr=196 pr=0 pw=0 time=8428 us cost=
       784        784        784      VIEW  (cr=196 pr=0 pw=0 time=3109 us cost=4 si
       784        784        784       WINDOW SORT (cr=196 pr=0 pw=0 time=2229 us co
       784        784        784        TABLE ACCESS FULL T (cr=196 pr=0 pw=0 time=8
        28         28         28      RECURSIVE WITH PUMP  (cr=0 pr=0 pw=0 time=2224 



will any possiblity of improvement in the plan or optimization in future rel?

Thanks

Tom Kyte
February 25, 2012 - 5:15 am UTC

probably not - look at the things you asked it to do.

Not until we get the sentient version of Oracle (version 42 I believe) will we be able to perform rewrites of that magnitude.

given the query it was given, that was about the best it could do - I would have done the same if you asked me to perform that work.

SQL and analytic approach

Timo Raitalaakso, March 07, 2012 - 2:13 pm UTC

Not pretty but possible with plain SQL and analytic functions.

create table t ( study_site number, tot_rec number );

insert into t values ( 1001, 10000 );
insert into t values ( 1002, 20000 );
insert into t values ( 1003, 30500 );
insert into t values ( 1004, 50000 );
insert into t values ( 1005, 25000 );
insert into t values ( 1006, 36000 );
insert into t values ( 1007, 28000 );
insert into t values ( 1008, 21000 );

with v as (
select t1.study_site ss1
     , t1.tot_rec tr1
     , t2.study_site ss2
     , t2.tot_rec tr2
     , sum(t2.tot_rec)over(partition by t1.study_site order by t2.study_site) sm
  from t t1, t t2
 where t1.study_site <= t2.study_site
), w as (
select ss1,max(ss2) ssmx, max(sm) mxsm
  from v
 where v.sm <= 50000
 group by ss1
), z as (
select min(ss1) ssmi,ssmx,max(mxsm) smmx
     , max(ssmx)over(order by ssmx rows between unbounded preceding and 1 preceding) pr
     , min(max(ssmx))over(order by ssmx rows between 1 following and unbounded following) fo
from w
group by ssmx
)
select ssmi,ssmx,smmx
  from z 
 where (pr not between ssmi and ssmx
   and fo not between ssmi and ssmx)
   or pr is null or fo is null
order by ssmi
;

one more way using model clause

Stephen Dedalus, March 08, 2012 - 1:51 pm UTC

Hi,
Thanks everyone for showing the power of model clause. I am pasting another variation

WITH temp1 AS
(SELECT *
FROM t
model
dimension BY(study_site)
measures(cnt tot,0 AS cum_total,0 AS flag) ignore nav
rules sequential ORDER
( cum_total[ANY] =
CASE
WHEN tot[cv()] + cum_total[cv()-1] < 65000
THEN tot[cv()] + cum_total[cv()-1]
ELSE tot[cv()]
END,
flag[ANY] =
CASE
WHEN tot[cv()] + cum_total[cv()-1] < 65000
THEN flag[cv() -1]
ELSE flag[cv() -1]+1
END )
)
SELECT MIN(study_site),
MAX(study_site),
MAX(cum_total)
FROM temp1
GROUP BY flag
ORDER BY MIN(study_site)

Last 2 Queries provide wrong results

cb, March 09, 2012 - 6:58 am UTC

The queries submitted by Timo Raitalaakso and by Stephen Dedalus do not provide the correct results. With the original data the total sum of the total per study site must obviously be 113644.
 
The query by Timo Raitalaakso returns:
SQL> with v as (
  2  select t1.study_site ss1
  3       , t1.cnt tr1
  4       , t2.study_site ss2
  5       , t2.cnt tr2
  6       , sum(t2.cnt)over(partition by t1.study_site order by t2.study_site) sm
  7    from t t1, t t2
  8   where t1.study_site <= t2.study_site
  9  ), w as (
 10  select ss1,max(ss2) ssmx, max(sm) mxsm
 11    from v
 12   where v.sm <= 65000
 13   group by ss1
 14  ), z as (
 15  select min(ss1) ssmi,ssmx,max(mxsm) smmx
 16       , max(ssmx)over(order by ssmx rows between unbounded preceding and 1 preceding) pr
 17       , min(max(ssmx))over(order by ssmx rows between 1 following and unbounded following) fo
 18  from w
 19  group by ssmx
 20  )
 21  select ssmi,ssmx,smmx
 22    from z
 23   where (pr not between ssmi and ssmx
 24     and fo not between ssmi and ssmx)
 25     or pr is null or fo is null
 26  order by ssmi
 27  ;
 
      SSMI       SSMX       SMMX
---------- ---------- ----------
      1001       1022      48081
      1024       1045      39698
 
The query by Stephen Dedalus returns: 
SQL> WITH temp1 AS
  2    (SELECT *
  3    FROM t
  4    model
  5    dimension BY(study_site)
  6    measures(cnt tot,0 AS cum_total,0 AS flag) ignore nav
  7    rules sequential ORDER
  8    ( cum_total[ANY] =
  9      CASE
 10        WHEN tot[cv()] + cum_total[cv()-1] < 65000
 11        THEN tot[cv()] + cum_total[cv()-1]
 12        ELSE tot[cv()]
 13      END,
 14      flag[ANY] =
 15      CASE
 16        WHEN tot[cv()] + cum_total[cv()-1] < 65000
 17        THEN flag[cv() -1]
 18        ELSE flag[cv() -1]+1
 19      END )
 20    )
 21  SELECT MIN(study_site),
 22    MAX(study_site),
 23    MAX(cum_total)
 24  FROM temp1
 25  GROUP BY flag
 26  ORDER BY MIN(study_site)
 27  ;
 
MIN(STUDY_SITE) MAX(STUDY_SITE) MAX(CUM_TOTAL)
--------------- --------------- --------------
           1001            1045          32383

Both results must obviously be wrong, at least concerning the initial question.

Duck taping

Timo Raitalaakso, March 10, 2012 - 1:57 am UTC

Thank you for pointing out my mistake. This is an example when trying to use wrong technique to solve a problem. There is a hierarchy and analytic way is not the correct approach in this case. Plugging the hierarchy query there. Or should the refactoring start from empty table. Start learning the model clause to the toolbox. Not an easy task as the another incorrect rehearsal showed.

with v as (
select t1.study_site ss1
     , t1.cnt tr1
     , t2.study_site ss2
     , t2.cnt tr2
     , sum(t2.cnt)over(partition by t1.study_site order by t2.study_site) sm
  from t t1, t t2
 where t1.study_site <= t2.study_site
), w as (
select ss1,max(ss2) ssmx, max(sm) mxsm,min(ss1)over() mis
  from v
 where v.sm <= 65000
 group by ss1
), z (ss1,ssmx,mxsm) as (
select ss1,ssmx,mxsm
from w
where ss1=mis
union all
select w.ss1,w.ssmx,w.mxsm
from w, z
where z.ssmx + 1 = w.ss1)
select * 
  from z
;

Rows     Row Source Operation
-------  ---------------------------------------------------
      3  TEMP TABLE TRANSFORMATION  (cr=28 pr=1 pw=1 time=2328 us)
      0   LOAD AS SELECT  (cr=14 pr=0 pw=1 time=0 us)
     28    WINDOW BUFFER (cr=14 pr=0 pw=0 time=162 us cost=9 size=1092 card=28)
     28     SORT GROUP BY NOSORT (cr=14 pr=0 pw=0 time=3780 us cost=9 size=1092 card=28)
    274      VIEW  (cr=14 pr=0 pw=0 time=2457 us cost=9 size=1521 card=39)
    406       WINDOW SORT (cr=14 pr=0 pw=0 time=2632 us cost=9 size=1521 card=39)
    406        MERGE JOIN  (cr=14 pr=0 pw=0 time=2328 us cost=8 size=1521 card=39)
     28         SORT JOIN (cr=7 pr=0 pw=0 time=81 us cost=4 size=364 card=28)
     28          TABLE ACCESS FULL T (cr=7 pr=0 pw=0 time=54 us cost=3 size=364 card=28)
    406         SORT JOIN (cr=7 pr=0 pw=0 time=907 us cost=4 size=728 card=28)
     28          TABLE ACCESS FULL T (cr=7 pr=0 pw=0 time=54 us cost=3 size=728 card=28)
      3   VIEW  (cr=14 pr=1 pw=0 time=2320 us cost=7 size=2184 card=56)
      3    UNION ALL (RECURSIVE WITH) BREADTH FIRST (cr=14 pr=1 pw=0 time=2312 us)
      1     VIEW  (cr=5 pr=1 pw=0 time=0 us cost=2 size=1456 card=28)
     28      TABLE ACCESS FULL SYS_TEMP_0FD9D6627_812D8B (cr=5 pr=1 pw=0 time=81 us cost=2 size=1092 card=28)
      2     HASH JOIN  (cr=9 pr=0 pw=0 time=0 us cost=5 size=1456 card=28)
     84      VIEW  (cr=9 pr=0 pw=0 time=486 us cost=2 size=1092 card=28)
     84       TABLE ACCESS FULL SYS_TEMP_0FD9D6627_812D8B (cr=9 pr=0 pw=0 time=162 us cost=2 size=1092 card=28)
      3      RECURSIVE WITH PUMP  (cr=0 pr=0 pw=0 time=0 us)


Thanks for pointing out.......

Stephen Dedalus, March 10, 2012 - 3:16 am UTC

Thanks for pointing out the error......

Modified SQL:

SELECT MIN(study_site),
MAX(study_site),
MAX(cum_total) from (WITH temp1 AS
(SELECT study_site,row_number() over(order by study_site) as rn,cnt
FROM t ) SELECT * from temp1
model
dimension BY(rn)
measures(cnt tot,0 AS cum_total,0 AS flag,study_site) ignore nav
rules sequential ORDER
( cum_total[ANY] =
CASE
WHEN tot[cv()] + cum_total[cv()-1] < 65000
THEN tot[cv()] + cum_total[cv()-1]
ELSE tot[cv()]
END,
flag[ANY] =
CASE
WHEN tot[cv()] + cum_total[cv()-1] < 65000
THEN flag[cv() -1]
ELSE flag[cv() -1]+1
END )
)
GROUP BY flag
ORDER BY MIN(study_site)

output:

MIN(STUDY_SITE) MAX(STUDY_SITE) MAX(CUM_TOTAL)
---------------------- ---------------------- ----------------------
1001 1022 48081
1023 1044 62203
1045 1045 3360

3 rows selected


This is what the first question in the thread asked i suppose

RE: Jichao Li's MODEL solution

Duke Ganote, March 19, 2012 - 12:35 pm UTC

The bin-packing problem is interesting. As Wolfram's site notes "An alternative strategy first orders the items from largest to smallest...D. Johnson showed that this strategy is never suboptimal by more than 22%, and furthermore that no efficient bin-packing algorithm can be guaranteed to do better than 22%"
http://mathworld.wolfram.com/Bin-PackingProblem.html

One of the fun twists to the MODEL (aka SPREADSHEET) clause is getting to mix in analytic and grouping functions:

WITH t AS (
SELECT study_site, cnt
FROM dual
SPREADSHEET DIMENSION BY ('1001' AS study_site )
MEASURES ( 3407 AS cnt )
RULES
( cnt['1001'] = 3407 , cnt['1002'] = 4323
, cnt['1004'] = 1623 , cnt['1008'] = 1991
, cnt['1011'] = 885 , cnt['1012'] = 11597
, cnt['1014'] = 1989 , cnt['1015'] = 5282
, cnt['1017'] = 2841 , cnt['1018'] = 5183
, cnt['1020'] = 6176 , cnt['1022'] = 2784
, cnt['1023'] = 25865 , cnt['1024'] = 3734
, cnt['1026'] = 137 , cnt['1028'] = 6005
, cnt['1029'] = 76 , cnt['1031'] = 4599
, cnt['1032'] = 1989 , cnt['1034'] = 3427
, cnt['1036'] = 879 , cnt['1038'] = 6485
, cnt['1039'] = 3 , cnt['1040'] = 1105
, cnt['1041'] = 6460 , cnt['1042'] = 968
, cnt['1044'] = 471 , cnt['1045'] = 3360
) )
SELECT
UNIQUE min_site, max_site, max_sum
FROM t
SPREADSHEET
DIMENSION BY(ROW_NUMBER()
OVER(ORDER BY study_site ASC)
-- "first fit" approach
AS rn)
MEASURES( study_site
, cnt
, 0 as bin
, cnt AS sm -- cumulative sum in bin
, study_site AS min_site -- of bin
, study_site AS max_site -- of bin
, cnt AS max_sum -- of bin
)
RULES
( sm[rn > 1] = CASE WHEN sm[cv() - 1] + cnt[cv()] > 20000
OR cnt[cv()] > 20000
THEN cnt[cv()]
ELSE sm[cv() - 1] + cnt[cv()]
END
, bin[rn > 1] = bin[cv()-1]
+ CASE sm[cv()] WHEN cnt[cv()]
THEN 1 ELSE 0 END
, min_site[any] = MIN(study_site) OVER (PARTITION BY bin)
, max_site[any] = MAX(study_site) OVER (PARTITION BY bin)
, max_sum[any] = MAX(sm) OVER (PARTITION BY bin)
)
ORDER BY min_site asc
/
MIN_ MAX_ MAX_SUM
---- ---- ----------
1001 1011 12229
1012 1015 18868
1017 1022 16984
1023 1023 25865
1024 1034 19967
1036 1045 19731

Steve, May 13, 2013 - 11:17 am UTC

I'm trying to come to terms with the model clause and think it can help with my query - but the conections aren't being made.
I have a table like this:

Process ID Start_time End_time
2196 09/05/2013 07:15:00 09/05/2013 18:03:00
1957 09/05/2013 07:33:00 09/05/2013 17:09:00
2129 09/05/2013 07:35:00 09/05/2013 08:45:00
1559 09/05/2013 07:54:00 09/05/2013 09:15:00
2078 09/05/2013 07:58:00 09/05/2013 17:12:00
2210 09/05/2013 08:03:00 09/05/2013 17:13:00
2178 09/05/2013 08:05:00 09/05/2013 08:45:00
2176 09/05/2013 08:05:00 09/05/2013 17:22:00
2177 09/05/2013 08:05:00 09/05/2013 08:45:00
2130 09/05/2013 08:12:00 09/05/2013 08:45:00
2131 09/05/2013 08:12:00 09/05/2013 18:00:00
2200 09/05/2013 08:53:00 09/05/2013 17:38:00
1806 09/05/2013 09:04:00 09/05/2013 16:45:00
1639 09/05/2013 09:08:00 09/05/2013 15:58:00
2133 09/05/2013 11:17:00 09/05/2013 14:19:00
1559 09/05/2013 14:15:00 09/05/2013 17:27:00
2178 09/05/2013 15:15:00 09/05/2013 17:22:00
2130 09/05/2013 15:15:00 09/05/2013 18:00:00
1865 09/05/2013 15:15:00 09/05/2013 17:05:00
2177 09/05/2013 15:15:00 09/05/2013 17:22:00

The output I need is something along the lines of

Time Running Processes Processes
7:15 1 2196
7:30 1 2196
7:45 3 2196,1957,2129
8:00 5 2196,1957,2129,1559,2078
8:15 11 2196,1957,2129,1559,2078 etc etc

I'm sure you get the picture!

Any ideas - would model help here?

Thank you

Tom Kyte
May 13, 2013 - 1:28 pm UTC

this doesn't call for the model clause. this calls for

a) something to generate the set of times - a with clause, or connect by on dual
b) a join
c) an aggregate involving listagg.

ops$tkyte%ORA11GR2> with times
  2  as
  3  (select trunc(sysdate,'mm')+ level * 15/24/60 tm
  4     from dual
  5  connect by level <= 24*4-1)
  6  select tm, listagg(id,',') within group (order by id) ids
  7    from (
  8  select *
  9    from times, t
 10   where times.tm between to_date(to_char(t.sdate,'hh24miss'), 'hh24miss') and to_date( to_char( t.edate, 'hh24miss'), 'hh24miss' )
 11         )
 12   group by tm
 13   order by tm
 14  /

TM                  IDS
------------------- ---------------------------------------------------------------------------
05/01/2013 07:15:00 2196
05/01/2013 07:30:00 2196
05/01/2013 07:45:00 1957,2129,2196
05/01/2013 08:00:00 1559,1957,2078,2129,2196
05/01/2013 08:15:00 1559,1957,2078,2129,2130,2131,2176,2177,2178,2196,2210
05/01/2013 08:30:00 1559,1957,2078,2129,2130,2131,2176,2177,2178,2196,2210
05/01/2013 08:45:00 1559,1957,2078,2129,2130,2131,2176,2177,2178,2196,2210
05/01/2013 09:00:00 1559,1957,2078,2131,2176,2196,2200,2210
05/01/2013 09:15:00 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
05/01/2013 09:30:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
05/01/2013 09:45:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
05/01/2013 10:00:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
05/01/2013 10:15:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
05/01/2013 10:30:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
05/01/2013 10:45:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
05/01/2013 11:00:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
05/01/2013 11:15:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
05/01/2013 11:30:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
05/01/2013 11:45:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
05/01/2013 12:00:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
05/01/2013 12:15:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
05/01/2013 12:30:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
05/01/2013 12:45:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
05/01/2013 13:00:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
05/01/2013 13:15:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
05/01/2013 13:30:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
05/01/2013 13:45:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
05/01/2013 14:00:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
05/01/2013 14:15:00 1559,1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
05/01/2013 14:30:00 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
05/01/2013 14:45:00 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
05/01/2013 15:00:00 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
05/01/2013 15:15:00 1559,1639,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
05/01/2013 15:30:00 1559,1639,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
05/01/2013 15:45:00 1559,1639,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
05/01/2013 16:00:00 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
05/01/2013 16:15:00 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
05/01/2013 16:30:00 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
05/01/2013 16:45:00 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
05/01/2013 17:00:00 1559,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
05/01/2013 17:15:00 1559,2130,2131,2176,2177,2178,2196,2200
05/01/2013 17:30:00 2130,2131,2196,2200
05/01/2013 17:45:00 2130,2131,2196
05/01/2013 18:00:00 2130,2131,2196

44 rows selected.




I had to make an assumption that the date just didn't matter. we could simplify this if the date is going to be passed in.

ops$tkyte%ORA11GR2> variable input varchar2(30)
ops$tkyte%ORA11GR2> exec :input := '09/05/2013';

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> with times
  2  as
  3  (select to_date(:input,'mm/dd/yyyy')+ level * 15/24/60 tm
  4     from dual
  5  connect by level <= 24*4-1)
  6  select tm, listagg(id,',') within group (order by id) ids
  7    from (
  8  select *
  9    from times, t
 10   where times.tm between t.sdate and t.edate
 11     and t.sdate <= to_date(:input,'mm/dd/yyyy')+1
 12     and t.edate >= to_date(:input,'mm/dd/yyyy')
 13         )
 14   group by tm
 15   order by tm
 16  /

TM                  IDS
------------------- ---------------------------------------------------------------------------
09/05/2013 07:15:00 2196
09/05/2013 07:30:00 2196
09/05/2013 07:45:00 1957,2129,2196
09/05/2013 08:00:00 1559,1957,2078,2129,2196
09/05/2013 08:15:00 1559,1957,2078,2129,2130,2131,2176,2177,2178,2196,2210
09/05/2013 08:30:00 1559,1957,2078,2129,2130,2131,2176,2177,2178,2196,2210
09/05/2013 08:45:00 1559,1957,2078,2129,2130,2131,2176,2177,2178,2196,2210
09/05/2013 09:00:00 1559,1957,2078,2131,2176,2196,2200,2210
09/05/2013 09:15:00 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
09/05/2013 09:30:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
09/05/2013 09:45:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
09/05/2013 10:00:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
09/05/2013 10:15:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
09/05/2013 10:30:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
09/05/2013 10:45:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
09/05/2013 11:00:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
09/05/2013 11:15:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
09/05/2013 11:30:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
09/05/2013 11:45:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
09/05/2013 12:00:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
09/05/2013 12:15:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
09/05/2013 12:30:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
09/05/2013 12:45:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
09/05/2013 13:00:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
09/05/2013 13:15:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
09/05/2013 13:30:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
09/05/2013 13:45:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
09/05/2013 14:00:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
09/05/2013 14:15:00 1559,1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
09/05/2013 14:30:00 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
09/05/2013 14:45:00 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
09/05/2013 15:00:00 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
09/05/2013 15:15:00 1559,1639,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
09/05/2013 15:30:00 1559,1639,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
09/05/2013 15:45:00 1559,1639,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
09/05/2013 16:00:00 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
09/05/2013 16:15:00 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
09/05/2013 16:30:00 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
09/05/2013 16:45:00 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
09/05/2013 17:00:00 1559,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
09/05/2013 17:15:00 1559,2130,2131,2176,2177,2178,2196,2200
09/05/2013 17:30:00 2130,2131,2196,2200
09/05/2013 17:45:00 2130,2131,2196
09/05/2013 18:00:00 2130,2131,2196

44 rows selected.

Couldn't resist, despite it's the wrong 'tool'

Hoek, May 15, 2013 - 5:20 pm UTC

Since I've become quite obsessed about the MODEL clause lately, I just gave it a try anyway:

SQL> select ts
  2  ,      processes
  3  from ( select *
  4         from ( with times as ( select trunc(sysdate,'mm')+ level * 15/24/60 tm
  5                                from   dual
  6                                connect by level <= 24*4-1
  7                              )
  8                select tm
  9                ,      process_id 
 10                from   t
 11                ,      times 
 12                where  times.tm between to_date(to_char(t.starttime, 'hh24miss'), 'hh24miss')
 13                                    and to_date(to_char(t.endtime, 'hh24miss'), 'hh24miss' )
 14                )
 15         model
 16         partition by ( tm )
 17         dimension by ( row_number() over ( partition by tm order by process_id) rn )
 18         measures     ( to_char(tm, 'hh24:mi:ss') ts
 19                      , to_char(process_id) process_id
 20                      , cast(null as varchar2(4000)) processes
 21                      )
 22         rules        ( processes[any] order by process_id desc = case 
 23                                                                    when (ts[cv()] <= ts[cv()+1])
 24                                                                    then process_id[cv()]||','||processes[cv()+1]
 25                                                                    else process_id[cv()]
 26                                                                  end
 27                      )
 28       )
 29  where rn=1
 30  order by ts;

TS       PROCESSES
-------- ----------------------------------------------------------------------------------------------------
07:15:00 2196
07:30:00 2196
07:45:00 1957,2129,2196
08:00:00 1559,1957,2078,2129,2196
08:15:00 1559,1957,2078,2129,2130,2131,2176,2177,2178,2196,2210
08:30:00 1559,1957,2078,2129,2130,2131,2176,2177,2178,2196,2210
08:45:00 1559,1957,2078,2129,2130,2131,2176,2177,2178,2196,2210
09:00:00 1559,1957,2078,2131,2176,2196,2200,2210
09:15:00 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
09:30:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
09:45:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
10:00:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
10:15:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
10:30:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
10:45:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
11:00:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
11:15:00 1639,1806,1957,2078,2131,2176,2196,2200,2210
11:30:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
11:45:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
12:00:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
12:15:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
12:30:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
12:45:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
13:00:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
13:15:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
13:30:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
13:45:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
14:00:00 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
14:15:00 1559,1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
14:30:00 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
14:45:00 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
15:00:00 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
15:15:00 1559,1639,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
15:30:00 1559,1639,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
15:45:00 1559,1639,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
16:00:00 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
16:15:00 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
16:30:00 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
16:45:00 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
17:00:00 1559,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
17:15:00 1559,2130,2131,2176,2177,2178,2196,2200
17:30:00 2130,2131,2196,2200
17:45:00 2130,2131,2196
18:00:00 2130,2131,2196

44 rows selected.


I learned the 'trick' is the DESC order by in the RULES clause.

*disclaimer*
No clue how this performs irl.

Thanks to Rob's series on the MODEL clause, of which amongst others:
http://rwijk.blogspot.nl/2008/05/string-aggregation-with-model-clause.html
and Joe's tutorials:
http://www.sqlsnippets.com/en/topic-11663.html

And to add...

Hoek, May 15, 2013 - 6:54 pm UTC

Forgot column 'running processes':
SQL> select ts
  2  ,      regexp_count(processes, ',')+1 running_processes
  3  ,      processes
  4  from ( select *
  5         from ( with times as ( select trunc(sysdate,'mm')+ level * 15/24/60 tm
  6                                from   dual
  7                                connect by level <= 24*4-1
  8                              )
  9                select tm
 10                ,      process_id 
 11                from   t
 12                ,      times 
 13                where  times.tm between to_date(to_char(t.starttime, 'hh24miss'), 'hh24miss')
 14                                    and to_date(to_char(t.endtime, 'hh24miss'), 'hh24miss' )
 15              )
 16         model
 17         partition by ( tm )
 18         dimension by ( row_number() over ( partition by tm order by process_id) rn )
 19         measures     ( to_char(tm, 'hh24:mi:ss') ts
 20                      , to_char(process_id) process_id
 21                      , cast(null as varchar2(4000)) processes
 22                      )
 23         rules ( processes[any] order by process_id desc = case 
 24                                                             when (ts[cv()] <= ts[cv()+1])
 25                                                             then process_id[cv()]||','||processe
s[cv()+1]
 26                                                             else process_id[cv()]
 27                                                           end
 28               )
 29       )
 30  where rn=1
 31  order by ts;

TS       RUNNING_PROCESSES PROCESSES
-------- ----------------- -------------------------------------------------------------------------
07:15:00                 1 2196
07:30:00                 1 2196
07:45:00                 3 1957,2129,2196
08:00:00                 5 1559,1957,2078,2129,2196
08:15:00                11 1559,1957,2078,2129,2130,2131,2176,2177,2178,2196,2210
08:30:00                11 1559,1957,2078,2129,2130,2131,2176,2177,2178,2196,2210
08:45:00                11 1559,1957,2078,2129,2130,2131,2176,2177,2178,2196,2210
09:00:00                 8 1559,1957,2078,2131,2176,2196,2200,2210
09:15:00                10 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
09:30:00                 9 1639,1806,1957,2078,2131,2176,2196,2200,2210
09:45:00                 9 1639,1806,1957,2078,2131,2176,2196,2200,2210
10:00:00                 9 1639,1806,1957,2078,2131,2176,2196,2200,2210
10:15:00                 9 1639,1806,1957,2078,2131,2176,2196,2200,2210
10:30:00                 9 1639,1806,1957,2078,2131,2176,2196,2200,2210
10:45:00                 9 1639,1806,1957,2078,2131,2176,2196,2200,2210
11:00:00                 9 1639,1806,1957,2078,2131,2176,2196,2200,2210
11:15:00                 9 1639,1806,1957,2078,2131,2176,2196,2200,2210
11:30:00                10 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
11:45:00                10 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
12:00:00                10 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
12:15:00                10 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
12:30:00                10 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
12:45:00                10 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
13:00:00                10 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
13:15:00                10 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
13:30:00                10 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
13:45:00                10 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
14:00:00                10 1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
14:15:00                11 1559,1639,1806,1957,2078,2131,2133,2176,2196,2200,2210
14:30:00                10 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
14:45:00                10 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
15:00:00                10 1559,1639,1806,1957,2078,2131,2176,2196,2200,2210
15:15:00                14 1559,1639,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
15:30:00                14 1559,1639,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
15:45:00                14 1559,1639,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
16:00:00                13 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
16:15:00                13 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
16:30:00                13 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
16:45:00                13 1559,1806,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
17:00:00                12 1559,1865,1957,2078,2130,2131,2176,2177,2178,2196,2200,2210
17:15:00                 8 1559,2130,2131,2176,2177,2178,2196,2200
17:30:00                 4 2130,2131,2196,2200
17:45:00                 3 2130,2131,2196
18:00:00                 3 2130,2131,2196

44 rows selected.

Answer to original question using 12c MATCH_RECOGNIZE

Stew Ashton, September 27, 2013 - 10:25 am UTC

SELECT * FROM t
MATCH_RECOGNIZE (
  ORDER BY STUDY_SITE
  MEASURES
    A.STUDY_SITE first_site,
    nvl(LAST(b.study_site), A.study_site) last_site,
    sum(cnt) sum_cnt
  PATTERN (A b*)
  DEFINE b AS sum(cnt) <= 20000
);

FIRST_SITE  LAST_SITE    SUM_CNT
---------- ---------- ----------
      1001       1011      12229 
      1012       1015      18868 
      1017       1022      16984 
      1023       1023      25865 
      1024       1034      19967 
      1036       1045      19731
Tom, if you dive into MATCH_RECOGNIZE in Oracle Magazine you might want to refer to http://www.oracle.com/technetwork/issue-archive/2012/12-mar/o22asktom-1518271.html
By the way, if you want a laugh go to http://www.oracle.com/technetwork/oramag/magazine/tech-articles/index.html and click on your name. You will go to a list of articles by: Steven Feuerstein ;)

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.