Skip to Main Content
  • Questions
  • How to split rows into balanced sets based on a running total limited to 2000

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Martin.

Asked: June 02, 2017 - 9:58 am UTC

Last updated: September 21, 2017 - 1:30 pm UTC

Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

Viewed 10K+ times! This question is

You Asked

Hi,

my question fits in to your "Balanced sets in SQL" collection of questions.

I have a table as follows (in reality 26 million rows):

CREATE TABLE T (
 "PI" VARCHAR2(120), 
 "S" NUMBER, 
 "L" NUMBER
);

Insert into T (PI,S,L) values ('000740A0','1','54');
Insert into T (PI,S,L) values ('000740A0','2','103');
Insert into T (PI,S,L) values ('000740A0','3','102');
Insert into T (PI,S,L) values ('000740A0','4','102');
Insert into T (PI,S,L) values ('000740A0','5','173');
Insert into T (PI,S,L) values ('000740A0','6','174');
Insert into T (PI,S,L) values ('000740A0','7','120');
Insert into T (PI,S,L) values ('000740A0','8','156');
Insert into T (PI,S,L) values ('000740A0','9','102');
Insert into T (PI,S,L) values ('000740A0','10','119');
Insert into T (PI,S,L) values ('000740A0','11','102');
Insert into T (PI,S,L) values ('000740A0','12','99');
Insert into T (PI,S,L) values ('000740A0','13','119');
Insert into T (PI,S,L) values ('000740A0','14','148');
Insert into T (PI,S,L) values ('000740A0','15','102');
Insert into T (PI,S,L) values ('000740A0','16','100');
Insert into T (PI,S,L) values ('000740A0','17','119');
Insert into T (PI,S,L) values ('000740A0','18','100');
Insert into T (PI,S,L) values ('000740A0','19','97');
Insert into T (PI,S,L) values ('000740A0','20','119');
Insert into T (PI,S,L) values ('000740A0','21','100');
Insert into T (PI,S,L) values ('000740A0','22','118');
Insert into T (PI,S,L) values ('000740A0','23','146');
Insert into T (PI,S,L) values ('000740A0','24','120');
Insert into T (PI,S,L) values ('000740A0','25','127');
Insert into T (PI,S,L) values ('000740A0','26','124');
Insert into T (PI,S,L) values ('000740A0','27','134');
Insert into T (PI,S,L) values ('000844A0','1','54');
Insert into T (PI,S,L) values ('000844A0','2','103');
Insert into T (PI,S,L) values ('000844A0','3','102');
Insert into T (PI,S,L) values ('000844A0','4','95');
Insert into T (PI,S,L) values ('000844A0','5','102');
Insert into T (PI,S,L) values ('000844A0','6','99');
Insert into T (PI,S,L) values ('000844A0','7','95');
Insert into T (PI,S,L) values ('000844A0','8','102');
Insert into T (PI,S,L) values ('000844A0','9','133');
Insert into T (PI,S,L) values ('000844A0','10','173');
Insert into T (PI,S,L) values ('000844A0','11','120');
Insert into T (PI,S,L) values ('000844A0','12','143');
Insert into T (PI,S,L) values ('000844A0','13','156');
Insert into T (PI,S,L) values ('000844A0','14','144');
Insert into T (PI,S,L) values ('000844A0','15','139');
Insert into T (PI,S,L) values ('000844A0','16','147');
Insert into T (PI,S,L) values ('000844A0','17','131');
Insert into T (PI,S,L) values ('000844A0','18','132');
Insert into T (PI,S,L) values ('000844A0','19','150');
Insert into T (PI,S,L) values ('000844A0','20','128');
Insert into T (PI,S,L) values ('000844A0','21','120');
Insert into T (PI,S,L) values ('000844A0','22','104');
Insert into T (PI,S,L) values ('000844A0','23','123');
Insert into T (PI,S,L) values ('000844A0','24','104');
Insert into T (PI,S,L) values ('000844A0','25','101');
Insert into T (PI,S,L) values ('000844A0','26','123');
Insert into T (PI,S,L) values ('000844A0','27','104');
Insert into T (PI,S,L) values ('000844A0','28','147');
Insert into T (PI,S,L) values ('000844A0','29','100');
Insert into T (PI,S,L) values ('000844A0','30','150');
Insert into T (PI,S,L) values ('000844A0','31','117');
Insert into T (PI,S,L) values ('000844A0','32','129');
Insert into T (PI,S,L) values ('000844A0','33','114');
Insert into T (PI,S,L) values ('000844A0','34','147');
Insert into T (PI,S,L) values ('000844A0','35','138');
Insert into T (PI,S,L) values ('000844A0','36','147');
Insert into T (PI,S,L) values ('000844A0','37','138');
Insert into T (PI,S,L) values ('000844A0','38','147');
Insert into T (PI,S,L) values ('000844A0','39','138');
Insert into T (PI,S,L) values ('000844A0','40','147');
Insert into T (PI,S,L) values ('000844A0','41','138');
Insert into T (PI,S,L) values ('000844A0','42','147');
Insert into T (PI,S,L) values ('000844A0','43','138');
Insert into T (PI,S,L) values ('000844A0','44','147');
Insert into T (PI,S,L) values ('000844A0','45','138');
Insert into T (PI,S,L) values ('000844A0','46','147');
Insert into T (PI,S,L) values ('000844A0','47','138');
Insert into T (PI,S,L) values ('000844A0','48','147');
Insert into T (PI,S,L) values ('000844A0','49','138');
Insert into T (PI,S,L) values ('000844A0','50','147');
Insert into T (PI,S,L) values ('000844A0','51','138');
Insert into T (PI,S,L) values ('000844A0','52','147');


I would like to create buckets by partitioning by PI and the running sum of L+2 (sorting by S) with the sum being limited to a maximum of 2000. So the query I search for would produce following buckets (in an additional column I), column SU shows the running sum of L+2 (for information only):

PI                    S          L         SU          I
------------ ---------- ---------- ---------- ----------
000740A0              1         54         56          0
000740A0              2        103        161          0
000740A0              3        102        265          0
000740A0              4        102        369          0
000740A0              5        173        544          0
000740A0              6        174        720          0
000740A0              7        120        842          0
000740A0              8        156       1000          0
000740A0              9        102       1104          0
000740A0             10        119       1225          0
000740A0             11        102       1329          0
000740A0             12         99       1430          0
000740A0             13        119       1551          0
000740A0             14        148       1701          0
000740A0             15        102       1805          0
000740A0             16        100       1907          0
000740A0             17        119        119          1
000740A0             18        100        221          1
000740A0             19         97        320          1
000740A0             20        119        441          1
000740A0             21        100        543          1
000740A0             22        118        663          1
000740A0             23        146        811          1
000740A0             24        120        933          1
000740A0             25        127       1062          1
000740A0             26        124       1188          1
000740A0             27        134       1324          1
000844A0              1         54         56          0
000844A0              2        103        161          0
000844A0              3        102        265          0
000844A0              4         95        362          0
000844A0              5        102        466          0
000844A0              6         99        567          0
000844A0              7         95        664          0
000844A0              8        102        768          0
000844A0              9        133        903          0
000844A0             10        173       1078          0
000844A0             11        120       1200          0
000844A0             12        143       1345          0
000844A0             13        156       1503          0
000844A0             14        144       1649          0
000844A0             15        139       1790          0
000844A0             16        147       1939          0
000844A0             17        131        131          1
000844A0             18        132        265          1
000844A0             19        150        417          1
000844A0             20        128        547          1
000844A0             21        120        669          1
000844A0             22        104        775          1
000844A0             23        123        900          1
000844A0             24        104       1006          1
000844A0             25        101       1109          1
000844A0             26        123       1234          1
000844A0             27        104       1340          1
000844A0             28        147       1489          1
000844A0             29        100       1591          1
000844A0             30        150       1743          1
000844A0             31        117       1862          1
000844A0             32        129       1993          1
000844A0             33        114        114          2
000844A0             34        147        263          2
000844A0             35        138        403          2
000844A0             36        147        552          2
000844A0             37        138        692          2
000844A0             38        147        841          2
000844A0             39        138        981          2
000844A0             40        147       1130          2
000844A0             41        138       1270          2
000844A0             42        147       1419          2
000844A0             43        138       1559          2
000844A0             44        147       1708          2
000844A0             45        138       1848          2
000844A0             46        147       1997          2
000844A0             47        138        138          3
000844A0             48        147        287          3
000844A0             49        138        427          3
000844A0             50        147        576          3
000844A0             51        138        716          3
000844A0             52        147        865          3


Do you see any possibility to solve this without PL/SQL?

The problem is that using an analytic running sum and then using a division and a trunc or a width_bucket works for I=0 but not for the following buckets, because of the remainder. So the following buckets are > 2000...

So far I do it with following PL/SQL:

declare
 al integer;
 gi integer;
 lpi varchar2(200) := null;
begin
 for hl in ( 
  select
   pi, l, su, i, rowid rid
  from t
  order by pi, s
  for update of su, i
 )
 loop
  if lpi is null or hl.pi<>lpi then
   lpi := hl.pi;
   gi := 0;
   al := 0;
  end if;
  al := al + hl.l+2;
  if al > 2000 then
   gi := gi + 1;
   al := hl.l+2;
  end if;
  update t set su=al, i=gi where rowid=hl.rid;  
 end loop;
end;


But that is slow (2475s).

Best Regards

Martin

and Chris said...

So you want to split the rows into buckets with a running sum <= 2000?

With match_recognize it's a breeze:

Just define a variable that checks the sum ( L + 2 ) <= 2000. And in the measures clause call match_number to state which bucket the values go in:

select * from t
match_recognize (
  partition by pi
  order by s
  measures match_number() mn, sum (l + 2) tot
  all rows per match
  pattern ( twok+ ) 
  define
    twok as sum (l + 2) <= 2000
);

PI        S   MN  TOT   L    
000740A0  1   1   56    54   
000740A0  2   1   161   103  
000740A0  3   1   265   102  
000740A0  4   1   369   102  
000740A0  5   1   544   173  
000740A0  6   1   720   174  
000740A0  7   1   842   120  
000740A0  8   1   1000  156  
000740A0  9   1   1104  102  
000740A0  10  1   1225  119  
000740A0  11  1   1329  102  
000740A0  12  1   1430  99   
000740A0  13  1   1551  119  
000740A0  14  1   1701  148  
000740A0  15  1   1805  102  
000740A0  16  1   1907  100  
000740A0  17  2   121   119  
000740A0  18  2   223   100  
000740A0  19  2   322   97   
000740A0  20  2   443   119  
000740A0  21  2   545   100  
000740A0  22  2   665   118  
000740A0  23  2   813   146  
000740A0  24  2   935   120  
000740A0  25  2   1064  127  
000740A0  26  2   1190  124  
000740A0  27  2   1326  134  
000844A0  1   1   56    54   
000844A0  2   1   161   103  
000844A0  3   1   265   102  
000844A0  4   1   362   95   
000844A0  5   1   466   102  
000844A0  6   1   567   99   
000844A0  7   1   664   95   
000844A0  8   1   768   102  
000844A0  9   1   903   133  
000844A0  10  1   1078  173  
000844A0  11  1   1200  120  
000844A0  12  1   1345  143  
000844A0  13  1   1503  156  
000844A0  14  1   1649  144  
000844A0  15  1   1790  139  
000844A0  16  1   1939  147  
000844A0  17  2   133   131  
000844A0  18  2   267   132  
000844A0  19  2   419   150  
000844A0  20  2   549   128  
000844A0  21  2   671   120  
000844A0  22  2   777   104  
000844A0  23  2   902   123  
000844A0  24  2   1008  104  
000844A0  25  2   1111  101  
000844A0  26  2   1236  123  
000844A0  27  2   1342  104  
000844A0  28  2   1491  147  
000844A0  29  2   1593  100  
000844A0  30  2   1745  150  
000844A0  31  2   1864  117  
000844A0  32  2   1995  129  
000844A0  33  3   116   114  
000844A0  34  3   265   147  
000844A0  35  3   405   138  
000844A0  36  3   554   147  
000844A0  37  3   694   138  
000844A0  38  3   843   147  
000844A0  39  3   983   138  
000844A0  40  3   1132  147  
000844A0  41  3   1272  138  
000844A0  42  3   1421  147  
000844A0  43  3   1561  138  
000844A0  44  3   1710  147  
000844A0  45  3   1850  138  
000844A0  46  3   1999  147  
000844A0  47  4   140   138  
000844A0  48  4   289   147  
000844A0  49  4   429   138  
000844A0  50  4   578   147  
000844A0  51  4   718   138  
000844A0  52  4   867   147  

Rating

  (12 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

does this work?

Chuck Jolley, June 05, 2017 - 7:46 pm UTC

with t2
as (select pi,
           s,
           l,
           l + 2 l2, 
           trunc((sum(l + 2 ) over (partition by pi order by s))/ 2000) bucket 
      from t)
select pi,
       s,
       l,
       sum(l2) over (partition by pi, bucket order by s) - (2 * least(bucket, 1)) su,
       bucket
  from t2
order by pi, s  

PI S L SU BUCKET
000740A0 1 54 56 0
000740A0 2 103 161 0
000740A0 3 102 265 0
000740A0 4 102 369 0
000740A0 5 173 544 0
000740A0 6 174 720 0
000740A0 7 120 842 0
000740A0 8 156 1000 0
000740A0 9 102 1104 0
000740A0 10 119 1225 0
000740A0 11 102 1329 0
000740A0 12 99 1430 0
000740A0 13 119 1551 0
000740A0 14 148 1701 0
000740A0 15 102 1805 0
000740A0 16 100 1907 0
000740A0 17 119 119 1
000740A0 18 100 221 1
000740A0 19 97 320 1
000740A0 20 119 441 1
000740A0 21 100 543 1
000740A0 22 118 663 1
000740A0 23 146 811 1
000740A0 24 120 933 1
000740A0 25 127 1062 1
000740A0 26 124 1188 1
000740A0 27 134 1324 1
000844A0 1 54 56 0
000844A0 2 103 161 0
000844A0 3 102 265 0
000844A0 4 95 362 0
000844A0 5 102 466 0
000844A0 6 99 567 0
000844A0 7 95 664 0
000844A0 8 102 768 0
000844A0 9 133 903 0
000844A0 10 173 1078 0
000844A0 11 120 1200 0
000844A0 12 143 1345 0
000844A0 13 156 1503 0
000844A0 14 144 1649 0
000844A0 15 139 1790 0
000844A0 16 147 1939 0
000844A0 17 131 131 1
000844A0 18 132 265 1
000844A0 19 150 417 1
000844A0 20 128 547 1
000844A0 21 120 669 1
000844A0 22 104 775 1
000844A0 23 123 900 1
000844A0 24 104 1006 1
000844A0 25 101 1109 1
000844A0 26 123 1234 1
000844A0 27 104 1340 1
000844A0 28 147 1489 1
000844A0 29 100 1591 1
000844A0 30 150 1743 1
000844A0 31 117 1862 1
000844A0 32 129 1993 1
000844A0 33 114 114 2
000844A0 34 147 263 2
000844A0 35 138 403 2
000844A0 36 147 552 2
000844A0 37 138 692 2
000844A0 38 147 841 2
000844A0 39 138 981 2
000844A0 40 147 1130 2
000844A0 41 138 1270 2
000844A0 42 147 1419 2
000844A0 43 138 1559 2
000844A0 44 147 1708 2
000844A0 45 138 1848 2
000844A0 46 147 1997 2
000844A0 47 138 138 3
000844A0 48 147 287 3
000844A0 49 138 427 3
000844A0 50 147 576 3
000844A0 51 138 716 3
000844A0 52 147 865 3


Thanks a lot for your superb answer

Martin Glaser, September 13, 2017 - 12:20 pm UTC

Hi Chris,

this is really a wonderful solution. Lean and efficient. Performance is much better (707s).

Best Regards

Martin

P. S. sorry for my late answer, I had a lot of trouble to get into AskTOM and to be allowed to give an answer...
Chris Saxon
September 15, 2017 - 2:43 pm UTC

Thanks. FYI you can normally add reviews even when we're closed for new questions.

Chuck's solution does not work in general

Martin Glaser, September 13, 2017 - 12:51 pm UTC

Hi Chuck,

your solution works for the example given but not in general.

If the example is changed with...
update t set l=120 where pi='000844A0' and s=33;

...then you can see the problem in BUCKET 2 of PI=000844A0:
PI               S          L         SU     BUCKET
------------- ---- ---------- ---------- ----------
...           
000844A0        29        100       1591          1
000844A0        30        150       1743          1
000844A0        31        117       1862          1
000844A0        32        129       1993          1
000844A0        33        120        120          2
000844A0        34        147        269          2
000844A0        35        138        409          2
000844A0        36        147        558          2
000844A0        37        138        698          2
000844A0        38        147        847          2
000844A0        39        138        987          2
000844A0        40        147       1136          2
000844A0        41        138       1276          2
000844A0        42        147       1425          2
000844A0        43        138       1565          2
000844A0        44        147       1714          2
000844A0        45        138       1854          2
000844A0        46        147       2003          2
000844A0        47        138        138          3
000844A0        48        147        287          3
000844A0        49        138        427          3
000844A0        50        147        576          3
...

Best Regards

Martin

Chris Saxon
September 15, 2017 - 3:32 pm UTC

Yep, that's because the running total needs to restart at zero once you go over the 2k limit. But sum() over (order by ...) is the running total from the start.

I don't know of a general way to solve this using analytics.

using Analytics

Rajeshwaran, Jeyabal, September 17, 2017 - 2:33 am UTC

....
I don't know of a general way to solve this using analytics.
....


Chris - Here is an approach using Analytics.

Model could do the same thing, but not sure how many of you "like" model clause. so skipping it now.

demo@ORA12C> column pi format a10
demo@ORA12C> select pi,s,l,
  2    sum(l+2) over( partition by pi,rn order by s ) su_final,
  3    row_number() over( partition by pi,rn order by s ) rn
  4  from (
  5  select t.*, sum(l+2) over( partition by pi
  6                  order by s ) su,
  7            ceil( sum(l+2) over( partition by pi
  8                  order by s ) / 2000)  rn
  9  from t
 10      )
 11  /

PI                  S          L   SU_FINAL         RN
---------- ---------- ---------- ---------- ----------
000740A0            1         54         56          1
000740A0            2        103        161          2
000740A0            3        102        265          3
000740A0            4        102        369          4
000740A0            5        173        544          5
000740A0            6        174        720          6
000740A0            7        120        842          7
000740A0            8        156       1000          8
000740A0            9        102       1104          9
000740A0           10        119       1225         10
000740A0           11        102       1329         11
000740A0           12         99       1430         12
000740A0           13        119       1551         13
000740A0           14        148       1701         14
000740A0           15        102       1805         15
000740A0           16        100       1907         16
000740A0           17        119        121          1
000740A0           18        100        223          2
000740A0           19         97        322          3
000740A0           20        119        443          4
000740A0           21        100        545          5
000740A0           22        118        665          6
000740A0           23        146        813          7
000740A0           24        120        935          8
000740A0           25        127       1064          9
000740A0           26        124       1190         10
000740A0           27        134       1326         11
000844A0            1         54         56          1
000844A0            2        103        161          2
000844A0            3        102        265          3
000844A0            4         95        362          4
000844A0            5        102        466          5
000844A0            6         99        567          6
000844A0            7         95        664          7
000844A0            8        102        768          8
000844A0            9        133        903          9
000844A0           10        173       1078         10
000844A0           11        120       1200         11
000844A0           12        143       1345         12
000844A0           13        156       1503         13
000844A0           14        144       1649         14
000844A0           15        139       1790         15
000844A0           16        147       1939         16
000844A0           17        131        133          1
000844A0           18        132        267          2
000844A0           19        150        419          3
000844A0           20        128        549          4
000844A0           21        120        671          5
000844A0           22        104        777          6
000844A0           23        123        902          7
000844A0           24        104       1008          8
000844A0           25        101       1111          9
000844A0           26        123       1236         10
000844A0           27        104       1342         11
000844A0           28        147       1491         12
000844A0           29        100       1593         13
000844A0           30        150       1745         14
000844A0           31        117       1864         15
000844A0           32        129       1995         16
000844A0           33        114        116          1
000844A0           34        147        265          2
000844A0           35        138        405          3
000844A0           36        147        554          4
000844A0           37        138        694          5
000844A0           38        147        843          6
000844A0           39        138        983          7
000844A0           40        147       1132          8
000844A0           41        138       1272          9
000844A0           42        147       1421         10
000844A0           43        138       1561         11
000844A0           44        147       1710         12
000844A0           45        138       1850         13
000844A0           46        147       1999         14
000844A0           47        138        140          1
000844A0           48        147        289          2
000844A0           49        138        429          3
000844A0           50        147        578          4
000844A0           51        138        718          5
000844A0           52        147        867          6
demo@ORA12C>

Chris Saxon
September 18, 2017 - 12:47 pm UTC

Still doesn't work. If you run the update from Martin Glaser you get:

update t set l=120 where pi='000844A0' and s=33;

select pi,s,l,
    sum(l+2) over( partition by pi,rn order by s ) su_final,
    row_number() over( partition by pi,rn order by s ) rn
from (
  select t.*, sum(l+2) over( partition by pi
                  order by s ) su,
            ceil( sum(l+2) over( partition by pi
                  order by s ) / 2000)  rn
  from t
  where  pi = '000844A0'
);

PI        S   L    SU_FINAL  RN  
000844A0  1   54   56        1   
000844A0  2   103  161       2   
000844A0  3   102  265       3   
000844A0  4   95   362       4   
000844A0  5   102  466       5   
000844A0  6   99   567       6   
000844A0  7   95   664       7   
000844A0  8   102  768       8   
000844A0  9   133  903       9   
000844A0  10  173  1078      10  
000844A0  11  120  1200      11  
000844A0  12  143  1345      12  
000844A0  13  156  1503      13  
000844A0  14  144  1649      14  
000844A0  15  139  1790      15  
000844A0  16  147  1939      16  
000844A0  17  131  133       1   
000844A0  18  132  267       2   
000844A0  19  150  419       3   
000844A0  20  128  549       4   
000844A0  21  120  671       5   
000844A0  22  104  777       6   
000844A0  23  123  902       7   
000844A0  24  104  1008      8   
000844A0  25  101  1111      9   
000844A0  26  123  1236      10  
000844A0  27  104  1342      11  
000844A0  28  147  1491      12  
000844A0  29  100  1593      13  
000844A0  30  150  1745      14  
000844A0  31  117  1864      15  
000844A0  32  129  1995      16  
000844A0  33  120  122       1   
000844A0  34  147  271       2   
000844A0  35  138  411       3   
000844A0  36  147  560       4   
000844A0  37  138  700       5   
000844A0  38  147  849       6   
000844A0  39  138  989       7   
000844A0  40  147  1138      8   
000844A0  41  138  1278      9   
000844A0  42  147  1427      10  
000844A0  43  138  1567      11  
000844A0  44  147  1716      12  
000844A0  45  138  1856      13  
000844A0  46  147  2005      14  <====== greater than 2000
000844A0  47  138  140       1   
000844A0  48  147  289       2   
000844A0  49  138  429       3   
000844A0  50  147  578       4   
000844A0  51  138  718       5   
000844A0  52  147  867       6 

Tried it using analytics.? Will this work

GJ, September 18, 2017 - 6:51 am UTC

I attempted as follows

First i create a data block and obtain a running_total for l+2 partitioned by the pi column

Then i create row_generator block using dual and connect by to build the upper and lower limit in chunks of 2000.
Eg:
select level as lvl
,(level-1)*2000 +1 as lo_lvl
,level*2000 as hi_lvl
from dual
connect by level<=4(replace this with max possible value of l+2)

LVL LO_LVL HI_LVL
1 1 2000
2 2001 4000
3 4001 6000
4 6001 8000


After that categorize the running_total between this upper and lower values.

Then i aggregate the sum(l_plus_2) partitioned by the row_sourcegenerators lvl and pi

with data
  as (select sum(a.l+2) over(partition by a.pi order by a.s) as running_total
             ,a.l+2 as l_plus_2
             ,a.*
       from t a
      )
    ,row_generator as
     (select level as lvl
               ,(level-1)*2000 +1 as lo_lvl
              ,level*2000 as hi_lvl
          from dual
        connect by level<=(select max(data.running_total)/2000
                             from data
                           )     
     )
select a1.pi
      ,a1.s
      ,b1.lvl as mn
      ,sum(a1.l_plus_2) over(partition by a1.pi,b1.lvl order by a1.s) as running_tot
      ,a1.l as original_l
  from data a1
  join row_generator b1
     on a1.running_total>=b1.lo_lvl
    and a1.running_total<=b1.hi_lvl
order by 1,2  

Re.

GJ, September 18, 2017 - 8:14 am UTC

small correction the code for connect by should be as follows

connect by level<=(select ceil(max(data.running_total)/2000)

I like Rajesh's solution that has less parsing of data and less lines of code than what i made.

with data
  as (select sum(a.l+2) over(partition by a.pi order by a.s) as running_total
             ,a.l+2 as l_plus_2
             ,a.*
       from t a
      )
    ,row_generator as
     (select level as lvl
               ,(level-1)*2000 +1 as lo_lvl
              ,level*2000 as hi_lvl
          from dual
        connect by level<=(select ceil(max(data.running_total)/2000)
                             from data
                           )     
     )
select a1.pi
      ,a1.s
      ,b1.lvl as mn
      ,sum(a1.l_plus_2) over(partition by a1.pi,b1.lvl order by a1.s) as running_tot
      ,a1.l as original_l
  from data a1
  join row_generator b1
     on a1.running_total>=b1.lo_lvl
    and a1.running_total<=b1.hi_lvl
order by 1,2  

Chris Saxon
September 18, 2017 - 12:47 pm UTC

Still doesn't work. If you run the update from Martin Glaser you get:

update t set l=120 where pi='000844A0' and s=33;

with data
  as (select sum(a.l+2) over(partition by a.pi order by a.s) as running_total
             ,a.l+2 as l_plus_2
             ,a.*
       from t a
       where  pi = '000844A0'
      )
    ,row_generator as
     (select level as lvl
               ,(level-1)*2000 +1 as lo_lvl
              ,level*2000 as hi_lvl
          from dual
        connect by level<=(select ceil(max(data.running_total)/2000)
                             from data
                           )     
     )
select a1.pi
      ,a1.s
      ,b1.lvl as mn
      ,sum(a1.l_plus_2) over(partition by a1.pi,b1.lvl order by a1.s) as running_tot
      ,a1.l as original_l
  from data a1
  join row_generator b1
     on a1.running_total>=b1.lo_lvl
    and a1.running_total<=b1.hi_lvl
order by 1,2 ;

PI        S   MN  RUNNING_TOT  ORIGINAL_L  
000844A0  1   1   56           54          
000844A0  2   1   161          103         
000844A0  3   1   265          102         
000844A0  4   1   362          95          
000844A0  5   1   466          102         
000844A0  6   1   567          99          
000844A0  7   1   664          95          
000844A0  8   1   768          102         
000844A0  9   1   903          133         
000844A0  10  1   1078         173         
000844A0  11  1   1200         120         
000844A0  12  1   1345         143         
000844A0  13  1   1503         156         
000844A0  14  1   1649         144         
000844A0  15  1   1790         139         
000844A0  16  1   1939         147         
000844A0  17  2   133          131         
000844A0  18  2   267          132         
000844A0  19  2   419          150         
000844A0  20  2   549          128         
000844A0  21  2   671          120         
000844A0  22  2   777          104         
000844A0  23  2   902          123         
000844A0  24  2   1008         104         
000844A0  25  2   1111         101         
000844A0  26  2   1236         123         
000844A0  27  2   1342         104         
000844A0  28  2   1491         147         
000844A0  29  2   1593         100         
000844A0  30  2   1745         150         
000844A0  31  2   1864         117         
000844A0  32  2   1995         129         
000844A0  33  3   122          120         
000844A0  34  3   271          147         
000844A0  35  3   411          138         
000844A0  36  3   560          147         
000844A0  37  3   700          138         
000844A0  38  3   849          147         
000844A0  39  3   989          138         
000844A0  40  3   1138         147         
000844A0  41  3   1278         138         
000844A0  42  3   1427         147         
000844A0  43  3   1567         138         
000844A0  44  3   1716         147         
000844A0  45  3   1856         138         
000844A0  46  3   2005         147     <===== greater than 2000    
000844A0  47  4   140          138         
000844A0  48  4   289          147         
000844A0  49  4   429          138         
000844A0  50  4   578          147         
000844A0  51  4   718          138         
000844A0  52  4   867          147  

MODEL

Rajeshwaran, Jeyabal, September 18, 2017 - 2:51 pm UTC

With model clause, it works perfectly.

demo@ORA12C> select *
  2  from t
  3  where  pi = '000844A0'
  4  model
  5    partition by (pi)
  6    dimension by (s-1 s)
  7    measures( l, 0 tot)
  8    rules iterate(1000) until (l[iteration_number+1] is null)
  9    (
 10      tot[iteration_number] = case when iteration_number =0 then l[cv()]+2
 11                                   when tot[ cv()-1 ] + l[cv()]+2 <= 2000 then
 12                                        tot[ cv()-1 ] + l[cv()]+2
 13                                   else l[cv()]+2 end  )
 14  /

PI                  S          L        TOT
---------- ---------- ---------- ----------
000844A0            0         54         56
000844A0            1        103        161
000844A0            2        102        265
000844A0            3         95        362
000844A0            4        102        466
000844A0            5         99        567
000844A0            6         95        664
000844A0            7        102        768
000844A0            8        133        903
000844A0            9        173       1078
000844A0           10        120       1200
000844A0           11        143       1345
000844A0           12        156       1503
000844A0           13        144       1649
000844A0           14        139       1790
000844A0           15        147       1939
000844A0           16        131        133
000844A0           17        132        267
000844A0           18        150        419
000844A0           19        128        549
000844A0           20        120        671
000844A0           21        104        777
000844A0           22        123        902
000844A0           23        104       1008
000844A0           24        101       1111
000844A0           25        123       1236
000844A0           26        104       1342
000844A0           27        147       1491
000844A0           28        100       1593
000844A0           29        150       1745
000844A0           30        117       1864
000844A0           31        129       1995
000844A0           32        120        122
000844A0           33        147        271
000844A0           34        138        411
000844A0           35        147        560
000844A0           36        138        700
000844A0           37        147        849
000844A0           38        138        989
000844A0           39        147       1138
000844A0           40        138       1278
000844A0           41        147       1427
000844A0           42        138       1567
000844A0           43        147       1716
000844A0           44        138       1856
000844A0           45        147        149  <====== no, more greater than 2000, got reset prefectly.
000844A0           46        138        289
000844A0           47        147        438
000844A0           48        138        578
000844A0           49        147        727
000844A0           50        138        867
000844A0           51        147       1016

52 rows selected.

demo@ORA12C>


and the MODEL clause output exactly matches with the pattern matching output, no discrepancy.

demo@ORA12C> select pi,s,l,tot
  2  from t
  3  match_recognize(
  4    partition by pi
  5    order by s
  6    measures
  7      running sum(l+2) as tot
  8    all rows per match
  9    pattern (b1 b2+)
 10    define
 11      b2 as sum(l+2) <= 2000 )
 12  minus
 13  select pi,s+1,l,tot
 14  from t
 15  model
 16    partition by (pi)
 17    dimension by (s-1 s)
 18    measures( l, 0 tot)
 19    rules iterate(1000) until (l[iteration_number+1] is null)
 20    (
 21      tot[iteration_number] = case when iteration_number =0 then l[cv()]+2
 22                                   when tot[ cv()-1 ] + l[cv()]+2 <= 2000 then
 23                                        tot[ cv()-1 ] + l[cv()]+2
 24                                   else l[cv()]+2 end  )
 25  /

no rows selected

demo@ORA12C> 


but still, something is i am missing with Analytics, will look around.
Chris Saxon
September 18, 2017 - 4:27 pm UTC

Yep, I think you need to use model if you want to do something like this pre-12c (from 10g on ;)

Imperative and Declarative

Duke Ganote, September 18, 2017 - 5:29 pm UTC

Prior to MODEL (10gR2), only imperative solutions are available. Afterward, it's
10gR2 => MODEL
11gR2 => recursive SQL
12c => MATCH_RECOGNIZE

WITH part_pi AS (
SELECT ROW_NUMBER()OVER
        (PARTITION BY pi
             ORDER BY s ) r#
     , t.*
  FROM t
),
rCTE ( r#, pi, s, l, summer ) AS (
SELECT pp.r#, pp.pi, pp.s, pp.l
     , pp.l+2
  FROM part_pi pp
 WHERE r# = 1
UNION ALL
SELECT pp.r#, pp.pi, pp.s, pp.l
     , CASE WHEN pp.l+rCTE.summer+2 > 2000
            THEN pp.l+2
            ELSE pp.l+rCTE.summer+2
        END
  FROM part_pi pp
  JOIN rCTE
    ON rCTE.r# + 1 = pp.r#
   AND rCTE.pi     = pp.pi
)
SELECT *
  FROM rCTE
 ORDER BY pi, r#;

R# PI                  S          L     SUMMER
-- ---------- ---------- ---------- ----------
 1 000740A0            1         54         56
 2 000740A0            2        103        161
 3 000740A0            3        102        265
 4 000740A0            4        102        369
 5 000740A0            5        173        544
 6 000740A0            6        174        720
 7 000740A0            7        120        842
 8 000740A0            8        156       1000
 9 000740A0            9        102       1104
10 000740A0           10        119       1225
11 000740A0           11        102       1329
12 000740A0           12         99       1430
13 000740A0           13        119       1551
14 000740A0           14        148       1701
15 000740A0           15        102       1805
16 000740A0           16        100       1907
17 000740A0           17        119        121
18 000740A0           18        100        223
19 000740A0           19         97        322
20 000740A0           20        119        443
21 000740A0           21        100        545
22 000740A0           22        118        665
23 000740A0           23        146        813
24 000740A0           24        120        935
25 000740A0           25        127       1064
26 000740A0           26        124       1190
27 000740A0           27        134       1326
 1 000844A0            1         54         56
 2 000844A0            2        103        161
 3 000844A0            3        102        265
 4 000844A0            4         95        362
 5 000844A0            5        102        466
 6 000844A0            6         99        567
 7 000844A0            7         95        664
 8 000844A0            8        102        768
 9 000844A0            9        133        903
10 000844A0           10        173       1078
11 000844A0           11        120       1200
12 000844A0           12        143       1345
13 000844A0           13        156       1503
14 000844A0           14        144       1649
15 000844A0           15        139       1790
16 000844A0           16        147       1939
17 000844A0           17        131        133
18 000844A0           18        132        267
19 000844A0           19        150        419
20 000844A0           20        128        549
21 000844A0           21        120        671
22 000844A0           22        104        777
23 000844A0           23        123        902
24 000844A0           24        104       1008
25 000844A0           25        101       1111
26 000844A0           26        123       1236
27 000844A0           27        104       1342
28 000844A0           28        147       1491
29 000844A0           29        100       1593
30 000844A0           30        150       1745
31 000844A0           31        117       1864
32 000844A0           32        129       1995
33 000844A0           33        114        116
34 000844A0           34        147        265
35 000844A0           35        138        405
36 000844A0           36        147        554
37 000844A0           37        138        694
38 000844A0           38        147        843
39 000844A0           39        138        983
40 000844A0           40        147       1132
41 000844A0           41        138       1272
42 000844A0           42        147       1421
43 000844A0           43        138       1561
44 000844A0           44        147       1710
45 000844A0           45        138       1850
46 000844A0           46        147       1999
47 000844A0           47        138        140
48 000844A0           48        147        289
49 000844A0           49        138        429
50 000844A0           50        147        578
51 000844A0           51        138        718
52 000844A0           52        147        867


Connor McDonald
September 19, 2017 - 2:04 am UTC

nice stuff

MODEL: avoid ITERATE

Stew Ashton, September 19, 2017 - 9:04 am UTC

ITERATE can be a big drag on performance.
select pi, s, l, sum_l2, bucket
from t
model
  partition by (pi)
  dimension by (row_number() over(partition by pi order by s) rn)
  measures (s, l, l+2 sum_l2, 0 bucket)
  rules(
    sum_l2[rn>1] = l[cv()]+2 +
      case when sum_l2[cv()-1] + l[cv()] <= 2000
           then sum_l2[cv()-1] else 0 end,
    bucket[rn>1] = bucket[cv()-1] +
      case when sum_l2[cv()-1] + l[cv()] <= 2000
           then 0 else 1 end
  )

Chris Saxon
September 19, 2017 - 10:50 am UTC

Thanks for sharing Stew.

@Duke Ganote: prior to 10...

Stew Ashton, September 19, 2017 - 9:36 am UTC

Well, you could do it prior to 10 with a hierarchical query, though a PL/SQL solution might scale much better.
with data as (
  select pi, s, l,
    row_number() over(partition by pi order by s) rn,
    sum(l+2) over(partition by pi order by s) current_sum
  from t
)
, data_joined as (
  select pi, a.rn dad,
    a.current_sum - a.l - 2 prev_sum,
    max(b.rn) son
  from data a
  join data b using(pi)
  where a.rn < b.rn
  and b.current_sum - a.current_sum + a.l + 2 <= 2000
  group by pi, a.rn, a.l, a.current_sum
)
, hier_data as (
  select pi, dad, son, prev_sum, level-1 bucket
  from data_joined
  start with dad = 1
  connect by pi = prior pi and dad = prior son+1
)
select a.pi, a.s, a.l, 
  a.current_sum-b.prev_sum summer, bucket
from data a
join hier_data b on a.pi = b.pi and rn between dad and son;
Best regards, Stew
Chris Saxon
September 19, 2017 - 10:52 am UTC

Nicely done.

@Stew Ashton

Duke Ganote, September 19, 2017 - 5:23 pm UTC

Poof! Mind blown!

pre-MODEL -> Analytics & Cartesian

Duke Ganote, September 21, 2017 - 1:24 pm UTC

I always have the nagging suspicion that analytic functions will solve this problem, but I think my intuition is wrong. The RANGE clause only applies to the ORDER BY value (in our case, "s" rather than the "L" we want to be summed). If only we could look back-and-forth from each record for the range of SUM(L)...

The closest I've got using analytics is a modest modification of Stew Ashton's query. I'm using PRECEDING and FOLLOWING in order to obtain the leading and lagging sums for each record (summation starting from each partition beginning):

WITH-------------------------------------------------------
-- basic_calc: ordering + sums (lag, current, lead)
-----------------------------------------------------------
basic_calc as (
SELECT pi, s, L
     , 2000 AS bucketsize
     --------------------------------------
     -- partition ordering, top and bottom
     --------------------------------------
     , ROW_NUMBER() OVER
             (PARTITION BY pi 
                  ORDER BY s) AS r#
     , COUNT(*) OVER
             (PARTITION BY pi) AS max_r#
     --------------------------------------
     -- partition cumulative sums (lag, current, lead)
     --------------------------------------
     , COALESCE(SUM(L+2) OVER
             (PARTITION BY pi 
                  ORDER BY s
            ROWS BETWEEN UNBOUNDED PRECEDING
                    AND 1 PRECEDING),0) AS LagSum
     , SUM(L+2) OVER
             (PARTITION BY pi 
                  ORDER BY s) AS MySum
     , SUM(L+2) OVER
             (PARTITION BY pi 
                  ORDER BY s
            ROWS BETWEEN UNBOUNDED PRECEDING
                           AND 1 FOLLOWING) AS LeadSum
  FROM t
),---------------------------------------------------------
-- feasibleSET: all the sequential-sums <= 2000 in value
-----------------------------------------------------------
feasibleSET AS (
SELECT dad.pi
     , dad.r# AS dad#
     , dad.LagSum AS prev_sum
     , son.r# AS son#
     , son.Mysum - dad.LagSum AS delta_inclusive
     , son.bucketsize
  FROM basic_calc dad
  JOIN basic_calc son
    ON dad.pi = son.pi -- basically a within-partition Cartesian
 WHERE dad.r# < son.r#
   AND son.MySum   - dad.LagSum <= dad.bucketsize
   AND CASE WHEN son.r# < son.max_r# -- end r# is special case
            THEN CASE WHEN son.LeadSum - dad.LagSum
                             > dad.bucketsize
                      THEN 'True'
                      ELSE 'False'
                 END
            ELSE 'True'
        END = 'True' -- last record before sum exceeds 2000
),---------------------------------------------------------
-- Solution path through the feasible set
-----------------------------------------------------------
hier_data as (
SELECT pi, dad#, son#, prev_sum
     , level-1 AS bucket#
  FROM feasibleSET
 START WITH dad# = 1
 CONNECT BY pi = PRIOR pi 
        AND dad# = PRIOR son#+1
)---------------------------------------------------------
-- results
-----------------------------------------------------------
SELECT a.pi, a.s, a.L
     , a.mysum-b.prev_sum AS summer
--     , bucketsize
  FROM basic_calc a
  JOIN hier_data b 
    ON a.pi = b.pi 
   AND a.r# BETWEEN b.dad# AND b.son#
 ORDER BY 1, 2, 3;

Chris Saxon
September 21, 2017 - 1:30 pm UTC

The problem is you don't know how many rows you need to make 2000 in each bucket. And it depends on how many you allocated to the previous bucket. Which is why you need recursion (if not using model or match_recognize).

Or a cartesian product to calculate every possible grouping. Which ain't gonna scale well...

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.