Skip to Main Content
  • Questions
  • Smart Running Sum - Reset when total > 100

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Roy.

Asked: September 05, 2016 - 1:20 pm UTC

Last updated: August 25, 2022 - 1:21 pm UTC

Version: 3.2.10.09

Viewed 10K+ times! This question is

You Asked

Hi,
I m trying to add a culomn which will calculate a running sum of the date diffrents between each consecutive flights (In Minutes) Per Destination but with one tweak:
every time that the running sum reach spesific amount (Prompt Value = 100 Min.) it will restart the calculation from the current row.
To make it easiest to understand i paste before and after table:

Before:
Flight_Date Destination DateDiff-Min
08/06/2016 16:30 Berlin 0
08/06/2016 17:30 Berlin 60
08/06/2016 18:00 Berlin 30
08/06/2016 18:10 Berlin 10
09/06/2016 02:30 Berlin 500
09/06/2016 02:50 Berlin 20
09/06/2016 03:50 Berlin 60
09/06/2016 04:50 Berlin 60
09/06/2016 04:55 Berlin 5
09/06/2016 05:50 Berlin 55
13/06/2016 07:20 Rome 0
13/06/2016 08:50 Rome 90
13/06/2016 09:20 Rome 30
14/06/2016 07:20 Rome 1320

After:

Flight_Date Destination DateDiff-Min Running Sum
08/06/2016 16:30 Berlin 0 0
08/06/2016 17:30 Berlin 60 60
08/06/2016 18:00 Berlin 30 90
08/06/2016 18:10 Berlin 10 100
09/06/2016 02:30 Berlin 500 0
09/06/2016 02:50 Berlin 20 20
09/06/2016 03:50 Berlin 60 80
09/06/2016 04:50 Berlin 60 0
09/06/2016 04:55 Berlin 5 5
09/06/2016 05:50 Berlin 55 60
13/06/2016 07:20 Rome 0 0
13/06/2016 08:50 Rome 90 90
13/06/2016 09:20 Rome 30 0
14/06/2016 07:20 Rome 1320 1320

Note that Every time the running sum pass 100 min. it restart and calculate the running sum from the restart point.
Thank you for your time and patience and hope that my description was clear enought.
Roy

and Chris said...

So the rule is:

- Find the number of minutes between flights from the same destination
- When the running total of these is greater than 100, start the total from zero again?

If so, I'm not following why the final Rome flight has a value of 1320. Surely this should be zero, because you've gone over the 100 limit?

Anyway, here's one way to do it:

- Assign a row number to each destination, ordered by date
- Start with row = 1 for each destination
- Recursively work down the results. Each time increment the running total, provided it's <= 100. Otherwise start again from zero

I've done this with recursive subquery factoring (CTEs):

create table t (Flight_Date date, Destination varchar2(6), DateDiffMin int);
alter session set nls_date_format = 'yyyy-mm-dd hh24:mi:ss';
insert into t values ('2016-06-08 16:30:00', 'Berlin', 0);
insert into t values ('2016-06-08 17:30:00', 'Berlin', 60);
insert into t values ('2016-06-08 18:00:00', 'Berlin', 30);
insert into t values ('2016-06-08 18:10:00', 'Berlin', 10);
insert into t values ('2016-06-09 02:30:00', 'Berlin', 500);
insert into t values ('2016-06-09 02:50:00', 'Berlin', 20);
insert into t values ('2016-06-09 03:50:00', 'Berlin', 60);
insert into t values ('2016-06-09 04:50:00', 'Berlin', 60);
insert into t values ('2016-06-09 04:55:00', 'Berlin', 5);
insert into t values ('2016-06-09 05:50:00', 'Berlin', 55);
insert into t values ('2016-06-13 07:20:00', 'Rome', 0);
insert into t values ('2016-06-13 08:50:00', 'Rome', 90);
insert into t values ('2016-06-13 09:20:00', 'Rome', 30);
insert into t values ('2016-06-14 07:20:00', 'Rome', 1320);

with rws as (
  select t.*, 
         row_number() over (partition by destination order by flight_date) rn
  from   t
), tots (rn, flight_date, dest, DateDiffMin, tot) as (
  select rn, flight_date, destination, DateDiffMin, 0 from rws
  where  rn = 1
  union all
  select r.rn, r.flight_date, r.destination, r.DateDiffMin, 
         case when t.tot + r.DateDiffMin > 100 then
           0 else t.tot + r.DateDiffMin
         end
  from   tots t
  join   rws r
  on     r.rn = t.rn + 1
  and    t.dest = r.destination
)
  select * from tots
  order  by flight_date, dest;

RN  FLIGHT_DATE          DEST    DATEDIFFMIN  TOT  
1   2016-06-08 16:30:00  Berlin  0            0    
2   2016-06-08 17:30:00  Berlin  60           60   
3   2016-06-08 18:00:00  Berlin  30           90   
4   2016-06-08 18:10:00  Berlin  10           100  
5   2016-06-09 02:30:00  Berlin  500          0    
6   2016-06-09 02:50:00  Berlin  20           20   
7   2016-06-09 03:50:00  Berlin  60           80   
8   2016-06-09 04:50:00  Berlin  60           0    
9   2016-06-09 04:55:00  Berlin  5            5    
10  2016-06-09 05:50:00  Berlin  55           60   
1   2016-06-13 07:20:00  Rome    0            0    
2   2016-06-13 08:50:00  Rome    90           90   
3   2016-06-13 09:20:00  Rome    30           0    
4   2016-06-14 07:20:00  Rome    1,320        0 


If you want to reset at a different duration, e.g. 60 minutes, just change the condition in the case.

Note you don't need to store the time difference! You could calculate it in the first with clause alongside the rownum using lag:

select t.*, 
       row_number() over (partition by destination order by flight_date) rn,
       round((flight_date - lag(flight_date, 1, flight_date) over (
         partition by destination order by flight_date)
       ) * 1440) lg
from   t;

FLIGHT_DATE          DESTINATION  DATEDIFFMIN  RN  LG     
2016-06-08 16:30:00  Berlin       0            1   0      
2016-06-08 17:30:00  Berlin       60           2   60     
2016-06-08 18:00:00  Berlin       30           3   30     
2016-06-08 18:10:00  Berlin       10           4   10     
2016-06-09 02:30:00  Berlin       500          5   500    
2016-06-09 02:50:00  Berlin       20           6   20     
2016-06-09 03:50:00  Berlin       60           7   60     
2016-06-09 04:50:00  Berlin       60           8   60     
2016-06-09 04:55:00  Berlin       5            9   5      
2016-06-09 05:50:00  Berlin       55           10  55     
2016-06-13 07:20:00  Rome         0            1   0      
2016-06-13 08:50:00  Rome         90           2   90     
2016-06-13 09:20:00  Rome         30           3   30     
2016-06-14 07:20:00  Rome         1,320        4   1,320 

Rating

  (11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Using Pattern matching

Rajeshwaran Jeyabal, September 06, 2016 - 2:43 pm UTC

Since database version is not mentioned, i believe they will be on 12c - if so using pattern matching.
demo@ORA12C> select * from t
  2  match_recognize(
  3    partition by destination
  4    order by flight_date
  5    measures
  6      match_number() as mno,
  7      classifier() as cls,
  8      nvl(sum(down.DateDiffMin),0) as tot
  9    all rows per match
 10    pattern( strt down* )
 11    define
 12      down as sum(down.DateDiffMin) <= 100 )
 13  /

DESTIN FLIGHT_DATE                MNO CLS               TOT DATEDIFFMIN
------ ------------------- ---------- ---------- ---------- -----------
Berlin 2016-06-08 16:30:00          1 STRT                0           0
Berlin 2016-06-08 17:30:00          1 DOWN               60          60
Berlin 2016-06-08 18:00:00          1 DOWN               90          30
Berlin 2016-06-08 18:10:00          1 DOWN              100          10
Berlin 2016-06-09 02:30:00          2 STRT                0         500
Berlin 2016-06-09 02:50:00          2 DOWN               20          20
Berlin 2016-06-09 03:50:00          2 DOWN               80          60
Berlin 2016-06-09 04:50:00          3 STRT                0          60
Berlin 2016-06-09 04:55:00          3 DOWN                5           5
Berlin 2016-06-09 05:50:00          3 DOWN               60          55
Rome   2016-06-13 07:20:00          1 STRT                0           0
Rome   2016-06-13 08:50:00          1 DOWN               90          90
Rome   2016-06-13 09:20:00          2 STRT                0          30
Rome   2016-06-14 07:20:00          3 STRT                0        1320

14 rows selected.

demo@ORA12C>

Chris Saxon
September 06, 2016 - 3:50 pm UTC

i believe they will be on 12c

Much as I'd love this to be true, the majority are still on 11g and earlier.

The pattern matching solution is neat though.

or MODEL it

Duke Ganote, September 06, 2016 - 4:15 pm UTC

Or, back in the wayback machine, there's MODEL (although I'm using 12c so I get the easy column-naming for the initial query S):

WITH s ( Flight_Date,         Destination, DateDiffMin ) AS (               SELECT
         '2016-06-08 16:30:00', 'Berlin',    0          FROM DUAL UNION ALL SELECT
         '2016-06-08 17:30:00', 'Berlin',   60          FROM DUAL UNION ALL SELECT
         '2016-06-08 18:00:00', 'Berlin',   30          FROM DUAL UNION ALL SELECT
         '2016-06-08 18:10:00', 'Berlin',   10          FROM DUAL UNION ALL SELECT
         '2016-06-09 02:30:00', 'Berlin',  500          FROM DUAL UNION ALL SELECT
         '2016-06-09 02:50:00', 'Berlin',   20          FROM DUAL UNION ALL SELECT
         '2016-06-09 03:50:00', 'Berlin',   60          FROM DUAL UNION ALL SELECT
         '2016-06-09 04:50:00', 'Berlin',   60          FROM DUAL UNION ALL SELECT
         '2016-06-09 04:55:00', 'Berlin',    5          FROM DUAL UNION ALL SELECT
         '2016-06-09 05:50:00', 'Berlin',   55          FROM DUAL UNION ALL SELECT
         '2016-06-13 07:20:00', 'Rome'  ,    0          FROM DUAL UNION ALL SELECT
         '2016-06-13 08:50:00', 'Rome'  ,   90          FROM DUAL UNION ALL SELECT
         '2016-06-13 09:20:00', 'Rome'  ,   30          FROM DUAL UNION ALL SELECT
         '2016-06-14 07:20:00', 'Rome'  , 1320          FROM DUAL
), t AS (
SELECT TO_DATE(Flight_Date,'YYYY-MM-DD HH24:MI:SS') flight_date
     , destination, dateDiffMin
  FROM s
), rws as (
  select t.*,
         row_number() over (partition by destination order by flight_date) rn
  from   t
)
SELECT rn,Flight_date, destination, dateDiffMin, grp, inc_sum, inc_grp
  FROM rws
 MODEL
PARTITION BY (destination)
DIMENSION BY (rn)
MEASURES
( Flight_Date,  DateDiffMin--, destination
, 0 as tot, 0 as grp, 0 as inc_sum, 0 as inc_grp )
RULES AUTOMATIC ORDER
(tot[1]   = dateDiffMin[CV()]
,tot[rn>1] = CASE WHEN DateDiffMin[CV()]+tot[CV()-1] <= 100
                  THEN DateDiffMin[CV()]+tot[CV()-1]
                  ELSE DateDiffMin[CV()]
             END
,grp[1]       = 1
,grp[rn>1] = CASE WHEN DateDiffMin[CV()]+tot[CV()-1] <= 100
                  THEN grp[CV()-1]
                  ELSE grp[CV()-1]+1
             END
,inc_sum[1]   = DateDiffMin[CV()]
,inc_sum[rn>1] = DateDiffMin[CV()]+inc_sum[CV()-1]
,inc_grp[any] = trunc(inc_sum[cv()]/100)+1
)
order by destination, rn;

   RN FLIGHT_DATE         DESTIN DATEDIFFMIN        GRP    INC_SUM    INC_GRP
----- ------------------- ------ ----------- ---------- ---------- ----------
    1 2016-06-08 16:30:00 Berlin           0          1          0          1
    2 2016-06-08 17:30:00 Berlin          60          1         60          1
    3 2016-06-08 18:00:00 Berlin          30          1         90          1
    4 2016-06-08 18:10:00 Berlin          10          1        100          2
    5 2016-06-09 02:30:00 Berlin         500          2        600          7
    6 2016-06-09 02:50:00 Berlin          20          3        620          7
    7 2016-06-09 03:50:00 Berlin          60          3        680          7
    8 2016-06-09 04:50:00 Berlin          60          4        740          8
    9 2016-06-09 04:55:00 Berlin           5          4        745          8
   10 2016-06-09 05:50:00 Berlin          55          5        800          9
    1 2016-06-13 07:20:00 Rome             0          1          0          1
    2 2016-06-13 08:50:00 Rome            90          1         90          1
    3 2016-06-13 09:20:00 Rome            30          2        120          2
    4 2016-06-14 07:20:00 Rome          1320          3       1440         15

Chris Saxon
September 07, 2016 - 2:09 am UTC

nice touch

Excellent !!

A reader, September 06, 2016 - 7:52 pm UTC

Hey Chris,
First let me thank you for your quick and professional solution, Its work great !
As for your remark about the last row in Rome (1320) you are right, its supposed to be 0 as you wrote.
But (and i hope i m not exaggerate. .)
I plan to use this code for derive table in bo 3.1 universe and i would like to know if there is any way to get the same result without using recursive subquery?
Thank you again !
Chris Saxon
September 07, 2016 - 2:32 am UTC

glad we could help

why no recursive subquery ?

Rajeshwaran Jeyabal, September 07, 2016 - 2:55 am UTC

i would like to know if there is any way to get the same result without using recursive subquery?

Why don't you like recursive subquery?

The above MODEL clause should work from 10g and above.

but in case of huge dataset's stay away from model, they dont scale.

https://community.oracle.com/ideas/13061
Connor McDonald
September 07, 2016 - 9:24 am UTC

As they said, Business Objects doesn't like the with clause. I believe you can get around this using derived tables placing the whole thing in an inline view, e.g.:

select * from (
  with rws as (
    select t.*, 
           row_number() over (partition by destination order by flight_date) rn
    from   t
  ), tots (rn, flight_date, dest, DateDiffMin, tot) as (
    select rn, flight_date, destination, DateDiffMin, 0 from rws
    where  rn = 1
    union all
    select r.rn, r.flight_date, r.destination, r.DateDiffMin, 
           case when t.tot + r.DateDiffMin > 100 then
             0 else t.tot + r.DateDiffMin
           end
    from   tots t
    join   rws r
    on     r.rn = t.rn + 1
    and    t.dest = r.destination
  )
    select * from tots
)


But my BOBJ knowledge is rusty, so don't hold me to this!

Unable to execute it

niraj sharma, November 07, 2018 - 10:15 pm UTC

I am using Redshift and have similar requirement to be addressed.
after running your query I am getting an error
"ERROR: 42P01: relation "tots" does not exist"

can you please advise
Connor McDonald
November 08, 2018 - 12:36 am UTC

You'd need to visit: asktom.redshift.com for that :-)

How to create a bucket based on a specific limit

Ankit B, August 24, 2022 - 11:17 am UTC

I have data something like below: (DD-MM-YY)

Dt1 Dt2 FIscalMonth DaysDiff (Dt2- Dt1)
22-10-21 29-10-21 Oct21 8
30-10-21 26-11-21 Nov21 28
27-11-21 24-12-21 Dec21 28
25-12-21 28-01-22 Jan22 35
29-01-22 25-02-22 Feb22 28
26-02-22 25-03-22 Mar22 28
26-03-22 29-04-22 Apr22 35
30-04-22 27-05-22 May22 28
28-05-22 24-06-22 Jun22 28
25-06-22 29-07-22 Jul22 35
30-07-22 13-08-22 Aug22 15


Required output:
(MMYY)
Bucket 1021 1121 1221 0122 0222 0322 0422 0522 0622 0722 822
L90 8 28 28 26
L91-180 9 28 28 25
L180+ 10 28 28 35 15


**
I need to have 3bucket 90, 90-180, 180above.
Month start to fill in these bucket, the moment sum becomes 90 in a row.
The remaining days in that month move to the other bucket and the process follow.(Carefully observe jan22 and april 22)

Chris Saxon
August 24, 2022 - 12:46 pm UTC

I'm struggling to interpret the data - please post it in the form of create table + insert into statements.

That said, you probably want to use pattern matching for this. This can group rows up to some total, e.g.:

select * from ... match_recognize (
  order by ...
  measures ...
  pattern ( ninety+ )
  define
    ninety as sum ( ... ) <= 90
)

Split days of month into different bucket once limit is reached.

Ankit B, August 24, 2022 - 11:43 am UTC

Data:
Date1 date2 fis_month day_diff
22-10-21 29-10-21 01-10-21 8
30-10-21 26-11-21 01-11-21 28
27-11-21 24-12-21 01-12-21 28
25-12-21 28-01-22 01-01-22 35
29-01-22 25-02-22 01-02-22 28
26-02-22 25-03-22 01-03-22 28
26-03-22 29-04-22 01-04-22 35
30-04-22 27-05-22 01-05-22 28
28-05-22 24-06-22 01-06-22 28
25-06-22 29-07-22 01-07-22 35
30-07-22 13-08-22 01-08-22 15

Output required:

Months Bkt90 bkt90-180 Bkt180+
Oct21 8
Nov21 28
Dec21 28
Jan22 26 9
Feb22 28
Mar22 28
Apr22 25 10
May22 28
Jun22 28
jul22 35
aug22 15


In a coulmn, once the sum of days becomes 90, it moves to other column. oct21 = 8days, nov21 = 28 days (oct+nov 36days), dec21 = 28 days (oct+nov+dec (8+28+28=64)days), when jan22 comes (35days), sum becomes (8+28+28+35=99), so we need to adjust jan22 data, we keep 26 in bkt90 and move 9(35-26) into bucket 90-180. and the process continue..The last bucket keeps everying no limit check of 90


Ankit B, August 24, 2022 - 3:08 pm UTC

Sample Data:
|    date1  |    date2 | fiscal mon | days diff |
|-----------|----------|-----------|----------|
| 22-10-21  |29-10-21  |  Oct21    |   8      |
| 30-10-21  |26-11-21  |  Nov21    |   28     |
| 27-11-21  |24-12-21  |  Dec21    |   28     |
| 25-12-21  |28-01-22  |  Jan22    |   35     |
| 29-01-22  |25-02-22  |  Feb22    |   28     |
| 26-02-22  |25-03-22  |  Mar22    |   28     |
| 26-03-22  |29-04-22  |  Apr22    |   35     |
| 30-04-22  |27-05-22  |  May22    |   28     |
| 28-05-22  |24-06-22  |  Jun22    |   28     |
| 25-06-22  |29-07-22  |  Jul22    |   35     |
| 30-07-22  |13-08-22  |  Aug22    |   15     |


Required Output:

| Month   |Oct21|Nov21|Dec21|Jan22|Feb22|Mar22|Apr22|May22|Jun22|Jul22|Aug22|
|---------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
|Bkt90    |  8  | 28  | 28  |  26 |     |     |     |     |     |     |     |
|Bkt90-180|     |     |     |   9 | 28  |  28 | 25  |     |     |     |     |
|Bkt180+  |     |     |     |     |     |     | 10  | 28  | 28  |  35 |  15 |


I am creating 3 buckets 90, 90-180, 180+
Based on fiscal mon, days diff data must flow to these buckets. Once sum of days in a month in a bucket reaches 90, the remaining days in a month moves to other bucket.



In a row, once the sum of days becomes 90, it moves to other row.
Eg oct21 = 8days
   nov21 = 28 days (oct and November 36days)
   dec21 = 28 days (oct , nov and dec (8+28+28=64)days
 when jan22 comes (35days), sum becomes (8+28+28+35=99), so we need to adjust jan22 data, we keep 26 in bkt90 (so that that row sum is 90)
and move 9(35-26) into bucket 90-180. and the process continue..(pay attention to jan22 and april

The last bucket keeps everything no limit check of 90.

Chris Saxon
August 25, 2022 - 1:21 pm UTC

One way to do this is to adapt the recursive with solution at the top of this thread - check the running total and reset as needed.

You'll need to duplicate rows when the total is >= 90 too.

Then pivot the result of this.

Ankit B, August 24, 2022 - 3:23 pm UTC

Hi Chris,

Thank you for your response

Not sure if there is a way to add image here.
But you can find the question with a proper format here:
https://stackoverflow.com/questions/73475658/split-month-days-onto-multiple-row-once-that-row-meet-a-limit-say-each-bucket-l
Chris Saxon
August 25, 2022 - 1:00 pm UTC

The "proper format" is

- create table statement
- insert into statement

NOT pictures or a table of the output.

Please provide data in that format.

to Ankit B

not Smart running sum, August 26, 2022 - 8:26 am UTC

set numwidth 5
with t(m, x) as (
  select rownum, column_value from table(sys.odcinumberlist(8,28,28,35,28,28,35,28,28,35,15))
),
u(z1, z2) as (
  select (level-1)*90, level*90+nullif(connect_by_isleaf,1) from dual connect by level<=3
)
select *
from (
  select z1 mm, m,
    least(x,x-(xx-nvl(z2,xx)),xx-z1) y
  from (
    select t.*,
       sum(x) over(order by m) xx
    from t
  ) left join u on xx > z1 and xx <= nvl(z2+x,xx)
) pivot (max(y) for m in (1,2,3,4,5,6,7,8,9,10,11,12))
order by 1;

   MM     1     2     3     4     5     6     7     8     9    10    11    12
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    0     8    28    28    26                                                
   90                       9    28    28    25                              
  180                                        10    28    28    35    15      

Probably wrong resut

Asim, September 11, 2022 - 1:44 pm UTC

Chris, up in this discussion, on,September 07, 2016 - 2:09 am UTC, you replied "nice touch" to dukes MODEL solution.

But arent the results wrong? At the value of berlin, datediff of 500, the sum should be zero not 600?

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.