Skip to Main Content
  • Questions
  • Smart Running Sum - Reset when total > 100

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Roy.

Asked: September 05, 2016 - 1:20 pm UTC

Last updated: November 08, 2018 - 12:36 am UTC

Version: 3.2.10.09

Viewed 1000+ times

You Asked

Hi,
I m trying to add a culomn which will calculate a running sum of the date diffrents between each consecutive flights (In Minutes) Per Destination but with one tweak:
every time that the running sum reach spesific amount (Prompt Value = 100 Min.) it will restart the calculation from the current row.
To make it easiest to understand i paste before and after table:

Before:
Flight_Date Destination DateDiff-Min
08/06/2016 16:30 Berlin 0
08/06/2016 17:30 Berlin 60
08/06/2016 18:00 Berlin 30
08/06/2016 18:10 Berlin 10
09/06/2016 02:30 Berlin 500
09/06/2016 02:50 Berlin 20
09/06/2016 03:50 Berlin 60
09/06/2016 04:50 Berlin 60
09/06/2016 04:55 Berlin 5
09/06/2016 05:50 Berlin 55
13/06/2016 07:20 Rome 0
13/06/2016 08:50 Rome 90
13/06/2016 09:20 Rome 30
14/06/2016 07:20 Rome 1320

After:

Flight_Date Destination DateDiff-Min Running Sum
08/06/2016 16:30 Berlin 0 0
08/06/2016 17:30 Berlin 60 60
08/06/2016 18:00 Berlin 30 90
08/06/2016 18:10 Berlin 10 100
09/06/2016 02:30 Berlin 500 0
09/06/2016 02:50 Berlin 20 20
09/06/2016 03:50 Berlin 60 80
09/06/2016 04:50 Berlin 60 0
09/06/2016 04:55 Berlin 5 5
09/06/2016 05:50 Berlin 55 60
13/06/2016 07:20 Rome 0 0
13/06/2016 08:50 Rome 90 90
13/06/2016 09:20 Rome 30 0
14/06/2016 07:20 Rome 1320 1320

Note that Every time the running sum pass 100 min. it restart and calculate the running sum from the restart point.
Thank you for your time and patience and hope that my description was clear enought.
Roy

and Connor said...

So the rule is:

- Find the number of minutes between flights from the same destination
- When the running total of these is greater than 100, start the total from zero again?

If so, I'm not following why the final Rome flight has a value of 1320. Surely this should be zero, because you've gone over the 100 limit?

Anyway, here's one way to do it:

- Assign a row number to each destination, ordered by date
- Start with row = 1 for each destination
- Recursively work down the results. Each time increment the running total, provided it's <= 100. Otherwise start again from zero

I've done this with recursive subquery factoring (CTEs):

create table t (Flight_Date date, Destination varchar2(6), DateDiffMin int);
alter session set nls_date_format = 'yyyy-mm-dd hh24:mi:ss';
insert into t values ('2016-06-08 16:30:00', 'Berlin', 0);
insert into t values ('2016-06-08 17:30:00', 'Berlin', 60);
insert into t values ('2016-06-08 18:00:00', 'Berlin', 30);
insert into t values ('2016-06-08 18:10:00', 'Berlin', 10);
insert into t values ('2016-06-09 02:30:00', 'Berlin', 500);
insert into t values ('2016-06-09 02:50:00', 'Berlin', 20);
insert into t values ('2016-06-09 03:50:00', 'Berlin', 60);
insert into t values ('2016-06-09 04:50:00', 'Berlin', 60);
insert into t values ('2016-06-09 04:55:00', 'Berlin', 5);
insert into t values ('2016-06-09 05:50:00', 'Berlin', 55);
insert into t values ('2016-06-13 07:20:00', 'Rome', 0);
insert into t values ('2016-06-13 08:50:00', 'Rome', 90);
insert into t values ('2016-06-13 09:20:00', 'Rome', 30);
insert into t values ('2016-06-14 07:20:00', 'Rome', 1320);

with rws as (
  select t.*, 
         row_number() over (partition by destination order by flight_date) rn
  from   t
), tots (rn, flight_date, dest, DateDiffMin, tot) as (
  select rn, flight_date, destination, DateDiffMin, 0 from rws
  where  rn = 1
  union all
  select r.rn, r.flight_date, r.destination, r.DateDiffMin, 
         case when t.tot + r.DateDiffMin > 100 then
           0 else t.tot + r.DateDiffMin
         end
  from   tots t
  join   rws r
  on     r.rn = t.rn + 1
  and    t.dest = r.destination
)
  select * from tots
  order  by flight_date, dest;

RN  FLIGHT_DATE          DEST    DATEDIFFMIN  TOT  
1   2016-06-08 16:30:00  Berlin  0            0    
2   2016-06-08 17:30:00  Berlin  60           60   
3   2016-06-08 18:00:00  Berlin  30           90   
4   2016-06-08 18:10:00  Berlin  10           100  
5   2016-06-09 02:30:00  Berlin  500          0    
6   2016-06-09 02:50:00  Berlin  20           20   
7   2016-06-09 03:50:00  Berlin  60           80   
8   2016-06-09 04:50:00  Berlin  60           0    
9   2016-06-09 04:55:00  Berlin  5            5    
10  2016-06-09 05:50:00  Berlin  55           60   
1   2016-06-13 07:20:00  Rome    0            0    
2   2016-06-13 08:50:00  Rome    90           90   
3   2016-06-13 09:20:00  Rome    30           0    
4   2016-06-14 07:20:00  Rome    1,320        0 


If you want to reset at a different duration, e.g. 60 minutes, just change the condition in the case.

Note you don't need to store the time difference! You could calculate it in the first with clause alongside the rownum using lag:

select t.*, 
       row_number() over (partition by destination order by flight_date) rn,
       round((flight_date - lag(flight_date, 1, flight_date) over (
         partition by destination order by flight_date)
       ) * 1440) lg
from   t;

FLIGHT_DATE          DESTINATION  DATEDIFFMIN  RN  LG     
2016-06-08 16:30:00  Berlin       0            1   0      
2016-06-08 17:30:00  Berlin       60           2   60     
2016-06-08 18:00:00  Berlin       30           3   30     
2016-06-08 18:10:00  Berlin       10           4   10     
2016-06-09 02:30:00  Berlin       500          5   500    
2016-06-09 02:50:00  Berlin       20           6   20     
2016-06-09 03:50:00  Berlin       60           7   60     
2016-06-09 04:50:00  Berlin       60           8   60     
2016-06-09 04:55:00  Berlin       5            9   5      
2016-06-09 05:50:00  Berlin       55           10  55     
2016-06-13 07:20:00  Rome         0            1   0      
2016-06-13 08:50:00  Rome         90           2   90     
2016-06-13 09:20:00  Rome         30           3   30     
2016-06-14 07:20:00  Rome         1,320        4   1,320 

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Using Pattern matching

Rajeshwaran Jeyabal, September 06, 2016 - 2:43 pm UTC

Since database version is not mentioned, i believe they will be on 12c - if so using pattern matching.
demo@ORA12C> select * from t
  2  match_recognize(
  3    partition by destination
  4    order by flight_date
  5    measures
  6      match_number() as mno,
  7      classifier() as cls,
  8      nvl(sum(down.DateDiffMin),0) as tot
  9    all rows per match
 10    pattern( strt down* )
 11    define
 12      down as sum(down.DateDiffMin) <= 100 )
 13  /

DESTIN FLIGHT_DATE                MNO CLS               TOT DATEDIFFMIN
------ ------------------- ---------- ---------- ---------- -----------
Berlin 2016-06-08 16:30:00          1 STRT                0           0
Berlin 2016-06-08 17:30:00          1 DOWN               60          60
Berlin 2016-06-08 18:00:00          1 DOWN               90          30
Berlin 2016-06-08 18:10:00          1 DOWN              100          10
Berlin 2016-06-09 02:30:00          2 STRT                0         500
Berlin 2016-06-09 02:50:00          2 DOWN               20          20
Berlin 2016-06-09 03:50:00          2 DOWN               80          60
Berlin 2016-06-09 04:50:00          3 STRT                0          60
Berlin 2016-06-09 04:55:00          3 DOWN                5           5
Berlin 2016-06-09 05:50:00          3 DOWN               60          55
Rome   2016-06-13 07:20:00          1 STRT                0           0
Rome   2016-06-13 08:50:00          1 DOWN               90          90
Rome   2016-06-13 09:20:00          2 STRT                0          30
Rome   2016-06-14 07:20:00          3 STRT                0        1320

14 rows selected.

demo@ORA12C>

Chris Saxon
September 06, 2016 - 3:50 pm UTC

i believe they will be on 12c

Much as I'd love this to be true, the majority are still on 11g and earlier.

The pattern matching solution is neat though.

or MODEL it

Duke Ganote, September 06, 2016 - 4:15 pm UTC

Or, back in the wayback machine, there's MODEL (although I'm using 12c so I get the easy column-naming for the initial query S):

WITH s ( Flight_Date,         Destination, DateDiffMin ) AS (               SELECT
         '2016-06-08 16:30:00', 'Berlin',    0          FROM DUAL UNION ALL SELECT
         '2016-06-08 17:30:00', 'Berlin',   60          FROM DUAL UNION ALL SELECT
         '2016-06-08 18:00:00', 'Berlin',   30          FROM DUAL UNION ALL SELECT
         '2016-06-08 18:10:00', 'Berlin',   10          FROM DUAL UNION ALL SELECT
         '2016-06-09 02:30:00', 'Berlin',  500          FROM DUAL UNION ALL SELECT
         '2016-06-09 02:50:00', 'Berlin',   20          FROM DUAL UNION ALL SELECT
         '2016-06-09 03:50:00', 'Berlin',   60          FROM DUAL UNION ALL SELECT
         '2016-06-09 04:50:00', 'Berlin',   60          FROM DUAL UNION ALL SELECT
         '2016-06-09 04:55:00', 'Berlin',    5          FROM DUAL UNION ALL SELECT
         '2016-06-09 05:50:00', 'Berlin',   55          FROM DUAL UNION ALL SELECT
         '2016-06-13 07:20:00', 'Rome'  ,    0          FROM DUAL UNION ALL SELECT
         '2016-06-13 08:50:00', 'Rome'  ,   90          FROM DUAL UNION ALL SELECT
         '2016-06-13 09:20:00', 'Rome'  ,   30          FROM DUAL UNION ALL SELECT
         '2016-06-14 07:20:00', 'Rome'  , 1320          FROM DUAL
), t AS (
SELECT TO_DATE(Flight_Date,'YYYY-MM-DD HH24:MI:SS') flight_date
     , destination, dateDiffMin
  FROM s
), rws as (
  select t.*,
         row_number() over (partition by destination order by flight_date) rn
  from   t
)
SELECT rn,Flight_date, destination, dateDiffMin, grp, inc_sum, inc_grp
  FROM rws
 MODEL
PARTITION BY (destination)
DIMENSION BY (rn)
MEASURES
( Flight_Date,  DateDiffMin--, destination
, 0 as tot, 0 as grp, 0 as inc_sum, 0 as inc_grp )
RULES AUTOMATIC ORDER
(tot[1]   = dateDiffMin[CV()]
,tot[rn>1] = CASE WHEN DateDiffMin[CV()]+tot[CV()-1] <= 100
                  THEN DateDiffMin[CV()]+tot[CV()-1]
                  ELSE DateDiffMin[CV()]
             END
,grp[1]       = 1
,grp[rn>1] = CASE WHEN DateDiffMin[CV()]+tot[CV()-1] <= 100
                  THEN grp[CV()-1]
                  ELSE grp[CV()-1]+1
             END
,inc_sum[1]   = DateDiffMin[CV()]
,inc_sum[rn>1] = DateDiffMin[CV()]+inc_sum[CV()-1]
,inc_grp[any] = trunc(inc_sum[cv()]/100)+1
)
order by destination, rn;

   RN FLIGHT_DATE         DESTIN DATEDIFFMIN        GRP    INC_SUM    INC_GRP
----- ------------------- ------ ----------- ---------- ---------- ----------
    1 2016-06-08 16:30:00 Berlin           0          1          0          1
    2 2016-06-08 17:30:00 Berlin          60          1         60          1
    3 2016-06-08 18:00:00 Berlin          30          1         90          1
    4 2016-06-08 18:10:00 Berlin          10          1        100          2
    5 2016-06-09 02:30:00 Berlin         500          2        600          7
    6 2016-06-09 02:50:00 Berlin          20          3        620          7
    7 2016-06-09 03:50:00 Berlin          60          3        680          7
    8 2016-06-09 04:50:00 Berlin          60          4        740          8
    9 2016-06-09 04:55:00 Berlin           5          4        745          8
   10 2016-06-09 05:50:00 Berlin          55          5        800          9
    1 2016-06-13 07:20:00 Rome             0          1          0          1
    2 2016-06-13 08:50:00 Rome            90          1         90          1
    3 2016-06-13 09:20:00 Rome            30          2        120          2
    4 2016-06-14 07:20:00 Rome          1320          3       1440         15

Chris Saxon
September 07, 2016 - 2:09 am UTC

nice touch

Excellent !!

A reader, September 06, 2016 - 7:52 pm UTC

Hey Chris,
First let me thank you for your quick and professional solution, Its work great !
As for your remark about the last row in Rome (1320) you are right, its supposed to be 0 as you wrote.
But (and i hope i m not exaggerate. .)
I plan to use this code for derive table in bo 3.1 universe and i would like to know if there is any way to get the same result without using recursive subquery?
Thank you again !
Chris Saxon
September 07, 2016 - 2:32 am UTC

glad we could help

why no recursive subquery ?

Rajeshwaran Jeyabal, September 07, 2016 - 2:55 am UTC

i would like to know if there is any way to get the same result without using recursive subquery?

Why don't you like recursive subquery?

The above MODEL clause should work from 10g and above.

but in case of huge dataset's stay away from model, they dont scale.

https://community.oracle.com/ideas/13061
Connor McDonald
September 07, 2016 - 9:24 am UTC

As they said, Business Objects doesn't like the with clause. I believe you can get around this using derived tables placing the whole thing in an inline view, e.g.:

select * from (
  with rws as (
    select t.*, 
           row_number() over (partition by destination order by flight_date) rn
    from   t
  ), tots (rn, flight_date, dest, DateDiffMin, tot) as (
    select rn, flight_date, destination, DateDiffMin, 0 from rws
    where  rn = 1
    union all
    select r.rn, r.flight_date, r.destination, r.DateDiffMin, 
           case when t.tot + r.DateDiffMin > 100 then
             0 else t.tot + r.DateDiffMin
           end
    from   tots t
    join   rws r
    on     r.rn = t.rn + 1
    and    t.dest = r.destination
  )
    select * from tots
)


But my BOBJ knowledge is rusty, so don't hold me to this!

Unable to execute it

niraj sharma, November 07, 2018 - 10:15 pm UTC

I am using Redshift and have similar requirement to be addressed.
after running your query I am getting an error
"ERROR: 42P01: relation "tots" does not exist"

can you please advise
Connor McDonald
November 08, 2018 - 12:36 am UTC

You'd need to visit: asktom.redshift.com for that :-)

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.