Skip to Main Content
  • Questions
  • Using analytical functions for time period grouping

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Peter.

Asked: June 10, 2021 - 7:52 am UTC

Last updated: June 11, 2021 - 10:25 am UTC

Version: 19c

Viewed 1000+ times

You Asked

Hi Tom,

I have a table like below:

GRP,SUBGRP,START_Y,END_Y
122,...
123,A,2010,2011
123,A,2011,2012
123,A,2012,2013
123,A,2013,2014
123,B,2014,2015
123,B,2015,2016
123,B,2016,2017
123,A,2017,2018
123,A,2018,2019
123,A,2019,2020
124,...


I would like to find start and end of all intervals in this table like so:

GRP,SUBGRP,MIN,MAX
122,...
123,A,2010,2014
123,B,2014,2017
123,A,2017,2020
124,...


A simple group by would show the results over the complete timeperiod but not over the different intervals:

GRP,SUBGRP,MIN,MAX
122,...
123,A,2010,2020
123,B,2014,2017
124,...


I think it should be possible with analytic functions but I don't get it.

and Chris said...

You need to split this up into groups of consecutive rows first, then group by these groups.

Or you can use pattern matching :)

With match_recognize, you're searching for

- Any row
- Followed by any number of rows where the current start = previous end

Which gives these pattern and define clauses:

    pattern ( init consecutive* )
    define
      consecutive as st = prev ( en )


Giving:

with rws ( grp, subgrp, st, en ) as (
  select 123,'A',2010,2011 from dual union all
  select 123,'A',2011,2012 from dual union all
  select 123,'A',2012,2013 from dual union all
  select 123,'A',2013,2014 from dual union all
  select 123,'B',2014,2015 from dual union all
  select 123,'B',2015,2016 from dual union all
  select 123,'B',2016,2017 from dual union all
  select 123,'A',2017,2018 from dual union all
  select 123,'A',2018,2019 from dual union all
  select 123,'A',2019,2020 from dual 
)
  select * from rws match_recognize (
    partition by grp, subgrp
    order  by st, en
    measures 
      first ( st ) mn,
      last ( en ) mx
    pattern ( init consecutive* )
    define
      consecutive as st = prev ( en )
  )
  order by mn;
  
GRP    SUBGRP     MN      MX     
   123 A        2010    2014 
   123 B        2014    2017 
   123 A        2017    2020 


To use plain analytic functions:

- Calculate a row number for each group & subgroup, sorted by start/end
- Subtract this row# from the start or end value giving your a group

This is known as the Tabibitosan method.

Group by the group calculated by this to get the split you want:

with rws ( grp, subgrp, st, en ) as (
  select 123,'A',2010,2011 from dual union all
  select 123,'A',2011,2012 from dual union all
  select 123,'A',2012,2013 from dual union all
  select 123,'A',2013,2014 from dual union all
  select 123,'B',2014,2015 from dual union all
  select 123,'B',2015,2016 from dual union all
  select 123,'B',2016,2017 from dual union all
  select 123,'A',2017,2018 from dual union all
  select 123,'A',2018,2019 from dual union all
  select 123,'A',2019,2020 from dual 
), grps as (
  select r.*,
         st - row_number () over (
           partition by grp, subgrp
           order  by st, en
         ) g
  from   rws r
)
  select grp, subgrp, 
         min ( st ), max ( en )
  from   grps
  group  by grp, subgrp, g
  order  by 3;
  
GRP    SUBGRP   MIN(ST)   MAX(EN)   
   123 A           2010      2014 
   123 B           2014      2017 
   123 A           2017      2020 

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Another solution using analytic functions only

Raj, June 11, 2021 - 12:09 am UTC

See, the order of records in your original table is never guaranteed unless you have a column guaranteed to sort it like that - for instance a LOAD_DT column on which you can apply ORDER BY.

For you, that ordering column might as well be the START_Y if you think it guarantees ordering.

Create:

CREATE TABLE TEST (
GRP INTEGER,
SUBGRP CHAR(1),
START_Y INTEGER,
END_Y INTEGER,
LOAD_DT DATE
);


Insert:

INSERT INTO TEST VALUES(122,'Z',2009,2010, SYSDATE);
INSERT INTO TEST VALUES(123,'A',2010,2011,SYSDATE);
INSERT INTO TEST VALUES(123,'A',2011,2012,SYSDATE);
INSERT INTO TEST VALUES(123,'A',2012,2013,SYSDATE);
INSERT INTO TEST VALUES(123,'A',2013,2014,SYSDATE);

INSERT INTO TEST VALUES(123,'B',2014,2015,SYSDATE);
INSERT INTO TEST VALUES(123,'B',2015,2016,SYSDATE);
INSERT INTO TEST VALUES(123,'B',2016,2017,SYSDATE);

INSERT INTO TEST VALUES(123,'A',2017,2018,SYSDATE);
INSERT INTO TEST VALUES(123,'A',2018,2019,SYSDATE);
INSERT INTO TEST VALUES(123,'A',2019,2020,SYSDATE);

INSERT INTO TEST VALUES(124,'C',2021,2022,SYSDATE);

COMMIT;


Select:
SELECT * FROM TEST ORDER BY LOAD_DT;


GRP SUBGRP START_Y END_Y LOAD_DT
122 Z 2009 2010 21-06-10
123 A 2010 2011 21-06-10
123 A 2011 2012 21-06-10
123 A 2012 2013 21-06-10
123 A 2013 2014 21-06-10
123 B 2014 2015 21-06-10
123 B 2015 2016 21-06-10
123 B 2016 2017 21-06-10
123 A 2017 2018 21-06-10
123 A 2018 2019 21-06-10
123 A 2019 2020 21-06-10
124 C 2021 2022 21-06-10


SELECT GRP, SUBGRP, FINAL_GRP_START_Y, FINAL_GRP_END_Y FROM (
SELECT GRP, SUBGRP,
--we can't group by GRP, SUBGRP so this funky logic to aggregate using lag/lead again in the outer query
CASE WHEN GRP_START_Y = 0 THEN LAG(GRP_START_Y) OVER (ORDER BY LOAD_DT) ELSE GRP_START_Y END AS FINAL_GRP_START_Y,
CASE WHEN GRP_END_Y = 0 THEN LEAD(GRP_END_Y) OVER (ORDER BY LOAD_DT) ELSE GRP_END_Y END AS FINAL_GRP_END_Y,
LOAD_DT
FROM (
SELECT
GRP,SUBGRP,
<i>--find out START_Y and END_Y at the point where GRP and SUBGRP switches values using lag/lead</i>
CASE 
    WHEN SUBGRP <> nvl(LAG(SUBGRP) OVER (ORDER BY LOAD_DT),'a') 
    OR GRP <> nvl(LEAD(GRP) OVER (ORDER BY LOAD_DT), 0)
    THEN START_Y 
    ELSE 0 
    END AS GRP_START_Y,
CASE 
    WHEN SUBGRP <> nvl(LEAD(SUBGRP) OVER (ORDER BY LOAD_DT),'a') 
    OR GRP <> nvl(LEAD(GRP) OVER (ORDER BY LOAD_DT), 0)
    THEN END_Y 
    ELSE 0 
    END AS GRP_END_Y,
<i>--this is just to see output of lag,lead. Not needed for calculation</i>
LAG(SUBGRP) OVER (ORDER BY LOAD_DT) LAG_OUTPUT,
LEAD(SUBGRP) OVER (ORDER BY LOAD_DT) LEAD_OUTPUT,
START_Y,END_Y,LOAD_DT
FROM TEST
) T
WHERE GRP_START_Y <> GRP_END_Y
)
GROUP BY GRP, SUBGRP, FINAL_GRP_START_Y, FINAL_GRP_END_Y;


LOAD_DT is not extracted into the outermost query to order it by, that's why A's in SUBGRP are in consecutive records but it's doable to order it in the way you like.

122 Z 2009 2010
123 B 2014 2017
123 A 2010 2014
123 A 2017 2020
124 C 2021 2022
123 A 2019 2020

Question Answered

Peter, June 11, 2021 - 9:57 am UTC

Hi Chris,

thanks for your answer.
Exactly what I was searching for :-)
Chris Saxon
June 11, 2021 - 10:25 am UTC

You're welcome :)

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.