Hi Tom,
I have a table like below:
GRP,SUBGRP,START_Y,END_Y
122,...
123,A,2010,2011
123,A,2011,2012
123,A,2012,2013
123,A,2013,2014
123,B,2014,2015
123,B,2015,2016
123,B,2016,2017
123,A,2017,2018
123,A,2018,2019
123,A,2019,2020
124,...
I would like to find start and end of all intervals in this table like so:
GRP,SUBGRP,MIN,MAX
122,...
123,A,2010,2014
123,B,2014,2017
123,A,2017,2020
124,...
A simple group by would show the results over the complete timeperiod but not over the different intervals:
GRP,SUBGRP,MIN,MAX
122,...
123,A,2010,2020
123,B,2014,2017
124,...
I think it should be possible with analytic functions but I don't get it.
You need to split this up into groups of consecutive rows first, then group by these groups.
Or you can use pattern matching :)
With match_recognize, you're searching for
- Any row
- Followed by any number of rows where the current start = previous end
Which gives these pattern and define clauses:
pattern ( init consecutive* )
define
consecutive as st = prev ( en )
Giving:
with rws ( grp, subgrp, st, en ) as (
select 123,'A',2010,2011 from dual union all
select 123,'A',2011,2012 from dual union all
select 123,'A',2012,2013 from dual union all
select 123,'A',2013,2014 from dual union all
select 123,'B',2014,2015 from dual union all
select 123,'B',2015,2016 from dual union all
select 123,'B',2016,2017 from dual union all
select 123,'A',2017,2018 from dual union all
select 123,'A',2018,2019 from dual union all
select 123,'A',2019,2020 from dual
)
select * from rws match_recognize (
partition by grp, subgrp
order by st, en
measures
first ( st ) mn,
last ( en ) mx
pattern ( init consecutive* )
define
consecutive as st = prev ( en )
)
order by mn;
GRP SUBGRP MN MX
123 A 2010 2014
123 B 2014 2017
123 A 2017 2020
To use plain analytic functions:
- Calculate a row number for each group & subgroup, sorted by start/end
- Subtract this row# from the start or end value giving your a group
This is known as the Tabibitosan method.
Group by the group calculated by this to get the split you want:
with rws ( grp, subgrp, st, en ) as (
select 123,'A',2010,2011 from dual union all
select 123,'A',2011,2012 from dual union all
select 123,'A',2012,2013 from dual union all
select 123,'A',2013,2014 from dual union all
select 123,'B',2014,2015 from dual union all
select 123,'B',2015,2016 from dual union all
select 123,'B',2016,2017 from dual union all
select 123,'A',2017,2018 from dual union all
select 123,'A',2018,2019 from dual union all
select 123,'A',2019,2020 from dual
), grps as (
select r.*,
st - row_number () over (
partition by grp, subgrp
order by st, en
) g
from rws r
)
select grp, subgrp,
min ( st ), max ( en )
from grps
group by grp, subgrp, g
order by 3;
GRP SUBGRP MIN(ST) MAX(EN)
123 A 2010 2014
123 B 2014 2017
123 A 2017 2020