## Question and Answer

## You Asked

Hi Tom,

I have a table like below:

I would like to find start and end of all intervals in this table like so:

A simple group by would show the results over the complete timeperiod but not over the different intervals:

I think it should be possible with analytic functions but I don't get it.

I have a table like below:

GRP,SUBGRP,START_Y,END_Y 122,... 123,A,2010,2011 123,A,2011,2012 123,A,2012,2013 123,A,2013,2014 123,B,2014,2015 123,B,2015,2016 123,B,2016,2017 123,A,2017,2018 123,A,2018,2019 123,A,2019,2020 124,...

I would like to find start and end of all intervals in this table like so:

GRP,SUBGRP,MIN,MAX 122,... 123,A,2010,2014 123,B,2014,2017 123,A,2017,2020 124,...

A simple group by would show the results over the complete timeperiod but not over the different intervals:

GRP,SUBGRP,MIN,MAX 122,... 123,A,2010,2020 123,B,2014,2017 124,...

I think it should be possible with analytic functions but I don't get it.

## and we said...

You need to split this up into groups of consecutive rows first, then group by these groups.

Or you can use pattern matching :)

With match_recognize, you're searching for

- Any row

- Followed by any number of rows where the current start = previous end

Which gives these pattern and define clauses:

Giving:

To use plain analytic functions:

- Calculate a row number for each group & subgroup, sorted by start/end

- Subtract this row# from the start or end value giving your a group

This is known as the Tabibitosan method.

Group by the group calculated by this to get the split you want:

Or you can use pattern matching :)

With match_recognize, you're searching for

- Any row

- Followed by any number of rows where the current start = previous end

Which gives these pattern and define clauses:

pattern ( init consecutive* ) define consecutive as st = prev ( en )

Giving:

with rws ( grp, subgrp, st, en ) as ( select 123,'A',2010,2011 from dual union all select 123,'A',2011,2012 from dual union all select 123,'A',2012,2013 from dual union all select 123,'A',2013,2014 from dual union all select 123,'B',2014,2015 from dual union all select 123,'B',2015,2016 from dual union all select 123,'B',2016,2017 from dual union all select 123,'A',2017,2018 from dual union all select 123,'A',2018,2019 from dual union all select 123,'A',2019,2020 from dual ) select * from rws match_recognize ( partition by grp, subgrp order by st, en measures first ( st ) mn, last ( en ) mx pattern ( init consecutive* ) define consecutive as st = prev ( en ) ) order by mn; GRP SUBGRP MN MX 123 A 2010 2014 123 B 2014 2017 123 A 2017 2020

To use plain analytic functions:

- Calculate a row number for each group & subgroup, sorted by start/end

- Subtract this row# from the start or end value giving your a group

This is known as the Tabibitosan method.

Group by the group calculated by this to get the split you want:

with rws ( grp, subgrp, st, en ) as ( select 123,'A',2010,2011 from dual union all select 123,'A',2011,2012 from dual union all select 123,'A',2012,2013 from dual union all select 123,'A',2013,2014 from dual union all select 123,'B',2014,2015 from dual union all select 123,'B',2015,2016 from dual union all select 123,'B',2016,2017 from dual union all select 123,'A',2017,2018 from dual union all select 123,'A',2018,2019 from dual union all select 123,'A',2019,2020 from dual ), grps as ( select r.*, st - row_number () over ( partition by grp, subgrp order by st, en ) g from rws r ) select grp, subgrp, min ( st ), max ( en ) from grps group by grp, subgrp, g order by 3; GRP SUBGRP MIN(ST) MAX(EN) 123 A 2010 2014 123 B 2014 2017 123 A 2017 2020

## Rating

# Comments

See, the order of records in your original table is never guaranteed unless you have a column guaranteed to sort it like that - for instance a LOAD_DT column on which you can apply ORDER BY.

For you, that ordering column might as well be the START_Y if you think it guarantees ordering.

LOAD_DT is not extracted into the outermost query to order it by, that's why A's in SUBGRP are in consecutive records but it's doable to order it in the way you like.

For you, that ordering column might as well be the START_Y if you think it guarantees ordering.

**Create**:CREATE TABLE TEST ( GRP INTEGER, SUBGRP CHAR(1), START_Y INTEGER, END_Y INTEGER, LOAD_DT DATE );

**Insert**:INSERT INTO TEST VALUES(122,'Z',2009,2010, SYSDATE); INSERT INTO TEST VALUES(123,'A',2010,2011,SYSDATE); INSERT INTO TEST VALUES(123,'A',2011,2012,SYSDATE); INSERT INTO TEST VALUES(123,'A',2012,2013,SYSDATE); INSERT INTO TEST VALUES(123,'A',2013,2014,SYSDATE); INSERT INTO TEST VALUES(123,'B',2014,2015,SYSDATE); INSERT INTO TEST VALUES(123,'B',2015,2016,SYSDATE); INSERT INTO TEST VALUES(123,'B',2016,2017,SYSDATE); INSERT INTO TEST VALUES(123,'A',2017,2018,SYSDATE); INSERT INTO TEST VALUES(123,'A',2018,2019,SYSDATE); INSERT INTO TEST VALUES(123,'A',2019,2020,SYSDATE); INSERT INTO TEST VALUES(124,'C',2021,2022,SYSDATE); COMMIT;

**Select**:SELECT * FROM TEST ORDER BY LOAD_DT;

GRP SUBGRP START_Y END_Y LOAD_DT 122 Z 2009 2010 21-06-10 123 A 2010 2011 21-06-10 123 A 2011 2012 21-06-10 123 A 2012 2013 21-06-10 123 A 2013 2014 21-06-10 123 B 2014 2015 21-06-10 123 B 2015 2016 21-06-10 123 B 2016 2017 21-06-10 123 A 2017 2018 21-06-10 123 A 2018 2019 21-06-10 123 A 2019 2020 21-06-10 124 C 2021 2022 21-06-10

SELECT GRP, SUBGRP, FINAL_GRP_START_Y, FINAL_GRP_END_Y FROM ( SELECT GRP, SUBGRP, --we can't group by GRP, SUBGRP so this funky logic to aggregate using lag/lead again in the outer query CASE WHEN GRP_START_Y = 0 THEN LAG(GRP_START_Y) OVER (ORDER BY LOAD_DT) ELSE GRP_START_Y END AS FINAL_GRP_START_Y, CASE WHEN GRP_END_Y = 0 THEN LEAD(GRP_END_Y) OVER (ORDER BY LOAD_DT) ELSE GRP_END_Y END AS FINAL_GRP_END_Y, LOAD_DT FROM ( SELECT GRP,SUBGRP, <i>--find out START_Y and END_Y at the point where GRP and SUBGRP switches values using lag/lead</i> CASE WHEN SUBGRP <> nvl(LAG(SUBGRP) OVER (ORDER BY LOAD_DT),'a') OR GRP <> nvl(LEAD(GRP) OVER (ORDER BY LOAD_DT), 0) THEN START_Y ELSE 0 END AS GRP_START_Y, CASE WHEN SUBGRP <> nvl(LEAD(SUBGRP) OVER (ORDER BY LOAD_DT),'a') OR GRP <> nvl(LEAD(GRP) OVER (ORDER BY LOAD_DT), 0) THEN END_Y ELSE 0 END AS GRP_END_Y, <i>--this is just to see output of lag,lead. Not needed for calculation</i> LAG(SUBGRP) OVER (ORDER BY LOAD_DT) LAG_OUTPUT, LEAD(SUBGRP) OVER (ORDER BY LOAD_DT) LEAD_OUTPUT, START_Y,END_Y,LOAD_DT FROM TEST ) T WHERE GRP_START_Y <> GRP_END_Y ) GROUP BY GRP, SUBGRP, FINAL_GRP_START_Y, FINAL_GRP_END_Y;

LOAD_DT is not extracted into the outermost query to order it by, that's why A's in SUBGRP are in consecutive records but it's doable to order it in the way you like.

122 Z 2009 2010 123 B 2014 2017 123 A 2010 2014 123 A 2017 2020 124 C 2021 2022 123 A 2019 2020

Hi Chris,

thanks for your answer.

Exactly what I was searching for :-)

thanks for your answer.

Exactly what I was searching for :-)

You're welcome :)