I have a simple table with following data: (PK, StartDate, EndDate) for which I want to know what are the periods of operation and what are the non-operational periods. For example:
StartDate EndDate
15-Apr-16 29-Apr-16
29-Apr-16 30-Apr-16
09-May-16 12-May-16
25-Jul-16 19-Sep-16 A
11-Aug-16 17-Aug-16 included in A
21-Aug-16 05-Sep-16 included in A
28-Aug-16 17-Sep-16 included in A
06-Sep-16 10-Oct-16 Partial to A, so now the period is 25-Jul-16 until 10-Oct-16
---------------------------------------
So based on the above the operational periods are:
15-Apr-16 until 30-Apr-16
09-May-16 until 12-May-16
25-Jul-16 until 10-Oct-16
and non-operational periods are:
01-May-16 until 08-May-16
13-May-16 until 25-Jul-16
Appreciate your support.
Here's one way to do it. The concept is to first merge the groups, then find the missing groups.
To merge the groups, the basic idea is:
1. Split the ranges so you have every possible sub-range. e.g. 25 Jul - 19 Sep becomes 25 Jul - 11 Aug, 11 Aug - 17 Aug, 17 Aug - 21 Aug, etc.
2. Group together the consecutive ranges using the Tabibitosan method
You can do step 1 by:
- Unpivoting the original dataset so you have a single column of dates. Each of these is the start of a range
- The end of each range is the next row. So you can calculate this with lead
- Note you'll need to squash out duplicate start dates at this point! You can do this by calculating the dense_rank of start dates on the original set. Then group by this value on the unpivoted rows.
You now have a set of ranges.
alter session set nls_date_format = 'dd-Mon-yy';
create table t (
stdt date,
endt date
);
insert into t values ('15-Apr-16', '29-Apr-16');
insert into t values ('29-Apr-16', '30-Apr-16');
insert into t values ('09-May-16', '12-May-16');
insert into t values ('10-May-16', '12-May-16');
insert into t values ('25-Jul-16', '19-Sep-16');
insert into t values ('11-Aug-16', '17-Aug-16');
insert into t values ('21-Aug-16', '05-Sep-16');
insert into t values ('28-Aug-16', '17-Sep-16');
insert into t values ('06-Sep-16', '10-Oct-16');
commit;
with ranges as (
select c1, dt, lead(dt) over (order by dt) ld,
dense_rank() over (order by dt) dr
from t
unpivot (dt for c1 in (stdt, endt))
), grps as (
select min(dt) st, max(ld) en, dr
from t, ranges r
where t.stdt <= r.dt and r.dt < t.endt
group by dr
)
select * from grps
order by dr;
ST EN DR
15-Apr-16 29-Apr-16 1
29-Apr-16 30-Apr-16 2
09-May-16 10-May-16 4
10-May-16 12-May-16 5
25-Jul-16 11-Aug-16 7
11-Aug-16 17-Aug-16 8
17-Aug-16 21-Aug-16 9
21-Aug-16 28-Aug-16 10
28-Aug-16 05-Sep-16 11
05-Sep-16 06-Sep-16 12
06-Sep-16 17-Sep-16 13
17-Sep-16 19-Sep-16 14
19-Sep-16 10-Oct-16 15
To do step 2
- Increment a counter for each row, ordered by start date. (row_number)
- Then find the difference between this at the dense_rank value
- Group by this calculation and get the min start and max end to get the consecutive groups:
with ranges as (
select c1, dt, lead(dt) over (order by dt) ld,
dense_rank() over (order by dt) dr
from t
unpivot (dt for c1 in (stdt, endt))
), grps as (
select min(dt) st, max(ld) en, dr
from t, ranges r
where t.stdt <= r.dt and r.dt < t.endt
group by dr
), tab as (
select st, en,
dr - row_number() over (order by st) grp
from grps
), oper as (
select 'Operational' status, min(st) st, max(en) en from tab
group by grp
)
select * from oper
order by 1;
STATUS ST EN
Operational 09-May-16 12-May-16
Operational 25-Jul-16 10-Oct-16
Operational 15-Apr-16 30-Apr-16
You now have the operational ranges.
The non-operational ranges:
- Start at the end of each operational range
- End at the start of the next operational range
So you can find this with lead again:
select 'Non-operational' status,
en + 1 st, lead(st) over (order by st) - 1 en
from oper
To get everything in a single query, union (all) the results together and exclude the final non-operational group (that has a null end):
with ranges as (
select c1, dt, lead(dt) over (order by dt) ld,
dense_rank() over (order by dt) dr
from t
unpivot (dt for c1 in (stdt, endt))
), grps as (
select min(dt) st, max(ld) en, dr
from t, ranges r
where t.stdt <= r.dt and r.dt < t.endt
group by dr
), tab as (
select st, en,
dr - row_number() over (order by st) grp
from grps
), oper as (
select 'Operational' status, min(st) st, max(en) en from tab
group by grp
), non_oper as (
select 'Non-operational' status,
en + 1 st, lead(st) over (order by st) - 1 en
from oper
)
select * from oper
union all
select * from non_oper
where en is not null
order by 2, 3;
STATUS ST EN
Operational 15-Apr-16 30-Apr-16
Non-operational 01-May-16 08-May-16
Operational 09-May-16 12-May-16
Non-operational 13-May-16 24-Jul-16
Operational 25-Jul-16 10-Oct-16
If you want to know more about dealing with gaps and overlaps in dates, I recommend reading Stew Ashton's blog posts:
https://stewashton.wordpress.com/2014/03/11/sql-for-date-ranges-gaps-and-overlaps/