Hi Tom,
I have a requirement to split the dates when one date range can contain other date range as shown below.
Need help writing sql query for this.
These are the different examples.
Input:
IK IK1 START_DATE END_DATE
417532654 887825619 5/1/2017 4/30/2018
417532654 901076156 5/1/2017 12/31/2017
Output:
IK IK1 START_DATE END_DATE
417532654 887825619 5/1/2017 12/31/2017
417532654 901076156 1/1/2018 4/30/2018
417532654 901076156 5/1/2017 12/31/2017
Input:
IK IK1 START_DATE END_DATE
417517804 919990044 10/1/2020 10/31/2020
417517804 915844176 10/1/2020 9/30/2021
417517804 918267561 11/1/2020 12/31/2020
Output:
IK IK1 START_DATE END_DATE
417517804 919990044 10/1/2020 10/31/2020
417517804 915844176 10/1/2020 10/31/2020
417517804 918267561 11/1/2020 12/31/2020
417517804 915844176 11/1/2020 12/31/2020
417517804 915844176 1/1/2021 9/30/2021
Input:
IK IK1 START_DATE END_DATE
417517805 919990044 8/1/2020 8/1/2020
417517805 915844176 8/1/2020 8/31/2021
417517805 918267561 8/1/2020 8/1/2020
417517805 915844177 8/1/2020 8/31/2021
Output:
IK IK1 START_DATE END_DATE
417517805 919990044 8/1/2020 8/1/2020
417517805 915844176 8/1/2020 8/1/2021
417517805 915844176 8/2/2020 8/31/2021
417517805 918267561 8/1/2020 8/1/2020
417517805 915844177 8/1/2020 8/1/2021
417517805 915844177 8/2/2020 8/31/2021
Here is the script:
create table test_dates
( ik number,
ik1 number,
start_date date,
end_date date);
REM INSERTING into TEST_DATES
SET DEFINE OFF;
Insert into TEST_DATES (IK,IK1,START_DATE,END_DATE) values (417517804,919990044,to_date('10/01/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'),to_date('10/31/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'));
Insert into TEST_DATES (IK,IK1,START_DATE,END_DATE) values (417517804,915844176,to_date('10/01/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'),to_date('09/30/2021 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'));
Insert into TEST_DATES (IK,IK1,START_DATE,END_DATE) values (417517804,918267561,to_date('11/01/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'),to_date('12/31/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'));
Insert into TEST_DATES (IK,IK1,START_DATE,END_DATE) values (417517805,915844177,to_date('08/01/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'),to_date('08/31/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'));
Insert into TEST_DATES (IK,IK1,START_DATE,END_DATE) values (417517805,919990044,to_date('08/01/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'),to_date('08/01/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'));
Insert into TEST_DATES (IK,IK1,START_DATE,END_DATE) values (417517805,915844176,to_date('08/01/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'),to_date('08/31/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'));
Insert into TEST_DATES (IK,IK1,START_DATE,END_DATE) values (417517805,918267561,to_date('08/01/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'),to_date('08/01/2020 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'));
Insert into TEST_DATES (IK,IK1,START_DATE,END_DATE) values (417532654,887825619,to_date('05/01/2017 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'),to_date('04/30/2018 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'));
Insert into TEST_DATES (IK,IK1,START_DATE,END_DATE) values (417532654,901076156,to_date('05/01/2017 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'),to_date('12/31/2017 12.00.00 AM','MM/DD/YYYY HH.MI.SS AM'));
If you're looking to find all the ranges which fall within another, here's one way to approach this:
- Join the table to itself, getting all those rows where the start and end dates are both between the start & dates in the self-join
dups.start_date between base.start_date and base.end_date
and dups.end_date between base.start_date and base.end_date
- The start date is the previous end date + 1 or the current start date for the first row - you can do this with lag():
lag (
dups.end_date + 1, 1, base.start_date
) over (
partition by base.ik, base.ik1
order by dups.end_date
)
- The end date is the max end date so far - you can do this with max() over ...
max ( dups.end_date ) over (
partition by base.ik, base.ik1
order by dups.end_date
)
Altogether this is:
alter session set nls_date_format = ' DD-MON-YYYY ';
with rws as (
select base.ik, base.ik1,
lag (
dups.end_date + 1, 1, base.start_date
) over (
partition by base.ik, base.ik1
order by dups.end_date
) start_date,
max ( dups.end_date ) over (
partition by base.ik, base.ik1
order by dups.end_date
) end_date
from test_dates base
join test_dates dups
on dups.start_date between base.start_date and base.end_date
and dups.end_date between base.start_date and base.end_date
)
select * from rws
where start_date <= end_date
order by start_date, end_date;
IK IK1 START_DATE END_DATE
417532654 887825619 01-MAY-2017 31-DEC-2017
417532654 901076156 01-MAY-2017 31-DEC-2017
417532654 887825619 01-JAN-2018 30-APR-2018
417517805 919990044 01-AUG-2020 01-AUG-2020
417517805 918267561 01-AUG-2020 01-AUG-2020
417517805 915844177 01-AUG-2020 01-AUG-2020
417517805 915844176 01-AUG-2020 01-AUG-2020
417517805 915844176 02-AUG-2020 31-AUG-2020
417517805 915844177 02-AUG-2020 31-AUG-2020
417517804 915844176 01-OCT-2020 31-OCT-2020
417517804 919990044 01-OCT-2020 31-OCT-2020
417517804 918267561 01-NOV-2020 31-DEC-2020
417517804 915844176 01-NOV-2020 31-DEC-2020
417517804 915844176 01-JAN-2021 30-SEP-2021This assumes that all sub-periods are contained within another with no overlaps.
e.g. there are no periods like this:
1 Jan 2021 - 1 May 2021
1 Feb 2021 - 1 Jul 2021
That you need to split into 1 Jan - 1 Feb - 1 May - 1 Jul. It also assumes no gaps in the subranges. If there are, you can't (always) take the previous end date.
If these assumptions are wrong, you'll need to update the solution. I recommend reading Stew Ashton's blog posts on merging/splitting overlapping date ranges
https://stewashton.wordpress.com/2014/03/11/sql-for-date-ranges-gaps-and-overlaps/