Tom,
I've been tasked with finding the 'most relevant' contiguous date range among records that store date ranges, and I'm stuck.
Most relevant is defined as
1. If sysdate falls within a date range, it's most relevant. If not then,
2. The next future date range is most relevant. If there is no future date range then,
3. the most recent previous date range is most relevant.
CREATE table TJ_TEST (
TJ_TEST_KEY NUMBER(10) primary key,
PERSON_KEY NUMBER(10) NOT NULL,
EFFECTIVE_DATE DATE NOT NULL,
TERMINATION_DATE DATE NOT NULL)
;
insert into TJ_TEST values(1, 1234, to_date('2009-01-01', 'YYYY-MM-DD'), to_date('2009-03-31', 'YYYY-MM-DD'));
insert into TJ_TEST values(2, 1234, to_date('2009-04-01', 'YYYY-MM-DD'), to_date('2009-06-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(3, 1234, to_date('2009-07-01', 'YYYY-MM-DD'), to_date('2009-09-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(4, 1234, to_date('2009-12-01', 'YYYY-MM-DD'), to_date('2009-12-15', 'YYYY-MM-DD'));
insert into TJ_TEST values(5, 1234, to_date('2010-01-01', 'YYYY-MM-DD'), to_date('2010-03-31', 'YYYY-MM-DD'));
insert into TJ_TEST values(6, 1234, to_date('2010-04-01', 'YYYY-MM-DD'), to_date('2010-06-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(7, 1234, to_date('2010-07-01', 'YYYY-MM-DD'), to_date('2010-09-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(8, 1234, to_date('2011-01-01', 'YYYY-MM-DD'), to_date('2011-03-01', 'YYYY-MM-DD'));
insert into TJ_TEST values(9, 1234, to_date('2011-04-01', 'YYYY-MM-DD'), to_date('2011-06-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(10, 1234, to_date('2011-07-01', 'YYYY-MM-DD'), to_date('2011-09-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(11, 1234, to_date('2011-10-01', 'YYYY-MM-DD'), to_date('2011-12-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(12, 1234, to_date('2012-01-01', 'YYYY-MM-DD'), to_date('2012-11-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(13, 1234, to_date('2012-12-01', 'YYYY-MM-DD'), to_date('2012-12-30', 'YYYY-MM-DD'));
Using lead & lag, I can see the beginnings and ends of contiguous date ranges:
select abs(nvl(TERMINATION_DATE - lag(EFFECTIVE_DATE) over (order by EFFECTIVE_DATE desc),2)) as END_DATE_GAP,
nvl(EFFECTIVE_DATE - lead(TERMINATION_DATE) over (order by EFFECTIVE_DATE desc), 2) as START_DATE_GAP,
TJ_TEST_KEY,
PERSON_KEY,
to_char(EFFECTIVE_DATE, 'YYYY-MM-DD') EFFECTIVE_DATE,
to_char(TERMINATION_DATE, 'YYYY-MM-DD') TERMINATION_DATE
from TJ_TEST
where PERSON_KEY = 1234;
END_DATE_GAP START_DATE_GAP TJ_TEST_KEY PERSON_KEY EFFECTIVE_ TERMINATIO
------------ -------------- ----------- ---------- ---------- ----------
2 1 13 1234 2012-12-01 2012-12-30
1 2 12 1234 2012-01-01 2012-11-30
2 1 11 1234 2011-10-01 2011-12-30
1 1 10 1234 2011-07-01 2011-09-30
1 31 9 1234 2011-04-01 2011-06-30
31 93 8 1234 2011-01-01 2011-03-01
93 1 7 1234 2010-07-01 2010-09-30
1 1 6 1234 2010-04-01 2010-06-30
1 17 5 1234 2010-01-01 2010-03-31
17 62 4 1234 2009-12-01 2009-12-15
62 1 3 1234 2009-07-01 2009-09-30
1 1 2 1234 2009-04-01 2009-06-30
1 2 1 1234 2009-01-01 2009-03-31
13 rows selected.
When a gap of more than one day exists, then the ranges are not contiguous.
The above needs to become a set of date ranges (in ascending order for readability),
EFFECTIVE_ TERMINATIO
---------- ----------
2009-01-01 2009-09-30
2009-12-01 2009-12-15
2010-01-01 2010-09-30
2011-01-01 2011-03-01
2011-04-01 2011-12-30
2012-01-01 2012-12-30
From here, the most relevant date range is 2011-04-01 - 2011-12-30.
The GAP values > 1 are endpoints of the date ranges I want to evaluate, but when I use this in a cursor and step through the data, I get a seemingly unrelated SQL error:
DECLARE
CURR_EFF_DATE date;
CURR_TERM_DATE date;
FUT_EFF_DATE date;
FUT_TERM_DATE date;
INSP_EFF_DATE date;
INSP_TERM_DATE date;
cursor C1 is
select abs(nvl(TERMINATION_DATE - lag(EFFECTIVE_DATE) over (order by EFFECTIVE_DATE desc),2)) as END_DATE_GAP,
nvl(EFFECTIVE_DATE - lead(TERMINATION_DATE) over (order by EFFECTIVE_DATE desc), 2) as START_DATE_GAP,
TJ_TEST_KEY,
PERSON_KEY,
to_char(EFFECTIVE_DATE, 'YYYY-MM-DD') EFFECTIVE_DATE,
to_char(TERMINATION_DATE, 'YYYY-MM-DD') TERMINATION_DATE
from TJ_TEST
where PERSON_KEY = 1234;
BEGIN
FOR item IN C1 LOOP
IF INSP_EFF_DATE is null or INSP_TERM_DATE is null THEN
IF item.START_DATE_GAP <> 1 THEN
INSP_EFF_DATE := item.EFFECTIVE_DATE;
END IF;
IF item.END_DATE_GAP <> 1 THEN
INSP_TERM_DATE := item.TERMINATION_DATE;
END IF;
END IF;
IF sysdate between INSP_EFF_DATE and INSP_TERM_DATE then
CURR_EFF_DATE := INSP_EFF_DATE;
CURR_TERM_DATE := INSP_TERM_DATE;
ELSIF sysdate > INSP_TERM_DATE then
CURR_EFF_DATE := INSP_EFF_DATE;
CURR_TERM_DATE := INSP_TERM_DATE;
ELSIF sysdate < INSP_EFF_DATE then
FUT_EFF_DATE := INSP_EFF_DATE;
FUT_TERM_DATE := INSP_TERM_DATE;
INSP_EFF_DATE := NULL;
INSP_TERM_DATE := NULL;
END IF;
exit when CURR_EFF_DATE is not null;
END LOOP;
dbms_output.put_line('CURR_EFF_DATE: ' || to_char(CURR_EFF_DATE, 'YYYY-MM-DD'));
dbms_output.put_line('CURR_TERM_DATE: ' || to_char(CURR_TERM_DATE, 'YYYY-MM-DD'));
dbms_output.put_line('FUT_EFF_DATE: ' || to_char(FUT_EFF_DATE, 'YYYY-MM-DD'));
dbms_output.put_line('FUT_TERM_DATE: ' || to_char(FUT_TERM_DATE, 'YYYY-MM-DD'));
dbms_output.put_line('INSP_EFF_DATE: ' || to_char(INSP_EFF_DATE, 'YYYY-MM-DD'));
dbms_output.put_line('INSP_TERM_DATE: ' || to_char(INSP_TERM_DATE, 'YYYY-MM-DD'));
END;
/
nvl(EFFECTIVE_DATE - lead(TERMINATION_DATE) over (order by EFFECTIVE_DATE desc), 2) as START_DATE_GAP,
*
ERROR at line 10:
ORA-01861: literal does not match format string
ORA-06512: at line 24
In spite of the error message, I can't help but think that there's a problem with stepping through a cursor that contains a windowed function.
Can you either help me push through this error or identify a simpler way to accomplish this (short of not doing it at all)?
Thanks,
T. J.
Here we go, I added some data to be "multi-person"
ops$tkyte%ORA11GR2> insert into TJ_TEST values(14, 456, sysdate-20, sysdate+10 );
1 row created.
ops$tkyte%ORA11GR2> insert into TJ_TEST values(15, 456, sysdate-200, sysdate-150 );
1 row created.
ops$tkyte%ORA11GR2> insert into TJ_TEST values(16, 456, sysdate+12, sysdate+25 );
1 row created.
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> select *
2 from (
3 select person_key, edate, tdate,
4 row_number() over (partition by person_key order by oc, case when oc = 1 then edate end ASC, case when oc = 2 then tdate end DESC ) rn
5 from (
6 select person_key, min(effective_date) edate, max(termination_date) tdate,
7 case when sysdate between min(effective_date) and max(termination_date)
8 then 0
9 when sysdate < min(effective_date)
10 then 1
11 when sysdate > max(termination_date)
12 then 2
13 end oc
14 from (
15 select person_key, effective_date, termination_date,
16 last_value( grp IGNORE NULLS ) over (partition by person_key order by effective_date) the_grp
17 from (
18 select t.*, lag(termination_date) over (partition by person_key order by effective_date),
19 case when effective_date - lag(termination_date) over (partition by person_key order by effective_date) > 1
20 then row_number() over (partition by person_key order by effective_date)
21 when lag(termination_date) over (partition by person_key order by effective_date) is null
22 then row_number() over (partition by person_key order by effective_date)
23 end grp
24 from tj_test t
25 ) t
26 ) t
27 group by person_key, the_grp
28 )
29 )
30 where rn = 1
31 /
PERSON_KEY EDATE TDATE RN
---------- --------- --------- ----------
456 25-FEB-11 27-MAR-11 1
1234 01-APR-11 30-DEC-11 1
If you don't need that - eg, you'll do this for a single person key, then:
ops$tkyte%ORA11GR2> select *
2 from (
3 select :x, edate, tdate,
4 row_number() over (order by oc, case when oc = 1 then edate end ASC, case when oc = 2 then tdate end DESC ) rn
5 from (
6 select min(effective_date) edate, max(termination_date) tdate,
7 case when sysdate between min(effective_date) and max(termination_date)
8 then 0
9 when sysdate < min(effective_date)
10 then 1
11 when sysdate > max(termination_date)
12 then 2
13 end oc
14 from (
15 select effective_date, termination_date,
16 last_value( grp IGNORE NULLS ) over (order by effective_date) the_grp
17 from (
18 select t.*, lag(termination_date) over (order by effective_date),
19 case when effective_date - lag(termination_date) over (order by effective_date) > 1
20 then row_number() over (order by effective_date)
21 when lag(termination_date) over (order by effective_date) is null
22 then row_number() over (order by effective_date)
23 end grp
24 from tj_test t
25 ) t
26 ) t
27 group by the_grp
28 )
29 )
30 where rn = 1
31 /
:X EDATE TDATE RN
---------- --------- --------- ----------
1234 01-JAN-11 27-MAR-11 1
will do it. Let's look at the query bit by bit:
ops$tkyte%ORA11GR2> select t.*, lag(termination_date) over (partition by person_key order by effective_date),
2 case when effective_date - lag(termination_date) over (partition by person_key order by effective_date) > 1
3 then row_number() over (partition by person_key order by effective_date)
4 when lag(termination_date) over (partition by person_key order by effective_date) is null
5 then row_number() over (partition by person_key order by effective_date)
6 end grp
7 from tj_test t
8 /
TJ_TEST_KEY PERSON_KEY EFFECTIVE TERMINATI LAG(TERMI GRP
----------- ---------- --------- --------- --------- ----------
15 456 29-AUG-10 18-OCT-10 1
14 456 25-FEB-11 27-MAR-11 18-OCT-10 2
16 456 29-MAR-11 11-APR-11 27-MAR-11 3
1 1234 01-JAN-09 31-MAR-09 1
2 1234 01-APR-09 30-JUN-09 31-MAR-09
3 1234 01-JUL-09 30-SEP-09 30-JUN-09
4 1234 01-DEC-09 15-DEC-09 30-SEP-09 4
5 1234 01-JAN-10 31-MAR-10 15-DEC-09 5
6 1234 01-APR-10 30-JUN-10 31-MAR-10
7 1234 01-JUL-10 30-SEP-10 30-JUN-10
8 1234 01-JAN-11 01-MAR-11 30-SEP-10 8
9 1234 01-APR-11 30-JUN-11 01-MAR-11 9
10 1234 01-JUL-11 30-SEP-11 30-JUN-11
11 1234 01-OCT-11 30-DEC-11 30-SEP-11
12 1234 01-JAN-12 30-NOV-12 30-DEC-11 12
13 1234 01-DEC-12 30-DEC-12 30-NOV-12
16 rows selected.
this'll help us build our contiguous groups, we mark each record where the last termination date is more than one away from our effective date.
Then we carry down that value:
ops$tkyte%ORA11GR2> select person_key, effective_date, termination_date,
2 last_value( grp IGNORE NULLS ) over (partition by person_key order by effective_date) the_grp
3 from (
4 select t.*, lag(termination_date) over (partition by person_key order by effective_date),
5 case when effective_date - lag(termination_date) over (partition by person_key order by effective_date) > 1
6 then row_number() over (partition by person_key order by effective_date)
7 when lag(termination_date) over (partition by person_key order by effective_date) is null
8 then row_number() over (partition by person_key order by effective_date)
9 end grp
10 from tj_test t
11 ) t
12 /
PERSON_KEY EFFECTIVE TERMINATI THE_GRP
---------- --------- --------- ----------
456 29-AUG-10 18-OCT-10 1
456 25-FEB-11 27-MAR-11 2
456 29-MAR-11 11-APR-11 3
1234 01-JAN-09 31-MAR-09 1
1234 01-APR-09 30-JUN-09 1
1234 01-JUL-09 30-SEP-09 1
1234 01-DEC-09 15-DEC-09 4
1234 01-JAN-10 31-MAR-10 5
1234 01-APR-10 30-JUN-10 5
1234 01-JUL-10 30-SEP-10 5
1234 01-JAN-11 01-MAR-11 8
1234 01-APR-11 30-JUN-11 9
1234 01-JUL-11 30-SEP-11 9
1234 01-OCT-11 30-DEC-11 9
1234 01-JAN-12 30-NOV-12 12
1234 01-DEC-12 30-DEC-12 12
16 rows selected.
Now we min/max to get our ranges:
ops$tkyte%ORA11GR2> select person_key, min(effective_date) edate, max(termination_date) tdate,
2 case when sysdate between min(effective_date) and max(termination_date)
3 then 0
4 when sysdate < min(effective_date)
5 then 1
6 when sysdate > max(termination_date)
7 then 2
8 end oc
9 from (
10 select person_key, effective_date, termination_date,
11 last_value( grp IGNORE NULLS ) over (partition by person_key order by effective_date) the_grp
12 from (
13 select t.*, lag(termination_date) over (partition by person_key order by effective_date),
14 case when effective_date - lag(termination_date) over (partition by person_key order by effective_date) > 1
15 then row_number() over (partition by person_key order by effective_date)
16 when lag(termination_date) over (partition by person_key order by effective_date) is null
17 then row_number() over (partition by person_key order by effective_date)
18 end grp
19 from tj_test t
20 ) t
21 ) t
22 group by person_key, the_grp
23 /
PERSON_KEY EDATE TDATE OC
---------- --------- --------- ----------
1234 01-JAN-09 30-SEP-09 2
1234 01-DEC-09 15-DEC-09 2
1234 01-JAN-10 30-SEP-10 2
1234 01-JAN-11 01-MAR-11 2
1234 01-JAN-12 30-DEC-12 1
456 29-MAR-11 11-APR-11 1
456 29-AUG-10 18-OCT-10 2
1234 01-APR-11 30-DEC-11 1
456 25-FEB-11 27-MAR-11 0
9 rows selected.
I also set up an OC - order column - to help us with your rules of selection. If sysdate is in range - then zero (first, best match). If sysdate is less than the effective date - the range is in the future, we assign one - that is second best. If sysdate is newer than the termination date - in the past - we assign two.
Then:
ops$tkyte%ORA11GR2> select person_key, edate, tdate,
2 row_number() over (partition by person_key order by oc, case when oc = 1 then edate end ASC, case when oc = 2 then tdate end DESC ) rn
3 from (
4 select person_key, min(effective_date) edate, max(termination_date) tdate,
5 case when sysdate between min(effective_date) and max(termination_date)
6 then 0
7 when sysdate < min(effective_date)
8 then 1
9 when sysdate > max(termination_date)
10 then 2
11 end oc
12 from (
13 select person_key, effective_date, termination_date,
14 last_value( grp IGNORE NULLS ) over (partition by person_key order by effective_date) the_grp
15 from (
16 select t.*, lag(termination_date) over (partition by person_key order by effective_date),
17 case when effective_date - lag(termination_date) over (partition by person_key order by effective_date) > 1
18 then row_number() over (partition by person_key order by effective_date)
19 when lag(termination_date) over (partition by person_key order by effective_date) is null
20 then row_number() over (partition by person_key order by effective_date)
21 end grp
22 from tj_test t
23 ) t
24 ) t
25 group by person_key, the_grp
26 )
27 /
PERSON_KEY EDATE TDATE RN
---------- --------- --------- ----------
456 25-FEB-11 27-MAR-11 1
456 29-MAR-11 11-APR-11 2
456 29-AUG-10 18-OCT-10 3
1234 01-APR-11 30-DEC-11 1
1234 01-JAN-12 30-DEC-12 2
1234 01-JAN-11 01-MAR-11 3
1234 01-JAN-10 30-SEP-10 4
1234 01-DEC-09 15-DEC-09 5
1234 01-JAN-09 30-SEP-09 6
9 rows selected.
Using row_number and ordering by our OC, that puts a range with sysdate in it first, - then if OC = 1 we'll order by edate from small to big (to put the nearest date range first in that group) and then if OC = 2 by tdate from big to small (putting again the nearest date range first in that group)
and then simply keep the first row for each person key....
NO PROCEDURAL CODE!!!!
As for the error you were getting, that is an interesting one. The error message is looking buggy, the real error is coming into play here:
INSP_EFF_DATE := item.EFFECTIVE_DATE;
item.effective_date is a STRING in the format YYYY-MM-DD
insp_eff_date is a DATE
you are doing an implicit conversion from a string to a date
and your NLS_DATE_FORMAT must not have been YYYY-MM-DD
It is yet ANOTHER CASE of the dreaded implicit conversion.
The error message is buggy because it didn't show you line 24! Line 24 is one of your assignments....
ops$tkyte%ORA11GR2> DECLARE
2 INSP_EFF_DATE date;
3 cursor C1 is
4 select to_char(EFFECTIVE_DATE, 'YYYY-MM-DD') EFFECTIVE_DATE,
5 to_char(TERMINATION_DATE, 'YYYY-MM-DD') TERMINATION_DATE
6 from TJ_TEST
7 where PERSON_KEY = 1234;
8 BEGIN
9 FOR item IN C1
10 LOOP
11 INSP_EFF_DATE := item.EFFECTIVE_DATE;
12 end loop;
13 end;
14 /
select to_char(EFFECTIVE_DATE, 'YYYY-MM-DD') EFFECTIVE_DATE,
*
ERROR at line 4:
ORA-01861: literal does not match format string
ORA-06512: at line 11
See, the error stack tells us line 11 - but the top of the stack goes back to the originating SQL for some reason and that is what got printed out in the error message.
Look at line 11 though, that is the culprit.