Skip to Main Content
  • Questions
  • Find the most relevant contiguous date range

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, TJ.

Asked: March 16, 2011 - 10:50 am UTC

Last updated: January 05, 2012 - 2:34 pm UTC

Version: 10.2.0.3

Viewed 1000+ times

You Asked

Tom,

I've been tasked with finding the 'most relevant' contiguous date range among records that store date ranges, and I'm stuck.

Most relevant is defined as
1. If sysdate falls within a date range, it's most relevant. If not then,
2. The next future date range is most relevant. If there is no future date range then,
3. the most recent previous date range is most relevant.

CREATE table TJ_TEST (
  TJ_TEST_KEY NUMBER(10) primary key,
  PERSON_KEY NUMBER(10) NOT NULL,
  EFFECTIVE_DATE DATE NOT NULL,
  TERMINATION_DATE DATE NOT NULL)
;

insert into TJ_TEST values(1, 1234, to_date('2009-01-01', 'YYYY-MM-DD'), to_date('2009-03-31', 'YYYY-MM-DD'));
insert into TJ_TEST values(2, 1234, to_date('2009-04-01', 'YYYY-MM-DD'), to_date('2009-06-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(3, 1234, to_date('2009-07-01', 'YYYY-MM-DD'), to_date('2009-09-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(4, 1234, to_date('2009-12-01', 'YYYY-MM-DD'), to_date('2009-12-15', 'YYYY-MM-DD'));
insert into TJ_TEST values(5, 1234, to_date('2010-01-01', 'YYYY-MM-DD'), to_date('2010-03-31', 'YYYY-MM-DD'));
insert into TJ_TEST values(6, 1234, to_date('2010-04-01', 'YYYY-MM-DD'), to_date('2010-06-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(7, 1234, to_date('2010-07-01', 'YYYY-MM-DD'), to_date('2010-09-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(8, 1234, to_date('2011-01-01', 'YYYY-MM-DD'), to_date('2011-03-01', 'YYYY-MM-DD'));
insert into TJ_TEST values(9, 1234, to_date('2011-04-01', 'YYYY-MM-DD'), to_date('2011-06-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(10, 1234, to_date('2011-07-01', 'YYYY-MM-DD'), to_date('2011-09-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(11, 1234, to_date('2011-10-01', 'YYYY-MM-DD'), to_date('2011-12-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(12, 1234, to_date('2012-01-01', 'YYYY-MM-DD'), to_date('2012-11-30', 'YYYY-MM-DD'));
insert into TJ_TEST values(13, 1234, to_date('2012-12-01', 'YYYY-MM-DD'), to_date('2012-12-30', 'YYYY-MM-DD'));


Using lead & lag, I can see the beginnings and ends of contiguous date ranges:
    select abs(nvl(TERMINATION_DATE - lag(EFFECTIVE_DATE) over (order by EFFECTIVE_DATE desc),2)) as END_DATE_GAP,
           nvl(EFFECTIVE_DATE - lead(TERMINATION_DATE) over (order by EFFECTIVE_DATE desc), 2) as START_DATE_GAP,
           TJ_TEST_KEY,
           PERSON_KEY,
           to_char(EFFECTIVE_DATE, 'YYYY-MM-DD') EFFECTIVE_DATE,
           to_char(TERMINATION_DATE, 'YYYY-MM-DD') TERMINATION_DATE
      from TJ_TEST
     where PERSON_KEY = 1234;

END_DATE_GAP START_DATE_GAP TJ_TEST_KEY PERSON_KEY EFFECTIVE_ TERMINATIO
------------ -------------- ----------- ---------- ---------- ----------
           2              1          13       1234 2012-12-01 2012-12-30
           1              2          12       1234 2012-01-01 2012-11-30
           2              1          11       1234 2011-10-01 2011-12-30
           1              1          10       1234 2011-07-01 2011-09-30
           1             31           9       1234 2011-04-01 2011-06-30
          31             93           8       1234 2011-01-01 2011-03-01
          93              1           7       1234 2010-07-01 2010-09-30
           1              1           6       1234 2010-04-01 2010-06-30
           1             17           5       1234 2010-01-01 2010-03-31
          17             62           4       1234 2009-12-01 2009-12-15
          62              1           3       1234 2009-07-01 2009-09-30
           1              1           2       1234 2009-04-01 2009-06-30
           1              2           1       1234 2009-01-01 2009-03-31

13 rows selected.


When a gap of more than one day exists, then the ranges are not contiguous.
The above needs to become a set of date ranges (in ascending order for readability),
EFFECTIVE_ TERMINATIO
---------- ----------
2009-01-01 2009-09-30
2009-12-01 2009-12-15
2010-01-01 2010-09-30
2011-01-01 2011-03-01
2011-04-01 2011-12-30
2012-01-01 2012-12-30

From here, the most relevant date range is 2011-04-01 - 2011-12-30.

The GAP values > 1 are endpoints of the date ranges I want to evaluate, but when I use this in a cursor and step through the data, I get a seemingly unrelated SQL error:

DECLARE 
  CURR_EFF_DATE date;
  CURR_TERM_DATE date;
  FUT_EFF_DATE date;
  FUT_TERM_DATE date;
  INSP_EFF_DATE date;
  INSP_TERM_DATE date;
  cursor C1 is
    select abs(nvl(TERMINATION_DATE - lag(EFFECTIVE_DATE) over (order by EFFECTIVE_DATE desc),2)) as END_DATE_GAP,
           nvl(EFFECTIVE_DATE - lead(TERMINATION_DATE) over (order by EFFECTIVE_DATE desc), 2) as START_DATE_GAP,
           TJ_TEST_KEY,
           PERSON_KEY,
           to_char(EFFECTIVE_DATE, 'YYYY-MM-DD') EFFECTIVE_DATE,
           to_char(TERMINATION_DATE, 'YYYY-MM-DD') TERMINATION_DATE
      from TJ_TEST
     where PERSON_KEY = 1234;
BEGIN
  FOR item IN C1 LOOP
    IF INSP_EFF_DATE is null or INSP_TERM_DATE is null THEN
      IF item.START_DATE_GAP <> 1 THEN
        INSP_EFF_DATE := item.EFFECTIVE_DATE;
      END IF;
      IF item.END_DATE_GAP <> 1 THEN
        INSP_TERM_DATE := item.TERMINATION_DATE;
      END IF;
    END IF;
    
    IF sysdate between INSP_EFF_DATE and INSP_TERM_DATE then 
      CURR_EFF_DATE := INSP_EFF_DATE;
      CURR_TERM_DATE := INSP_TERM_DATE;
    
    ELSIF sysdate > INSP_TERM_DATE then
      CURR_EFF_DATE := INSP_EFF_DATE;
      CURR_TERM_DATE := INSP_TERM_DATE;
    
    ELSIF sysdate < INSP_EFF_DATE then
      FUT_EFF_DATE := INSP_EFF_DATE;
      FUT_TERM_DATE := INSP_TERM_DATE;
      INSP_EFF_DATE := NULL;
      INSP_TERM_DATE := NULL;
    END IF;
    
    exit when CURR_EFF_DATE is not null;
  END LOOP;
  
  dbms_output.put_line('CURR_EFF_DATE: ' || to_char(CURR_EFF_DATE, 'YYYY-MM-DD'));
  dbms_output.put_line('CURR_TERM_DATE: ' || to_char(CURR_TERM_DATE, 'YYYY-MM-DD'));
  dbms_output.put_line('FUT_EFF_DATE: ' || to_char(FUT_EFF_DATE, 'YYYY-MM-DD'));
  dbms_output.put_line('FUT_TERM_DATE: ' || to_char(FUT_TERM_DATE, 'YYYY-MM-DD'));
  dbms_output.put_line('INSP_EFF_DATE: ' || to_char(INSP_EFF_DATE, 'YYYY-MM-DD'));
  dbms_output.put_line('INSP_TERM_DATE: ' || to_char(INSP_TERM_DATE, 'YYYY-MM-DD'));
      
END;
/
           nvl(EFFECTIVE_DATE - lead(TERMINATION_DATE) over (order by EFFECTIVE_DATE desc), 2) as START_DATE_GAP,
                     *
ERROR at line 10:
ORA-01861: literal does not match format string
ORA-06512: at line 24


In spite of the error message, I can't help but think that there's a problem with stepping through a cursor that contains a windowed function.

Can you either help me push through this error or identify a simpler way to accomplish this (short of not doing it at all)?

Thanks,
T. J.

and Tom said...

Here we go, I added some data to be "multi-person"




ops$tkyte%ORA11GR2> insert into TJ_TEST values(14, 456, sysdate-20, sysdate+10 );

1 row created.

ops$tkyte%ORA11GR2> insert into TJ_TEST values(15, 456, sysdate-200, sysdate-150 );

1 row created.

ops$tkyte%ORA11GR2> insert into TJ_TEST values(16, 456, sysdate+12, sysdate+25 );

1 row created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select *
  2    from (
  3  select person_key, edate, tdate,
  4         row_number() over (partition by person_key order by oc, case when oc = 1 then edate end ASC, case when oc = 2 then tdate end DESC ) rn
  5    from (
  6  select person_key, min(effective_date) edate, max(termination_date) tdate,
  7          case when sysdate between min(effective_date) and max(termination_date)
  8              then 0
  9              when sysdate < min(effective_date)
 10              then 1
 11              when sysdate > max(termination_date)
 12              then 2
 13          end oc
 14    from (
 15  select person_key, effective_date, termination_date,
 16         last_value( grp IGNORE NULLS ) over (partition by person_key order by effective_date) the_grp
 17    from (
 18  select t.*, lag(termination_date) over (partition by person_key order by effective_date),
 19         case when effective_date - lag(termination_date) over (partition by person_key order by effective_date) > 1
 20              then row_number() over (partition by person_key order by effective_date)
 21              when lag(termination_date) over (partition by person_key order by effective_date) is null
 22              then row_number() over (partition by person_key order by effective_date)
 23          end grp
 24    from tj_test t
 25         ) t
 26         ) t
 27   group by person_key, the_grp
 28         )
 29         )
 30   where rn = 1
 31  /

PERSON_KEY EDATE     TDATE             RN
---------- --------- --------- ----------
       456 25-FEB-11 27-MAR-11          1
      1234 01-APR-11 30-DEC-11          1




If you don't need that - eg, you'll do this for a single person key, then:

ops$tkyte%ORA11GR2> select *
  2    from (
  3  select :x, edate, tdate,
  4         row_number() over (order by oc, case when oc = 1 then edate end ASC, case when oc = 2 then tdate end DESC ) rn
  5    from (
  6  select min(effective_date) edate, max(termination_date) tdate,
  7          case when sysdate between min(effective_date) and max(termination_date)
  8              then 0
  9              when sysdate < min(effective_date)
 10              then 1
 11              when sysdate > max(termination_date)
 12              then 2
 13          end oc
 14    from (
 15  select effective_date, termination_date,
 16         last_value( grp IGNORE NULLS ) over (order by effective_date) the_grp
 17    from (
 18  select t.*, lag(termination_date) over (order by effective_date),
 19         case when effective_date - lag(termination_date) over (order by effective_date) > 1
 20              then row_number() over (order by effective_date)
 21              when lag(termination_date) over (order by effective_date) is null
 22              then row_number() over (order by effective_date)
 23          end grp
 24    from tj_test t
 25         ) t
 26         ) t
 27   group by the_grp
 28         )
 29         )
 30   where rn = 1
 31  /

        :X EDATE     TDATE             RN
---------- --------- --------- ----------
      1234 01-JAN-11 27-MAR-11          1



will do it. Let's look at the query bit by bit:

ops$tkyte%ORA11GR2> select t.*, lag(termination_date) over (partition by person_key order by effective_date),
  2         case when effective_date - lag(termination_date) over (partition by person_key order by effective_date) > 1
  3              then row_number() over (partition by person_key order by effective_date)
  4              when lag(termination_date) over (partition by person_key order by effective_date) is null
  5              then row_number() over (partition by person_key order by effective_date)
  6          end grp
  7    from tj_test t
  8  /

TJ_TEST_KEY PERSON_KEY EFFECTIVE TERMINATI LAG(TERMI        GRP
----------- ---------- --------- --------- --------- ----------
         15        456 29-AUG-10 18-OCT-10                    1
         14        456 25-FEB-11 27-MAR-11 18-OCT-10          2
         16        456 29-MAR-11 11-APR-11 27-MAR-11          3
          1       1234 01-JAN-09 31-MAR-09                    1
          2       1234 01-APR-09 30-JUN-09 31-MAR-09
          3       1234 01-JUL-09 30-SEP-09 30-JUN-09
          4       1234 01-DEC-09 15-DEC-09 30-SEP-09          4
          5       1234 01-JAN-10 31-MAR-10 15-DEC-09          5
          6       1234 01-APR-10 30-JUN-10 31-MAR-10
          7       1234 01-JUL-10 30-SEP-10 30-JUN-10
          8       1234 01-JAN-11 01-MAR-11 30-SEP-10          8
          9       1234 01-APR-11 30-JUN-11 01-MAR-11          9
         10       1234 01-JUL-11 30-SEP-11 30-JUN-11
         11       1234 01-OCT-11 30-DEC-11 30-SEP-11
         12       1234 01-JAN-12 30-NOV-12 30-DEC-11         12
         13       1234 01-DEC-12 30-DEC-12 30-NOV-12

16 rows selected.


this'll help us build our contiguous groups, we mark each record where the last termination date is more than one away from our effective date.

Then we carry down that value:

ops$tkyte%ORA11GR2> select person_key, effective_date, termination_date,
  2         last_value( grp IGNORE NULLS ) over (partition by person_key order by effective_date) the_grp
  3    from (
  4  select t.*, lag(termination_date) over (partition by person_key order by effective_date),
  5         case when effective_date - lag(termination_date) over (partition by person_key order by effective_date) > 1
  6              then row_number() over (partition by person_key order by effective_date)
  7              when lag(termination_date) over (partition by person_key order by effective_date) is null
  8              then row_number() over (partition by person_key order by effective_date)
  9          end grp
 10    from tj_test t
 11         ) t
 12  /

PERSON_KEY EFFECTIVE TERMINATI    THE_GRP
---------- --------- --------- ----------
       456 29-AUG-10 18-OCT-10          1
       456 25-FEB-11 27-MAR-11          2
       456 29-MAR-11 11-APR-11          3
      1234 01-JAN-09 31-MAR-09          1
      1234 01-APR-09 30-JUN-09          1
      1234 01-JUL-09 30-SEP-09          1
      1234 01-DEC-09 15-DEC-09          4
      1234 01-JAN-10 31-MAR-10          5
      1234 01-APR-10 30-JUN-10          5
      1234 01-JUL-10 30-SEP-10          5
      1234 01-JAN-11 01-MAR-11          8
      1234 01-APR-11 30-JUN-11          9
      1234 01-JUL-11 30-SEP-11          9
      1234 01-OCT-11 30-DEC-11          9
      1234 01-JAN-12 30-NOV-12         12
      1234 01-DEC-12 30-DEC-12         12

16 rows selected.


Now we min/max to get our ranges:

ops$tkyte%ORA11GR2> select person_key, min(effective_date) edate, max(termination_date) tdate,
  2          case when sysdate between min(effective_date) and max(termination_date)
  3              then 0
  4              when sysdate < min(effective_date)
  5              then 1
  6              when sysdate > max(termination_date)
  7              then 2
  8          end oc
  9    from (
 10  select person_key, effective_date, termination_date,
 11         last_value( grp IGNORE NULLS ) over (partition by person_key order by effective_date) the_grp
 12    from (
 13  select t.*, lag(termination_date) over (partition by person_key order by effective_date),
 14         case when effective_date - lag(termination_date) over (partition by person_key order by effective_date) > 1
 15              then row_number() over (partition by person_key order by effective_date)
 16              when lag(termination_date) over (partition by person_key order by effective_date) is null
 17              then row_number() over (partition by person_key order by effective_date)
 18          end grp
 19    from tj_test t
 20         ) t
 21         ) t
 22   group by person_key, the_grp
 23  /

PERSON_KEY EDATE     TDATE             OC
---------- --------- --------- ----------
      1234 01-JAN-09 30-SEP-09          2
      1234 01-DEC-09 15-DEC-09          2
      1234 01-JAN-10 30-SEP-10          2
      1234 01-JAN-11 01-MAR-11          2
      1234 01-JAN-12 30-DEC-12          1
       456 29-MAR-11 11-APR-11          1
       456 29-AUG-10 18-OCT-10          2
      1234 01-APR-11 30-DEC-11          1
       456 25-FEB-11 27-MAR-11          0

9 rows selected.



I also set up an OC - order column - to help us with your rules of selection. If sysdate is in range - then zero (first, best match). If sysdate is less than the effective date - the range is in the future, we assign one - that is second best. If sysdate is newer than the termination date - in the past - we assign two.


Then:

ops$tkyte%ORA11GR2> select person_key, edate, tdate,
  2         row_number() over (partition by person_key order by oc, case when oc = 1 then edate end ASC, case when oc = 2 then tdate end DESC ) rn
  3    from (
  4  select person_key, min(effective_date) edate, max(termination_date) tdate,
  5          case when sysdate between min(effective_date) and max(termination_date)
  6              then 0
  7              when sysdate < min(effective_date)
  8              then 1
  9              when sysdate > max(termination_date)
 10              then 2
 11          end oc
 12    from (
 13  select person_key, effective_date, termination_date,
 14         last_value( grp IGNORE NULLS ) over (partition by person_key order by effective_date) the_grp
 15    from (
 16  select t.*, lag(termination_date) over (partition by person_key order by effective_date),
 17         case when effective_date - lag(termination_date) over (partition by person_key order by effective_date) > 1
 18              then row_number() over (partition by person_key order by effective_date)
 19              when lag(termination_date) over (partition by person_key order by effective_date) is null
 20              then row_number() over (partition by person_key order by effective_date)
 21          end grp
 22    from tj_test t
 23         ) t
 24         ) t
 25   group by person_key, the_grp
 26         )
 27  /

PERSON_KEY EDATE     TDATE             RN
---------- --------- --------- ----------
       456 25-FEB-11 27-MAR-11          1
       456 29-MAR-11 11-APR-11          2
       456 29-AUG-10 18-OCT-10          3
      1234 01-APR-11 30-DEC-11          1
      1234 01-JAN-12 30-DEC-12          2
      1234 01-JAN-11 01-MAR-11          3
      1234 01-JAN-10 30-SEP-10          4
      1234 01-DEC-09 15-DEC-09          5
      1234 01-JAN-09 30-SEP-09          6

9 rows selected.



Using row_number and ordering by our OC, that puts a range with sysdate in it first, - then if OC = 1 we'll order by edate from small to big (to put the nearest date range first in that group) and then if OC = 2 by tdate from big to small (putting again the nearest date range first in that group)

and then simply keep the first row for each person key....


NO PROCEDURAL CODE!!!!



As for the error you were getting, that is an interesting one. The error message is looking buggy, the real error is coming into play here:

INSP_EFF_DATE := item.EFFECTIVE_DATE;


item.effective_date is a STRING in the format YYYY-MM-DD
insp_eff_date is a DATE

you are doing an implicit conversion from a string to a date

and your NLS_DATE_FORMAT must not have been YYYY-MM-DD

It is yet ANOTHER CASE of the dreaded implicit conversion.

The error message is buggy because it didn't show you line 24! Line 24 is one of your assignments....




ops$tkyte%ORA11GR2> DECLARE
  2    INSP_EFF_DATE date;
  3    cursor C1 is
  4      select to_char(EFFECTIVE_DATE, 'YYYY-MM-DD') EFFECTIVE_DATE,
  5             to_char(TERMINATION_DATE, 'YYYY-MM-DD') TERMINATION_DATE
  6        from TJ_TEST
  7       where PERSON_KEY = 1234;
  8  BEGIN
  9    FOR item IN C1
 10    LOOP
 11          INSP_EFF_DATE := item.EFFECTIVE_DATE;
 12    end loop;
 13  end;
 14  /
    select to_char(EFFECTIVE_DATE, 'YYYY-MM-DD') EFFECTIVE_DATE,
                *
ERROR at line 4:
ORA-01861: literal does not match format string
ORA-06512: at line 11




See, the error stack tells us line 11 - but the top of the stack goes back to the originating SQL for some reason and that is what got printed out in the error message.

Look at line 11 though, that is the culprit.

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thank you!

TJ Kiernan, March 17, 2011 - 3:03 pm UTC

Absolutely brilliant. Thank you, Tom. Breaking down your answer into its smaller parts is most helpful.

I've been beating the 'implicit conversion is evil' drum quite loudly lately, so I'm a bit embarrassed about that error. Thank you for pointing that out in spite of the fact that the procedural portion of my attempt was rendered irrelevant.

-T. J.
Tom Kyte
March 17, 2011 - 3:36 pm UTC

I only did the breaking down bit because you share my initials :) TJK

TJK

Kim Berg Hansen, March 18, 2011 - 2:21 am UTC

So that's the secret... Gotta change my name to Thibadault James Köenig !

(Sorry, couldn't resist :-)

A reader, January 05, 2012 - 2:15 pm UTC

So close, my initials are TJF. 8(

I have a question about designing for date ranges. Take a scenario where you have a set of parameters that you want to store values for, but you want to be able to set what the parameter will be at a future point in time, or turn a parameter off for a bit. Most of the systems I have worked on in the past that have this requirement use BEGIN_DATE and END_DATE and manage these dates when you insert a new record. So, if I wanted to change the value of a parameter effective today, I would first have to update the current record to set the END_DATE to yesterday, then insert a new record with today’s date. I would also have to check to see if there were any records set to become active in the future, and either sent the END_DATE of my new record to the day before the next record, or remove the future record(s). That becomes very difficult to program and foresee all of the different combinations, not to mention I am not sure how I could possibly enforce those rules through some constraint to make sure even people updating the table manually followed the rules.

So, I am wondering if it makes sense to basically ignore END_DATE unless it is ending a record prematurely. Take this example:

CREATE TABLE test_date_range (
id VARCHAR2(100),
value VARCHAR2(100),
begin_date DATE,
end_date DATE,
CONSTRAINT testdaterange_pk PRIMARY KEY (id, begin_date)
);

INSERT INTO test_date_range VALUES ('test1', 'bob', TO_DATE('01122011','MMDDYYYY'), NULL);
INSERT INTO test_date_range VALUES ('test1', 'fred', TO_DATE('06222011','MMDDYYYY'), NULL);
INSERT INTO test_date_range VALUES ('test1', 'bill', TO_DATE('01122012','MMDDYYYY'), NULL);


If we ignore END_DATE, then this is perfectly fine and we don’t need to add any logic to “end” each of these records. We can use the following query to get what the value is on any given date replacing SYSDATE in the query with the point in time we want to check for:

SELECT id, value, begin_date, end_date
FROM
(SELECT id, value, begin_date, end_date,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY begin_date DESC) AS rowno
FROM test_date_range
WHERE id = 'test1'
AND SYSDATE >= begin_date)
WHERE rowno = 1
AND SYSDATE <= NVL(end_date,SYSDATE)


If we decide we need to cancel the value, we only need to update that single row and it is fine:

UPDATE test_date_range
SET end_date = TRUNC(SYSDATE-1)
WHERE id = 'test1'
AND begin_date = TO_DATE('06222011','MMDDYYYY');


If we want to put a new row into effect, we don’t have to worry about adjusting any end dates, we just add it:

INSERT INTO test_date_range VALUES ('test1', 'frank', TRUNC(SYSDATE-5), NULL);


I am wondering if this approach makes sense. If you were to take over a system that had this implemented would you go, “what was that crazy developer thinking?” Are there issues in this approach that I have not considered yet? Is it too strange that there are a bunch of records that “begin” but never “end”? Basically, is there a better way for me to build this?
Tom Kyte
January 05, 2012 - 2:34 pm UTC

sounds a lot like this database feature:

http://docs.oracle.com/cd/E11882_01/appdev.112/e11826/long_vt.htm#i1010463

which for a table of parameters should work pretty good (always test)...

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.