Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Chris Saxon

Thanks for the question, Mike.

Asked: January 16, 2018 - 8:43 pm UTC

Last updated: January 19, 2018 - 4:52 pm UTC

Version: 11.2.0

Viewed 1000+ times

You Asked

Hi,

I have a following table:
create table t ( user_id varchar2(10), clinic varchar2(50), visit_dt date );

INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '08-APR-2008') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '21-APR-2008') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '08-JAN-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '26-JAN-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '29-JAN-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '24-FEB-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '17-MAR-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '30-MAR-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '04-MAY-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '19-OCT-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '18-NOV-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '25-NOV-2009') );


We are looking to select users who visited a clinic two or more times with visits at least 3 months apart. Is there a way to only select those visits that are >=3 months apart? The first qualifying visit is the first visit to the clinic and next qualifying visit must be >=3 months later. The might be multiple visits under 3 months, but we only interested in those that are 3+. The clock re-starts after each qualifying visit. User ID can be linked to more than one clinic, but qualifying visits need to be to the same clinic.

From the table above I need the following records:

user_id  clinic  visit_dt
---------------------------------
12 oncology 08-APR-2008
12 oncology 08-JAN-2009
12 oncology 04-MAY-2009
12 oncology 19-OCT-2009      


Any help is much appreciated! Thank you.

and Chris said...

It's a shame you're not on 12c. Pattern matching makes this easy :)

Just define an always-true initial variable. Then another which matches all those within 3 months of this:

alter session set nls_date_format = ' DD Mon YY ';

select * from t
match_recognize (
  order by visit_dt
  measures
    match_number() visit,
    first(st.visit_dt) stdt,
    coalesce(last(visit.visit_dt), st.visit_dt) endt,
    user_id usr,
    clinic  clin
  one row per match
  pattern ( st visit* )
  define
    visit as months_between ( visit_dt, st.visit_dt ) < 3
);

VISIT   STDT          ENDT          USR   CLIN       
      1  08 Apr 08     21 Apr 08    12    oncology   
      2  08 Jan 09     30 Mar 09    12    oncology   
      3  04 May 09     04 May 09    12    oncology   
      4  19 Oct 09     25 Nov 09    12    oncology   


On 11.2 you can get what you want using recursive with. Which is a little clunky!

In the base condition, select the first row for the patient. Then, in the recursive part, check if the date of the next visit is within three months of the initial. If it is, return the previous date. If not, return the current date.

Then take the distinct of this to get the visits:

with vals as (
  select t.*, 
         row_number() over (order by visit_dt) rn,
         min(visit_dt) over () mn
  from   t
), visits (
  user_id, clinic, visit_strt, rn
) as (
  select user_id, clinic, visit_dt, rn
  from   vals
  where  visit_dt = mn
  union all
  select s.user_id, s.clinic,
         case
           when months_between(s.visit_dt, v.visit_strt) < 3 then
             v.visit_strt
           else
             s.visit_dt
         end, s.rn
  from   visits v
  join   vals s
  on     v.rn + 1 = s.rn
)
  select distinct user_id, clinic, visit_strt
  from   visits
  order  by visit_strt;

USER_ID   CLINIC     VISIT_STRT    
12        oncology    08 Apr 08    
12        oncology    08 Jan 09    
12        oncology    04 May 09    
12        oncology    19 Oct 09

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Mike Finnegan, January 18, 2018 - 5:16 pm UTC

Very much appreciate the response. From the looks of the suggested query it is exactly what I need. However, I am still wondering if it can be broken down into smaller queries? I have ran suggested code and it kept running for hours (<12 hr) on 70,000 records table. Not sure what the issue is. Will have to figure out how to split that code. Thanks again for taking your time to help out!
Chris Saxon
January 18, 2018 - 5:57 pm UTC

Break up the recursive query? No, not really. Generally running a single query is faster than many.

If you need help with performance issues, we need to see the execution plan for it. You can find out how to get these at:

https://blogs.oracle.com/sql/how-to-create-an-execution-plan

12 hours on 70k rows sounds excessive though...

Mike Finnegan, January 18, 2018 - 8:07 pm UTC

I have tried the query on a smaller subset of data and it rung quick enough, however there results are not what I expect. For example, if I add a few rows to above table:



create table t ( user_id varchar2(10), clinic varchar2(50), visit_dt date );

INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '08-APR-2008') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '21-APR-2008') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '08-JAN-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '26-JAN-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '29-JAN-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '24-FEB-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '17-MAR-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '30-MAR-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '04-MAY-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '19-OCT-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '18-NOV-2009') );
INSERT INTO T VALUES ( 012, 'oncology', TO_DATE( '25-NOV-2009') );
INSERT INTO T VALUES ( 015, 'cardiology', TO_DATE( '20-MAY-2008') );
INSERT INTO T VALUES ( 015, 'cardiology', TO_DATE( '18-NOV-2008') );
INSERT INTO T VALUES ( 015, 'cardiology', TO_DATE( '18-NOV-2008') );
INSERT INTO T VALUES ( 015, 'cardiology', TO_DATE( '02-DEC-2008') );
INSERT INTO T VALUES ( 015, 'cardiology', TO_DATE( '02-DEC-2008') );

and then run the query I get incorrect results:
user_id clinic visit_strt
15 cardiology 08-APR-2008
12 oncology 08-APR-2008
15 cardiology 18-NOV-2008
12 oncology 18-NOV-2008
12 oncology 24-FEB-2009
12 oncology 19-OCT-2009

The user 015 at the cardiology clinic has visit date on 08-APR-2008 that is from a different user id.
I think the issue might be in the row number partitioning (?)
Chris Saxon
January 19, 2018 - 10:59 am UTC

Yes, you need to partition row_number() and min by user_id. And join on this in the recursive call (the 2nd branch of the union all)

What am I missing?

Duke Ganote, January 19, 2018 - 1:26 pm UTC

There's some subtlety I'm missing, I'm sure. If we're just looking for user-clinic visits more than 3 months since the prior, then:

set linesize 130
column clinic format a10
column months_from_last format 999.99
select s.* 
     , case when COALESCE(months_from_last,9999) >=3 
            then 'Y' else 'N' 
        end as qtr_year_apart
  from (
select t.*
      , lag(visit_dt)over
            (partition by user_id, clinic order by visit_dt) AS prior_visit_dt
      , months_between(visit_dt
           , lag(visit_dt)over
               (partition by user_id, clinic order by visit_dt)) AS months_from_last
   from t
) s
 order by user_id, clinic, visit_dt asc;


USER_ID    CLINIC     VISIT_DT  PRIOR_VISIT_DT MONTHS_FROM_LAST QTR_YEAR_APART
---------- ---------- --------- -------------- ---------------- --------------
12         oncology   08-APR-08                                 Y             
12         oncology   21-APR-08 08-APR-08                   .42 N             
12         oncology   08-JAN-09 21-APR-08                  8.58 Y             
12         oncology   26-JAN-09 08-JAN-09                   .58 N             
12         oncology   29-JAN-09 26-JAN-09                   .10 N             
12         oncology   24-FEB-09 29-JAN-09                   .84 N             
12         oncology   17-MAR-09 24-FEB-09                   .77 N             
12         oncology   30-MAR-09 17-MAR-09                   .42 N             
12         oncology   04-MAY-09 30-MAR-09                  1.16 N             
12         oncology   19-OCT-09 04-MAY-09                  5.48 Y             
12         oncology   18-NOV-09 19-OCT-09                   .97 N             
12         oncology   25-NOV-09 18-NOV-09                   .23 N             
15         cardiology 20-MAY-08                                 Y             
15         cardiology 18-NOV-08 20-MAY-08                  5.94 Y             
15         cardiology 18-NOV-08 18-NOV-08                  0.00 N             
15         cardiology 02-DEC-08 18-NOV-08                   .48 N             
15         cardiology 02-DEC-08 02-DEC-08                  0.00 N             

17 rows selected.

Chris Saxon
January 19, 2018 - 3:26 pm UTC

My understanding is they're counting three months after the first visit, not the prior.

Otherwise the attendances on 30-MAR-09 and 04-MAY-09 fall under the same visit surely?

Mike Finnegan, January 19, 2018 - 4:32 pm UTC

Thank you! Correct partitioning solved the issue. I got all the visits three or more months apart. I have been working on this issue for a week. Your help is much appreciated.
Chris Saxon
January 19, 2018 - 4:52 pm UTC

Happy to help

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.