Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Rehan.

Asked: February 24, 2018 - 10:28 am UTC

Last updated: March 01, 2018 - 1:29 am UTC

Version: 11g

Viewed 1000+ times

You Asked

I have following initial dataset
F_ID L_CAT CHG_DT
F1  VHL  01-FEB-2016
F1  VHL  10-FEB-2016
F1  VHL  15-FEB-2016
F1  MHL  20-FEB-2016
F1  VHL  25-FEB-2016
F1  VHL  28-FEB-2016
F1  MHL  05-MAR-2016
F1  MHL  10-MAR-2016
F2  VHL  01-FEB-2016
F2  VHL  10-FEB-2016
F2  MHL  18-FEB-2016
F2  MHL  21-FEB-2016
F2  VHL  25-FEB-2016

and I want to generate following output from SQL Query:
F_ID L_CAT FROM_DT  TO_DT
F1  VHL  01-FEB-2016  20-FEB-2016
F1  MHL  20-FEB-2016  25-FEB-2016
F1  VHL  25-FEB-2016  05-MAR-2016
F1  MHL  05-MAR-2016  10-MAR-2016
F2  VHL  01-FEB-2016  18-FEB-2016
F2  MHL  18-FEB-2016  25-FEB-2016
F2  VHL  25-FEB-2016  25-FEB-2016


i.e., I want to calculate time span during which each F_ID remain in specific L_CAT

code to produce scenario
create table my_test
(
f_id varchar2(30),
l_cat varchar2(30),
chg_dt date
);

insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','01-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','10-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','15-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','MHL','20-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','25-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','28-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','MHL','05-MAR-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','MHL','10-MAR-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F2','VHL','01-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F2','VHL','10-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F2','MHL','18-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F2','MHL','21-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F2','VHL','25-FEB-2016');
COMMIT;

Thanks

with LiveSQL Test Case:

and Connor said...

There are more concise ways of doing this but I'll give you the longer version because it will let you build up the concepts here

SQL> @drop my_Test

Y1                                                                                           Y2
-------------------------------------------------------------------------------------------- -------------------------
TABLE                                                                                        cascade constraints purge

1 row selected.


Table dropped.

SQL>
SQL> create table my_test
  2  (
  3  f_id varchar2(30),
  4  l_cat varchar2(30),
  5  chg_dt date
  6  );

Table created.

SQL>
SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','01-FEB-2016');

1 row created.

SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','10-FEB-2016');

1 row created.

SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','15-FEB-2016');

1 row created.

SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F1','MHL','20-FEB-2016');

1 row created.

SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','25-FEB-2016');

1 row created.

SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','28-FEB-2016');

1 row created.

SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F1','MHL','05-MAR-2016');

1 row created.

SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F1','MHL','10-MAR-2016');

1 row created.

SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F2','VHL','01-FEB-2016');

1 row created.

SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F2','VHL','10-FEB-2016');

1 row created.

SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F2','MHL','18-FEB-2016');

1 row created.

SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F2','MHL','21-FEB-2016');

1 row created.

SQL> insert into my_test(f_id, l_cat, chg_dt) values ('F2','VHL','25-FEB-2016');

1 row created.

SQL> COMMIT;

Commit complete.

--
-- extract some information about previous and next row from the current row
--

SQL>
SQL>
SQL> select f_id,
  2         l_cat,
  3         chg_dt,
  4         lead(chg_dt) over ( partition by f_id order by chg_dt ) nxt_dt,
  5         lag(chg_dt) over ( partition by f_id order by chg_dt ) prev_dt,
  6         lead(l_cat) over ( partition by f_id order by chg_dt ) nxt_cat,
  7         lag(l_cat) over ( partition by f_id order by chg_dt ) prev_cat
  8  from my_Test
  9  order by 1,3;

F_ID                           L_CAT                          CHG_DT    NXT_DT    PREV_DT   NXT_CAT                        PREV_CAT
------------------------------ ------------------------------ --------- --------- --------- ------------------------------ ------------------------------
F1                             VHL                            01-FEB-16 10-FEB-16           VHL
F1                             VHL                            10-FEB-16 15-FEB-16 01-FEB-16 VHL                            VHL
F1                             VHL                            15-FEB-16 20-FEB-16 10-FEB-16 MHL                            VHL
F1                             MHL                            20-FEB-16 25-FEB-16 15-FEB-16 VHL                            VHL
F1                             VHL                            25-FEB-16 28-FEB-16 20-FEB-16 VHL                            MHL
F1                             VHL                            28-FEB-16 05-MAR-16 25-FEB-16 MHL                            VHL
F1                             MHL                            05-MAR-16 10-MAR-16 28-FEB-16 MHL                            VHL
F1                             MHL                            10-MAR-16           05-MAR-16                                MHL
F2                             VHL                            01-FEB-16 10-FEB-16           VHL
F2                             VHL                            10-FEB-16 18-FEB-16 01-FEB-16 MHL                            VHL
F2                             MHL                            18-FEB-16 21-FEB-16 10-FEB-16 MHL                            VHL
F2                             MHL                            21-FEB-16 25-FEB-16 18-FEB-16 VHL                            MHL
F2                             VHL                            25-FEB-16           21-FEB-16                                MHL

13 rows selected.

-- DATE_FROM
-- when the prev is null (start of rows), we are at the start of a new range
-- or when this category is not the same as previous, we are at the start of a new range
--
-- DATE_TO
-- when the next is null (end of rows), we are at the end of current date range
-- or when next category is not same as this one, we are at the end of current date range


SQL> with t as
  2  (
  3  select f_id,
  4         l_cat,
  5         chg_dt,
  6         lead(chg_dt) over ( partition by f_id order by chg_dt ) nxt_dt,
  7         lag(chg_dt) over ( partition by f_id order by chg_dt ) prev_dt,
  8         lead(l_cat) over ( partition by f_id order by chg_dt ) nxt_cat,
  9         lag(l_cat) over ( partition by f_id order by chg_dt ) prev_cat
 10  from my_Test
 11  )
 12  select
 13    f_id,
 14    l_cat,
 15    case
 16      when prev_dt is null then chg_dt
 17      when l_cat != prev_cat then chg_dt
 18    end date_from,
 19    case
 20      when nxt_dt is null then chg_dt
 21      when l_cat != nxt_cat then nxt_dt
 22    end date_to
 23  from t;

F_ID                           L_CAT                          DATE_FROM DATE_TO
------------------------------ ------------------------------ --------- ---------
F1                             VHL                            01-FEB-16
F1                             VHL
F1                             VHL                                      20-FEB-16
F1                             MHL                            20-FEB-16 25-FEB-16
F1                             VHL                            25-FEB-16
F1                             VHL                                      05-MAR-16
F1                             MHL                            05-MAR-16
F1                             MHL                                      10-MAR-16
F2                             VHL                            01-FEB-16
F2                             VHL                                      18-FEB-16
F2                             MHL                            18-FEB-16
F2                             MHL                                      25-FEB-16
F2                             VHL                            25-FEB-16 25-FEB-16

13 rows selected.

--
-- now I want to carry the DATE_FROM down through every row
--

SQL> with t as
  2  (
  3  select f_id,
  4         l_cat,
  5         chg_dt,
  6         lead(chg_dt) over ( partition by f_id order by chg_dt ) nxt_dt,
  7         lag(chg_dt) over ( partition by f_id order by chg_dt ) prev_dt,
  8         lead(l_cat) over ( partition by f_id order by chg_dt ) nxt_cat,
  9         lag(l_cat) over ( partition by f_id order by chg_dt ) prev_cat
 10  from my_Test
 11  )
 12  select
 13    f_id,
 14    l_cat,
 15    chg_dt,
 16    last_value(case
 17      when prev_dt is null then chg_dt
 18      when l_cat != prev_cat then chg_dt
 19    end ignore nulls) over ( partition by f_id order by chg_dt) date_from,
 20    case
 21      when nxt_dt is null then chg_dt
 22      when l_cat != nxt_cat then nxt_dt
 23    end date_to
 24  from t
 25  /

F_ID                           L_CAT                          CHG_DT    DATE_FROM DATE_TO
------------------------------ ------------------------------ --------- --------- ---------
F1                             VHL                            01-FEB-16 01-FEB-16
F1                             VHL                            10-FEB-16 01-FEB-16
F1                             VHL                            15-FEB-16 01-FEB-16 20-FEB-16
F1                             MHL                            20-FEB-16 20-FEB-16 25-FEB-16
F1                             VHL                            25-FEB-16 25-FEB-16
F1                             VHL                            28-FEB-16 25-FEB-16 05-MAR-16
F1                             MHL                            05-MAR-16 05-MAR-16
F1                             MHL                            10-MAR-16 05-MAR-16 10-MAR-16
F2                             VHL                            01-FEB-16 01-FEB-16
F2                             VHL                            10-FEB-16 01-FEB-16 18-FEB-16
F2                             MHL                            18-FEB-16 18-FEB-16
F2                             MHL                            21-FEB-16 18-FEB-16 25-FEB-16
F2                             VHL                            25-FEB-16 25-FEB-16 25-FEB-16

13 rows selected.

--
-- and with that, a simple GROUP BY / MAX will get the DATE_TO I need
--


SQL> with t as
  2  (
  3  select f_id,
  4         l_cat,
  5         chg_dt,
  6         lead(chg_dt) over ( partition by f_id order by chg_dt ) nxt_dt,
  7         lag(chg_dt) over ( partition by f_id order by chg_dt ) prev_dt,
  8         lead(l_cat) over ( partition by f_id order by chg_dt ) nxt_cat,
  9         lag(l_cat) over ( partition by f_id order by chg_dt ) prev_cat
 10  from my_Test
 11  ),
 12  t2 as
 13  (
 14  select
 15    f_id,
 16    l_cat,
 17    chg_dt,
 18    last_value(case
 19      when prev_dt is null then chg_dt
 20      when l_cat != prev_cat then chg_dt
 21    end ignore nulls) over ( partition by f_id order by chg_dt) date_from,
 22    case
 23      when nxt_dt is null then chg_dt
 24      when l_cat != nxt_cat then nxt_dt
 25    end date_to
 26  from t
 27  )
 28  select
 29    f_id,
 30    l_cat,
 31    date_from,
 32    max(date_to)
 33  from t2
 34  group by f_id, l_cat, date_from
 35  order by 1,3;

F_ID                           L_CAT                          DATE_FROM MAX(DATE_
------------------------------ ------------------------------ --------- ---------
F1                             VHL                            01-FEB-16 20-FEB-16
F1                             MHL                            20-FEB-16 25-FEB-16
F1                             VHL                            25-FEB-16 05-MAR-16
F1                             MHL                            05-MAR-16 10-MAR-16
F2                             VHL                            01-FEB-16 18-FEB-16
F2                             MHL                            18-FEB-16 25-FEB-16
F2                             VHL                            25-FEB-16 25-FEB-16

7 rows selected.

SQL>


Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Rehan Ali, February 25, 2018 - 9:26 am UTC

WOW! Superb. Very well explained. Thank you very much...
Connor McDonald
February 27, 2018 - 1:54 am UTC

glad we could help

using sql pattern matching

Rajeshwaran, Jeyabal, February 27, 2018 - 1:36 am UTC

Team,

was looking at to solve this using pattern matching and was able to get this.

demo@ORA12C> select * from t;

F_I L_CAT CHG_DT
--- ----- -----------
F1  VHL   01-FEB-2016
F1  VHL   10-FEB-2016
F1  VHL   15-FEB-2016
F1  MHL   20-FEB-2016
F1  VHL   25-FEB-2016
F1  VHL   28-FEB-2016
F1  MHL   05-MAR-2016
F1  MHL   10-MAR-2016
F2  VHL   01-FEB-2016
F2  VHL   10-FEB-2016
F2  MHL   18-FEB-2016
F2  MHL   21-FEB-2016
F2  VHL   25-FEB-2016

13 rows selected.

demo@ORA12C> select * from t
  2  match_recognize(
  3    partition by f_id
  4    order by chg_dt
  5    measures
  6      match_number() mno,
  7      classifier() cls
  8    all rows per match with unmatched rows
  9    after match skip to last b3
 10    pattern (b1 b2* b3)
 11    define
 12      b2 as l_cat = prev(l_cat) ,
 13      b3 as l_cat <> prev(l_cat) or
 14            next(l_cat) is null )
 15  /

F_I CHG_DT             MNO CLS   L_CAT
--- ----------- ---------- ----- -----
F1  01-FEB-2016          1 B1    VHL
F1  10-FEB-2016          1 B2    VHL
F1  15-FEB-2016          1 B2    VHL
F1  20-FEB-2016          1 B3    MHL
F1  20-FEB-2016          2 B1    MHL
F1  25-FEB-2016          2 B3    VHL
F1  25-FEB-2016          3 B1    VHL
F1  28-FEB-2016          3 B2    VHL
F1  05-MAR-2016          3 B3    MHL
F1  05-MAR-2016          4 B1    MHL
F1  10-MAR-2016          4 B3    MHL
F2  01-FEB-2016          1 B1    VHL
F2  10-FEB-2016          1 B2    VHL
F2  18-FEB-2016          1 B3    MHL
F2  18-FEB-2016          2 B1    MHL
F2  21-FEB-2016          2 B2    MHL
F2  25-FEB-2016          2 B3    VHL

17 rows selected.


but unable to get the last row in the partition f_id=F2 (please compare this output with the above required output), please see below and kindly advice.

demo@ORA12C> select *
  2  from t
  3  match_recognize(
  4    partition by f_id
  5    order by chg_dt
  6    measures
  7      match_number() mno,
  8      b1.l_cat as cat ,
  9      first(chg_dt) as date_from,
 10      last(chg_dt) as date_to
 11    one row per match
 12    after match skip to b3
 13    pattern (b1 b2* b3)
 14    define
 15      b2 as prev(l_cat) = l_cat ,
 16      b3 as prev(l_cat) <> l_cat or
 17            next(l_cat) is null )
 18  /

F_I        MNO CAT   DATE_FROM   DATE_TO
--- ---------- ----- ----------- -----------
F1           1 VHL   01-FEB-2016 20-FEB-2016
F1           2 MHL   20-FEB-2016 25-FEB-2016
F1           3 VHL   25-FEB-2016 05-MAR-2016
F1           4 MHL   05-MAR-2016 10-MAR-2016
F2           1 VHL   01-FEB-2016 18-FEB-2016
F2           2 MHL   18-FEB-2016 25-FEB-2016

6 rows selected.

demo@ORA12C>

Connor McDonald
March 01, 2018 - 1:29 am UTC

Best we could get to is this (Thanks to Keith Laker)

select f_id, 
  l_cat, 
  first_date, 
  final_last_date,
  greatest(final_last_date - first_date, 1) as duration
from (
select mn, 
       cf, 
       f_id, 
       nvl(l_cat,b_l_cat) as l_cat, 
       nvl(first_date, b_first_date) as first_date,
       nvl(last_date, b_last_date) as check_last_date,
       nvl(lead(nvl(first_date, b_first_date), 1) over (partition by f_id ORDER BY  nvl(last_date, b_last_date)),  nvl(last_date, b_last_date)) as final_last_date
FROM my_test MATCH_RECOGNIZE
(partition by f_id order by chg_dt
measures first(strt.chg_dt) as first_date,
         last(a.chg_dt) as last_date,
         strt.l_cat as l_cat,
         first(b.chg_dt) as b_first_date,
         last(b.chg_dt) as b_last_date,
         b.l_cat as b_l_cat,
         match_number() as Mn,
         classifier() as cf
pattern ((strt a+) | b)
  define a AS l_cat = prev(l_cat),
         b as l_cat <> prev(l_cat) and l_cat <> nvl(next(l_cat),prev(l_cat))));



the missing rows

Racer I., February 27, 2018 - 12:20 pm UTC

Hi,

regarding the previous attempt with match_recognize :
I couldn't get it to work without adding fake rows :
with 
expand as (
  select f_id, l_cat, chg_dt
  from my_test
  union all
  select f_id, MAX(l_cat) KEEP (DENSE_RANK LAST ORDER BY chg_dt) l_cat, MAX(chg_dt) chg_dt
  from my_test
  group by f_id)
select *
from expand  
    match_recognize(
      partition by f_id
      order by chg_dt
      measures
        match_number() mno,
        classifier() cls,
        b2.l_cat as l_cat ,
        first(chg_dt) as from_chg_dt,
        last(chg_dt) as to_chg_dt
      one row per match
      after match skip to last b2
     pattern ( b1 i* b2 )
     define
       b2 as l_cat <> prev(l_cat) or next(l_cat) is null,
       i as l_cat = prev(l_cat))


Apparently mr doesn't like single row matches at the end of a partition.

regards,

re-reading again !

Rajeshwaran, Jeyabal, April 26, 2018 - 5:23 am UTC

Team,

Started re-reading this page again today and got a new idea with Match recognize.
and able to solve this sql using MR feature than involving MR and Analytics like above.

demo@ORA12C> select * from t;

F_ID  L_CAT CHG_DT
----- ----- -----------
F1    VHL   01-FEB-2016
F1    VHL   10-FEB-2016
F1    VHL   15-FEB-2016
F1    MHL   20-FEB-2016
F1    VHL   25-FEB-2016
F1    VHL   28-FEB-2016
F1    MHL   05-MAR-2016
F1    MHL   10-MAR-2016
F2    VHL   01-FEB-2016
F2    VHL   10-FEB-2016
F2    MHL   18-FEB-2016
F2    MHL   21-FEB-2016
F2    VHL   25-FEB-2016

13 rows selected.

demo@ORA12C> select f_id,mno,start_dt,
  2      nvl( nvl( x1,end_dt) , start_dt) as new_end_dt
  3  from t
  4  match_recognize(
  5    partition by f_id
  6    order by chg_dt
  7    measures
  8      match_number() mno,
  9      classifier() cls,
 10      b1.chg_dt as start_dt,
 11      last(b2.chg_dt) as end_dt ,
 12      next(chg_dt) as x1
 13    one row per match
 14    pattern( b1 b2*)
 15    define
 16      b2 as prev(l_cat) = l_cat )
 17  order by f_id,mno ;

F_ID         MNO START_DT    NEW_END_DT
----- ---------- ----------- -----------
F1             1 01-FEB-2016 20-FEB-2016
F1             2 20-FEB-2016 25-FEB-2016
F1             3 25-FEB-2016 05-MAR-2016
F1             4 05-MAR-2016 10-MAR-2016
F2             1 01-FEB-2016 18-FEB-2016
F2             2 18-FEB-2016 25-FEB-2016
F2             3 25-FEB-2016 25-FEB-2016

7 rows selected.

demo@ORA12C>

another approach using analytics

Rajeshwaran, Jeyabal, May 17, 2018 - 12:43 pm UTC

demo@ORA12C> select * from t;

F_ID  L_CAT CHG_DT
----- ----- -----------
F1    VHL   01-FEB-2016
F1    VHL   10-FEB-2016
F1    VHL   15-FEB-2016
F1    MHL   20-FEB-2016
F1    VHL   25-FEB-2016
F1    VHL   28-FEB-2016
F1    MHL   05-MAR-2016
F1    MHL   10-MAR-2016
F2    VHL   01-FEB-2016
F2    VHL   10-FEB-2016
F2    MHL   18-FEB-2016
F2    MHL   21-FEB-2016
F2    VHL   25-FEB-2016

13 rows selected.

demo@ORA12C> select f_id, l_cat, x1,x2
  2  from (
  3  select f_id, l_cat, chg_dt,
  4      last_value( end_dt ignore nulls) over( partition by f_id order by chg_dt ) as x1,
  5      case when grp ='x' then grp2 end as x2
  6  from (
  7  select t.*  ,
  8    case when lag(l_cat) over(partition by f_id order by chg_dt) is null or
  9          l_cat <> lag(l_cat) over(partition by f_id order by chg_dt) then
 10          chg_dt end as end_dt ,
 11    case when lead(l_cat) over(partition by f_id order by chg_dt) is null or
 12          l_cat <> lead(l_cat) over(partition by f_id order by chg_dt) then
 13          'x' end as grp ,
 14    case when lead(l_cat) over(partition by f_id order by chg_dt) is null or
 15          l_cat <> lead(l_cat) over(partition by f_id order by chg_dt) then
 16          lead(chg_dt,1,chg_dt) over(partition by f_id order by chg_dt) end as grp2
 17  from t
 18       )
 19       )
 20  where x2 is not null
 21  /

F_ID  L_CAT X1          X2
----- ----- ----------- -----------
F1    VHL   01-FEB-2016 20-FEB-2016
F1    MHL   20-FEB-2016 25-FEB-2016
F1    VHL   25-FEB-2016 05-MAR-2016
F1    MHL   05-MAR-2016 10-MAR-2016
F2    VHL   01-FEB-2016 18-FEB-2016
F2    MHL   18-FEB-2016 25-FEB-2016
F2    VHL   25-FEB-2016 25-FEB-2016

7 rows selected.

demo@ORA12C>

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.