Thanks for the question, NC.

Asked: November 09, 2017 - 4:09 pm UTC

Last updated: November 15, 2017 - 10:57 am UTC

Version: 11.2.0.4

Viewed 1000+ times

You Asked

I have data that looks like this:

create table t (a varchar2(30), b date);

insert into t values (NULL,TO_DATE('2003/05/03 16:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 17:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 18:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 18:30:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 19:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 19:32:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 20:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 20:32:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 21:02:44', 'yyyy/mm/dd hh24:mi:ss'));

I need to assign a group number for each repeating pattern of the TEXT field. The pattern I am looking for is "AAAA - null - AAAA". I want to be able to ignore any other 'nulls' outside of this pattern. In essence, looking to assign a group number like this:

TEXT    DATE/TIME     GROUP
(null) 1/1/2017 0:00 IGNORE FROM ASSIGNING GROUP #
(null) 1/1/2017 1:00 IGNORE FROM ASSIGNING GROUP #
AAAA 1/1/2017 2:00   1
(null) 1/1/2017 2:30   1 
AAAA 1/1/2017 3:00   1
(null) 1/1/2017 4:00 IGNORE FROM ASSIGNING GROUP #
AAAA 1/1/2017 4:30   2
(null) 1/1/2017 5:00   2
AAAA 1/1/2017 5:30   2

I would like the data to look like this:

TEXT    DATE/TIME     GROUP
AAAA 1/1/2017 2:00 1
(null) 1/1/2017 2:30 1
AAAA 1/1/2017 3:00 1
AAAA 1/1/2017 4:30 2
(null) 1/1/2017 5:00 2
AAAA 1/1/2017 5:30 2

I am on Oracle 11g (Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit) & PL/SQL Release 11.2.0.4.0.

I am attempting to identify patterns using SQL Lag/Lead functions & assign group numbers using Dense_Rank. But the issue I am running into is it's including the rows that I wish to IGNORE (as noted above).

Thank you.

and Chris said...

It's a shame you're not on 12c. This is easy-peasy there with match_recognize:

create table t (a varchar2(30), b date);

insert into t values (NULL,TO_DATE('2003/05/03 16:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 17:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 18:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 18:30:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 19:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 19:32:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 20:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 20:32:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 21:02:44', 'yyyy/mm/dd hh24:mi:ss'));

select * from t
match_recognize (
  order by b
  measures match_number() grp
  all rows per match
  pattern (aa nl aa) 
  define 
    aa as a = 'AAAA',
    nl as a is null
);

B                      GRP   A      
03-MAY-2003 18:02:44       1 AAAA   
03-MAY-2003 18:30:44       1        
03-MAY-2003 19:02:44       1 AAAA   
03-MAY-2003 20:02:44       2 AAAA   
03-MAY-2003 20:32:44       2        
03-MAY-2003 21:02:44       2 AAAA

But as you're not...

The problem with analytics is it's hard to stop yourself double counting groups. e.g. you have :

AAAA - null - AAAA - null - AAAA - null - AAAA

Which you consider two groups. So you need to avoid counting the middle A - null - A combination as a group in its own right.

One way to avoid this is with recursion.

Using recursive with you append the current A value, mapping nulls to some "out-of-bounds" value such as '##'

If this equals the final pattern (AAAA##AAAA) or part of it (AAAA##), keep appending.

Otherwise, reset from the beginning again.

Then return the max value of this calculation within the following two rows.

This gives:

with rws as (
  select t.*, row_number() over (order by b) rn
  from   t
), grps (a, b, rn, grp) as (
  select a, b, rn, to_char(a) grp from rws
  where  rn = 1
  union all
  select r.a, r.b, r.rn, 
         case 
           when g.grp || nvl(r.a, '##') in ('AAAA##', 'AAAA##AAAA') then 
             g.grp || nvl(r.a, '##')
           else r.a
         end 
  from   grps g 
  join   rws r
  on     g.rn + 1 = r.rn 
)
  select g.*, 
         case 
           when max(grp) over (
             order by b rows between current row and 2 following
           ) = 'AAAA##AAAA' then rownum 
         end g
  from   grps g;

A      B                      RN   GRP          G   
       03-MAY-2003 16:02:44      1                  
       03-MAY-2003 17:02:44      2                  
AAAA   03-MAY-2003 18:02:44      3 AAAA           3 
       03-MAY-2003 18:30:44      4 AAAA##         4 
AAAA   03-MAY-2003 19:02:44      5 AAAA##AAAA     5 
       03-MAY-2003 19:32:44      6                  
AAAA   03-MAY-2003 20:02:44      7 AAAA           7 
       03-MAY-2003 20:32:44      8 AAAA##         8 
AAAA   03-MAY-2003 21:02:44      9 AAAA##AAAA     9

All you need to do now is number the groups properly ;)

Rating

(3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Model Clause

Rajeshwaran Jeyabal, November 11, 2017 - 3:06 am UTC

Using SQL Model clause

demo@ORA12C> select a,b,rnk
  2  from (
  3  select *
  4  from t
  5  model
  6    dimension by ( row_number() over(order by b) r)
  7    measures( a,b, cast(null as varchar2(40)) grp, 0 rnk,
  8            cast(null as varchar2(40)) grp2 )
  9    rules(
 10      grp[any] order by r =
 11        case when grp[cv()-1] || nvl( a[cv()],'##') in ('AAAA##','AAAA##AAAA') then
 12              grp[cv()-1] || nvl( a[cv()],'##') else a[cv()] end ,
 13      grp2[any] = max(grp) over(order by b rows between current row and 2 following) ,
 14      rnk[any] order by r = case when grp2[cv()] ='AAAA##AAAA' and grp2[cv()-1] <> 'AAAA##AAAA'
 15                      then nvl(rnk[cv()-1],0)+1 else rnk[cv()-1] end )
 16        )
 17  where grp2 ='AAAA##AAAA';

A          B                              RNK
---------- ----------------------- ----------
AAAA       03-MAY-2003 06:02:44 pm          1
           03-MAY-2003 06:30:44 pm          1
AAAA       03-MAY-2003 07:02:44 pm          1
AAAA       03-MAY-2003 08:02:44 pm          2
           03-MAY-2003 08:32:44 pm          2
AAAA       03-MAY-2003 09:02:44 pm          2

6 rows selected.

demo@ORA12C>

OK I give up!

AndyP, November 14, 2017 - 3:04 pm UTC

Please put me out of my misery. Just how do you assign consecutive group numbers, starting with 1, to each of those sets of 3 row-numbered rows? (From the analytic answer I mean, rather than the model or match_recognize ones)

November 14, 2017 - 4:39 pm UTC

One way to do this is with the Tabibitosan method.

Basically, exclude the null rows from the case/max expression. Then assign them a row number.

Subtract this from the original row number and voila, you have your groups!

with rws as (
  select t.*, row_number() over (order by b) rn
  from   t
), grps (a, b, rn, grp) as (
  select a, b, rn, to_char(a) grp from rws
  where  rn = 1
  union all
  select r.a, r.b, r.rn, 
         case 
           when g.grp || nvl(r.a, '##') in ('AAAA##', 'AAAA##AAAA') then 
             g.grp || nvl(r.a, '##')
           else r.a
         end 
  from   grps g 
  join   rws r
  on     g.rn + 1 = r.rn 
), rns as (
  select g.*, 
         case 
           when max(grp) over (
             order by b rows between current row and 2 following
           ) = 'AAAA##AAAA' then rownum 
         end g
  from   grps g
)
  select r.*, 
         g - row_number() over (order by g) 
  from   rns r
  where  g is not null;

Of course

AndyP, November 15, 2017 - 9:03 am UTC

Thanks Chris, that Tabibitosan approach is so neat, but so hard to spell

To answer my own question, of how to assign consecutive group numbers, starting with 1, to each of those sets of 3 row-numbered rows, DENSE_RANK seems to suit

,grpids as
(
select a,b,rn,grp,g,g - row_number() over (order by g) groupid
  from grpdata
 where g is not null
)
select a,b,rn,grp,g,groupid
      ,dense_rank() over (order by groupid) normalised
  from grpids
 order by rn

A      B                            RN GRP                           G    GROUPID NORMALISED
------ -------------------- ---------- -------------------- ---------- ---------- ----------
AAAA   03-May-2003 18:02:44          3 AAAA                          3          2          1
       03-May-2003 18:30:44          4 AAAA##                        4          2          1
AAAA   03-May-2003 19:02:44          5 AAAA##AAAA                    5          2          1
AAAA   03-May-2003 20:02:44          7 AAAA                          7          3          2
       03-May-2003 20:32:44          8 AAAA##                        8          3          2
AAAA   03-May-2003 21:02:44          9 AAAA##AAAA                    9          3          2

November 15, 2017 - 10:57 am UTC

It's pronouncing Tabibitosan I find most difficult ;)

But yep, dense_ranking the result of this gives what you want.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Model Clause

OK I give up!

Of course

More to Explore

Analytics

Identify patterns and create groups

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

Model Clause

OK I give up!

Of course

More to Explore

Analytics