I have data that looks like this:
create table t (a varchar2(30), b date);
insert into t values (NULL,TO_DATE('2003/05/03 16:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 17:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 18:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 18:30:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 19:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 19:32:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 20:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 20:32:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 21:02:44', 'yyyy/mm/dd hh24:mi:ss'));
I need to assign a group number for each repeating pattern of the TEXT field. The pattern I am looking for is "AAAA - null - AAAA". I want to be able to ignore any other 'nulls' outside of this pattern. In essence, looking to assign a group number like this:
TEXT DATE/TIME GROUP
(null) 1/1/2017 0:00 IGNORE FROM ASSIGNING GROUP #
(null) 1/1/2017 1:00 IGNORE FROM ASSIGNING GROUP #
AAAA 1/1/2017 2:00 1
(null) 1/1/2017 2:30 1
AAAA 1/1/2017 3:00 1
(null) 1/1/2017 4:00 IGNORE FROM ASSIGNING GROUP #
AAAA 1/1/2017 4:30 2
(null) 1/1/2017 5:00 2
AAAA 1/1/2017 5:30 2
I would like the data to look like this:
TEXT DATE/TIME GROUP
AAAA 1/1/2017 2:00 1
(null) 1/1/2017 2:30 1
AAAA 1/1/2017 3:00 1
AAAA 1/1/2017 4:30 2
(null) 1/1/2017 5:00 2
AAAA 1/1/2017 5:30 2
I am on Oracle 11g (Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit) & PL/SQL Release 11.2.0.4.0.
I am attempting to identify patterns using SQL Lag/Lead functions & assign group numbers using Dense_Rank. But the issue I am running into is it's including the rows that I wish to IGNORE (as noted above).
Thank you.
It's a shame you're not on 12c. This is easy-peasy there with match_recognize:
create table t (a varchar2(30), b date);
insert into t values (NULL,TO_DATE('2003/05/03 16:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 17:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 18:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 18:30:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 19:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 19:32:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 20:02:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values (NULL,TO_DATE('2003/05/03 20:32:44', 'yyyy/mm/dd hh24:mi:ss'));
insert into t values ('AAAA',TO_DATE('2003/05/03 21:02:44', 'yyyy/mm/dd hh24:mi:ss'));
select * from t
match_recognize (
order by b
measures match_number() grp
all rows per match
pattern (aa nl aa)
define
aa as a = 'AAAA',
nl as a is null
);
B GRP A
03-MAY-2003 18:02:44 1 AAAA
03-MAY-2003 18:30:44 1
03-MAY-2003 19:02:44 1 AAAA
03-MAY-2003 20:02:44 2 AAAA
03-MAY-2003 20:32:44 2
03-MAY-2003 21:02:44 2 AAAA
But as you're not...
The problem with analytics is it's hard to stop yourself double counting groups. e.g. you have :
AAAA - null - AAAA - null - AAAA - null - AAAA
Which you consider two groups. So you need to avoid counting the middle A - null - A combination as a group in its own right.
One way to avoid this is with recursion.
Using recursive with you append the current A value, mapping nulls to some "out-of-bounds" value such as '##'
If this equals the final pattern (AAAA##AAAA) or part of it (AAAA##), keep appending.
Otherwise, reset from the beginning again.
Then return the max value of this calculation within the following two rows.
This gives:
with rws as (
select t.*, row_number() over (order by b) rn
from t
), grps (a, b, rn, grp) as (
select a, b, rn, to_char(a) grp from rws
where rn = 1
union all
select r.a, r.b, r.rn,
case
when g.grp || nvl(r.a, '##') in ('AAAA##', 'AAAA##AAAA') then
g.grp || nvl(r.a, '##')
else r.a
end
from grps g
join rws r
on g.rn + 1 = r.rn
)
select g.*,
case
when max(grp) over (
order by b rows between current row and 2 following
) = 'AAAA##AAAA' then rownum
end g
from grps g;
A B RN GRP G
03-MAY-2003 16:02:44 1
03-MAY-2003 17:02:44 2
AAAA 03-MAY-2003 18:02:44 3 AAAA 3
03-MAY-2003 18:30:44 4 AAAA## 4
AAAA 03-MAY-2003 19:02:44 5 AAAA##AAAA 5
03-MAY-2003 19:32:44 6
AAAA 03-MAY-2003 20:02:44 7 AAAA 7
03-MAY-2003 20:32:44 8 AAAA## 8
AAAA 03-MAY-2003 21:02:44 9 AAAA##AAAA 9
All you need to do now is number the groups properly ;)