Skip to Main Content

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Rajendra.

Asked: November 21, 2016 - 1:18 pm UTC

Last updated: November 28, 2016 - 5:55 pm UTC

Version: 11g

Viewed 1000+ times

You Asked

Hi Tom,

I am a newbie to oracle and i have a requirement. I even got struck at creating the table as i want to store only time in one of the column 'login'


input 1
userid login
1 9:00
2 9:15
3 9:20
1 9:25
4 10:30

Below is the desired output if the login interval of the user is less than 30 minutes (here userid 1 logged in at 9:00 and then again loggedin at 9:25 , so the interval is < 30 minutes, in that case the seqid should not change ).



userid login seqid
1 9:00 1
2 9:15 2
3 9:20 3
1 9:25 1
4 10:30 4



input 2
userid login
1 9:00
2 9:15
3 9:20
1 9:35
4 10:30

desired output if the login interval of the user is greater than 30 minutes (here userid 1 logged in at 9:00 and then again logged in at 9:35 , so the interval is > 30 minutes, in that case the seqid should change. It should be the next sequence in the list)

userid login seqid
1 9:00 1
2 9:15 2
3 9:20 3
1 9:35 4
4 10:30 5

Please help me with this.

Thanks
Raj

and we said...

Surely you need to store the date and the time? Otherwise how do you tell between a login today at 1am and a login yesterday at the same time?

And if a user logs in at 9am, 9:25am and 9:35am is that two sessions (because 9:35 > 30 mins after 9) or one (because 9:35 is only 10 mins after the login at 9:25)?

I'll discuss both solutions. So before we begin proper, here's a table with sample data:

create table t (
  user_id    int,
  login_date date
);

insert into t values (1, trunc(sysdate)+interval '9' hour);
insert into t values (1, trunc(sysdate)+interval '9:05' hour to minute);
insert into t values (2, trunc(sysdate)+interval '9:15' hour to minute);
insert into t values (3, trunc(sysdate)+interval '9:20' hour to minute);
insert into t values (1, trunc(sysdate)+interval '9:25' hour to minute);
insert into t values (4, trunc(sysdate)+interval '9:25' hour to minute);
insert into t values (1, trunc(sysdate)+interval '9:35' hour to minute);
insert into t values (2, trunc(sysdate)+interval '9:45' hour to minute);
insert into t values (1, trunc(sysdate)+interval '10:00' hour to minute);
insert into t values (4, trunc(sysdate)+interval '10:15' hour to minute);
insert into t values (1, trunc(sysdate)+interval '10:35' hour to minute);
insert into t values (4, trunc(sysdate)+interval '10:35' hour to minute);
insert into t values (5, trunc(sysdate)+interval '10:35' hour to minute);

commit;


If it's the second case (group all logins that were within 30 mins of the previous), you can do the following:

1. Find the time difference between the current and previous row
2. If this is > 30 mins, return the login date otherwise null
3. "Fill down" the blank times with the first
4. Rank the rows according to this calculated values.

To do step 1, use lag to find the previous value. You want to split this up by user, so partition by user_id and order by date:

login_date - lag(login_date, 1, date'1900-01-01') 
  over (partition by user_id order by login_date)


The third argument to lag (date'1900-01-01') is the default value if there's no previous row. i.e. the first login for each user. Set this to any suitable out-of-bounds date.

Stick this in a case expression to return the login_date if there's > 30 minutes between the two

select user_id, login_date,
       case 
         when ( 
           login_date - 
           lag(login_date, 1, date'1900-01-01') 
             over (partition by user_id order by login_date) 
         ) > 30/1440 then
           login_date
       end dt
from   t;

USER_ID  LOGIN_DATE            DT                    
1        22-NOV-2016 09:00:00  22-NOV-2016 09:00:00  
1        22-NOV-2016 09:05:00                        
1        22-NOV-2016 09:25:00                        
1        22-NOV-2016 09:35:00                        
1        22-NOV-2016 10:00:00                        
1        22-NOV-2016 10:35:00  22-NOV-2016 10:35:00  
2        22-NOV-2016 09:15:00  22-NOV-2016 09:15:00  
2        22-NOV-2016 09:45:00                        
3        22-NOV-2016 09:20:00  22-NOV-2016 09:20:00  
4        22-NOV-2016 09:25:00  22-NOV-2016 09:25:00  
4        22-NOV-2016 10:15:00  22-NOV-2016 10:15:00  
4        22-NOV-2016 10:35:00                        
5        22-NOV-2016 10:35:00  22-NOV-2016 10:35:00
</code

You can then populate the blanks by returning the last_value of this calculation, ignoring nulls:

<code>
with rws as (
select user_id, login_date,
       case 
         when ( 
           login_date - 
           lag(login_date, 1, date'1900-01-01') 
             over (partition by user_id order by login_date) 
         ) > 30/1440 then
           login_date
       end dt
from   t
)
  select user_id, login_date, 
         last_value(dt) ignore nulls 
           over (partition by user_id order by login_date) lv
  from rws;

USER_ID  LOGIN_DATE            LV                    
1        22-NOV-2016 09:00:00  22-NOV-2016 09:00:00  
1        22-NOV-2016 09:05:00  22-NOV-2016 09:00:00  
1        22-NOV-2016 09:25:00  22-NOV-2016 09:00:00  
1        22-NOV-2016 09:35:00  22-NOV-2016 09:00:00  
1        22-NOV-2016 10:00:00  22-NOV-2016 09:00:00  
1        22-NOV-2016 10:35:00  22-NOV-2016 10:35:00  
2        22-NOV-2016 09:15:00  22-NOV-2016 09:15:00  
2        22-NOV-2016 09:45:00  22-NOV-2016 09:15:00  
3        22-NOV-2016 09:20:00  22-NOV-2016 09:20:00  
4        22-NOV-2016 09:25:00  22-NOV-2016 09:25:00  
4        22-NOV-2016 10:15:00  22-NOV-2016 10:15:00  
4        22-NOV-2016 10:35:00  22-NOV-2016 10:15:00  
5        22-NOV-2016 10:35:00  22-NOV-2016 10:35:00


You can now use this calculated LV value to apply the sequence. The analytic dense_rank is what you want. This gives rows with the same value the same rank with no gaps:

with rws as (
select user_id, login_date,
       case 
         when ( 
           login_date - 
           lag(login_date, 1, date'1900-01-01') 
             over (partition by user_id order by login_date) 
         ) > 30/1440 then
           login_date
       end dt
from   t
), grps as (
  select user_id, login_date, 
         last_value(dt) ignore nulls 
           over (partition by user_id order by login_date) lv
  from rws
)
  select user_id, login_date, 
         dense_rank() over (order by lv, user_id) seq
  from   grps
  order  by login_date, user_id;

USER_ID  LOGIN_DATE            SEQ  
1        22-NOV-2016 09:00:00  1    
1        22-NOV-2016 09:05:00  1    
2        22-NOV-2016 09:15:00  2    
3        22-NOV-2016 09:20:00  3    
1        22-NOV-2016 09:25:00  1    
4        22-NOV-2016 09:25:00  4    
1        22-NOV-2016 09:35:00  1    
2        22-NOV-2016 09:45:00  2    
1        22-NOV-2016 10:00:00  1    
4        22-NOV-2016 10:15:00  5    
1        22-NOV-2016 10:35:00  6    
4        22-NOV-2016 10:35:00  5    
5        22-NOV-2016 10:35:00  7


The above makes heavy use of analytic functions. If you want to know more about them, check out Connor's KISS video series:

https://www.youtube.com/playlist?list=PLJMaoEWvHwFJDyhMLCkNSSUQWw9waFkIj

If you need the first case (split into 30 minute groups) this is easiest with pattern matching.

Again, you want to partition by user and order by date. Once you've done this you want patterns where the current date is within 30 minutes of the first date in the match. So your pattern variable looks like:

thirty as login_date - first(login_date) <= 30/1440


You then want the first date in the pattern. You can return this in the measures clause with

first(login_date) as fd


All together this gives:

select * from t
match_recognize (
  partition by user_id
  order  by login_date
  measures 
     first(login_date) as fd
  all rows per match
  pattern (thirty*)
  define
    thirty as login_date - first(login_date) <= 30/1440
);

USER_ID  LOGIN_DATE            FD                    
1        22-NOV-2016 09:00:00  22-NOV-2016 09:00:00  
1        22-NOV-2016 09:05:00  22-NOV-2016 09:00:00  
1        22-NOV-2016 09:25:00  22-NOV-2016 09:00:00  
1        22-NOV-2016 09:35:00  22-NOV-2016 09:35:00  
1        22-NOV-2016 10:00:00  22-NOV-2016 09:35:00  
1        22-NOV-2016 10:35:00  22-NOV-2016 10:35:00  
2        22-NOV-2016 09:15:00  22-NOV-2016 09:15:00  
2        22-NOV-2016 09:45:00  22-NOV-2016 09:15:00  
3        22-NOV-2016 09:20:00  22-NOV-2016 09:20:00  
4        22-NOV-2016 09:25:00  22-NOV-2016 09:25:00  
4        22-NOV-2016 10:15:00  22-NOV-2016 10:15:00  
4        22-NOV-2016 10:35:00  22-NOV-2016 10:15:00  
5        22-NOV-2016 10:35:00  22-NOV-2016 10:35:00


Finish by calculating the dense_rank, ordered by FD as in the other solution.

But this requires 12c. If you need an 11g solution let us know.

If you want to know more about pattern matching check out the following blog posts:

http://oracle-big-data.blogspot.co.uk/search/label/Pattern%20Matching


Rating

  (9 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

11G Solutions

Rajeshwaran, Jeyabal, November 23, 2016 - 8:09 am UTC

....
If you need an 11g solution let us know.
....


I was able to get it using MODEL clause, any possibility using Analytics?

demo@ORA12C> select *
  2  from t
  3  model
  4    partition by (user_id)
  5    dimension by ( row_number() over(partition by user_id order by login_date) r )
  6    measures ( login_date , cast(null as date) x, 0 y )
  7    rules (
  8      x[any] order by r = case when cv(r) = 1 then login_date[cv()]
  9                               when login_date[cv()] - x[cv()-1] <= 30/24/60 then x[cv()-1]
 10                               else login_date[cv()] end ,
 11      y[any] order by r = case when cv(r)=1 then 1
 12                               when x[cv()] = x[cv()-1] then y[cv()-1]
 13                               else y[cv()-1]+1 end
 14          )
 15  order by user_id,login_date
 16  /

   USER_ID          R LOGIN_DATE           X                             Y
---------- ---------- -------------------- -------------------- ----------
         1          1 22-NOV-2016 09:00:00 22-NOV-2016 09:00:00          1
         1          2 22-NOV-2016 09:05:00 22-NOV-2016 09:00:00          1
         1          3 22-NOV-2016 09:25:00 22-NOV-2016 09:00:00          1
         1          4 22-NOV-2016 09:35:00 22-NOV-2016 09:35:00          2
         1          5 22-NOV-2016 10:00:00 22-NOV-2016 09:35:00          2
         1          6 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          3
         2          1 22-NOV-2016 09:15:00 22-NOV-2016 09:15:00          1
         2          2 22-NOV-2016 09:45:00 22-NOV-2016 09:15:00          1
         3          1 22-NOV-2016 09:20:00 22-NOV-2016 09:20:00          1
         4          1 22-NOV-2016 09:25:00 22-NOV-2016 09:25:00          1
         4          2 22-NOV-2016 10:15:00 22-NOV-2016 10:15:00          2
         4          3 22-NOV-2016 10:35:00 22-NOV-2016 10:15:00          2
         5          1 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          1

13 rows selected.

demo@ORA12C>

Chris Saxon
November 24, 2016 - 1:11 pm UTC

If you're happy to split the logins into 30 min intervals, you can do something like:

create table t (
  user_id    int,
  login_date date
);

insert into t values (1, trunc(sysdate)+interval '9' hour);
insert into t values (1, trunc(sysdate)+interval '9:05' hour to minute);
insert into t values (1, trunc(sysdate)+interval '9:25' hour to minute);
insert into t values (1, trunc(sysdate)+interval '9:35' hour to minute);
insert into t values (1, trunc(sysdate)+interval '10:55' hour to minute);
insert into t values (1, trunc(sysdate)+interval '11:05' hour to minute);

commit;

select login_date, floor((login_date-trunc(login_date)) * 1440 / 30) grp
from   t;

LOGIN_DATE            GRP  
24-NOV-2016 09:00:00  18   
24-NOV-2016 09:05:00  18   
24-NOV-2016 09:25:00  18   
24-NOV-2016 09:35:00  19   
24-NOV-2016 10:55:00  21   
24-NOV-2016 11:05:00  22


But if it has to be 30 mins since the last login... it gets tricky. You could use recursive CTEs:

with rws (user_id, login_date, min_date, grp) as (
  select user_id, login_date, login_date min_date, 1 grp from t t1
  where  login_date in (
    select min(login_date) from t t2
    where  t2.user_id = t2.user_id
  )
  union all
  select t.user_id, t.login_date,
         case 
           when t.login_date - rws.min_date > 30/1440 then 
             t.login_date 
           else 
             rws.min_date
         end min_date,
         case 
           when t.login_date - rws.min_date > 30/1440 then 
             grp + 1
           else 
             grp
         end grp 
  from   rws 
  join   t
  on     t.user_id = rws.user_id
  and    t.login_date > rws.login_date
  and    t.login_date = (
    select min(login_date) from t t2
    where  t2.user_id = t2.user_id
    and    t2.login_date > rws.login_date
  )
)
 select * from rws;

USER_ID  LOGIN_DATE            MIN_DATE              GRP  
1        24-NOV-2016 09:00:00  24-NOV-2016 09:00:00  1    
1        24-NOV-2016 09:05:00  24-NOV-2016 09:00:00  1    
1        24-NOV-2016 09:25:00  24-NOV-2016 09:00:00  1    
1        24-NOV-2016 09:35:00  24-NOV-2016 09:35:00  2    
1        24-NOV-2016 10:55:00  24-NOV-2016 10:55:00  3    
1        24-NOV-2016 11:05:00  24-NOV-2016 10:55:00  3

11G Solutions

Rajeshwaran, Jeyabal, November 25, 2016 - 5:51 am UTC

Lets start with this data set in place.
truncate table t;
insert into t values(1,to_date('22-NOV-2016 09:00:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(1,to_date('22-NOV-2016 09:05:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(1,to_date('22-NOV-2016 09:25:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(1,to_date('22-NOV-2016 09:35:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(1,to_date('22-NOV-2016 10:00:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(1,to_date('22-NOV-2016 10:35:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(2,to_date('22-NOV-2016 09:15:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(2,to_date('22-NOV-2016 09:45:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(3,to_date('22-NOV-2016 09:20:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(4,to_date('22-NOV-2016 09:25:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(4,to_date('22-NOV-2016 10:15:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(4,to_date('22-NOV-2016 10:35:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(5,to_date('22-NOV-2016 10:35:00','dd-mon-yyyy hh24:mi:ss'));
commit;



Given this
....
If you need the first case (split into 30 minute groups) this is easiest with pattern matching.
Again, you want to partition by user and order by date. Once you've done this you want patterns
where the current date is within 30 minutes of the first date in the match.
So your pattern variable looks like:

thirty as login_date - first(login_date) <= 30/1440
....


with pattern matching (works from 12c and above), was able to get this.

demo@ORA12C> select user_id,login_date,fd,
  2        dense_rank() over(partition by user_id
  3            order by fd) rnk
  4  from (
  5  select *
  6  from t
  7  match_recognize(
  8    partition by user_id
  9    order by login_date
 10    measures
 11      match_number() mno,
 12      classifier() cls,
 13      strt.login_date as fd
 14    all rows per match
 15    pattern( strt down*)
 16    define
 17      down as login_date - strt.login_date <= (30/24/60) )
 18        )
 19  /

   USER_ID LOGIN_DATE           FD                          RNK
---------- -------------------- -------------------- ----------
         1 22-NOV-2016 09:00:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:05:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:25:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:35:00 22-NOV-2016 09:35:00          2
         1 22-NOV-2016 10:00:00 22-NOV-2016 09:35:00          2
         1 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          3
         2 22-NOV-2016 09:15:00 22-NOV-2016 09:15:00          1
         2 22-NOV-2016 09:45:00 22-NOV-2016 09:15:00          1
         3 22-NOV-2016 09:20:00 22-NOV-2016 09:20:00          1
         4 22-NOV-2016 09:25:00 22-NOV-2016 09:25:00          1
         4 22-NOV-2016 10:15:00 22-NOV-2016 10:15:00          2
         4 22-NOV-2016 10:35:00 22-NOV-2016 10:15:00          2
         5 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          1

13 rows selected.


Now thinking about to handle this in 11g database, so with MODEL clause got this.

demo@ORA12C> select user_id,login_date,x as fd,y as rnk
  2  from t
  3  model
  4    partition by (user_id)
  5    dimension by ( row_number() over(partition by user_id order by login_date) r )
  6    measures ( login_date , cast(null as date) x, 0 y )
  7    rules (
  8      x[any] order by r = case when cv(r) = 1 then login_date[cv()]
  9                               when login_date[cv()] - x[cv()-1] <= 30/24/60 then x[cv()-1]
 10                               else login_date[cv()] end ,
 11      y[any] order by r = case when cv(r)=1 then 1
 12                               when x[cv()] = x[cv()-1] then y[cv()-1]
 13                               else y[cv()-1]+1 end
 14          )
 15  order by user_id,login_date;

   USER_ID LOGIN_DATE           FD                          RNK
---------- -------------------- -------------------- ----------
         1 22-NOV-2016 09:00:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:05:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:25:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:35:00 22-NOV-2016 09:35:00          2
         1 22-NOV-2016 10:00:00 22-NOV-2016 09:35:00          2
         1 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          3
         2 22-NOV-2016 09:15:00 22-NOV-2016 09:15:00          1
         2 22-NOV-2016 09:45:00 22-NOV-2016 09:15:00          1
         3 22-NOV-2016 09:20:00 22-NOV-2016 09:20:00          1
         4 22-NOV-2016 09:25:00 22-NOV-2016 09:25:00          1
         4 22-NOV-2016 10:15:00 22-NOV-2016 10:15:00          2
         4 22-NOV-2016 10:35:00 22-NOV-2016 10:15:00          2
         5 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          1

13 rows selected.

demo@ORA12C>


with the above recursive with clause, don't get the correct result set.

demo@ORA12C> with rws (user_id, login_date, min_date, grp) as (
  2    select user_id, login_date, login_date min_date, 1 grp from t t1
  3    where  login_date in (
  4      select min(login_date) from t t2
  5      where  t2.user_id = t2.user_id
  6    )
  7    union all
  8    select t.user_id, t.login_date,
  9           case
 10             when t.login_date - rws.min_date > 30/1440 then
 11               t.login_date
 12             else
 13               rws.min_date
 14           end min_date,
 15           case
 16             when t.login_date - rws.min_date > 30/1440 then
 17               grp + 1
 18             else
 19               grp
 20           end grp
 21    from   rws
 22    join   t
 23    on     t.user_id = rws.user_id
 24    and    t.login_date > rws.login_date
 25    and    t.login_date = (
 26      select min(login_date) from t t2
 27      where  t2.user_id = t2.user_id
 28      and    t2.login_date > rws.login_date
 29    )
 30  )
 31   select * from rws;

   USER_ID LOGIN_DATE           MIN_DATE                    GRP
---------- -------------------- -------------------- ----------
         1 22-NOV-2016 09:00:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:05:00 22-NOV-2016 09:00:00          1

2 rows selected.

demo@ORA12C>


looks like few join were in-correct in the Recursive with clauses.
....
4 select min(login_date) from t t2
5 where t2.user_id = t2.user_id

27 where t2.user_id = t2.user_id
28 and t2.login_date > rws.login_date
....


t2.user_id joins again with t2.user_id , hence the problem is

The algorithm for the recursive with clause goes like this.
a) for each user start with the least login_date as grp=1
b) for each user, get the next login_date greater than the above least login date and find the distance between the current login_date and the least login date 
c) if the distance is more than 30 mins, then increment the grp by one else have it as such.
d) then proceed the same way for others rows 


Now the updated recursive with clause goes like this.

demo@ORA12C> with rws ( user_id, login_date, min_dt, grp ) as (
  2  select t1.user_id, t1.login_date , t1.login_date min_dt, 1 grp
  3  from t t1
  4  where login_date = ( select min(t2.login_date)
  5        from t t2
  6        where t2.user_id = t1.user_id )
  7  union all
  8  select t3.user_id, t3.login_date,
  9            case when t3.login_date - r1.min_dt > 30/24/60 then
 10                    t3.login_date else r1.min_dt end ,
 11            case when t3.login_date - r1.min_dt > 30/24/60 then
 12                    r1.grp +1 else r1.grp end
 13  from t t3, rws r1
 14  where t3.user_id = r1.user_id and
 15  t3.login_date > r1.login_date and
 16  t3.login_date = ( select min(t4.login_date)
 17        from t t4
 18        where t4.user_id = r1.user_id and
 19        t4.login_date > r1.login_date )
 20  )
 21  select user_id, login_date, min_dt fd, grp rnk
 22  from rws
 23  order by user_id, login_date;

   USER_ID LOGIN_DATE           FD                          RNK
---------- -------------------- -------------------- ----------
         1 22-NOV-2016 09:00:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:05:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:25:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:35:00 22-NOV-2016 09:35:00          2
         1 22-NOV-2016 10:00:00 22-NOV-2016 09:35:00          2
         1 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          3
         2 22-NOV-2016 09:15:00 22-NOV-2016 09:15:00          1
         2 22-NOV-2016 09:45:00 22-NOV-2016 09:15:00          1
         3 22-NOV-2016 09:20:00 22-NOV-2016 09:20:00          1
         4 22-NOV-2016 09:25:00 22-NOV-2016 09:25:00          1
         4 22-NOV-2016 10:15:00 22-NOV-2016 10:15:00          2
         4 22-NOV-2016 10:35:00 22-NOV-2016 10:15:00          2
         5 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          1

13 rows selected.

demo@ORA12C>


But do we have any way to get this using Analytics ?

Tried using Analytics, but got only this. (which is incorrect!)

demo@ORA12C> select user_id,login_date, grp fd,
  2            dense_rank() over(partition by user_id
  3                order by grp ) rnk
  4  from (
  5  select user_id,login_date,
  6            min(login_date) over(partition by user_id) min_dt ,
  7            case when login_date - min(login_date) over(partition by user_id) > 30/24/60
  8                      then login_date
  9                      else min(login_date) over(partition by user_id)
 10              end grp
 11  from t
 12       )
 13  order by user_id,login_date
 14  /

   USER_ID LOGIN_DATE           FD                          RNK
---------- -------------------- -------------------- ----------
         1 22-NOV-2016 09:00:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:05:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:25:00 22-NOV-2016 09:00:00          1
         1 22-NOV-2016 09:35:00 22-NOV-2016 09:35:00          2
         1 22-NOV-2016 10:00:00 22-NOV-2016 10:00:00          3
         1 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          4
         2 22-NOV-2016 09:15:00 22-NOV-2016 09:15:00          1
         2 22-NOV-2016 09:45:00 22-NOV-2016 09:15:00          1
         3 22-NOV-2016 09:20:00 22-NOV-2016 09:20:00          1
         4 22-NOV-2016 09:25:00 22-NOV-2016 09:25:00          1
         4 22-NOV-2016 10:15:00 22-NOV-2016 10:15:00          2
         4 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          3
         5 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          1

13 rows selected.

demo@ORA12C>


in this above example, the user_id=4 has three dates ( 22-NOV-2016 09:25:00, 22-NOV-2016 10:15:00 and 22-NOV-2016 10:35:00 ) - order by login_date asc

a) take the first date(09:25) and compare that with the next date (10:15)
b) if the difference is less than 30 min, then same group else new group, then proceed the comparison using (10:15) as a base
c) next the distance between 10:15 and 10:35 is less the 30 min, so lies in the same group

so user_id=4 has totally two group - group=1 has only one element (22-NOV-2016 09:25:00) and group=2 has two elements ( 22-NOV-2016 10:15:00 and 22-NOV-2016 10:35:00 )

These kind of inter-row calculations/results are easy to handle using MODEL clause then Analytics.

kindly let us know if you have any options to get this using Analytics.
Chris Saxon
November 25, 2016 - 4:56 pm UTC

Good catch - thanks.

Though the recursive solution still isn't quite right. The first login for each user gets rank = 1. Only the very first login should have this value. The sequence needs to go across all users.

Is it possible with "pure" analytics? Maybe. I can't find a correct solution yet...

Adding to MATCH_RECOGNIZE solution

Stew Ashton, November 25, 2016 - 9:39 am UTC

If I read the original answer right, MATCH_RECOGNIZE is supposed to do most of the work but DENSE_RANK is used to rank the groups.

It seems to me that the MATCH_NUMBER() function provides that rank directly.
select * from t
match_recognize (
  partition by user_id
  order  by login_date
  measures 
     first(login_date) as fd,
     match_number() rnk
  all rows per match
  pattern (thirty*)
  define
    thirty as login_date - first(login_date) <= 30/1440
);

   USER_ID LOGIN_DATE       FD                      RNK
---------- ---------------- ---------------- ----------
         1 2016-11-25 09:00 2016-11-25 09:00          1
         1 2016-11-25 09:05 2016-11-25 09:00          1
         1 2016-11-25 09:25 2016-11-25 09:00          1
         1 2016-11-25 09:35 2016-11-25 09:35          2
         1 2016-11-25 10:00 2016-11-25 09:35          2
         1 2016-11-25 10:35 2016-11-25 10:35          3
         2 2016-11-25 09:15 2016-11-25 09:15          1
         2 2016-11-25 09:45 2016-11-25 09:15          1
         3 2016-11-25 09:20 2016-11-25 09:20          1
         4 2016-11-25 09:25 2016-11-25 09:25          1
         4 2016-11-25 10:15 2016-11-25 10:15          2
         4 2016-11-25 10:35 2016-11-25 10:15          2
         5 2016-11-25 10:35 2016-11-25 10:35          1


Unless I am missing something...

Best regards, Stew
Chris Saxon
November 25, 2016 - 10:03 am UTC

The rank has to cross user ids. i.e. there should only be one set of rows with rank = 1. Using match_number() resets to 1 for new user_id, giving multiple.

Oops, thanks

Stew Ashton, November 25, 2016 - 2:16 pm UTC

Ah, in the dense_rank() the ordering is by "first login date", then user_id. So I was missing something...

Thanks for setting me straight, and excellent answer!

Best regards, Stew
Chris Saxon
November 25, 2016 - 4:51 pm UTC

Cheers. I though match_number() was good enough originally too...


Performance issues...

Brendan, November 25, 2016 - 4:00 pm UTC

I wrote a blog post just lately on this kind of problem solved with features from different Oracle versions. May I comment on the performance aspects, based on what I learned there?

Here is the execution plan for the recursive subquery solution (as in Rajeshwaran's corrected version):

Plan hash value: 4272184493

-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |         |      1 |        |     13 |00:00:00.01 |     112 |       |       |          |
|   1 |  SORT ORDER BY                             |         |      1 |      4 |     13 |00:00:00.01 |     112 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW                                     |         |      1 |      4 |     13 |00:00:00.01 |     112 |       |       |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST|         |      1 |        |     13 |00:00:00.01 |     112 |       |       |          |
|*  4 |     HASH JOIN                              |         |      1 |      3 |      5 |00:00:00.01 |      14 |  1156K|  1156K|  664K (0)|
|   5 |      VIEW                                  | VW_SQ_1 |      1 |     13 |      5 |00:00:00.01 |       7 |       |       |          |
|   6 |       HASH GROUP BY                        |         |      1 |     13 |      5 |00:00:00.01 |       7 |  1115K|  1115K|  852K (0)|
|   7 |        TABLE ACCESS FULL                   | T       |      1 |     13 |     13 |00:00:00.01 |       7 |       |       |          |
|   8 |      TABLE ACCESS FULL                     | T       |      1 |     13 |     13 |00:00:00.01 |       7 |       |       |          |
|*  9 |     FILTER                                 |         |      6 |        |      8 |00:00:00.01 |      98 |       |       |          |
|* 10 |      HASH JOIN                             |         |      6 |      1 |     19 |00:00:00.01 |      42 |   927K|   927K|  787K (0)|
|  11 |       RECURSIVE WITH PUMP                  |         |      6 |        |     13 |00:00:00.01 |       0 |       |       |          |
|  12 |       TABLE ACCESS FULL                    | T       |      6 |     13 |     78 |00:00:00.01 |      42 |       |       |          |
|  13 |      SORT AGGREGATE                        |         |      8 |      1 |      8 |00:00:00.01 |      56 |       |       |          |
|* 14 |       TABLE ACCESS FULL                    | T       |      8 |      1 |     19 |00:00:00.01 |      56 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("LOGIN_DATE"="MIN(LOGIN_DATE)" AND "ITEM_0"="T1"."USER_ID")
   9 - filter("T"."LOGIN_DATE"=)
  10 - access("T"."USER_ID"="RWS"."USER_ID")
       filter("T"."LOGIN_DATE">"RWS"."LOGIN_DATE")
  14 - filter(("T2"."USER_ID"=:B1 AND "T2"."LOGIN_DATE">:B2))

Note
-----
   - dynamic sampling used for this statement (level=2)


There are 4 scans of the table. I would first order the records up front in a separate subquery, which is simpler as well as faster than using a correlated subquery for minimum date. Here is the revised query with resulting plan:

Version avoiding the correlated subquery for min date
SQL> with rows_ordered AS (
  2   select user_id, login_date, Row_Number() over (partition by user_id order by login_date) rn from t
  3  ), rws (user_id, login_date, min_date, rn, grp) as (
  4    select user_id, login_date, login_date, 1, 1
  5    from rows_ordered t1
  6    where  t1.rn = 1
  7    union all
  8    select t.user_id, t.login_date,
  9           case
 10             when t.login_date - rws.min_date > 30/1440 then
 11               t.login_date
 12             else
 13               rws.min_date
 14           end,
 15           t.rn,
 16           case
 17             when t.login_date - rws.min_date > 30/1440 then
 18               grp + 1
 19             else
 20               grp
 21           end
 22    from   rws
 23    join   rows_ordered t
 24    on     t.user_id = rws.user_id
 25    and    t.rn = rws.rn + 1
 26  )
 27   select  /*+ LOGIN_BPF gather_plan_statistics */ * from rws
 28  order by user_id, login_date
 29  /

Plan hash value: 1188792140

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |      |      1 |        |     13 |00:00:00.01 |      49 |       |       |          |
|   1 |  SORT ORDER BY                             |      |      1 |     15 |     13 |00:00:00.01 |      49 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW                                     |      |      1 |     15 |     13 |00:00:00.01 |      49 |       |       |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |      1 |        |     13 |00:00:00.01 |      49 |       |       |          |
|*  4 |     VIEW                                   |      |      1 |     13 |      5 |00:00:00.01 |       7 |       |       |          |
|*  5 |      WINDOW SORT PUSHED RANK               |      |      1 |     13 |      8 |00:00:00.01 |       7 |  2048 |  2048 | 2048  (0)|
|   6 |       TABLE ACCESS FULL                    | T    |      1 |     13 |     13 |00:00:00.01 |       7 |       |       |          |
|*  7 |     HASH JOIN                              |      |      6 |      2 |      8 |00:00:00.01 |      42 |   933K|   933K|  867K (0)|
|   8 |      RECURSIVE WITH PUMP                   |      |      6 |        |     13 |00:00:00.01 |       0 |       |       |          |
|   9 |      VIEW                                  |      |      6 |     13 |     78 |00:00:00.01 |      42 |       |       |          |
|  10 |       WINDOW SORT                          |      |      6 |     13 |     78 |00:00:00.01 |      42 |  2048 |  2048 | 2048  (0)|
|  11 |        TABLE ACCESS FULL                   | T    |      6 |     13 |     78 |00:00:00.01 |      42 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("T1"."RN"=1)
   5 - filter(ROW_NUMBER() OVER ( PARTITION BY "USER_ID" ORDER BY "LOGIN_DATE")<=1)
   7 - access("T"."USER_ID"="RWS"."USER_ID" AND "T"."RN"="RWS"."RN"+1)

Note
-----
   - dynamic sampling used for this statement (level=2)


However, this will still be very slow for large datasets, as execution time grows quadratically with the number of records per user. This can be made linear by writing the initial subquery into a temporary table with an index, and reading from that in the main query. In my work I found that solution by Model clause was also linear, and quicker too, but Match_Recognize was much faster than either, if you have 12c.

Chris Saxon
November 25, 2016 - 4:52 pm UTC

Yep, multiple references to the same table are a killer...

Performance | MODEL clause

Rajeshwaran, Jeyabal, November 26, 2016 - 7:26 am UTC

With MODEL clause, the plan look like this.

Just scan the table "T" once.

But, I am afraid of SQL Model clause due to this. https://community.oracle.com/ideas/13061

demo@ORA12C> set serveroutput off
demo@ORA12C> select /*+ gather_plan_statistics */ user_id,login_date,x as fd,y as rnk
  2     from t
  3     model
  4       partition by (user_id)
  5       dimension by ( row_number() over(partition by user_id order by login_date) r )
  6       measures ( login_date , cast(null as date) x, 0 y )
  7       rules (
  8         x[any] order by r = case when cv(r) = 1 then login_date[cv()]
  9                                  when login_date[cv()] - x[cv()-1] <= 30/24/60 then x[cv()-1]
 10                                  else login_date[cv()] end ,
 11         y[any] order by r = case when cv(r)=1 then 1
 12                                  when x[cv()] = x[cv()-1] then y[cv()-1]
 13                                  else y[cv()-1]+1 end
 14             )
 15     order by user_id,login_date;

   USER_ID LOGIN_DATE  FD                 RNK
---------- ----------- ----------- ----------
         1 22-NOV-2016 22-NOV-2016          1
         1 22-NOV-2016 22-NOV-2016          1
         1 22-NOV-2016 22-NOV-2016          1
         1 22-NOV-2016 22-NOV-2016          2
         1 22-NOV-2016 22-NOV-2016          2
         1 22-NOV-2016 22-NOV-2016          3
         2 22-NOV-2016 22-NOV-2016          1
         2 22-NOV-2016 22-NOV-2016          1
         3 22-NOV-2016 22-NOV-2016          1
         4 22-NOV-2016 22-NOV-2016          1
         4 22-NOV-2016 22-NOV-2016          2
         4 22-NOV-2016 22-NOV-2016          2
         5 22-NOV-2016 22-NOV-2016          1

13 rows selected.

demo@ORA12C>
demo@ORA12C> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------
SQL_ID  2hqau85tyz4wm, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ user_id,login_date,x as fd,y as
rnk    from t    model      partition by (user_id)      dimension by (
row_number() over(partition by user_id order by login_date) r )
measures ( login_date , cast(null as date) x, 0 y )      rules (
x[any] order by r = case when cv(r) = 1 then login_date[cv()]
                      when login_date[cv()] - x[cv()-1] <= 30/24/60
then x[cv()-1]                                 else login_date[cv()]
end ,        y[any] order by r = case when cv(r)=1 then 1
                  when x[cv()] = x[cv()-1] then y[cv()-1]
                  else y[cv()-1]+1 end            )    order by
user_id,login_date

Plan hash value: 1494216132

------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |     13 |00:00:00.01 |       7 |       |       |          |
|   1 |  SORT ORDER BY       |      |      1 |     13 |     13 |00:00:00.01 |       7 |  2048 |  2048 | 2048  (0)|
|   2 |   SQL MODEL ORDERED  |      |      1 |     13 |     13 |00:00:00.01 |       7 |   965K|   965K|  842K (0)|
|   3 |    WINDOW SORT       |      |      1 |     13 |     13 |00:00:00.01 |       7 |  2048 |  2048 | 2048  (0)|
|   4 |     TABLE ACCESS FULL| T    |      1 |     13 |     13 |00:00:00.01 |       7 |       |       |          |
------------------------------------------------------------------------------------------------------------------


26 rows selected.

demo@ORA12C>

Chris Saxon
November 26, 2016 - 2:25 pm UTC

Why are you afraid? Test on your data to see whether performance is "good enough"!

Fear of Model Clause

Brendan, November 26, 2016 - 1:38 pm UTC

Hi Rajesh,

Be not afraid of Model Clause! But do be careful.

In addition to the blog post of a couple of weeks ago, I posted another article this morning on a similar, but slightly more complicated, problem posed recently on OTN. My first Model solution used automatic ordering to avoid ORA-32637, and to my surprise, this caused the execution times to be very slow, varying quadratically with number of records per partition. I managed to fix it by ordering the second rule by dimension descending, and reverting to sequential overall ordering of rules. Then, as in the first article, Model was second best only to Match_Recognize.

Model clause implements multiple algorithms under the covers and is sometimes slow, sometimes fast. These bursting problems seem to suit it very well, with sequential ordering.

I think benchmarking across multiple dataset sizes is the best way to check performance characteristics generally for this kind of non-OLTP query.

Regarding possible analytic function solutions, I remember looking for them at the time of the original AskTom question (referenced in my first blog post), several years ago, and not finding any - in fact, I think that is what prompted me to consider Model. I have not seen any analytic function solutions since then, and believe they are not possible - but if anyone disagrees...
Chris Saxon
November 26, 2016 - 2:48 pm UTC

Thanks Brendan - can you share a link to your findings?

I feel like it is possible with just analytic functions, but I can't see how at the moment...

Link to My Results

Brendan, November 26, 2016 - 3:08 pm UTC

Sure http://aprogrammerwrites.eu/?p=1836 is the first one

Chris Saxon
November 28, 2016 - 10:50 am UTC

Thanks for sharing

a question on xplan

A reader, November 28, 2016 - 4:04 pm UTC

after SELECT /*+ gather_plan_statistics */ ....

select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

gives something out of my query :
(unless specifiying the sql_id ; WHY ??? and how to overcome pls?

tkx



--------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | |
| 1 | SORT ORDER BY | | 1 | 2048 | 2048 | 2048 (0)|
|* 2 | FIXED TABLE FIXED INDEX| X$KGLCURSOR_CHILD (ind:2) | 1 | | | |
--------------------------------------------------------------------------------------------------
Chris Saxon
November 28, 2016 - 5:55 pm UTC

If you don't pass the SQL_ID it returns the plan for the last executed statement.

What exactly is your script?

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.