Thanks for the question, Rajendra.

Asked: November 21, 2016 - 1:18 pm UTC

Last updated: November 28, 2016 - 5:55 pm UTC

Version: 11g

Viewed 1000+ times

Hi Tom,

I am a newbie to oracle and i have a requirement. I even got struck at creating the table as i want to store only time in one of the column 'login'

input 1
1 9:00
2 9:15
3 9:20
1 9:25
4 10:30

Below is the desired output if the login interval of the user is less than 30 minutes (here userid 1 logged in at 9:00 and then again loggedin at 9:25 , so the interval is < 30 minutes, in that case the seqid should not change ).

1 9:00 1
2 9:15 2
3 9:20 3
1 9:25 1
4 10:30 4

input 2
1 9:00
2 9:15
3 9:20
1 9:35
4 10:30

desired output if the login interval of the user is greater than 30 minutes (here userid 1 logged in at 9:00 and then again logged in at 9:35 , so the interval is > 30 minutes, in that case the seqid should change. It should be the next sequence in the list)

1 9:00 1
2 9:15 2
3 9:20 3
1 9:35 4
4 10:30 5

Thanks
Raj

and Chris said...

Surely you need to store the date and the time? Otherwise how do you tell between a login today at 1am and a login yesterday at the same time?

And if a user logs in at 9am, 9:25am and 9:35am is that two sessions (because 9:35 > 30 mins after 9) or one (because 9:35 is only 10 mins after the login at 9:25)?

I'll discuss both solutions. So before we begin proper, here's a table with sample data:

```create table t (
user_id    int,
);

insert into t values (1, trunc(sysdate)+interval '9' hour);
insert into t values (1, trunc(sysdate)+interval '9:05' hour to minute);
insert into t values (2, trunc(sysdate)+interval '9:15' hour to minute);
insert into t values (3, trunc(sysdate)+interval '9:20' hour to minute);
insert into t values (1, trunc(sysdate)+interval '9:25' hour to minute);
insert into t values (4, trunc(sysdate)+interval '9:25' hour to minute);
insert into t values (1, trunc(sysdate)+interval '9:35' hour to minute);
insert into t values (2, trunc(sysdate)+interval '9:45' hour to minute);
insert into t values (1, trunc(sysdate)+interval '10:00' hour to minute);
insert into t values (4, trunc(sysdate)+interval '10:15' hour to minute);
insert into t values (1, trunc(sysdate)+interval '10:35' hour to minute);
insert into t values (4, trunc(sysdate)+interval '10:35' hour to minute);
insert into t values (5, trunc(sysdate)+interval '10:35' hour to minute);

commit;
```

If it's the second case (group all logins that were within 30 mins of the previous), you can do the following:

1. Find the time difference between the current and previous row
2. If this is > 30 mins, return the login date otherwise null
3. "Fill down" the blank times with the first
4. Rank the rows according to this calculated values.

To do step 1, use lag to find the previous value. You want to split this up by user, so partition by user_id and order by date:

```login_date - lag(login_date, 1, date'1900-01-01')
over (partition by user_id order by login_date)```

The third argument to lag (date'1900-01-01') is the default value if there's no previous row. i.e. the first login for each user. Set this to any suitable out-of-bounds date.

Stick this in a case expression to return the login_date if there's > 30 minutes between the two

```select user_id, login_date,
case
when (
over (partition by user_id order by login_date)
) > 30/1440 then
end dt
from   t;

1        22-NOV-2016 09:00:00  22-NOV-2016 09:00:00
1        22-NOV-2016 09:05:00
1        22-NOV-2016 09:25:00
1        22-NOV-2016 09:35:00
1        22-NOV-2016 10:00:00
1        22-NOV-2016 10:35:00  22-NOV-2016 10:35:00
2        22-NOV-2016 09:15:00  22-NOV-2016 09:15:00
2        22-NOV-2016 09:45:00
3        22-NOV-2016 09:20:00  22-NOV-2016 09:20:00
4        22-NOV-2016 09:25:00  22-NOV-2016 09:25:00
4        22-NOV-2016 10:15:00  22-NOV-2016 10:15:00
4        22-NOV-2016 10:35:00
5        22-NOV-2016 10:35:00  22-NOV-2016 10:35:00
</code

You can then populate the blanks by returning the last_value of this calculation, ignoring nulls:

<code>
with rws as (
case
when (
over (partition by user_id order by login_date)
) > 30/1440 then
end dt
from   t
)
last_value(dt) ignore nulls
over (partition by user_id order by login_date) lv
from rws;

1        22-NOV-2016 09:00:00  22-NOV-2016 09:00:00
1        22-NOV-2016 09:05:00  22-NOV-2016 09:00:00
1        22-NOV-2016 09:25:00  22-NOV-2016 09:00:00
1        22-NOV-2016 09:35:00  22-NOV-2016 09:00:00
1        22-NOV-2016 10:00:00  22-NOV-2016 09:00:00
1        22-NOV-2016 10:35:00  22-NOV-2016 10:35:00
2        22-NOV-2016 09:15:00  22-NOV-2016 09:15:00
2        22-NOV-2016 09:45:00  22-NOV-2016 09:15:00
3        22-NOV-2016 09:20:00  22-NOV-2016 09:20:00
4        22-NOV-2016 09:25:00  22-NOV-2016 09:25:00
4        22-NOV-2016 10:15:00  22-NOV-2016 10:15:00
4        22-NOV-2016 10:35:00  22-NOV-2016 10:15:00
5        22-NOV-2016 10:35:00  22-NOV-2016 10:35:00
```

You can now use this calculated LV value to apply the sequence. The analytic dense_rank is what you want. This gives rows with the same value the same rank with no gaps:

```with rws as (
case
when (
over (partition by user_id order by login_date)
) > 30/1440 then
end dt
from   t
), grps as (
last_value(dt) ignore nulls
over (partition by user_id order by login_date) lv
from rws
)
dense_rank() over (order by lv, user_id) seq
from   grps

1        22-NOV-2016 09:00:00  1
1        22-NOV-2016 09:05:00  1
2        22-NOV-2016 09:15:00  2
3        22-NOV-2016 09:20:00  3
1        22-NOV-2016 09:25:00  1
4        22-NOV-2016 09:25:00  4
1        22-NOV-2016 09:35:00  1
2        22-NOV-2016 09:45:00  2
1        22-NOV-2016 10:00:00  1
4        22-NOV-2016 10:15:00  5
1        22-NOV-2016 10:35:00  6
4        22-NOV-2016 10:35:00  5
5        22-NOV-2016 10:35:00  7
```

The above makes heavy use of analytic functions. If you want to know more about them, check out Connor's KISS video series:

If you need the first case (split into 30 minute groups) this is easiest with pattern matching.

Again, you want to partition by user and order by date. Once you've done this you want patterns where the current date is within 30 minutes of the first date in the match. So your pattern variable looks like:

`thirty as login_date - first(login_date) <= 30/1440`

You then want the first date in the pattern. You can return this in the measures clause with

```first(login_date) as fd
```

All together this gives:

```select * from t
match_recognize (
partition by user_id
measures
all rows per match
pattern (thirty*)
define
);

1        22-NOV-2016 09:00:00  22-NOV-2016 09:00:00
1        22-NOV-2016 09:05:00  22-NOV-2016 09:00:00
1        22-NOV-2016 09:25:00  22-NOV-2016 09:00:00
1        22-NOV-2016 09:35:00  22-NOV-2016 09:35:00
1        22-NOV-2016 10:00:00  22-NOV-2016 09:35:00
1        22-NOV-2016 10:35:00  22-NOV-2016 10:35:00
2        22-NOV-2016 09:15:00  22-NOV-2016 09:15:00
2        22-NOV-2016 09:45:00  22-NOV-2016 09:15:00
3        22-NOV-2016 09:20:00  22-NOV-2016 09:20:00
4        22-NOV-2016 09:25:00  22-NOV-2016 09:25:00
4        22-NOV-2016 10:15:00  22-NOV-2016 10:15:00
4        22-NOV-2016 10:35:00  22-NOV-2016 10:15:00
5        22-NOV-2016 10:35:00  22-NOV-2016 10:35:00
```

Finish by calculating the dense_rank, ordered by FD as in the other solution.

But this requires 12c. If you need an 11g solution let us know.

If you want to know more about pattern matching check out the following blog posts:

http://oracle-big-data.blogspot.co.uk/search/label/Pattern%20Matching

Rating

(9 ratings)

Is this answer out of date? If it is, please let us know via a Comment

11G Solutions

Rajeshwaran, Jeyabal, November 23, 2016 - 8:09 am UTC

....
If you need an 11g solution let us know.
....

I was able to get it using MODEL clause, any possibility using Analytics?

```demo@ORA12C> select *
2  from t
3  model
4    partition by (user_id)
5    dimension by ( row_number() over(partition by user_id order by login_date) r )
6    measures ( login_date , cast(null as date) x, 0 y )
7    rules (
8      x[any] order by r = case when cv(r) = 1 then login_date[cv()]
9                               when login_date[cv()] - x[cv()-1] <= 30/24/60 then x[cv()-1]
11      y[any] order by r = case when cv(r)=1 then 1
12                               when x[cv()] = x[cv()-1] then y[cv()-1]
13                               else y[cv()-1]+1 end
14          )
16  /

---------- ---------- -------------------- -------------------- ----------
1          1 22-NOV-2016 09:00:00 22-NOV-2016 09:00:00          1
1          2 22-NOV-2016 09:05:00 22-NOV-2016 09:00:00          1
1          3 22-NOV-2016 09:25:00 22-NOV-2016 09:00:00          1
1          4 22-NOV-2016 09:35:00 22-NOV-2016 09:35:00          2
1          5 22-NOV-2016 10:00:00 22-NOV-2016 09:35:00          2
1          6 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          3
2          1 22-NOV-2016 09:15:00 22-NOV-2016 09:15:00          1
2          2 22-NOV-2016 09:45:00 22-NOV-2016 09:15:00          1
3          1 22-NOV-2016 09:20:00 22-NOV-2016 09:20:00          1
4          1 22-NOV-2016 09:25:00 22-NOV-2016 09:25:00          1
4          2 22-NOV-2016 10:15:00 22-NOV-2016 10:15:00          2
4          3 22-NOV-2016 10:35:00 22-NOV-2016 10:15:00          2
5          1 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          1

13 rows selected.

demo@ORA12C>```

November 24, 2016 - 1:11 pm UTC

If you're happy to split the logins into 30 min intervals, you can do something like:

```create table t (
user_id    int,
);

insert into t values (1, trunc(sysdate)+interval '9' hour);
insert into t values (1, trunc(sysdate)+interval '9:05' hour to minute);
insert into t values (1, trunc(sysdate)+interval '9:25' hour to minute);
insert into t values (1, trunc(sysdate)+interval '9:35' hour to minute);
insert into t values (1, trunc(sysdate)+interval '10:55' hour to minute);
insert into t values (1, trunc(sysdate)+interval '11:05' hour to minute);

commit;

from   t;

24-NOV-2016 09:00:00  18
24-NOV-2016 09:05:00  18
24-NOV-2016 09:25:00  18
24-NOV-2016 09:35:00  19
24-NOV-2016 10:55:00  21
24-NOV-2016 11:05:00  22
```

But if it has to be 30 mins since the last login... it gets tricky. You could use recursive CTEs:

```with rws (user_id, login_date, min_date, grp) as (
where  t2.user_id = t2.user_id
)
union all
case
when t.login_date - rws.min_date > 30/1440 then
else
rws.min_date
end min_date,
case
when t.login_date - rws.min_date > 30/1440 then
grp + 1
else
grp
end grp
from   rws
join   t
on     t.user_id = rws.user_id
where  t2.user_id = t2.user_id
)
)
select * from rws;

1        24-NOV-2016 09:00:00  24-NOV-2016 09:00:00  1
1        24-NOV-2016 09:05:00  24-NOV-2016 09:00:00  1
1        24-NOV-2016 09:25:00  24-NOV-2016 09:00:00  1
1        24-NOV-2016 09:35:00  24-NOV-2016 09:35:00  2
1        24-NOV-2016 10:55:00  24-NOV-2016 10:55:00  3
1        24-NOV-2016 11:05:00  24-NOV-2016 10:55:00  3
```

11G Solutions

Rajeshwaran, Jeyabal, November 25, 2016 - 5:51 am UTC

```truncate table t;
insert into t values(1,to_date('22-NOV-2016 09:00:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(1,to_date('22-NOV-2016 09:05:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(1,to_date('22-NOV-2016 09:25:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(1,to_date('22-NOV-2016 09:35:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(1,to_date('22-NOV-2016 10:00:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(1,to_date('22-NOV-2016 10:35:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(2,to_date('22-NOV-2016 09:15:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(2,to_date('22-NOV-2016 09:45:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(3,to_date('22-NOV-2016 09:20:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(4,to_date('22-NOV-2016 09:25:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(4,to_date('22-NOV-2016 10:15:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(4,to_date('22-NOV-2016 10:35:00','dd-mon-yyyy hh24:mi:ss'));
insert into t values(5,to_date('22-NOV-2016 10:35:00','dd-mon-yyyy hh24:mi:ss'));
commit;```

Given this
....
If you need the first case (split into 30 minute groups) this is easiest with pattern matching.
Again, you want to partition by user and order by date. Once you've done this you want patterns
where the current date is within 30 minutes of the first date in the match.
So your pattern variable looks like:

....

with pattern matching (works from 12c and above), was able to get this.

```demo@ORA12C> select user_id,login_date,fd,
2        dense_rank() over(partition by user_id
3            order by fd) rnk
4  from (
5  select *
6  from t
7  match_recognize(
8    partition by user_id
10    measures
11      match_number() mno,
12      classifier() cls,
14    all rows per match
15    pattern( strt down*)
16    define
18        )
19  /

---------- -------------------- -------------------- ----------
1 22-NOV-2016 09:00:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:05:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:25:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:35:00 22-NOV-2016 09:35:00          2
1 22-NOV-2016 10:00:00 22-NOV-2016 09:35:00          2
1 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          3
2 22-NOV-2016 09:15:00 22-NOV-2016 09:15:00          1
2 22-NOV-2016 09:45:00 22-NOV-2016 09:15:00          1
3 22-NOV-2016 09:20:00 22-NOV-2016 09:20:00          1
4 22-NOV-2016 09:25:00 22-NOV-2016 09:25:00          1
4 22-NOV-2016 10:15:00 22-NOV-2016 10:15:00          2
4 22-NOV-2016 10:35:00 22-NOV-2016 10:15:00          2
5 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          1

13 rows selected.```

Now thinking about to handle this in 11g database, so with MODEL clause got this.

```demo@ORA12C> select user_id,login_date,x as fd,y as rnk
2  from t
3  model
4    partition by (user_id)
5    dimension by ( row_number() over(partition by user_id order by login_date) r )
6    measures ( login_date , cast(null as date) x, 0 y )
7    rules (
8      x[any] order by r = case when cv(r) = 1 then login_date[cv()]
9                               when login_date[cv()] - x[cv()-1] <= 30/24/60 then x[cv()-1]
11      y[any] order by r = case when cv(r)=1 then 1
12                               when x[cv()] = x[cv()-1] then y[cv()-1]
13                               else y[cv()-1]+1 end
14          )

---------- -------------------- -------------------- ----------
1 22-NOV-2016 09:00:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:05:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:25:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:35:00 22-NOV-2016 09:35:00          2
1 22-NOV-2016 10:00:00 22-NOV-2016 09:35:00          2
1 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          3
2 22-NOV-2016 09:15:00 22-NOV-2016 09:15:00          1
2 22-NOV-2016 09:45:00 22-NOV-2016 09:15:00          1
3 22-NOV-2016 09:20:00 22-NOV-2016 09:20:00          1
4 22-NOV-2016 09:25:00 22-NOV-2016 09:25:00          1
4 22-NOV-2016 10:15:00 22-NOV-2016 10:15:00          2
4 22-NOV-2016 10:35:00 22-NOV-2016 10:15:00          2
5 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          1

13 rows selected.

demo@ORA12C>```

with the above recursive with clause, don't get the correct result set.

```demo@ORA12C> with rws (user_id, login_date, min_date, grp) as (
4      select min(login_date) from t t2
5      where  t2.user_id = t2.user_id
6    )
7    union all
9           case
10             when t.login_date - rws.min_date > 30/1440 then
12             else
13               rws.min_date
14           end min_date,
15           case
16             when t.login_date - rws.min_date > 30/1440 then
17               grp + 1
18             else
19               grp
20           end grp
21    from   rws
22    join   t
23    on     t.user_id = rws.user_id
26      select min(login_date) from t t2
27      where  t2.user_id = t2.user_id
29    )
30  )
31   select * from rws;

---------- -------------------- -------------------- ----------
1 22-NOV-2016 09:00:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:05:00 22-NOV-2016 09:00:00          1

2 rows selected.

demo@ORA12C>```

looks like few join were in-correct in the Recursive with clauses.
....
4 select min(login_date) from t t2
5 where t2.user_id = t2.user_id

27 where t2.user_id = t2.user_id
....

t2.user_id joins again with t2.user_id , hence the problem is

```The algorithm for the recursive with clause goes like this.
b) for each user, get the next login_date greater than the above least login date and find the distance between the current login_date and the least login date
c) if the distance is more than 30 mins, then increment the grp by one else have it as such.
d) then proceed the same way for others rows ```

Now the updated recursive with clause goes like this.

```demo@ORA12C> with rws ( user_id, login_date, min_dt, grp ) as (
3  from t t1
5        from t t2
6        where t2.user_id = t1.user_id )
7  union all
9            case when t3.login_date - r1.min_dt > 30/24/60 then
10                    t3.login_date else r1.min_dt end ,
11            case when t3.login_date - r1.min_dt > 30/24/60 then
12                    r1.grp +1 else r1.grp end
13  from t t3, rws r1
14  where t3.user_id = r1.user_id and
17        from t t4
18        where t4.user_id = r1.user_id and
20  )
21  select user_id, login_date, min_dt fd, grp rnk
22  from rws

---------- -------------------- -------------------- ----------
1 22-NOV-2016 09:00:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:05:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:25:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:35:00 22-NOV-2016 09:35:00          2
1 22-NOV-2016 10:00:00 22-NOV-2016 09:35:00          2
1 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          3
2 22-NOV-2016 09:15:00 22-NOV-2016 09:15:00          1
2 22-NOV-2016 09:45:00 22-NOV-2016 09:15:00          1
3 22-NOV-2016 09:20:00 22-NOV-2016 09:20:00          1
4 22-NOV-2016 09:25:00 22-NOV-2016 09:25:00          1
4 22-NOV-2016 10:15:00 22-NOV-2016 10:15:00          2
4 22-NOV-2016 10:35:00 22-NOV-2016 10:15:00          2
5 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          1

13 rows selected.

demo@ORA12C>```

But do we have any way to get this using Analytics ?

Tried using Analytics, but got only this. (which is incorrect!)

```demo@ORA12C> select user_id,login_date, grp fd,
2            dense_rank() over(partition by user_id
3                order by grp ) rnk
4  from (
6            min(login_date) over(partition by user_id) min_dt ,
9                      else min(login_date) over(partition by user_id)
10              end grp
11  from t
12       )
14  /

---------- -------------------- -------------------- ----------
1 22-NOV-2016 09:00:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:05:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:25:00 22-NOV-2016 09:00:00          1
1 22-NOV-2016 09:35:00 22-NOV-2016 09:35:00          2
1 22-NOV-2016 10:00:00 22-NOV-2016 10:00:00          3
1 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          4
2 22-NOV-2016 09:15:00 22-NOV-2016 09:15:00          1
2 22-NOV-2016 09:45:00 22-NOV-2016 09:15:00          1
3 22-NOV-2016 09:20:00 22-NOV-2016 09:20:00          1
4 22-NOV-2016 09:25:00 22-NOV-2016 09:25:00          1
4 22-NOV-2016 10:15:00 22-NOV-2016 10:15:00          2
4 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          3
5 22-NOV-2016 10:35:00 22-NOV-2016 10:35:00          1

13 rows selected.

demo@ORA12C>```

in this above example, the user_id=4 has three dates ( 22-NOV-2016 09:25:00, 22-NOV-2016 10:15:00 and 22-NOV-2016 10:35:00 ) - order by login_date asc

a) take the first date(09:25) and compare that with the next date (10:15)
b) if the difference is less than 30 min, then same group else new group, then proceed the comparison using (10:15) as a base
c) next the distance between 10:15 and 10:35 is less the 30 min, so lies in the same group

so user_id=4 has totally two group - group=1 has only one element (22-NOV-2016 09:25:00) and group=2 has two elements ( 22-NOV-2016 10:15:00 and 22-NOV-2016 10:35:00 )

These kind of inter-row calculations/results are easy to handle using MODEL clause then Analytics.

kindly let us know if you have any options to get this using Analytics.
November 25, 2016 - 4:56 pm UTC

Good catch - thanks.

Though the recursive solution still isn't quite right. The first login for each user gets rank = 1. Only the very first login should have this value. The sequence needs to go across all users.

Is it possible with "pure" analytics? Maybe. I can't find a correct solution yet...

Stew Ashton, November 25, 2016 - 9:39 am UTC

If I read the original answer right, MATCH_RECOGNIZE is supposed to do most of the work but DENSE_RANK is used to rank the groups.

It seems to me that the MATCH_NUMBER() function provides that rank directly.
```select * from t
match_recognize (
partition by user_id
measures
match_number() rnk
all rows per match
pattern (thirty*)
define
);

---------- ---------------- ---------------- ----------
1 2016-11-25 09:00 2016-11-25 09:00          1
1 2016-11-25 09:05 2016-11-25 09:00          1
1 2016-11-25 09:25 2016-11-25 09:00          1
1 2016-11-25 09:35 2016-11-25 09:35          2
1 2016-11-25 10:00 2016-11-25 09:35          2
1 2016-11-25 10:35 2016-11-25 10:35          3
2 2016-11-25 09:15 2016-11-25 09:15          1
2 2016-11-25 09:45 2016-11-25 09:15          1
3 2016-11-25 09:20 2016-11-25 09:20          1
4 2016-11-25 09:25 2016-11-25 09:25          1
4 2016-11-25 10:15 2016-11-25 10:15          2
4 2016-11-25 10:35 2016-11-25 10:15          2
5 2016-11-25 10:35 2016-11-25 10:35          1```

Unless I am missing something...

Best regards, Stew
November 25, 2016 - 10:03 am UTC

The rank has to cross user ids. i.e. there should only be one set of rows with rank = 1. Using match_number() resets to 1 for new user_id, giving multiple.

Oops, thanks

Stew Ashton, November 25, 2016 - 2:16 pm UTC

Ah, in the dense_rank() the ordering is by "first login date", then user_id. So I was missing something...

Thanks for setting me straight, and excellent answer!

Best regards, Stew
November 25, 2016 - 4:51 pm UTC

Cheers. I though match_number() was good enough originally too...

Performance issues...

Brendan, November 25, 2016 - 4:00 pm UTC

I wrote a blog post just lately on this kind of problem solved with features from different Oracle versions. May I comment on the performance aspects, based on what I learned there?

Here is the execution plan for the recursive subquery solution (as in Rajeshwaran's corrected version):

```Plan hash value: 4272184493

-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |         |      1 |        |     13 |00:00:00.01 |     112 |       |       |          |
|   1 |  SORT ORDER BY                             |         |      1 |      4 |     13 |00:00:00.01 |     112 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW                                     |         |      1 |      4 |     13 |00:00:00.01 |     112 |       |       |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST|         |      1 |        |     13 |00:00:00.01 |     112 |       |       |          |
|*  4 |     HASH JOIN                              |         |      1 |      3 |      5 |00:00:00.01 |      14 |  1156K|  1156K|  664K (0)|
|   5 |      VIEW                                  | VW_SQ_1 |      1 |     13 |      5 |00:00:00.01 |       7 |       |       |          |
|   6 |       HASH GROUP BY                        |         |      1 |     13 |      5 |00:00:00.01 |       7 |  1115K|  1115K|  852K (0)|
|   7 |        TABLE ACCESS FULL                   | T       |      1 |     13 |     13 |00:00:00.01 |       7 |       |       |          |
|   8 |      TABLE ACCESS FULL                     | T       |      1 |     13 |     13 |00:00:00.01 |       7 |       |       |          |
|*  9 |     FILTER                                 |         |      6 |        |      8 |00:00:00.01 |      98 |       |       |          |
|* 10 |      HASH JOIN                             |         |      6 |      1 |     19 |00:00:00.01 |      42 |   927K|   927K|  787K (0)|
|  11 |       RECURSIVE WITH PUMP                  |         |      6 |        |     13 |00:00:00.01 |       0 |       |       |          |
|  12 |       TABLE ACCESS FULL                    | T       |      6 |     13 |     78 |00:00:00.01 |      42 |       |       |          |
|  13 |      SORT AGGREGATE                        |         |      8 |      1 |      8 |00:00:00.01 |      56 |       |       |          |
|* 14 |       TABLE ACCESS FULL                    | T       |      8 |      1 |     19 |00:00:00.01 |      56 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

10 - access("T"."USER_ID"="RWS"."USER_ID")

Note
-----
- dynamic sampling used for this statement (level=2)```

There are 4 scans of the table. I would first order the records up front in a separate subquery, which is simpler as well as faster than using a correlated subquery for minimum date. Here is the revised query with resulting plan:

```Version avoiding the correlated subquery for min date
SQL> with rows_ordered AS (
2   select user_id, login_date, Row_Number() over (partition by user_id order by login_date) rn from t
3  ), rws (user_id, login_date, min_date, rn, grp) as (
5    from rows_ordered t1
6    where  t1.rn = 1
7    union all
9           case
10             when t.login_date - rws.min_date > 30/1440 then
12             else
13               rws.min_date
14           end,
15           t.rn,
16           case
17             when t.login_date - rws.min_date > 30/1440 then
18               grp + 1
19             else
20               grp
21           end
22    from   rws
23    join   rows_ordered t
24    on     t.user_id = rws.user_id
25    and    t.rn = rws.rn + 1
26  )
27   select  /*+ LOGIN_BPF gather_plan_statistics */ * from rws
29  /

Plan hash value: 1188792140

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |      |      1 |        |     13 |00:00:00.01 |      49 |       |       |          |
|   1 |  SORT ORDER BY                             |      |      1 |     15 |     13 |00:00:00.01 |      49 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW                                     |      |      1 |     15 |     13 |00:00:00.01 |      49 |       |       |          |
|   3 |    UNION ALL (RECURSIVE WITH) BREADTH FIRST|      |      1 |        |     13 |00:00:00.01 |      49 |       |       |          |
|*  4 |     VIEW                                   |      |      1 |     13 |      5 |00:00:00.01 |       7 |       |       |          |
|*  5 |      WINDOW SORT PUSHED RANK               |      |      1 |     13 |      8 |00:00:00.01 |       7 |  2048 |  2048 | 2048  (0)|
|   6 |       TABLE ACCESS FULL                    | T    |      1 |     13 |     13 |00:00:00.01 |       7 |       |       |          |
|*  7 |     HASH JOIN                              |      |      6 |      2 |      8 |00:00:00.01 |      42 |   933K|   933K|  867K (0)|
|   8 |      RECURSIVE WITH PUMP                   |      |      6 |        |     13 |00:00:00.01 |       0 |       |       |          |
|   9 |      VIEW                                  |      |      6 |     13 |     78 |00:00:00.01 |      42 |       |       |          |
|  10 |       WINDOW SORT                          |      |      6 |     13 |     78 |00:00:00.01 |      42 |  2048 |  2048 | 2048  (0)|
|  11 |        TABLE ACCESS FULL                   | T    |      6 |     13 |     78 |00:00:00.01 |      42 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("T1"."RN"=1)
5 - filter(ROW_NUMBER() OVER ( PARTITION BY "USER_ID" ORDER BY "LOGIN_DATE")<=1)
7 - access("T"."USER_ID"="RWS"."USER_ID" AND "T"."RN"="RWS"."RN"+1)

Note
-----
- dynamic sampling used for this statement (level=2)
```

However, this will still be very slow for large datasets, as execution time grows quadratically with the number of records per user. This can be made linear by writing the initial subquery into a temporary table with an index, and reading from that in the main query. In my work I found that solution by Model clause was also linear, and quicker too, but Match_Recognize was much faster than either, if you have 12c.

November 25, 2016 - 4:52 pm UTC

Yep, multiple references to the same table are a killer...

Performance | MODEL clause

Rajeshwaran, Jeyabal, November 26, 2016 - 7:26 am UTC

With MODEL clause, the plan look like this.

Just scan the table "T" once.

But, I am afraid of SQL Model clause due to this. https://community.oracle.com/ideas/13061

```demo@ORA12C> set serveroutput off
demo@ORA12C> select /*+ gather_plan_statistics */ user_id,login_date,x as fd,y as rnk
2     from t
3     model
4       partition by (user_id)
5       dimension by ( row_number() over(partition by user_id order by login_date) r )
6       measures ( login_date , cast(null as date) x, 0 y )
7       rules (
8         x[any] order by r = case when cv(r) = 1 then login_date[cv()]
9                                  when login_date[cv()] - x[cv()-1] <= 30/24/60 then x[cv()-1]
11         y[any] order by r = case when cv(r)=1 then 1
12                                  when x[cv()] = x[cv()-1] then y[cv()-1]
13                                  else y[cv()-1]+1 end
14             )

---------- ----------- ----------- ----------
1 22-NOV-2016 22-NOV-2016          1
1 22-NOV-2016 22-NOV-2016          1
1 22-NOV-2016 22-NOV-2016          1
1 22-NOV-2016 22-NOV-2016          2
1 22-NOV-2016 22-NOV-2016          2
1 22-NOV-2016 22-NOV-2016          3
2 22-NOV-2016 22-NOV-2016          1
2 22-NOV-2016 22-NOV-2016          1
3 22-NOV-2016 22-NOV-2016          1
4 22-NOV-2016 22-NOV-2016          1
4 22-NOV-2016 22-NOV-2016          2
4 22-NOV-2016 22-NOV-2016          2
5 22-NOV-2016 22-NOV-2016          1

13 rows selected.

demo@ORA12C>
demo@ORA12C> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------
SQL_ID  2hqau85tyz4wm, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ user_id,login_date,x as fd,y as
rnk    from t    model      partition by (user_id)      dimension by (
row_number() over(partition by user_id order by login_date) r )
measures ( login_date , cast(null as date) x, 0 y )      rules (
x[any] order by r = case when cv(r) = 1 then login_date[cv()]
when login_date[cv()] - x[cv()-1] <= 30/24/60
end ,        y[any] order by r = case when cv(r)=1 then 1
when x[cv()] = x[cv()-1] then y[cv()-1]
else y[cv()-1]+1 end            )    order by

Plan hash value: 1494216132

------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |     13 |00:00:00.01 |       7 |       |       |          |
|   1 |  SORT ORDER BY       |      |      1 |     13 |     13 |00:00:00.01 |       7 |  2048 |  2048 | 2048  (0)|
|   2 |   SQL MODEL ORDERED  |      |      1 |     13 |     13 |00:00:00.01 |       7 |   965K|   965K|  842K (0)|
|   3 |    WINDOW SORT       |      |      1 |     13 |     13 |00:00:00.01 |       7 |  2048 |  2048 | 2048  (0)|
|   4 |     TABLE ACCESS FULL| T    |      1 |     13 |     13 |00:00:00.01 |       7 |       |       |          |
------------------------------------------------------------------------------------------------------------------

26 rows selected.

demo@ORA12C>```

November 26, 2016 - 2:25 pm UTC

Why are you afraid? Test on your data to see whether performance is "good enough"!

Fear of Model Clause

Brendan, November 26, 2016 - 1:38 pm UTC

Hi Rajesh,

Be not afraid of Model Clause! But do be careful.

In addition to the blog post of a couple of weeks ago, I posted another article this morning on a similar, but slightly more complicated, problem posed recently on OTN. My first Model solution used automatic ordering to avoid ORA-32637, and to my surprise, this caused the execution times to be very slow, varying quadratically with number of records per partition. I managed to fix it by ordering the second rule by dimension descending, and reverting to sequential overall ordering of rules. Then, as in the first article, Model was second best only to Match_Recognize.

Model clause implements multiple algorithms under the covers and is sometimes slow, sometimes fast. These bursting problems seem to suit it very well, with sequential ordering.

I think benchmarking across multiple dataset sizes is the best way to check performance characteristics generally for this kind of non-OLTP query.

Regarding possible analytic function solutions, I remember looking for them at the time of the original AskTom question (referenced in my first blog post), several years ago, and not finding any - in fact, I think that is what prompted me to consider Model. I have not seen any analytic function solutions since then, and believe they are not possible - but if anyone disagrees...
November 26, 2016 - 2:48 pm UTC

I feel like it is possible with just analytic functions, but I can't see how at the moment...

Brendan, November 26, 2016 - 3:08 pm UTC

Sure http://aprogrammerwrites.eu/?p=1836 is the first one

November 28, 2016 - 10:50 am UTC

Thanks for sharing

a question on xplan

A reader, November 28, 2016 - 4:04 pm UTC

after SELECT /*+ gather_plan_statistics */ ....

select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

gives something out of my query :
(unless specifiying the sql_id ; WHY ??? and how to overcome pls?

tkx

--------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | |
| 1 | SORT ORDER BY | | 1 | 2048 | 2048 | 2048 (0)|
|* 2 | FIXED TABLE FIXED INDEX| X\$KGLCURSOR_CHILD (ind:2) | 1 | | | |
--------------------------------------------------------------------------------------------------
November 28, 2016 - 5:55 pm UTC

If you don't pass the SQL_ID it returns the plan for the last executed statement.