Query not working with my test data
Rajneesh Pandey, August 02, 2016 - 10:28 am UTC
Hi Chris,
I ran the same query on the below data set but not getting the correct result. Can you please let me know if I am doing anything wrong.
with t as (
select 'A' name, 10 id from dual
union
select 'A' name, 11 id from dual
union
select 'A' name, 12 id from dual
union
select 'A' name, 13 id from dual
union
select 'A' name, 41 id from dual
union
select 'A' name, 42 id from dual
union
select 'B' name, 10 id from dual
union
select 'B' name, 11 id from dual
union
select 'B' name, 12 id from dual
)
select name, count(c) from (
select name, grp, count(*) c from (select name, id - row_number() over (partition by name order by id) grp from t)
group by name, grp)
group by name
Regards,
Rajneesh
August 02, 2016 - 10:56 am UTC
What do you get and what are you expecting?
Based on what I can see there's two groups for A (10-13 & 41-42) and one for B (10-12). That's the output I get:
SQL> with t as (
2 select 'A' name, 10 id from dual
3 union
4 select 'A' name, 11 id from dual
5 union
6 select 'A' name, 12 id from dual
7 union
8 select 'A' name, 13 id from dual
9 union
10 select 'A' name, 41 id from dual
11 union
12 select 'A' name, 42 id from dual
13 union
14 select 'B' name, 10 id from dual
15 union
16 select 'B' name, 11 id from dual
17 union
18 select 'B' name, 12 id from dual
19 )
20 select name, count(c) from (
21 select name, grp, count(*) c
22 from (
23 select name, id - row_number() over (partition by name order by id) grp from t
24 )
25 group by name, grp)
26 group by name ;
N COUNT(C)
- ----------
A 2
B 1
And it's what you asked for, no?
Rajneesh Pandey, August 02, 2016 - 11:28 am UTC
Hi Chris,
I think I had misunderstood the requirement. I thought that the expected result should give a count of pairs for each group. like in my example for A, it should be 4 (10,11), (11,12), (12,13), (41,42), and for B it should be (10,11), (11,12), i.e. 2.
If I modify your query like below, it is giving me the above expected result.
with t as (
select 'A' name, 10 id from dual
union
select 'A' name, 11 id from dual
union
select 'A' name, 12 id from dual
union
select 'A' name, 13 id from dual
union
select 'A' name, 41 id from dual
union
select 'A' name, 42 id from dual
union
select 'B' name, 10 id from dual
union
select 'B' name, 11 id from dual
union
select 'B' name, 12 id from dual
)
select name, sum(c) from (
select name, grp, (count(*) -1) c from (
select name, id - row_number() over (partition by name order by id) grp from t)
group by name, grp)
group by name
Regards,
Rajneesh
August 02, 2016 - 12:43 pm UTC
Yes, the number of consecutive pairs will be the number of rows in the group - 1.
So simple, so obvious...
Francisco, August 02, 2016 - 1:06 pm UTC
I run this on my 4.8M records table (about 300K distinct names where only 10K belong to more than a group ranging from 2 to 5K groups) and performance was pretty good - no need to tweak anything. I liked the KISS YouTube video. My dts_id reflects the date_time_stamp and I have a table that links the actual date and time to the id. I will see how performance behaves if I follow the YouTube video to find these data clusters based on the date and time - if I can get it to perform well, then there is one less table to keep in my schema! Thanks a lot for the help