Skip to Main Content
  • Questions
  • returning top 10 records based on previous records counts

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question.

Asked: July 28, 2018 - 3:30 pm UTC

Last updated: July 31, 2018 - 1:44 pm UTC

Version: 11

Viewed 1000+ times

You Asked

Hi,

I have an ordered table of products and customers as follows:

rn product customer 
1 859274 A 
2 859267 A 
3 859250 A 
4 863592 B 
5 862250 B 
6 862700 B 
7 862694 B 
8 862120 B 
9 863592 C 
10 862250 C 
11 862120 D 
12 862694 C 
13 863592 E 
14 862113 D 
15 863592 F 
16 862250 F 
17 862700 E 
18 862694 E 
19 863592 G 



I need an SQL sentence that will return the top 10 product/customer pairs HOWEVER both product and customer cannot repeat itself more than 2 times.

to make it clearer row 3 is out because customer A was included twice before.
row 13 is out (although first time this customer appears) because product 863592 was already included 2 times in vetted records.

the answer for the above example should be:

1 859274 A 
2 859267 A 
4 863592 B 
5 862250 B 
9 863592 C 
10 862250 C 
11 862120 D 
14 862113 D 
17 862700 E 
18 862694 E 


I tried several options like analytic functions including cumulative counts, lags, etc... was not able to solve this one :-)

Thanks

and Chris said...

You can use row_number to assign an incrementing counter for each customer or product. Just pass these in the partition by clause. e.g.:

row_number () over ( partition by product order by rn )


You can then filter out those with a rn > 2

CREATE TABLE t (
  rn int, product int, customer varchar2(1)
)   ;

INSERT INTO t (rn, product, customer) VALUES (1, 859274, 'A');
INSERT INTO t (rn, product, customer) VALUES (2, 859267, 'A');
INSERT INTO t (rn, product, customer) VALUES (3, 859250, 'A');
INSERT INTO t (rn, product, customer) VALUES (4, 863592, 'B');
INSERT INTO t (rn, product, customer) VALUES (5, 862250, 'B');
INSERT INTO t (rn, product, customer) VALUES (6, 862700, 'B');
INSERT INTO t (rn, product, customer) VALUES (7, 862694, 'B');
INSERT INTO t (rn, product, customer) VALUES (8, 862120, 'B');
INSERT INTO t (rn, product, customer) VALUES (9, 863592, 'C');
INSERT INTO t (rn, product, customer) VALUES (10, 862250, 'C');
INSERT INTO t (rn, product, customer) VALUES (11, 862120, 'D');
INSERT INTO t (rn, product, customer) VALUES (12, 862694, 'C');
INSERT INTO t (rn, product, customer) VALUES (13, 863592, 'E');
INSERT INTO t (rn, product, customer) VALUES (14, 862113, 'D');
INSERT INTO t (rn, product, customer) VALUES (15, 863592, 'F');
INSERT INTO t (rn, product, customer) VALUES (16, 862250, 'F');
INSERT INTO t (rn, product, customer) VALUES (17, 862700, 'E');
INSERT INTO t (rn, product, customer) VALUES (18, 862694, 'E');
INSERT INTO t (rn, product, customer) VALUES (19, 863592, 'G');

with ranks as (
  select t.*,
         row_number () over ( partition by product order by rn ) prod_rn ,
         row_number () over ( partition by customer order by rn ) cust_rn
  from   t
)
  select * from ranks
  where  prod_rn <= 2
  and    cust_rn <= 2
  order  by rn;

RN   PRODUCT   CUSTOMER   PROD_RN   CUST_RN   
   1    859274 A                  1         1 
   2    859267 A                  1         2 
   4    863592 B                  1         1 
   5    862250 B                  1         2 
   9    863592 C                  2         1 
  10    862250 C                  2         2 
  11    862120 D                  2         1 
  14    862113 D                  1         2 
  17    862700 E                  2         2

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

RN=18 should also be returned

A reader, July 31, 2018 - 11:09 am UTC

Thanks a lot.
Yes, this is similar to what I also tried.
the problem start with RN=18 where the prod_rn=3 and cust_rn=3 however it still needs to be in (because not all the instances before were qualified so they should not be counted - should be kind of a recursive logic).
I have this SQL with MODEL clause which resolved this:


SELECT * FROM(
SELECT *
FROM TBL_TEST
MODEL
DIMENSION BY(
RN,
PRODUCT,
CUSTOMER
)
MEASURES(1 FLAG)
RULES(
FLAG[RN,ANY,ANY] ORDER BY RN = CASE
WHEN COUNT(FLAG)[RN <= CV(RN),ANY,CUSTOMER = CV(CUSTOMER)] > 2 THEN NULL
WHEN COUNT(FLAG)[RN <= CV(RN),PRODUCT = CV(PRODUCT),ANY] > 2 THEN NULL
ELSE COUNT(FLAG)[RN <= CV(RN),ANY,ANY]
END
)
)
where FLAG<=10
Chris Saxon
July 31, 2018 - 11:33 am UTC

Why does it need to be in? It breaks the rule you've given!

What precisely is the rule?

A reader, July 31, 2018 - 11:50 am UTC


logic

A reader, July 31, 2018 - 12:14 pm UTC

The logic should be a recursive one:
Let's say you are on record [rn,cust,prod] = [18,x,y]:
count x where rn between 1 and 17 and rn was selected in previous iterations.
count y where rn between 1 and 17 and rn were selected in previous iterations.
if both<2 then select current rn

hope its clarifying
Chris Saxon
July 31, 2018 - 1:44 pm UTC

No... that doesn't help...

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.