Skip to Main Content
  • Questions
  • selecting fewer rows based on the max of one

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Robert.

Asked: October 11, 2011 - 5:48 pm UTC

Last updated: October 13, 2011 - 10:52 am UTC

Version: 10.2 express edition

Viewed 1000+ times

You Asked

Consider this table with this data:


create table responses_v5 ( user_id varchar2(20), hw_id number(38), pb_id number(38), tries number(38), grade number(3,2), response varchar2(20), primary key(user_id, hw_id, pb_id, tries) );


insert into responses_v5 values ( 'bob', 4540, 1, 1, 0.0, 'response1' );
insert into responses_v5 values ( 'bob', 4540, 1, 2, 0.0, 'response2' );
insert into responses_v5 values ( 'bob', 4540, 1, 3, 1.0, 'response3' );
insert into responses_v5 values ( 'refoyl', 4540, 1, 1, 0.0, 'response4' );
insert into responses_v5 values ( 'refoyl', 4540, 1, 2, 0.0, 'response5' );
insert into responses_v5 values ( 'refoyl', 4540, 2, 1, 0.0, 'response6' );
insert into responses_v5 values ( 'refoyl', 4540, 2, 2, 0.0, 'response7' );
insert into responses_v5 values ( 'refoyl', 4540, 2, 3, 1.0, 'response8' );
insert into responses_v5 values ( 'refoyl', 4541, 1, 1, 0.0, 'response9' );
insert into responses_v5 values ( 'refoyl', 4541, 1, 2, 0.0, 'response0' );


For each unique combination of user_id, hw_id, and pb_id, I want only the record that has the maximum number of tries. Such a query would return only these records from the table above:

'bob', 4540, 1, 3, 1.0, 'response3'
'refoyl', 4540, 1, 2, 0.0, 'response5'
'refoyl', 4540, 2, 3, 1.0, 'response8'
'refoyl', 4541, 1, 2, 0.0, 'response0'


Currently I am acquiring all the records, ordering them by user_id, hw_id, pb_id, and -tries, and then using Java to save the first record for each unique combination of user_id, hw_id, and pb_id and discarding the remaining ones. But clearly it would be better if I did not acquire those extra records in the first place.

Any ideas?

and Tom said...

see also

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:122801500346829407

ops$tkyte%ORA11GR2> select user_id, hw_id, pb_id,
  2         max(tries) keep (dense_rank first order by tries DESC) max_tries,
  3             max(grade) keep (dense_rank first order by tries DESC) max_grade,
  4             max(response) keep (dense_rank first order by tries DESC) max_response
  5    from responses_v5
  6   group by user_id, hw_id, pb_id
  7   order by user_id, hw_id, pb_id
  8  /

USER_ID                   HW_ID      PB_ID  MAX_TRIES  MAX_GRADE MAX_RESPONSE
-------------------- ---------- ---------- ---------- ---------- --------------------
bob                        4540          1          3          1 response3
refoyl                     4540          1          2          0 response5
refoyl                     4540          2          3          1 response8
refoyl                     4541          1          2          0 response0



Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

impressive!

Robert Grossman, October 12, 2011 - 11:28 am UTC

Thanks! I never would have figured that out on my own.

For those reading this who are as ignorant as I am, Google tells me that DESC stands for "descending", not "describe"; its opposite is ASC.

follow-up

Robert Grossman, October 12, 2011 - 12:43 pm UTC

The problem I set to Tom was simplified from what I had, in that in my real table, the response field was a CLOB, not a varchar2, and it turns out that max() doesn't work on a CLOB.  But I figured out how to make the selection work with CLOBS:

<code>select r1.user_id, r1.hw_id, r1.pb_id, r1.tries, r2.grade, r2.response from (select user_id, hw_id, pb_id, max(tries) keep (dense_rank first order by tries DESC) tries from test_responses_v5 group by user_id, hw_id, pb_id order by user_id, hw_id, pb_id) r1 join test_responses_v5 r2 on r2.user_id = r1.user_id and r2.hw_id = r1.hw_id and r2.pb_id = r1.pb_id and r2.tries = r1.tries;


USER_ID     HW_ID      PB_ID TRIES    GRADE RESPONSE
-------------------- ---------- ---------- ---------- ---------- ---------------
bob      4540   1     3        1 response3
refoyl      4540   1     2       .8 response5
refoyl      4540   2     3        1 response8
refoyl      4541   1     2        0 response0
</code>
Tom Kyte
October 12, 2011 - 3:37 pm UTC

instead of:

ops$tkyte%ORA11GR2> select user_id, hw_id, pb_id,
  2         max(tries) keep (dense_rank first order by tries DESC) max_tries,
  3             max(grade) keep (dense_rank first order by tries DESC) max_grade,
  4             max(response) keep (dense_rank first order by tries DESC) max_response
  5    from responses_v5
  6   group by user_id, hw_id, pb_id
  7   order by user_id, hw_id, pb_id
  8  /
           max(response) keep (dense_rank first order by tries DESC) max_response
               *
ERROR at line 4:
ORA-00932: inconsistent datatypes: expected - got CLOB


<b>use this</b>


ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> select *
  2    from (select responses_v5.*,
  3                 row_number() over (partition by user_id, hw_id, pb_id order by tries DESC) rn
  4            from responses_v5)
  5   where rn = 1
  6  /

USER_ID         HW_ID      PB_ID      TRIES      GRADE RESPONSE           RN
---------- ---------- ---------- ---------- ---------- ---------- ----------
bob              4540          1          3          1 response3           1
refoyl           4540          1          2          0 response5           1
refoyl           4540          2          3          1 response8           1
refoyl           4541          1          2          0 response0           1


Index-friendly variation

Stew Ashton, October 13, 2011 - 6:31 am UTC


If you expect to get back a small subset of the table, you might want to take advantage of your primary index:
SELECT * FROM responses_v5 WHERE ROWID IN (
  SELECT MAX(ROWID) KEEP (DENSE_RANK FIRST ORDER BY tries DESC)
  FROM responses_v5
  GROUP BY user_id, hw_id, pb_id
);

Tom Kyte
October 13, 2011 - 7:44 am UTC

but in this case, the primary key is probably as big as the table.

primary key = user_id, hw_id, pb_id, tries
table =       user_id, hw_id, pb_id, tries, grade, response


primary key index = primary key + rowid

if the response is larger than 4k or is defined as "out of line", the index and the table would probably be about the same size - or not that much different.

but yes, your approach is sound.

thanks again

Robert Grossman, October 13, 2011 - 9:20 am UTC

Thanks to both Tom and Stew. Tom's solution to the CLOB problem is very elegant and readable.

The power of SQL

Enrique Aviles, October 13, 2011 - 10:20 am UTC

Currently I am acquiring all the records, ordering them by user_id, hw_id, pb_id, and -tries, and then using Java to save the first record for each unique combination of user_id, hw_id, and pb_id and discarding the remaining ones.

I'd say this type of processing happens more often than not due to a lack of knowledge of SQL. It's an excellent example of using SQL to get what you need instead of processing the data in the app to get what you need. There are obvious performance implications with either approach. SQL is the way to go.
Tom Kyte
October 13, 2011 - 10:52 am UTC

agreed.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.