Skip to Main Content
  • Questions
  • Max() analytic with order by results

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Steve.

Asked: September 08, 2020 - 3:13 pm UTC

Last updated: June 28, 2023 - 12:57 pm UTC

Version: 12c

Viewed 10K+ times! This question is

You Asked

I don't understand why when using max or min analytic functions that using order by would change the result - I would expect that all the max calls would have the same result and all the min calls would have the same result.
The documentation doesn't give me any reason to believe they would be different either.

with
td as (select 'X1' as grp1, to_date('09/03/2020 09:00:00','MM/DD/YYYY HH24:MI:SS') as stime, to_date('09/03/2020 10:59:59','MM/DD/YYYY HH24:MI:SS') as etime from dual
         union all
         select 'X1', to_date('09/03/2020 07:30:00','MM/DD/YYYY HH24:MI:SS'), to_date('09/03/2020 09:32:00','MM/DD/YYYY HH24:MI:SS') from dual
         union all
         select 'X1', to_date('09/03/2020 11:00:00','MM/DD/YYYY HH24:MI:SS'), to_date('09/03/2020 18:45:00','MM/DD/YYYY HH24:MI:SS') from dual)
select  td.*,
        max(td.etime) over (partition by td.grp1) as max_etime,
        max(td.etime) over (partition by td.grp1 order by td.stime asc) as max_etime_asc,
        max(td.etime) over (partition by td.grp1 order by td.stime desc) as max_etime_desc,
        min(td.etime) over (partition by td.grp1) as min_etime,
        min(td.etime) over (partition by td.grp1 order by td.stime asc) as min_etime_asc,
        min(td.etime) over (partition by td.grp1 order by td.stime desc) as min_etime_desc
from    td


GR STIME               ETIME               MAX_ETIME           MAX_ETIME_ASC       MAX_ETIME_DESC      MIN_ETIME           MIN_ETIME_ASC       MIN_ETIME_DESC     
-- ------------------- ------------------- ------------------- ------------------- ------------------- ------------------- ------------------- -------------------
X1 09/03/2020 07:30:00 09/03/2020 09:32:00 09/03/2020 18:45:00 09/03/2020 09:32:00 09/03/2020 18:45:00 09/03/2020 09:32:00 09/03/2020 09:32:00 09/03/2020 09:32:00
X1 09/03/2020 09:00:00 09/03/2020 10:59:59 09/03/2020 18:45:00 09/03/2020 10:59:59 09/03/2020 18:45:00 09/03/2020 09:32:00 09/03/2020 09:32:00 09/03/2020 10:59:59
X1 09/03/2020 11:00:00 09/03/2020 18:45:00 09/03/2020 18:45:00 09/03/2020 18:45:00 09/03/2020 18:45:00 09/03/2020 09:32:00 09/03/2020 09:32:00 09/03/2020 18:45:00


and Chris said...

When you have an order by in the over clause, there is a default windowing clause of:
range between unbounded preceding and current row


Which means:

Calculate the function over all the rows from the start of the data set up to all rows with the same value as the current row. So any rows with values after the current are excluded from the calculation.

So it's expected that the value of min/max can change as you work through the data set.

If you want to get the min/max over the whole partition, remove the order by.

Or you could add this window after the sort:

rows between unbounded preceding and unbounded following 


I can't think of a good reason to do this for min/max though. It's a lot of extra code for the same effect as partition by alone.

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks.

Steve, September 08, 2020 - 9:04 pm UTC

I hadn't considered the default windowing clause...
I don't include the order by myself because it never made sense to me.
However, I have seen it in code and just ignored it not realizing it made a difference until recently when I was troubleshooting an issue.
Now, I'll need to keep an eye on these situations for potential unexpected behavior.
Thanks.
Connor McDonald
September 09, 2020 - 2:46 am UTC

More examples here


I do see a reason for the order by

Jeff Seger, June 22, 2023 - 9:06 pm UTC

I was looking for a way to get the id of the last entry when sorted by date when the IDs are not guaranteed to be in the same order as the dates. so I needed

MAX(action_id) OVER(
PARTITION BY account
ORDER BY
action_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as last_id_within_acct_by_date

Thanks for the answer!!
Chris Saxon
June 23, 2023 - 9:22 am UTC

Thanks for sharing - I'm not sure that does what you're looking for though! It returns the highest ACTION_ID per account.

If you want the (highest) ACTION_ID for the last ACTION_DATE, you need:

max ( action_id ) keep (
  dense_rank last order by action_date
) over (
  partition by account
)


For example, compare the difference:

select employee_id, hire_date,
  max ( employee_id ) over (
    order by hire_date
    rows between unbounded preceding and unbounded following
  ) mx_id,
  max ( employee_id ) keep (
    dense_rank last order by hire_date
  ) over () mx_id_for_lastest_hired
from   hr.employees
where  department_id = 60
order by hire_date;

EMPLOYEE_ID HIRE_DATE                 MX_ID MX_ID_FOR_LASTEST_HIRED
----------- -------------------- ---------- -----------------------
        105 25-JUN-2015 00:00:00        107                     104
        103 03-JAN-2016 00:00:00        107                     104
        106 05-FEB-2016 00:00:00        107                     104
        107 07-FEB-2017 00:00:00        107                     104
        104 21-MAY-2017 00:00:00        107                     104

Default ORDER BY clause

mathguy, June 25, 2023 - 3:36 pm UTC

When considering a feature (like an analytic function, let's say analytic MAX(...) to be specific) with various options, I find it helpful to imagine that there's only one way that feature works, and to think of the options as special cases of a single global definition.

Some analytic functions require an ORDER BY clause, even when we aren't particularly interested in an ordering. For example ROW_NUMBER() does. If we don't care about the ordering, we can always say ORDER BY NULL, which means all rows (in a partition) are "equal". (Why a function like ROW_NUMBER() requires the ORDER BY to be present, instead of allowing the obvious default to be used when ORDER BY is missing, I never understood. We can - and do - write that in ourselves anyway, whenever needed.)

Analytic MAX does not require an ORDER BY clause, and it appears to behave differently when it only has an (optional) PARTITION BY clause without ORDER BY, vs. when it includes an ORDER BY clause.

If we don't like "different behavior depending on <whatever>" (I personally don't like that!), an alternative way to think about it is to imagine that analytic MAX always has an ORDER BY clause. If an explicit one is not present, the default is ORDER BY NULL. Then, a windowing clause always applies as well; as Chris explained, that is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Since the ordering is by NULL, all rows are "equal"; and since the windowing clause is a RANGE condition, all rows are included in the window - no matter what row we are currently on.

At the abstract level, the "default" ordering (ORDER BY NULL) coupled with the default windowing clause means "all rows in the partition". This way we don't have to consider the behavior in the absence of ORDER BY as a special, different case.
Chris Saxon
June 26, 2023 - 12:46 pm UTC

Thanks for your thoughts

last id by date

Greg, June 27, 2023 - 7:56 pm UTC

@Jeff Seger,

if I understand what you're asking, you probably don't/can't use MAX efficiently, instead, use FIRST_VALUE:

with td as
( select 123 id, to_date('09/03/2020 09:00:00','MM/DD/YYYY HH24:MI:SS') dt from dual union all
select 127 id, to_date('09/03/2020 19:00:00','MM/DD/YYYY HH24:MI:SS') dt from dual union all
select 126 id, to_date('09/03/2021 09:00:00','MM/DD/YYYY HH24:MI:SS') dt from dual union all
select 125 id, to_date('09/03/2022 09:00:00','MM/DD/YYYY HH24:MI:SS') dt from dual union all
select 124 id, to_date('09/03/2023 09:00:00','MM/DD/YYYY HH24:MI:SS') dt from dual
)
select id, dt,
first_value(id) over (order by dt desc) latest_id
from td;

ID DT LATEST_ID
---------- -------------------- ----------
124 03-sep-2023 09:00:00 124
125 03-sep-2022 09:00:00 124
126 03-sep-2021 09:00:00 124
127 03-sep-2020 19:00:00 124
123 03-sep-2020 09:00:00 124

5 rows selected.

based on the date order, id 124 is the "last one" .. and that picks it out for each ...

Chris Saxon
June 28, 2023 - 12:57 pm UTC

Yep - though this is non-deterministic if two rows have the same last DT

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.