Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, Getachew.

Asked: October 29, 2006 - 1:43 pm UTC

Last updated: April 14, 2009 - 9:43 am UTC

Version: 10.2.02

Viewed 1000+ times

You Asked

In 10.2 when a query is run with GROUP BY it does not order the output by the grouped columns. In earlier versions, the output was always sorted by the grouped columns even though there was no
ORDER BY clause present.
The GROUP BY clause does not guarantee that the output would be sorted by the grouped columns.
My question is : In order to get an ordered output do we now have to include the ORDER BY clause also to the query? Is there some parameter that controls this?

Thank you.


and Tom said...

NO, NO, NO, NO, NOT TRUE

never been true
never was true
never will be true.

There is ONLY one way to get sorted output "gauranteed" and that is.......


TO USE ORDER BY


This HAS ALWAYS AND FOREVER been the truth, period, plain, simple, honestly!!!!!!


</code> http://asktom.oracle.com/Misc/order-in-court.html <code>

group by

o HAS NEVER HAD TO SORT
o ALWAYS DID A BINARY SORT (meaning for most of the world, the sort
would have been WRONG even if it happened by accident)


not having an order by and presuming the data is sorted is one thing:

A BUG IN YOUR CODE


sorry for shouting, this just gets frustrating after a while. I don't say this as often as "bind", but really - unless and until there is an ORDER BY, the data - it ain't sorted.

Rating

  (13 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

non-sorted data in 10Gr2

Original poster, October 29, 2006 - 9:03 pm UTC

Wow! I understand the frustration but, I truely can't find any explanation when I see two different results (the way the data was being displayed) when I run the exact same query in 8i, 10GR1 and 10GR2.
8i and 10GR1 display one way "sorted", where as 10GR2 displays it onother way "not sorted".

Thank you Tom!




Tom Kyte
October 30, 2006 - 8:44 am UTC

Wow, I don't get why you think the results are different if you don't have an order by!!!


ops$tkyte%ORA9IR2> create table t ( x int, y int );

Table created.

ops$tkyte%ORA9IR2> insert into t
  2  select 100-rownum, rownum
  3    from all_users
  4   where rownum <= 5;

5 rows created.

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> select * from t where x > 0;

         X          Y
---------- ----------
        99          1
        98          2
        97          3
        96          4
        95          5

ops$tkyte%ORA9IR2> create index t_idx on t(x);

Index created.

ops$tkyte%ORA9IR2> select * from t where x > 0;

         X          Y
---------- ----------
        95          5
        96          4
        97          3
        98          2
        99          1


Different access paths, different "sorted orders", always has been, always will be, always is TRUE.  Tell me, is the above query "broken" - all I did was add an index, access path changed.

Unless and until there is an order by - you cannot have any expectation over the sorted order of the data.


It was never ever true that group by had to sort.
It was never ever true that group by did sort always.
It was never ever true that you could rely on group by sorting.
One of the links I pointed you to shows that going way way way back in time.



 

Group By Behaviour

A reader, October 30, 2006 - 8:36 am UTC

If I understand you correctly, when we *appeared* to get sorted resullts in a group by query, this was as a consequence of the binary sort?

Tom Kyte
October 30, 2006 - 9:27 am UTC

correct, it just had to sort bytes, it didn't need to use a collating sequence.

Order by

A reader, October 30, 2006 - 10:06 am UTC

Tom,
"order by" seems to be an English like statement which conveys the impression that records will appear "ordered". I do not understand that without using "order by", how can there be an expectation of ordering of records, be it for whatever reason? If I do not give a proper directive to Oracle, how can I expect a certain behavior?

I witnessed a similar situation where developers had coded wihout using "order by" and relying on group by statement to do ordering. When code moved from earlier release of Oracle to 10g, it broke down. Everyone started screaming that it was a "bug in Oracle". Lot of time and energy was expanded, including a SR to somehow convince Oracle support that it was a bug. It is clearly written in 9i and 10g documentation that group by doesn't guarantee record ordering. Finally the developers had to add the "order by". It still upsets me when they refer to this as a quirky behavior of 10g.

Thanks

Tom Kyte
October 30, 2006 - 12:38 pm UTC

It upsets me no end.

The only thing about "group by sorting" that can be said is:

It has never been true. (see link above, I show this in 7.x days!)
It is not currently true. (obviously)
It will never be true.... (I can say that with great confidence)

What I really can't understand is why this question keeps coming up

John, October 30, 2006 - 10:38 am UTC

I think Tom has answered this question over a dozen times! Search the site first people!
Tom,
My hat is off to your patience.

Tom Kyte
October 30, 2006 - 12:45 pm UTC

this will also be printed in the jan/feb oracle magazine :) about this "group by sorting" nonsense.

Alberto Dell'Era, October 30, 2006 - 2:29 pm UTC

Could be that some other RDBMS orders group by by default - hence this persistent misunderstanding ?

Tom Kyte
October 30, 2006 - 3:16 pm UTC

Oracle tended to do a binary sort in many cases...

This is akin to someone observing that when they flip a coin..... it comes up heads - therefore.....


the funny thing is - if the optimizer could skip the order by (the binary sort done by group by was "good enough") it would... Meaning, adding the order by to a query with group by didn't negatively affect performance (and if it DID cause the query to run slower, well, then the group by wasn't SORTING the data in the first place!!!)

GROUP BY HASH

A reader, October 30, 2006 - 3:56 pm UTC

With 10gR2 (or even earlier) Oracle may choose between grouping by applying a hash function or by binary sort. A "hash group by" operation as referenced in other AskTom threads. So even less reason to skip an "ORDER BY" clause if ordering is required in the output.

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:43802711317947 <code>

But maybe some folks hope they can shave development costs by having their developers type as little as possible? :-)

To "A reader"

Mark, October 30, 2006 - 6:58 pm UTC

"'order by' seems to be an English like statement which conveys the impression that records will appear 'ordered'. I do not understand that without using 'order by', how can there be an expectation of ordering of records, be it for whatever reason? If I do not give a proper directive to Oracle, how can I expect a certain behavior?"

Sometimes a developer (or anyone) infers an imagined behavior of a program, based on some kind of apparently consistent pattern in its output. They're looking at a side effect, and not making the effort to understand what's truly going on (i.e. not RTFM). This phenomenon isn't exclusive to order by... I can't think of a specific example, but I know I've seen this kind of thing before.

Almost makes me wish Oracle randomly jumbled the results when there is no order by clause, but obviously that would be a silly waste of computing power.

Oft discussed, but the answer is unchanging

Duke Ganote, December 21, 2006 - 2:06 pm UTC

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:74320098178823#74401237604050 <code>
One of the few topics that now makes me feel "out of sorts" :)

Thanks & question

Sergei, December 27, 2006 - 10:20 am UTC

Thank you. I did not found such clear explanation and advice in Oracle 10g documentation.
Wht about GROUP BY ROLLUP function? It looks like sorting right implicitly like it was in Oracle 9i. Do we need also to add ORDER BY to right assure sorting order?

So, why the following behaviour then:

A reader, October 27, 2008 - 8:50 pm UTC

The following is based on Orade Demo Schema "SH":

SQL> alter session 
set optimizer_features_enable = '10.2.0.3.1';

Session altered.

SQL> select country_id, count(*)
  2  from customers
  3  group by country_id;

COUNTRY_ID   COUNT(*)
---------- ----------
     52779       3833
     52777        383
     52770       7780
     52782        624
     52776       8173
     52790      18520
     52773        403
     52772       2010
     52791         88
     52785        244
     52775        832
     52789       7557
     52787         75
     52778       2039
     52771        712
     52769        597
     52786        708
     52774        831
     52788         91

19 rows selected.

SQL> alter session 
set optimizer_features_enable = '9.2.0';

Session altered.

SQL> select country_id, count(*)
  2  from customers
  3  group by country_id;

COUNTRY_ID   COUNT(*)
---------- ----------
     52769        597
     52770       7780
     52771        712
     52772       2010
     52773        403
     52774        831
     52775        832
     52776       8173
     52777        383
     52778       2039
     52779       3833
     52782        624
     52785        244
     52786        708
     52787         75
     52788         91
     52789       7557
     52790      18520
     52791         88

19 rows selected.

Oracle University books are wrong and they state a very misleading statement that by product of a Group By is a sort.  

Tom Kyte
October 27, 2008 - 8:57 pm UTC

so, are you of the school that if you flip a coin N times and N times it shows up heads - it'll be heads the N+1 time too?

come on

re-read my original article above.

And if you can get me an explicit reference to Oracle literature that says "group by sorts", I'll have it fixed.


group by has NEVER had to sort - never.

did it sometimes sort?

Yes, sure.

Did it always sort?

No, not at all.

And did it sort correctly when it did sort?

Sometimes, but not always. It always did a BINARY SORT - not a character set sort. So the data would be sorted incorrectly if you use anything but very simple ASCII strings.

SORT GROUP BY

bigo, April 13, 2009 - 5:55 am UTC

Tom,

My question is related to the above thread, but a bit different(trust me, i know group by does not sort as intended).

I have a table like:

Table X
Columns (col1,col2,col3,col4)
where col1 and col2 form a "logical" primary key for the table.

It is partitioned by DAY and about 500 million rows / day.

I write a query like:

SELECT COL1,COL2,MAX(COL4)
FROM X
WHERE DAY = TRUNC(SYSDATE -1)
GROUP BY COL1,COL2

In the explain plan, it shows me a SORT GROUP BY.

I would expect (from a SD perspective) that a MAX should not require sorting, is there any way I can tell the oracle optimizer not to sort (O(nlogn)) my results and just perform a MAX() (O(n)). ?

Why does oracle do this? Please help me understand.

Thanks!
Tom Kyte
April 13, 2009 - 5:26 pm UTC

from a SD perspective - no clue what "SD" is.


You are throwing out o(nlogn) like it "means" something, it doesn't. sort group by is just a step name, it does smart stuff under the covers.

grouping requires some sort of organization, that is what the "sort" bit is about. We can in current releases use hashing instead of ordering data, but the sort it to basically be able to "find keys again"

You say "col1 and col2 are like a 'logical' primary key" (which is not meaningful, I have no idea what a "logical primary key" is... or means)...

if they are a unique key, then obviously you did not mean to use group by, it is not necessary.

bigo, April 13, 2009 - 5:52 pm UTC

Sorry about the vague review...

SD = software developer

I guess I wanted to understand what is the smart stuff that Oracle is doing under the covers when it shows a SORT GROUP BY in the explain plan.

Anyways, here's a better example of the scenario im facing:


My_table: (lets call it TBL)

Col1 | Col2 | Col3 | version_number


My source for this data is a real-time stream which publishes a full-update (the whole row) to me everytime any of the columns changes.

This leads to a lot of updates per day for the same col1,col2,col3 values...I only want to keep the latest one (by version_number).

The size is about 1 BN / day for total rows recieved.

I am trying to figure out the most efficient way in SQL to de-dupe the data. I could either use the row_number() analytic function and order the set by version_number...picking only one row or...create a skinny table by selecting all columns and max(version_number) group by col1,col2,col3.

When I tried the max()..i saw the SORT GROUP BY in the explain plan and I wasnt sure why that was there.

I need help in understanding what that SORT GROUP BY means.
I read that Oracle 10.2 onwards has something called HASH GROUP BY which can be set using a parameter?

Can you explain the difference between the two?

Let me know if I'm not clear...and thanks for the quick reply!



Tom Kyte
April 14, 2009 - 9:43 am UTC

select c1, c2, max(c3) .....

the sort group by will create a sorted list of c1, c2 as we scan the table - having c1, c2 being "unique" - and maintaining the max(c3).

So, if there are 100,000 unique c1, c2's - and there are 100,000,000 rows in the table - then there will be 100,000 c1/c2's in 'temp' whilst we find the max(c3) for each one.


I don't understand why you would compare using row_number() on the fly to creating a "skinny table" using group by???


Why wouldn't you compare using row_number to using group by in the fly?

And then compare "doing it on the fly (using either technique) versus using a materialized view"



The sort group by is there because - you asked for an AGGREGATE and that is how they are built.


hash group by does the same thing conceptually as a sort group by - but uses hashing to find the c1/c2 values instead of a sorted list of values.



Think of it this way, you have data in your table like this:

c1       c2       c3
-----    -----    ------
1        1        2
1        2        3 
2        1        4 
1        1        42
......



we start scanning the data, we hit row 1, in temp we'll have:


1,1 -> 2

we hit row 2, now we have

1,1 -> 2
1,2 -> 3

we hit row 3, now we have

1,1 -> 2
1,2 -> 3
2,1 -> 4

we hit row 4, now we have:

1,1 -> 42
1,2 -> 3
2,1 -> 4

now return the data, you get the data as it appears right above, accidentally sorted in binary sort order (sort group by)

we have a sorted list - so we can lookup c1/c2 and perform the aggregation.


If you used hash group by it would be conceptually like the following:

hit row 1, hash(1,1) into a hash table slot in temp. Suppose that returns 1032 as the 'slot' in the hash table. We put (1,1->2) into slot 1032 in temp.

hit row 2, hash(1,2) into that hash table. Suppose that returns 55 as the slot. We put (1,2->3) into slot 55.

hit row 3, hash(2,1) into that hash table. Suppose that returns 201 as the slot. We put (2,1->4) into slot 201.

hit row 4, hash(1,1) - you get 1032 again. Goto slot 1032 - see data is there, modify it (do the aggregate) and store (1,1->42) in slot 1032.

When done with all rows in table, return the data from the hash table, you get

1,2->3
2,1->4
1,1->42

as the output (not sorted in a binary fashion, hashing spreads data out) - hash group by.



You can see that the amount of "temp space" needed is a function of the rows returned (the distinct cardinality of c1,c2 - NOT the number of rows in the table) via the following:

ops$tkyte%ORA9IR2> /*
DOC>drop table t1 ;
DOC>drop table t2 ;
DOC>
DOC>create table t1
DOC>as
DOC>select mod(rownum,2) c1, mod(rownum,2)+1 c2, object_id c3
DOC>  from all_objects
DOC>/
DOC>create table t2
DOC>as
DOC>select mod(rownum,500) c1, mod(rownum,200)+1 c2, object_id c3
DOC>  from all_objects
DOC>/
DOC>*/
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> alter session set workarea_size_policy = manual;

Session altered.

ops$tkyte%ORA9IR2> alter session set sort_area_size = 65536;

Session altered.

ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> set autotrace traceonly
ops$tkyte%ORA9IR2> select c1, c2, max(c3) from t1 group by c1, c2;


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (GROUP BY)
   2    1     TABLE ACCESS (FULL) OF 'T1'




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         62  consistent gets
          0  physical reads
          0  redo size
        555  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          2  rows processed

ops$tkyte%ORA9IR2> select c1, c2, max(c3) from t2 group by c1, c2;

1000 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   SORT (GROUP BY)
   2    1     TABLE ACCESS (FULL) OF 'T2'




Statistics
----------------------------------------------------------
          0  recursive calls
         57  db block gets
         69  consistent gets
        225  physical reads
          0  redo size
      23891  bytes sent via SQL*Net to client
       1225  bytes received via SQL*Net from client
         68  SQL*Net roundtrips to/from client
          0  sorts (memory)
          1  sorts (disk)
       1000  rows processed

ops$tkyte%ORA9IR2> set autotrace off



notice that when c1,c2 have few distinct values (2 in my example) - the "sort" (which is not really a sort, just a sorted set of distinct values of c1,c2) is done in memory (used a 64k sort area size).

when c1, c2 have many distinct values (1000 in my example) the "sort" (which is not a sort really) is swapped to disk - 64k could not hold 1,000 intermediate results.

Streaming of GROUP BY

Lauri Pietarinen, May 15, 2014 - 5:05 pm UTC

Hi Tom,

I don't know if this exactly fits the thread but could not find a better one.

Returning to your table T:

c1 c2 c3
----- ----- ------
1 1 2
1 2 3
2 1 4
1 1 42

If we had an index TX(c1,c2,c3) on T can Oracle "stream out" the answers without sorting or hashing in the following example:

select c1, c2
from T
group by c1, c2
having count(*) > 1;

I would presume, that using index TX Oracle could just scan through the index and output the qualifying rows as it meets them without having to save any temporary stuff anywhere. Especially if this is a large table we are traversing, it would speed up things to be able to execute the query with one pass on the data.

Is this an option for Oracle?