div.b-mobile {display:none;}

Tuesday, August 09, 2005

Order in the court!

As I was proofing my chapters yesterday, I couldn't help but notice that in three chapters I said the same thing, stronger and stronger each time. It was about using (or not using more specifically) ORDER BY. I see it time and time and time again.Someone assumes that because they observe the data coming out sorted from a query or that a query is using an index - they can and should leave off the ORDER BY. That is a career limiting move to borrow a phrase from Connor McDonald (he used it in reference to reorganizing your database but it applies totally here). It is a time bomb ticking away in your code.

Did you know simple optimizer change - no more sort.  Data could appear to be sorted today, statistics gathered and not sorted tomorrow.

Did you know that reading data out via an index - does not imply data is sorted by the index key!

That a group by does not sort (necessarily, and even if it does, it is a binary sort – which is different than a character sort for those of us outside of US7ASCII land).  Unfortunately, part of that thread is missing for some reason (I remember it well, I loved the comment: “I have frequently thought that such 'preaching' is simply an indication that the person answering isn't willing to admit that they don't know how to do it” said right before they tried to show someone how to sort using GROUP BY.  Pot, kettle, and black all popped into my head at that).

Anyway, the bottom line – you want data sorted?  You better use ORDER BY because you cannot have any expectations on the sorted nature of the data otherwise!

In chapter 10 on tables, I wrote:

You should think of a heap organized table as a big unordered collection of rows. These rows will come out in a seemingly random order, and depending on other options being used (parallel query, different optimizer modes and so on), they may come out in a different order with the same query. Do not ever count on the order of rows from a query unless you have an ORDER BY statement on your query!

In chapter 11 on indexes, I wrote:

Note Do not be tempted to ever leave an ORDER BY off a query. Just because your query plan includes an index does not mean the data will be returned in "some order." The only way to retrieve data from the database in some sorted order is to include an ORDER BY on your query. There is no substitute for ORDER BY.

And finally, in the chapter on partitioning, I did it again, a tad stronger.

USING ORDER BY


This example brought to mind an unrelated but very important fact. When looking at hash partitioned indexes, we are faced with another case where the use of a index to retrieve data would not automatically retrieve the data sorted. Many people assume that if the query plan shows an index is used to retrieve the data, the data will be retrieved sorted. This has never been true. The only way we can retrieve data in any sort of sorted order is to use ORDER BY on the query. If your query does not contain an ORDER BY statement, you cannot make any assumptions about the sorted order of the data.

A quick example demonstrates this. We create a small table as a copy of ALL_USERS and create a hash partitioned index with four partitions on the USER_ID column:

SQL> create table t
  2  as
  3  select *
  4    from all_users
  5  /
Table created.
 
SQL> create index t_idx
  2  on t(user_id)
  3  global
  4  partition by hash(user_id)
  5  partitions 4
  6  /
Index created.

Now, we will query that table and use a hint to have Oracle use the index. Notice the ordering (actually, the lack of ordering) of the data:

SQL> set autotrace on explain
SQL> select /*+ index( t t_idx ) */ user_id
  2    from t
  3   where user_id > 0
  4  /
 
   USER_ID
----------
        11
        34

        81
       157
        19
        22

       139
       161
         5
        23

       163
       167
        35
        37

        75
       160
38 rows selected.
 
Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=38 Bytes=494)
   1    0   PARTITION HASH (ALL) (Cost=4 Card=38 Bytes=494)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX' (INDEX) (Cost=4 Card=38 Bytes=494)
 
SQL> set autotrace off

So, even though Oracle used the index in a range scan, the data is obviously not sorted. In fact, you might observe a pattern in this data. There are "four sorted" results here; the  … replaces values that were increasing in value; and between the rows with USER_ID = 34 and 81, the values were increasing in the output. Then the row with USER_ID = 19 appeared. What we are observing is Oracle returning "sorted data" from each of the four hash partitions, one after the other.

This is just a warning that unless your query has an ORDER BY, you have no reason to anticipate the data being returned to you in any kind of sorted order whatsoever. (And no, GROUP BY doesn't have to sort either! There is no substitute for ORDER BY.)

Caveat Emptor

POST A COMMENT

65 Comments:

Blogger scubajim said....

I find using order by and not using order by in looking at query performance is valuable. I often run a query with and without the order by (several times to get a an average) to see what the timing difference is. This method allows me to attribute the delta in times to the time it takes to sort the results. Often the sort area is too small or the sort results are too large (3 million row table being sorted)

Tue Aug 09, 02:10:00 PM EDT  

Blogger Thomas Kyte said....

I find using order by and not using order by in looking at query performance is valuable.

That is fine, to find the "cost" of the sort, however at the end of the day there is only one of two things that are true:

a) you need the data to be predicably sorted. An order by is thus a mandatory part of your SQL statement

b) you do not need the data to be sorted. If you have an order by - ok, you are just maybe wasting cpu cycles, but no big deal.

Tue Aug 09, 02:18:00 PM EDT  

Blogger Alberto Dell'Era said....

I find using order by and not using order by in looking at query performance is valuable.

I've also heard that in 10gR2, they have changed the sorting algorithm, thus spectacularly increasing performance ;)

Tue Aug 09, 02:35:00 PM EDT  

Blogger Niall said....

Alberto wrote

I've also heard that in 10gR2, they have changed the sorting algorithm, thus spectacularly increasing performance ;)

I note the smiley, but just for the sake of completeness. Tom's 1998 post notes two different mechanisms for group by existing from 1994 (back when I thought Accountancy was great). How many group by and sort mechanisms might there be in 2005? How would we know?. How does automatic PGA 'tuning' affect this on a system short of memory? and so on.

ORDER BY does what it says - and no more or less
GROUP BY does what it says - and no more or less.

that works for me.

Tue Aug 09, 03:32:00 PM EDT  

Blogger Thomas Kyte said....

but just for the sake of completeness.

I think Alberto was trying to drive home the fact that the new HASH BASED operations in 10gR2 will replace sorts in more and more operations. Check this
example out. "HASH GROUP BY"

Tue Aug 09, 03:42:00 PM EDT  

Blogger scubajim said....

Yes, I only use it to determine the "cost" of the sort. If you need it sorted one must indicate that explicitly (order by...). It does help diagnose what is going on.

Sometimes they only want "recent rows" and they can use the sort and put a condition to allow the database to sort a much smaller set Thus instead of sorting a million call center entries to see what was entered in the last hour; such as get me all the rows that have been updated or created in the last 4 hours and sort them from most recent to least recent.

Tue Aug 09, 04:16:00 PM EDT  

Blogger Joel Garry said....

a career limiting move

Funny, it never seems to stop managers with brains stuck in dbase-land.

Tue Aug 09, 04:22:00 PM EDT  

Blogger David Aldridge said....

Funny, it never seems to stop managers with brains stuck in dbase-land.
That's because they'd be the ones performing the limiting, not the one suffering from it.

Tue Aug 09, 04:34:00 PM EDT  

Blogger LC said....

Catchy title, but still a whole blog entry on order by does order by.

good style, but unexpected material;

I suppose the new and "cool" features are only relevant if people have the basics.

And sometimes the simplest things can be the most thought provoking.

Tue Aug 09, 05:21:00 PM EDT  

Blogger Alberto Dell'Era said....

Thanks, HASH GROUP BY! How cool and performant ... and definitely not returning the rows sorted :)

Actually, I was referring to the Internet Seminar "overview of Oracle Database 10g Release 2" available here
http://www.oracle.com/database/index.html
Near the end, they say to have changed the sort algorithm, for much better performance.

It seemed to me a very interesting thing to know - especially since it will benefit all existing code transparently (such as HASH GROUP BY , and, in 10gR1, the improved PL/SQL performance, the new outer hash join ecc - all in all, it seems that the 10g releases have transparent performance as one of the main goals).

Tue Aug 09, 06:21:00 PM EDT  

Anonymous Eddie Awad said....

If we omit the "order by" clause, the rows may come out in a random order. However, sometimes, there is an intentional need to return rows in a random order. In that case, just "order by dbms_random.random". It sounds a little bit like an oxymoron, "order" and "random", but it works.

Tue Aug 09, 06:31:00 PM EDT  

Blogger Robert Vollman said....

Tom,

I noticed you included an example to illustrate your point. Good!

Now, is there any situations you're aware of where you can guarantee your results will come in sorted order WITHOUT an ORDER BY clause? I think you're saying "no", but just want to be sure

Tue Aug 09, 07:09:00 PM EDT  

Blogger Thomas Kyte said....

any situations you're aware of where you can guarantee your results will come in sorted order

not without making lots of assumptions, that by the way could change with the whim of the optimizer

which means

NO

unless you have order by, any order the rows come out in is CORRECT.

The old saying about

ASSUME

holds true here. without order by, there is no order.

Tue Aug 09, 07:14:00 PM EDT  

Blogger Noons said....

"without order by, there is no order"

and even more pertinent in these days of ASSM: virtually guaranteed that the order you thought your rows went in won't be the same when they come out. ORDER BY is definitely a mandatory item now.

Mind you: ASSM is a problem for those who rely on pre-sorting rows to cluster them physically on a given key combination that may not necessarily match the PK.

Tue Aug 09, 08:27:00 PM EDT  

Blogger David Aldridge said....

"ASSM is a problem for those who rely on pre-sorting rows to cluster them physically on a given key combination that may not necessarily match the PK"
Maybe there's room for another article as a sort of inverse of the ORDER BY one then ... just as you need ORDER BY to guarantee order in a query result, you need a Cluster to guarantee physical storage order in the presence of ASSM. Another assumption bites the dust.

Tue Aug 09, 10:00:00 PM EDT  

Blogger Connor McDonald said....

any situations you're aware of where you can guarantee your results will come in sorted order ?

One row table

:-)

Its a bummer I can't the very first discussion of this on Google, mid 90's I think, which is very similar to the one you (TK) posted. Some guy recommended using group-by in views, followed by TK showing the counter-example. The guy responded with "I've a senior expert at Boeing and I've been done their databases for 'n' years blah blah blah"...

to which I posted

create view SAFEST_RUNWAY as
select safety_rating, max(runway)
group by safety_rating

Hmmmm....

Tue Aug 09, 10:29:00 PM EDT  

Blogger Noons said....

Does UNION do a sort or is it the same as GROUP BY?

Wed Aug 10, 12:14:00 AM EDT  

Anonymous Simo said....

Since some people seem to think data can be retrieved in order without using the order by, is there database which actually does that or anyone know where this assumption comes from? Or is it just common lazyness not to include it :)

Wed Aug 10, 01:51:00 AM EDT  

Blogger Thomas Kyte said....

create view SAFEST_RUNWAY

Connor -- that is the same thread (keystoke), bits of it are definitely missing :(

That is why I remember that thread so well, I remembered your posting to it. Half of my posts are missing in that thread too (I was the company line)

Does UNION do a sort or is it the same as GROUP BY?

Union is a distinct operation and may or may not sort in order to distinct. In any case, the sort it does would be a binary sort, not a character set sort (eg: does not sort the same as order by would necessarily). Throw in parallel operations, partitioning and all bets are off.

Or is it just common lazyness

It is not laziness so much as "thinking they are optimizing something". Only thing is, the optimizer will SKIP the sort when it knows it can (because it will use an index appropriately to read the data "sorted") so they really are not saving anything.

The other part is "misconception", similar to the guy in the internet posting. He observed empirically that group by seems to sort. He personally never observed it (or maybe he wasn't always looking) NOT sort. Therefore (in his mind) group by must sort. Very similar to flipping a coin and observing heads over and over and coming to the conclusion that coins, when tossed, end up heads up.

Wed Aug 10, 07:51:00 AM EDT  

Blogger Thomas Kyte said....

It is funny, there is something in the text of this blog that is just drawing spambots. Zapped three of them in the last hour.

Wonder what keywords we used to draw them on this one...

Wed Aug 10, 09:17:00 AM EDT  

Blogger Connor McDonald said....

Now that we've moved onto UNION, what is the freakin' obstacle with UNION ALL

Every code review I do and point out "UNION ALL", the developer looks at it as if its some brand spanking new feature that must have just come out...

Wed Aug 10, 09:33:00 AM EDT  

Anonymous Rob H said....

If you like gizmodo, try engadget, slashdot.org, and of course the ever popular fark.

Wed Aug 10, 09:55:00 AM EDT  

Anonymous Anonymous said....

I think its just because its counterintuitive.

SELECT, SELECT DISTINCT
UNION ALL, UNION

All of the following are more intuitive:

SELECT ALL, SELECT
UNION ALL, UNION

SELECT, SELECT DISTINCT
UNION, UNION DISTINCT

SELECT ALL, SELECT DISTINCT
UNION ALL, UNION DISTINCT

The latter one is my personal preference as it is fully states what its going to do. It's almost completely implemented in oracle (UNION DISTINCT is the only one that doesn't work/exist). Then "select" and "union" would just be shortcuts for "select all" and "union distinct".

Wed Aug 10, 10:50:00 AM EDT  

Blogger Robert Vollman said....

any situations you're aware of where you can guarantee your results will come in sorted order ?

One row table


Smart ass!

Its a bummer I can't the very first discussion of this on Google, mid 90's I think, which is very similar to the one you (TK) posted. Some guy recommended using group-by in views, followed by TK showing the counter-example. The guy responded with "I've a senior expert at Boeing and I've been done their databases for 'n' years blah blah blah"...

to which I posted

create view SAFEST_RUNWAY as
select safety_rating, max(runway)
group by safety_rating


SENIOR smart-ass. :)

Wed Aug 10, 11:08:00 AM EDT  

Anonymous Anonymous said....

Since some people seem to think data can be retrieved in order without using the order by, is there database which actually does that ... ?

Actually, yes. I cut my teeth on Ingres 6.4 and its successor, OpenIngres 1.2, back in the mid-nineties. In Ingres, I could always count on "group by" returning results in order. Thus, I routinely left "order by" off in queries doing grouping. I like to think of it as "lexical optimization" -- the less typing, the better. It's a throwback to my programming classes, at a time when the state-of-the art MS-DOS PC, running at 8 MHz, had 20-MB (!) hard drives and 640 K of RAM. My teachers stressed writing very efficient code and leaving out anything that wasn't absolutely necessary.

Needless to say, it definitely required an adjustment on my part when I got to Oracle four years ago. Between Ingres and Oracle I worked briefly with SQL Server 7 and 2000. My memory is a bit fuzzy, but it seems to me that I could get away with omitting "order by" there as well. Or maybe I just didn't work with it long enough for the practice to bite me.

Regards,

Bob Shepard

Wed Aug 10, 11:21:00 AM EDT  

Blogger Thomas Kyte said....

Actually, yes.

Actually, no, you might have observed ingress frequently returning data ordered as a result of group by -- but no where in the ingress documenation will you find "group by is synonymous with order by"

In fact, the index example I used on the newsgroup thread way back when might be sufficient on any relatively smart database to avoid the sort in a group by.

In any case, that is appears to sort in 6.4 does not mean it would in 7.x!!!

In SQL (anybodies SQL), unless you use order by, you cannot have any expectations as to the order of the rows returned!!!

Wed Aug 10, 11:37:00 AM EDT  

Blogger Joel Garry said....

Connor -- that is the same thread (keystoke), bits of it are definitely missing :(

So I went looking for the thread, and eventually found it, but after a "Server is busy try later" message. I found 17 messages (IIRC, it was early this morning), it didn't seem like anything was missing, but it's hard to tell.

Then I googled about, wondering what I was doing during that thread, and found some contemporaneous posts of mine. Stuff I'm pretty sure I haven't looked at all these years.

Many hours later, different computer/ISP, I did a google groups search on my name (not logged in), as I often do to figure out what threads I'm involved in. But I neglected to sort by date - and the first results were sorted by reverse order of what I had looked at earlier!

I interpret all this to mean google has some sort of amazingly large ageable cache of what has been accessed. And further, I speculate they haven't figured out the basic concurrency and retrieval issues us db folk take for granted (group by sorting notwithstanding :-).

On the other hand, they may track the few ip's I post from and relate them...?

Wed Aug 10, 02:49:00 PM EDT  

Anonymous Anonymous said....

This is the thread which Tom and Connor were talking about
http://groups-beta.google.com/group/comp.databases.oracle.misc/browse_frm/thread/956a28e99b576975/3eb87dbeee4a2cb3

Thu Aug 11, 01:41:00 AM EDT  

Anonymous Anonymous said....

Oops:(

http://groups-beta.google.com/
group/comp.databases.oracle.misc/
browse_frm/thread/
956a28e99b576975/3eb87dbeee4a2cb3

Thu Aug 11, 01:43:00 AM EDT  

Anonymous Jeni Russell said....

Ah, how I wish I could hunt down the programmer a sparred with a few years ago on this very subject. Your point is well-taken, Tom -- we set out to tune up a group of poorly-performing (vendor-designed) OLTP databases (48 twins sharing the same 8-CPU Sun 15K domain, making scalability an obvious goal). We realized the table/index/column stats were incomplete at best, and corrected that. It provided drastic and immediate relief! Upon which the app programming team opened a priority 1 ticket and insisted we change it back -- because one piece of code which had "always" produced sorted output (because it used a particularly-inappropriate index) was now returning unsorted results. And the programmer responsible for this code vehemently defended it, insisting that no "order by" was necessary because his supremely sophisticated code was designed to use that index to produce ordered results.
I wish I could find him now and ask him to post his argument to your blog!

Thu Aug 11, 01:56:00 AM EDT  

Anonymous Anonymous said....

Correct me If I'm wrong, but isn't Order by allowed in views in 9I (not sure since what version). Did you guys bow for mr SENIOR smart-ass? ;)

Can you tell why 'order by in views' was implemended after all? From the usenet discussion I understand it was not back in 1998. Also you were pretty 'passionate' about it not being implemented.

Although possible I never use order by in views though... I don't see the benefit of putting it in the view but I do see dis-advantages, like you already pointed out in the thread.

Thu Aug 11, 05:32:00 AM EDT  

Blogger Niall said....

Yes order by is supported in views (IIRC it came in in 8i in fact)

<pre>
SQL>SELECT BANNER FROM V$VERSION;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - Prod
PL/SQL Release 10.1.0.4.0 - Production
CORE 10.1.0.4.0 Production
TNS for 32-bit Windows: Version 10.1.0.4.0 - Production
NLSRTL Version 10.1.0.4.0 - Production

5 rows selected.

SQL>CREATE VIEW ORDERED_TEST
2 AS SELECT OWNER,OBJECT_NAME
3 FROM ALL_OBJECTS
4 ORDER BY OWNER;

View created.

SQL>EXPLAIN PLAN
2 FOR
3 SELECT * FROM ORDERED_TEST
4 ORDER BY OBJECT_NAME;

Explained.

SQL>@SHOW_PLAN_9I

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------

Plan hash value: 93631463

-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 23372 | 776K| | 1543 (2)| 00:00:19 |
| 1 | SORT ORDER BY | | 23372 | 776K| 2040K| 1543 (2)| 00:00:19 |
| 2 | VIEW | ORDERED_TEST | 23372 | 776K| | 1325 (2)| 00:00:16 |
| 3 | SORT ORDER BY | | 23372 | 2145K| 11M| 1325 (2)| 00:00:16 |
|* 4 | FILTER | | | | | | |
|* 5 | HASH JOIN | | 53633 | 4923K| | 159 (10)| 00:00:02 |
| 6 | TABLE ACCESS FULL | USER$ | 75 | 1125 | | 3 (0)| 00:00:01 |
|* 7 | TABLE ACCESS FULL | OBJ$ | 53546 | 4130K| | 155 (10)| 00:00:02 |
|* 8 | TABLE ACCESS BY INDEX ROWID | IND$ | 1 | 8 | | 2 (0)| 00:00:01 |
|* 9 | INDEX UNIQUE SCAN | I_IND1 | 1 | | | 1 (0)| 00:00:01 |
| 10 | NESTED LOOPS | | 3 | 72 | | 26 (0)| 00:00:01 |
|* 11 | INDEX RANGE SCAN | I_OBJAUTH1 | 1 | 11 | | 2 (0)| 00:00:01 |
|* 12 | FIXED TABLE FULL | X$KZSRO | 3 | 39 | | 24 (0)| 00:00:01 |
|* 13 | FIXED TABLE FULL | X$KZSPR | 1 | 26 | | 24 (0)| 00:00:01 |

continues.....</pre>

Thu Aug 11, 07:51:00 AM EDT  

Blogger Thomas Kyte said....

This is the thread which Tom and Connor

Thanks -- I thought it was the same discussion, but apparently, it was a different discussion, same person.

Jeri said ... I wish I could find him now and ask him to post his argument to your blog!

He had no argument whatsoever :) Indexes may or may not be used (the physical implementation isn't relevant to the result set, you need to say what you mean)


Correct me If I'm wrong, but isn't Order by allowed in views in 9I

Yes, it was added in 8i, 8.1.5. I've never used it in a view, although I have used it often in INLINE VIEWS with top-n queries. There it is very useful. In view, it only seems that it could make for some really poor performance and nothing else.

Thu Aug 11, 08:14:00 AM EDT  

Anonymous Anonymous said....

hi Tom:

if there is already an index in place for the field of "order by", will that improve query performance.

i have a query that return over 1 Million record with an "order by", i did not seems to see Oracle using that index from the execution plan (table analyzed),
no major performance difference btw a query run with / without index

1. Do i need to force oracle using that index by putting on hint?

2. hugh number of blocks (20K blocks per query) taken by check the v$sort_usage view, but seems disk sort / in-memory sort is perfect

16774789 (sort memory)
4977 (sort disk)
6003132084 (sorts rows)

I am just a bit confused...any hint? thanks

Thu Aug 11, 03:23:00 PM EDT  

Blogger Thomas Kyte said....

if there is already an index in place for the field of "order by", will that improve query performance

the law of "3" comes into play

the index might

a) make it faster
b) make it much slower
c) have no measurable effect.


If you are reading 1 million out of 1 million rows -- do you really want to read them slow by slow (block by block) via the index.

Tell you what, hint it, and measure it.

Full scans are *not* evil.

Reading data via the index is not necessarily a good thing.

Thu Aug 11, 04:21:00 PM EDT  

Anonymous Anonymous said....

thanks Tom, haaa..my first question ansered by guru, :>), read asktom regularly.

this is query with 2 tables join a small drive table with another table with over 100M record, query result return 1M record, it runs pretty quick when take the 'order by' out, but slow down dramatically when it come with 'order by'.

i will try 'hint' tonight and see if there is any effect. spent the whole yesterday night to tune this query, seems got to burn another night...life for DBA :<(

Thu Aug 11, 04:33:00 PM EDT  

Blogger Laurent Schneider said....

well, I guess there is one exception : a sorted single table hash cluster. There you do NOT need an order by.

out of the doc:
Given the sort order of the data, the following query would return the call records for a specified hash key by oldest record first.

SELECT * WHERE telephone_number = 6505551212;

Fri Aug 12, 10:47:00 AM EDT  

Anonymous Gabe said....

Laurent,

I think the point made [over and over] is that one should treat SQL as an interface, an API ... and hence, as a software layer hiding/abstracting the physical details. The only guaranteed way to get data sorted through that layer, now and in the future [as long as the specifications don't change], is to use ORDER BY ... and that should really be the end of it.

Assuming one codes without an ORDER BY in this situation ...
1. Are you sure NLS settings won't matter here?
2. What happens if the table is re-implemented outside of the hash cluster ... after all, that is one reason to have this SQL interface ... to separate the applications from the physical storage details. It is OK to change the table implementation in order to affect the performance characteristics … getting a different result set as a consequence of changing the physical implementation is not OK.

I’m not suggesting, a developer should ignore the physical implementation of a table (heap, iot, cluster, external, etc) … I’m suggesting one should not use that awareness to cut corners in how the SQL language itself is being used … SQL has some specifications, let’s use the language accordingly.

The other point made [over and over] is … when sorted data is required, always use the ORDER BY in the app and leave the optimizer to decide if a sort is actually necessary or not. One is 100% safe (solid programming) and the optimization (whenever possible) will happen transparently. Sounds like a win-win situation to me.

Fri Aug 12, 01:08:00 PM EDT  

Blogger Thomas Kyte said....

Laurent Schneider said...

well, I guess there is one exception


Can you point me to the exact book.

I'll have the documentation bug fixed as soon as possible.

They are counting on a SPECIFIC PLAN happening there, and if the plan doesn't happen - well, data isn't sorted.

They NEED to have an order by on that query.

Fri Aug 12, 05:23:00 PM EDT  

Blogger Alberto Dell'Era said....

I think Laurent's book is:

http://tinyurl.com/9953o

Good to have a few counterexamples,
on my nls_langiage=italian,
charset=AL32UTF8 10.1.0.3 instance:

create cluster t_cluster (
h int,
x varchar2(3) SORT
)
hashkeys 20
hash is h;

create table t (h int, x varchar2(3) sort)
cluster t_cluster (h, x);

insert into t (h,x)
values (1, 'b');
insert into t (h,x)
values (1, 'c');
insert into t (h,x)
values (1, unistr ('\00E0')) -- accented "a"
/
commit;

create index t_idx on t (h, x desc);

italy> select x from t where h = 1
2 ORDER BY x;

X
---------
à
b
c

italy>
italy> select x from t where h = 1;

X
---------
b
c
à

italy>
italy> select /*+ index (t t_idx) */ x
2 from t where h = 1;

X
---------
à
c
b

italy>
italy> select x, dump(x,16) x_hex from t;

X X_HEX
--------- --------------------
b Typ=1 Len=1: 62
c Typ=1 Len=1: 63
à Typ=1 Len=2: c3,a0

Added the NLS one after reading Gabe's comment.

Sat Aug 13, 05:27:00 AM EDT  

Blogger Laurent Schneider said....

yes, my book is the Database Administrator's Guide 10gR2

http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14231/hash.htm#ADMIN01901

well, I am not totally convinced about the counterexample, it just shows that nls sort is not used in the cluster. however, if the documentation is going to be corrected, than I will no longer be able to assume the data will come in the "right" order...

similar to REWRITE_OR_ERROR, I could imagine a hint SORT_OR_ERROR to exits if not able to use the sort provided by the cluster

Mon Aug 15, 06:38:00 AM EDT  

Blogger Thomas Kyte said....

Laurent Schneider said...

yes, my book



Laurent, when the doc is fixed it'll say something like:

....
The sorted hash cluster stores the data in a fashion that in many cases allows the execution plan to skip an explicit sort -- you need to use order by of course, but you might not SEE an "ORDER BY" step in the execution plan since the data is physically stored sorted for you.
.....

You get neat plans like this:

ops$tkyte@ORA10G> set autotrace traceonly explain
ops$tkyte@ORA10G> variable x number
ops$tkyte@ORA10G> select cust_id, order_dt, order_number
2 from cust_orders
3 where cust_id = :x
4 order by order_dt;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=0 Card=4 Bytes=76)
1 0 TABLE ACCESS (HASH) OF 'CUST_ORDERS' (CLUSTER (HASH))


NO sort/order by step in the execution plan

That is the magic of the sorted hash cluster, not that you can safely (because you cannot) leave off the order by.

Mon Aug 15, 06:58:00 AM EDT  

Blogger Laurent Schneider said....

very well explained... if I need to sort, I will use order by, even in that case !

Tue Aug 16, 04:45:00 AM EDT  

Anonymous Michael Garfield Sørensen said....

Not using ORDER BY is what gave us "The oldest bug in Oracle" - Jonathan explains :)

Wed Aug 17, 09:47:00 AM EDT  

Blogger Laurent Schneider said....

I thought it could be interresting, a query with distinct can use hash unique access plan operation in 10gR2 :

SQL> select distinct tzname from v$timezone_names where tzname like 'Etc/GMT+1_';
TZNAME
----------
Etc/GMT+10
Etc/GMT+12
Etc/GMT+11

providing an unsorted output! thanks again mr tom kyte for having persuade me to use order by in any situation!

Tue Sep 06, 08:31:00 AM EDT  

Blogger giantpanda said....

tried Oracle 10g's new feature sorted hash cluster, for a key with relatively small number of record, return is pretty fast in presorted order, while for key which huge number of record under it, the return speed is equally slow, it still need to do some kind of sorting (as i see the sort segment kept on growing) before return result in ordered format, even it is already presorted.

eventually, i still use materialized view, instead of sorted hash cluster

Wed Oct 05, 11:39:00 AM EDT  

Anonymous Giuseppe D'Ambrosio said....

someone said before,

>> Thanks, HASH GROUP BY! How cool and performant ... and definitely not returning the rows sorted :)

..but returning not even the right results;;)

what about _gby_hash_aggregation_enabled?
I'm curious about hidden parameters in general.. why to release this cool hash group by, with its friendly parameter, if there's a bug? I mean, _gby__etc__ has been thinked like an emergency exit, or like "in the meantime noone will notice, they'll think: cool!"?;;

Fri Aug 04, 09:13:00 AM EDT  

Blogger Thomas Kyte said....

Pretty much every new optimizer feature has a hidden switch for it to enable/disable. Nothing strange or evil/suspicious about it.

Fri Aug 04, 09:18:00 AM EDT  

Anonymous Giuseppe D'Ambrosio said....

ok I was joking;)
so I suppose my 10.2.0.2.0 release hadn't found yet that bug.. and from some 10.x on, the parameter is set to false by default (or the bug fixed)..right?

Fri Aug 04, 09:32:00 AM EDT  

Anonymous Anonymous said....

Did Oracle recommend to use "Group By" command without "Order By" to sort data in Oracle 9i.

Mon Jan 08, 06:09:00 PM EST  

Blogger Thomas Kyte said....

group by has never had to sort. Not in 9ir2, 9ir2, 8ir3, 8ir2, 8ir1, 8.0, 7.3, 7.2, 7.1, 7.0, 6, ............

No is the answer to the prior comment.

Mon Jan 08, 06:18:00 PM EST  

Anonymous Laurent Schneider said....

SELECT * WHERE telephone_number = 6505551212;

I just notice that in this query not only the order by clause is missing, but also the from clause ;-)

it is not fixed in 11g doc

Thu Aug 16, 05:19:00 AM EDT  

Anonymous Anonymous said....

Hello Tom

I have an SQL with two analytical functions in it - lead and lag - both with the same "over (order by...) " columns.
The rows seem to come out ordered as I would like, but now I've read this post I am having second thoughts.

Since order by is already included, do I need to add another order by clause at the end of SQL ?

Regards, Matjaz

Mon Apr 07, 11:39:00 AM EDT  

Blogger Thomas Kyte said....

@anonymous

Since order by is already included, do I need to add another order by clause at the end of SQL ?


you only need that last order by IF YOU WANT YOUR DATA ORDERED.

Yes, that was the entire, sole, single purpose of this article.

Unless and until you have an order by clause on your statement, you do not have any reason to expect the data to be ordered.


An order by clause CAN BE SKIPPED by the optimizer if it already did it - so it will not slow down anything if the data is already sorted accidentally - but it will be the only way to ensure that the data is sorted.

Mon Apr 07, 11:42:00 AM EDT  

Anonymous Anonymous said....

Not sure how right this question is - can the following query guarantee output ordered by x?

select * from (select x,y,z from t order by a)

Wed Jun 11, 06:16:00 PM EDT  

Anonymous Anonymous said....

corrected : can the following query guarantee output ordered by x?

select * from (select x,y,z from t order by x)

Wed Jun 11, 06:18:00 PM EDT  

Blogger Thomas Kyte said....

@anonymous

in a word:

no

the optimizer is free to recognize that the order by isn't necessary and can skip it.

Wed Jun 11, 08:24:00 PM EDT  

Anonymous Anonymous said....

Hi there,

This is great stuff, thanks for all the info.

I have a question. Let's say there is students table with columns (id, name, type) and I need the list of students sorted by type so I include an 'order by type' clause in my query.

Can any assumptions be made about the order of the students within each type group?

If not (which by the way is my guess), would it be fair to say that the only way to enforce the sort order of a query is not only using an 'order by' clause but always appending to the list of (semantically) relevant fields for each query, another field that would always ensure the same order in the results (e.g., in my example, 'order by type, id')?

Thanks a lot in advance,
Ernesto

Thu Oct 02, 10:58:00 AM EDT  

Blogger Thomas Kyte said....

Can any assumptions be made about the order of the students within each type group?

No...

but always appending to the list of (semantically) relevant fields for each query, another field that would always ensure the same order in the results

Yes you have to order by all of the attributes you want the data ordered by...

If you sort by a list of attributes that is not unique, the rows can and will come out in different orders at differing times - they will be sorted by your sort key, but the records with the same key can and will be randomly sorted each time you retrieve them..

Mon Oct 06, 06:45:00 AM EDT  

Blogger Vishal said....

Hi Tom,

I have a question regarding ordering.

Consider a table T with the following data in it.


COL1 COL2 COL3

1 1 A
2 1 B
3 1 C
4 1 D
5 2 E
6 2 F

Say Col1 is the primary key column. If I issue a query 'select * from t where col2 = 1 order by col2'

I get this output:

COL1 COL2 COL3

1 1 A
2 1 B
3 1 C
4 1 D

which is correct. Note that I am ordering on COL2 which has duplicate values. If I reexecute this query can Oracle guarantee that it will return me the output as above or will the order of output change as
I have not ordered on primary key but on a column that has repeating values.

Thanks,
Vishal

Sat Oct 30, 09:01:00 AM EDT  

Blogger Thomas Kyte said....

@Vishal

you can only assume that the data is ordered by the column(s) you specify in the order by.

If you order by something non-unique, the order of the rows is non-deterministic.

The rows could be returned in a different order from execution to execution - it would be totally permissible for that to happen.

You can only rely on the fact (in your example) that the results will be sorted by COL2 and not anything else.

Mon Nov 01, 04:34:00 AM EDT  

Anonymous Anonymous said....

select * from (select x,y,z from t order by x)

You said before that the order is not guaranteed because the optimizer may recognize that the order is not necessary.

That structure is used to generate TOP N queries:

select *
from (select * from t order by x)
WHERE ROWNUM < 10

Are you saying that the above query would not necessarily return the 10 rows with the 10 lowest x values (assuming x is unique)?

Thanks,
Letizia

Mon Dec 06, 07:08:00 AM EST  

Blogger Thomas Kyte said....

@Letizia

the optimizer recognizes that the former query is such that the order by is not necessary. However, the latter query is such that IT IS necessary.

The top-n query works because the optimizer recognizes it for what it is.

Mon Dec 06, 07:24:00 AM EST  

Anonymous Anonymous said....

Ok, thanks Tom. I thought so, after a few tries on 9i, 10g and 11g versions; I saw that the subquery is not necessary any longer in the newer versions.
Unfortunately my Oracle "teacher" on 9i told me that the order by when there's a group by/distinct is a "double work", so I never coded them together and I even removed a few of them :(
Some more work for this new year!
Letizia

Wed Jan 05, 08:17:00 AM EST  

Blogger Thomas Kyte said....

@Letizia

If there was "double work" - if the group by actually did sort the data as it sometimes would, sometimes could - the OPTIMIZER would remove the double work.

If you "group by x order by x" and the group by used a SORT GROUP BY and the SORT GROUP BY (which is a binary sort) sorted the data correctly given your NLS settings - then the optimizer would SKIP the SORT ORDER BY step.

But if the group by x did NOT sort the data - as it never never had to in any release of Oracle - then the optimizer would not skip the sort order by.


In short, there was never double work with "group by x order by x", we only sorted for the order by when we *needed* to.

so, there was never any penalty for having it there - and if your application required sorted data - it was MANDATORY that it should appear there.

Wed Jan 05, 08:23:00 AM EST  

POST A COMMENT

<< Home