Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Sokrates.

Asked: September 04, 2008 - 12:00 pm UTC

Last updated: July 26, 2009 - 6:56 am UTC

Version: 10.0.2

Viewed 1000+ times

You Asked

when I have a query of the form

select rownum rn, x.*
from
(select ...
 from ...
 order by ...
) x


is it guaranteed that the output is ordered by rn
(without explicitly telling Oracle to order it by rn) ?



and Tom said...

with the top-n query processing - when you have rownum wrapped around that order by in the inline view, the optimizer thinks "top-n query" - it returns the data from the inline view sorted and then assigns rownum.

Perhaps better and more "semantically telling" would be:

select row_number() over (order by ....) rn, ....
from ...
order by .....


that is absolutely 100% to be "sorted by row_number" assuming the order by in row_number is the same as on the query itself.

Rating

  (21 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

mistake in my question

Sokrates, September 05, 2008 - 3:17 am UTC

thank you

actually, I had a mistake in my question

what I really wanted to ask was

when I have a query of the form

select rownum rn, x.*
from
(select ...
 from ...
 order by a,b,c
) x

is it guaranteed that the output is ordered by a, b, c
(without explicitly telling Oracle to order it by a, b, c) ?



In the past, I took this as a matter of course without thinking about it.

But I think, you already answered also this question
("..it returns the data from the inline view sorted and then assigns rownum")
But afterwards, it is - at least theoretically free - to return the data in another order than a, b, c ?

Shouldn't I (depart from the row_number() solution which I would also prefer here)
write

select * from
(
select rownum rn, x.*
from
(select ...
 from ...
 order by a,b,c
) x
)
order by a,b,c


to be sure that the data is sorted by a, b, c

?
Tom Kyte
September 05, 2008 - 9:04 am UTC

...
But afterwards, it is - at least theoretically free - to return the data in another order than a, b, c ?
....


yes, and yes.

ops$tkyte%ORA10GR2> select *
  2    from (select rownum rn, x.*
  3            from (select * from t order by object_name, object_type) x
  4             )
  5   order by object_name, object_type
  6  /

Execution Plan
----------------------------------------------------------
Plan hash value: 3902787780

--------------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      | 50730 |  6985K|       |  1329   (3)| 00:00:07 |
|   1 |  VIEW                 |      | 50730 |  6985K|       |  1329   (3)| 00:00:07 |
|   2 |   COUNT               |      |       |       |       |            |          |
|   3 |    VIEW               |      | 50730 |  6341K|       |  1329   (3)| 00:00:07 |
|   4 |     SORT ORDER BY     |      | 50730 |  4607K|    12M|  1329   (3)| 00:00:07 |
|   5 |      TABLE ACCESS FULL| T    | 50730 |  4607K|       |   226   (4)| 00:00:02 |
--------------------------------------------------------------------------------------



and it knows the second order by is redundant and makes it go away - but it forces it to sort it at least once for sure and ensures the selected output is sorted.

Michael Virnstein, September 05, 2008 - 5:28 am UTC

As far as i have observed, sorting again is only necessary when you join something (e.g. additional information) on the top-n query:
<code>
select t1.*, t2.z 
  from (select rownum rn, x.*
          from (select ...
                  from ...
                 order by a,b,c) x
       ) t1,
       t2
 where t1.a = t2.a
 order by rn
</code>
Tom Kyte
September 05, 2008 - 9:23 am UTC

just to be devils advocate here...

in the past people "observed" that group by always seemed to sort (it didn't, never did, but it did sort frequently)....

so, they left the order by off

and got very much "hosed"


so, while I cannot imagine

select rownum, x.* from (select ... order by ... ) x

not being sorted by rownum (by the order by), it is permitted to not be sorted.

Better safe than sorry

Brett, September 05, 2008 - 7:21 am UTC

If you really need the rows returned in a certain order, then the best thing to do is perform the order by on the outer select. I always suggest this to developers, that way, there is no chance of it being "wrong"...even if Oracle decides to change how things work internally...like they did with implicit sorting on aggregations in 10g
Tom Kyte
September 05, 2008 - 9:26 am UTC

group by never had to sort in any release of Oracle.

A simple addition of an index could have "unsorted" their data in any release of Oracle.

A simple gathering of statistics could have "unsorted" their data

An introduction of partitioning could have...

Use of parallel query could have...


There were dozens of ways to have group by "not sort" in all releases - forever.

thx

Sokrates, September 05, 2008 - 9:09 am UTC


there is nothing ... "after"

A reader, September 05, 2008 - 2:21 pm UTC

so, while I cannot imagine

select rownum, x.* from (select ... order by ... ) x

not being sorted by rownum (by the order by), it is permitted to not be sorted.



I'm not convinced. Why would it be permitted not to be sorted?

If anything, your first example shows the optimizer knows an over-lapping order by, similar to the one in the outer-most in-line view, is in fact redundant.

The rownum sequencing will always be consistent with that order by ... and the rows will come back sequentially by rownum ... and hence, by the order by.

If the rows were to not come back sequentially by rownum then we might have cases where something like:

"select ... from ... where rownum > 1"

would return something ... and the documentation clearly says that will never happen.

So, there is nothing - afterwards - to interfere with the order.

Tom Kyte
September 05, 2008 - 4:55 pm UTC

... Why would it be permitted not to be sorted? ...

because there is no order by on the outer query. that is why.

I've always wished the database would flip flop rows - to make them purposely NOT SORTED.

It is PERMITTED to return them unordered. PERMITTED. Permitted is different from "it does return them unordered", "it will return them unordered"

permitted means - there is nothing explicitly stated anywhere (that I am aware of) that says they will be.

You presume SQL to be procedural - "aftwards" - having some explicit order of operation. It doesn't have to be that way.

Like I said, in current releases - I cannot think of a reason for the data to come out unordered in that query.

Used to be a time that if you used an index range scan, I could say the same thing (data would be sorted by index key) - but nowadays - that is not true either. Back in the early days (v7), I would have said "if you use an index, the data will come back sorted", that is no longer true - because.... things change. I can show you index range scans that return data "not sorted" (hint: think partitioning)

Unless and until you have an order by.....

Can I accidentally get the right answer?

Chuck, September 05, 2008 - 5:59 pm UTC

Why do people seem to mind putting order by on their queries so much?
Is 'order by' hard to type?
It amounts to "Can I accidentally get the right answer?"
Well sure, so could a monkey.
Why not just tell the database how result should be sorted if it needs to be sorted sorted?
I'm really asking: why does this crop up so often.


PS This is not necessarily directed at the current questioner. It seems to be a common question.


Tom Kyte
September 08, 2008 - 8:50 am UTC

The reasoning goes like this (from what I've heard, I've asked this same question over and over...)

the theory:

sorting takes work, cpu, resources

if I can get my query to return the data sorted without using order by, I have therefore reduced the work, cpu, resources needed.

why it is a fallacy:

if the optimizer sees something like:

select * from t group by x order by x;

it will likely choose to do a "sort group by" to group - and SKIP THE ORDER by - it 'knows' the data is already sorted!!!

ops$tkyte%ORA10GR2> create table t ( x varchar2(20) );

Table created.

ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'T', numrows => 100000, numblks => 10000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select x, count(*) from t group by x;

Execution Plan
----------------------------------------------------------
Plan hash value: 47235625

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|  1171K|  2213   (2)| 00:00:2
|   1 |  HASH GROUP BY     |      |   100K|  1171K|  2213   (2)| 00:00:2
|   2 |   TABLE ACCESS FULL| T    |   100K|  1171K|  2201   (1)| 00:00:2
------------------------------------------------------------------------

ops$tkyte%ORA10GR2> select x, count(*) from t group by x order by x;

Execution Plan
----------------------------------------------------------
Plan hash value: 1476560607

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|  1171K|  2213   (2)| 00:00:2
|   1 |  SORT GROUP BY     |      |   100K|  1171K|  2213   (2)| 00:00:2
|   2 |   TABLE ACCESS FULL| T    |   100K|  1171K|  2201   (1)| 00:00:2
------------------------------------------------------------------------

ops$tkyte%ORA10GR2> set autotrace off


see how there is no order by on the second plan - it switched from hash group by (definitely not sorted) to sort group by.


Now, if the developer actually runs "group by x" and observes what appears to be sorted data and then run "group by x order by x" and observes what appears to be sorted data that TOOK LONGER TO RETRIEVE - then we have the basis for this myth - and we have a developer that hasn't really studied what the returned data was all about (eg: they have a bug, their data from the group by x WAS NOT SORTED)

For example, consider this (I've disabled hash group by in 10g, so it does a sort group by for BOTH group by's this time)

ops$tkyte%ORA10GR2> create table t ( x varchar2(20) );

Table created.

ops$tkyte%ORA10GR2> exec dbms_stats.set_table_stats( user, 'T', numrows => 100000, numblks => 10000 );

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> alter session set "_gby_hash_aggregation_enabled" = false;

Session altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> @at
ops$tkyte%ORA10GR2> column PLAN_TABLE_OUTPUT format a72 truncate
ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select x, count(*) from t group by x;

Execution Plan
----------------------------------------------------------
Plan hash value: 1476560607

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|  1171K|  2213   (2)| 00:00:2
|   1 |  SORT GROUP BY     |      |   100K|  1171K|  2213   (2)| 00:00:2
|   2 |   TABLE ACCESS FULL| T    |   100K|  1171K|  2201   (1)| 00:00:2
------------------------------------------------------------------------

ops$tkyte%ORA10GR2> select x, count(*) from t group by x order by x;

Execution Plan
----------------------------------------------------------
Plan hash value: 3210385847

------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time
------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   100K|  1171K|  2224   (2)| 00:00:
|   1 |  SORT ORDER BY      |      |   100K|  1171K|  2224   (2)| 00:00:
|   2 |   SORT GROUP BY     |      |   100K|  1171K|  2224   (2)| 00:00:
|   3 |    TABLE ACCESS FULL| T    |   100K|  1171K|  2201   (1)| 00:00:
------------------------------------------------------------------------

ops$tkyte%ORA10GR2> set autotrace off
ops$tkyte%ORA10GR2> select * from nls_database_parameters where parameter = 'NLS_CHARACTERSET';

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_CHARACTERSET               WE8ISO8859P1

ops$tkyte%ORA10GR2> !echo $NLS_LANG
FRENCH_CANADA.WE8MSWIN1252



see how the second query has an "extra" sort step? Well, it is NOT EXTRA, it is mandatory. Sort group by is a binary sort, ORDER BY is a character set set. The sort group by does NOT (has not, never did) sort the data by the character set - so by leaving the order by off - the developer says to themselves "see, i just made the query run faster and we get the right answer because my test data appears sorted to me"

Until they go production and real data is there and it is NOT sorted correctly...



The order by was necessary here to GET THE RIGHT ANSWER - always has been (all releases) always will be....



Chuck, September 08, 2008 - 10:16 am UTC

Thanks for the explanation.
For some reason, maybe because it's Monday, I picture hundreds of programmers saying "Hey Fred! Kyte says if we stick to US English we don't have to use order by!" today.


proof for this not the other ...

A reader, September 08, 2008 - 12:36 pm UTC


Tom,

This is not in the same category as the group by, I never used any of them reasoning that you mention. It is not laziness to type, or what I call premature optimization, nor belief in pixie dust. Sorry to disappoint you Chuck ... but, if you want proof of how pervasive the issue actually is, take a look at this answer here:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1137577300346084930#1141584300346481219
Isn't an order by missing there before one could declare those two queries equivalent? And that is Sep 5, 2008.

I would hypothesize that the majority of requirements for top-n or pagination queries actually need sorted results ... and yet almost every answer on this forum, or in the manuals for that matter, fails to mention or caveat for it. Arguing that it should be obvious is hardly persuasive.

Here is another, older, example:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:32812348052#2026966612220
and look for the first run_stats comparison:
  9          for x in ( select *
 10                        from ( select * from t order by object_name )
 11                       where rownum < 10 )
 12          loop
 13              null;
 14          end loop;

versus
 23          for x in ( select a.*, rownum r
 24                        from ( select * from t order by object_name ) a )
 25          loop
 26              exit when ( x.r >= 10 );
 27          end loop;

Given enough rows in t, the first pl/sql loop iterates 9 times, guranteed, nobody would dispute that! The second query is not sorted and hence the pl/sql loop based on it can iterate anywhere between 0 and 9 times ... at least, in theory ;) ... presumption of the SQL being procedural?

But people do have the right to revise their understanding, Tom Kyte or I included.

Believe or not ... I want to understand this and not just save keystrokes.

So, here is my theory:

"Unless and until you have an order by ..."

"To be guaranteed to get to Rome you need to take a road to Rome"

"But, if all roads lead to Rome, then take a road and you'll get to Rome ... guaranteed"

By "all roads lead to Rome" I do not mean what I have always observed with ROWNUM or the past, current or future internal implementations of ROWNUM (which I have no way of knowing or predicting) ... I mean its very definition:

For each row returned by a query, the ROWNUM pseudocolumn returns a number indicating the order in which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.

select rownum, x.* from (select ... order by ... ) x

My query is sorted ... and, as the rows are coming out, the ROWNUM labels are applied ... sequentially.

I do have a guarantee and it is this: ROWNUM will not interfere with the pre-sorted result set.

To work through a simple example: if the inner query returns 3 rows sorted by the values A, B and C then:

Rownum 1 is associated with row A
Rownum 2 is associated with row B
Rownum 3 is associated with row C

I hope everyone agrees so far. I said it in my first review: "The rownum sequencing will always be consistent with that order by".

Your contention is that the final result set may be returned like this:

(2, B)
(3, C)
(1, A)

not ordered anymore. That is, maybe this is false: "... and the rows will come back sequentially by rownum ..."

And my contention is that, by its own definition, ROWNUM will not interfere with the result set, sorted or not sorted.

And I did register the critique:

You presume SQL to be procedural - "aftwards" - having some explicit order of operation. It doesn't have to be that way.

Here procedural is used as an antonym for set-based, set-oriented. Most of the time I've seen you use this argument, that SQL is not procedural, it is appropriate. But just because you've used it, appropriately, most of the time it does not mean it is universally true.

The moment you throw ROWNUM or CONNECT BY in a query you are out of the realm of sets ... there is no order in a set ... the moment you have features operating on sorted sets the set-based argument fades away. Put those things in a query and it isn't I who presume the SQL to be procedural ... the SQL is procedural in those instances.

The very reason a top-n or pagination query or something like "select ... from ... where rownum > 1" works the way they do is because, conceptually, the SQL is operating sequentially for a while ... it has briefly shifted to procedural mode.

ROWNUM operates on sets, sorted or not ... ROWNUM does not sort ... ROWNUM does not interfere with the set it is given ... but it operates procedurally by definition. ROWNUM 1 will always come before ROWNUM 2 and so on ... it just is.

So, I'm still at:

The rownum sequencing will always be consistent with that order by ... and the rows will come back sequentially by rownum ... and hence, by the order by.

I think will both remain convincingly unconvinced with each other's argument. Fine ... better than lazy or naïve, I suppose!

Tom Kyte
September 08, 2008 - 4:23 pm UTC

look - someone asked:

... But afterwards, it is - at least theoretically free - to return the data in another order than a, b, c ? ...


and you know what the answer is?

it made me think.

there is only one answer possible to that question:

yes, yes it is

why - because there is not an order by there. and we should probably have one. Point taken, I will be doing that in the future.


Let me ask you - does the data HAVE to come back sorted? does it?

Chuck, September 08, 2008 - 3:49 pm UTC

You disappoint me not the least.
It's all very interesting.
Under the covers Oracle is a procedural computer program and it behaves that way. It's fun to experiment and find the boundaries and teach it tricks.
But a sql statement in a production environment that requires a sorted final result, and has no order by clause against the final result set, is an incorrect sql statement.
Leaving it off seems to have real world risks with little if any real world advantage.


since you asked ...

A reader, September 09, 2008 - 12:28 am UTC


Tom,

It made me think too ... but I came to a different conclusion.

select rownum, x.* from (select ... order by ... ) x

Rownum is a pseudo column ... it is not part of the set, it is something about the set. Where you see a set in need of ordering I see an already ordered set with a descriptor.

In order to get a sorted set one needs to have an order by operation.

Absolutely ... if by set one means relational set. There is no concept of order in a relational set and, in that light, it should go almost without saying ... in order to get a sorted set one needs to, ahem, sort it!

SQL works on relational sets, the input ... it implements relational operators ... the fallacy is assuming the SQL output is always a relational set. Well, not always, because SQL has non-relational extensions. It is mostly problematic when the output of a query block is the input to another ... conceptually, a materialization of the query block has to happen before we can safely continue to apply relational operators (and rely on that theoretical foundation) ... we need a relational set as our next input. A sub-query with rownum is not mergeable ... similar thing ... we cannot apply relational operators across the sub-query boundaries because we've used this non-relational feature of SQL!

... But afterwards, it is - at least theoretically free - to return the data in another order than a, b, c ? ...

If that were a relational set, yes ... there would be a theory, the relational theory, telling us not to rely on the existence of any particular order. That's not a relational set though ... that is the result of a SQL query.

I'm afraid, at least in my books, it boils down to the old argument that SQL is not a true relational implementation.
On the other hand, I'm OK to accept that pragmatic answer ... it works the way it works ... in as much as it relates to the definition of ROWNUM.

does the data HAVE to come back sorted? does it?

Based on my understanding of what ROWNUM is, yes ... the result of that SQL query will be returned sorted.

How Oracle SQL defines ROWNUM has nothing to do with the relational theory.

So rather than retrofitting a great number of top-n and pagination queries out there I would suggest maybe a documentation clarification of the ROWNUM definition. From my perspective, it is clear that ROWNUM 2 will never ever be returned prior to ROWNUM 1. But maybe I'm misinterpreting the documentation.

Thanks for your patience.
Tom Kyte
September 09, 2008 - 7:47 am UTC

do you understand that things change. When someone asks "hypothetically", the answer is of course "yes"

so what that rownum is not a "set thing", so what. Does that mean they are not allowed to scramble up the order of the data?

I've often wished they would - that if a query didn't have an order by, they would do something like return 'row 2' and then 'row 1' and then 'row 4' and then 'row 3' and so on - just to drive home the point that "this stuff isn't sorted by anything, even if by accident it looked like they did'


I know rownum is assigned sequentially to rows as they flow from one step to the next. But does that mean that they have to flow out of the rownum step in the same order?

No - theoretically - no.

In practice do they? Yes, today they do.

Do they *have* to - I do not think so.


This is nothing to do with rownum really, I'm not talking relational theory, I'm talking reality.


Think about the existence of a new feature in the future (like partitioning was when version 8.0 came out). This new feature causes a change that we cannot even anticipate today. That is what I'm thinking about.


Used to be that if you used an index range scan to read data, data would come out in the sorted order of the index keys. It isn't true anymore, it was until 10g, but it isn't true anymore.

Used to be that if you had an index on (a,b,c) and did a where on "b=? and c=?" - the index would not be used to retrieve the data, until 9i that is, then it might be - not would be, but could be.



ORDER BY TEST CASE

Mihail Bratu, September 09, 2008 - 6:45 am UTC

Hello Tom,

Let's analyze the example below:

set autotrace traceonly explain
select ename, row_number() over (order by ename) rn
from emp;

Execution Plan
----------------------------------------------------------
Plan hash value: 3145491563

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 84 | 4 (25)| 00:00:01 |
| 1 | WINDOW SORT | | 14 | 84 | 4 (25)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 14 | 84 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------

Now, we add an order by at the end:

select ename, row_number() over (order by ename) rn
from emp
order by 2;

Execution Plan
----------------------------------------------------------
Plan hash value: 3145491563

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 84 | 3 (0)| 00:00:01 |
| 1 | WINDOW SORT | | 14 | 84 | 3 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 14 | 84 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------

In this case, when having only one analytic function, it seems to be no need of the last order by clause.

The next example shows that the optimizer is really removing the last order by clause in the previous example:

select ename, row_number() over (order by ename) rn
from emp
order by 1, 2;

Execution Plan
----------------------------------------------------------
Plan hash value: 3946901873

----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 84 | 5 (40)| 00:00:01 |
| 1 | SORT ORDER BY | | 14 | 84 | 5 (40)| 00:00:01 |
| 2 | WINDOW SORT | | 14 | 84 | 5 (40)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMP | 14 | 84 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------


Tom Kyte
September 10, 2008 - 9:08 am UTC

I'm not sure I see your point here.

You are showing that the optimizer will - when it can - "not sort redundantly"

that does not show that "it will therefore be sorted without the order by"


You cannot empirically prove that the results will always be sorted in this case - what if I flip a coin and 50 times in a row it comes up heads, does that prove that that coin will always come up heads?

A change in plan can change the order of the data.

Let me put it this way:

The optimizer is free to remove a sort, WE are not allowed to leave out an order by


(see above for my group by example - for example, a new release - a new plan "hash group by" and everything changes.

If you want data sorted, I would highly recommend - asking for it sorted.

to have confidence in optimizer

Mihail Bratu, September 09, 2008 - 9:32 am UTC

hello again,

select rownum rn, ename
from (
select ename
from emp
order by 1
);

Execution Plan
----------------------------------------------------------
Plan hash value: 949368061

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 98 | 4 (25)| 00:00:01 |
| 1 | COUNT | | | | | |
| 2 | VIEW | | 14 | 98 | 4 (25)| 00:00:01 |
| 3 | SORT ORDER BY | | 14 | 84 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| EMP | 14 | 84 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------


select rownum rn, ename
from (
select ename
from emp
order by 1
)
order by 2;

Execution Plan
----------------------------------------------------------
Plan hash value: 949368061

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 98 | 4 (25)| 00:00:01 |
| 1 | COUNT | | | | | |
| 2 | VIEW | | 14 | 98 | 4 (25)| 00:00:01 |
| 3 | SORT ORDER BY | | 14 | 84 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| EMP | 14 | 84 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------

select rownum rn, ename
from (
select ename
from emp
order by 1
)
order by 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1334697481

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 98 | 5 (40)| 00:00:01 |
| 1 | SORT ORDER BY | | 14 | 98 | 5 (40)| 00:00:01 |
| 2 | COUNT | | | | | |
| 3 | VIEW | | 14 | 98 | 4 (25)| 00:00:01 |
| 4 | SORT ORDER BY | | 14 | 84 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL| EMP | 14 | 84 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------

ROWNUM voodoo?

Bill, November 12, 2008 - 6:40 pm UTC

I'm hoping you'll find this question interesting enough even though it is only somewhat related to the original topic. I'm experiencing a situation where the use of ROWNUM is causing Oracle to pass NULL to a function and ultimately causing no records to return (version 10.2.0.2). I'm truly hoping it's my own stupidity and not really Oracle voodoo. Any wisdom you can share would be much appreciated.

Here are the sample tables and the function:

create table t1 (
t1pk number,
t1char varchar2(10)
)

insert into t1 values (1,'T1ROW1');
insert into t1 values (2,'T1ROW2');
insert into t1 values (3,'T1ROW3');

create table t2 (
t2pk number,
t2fk number,
t2num number,
t2date date
)

insert into t2 values (10,1,111,to_date('1-nov-2008'));
insert into t2 values (20,2,222,to_date('2-nov-2008'));
insert into t2 values (30,3,333,to_date('3-nov-2008'));

CREATE OR REPLACE
FUNCTION is_param_passed(p_param NUMBER)
RETURN integer
AS
BEGIN
IF p_param IS NULL
THEN
RETURN 0;
ELSE
RETURN 1;
END IF;
END is_param_passed;

Here is the query:

SELECT
ROWNUM AS rowcounter,
recs.*
FROM (
SELECT *
FROM t1
WHERE t1pk IN (
SELECT DISTINCT t2fk
FROM t2
WHERE t2num IN (333)
AND is_param_passed(t2.t2pk) = 1
)
ORDER BY t1char
) recs

Run this query and you get no results. However, if you take out the "ROWNUM as rowcounter" from the outer select statement, you get results. What I discovered is that with ROWNUM in the select list, the paramter being passed to the function was passed as NULL. More weirdness: If you change "t2num IN (333)" to "t2num IN (222,333)" you get results. Alternatively if you add this phrase "AND rownum >= 0" to the end of the innermost query you get results. Also, if you join t1 and t2 instead of doing the IN clause, you get results. In this case, for reasons to long to explain I need to use the IN clause as shown. I'm baffled! Again, any insight into this would be most appreciated.

Tom Kyte
November 14, 2008 - 9:47 am UTC

I tested in 10.2.0.2 and - yes - it is a bug.

I patched to 10.2.0.4 (after testing in 11.1.0.6 and seeing success) and it was fixed in between 10.2.0.2 and 10.2.0.4. Definitely a bug, fixed in current release.

Followup on rownum / Voodoo

Bill, November 13, 2008 - 1:20 pm UTC

I have discovered that the situation I described yesterday (ROWNUM voodoo? November 12, 2008) may actually be an Oracle bug. Specifically, metalink bug 5708897. In that metalink report they suggest a workaround:

alter session set "_optimizer_filter_pred_pullup"=false;

That workaround did seem to fix the problem. Apparently this bug was fixed in 10.2.0.4. I'm waiting to get a 10.2.0.4 instance up and running so I can test/confirm there. I will continue to post any additional info to avoid having any of you waste time on my submission.

Tom Kyte
November 14, 2008 - 4:47 pm UTC

I did just that - it is fixed in 10.2.0.4

Meaning of ROWNUM

Rahul, November 14, 2008 - 3:02 pm UTC

Tom,

I asked this question in forums,

http://forums.oracle.com/forums/thread.jspa?messageID=2385105�


but, I wanted to post it here to get your opinion, because, I didn't get a satisfactory enough answer in the thread.

Here is the original question:

I am curious about this example.

select * from emp where rownum <=5 and salary > 3000;

What does this query mean?

1) Gives you any 5 employees with a salary > 3000.

2) It's picking any 5 employees and checking if their salary > 3000.

Another way to ask my question is:

If I have to guarantee I get ANY (that eliminates order by) five employees with salary > 3000, then, should I write the query mentioned before or this one:

select * from
(select * from emp where salary > 3000 )
where rownum <=5.

I checked the official definition of rownum and it doesnt' say anything of that sort here:

http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/pseudocolumns009.htm#sthref832

Thank you,
Rahul.

Tom Kyte
November 18, 2008 - 6:17 pm UTC

....
select * from emp where rownum <=5 and salary > 3000;

What does this query mean?

...


it means, find up to five employees making more than 3,000. The five returned might be different depending on who runs the query (eg: against the same data, the SAME data - two users could get different answers, the order of rows on disk will affect the answer as well)


this query:

select * from
(select * from emp where salary > 3000 )
where rownum <=5.


is logically identical, the where rownum <= 5 can be merged into the inline view. they are the same

How about this one?

A reader, November 18, 2008 - 6:45 pm UTC

Assuming the data in the EMP table is static, will this query return the same output every time?

select sal
from (select row_number() over (order by sal) rn, sal from emp) x
where x.rn between 1 and 10

I think the output will be the same every time but reading this thread has made me think twice.



Tom Kyte
November 21, 2008 - 3:48 pm UTC

It does not have to be.

If you get a different sort workarea size (which you definitely can), you'll get different orderings.


pga automatic memory management will (can) change the sort workarea size from run to run of a query, different sort area sizes -> different sort outputs with duplicates....




I don't agree

Sokrates, November 21, 2008 - 5:44 pm UTC

I don't agree with your last answer, I don't get it.
It might be true that we get different orderings but please note that the query isn't

select rowid
from (select row_number() over (order by sal) rn, sal from emp) x
where x.rn between 1 and 10


but

select sal
from (select row_number() over (order by sal) rn, sal from emp) x
where x.rn between 1 and 10


which is not dependent on the ordering, but deterministic ("Assuming the data in the EMP table is static")

Do I miss something or misunderstand something ?
Do you have an example ?



Tom Kyte
November 24, 2008 - 4:30 pm UTC

The question was

"Assuming the data in the EMP table is static, will this query return the same output every time?"

My answer is "you might get different rows". Sure the numbers will all look the same - but they will be different rows. To me, that means "the output was DIFFERENT"

it might look the same - all the same number 5 for example - but they might be different number 5's (so the rowid would be different, if you selected it). To me, a different set of number 5's is a different output.

One more thing

A reader, November 22, 2008 - 12:00 am UTC

Not only the data in the EMP table is static, let's also assume every salary is unique. This is to avoid arbitrary rownum assignments due to duplicate salaries.

I think the query will return the same data every time but you said it may not. I'm confused!
Tom Kyte
November 24, 2008 - 4:34 pm UTC

if you add unique, it suddenly changes everything.

Yes, then it becomes deterministic.

And in theory, it'll be sorted the same - but it doesn't have to be since there is no order by.

Final answer

A reader, November 24, 2008 - 4:42 pm UTC

So, in order to have the same output all the time (assuming static table and unique salaries) we need:

select sal
from (select row_number() over (order by sal) rn, sal from emp) x
where x.rn between 1 and 10
order by x.rn

Correct?!


Tom Kyte
November 24, 2008 - 7:52 pm UTC

technically - yes.

I've never seen it NOT be sorted, but that doesn't mean that at some point in the future it won't be....

if you did order by sal instead of x.rn - it (the optimizer) would know to skip the extra bit of work (unless it needs to at some point in the future)

agree now

Sokrates, November 24, 2008 - 4:44 pm UTC

now I understand what you meant

OK

Kumar, July 23, 2009 - 4:09 am UTC

Hi Tom,
In a main query why rownum <= n works
and not rownum =n ? Any theory behind this?

Thanks for your time.
Tom Kyte
July 26, 2009 - 6:56 am UTC

no theory, all fact.

rownum is incremented AFTER the where clause evaluates to true.

select * from t where rownum = 2

can never return a row, since rownum is NEVER incremented. There cannot be a rownum = 2 unless and until there was a rownum = 1

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.