Skip to Main Content
  • Questions
  • Can rownum return invalid data when referenced 2x in a query

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Jan.

Asked: April 19, 2007 - 12:15 am UTC

Last updated: August 20, 2007 - 10:21 pm UTC

Version: 10.1.0

Viewed 1000+ times

You Asked

I am creating a simple table with 3 columns (first name, last name and id columns). I need to return each distinct id in the table, along with a single name associated with the id (it doesn't matter which name I get back, as long as I get back a valid name). The example below has two distinct names for each id. I am using functions to get the first name and last name out of the table (each has its own function). Each function uses the rownum pseudocolumn.

Is it possible to retrieve invalid names? For instance, in the example below, when retrieving a name for id 1, is there anyway to retrieve Tom for the first name and Howard for the last name? Or, because this is all contained within a single query / transaction, could I only retrieve Tom Jones or Shirley Howard?

Jan

create table test (fnm varchar2(30), lnm varchar2(30), id number);

insert into test values ('Tom','Jones',1);
insert into test values ('Shirley','Howard',1);
insert into test values ('Sue','Snue',2);
insert into test values ('Jay','Buckley',2);
insert into test values ('Robert','Smith',3);
insert into test values ('Russel','Weiss',3);

create function getfnm (passedinid in number)
return varchar2
is
first varchar2(30);
begin
select test.fnm into first
from test
where id=passedinid and rownum=1;
return first;
end;
/

create function getlnm (passedinid in number)
return varchar2
is
last varchar2(30);
begin
select test.lnm into last
from test
where id=passedinid and rownum=1;
return last;
end;
/

select distinct(id), getfnm(id) first, getlnm(id) last from test;

ID FIRST LAST
---------- --------------- ---------------
1 Tom Jones
2 Sue Snue
3 Robert Smith

and Tom said...

that would be the wrongest way in the world to do this.

That would be a bad use of rownum - it is highly unlikely it would cause the issue you mention in this particular case, but ROWNUM is a psuedo column assigned to rows as they flow out of a query, a result set. It is not something assigned permanently to a row.

but the big problem in your example is the use of DISTINCT. Your query is the same as:

select distinct *
  from (select id, getfnm(id), getlnm(id) from test )
/


distinct (or UNIQUE) works on the "set", it is NOT a function.... that could cause some real problems (performance pops into my head immediately)

There is also a huge issue with read consistency to consider.

But first, we should NOT be using plsql here at all!!! there are much better approaches, for example:

ops$tkyte%ORA10GR2> select id, fnm, lnm
  2    from (select test.*, row_number() over (partition by id order by lnm) rn
  3            from test
  4         )
  5   where rn = 1;

        ID FNM                            LNM
---------- ------------------------------ ------------------------------
         1 Shirley                        Howard
         2 Jay                            Buckley
         3 Robert                         Smith

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select id, max( fnm || ' ' || lnm ) name
  2    from test
  3   group by id;

        ID NAME
---------- ------------------------------
         1 Tom Jones
         2 Sue Snue
         3 Russel Weiss

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select id,
  2         min(fnm) KEEP (dense_rank first order by rowid) fnm,
  3         min(lnm) KEEP (dense_rank first order by rowid) lnm
  4    from test
  5   group by id;

        ID FNM                            LNM
---------- ------------------------------ ------------------------------
         1 Tom                            Jones
         2 Sue                            Snue
         3 Robert                         Smith



Any of those would be preferable (well, more than preferable, they return correct consistent results)

let me address that read consistency comment. Each of the SQL statements you execute will be read consistent by default as of the time they are executed. In a single user system (what is that?) you would be OK probably (safe enough) with your query, but as soon as we introduce more than one user and modifications - your approach is suspect.

Into your functions, I'll inject some "sql" and an autonomous transaction. This autonomous transaction just represents other users in your system doing "stuff" and committing as your query is executing:

ops$tkyte%ORA10GR2> create or replace function getfnm (passedinid in number) return varchar2
  2  is
  3      pragma autonomous_transaction;
  4      first varchar2(30);
  5  begin
  6      select test.fnm into first
  7        from test
  8       where id=passedinid
  9         and rownum=1;
 10      delete from test where id = passedinid;
 11      commit;
 12      return first;
 13  end;
 14  /

Function created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> create or replace function getlnm (passedinid in number) return varchar2
  2  is
  3      pragma autonomous_transaction;
  4      last varchar2(30);
  5  begin
  6      select test.lnm into last
  7        from test
  8       where id=passedinid
  9         and rownum=1;
 10      delete from test where id = passedinid;
 11      commit;
 12      return last;
 13  end;
 14  /

Function created.



So, as the function is called, we'll delete some data.... And commit. the outcome - interesting:

ops$tkyte%ORA10GR2> select distinct id, getfnm(id) first, getlnm(id) last
  2    from test
  3   order by id;

        ID FIRST      LAST
---------- ---------- ----------
         1 Tom
         1
         2 Sue
         2
         3 Robert
         3

6 rows selected.


hmm....


Rating

  (10 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thank you

A reader, April 19, 2007 - 5:38 pm UTC

Thank you for your response. My example used was a very simplified version of a proposal made by a contractor to my office. They were trying to use the same functionality multiple times within a single view (for instance, getting a sender's first name and a recipient's first name), and therefore, they didn't want the code to look too complex!

The use of distinct was my own to retrieve only a single row for each id. I removed the distinct as well as the autonomous transaction and ran a different test.

create or replace function getfnm (passedinid in number) 
return varchar2 
is 
x number:=1;
first varchar2(30); 
begin 
while x<10000000 loop
x:=x+1;
end loop;
select test.fnm into first 
from test 
where id=passedinid and rownum=1; 
return first; 
end; 
/ 

create or replace function getlnm (passedinid in number) 
return varchar2 
is 
x number :=1;
last varchar2(30); 
begin 
select test.lnm into last 
from test 
where id=passedinid and rownum=1; 
while x<10000000 loop
x:=x+1;
end loop;
return last; 
end; 
/ 


As soon as I kicked off the select statement, I went to a different session and executed:

SQL> delete test where id=2;

2 rows deleted.

SQL> commit;

Commit complete.


Then I went back to the first session for the results:
SQL> select id, getfnm(id) first, getlnm(id) last from test order by id;

        ID FIRST      LAST
---------- ---------- ----------
         1 Tom        Jones
         1 Tom        Jones
         2
         2
         3 Robert     Smith
         3 Robert     Smith

6 rows selected.


I am not sure I understand what is going on with the read consistency here. Isn't the "master select" statement supposed to be read consistent? Why does it seem that each of the functions is an independent statement that maintains its consistency, but not the driving select statement? I would've thought that each of the functions being executed were part of a single transaction of the "master select". Is that incorrect? Perhaps I'm having issues seeing the ordering of what is executing when.
Tom Kyte
April 20, 2007 - 6:37 am UTC

the master select is, as is each recursive sql, but since they are done at different times, they are done with different "as of" point in times..


They are part of the SAME TRANSACTION
They are each read consistent
They are however very much independent of eachother.

This is a horrible way to approach the problem, you need a new contractor for sure. They are either

a) not really good with SQL
b) a hardware vendor in disguise, they'll leave this application and then "fix it" by selling you more hardware


tell the contractor - "hey, this database, it can do these things we call "joins" and "aggregation" - you don't need to write procedural code"

and in fact, you have just proven their concept won't fly in real life.

Analytics and Order By

Yogesh Purabiya, April 20, 2007 - 12:50 am UTC

There are 2 observations.

(1) From your first book (Wrox edition), somehow I picked up an idea that while using Analytics we have to use ORDER BY clause in the same query / sub-query. But, here I see that output comes OK even when you have not used the "ORDER BY id, lnm" clause in the sub-query. Is it due to Version change, or I took it wrongly ?

(2) There is one review above waiting for your reply. Previously, as I remember, such pages were blocked until & unless you have attended them. Now, after seeing my review, there are chances that you would miss the previous one (which may be pending). Sorry, if my observation is wrong.


Tom Kyte
April 20, 2007 - 6:56 am UTC

1) I am still so surprised how often this comes up...

UNLESS AND UNTIL YOUR SQL STATEMENT HAS AN ORDER BY ON IT - THE DATA IS NOT SORTED IN ANY FASHION YOU CAN RELY ON.

In the above, it is purely "by accident" if you are observing any sort of sorted order.


use order by, or please just expect the data to come out in any order we want to send it to you


Thank you again. I understand.

Jan, April 20, 2007 - 9:10 am UTC

Tom,
Thank you again for your most recent reply. After reading your comment yesterday and thinking about it a few times, I was still confused. But, as I was walking into the office this morning, everything cleared up. Then I saw your reply and it validated my thinking of this morning. Sorry for the extra work on your part to get through my sleepyhead yesterday.
As far as your comments about our contractor -- I've been saying the same thing for about 4 years now (not really good with SQL), but it isn't always possible to get the management to agree and do something about it! The original purpose of my note was because I wasn't comfortable with the functions, but couldn't clearly poke a hole in it myself. Thank you for the help! :) And thank you for a better understaning of what is going on with functions called from select statements.
Finally, to the person who didn't understand why the data is appearing in the same order, here's a simple example where the data is the same, but it comes out differently.
SQL> create table test6 (col1 number);

Table created.

SQL> insert into test6 values (1);

1 row created.

SQL> insert into test6 values (2);

1 row created.

SQL> insert into test6 values (3);

1 row created.

SQL> select * from test6;

      COL1
----------
         1
         2
         3

SQL> select * from test6 order by col1;

      COL1
----------
         1
         2
         3

SQL> delete test6 where col1=1;

1 row deleted.

SQL> insert into test6 values (1);

1 row created.

SQL> select * from test6;

      COL1
----------
         2
         3
         1

SQL> select * from test6 order by col1;

      COL1
----------
         1
         2
         3


Question on KEEP keyword

Robert Shepard, April 20, 2007 - 10:36 am UTC

I am very puzzled by the query you used above:

select id,
   min(fnm) KEEP (dense_rank first order by rowid) fnm,
   min(lnm) KEEP (dense_rank first order by rowid) lnm
from test
group by id;


I see that it works on Oracle 9.2.0.7, but am unable to find any documentation on the "keep" syntax. I assume it's a type of analytic function, but the "SQL Reference" manual has nothing on it.

Is it an undocumented feature? Or am I just looking in the wrong place?

Order By in Analytics

Yogesh Purabiya, April 21, 2007 - 12:27 am UTC

My query was regarding

ops$tkyte%ORA10GR2> select id, fnm, lnm
2 from (select test.*, row_number() over (partition by id order by lnm) rn
3 from test
4 )
5 where rn = 1;

versus

ops$tkyte%ORA10GR2> select id, fnm, lnm
2 from (select test.*, row_number() over (partition by id order by lnm) rn
3 from test
3.1 order by id, lnm
4 )
5 where rn = 1;

I thought that 3.1 order by id, lnm is required. But, in Oracle 8.1.7 also it worked without that.

Thanks for the support.
Tom Kyte
April 21, 2007 - 9:05 am UTC

it worked BY ACCIDENT.

I cannot stress this enough

WITHOUT AN ORDER BY ON THE QUERY THE ROWS ARE FREE TO COME OUT HOWEVER WE FEEL LIKE THEM COMING OUT

without order by, you cannot complain, nor even just point out that the rows don't come out sorted - they do not have to.


Tell me - if you flip a coin 500 times and observe it coming up heads - have you just shown that coins come up heads? Of course not.

In the same sense, you have not shown that this query does not need order by in order to return data sorted by id.

You have the order by in the WRONG PLACE as well - the order by belongs on line 6 - you want to sort the result set.

Restrictions on calling stored functions from SQL

Craig, April 21, 2007 - 2:40 pm UTC

I was surprised to see this example, I did not realise you could execute DML in a function called from query.

From the Application Developers Guide;
'The following restrictions are enforced on subprograms:
* A subprogram called from a query (SELECT) statement or from a parallelized DML statement may NOT execute a DML statement OR OTHERWISE MODIFY the database.
* ...'

But sure enough the next paragraph reads;
'You can avoid these restrictions if the execution of the new SQL statement is not logically embedded in the context of the already running statement. PL/SQL's autonomous transactions provide one escape (see "Autonomous Transactions" )...'

So I should be very careful to run ad-hoc queries on any system that I am not 100 per cent familiar with!

Glad I discovered this today.

second thoughts ...

Craig, April 21, 2007 - 3:16 pm UTC

but of course I should be very familiar with any function that I call from SQL before using it.

Order By and Analytics

Yogesh Purabiya, April 22, 2007 - 11:32 pm UTC

Sorry, Tom ! I have failed to tell you what exactly I want to.
I am not referring to the order of the output to the user.
I am refering to order of the input to the Analytical Function.
Unless we supply sorted input to the Analytics, how does it partiton; that is the point.

(1)
select test.*,
row_number() over (partition by id order by lnm) rn
from test

(2)
select test.*,
row_number() over (partition by id order by lnm) rn
from test
order by id, lnm

(1) is from your review to original query
(2) is what I thought of

I tired both in 8.1.7, and got the same results.
Again, I am not refering to the sorted output, I am refering to the correct effect of partition by.
Tom Kyte
April 23, 2007 - 4:27 pm UTC

partitioning is entirely totally different from sorting.

#1 and #2 above return exactly the same data. The only difference is - #2 will definitely return the rows from the cursor sorted by id, lnm - #1 - does not have to.

but with or without the order by on the statement itself, these queries of course return the same rows! just maybe in different orders.

Thanks !

Yogesh Purabiya, April 23, 2007 - 11:10 pm UTC

Thanks, as usual.
This is clear now.

Confused

Yang, August 17, 2007 - 1:06 am UTC

As far as I am concerned, the old data will be saved into a undo buffer(Sorry I forgot the term) when function getfnm or getlnm touched it. Then, in order to make sure the data accessed by main "select" statement is always the same as that at start point of statement, database will feed the old data in undo buffer to "select" statement. So, the result of main "select" statement should always be consistent, no matter how many records have been deleted or updated by other transactions concurrently. However, your experiment in follow-up just totally overthrown my "belief" above. Could you please help me to figure it out?

Tom Kyte
August 20, 2007 - 10:21 pm UTC

the simple answer is simply:

there are MANY sql statements being executed.
each runs "as of the time they were parsed and opened"

they are all parsed and opened at different points in time.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.