Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Sokrates.

Asked: March 14, 2011 - 10:11 am UTC

Answered by: Tom Kyte - Last updated: November 06, 2012 - 2:44 pm UTC

Category: Database - Version: 11.2.0.2

Viewed 10K+ times! This question is

You Asked

is this intended behaviour:
sokrates@11.2.0.2 > select * from dual where dbms_random.value = dbms_random.value;

DUM
---
X

1 row selected.

sokrates@11.2.0.2 > select * from dual where not ( dbms_random.value = dbms_random.value );

DUM
---
X

1 row selected.

sokrates@11.2.0.2 > select * from dual where dbms_random.value = dbms_random.value and not ( dbms_random.value = dbms_random.value );

DUM
---
X

1 row selected.


?

and we said...

perfect.

I've written thousands of times that you cannot rely on how many times or when or if SQL will call your function.

Your query above can be written in a semantically equivalent fashion as:


select * from (select dbms_random.random R from dual) where r = r and not (r=r);

but that will (currently) return zero rows in 11g. But funny enough, in 10g - if we see you query using the inline view, we turn it into

select * from dual where dbms_random.value =
dbms_random.value and not ( dbms_random.value = dbms_random.value );


and in 10g, that returns

ops$tkyte%ORA10GR2> select * from (select dbms_random.random r from dual) where r=r and not(r=1);

         R
----------
-1.212E+09




I see no bug here, just an optimization that has always been allowed to take place.

If you query:


select * from t where UNIQUE_INDEXED_COLUMN = plsql_function();


How many times should your function be invoked?

o once per row in the table?
o once - just to get the value for the unique index?
o something else?

The answer is "it depends".

It depends on the plan - if we decided to full scan for whatever reason - we might call your function once per row, or maybe just ONCE and reuse the value)

If we decided to index range scan - it will probably be twice in older releases - once to get the start key for the index lookup and once to get the stop key for the index lookup. In current releases - maybe just once to get them both.


"it depends"


When calling functions from SQL, you had better not rely on how often the functions get called, in what order, or whatever. In short, assume nothing. And remember - SQL rewrites kick in and we rewrite your SQL all of the time. Don't rely on side effects

and you rated our response

  (52 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

non deterministic

March 15, 2011 - 3:32 am UTC

Reviewer: Laurent Schneider from Switzerland

As the function DBMS_RANDOM.VALUE (or the deprecated DBMS_RANDOM.RANDOM) is specified twice and is not deterministic, assuming that f=f is true when f is not null is a dubious rewrite.

And this used to work correctly in 10g...


Tom Kyte

Followup  

March 15, 2011 - 8:33 am UTC

I would say 10g worked incorrectly - and we'd be as accurate. It would be like saying "9i used to return this set correctly:

select a,b,count(*) from t group by a,b;"

because it returned the data sorted in a binary fashion by a,b...



SYSDATE is not deterministic either. Yet we evaluate it once per query. We are free to do the same with a function f()


You have no control over the order or number of times a function is invoked - ever. Never had, never will.

The fact is:

select ...
  from t
 where f() = f()


is identical to

select ...
  from (select f() F from t )
 where F=F


and in fact, they could optimize that to be simply:

with data as (select f() F from dual)
select ...
  from t, data
 where F=F


Or even:

select ...
  from t
 where (select f() from dual) = (select f() from dual)



and scalar subquery caching would kick in. Or maybe it wouldn't - sometimes. That gets tricky too, the subquery caching is heavily dependent on the order in which the rows get processed - add a row to a table, and the number of times your function gets invoked might go way way up, way way down, or stay the same. You've NEVER been able to rely on the number of times, nor the order of invocation of your function.



We never needed "deterministic" to do that - it has always happened.

Anyone RELYING on an observed side effect, as a side effect of the optimizer plan, such as this does - is setting themselves up for a problem at some point.


A lot like people that coded:
select a, b, count(*) from t group by a,b;

and assumed the result set would always be ordered by a,b because they empirically observed it being ordered.

number of execution

March 15, 2011 - 9:45 am UTC

Reviewer: Laurent Schneider from Switzerland

How often SYSDATE is executed does not matter, it refers to the starttime at the execution of the query. It always return the same value, being executed once or multiple times.

With

select * from dual where sysdate=sysdate

In 10g it was

1 - filter(SYSDATE@!=SYSDATE@!)

And in 11g it is

1 - filter(SYSDATE@! IS NOT NULL)

Both are always true within one SQL statement.

The rewrite there is legal as the SYSDATE there returns always the same value.

But considering DBMS_RANDOM.VALUE=DBMS_RANDOM.VALUE to be true when DBMS_RANDOM.VALUE is not null is confusing enough to be considered as a bug.


Tom Kyte

Followup  

March 15, 2011 - 10:35 am UTC

I have a feeling if you file it - it'll be closed as "not a bug"


It has always been true that you have never been able to count on the number of times, the ordering of function calls in SQL - never.


And given that using dbms_random in this fashion is "strange" to put it best. Is there is real world use of this, something "practical"?


March 15, 2011 - 10:50 am UTC

Reviewer: Laurent Schneider from Switzerland

> Is there is real world use of this, something "practical"?

Probably not, and therefore I will not open a bug
Tom Kyte

Followup  

March 15, 2011 - 10:58 am UTC

And people do need to watch out for constructs like:


select * from t where dbms_random.value(0,1) <= 0.10;


it won't return 10% of the table (sample does that). It might return no rows or every row :) You'd have to force feed in something "unique" to get the function evaluated at least once per row:



select * from t where dbms_random.value(0,decode(rowid,null,0,1)) <= 0.10;


for example.

nice example

March 16, 2011 - 3:04 am UTC

Reviewer: Laurent Schneider from Switzerland

> select * from t where dbms_random.value(0,1) <= 0.10; 

I like this !

Could then 
select dbms_crypto.randomnumber, ename from emp
return only one distinct value? Scary...


Tom Kyte

Followup  

March 16, 2011 - 8:35 am UTC

yes it could. No reason it could not.



order by?

March 16, 2011 - 9:14 am UTC

Reviewer: Chuck Jolley from OKC, OK USA

Is "order by dbms_random.value" a special case?

Back when we were on 10g we had a process that needed to randomize lists of about 10 to 100 names. Sort of like a lottery. Randomize the list and then pick a random slot in the list as the "winner"

During testing "order by dbms_random.value" did a great job of randomizing the lists. We ran it across hundreds of thousands of test lists and analyzed the results.

The business need for the process went away, but the above discussion made me curious.

Tom Kyte

Followup  

March 16, 2011 - 10:37 am UTC

During testing "order by dbms_random.value" did a great job of randomizing the
lists. We ran it across hundreds of thousands of test lists and analyzed the
results.


How about this - for all of my life, "group by a,b,c" did a great job of order the data by a,b,c. We ran it across hundreds of thousands of test lists and analyzed the results.

Until 10g - when the typical (but not always present - remember group by NEVER had to sort in ANY release) "SORT GROUP BY" was replaced almost always with HASH GROUP BY...


;)

You'll have to watch out for plan changes on that one - one day the optimizer will recognize that "order by f" can be replace with "<this space left intentionally blank>".


Re: order by

March 16, 2011 - 9:39 am UTC

Reviewer: Laurent Schneider from Switzerland

@Chuck : Definitely worth opening bug (if this would occur, which I doubt due to the number of application that will break).

Even if closed as a NOT-A-BUG, you will get a workaround.

Something like
<code>
select /*+optimizer_features_enable('10.1.0')*/ * from dual where dbms_random.value=dbms_random.value;
</code>
Tom Kyte

Followup  

March 16, 2011 - 10:41 am UTC

It would be closed as not a bug if and when it happens - that is what I believe.

March 16, 2011 - 10:47 am UTC

Reviewer: Chuck Jolley from OKC, OK USA

Well, that process and it's code is "in the attic" so to speak.
If the legislature brings it back I will re write the function.

Like Tom, I am not a fan of getting correct results by accident.

Tom, what would be a correct way of randomizing a list, preferably directly in sql?
Tom Kyte

Followup  

March 16, 2011 - 12:15 pm UTC

to be safe, I would wrap a function of my own around it that took rowid (or something else unique) so you can:


select rand(rowid), t.*
from t
order by 1;

since rowid will be unique for each and every row - it cannot cache, it cannot optimize away - it would be forced to call the function. If it didn't sort by the random value in column 1 - it would be a bug.


"order by dbms_random.value" not working as expected [anymore]

March 16, 2011 - 12:59 pm UTC

Reviewer: Sokrates

so, the very commonly used order-by-clause
"order by dbms_random.value"
to randomize the sort order of a result set always worked more or less coincidentally ?

I just checked my two favourite google-custom-Oracle-searches
http://www.miracleoy.fi/search.html
and
http://sn.im/orclsearch
entering
"order by dbms_random.value"
and got tons of Oracle-Experts using this clause

for example Thomas Kyte:
http://www.oracle.com/technetwork/issue-archive/o14asktom-084433.html
"Generating Some Random Data"


seems to me that plenty of articles need to be rewritten

Tom Kyte

Followup  

March 16, 2011 - 3:59 pm UTC

I didn't say it had changed - only that it can. And we probably shouldn't rely on it's current behavior - because it is obvious now that a plan change could change things.

I myself have learned this the hard way over the intervening more than seven years...

March 17, 2011 - 4:24 am UTC

Reviewer: Oleksandr Alesinskyy

From my point of view, such treatment of non-deterministic and "not-builtin" functions is a major, if not a critical bug from Oracle side. All optimizations shall result in the semantically equivalent query - and if the function is neither deterministic nor returning the single value per query (and latter the optimizer can know only for builtins) any optimizations that change the number of calls break this equivalence, so must be prohibited - as they trade the results validity for the performance, which is absolutely unacceptable.


Tom Kyte

Followup  

March 17, 2011 - 8:28 am UTC

Umm, not so, it cannot be so.

Consider:

create table t ( x int, y int );
select * from t where x = f();


Suppose T has 100 rows in it.

How many times should f() be evaluated????

Now, add
create index t_idx on t(x);



Now how many times should f() be evaluated????


Now, insert into t select 1, rownum from dual connect by level <= 1000000;

Now - how many times would or should f() be evaluated???

If you have a non-deterministic function - your result set is BY DEFINITION already non-deterministic (think about that for a second). Use a non-deterministic function in SQL (remember, sql ain't a procedural language) with some trepidation - HEALTHY trepidation - I'm not suggesting "do not use", I'm suggesting "do not use with some expectation of how many times and where/when it will be called".



What about this:

select * from t where x = f() and y = 5;


without an index? How many times would or should f() be invoked? You cannot rely on the side effects of a function being called from SQL, you have NO CLUE when or how it will be called.



And the fact is that

select * from t where x = f() 


is semantically equivalent to

select * from t where x = (select f() from dual)


and if you know about scalar subquery caching, you know that the second one will in fact call f() once per query - not once per row per query.


That a function is deterministic or not is not relevant in the execution of a single SQL statement. You have no ideas how many times it will be invoked, when it will be invoked (short circuit evaluation? use of a index to access the data? Merge join cartesian blows up the result set really big and function is called a lot more than you think it should be? and so on)

Stop thinking procedurally here.

March 17, 2011 - 3:25 pm UTC

Reviewer: Oleksandr Alesinskyy

That's true for the where clause - no doubt, but it is not true for select list, group by and order by clauses.
All of them are (or may be) executed after the row is included into the result set. So we may safely require that the function shall be called once and only once for each row.

Moreover, even for the where clause it may be reasonably required that the function shall be called at least once per each row considered as a "candidate" - including all rows found their way into the result set.
Just as if the function accept a rowid parameter.
Tom Kyte

Followup  

March 17, 2011 - 3:40 pm UTC

... That's true for the where clause - no doubt, but it is not true for select list, group by and order by clauses.
...

why not?


... So we may safely require that the function shall be called once and only once for each row. ...

how so? with view merging and other rewrite techniques - you quite simply cannot say that.


... Moreover, even for the where clause it may be reasonably required that the function shall be called at least once per each row considered as a "candidate" - including all rows found their way into the result set.
Just as if the function accept a rowid parameter. ...

that is not even close to being reasonable. You would never be able to use an index as an access path - ever - if that were the case.



There is no reason whatsoever that a function with no inputs cannot be considered a constant - none. Show me where there is - other than your opinion?


March 17, 2011 - 5:42 pm UTC

Reviewer: Oleksandr Alesinskyy

. That's true for the where clause - no doubt, but it is not true for select list, group by and order by clauses.
...

why not?
Because we may firstly select the row (include it into the result set) and only then calculate the function in its select list, group by, order by (and optionally remember the result with the row for a later reuse).

how so? with view merging and other rewrite techniques - you quite simply cannot say that.

Why? Moreover, even if it is so (quite unlikely!) it only means that such techniques are invalid for this type of queries. Semantics cannot be traded for performance! No way.

... Moreover, even for the where clause it may be reasonably required that the function shall be called at least once per each row considered as a "candidate" - including all rows found their way into the result set.
Just as if the function accept a rowid parameter. ...

that is not even close to being reasonable. You would never be able to use an index as an access path - ever - if that were the case.

Again, why? I did not mean that each row in a table shall be considered as a "candidate", I meant "each row that was really assessed for inclusion" - namely, the conditions from where clause were evaluated against it.
Sure, depending on an execution plan number of calls to the function may change, it is OK. It is only required that function was called once for each row included in the result set (and likely many more times on the "thrown-out" rows).


There is no reason whatsoever that a function with no inputs cannot be considered a constant - none. Show me where there is - other than your opinion?
If it may be considered as a constant it means that it is deterministic - by the very definition of the deterministic function.
DETERMINISTIC Clause
Specify DETERMINISTIC to indicate that the function returns the same result value
whenever it is called with the same values for its arguments.


Tom Kyte

Followup  

March 17, 2011 - 6:01 pm UTC

... Because we may firstly select the row (include it into the result set) and only then calculate the function in its select list, group by, order by (and optionally remember the result with the row for a later reuse). ...

there you go again, ascribing PROCEDURAL processing to a set based language.


Semantics cannot be traded for performance! No way.

the semantics haven't changed. You've never ever had any control over the number of times or when or how things are processed in a SQL query.


Again, why? I did not mean that each row in a table shall be considered as a "candidate", I meant "each row that was really assessed for inclusion" - namely, the conditions from where clause were evaluated against it.

think about how an index access works. The steps are

a) find the start key
b) find the stop key
c) output all keys in between the start and stop key

It would be utterly WRONG to re-evaluate the values in between - we are ranging on the index from (a) to (b), all keys in the range (a) to (b) are KNOWN apriori to satisfy the predicate - by definition.


It is only required that function was called once for each row included in the result set (and likely many more times on the "thrown-out" rows).

It is not - there you go again, ascribing PROCEDURAL PROCESSING - for loops in effect - to an inherently NON-PROCEDURAL language.


If it may be considered as a constant it means that it is deterministic - by the very definition of the deterministic function.


No, that is not the definition of deterministic, deterministic is "same inputs - same outputs - forever".

DETERMINISTIC Clause
Specify DETERMINISTIC to indicate that the function returns the same result value
whenever it is called with the same values for its arguments.


but wait - you knew that.

We are using it as a constant during the execution of a single SQL statement, not OVER sql statements. We are NOT considering it to be deterministic. That is something we are, have been and will be allowed to do.

Again - please reference a specification for the language that says otherwise.


Stop thinking procedurally, stop thinking "row by row" - those concepts do not really exist in SQL as it is processed - they just don't.



Just like:


select *
from (select * from t order by x)
/

is 'dangerous', you might be lead to believe that it is the same as

select * from t order by x

but it isn't, it is the same as

select * from t


March 17, 2011 - 5:57 pm UTC

Reviewer: Oleksandr Alesinskyy

That said I always longed for a "statement-deterministic" clause that mark the function as deterministic inside statement (like SYSDATE).
Tom Kyte

Followup  

March 17, 2011 - 6:22 pm UTC

We've had it for a while.

scalar subqueries achieve that to a large degree.


select *
from t where x = (select f() from dual)
/

will almost certainly cause F() to be evaluated exactly once per query execution - regardless of plan


select * from t where = (select f(t.y) from dual)

will cause f(t.y) to be evaluated as few times as possible given the caching that takes place.


And select * from t where x = (select f() from dual) is semantically equivalent to select * from t where x = F() - but might return a different result set if f() returns different values.


And select f() from t is semantically equivalent to select ( select f() from dual) from t - but again might return different results, they are the same - but because your function is "funky" - different.

caveat emptor.

not specified

March 18, 2011 - 3:04 am UTC

Reviewer: Sokrates

"...please reference a specification for the language that says otherwise. ..."

ok, I see, that's a point
I didn't find any specification in Oracle documentation concerning the question "how often a PL/SQL-function used in sql is evaluated during execution of this sql"
(I "know" - probably it is documented - SYSDATE in one sql is always the SYSDATE when the execution started, I "know" - probably it is documented - about scalar subquery caching, but I don't know much else about it).

So, that's a bit easy for a software vendor: don't specify how things work, and nobody can complain in the future when they will work another way like expected.

This has a number of (ugly) consequences, for example:

- there is nothing like "code stability" in Oracle when I am using PL/SQL-functions within SQL. My code can (and so, most likely, will) break in the future. Good news for developers who look for not being redundant in the future, bad news for developers who want to deliver clean and robust code.

- I *have* to code more "procedural". I can't rely on smart SQL-constructs like for example "order by dbms_random.value" anymore, instead next time I need a randomly ordered result, I will have to code it procedurally ( just one example ). Sigh, I always was proud of being more clever than these procedural guys when it came to "thinking sets", now I see some of the limits

Thanks for answering my question anyway !
Tom Kyte

Followup  

March 18, 2011 - 8:00 am UTC

Or in the SQL language specification itself - that is what I was saying. Look to the *language* that is SQL.

Subquery caching is not documented. It is an internal, legitimate optimization.


there is nothing like "code stability" in Oracle when I am using PL/SQL-functions within SQL.

There is nothing like "code stability" when relying on side effects. For that I present to you:

select a,b,max(c) from t group by a,b;

A query that many thought "should order by a,b because I've seen it do it". Relying on a side effect.

This is nothing like "code stability" when relying on side effects. For that I present to you:

create table t ( x varchar2(20), y int );
select * from t where to_number(x) > 10 and y > 5;


Now, what if - whenever Y>5, X is known to be a number. Does that query work execute successfully or fail if there is some data in there where X is not convertable to a number?

"it depends"


Anytime you ascribe procedural like processing to a SQL statement - you are in danger.


... instead next time I need a randomly ordered result, I will have to code it procedurally ( just one example ). ...


please do go back and re-read, we covered that one, no you don't. If you make a function have unique inputs.... Then it cannot be constant.


scalar subquery caching

March 18, 2011 - 3:53 am UTC

Reviewer: Laurent Schneider from Switzerland

This is probably an illegal optimization that was introduced in Oracle8, and now it is a bit too late to expect to have

SELECT * FROM EMP WHERE DBMS_RANDOM.VALUE<.5

returning 50% of the rows.

Tom, it is GOOD TO KNOW how things works and how things breaks. It helps a lot

I won't change my mind that it is a misbehavior to cache something that is not cachable.

And estimating (DBMS_RANDOM.VALUE=DBMS_RANDOM.VALUE) to true will really look stupid for any programmer that will ever start working with Oracle

Your milage may vary
Tom Kyte

Followup  

March 18, 2011 - 8:02 am UTC

And estimating (DBMS_RANDOM.VALUE=DBMS_RANDOM.VALUE) to true will really look
stupid for any programmer that will ever start working with Oracle


not as stupid as it looks all by itself, before it is executed :)

where random = random;

I would look at that and say "why?"

order by in a subquery

March 18, 2011 - 4:19 am UTC

Reviewer: Laurent Schneider from Switzerland

select *
from (select * from t order by x)
/

is 'dangerous', you might be lead to believe that it is the same as

select * from t order by x

but it isn't, it is the same as

select * from t


but this construct must work

SELECT * FROM
(SELECT empno FROM emp ORDER BY empno)
WHERE ROWNUM < 11;

because it is documented
Tom Kyte

Followup  

March 18, 2011 - 8:03 am UTC

correct - and I purposely did not post it with rownum.

I've seen far too many bits of code using inline views with order by - and people expecting the order by to be "obeyed". It doesn't have to be unless you use rownum - which converts the query into a special "top-n query".

@Laurent

March 18, 2011 - 8:11 am UTC

Reviewer: Sokrates


top-N reporting works with

SELECT * FROM
   (SELECT empno FROM emp ORDER BY empno)
   WHERE ROWNUM < 11;


but you must not assume the result to be ordered by empno !

We've had it for a while.

March 18, 2011 - 8:15 am UTC

Reviewer: Oleksandr Alesinskyy

scalar subqueries achieve that to a large degree.
That's true. But with 2 caveats:
1. Readability is reduced.
2. For reasons unknown to me a usage of such subquieries often caused a drastic performance degradation, at least in 8i/9i.

will cause f(t.y) to be evaluated as few times as possible given the caching that takes place.

And who says that these results are suitable to be cached? This is purpose of the (proposed) clause to define their suitability.

Tom Kyte

Followup  

March 18, 2011 - 8:23 am UTC

can you provide a working example for point #2, my experience has been opposite


... And who says that these results are suitable to be cached? This is purpose of the (proposed) clause to define their suitability.
...

again we are right back to square one with someone ascribing to a non-procedural language some aspects of a procedural language.

rand

March 18, 2011 - 8:54 am UTC

Reviewer: Sokrates


"...please do go back and re-read, we covered that one, no you don't. If you make a function have unique inputs.... Then it cannot be constant. ..."

that may be true for 11.2
So, I should code (example)
create function rand(r in rowid) return number as begin return dbms_random.value; end rand;

and then
select * from (select * from <table> order by rand(rowid)) where rownum<=10
?

Probably, in V12 or 13 or 17, the optimizer discovers that rand does not depend on its input parameter and decides "hey, we optimize that and evaluate rand only once in this query".
Who tells me today it won't ?


Non-Deterministic functions in sql?

March 18, 2011 - 9:30 am UTC

Reviewer: APH from Miami, FL, USA

I'm not really clear on how a non-deterministic function is particularly useful in sql. I'll refer to this example:
create table t ( x int, y int );
select * from t where x = f();
How could someone possibly determine whether f() gets called once for the entire query, or once for each row? Seems ambiguous, meaning the behavior is undefined (don't use it). Same in this case:
select x, f(), from t;
Again, once or once per row? Seems it could go either way.

I suppose you could "cheat" the optimizer by "lying" to it. Say we made f() deterministic, even if it is not (bad idea already, I know). Then, we give it a dummy parameter which has no affect on the output. Wouldn't the optimizer be forced to evaluate the function for every unique argument value? Something like this:
select x, f(dummy=>sys_guid()) from t;
Forgive my ignorance, but I'm having trouble wrapping my head around the whole idea of how non-deterministic functions make sense in a query.

Tom Kyte

Followup  

March 18, 2011 - 10:53 am UTC

I'm not really clear on how a non-deterministic function is particularly useful in sql


this is a good point - I'd change the word useful to "safe".

in a procedural language, you have complete control over when and how things are called. In SQL you do not.

If you have a query that *depends* on a function getting called once per row because of its very nature (say it increments a counter and you depend on that counter), you are doing something very very unsafe.

Wouldn't the optimizer be forced to evaluate the function for every unique argument value?

Today it does, but as Sokrates pointed out above (something I hadn't really thought about yet...) - perhaps in the future, the optimizer will query metadata (that doesn't exist in version 11) about the function and see the function doesn't really depend on the input (it is dummy input, never referenced) so it has actually been optimized away - it doesn't really "happen", the function call is known to be "f()"....

In which case, you'd have to reference it in some fashion ;)

Forgive my ignorance, but I'm having trouble wrapping my head around the whole idea of how non-deterministic functions make sense in a query.



This is a very very good point. Deterministic at least for the LIFE OF A QUERY. They need not be deterministic "forever". For example:

select * from t where x = sys_context('myctx', 'x');

sys_context is DEFINITELY not deterministic, it would be useless if it were. But it is considered deterministic for the life of a query (it works much like a bind does).

If the value returned by sys_context changed in mid-stream - the results would be rather "meaningless" and unpredicable (it would be driven by the query plan, how the data is accessed and that could change from execution to execution)


So, the bottom line is - if you care

o how many times
o when
o if
o in what order

your function with a given set of inputs is invoked in SQL - you are probably doing something wrong.

Who is this "Sokrates"?

March 18, 2011 - 12:01 pm UTC

Reviewer: Parthiban Nagarajan from Coimbatore, India

I'm really eager to know who is this "Sokrates"? In Jonathan Lewis's blog, he is using a shortened URL which then redirects to a Wikipedia's page about the original, famous Socrates.
Anyway, good work Sokrates. This post has grown rapidly in size. Within 4 days, a lot of followups/comments. Keep it up Sokrates :-)

March 18, 2011 - 12:44 pm UTC

Reviewer: Oleksandr Alesinskyy

can you provide a working example for point #2, my experience has been opposite

Unfortunately, not now - I have met it long time ago (2003?) and do not remember all circumstances to model them.
Should add that I was very surprised because my previous experience was more in line with yours. If I would be able to reproduce this behavior in 11g (the only version to which I have an access right now) I would post here.

again we are right back to square one with someone ascribing to a non-procedural language some aspects of a procedural language.
How? I want declaratively describe the contract of my function - and such declarative description is perfectly in line with SQL philosophy.

As for a usability of non-deterministic functions in SQL - the same DBMS_RANDOM.value provides a perfect example of potentially usable non-deterministic function.
Tom Kyte

Followup  

March 18, 2011 - 12:51 pm UTC

... And who says that these results are suitable to be cached? This is purpose of the (proposed) clause to define their suitability.
...

again we are right back to square one with someone ascribing to a non-procedural language some aspects of a procedural language.


but it doesn't have anything to do with caching in general, it is all about the execution plan. The execution plan also dictates what is called, when and how frequently. It still goes back to "procedural processing".

there are 100 records such that x=10
there are 10 records such that y > 10 and x = 10
select *
  from (select f(), y from t where x = 10)
 where y > 10;

Semantically, the same as
select f(), y from t where x = 10 and y > 10;

How many times should f() be invoked.

I vote for "one", regardless of the "deterministic" property of f(). If f() returns arbitrary data for each call - then anything f() can return is acceptable and fine.

Give me a real world use case where by this is not true. let's make this real for a minute. Give me an example of a function f() that returns a different value each time it is called and has real world meaning.

March 18, 2011 - 1:19 pm UTC

Reviewer: A reader

Hi Tom,


there is no input parameter to f() i.e. f(a) it may be executed once but if there is column pass to f() i.e f(col1) than it will executed for every row right?

so in the query below it will call 100 times

select *
from (select f(), y from t where x = 10)
where y > 10;


and 10 times if above query is transform as below

select f(), y from t where x = 10 and y > 10;

Thanks..
Tom Kyte

Followup  

March 18, 2011 - 1:27 pm UTC

there is no input parameter to f() i.e. f(a) it may be executed once but if
there is column pass to f() i.e f(col1) than it will executed for every row
right?


not necessarily, there is no reason a cache of prior f(col1) calls done during the execution of that query could not be maintained. The function would be invoked for the number of DISTINCT values of col1 at least.


so in the query below it will call 100 times

select *
from (select f(), y from t where x = 10)
where y > 10;


and 10 times if above query is transform as below

select f(), y from t where x = 10 and y > 10;

Thanks..


or once - but the point is, it might be 1, it might be 10, it might be 100, it might be some value other than 1, 10, or 100.

Sokrates ...

March 20, 2011 - 5:57 am UTC

Reviewer: Sokrates aka Matthias Rogel from Kaiserslautern, Germany

( to Parthiban Nagarajan from Coimbatore, India )

... is just an Oracle enthusiast
(since the first day I worked with this software in the spring of 1992), who likes to ask questions

Because I have such a long name in real life, I use "Sokrates" when asking questions
in order to save keystrokes :-)

I like questions starting with "how"
but my favourite questions start with "why"

( like the Sokrates on http://tinyurl.com/2xft2k )



March 20, 2011 - 4:13 pm UTC

Reviewer: A reader

there are 100 records such that x=10
there are 10 records such that y > 10 and x = 10

select *
from (select f(), y from t where x = 10)
where y > 10;


Semantically, the same as

select f(), y from t where x = 10 and y > 10;


How many times should f() be invoked.


At least ten - once for each row in the result.
And, strictly speaking, in the presence of non-deterministic function these 2 queries are not equivalent.

If f() returns arbitrary data for each call - then anything f() can return is acceptable and fine.

Definitely, not. E.g. I want to generate data with some distribution.

Give me an example of a function f() that returns a different value each time it is called and has real world meaning.
As said above DBMS_RANDOM.VALUE, ROWNUM(). DENSE_RANK(),...
Tom Kyte

Followup  

March 20, 2011 - 4:31 pm UTC

why not 100? I was expecting that to be a stance - since

from (select f(), y from t where x = 10)

would find 100 rows.


They are equivalent - if the function is non-deterministic (eg: like random), the output of any of them are equivalent - repeated runs of the query (even with the same plan) would not return the same result set. The entire THING is non-deterministic.

ROWNUM, ROW_NUMBER, DENSE_RANK do not count - for they are (assuming the order by is ordering by something UNIQUE) - deterministic. Remember - part of their inputs is the order by clause.

So, beyond "random" (which we've discussed how to "fix") - give a real world example. rownum and dense rank/rank etc do not count for they are in fact deterministic (and have to be used with real care if you do not order by something unique)

March 20, 2011 - 5:03 pm UTC

Reviewer: Oleksandr Alesinskyy

why not 100? I was expecting that to be a stance - since
from (select f(), y from t where x = 10)
would find 100 rows.

100 would be nice - I said at least 10.


They are equivalent - if the function is non-deterministic (eg: like random), the output of any of them are equivalent - repeated runs of the query (even with the same plan) would not return the same result set. The entire THING is non-deterministic.

Probably yes, probably, no. Non-deterministic is not very good word. The real thing is it cannot be determined solely from parameters - but it may be very deterministic if an initial state is taken into account.


ROWNUM, ROW_NUMBER, DENSE_RANK do not count - for they are (assuming the order by is ordering by something UNIQUE) - deterministic. Remember - part of their inputs is the order by clause.

ORDER BY clause is not present in their parameter list :) So, strictly speaking, they are non-deterministic -in a sense of the DETERMINISTIC clause. They depend on the state.



Tom Kyte

Followup  

March 21, 2011 - 9:44 am UTC

As I said, I was expecting 100 to be your stance - not an option - based on your prior statements.


Non-deterministic is the only way I can describe the output from a query using dbms_random in any way. Repeated runs against the same inputs result in different outputs. Even if we start with the same seed. All it takes is a plan change.


Order by is part of their definition. It is part of their *inputs*. Deterministic is not so narrowly defined as to be dependent on formal named parameters. They do not depend on a state, they depend on their inputs. Try using row_number without order by - it is a necessarily component, it is an input to the function. Just as the partitioning and windowing clauses are. The definition of row_number even mentions this:

For consistent results, the query must ensure a deterministic sort order.

sys_context

March 21, 2011 - 9:01 am UTC

Reviewer: James Su from Toronto

Hi Tom,
A trick that we use to pass parameter to a dynamic SQL or VIEW is sys_context. So it is possible this function will be called for each row? Then it doesn't perform as good as bind variables.
Tom Kyte

Followup  

March 21, 2011 - 10:13 am UTC

it is designed to be called once per query - working just like a bind.

however, if you wrap it in other functions, it can be invoked more than once.

"it depends" and we have no control over it...

March 22, 2011 - 6:59 am UTC

Reviewer: Oleksandr Alesinskyy

Followup March 21, 2011 - 9am Central time zone:
As I said, I was expecting 100 to be your stance - not an option - based on your prior statements.

Yes, it would be even better - but it is mainly the matter of definition, that means how we specify semantics for such usages. From my point of view current Oracle semantics is unintuitive and unreasonable. It would be much more honest to prohibit to use any non-deterministic function in SQL.

Non-deterministic is the only way I can describe the output from a query using dbms_random in any way.

Sure, but my comments were not tied to DBMS_RANDOM. They addressed any function with results depending not only on its arguments.


Order by is part of their definition. It is part of their *inputs*. Deterministic is not so narrowly defined as to be dependent on formal named parameters.
Really? Then I have to quote a definition of the DETERMINISTIC clause again:
Specify DETERMINISTIC to indicate that the function returns the same result value whenever it is called with the same values for its arguments.

They do not depend on a state, they depend on their inputs.

They depend oh the state of the query processing, on the pre-history. You may consider it as an input of the state maintained outside - but anyhow it is not an argument.


Tom Kyte

Followup  

March 22, 2011 - 8:46 am UTC

From my point of view current Oracle semantics is unintuitive and unreasonable.

Let me rewrite that for you

From my point of view current SQL semantics is unintuitive and unreasonable.


You keep trying to force procedural constructs and rules on a language that proclaims as loud as it can "I am not procedural"


Specify DETERMINISTIC to indicate that the function returns the same result value whenever it is called with the same values for its arguments.


I am talking about the word DETERMINISTIC in general, in life, in the dictionary. I am not talking about the attribute deterministic as applied to a PLSQL function.

And I will maintain that conceptually speaking - order by is an INPUT to an analytic function. It is an input.


but anyhow it is not an argument.

and it doesn't have to be.

http://dictionary.reference.com/browse/deterministic

See the computing specific definition. The definition of the row_number function even references the word in respect to the order by.


March 23, 2011 - 5:03 pm UTC

Reviewer: Oleksandr Alesinskyy

You keep trying to force procedural constructs and rules on a language that proclaims as loud as it can "I am not procedural"
Which procedural constructs I want to force into SQL? Where you have noticed them? I am speaking exclusively about declarative constructs. Constructs that defines semantics of the outcome but not a way how this semantics will be achieved. Yes, I explained in which way an outcome is attainable - but it is just an example.


I am talking about the word DETERMINISTIC in general, in life, in the dictionary. I am not talking about the attribute deterministic as applied to a PLSQL function.
And I am talking exactly about this DETERMINISTIC clause - because its very existence proves that functions tagged with it should be treated differently then functions without such tagging. Otherwise this clause becomes senseless.

Tom Kyte

Followup  

March 23, 2011 - 5:46 pm UTC

... Otherwise this clause becomes senseless. ...

No it doesn't. Not in the least. Not at all.

In order to create an index on a function - the function must be 100% deterministic. It is a requirement to be indexed.


In order for this function to be called once or one hundred times or five times or N times in fact:

select * from one_hundred_row_table where x = f();

F() does not need to be deterministic - it is not a requirement, it never has been, it never will be.

to_date( '2005', 'yyyy' ) is not deterministic, we only have to evaluate it once if we want to - and in fact - we do, we have and will...

We might call F() once (current releases for start/stop key for an index).
We might call F() twice (olders releases - for start stop key)
We might call F() 100 times (no index)
We might call F() once (versions to come)
We might call F() N times - because that is the way it works, you have no control over it. Never had, never will.


And if we called F() 100 times - once per row - the results of that query would not be repeatable, not deterministic itself. At that point - given you have no control over the order of rows being processed (and never will), that you have no control over the number of times the function will be invoked (never will) - pretty much any answer is acceptable, you have said as much by using a non-deterministic function that you have no control over.

Pipelined functions?

March 23, 2011 - 7:27 pm UTC

Reviewer: Tom from Charlottesville, VA

This has been an interesting discussion thread, but if someone needs that level of procedural control over a result set, why aren't they using pipelined functions?

March 24, 2011 - 5:39 am UTC

Reviewer: Oleksandr Alesinskyy


In order for this function to be called once or one hundred times or five times or N times in fact:

select * from one_hundred_row_table where x = f();

F() does not need to be deterministic - it is not a requirement, it never has been, it never will be.

I argue that if f() is not deterministic (or "statement-deterministic") Oracle has no right to call its once only.


to_date( '2005', 'yyyy' ) is not deterministic, we only have to evaluate it once if we want to - and in fact - we do, we have and will...

It is what I call "statement-deterministic". But there is no way to mark f() this way.

works, you have no control over it. Never had, never will.

Yes, indeed - their is very hard to specify reasonable semantics for non-deterministic function in the where clause. Anyhow, even in this case it might bepossible to guarantee at least one call per row in the final result set - in my opinion it would be less efficient but much more intuitive.

But for select list, group by, order by quite reasonable semantics may be specified and implemented - and without a major performance penalty.

Anyhow, we already are going in circles so I do not see any point to continue this discussion any further.

Tom Kyte

Followup  

July 30, 2012 - 11:44 am UTC

I argue that if f() is not deterministic (or "statement-deterministic") Oracle has no right to call its once only.


and we'll have to agree to DISAGREE. It would mean for example that:

select * from t where dt = to_date( '2005', 'yyyy' )

could *never* use an index on DT. Once again, you are ascribing procedural flow of control to something that says "I AM NOT PROCEDURAL"

It is what I call "statement-deterministic". But there is no way to mark f() this way.


Uh-uh - no way. If you run that shortly before midnight on the last day of the month and it runs for a minute or two - THE VALUE CHANGES. You cannot pick and choose what is "statement deterministic" or not. Basically, anything that is non-deterministic EVER would have to fall into your rule. There cannot be exceptions like that just because you (and you personally) feel it would be OK.


But for select list, group by, order by quite reasonable semantics may be specified and implemented - and without a major performance penalty.


I cannot believe you wrote that really - the "without a major performance penalty".

March 24, 2011 - 5:30 pm UTC

Reviewer: Oleksandr Alesinskyy

I start to feel that you deliberately misinterpret my words.

select * from t where dt = to_date( '2005', 'yyyy' )

could *never* use an index on DT.

And what? How this statement is related to our discussion?

Once again, you are ascribing procedural flow of control to something that says "I AM NOT PROCEDURAL"

It is no more procedural as ROWNUM pseudocolumn behavior. Or LEVEL or - name it yourself.


It is what I call "statement-deterministic". But there is no way to mark f() this way.


Uh-uh - no way. If you run that shortly before midnight on the last day of the month and it runs for a minute or two - THE VALUE CHANGES.
That's true, I completely forgot about this peculiarity (another rather unintuitive behavior - why default month is current but default day is 1?).


You cannot pick and choose what is "statement deterministic" or not. Basically, anything that is non-deterministic EVER would have to fall into your rule.
??? It even does not matter if the function is really statement-deterministic or is completely deterministic. It rather matter of the possibility to specify semantics.

There cannot be exceptions like that just because you (and you personally) feel it would be OK.
Exceptions from what?


I cannot believe you wrote that really - the "without a major performance penalty".

Exactly - without a major performance penalty (unless the calculation of the function itself is very expensive). If it is not used in the where clause at some level of query it execution may be delayed to the last moment. So it does not affect your optimizations.



Tom Kyte

Followup  

March 25, 2011 - 10:50 am UTC

And what? How this statement is related to our discussion?


because you say that a non-deterministic function could and should be evaluated for each row. I quoted you:

<quote>I argue that if f() is not deterministic (or "statement-deterministic") Oracle has no right to call its once only.
</quote>


If that were the case, if what you said was to be true - then the where clause I wrote using to_date('2005','yyyy') could never be evaluated just once for the query, it would have to be evaluated for each row and an index on dt could not be used to evaluate the result set - we'd have to full scan. For to_date(,'yyyy') is NOT deterministic, not at the statement level - not at any level.


There cannot be exceptions like that just because you (and you personally) feel it would be OK.
Exceptions from what?


either non-deterministic functions are evaluated once per row per use in the row - or they are not. You cannot go both ways.

(unless the calculation of the function itself is very expensive

simply the OVERHEAD of invoking the function - even if the function does nothing - is HUGE and don't forget:

select * from one_billion_row_table where indexed_column_with_unique_values = f();

would have to full scan - no indexes. I would call that *expensive*.

ORDER BY in inline view

March 25, 2011 - 12:12 am UTC

Reviewer: Parthiban Nagarajan from Coimbatore, India

Hi Tom

Please review the following code. It seems what you've said in a follow up earlier is not true. Could you explain?
select * from (select * from t order by x) behaves same as select * from t order by x
SQL> select * from v$version;

BANNER                                                                          
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production    
PL/SQL Release 11.2.0.2.0 - Production                                          
CORE 11.2.0.2.0 Production                                                      
TNS for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production               
NLSRTL Version 11.2.0.2.0 - Production                                          

SQL> drop table t7 purge;
drop table t7 purge
           *
ERROR at line 1:
ORA-00942: table or view does not exist 


SQL> create table t7 as select * from all_objects where rownum < 11 order by created desc;

Table created.

SQL> exec dbms_stats.gather_table_stats(user,'t7');

PL/SQL procedure successfully completed.

SQL> set autot traceonly explain
SQL> select * from t7;

Execution Plan
----------------------------------------------------------                      
                                                                                
----------------------------------------------------------                      
| Id  | Operation         | Name | Rows  | Bytes | Cost  |                      
----------------------------------------------------------                      
|   0 | SELECT STATEMENT  |      |    10 |   760 |     3 |                      
|   1 |  TABLE ACCESS FULL| T7   |    10 |   760 |     3 |                      
----------------------------------------------------------                      
                                                                                
Note                                                                            
-----                                                                           
   - 'PLAN_TABLE' is old version                                                

SQL> select * from t7 order by created;

Execution Plan
----------------------------------------------------------                      
                                                                                
-----------------------------------------------------------                     
| Id  | Operation          | Name | Rows  | Bytes | Cost  |                     
-----------------------------------------------------------                     
|   0 | SELECT STATEMENT   |      |    10 |   760 |     4 |                     
|   1 |  SORT ORDER BY     |      |    10 |   760 |     4 |                     
|   2 |   TABLE ACCESS FULL| T7   |    10 |   760 |     3 |                     
-----------------------------------------------------------                     
                                                                                
Note                                                                            
-----                                                                           
   - 'PLAN_TABLE' is old version                                                

SQL> select * from (select * from t7 order by created);

Execution Plan
----------------------------------------------------------                      
                                                                                
-----------------------------------------------------------                     
| Id  | Operation          | Name | Rows  | Bytes | Cost  |                     
-----------------------------------------------------------                     
|   0 | SELECT STATEMENT   |      |    10 |   760 |     4 |                     
|   1 |  SORT ORDER BY     |      |    10 |   760 |     4 |                     
|   2 |   TABLE ACCESS FULL| T7   |    10 |   760 |     3 |                     
-----------------------------------------------------------                     
                                                                                
Note                                                                            
-----                                                                           
   - 'PLAN_TABLE' is old version                                                

SQL> drop table t7 purge;

Table dropped.

SQL> spool off

Tom Kyte

Followup  

March 25, 2011 - 11:05 am UTC

select * from (select * from t order by x) behaves same as select * from t order by x


Let me ask you something:

if you flip a coin 500 times
and the coin lands heads up 500 times
did you just prove that coins, when flipped, always land heads up?


I wrote, very specifically:

Just like:


select *
from (select * from t order by x)
/

is 'dangerous', you might be lead to believe that it is the same as

select * from t order by x

but it isn't, it is the same as

select * from t



The optimizer is perfectly free to remove your order by if it so desires (without rownum being involved in any case).

It might, it might not, it is not promised that the data will be returned ordered. It is a dangerous construct.



Just as in the past people have "observed" that

select x, max(y) from t group by x;

returned data ordered by x, they 'counted on it', they got burnt badly when group by stopped using sorts most of the time instead of using sorts most of the time.


Put the order by on the final result to get data ordered by something.

March 31, 2011 - 11:12 am UTC

Reviewer: Oleksandr Alesinskyy

There cannot be exceptions like that just because you (and you personally) feel it would be OK.
Exceptions from what?

Sysdate is even less deterministic - but it always called once and only once. Oracle effectively declares it to be a "statement-deterministic".

either non-deterministic functions are evaluated once per row per use in the row - or they are not. You cannot go both ways.

Definitely, I can. It is not about the nature of the function as such as about declared semantic. If I declare a function to be a "statement-deterministic" (as Oracle do with sysdate, may propse another name if you do not like "statement-deterministic") then Oracle must to call it once only.


simply the OVERHEAD of invoking the function - even if the function does nothing - is HUGE and don't forget:

Only due to the current Oracle implementation, it is not an inherent property of such calls (yes, there were good reasons to implement it so).

would have to full scan - no indexes. I would call that *expensive*.
No amount of the performance improvement justifies a badly broken semantics. Never, ever!
Otherwise all programs would consist from the single Assembler instruction
ret

Tom Kyte

Followup  

April 12, 2011 - 10:19 am UTC

show me where the semantics are defined that require it to be called N times in a predicable fashion?

SQL is not like that, it doesn't work that way. The use of non-statement deterministic functions is always going to be somewhat unpredicable.

April 03, 2011 - 10:50 am UTC

Reviewer: Bruno Vroman from Belgium

Very interesting discussion. I better understand what I had misunderstood, see "My Oracle Support" forum:
http://tinyurl.com/filetering-with-dbms-random
Thank you Tom for the detailed explanations, and tanks a lot for the "workaround": yes, we can make sure that the function is called once for each row ;-)
Best regards!

What about more than once per row?

July 19, 2012 - 7:44 am UTC

Reviewer: Gus from NZ

Hi Tom,

You said way back up at the top of the thread that we can't be sure whether a function will be called once per query, once per row, or something else. Does that mean it could be called more than once, or even zero times per row?

WITH tbl AS (SELECT 1 AS id, 'X' AS data FROM dual UNION ALL
SELECT 2, 'Y' FROM dual UNION ALL
SELECT 3, 'Z' FROM dual
)
SELECT data
FROM tbl
WHERE id = trunc(dbms_random.value(1, 4))
;

This query returns some number of rows between zero and three for me in 11gr1 and 10gr2XE.

Testing further with

DECLARE
TYPE result_tbl IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;

l_results result_tbl;
l_result NUMBER;

FUNCTION f RETURN NUMBER
AS
l_count NUMBER;
BEGIN
WITH tbl AS (SELECT 1 AS id, 'A' AS data FROM dual UNION ALL
SELECT 2, 'B' FROM dual UNION ALL
SELECT 3, 'C' FROM dual UNION ALL
SELECT 4, 'D' FROM dual UNION ALL
SELECT 5, 'E' FROM dual UNION ALL
SELECT 6, 'F' FROM dual UNION ALL
SELECT 7, 'G' FROM dual UNION ALL
SELECT 8, 'H' FROM dual UNION ALL
SELECT 9, 'I' FROM dual UNION ALL
SELECT 10, 'J' FROM dual
)
SELECT count(data)
INTO l_count
FROM tbl
WHERE id = trunc(dbms_random.value(1, 11))
;
RETURN l_count;
END;
BEGIN
FOR i IN 1..100000 LOOP
l_result := f();
BEGIN
l_results(l_result) := l_results(l_result) + 1;
EXCEPTION WHEN no_data_found THEN
l_results(l_result) := 1;
END;
END LOOP;

FOR i IN l_results.FIRST..l_results.LAST LOOP
dbms_output.put_line('i: ' || i || ', count: ' || l_results(i));
END LOOP;
END;
/

gives me results like

i: 0, count: 34641
i: 1, count: 39201
i: 2, count: 19140
i: 3, count: 5726
i: 4, count: 1113
i: 5, count: 153
i: 6, count: 23
i: 7, count: 3

I get the same results regardless of whether I select from a table or an inline view.

I can understand getting multiple results (and the distribution of rows returned that is greater than 1) per id since I can't guarantee how many times dbms_random.value will be called, but surely I can expect it to be called at least once?

But if I can't then why would the query above ever return no rows?
Tom Kyte

Followup  

July 19, 2012 - 12:27 pm UTC

... but surely I can expect it to be called at
least once?
...

yes it would be called at least once.

but what if it were called more than once. as it likely is. then each time

where id = trunc(dbms_random.value(1,11))

is evaluated, a new random number is returned. so can you see why it would frequently return zero rows. A given row would have only a 1 in 10 chance of being returned.

Rate conversion

July 24, 2012 - 2:50 am UTC

Reviewer: Ashim Basak

HI,
I have a source Rate table like :

From Currency To Currency Conversion Rate
A B 2
B C 2.5
D B 3
B K 1.4
E F 3.1
R C 2.2
A L 1.9


I want to build the target rate table look like:

From Currency To Currency Conversion Rate
A B 2
B C 2.5
A C 5
D B 3
B K 1.4
A K 2.8
E F 3.1
R C 2.2
A L 1.9


Means there is no direct conversion between A to C but has indirect path like A to B and B to C.

A C 5

Also A to B and B to K so get A to K :
A K 2.8

Please suggets the way in a single sql query to solev this problem...
Tom Kyte

Followup  

July 30, 2012 - 9:07 am UTC

no creates
no inserts
no look

Thanks

July 27, 2012 - 11:02 am UTC

Reviewer: Gus from Nz

Thanks Tom, for some reason I didn't even consider the possibility that it would be called once per comparison.

I really appreciate all the work you put in helping people out, I'm constantly impressed by how well you mentor anonymous people. Reading Ask Tom has taught me probably most of what I know about both Oracle and SQL in general.

@Ashim Basak

July 27, 2012 - 12:44 pm UTC

Reviewer: STEVE H

What does your question have to do with random numbers. Are you using random numbers for your currenecy conversions.....?

interesting difference in behaviour

November 03, 2012 - 4:38 pm UTC

Reviewer: Sokrates

sokrates@11.2 > create function rand return number is begin dbms_output.put_line('rand called'); return dbms_random.value; end rand;
  2  /

Function created.

sokrates@11.2 > set autotr on explain
sokrates@11.2 > with x as (select dbms_random.value r from dual) select * from x where r=r and r!=r;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 885813501

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     1 |    13 |     2   (0)| 00:00:01 |
|*  1 |  VIEW            |      |     1 |    13 |     2   (0)| 00:00:01 |
|   2 |   FAST DUAL      |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("R" IS NOT NULL AND "R"<>"R")


how often dbms_random.value was called here ?
I would guess "once".

However, different behaviour with user-defined function rand:
sokrates@11.2 > with x as (select rand r from dual) select * from x where r=r and r!=r;

         R
----------
.215245124

rand called
rand called
rand called
rand called

Execution Plan
----------------------------------------------------------
Plan hash value: 4034615273

-----------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     1 |     2   (0)| 00:00:01 |
|*  1 |  FILTER          |      |       |            |          |
|   2 |   FAST DUAL      |      |     1 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RAND"()<>"RAND"() AND "RAND"() IS NOT NULL)

Tom Kyte

Followup  

November 05, 2012 - 8:59 am UTC

and further:

ops$tkyte%ORA11GR2> with x as (select /*+ materialize */ rand r from dual) select * from x where r=r and r!=r;

no rows selected

rand called

Execution Plan
----------------------------------------------------------
Plan hash value: 3040811146

---------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |     1 |    13 |     4   (0)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6644_5FD48AF |       |       |            |          |
|   3 |    FAST DUAL               |                            |     1 |       |     2   (0)| 00:00:01 |
|*  4 |   VIEW                     |                            |     1 |    13 |     2   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL       | SYS_TEMP_0FD9D6644_5FD48AF |     1 |    13 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("R" IS NOT NULL AND "R"<>"R")



calling functions that are not statement level deterministic is dicey from SQL.

November 05, 2012 - 1:04 am UTC

Reviewer: Al Ricafort from Hong Kong

This topic has totally altered my perception about function in a SQL.

Mind Reading

November 05, 2012 - 12:39 pm UTC

Reviewer: Chuck Jolley from OKC, OK USA

Wow, as I got close to catching up to date on this question I started to ponder the effect of WITH and materialize on it.

Bingo, next scroll down there is Socrates and Tom discussing exactly those things.

November 05, 2012 - 1:49 pm UTC

Reviewer: A reader

Hello Sir,

why optimizer put - filter("R" IS NOT NULL AND "R"<>"R")

when its obivious that r=r and r!=r from the query that it will never be true and no rows will be produce? can you please explain.


Thank you

Tom Kyte

Followup  

November 05, 2012 - 2:00 pm UTC

that would just be a micro-optimization for a case that would never really occur in real life (who would be coding r=r and r<>r??) that they didn't make. in other words, this isn't real world, so why have an optimization branch for it.

November 05, 2012 - 2:39 pm UTC

Reviewer: A reader

Thanks Sir for quick response to clarify my understanding further more...


sokrates@11.2 > with x as (select rand r from dual) select * from x where r=r and r!=r;

         R
----------
.215245124

rand called
rand called
rand called
rand called

Execution Plan
----------------------------------------------------------
Plan hash value: 4034615273

-----------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     1 |     2   (0)| 00:00:01 |
|*  1 |  FILTER          |      |       |            |          |
|   2 |   FAST DUAL      |      |     1 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RAND"()<>"RAND"() AND "RAND"() IS NOT NULL)





is it above query get un-nested as below


select rand() from dual where rand()=rand() and rand()!=rand()


so rand() will be call first time in select, then second time during evaluting condition rand()=rand() and during last condition its call 2 times
and thats how its print the message 4 times?

Please correct me if i am wrong.


Thank you


Tom Kyte

Followup  

November 05, 2012 - 2:42 pm UTC

I believe so, once for the projection and three times for the rewritten predicate.

November 05, 2012 - 8:26 pm UTC

Reviewer: Chuck Jolley from OKC, OK USA

That one bothers me.
If the function were called in place of the r references the way it winds up working seems intuitive, accessing it through the WITH clause not so much.
How is the WITH clause defined?
Tom Kyte

Followup  

November 06, 2012 - 7:43 am UTC

I'm not sure what you mean?

to "A Reader"

November 06, 2012 - 2:07 am UTC

Reviewer: Sokrates


is it above query get un-nested as below


select rand() from dual where rand()=rand() and rand()!=rand()


so rand() will be call first time in select, then second time during evaluting condition rand()=rand() and during last condition its call 2 times
and thats how its print the message 4 times?


Actually the order of calls seem to differ slightly, the result of the select seems to be the last call, not the first call, at first the conditions are evaluated:

sokrates@11.2 > create or replace
  2  function rand return number
  3  is
  4  r number := dbms_random.value;
  5  begin
  6  dbms_output.put_line('rand called - will return ' || r);
  7  return r;
  8  end rand;
  9  /

Function created.

sokrates@11.2 > with x as (select rand r from dual) select * from x where r=r and r!=r;

         R
----------
.556546596

rand called - will return .3848935116626815464233585003749046243
rand called - will return .97113274370680783878482004983917496957
rand called - will return .36883776673117203532610477335486686666
rand called - will return .55654659631275442000486278461821038313

Execution Plan
----------------------------------------------------------
Plan hash value: 4034615273

-----------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     1 |     2   (0)| 00:00:01 |
|*  1 |  FILTER          |      |       |            |          |
|   2 |   FAST DUAL      |      |     1 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RAND"()<>"RAND"() AND "RAND"() IS NOT NULL)


The select delivers the result of the last call.

To "A reader"

November 06, 2012 - 2:27 am UTC

Reviewer: Sokrates

why optimizer put - filter("R" IS NOT NULL AND "R"<>"R")

when its obivious that r=r and r!=r from the query that it will never be true and no rows will be
produce?


That's a very interesting question in my optinion.

Assume, that the optimizer would see the "obvious contradiction" and from that would deliver "no rows selected". In other words, you would expect that the optimizer would "see" any "obvious" contradiction in a SQL, wouldn't you ?

But this wouldn't be a "micro-optimization", quite the opposite: the implementation of such an optimization would have to solve the boolean satisfiability problem, which is known to be NP-hard, see
http://en.wikipedia.org/wiki/Boolean_satisfiability_problem

November 06, 2012 - 10:19 am UTC

Reviewer: Chuck Jolley from OKC, OK USA

Well, I can understand why rand() could be called 4 times in the case of Socrates query written without the WITH clause for the reasons you've given.
But for the life of me I can't understand why the optimizer would simply rewrite the WITH clause out of the query as given just above.
At least that's the only explanation I can think of for the 4 calls to rand()
I thought the WITH clause was kind of like a directive to "do this first and reuse the results"
Tom Kyte

Followup  

November 06, 2012 - 10:47 am UTC

... I can't understand why the optimizer would simply
rewrite the WITH clause out of the query as given just above. ...


it is called view merging and it happens all of the time. SQL is not a procedural language - query rewrites are a huge part of optimization.

http://www.oracle.com/technetwork/issue-archive/2005/05-mar/o25asktom-089621.html
"Views and Merging/Pushing"

November 06, 2012 - 1:40 pm UTC

Reviewer: Chuck Jolley from OKC, OK USA

Interesting.
So WITH can be thought of as an alternate inline view syntax?

Our database even rewrites it when a materialize hint is used:

chajol_dba@taxtest>ed
Wrote file afiedt.buf

1 with x
2 as (/*+ materialize */
3 select rand r from dual)
4 select *
5 from x
6* where r=r and r!=r
chajol_dba@taxtest>/

R
----------
.140225199

1 row selected.

rand called - will return .76853973916567692167596730551150623272
rand called - will return .5396196175405806460117863850708532041
rand called - will return .37414826008658638489073832782549908657
rand called - will return .14022519860129962410244274383917663374
Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
Plan hash value: 4034615273

-----------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
|* 1 | FILTER | | | | |
| 2 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("RAND"()<>"RAND"() AND "RAND"() IS NOT NULL)

chajol_dba@taxtest>

I'm lucky in that I sit a mere tens of feet from my users and can fix things, if a plan goes bad, in a matter of minutes.

Tom Kyte

Followup  

November 06, 2012 - 2:44 pm UTC

So WITH can be thought of as an alternate inline view syntax?


in a way, yes. It can either be materialized into a temp table (it is sort of a hint that that might be good) or can be merged into the query directly - just like any other view or inline view.

you put the materlize hint in the wrong place, it is just a comment there.

Look up a couple of reviews, I did the materialize example already.


November 06, 2012 - 1:48 pm UTC

Reviewer: Chuck Jolley from OKC, OK USA

On the other hand:

chajol_dba@taxtest>ed
Wrote file afiedt.buf

1 with x
2 as (select rand() r from dual connect by level < 2)
3 select *
4 from x
5* where r=r and r!=r
chajol_dba@taxtest>/

no rows selected

rand called - will return .48506043344076145464260047750362969475
Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 2403765415

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |
|* 1 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
|* 2 | CONNECT BY WITHOUT FILTERING| | | | | |
| 3 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("R" IS NOT NULL AND "R"<>"R")
2 - filter(LEVEL<2)

chajol_dba@taxtest>

November 06, 2012 - 5:49 pm UTC

Reviewer: Chuck Jolley from OKC, OK USA

OK, thanks

November 07, 2012 - 8:55 am UTC

Reviewer: Chuck Jolley from OKC, OK USA

Yeah, it was a typo on my part. The materialize hint does change the plan. I put the corrected code here just for completeness.
These edge cases are interesting because they are like the C code you can find out there that is designed to trick compilers into making mistakes.

chajol_dba@taxtest>ed
Wrote file afiedt.buf

1 with x
2 as
3 (select /*+ materialize */
4 rand r
5 from dual)
6 select *
7 from x
8* where r=r and r!=r
chajol_dba@taxtest>/

no rows selected

rand called - will return .97600706740531863178218579740983563455
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 791862530

----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 4 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D683A_5017A14F | | | | |
| 3 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 4 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | SYS_TEMP_0FD9D683A_5017A14F | 1 | 13 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - filter("R" IS NOT NULL AND "R"<>"R")

chajol_dba@taxtest>

Distributing a given number between multiple columns

July 30, 2014 - 7:19 am UTC

Reviewer: Thiru from Phoenix

I have a requirement wherein a value in a given column needs to be distributed amongst two or three columns equally or otherwise.

For eg:

CREATE TABLE TT(c1 number,c2 number, c3 number,c4 number)

insert into tt values(.33,.33,.34,3)
insert into tt values(.33,.33,.34,5)
insert into tt values(.5,.5,null,5)

the column c4 needs to be distributed amongst c1,c2,c3 provided the values in c1,c2,c3 are not 0 or 1.

So in the above case, I would like to have the output as:

1,1,1,3
1,2,2,5 this could also be 2,1,2,5 or any such distrubtion
2,3,null,5

Trying to use DBMS_RANDOM but looks like I am not able to achieve
Thanks a lot
Thiru

More to Explore

DBMS_RANDOM

More on PL/SQL routine DBMS_RANDOM here