non deterministic
Laurent Schneider, March 15, 2011 - 3:32 am UTC
As the function DBMS_RANDOM.VALUE (or the deprecated DBMS_RANDOM.RANDOM) is specified twice and is not deterministic, assuming that f=f is true when f is not null is a dubious rewrite.
And this used to work correctly in 10g...
March 15, 2011 - 8:33 am UTC
I would say 10g worked incorrectly - and we'd be as accurate. It would be like saying "9i used to return this set correctly:
select a,b,count(*) from t group by a,b;"
because it returned the data sorted in a binary fashion by a,b...
SYSDATE is not deterministic either. Yet we evaluate it once per query. We are free to do the same with a function f()
You have no control over the order or number of times a function is invoked - ever. Never had, never will.
The fact is:
select ...
from t
where f() = f()
is identical to
select ...
from (select f() F from t )
where F=F
and in fact, they could optimize that to be simply:
with data as (select f() F from dual)
select ...
from t, data
where F=F
Or even:
select ...
from t
where (select f() from dual) = (select f() from dual)
and scalar subquery caching would kick in. Or maybe it wouldn't - sometimes. That gets tricky too, the subquery caching is heavily dependent on the order in which the rows get processed - add a row to a table, and the number of times your function gets invoked might go way way up, way way down, or stay the same. You've NEVER been able to rely on the number of times, nor the order of invocation of your function.
We never needed "deterministic" to do that - it has always happened.
Anyone RELYING on an observed side effect, as a side effect of the optimizer plan, such as this does - is setting themselves up for a problem at some point.
A lot like people that coded:
select a, b, count(*) from t group by a,b;
and assumed the result set would always be ordered by a,b because they empirically observed it being ordered.
number of execution
Laurent Schneider, March 15, 2011 - 9:45 am UTC
How often SYSDATE is executed does not matter, it refers to the starttime at the execution of the query. It always return the same value, being executed once or multiple times.
With
select * from dual where sysdate=sysdate
In 10g it was
1 - filter(SYSDATE@!=SYSDATE@!)
And in 11g it is
1 - filter(SYSDATE@! IS NOT NULL)
Both are always true within one SQL statement.
The rewrite there is legal as the SYSDATE there returns always the same value.
But considering DBMS_RANDOM.VALUE=DBMS_RANDOM.VALUE to be true when DBMS_RANDOM.VALUE is not null is confusing enough to be considered as a bug.
March 15, 2011 - 10:35 am UTC
I have a feeling if you file it - it'll be closed as "not a bug"
It has always been true that you have never been able to count on the number of times, the ordering of function calls in SQL - never.
And given that using dbms_random in this fashion is "strange" to put it best. Is there is real world use of this, something "practical"?
Laurent Schneider, March 15, 2011 - 10:50 am UTC
> Is there is real world use of this, something "practical"?
Probably not, and therefore I will not open a bug
March 15, 2011 - 10:58 am UTC
And people do need to watch out for constructs like:
select * from t where dbms_random.value(0,1) <= 0.10;
it won't return 10% of the table (sample does that). It might return no rows or every row :) You'd have to force feed in something "unique" to get the function evaluated at least once per row:
select * from t where dbms_random.value(0,decode(rowid,null,0,1)) <= 0.10;
for example.
nice example
Laurent Schneider, March 16, 2011 - 3:04 am UTC
> select * from t where dbms_random.value(0,1) <= 0.10;
I like this !
Could then
select dbms_crypto.randomnumber, ename from emp
return only one distinct value? Scary...
March 16, 2011 - 8:35 am UTC
yes it could. No reason it could not.
order by?
Chuck Jolley, March 16, 2011 - 9:14 am UTC
Is "order by dbms_random.value" a special case?
Back when we were on 10g we had a process that needed to randomize lists of about 10 to 100 names. Sort of like a lottery. Randomize the list and then pick a random slot in the list as the "winner"
During testing "order by dbms_random.value" did a great job of randomizing the lists. We ran it across hundreds of thousands of test lists and analyzed the results.
The business need for the process went away, but the above discussion made me curious.
March 16, 2011 - 10:37 am UTC
During testing "order by dbms_random.value" did a great job of randomizing the
lists. We ran it across hundreds of thousands of test lists and analyzed the
results.
How about this - for all of my life, "group by a,b,c" did a great job of order the data by a,b,c. We ran it across hundreds of thousands of test lists and analyzed the results.
Until 10g - when the typical (but not always present - remember group by NEVER had to sort in ANY release) "SORT GROUP BY" was replaced almost always with HASH GROUP BY...
;)
You'll have to watch out for plan changes on that one - one day the optimizer will recognize that "order by f" can be replace with "<this space left intentionally blank>".
Re: order by
Laurent Schneider, March 16, 2011 - 9:39 am UTC
@Chuck : Definitely worth opening bug (if this would occur, which I doubt due to the number of application that will break).
Even if closed as a NOT-A-BUG, you will get a workaround.
Something like
<code>
select /*+optimizer_features_enable('10.1.0')*/ * from dual where dbms_random.value=dbms_random.value;
</code>
March 16, 2011 - 10:41 am UTC
It would be closed as not a bug if and when it happens - that is what I believe.
Chuck Jolley, March 16, 2011 - 10:47 am UTC
Well, that process and it's code is "in the attic" so to speak.
If the legislature brings it back I will re write the function.
Like Tom, I am not a fan of getting correct results by accident.
Tom, what would be a correct way of randomizing a list, preferably directly in sql?
March 16, 2011 - 12:15 pm UTC
to be safe, I would wrap a function of my own around it that took rowid (or something else unique) so you can:
select rand(rowid), t.*
from t
order by 1;
since rowid will be unique for each and every row - it cannot cache, it cannot optimize away - it would be forced to call the function. If it didn't sort by the random value in column 1 - it would be a bug.
"order by dbms_random.value" not working as expected [anymore]
Sokrates, March 16, 2011 - 12:59 pm UTC
March 16, 2011 - 3:59 pm UTC
I didn't say it had changed - only that it can. And we probably shouldn't rely on it's current behavior - because it is obvious now that a plan change could change things.
I myself have learned this the hard way over the intervening more than seven years...
Oleksandr Alesinskyy, March 17, 2011 - 4:24 am UTC
From my point of view, such treatment of non-deterministic and "not-builtin" functions is a major, if not a critical bug from Oracle side. All optimizations shall result in the semantically equivalent query - and if the function is neither deterministic nor returning the single value per query (and latter the optimizer can know only for builtins) any optimizations that change the number of calls break this equivalence, so must be prohibited - as they trade the results validity for the performance, which is absolutely unacceptable.
March 17, 2011 - 8:28 am UTC
Umm, not so, it cannot be so.
Consider:
create table t ( x int, y int );
select * from t where x = f();
Suppose T has 100 rows in it.
How many times should f() be evaluated????
Now, add
create index t_idx on t(x);
Now how many times should f() be evaluated????
Now, insert into t select 1, rownum from dual connect by level <= 1000000;
Now - how many times would or should f() be evaluated???
If you have a non-deterministic function - your result set is BY DEFINITION already non-deterministic (think about that for a second). Use a non-deterministic function in SQL (remember, sql ain't a procedural language) with some trepidation - HEALTHY trepidation - I'm not suggesting "do not use", I'm suggesting "do not use with some expectation of how many times and where/when it will be called".
What about this:
select * from t where x = f() and y = 5;
without an index? How many times would or should f() be invoked? You cannot rely on the side effects of a function being called from SQL, you have NO CLUE when or how it will be called.
And the fact is that
select * from t where x = f()
is semantically equivalent to
select * from t where x = (select f() from dual)
and if you know about scalar subquery caching, you know that the second one will in fact call f() once per query - not once per row per query.
That a function is deterministic or not is not relevant in the execution of a single SQL statement. You have no ideas how many times it will be invoked, when it will be invoked (short circuit evaluation? use of a index to access the data? Merge join cartesian blows up the result set really big and function is called a lot more than you think it should be? and so on)
Stop thinking procedurally here.
Oleksandr Alesinskyy, March 17, 2011 - 3:25 pm UTC
That's true for the where clause - no doubt, but it is not true for select list, group by and order by clauses.
All of them are (or may be) executed after the row is included into the result set. So we may safely require that the function shall be called once and only once for each row.
Moreover, even for the where clause it may be reasonably required that the function shall be called at least once per each row considered as a "candidate" - including all rows found their way into the result set.
Just as if the function accept a rowid parameter.
March 17, 2011 - 3:40 pm UTC
... That's true for the where clause - no doubt, but it is not true for select list, group by and order by clauses.
...
why not?
... So we may safely require that the function shall be called once and only once for each row. ...
how so? with view merging and other rewrite techniques - you quite simply cannot say that.
... Moreover, even for the where clause it may be reasonably required that the function shall be called at least once per each row considered as a "candidate" - including all rows found their way into the result set.
Just as if the function accept a rowid parameter. ...
that is not even close to being reasonable. You would never be able to use an index as an access path - ever - if that were the case.
There is no reason whatsoever that a function with no inputs cannot be considered a constant - none. Show me where there is - other than your opinion?
Oleksandr Alesinskyy, March 17, 2011 - 5:42 pm UTC
. That's true for the where clause - no doubt, but it is not true for select list, group by and order by clauses.
...
why not?
Because we may firstly select the row (include it into the result set) and only then calculate the function in its select list, group by, order by (and optionally remember the result with the row for a later reuse).
how so? with view merging and other rewrite techniques - you quite simply cannot say that.
Why? Moreover, even if it is so (quite unlikely!) it only means that such techniques are invalid for this type of queries. Semantics cannot be traded for performance! No way.
... Moreover, even for the where clause it may be reasonably required that the function shall be called at least once per each row considered as a "candidate" - including all rows found their way into the result set.
Just as if the function accept a rowid parameter. ...
that is not even close to being reasonable. You would never be able to use an index as an access path - ever - if that were the case.
Again, why? I did not mean that each row in a table shall be considered as a "candidate", I meant "each row that was really assessed for inclusion" - namely, the conditions from where clause were evaluated against it.
Sure, depending on an execution plan number of calls to the function may change, it is OK. It is only required that function was called once for each row included in the result set (and likely many more times on the "thrown-out" rows).
There is no reason whatsoever that a function with no inputs cannot be considered a constant - none. Show me where there is - other than your opinion?
If it may be considered as a constant it means that it is deterministic - by the very definition of the deterministic function.
DETERMINISTIC Clause
Specify DETERMINISTIC to indicate that the function returns the same result value
whenever it is called with the same values for its arguments.
March 17, 2011 - 6:01 pm UTC
... Because we may firstly select the row (include it into the result set) and only then calculate the function in its select list, group by, order by (and optionally remember the result with the row for a later reuse). ...
there you go again, ascribing PROCEDURAL processing to a set based language.
Semantics cannot be traded for performance! No way.
the semantics haven't changed. You've never ever had any control over the number of times or when or how things are processed in a SQL query.
Again, why? I did not mean that each row in a table shall be considered as a "candidate", I meant "each row that was really assessed for inclusion" - namely, the conditions from where clause were evaluated against it.
think about how an index access works. The steps are
a) find the start key
b) find the stop key
c) output all keys in between the start and stop key
It would be utterly WRONG to re-evaluate the values in between - we are ranging on the index from (a) to (b), all keys in the range (a) to (b) are KNOWN apriori to satisfy the predicate - by definition.
It is only required that function was called once for each row included in the result set (and likely many more times on the "thrown-out" rows).
It is not - there you go again, ascribing PROCEDURAL PROCESSING - for loops in effect - to an inherently NON-PROCEDURAL language.
If it may be considered as a constant it means that it is deterministic - by the very definition of the deterministic function.
No, that is not the definition of deterministic, deterministic is "same inputs - same outputs - forever".
DETERMINISTIC Clause
Specify DETERMINISTIC to indicate that the function returns the same result value
whenever it is called with the same values for its arguments.
but wait - you knew that.
We are using it as a constant during the execution of a single SQL statement, not OVER sql statements. We are NOT considering it to be deterministic. That is something we are, have been and will be allowed to do.
Again - please reference a specification for the language that says otherwise.
Stop thinking procedurally, stop thinking "row by row" - those concepts do not really exist in SQL as it is processed - they just don't.
Just like:
select *
from (select * from t order by x)
/
is 'dangerous', you might be lead to believe that it is the same as
select * from t order by x
but it isn't, it is the same as
select * from t
Oleksandr Alesinskyy, March 17, 2011 - 5:57 pm UTC
That said I always longed for a "statement-deterministic" clause that mark the function as deterministic inside statement (like SYSDATE).
March 17, 2011 - 6:22 pm UTC
We've had it for a while.
scalar subqueries achieve that to a large degree.
select *
from t where x = (select f() from dual)
/
will almost certainly cause F() to be evaluated exactly once per query execution - regardless of plan
select * from t where = (select f(t.y) from dual)
will cause f(t.y) to be evaluated as few times as possible given the caching that takes place.
And select * from t where x = (select f() from dual) is semantically equivalent to select * from t where x = F() - but might return a different result set if f() returns different values.
And select f() from t is semantically equivalent to select ( select f() from dual) from t - but again might return different results, they are the same - but because your function is "funky" - different.
caveat emptor.
not specified
Sokrates, March 18, 2011 - 3:04 am UTC
"...please reference a specification for the language that says otherwise. ..."
ok, I see, that's a point
I didn't find any specification in Oracle documentation concerning the question "how often a PL/SQL-function used in sql is evaluated during execution of this sql"
(I "know" - probably it is documented - SYSDATE in one sql is always the SYSDATE when the execution started, I "know" - probably it is documented - about scalar subquery caching, but I don't know much else about it).
So, that's a bit easy for a software vendor: don't specify how things work, and nobody can complain in the future when they will work another way like expected.
This has a number of (ugly) consequences, for example:
- there is nothing like "code stability" in Oracle when I am using PL/SQL-functions within SQL. My code can (and so, most likely, will) break in the future. Good news for developers who look for not being redundant in the future, bad news for developers who want to deliver clean and robust code.
- I *have* to code more "procedural". I can't rely on smart SQL-constructs like for example "order by dbms_random.value" anymore, instead next time I need a randomly ordered result, I will have to code it procedurally ( just one example ). Sigh, I always was proud of being more clever than these procedural guys when it came to "thinking sets", now I see some of the limits
Thanks for answering my question anyway !
March 18, 2011 - 8:00 am UTC
Or in the SQL language specification itself - that is what I was saying. Look to the *language* that is SQL.
Subquery caching is not documented. It is an internal, legitimate optimization.
there is nothing like "code stability" in Oracle when I am using PL/SQL-functions within SQL.
There is nothing like "code stability" when relying on side effects. For that I present to you:
select a,b,max(c) from t group by a,b;
A query that many thought "should order by a,b because I've seen it do it". Relying on a side effect.
This is nothing like "code stability" when relying on side effects. For that I present to you:
create table t ( x varchar2(20), y int );
select * from t where to_number(x) > 10 and y > 5;
Now, what if - whenever Y>5, X is known to be a number. Does that query work execute successfully or fail if there is some data in there where X is not convertable to a number?
"it depends"
Anytime you ascribe procedural like processing to a SQL statement - you are in danger.
... instead next time I need a randomly ordered result, I will have to code it procedurally ( just one example ). ...
please do go back and re-read, we covered that one, no you don't. If you make a function have unique inputs.... Then it cannot be constant.
scalar subquery caching
Laurent Schneider, March 18, 2011 - 3:53 am UTC
This is probably an illegal optimization that was introduced in Oracle8, and now it is a bit too late to expect to have
SELECT * FROM EMP WHERE DBMS_RANDOM.VALUE<.5
returning 50% of the rows.
Tom, it is GOOD TO KNOW how things works and how things breaks. It helps a lot
I won't change my mind that it is a misbehavior to cache something that is not cachable.
And estimating (DBMS_RANDOM.VALUE=DBMS_RANDOM.VALUE) to true will really look stupid for any programmer that will ever start working with Oracle
Your milage may vary
March 18, 2011 - 8:02 am UTC
And estimating (DBMS_RANDOM.VALUE=DBMS_RANDOM.VALUE) to true will really look
stupid for any programmer that will ever start working with Oracle
not as stupid as it looks all by itself, before it is executed :)
where random = random;
I would look at that and say "why?"
order by in a subquery
Laurent Schneider, March 18, 2011 - 4:19 am UTC
select *
from (select * from t order by x)
/
is 'dangerous', you might be lead to believe that it is the same as
select * from t order by x
but it isn't, it is the same as
select * from t
but this construct must work
SELECT * FROM
(SELECT empno FROM emp ORDER BY empno)
WHERE ROWNUM < 11;
because it is documented
March 18, 2011 - 8:03 am UTC
correct - and I purposely did not post it with rownum.
I've seen far too many bits of code using inline views with order by - and people expecting the order by to be "obeyed". It doesn't have to be unless you use rownum - which converts the query into a special "top-n query".
@Laurent
Sokrates, March 18, 2011 - 8:11 am UTC
top-N reporting works with
SELECT * FROM
(SELECT empno FROM emp ORDER BY empno)
WHERE ROWNUM < 11;
but you must not assume the result to be ordered by empno !
We've had it for a while.
Oleksandr Alesinskyy, March 18, 2011 - 8:15 am UTC
scalar subqueries achieve that to a large degree.
That's true. But with 2 caveats:
1. Readability is reduced.
2. For reasons unknown to me a usage of such subquieries often caused a drastic performance degradation, at least in 8i/9i.
will cause f(t.y) to be evaluated as few times as possible given the caching that takes place.
And who says that these results are suitable to be cached? This is purpose of the (proposed) clause to define their suitability.
March 18, 2011 - 8:23 am UTC
can you provide a working example for point #2, my experience has been opposite
... And who says that these results are suitable to be cached? This is purpose of the (proposed) clause to define their suitability.
...
again we are right back to square one with someone ascribing to a non-procedural language some aspects of a procedural language.
rand
Sokrates, March 18, 2011 - 8:54 am UTC
"
...please do go back and re-read, we covered that one, no you don't. If you make a function have unique inputs.... Then it cannot be constant. ..."
that may be true for 11.2
So, I should code (example)
create function rand(r in rowid) return number as begin return dbms_random.value; end rand;
and then
select * from (select * from <table> order by rand(rowid)) where rownum<=10
?
Probably, in V12 or 13 or 17, the optimizer discovers that rand does not depend on its input parameter and decides "hey, we optimize that and evaluate rand only once in this query".
Who tells me today it won't ?
Non-Deterministic functions in sql?
APH, March 18, 2011 - 9:30 am UTC
I'm not really clear on how a non-deterministic function is particularly useful in sql. I'll refer to this example:
create table t ( x int, y int );
select * from t where x = f();
How could someone possibly determine whether f() gets called once for the entire query, or once for each row? Seems ambiguous, meaning the behavior is undefined (don't use it). Same in this case:
select x, f(), from t;
Again, once or once per row? Seems it could go either way.
I suppose you could "cheat" the optimizer by "lying" to it. Say we made f() deterministic, even if it is not (bad idea already, I know). Then, we give it a dummy parameter which has no affect on the output. Wouldn't the optimizer be forced to evaluate the function for every unique argument value? Something like this:
select x, f(dummy=>sys_guid()) from t;
Forgive my ignorance, but I'm having trouble wrapping my head around the whole idea of how non-deterministic functions make sense in a query.
March 18, 2011 - 10:53 am UTC
I'm not really clear on how a non-deterministic function is particularly useful in sql
this is a good point - I'd change the word useful to "safe".
in a procedural language, you have complete control over when and how things are called. In SQL you do not.
If you have a query that *depends* on a function getting called once per row because of its very nature (say it increments a counter and you depend on that counter), you are doing something very very unsafe.
Wouldn't the optimizer be forced to evaluate the function for every unique argument value?
Today it does, but as Sokrates pointed out above (something I hadn't really thought about yet...) - perhaps in the future, the optimizer will query metadata (that doesn't exist in version 11) about the function and see the function doesn't really depend on the input (it is dummy input, never referenced) so it has actually been optimized away - it doesn't really "happen", the function call is known to be "f()"....
In which case, you'd have to reference it in some fashion ;)
Forgive my ignorance, but I'm having trouble wrapping my head around the whole idea of how non-deterministic functions make sense in a query.
This is a very very good point. Deterministic at least for the LIFE OF A QUERY. They need not be deterministic "forever". For example:
select * from t where x = sys_context('myctx', 'x');
sys_context is DEFINITELY not deterministic, it would be useless if it were. But it is considered deterministic for the life of a query (it works much like a bind does).
If the value returned by sys_context changed in mid-stream - the results would be rather "meaningless" and unpredicable (it would be driven by the query plan, how the data is accessed and that could change from execution to execution)
So, the bottom line is - if you care
o how many times
o when
o if
o in what order
your function with a given set of inputs is invoked in SQL - you are probably doing something wrong.
Who is this "Sokrates"?
Parthiban Nagarajan, March 18, 2011 - 12:01 pm UTC
I'm really eager to know who is this "Sokrates"? In Jonathan Lewis's blog, he is using a shortened URL which then redirects to a Wikipedia's page about the original, famous Socrates.
Anyway, good work Sokrates. This post has grown rapidly in size. Within 4 days, a lot of followups/comments. Keep it up Sokrates :-)
Oleksandr Alesinskyy, March 18, 2011 - 12:44 pm UTC
can you provide a working example for point #2, my experience has been opposite
Unfortunately, not now - I have met it long time ago (2003?) and do not remember all circumstances to model them.
Should add that I was very surprised because my previous experience was more in line with yours. If I would be able to reproduce this behavior in 11g (the only version to which I have an access right now) I would post here.
again we are right back to square one with someone ascribing to a non-procedural language some aspects of a procedural language.
How? I want declaratively describe the contract of my function - and such declarative description is perfectly in line with SQL philosophy.
As for a usability of non-deterministic functions in SQL - the same DBMS_RANDOM.value provides a perfect example of potentially usable non-deterministic function.
March 18, 2011 - 12:51 pm UTC
... And who says that these results are suitable to be cached? This is purpose of the (proposed) clause to define their suitability.
...
again we are right back to square one with someone ascribing to a non-procedural language some aspects of a procedural language. but it doesn't have anything to do with caching in general, it is all about the execution plan. The execution plan also dictates what is called, when and how frequently. It still goes back to "procedural processing".
there are 100 records such that x=10
there are 10 records such that y > 10 and x = 10
select *
from (select f(), y from t where x = 10)
where y > 10;
Semantically, the same as
select f(), y from t where x = 10 and y > 10;
How many times should f() be invoked.
I vote for "one", regardless of the "deterministic" property of f(). If f() returns arbitrary data for each call - then anything f() can return is acceptable and fine.
Give me a real world use case where by this is not true. let's make this real for a minute. Give me an example of a function f() that returns a different value each time it is called and has real world meaning.
A reader, March 18, 2011 - 1:19 pm UTC
Hi Tom,
there is no input parameter to f() i.e. f(a) it may be executed once but if there is column pass to f() i.e f(col1) than it will executed for every row right?
so in the query below it will call 100 times
select *
from (select f(), y from t where x = 10)
where y > 10;
and 10 times if above query is transform as below
select f(), y from t where x = 10 and y > 10;
Thanks..
March 18, 2011 - 1:27 pm UTC
there is no input parameter to f() i.e. f(a) it may be executed once but if
there is column pass to f() i.e f(col1) than it will executed for every row
right?
not necessarily, there is no reason a cache of prior f(col1) calls done during the execution of that query could not be maintained. The function would be invoked for the number of DISTINCT values of col1 at least.
so in the query below it will call 100 times
select *
from (select f(), y from t where x = 10)
where y > 10;
and 10 times if above query is transform as below
select f(), y from t where x = 10 and y > 10;
Thanks..
or once - but the point is, it might be 1, it might be 10, it might be 100, it might be some value other than 1, 10, or 100.
Sokrates ...
Sokrates aka Matthias Rogel, March 20, 2011 - 5:57 am UTC
( to Parthiban Nagarajan from Coimbatore, India )
... is just an Oracle enthusiast
(since the first day I worked with this software in the spring of 1992), who likes to ask questions
Because I have such a long name in real life, I use "Sokrates" when asking questions
in order to save keystrokes :-)
I like questions starting with "how"
but my favourite questions start with "why"
( like the Sokrates on
http://tinyurl.com/2xft2k )
A reader, March 20, 2011 - 4:13 pm UTC
there are 100 records such that x=10
there are 10 records such that y > 10 and x = 10
select *
from (select f(), y from t where x = 10)
where y > 10;
Semantically, the same as
select f(), y from t where x = 10 and y > 10;
How many times should f() be invoked.
At least ten - once for each row in the result.
And, strictly speaking, in the presence of non-deterministic function these 2 queries are not equivalent.
If f() returns arbitrary data for each call - then anything f() can return is acceptable and fine.
Definitely, not. E.g. I want to generate data with some distribution.
Give me an example of a function f() that returns a different value each time it is called and has real world meaning.
As said above DBMS_RANDOM.VALUE, ROWNUM(). DENSE_RANK(),...
March 20, 2011 - 4:31 pm UTC
why not 100? I was expecting that to be a stance - since
from (select f(), y from t where x = 10)
would find 100 rows.
They are equivalent - if the function is non-deterministic (eg: like random), the output of any of them are equivalent - repeated runs of the query (even with the same plan) would not return the same result set. The entire THING is non-deterministic.
ROWNUM, ROW_NUMBER, DENSE_RANK do not count - for they are (assuming the order by is ordering by something UNIQUE) - deterministic. Remember - part of their inputs is the order by clause.
So, beyond "random" (which we've discussed how to "fix") - give a real world example. rownum and dense rank/rank etc do not count for they are in fact deterministic (and have to be used with real care if you do not order by something unique)
Oleksandr Alesinskyy, March 20, 2011 - 5:03 pm UTC
why not 100? I was expecting that to be a stance - since
from (select f(), y from t where x = 10)
would find 100 rows.
100 would be nice - I said at least 10.
They are equivalent - if the function is non-deterministic (eg: like random), the output of any of them are equivalent - repeated runs of the query (even with the same plan) would not return the same result set. The entire THING is non-deterministic.
Probably yes, probably, no. Non-deterministic is not very good word. The real thing is it cannot be determined solely from parameters - but it may be very deterministic if an initial state is taken into account.
ROWNUM, ROW_NUMBER, DENSE_RANK do not count - for they are (assuming the order by is ordering by something UNIQUE) - deterministic. Remember - part of their inputs is the order by clause.
ORDER BY clause is not present in their parameter list :) So, strictly speaking, they are non-deterministic -in a sense of the DETERMINISTIC clause. They depend on the state.
March 21, 2011 - 9:44 am UTC
As I said, I was expecting 100 to be your stance - not an option - based on your prior statements.
Non-deterministic is the only way I can describe the output from a query using dbms_random in any way. Repeated runs against the same inputs result in different outputs. Even if we start with the same seed. All it takes is a plan change.
Order by is part of their definition. It is part of their *inputs*. Deterministic is not so narrowly defined as to be dependent on formal named parameters. They do not depend on a state, they depend on their inputs. Try using row_number without order by - it is a necessarily component, it is an input to the function. Just as the partitioning and windowing clauses are. The definition of row_number even mentions this:
For consistent results, the query must ensure a deterministic sort order.
sys_context
James Su, March 21, 2011 - 9:01 am UTC
Hi Tom,
A trick that we use to pass parameter to a dynamic SQL or VIEW is sys_context. So it is possible this function will be called for each row? Then it doesn't perform as good as bind variables.
March 21, 2011 - 10:13 am UTC
it is designed to be called once per query - working just like a bind.
however, if you wrap it in other functions, it can be invoked more than once.
"it depends" and we have no control over it...
Oleksandr Alesinskyy, March 22, 2011 - 6:59 am UTC
Followup March 21, 2011 - 9am Central time zone:
As I said, I was expecting 100 to be your stance - not an option - based on your prior statements.
Yes, it would be even better - but it is mainly the matter of definition, that means how we specify semantics for such usages. From my point of view current Oracle semantics is unintuitive and unreasonable. It would be much more honest to prohibit to use any non-deterministic function in SQL.
Non-deterministic is the only way I can describe the output from a query using dbms_random in any way.
Sure, but my comments were not tied to DBMS_RANDOM. They addressed any function with results depending not only on its arguments.
Order by is part of their definition. It is part of their *inputs*. Deterministic is not so narrowly defined as to be dependent on formal named parameters.
Really? Then I have to quote a definition of the DETERMINISTIC clause again:
Specify DETERMINISTIC to indicate that the function returns the same result value whenever it is called with the same values for its arguments.
They do not depend on a state, they depend on their inputs.
They depend oh the state of the query processing, on the pre-history. You may consider it as an input of the state maintained outside - but anyhow it is not an argument.
March 22, 2011 - 8:46 am UTC
From my point of view current Oracle semantics is unintuitive and unreasonable. Let me rewrite that for you
From my point of view current SQL semantics is unintuitive and unreasonable.
You keep trying to force procedural constructs and rules on a language that proclaims as loud as it can "I am not procedural"
Specify DETERMINISTIC to indicate that the function returns the same result value whenever it is called with the same values for its arguments.
I am talking about the word DETERMINISTIC in general, in life, in the dictionary. I am not talking about the attribute deterministic as applied to a PLSQL function.
And I will maintain that conceptually speaking - order by is an INPUT to an analytic function. It is an input.
but anyhow it is not an argument.and it doesn't have to be.
http://dictionary.reference.com/browse/deterministic See the computing specific definition. The definition of the row_number function even references the word in respect to the order by.
Oleksandr Alesinskyy, March 23, 2011 - 5:03 pm UTC
You keep trying to force procedural constructs and rules on a language that proclaims as loud as it can "I am not procedural"
Which procedural constructs I want to force into SQL? Where you have noticed them? I am speaking exclusively about declarative constructs. Constructs that defines semantics of the outcome but not a way how this semantics will be achieved. Yes, I explained in which way an outcome is attainable - but it is just an example.
I am talking about the word DETERMINISTIC in general, in life, in the dictionary. I am not talking about the attribute deterministic as applied to a PLSQL function.
And I am talking exactly about this DETERMINISTIC clause - because its very existence proves that functions tagged with it should be treated differently then functions without such tagging. Otherwise this clause becomes senseless.
March 23, 2011 - 5:46 pm UTC
... Otherwise this clause becomes senseless. ...
No it doesn't. Not in the least. Not at all.
In order to create an index on a function - the function must be 100% deterministic. It is a requirement to be indexed.
In order for this function to be called once or one hundred times or five times or N times in fact:
select * from one_hundred_row_table where x = f();
F() does not need to be deterministic - it is not a requirement, it never has been, it never will be.
to_date( '2005', 'yyyy' ) is not deterministic, we only have to evaluate it once if we want to - and in fact - we do, we have and will...
We might call F() once (current releases for start/stop key for an index).
We might call F() twice (olders releases - for start stop key)
We might call F() 100 times (no index)
We might call F() once (versions to come)
We might call F() N times - because that is the way it works, you have no control over it. Never had, never will.
And if we called F() 100 times - once per row - the results of that query would not be repeatable, not deterministic itself. At that point - given you have no control over the order of rows being processed (and never will), that you have no control over the number of times the function will be invoked (never will) - pretty much any answer is acceptable, you have said as much by using a non-deterministic function that you have no control over.
Pipelined functions?
Tom, March 23, 2011 - 7:27 pm UTC
This has been an interesting discussion thread, but if someone needs that level of procedural control over a result set, why aren't they using pipelined functions?
Oleksandr Alesinskyy, March 24, 2011 - 5:39 am UTC
In order for this function to be called once or one hundred times or five times or N times in fact:
select * from one_hundred_row_table where x = f();
F() does not need to be deterministic - it is not a requirement, it never has been, it never will be.
I argue that if f() is not deterministic (or "statement-deterministic") Oracle has no right to call its once only.
to_date( '2005', 'yyyy' ) is not deterministic, we only have to evaluate it once if we want to - and in fact - we do, we have and will...
It is what I call "statement-deterministic". But there is no way to mark f() this way.
works, you have no control over it. Never had, never will.
Yes, indeed - their is very hard to specify reasonable semantics for non-deterministic function in the where clause. Anyhow, even in this case it might bepossible to guarantee at least one call per row in the final result set - in my opinion it would be less efficient but much more intuitive.
But for select list, group by, order by quite reasonable semantics may be specified and implemented - and without a major performance penalty.
Anyhow, we already are going in circles so I do not see any point to continue this discussion any further.
July 30, 2012 - 11:44 am UTC
I argue that if f() is not deterministic (or "statement-deterministic") Oracle has no right to call its once only.
and we'll have to agree to DISAGREE. It would mean for example that:
select * from t where dt = to_date( '2005', 'yyyy' )
could *never* use an index on DT. Once again, you are ascribing procedural flow of control to something that says "I AM NOT PROCEDURAL"
It is what I call "statement-deterministic". But there is no way to mark f() this way.
Uh-uh - no way. If you run that shortly before midnight on the last day of the month and it runs for a minute or two - THE VALUE CHANGES. You cannot pick and choose what is "statement deterministic" or not. Basically, anything that is non-deterministic EVER would have to fall into your rule. There cannot be exceptions like that just because you (and you personally) feel it would be OK.
But for select list, group by, order by quite reasonable semantics may be specified and implemented - and without a major performance penalty.
I cannot believe you wrote that really - the "without a major performance penalty".
Oleksandr Alesinskyy, March 24, 2011 - 5:30 pm UTC
I start to feel that you deliberately misinterpret my words.
select * from t where dt = to_date( '2005', 'yyyy' )
could *never* use an index on DT.
And what? How this statement is related to our discussion?
Once again, you are ascribing procedural flow of control to something that says "I AM NOT PROCEDURAL"
It is no more procedural as ROWNUM pseudocolumn behavior. Or LEVEL or - name it yourself.
It is what I call "statement-deterministic". But there is no way to mark f() this way.
Uh-uh - no way. If you run that shortly before midnight on the last day of the month and it runs for a minute or two - THE VALUE CHANGES.
That's true, I completely forgot about this peculiarity (another rather unintuitive behavior - why default month is current but default day is 1?).
You cannot pick and choose what is "statement deterministic" or not. Basically, anything that is non-deterministic EVER would have to fall into your rule.
??? It even does not matter if the function is really statement-deterministic or is completely deterministic. It rather matter of the possibility to specify semantics.
There cannot be exceptions like that just because you (and you personally) feel it would be OK.
Exceptions from what?
I cannot believe you wrote that really - the "without a major performance penalty".
Exactly - without a major performance penalty (unless the calculation of the function itself is very expensive). If it is not used in the where clause at some level of query it execution may be delayed to the last moment. So it does not affect your optimizations.
March 25, 2011 - 10:50 am UTC
And what? How this statement is related to our discussion?
because you say that a non-deterministic function could and should be evaluated for each row. I quoted you:
<quote>I argue that if f() is not deterministic (or "statement-deterministic") Oracle has no right to call its once only.
</quote>
If that were the case, if what you said was to be true - then the where clause I wrote using to_date('2005','yyyy') could never be evaluated just once for the query, it would have to be evaluated for each row and an index on dt could not be used to evaluate the result set - we'd have to full scan. For to_date(,'yyyy') is NOT deterministic, not at the statement level - not at any level.
There cannot be exceptions like that just because you (and you personally) feel it would be OK.
Exceptions from what?
either non-deterministic functions are evaluated once per row per use in the row - or they are not. You cannot go both ways.
(unless the calculation of the function itself is very expensive
simply the OVERHEAD of invoking the function - even if the function does nothing - is HUGE and don't forget:
select * from one_billion_row_table where indexed_column_with_unique_values = f();
would have to full scan - no indexes. I would call that *expensive*.
ORDER BY in inline view
Parthiban Nagarajan, March 25, 2011 - 12:12 am UTC
Hi Tom
Please review the following code. It seems what you've said in a follow up earlier is not true. Could you explain?
select * from (select * from t order by x) behaves same as
select * from t order by xSQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE 11.2.0.2.0 Production
TNS for IBM/AIX RISC System/6000: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
SQL> drop table t7 purge;
drop table t7 purge
*
ERROR at line 1:
ORA-00942: table or view does not exist
SQL> create table t7 as select * from all_objects where rownum < 11 order by created desc;
Table created.
SQL> exec dbms_stats.gather_table_stats(user,'t7');
PL/SQL procedure successfully completed.
SQL> set autot traceonly explain
SQL> select * from t7;
Execution Plan
----------------------------------------------------------
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 760 | 3 |
| 1 | TABLE ACCESS FULL| T7 | 10 | 760 | 3 |
----------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
SQL> select * from t7 order by created;
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 760 | 4 |
| 1 | SORT ORDER BY | | 10 | 760 | 4 |
| 2 | TABLE ACCESS FULL| T7 | 10 | 760 | 3 |
-----------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
SQL> select * from (select * from t7 order by created);
Execution Plan
----------------------------------------------------------
-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 760 | 4 |
| 1 | SORT ORDER BY | | 10 | 760 | 4 |
| 2 | TABLE ACCESS FULL| T7 | 10 | 760 | 3 |
-----------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
SQL> drop table t7 purge;
Table dropped.
SQL> spool off
March 25, 2011 - 11:05 am UTC
select * from (select * from t order by x) behaves same as select * from t order by x
Let me ask you something:
if you flip a coin 500 times
and the coin lands heads up 500 times
did you just prove that coins, when flipped, always land heads up?
I wrote, very specifically:
Just like:
select *
from (select * from t order by x)
/
is 'dangerous', you might be lead to believe that it is the same as
select * from t order by x
but it isn't, it is the same as
select * from t
The optimizer is perfectly free to remove your order by if it so desires (without rownum being involved in any case).
It might, it might not, it is not promised that the data will be returned ordered. It is a dangerous construct.
Just as in the past people have "observed" that
select x, max(y) from t group by x;
returned data ordered by x, they 'counted on it', they got burnt badly when group by stopped using sorts most of the time instead of using sorts most of the time.
Put the order by on the final result to get data ordered by something.
Oleksandr Alesinskyy, March 31, 2011 - 11:12 am UTC
There cannot be exceptions like that just because you (and you personally) feel it would be OK.
Exceptions from what?
Sysdate is even less deterministic - but it always called once and only once. Oracle effectively declares it to be a "statement-deterministic".
either non-deterministic functions are evaluated once per row per use in the row - or they are not. You cannot go both ways.
Definitely, I can. It is not about the nature of the function as such as about declared semantic. If I declare a function to be a "statement-deterministic" (as Oracle do with sysdate, may propse another name if you do not like "statement-deterministic") then Oracle
must to call it once only.
simply the OVERHEAD of invoking the function - even if the function does nothing - is HUGE and don't forget:
Only due to the current Oracle implementation, it is not an inherent property of such calls (yes, there were good reasons to implement it so).
would have to full scan - no indexes. I would call that *expensive*. No amount of the performance improvement justifies a badly broken semantics. Never, ever!Otherwise all programs would consist from the single Assembler instruction
ret
April 12, 2011 - 10:19 am UTC
show me where the semantics are defined that require it to be called N times in a predicable fashion?
SQL is not like that, it doesn't work that way. The use of non-statement deterministic functions is always going to be somewhat unpredicable.
Bruno Vroman, April 03, 2011 - 10:50 am UTC
Very interesting discussion. I better understand what I had misunderstood, see "My Oracle Support" forum:
http://tinyurl.com/filetering-with-dbms-random Thank you Tom for the detailed explanations, and tanks a lot for the "workaround": yes, we can make sure that the function is called once for each row ;-)
Best regards!
What about more than once per row?
Gus, July 19, 2012 - 7:44 am UTC
Hi Tom,
You said way back up at the top of the thread that we can't be sure whether a function will be called once per query, once per row, or something else. Does that mean it could be called more than once, or even zero times per row?
WITH tbl AS (SELECT 1 AS id, 'X' AS data FROM dual UNION ALL
SELECT 2, 'Y' FROM dual UNION ALL
SELECT 3, 'Z' FROM dual
)
SELECT data
FROM tbl
WHERE id = trunc(dbms_random.value(1, 4))
;
This query returns some number of rows between zero and three for me in 11gr1 and 10gr2XE.
Testing further with
DECLARE
TYPE result_tbl IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
l_results result_tbl;
l_result NUMBER;
FUNCTION f RETURN NUMBER
AS
l_count NUMBER;
BEGIN
WITH tbl AS (SELECT 1 AS id, 'A' AS data FROM dual UNION ALL
SELECT 2, 'B' FROM dual UNION ALL
SELECT 3, 'C' FROM dual UNION ALL
SELECT 4, 'D' FROM dual UNION ALL
SELECT 5, 'E' FROM dual UNION ALL
SELECT 6, 'F' FROM dual UNION ALL
SELECT 7, 'G' FROM dual UNION ALL
SELECT 8, 'H' FROM dual UNION ALL
SELECT 9, 'I' FROM dual UNION ALL
SELECT 10, 'J' FROM dual
)
SELECT count(data)
INTO l_count
FROM tbl
WHERE id = trunc(dbms_random.value(1, 11))
;
RETURN l_count;
END;
BEGIN
FOR i IN 1..100000 LOOP
l_result := f();
BEGIN
l_results(l_result) := l_results(l_result) + 1;
EXCEPTION WHEN no_data_found THEN
l_results(l_result) := 1;
END;
END LOOP;
FOR i IN l_results.FIRST..l_results.LAST LOOP
dbms_output.put_line('i: ' || i || ', count: ' || l_results(i));
END LOOP;
END;
/
gives me results like
i: 0, count: 34641
i: 1, count: 39201
i: 2, count: 19140
i: 3, count: 5726
i: 4, count: 1113
i: 5, count: 153
i: 6, count: 23
i: 7, count: 3
I get the same results regardless of whether I select from a table or an inline view.
I can understand getting multiple results (and the distribution of rows returned that is greater than 1) per id since I can't guarantee how many times dbms_random.value will be called, but surely I can expect it to be called at least once?
But if I can't then why would the query above ever return no rows?
July 19, 2012 - 12:27 pm UTC
... but surely I can expect it to be called at
least once?
...
yes it would be called at least once.
but what if it were called more than once. as it likely is. then each time
where id = trunc(dbms_random.value(1,11))
is evaluated, a new random number is returned. so can you see why it would frequently return zero rows. A given row would have only a 1 in 10 chance of being returned.
Rate conversion
Ashim Basak, July 24, 2012 - 2:50 am UTC
HI,
I have a source Rate table like :
From Currency To Currency Conversion Rate
A B 2
B C 2.5
D B 3
B K 1.4
E F 3.1
R C 2.2
A L 1.9
I want to build the target rate table look like:
From Currency To Currency Conversion Rate
A B 2
B C 2.5
A C 5
D B 3
B K 1.4
A K 2.8
E F 3.1
R C 2.2
A L 1.9
Means there is no direct conversion between A to C but has indirect path like A to B and B to C.
A C 5
Also A to B and B to K so get A to K :
A K 2.8
Please suggets the way in a single sql query to solev this problem...
July 30, 2012 - 9:07 am UTC
no creates
no inserts
no look
Thanks
Gus, July 27, 2012 - 11:02 am UTC
Thanks Tom, for some reason I didn't even consider the possibility that it would be called once per comparison.
I really appreciate all the work you put in helping people out, I'm constantly impressed by how well you mentor anonymous people. Reading Ask Tom has taught me probably most of what I know about both Oracle and SQL in general.
@Ashim Basak
STEVE H, July 27, 2012 - 12:44 pm UTC
What does your question have to do with random numbers. Are you using random numbers for your currenecy conversions.....?
interesting difference in behaviour
Sokrates, November 03, 2012 - 4:38 pm UTC
sokrates@11.2 > create function rand return number is begin dbms_output.put_line('rand called'); return dbms_random.value; end rand;
2 /
Function created.
sokrates@11.2 > set autotr on explain
sokrates@11.2 > with x as (select dbms_random.value r from dual) select * from x where r=r and r!=r;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 885813501
-------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |
|* 1 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
| 2 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("R" IS NOT NULL AND "R"<>"R")
how often dbms_random.value was called here ?
I would guess "once".
However, different behaviour with user-defined function rand:
sokrates@11.2 > with x as (select rand r from dual) select * from x where r=r and r!=r;
R
----------
.215245124
rand called
rand called
rand called
rand called
Execution Plan
----------------------------------------------------------
Plan hash value: 4034615273
-----------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
|* 1 | FILTER | | | | |
| 2 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RAND"()<>"RAND"() AND "RAND"() IS NOT NULL)
November 05, 2012 - 8:59 am UTC
and further:
ops$tkyte%ORA11GR2> with x as (select /*+ materialize */ rand r from dual) select * from x where r=r and r!=r;
no rows selected
rand called
Execution Plan
----------------------------------------------------------
Plan hash value: 3040811146
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 4 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6644_5FD48AF | | | | |
| 3 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 4 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6644_5FD48AF | 1 | 13 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("R" IS NOT NULL AND "R"<>"R")
calling functions that are not statement level deterministic is dicey from SQL.
Al Ricafort, November 05, 2012 - 1:04 am UTC
This topic has totally altered my perception about function in a SQL.
Mind Reading
Chuck Jolley, November 05, 2012 - 12:39 pm UTC
Wow, as I got close to catching up to date on this question I started to ponder the effect of WITH and materialize on it.
Bingo, next scroll down there is Socrates and Tom discussing exactly those things.
A reader, November 05, 2012 - 1:49 pm UTC
Hello Sir,
why optimizer put - filter("R" IS NOT NULL AND "R"<>"R")
when its obivious that r=r and r!=r from the query that it will never be true and no rows will be produce? can you please explain.
Thank you
November 05, 2012 - 2:00 pm UTC
that would just be a micro-optimization for a case that would never really occur in real life (who would be coding r=r and r<>r??) that they didn't make. in other words, this isn't real world, so why have an optimization branch for it.
A reader, November 05, 2012 - 2:39 pm UTC
Thanks Sir for quick response to clarify my understanding further more...
sokrates@11.2 > with x as (select rand r from dual) select * from x where r=r and r!=r;
R
----------
.215245124
rand called
rand called
rand called
rand called
Execution Plan
----------------------------------------------------------
Plan hash value: 4034615273
-----------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
|* 1 | FILTER | | | | |
| 2 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RAND"()<>"RAND"() AND "RAND"() IS NOT NULL)
is it above query get un-nested as below
select rand() from dual where rand()=rand() and rand()!=rand()
so rand() will be call first time in select, then second time during evaluting condition rand()=rand() and during last condition its call 2 times
and thats how its print the message 4 times?
Please correct me if i am wrong.
Thank you
November 05, 2012 - 2:42 pm UTC
I believe so, once for the projection and three times for the rewritten predicate.
Chuck Jolley, November 05, 2012 - 8:26 pm UTC
That one bothers me.
If the function were called in place of the r references the way it winds up working seems intuitive, accessing it through the WITH clause not so much.
How is the WITH clause defined?
November 06, 2012 - 7:43 am UTC
I'm not sure what you mean?
to "A Reader"
Sokrates, November 06, 2012 - 2:07 am UTC
is it above query get un-nested as below
select rand() from dual where rand()=rand() and rand()!=rand()
so rand() will be call first time in select, then second time during evaluting condition rand()=rand() and during last condition its call 2 times
and thats how its print the message 4 times? Actually the order of calls seem to differ slightly, the result of the select seems to be the last call, not the first call, at first the conditions are evaluated:
sokrates@11.2 > create or replace
2 function rand return number
3 is
4 r number := dbms_random.value;
5 begin
6 dbms_output.put_line('rand called - will return ' || r);
7 return r;
8 end rand;
9 /
Function created.
sokrates@11.2 > with x as (select rand r from dual) select * from x where r=r and r!=r;
R
----------
.556546596
rand called - will return .3848935116626815464233585003749046243
rand called - will return .97113274370680783878482004983917496957
rand called - will return .36883776673117203532610477335486686666
rand called - will return .55654659631275442000486278461821038313
Execution Plan
----------------------------------------------------------
Plan hash value: 4034615273
-----------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
|* 1 | FILTER | | | | |
| 2 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RAND"()<>"RAND"() AND "RAND"() IS NOT NULL)
The select delivers the result of the last call.
To "A reader"
Sokrates, November 06, 2012 - 2:27 am UTC
why optimizer put - filter("R" IS NOT NULL AND "R"<>"R")
when its obivious that r=r and r!=r from the query that it will never be true and no rows will be
produce? That's a very interesting question in my optinion.
Assume, that the optimizer would see the "obvious contradiction" and from that would deliver "no rows selected". In other words, you would expect that the optimizer would "see"
any "obvious" contradiction in a SQL, wouldn't you ?
But this wouldn't be a "micro-optimization", quite the opposite: the implementation of such an optimization would have to solve the boolean satisfiability problem, which is known to be NP-hard, see
http://en.wikipedia.org/wiki/Boolean_satisfiability_problem
Chuck Jolley, November 06, 2012 - 10:19 am UTC
Well, I can understand why rand() could be called 4 times in the case of Socrates query written without the WITH clause for the reasons you've given.
But for the life of me I can't understand why the optimizer would simply rewrite the WITH clause out of the query as given just above.
At least that's the only explanation I can think of for the 4 calls to rand()
I thought the WITH clause was kind of like a directive to "do this first and reuse the results"
Chuck Jolley, November 06, 2012 - 1:40 pm UTC
Interesting.
So WITH can be thought of as an alternate inline view syntax?
Our database even rewrites it when a materialize hint is used:
chajol_dba@taxtest>ed
Wrote file afiedt.buf
1 with x
2 as (/*+ materialize */
3 select rand r from dual)
4 select *
5 from x
6* where r=r and r!=r
chajol_dba@taxtest>/
R
----------
.140225199
1 row selected.
rand called - will return .76853973916567692167596730551150623272
rand called - will return .5396196175405806460117863850708532041
rand called - will return .37414826008658638489073832782549908657
rand called - will return .14022519860129962410244274383917663374
Elapsed: 00:00:00.02
Execution Plan
----------------------------------------------------------
Plan hash value: 4034615273
-----------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
|* 1 | FILTER | | | | |
| 2 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RAND"()<>"RAND"() AND "RAND"() IS NOT NULL)
chajol_dba@taxtest>
I'm lucky in that I sit a mere tens of feet from my users and can fix things, if a plan goes bad, in a matter of minutes.
November 06, 2012 - 2:44 pm UTC
So WITH can be thought of as an alternate inline view syntax?
in a way, yes. It can either be materialized into a temp table (it is sort of a hint that that might be good) or can be merged into the query directly - just like any other view or inline view.
you put the materlize hint in the wrong place, it is just a comment there.
Look up a couple of reviews, I did the materialize example already.
Chuck Jolley, November 06, 2012 - 1:48 pm UTC
On the other hand:
chajol_dba@taxtest>ed
Wrote file afiedt.buf
1 with x
2 as (select rand() r from dual connect by level < 2)
3 select *
4 from x
5* where r=r and r!=r
chajol_dba@taxtest>/
no rows selected
rand called - will return .48506043344076145464260047750362969475
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 2403765415
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |
|* 1 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
|* 2 | CONNECT BY WITHOUT FILTERING| | | | | |
| 3 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("R" IS NOT NULL AND "R"<>"R")
2 - filter(LEVEL<2)
chajol_dba@taxtest>
Chuck Jolley, November 06, 2012 - 5:49 pm UTC
OK, thanks
Chuck Jolley, November 07, 2012 - 8:55 am UTC
Yeah, it was a typo on my part. The materialize hint does change the plan. I put the corrected code here just for completeness.
These edge cases are interesting because they are like the C code you can find out there that is designed to trick compilers into making mistakes.
chajol_dba@taxtest>ed
Wrote file afiedt.buf
1 with x
2 as
3 (select /*+ materialize */
4 rand r
5 from dual)
6 select *
7 from x
8* where r=r and r!=r
chajol_dba@taxtest>/
no rows selected
rand called - will return .97600706740531863178218579740983563455
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 791862530
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 4 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D683A_5017A14F | | | | |
| 3 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 4 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
| 5 | TABLE ACCESS FULL | SYS_TEMP_0FD9D683A_5017A14F | 1 | 13 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("R" IS NOT NULL AND "R"<>"R")
chajol_dba@taxtest>
Distributing a given number between multiple columns
Thiru, July 30, 2014 - 7:19 am UTC
I have a requirement wherein a value in a given column needs to be distributed amongst two or three columns equally or otherwise.
For eg:
CREATE TABLE TT(c1 number,c2 number, c3 number,c4 number)
insert into tt values(.33,.33,.34,3)
insert into tt values(.33,.33,.34,5)
insert into tt values(.5,.5,null,5)
the column c4 needs to be distributed amongst c1,c2,c3 provided the values in c1,c2,c3 are not 0 or 1.
So in the above case, I would like to have the output as:
1,1,1,3
1,2,2,5 this could also be 2,1,2,5 or any such distrubtion
2,3,null,5
Trying to use DBMS_RANDOM but looks like I am not able to achieve
Thanks a lot
Thiru