Hello All,
Today, while commenting on a blog post, I found a weird behavior of the NTH_VALUE function,
and I wonder whether it is not a bug.
It's true, I performed my test on LiveSQL only, but I am not sure that this could be the cause.
It can be reproduced as follows:
create table tmp as
with x as (
select level r#
, to_char( to_date(level,'J'),'Jsp') as nm
from dual connect by level < 10
),
y as (
select
r#, nm
, ROW_NUMBER() OVER(ORDER BY DBMS_RANDOM.VALUE()) AS rn
from x
)
select * from y
/
Table created.
select r#, nm, rn
, NTH_VALUE(nm, rn) OVER(order by r#) shuffle
from tmp
/
R# NM RN SHUFFLE
-------------------------------
1 One 2 -
2 Two 1 One
3 Three 8 One
4 Four 3 One
5 Five 9 One
6 Six 4 One
7 Seven 5 One
8 Eight 7 One
9 Nine 6 One
9 rows selected.
This query uses the default window of "range between unbounded preceding and current row",
which is not the one I intended, but, anyway, the result is buggy !
With more explicit windows, we get the following:
select r#, nm, rn
, NTH_VALUE(nm, rn) from first
OVER(order by r#
range between unbounded preceding and current row) shuffle1
, NTH_VALUE(nm, rn) from first
OVER(order by r#
rows between unbounded preceding and current row) shuffle2
, NTH_VALUE(nm, rn) from last
OVER(order by r#
range between unbounded preceding and current row) shuffle3
, NTH_VALUE(nm, rn) from last
OVER(order by r#
rows between unbounded preceding and current row) shuffle4
from tmp
/
R# NM RN SHUFFLE1 SHUFFLE2 SHUFFLE3 SHUFFLE4
--------------------------------------------------------------------------------
1 One 2 - - - -
2 Two 1 One One Two Two
3 Three 8 One One - -
4 Four 3 One One Two Two
5 Five 9 One One - -
6 Six 4 One One Three Three
7 Seven 5 One One Three Three
8 Eight 7 One One Two Two
9 Nine 6 One One Four Four
9 rows selected.
Here we see that the result columns SHUFFLE1 and SHUFFLE2 which use the (default) FROM FIRST clause
are wrong, while the result columns SHUFFLE3 and SHUFFLE4 which use FROM LAST are correct.
If we use a fix offset in the function, then we still see similar wrong results:
select r#, nm, rn
, NTH_VALUE(nm, 5) from first
OVER(order by r#
range between unbounded preceding and current row) shuffle1
, NTH_VALUE(nm, 5) from first
OVER(order by r#
rows between unbounded preceding and current row) shuffle2
, NTH_VALUE(nm, 5) from last
OVER(order by r#
range between unbounded preceding and current row) shuffle3
, NTH_VALUE(nm, 5) from last
OVER(order by r#
rows between unbounded preceding and current row) shuffle4
from tmp
/
R# NM RN SHUFFLE1 SHUFFLE2 SHUFFLE3 SHUFFLE4
--------------------------------------------------------------------------------
1 One 2 - - - -
2 Two 1 - - - -
3 Three 8 - - - -
4 Four 3 - - - -
5 Five 9 Five Five One One
6 Six 4 Five Five Two Two
7 Seven 5 Five Five Three Three
8 Eight 7 Five Five Four Four
9 Nine 6 Five Five Five Five
9 rows selected.
Finally, the intended result was in fact one that specifies a window containing ALL rows,
and then ALL the results are correct:
select r#, nm, rn
, NTH_VALUE(nm, rn) from first
OVER(order by r#
range between unbounded preceding and unbounded following) shuffle1
, NTH_VALUE(nm, rn) from first
OVER(order by r#
rows between unbounded preceding and unbounded following) shuffle2
, NTH_VALUE(nm, rn) from last
OVER(order by r#
range between unbounded preceding and unbounded following) shuffle3
, NTH_VALUE(nm, rn) from last
OVER(order by r#
rows between unbounded preceding and unbounded following) shuffle4
from tmp
/
R# NM RN SHUFFLE1 SHUFFLE2 SHUFFLE3 SHUFFLE4
--------------------------------------------------------------------------------
1 One 2 Two Two Eight Eight
2 Two 1 One One Nine Nine
3 Three 8 Eight Eight Two Two
4 Four 3 Three Three Seven Seven
5 Five 9 Nine Nine One One
6 Six 4 Four Four Six Six
7 Seven 5 Five Five Five Five
8 Eight 7 Seven Seven Three Three
9 Nine 6 Six Six Four Four
9 rows selected.
Thanks a lot in advance if you can look into this issue, in more versions/environments,
because it looks to me like a bug that maybe should be reported to Oracle.
Best Regards,
Iudith Mentzel
I agree with you, because the docs state for the 'n' (ie, second column) in nth_value that:
"n can be a column, constant, bind variable, or an expression involving them."
I did the following to reproduce your output.
SQL> select r#, nm, rn
2 , NTH_VALUE(nm,rn) OVER(order by r# range between unbounded preceding and current row) shuffle
3 from tmp
4 /
R# NM RN SHUFFLE
---------- ---------- ---------- ----------
1 One 1 One
2 Two 7 One
3 Three 4 One
4 Four 3 One
5 Five 5 One
6 Six 2 One
7 Seven 6 One
8 Eight 9 One
9 Nine 8 One
9 rows selected.
And then to see what would happen if those nth_value expressions has been calculated using a literal value instead of 'rn', I wrapped it within a function
SQL> create or replace
2 function f(p_rn int,p_r# int) return varchar2 is
3 x varchar2(10);
4 begin
5 execute immediate
6 'select s
7 from (
8 select r#, NTH_VALUE(nm,'||p_rn||') OVER(order by r# range between unbounded preceding and current row) s
9 from tmp
10 )
11 where r# = '||p_r# into x;
12
13 return x;
14 end;
15 /
Function created.
SQL> col dynsql format a10
SQL> select r#, nm, rn
2 , NTH_VALUE(nm,rn) OVER(order by r# range between unbounded preceding and current row) shuffle
3 , f(rn,r#) dynsql
4 from tmp
5 /
R# NM RN SHUFFLE DYNSQL
---------- ---------- ---------- ---------- ----------
1 One 1 One One
2 Two 7 One
3 Three 4 One
4 Four 3 One Three
5 Five 5 One Five
6 Six 2 One Two
7 Seven 6 One Six
8 Eight 9 One
9 Nine 8 One Eight
9 rows selected.
SQL>
SQL>
and we can see something that looks more like we would have expected.
I'll log a bug for this - I see the same on 12.1 and 11.2, and a 10053 trace doesnt show any optimizer transformations going on, so possibly been there since nth_value came along.
Quick unrelated tip - if you're using dbms_random to build data, before you use it in your session, do something like:
exec dbms_random.seed(0)
Then your test case becomes repeatable in a consisten fashion for yourself, and also means we can replicate the data here.
Nice discovery !