Thanks for the question, Ralph.

Asked: March 28, 2022 - 9:29 am UTC

Last updated: March 31, 2022 - 5:21 pm UTC

Version: 12.2-19

Viewed 10K+ times! This question is

You Asked

Given a nested loop, where the inner (not driving) table has an analytic function, whose result is ambiguous (multiple rows could be the first row of the order by, only first row is taken), would it be feasible that said analytic function can return different results for different outer loops?

... iterate NL over this ... company has many addresses, all same prio

          SELECT *
            FROM (
                 SELECT company_id, street,
                        row_number() over ( partition by company_id order BY prio asc ) as row_num                       
                   FROM rau_address                     
                 )
           WHERE row_num = 1

If yes, how can I reproduce this behaviour?
If a NL iterates over DBMS_RANDOM, I do get different results each iteration, but I was not able to write a NL that would get different results out of "get first result of ambigious order by". In my test I used 100, 1000, 10000 rows with the same integer value in column x and order by x, which obviousy could return any one of these rows, but it was always the same row, at least within the one sql.

We are in the process of migrating from 12.2 to 19, thus both versions are interesting for me. Thx for your time.

with LiveSQL Test Case:

https://livesql.oracle.com/apex/livesql/s/m9lxl8otep99ama1ggavsimzn

and Chris said...

It doesn't matter if this is part of a join or not. If an order by is non-deterministic, the output is non-deterministic whether it's a join or querying a single table.

Here's a quick demo that uses attribute clustering to change the physical order of a table.

The sort is on a constant value, so has no effect. After adding the clustering to define an order and moving the table for this to take effect, the result changes:

create table t (
  c1 int, c2 int, c3 int
);

insert into t 
with rws as (
  select level rn, 1 c from dual
  connect by level <= 100
)
  select rn, -rn, c from rws
  order  by rn;

commit;  

select * from t
order  by c3
fetch  first 5 rows only;

        C1         C2         C3
---------- ---------- ----------
         1         -1          1
         2         -2          1
         3         -3          1
         4         -4          1
         5         -5          1

alter table t 
  add clustering by 
  linear order ( c2 );
alter table t 
  move;

select * from t
order  by c3
fetch  first 5 rows only;

        C1         C2         C3
---------- ---------- ----------
       100       -100          1
        99        -99          1
        98        -98          1
        97        -97          1
        96        -96          1

We can change the output back again by adding a covering index, so the database reads that instead of the table:

create index i 
  on t ( c1, c2, c3 );
alter table t 
  modify ( c1 not null, c2 not null, c3 not null );
  
select * from t
order  by c3
fetch  first 5 rows only;

        C1         C2         C3
---------- ---------- ----------
         1         -1          1
         2         -2          1
         3         -3          1
         4         -4          1
         5         -5          1

You're likely to notice different results after making changes to the physical model like this. Some SQL operations (e.g. hash join, hash group by) are more likely to give this effect too.

If you need deterministic results, use a deterministic order by!

Rating

(3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Ralph, March 29, 2022 - 3:36 pm UTC

Thanks for your answer, it brought me closer to my goal. We don‘t need the order by to be deterministic, but consistent within the query. Thus the interesting part for me is not between different queries run separately, but within the very same query.

Lets take three tables A,B,C and look at one row A1, which happens to match two rows B1 and B2 by some join condition. C shall be the table that matches different rows C1-Cn to A1, but we take one of them by choosing the first row from this non-deterministic order by.
Ignore other possible join orders, just look at two: If the optimizer chooses to join A->C->B, it will match A1 with any single Cx, can’t say which, but only one. Next step join with B and you get the results: A1-Cx-B1 and A1-Cx-B2.
But if the optimizer chooses to join A->B->C, it will get A1-B1 and A1-B2. If it then uses Hash-Join for C it will read any Cy once and get a similar result as above.
But if it opts for the NL, then there will be two reads of C, one for A1-B1 and one for A1-B2. And here comes my question: Do we possibly end up with a result set like A1-B1-Cx and A1-B2-Cy, where Cx and Cy is different? And if yes, how can I produce this result?

Our real life query is a single sql - much more complex though - and in production does very seldomly and not reproducibly deliver such Cx and Cy results. But until I can reproduce this behavior, it could be some other part of the complex query, that causes the problem.

https://livesql.oracle.com/apex/livesql/s/m9lxl8otep99ama1ggavsimzn
In my test script A would be rau_company, B would be a bit more complex with rau_owner & rau_companyowner and have more matching rows and C would be rau_address.

Your idea with adding clustering worked well to change the order by on my table, but only once. Even after I dropped the table and recreated it without clustering my local 12.2 kept the new order. I tried LiveSql and there the clustering order did not stay after drop/create. But with LiveSql I do not know how to get two different sessions, where I could run my 30-sec query and “alter table” in parallel.

Your idea with the index worked better. If you create it with asc or desc on street the first row of the order by will flip consistently (Statement 1 & 4 of below liveSQL). Thus I was positive that my query (Statement 29 in above liveSQL) which would use NL to iterate over the order by and that runs about 30 seconds would reflect the flip, if I recreated the index in the other direction while the query is running. But that was not the case, at least not in my local 12.2. The query returned only one row reflecting the order that was present when the query was started. Thus there must be some feature that keeps the result consistent, even though the order by in the parallel session already returns the other result, or have I made some other error?
https://livesql.oracle.com/apex/livesql/s/m9s6g03i3d0lp4g3aawhynjyj (liveSQL with drop/create of the index)

March 29, 2022 - 6:41 pm UTC

We don‘t need the order by to be deterministic, but consistent within the query.

If you have a non-deterministic sort, all bets are off. While you often get lucky and still get the same output, you might not.

Thus there must be some feature that keeps the result consistent

The result of a query is fixed at the time it starts.

Think about this:

You query one table which does a full table scan with no sorting. While this is running, you re-org the table in a way that changes the physical order of the data. If the query picked up this change immediately, potentially you'd read some rows twice and skip over others.

This would be bad.

So Oracle Database uses undo to reconstruct the table state at the time the query started to ensure you get consistent results.

Ralph, March 29, 2022 - 9:55 pm UTC

Ok, thus if this full table scan from your example is inside a NL once the first iteration reads the table, all subsequent iterations of the NL will use the identical physical rows from UNDO in case that there is a reorg or something else changes the table.

But then when you go back to my non-deterministic order by, within the same query all order bys will read the identical physical rows, too. How would there still be a chance to see two different results within the same query? Could there be some smart plan adapting feature at work, that detects an inefficiency after some order bys and changes the plan for the remaining order bys?

Luckily it is easy for us to make the order by deterministic in our original query by adding the id as secondary column to order by. Thanks for your clarifications.

March 30, 2022 - 1:30 pm UTC

How would there still be a chance to see two different results within the same query?

The optimizer will often transform your query when coming up with the plan. Particularly for complex SQL, there's a reasonable chance the final query is very different to the one you wrote.

It could be that in the process of doing this there are intermediate steps that return the same rows in different orders. Parallel operations could also affect the output.

Ultimately it doesn't really matter how it happens - the key point is if you want deterministic results, you must have a deterministic order by.

Ralph, March 31, 2022 - 3:51 pm UTC

The idea with paralellism worked like a charm, just execute this

ALTER TABLE rau_address PARALLEL 8;

before Statement 29 from my script https://livesql.oracle.com/apex/livesql/s/m9lxl8otep99ama1ggavsimzn and it will give you multiple different streets that did come up as first row of the order by within the very same query.

Thx for your valuable insights and have a nice day!

March 31, 2022 - 5:21 pm UTC

You're welcome

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

More to Explore

Analytics

NL with ambiguous order by

Breadcrumb

Question and Answer

You Asked

and Chris said...

Rating

Comments

More to Explore

Analytics