You Asked

Dear Team,

We have a bit complex SQL including an outer join where we see an issue in preserving row order during multiple fetches.

Structure of the SQL is as follows.

SELECT <column list>
FROM <this includes sub selects & couple of inner joins>
OUTER JOIN <sub select>
ORDER BY <column> ASC

ORDER BY is on a non-key column. We have duplicate values for this non-key column.

When we run the statement to get rows from rownum 10 to 25, set of duplicate values are sorted as expected.
OFFSET 0 ROWS FETCH FIRST 25 ROWS ONLY

When you run the statement you get first 16 rows; works as expected.
OFFSET 0 ROWS FETCH FIRST 16 ROWS ONLY

But when you run the same statement to fetch the next 50 rows, some of the rows already listed in previous query is resulted which is not the expected outcome.

OFFSET 17 ROWS FETCH FIRST 50 ROWS ONLY

Note that this happens only when the sorting column contains few duplicate values & the previous fetch stops at the middle of the duplicate values.

However when we remove the outer join, this works as expected. i.e. 50 rows from rownum 17 does not include items populated in previous step 1-16. Row order is preserved in that case.

Could this be a bug when it comes to situation with the OUTER JOIN present?
Expert advice is welcome here.

Thanks & Best Regards,
Navinth

and Connor said...

Under the covers, we remap the "fetch/offset" clauses to simple ordering functions, eg

SQL> select * from scott.emp
  2  order by ename
  3  offset 0 rows fetch first 10 rows only;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7876 ADAMS      CLERK           7788 12-JAN-83       1100                    20
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
      7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
      7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
      7902 FORD       ANALYST         7566 03-DEC-81       3000                    20
      7900 JAMES      CLERK           7698 03-DEC-81        950                    30
      7566 JONES      MANAGER         7839 02-APR-81       2975                    20
      7839 KING       PRESIDENT            17-NOV-81       5000                    10
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
      7934 MILLER     CLERK           7782 23-JAN-82       1300                    10

10 rows selected.

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID  5ujdsh6ztybn2, child number 0
-------------------------------------
select * from scott.emp order by ename offset 0 rows fetch first 10
rows only

Plan hash value: 3291446077

---------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |       |       |     4 (100)|          |
|*  1 |  VIEW                    |      |    10 |  1070 |     4  (25)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|      |    14 |   518 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | EMP  |    14 |   518 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("from$_subquery$_002"."rowlimit_$$_rownumber"<=10 AND
              "from$_subquery$_002"."rowlimit_$$_rownumber">0))
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."ENAME")<=10)

so the query is equivalent to something like

select *
from (
  select e.*, row_number() over ( order by "emp"."ename" ) x
  from scott.emp e
)
where x between 0 and 10

Whenever you have duplicates, then the assignment of "row_number" to those duplicates is arbitrary, for example if I had enames of:

BROWN
JONES
JONES
JONES
SMITH

then depending on the execution plan, I could get *any* the following sequencing:

BROWN 1
JONES 2
JONES 3
JONES 4
SMITH 5

or

BROWN 1
JONES 4
JONES 3
JONES 2
SMITH 5

or

BROWN 1
JONES 2
JONES 4
JONES 3
SMITH 5

etc etc...This is the same with *any* order by clause by the way. The only way to ensure deterministic results is to order by something that does not have duplicates. The easiest way is to use something unique as a tie-breaker, eg

SQL> select * from scott.emp
  2  order by ename, empno     <=== primary key added as tie-breaker
  3  offset 0 rows fetch first 10 rows only;

Rating

(3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Navinth Bakmeedeniya, June 12, 2019 - 9:04 am UTC

Thanks for your explanation Connor.

But then why we have different behaviors with & without outerjoin? Without outerjoin, order is always preserved.

June 14, 2019 - 2:40 pm UTC

That's pure coincidence; if you have a non-deterministic sort, there's no guarantee rows will come back in the same order!

Navinth Bakmeedeniya, June 15, 2019 - 12:09 pm UTC

But how come it is coincidence if works every time with absence of an outer join.

Since it works every time, I believe there should be some logic written for this purpose.

June 17, 2019 - 9:28 am UTC

If it's just a straight join between two tables, then the database gets the rows they physically exist on disk.

It's rare for rows to move. So if you inserted rows in the order you want them in your queries, you "got lucky".

If someone reorgs the table for some reason, you'll likely find the sort is now wrong.

In the example below:

- Insert rows in t2 in a random order
- The uses attribute clustering to force the database to store the rows in a specific way (after the move)

create table t1 (
  c1 int 
);
create table t2 (
  c1 int, c2 int
);

insert into t1
  with rws as (
    select level x from dual
    connect by level <= 10
  )
    select * from rws;
    
insert into t2
  with rws as (
    select level x from dual
    connect by level <= 10
  )
    select t1.c1, x 
    from   t1 cross join rws
    order  by dbms_random.value;
    
commit;

select t1.c1, t2.c2 
from   t1
join   t2
on     t1.c1 = t2.c1
order  by t1.c1
fetch first 10 rows only;

        C1         C2
---------- ----------
         1         10
         1          7
         1          2
         1          8
         1          3
         1          9
         1          4
         1          6
         1          1
         1          5
    
alter table t2 
  add clustering 
  by linear order ( c1, c2 );
alter table t2 move;

select t1.c1, t2.c2 
from   t1
join   t2
on     t1.c1 = t2.c1
order  by t1.c1
fetch first 10 rows only;

        C1         C2
---------- ----------
         1          1
         1          2
         1          3
         1          4
         1          5
         1          6
         1          7
         1          8
         1          9
         1         10

alter table t2 
  drop clustering ;
alter table t2 
  add clustering by 
  interleaved order ( c1, c2 );
alter table t2 move;

select t1.c1, t2.c2 
from   t1
join   t2
on     t1.c1 = t2.c1
order  by t1.c1
fetch first 10 rows only;

        C1         C2
---------- ----------
         1          1
         1         10
         1          9
         1          8
         1          7
         1          6
         1          5
         1          4
         1          3
         1          2

Note how the order of the values for C2 changes every time!

Navinth Bakmeedeniya, June 18, 2019 - 3:50 am UTC

But In your example, you are doing a DDL on the underlying table before the next execution. There fore I believe there is every possibility to have a different execution as it is also possible to have even a hard parse changing the execution plan.

But in our case, its just only a re-execution of the SQL, without having a even a DML on the table in between two SELECT executions.

June 18, 2019 - 2:12 pm UTC

The DDL is just to demonstrate the effect easily:

With a non-deterministic order by, the database can return rows in a different order.

There are many things that could cause this: moving the table, the optimizer choosing a different join method, etc.

The ONLY way to be certain you get a result set in a particular order is to add a deterministic order by.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

More to Explore

Analytics

Already fetched rows returned when previous fetch stops middle of duplicate values

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

More to Explore

Analytics