You Asked

Hi Tom!

We have a large load every day from one table to another and I wanted to split it up into chunks and run parallel jobs to speed it up.
I found that doing the split on rowid doesn't work on partitioned table. First I did it on partition level since this is how we load (partition by partition via dynamic SQL) but I should have realized that wouldn't work.

But I got really surprised that the split doesn't work even if I do it on whole table without partition. And I have tested at a few tables we have and it works, as long as its not a partitioned table.

I am quite confused with this, is there an explanation?

So this sql:

SELECT MIN(r) start_id
      ,MAX(r) end_id
FROM   (SELECT ntile(20) over (ORDER BY rownum) grp
              ,ROWID r
        FROM   my_big_table)
GROUP  BY grp ;

Gets it into 20 chunks but when testing the first chunk via this sql:

SELECT COUNT(9)
FROM my_big_source_table
WHERE ROWID BETWEEN 'AACGW1AELAAN72EAAB' AND 'AACGW1AH7AADmn/AAY';

it simply fetches the whole table. If I do the same on any other table that isn't partitioned it works fine as with the example in FreeSQL.

with LiveSQL Test Case:

https://freesql.com/?compressed_code=H4sIAAAAAAAACoWPy27CMBBF95b8D3cXm0fFhlVXSQgtEsHF5blCQKYQCRw0MeT3USJQ2XGXR5pz75R0or3Hvrg6r1paij8uzjjyR0M4p%252FJTCil%252Bk3ESz5COJoo1Sr9lv8kzKdCkk4armpPLGjq0JgWgHlfO5ydSfY3iRgxl7CCxiNbgonLXs8aBL0%252FTMx1rlqMB%252BJ8%252FnK%252FLtBRf1sx%252FUMsOfGmWlm8fkqI6ElNdn2fYka%252BIHIJWFIZhtEjj6WLcDrB12SvrfbeDbjcx8R0SQ6XRMQEAAA%253D%253D&code_language=PL_SQL&db_version=19&code_format=false

and Chris said...

The NTILE sorts by ROWNUM. There's no guarantee that ROWNUMs will align with ROWIDs. This is particularly true if the query runs in parallel. This applies whether the table is partitioned or not.

Here's an example with a non-partitioned table on 19.28 (the macro is to simplify counting the rows between rowid ranges):

create table t ( c1 int, c2 date, c3 varchar2(500) );

insert into t 
  select level, sysdate, rpad ( 'stuff', 500, level )
  from dual
  connect by level <= 100000;

commit;

create or replace function rowid_ranges ( start_rid rowid, end_rid rowid ) 
return clob sql_macro
as
begin 
  return 'select count(*) c
          from   t
          where  rowid between start_rid and end_rid';
end;
/

with rowids as (
  select min(r) start_id
  , max(r) end_id
  , count(*) grp#
  from (
    select ntile(5) over (order by rownum) grp
    , rowid r
    , rownum rn
    from t)
  group by grp 
)
select /*+ parallel */*
from   rowids, 
       lateral ( select * from rowid_ranges ( start_id, end_id ) );

START_ID           END_ID                   GRP#          C
------------------ ------------------ ---------- ----------
AAAYaBAAMAAB/EyAAK AAAYaKAAMAAHx3QAAE      20000      89436
AAAYaCAAMAAB/QCAAA AAAYaKAAMAAHyQVAAM      20000      89984
AAAYaBAAMAAB/ECAAA AAAYaKAAMAAHyQXAAC      20000     100000
AAAYaBAAMAAB/KIAAA AAAYaIAAMAAIOz/AAM      20000      75008
AAAYaBAAMAAC3/jAAA AAAYaKAAMAAHyKPAAM      20000      86526

Notice that there's lots of variation between the number of rows in each rowid range. And all are close to the number of rows in the table.

Changing NTILE to sort by ROWID gives the expect results: 20,000 rows per range:

with rowids as (
  select min(r) start_id
  , max(r) end_id
  , count(*) grp#
  from (
    select ntile(5) over (order by ROWID) grp -- sort by ROWID
    , rowid r
    , rownum rn
    from t)
  group by grp 
)
select /*+ parallel */*
from   rowids, 
       lateral ( select * from rowid_ranges ( start_id, end_id ) );

START_ID           END_ID                   GRP#          C
------------------ ------------------ ---------- ----------
AAAYaDAAMAADksDAAA AAAYaEAAMAAIGjfAAM      20000      20000
AAAYaJAAMAAHxpLAAA AAAYaKAAMAAHyQXAAC      20000      20000
AAAYaBAAMAAB/ECAAA AAAYaCAAMAAF0CXAAC      20000      20000
AAAYaHAAMAAHxcCAAA AAAYaIAAMAAIOz/AAM      20000      20000
AAAYaFAAMAAHxECAAA AAAYaGAAMAAIOt/AAM      20000      20000

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

Analytics

Splitting table into chunks for load doesn't work on partitioned table

Breadcrumb

Question and Answer

You Asked

and Chris said...

More to Explore

Analytics