Hi Tom!
We have a large load every day from one table to another and I wanted to split it up into chunks and run parallel jobs to speed it up.
I found that doing the split on rowid doesn't work on partitioned table. First I did it on partition level since this is how we load (partition by partition via dynamic SQL) but I should have realized that wouldn't work.
But I got really surprised that the split doesn't work even if I do it on whole table without partition. And I have tested at a few tables we have and it works, as long as its not a partitioned table.
I am quite confused with this, is there an explanation?
So this sql:
SELECT MIN(r) start_id
,MAX(r) end_id
FROM (SELECT ntile(20) over (ORDER BY rownum) grp
,ROWID r
FROM my_big_table)
GROUP BY grp ;Gets it into 20 chunks but when testing the first chunk via this sql:
SELECT COUNT(9)
FROM my_big_source_table
WHERE ROWID BETWEEN 'AACGW1AELAAN72EAAB' AND 'AACGW1AH7AADmn/AAY';
it simply fetches the whole table. If I do the same on any other table that isn't partitioned it works fine as with the example in FreeSQL.
https://freesql.com/?compressed_code=H4sIAAAAAAAACoWPy27CMBBF95b8D3cXm0fFhlVXSQgtEsHF5blCQKYQCRw0MeT3USJQ2XGXR5pz75R0or3Hvrg6r1paij8uzjjyR0M4p%252FJTCil%252Bk3ESz5COJoo1Sr9lv8kzKdCkk4armpPLGjq0JgWgHlfO5ydSfY3iRgxl7CCxiNbgonLXs8aBL0%252FTMx1rlqMB%252BJ8%252FnK%252FLtBRf1sx%252FUMsOfGmWlm8fkqI6ElNdn2fYka%252BIHIJWFIZhtEjj6WLcDrB12SvrfbeDbjcx8R0SQ6XRMQEAAA%253D%253D&code_language=PL_SQL&db_version=19&code_format=false
The NTILE sorts by ROWNUM. There's no guarantee that ROWNUMs will align with ROWIDs. This is particularly true if the query runs in parallel. This applies whether the table is partitioned or not.
Here's an example with a non-partitioned table on 19.28 (the macro is to simplify counting the rows between rowid ranges):
create table t ( c1 int, c2 date, c3 varchar2(500) );
insert into t
select level, sysdate, rpad ( 'stuff', 500, level )
from dual
connect by level <= 100000;
commit;
create or replace function rowid_ranges ( start_rid rowid, end_rid rowid )
return clob sql_macro
as
begin
return 'select count(*) c
from t
where rowid between start_rid and end_rid';
end;
/
with rowids as (
select min(r) start_id
, max(r) end_id
, count(*) grp#
from (
select ntile(5) over (order by rownum) grp
, rowid r
, rownum rn
from t)
group by grp
)
select /*+ parallel */*
from rowids,
lateral ( select * from rowid_ranges ( start_id, end_id ) );
START_ID END_ID GRP# C
------------------ ------------------ ---------- ----------
AAAYaBAAMAAB/EyAAK AAAYaKAAMAAHx3QAAE 20000 89436
AAAYaCAAMAAB/QCAAA AAAYaKAAMAAHyQVAAM 20000 89984
AAAYaBAAMAAB/ECAAA AAAYaKAAMAAHyQXAAC 20000 100000
AAAYaBAAMAAB/KIAAA AAAYaIAAMAAIOz/AAM 20000 75008
AAAYaBAAMAAC3/jAAA AAAYaKAAMAAHyKPAAM 20000 86526Notice that there's lots of variation between the number of rows in each rowid range. And all are close to the number of rows in the table.
Changing NTILE to sort by ROWID gives the expect results: 20,000 rows per range:
with rowids as (
select min(r) start_id
, max(r) end_id
, count(*) grp#
from (
select ntile(5) over (order by ROWID) grp -- sort by ROWID
, rowid r
, rownum rn
from t)
group by grp
)
select /*+ parallel */*
from rowids,
lateral ( select * from rowid_ranges ( start_id, end_id ) );
START_ID END_ID GRP# C
------------------ ------------------ ---------- ----------
AAAYaDAAMAADksDAAA AAAYaEAAMAAIGjfAAM 20000 20000
AAAYaJAAMAAHxpLAAA AAAYaKAAMAAHyQXAAC 20000 20000
AAAYaBAAMAAB/ECAAA AAAYaCAAMAAF0CXAAC 20000 20000
AAAYaHAAMAAHxcCAAA AAAYaIAAMAAIOz/AAM 20000 20000
AAAYaFAAMAAHxECAAA AAAYaGAAMAAIOt/AAM 20000 20000