Skip to Main Content
  • Questions
  • How to use DBMS_PARALLEL_EXECUTE to chunk over DB link

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question.

Asked: July 14, 2017 - 10:21 am UTC

Last updated: September 06, 2018 - 2:26 pm UTC

Version: 11g

Viewed 10K+ times! This question is

You Asked

Hi Team,

Can you please have a look at below:

I tried copying table data from One DB to Other over DB link using DBMS_PARALLEL_EXECUTE.
And it works as expected.

Challenge I am facing here is:
I have a table TB_TXN created on both Oracle DB.
Now i want to copy specific set of data from one DB to other.
E.g. I need to insert over DB link only records based on SQL: SELECT * FROM TB_TXN WHERE TXN_DATE <= TO_DATE('01/01/2015','MM/DD/YYYY');

Can you please help me, how to make it work using chunks executions?

declare
  l_sql_stmt constant varchar2(1000)
      := 'INSERT INTO TB_TXN@CONNECT_ARCHIVE
      SELECT * FROM TB_TXN WHERE ROWID BETWEEN :start_id AND :end_id';

  l_task_name   constant varchar2(20) := 'TXN_TEST';
begin
  begin
    dbms_parallel_execute.drop_task(l_task_name);
  exception
    when others then null;
  end;

  dbms_parallel_execute.create_task(l_task_name);

  dbms_parallel_execute.create_chunks_by_rowid(task_name   => l_task_name,
                                               table_owner => 'SCOTT',
                                               table_name  => 'TB_TXN',
                                               by_row      => TRUE,
                                               chunk_size  => 10);

 dbms_parallel_execute.run_task(task_name      => l_task_name,
                                 sql_stmt       => l_sql_stmt,
                                 language_flag  => DBMS_SQL.NATIVE,
                                 parallel_level => 10);
 DBMS_OUTPUT.PUT_LINE('DONE..');                                 
exception when others then
 DBMS_OUTPUT.PUT_LINE('ERROR:'||SQLERRM); 
end;
/

and we said...

So you want to chunk the results of your query up and send those over the DB link? If so you need to use the create_chunks_by_sql function.

First, here's a sample table we're going to transfer half of:

create table t as
  select rownum x, sysdate dt from dual connect by level <= 1000;

select count(*) from t ;

  COUNT(*)
----------
      1000

select count(*) from t where x <= 500 ;

  COUNT(*)
----------
       500


And a similar (empty) one over the db link:

select count(*) from t@db11204 ;

  COUNT(*)
----------
         0


Next you need to decide how many chunks you want. Then write a query to split your data into that many rows with start and end rowids.

There are various ways you can do this. For simplicity I'll use ntile to divide the rows into N buckets. Then group by the result of this to find the start and end rowids in each group:

select min(r) start_id, max(r) end_id from (
SELECT ntile(5) over (order by x) grp, rowid r
FROM   t 
WHERE  x <= 500
)
group  by grp;

START_ID           END_ID            
------------------ ------------------
AAAveZAAdAAAo4sAAA AAAveZAAdAAAo4sABj
AAAveZAAdAAAo4sABk AAAveZAAdAAAo4sADH
AAAveZAAdAAAo4sAEs AAAveZAAdAAAo4sAGP
AAAveZAAdAAAo4sAGQ AAAveZAAdAAAo4tABE
AAAveZAAdAAAo4sADI AAAveZAAdAAAo4sAEr


You can find in-depth discussion of other chunking techniques on the PL/SQL & EBR blog:

https://blogs.oracle.com/plsql-and-ebr/transforming-one-table-to-another:-sql-or-plsql

Once you have this query, pass it to dbms_parallel_execute.create_chunks_by_sql setting by_rowid to true. All you have to do then is run your task, passing the SQL statement:

declare
  l_sql_stmt    constant varchar2 ( 1000 ) := 'INSERT INTO t@db11204
SELECT * FROM t WHERE ROWID BETWEEN :start_id AND :end_id';

  l_chunk_sql   constant varchar2 ( 1000 ) := 'select min(r) start_id, max(r) end_id from (
SELECT ntile(5) over (order by x) grp, rowid r
FROM   t 
WHERE  x <= 500
)
group  by grp';
  l_try integer;
  l_status integer;
  l_task_name   constant varchar2( 20 ) := 'TXN_TEST';
begin
  begin
    dbms_parallel_execute.drop_task( l_task_name );
  exception
    when others then
      null;
  end;
  dbms_parallel_execute.create_task( l_task_name );
  dbms_parallel_execute.create_chunks_by_sql(l_task_name, l_chunk_sql, true);

  dbms_parallel_execute.run_task(
    task_name        => l_task_name,
    sql_stmt         => l_sql_stmt,
    language_flag    => dbms_sql.native,
    parallel_level   => 5
  );
  
  dbms_output.put_line( 'DONE..' || dbms_parallel_execute.task_status(l_task_name));
  
end;
/

DONE..7

select count(*) from t@db11204 ;

  COUNT(*)
----------
       500


Rating

  (13 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Rerun failed chunk.

A reader, July 18, 2017 - 11:20 am UTC

Thanks a lot Chris!
Above approached works exactly what i expected.

Out of 10 chunk, lets say 2 we're not processed.
Can you please help - how to rerun those 2 failed chunks.
Chris Saxon
July 18, 2017 - 2:11 pm UTC

You can manually process chunks with the GET_ROWID_CHUNK or GET_NUMBER_COL_CHUNK procedures. So you could adapt these to reprocess the failed chunks

For example, if I try processing using an invalid DB link name they'll all fail. But I can still pick them up after and try processing manually with execute immediate:

select * from t@db11;

SQL Error: ORA-02019: connection description for remote database not found

declare
  l_sql_stmt    constant varchar2 ( 1000 ) := 'INSERT INTO t@db11
SELECT * FROM t WHERE ROWID BETWEEN :start_id AND :end_id';

  l_chunk_sql   constant varchar2 ( 1000 ) := 'select min(r) start_id, max(r) end_id from (
SELECT ntile(5) over (order by x) grp, rowid r
FROM   t 
WHERE  x <= 500
)
group  by grp';
  l_try integer;
  l_status integer;
  l_task_name   constant varchar2( 20 ) := 'TXN_TEST';
begin
  begin
    dbms_parallel_execute.drop_task( l_task_name );
  exception
    when others then
      null;
  end;
  dbms_parallel_execute.create_task( l_task_name );
  dbms_parallel_execute.create_chunks_by_sql(l_task_name, l_chunk_sql, true);

  dbms_parallel_execute.run_task(
    task_name        => l_task_name,
    sql_stmt         => l_sql_stmt,
    language_flag    => dbms_sql.native,
    parallel_level   => 5
  );
  
  dbms_output.put_line( 'DONE..' || dbms_parallel_execute.task_status(l_task_name));
  
end;
/

DONE..5

select chunk_id, status from user_parallel_execute_chunks;

  CHUNK_ID STATUS              
---------- --------------------
      2601 UNASSIGNED          
      2602 UNASSIGNED          
      2603 UNASSIGNED          
      2604 UNASSIGNED          
      2605 UNASSIGNED 

declare
  l_sql_stmt    constant varchar2 ( 1000 ) := 'INSERT INTO t@db11
SELECT * FROM t WHERE ROWID BETWEEN :start_id AND :end_id';
  l_task_name   constant varchar2( 20 ) := 'TXN_TEST';
  l_chunk_id number;
  l_start_rowid rowid;
  l_end_rowid rowid;
  l_any_rows boolean;
begin

  loop
    dbms_parallel_execute.get_rowid_chunk(l_task_name,
                                          l_chunk_id,
                                          l_start_rowid,
                                          l_end_rowid,
                                          l_any_rows);
    exit when not l_any_rows ;
    
    begin
      execute immediate l_sql_stmt using l_start_rowid, l_end_rowid;
      dbms_parallel_execute.set_chunk_status('mytask',l_chunk_id,
        dbms_parallel_execute.processed);
    exception when others then
      -- log error
      dbms_output.put_line('CHUNK:' || l_chunk_id || ' ERR: ' || sqlerrm);
    end;
    
  end loop;
end;
/

CHUNK:2601 ERR: ORA-02019: connection description for remote database not found
CHUNK:2602 ERR: ORA-02019: connection description for remote database not found
CHUNK:2603 ERR: ORA-02019: connection description for remote database not found
CHUNK:2604 ERR: ORA-02019: connection description for remote database not found
CHUNK:2605 ERR: ORA-02019: connection description for remote database not found

select chunk_id, status from user_parallel_execute_chunks;

  CHUNK_ID STATUS              
---------- --------------------
      2601 ASSIGNED            
      2602 ASSIGNED            
      2603 ASSIGNED            
      2604 ASSIGNED            
      2605 ASSIGNED 

Invalid Data Insert due to BETWEEN :start_id AND :end_id

A reader, July 18, 2017 - 2:03 pm UTC

Hi Chris,
Somehow below chunk by SQL is not giving expected output:
If I try to create chunk by below SQL based on ROWID's, the data gets inserted in destination table for txn_date = '18-07-17' along with some random data having txn_date = 16-07-17, 10-07-16.

select min(r) start_id, max(r) end_id from (
SELECT ntile(3) over (order by rowid) grp, rowid r
FROM tb_transaction
where txn_date = '18-07-17'
)
group by grp;

It seems ROWID between ? and ? wont work here,because it can fetch any ROWID falling between ? and ?.

Can you please help here.
Chris Saxon
July 18, 2017 - 2:15 pm UTC

Ah. Good point. You need to stick your where clause of your insert as select.

Though this may lead to some highly uneven chunks...

Alternative for creating chunks

A reader, July 18, 2017 - 2:19 pm UTC

Hi Chris
You said there are multiple options to create chunks other than ntile, can you please tell me what are the other options for the same which I can try.

No of Thread to be executed each time

A reader, October 09, 2017 - 9:23 am UTC

I have created 500 chunks in SYS.USER_PARALLEL_EXECUTE_CHUNKS.
While monitoring status of execution of chunks, I can see only 3 chunkId's gets picked up/ASSIGNED/PROCESSED.
Is there any way whcih can help to increase the number of chunks to be PROCESSED in each time..
Chris Saxon
October 09, 2017 - 11:29 am UTC

You can use the parallel_level parameter of run_task to set how many chunks the database processes at once. If you don't set this, it defaults to:

The default number of job slaves is computed as the product of Oracle parameters cpu_count and parallel_threads_per_cpu. On a Real Application Clusters installation, the number of job slaves is the sum of individual settings on each node in the cluster. This procedure returns only when all the chunks are processed. In parallel cases, this procedure returns only when all the job slaves finished.

http://docs.oracle.com/database/122/ARPLS/DBMS_PARALLEL_EXECUTE.htm#ARPLS67375

Note that chunks are processed using scheduler jobs. So you may also be limited by whatever you've set JOB_QUEUE_PROCESSES to.

Arun, October 09, 2017 - 4:20 pm UTC

Very nice explanation, but when I used execute_parallel pkg, the number of processes/threads are not staying constant with the value of parallel_level (10) that I gave, for 15 minutes I saw around 8 processes running but afterwards it went down to 1 process and thereafter it just processed chunk by chunk, how can I keep using 8 process till all chunks get processed.

Sorry I am asking question in submit review, I do not see an option to reply or ask question.
Chris Saxon
October 09, 2017 - 4:39 pm UTC

And what is your system parameter job_queue_processes set to? Do you have any other jobs running at the same time?

ORA-27486: insufficient privileges#ORA-06512: at "SYS.DBMS_ISCHED", line 135

A reader, October 11, 2017 - 6:38 am UTC

When I used dbms_parallel_execute.run_task in declare begin block its working as expected e.g.
declare
piv_date varchar2(20) := '02/05/2017';
l_sql_stmt varchar2 ( 1000 );

l_chunk_sql varchar2 ( 1000 );

l_try integer;
l_status integer;
l_task_name constant varchar2( 20 ) := 'TB_TRANSACTION';
begin
begin
dbms_parallel_execute.drop_task( l_task_name );
dbms_output.put_line( 'drop_task..');
exception
when others then
null;
end;
l_chunk_sql := select ....;
l_sql_stmt := 'BEGIN p_process_statement(:start_id, :end_id, '||''''||piv_date||''''||'); END;';
dbms_parallel_execute.create_task( l_task_name );
dbms_parallel_execute.create_chunks_by_sql(l_task_name, l_chunk_sql, true);

dbms_parallel_execute.run_task(
task_name => l_task_name,
sql_stmt => l_sql_stmt,
language_flag => dbms_sql.native,
parallel_level => 10
);


end;

But when I tried executing dbms_parallel_execute.run_task from my package its giving me error:
pov_errmsg:ORA-27486: insufficient privileges#ORA-06512: at "SYS.DBMS_ISCHED", line 135
ORA-06512: at "SYS.DBMS_SCHEDULER", line 271
ORA-06512: at "SYS.DBMS_PARALLEL_EXECUTE", line 358
ORA-06512: at "SYS.DBMS_PARALLEL_EXECUTE", line 418

Could you please help.
Chris Saxon
October 11, 2017 - 9:38 am UTC

What exactly are you doing in your package?

Multiple SID's for SELECT

A reader, October 24, 2017 - 6:16 am UTC

When I executed COUNT SQL:
SELECT COUNT(1)
FROM TB_CHILD
WHERE TB_CHILD.CHILD_ID IN (
SELECT TB_PARENT_1.CHILD_ID
FROM TB_PARENT_1
, TB_PARENT_2
WHERE TB_PARENT_2.POSTING_DT <= TO_DATE('11/30/2008', 'MM/DD/YYYY')
AND TB_PARENT_1.PARENT_ID = TB_PARENT_2.PARENT_ID
);


I am not using any PARALLEL hint here, but if queries on v$session i am getting many records for my logged user via SQL developer for same SQL id:

SID SERIAL# AUDSID PADDR USER# USERNAME COMMAND OWNERID STATUS SERVER SCHEMA# SCHEMANAME OSUSER PROCESS PORT PROGRAM TYPE SQL_ADDRESS SQL_HASH_VALUE SQL_ID SQL_EXEC_ID
1 1643 1628 0700010EE1037790 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 3997956 54728 oracle@orcl (P002) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
166 1595 1628 0700010E910604A8 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 56950794 54728 oracle@orcl (P003) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
332 1353 1628 0700010E8902E080 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 262622 54728 oracle@orcl (P004) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
497 1241 1628 0700010E810270C0 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 39911642 54728 oracle@orcl (P005) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
662 1307 1628 0700010E79034598 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 45088888 54728 oracle@orcl (P006) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
827 1281 1628 0700010EA1022B00 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 53805062 54728 oracle@orcl (P007) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
992 1467 1628 0700010E990C9890 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 65929256 54728 oracle@orcl (P008) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
1158 1451 1628 0700010EE1038838 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 37552316 54728 oracle@orcl (P009) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
1322 1449 1628 0700010E91061550 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 18481176 54728 oracle@orcl (P010) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
1487 1449 1628 0700010E8902F128 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 7209400 54728 oracle@orcl (P011) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
1652 1449 1628 0700010E81028168 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 5439840 54728 oracle@orcl (P012) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
1816 1095 1628 0700010E79035640 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 19464244 54728 oracle@orcl (P013) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
1982 1343 1628 0700010EA1023BA8 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 8061042 54728 oracle@orcl (P014) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
2147 609 1628 0700010E990CA938 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 44892408 54728 oracle@orcl (P015) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
4621 2225 1628 0700010E81026018 283 SCOTT 3 2147483644 ACTIVE DEDICATED 283 SCOTT tiger 7708 54728 SQL Developer USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
4786 2475 1628 0700010E790334F0 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 37748880 54728 oracle@orcl (P000) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216
5116 1745 1628 0700010E990C87E8 283 SCOTT 3 70157 ACTIVE DEDICATED 283 SCOTT tiger 56164402 54728 oracle@orcl (P001) USER 0700010ECF5495B8 1503343684 4yhpytxctqd24 16777216

Could you please help to understand this?
Ideally It should be only one row for my logged in userid right?


Chris Saxon
October 24, 2017 - 7:14 am UTC

You might not think the query is running in parallel, but it is!

All the PXXX entries in program, such as oracle@orcl (P002), are the parallel processes

A reader, October 24, 2017 - 7:21 am UTC

Hi Chirs,
Can you please help to understand below.
I am bit confused here..
Oracle uses degree of parallelism only if it has exeplicitly mentioned in SQL i.e. PARALLEL 4, 6, etc right?
I am not putting any PARALLEL degree in SQL, why its going in parallel..
Is there any rules oracle follow, because only for some queries it gives many SID's for single SQL_ID otherwise it shows only one SID for other SQL's.
Chris Saxon
October 24, 2017 - 7:49 am UTC

Oracle uses degree of parallelism only if it has exeplicitly mentioned in SQL

Nope!

You can set the parallelism at the table level with:

alter table tab parallel N;


Or you can force it at the session level with:

alter session force parallel query parallel N;


Or you could have auto parallelism kick in. The rules for this are a bit complex, so read up on this at:

https://blogs.oracle.com/datawarehousing/what-is-auto-dop
https://docs.oracle.com/database/122/VLDBG/degree-parallel.htm#VLDBG1392

Get ChunkId & Job Name

A reader, October 25, 2017 - 6:47 am UTC

Hi Chirs,

I am executing chunks using procedure p_process_chunk below.
Can you help:
Is there any way i can get chunk Id(56, 57, etc) & Job Name (ex. TASK$_371_1, TASK$_371_2, etc) in p_process_chunk (i need this for logger perspective)
l_sql_stmt := 'BEGIN p_process_chunk(:start_id, :end_id); END;';
dbms_parallel_execute.run_task(task_name => l_task_name,
sql_stmt => l_sql_stmt,
language_flag => dbms_sql.native,
parallel_level => 5
);


Chris Saxon
October 30, 2017 - 2:33 pm UTC

Run_task only returns after processing all the chunks. So you can't track its progress within this block. You can query *_parallel_execute_chunks in another session to see what chunks there are and how they're progressing.

You can run the chunks yourself, as Tim Hall shows:

https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2#user_defined_framework

a question abort rowid and row

A reader, September 04, 2018 - 12:39 pm UTC

I think this example assumes the rowids and 'x' in same order . when i delete some records and insert again ,the remote table get wrong records. so "INSERT INTO t1
SELECT * FROM t WHERE x<=500 and ROWID BETWEEN :start_id AND :end_id' is right?
Chris Saxon
September 04, 2018 - 4:51 pm UTC

I'm not sure what you're doing - could you clarify please?

shifting sand

Racer I., September 05, 2018 - 12:19 pm UTC

Hi,

That is an instance of what Chris said above :
> You need to stick your where clause of your insert as select.
The row-id chunks are for the whole table / all rows. you always need to apply your logical filters.
Also : If the table lives between getting the chunks and using them you should adjust the chunks to be continuous.
Original example had gaps (maybe, wasn't sorted) :
AAAveZAAdAAAo4sABk AAAveZAAdAAAo4sADH
AAAveZAAdAAAo4sAEs AAAveZAAdAAAo4sAGP
so instead of
ROWID BETWEEN :start and :end
you should use
ROWID >= :start AND ROWID < :next_start
I think that works for ROWIDs. Hopefully also for UROWIDs.
First statement only upper bound, last statement only lower bound and you don't miss any late inserts outside.

> Though this may lead to some highly uneven chunks...

You could try to chunk WITH your filter as a reader from India did.

regards,

Chris Saxon
September 05, 2018 - 12:36 pm UTC

There are no rowid gaps in the original, remember group by doesn't sort! Rowid AAAveZAAdAAAo4sADI is last in the results...

So you can use between :start and :end.

Ga(s)p

Racer I., September 06, 2018 - 8:37 am UTC

Hi,

> There are no rowid gaps in the original,

Ok. But might there be? If you chunk over existing data and use min/max couldn't there be gaps with no existing row now but which is later filled by an insert?

Same for the first/last since there is no known minimum/maximum (U)ROWID (or can you use AAAAA... to ZZZZZ...)?

regards,
Chris Saxon
September 06, 2018 - 2:21 pm UTC

Well someone could always move the table between you generating the chunks and running the parallel task. Which could mean all the rows get new rowids. So the parallel processes find no rows!

So you always need to consider concurrency issues and defend against them appropriately.

(over)lapdog

Racer I., September 06, 2018 - 8:51 am UTC

Hi,

I just checked the first chunking statement again and found it uses the table-field for order by. The second one uses the ROWID. I think the latter is generally the better choice since it generates non-overlapping rowd-id ranges so the parallel processes don't have to scan the same blocks.
Both can still generate gaps though :

create table fstest (ID NUMBER(4));

insert into fstest (ID) values (1);
insert into fstest (ID) values (2);
insert into fstest (ID) values (3);
insert into fstest (ID) values (4);
insert into fstest (ID) values (5);

commit;

delete from fstest where ID = 3;

commit;

select min(r) start_id, max(r) end_id from (
SELECT ntile(2) over (order by ROWID) grp, rowid r
FROM fstest)
group by grp;

START_ID END_ID
AAUQ4lAAAAAAEZ0AAA AAUQ4lAAAAAAEZ0AAB
AAUQ4lAAAAAAEZ0AAD AAUQ4lAAAAAAEZ0AAE

insert into fstest (ID) values (3);

commit;

select '' || ROWID, ID from fstest;

Using start/end would miss a new insert in *C
In this case the 3 was inserted at *F which would also be missed.

regards,
Chris Saxon
September 06, 2018 - 2:26 pm UTC

Well, it depends on what your parallel process does. But yes, you need to consider changes other users may make while the process runs.

More to Explore

Analytics

Analytic SQL got you confused? Check out Connor McDonald's complete video course.