You Asked

Hi,

I was trying to optimize one of the batch update queries (given below) in my application

UPDATE schema1.TEST_RECORDS PARTITION (PARTDEF) gr SET  gr.purge_status_cd = 'PURGE_PENDING', gr.purge_date = SYSDATE WHERE EXISTS ( SELECT 1 FROM SCHEMA2.TEST_ACTIONS PARTITION (PART1) ca WHERE gr.emcp = ca.action_id )

The problems with the current approaches are

1. Taking more than 5 hours to complete as it is updating several millions of records.
2. It is a single transaction

I went through several tutorials and finally decided to go with Parallel PL/SQL as stated in this link http://www.orafaq.com/node/2450 I tested both approaches with 1M records and I do not see a big difference with respect to performance

CREATE OR REPLACE TYPE array_t AS TABLE OF NUMBER(38);
/
create or replace FUNCTION PARALLEL_PURGE( IN_CURSOR IN SYS_REFCURSOR) 
RETURN array_t
PARALLEL_ENABLE (PARTITION IN_CURSOR BY ANY)
PIPELINED
IS
 PRAGMA AUTONOMOUS_TRANSACTION;
 
 TYPE NUM_TYPE IS TABLE OF NUMBER(38);
 ACTION_ID NUM_TYPE;
 cnt INTEGER := 0;
BEGIN
 LOOP
  FETCH IN_CURSOR BULK COLLECT INTO ACTION_ID LIMIT 50000;
  EXIT WHEN ACTION_ID.COUNT() = 0;

  FORALL i IN ACTION_ID.FIRST .. ACTION_ID.LAST
    
   UPDATE schema1.TEST_RECORDS PARTITION (PARTDEF) SET purge_status_cd = 'PURGE_PENDING' , purge_date = SYSDATE WHERE emcp = ACTION_ID(i);
   
  commit;
  cnt := cnt + ACTION_ID.COUNT;
 END LOOP;
 
 CLOSE IN_CURSOR;
    COMMIT;
    PIPE ROW(cnt);
 RETURN;
 EXCEPTION
  WHEN OTHERS THEN
     ROLLBACK; 
     CLOSE IN_CURSOR;
     PIPE ROW(cnt);
     RAISE;  
END;
/

I call my new function like this

SELECT sum(column_value) FROM   TABLE(PARALLEL_PURGE(CURSOR( SELECT ca.action_id FROM SCHEMA2.TEST_ACTIONS partition(part1) ca INNER JOIN schema1.TEST_RECORDS PARTITION(PARTDEF) grxml ON (ca.action_id = grxml.emcp))));

I have few queries as stated below

1. Do you suggest any other approach which will give me a better performance.
2. Currently I am using PARTITION IN_CURSOR BY ANY. I have seen that we have an option to use RANGE/HASH in place of ANY. Since I am passing a Weak REF CURSOR, I am getting a compilation problem
3. Will i get any performance improvement if is use HASH/RANGE? What is the recomended approach to pass a strong ref cursor

Appreciate your help

and Connor said...

Yes. Parallel enabling functions is designed so that can be used in a parallel query, not really for doing parallel operations *within* the function.

What you probably want here is DBMS_PARALLE_EXECUTE. The basic premise is

- you have a workload that can carved up in a logical way into smaller chunks
- define a routine that will work on one of those chunks
- it uses to job scheduler to submit multiple jobs, each working on a chunk, and they run in parallel

In your example, you might carve up the workload by logical chunks of action_id's

Some examples here

https://asktom.oracle.com/pls/apex/asktom.search?tag=how-to-use-dbms-parallel-execute-to-chunk-over-db-link

Rating

(1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

observations on the pipelined approach

Racer I., December 19, 2018 - 8:12 am UTC

Hi,

DBMS_PARALLEL_EXECUTE ist fine but we also have used the pipelined approach successfully with a big migration project. I assume you have some way to actually get it to execute in parallel. We used a hint in the inner select

... CURSOR( SELECT /*+ PARALLAL(16) */ ca.action_id ...

But you can use ALTER SESSION or the base tables have a parallel degree or use one of the automatic parallel modes.

My take away as to ANY was that you should go with ANY whenever possible as it gave use the most equal distribution. The others (PARTITION HASH/RANGE, ORDER, CLUSTER) led to more skew so some parallel workers took way longer than the others, which is of course very inefficient (and you should check if this happens).

Also (in case you need this) : We used views for the inner selects that had placeholder comments for the parallel degree. The start code read their statements from User_Views, replaced the parallel degree and executed them with execute immediate. That way we got the flexibility to adjust the parallel hint (maybe you need to vary the partition name?) and the statements where still checked for validity by being views. Also they could be changed without recompiling the start code which didn't even become invalid, because it didn't depend on the view directly.

A question as to the update : is there a chance this will make the rows longer?
If so you may introduce some row chaining. To avoid it look at adding a state table that just references the base table.
This will get inserts instead of updates which can even be faster and can avoid ORA-1550 if the criteria you change were used to identify the rows to change. Like if your update fails in the middle a rerun would only need to update the rows that aren't already updated.
You can also use a state table to keep a state history (but then finding the current state requires windowing functions).
Alternatively use a state_id (plus reference table) including a NULL-state, so the field can be NOT NULL.
Hopefully you don't have multiple states each with their own state_date (like the purge_date) that start out NULL.

We actually never closed the in_cursor so that may be optional.

A (theoretical?) problem with DBMS_PARALLEL_EXECUTE is if you can't use the chunks (action_id betwen low and high) directly but you need to run the inner select with that filter because only some of the action_ids in that range need to be updated. You can't pass a list of already filtered action_ids. This means the inner select is executed X times (once per chunk) which can be (depending on the plan) more demanding then the pipelined approach, which executes the inner select just once.

Rerunning (all failed/unprocessed chunks) is easy with DBMS_PARALLEL_EXECUTE. We used an approach using SAVE EXCEPTIONS with FORALL and having a state-table for failed rows (with the exception as reason).
Small discovery : If multiple rows failed with different exceptions, SAVE EXCEPTIONS reported the same (first?) exception for all, which was confusing. So we retried each failed row by itself, to get the actual exception for it.

regards,

December 20, 2018 - 7:55 am UTC

great input.

Something I've not tried with dbms_parallel_execute, but I imagine is possible, is for some activities to have the result_cache hint so that as one "slave" does the hard work, the others will eventually pick up the benefit.

It may even be the case that you'd to "pre-run" such a query before any tasks commenced.

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

observations on the pipelined approach

More to Explore

PL/SQL demos

PL/SQL docs

PL/SQL Parallel function performance

Breadcrumb

Question and Answer

You Asked

and Connor said...

Rating

Comments

observations on the pipelined approach

More to Explore

PL/SQL demos

PL/SQL docs