Database, SQL and PL/SQL

On Working in Parallel

 

Finish faster with DBMS_PARALLEL_EXECUTE.

By Steven Feuerstein Oracle ACE Director

May/June 2010

 

I have to perform DML [data manipulation language] operations on millions of rows of data. Writing the SQL statement to perform the changes isn’t hard, but managing rollback segments and getting the process to finish in an acceptable amount of time are. Does Oracle Database 11g Release 2 offer any new features that can help?

Each new version of Oracle Database brings an ever-widening array of built-in or supplied packages that extend the capabilities of Oracle Database. Oracle Database 11g Release 2 is no exception: it contains more than 50 new packages (all documented in Oracle Database PL/SQL Packages and Types Reference ), including DBMS_PARALLEL_EXECUTE.

DBMS_PARALLEL_EXECUTE now provides the ability to break up a large table according to a variety of criteria, from ROWID ranges to key values and user-defined methods. You can then run a SQL statement or a PL/SQL block against these different “chunks” of the table in parallel, using the database scheduler to manage the processes running in the background. Error logging, automatic retries, and commits are integrated into the processing of these chunks.

To use DBMS_PARALLEL_EXECUTE to run tasks in parallel, your schema will need the CREATE JOB system privilege. You can then use the following subprograms of the built-in package (these are the most commonly used of the package’s routines) to achieve your goal:

• CREATE_TASK creates a named task to be managed by DBMS_PARALLEL_EXECUTE.
• CREATE_CHUNKS_BY_ROWID defines by ROWID the various chunks of the total set of rows to be modified by the SQL statement.
• CREATE_CHUNKS_BY_SQL defines, by a user-specified SQL statement, the chunking of data.
• CREATE_CHUNKS_BY_NUMBER_COL defines, by a numeric column, the chunking of data.
• RUN_TASK runs the named task after chunking has been defined.
• TASK_STATUS obtains the status of the task.
• STOP_TASK stops the task.
• RESUME_TASK resumes the task.
• DROP_TASK removes the task when it has been completed.

All these subprograms and any others in DBMS_PARALLEL_EXECUTE—except for TASK_STATUS—perform a commit.

Let’s look at a few examples of how to use this package. We will start with the simplest approach: chunking by ROWID. Suppose I need to apply a raise in salary (specified by a percentage) to all the employees in our company. In case my parallelized task fails for some reason, I want to be able to retry a specified number of times to complete it.

Listing 1 displays the code for implementing a parallelized apply_raise procedure.

Code Listing 1: Chunking by ROWID in the apply_raise procedure

SQL> PROCEDURE apply_raise (
 2     pct_in     IN NUMBER
 3   , retries_in IN PLS_INTEGER DEFAULT 2
 4  )
 5  IS
 6     c_update_statement CONSTANT VARCHAR2 (1000)
 7           := 'UPDATE /*+ ROWID (dda) */ EMPLOYEES emp
 8        SET emp.salary = emp.salary * (1.0 + pct_in/100)
 9        WHERE ROWID BETWEEN :starting_rowid AND :ending_rowid';
10     c_task_name   CONSTANT VARCHAR2 (20) := 'Give Raise';
11     l_attempts    PLS_INTEGER := 1;
12  BEGIN
13     DBMS_PARALLEL_EXECUTE.CREATE_TASK (c_task_name);
14
15     DBMS_PARALLEL_EXECUTE.
16     CREATE_CHUNKS_BY_ROWID (task_name => c_task_name
17                           , table_owner => USER
18                           , table_name => 'EMPLOYEES'
19                           , by_row => TRUE
20                           , chunk_size => 1000
21                            );
22
23    DBMS_PARALLEL_EXECUTE.
24        RUN_TASK (task_name => c_task_name
25                , sql_stmt => c_update_statement
26                , language_flag => DBMS_SQL.native
27                , parallel_level => 10
28                 );
29
30     LOOP
31        EXIT WHEN DBMS_PARALLEL_EXECUTE.TASK_STATUS (c_task_name) =
32                     DBMS_PARALLEL_EXECUTE.FINISHED
33                  OR l_attempts > retries_in;
34        l_attempts := l_attempts + 1;
35        DBMS_PARALLEL_EXECUTE.RESUME_TASK (c_task_name);
36     END LOOP;
37
38     DBMS_PARALLEL_EXECUTE.DROP_TASK (c_task_name);
39* END apply_raise;

Table 1 explains the use of DBMS_PARALLEL_EXECUTE subprograms in the apply_raise procedure in Listing 1. The steps in lines 6 through 28 create the task, specify chunking, and run the task. If you are sure the task will complete or if you do not want to recover from any failures, you can simply proceed to line 38 and drop the task.

Line(s) Significance
6 The SQL statement that performs the required update. The hint explicitly chooses a table scan by rowid for the employees table. I also include two placeholders (:starting_rowid and :ending_rowid). This statement will be executed with DBMS_SQL, and the placeholders will be replaced with specific ROWID values as determined by chunk size (see line 20).
10 Defining the task name in a constant to avoid using the hard-coded literal throughout the program.
13 Creating the new task.
15-21 Specifying chunking by ROWID for this task and the specified table. The by_row parameter is set to TRUE so that the chunk size (next argument) refers to the number of rows, not the number of blocks (by_row = FALSE).
23-28 Running this task for the specified UPDATE statement with 10 simultaneous jobs.
30-33 Starting up a simple loop. Exit the loop if the task status returns “finished” (specified through a package constant) or if the number of attempts exceeds the retry parameter.
35 Calling the RESUME_TASK procedure to resume the task, finishing any incomplete tasks.
38 Dropping the task when it is completed or retries are exhausted.
Table 1: Use of DBMS_PARALLEL_EXECUTE subprograms in the apply_raise procedure
 

Given that any number of errors can occur during execution of DML statements, however, you may want to build into your parallel task execution the ability to check the status of the task and resume it if a failure has occurred. Lines 30 through 35 in the apply_raise procedure address that challenge.

As you can see, DBMS_PARALLEL_EXECUTE offers an elegant, high-level API for specifying the parallel execution of a DML statement.

Chunking by User SQL Statement

As an alternative to chunking by ROWID, you can provide your own query to specify how you want your data chunked and modified in parallel. The query must contain two columns—start_id and end_id—both of which must be ROWIDs or numbers. Each row retrieved from this query against the “chunking table” must specify the start and end values of the chunk.

 

Suppose I want to execute my update against employees by ranges of department IDs. I can create a chunking table (see Listing 2) and then modify my apply_raise procedure. To do this, first I change the WHERE clause of the update statement to specify ranges of department IDs:

c_update_statement
CONSTANT VARCHAR2 (1000)
      := 'UPDATE EMPLOYEES emp
            SET emp.salary =
emp.salary * (1.0 + pct_in/100)
           WHERE department_id
         BETWEEN :starting_deptid
AND :ending_deptid' ;

Then I define the query used to specify the chunking rows:

c_chunk_statement
CONSTANT VARCHAR2 (1000)
      := 'SELECT start_id, end_id
FROM department_chunks';

Finally, I create chunks by this SQL statement, specifying that ROWIDs are not used:

DBMS_PARALLEL_EXECUTE.
CREATE_CHUNKS_BY_SQL (
    task_name => c_task_name
   , sql_stmt        => c_chunk_statement
   , by_rowid       => FALSE
                      );

The remainder of the apply_raise procedure remains unchanged.

Code Listing 2: Creating a chunking table

CREATE TABLE department_chunks (start_id INTEGER, end_id INTEGER)
/
BEGIN
   INSERT INTO department_chunks
   VALUES (1, 500);
   INSERT INTO department_chunks
   VALUES (501, 1000);
   INSERT INTO department_chunks
   VALUES (1001, 1500);
   COMMIT;
END;
/

Chunking by Numeric Column

If the table I am changing contains a numeric column whose values can be used to define the ranges of rows to be updated in parallel, I can specify chunking with this procedure:

DBMS_PARALLEL_EXECUTE.CREATE_
CHUNKS_BY_NUMBER_COL (
   task_name       IN  VARCHAR2,
   table_owner     IN  VARCHAR2,
   table_name      IN  VARCHAR2,
   table_column    IN  VARCHAR2,
   chunk_size      IN  NUMBER);

In other words, I simply need to provide the name of the numeric column. Oracle Database 11g Release 2 then computes the MIN and MAX values of the specified column and divides the ranges evenly, as specified by the chunk size. To implement this approach, I would change the original apply_raise procedure.

First, I change the WHERE clause of the update statement to specify ranges of department IDs:

c_update_statement
CONSTANT VARCHAR2 (1000)
      := 'UPDATE EMPLOYEES emp
             SET emp.salary =
emp.salary * (1.0 + pct_in/100)
           WHERE department_id
         BETWEEN :starting_deptid
AND :ending_deptid' ;

Next, I create chunks by the department_id column:

DBMS_PARALLEL_EXECUTE.
CREATE_CHUNKS_BY_NUMBER_COL (
    task_name    => c_task_name
  , table_owner  => USER
  , table_name   => 'EMPLOYEES'
  , table_column => 'DEPARTMENT_ID'
  , chunk_size   => 1000
  );

The remainder of the apply_raise procedure remains unchanged.

DBMS_PARALLEL_EXECUTE provides many more subprograms and supports many more features than can be covered in this column. For example, rather than simply asking to run a task with RUN_TASK, you can control chunk execution by getting a specific chunk with GET_ROWID_CHUNK and then executing it with EXECUTE IMMEDIATE. You can then immediately resolve any errors and decide if you want to commit the changes.

This introduction to DBMS_PARALLEL_EXECUTE should make it clear that with Oracle Database 11g Release 2, it is much easier to parallelize execution of DML operations.

Next Steps

READ more Best Practice PL/SQL
otn.oracle.com/oramag/oracle/plsql
 otn.oracle.com/pub/columns/plsql
 Oracle Database PL/SQL Packages and Types Reference

 DISCUSS PL/SQL

 DOWNLOAD Oracle Database 11g Release 2



 

DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.