May/June 2010
I have to perform DML [data manipulation language] operations on millions of rows of data. Writing the SQL statement to perform the changes isn’t hard, but managing rollback segments and getting the process to finish in an acceptable amount of time are. Does Oracle Database 11g Release 2 offer any new features that can help?
Each new version of Oracle Database brings an ever-widening array of built-in or supplied packages that extend the capabilities of Oracle Database. Oracle Database 11g Release 2 is no exception: it contains more than 50 new packages (all documented in Oracle Database PL/SQL Packages and Types Reference ), including DBMS_PARALLEL_EXECUTE.
DBMS_PARALLEL_EXECUTE now provides the ability to break up a large table according to a variety of criteria, from ROWID ranges to key values and user-defined methods. You can then run a SQL statement or a PL/SQL block against these different “chunks” of the table in parallel, using the database scheduler to manage the processes running in the background. Error logging, automatic retries, and commits are integrated into the processing of these chunks.
To use DBMS_PARALLEL_EXECUTE to run tasks in parallel, your schema will need the CREATE JOB system privilege. You can then use the following subprograms of the built-in package (these are the most commonly used of the package’s routines) to achieve your goal:
• CREATE_TASK creates a named task to be managed by DBMS_PARALLEL_EXECUTE.
• CREATE_CHUNKS_BY_ROWID defines by ROWID the various chunks of the total set of rows to be modified by the SQL statement.
• CREATE_CHUNKS_BY_SQL defines, by a user-specified SQL statement, the chunking of data.
• CREATE_CHUNKS_BY_NUMBER_COL defines, by a numeric column, the chunking of data.
• RUN_TASK runs the named task after chunking has been defined.
• TASK_STATUS obtains the status of the task.
• STOP_TASK stops the task.
• RESUME_TASK resumes the task.
• DROP_TASK removes the task when it has been completed.
All these subprograms and any others in DBMS_PARALLEL_EXECUTE—except for TASK_STATUS—perform a commit.
Let’s look at a few examples of how to use this package. We will start with the simplest approach: chunking by ROWID. Suppose I need to apply a raise in salary (specified by a percentage) to all the employees in our company. In case my parallelized task fails for some reason, I want to be able to retry a specified number of times to complete it.
Listing 1 displays the code for implementing a parallelized apply_raise procedure.
Code Listing 1: Chunking by ROWID in the apply_raise procedure
SQL> PROCEDURE apply_raise ( 2 pct_in IN NUMBER 3 , retries_in IN PLS_INTEGER DEFAULT 2 4 ) 5 IS 6 c_update_statement CONSTANT VARCHAR2 (1000) 7 := 'UPDATE /*+ ROWID (dda) */ EMPLOYEES emp 8 SET emp.salary = emp.salary * (1.0 + pct_in/100) 9 WHERE ROWID BETWEEN :starting_rowid AND :ending_rowid'; 10 c_task_name CONSTANT VARCHAR2 (20) := 'Give Raise'; 11 l_attempts PLS_INTEGER := 1; 12 BEGIN 13 DBMS_PARALLEL_EXECUTE.CREATE_TASK (c_task_name); 14 15 DBMS_PARALLEL_EXECUTE. 16 CREATE_CHUNKS_BY_ROWID (task_name => c_task_name 17 , table_owner => USER 18 , table_name => 'EMPLOYEES' 19 , by_row => TRUE 20 , chunk_size => 1000 21 ); 22 23 DBMS_PARALLEL_EXECUTE. 24 RUN_TASK (task_name => c_task_name 25 , sql_stmt => c_update_statement 26 , language_flag => DBMS_SQL.native 27 , parallel_level => 10 28 ); 29 30 LOOP 31 EXIT WHEN DBMS_PARALLEL_EXECUTE.TASK_STATUS (c_task_name) = 32 DBMS_PARALLEL_EXECUTE.FINISHED 33 OR l_attempts > retries_in; 34 l_attempts := l_attempts + 1; 35 DBMS_PARALLEL_EXECUTE.RESUME_TASK (c_task_name); 36 END LOOP; 37 38 DBMS_PARALLEL_EXECUTE.DROP_TASK (c_task_name); 39* END apply_raise;
Table 1 explains the use of DBMS_PARALLEL_EXECUTE subprograms in the apply_raise procedure in Listing 1. The steps in lines 6 through 28 create the task, specify chunking, and run the task. If you are sure the task will complete or if you do not want to recover from any failures, you can simply proceed to line 38 and drop the task.
Line(s) | Significance |
6 | The SQL statement that performs the required update. The hint explicitly chooses a table scan by rowid for the employees table. I also include two placeholders (:starting_rowid and :ending_rowid). This statement will be executed with DBMS_SQL, and the placeholders will be replaced with specific ROWID values as determined by chunk size (see line 20). |
10 | Defining the task name in a constant to avoid using the hard-coded literal throughout the program. |
13 | Creating the new task. |
15-21 | Specifying chunking by ROWID for this task and the specified table. The by_row parameter is set to TRUE so that the chunk size (next argument) refers to the number of rows, not the number of blocks (by_row = FALSE). |
23-28 | Running this task for the specified UPDATE statement with 10 simultaneous jobs. |
30-33 | Starting up a simple loop. Exit the loop if the task status returns “finished” (specified through a package constant) or if the number of attempts exceeds the retry parameter. |
35 | Calling the RESUME_TASK procedure to resume the task, finishing any incomplete tasks. |
38 | Dropping the task when it is completed or retries are exhausted. |
Given that any number of errors can occur during execution of DML statements, however, you may want to build into your parallel task execution the ability to check the status of the task and resume it if a failure has occurred. Lines 30 through 35 in the apply_raise procedure address that challenge.
As you can see, DBMS_PARALLEL_EXECUTE offers an elegant, high-level API for specifying the parallel execution of a DML statement.
Chunking by User SQL StatementAs an alternative to chunking by ROWID, you can provide your own query to specify how you want your data chunked and modified in parallel. The query must contain two columns—start_id and end_id—both of which must be ROWIDs or numbers. Each row retrieved from this query against the “chunking table” must specify the start and end values of the chunk.
Suppose I want to execute my update against employees by ranges of department IDs. I can create a chunking table (see Listing 2) and then modify my apply_raise procedure. To do this, first I change the WHERE clause of the update statement to specify ranges of department IDs:
c_update_statement CONSTANT VARCHAR2 (1000) := 'UPDATE EMPLOYEES emp SET emp.salary = emp.salary * (1.0 + pct_in/100) WHERE department_id BETWEEN :starting_deptid AND :ending_deptid' ;
Then I define the query used to specify the chunking rows:
c_chunk_statement CONSTANT VARCHAR2 (1000) := 'SELECT start_id, end_id FROM department_chunks';
Finally, I create chunks by this SQL statement, specifying that ROWIDs are not used:
DBMS_PARALLEL_EXECUTE. CREATE_CHUNKS_BY_SQL ( task_name => c_task_name , sql_stmt => c_chunk_statement , by_rowid => FALSE );
The remainder of the apply_raise procedure remains unchanged.
Code Listing 2: Creating a chunking table
CREATE TABLE department_chunks (start_id INTEGER, end_id INTEGER) / BEGIN INSERT INTO department_chunks VALUES (1, 500); INSERT INTO department_chunks VALUES (501, 1000); INSERT INTO department_chunks VALUES (1001, 1500); COMMIT; END; /
If the table I am changing contains a numeric column whose values can be used to define the ranges of rows to be updated in parallel, I can specify chunking with this procedure:
DBMS_PARALLEL_EXECUTE.CREATE_ CHUNKS_BY_NUMBER_COL ( task_name IN VARCHAR2, table_owner IN VARCHAR2, table_name IN VARCHAR2, table_column IN VARCHAR2, chunk_size IN NUMBER);
In other words, I simply need to provide the name of the numeric column. Oracle Database 11g Release 2 then computes the MIN and MAX values of the specified column and divides the ranges evenly, as specified by the chunk size. To implement this approach, I would change the original apply_raise procedure.
First, I change the WHERE clause of the update statement to specify ranges of department IDs:
c_update_statement CONSTANT VARCHAR2 (1000) := 'UPDATE EMPLOYEES emp SET emp.salary = emp.salary * (1.0 + pct_in/100) WHERE department_id BETWEEN :starting_deptid AND :ending_deptid' ;
Next, I create chunks by the department_id column:
DBMS_PARALLEL_EXECUTE. CREATE_CHUNKS_BY_NUMBER_COL ( task_name => c_task_name , table_owner => USER , table_name => 'EMPLOYEES' , table_column => 'DEPARTMENT_ID' , chunk_size => 1000 );
The remainder of the apply_raise procedure remains unchanged.
DBMS_PARALLEL_EXECUTE provides many more subprograms and supports many more features than can be covered in this column. For example, rather than simply asking to run a task with RUN_TASK, you can control chunk execution by getting a specific chunk with GET_ROWID_CHUNK and then executing it with EXECUTE IMMEDIATE. You can then immediately resolve any errors and decide if you want to commit the changes.
This introduction to DBMS_PARALLEL_EXECUTE should make it clear that with Oracle Database 11g Release 2, it is much easier to parallelize execution of DML operations.
READ more Best Practice PL/SQL
|
DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.