Database, SQL and PL/SQL

The Joy of Low-Hanging Fruit

Are you using BULK COLLECT and FORALL for bulk processing yet?

By Steven Feuerstein

July/August 2014

The following is based on true events. The names have been changed to protect people and intellectual property.

Answer to Last Issue’s Challenge The PL/SQL Challenge quiz in last issue’s “Writing SQL in Oracle Application Express” article offered different ways to implement the query for an Oracle Application Express interactive report so that it could return data from two different queries. Choices (b) and (c) are correct: you can use a table function or, for simpler scenarios, a union of queries that are mutually exclusive. You cannot, however, use a dynamic query with an interactive report, so choice (a) is incorrect.

I recently spent a few days with a team of developers at extremememe.info, an up-and-coming Web 3.0 paradigm-shifter that analyzes internet memes, tracks them to their source, and—best of all—predicts new and future memes.

The developers at extremememe.info are very knowledgeable about Oracle Database and PL/SQL, but as is the case with many other developers, they have little time to explore new technology features. They struggle to keep up with the demands of their users, and even when they know about some great new feature, it can be a challenge to convince management to commit the resources to apply those features to stable production code.

We interspersed training on PL/SQL techniques with reviews of their code and, in the process, came across a program that runs in a daily batch process and updates the status of all the memes. Unfortunately, as extremememe.info’s data volumes grew, the process was taking longer and longer, approaching 23 hours. Given how uncomfortably close that was to a full day, we decided to take a closer look.

On Closer Inspection

The meme update process uses the following three tables: em_memes, em_mentions, and em_incoming. (You can create these tables by using the emmemes_setup.sql script in the download for this article.) Note that the columns of these tables have been simplified for the purposes of this article.

The developers at extremememe.info use the em_memes table for all the memes of which they are aware and whose status they track. (Note that setting the default value of the primary key to the next sequential value is an Oracle Database 12c feature; in Oracle Database 11g Release 2 and earlier, you would use a trigger to achieve the same result. The em_memes_setup.sql script includes this code.)

CREATE TABLE em_memes
(
  meme_id        INTEGER 
                 DEFAULT 
                 em_memes_seq.NEXTVAL 
                 PRIMARY KEY,
  meme_name      VARCHAR2 (1000) 
                 UNIQUE,
  discovered_on  DATE,
  meme_status    VARCHAR2 (100)
)
/

The meme_status value can be VIRAL, DORMANT, HOAX, and so on. extremememe.info has a table for all the mentions of, or references to, a meme:

CREATE SEQUENCE em_mentions_seq
/
CREATE TABLE em_mentions
(
  mention_id   INTEGER
               DEFAULT 
               em_mentions_seq.NEXTVAL
               PRIMARY KEY,
  meme_id      INTEGER    
               REFERENCES 
               em_memes (meme_id),
  source_name     VARCHAR2 (100),
  source_details  CLOB,
  occurred_on     DATE
)
/

Meme mentions are loaded from many different sources, and extremememe.info relies on a staging table to collect data from all sources:

CREATE TABLE em_incoming
(
  meme_name        VARCHAR2 (1000),
  source_name      VARCHAR2 (100),
  occurred_on      DATE,
  mention_details  CLOB
)
/

The developers at extremememe.info also use the following em_memes_pkg package to log errors and to perform some critical proprietary computations, based on the content of those tables:

CREATE OR REPLACE PACKAGE em_memes_pkg
IS
  TYPE incoming_t IS TABLE OF 
  em_incoming%ROWTYPE;
  TYPE mentions_t IS TABLE OF 
  em_mentions%ROWTYPE;
  TYPE meme_ids_t IS TABLE OF 
  em_memes.meme_id%TYPE;
  PROCEDURE log_error (
    error_code_in   
    IN INTEGER DEFAULT SQLCODE,
    error_msg_in    
    IN VARCHAR2 
       DEFAULT 
       DBMS_UTILITY.format_error_stack);
  FUNCTION unpacked_incoming (
    incoming_in   IN 
    em_incoming%ROWTYPE)
    RETURN em_mentions%ROWTYPE;
  PROCEDURE reset_meme_status (
    meme_id_in       IN
    em_memes.meme_id%TYPE,
    new_status_out   OUT 
    em_memes.meme_status%TYPE);
END;
/

For space reasons, I will not provide the package body code for em_memes_pkg, but Table 1 lists what the elements in the specification provide.

em_memes_pkg.incoming_t	This is a nested table type that contains rows of data from the em_incoming table.
em_memes_pkg.mentions_t	This is a nested table type that contains rows of data from the em_mentions table.
em_memes_pkg.meme_ids_t	This is a nested table type that contains primary keys from the em_memes table.
em_memes_pkg.log_error	This is a generic (and typical) error-logging mechanism. Defined as an autonomous transaction, it writes out the basic error information available from the DBMS_UTILITY functions (and more) to a log table and then commits just that insert.
em_memes_pkg.unpacked_incoming	This is a function that converts a row of incoming data (whose mention details column is an XML document that holds the different formats of source information) into a record that can be inserted into the em_mentions table.
em_memes_pkg.reset_meme_status	The heart and soul of the extremememe.info proprietary process, this procedure analyzes the contents of the mentions table and ascertains the status of the meme.

Table 1: em_memes_pkg elements and descriptions

Listing 1 shows the code for the status update process in the em_update_status procedure. (Listing 1 is available as em_update_status_old.sql in the download for this article.) Descriptions of the key parts of the em_update _status package appear at the end of Listing 1.

Code Listing 1: Meme status updater, version 1

  1  CREATE OR REPLACE PROCEDURE em_update_status
  2  IS
  3     CURSOR incoming_cur
  4     IS
  5        SELECT * FROM em_incoming;
  6
  7     l_mention   em_mentions%ROWTYPE;
  8     l_status    em_memes.meme_status%TYPE;
  9  BEGIN
 10     FOR incoming_r IN incoming_cur
 11     LOOP
 12        BEGIN
 13           SAVEPOINT new_transaction;
 14
 15           l_mention := em_memes_pkg.unpacked_incoming (incoming_r);
 16
 17           INSERT INTO em_mentions (meme_id,
 18                                    source_name,
 19                                    source_details,
 20                                    occurred_on)
 21                VALUES (l_mention.meme_id,
 22                        l_mention.source_name,
 23                        l_mention.source_details,
 24                        l_mention.occurred_on);
 25
 26           em_memes_pkg.reset_meme_status (l_mention.meme_id,
 27                                           l_status);
 28
 29           IF l_status IS NOT NULL
 30           THEN
 31              UPDATE em_memes
 32                 SET meme_status = l_status
 33               WHERE meme_id = l_mention.meme_id;
 34           END IF;
 35        EXCEPTION
 36           WHEN OTHERS
 37           THEN
 38              em_memes_pkg.log_error;
 39              ROLLBACK TO new_transaction;
 40        END;
 41     END LOOP;
 42  END;

Line(s)	Description
10	For each row in the staging table (em_incoming)…
12, 40	Put the entire body of the loop inside its own nested block so that any exception can be trapped and logged.
13	Set a savepoint for this new transaction.
15	Convert the incoming row to a record that can be inserted into the em_mentions table.
17–24	Insert a single row into the em_mentions table.
26–27	Compute the new status for the meme, based on the new mention.
29–34	If the status isn’t NULL, update the em_memes table with that status.
35–39	If anything goes wrong, log the error and then erase the effects of the insert and/or the update by rolling back to the savepoint.

I was dismayed when I saw extremememe.info’s implementation of em_memes_pkg.reset_meme_status, because I found in that procedure a cursor FOR loop that contained two nonquery data manipulation language (DML) statements. That is a classic anti-pattern, a general pattern of coding that should be avoided. It should be avoided because the inserts and updates are changing the tables on a row-by-row basis, which maximizes the number of context switches (between SQL and PL/SQL) and consequently greatly slows the performance of the code. Fortunately, this classic antipattern has a classic, well-defined solution: use BULK COLLECT and FORALL to switch from row-by-row processing to bulk processing.

The presence of this antipattern was also a source of delight for me. The developers needed to improve the performance of the procedure significantly. That would have been very hard to do if they had already taken full advantage of SQL and PL/SQL performance optimization features. Because they had not done so, they were looking at a procedure full of low-hanging fruit. That is, they were looking at the possibility of solving their problem with a relatively straightforward change to their existing code.

Code Review Identifies a Logic Flaw

I shared my dismay and delight with the extremememe.info developer team. They were excited about the potential improvements and were eager to get started. Then I noticed something odd about the algorithm they’d presented. A brief Q&A session soon clarified matters:

“Can the table of incoming data contain more than one reference to the same meme?”

Heads nodded vigorously. They told me that the more viral the behavior of a meme, the more mentions there would be. In the case of some very popular memes, the daily incoming could contain millions of references to the same meme. That led directly to my second question:

“Then why are you updating the meme status after each insert of a mention?”

Now eyebrows shot up and heads were turned. “Wow,” said one developer. “Oh. My. Gosh.” said another.

The team lead turned to me. “We never thought of that before. We’ve been doing an enormous amount of unnecessary processing. That’s kind of embarrassing. It sure helps to have a fresh pair of eyes looking at our code.”

Yes, it does. Never underestimate the power of your brain to hide the obvious from you. Once you’ve spent a lot of time writing or working with a program, you don’t really see the code anymore. You see what you think the code should be. When someone else looks at it, they come at it without any preconceived notions and perceive very different aspects of the program.

If you’re the only one who’s ever looked at your code, you can be certain it contains bugs and perhaps even substantial algorithm errors.

Members of the extremememe.info developer team quickly agreed that the statuses of the memes should be updated only after all mentions were inserted. Furthermore, only those memes with new mentions should be updated.

In the remainder of this article, I show you the end result of the rewrite of extremememe.info’s em_memes_pkg.reset_meme_status procedure into a program that runs significantly faster and made it possible for data volumes to grow substantially while still allowing the batch process to be finished in a single day. My stevenfeuersteinonplsql.blogspot.com blog expands on this article to take you through those changes in a step-by-step process.

On Switching to Bulk Processing

Although row-by-row processing can be unacceptably slow for large volumes of data, it offers a significant advantage for execution of nonquery DML: it is easy to define and manage a transaction.

In the reset_meme_status procedure, the transaction consists of two DML processes:

Insert a row into the mentions table.
Update the memes table.

A quick glance at the original procedure code in Listing 1 shows that these two processes go together: if the insert fails, the update won’t execute. If the update fails, the insert will be rolled back. And in either case, the error will be logged and processing will continue.

With the switch to bulk processing, the procedure moves from a row-by-row approach to a phased approach. For reset_meme_status, this means that the procedure will perform all the inserts first and then execute all the updates.

Generally, you can think of bulk processing as having three distinct phases:

Phase 1: Populate the collections that will drive the FORALL processing. This is usually done with a BULK COLLECT query.
Phase 2: Modify the contents of the collections as needed to prepare them for Phase 3.
Phase 3: Execute the FORALL statement to push the data into the table(s).

Figure 1 shows these three phases.

o44plsql-f1

Figure 1: Bulk processing phases

In the real world, life is complicated and there are always exceptions. Sometimes you will not need Phase 1, because the collections are already filled and being passed to your program, and sometimes you can skip Phase 2, because the collection is ready to go, to be used immediately in the FORALL statement.

The Bulk Processing Implementation

The code in Listing 2 is the rewrite of the too slow em_update_status procedure. This new version takes advantage of nested procedures (procedures declared within another procedure or function). The code for these nested procedures—get_next_set_of_incoming, unpack_and_record_mentions, and reset_meme_statuses—is presented and analyzed later in this article. Descriptions of the key parts of the revised em_update _status package appear at the end of Listing 2.

Code Listing 2: Meme status updater, version 2

  1  CREATE OR REPLACE PROCEDURE em_update_status (
  2     bulk_collect_limit_in   IN PLS_INTEGER DEFAULT 100)
  3  IS
  4     bulk_errors           EXCEPTION;
  5     PRAGMA EXCEPTION_INIT (bulk_errors, -24381);
  6
  7     CURSOR incoming_cur
  8     IS
  9        SELECT * FROM em_incoming;
 10
 11     l_incoming            em_memes_pkg.incoming_t;
 12
 13     l_affected_meme_ids   em_memes_pkg.meme_ids_t;
 14        := em_memes_pkg.meme_ids_t ();
 15     
 16     PROCEDURE get_next_set_of_incoming ....
 17     PROCEDURE unpack_and_record_mentions ....
 18     PROCEDURE reset_meme_statuses ....
 19     
 20  BEGIN
 21     OPEN incoming_cur;
 22
 23     LOOP
 24        get_next_set_of_incoming (l_incoming);
 25        EXIT WHEN l_incoming.COUNT = 0;
 26
 27        unpack_and_record_mentions (l_incoming, l_affected_meme_ids);
 28     END LOOP;
 29
 30     CLOSE incoming_cur;
 31
 32     reset_meme_statuses (l_affected_meme_ids);
 33  END;

Line(s)	Description
2	BULK COLLECT returns multiple rows with each fetch. This parameter controls the number of rows fetched.
4–5	Because the original procedure traps exceptions, logs the errors, and continues processing, use SAVE EXCEPTIONS in the FORALL statements and include an exception handler for ORA-24381.
11–13	Declare nested tables based on the packaged types to be used by the nested procedures.
16–18	The nested procedures.
21–32	The now small and very readable executable section. Here’s the narrative version of the code: Open the cursor that identifies all the incoming rows to be processed. Get the next N rows of incoming data. Stop when the collection is empty. Unpack each incoming row, and insert it into the mentions table. Add all IDs of memes that were mentioned into the collection of affected memes. After inserting all mentions, reset the statuses of only those memes that were affected.

Now let’s take a closer look at each of the nested procedures, correlating each of these with the phases I described earlier.

Phase 1: Get incoming rows. When you’re switching to BULK COLLECT to fetch data, there are a few key guidelines to keep in mind:

You can use BULK COLLECT with static SELECT statements (SELECT BULK COLLECT INTO and FETCH BULK COLLECT INTO) and dynamic queries (EXECUTE IMMEDIATE BULK COLLECT INTO).
If your original code contains a cursor FOR loop that fetches many rows (thousands and higher) but the loop body does not have any nonquery DML (inserts, updates, deletes), you probably do not need to switch to BULK COLLECT. The PL/SQL compiler automatically optimizes cursor FOR loops to return 100 rows with each fetch.
BULK COLLECT and FORALL work with PL/SQL collections—arraylike structures. These collections consume process global area (PGA) memory, which is allocated per session. It is possible for sessions to run out of memory and return the “ORA-04030: out of process memory” error. The best way to manage memory in bulk processing is to use the LIMIT clause with BULK COLLECT.
Any collections populated by BULK COLLECT are either empty or filled sequentially from index value 1. This means that every index value between that returned by the FIRST method (lowest defined index value) and the LAST method (highest defined index value) is defined.

With all that in mind, let’s now implement the get_next_set_of_incoming procedure (called in the new em_update_status procedure). The get_next_set_of_incoming procedure passes back a collection of incoming records; BULK COLLECT will improve the performance, and the LIMIT parameter will help control PGA consumption:

PROCEDURE get_next_set_of_incoming (
  incoming_out      
    OUT NOCOPY em_memes_pkg.incoming_t)
IS
BEGIN
  FETCH incoming_cur
    BULK COLLECT INTO incoming_out
    LIMIT bulk_collect_limit_in;
END;

First, note that I use the NOCOPY hint in my parameter definition. By using NOCOPY, I’m asking Oracle Database to not make a local copy of my collection inside the procedure. The result will be lower PGA consumption and improved performance.

Next, because I already opened the cursor on line 21 of the new em_update_status procedure, all I need to do in this get_next_set_of_incoming procedure is fetch. And I want to retrieve multiple rows with each fetch, so I put the keywords BULK COLLECT in front of INTO and then provide a collection after INTO.

I then add the LIMIT clause, which tells Oracle Database to retrieve up to that number of rows with each FETCH. I could have provided a literal instead of a parameter, as in

LIMIT 100;

In fact, 100 is a good default value. You get a nice boost in performance, and you don’t consume enormous amounts of PGA.

Yet I hesitate to hard-code the value. If over time this procedure needs to work with more and more data, I may want to experiment with different limit values to see if I can coax more performance out of the process without breaking the PGA bank.

By passing the limit value as a parameter, I can change the value whenever I want without having to recompile the code.

OK, I’ve got the next N rows of incoming data. Now it’s time to move them to the em_mentions table.

Phases 2 and 3: Unpack and record mentions. With phased processing, you do not unpack a single row and then insert it into a table. Instead, you unpack all the rows fetched and then push them all into a table by using FORALL. In this case, all the rows fetched by get_next_set_of_incoming will be unpacked and pushed into the em_mentions table with FORALL.

Listing 3 shows the implementation of the unpack_and_record_mentions procedure. As you can see, it is much more involved than the code required to fetch data. That’s because it contains an INSERT, which means that you have to pay a lot of attention to error processing.

Code Listing 3: Unpack and record mentions

 1  PROCEDURE unpack_and_record_mentions (
 2    incoming_in             IN     em_memes_pkg.incoming_t,
 3    affected_meme_ids_out   IN OUT em_memes_pkg.meme_ids_t)
 4  IS
 5    l_affected_this_time  em_memes_pkg.meme_ids_t;
 6    l_mentions            em_memes_pkg.mentions_t := em_memes_pkg.meme_ids_t();
 7    l_status              em_memes.meme_status%TYPE;
 8
 9    PROCEDURE add_to_affected_memes
10    IS
11    BEGIN
12      affected_meme_ids_out :=
13        affected_meme_ids_out
14              MULTISET UNION DISTINCT l_affected_this_time;
15    END;
16
17    PROCEDURE record_errors
18    IS
19      l_bad_index   PLS_INTEGER;
20    BEGIN
21      FOR indx IN 1 .. SQL%BULK_EXCEPTIONS.COUNT
22      LOOP
23        l_bad_index := SQL%BULK_EXCEPTIONS (indx).ERROR_INDEX;
24
25        em_memes_pkg.log_error (
26          error_code_in  => SQL%BULK_EXCEPTIONS (indx).ERROR_CODE,
27          error_msg_in   => 'Unable to insert mention for incoming mention '
28                              || incoming_in (l_bad_index).meme_name
29                              || '-'
30                              || incoming_in (l_bad_index).source_name
31                              || '-'
32                              || TO_CHAR (
33                                    incoming_in (l_bad_index).occurred_on,
34                                    'YYYY-MM-DD HH24:MI:SS'));
35      END LOOP;
36    END;
37  BEGIN
38    l_mentions.EXTEND (incoming_in.COUNT);
39
40    FOR indx IN 1 .. incoming_in.COUNT
41    LOOP
42      l_mentions (indx) :=
43        em_memes_pkg.unpacked_incoming (incoming_in (indx));
44    END LOOP;
45
46    FORALL indx IN 1 .. l_mentions.COUNT SAVE EXCEPTIONS
47      INSERT INTO em_mentions (meme_id,
48                                 source_name,
49                                 source_details,
50                                 occurred_on)
51             VALUES (l_mentions (indx).meme_id,
52                     l_mentions (indx).source_name,
53                     l_mentions (indx).source_details,
54                     SYSDATE)
55       RETURNING meme_id
56               BULK COLLECT INTO l_affected_this_time;
57
58    add_to_affected_memes;
59  EXCEPTION
60    WHEN bulk_errors
61    THEN
62      add_to_affected_memes;
63
64      record_errors;
65  END;

Table 2 includes descriptions of the significant lines of code in Listing 3. These descriptions are out of order (not simply listed from the first to last lines), because I am again using nested subprograms to improve readability. And that usually means that you read from the bottom up: start with the main executable section and then jump to subprograms as needed.

Line(s)	Description
2 and 3	The next set of incoming rows is passed to the procedure, which passes back an updated list of all those meme IDs that have received at least one mention. This collection will drive the final “reset meme status” step.
38	Start of the executable section: make room in the local nested table of mentions to hold all the incoming data.
40–44	A classic Phase 2: for each of the incoming rows, convert it to the mentions format and load it into the array. Notice the use of the same index value for the l_mentions collection as in the incoming_in collection. Because incoming_in was populated with BULK COLLECT, it is either empty or filled sequentially from 1.
46–56	Time to push all that nicely prepared data into the em_mentions table with FORALL. Because the collection is sequentially filled from 1, use the simplest form for the FORALL header: FORALL indx IN 1 .. l_mentions.COUNT If I used this form (very similar to a numeric FOR loop) and the collection had gaps (index values between FIRST and LAST that are not defined), the procedure would fail with the “ORA-22160: element at index [N] does not exist” error. To avoid this problem, you can use INDICES OF or VALUES OF; the former is discussed in the “Phases 2 and 3: Reset meme statuses” section. Next I use the RETURNING clause to capture the IDs of all mentioned memes. I need to use BULK COLLECT for my RETURNING clause, because I am likely to insert more than one row and therefore will return more than one ID.
58	Call the add_to_affected_memes procedure. This is a procedure declared within the parent procedure. Its implementation is very simple: 9 PROCEDURE add_to_affected_memes 10 IS 11 BEGIN 12 affected_meme_ids_out := 13 affected_meme_ids_out 14 MULTISET UNION DISTINCT l_affected_this_time; 15 END; It uses the MULTISET UNION operation to add the ID of the newly affected meme to the existing list of previously affected memes. By adding the DISTINCT keyword, I ensure that there will not be duplicates in the resulting nested table.
59–64	Time to handle any errors during insertion into the em_mentions table. The WHEN clause references the bulk_errors exception, which is declared at the top of the procedure. Within the handler, I call the add_to_affected_memes procedure.
17–36	The record_errors procedure iterates through the contents of SQL%BULK_EXCEPTIONS, a pseudocollection of records that contains the index in the collection for which an error was returned by the SQL engine and the error code. It is called a pseudocollection because it is only a partial implementation of an associative array; specifically, only the COUNT method is defined for this collection.

Table 2: unpack_and_record_mentions procedure line descriptions

Phases 2 and 3: Reset meme statuses. At this point, all meme mentions have been inserted that can be inserted. In the process, the unpack_and_record_mentions procedure filled up the l_affected_meme_ids collection, so now it’s time to use that collection to drive the second FORALL statement, performing the updates against the em_memes table.

In the original program, for each mention, the new meme status was computed. If it was NULL, no update was performed. The new program will need to incorporate these steps. The code for reset_meme_statuses is shown in Listing 4.

Code Listing 4: Reset meme statuses

1  PROCEDURE reset_meme_statuses (
 2     affected_meme_ids_in   IN em_memes_pkg.meme_ids_t)
 3  IS
 4     l_affected_meme_ids   em_memes_pkg.meme_ids_t
 5                              := affected_meme_ids_in;
 6
 7     TYPE meme_statuses_t IS TABLE OF em_memes.meme_status%TYPE
 8        INDEX BY PLS_INTEGER;
 9
10     l_statuses            meme_statuses_t;
11  BEGIN
12     FOR indx IN 1 .. affected_meme_ids_in.COUNT
13     LOOP
14        em_memes_pkg.reset_meme_status (
15           meme_id_in       => affected_meme_ids_in (indx),
16           new_status_out   => l_statuses (indx));
17
18        IF l_statuses (indx) IS NULL
19        THEN
20           /* No update */
21           l_affected_meme_ids.delete (indx);
22        END IF;
23     END LOOP;
24
25     FORALL indx IN INDICES OF l_affected_meme_ids
26       SAVE EXCEPTIONS
27        UPDATE em_memes
28           SET meme_status = l_statuses (indx)
29         WHERE meme_id = l_affected_meme_ids (indx);
30  EXCEPTION
31     WHEN bulk_errors
21     THEN
33        DECLARE
34           l_index   PLS_INTEGER;
35
36           FUNCTION bind_array_index_for (
37              bind_array_in    IN em_memes_pkg.meme_ids_t,
38              error_index_in   IN PLS_INTEGER)
39              RETURN PLS_INTEGER
40           IS
41              l_index   PLS_INTEGER := bind_array_in.FIRST;
42           BEGIN
43              FOR indx IN 1 .. error_index_in - 1
44              LOOP
45                 l_index := bind_array_in.NEXT (l_index);
46              END LOOP;
47
48              RETURN l_index;
49           END;
50        BEGIN
51           FOR indx IN 1 .. SQL%BULK_EXCEPTIONS.COUNT
52           LOOP
53              l_index :=
54                 bind_array_index_for (
55                    l_affected_meme_ids,
56                    SQL%BULK_EXCEPTIONS (indx).ERROR_INDEX);
57
58              em_memes_pkg.log_error (
59                 error_code_in   => SQL%BULK_EXCEPTIONS (indx).ERROR_CODE,
60                 error_msg_in    =>    'Unable to reset meme status '
61                                    || l_affected_meme_ids (
62                                          l_index));
63
64              /* Mimic SAVEPOINT behavior, removing mentions for this meme */
65              DELETE FROM em_mentions
66                    WHERE     meme_id =
67                                 l_affected_meme_ids (l_index)
68                          AND occurred_on >= TRUNC (SYSDATE);
69           END LOOP;
70        END;
71  END;

Line(s)	Description
4	Copy the collection passed into the procedure into a local variable.
7–10	Calculate all the meme statuses (Phase 2). Declare a collection to hold all those new statuses.
12–23	Phase 2 preparation of collection: for each affected meme ID, calculate the status and then either put that status into the l_statuses collection or remove the element from the affected meme IDs list.
25–29	Update the rows in the em_memes table by using a FORALL statement. But now the header of this statement looks a bit different: FORALL indx IN INDICES OF l_affected_meme_ids INDICES OF offers an alternative to the more common “IN low .. high” syntax and is used when the collection referenced inside the DML statement might be sparse (at least one index value between FIRST and LAST is undefined). Using INDICES OF, I tell the PL/SQL engine to use only index values that are defined in the l_affected_meme_ids, thereby skipping over undefined values.
31–70	Handle the bulk_errors exception, and loop through the contents of the SQL%BULK_EXCEPTIONS collection.

Table 3: Reset_meme_statuses procedure line descriptions

(The code in Listing 4 is also in the em_update_status_new.sql file in the download for this article). Table 3, also available in the Oracle-hosted online article, presents descriptions of the significant lines of code in Listing 4.

So, Is It Faster?

It might be lots of fun to completely reorganize one’s program in hopes of improving performance, but we can’t just assume that the resulting code actually does run quickly.

The results the extremememe.info team and I found when we put together and ran a test script (em_test.sql in the download for this article) were unambiguous: the new em_update_status runs between two and three times as fast as the original procedure.

Not too shabby.

Faster but More (and More Complex) Code

After completing the rewrite, and verifying the performance, of the new em_update_status procedure, the developer team at extremememe.info and I sat down to review what we’d learned.

The team lead made the first point:

“No matter how much better the performance is, the most important outcome of this process was the identification of the flaw in our original algorithm. I still can’t believe we didn’t realize how unnecessary it was to recalculate the meme status with each new mention.”

Because it’s important to take these learning moments and extend them into the future, I suggested they start up a regular monthly code review meeting. “Pick another program, and walk-and-talk your way through it. In the process of explaining the code, you are sure to identify lots of possibilities for improvement—and maybe another bug or two.” Then we turned our attention to the impact of the bulk processing features on the code.

One big change stood out to everyone: the length of the procedure had increased from 42 to 157 lines. If it was any sort of reassurance, I told the team that this is what everyone experiences. And, as one of the junior programmers on the team pointed out, the new version is not only longer but also much more complicated. The switch from integrated, row-level processing to phased, bulk processing has ripple effects in terms of transaction management and error handling.

Everyone agreed, though, that as long as nested subprograms were used to hide that complexity and improve readability, the performance benefits more than justified the increased cost of maintenance of that code.

A stevenfeuersteinonplsql.blogspot.com blog post expands on this article to take you through the extremememe.info code changes in a step-by-step process.

Take the Challenge

Each PL/SQL article offers a quiz to test your knowledge of the information provided in it. The quiz appears below and also at PL/SQL Challenge (plsqlchallenge.com), a website that offers online quizzes on the PL/SQL language as well as SQL, Oracle Application Express, database design, and deductive logic.

Here is your quiz for this article:

Which of the following statements about BULK COLLECT and FORALL are correct?

a. System global area (SGA) is the only type of memory affected by a switch from row-by-row processing to bulk processing.

b. You can use BULK COLLECT only with nondynamic queries.

c. You can have only one nonquery DML statement inside a FORALL statement.

d. If you want to make sure the SQL engine at least attempts to execute every DML statement generated by FORALL, add the SAVE EXCEPTIONS clause.

Next Steps

DOWNLOAD
Oracle Database 12c
scripts for this article

TEST your PL/SQL knowledge

READ more about Oracle Database 12c

BULK COLLECT and FORALL
“Bulk Processing with BULK COLLECT and FORALL”
“Oracle 10g Adds More to FORALL”

Steven Feuerstein

Steven Feuerstein is Oracle Corporation's Developer Advocate for PL/SQL, and an expert on the Oracle PL/SQL language, having written ten books on PL/SQL, including Oracle PL/SQL Programming and Oracle PL/SQL Best Practices (all published by O'Reilly Media). Steven has been developing software since 1980, spent five years with Oracle back in the "old days" (1987-1992), and was PL/SQL Evangelist for Quest Software (and then Dell) from January 2001 to February 2014 - at which point he returned joyfully to Oracle Corporation. He was one of the original Oracle ACE Directors and writes regularly for Oracle Magazine, which named him the PL/SQL Developer of the Year in both 2002 and 2006. He is also the first recipient of ODTUG's Lifetime Achievement Award (2009).

DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.

Share

The Joy of Low-Hanging Fruit

Are you using BULK COLLECT and FORALL for bulk processing yet?