Database, SQL and PL/SQL

On the PGA and Indexing Collections

Best practices for knowing your PGA impact and indexing collections

By Steven Feuerstein

May/June 2008

My DBA wants me to reduce the amount of PGA (program global area) memory I use in my collection-based programs. Isn't it the DBA's job to manage memory, and if it isn't, how am I supposed to know how much PGA memory I am using?

Sure, it is ultimately the responsibility of the database administrator to analyze memory usage and to either make the necessary adjustments in the database configuration or recommend to the developers what should be changed in their code.

Having said that, I strongly encourage all developers to become as aware of and as involved as possible in all aspects of deployment of your code, to ensure the best-possible results for your users.

It is certainly possible, for example, for you to measure the amount of PGA memory your program uses. To do so, however, you will need to go back to your DBA and ask that person to grant SELECT privileges to your schema on the following views:

SYS.V_$SESSION
SYS.V_$SESSTAT
SYS.V_$STATNAME

Once you can query from these views, compile the show_pga_memory procedure, shown in Listing 1 (and included in the show_pga_memory.sp file in the sample code download for this column), in your schema. You can then call show_pga_memory after running your application code to see how large the PGA has become.

Code Listing 1: show_pga_memory procedure

CREATE OR REPLACE PROCEDURE show_pga_memory (
   context_in   IN   VARCHAR2 DEFAULT NULL
)
IS
   l_memory   NUMBER;
BEGIN
    SELECT st.VALUE
       INTO l_memory
      FROM SYS.v_$session se, SYS.v_$sesstat st, SYS.v_$statname nm
    WHERE se.audsid = USERENV ('SESSIONID')
        AND st.statistic# = nm.statistic#
        AND se.SID = st.SID
        AND nm.NAME = 'session pga memory';
   DBMS_OUTPUT.put_line (   CASE
                                            WHEN context_in IS NULL
                                                  THEN NULL
                                            ELSE context_in || ' - '
                                        END
                                    || 'PGA memory used in session = '
                                    || TO_CHAR (l_memory)
                                   );
END show_pga_memory;

When I used this procedure to test BULK COLLECT with different LIMIT clause values, I found that the size of my PGA ranged from 4,613,072 bytes to 46,883,792 bytes.

Indexing Collections

I want to use associative arrays to quickly look up an office product name for a given product number and a product number for a given product name. Product names are unique, and product numbers are integers. I see how I can use the product number as the index value in my collection of names, but I also need to reverse the process and find a product number for a given product name. Can I create another index on a collection's contents?

A collection has only one index, which is defined explicitly in associative array types with the INDEX BY clause (as integer or string) and implicitly in nested table and varray types (as integer only).

So if you need to find the index value (location) at which a given element resides, you will have to do one of the following:

Scan the contents of the collection until you find a match.
Construct a second collection that will emulate (act like) an index on the first or primary collection.

I demonstrate both of these techniques below, based on a set of office product data that is initialized within the package body. Scan until you find it. First, I create a package whose specification includes a collection of product names, as shown in Listing 2.

Code Listing 2: Initial office_products package specification and body

CREATE OR REPLACE PACKAGE office_products
IS
   TYPE names_list_aat IS TABLE OF VARCHAR2 (100)
       INDEX BY PLS_INTEGER;
   list_of_names   names_list_aat;
END office_products;
CREATE OR REPLACE PACKAGE BODY office_products
IS
BEGIN
   list_of_names (1567) := 'Stapler';
   list_of_names (75009) := 'Mousepad';
   list_of_names (101) := 'Pencil sharpener';
END office_products;

Suppose I have the name of the product and want to find its index value so that I can remove that element. Listing 3 contains the product_lookup1 function, which scans the collection until it finds a match on the name.

Code Listing 3: Function for finding name of product

CREATE OR REPLACE FUNCTION product_lookup1 (NAME_IN IN VARCHAR2)
   RETURN PLS_INTEGER
IS
   c_count    PLS_INTEGER := office_products.list_of_names.COUNT;
   l_index     PLS_INTEGER := office_products.list_of_names.FIRST;
   l_return    PLS_INTEGER;
BEGIN
   WHILE (l_index IS NOT NULL)
   LOOP
      IF office_products.list_of_names (l_index) = NAME_IN
      THEN
          l_return := l_index;
          l_index := NULL;
      ELSE
          l_index := office_products.list_of_names.NEXT (l_index);
      END IF;
   END LOOP;
   RETURN l_return;
END product_lookup1;

This is a very simple program, easy to understand and maintain. Yet it may perform inefficiently for large collectionsespecially if I try to locate the index value for a name that is not in the list. To do so, I would have to scan the full contents of the collection. Clearly, I'd rather not have to look through the collection element by element.

Emulating a table column index. A much better approach is to follow the example of Oracle Database. If you want to quickly find a row in a table based on a given column value, you create an index on that column. For this question, if I were working with a relational table, I would create an index on the product name.

I can do something similar for a collection, but to do so, I will have to define a second collection to emulate the index on a table column. For the remainder of this column, I will use the term index collection when referring to the collection I create to emulate an index on a table column.

A collection can have only one index, and my list_of_names collection is already indexed by product number. So I will have to create a second collection and index this one by product name. Listing 4 shows the modified office_products package specification.

Code Listing 4: Modified office_products package specification

CREATE OR REPLACE PACKAGE office_products
IS
   SUBTYPE name_t IS VARCHAR2(100);
   SUBTYPE index_t IS PLS_INTEGER;
   TYPE names_list_aat IS TABLE OF name_t
      INDEX BY index_t;
   TYPE index_list_aat IS TABLE OF index_t
      INDEX BY name_t;
   list_of_names     names_list_aat;
   index_by_name   index_list_aat;
END office_products;

Note, first of all, that I no longer use hard-coded declarations for the types referenced in my collection types. Instead, I have switched to using subtypes.

This approach is completely optional but highly recommended. In other words, I could define my collection types as follows:

TYPE names_list_aat IS TABLE OF VARCHAR2(100) INDEX BY PLS_INTEGER;
TYPE index_list_aat IS TABLE OF
PLS_INTEGER INDEX BY VARCHAR2(100);

Both approaches will execute in exactly the same way. So why bother with subtypes? Because the resulting code explains itself much more clearly, making it easier to understand, debug, and maintain.

Because I'm using subtypes, my package specification tells mevery explicitly and without the need for commentsthat I have these two collections to work with:

office_products.list_of_names, a list of the product names indexed by the index value (product number)
office_products.index_by_name, a list of the product numbers indexed by the product name

Of course, I still need to populate these collections, so let's look at the package body in Listing 5.

Code Listing 5: Modified office_products package body

CREATE OR REPLACE PACKAGE BODY office_products
IS
   PROCEDURE initialize
   IS
      PROCEDURE add_product (index_in IN index_t, NAME_IN IN name_t)
      IS
      BEGIN
          list_of_names (index_in) := NAME_IN;
          index_by_name (NAME_IN) := index_in;
      END add_product;
   BEGIN
       add_product (1567, 'Stapler');
       add_product (75009, 'Mousepad');
       add_product (101, 'Pencil sharpener');
   END initialize;
BEGIN
   initialize;
END office_products;

I have, first of all, changed the structure of the body, having moved the lines of code used to populate the names collection into its own initialize procedure. I then call that procedure in the initialization section of the package. I find it easier to maintain such code in its subprogram than to have a long, complex block of code at the end of my package. Also, I could now, if necessary, expose this subprogram in the package specification so that it can be reinitialized as needed during a session.

The initialize procedure itself contains a local procedure, add_product, that takes a name and does two things:

Adds that name to the list of names
Updates the index collection to associate that product name (the index value of this string-indexed collection) with the index value of the original list of names

I then call add_product for each row in the products table.

With this emulated index in place, my lookup function now becomes very simple, as shown in Listing 6. This function returns the index value for the specific name, if a row exists for that name. If not, the function simply returns NULL. Because NULL is never allowed as an index value, it is a good indicator for "Sorry, no product with that name."

Code Listing 6: Modified product lookup (product_lookup2) function

CREATE OR REPLACE FUNCTION product_lookup2 (NAME_IN IN office_products.name_t)
   RETURN PLS_INTEGER
IS
   l_index   PLS_INTEGER;
BEGIN
   IF office_products.index_by_name.EXISTS (NAME_IN)
   THEN
       l_index := office_products.index_by_name (NAME_IN);
   END IF;
   RETURN l_index;
END product_lookup2;

This second approach, taking advantage of the emulated index, results in a very simple lookup program, but, of course, the setup code is more complicated. Increased code complexity is justified if I can get a substantial boost in performance. Let's see if that is the case.

Comparing the performance of lookup programs. I want to compare performance for these scenarios:

Scan collection for a name in the collection
Scan collection for a name not in the collection
Index find for a name in the collection
Index find for a name not in the collection

First I change the initialize procedure so that I can load a nontrivial volume of data into the collections (I will no longer populate the collections in the initialization section), as shown in Listing 7.

Code Listing 7: Modified initialize procedure

CREATE OR REPLACE PACKAGE BODY office_products
IS
   PROCEDURE initialize (count_in IN PLS_INTEGER)
   IS
      PROCEDURE add_product (index_in IN index_t, NAME_IN IN name_t)
      IS
      BEGIN
          list_of_names (index_in) := NAME_IN;
          index_by_name (NAME_IN) := index_in;
      END add_product;
   BEGIN
      list_of_names.DELETE;
      index_by_name.DELETE;
      add_product (1567, 'Stapler');
      add_product (75009, 'Mousepad');
      add_product (101, 'Pencil sharpener');
      FOR indx IN 1 .. count_in
      LOOP
          add_product (indx * 1000, 'Product ' || indx);
      END LOOP;
   END initialize;
END office_products;

I then construct a compare_lookups procedure that relies on DBMS_UTILITY.GET_CPU_TIME to help me calculate elapsed CPU time down to the hundredth of a second. Listing 8 includes the code for timing the performance of the collection scan implementation.

Code Listing 8: compare_lookups procedure

CREATE OR REPLACE PROCEDURE compare_lookups (
   coll_count_in   IN   PLS_INTEGER
 , iterations_in   IN   PLS_INTEGER
)
IS
   l_time_before   PLS_INTEGER;
   l_index       PLS_INTEGER;
BEGIN
   office_products.initialize (coll_count_in);
  .
  .
  .
   l_time_before := DBMS_UTILITY.get_cpu_time;
   FOR indx IN 1 .. iterations_in
   LOOP
      l_index := product_lookup1 (
         'Product ' || TO_CHAR (coll_count_in / 2));
   END LOOP;
   DBMS_OUTPUT.put_line (
      '   Scan Find - name exists = '
    || TO_CHAR (DBMS_UTILITY.get_cpu_time - l_time_before));
     .
     .
     .
END compare_lookups;

I then call this compare_lookups procedure with variations in the number of rows in the collection and the number of calls to the lookup programs:

BEGIN
   compare_lookups (100,      1000);
   compare_lookups (1000,    1000);
   compare_lookups (100000, 1000);
   compare_lookups (100000, 10000);
END;

Listing 9 includes the results I obtained (in hundredths of a second) in an Oracle Database 10g Release 2 instance on a laptop with 4GB of memory.

Code Listing 9: Results of compare_lookups procedure

COUNT in collections = 100
Number of times program lookup called = 1000
   Scan Find - name exists = 0
   Scan Find - name does not exist = 2
   Index Find - name exists = 0
   Index Find - name does not exist = 0
COUNT in collections = 1000
Number of times program lookup called = 1000
   Scan Find - name exists = 12
   Scan Find - name does not exist = 11
   Index Find - name exists = 2
   Index Find - name does not exist = 0
COUNT in collections = 100000
Number of times program lookup called = 1000
   Scan Find - name exists = 681
   Scan Find - name does not exist = 1391
   Index Find - name exists = 0
   Index Find - name does not exist = 0
COUNT in collections = 100000
Number of times program lookup called = 10000
   Scan Find - name exists = 6578
   Scan Find - name does not exist = 14043
   Index Find - name exists = 1
   Index Find - name does not exist = 0

From this data, I draw the following conclusions:

For collections with small amounts of data, you can take either approachthe scan find or the index findto finding the index containing a certain element, and performance will be fine.
For collections with hundreds and certainly thousands of elements, you should build an index collection and use that to find indexes for elements. Performance is stable regardless of collection counts.

Finally, as with any performance analysis and conclusions, you should run the script compare_lookups.sql (available in the sample code download for this column) yourself on your version of Oracle Database and confirm the results. Don't forget. . . . Collections offer a very efficient mechanism for looking up data. Retrieving data from a collection cached in the program global area is much faster than doing so from a persistent table's data cached in the system global area.

You must always remember the following, however, when using package-level collections in this way:

The data you are caching in the collection is static while the user is asking for the information. If the source of this data changes (if, for example, it is queried from a table), your cache will be out-of-date.
Your database server must have enough memory (in addition to the SGA allocation) to handle all the PGAs for each session connected to Oracle Database that uses this package-based cache. Each PGA will have its own copy of the cache.

If you are developing your software for Oracle Database 11g, you should take advantage of the function result cache, which I wrote about in the September/October issue of Oracle Magazine ("On the PL/SQL Function Result Cache,"). It offers performance similar to that of package-based caches but is shared across all sessions in the database instance and is automatically invalidated if any changes are made to dependent tables.

Regardless of the approach you take, remember that collections have only a single index but that you can always add your own emulated indexes by populating other collections with the cross-referencing information.

Next Steps

Share

On the PGA and Indexing Collections

Best practices for knowing your PGA impact and indexing collections