Database, SQL and PL/SQL

Taking Up Collections

 

Part 1 in a series that looks at enrichments to PL/SQL in Oracle Database 10g.

By Steven Feuerstein Oracle ACE Director

September/October 2003

 

During its lifetime, Oracle PL/SQL has become faster, easier to use, and richer in its feature set. Oracle Database 10g continues this PL/SQL tradition of speed, ease, and feature growth by offering a useful array of enhancements, including:

  • Dramatically faster excution speed, thanks to transparent performance boosts including a new optimizing compiler, better integrated native compilation, and new datatypes that help out with number-crunching applications.

  • The FORALL statement, made even more flexible and useful. For example, FORALL now supports nonconsecutive indexes.

  • Regular expressions are available in PL/SQL in the form of three new functions ( REGEXP_INSTR, REGEXP_REPLACE , and REGEXP_SUBSTR ) and the REGEXP_LIKE operator for comparisons.

  • Collections, improved to include such things as collection comparison for equality and support for set operations on nested tables.

With Oracle Database 10g, PL/SQL maintains its position as the most efficient, most productive programming language for Oracle Database. Its dramatic uptick in performance and support for IEEE arithmetic and regular expressions now open up entirely new areas of functionality for which PL/SQL will be the language of choice.

This article, the first in a series on PL/SQL in Oracle Database 10g, will explore some of the collection-related enhancements of Oracle Database 10g.


Comparing Collections

When I first started training developers on PL/SQL in the late 1990s, few people were using (or were even aware of) packages, which many (myself included) consider to be the cornerstone of any well-designed PL/SQL application. Today, packages are widely used. Now, the new frontier of critical but underutilized PL/SQL features seems to be the use of collections.

Collections are Oracle's version of arrays; collections are single-dimensioned lists. Back in Oracle7, collections (then known as "PL/SQL tables") were handy but limited in functionality and performance. Each subsequent release of Oracle, however, brought improvement to collections. With Oracle Database 10g, these data structures are powerful, fast, and a must-use for virtually any complex PL/SQL application project.

One key enhancement to collections in Oracle Database 10g is the ability to compare the contents of two collections for equality (and inequality as well). Prior to Oracle Database 10g, you could compare two collections, but you had to write a function to do it. There are several complicated factors to consider when writing such a program, including:

  • You must write a separate program for each type of collection you are using. Even if both collections hold the same type of data, unless they are defined based on the exact same TYPE , you will need different functions for the compare.

  • A row-by-row comparison of the contents of the table is required, which means that you must do a "full collection scan." The code to do this is not very complicated. But it is tedious and error-prone, especially if you are comparing a collection of complex datatypes, such as records, objects, or XMLTypes.

  • You have to decide what to do about NULLs. Are two rows equal if they both contain NULLs? According to Oracle: "They are neither equal nor unequal," but you might decide differently, and you have to write the code to handle that.

The consequence of all this complexity is that you don't often write such a program, and you might even work around the need for it in your application.

In the Oracle Database 10g distribution, the collcompare.sql file contains an example of such a program for a collection of records based on the employee table.

Suppose that I installed the emp_ coll_pkg.equal function in the collcompare.sql script. I could use it as follows:

DECLARE
  dbas emp_coll_pkg.employee_tt;
  developers emp_coll_pkg.employee_tt;
BEGIN
  populate_lists (dbas, developers);
  IF emp_coll_pkg.equal (dbas, developers)
  THEN
    DBMS_OUTPUT.PUT_LINE (
       'Likely a very small IT organization!');
  END IF;
END;

That's readable, concise code. (If only you didn't have to go through the pain of writing the equal function!) To make your lives easier, however, Oracle Database 10g now allows you to perform "native" compares of two nested tables. In other words, you can avoid writing any collection-specific comparison logic and get right to the point, as shown below:

DECLARE
  dbas emp_coll_pkg.employee_tt;
  developers emp_coll_pkg.employee_tt;
BEGIN
  populate_lists (dbas, developers);
  IF dbas = developers
  THEN
     DBMS_OUTPUT.PUT_LINE (
        'Likely a very small IT organization!');
END IF;
END;

At this time, collection compare is available only for nested tables. In other words, you cannot directly compare the contents of two associative arrays (formerly known as "index by tables") or varying arrays. Let's hope that in the next release of Oracle Database, this compare functionality is extended to associative arrays and varying arrays.


Set Theory and Multiset Union

The SQL language has long offered the ability to apply set operations ( UNION, INTERSECT , and MINUS ) to the result sets of queries. In Oracle Database 10g, you can now use those same high-level, very powerful operators against nested tables (and only nested tables) in your PL/SQL programs and on nested tables declared as columns inside relational tables.

Let's take a look at some of the syntax needed to do this, starting with UNION .

First, I create a schema-level nested table type:

CREATE OR REPLACE TYPE strings_nt
     IS TABLE OF VARCHAR2(100);
/

Then I define a package and within it create and populate two nested tables of this type, each containing some of my father's and my favorite things:

CREATE OR REPLACE PACKAGE favorites_pkg
  IS
    my_favorites    strings_nt
   := strings_nt ('CHOCOLATE'
               , 'BRUSSEL SPROUTS'
               , 'SPIDER ROLL'
                 );
 dad_favorites   strings_nt
   := strings_nt ('PICKLED HERRING
                 , 'POTATOES'
               , 'PASTRAMI'
               , 'CHOCOLATE'
                 );
 PROCEDURE show_favorites (
    title_in   IN   VARCHAR2
  , favs_in    IN   strings_nt
   );
END;
/

In this package, I also include a procedure to show the contents of a strings_nt nested table. This will come in very handy shortly.

By defining these collections in a package, outside any program, they persist (they maintain their state and values) for the duration of my session or until I change or delete them. This means that I can now write programs outside the package to manipulate the contents of those collections.

Note that this package has been simplified for the purposes of presenting collection functionality. In a production application, you should always take care to "hide" your package data, as with these collections, in the package body, and then provide procedures and functions to manage the data.

Suppose, for example, that I would like to combine these two collections into a single collection of our favorites. Prior to Oracle Database 10g, I would have to write a loop that transfers the contents of one collection to another. Now, I can rely on the MULTISET UNION operator, as shown below:

DECLARE
  our_favorites
     strings_nt := strings_nt ();
BEGIN
  our_favorites :=
     favorites_pkg.my_favorites
       MULTISET UNION
     favorites_pkg.dad_favorites;
   favorites_pkg.show_favorites (
     'ME then DAD', our_favorites);
END;
/

The output from this script is:

ME then DAD
1 = CHOCOLATE
2 = BRUSSEL SPROUTS
3 = SPIDER ROLL
4 = PICKLED HERRING
5 = POTATOES
6 = PASTRAMI
7 = CHOCOLATE

As you can see, the values from both nested tables have been combined. And you can see right away that the MULTISET UNION operator is different from the SQL UNION operator (and, in fact, identical to SQL's UNION ALL operator). When you UNION two SELECT result sets, the SQL engine automatically generates a unique set of ordered results. In other words, if my two nested tables were queries, the UNION would produce this result set:

BRUSSEL SPROUTS
CHOCOLATE
PASTRAMI
PICKLED HERRING
POTATOES
SPIDER ROLL

The data is in alphabetical order, and CHOCOLATE appears just once. Why the different results? Because a nested table is a multiset, defined at mathworld.wolfram.com/Multiset.html as follows:

"A set-like object in which order is ignored, but multiplicity is explicitly significant; therefore, multisets {1, 2, 3} and {2, 1, 3} are equivalent, but {1, 1, 2, 3} and {1, 2, 3} differ."

Oracle documentation states that the difference between nested tables and varying arrays is that the data stored in a nested table column does not preserve its order, while that order is preserved in a varying array. Prior to Oracle Database 10g, this distinction did not mean much in the PL/SQL world. Now, with the set operators, the special characteristics of a multiset, or nested table, show themselves quite clearly.

To better see that nested table data is not ordered and that the MULTISET UNION does not apply an order to the resulting nested table, consider this block:

DECLARE
  our_favorites
     strings_nt := strings_nt ();
BEGIN
  our_favorites :=
     favorites_pkg.dad_favorites
       MULTISET UNION
     favorites_pkg.my_favorites;
 favorites_pkg.show_favorites (
    'DAD then ME', our_favorites);
END;
/

The only change from the previous block is that I have switched the order of the nested tables in the MULTISET UNION operation. The results then change to:

DAD then ME
1 = PICKLED HERRING
2 = POTATOES
3 = PASTRAMI
4 = CHOCOLATE
5 = CHOCOLATE
6 = BRUSSEL SPROUTS
7 = SPIDER ROLL

Suppose you don't want repeats in your unioned nested table. In that case, you will use the MULTISET UNION DISTINCT flavor to modify the UNION operation from the previous example, as follows:

favorites_pkg.dad_favorites
  MULTISET UNION DISTINCT
favorites_pkg.my_favorites;

This would change the output to:

DAD then ME
1 = PICKLED HERRING
2 = POTATOES
3 = PASTRAMI
4 = CHOCOLATE
5 = BRUSSEL SPROUTS
6 = SPIDER ROLL

No more repetition of chocolate. (Grievous loss!)

From these various tests, you can draw some conclusions about UNION :

  • You must use the keyword MULTISET before UNION when performing this operation with nested tables.

  • Unlike the SQL UNION set operator, the collection MULTISET UNION operator does not eliminate duplicates, unless you specify the DISTINCT clause.

  • Unlike the SQL UNION set operator, the collection MULTISET UNION operator does not reorder the elements in the resulting collection. MULTISET UNION preserves the order in each collection and simply appends the contents of the second after the first.

But collections support in Oracle Database 10g offers more than just a UNION operator.


Finding Common Ground

Suppose you want to identify all the common elements in two collections. In SQL, you would use the INTERSECT operator on two relational tables. For PL/SQL in Oracle Database 10g with nested tables, you use the MULTISET INTERSECT operator against collections. Of course, if you have defined columns in relational tables that are nested tables, you can apply MULTISET INTERSECT to those collections as well.

I can easily, for example, determine all the favorite things I share with my dad with the following lines of code:

DECLARE
  our_favorites
     strings_nt := strings_nt ();
BEGIN
   our_favorites :=
      favorites_pkg.my_favorites
         MULTISET INTERSECT
      favorites_pkg.dad_favorites;
favorites_pkg.show_favorites (
  'IN COMMON:', our_favorites);
END;
/

With this output as the result:

IN COMMON:
1 = CHOCOLATE

What's Mine Is Mine

If you want to determine the opposite of commonality, there is MULTISET EXCEPT (which is analogous to the SQL MINUS operator). With MULTISET EXCEPT , you can quickly identify those rows that are in one nested table but not found in another.

Suppose my father wants to determine which of his favorites are unique to him (favorites that I do not share). He can do this as follows:

DECLARE
  our_favorites
     strings_nt := strings_nt ();
BEGIN
  our_favorites :=
     favorites_pkg.dad_favorites
         MULTISET EXCEPT
     favorites_pkg.my_favorites;
  favorites_pkg.show_favorites (
     'ONLY DAD'S:', our_favorites);
END;
/

With this output as the result:

ONLY DAD'S:
1 = HERRING
2 = POTATOES
3 = PASTRAMI

When Uniqueness Counts

The final MULTISET function to introduce is SET . SET (without a MULTISET as prefix) offers the ability to remove duplicate entries from your nested table. In this way, it is similar to the SQL DISTINCT aggregate function.

Suppose I liked chocolate so much that I entered it multiple times in my nested table:

CREATE OR REPLACE PACKAGE favorites_pkg
IS
  my_favorites    strings_nt
    := strings_nt ('CHOCOLATE'
                , 'BRUSSEL SPROUTS'
                , 'SPIDER ROLL'
                , 'CHOCOLATE'
                 );
END favorites_pkg;

If I want to see only the unique values, I simply call the SET function:

DECLARE
  keep_it_simple
    strings_nt := strings_nt ();
BEGIN
  keep_it_simple :=
    SET (favorites_pkg.my_favorites);
  favorites_pkg.show_favorites (
    'DISTINCT SET', keep_it_simple);
END;

With this result:

DISTINCT SET
1 = CHOCOLATE
2 = BRUSSEL SPROUTS
3 = SPIDER ROLL

I can also invoke the SET function in line wherever the nested table itself can be referenced, as shown here:

favorites_pkg.show_favorites (
   'DISTINCT SET',
   SET (favorites_pkg.my_favorites));    

Interesting Applications of New Features

The ease of programming and efficiency of collection manipulation in PL/SQL provides an opportunity to think creatively about how to solve problems.

Consider the issue of how to determine whether two relational tables are equal. Assuming they have the same structure, how can you determine whether they contain the same information? As usual, it is not all that difficult to come up with a logical solution in SQL. The following use of MINUS and UNION accomplishes much of what is needed:

SELECT COUNT (*)
  FROM ((SELECT *
FROM table1
         MINUS
         SELECT *
FROM table2)
        UNION
        (SELECT *
        FROM table2
         MINUS
         SELECT *
          FROM table1));

When this query returns 0 rows, the tables must have the same contents. It hardly looks efficient, however. Now consider the alternative with nested tables in Oracle Database 10g.

First, declare a nested table TYPE in PL/SQL that mimics the structure of the table:

DECLARE
   TYPE table1_tt IS TABLE OF table1%ROWTYPE;
   nt_copy1 table1_tt;
   nt_copy2 table1_tt;

Then use BULK COLLECT to load up these collections with all the data from the table in a most efficient manner:

   PROCEDURE load1 (e IN OUT table1_tt)
   IS
   BEGIN
      SELECT *
      BULK COLLECT INTO e
        FROM table1;
   END;

And now check for table equality simply with this code:

BEGIN
   load1 (nt_copy1);
   load2 (nt_copy2);
   IF nt_copy1 = nt_copy2 THEN ...

That's certainly more straightforward, but what about performance? I used DBMS_UTILITY.GET_TIME in the Oracle Database 10g 10i_tab_compare_with_coll.sql file to calculate elapsed time for these two methods. Executing each approach 1,000 times, I got these results:

  • Nested table compare: .05 seconds

  • MINUS-UNION compare: 5.14 seconds

That's two orders of magnitude improvement in performance! Now, to be sure, there are trade-offs. By relying on nested tables, you are using up memory in the program global area of each session running the code. If you compare the contents of two large tables, you could be consuming a very large amount of memory (in addition to that already consumed by the system global area).
Rethinking the Use of Nested Tables

Oracle first introduced nested tables and varying arrays in Oracle8i, as part of the object-relational model. By that time, I was already comfortable with index-by tables and saw no reason to switch to these new types. I certainly found little occasion to de-normalize my data by defining a column in a relational table as a nested table or VARRAY . And there was no need to explicitly initialize an index-by table or EXTEND before adding a row.

With the fascinating new collection features in Oracle Database 10g, however, it is time to rethink my preferences. You can use a nested table much as you would an associative array (the new name, as of Oracle9i, for index-by tables) with a numeric index. Initializing with a constructor and explicitly extending, while involving slightly more effort, follows accepted standards in the object-oriented world.

The collection-level compare, all by itself, is a convincing argument to use nested tables. The set operators are icing on the cake and could save you many lines of coding (and many hours of testing), depending on application requirements.

Next Steps

READ
more on Oracle Database 10g
oracle.com/database

more Feuerstein articles
oracle.com/technetwork
 oreillynet.com

Feuerstein books
 oreilly.oracle.com

 

DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.