Database, SQL & PLSQL

Streaming Table Functions

Transform your data on the way to your data warehouse.

September/October 2018

A table function is a function you can use like a table in the FROM clause of a SELECT statement. A common usage of table functions is to stream data directly from one process or transformation to the next process without intermediate staging, and a table function used in this way is called a streaming table function. As you might be able to tell from the reference to transformation above, this technique is most often used in data warehouses as part of extract, transform, and load (ETL) operations.

My previous article, “When Is a Function Like a Table? When It’s a Table Function!,” includes an overview of table functions. In this article, I will show you the basic steps needed to create a streaming table function, but before diving into the details, let’s look at a streaming table function example:

INSERT INTO tickers
   SELECT *
     FROM TABLE (doubled (CURSOR (SELECT * FROM stocks)))

What’s going on here? Let’s take it step by step from the inside out:

Code	Description
`SELECT * FROM stocks`	Gets all the rows from the stocks table
`CURSOR ()`	Creates a cursor variable with the CURSOR expression that points to the result set
`()`	Passes that cursor variable to the doubled table function
`doubled ()`	Performs its transformation and returns a nested table of object type instances
`SELECT * FROM TABLE(...)`	Converts the collection into a relational set of rows and columns
`INSERT INTO tickers`	Inserts those rows into the tickers table

Sometimes (often?) you will need to perform more than one transformation as part of the streaming process. No problem. You can certainly string together multiple invocations of table functions:

INSERT INTO tickers
SELECT *
  FROM TABLE (transform2 (
                CURSOR (SELECT *
                          FROM TABLE (transform1 (
                                        CURSOR (SELECT * FROM stocks
       ))))))

Setting Up Tables for Transformation

To transform data from one table to another, you need tables and the data in those tables. In this article, I will start with the stocks table, each row of which contains the opening and closing trading prices for each stock ticker symbol:

CREATE TABLE stocks (
ticker VARCHAR2 (20),
trade_date DATE,
opening_price NUMBER,
closing_price NUMBER)
/

My transformation is simple: for each row in the stocks table, generate two rows for the tickers table (one row each for the opening and closing prices):

CREATE TABLE tickers
(
   ticker      VARCHAR2 (20),
   pricedate   DATE,
   pricetype   VARCHAR2 (1),
   price       NUMBER
)
/

Before continuing, I feel obligated to point out that for this particular transformation (one row in stocks to two rows in tickers), you don’t need a table function to get the job done. For example, you can use INSERT ALL to insert into tickers twice:

INSERT ALL
  INTO tickers (ticker,
                pricedate,
                pricetype,
                price)
VALUES (ticker,
        trade_date,
        ‹O›,
        opening_price)
  INTO tickers (ticker,
                pricedate,
                pricetype,
                price)
VALUES (ticker,
        trade_date,
        ‹C›,
        closing_price)
   SELECT * FROM stocks
/

You could also use UNPIVOT (thanks, Chris Saxon @chrisrsaxon, for making me aware of this technique!):

INSERT INTO tickers (ticker,
                     pricedate,
                     pricetype,
                     price)
   SELECT *
     FROM stocks UNPIVOT (price
                 FOR price_type
                 IN (opening_price AS 'O', closing_price AS 'C'))
/

SQL is an extraordinarily powerful language. It is quite likely that the transformation you would like to perform is doable in pure SQL. And if you can avoid the use of a table function, implementing your requirement in SQL instead, then you should by all means do so. To strengthen your SQL skills, take advantage of workouts, classes, and quizzes at the Oracle Dev Gym, especially the SQL analytics class by Connor McDonald.

If, however, the transformation requires the use of procedural logic (hence, PL/SQL) or if you can’t sort through the SQL syntax, table functions offer a powerful, straightforward way to get the job done.

For the purposes of this article, assume that the transformation is much more complex and requires use of a table function.

Types and Package for Table Functions

As shown in my previous article on table functions, when you need a table function to return more than one piece of data in each collection element (for example, more than just a list of strings or numbers), you need to create an object type and a collection of those object types.

In this article’s example, I want to move stock data from the stocks table to the tickers table, so I need an object type that “looks like” the tickers table.

Ideally, I would create a collection type like this:

CREATE TYPE tickers_nt AS TABLE OF tickers%ROWTYPE;
/

But %ROWTYPE is a PL/SQL declaration attribute and is not known to the SQL engine, so this statement fails with

PLS-00329: schema-level type has illegal reference to TICKERS

Instead, I create an object type that mimics the structure of my table, as follows:

CREATE TYPE ticker_ot
   AUTHID DEFINER IS OBJECT
(
   ticker VARCHAR2 (20),
   pricedate DATE,
   pricetype VARCHAR2 (1),
   price NUMBER
);
/

I then create a nested table of those object types:

CREATE TYPE tickers_nt AS TABLE OF ticker_ot;
/

I plan to use this table function in a streaming process. This means that I will be passing in a set of data (rows and columns) from SQL. To do this, I will also need to define a strong REF CURSOR type that will be used as the data type of the parameter accepting the dataset inside the SQL statement. In the package specification below, I create two REF CURSOR types, one for rows from the stocks table and another for the tickers table.

CREATE OR REPLACE PACKAGE stock_mgr AUTHID DEFINER
IS
   TYPE stocks_rc IS REF CURSOR
      RETURN stocks%ROWTYPE;

   TYPE tickers_rc IS REF CURSOR
      RETURN tickers%ROWTYPE;
END stock_mgr;
/

The variable you declare based on a REF CURSOR type is a cursor variable. Within PL/SQL, you might write code like this:

DECLARE
   l_cursor   stock_mgr.stocks_rc;
   l_stock    stocks%ROWTYPE;
BEGIN
   /* With a static SQL statement */
   OPEN l_cursor FOR SELECT * FROM stocks;

   LOOP
      FETCH l_cursor INTO l_stock;

      EXIT WHEN l_cursor%NOTFOUND;
   END LOOP;

   CLOSE l_cursor;

   /* Or with a dynamic SQL statement */
   OPEN l_cursor FOR 'select * from stocks';

   LOOP
      FETCH l_cursor INTO l_stock;

      EXIT WHEN l_cursor%NOTFOUND;
   END LOOP;

   CLOSE l_cursor;
END;
/

Note that you can use all the usual cursor attributes and operations on cursor variables: FETCH, %FOUND, CLOSE, and so on.

As you will see later, the way you use this REF CURSOR type will be a bit different for streaming table functions in the SQL context.

Define the Table Function

The main distinction between streaming table functions and “normal” table functions such as those addressed in the previous article is that at least one parameter to that streaming table function is a cursor variable. The table function could have more than one cursor variable input and other parameters of other types such as a string or date. In this article, I will stick with the minimum: a single cursor variable parameter.

Generally, the flow within a streaming table function is

Fetch a row from the cursor variable.
Apply the transformation to each row.
Put the transformed data into the collection.
Return the collection when done.

Now let’s see how this pattern unfolds in my doubled function—one stocks row doubled to two ticker rows—and the line descriptions that follow.

 CREATE OR REPLACE FUNCTION doubled (rows_in stock_mgr.stocks_rc)
   RETURN tickers_nt
   AUTHID DEFINER
IS
   TYPE stocks_aat IS TABLE OF stocks%ROWTYPE INDEX BY PLS_INTEGER;
   l_stocks    stocks_aat;

   l_doubled   tickers_nt := tickers_nt ();
BEGIN
   LOOP
      FETCH rows_in BULK COLLECT INTO l_stocks LIMIT 100;
      EXIT WHEN l_stocks.COUNT = 0;

      FOR l_row IN 1 .. l_stocks.COUNT
      LOOP
         l_doubled.EXTEND;
         l_doubled (l_doubled.LAST) :=
            ticker_ot (l_stocks (l_row).ticker,
                       l_stocks (l_row).trade_date,
                       ‹O›,
                       l_stocks (l_row).opening_price);

         l_doubled.EXTEND;
         l_doubled (l_doubled.LAST) :=
            ticker_ot (l_stocks (l_row).ticker,
                       l_stocks (l_row).trade_date,
                       ‹C›,
                       l_stocks (l_row).closing_price);
      END LOOP;
   END LOOP;
   CLOSE rows_in;

   RETURN l_doubled;
END;

Line(s)	Description
1	Use the REF CURSOR type defined in the package for the rows passed in. Because I am selecting from the stocks table, I use the stocks_rc type.
2	Return an array, each of whose elements looks just like a row in the tickers table.
5, 6	Declare an associative array to hold rows fetched from the rows_in cursor variable.
8	Declare the local variable to be filled and then returned back to the SELECT statement.
10	Start up a simple loop to fetch rows from the cursor variable. It’s already open—the CURSOR expression takes care of that.
11, 12	Use the BULK COLLECT feature to retrieve as many as 100 rows with each fetch. I do this to avoid row-by-row processing, which is not efficient enough. Exit the loop when the associative array is empty.
14	For each element in the array (row from the cursor variable)...
16–21	Use EXTEND to add another element at the end of the nested table, then call the object type constructor to create the first (“opening”) of the two rows for tickers table, and put the element into the new last index value of the collection.
23–28	Do the same for the second (“closing”) row of the tickers table.
31	Close the cursor variable, now that all the rows have been fetched. Note: this step is optional. When you use a CURSOR expression to pass in the result set, the cursor will be closed automatically when the function terminates.
33	Send the nested table back to the SELECT statement for streaming.

Regarding FETCH-BULK COLLECT-LIMIT

I used a value of 100 for the LIMIT clause. That’s a decent default value—it’s the number of rows retrieved by cursor FOR loops with each fetch. But if you are processing an extremely large number of rows and want to squeeze better performance out of your function, you might try a larger LIMIT value. Note, however, that this will consume more Process Global Area (PGA) memory and that, at some point, your code will slow down due to excessive memory consumption.

You should also pass the LIMIT value as a parameter to give you the ability to modify the performance profile without recompiling your function, as in

CREATE OR REPLACE FUNCTION doubled (
   rows_in stock_mgr.stocks_rc, limit_in IN INTEGER DEFAULT 100)
...
BEGIN
   LOOP
      FETCH rows_in BULK COLLECT INTO l_stocks LIMIT limit_in;

OK, let’s do some streaming. Assume that I previously loaded the stocks table with 1,000 rows.

SELECT COUNT (*) c FROM tickers
/

C
————
0

INSERT INTO tickers
   SELECT * FROM TABLE (doubled (CURSOR (SELECT * FROM stocks)))
/

2000 row(s) inserted.

SELECT COUNT (*) FROM tickers
/

C
————
2000

     SELECT *
       FROM tickers
FETCH FIRST 10 ROWS ONLY
/

TICKER  PRICEDATE  PRICETYPE  PRICE
STK214  06-JUL-18  C          229
STK215  06-JUL-18  O          215
STK215  06-JUL-18  C          230
STK216  06-JUL-18  O          216
STK216  06-JUL-18  C          231
STK217  06-JUL-18  O          217
STK217  06-JUL-18  C          232
STK218  06-JUL-18  O          218
STK218  06-JUL-18  C          233
STK219  06-JUL-18  O          219

A Two-Step Transformation

In my two-step transformation, I will create a function that returns a nested table of elements that matches the stocks table. So I will need an object type and a nested table type.

CREATE OR REPLACE TYPE stock_ot
   AUTHID DEFINER IS OBJECT
(
   ticker VARCHAR2 (20),
   trade_date DATE,
   opening_price NUMBER,
   closing_price NUMBER
)
/

CREATE OR REPLACE TYPE stocks_nt AS TABLE OF stock_ot;
/

Now I create a table function that accepts a cursor variable, each of whose rows contains ticker data and returns a nested table, each of whose elements looks like a row in the stocks table. Because it follows precisely the same pattern as the doubled function, I will not describe the individual lines.

CREATE OR REPLACE FUNCTION singled (
tickers_in IN stock_mgr.tickers_rc)
   RETURN stocks_nt
   AUTHID DEFINER
IS
   TYPE tickers_aat IS TABLE OF tickers%ROWTYPE
      INDEX BY PLS_INTEGER;

   l_tickers   tickers_aat;
   l_singles   stocks_nt := stocks_nt ();
BEGIN
   LOOP
      FETCH tickers_in BULK COLLECT INTO l_tickers LIMIT 100;

      EXIT WHEN l_tickers.COUNT = 0;

      FOR indx IN 1 .. l_tickers.COUNT
      LOOP
         l_singles.EXTEND;
         l_singles (l_singles.LAST) :=
            stock_ot (l_tickers (indx).ticker,
                      l_tickers (indx).pricedate,
                      l_tickers (indx).price,
                      l_tickers (indx).price * .5);
      END LOOP;
   END LOOP;

   RETURN l_singles;
END;
/

Now let’s do a two-step transformation: stocks -> tickers -> more_stocks!

CREATE TABLE more_stocks

AS
  SELECT *
   FROM TABLE (
         singled (
           CURSOR (
             SELECT *
               FROM TABLE (doubled (CURSOR (SELECT * FROM stocks))))))
/

SELECT COUNT (*) FROM more_stocks
/

Summary

Streaming table functions play a crucial role in data warehouse ETL operations. Oracle Database makes building such functions easy, through its implementation of PL/SQL cursor variables and the CURSOR expression.

Remember that the collection constructed and returned by a streaming table function will consume PGA memory, so very large datasets passed into the function via the cursor variable may result in errors.

What can you do about that? Make that streaming table function a pipelined streaming table function, which I’ll cover in the next article in this series.

Next Steps

Share

Streaming Table Functions

Transform your data on the way to your data warehouse.