By Chris Saxon
August 23, 2019
You slip into your Monday morning team meeting just in time to hear your boss’s next “brilliant” idea: You are going to build a generic comma-separated values (CSV) parser! It must be able to take any CSV file as input and split out the results using SQL. Each record and field in the file will become a new row and column in the output. And you need to build the parser by this Friday!
Your mind starts racing. You’ll have to parse the file to read the header, see how many fields it returns, and then construct a SQL query with the same number of columns in the SELECT
statement.
Sounds like you’ll need some dynamic SQL. This could be a very long week.
Luckily you’re using Oracle Database 18c, so there's another option available. It’s one of the coolest new SQL extensions: polymorphic table functions (PTFs).
Oracle Database has included table functions for a long time. You can use them to generate rows in your result set.
But what if you want to change the columns at runtime based on input parameters? You have to fall back on some messy dynamic SQL or arcane custom aggregate functions.
PTFs offer a better solution. With PTFs you can, at runtime, add or remove columns from a table’s result set, based on values you supply.
Let’s find out how.
A PTF includes two key components:
A package, which defines the columns that will (or won’t) appear in the output and row values for any new columns
The function you’ll call to run the PTF, which can be a standalone function or a function nested within the package
The package contains the PTF’s implementation. It also includes two key parts:
A describe
function tells the database which columns to include in the output.
A fetch_rows
procedure assigns values to each row for these new columns.
Let’s start with a simple example. I’ll pass a list of columns to remove from the result set and another list of columns to add to the output.
If you want to follow along, you can find the code to build and run this polymorphic table function on Oracle Live SQL.
The following code creates a standalone PTF:
create or replace function hide_existing_add_new_cols ( tab table, add_cols columns, hide_cols columns ) return table pipelined row polymorphic using hide_and_add_cols_pkg; /
This code includes some new syntax:
tab table
specifies which table the PTF operates on. Every PTF must have exactly one table
parameter.
add_cols columns
and hide_cols columns
list the column identifiers to add and remove, respectively, from the result set. These parameters use the new pseudo-operator, columns
. They allow you to pass one or more column identifiers to the PTF.
row polymorphic
means the PTF processes rows independently. You can also define table semantics PTFs, which allow you to operate on a set of rows. As you’ll see later, these parameters also enable powerful extensions when you’re executing your functions.
using hide_and_add_cols_pkg
binds the PTF to the hide_and_add_cols_pkg
package.
As you can see in the last full line of code for the function, to implement this PTF, I need to create the hide_and_add_cols_pkg
package.
The following is the header for this package:
create or replace package hide_and_add_cols_pkg as function describe ( tab in out dbms_tf.table_t, add_cols dbms_tf.columns_t, hide_cols dbms_tf.columns_t ) return dbms_tf.describe_t; procedure fetch_rows; end hide_and_add_cols_pkg; /
The describe
function must have the same parameters as the PTF itself, with one subtle difference. You must replace the TABLE
argument with the DBMS_TF.table_t
type and any COLUMNS
arguments with DBMS_TF.column_t
. The describe
function must also have a return
type of DBMS_TF.describe_t
.
The fetch_rows
procedure must also have the same parameters as the PTF, minus any TABLE
or COLUMNS
arguments. In this example, the PTF includes only arguments of type TABLE
or COLUMNS
, so fetch_rows
has no parameters.
So what do you do inside these program units?
The describe
function defines the shape of the result set. That is, it returns the names and data types of the columns to display in the query results. To get hide_existing_add_new_cols
to do what it describes, there are two things you need to do:
Hide any existing columns listed on the columns-to-suppress list
Add the list of new columns
You can do these things with the following describe
function in the body of the hide_and_add_cols_pkg
package:
function describe ( tab in out dbms_tf.table_t, add_cols dbms_tf.columns_t, hide_cols dbms_tf.columns_t ) return dbms_tf.describe_t as new_cols dbms_tf.columns_new_t; col_list dbms_tf.columns_t := add_cols; begin for i in 1 .. tab.column.count loop if tab.column(i).description.name member of hide_cols then tab.column(i).for_read := false; tab.column(i).pass_through := false; end if; end loop; for i in 1 .. col_list.count loop new_cols(i) := dbms_tf.column_metadata_t ( name => col_list(i), type => dbms_tf.type_number ); end loop; return dbms_tf.describe_t ( new_columns => new_cols ); end describe;
Lines 10–15 of the describe
function remove any of the table’s columns that are in the hide_cols
list. The code sets the PASS_THROUGH
and FOR_READ
properties of the table’s columns.
The properties work as follows:
PASS_THROUGH
determines whether a column appears in the output.
FOR_READ
determines whether the fetch_rows
procedure can reference the column values.
You can set these values independently. So, for example, you can set the PASS_THROUGH
column value to TRUE
and the FOR_READ
column value to FALSE
and vice versa. This PTF removes columns from the results, and it doesn’t need those column values to define new columns, so you can set both properties to FALSE
.
Note that the table parameter also includes an array of its columns, so you can loop through these as shown in line 10. Line 11 checks whether the name of any existing column in the table matches a column in the list to exclude. If a column name matches, lines 12 and 13 set the PASS_THROUGH
and FOR_READ
properties for it to FALSE
.
Next, lines 17–24 define the new columns. The code loops through the array of new_cols
, and for each value, it uses DBMS_TF.column_metadata_t
to define a new column of type NUMBER
. Finally, lines 26–28 return the new column array to the client. This adds the new columns to the query’s output.
Now that I have defined the new columns, it’s time to set their values.
To keep it simple, the fetch_rows
procedure will assign a value to each new column according to the following formula:
ROW NUMBER * NEW COLUMN POSITION
So for the first column, the values for rows 1, 2, and 3 will be 1, 2, and 3, respectively. The second column will have the values 2, 4, and 6; the third column will have the values 3, 6, and 9; and so on.
To set these values, I need to loop through the rows and new columns in the result set. I can access these by fetching the current environment, which returns a record that includes two column arrays: a GET_COLUMNS
list and a PUT_COLUMNS
list.
The columns in the GET_COLUMNS
list are the existing columns in the table for which the FOR_READ
property is set to TRUE
. The columns in the PUT_COLUMNS
list include the new columns I defined in the describe
function.
The following code sets the new column values:
procedure fetch_rows as env dbms_tf.env_t; col dbms_tf.tab_number_t; last_row pls_integer := 0; begin env := dbms_tf.get_env(); for cols in 1 .. env.put_columns.count loop dbms_tf.xstore_get('col' || cols, last_row); for rws in 1 .. env.row_count loop col ( rws ) := ( rws + last_row ) * cols; end loop; dbms_tf.put_col ( cols, col ); dbms_tf.xstore_set('col' || cols, last_row + env.row_count); end loop; end fetch_rows;
Line 8 gets the compilation state of the PTF by calling DBMS_TF.get_env()
. The env
variable holds the current environment.
I can now use this to loop through the PUT columns—those added to the result set—and define a value for each row. Lines 10–22 iterate through these columns. Lines 14–16 loop through the current rows, building up an array of values. To return these to the client, line 18 calls DBMS_TF.put_col
. This assigns the values in the array to the new column in position N. The column position matches the order in which they were added in the describe
function.
Now you might be wondering this: What’s the purpose of the DBMS_TF.xstore*
calls?
PTFs process rows in batches, and each new batch runs fetch_rows
again. So one call to a PTF may execute fetch_rows
many times.
Each call to DBMS_TF.get_env
returns the number of rows in that batch—not the total number of rows processed. If the row loop set the column value to ROW NUMBER * COLUMN NUMBER, PTFs processing many rows will reset the counter part way through your results. It’s unlikely you want this to happen.
To avoid this, you can save the state in the execution store. This is a series of key-value pairs that you access with the DBMS_TF.xstore*
procedures.
Before assigning the row values, I get the number of the last row processed, which happens on line 12. DBMS_TF.xstore_get
looks for the key passed as the first argument. The second argument is an in-out parameter. If the key doesn’t exist, DBMS_TF.xstore_get
returns the value passed in for this second parameter.
So DBMS_TF.xstore_get('col' || cols, last_row)
searches for the keys col1, col2, and so on. For the first set of rows, these keys don’t exist, so the code returns the initialized value for last_row
: which is zero. Otherwise, it returns the position of the last row processed.
Then, after processing all the rows in the current batch, I need to update the execution store to hold the position of the current row in the results, as shown on line 20. Calling DBMS_TF.xstore_set
overwrites the current value for any keys that exist.
So that builds the PTF. Now how do you use it?
To use the PTF, simply call it in the FROM
clause of your query, passing the name of the table you’re operating on and the lists of columns to add and remove.
To replace the DUMMY
column in the DUAL
table with one named C1
, I run the following:
select * from hide_existing_add_new_cols ( dual, columns ( c1 ), columns ( dummy ) ); C1 1
I can also pass in a subquery to the PTF by defining a common-table expression (CTE):
with rws as ( select dummy from dual connect by level <= 4 ) select * from hide_existing_add_new_cols ( rws, columns ( c1, c2, c3, c4 ), columns ( not_here ) ); DUMMY C1 C2 C3 C4 X 1 2 3 4 X 2 4 6 8 X 3 6 9 12 X 4 8 12 16
This demonstrates a key feature of SQL: Everything is a table. You can pass views or named CTEs as table inputs to a PTF.
But you can’t pass a subquery directly. The following code will fail:
select * from hide_existing_add_new_cols ( ( select dummy from dual ), columns ( c1, ), columns ( dummy ) ); SQL Error: ORA-00904: : invalid identifier
To implement this PTF, you need to know only the position of the row in the result set—not the values from other rows. While row semantics are good enough for this PTF implementation, if you’re writing a PTF that calculates results across rows—say, aggregations—you need to use table semantics.
Switching to table semantics PTFs also enables you to define the order in which you process the dataset. So let’s re-create the PTF with table semantics:
create or replace function hide_existing_add_new_cols ( tab table, add_cols columns, hide_cols columns ) return table pipelined table polymorphic using hide_and_add_cols_pkg; /
This enables you to use the PARTITION BY
and ORDER BY
clauses when you call the PTF.
To demonstrate this, let’s create a table storing the integers 1 to 6, identifying whether they’re even and including their negative values:
create table integers as select level number_value, case mod ( level, 2 ) when 0 then 'Y' when 1 then 'N' end is_even, -level negated_value from dual connect by level <= 6;
Now I can define the order in which the PTF processes the rows. To do this, I add the ORDER BY
clause after the table:
select * from hide_existing_add_new_cols ( integers order by negated_value, columns ( c1, c2 ), columns ( is_even ) ) order by number_value; NUMBER_VALUE NEGATED_VALUE C1 C2 1 -1 6 12 2 -2 5 10 3 -3 4 8 4 -4 3 6 5 -5 2 4 6 -6 1 2
This sorts the input to the PTF by the negated values, so you have descending counters relative to the positive values. Also note that the final ORDER BY
for the query is different from the input for the PTF.
As with analytic functions, the PARTITION BY
clause carves the input dataset into groups for each value in the columns, so you can have separate counters for the odd and even values by using PARTITION BY is_even
:
select * from hide_existing_add_new_cols ( integers partition by is_even order by number_value, columns ( c1, c2 ), columns ( negated_value ) ) order by is_even, number_value; NUMBER_VALUE IS_EVEN C1 C2 1 N 1 2 3 N 2 4 5 N 3 6 2 Y 1 2 4 Y 2 4 6 Y 3 6
Partitioning and sorting your dataset this way uses standard functionality. All you need to do is define the PTF with table semantics.
Clearly PTFs are a powerful feature. With them you change the shape of a result set at runtime.
But you may be wondering: Why bother building one to add row counters? This is much easier to do with the analytic function ROW_NUMBER.
True; my example is a little contrived, but it demonstrates the key features of PTFs. Namely, it demonstrates how to
Remove existing columns from a result set in the describe
function
Add new columns in this function
Assign values to new columns in the fetch_rows
procedure
Use table semantics PTFs to split and sort the input rows
Manage execution state using DBMS_TF.xstore*
procedures
But what are the real-world use cases for PTFs?
Let’s return to your original requirement: to read a CSV file that has an unknown number of fields and split the values out into separate columns. Doing this without PTFs requires complex dynamic SQL.
But with PTFs in place, you could read the first row of the CSV file and pass it as an argument to the function, then use it to define the columns in your results in the describe
function, and split the input rows into the columns in the fetch_rows
procedure. Here’s an example of a dynamic CSV-to-column splitter PTF.
Other cases where PTFs may be appropriate include the following:
Dynamic PIVOT and UNPIVOT operations.
Top-N Plus queries. (Use PTFs in place of a traditional Top-N query, but with an extra summary row, giving totals for all the remaining rows in the dataset.)
Executing R scripts against a table, returning the results of the R script.
I suspect there will be relatively few general-purpose PTF use cases. But I’m also sure many developers will find more weird and wacky ways of using them!
What do you think of PTFs: awesome or awful? Can you come up with other general-purpose use cases? If you have any comments or questions about PTFs, reach out to Chris Saxon on Twitter.
READ an overview of polymorphic table functions.
WORK with the code for this article on Oracle Live SQL.
WORK with a CSV-to-columns converter on Oracle Live SQL.
Illustration by Wes Rowell
DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.