Database, SQL and PL/SQL

On Promotion, Restriction, and Data Loading

Our technologist provides missing links, custom sizes, and external table tips.

By Tom Kyte

January/February 2013

Steven Feuerstein posted a URL that contains all of his articles that have been published in Oracle Magazine: oracle.com/technetwork/issue-archive/index-087690.html.

I was wondering if you had a similar link and could post it here.

Yes. Oracle Magazine maintains an archive of past issues and—for recurring columns such as Ask Tom—pages that point to them all. My page of links is oracle.com/technetwork/issue-archive/index-093676.html. You can also go to the Oracle Magazine home page at oracle.com/technetwork/oramag/magazine/home for links to the complete content of back issues as well as links to these list pages—near the bottom of the home page under FEATURED COLUMNS.

Undo Size Restrictions

I have some coworkers who believe that it is a good idea to limit the size of the undo tablespace—in this case, to 1 GB per node in two- to five-node Oracle Real Application Clusters systems. The reasoning is that it controls the size of undo and limits large transactions. (They also do this with TEMP tablespaces). I am not a fan of this practice. What is your opinion?

It seems to be a strange way to limit a resource—and rather arbitrary. To meet the requirement “limit the size of a transaction,” this approach would not even appear on my list of solutions. It doesn’t limit the size of individual transactions—it limits only the cumulative size of all concurrently executing transactions to 1 GB. You could still have a single transaction that is 1 GB in size.

To control the size of transactions, the correct method is to use Oracle Database's Resource Manager feature. Almost all systems should be using Resource Manager already, so if your systems are not, you should read up on it at http://docs.oracle.com/cd/E11882_01/server.112/e25494/dbrm.htm#ADMIN027. With Resource Manager you can, among other things, establish an undo quota for undo space. The undo quota limits the amount of undo a resource consumer group can utilize at any given point in time. In this manner, you can limit the amount of undo an application, group of users, or individual users can generate.

But back to this 1 GB arbitrary limit: Some systems will need 1 GB of undo, others will need 500 MB, others will need 10 GB, and others will need other sizes. The amount of undo needed is based on

The number of concurrent transactions
The relative size of these transactions
The length of your longest-running queries (because you don’t want to experience ORA-1555, “Snapshot too old,” errors due to insufficient undo retention)

In the olden days—before automatic undo management in Oracle9i Database—a DBA had to size undo manually. Nowadays this is how I would size my undo:

Determine the undo retention I need to support. This is a function of my longest-running query (undo retention must be longer than that, or else I will be subject to the dreaded ORA-1555 error) and my desire to be able to do flashback table or flashback query operations to recover from a “whoops”-type error without having to go to backups. I’d like my undo retention to be at least three hours in most typical cases—for the “whoops” type of stuff—and even longer if I can get it, because it is so much easier to do flashback operations than to go to backups. Flashback features have saved me more often than I can count since Oracle9i Database.
Allow my undo segment to autoextend—within reasonable limits—to satisfy the undo retention I need to support. You don’t have to let it go to infinity, just to some reasonable size.

If you allow the undo tablespace to autoextend, the database will retain undo for at least as long as you requested in the UNDO_RETENTION setting, and it may retain undo even longer to permit long-running queries to complete.

Temporary tablespace space follows the same sort of rules. For some systems, no temporary space might be used. An online transaction processing (OLTP) system, for example, works with tens of rows at a time, so there is virtually no need for temporary space. Other systems, such as a reporting system or a data warehouse, may need many gigabytes or even terabytes of temporary space, depending on the needs of the queries and the number of concurrently executing queries. The one-size-fits-all approach to undo doesn’t work in real life.

Tricky Data Loading

I am using SQL Loader to load data into tables from a flat file. Some sample data in the file might be:

12,smith,1234556@@1234567 @@876556612,1200

14,John.1234@@4567@@56789 @@12345@@45778@@34566@@23456,1345

The field values are empno, empname, phone numbers, and salary. The table structures are

create table emp ( empno number(5) primary key, ename varchar2(10), Sal number(10,2) )
create table emp_contact ( empno references emp, phone_no number(10) )

I want to insert the data into the emp and emp_contact tables. I don’t know how many values for phone_no the file contains (the number of values for phone_no is not fixed). How do I insert the data into the tables?

This is tricky, because you have to turn a single row of data into many rows in the EMP_CONTACT table.

With SQL Loader (the twentieth-century legacy data loading tool), this is not going to be directly possible. You could create a view on DUAL and place an INSTEAD OF INSERT trigger on that view to procedurally process the data. The trigger would take each row inserted, parse it row by row, and insert the row(s) into EMP and EMP_CONTACT. I’m not going to suggest that approach, though, because it is very code-intensive and does not scale very well. The trigger approach processes slow by slow (row by row), so it doesn’t perform well for data loading.

I am instead going to suggest using the data loading tool of the twenty-first century: the external table. To test this, I start by creating the external table, as shown in Listing 1, so I can query the data as if it were in a regular database table.

Code Listing 1: Creating the external table

SQL> create or replace directory my_dir as '/home/tkyte'
  2  /
Directory created.
SQL> CREATE TABLE et
  2  ( "EMPNO" VARCHAR2(10),
  3    "ENAME" VARCHAR2(20),
  4    "TELNOS" VARCHAR2(1000),
  5    "SAL" VARCHAR2(10) )
  6  ORGANIZATION external
  7  (
  8    TYPE oracle_loader
  9    DEFAULT DIRECTORY MY_DIR
 10    ACCESS PARAMETERS
 11    (
 12      RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
 13      BADFILE 'MY_DIR':'t.bad'
 14      LOGFILE 't.log_xt'
 15      READSIZE 1048576
 16      FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' LDRTRIM
 17      MISSING FIELD VALUES ARE NULL
 18      REJECT ROWS WITH ALL NULL FIELDS
 19      (
 20        "EMPNO" CHAR(255)
 21          TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
 22        "ENAME" CHAR(255)
 23          TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
 24        "TELNOS" CHAR(1000)
 25          TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
 26        "SAL" CHAR(255)
 27          TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
 28      )
 29    )
 30    location
 31    (
 32      't.dat'
 33    )
 34  )
 35  /
Table created.
SQL> select * from et;
EMPNO  ENAME   TELNOS                                          SAL
—————  —————   —————————————————————————————————————————————   ————
12     smith   1234556@@1234567@@876556612                     1200
14     John    1234@@4567@@56789@@12345@@45778@@34566@@23456   1345

So now I can access the data to be loaded with SQL, and once I have that ability, I can do anything I want to it. What I need to do is take each of the rows in Listing 1, parse the TELNOS string around the string ‘@@’, and output a row for each telephone number. Using SQL, I can do that easily, as shown in Listing 2.

Code Listing 2: Querying the external table to output a row for each telephone number

SQL> select empno, ename, sal, i,
  2         substr( tnos,
  3                 instr( tnos, '@@', 1, i )+2,
  4                 instr( tnos, '@@', 1, i+1 )
  5                   -instr( tnos, '@@', 1, i) - 2 ) tno
  6    from (
  7  select to_number(et.empno) empno,
  8         et.ename,
  9         to_number(et.sal) sal,
 10         column_value i,
 11         '@@'||et.telnos||'@@' tnos
 12    from et,
 13         table( cast( multiset(
 14            select level
 15              from dual
 16           connect by level <=
 17              (length(et.telnos)
 18                 -length(replace(et.telnos,'@@','')))/2+1 )
 19               as sys.odciNumberList ) )
 20         )
 21  /
EMPNO  ENAME    SAL    I  TNO
—————  —————   ————   ——  ———————
   12  smith   1200    1  1234556
   12  smith   1200    2  1234567
   12  smith   1200    3  876556612
   14  John    1345    1  1234
   14  John    1345    2  4567
   14  John    1345    3  56789
   14  John    1345    4  12345
   14  John    1345    5  45778
   14  John    1345    6  34566
   14  John    1345    7  23456
10 rows selected.

I’ll explain the query in Listing 2 before I proceed to the data load. I know I need to turn each row in the external table into multiple rows and that the number of rows will be equal to the number of ‘@@’ strings plus one I find in the ET.TELNOS column—I compute that number on lines 17 and 18. Then I use a CONNECT BY to generate that many rows.

On lines 13–19, I generate the set of rows I need for each input row: three rows for the first input record and seven rows for the second. Then I join this set of data to the external table itself—row by row. No join condition is necessary here when I use the construct

SELECT *
FROM some_table, TABLE( some_set )

The TABLE() clause executes for each row in SOME_TABLE, and those rows are joined to the row that caused them to be generated. As a result, lines 12–20 have the effect of returning the first row from the ET external table three times and the second row seven times—exactly what I need. On lines 7–11, I select and format the columns I need from the output. I am converting the EMPNO and SAL columns into their proper datatypes, preserving the ENAME field as is, and adding ‘@@’ to the front and back of the TELNOS string, which will make it easier to parse in a moment. Also, I take the output of my table function (the column is implicitly named COLUMN_VALUE) and name it I. This represents the number of that row by each input record—as you can see, it runs from 1 to 3 for the first record and 1 to 7 for the second.

Last, on lines 1–5, I select the fields I need and parse the nth telephone number from the TNOS string. I use INSTR on line 3 to find the nth occurrence of ‘@@’ in the string and then use INSTR again to discover where the (n+1)th occurrence is on line 4. This enables me to compute the starting character and length of the nth telephone entry in the string, and I get the desired output.

Now in order to load the data, I need to take the first row (where I = 1) for each group in Listing 2, insert it into the EMP table, and take all the rows in each group and insert them into the EMP_CONTACT table. I could do that in two passes through the data—that is, just load the ET table into EMP and then load the output of my Listing 2 query into EMP_CONTACT. Rather than process the data twice, however, I’ll use a multitable insert to load both tables in one statement, as shown in Listing 3.

Code Listing 3: Multitable INSERT to load rows into both the emp and emp_contact tables

SQL> create table emp
  2  ( empno number primary key,
  3    ename varchar2(10),
  4    sal   number
  5  );
Table created.
SQL> create table emp_contact
  2  ( empno    number references emp,
  3    phone_no number
  4  );
Table created.
\
SQL> insert all
  2  when (i = 1) then into emp (empno,ename,sal) values (empno,ename,sal)
  3  when (i > 0) then into emp_contact(empno,phone_no) values (empno,tno)
  4  select empno, ename, sal, i,
  5         substr( tnos,
  6                 instr( tnos, '@@', 1, i )+2,
  7                 instr( tnos, '@@', 1, i+1 )
  8                   -instr( tnos, '@@', 1, i) - 2 ) tno
  9    from (
 10  select to_number(et.empno) empno,
 11         et.ename,
 12         to_number(et.sal) sal,
 13         column_value i,
 14         '@@'||et.telnos||'@@' tnos
 15    from et,
 16         table( cast( multiset(
 17            select level
 18              from dual
 19           connect by level <=
 20              (length(et.telnos)
 21                 -length(replace(et.telnos,'@@','')))/2+1 )
 22               as sys.odciNumberList ) )
 23         )
 24  /
12 rows created.
SQL> select * from emp;
EMPNO  ENAME    SAL
—————  ——————  —————
   12  smith   1200
   14  John    1345
SQL> select * from emp_contact;
EMPNO    PHONE_NO
———————  —————————
   12     1234556
   12     1234567
   12   876556612
   14        1234
   14        4567
   14       56789
   14       12345
   14       45778
   14       34566
   14       23456
10 rows selected.

As you can see, I simultaneously loaded the EMP and the EMP_CONTACT tables, using a single SQL statement against the ET external table. Using external tables as your data loading tool offers up a world of possibilities that was just not possible with SQL Loader.

Cardinalities

I have a question about joining with collections and cardinality estimation. The optimizer is always estimating that 8,168 rows are coming back from my collection, and because of that, it is using inefficient plans. The estimate of 8,168 is more than two orders of magnitude more than the real cardinality. How can I solve this problem?

This is a long-running issue with pipelined functions and collections during optimization. The optimizer in general doesn’t have any information about the cardinality (the number of rows) being returned by the collection. It has to guess—and that guess is based on the block size (default statistics are block-size-driven). So, for a database with an 8 K block size, the guess is about 8,000. And because it is unlikely that your collection has about 8,000 elements (probably more like 8 or 80 in most cases), you can definitely arrive at a suboptimal plan.

So the question is how do you get the right estimated cardinality value to the optimizer? I’m going to demonstrate four ways here—and I’ll provide a pointer to a fifth way that uses the extensible optimizer. The four methods I’ll demonstrate are

The cardinality hint (undocumented)
The OPT_ESTIMATE hint (undocumented)
Dynamic sampling (Oracle Database 11g Release 1 and later)
Oracle Database’s Cardinality Feedback feature (Oracle Database 11g Release 2 and later)

You can read about the fifth method in an excellent write-up by Oracle ACE Adrian Billington at bit.ly/WRhizp.

To demonstrate how to get and work with the right estimated cardinality value, I’m going to use a little pipelined function I’ve used many times over the years. This function takes a delimited string and turns it into a set, as shown in Listing 4.

Code Listing 4: Pipelined function turning delimited string into set

SQL> create or replace type str2tblType as table of varchar2(30)
  2  /
Type created.
SQL> create or replace
  2  function str2tbl( p_str in varchar2, p_delim in varchar2 default ',' )
  3  return str2tblType
  4  PIPELINED
  5  as
  6      l_str      long default p_str || p_delim;
  7      l_n        number;
  8  begin
  9      loop
 10          l_n := instr( l_str, p_delim );
 11          exit when (nvl(l_n,0) = 0);
 12          pipe row( ltrim(rtrim(substr(l_str,1,l_n-1))) );
 13          l_str := substr( l_str, l_n+1 );
 14      end loop;
 15  end;
 16  /
Function created.

Once I have that function installed, I can try it out by executing a query like this:

SQL> variable x varchar2(15)
SQL> exec :x := '1,2,3,a,b,c'
PL/SQL procedure successfully completed.
SQL> select * from table(str2tbl(:x));
COLUMN_VALUE
——————————————————————————————————————
1
2
3
a
b
c
6 rows selected.

You can see the potential uses for a function like this—to bind an IN list, for example (see bit.ly/S43Jqq for a demonstration). But a problem with this function is the estimated cardinality the optimizer comes up with, as shown in Listing 5.

Code Listing 5: Optimizer making 8 K cardinality estimate

SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————
SQL_ID  ddk1tv9s5pzq5, child number 0
————————————————————————————————————————————————————————
select * from table(str2tbl(:x))
Plan hash value: 2407808827
———————————————————————————————————————————————————————————————————————————
|Id|Operation                      |Name   |Rows|Bytes|Cost (%CPU)|Time    |
———————————————————————————————————————————————————————————————————————————
| 0|SELECT STATEMENT               |       |    |     |  29  (100)|        |
| 1| COLLECTION ITERATOR PICKLER...|STR2TBL|8168|16336|  29    (0)|00:00:01|

As you can see, there is that magic 8,168 number in the ROWS column. The optimizer assumes that the collection returned by this function is going to have more than 8,000 entries—you can imagine how that might affect the choices made by the optimizer regarding whether to use an index when processing a query. The first solution to this might be to use the cardinality hint, as shown in Listing 6.

Code Listing 6: Using the CARDINALITY hint

SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————
SQL_ID  bd2f8rh30z3ww, child number 0
————————————————————————————————————————————————————————
select /*+ cardinality(sq 10) */ * from table(str2tbl(:x)) sq
Plan hash value: 2407808827
———————————————————————————————————————————————————————————————————————————
|Id|Operation                      |Name   |Rows|Bytes|Cost (%CPU)|Time    |
———————————————————————————————————————————————————————————————————————————
| 0|SELECT STATEMENT               |       |    |     |  29  (100)|        |
| 1| COLLECTION ITERATOR PICKLER...|STR2TBL|  10|   20|  29    (0)|00:00:01|

As you can see, when I use CARDINALITY(SQ 10) as a hint in the query, the optimizer adjusts its estimated cardinality to be the number I input—not 8,168. Note that I did not use the number 6, even though the cardinality really was 6 in my example. You do not have to be exact with the estimate; you want a number that is representative of the data, but it doesn’t have to be perfect. Typically it should be within an order of magnitude of the actual value (8,168 is a full three orders of magnitude off!). This cardinality hint will work in Oracle9i Database and later.

The next approach I can take in Oracle Database 10g and later is the same approach a SQL profile would take: use the OPT_ESTIMATE hint. The OPT_ESTIMATE hint in this case will take three inputs: the type of thing to apply the hint to (a table in this case), the correlation name of that table (SQ in this case), and a scaling factor to apply to the optimizer’s estimate. This scaling factor is a number by which the optimizer will multiply its estimated cardinality to arrive at the desired cardinality. In this case, the scaling factor will be

SQL> select 10/8168 from dual;
    10/8168
————————————————
  .00122429

I want 10 instead of 8,168, so I need to use a scaling factor of 0.00122429. When I do that, I get the result in Listing 7.

Code Listing 7: Using the OPT_ESTIMATE hint

select /*+ opt_estimate(table, sq, scale_rows=0.00122429) */ *
  from table(str2tbl(:x)) sq
Plan hash value: 2407808827
———————————————————————————————————————————————————————————————————————————
|Id|Operation                      |Name   |Rows|Bytes|Cost (%CPU)|Time    |
———————————————————————————————————————————————————————————————————————————
| 0|SELECT STATEMENT               |       |    |     |  29  (100)|        |
| 1|  OLLECTION ITERATOR PICKLER...|STR2TBL|  10|   20|  29    (0)|00:00:01|

I achieve the desired result again: a cardinality estimate of 10.

The third approach I can use is dynamic sampling in Oracle Database 11g Release 1 and later. In order for a collection to be sampled, I must add a dynamic sampling hint to the query. Having the dynamic sampling level set in the session or at the system level is insufficient. Listing 8 demonstrates the query with the dynamic sampling hint.

Code Listing 8: Using the DYNAMIC SAMPLING hint

select /*+ dynamic_sampling( sq, 2 ) */ * from table( str2tbl(:x,',') ) sq
Plan hash value: 2407808827
———————————————————————————————————————————————————————————————————————————
|Id|Operation                      |Name   |Rows|Bytes|Cost (%CPU)|Time    |
———————————————————————————————————————————————————————————————————————————
| 0|SELECT STATEMENT               |       |    |     |  11  (100)|        |
| 1| COLLECTION ITERATOR PICKLER...|STR2TBL|   6|   12|  11    (0)|00:00:01|
———————————————————————————————————————————————————————————————————————————
Note
———————
dynamic sampling used for this statement (level=2)

As you can see, the estimated cardinality is 6 in this case, because when I executed the query and the sample was taken, there were six elements. The dynamic sampling approach will use the cardinality it discovers on the hard parse of the SQL statement for all subsequent executions of that same cursor.

Last, Oracle Database 11g Release 2 and later include the Cardinality Feedback feature. I quickly demonstrated this once before (bit.ly/Y7WSjM). Since I wrote that article, a new requirement for Cardinality Feedback to work with collections and bind variables has arisen, and I’ll address that here.

Cardinality Feedback works by having the optimizer change its cardinality estimates after executing a query for the first time and observing that the actual cardinalities were very far off from the estimated cardinalities. That is, the optimizer starts to learn from its mistakes. If it executes a query and discovers that the real row counts are far off from the estimated counts, it will reoptimize the query, using the newly discovered values.

For this to work with collections and bind variables, you have to modify our query slightly. I’ll be using a WITH factored subquery and the materialize hint to have the database physically materialize the collection data into a temporary table. (This is required for Cardinality Feedback to work with bind variables in this case.) Listing 9 demonstrates Cardinality Feedback using the WITH factored subquery and the materialize hint.

Code Listing 9: Using Cardinality Feedback (and the WITH factored subquery and materialize hint)

with sq
as (
select /*+ materialize */ *
  from table( str2tbl( :x ) )
)
select *
  from sq
Plan hash value: 630596523
—————————————————————————————————————————————————————————————————————————————
|Id|Operation                        |Name   |Rows|Bytes|Cost (%CPU)|Time    |
—————————————————————————————————————————————————————————————————————————————
| 0|SELECT STATEMENT                 |       |    |     |  32  (100)|        |
| 1| TEMP TABLE TRANSFORMATION       |       |    |     |           |        |
| 2|  LOAD AS SELECT                 |       |    |     |           |        |
| 3|   COLLECTION ITERATOR PICKLER...|STR2TBL|8168|16336|   29   (0)|00:00:01|
| 4|  VIEW                           |       |8168| 135K|    3   (0)|00:00:01|
| 5|   TABLE ACCESS FULL             |SYS_...|8168|16336|    3   (0)|00:00:01|
—————————————————————————————————————————————————————————————————————————————
18 rows selected.

With Cardinality Feedback, I would just use SQ wherever I would have used the actual query. Note that the first time I execute this query, the cardinality estimate is way off—it is that magic number 8,168 (again). However, the optimizer learns from its mistake, and when I execute the query again, I get the result in Listing 10.

Code Listing 11: Getting the corrected cardinality with Cardinality Feedback

with sq as (select /*+ materialize */ *    from table( str2tbl( :x ) )
) select * from sq
Plan hash value: 630596523
—————————————————————————————————————————————————————————————————————————————
|Id|Operation                        |Name   |Rows|Bytes|Cost (%CPU)|Time    |
—————————————————————————————————————————————————————————————————————————————
| 0|SELECT STATEMENT                         |    |     |  32  (100)|        |
| 1| TEMP TABLE TRANSFORMATION               |    |     |           |        |
| 2|  LOAD AS SELECT                         |    |     |           |        |
| 3|   COLLECTION ITERATOR PICKLER...|STR2TBL|8168|16336|  29    (0)|00:00:01|
| 4|  VIEW                                   |   6| 102 |   3    (0)|00:00:01|
| 5|   TABLE ACCESS FULL             |SYS_...|   6|   12|   3    (0)|00:00:01|
—————————————————————————————————————————————————————————————————————————————
Note
———————
   - cardinality feedback used for this statement
22 rows selected.

With Cardinality Feedback, I get the correct estimated cardinality. I have to make the mistake once, but the optimizer will self-correct the second time around. So, there are four of the methods for getting the estimated cardinality for collections correct: the cardinality hint, the OPT_ESTIMATE hint, dynamic sampling, and Cardinality Feedback.

Next Steps

ASK Tom
Tom Kyte answers your most difficult technology questions. Highlights from that forum appear in this column.

FOLLOW Tom on Twitter

DOWNLOAD Oracle Database 11g Release 2

LEARN more about
Oracle Database’s Resource Manager feature
the extensible optimizer
pipelined functions
Oracle Database’s Cardinality Feedback feature

FOLLOW Oracle Database
on Twitter
on Facebook

Tom Kyte

In his 22 years at Oracle, Tom Kyte started the Ask Tom question and answer site and was the site's sole proprietor for almost 15 years. In his spare time, he also wrote Expert Oracle Database Architecture (Apress, 2005, 2010, 2014) and Effective Oracle by Design (Oracle Press, 2003), among other books. (The Ask Tom site continues today as Ask TOM: Ask The Oracle Masters.)

DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.

Share

On Promotion, Restriction, and Data Loading

Our technologist provides missing links, custom sizes, and external table tips.