Skip to Main Content
  • Questions
  • SQL*Loader with multiple WHENs is rejecting all rows

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Basil.

Asked: March 14, 2011 - 11:19 am UTC

Last updated: March 14, 2011 - 6:25 pm UTC

Version: 10.2.0

Viewed 10K+ times! This question is

You Asked

I'm working to improve the efficiency of an existing process which starts with staging an input file (generated by a client program) into the database. In some situations, the input file can have 20,000,000 lines or more, but with 1/3 or so of those input lines being of no use on the input side.

At present, ALL of the lines are transferred (as rows in a staging table), with unneeded lines pruned at the beginning of processing, using a DELETE. The delete obviously generates a lot of redo and undo, and takes some time to accomplish. Further, we're transferring lines that we don't even need to bother with in the first place.

I'm trying to modify the SQL*Loader control file to discard these unneeded records in order to save both the transfer time and the pruning time. However, I'm running up against SQL*Loader's lack of an "OR" in its condition processing. I've tried structuring the control file to perform multiple tests, so that each successful test would result in sending the row to the database. My results show that nearly ALL rows are being rejected.

My questions:
1. What's wrong with my control file (see below)? The docs suggest this is one way to solve the problem. In the input, I'd expect lines 2, 4, 6, 7, 8, 9, 10, and 11 to load.
2. Is there a better way to do this?

The staging table:
 drop table staging;
 create table staging (
   rectype char(1) not null, 
 id number not null, 
 name varchar2(64) not null, 
 val varchar2(256))
 /


My input file, inputdata.csv:
4,1,"rn",""
4,2,"rn","nonnull"
4,3,"rd",""
4,4,"rd","nonnull"
4,5,"ac","NO"
4,6,"ac","YES"
4,7,"ie","nonnull"
4,8,"at","nonnull"
4,9,"ms","1.23"
4,10,"mb","1.56"
4,11,"ms",".99"





My control file, upload_inputdata.ctl:
LOAD DATA 
REPLACE
INTO TABLE staging
WHEN NAME = 'rn' 
     AND VAL <> ''
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE,
  ID,
  NAME,
  VAL
)
INTO TABLE staging
WHEN NAME = 'rd'
     AND VAL <> ''
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE,
  ID,
  NAME,
  VAL
)
INTO TABLE staging
WHEN NAME = 'ac'
     AND VAL <> 'NO'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE,
  ID,
  NAME,
  VAL
)
INTO TABLE staging
WHEN NAME <> ''
    AND NAME <> 'ie'
    AND NAME <> 'at'
    AND NAME <> 'rn'
    AND NAME <> 'rd'
    AND NAME <> 'ac'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE,
  ID,
  NAME,
  VAL
)


The command to launch SQL*Loader:
sqlldr user/pass@db DATA='inputdata.csv' CONTROL=upload_inputdata.ctl LOG=inputdata.log  SILENT=HEADER ROWS=10000 DIRECT=true DISCARD=inputdata_discards.csv


My discards file, with comments:
SQL*Loader: Release 10.2.0.4.0 - Production on Mon Mar 14 11:56:25 2011

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

Control File:   upload_inputdata.ctl
Data File:      inputdata.csv
  Bad File:     inputdata.bad
  Discard File: inputdata_discards.csv
 (Allow all discards)

Number to load: ALL
Number to skip: 0
Errors allowed: 50
Continuation:    none specified
Path used:      Direct

Table STAGING, loaded when NAME = 0X726e(character 'rn')
                  and VAL != BLANKS
Insert option in effect for this table: REPLACE
TRAILING NULLCOLS option in effect

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
RECTYPE                             FIRST     *   ,  O(") CHARACTER            
ID                                   NEXT     *   ,  O(") CHARACTER            
NAME                                 NEXT     *   ,  O(") CHARACTER            
VAL                                  NEXT     *   ,  O(") CHARACTER            

Table STAGING, loaded when NAME = 0X7264(character 'rd')
                  and VAL != BLANKS
Insert option in effect for this table: REPLACE
TRAILING NULLCOLS option in effect

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
RECTYPE                              NEXT     *   ,  O(") CHARACTER            
ID                                   NEXT     *   ,  O(") CHARACTER            
NAME                                 NEXT     *   ,  O(") CHARACTER            
VAL                                  NEXT     *   ,  O(") CHARACTER            

Table STAGING, loaded when NAME = 0X6163(character 'ac')
                  and VAL != 0X4e4f(character 'NO')
Insert option in effect for this table: REPLACE
TRAILING NULLCOLS option in effect

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
RECTYPE                              NEXT     *   ,  O(") CHARACTER            
ID                                   NEXT     *   ,  O(") CHARACTER            
NAME                                 NEXT     *   ,  O(") CHARACTER            
VAL                                  NEXT     *   ,  O(") CHARACTER            

Table STAGING, loaded when NAME != BLANKS
                  and NAME != 0X6965(character 'ie')
                  and NAME != 0X6174(character 'at')
                  and NAME != 0X726e(character 'rn')
                  and NAME != 0X7264(character 'rd')
                  and NAME != 0X6163(character 'ac')
Insert option in effect for this table: REPLACE
TRAILING NULLCOLS option in effect

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
RECTYPE                              NEXT     *   ,  O(") CHARACTER            
ID                                   NEXT     *   ,  O(") CHARACTER            
NAME                                 NEXT     *   ,  O(") CHARACTER            
VAL                                  NEXT     *   ,  O(") CHARACTER            



Record 1: Discarded - failed all WHEN clauses. -- OK
Record 3: Discarded - failed all WHEN clauses. -- OK
Record 4: Discarded - failed all WHEN clauses. -- NOT OK; Second WHEN should apply
Record 5: Discarded - failed all WHEN clauses. -- OK
Record 6: Discarded - failed all WHEN clauses. -- NOT OK; Third WHEN should apply
Record 7: Discarded - failed all WHEN clauses. -- OK
Record 8: Discarded - failed all WHEN clauses. -- OK
Record 9: Discarded - failed all WHEN clauses. -- NOT OK; Last WHEN should apply
Record 10: Discarded - failed all WHEN clauses. -- NOT OK; Last WHEN should apply
Record 11: Discarded - failed all WHEN clauses. -- NOT OK; Last WHEN should apply
Record 12: Discarded - failed all WHEN clauses. -- OK; blank line in input
Record 13: Discarded - failed all WHEN clauses. -- OK; blank line in input


Table STAGING:
  1 Row successfully loaded.
  0 Rows not loaded due to data errors.
  12 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.


Table STAGING:
  0 Rows successfully loaded.
  0 Rows not loaded due to data errors.
  13 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.


Table STAGING:
  0 Rows successfully loaded.
  0 Rows not loaded due to data errors.
  13 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.


Table STAGING:
  0 Rows successfully loaded.
  0 Rows not loaded due to data errors.
  13 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.

Bind array size not used in direct path.
Column array  rows :    5000
Stream buffer bytes:  256000
Read   buffer bytes: 1048576

Total logical records skipped:          0
Total logical records read:            13
Total logical records rejected:         0
Total logical records discarded:       12
Total stream buffers loaded by SQL*Loader main thread:        1
Total stream buffers loaded by SQL*Loader load thread:        0

Run began on Mon Mar 14 11:56:25 2011
Run ended on Mon Mar 14 11:56:26 2011

Elapsed time was:     00:00:01.00
CPU time was:         00:00:00.22



and Tom said...

http://docs.oracle.com/cd/E11882_01/server.112/e16536/ldr_control_file.htm#sthref699

<quote>
The important point in this example is that the second empno field is found immediately after the first ename, although it is in a separate INTO TABLE clause. Field scanning does not start over from the beginning of the record for a new INTO TABLE clause. Instead, scanning continues where it left off.
</quote>

In your case, only the FIRST into clause would see any data- the rest see all nulls.


drop table staging;
    create table staging (
   rectype char(1) not null, 
    id number not null, 
    name varchar2(64) not null, 
    val varchar2(256));

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> !cat test.ctl
LOAD DATA 
REPLACE
INTO TABLE staging
WHEN NAME = 'rn' 
     AND VAL <> ''
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE,
  ID,
  NAME,
  VAL
)
INTO TABLE staging
WHEN NAME = 'rd'
     AND VAL <> ''
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE position(1),
  ID,
  NAME,
  VAL
)
INTO TABLE staging
WHEN NAME = 'ac'
     AND VAL <> 'NO'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE position(1),
  ID,
  NAME,
  VAL
)
INTO TABLE staging
WHEN NAME <> ''
    AND NAME <> 'ie'
    AND NAME <> 'at'
    AND NAME <> 'rn'
    AND NAME <> 'rd'
    AND NAME <> 'ac'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE position(1),
  ID,
  NAME,
  VAL
)

ops$tkyte%ORA11GR2> !sqlldr / test data=test.dat

SQL*Loader: Release 11.2.0.2.0 - Production on Mon Mar 14 13:54:07 2011

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

Commit point reached - logical record count 11

ops$tkyte%ORA11GR2> select * from staging;

R         ID NAME                           VAL
- ---------- ------------------------------ --------------------
4          2 rn                             nonnull
4          4 rd                             nonnull
4          6 ac                             YES
4          9 ms                             1.23
4         10 mb                             1.56
4         11 ms                             .99

6 rows selected.


/



Records 7 and 8 should not load according to you - you said "when it is not equal to ie and at"

Rating

  (8 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks

Basil, March 14, 2011 - 1:08 pm UTC

Very helpful. I had tried POSITION(1) to no effect, but, in all likelihood, combined it incorrectly with something else.

You're also right in noting that 7 and 8 should be rejected. I have that correct in my analysis of the log, but not in my problem summary at the top.

Another Approach

APH, March 14, 2011 - 4:49 pm UTC

Any reason for not using an external table? I would imagine one could do all the filtering and transforming necessary with regular SQL, which people might be more familiar using.
Tom Kyte
March 14, 2011 - 5:02 pm UTC

it would be so much easier with an external table - yes, I should have mentioned that...

Error handling in external table.

Mahesh, March 14, 2011 - 6:13 pm UTC

After loading data using sql*loader,I check the return code and errors in sql loader log file or number records in bad file. How this check can be done while using the external table? I know it does creates a bad file/log file but it get created after first selection. Is this means first select * from external table and check for log/bad file?

Tom Kyte
March 14, 2011 - 6:25 pm UTC

just have an external table defined on the BAD file itself, you can query up the bad records (just use a few varchar2(4000) fields to map it and set reject limit 0 on the bad file - that way you know you are seeing the bad records)

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1615330671789#56205746904275

Why not an external table

Basil, March 14, 2011 - 7:50 pm UTC

Any reason for not using an external table? I would imagine one could do all the filtering and transforming necessary with regular SQL, which people might be more familiar using.

An external table isn't an option unless the file being uploaded resides on the database server. In this case, a client-side application does an analysis that generates these large files (for historical reasons, mainly because the client-side program existed for several years before a database came into the picture). If I were to move the file over, we'd have to use something like ftp or look to implement something that would transfer the file as a CLOB.

On that last point, I'd love to do something like transfer the entire file as a CLOB into the database, using SQL*Loader, and then have the database use the CLOB as the source for an external table. I know how to do the latter, but the SQL*Loader docs just left me scratching my head as to how I would do the former.

In my dream world, we'd drop the entire file-generation thing and have the application populate the database directly. However, you don't even want to know how well that idea was received. *sigh*

To Basil: Re: SqlLoader a blob

Kim Berg Hansen, March 15, 2011 - 2:23 am UTC

Hi, Basil

You can load a blob/clob with SqlLoader.

We have a table with columns Id, FileName, BlobData.
The client app generates a set of files and at the same time it generates the control file for SqlLoader like this:

load data
infile *
append
into table tmpblobdata
fields terminated by ';'
(
   Id integer external,
   FileName   char,
   BlobData   lobfile (tmpname) terminated by eof
)
begindata
1;/tmp/firstfile.doc
2;/tmp/secondfile.doc
3;/tmp/thirdfile.doc


In your data for SqlLoader you have the filename, and SqlLoader can load that file as a lob using the lobfile syntax.

If you do not wish the filename as a column in the table you insert to, you can probably use the "filler" syntax.

Perhaps an idea :-)

Bug in my previous review, sorry

Kim Berg Hansen, March 15, 2011 - 2:27 am UTC

Aaargh... typo in my previous review.
Sorry, Basil - here comes the right version:

load data
infile *
append
into table tmpblobdata
fields terminated by ';'
(
   Id integer external,
   FileName   char,
   BlobData   lobfile (FileName) terminated by eof
)
begindata
1;/tmp/firstfile.doc
2;/tmp/secondfile.doc
3;/tmp/thirdfile.doc


(Was a bit too quick with the cut and paste and edit and missed the typo in Preview Review :-)

Thanks!

Basil, March 15, 2011 - 10:37 am UTC

The sample is very helpful. I now have a CLOB in my database.

I had thought you could specify a CLOB as input to an external table definition (rather than a file in the filesystem), but I sure don't see that now. I'll try specifying the CLOB definition in the table as using the filesystem then load from that.

At this point, I want to compare the time required between the SQL*Loader-only approach, and the SQL*Loader with external table. The SQL*Loader-only approach can remove fully 1/3 of the rows in the source file on the client, which is a significant savings.

cursor

A reader, September 02, 2012 - 1:39 pm UTC