I'm working to improve the efficiency of an existing process which starts with staging an input file (generated by a client program) into the database. In some situations, the input file can have 20,000,000 lines or more, but with 1/3 or so of those input lines being of no use on the input side.
At present, ALL of the lines are transferred (as rows in a staging table), with unneeded lines pruned at the beginning of processing, using a DELETE. The delete obviously generates a lot of redo and undo, and takes some time to accomplish. Further, we're transferring lines that we don't even need to bother with in the first place.
I'm trying to modify the SQL*Loader control file to discard these unneeded records in order to save both the transfer time and the pruning time. However, I'm running up against SQL*Loader's lack of an "OR" in its condition processing. I've tried structuring the control file to perform multiple tests, so that each successful test would result in sending the row to the database. My results show that nearly ALL rows are being rejected.
My questions:
1. What's wrong with my control file (see below)? The docs suggest this is one way to solve the problem.
In the input, I'd expect lines 2, 4, 6, 7, 8, 9, 10, and 11 to load.2. Is there a better way to do this?
The staging table: drop table staging;
create table staging (
rectype char(1) not null,
id number not null,
name varchar2(64) not null,
val varchar2(256))
/
My input file, inputdata.csv:4,1,"rn",""
4,2,"rn","nonnull"
4,3,"rd",""
4,4,"rd","nonnull"
4,5,"ac","NO"
4,6,"ac","YES"
4,7,"ie","nonnull"
4,8,"at","nonnull"
4,9,"ms","1.23"
4,10,"mb","1.56"
4,11,"ms",".99"
My control file, upload_inputdata.ctl:LOAD DATA
REPLACE
INTO TABLE staging
WHEN NAME = 'rn'
AND VAL <> ''
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE,
ID,
NAME,
VAL
)
INTO TABLE staging
WHEN NAME = 'rd'
AND VAL <> ''
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE,
ID,
NAME,
VAL
)
INTO TABLE staging
WHEN NAME = 'ac'
AND VAL <> 'NO'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE,
ID,
NAME,
VAL
)
INTO TABLE staging
WHEN NAME <> ''
AND NAME <> 'ie'
AND NAME <> 'at'
AND NAME <> 'rn'
AND NAME <> 'rd'
AND NAME <> 'ac'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE,
ID,
NAME,
VAL
)
The command to launch SQL*Loader:sqlldr user/pass@db DATA='inputdata.csv' CONTROL=upload_inputdata.ctl LOG=inputdata.log SILENT=HEADER ROWS=10000 DIRECT=true DISCARD=inputdata_discards.csv
My discards file, with comments:SQL*Loader: Release 10.2.0.4.0 - Production on Mon Mar 14 11:56:25 2011
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Control File: upload_inputdata.ctl
Data File: inputdata.csv
Bad File: inputdata.bad
Discard File: inputdata_discards.csv
(Allow all discards)
Number to load: ALL
Number to skip: 0
Errors allowed: 50
Continuation: none specified
Path used: Direct
Table STAGING, loaded when NAME = 0X726e(character 'rn')
and VAL != BLANKS
Insert option in effect for this table: REPLACE
TRAILING NULLCOLS option in effect
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
RECTYPE FIRST * , O(") CHARACTER
ID NEXT * , O(") CHARACTER
NAME NEXT * , O(") CHARACTER
VAL NEXT * , O(") CHARACTER
Table STAGING, loaded when NAME = 0X7264(character 'rd')
and VAL != BLANKS
Insert option in effect for this table: REPLACE
TRAILING NULLCOLS option in effect
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
RECTYPE NEXT * , O(") CHARACTER
ID NEXT * , O(") CHARACTER
NAME NEXT * , O(") CHARACTER
VAL NEXT * , O(") CHARACTER
Table STAGING, loaded when NAME = 0X6163(character 'ac')
and VAL != 0X4e4f(character 'NO')
Insert option in effect for this table: REPLACE
TRAILING NULLCOLS option in effect
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
RECTYPE NEXT * , O(") CHARACTER
ID NEXT * , O(") CHARACTER
NAME NEXT * , O(") CHARACTER
VAL NEXT * , O(") CHARACTER
Table STAGING, loaded when NAME != BLANKS
and NAME != 0X6965(character 'ie')
and NAME != 0X6174(character 'at')
and NAME != 0X726e(character 'rn')
and NAME != 0X7264(character 'rd')
and NAME != 0X6163(character 'ac')
Insert option in effect for this table: REPLACE
TRAILING NULLCOLS option in effect
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
RECTYPE NEXT * , O(") CHARACTER
ID NEXT * , O(") CHARACTER
NAME NEXT * , O(") CHARACTER
VAL NEXT * , O(") CHARACTER
Record 1: Discarded - failed all WHEN clauses. -- OK
Record 3: Discarded - failed all WHEN clauses. -- OK
Record 4: Discarded - failed all WHEN clauses. -- NOT OK; Second WHEN should apply
Record 5: Discarded - failed all WHEN clauses. -- OK
Record 6: Discarded - failed all WHEN clauses. -- NOT OK; Third WHEN should apply
Record 7: Discarded - failed all WHEN clauses. -- OK
Record 8: Discarded - failed all WHEN clauses. -- OK
Record 9: Discarded - failed all WHEN clauses. -- NOT OK; Last WHEN should apply
Record 10: Discarded - failed all WHEN clauses. -- NOT OK; Last WHEN should apply
Record 11: Discarded - failed all WHEN clauses. -- NOT OK; Last WHEN should apply
Record 12: Discarded - failed all WHEN clauses. -- OK; blank line in input
Record 13: Discarded - failed all WHEN clauses. -- OK; blank line in input
Table STAGING:
1 Row successfully loaded.
0 Rows not loaded due to data errors.
12 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Table STAGING:
0 Rows successfully loaded.
0 Rows not loaded due to data errors.
13 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Table STAGING:
0 Rows successfully loaded.
0 Rows not loaded due to data errors.
13 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Table STAGING:
0 Rows successfully loaded.
0 Rows not loaded due to data errors.
13 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Bind array size not used in direct path.
Column array rows : 5000
Stream buffer bytes: 256000
Read buffer bytes: 1048576
Total logical records skipped: 0
Total logical records read: 13
Total logical records rejected: 0
Total logical records discarded: 12
Total stream buffers loaded by SQL*Loader main thread: 1
Total stream buffers loaded by SQL*Loader load thread: 0
Run began on Mon Mar 14 11:56:25 2011
Run ended on Mon Mar 14 11:56:26 2011
Elapsed time was: 00:00:01.00
CPU time was: 00:00:00.22
http://docs.oracle.com/cd/E11882_01/server.112/e16536/ldr_control_file.htm#sthref699 <quote>
The important point in this example is that the second empno field is found immediately after the first ename, although it is in a separate INTO TABLE clause. Field scanning does not start over from the beginning of the record for a new INTO TABLE clause. Instead, scanning continues where it left off.</quote>
In your case, only the FIRST into clause would see any data- the rest see all nulls.
drop table staging;
create table staging (
rectype char(1) not null,
id number not null,
name varchar2(64) not null,
val varchar2(256));
ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> !cat test.ctl
LOAD DATA
REPLACE
INTO TABLE staging
WHEN NAME = 'rn'
AND VAL <> ''
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE,
ID,
NAME,
VAL
)
INTO TABLE staging
WHEN NAME = 'rd'
AND VAL <> ''
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE position(1),
ID,
NAME,
VAL
)
INTO TABLE staging
WHEN NAME = 'ac'
AND VAL <> 'NO'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE position(1),
ID,
NAME,
VAL
)
INTO TABLE staging
WHEN NAME <> ''
AND NAME <> 'ie'
AND NAME <> 'at'
AND NAME <> 'rn'
AND NAME <> 'rd'
AND NAME <> 'ac'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
( RECTYPE position(1),
ID,
NAME,
VAL
)
ops$tkyte%ORA11GR2> !sqlldr / test data=test.dat
SQL*Loader: Release 11.2.0.2.0 - Production on Mon Mar 14 13:54:07 2011
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Commit point reached - logical record count 11
ops$tkyte%ORA11GR2> select * from staging;
R ID NAME VAL
- ---------- ------------------------------ --------------------
4 2 rn nonnull
4 4 rd nonnull
4 6 ac YES
4 9 ms 1.23
4 10 mb 1.56
4 11 ms .99
6 rows selected.
/
Records 7 and 8 should not load according to you - you said "when it is not equal to ie and at"