Home>Question Details



Jan -- Thanks for the question regarding "SQL*Loader data from first record must be inserted in all rows", version 11.1.0

Submitted on 1-Sep-2009 9:12 Central time zone
Last updated 14-Sep-2009 14:05

You Asked

Hi.
Some data in the first record of my text file (a date), has to be inserted in all rows of the table.
How do I create a CTL file for this?
The first row of the text-file goes e.g. HEAD20091001. The date (2009-10-01) relates to all the rest of the records in the text-file, and must be inserted in all records of the table.Example: First record: HEAD20090901 Second Record: Name1 Adress1 Third Record Name2 Address3. Table: addrdata(asofdate date, name varchar2(200), addr varchar2(200)) The date from first record of text-file must be inserted into record1 and record 2 og table. An external table could maybe be a good idea?

and we said...

two external tables, one for record one and the other for records 2 on.


ops$tkyte%ORA10GR2> create or replace directory upload_directory as '/tmp'
  2  /

Directory created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> CREATE TABLE dept_et
  2  (
  3    "DEPTNO" NUMBER(2),
  4    "DNAME" VARCHAR2(14),
  5    "LOC" VARCHAR2(13)
  6  )
  7  ORGANIZATION external
  8  (
  9    TYPE oracle_loader
 10    DEFAULT DIRECTORY UPLOAD_DIRECTORY
 11    ACCESS PARAMETERS
 12    (
 13      RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
 14      READSIZE 1048576
 15      SKIP 1
 16      FIELDS TERMINATED BY "|" LDRTRIM
 17      REJECT ROWS WITH ALL NULL FIELDS
 18      (
 19        "DEPTNO" CHAR(255)
 20          TERMINATED BY "|",
 21        "DNAME" CHAR(255)
 22          TERMINATED BY "|",
 23        "LOC" CHAR(255)
 24          TERMINATED BY "|"
 25      )
 26    )
 27    location
 28    (
 29      'dept.dat'
 30    )
 31  )REJECT LIMIT UNLIMITED
 32  /

Table created.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> CREATE TABLE DT_ET
  2  (
  3    "X" varchar2(8)
  4  )
  5  ORGANIZATION external
  6  (
  7    TYPE oracle_loader
  8    DEFAULT DIRECTORY UPLOAD_DIRECTORY
  9    ACCESS PARAMETERS
 10    (
 11      RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
 12      FIELDS LDRTRIM
 13      REJECT ROWS WITH ALL NULL FIELDS
 14      (
 15        "X" (5:12) CHAR(8)
 16      )
 17    )
 18    location
 19    (
 20      'dept.dat'
 21    )
 22  )REJECT LIMIT UNLIMITED
 23  /

Table created.

ops$tkyte%ORA10GR2> !cat /tmp/dept.dat
HEAD20090901
10|ACCOUNTING|NEW YORK
20|RESEARCH|DALLAS
30|SALES|CHICAGO
40|OPERATIONS|BOSTON

ops$tkyte%ORA10GR2> select * from dept_et;

    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        40 OPERATIONS     BOSTON

ops$tkyte%ORA10GR2> select * from dt_et where rownum = 1;

X
--------
20090901

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> with dt as (select * from dt_et where rownum=1)
  2  select * from dept_et, dt;

    DEPTNO DNAME          LOC           X
---------- -------------- ------------- --------
        10 ACCOUNTING     NEW YORK      20090901
        20 RESEARCH       DALLAS        20090901
        30 SALES          CHICAGO       20090901
        40 OPERATIONS     BOSTON        20090901


Reviews    
4 stars Review   September 2, 2009 - 3pm Central time zone
Reviewer: Jan Ahlberg from Denmark
I was hoping for a more "elegant" solution, but I trust (now), that this is the best solution that 
I can find. Thank you very much.


Followup   September 3, 2009 - 7am Central time zone:

it is funny how elegance is in the eye of the beholder since I thought that anything that involves external tables is immediately and definitely more elegant than sqlldr.

Are there tricks to do this with sqlldr? Sure, but it is a rather dumb loading tool - we could trick it out with various options and using a trigger (ugh) - it would be less than elegant :)
5 stars Eye of the beholder   September 3, 2009 - 12pm Central time zone
Reviewer: Tom Barry from San Francisco, CA
Well, I thought it was elegant.


4 stars elegant?   September 4, 2009 - 12pm Central time zone
Reviewer: Jiri from San Francisco, CA
In general rejecting records is really bad approach. In my opinion if you use external tables, you 
should always try to use REJECT LIMIT 0 and use LOAD WHEN clause instead (of course this is not 
always possible, but it should be your first pick).


in general (maybe not applicable here due different structure of header/body), you can use INSERT 
INTO ALL clause and insert the record to multiple tables.


Followup   September 4, 2009 - 3pm Central time zone:

... In general rejecting records is really bad approach. In my opinion if you use ...


it depends - it depends on the use case. dml error logging, sqlldr and external tables all offer the option - so, it depends.
4 stars I disagree   September 4, 2009 - 7pm Central time zone
Reviewer: Jiri from San Francisco, CA
I really disagree with you, your example was external table (lets stay with that) with REJECT 
UNLIMITED, it is very bad practice to use it. Not only you can miss real errors, but also it can be 
much slower than using LOAD WHEN

yes, there are situations where you cannot use LOAD WHEN, but you should always try to find better 
solution than REJECT UNLIMITED 


Followup   September 6, 2009 - 9am Central time zone:

why? give a reason - it is not like you cannot figure out "some records were rejected" (a common approach is yet another external table to read the bad file - one with a series of varchar2(4000) fields that can expose the entire record - just field1, field2, fielde3)

Just like sqlldr, rejects are "the default"

so, if you have a reason other than "I don't like it", let's hear it?

It is very common, the norm even, during a load to permit failure - dml error logging for example was introduced in 10g years ago to allow just such a thing.

and I think people reading an example - an approach - can make semi-intelligent uses of supplied demo's. If errors are a "no no" for them - they would, well, modify the code.
3 stars Not really   September 9, 2009 - 3am Central time zone
Reviewer: Jiri from San Francisco ca
Rejecting possibly millions of records and creating possibly huge bad file is not what like or 
don't like. It just because you don't want to load the records is bad practice I general. Error 
logging should be used in my opinion only for real errors and not half baked code.

1. Two external tables with LOAD WHEN clause work better

or

2. You can parse the first row in unix and create tiny one row file (this can be part of the file 
move)

or (and don't like this approach)

3. You can keep the same ddl for first record and separated it into 3 records using substr in SQL 
(still faster than generate possibly large bad file)




Followup   September 9, 2009 - 6am Central time zone:

... Rejecting possibly millions of records and creating possibly huge bad file is
not what like or don't like. ...

that is your opinion, that is fine, I'm ok with you thinking that.


1) load when will *not* filter bad records out, you are still back where you started, worse even - because if you use the load when to try to filter out bad records - you don't know they were filtered until you read and parse the log file.



I don't know where you are going with this. In my example, i use rownum to get just the first record and skip in the second one to skip the first record. I don't need (or want) a load when, using load when would be totally erroneous and not useful here.
5 stars Do I understand correctly   September 9, 2009 - 9am Central time zone
Reviewer: A reader 
Tom, Can you please confirm if my understanding is correct here. You use the rownum = 1, which actually only on select from the external table tries to load the data. i.e The data is only loaded into an external table when you attempt to select from the table itself? Therefore the comment above, about how a bad file will be generated is not true. If you were to select everything from the external table, then yes, there would be some errors, but because of the rownum clause, it will only attempt to load the first record from dept.dat?
Many thanks in advance, Pete

Followup   September 14, 2009 - 10am Central time zone:

... The data is only loaded into an external table when you attempt to select from the table itself? ...

that does not make sense, an external table already exists, it is not loaded "into", it is selected from

I am using rownum=1 to get the first record, correct.

2 stars shell script   September 11, 2009 - 5am Central time zone
Reviewer: Pasko from Hamburg, Germany
Hi,

i just tried the shell script solution suggested by Jiri.
So, just parse the Data and generate a new file which includes the first record in all rows and 
then load the generated file.

Shell-Script.
-------------
#:cat parse_test.sh

#!/bin/sh
read asofdate
while read line
do
 echo "${asofdate}|${line}"
done

Data File:
-----------
#:cat test_data.txt

HEAD20090901
10|ACCOUNTING|NEW YORK
20|RESEARCH|DALLAS
30|SALES|CHICAGO
40|OPERATIONS|BOSTON

Command to parse the Data:
--------------------------
#: ./parse_test.sh  < test_data.txt > test_out.txt


Processed Data:
---------------
#: cat test_out.txt

HEAD20090901|10|ACCOUNTING|NEW YORK
HEAD20090901|20|RESEARCH|DALLAS
HEAD20090901|30|SALES|CHICAGO
HEAD20090901|40|OPERATIONS|BOSTON


I hope this helps ;)



4 stars Looks elegant to me   September 11, 2009 - 1pm Central time zone
Reviewer: Jared from Beaverton, OR USA
To the poster that thought the use of external tables inelegant:  Have you written any complicated 
sqlldr control files?

In a previous job I spent a great deal of time with sqlldr.

There is nothing elegant about sqlldr control files.

The use of external tables was excellent IMO.


3 stars Agreed   September 12, 2009 - 1pm Central time zone
Reviewer: Jiri from Sf
Totally agree that external tables are easier to use tha SQL loader. My issue is using reject as 
generic method for filtering.

LOAD WHEN is filter and simple skips records which do not meet criteria  (if for example load when 
depends of first field, other fields in the row are ignored). Using reject unlimited is not good, 
if I ignore bad file which you can suppress or cache of few thousand rows, you can face real 
issues. What if first row is corrupted? Because you use reject unlimited, external table will not 
return error.

If you can use LOAD WHEN (for example first field is field type and defines header/body/tail) then 
it's better solution.

I showed two other options (unix parsing - can be done using plsql as well) or reading first row as 
varchar2 (which I am not big fan of).

We all probably agree that we should never use plsql clause "when others then null" and rather 
catch each error directly, I really don't get why very generic "reject unlimited" is ok.



3 stars bad file   September 13, 2009 - 4am Central time zone
Reviewer: Jiri from Sf
"Tom, Can you please confirm if my understanding is correct here. You use the rownum = 1, which 
actually only on select from the external table tries to load the data. i.e The data is only loaded 
into an external table when you attempt to select from the table itself? Therefore the comment 
above, about how a bad file will be generated is not true."


To answer this, yes it will generate BAD file. That was my issue in the first place (both external 
tables in the example have REJECT UNLIMITED which in my opinion is bad practice). You can avoid bad 
file (if you want to read rownum=1) by lowering the cache for external table (you won't really 
avoid it but make it read less records) and by eliminating creating BAD file (please note you also 
need to deal with LOG file as it contains info about rejected rows - not content but info what rows 
were rejected and why). 


but again, the question in the post said that the first row says HEAD20091001 (which I assume HEAD 
defines header row), if that is true using LOAD WHEN with combination of ROWNUM=1 will not generate 
any bad records and will be very clean)


3 stars sample   September 13, 2009 - 5am Central time zone
Reviewer: Jiri from San Francisco, CA
This is how external table with LOAD WHEN can look like, it does not generate any bad records and 
should be even faster than using reject unlimited. I also changed the varchar to date (assuming it 
is always a date in static format).


hope this helps
jiri



CREATE TABLE DT_ET
(
  ROW_TYPE  VARCHAR2(4 BYTE),
  X         DATE
)
ORGANIZATION EXTERNAL
  (  TYPE ORACLE_LOADER
     DEFAULT DIRECTORY UPLOAD_DIRECTORY
     ACCESS PARAMETERS 
       ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
         BADFILE BAD_DIRECTORY:'dept.bad'
         LOGFILE LOG_DIRECTORY:'dept.log'
         NODISCARDFILE
         SKIP 0
         LOAD WHEN ( ROW_TYPE = 'HEAD')
         FIELDS LDRTRIM
         REJECT ROWS WITH ALL NULL FIELDS
         (
           ROW_TYPE(1:4)   CHAR(4),
           X       (5:12)  DATE 'YYYYMMDD'
         )
       )
     LOCATION (UPLOAD_DIRECTORY:'dept.dat')
  )
REJECT LIMIT 0;



3 stars just to clarify   September 13, 2009 - 5am Central time zone
Reviewer: Jiri from San Francisco, CA
just to clarify sample above, you would read it with ROWNUM=1 to guarantee maximum speed and 
possible eliminate rare cases where body of the file includes word HEAD at the beginning of the row 
(not sure if it is possible in this case). REJECT LIMIT 0 and DATE conversion also stop the load in 
case of date format issue in the header (corrupted header, ...)

select * from dt_et where rownum = 1;


3 stars Whats wrong with rejecting records?   September 13, 2009 - 9pm Central time zone
Reviewer: Galen Boyer from Boston
> We all probably agree that we should never use plsql clause "when
> others then null" and rather catch each error directly, I really don't
> get why very generic "reject unlimited" is ok.

What is wrong with rejecting records?  What is wrong is letting bad
records in, which is what "when others null" does.  Rejecting all bad
records is the same as not having any exception handling, and just
letting your designed constraints fire.


Followup   September 14, 2009 - 1pm Central time zone:

we invented things like dml error logging to mimic the sqlldr "bad" file which is the same as the external table bad file. It is pretty "standard" and is not anywhere near "when others then null"

Use it or not, it is up to you
3 stars Agreed   September 14, 2009 - 11am Central time zone
Reviewer: Jiri from San francisco ca
Galen, I thing we both say the same thing. I am strictly against blind reject unlimited that's why 
I responded to very initial answer and offered better solution with reject limit 0 using load when 
filter. My subject "why is Rejcting all reconds bad?" was a response to moderator of this question 
who created the solution with reject limit unlimited and did not get why I was so against it. 


3 stars reject unlimited is not bad   September 14, 2009 - 1pm Central time zone
Reviewer: Galen Boyer from Boston
Jiri,

I'm with Tom in this. I don't see how unlimited rejection of records
is bad.  Take a small UI case.  A user tries to something invalid,
REJECTION. The user tries something invalid, REJECTION. The user tries
something invalid, REJECTION.  The user tries something valid, ALLOW.
The user tries something invalid, REJECTION ... and so on...

Why is unlimited rejection not a bad thing in the UI case but a bad
thing in a batch case?  

Ask it another way, as long as the database does not allow invalid
records into the database, what is the issue?


Followup   September 14, 2009 - 2pm Central time zone:

the bottom line is

it is neither inherently evil, not inherently awesome.

It is a capability that (much like when others then null - triggers - autonomous_transactions) when used appropriately is the greatest thing ever, when used inappropriately a disaster waiting to happen.
3 stars I really dont agree   September 15, 2009 - 6pm Central time zone
Reviewer: jiri from San Francisco, CA
just to close the topic ... REJECT UNLIMITED is useful in some cases. This questing was not generic 
question but the person asked how to read HEADER row, blidndly rejecting body rows and creating 
extra files on unix is overhead you dont need, I clearly showed that LOAD WHEN filter does the same 
and will not need to create rejects. It not only does not create any extra files, it also makes the 
load more tight in case of file corruptions in first row.






Write a Review
 


All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.

About Oracle | Legal Notices and Terms of Use | Privacy Statement