Review
September 2, 2009 - 3pm Central time zone
Reviewer: Jan Ahlberg from Denmark
I was hoping for a more "elegant" solution, but I trust (now), that this is the best solution that
I can find. Thank you very much.
Followup September 3, 2009 - 7am Central time zone:
it is funny how elegance is in the eye of the beholder since I thought that anything that involves external tables is immediately and definitely more elegant than sqlldr.
Are there tricks to do this with sqlldr? Sure, but it is a rather dumb loading tool - we could trick it out with various options and using a trigger (ugh) - it would be less than elegant :)
Eye of the beholder
September 3, 2009 - 12pm Central time zone
Reviewer: Tom Barry from San Francisco, CA
Well, I thought it was elegant.
elegant?
September 4, 2009 - 12pm Central time zone
Reviewer: Jiri from San Francisco, CA
In general rejecting records is really bad approach. In my opinion if you use external tables, you
should always try to use REJECT LIMIT 0 and use LOAD WHEN clause instead (of course this is not
always possible, but it should be your first pick).
in general (maybe not applicable here due different structure of header/body), you can use INSERT
INTO ALL clause and insert the record to multiple tables.
Followup September 4, 2009 - 3pm Central time zone:
... In general rejecting records is really bad approach. In my opinion if you use ...
it depends - it depends on the use case. dml error logging, sqlldr and external tables all offer the option - so, it depends.
I disagree
September 4, 2009 - 7pm Central time zone
Reviewer: Jiri from San Francisco, CA
I really disagree with you, your example was external table (lets stay with that) with REJECT
UNLIMITED, it is very bad practice to use it. Not only you can miss real errors, but also it can be
much slower than using LOAD WHEN
yes, there are situations where you cannot use LOAD WHEN, but you should always try to find better
solution than REJECT UNLIMITED
Followup September 6, 2009 - 9am Central time zone:
why? give a reason - it is not like you cannot figure out "some records were rejected" (a common approach is yet another external table to read the bad file - one with a series of varchar2(4000) fields that can expose the entire record - just field1, field2, fielde3)
Just like sqlldr, rejects are "the default"
so, if you have a reason other than "I don't like it", let's hear it?
It is very common, the norm even, during a load to permit failure - dml error logging for example was introduced in 10g years ago to allow just such a thing.
and I think people reading an example - an approach - can make semi-intelligent uses of supplied demo's. If errors are a "no no" for them - they would, well, modify the code.
Not really
September 9, 2009 - 3am Central time zone
Reviewer: Jiri from San Francisco ca
Rejecting possibly millions of records and creating possibly huge bad file is not what like or
don't like. It just because you don't want to load the records is bad practice I general. Error
logging should be used in my opinion only for real errors and not half baked code.
1. Two external tables with LOAD WHEN clause work better
or
2. You can parse the first row in unix and create tiny one row file (this can be part of the file
move)
or (and don't like this approach)
3. You can keep the same ddl for first record and separated it into 3 records using substr in SQL
(still faster than generate possibly large bad file)
Followup September 9, 2009 - 6am Central time zone:
... Rejecting possibly millions of records and creating possibly huge bad file is
not what like or don't like. ...
that is your opinion, that is fine, I'm ok with you thinking that.
1) load when will *not* filter bad records out, you are still back where you started, worse even - because if you use the load when to try to filter out bad records - you don't know they were filtered until you read and parse the log file.
I don't know where you are going with this. In my example, i use rownum to get just the first record and skip in the second one to skip the first record. I don't need (or want) a load when, using load when would be totally erroneous and not useful here.
Do I understand correctly
September 9, 2009 - 9am Central time zone
Reviewer: A reader
Tom, Can you please confirm if my understanding is correct here. You use the rownum = 1, which actually only on select from the external table tries to load the data. i.e The data is only loaded into an external table when you attempt to select from the table itself? Therefore the comment above, about how a bad file will be generated is not true. If you were to select everything from the external table, then yes, there would be some errors, but because of the rownum clause, it will only attempt to load the first record from dept.dat?
Many thanks in advance, Pete
Followup September 14, 2009 - 10am Central time zone:
... The data is only loaded into an external table when you attempt to select from the table itself? ...
that does not make sense, an external table already exists, it is not loaded "into", it is selected from
I am using rownum=1 to get the first record, correct.
shell script
September 11, 2009 - 5am Central time zone
Reviewer: Pasko from Hamburg, Germany
Hi,
i just tried the shell script solution suggested by Jiri.
So, just parse the Data and generate a new file which includes the first record in all rows and
then load the generated file.
Shell-Script.
-------------
#:cat parse_test.sh
#!/bin/sh
read asofdate
while read line
do
echo "${asofdate}|${line}"
done
Data File:
-----------
#:cat test_data.txt
HEAD20090901
10|ACCOUNTING|NEW YORK
20|RESEARCH|DALLAS
30|SALES|CHICAGO
40|OPERATIONS|BOSTON
Command to parse the Data:
--------------------------
#: ./parse_test.sh < test_data.txt > test_out.txt
Processed Data:
---------------
#: cat test_out.txt
HEAD20090901|10|ACCOUNTING|NEW YORK
HEAD20090901|20|RESEARCH|DALLAS
HEAD20090901|30|SALES|CHICAGO
HEAD20090901|40|OPERATIONS|BOSTON
I hope this helps ;)
Looks elegant to me
September 11, 2009 - 1pm Central time zone
Reviewer: Jared from Beaverton, OR USA
To the poster that thought the use of external tables inelegant: Have you written any complicated
sqlldr control files?
In a previous job I spent a great deal of time with sqlldr.
There is nothing elegant about sqlldr control files.
The use of external tables was excellent IMO.
Agreed
September 12, 2009 - 1pm Central time zone
Reviewer: Jiri from Sf
Totally agree that external tables are easier to use tha SQL loader. My issue is using reject as
generic method for filtering.
LOAD WHEN is filter and simple skips records which do not meet criteria (if for example load when
depends of first field, other fields in the row are ignored). Using reject unlimited is not good,
if I ignore bad file which you can suppress or cache of few thousand rows, you can face real
issues. What if first row is corrupted? Because you use reject unlimited, external table will not
return error.
If you can use LOAD WHEN (for example first field is field type and defines header/body/tail) then
it's better solution.
I showed two other options (unix parsing - can be done using plsql as well) or reading first row as
varchar2 (which I am not big fan of).
We all probably agree that we should never use plsql clause "when others then null" and rather
catch each error directly, I really don't get why very generic "reject unlimited" is ok.
bad file
September 13, 2009 - 4am Central time zone
Reviewer: Jiri from Sf
"Tom, Can you please confirm if my understanding is correct here. You use the rownum = 1, which
actually only on select from the external table tries to load the data. i.e The data is only loaded
into an external table when you attempt to select from the table itself? Therefore the comment
above, about how a bad file will be generated is not true."
To answer this, yes it will generate BAD file. That was my issue in the first place (both external
tables in the example have REJECT UNLIMITED which in my opinion is bad practice). You can avoid bad
file (if you want to read rownum=1) by lowering the cache for external table (you won't really
avoid it but make it read less records) and by eliminating creating BAD file (please note you also
need to deal with LOG file as it contains info about rejected rows - not content but info what rows
were rejected and why).
but again, the question in the post said that the first row says HEAD20091001 (which I assume HEAD
defines header row), if that is true using LOAD WHEN with combination of ROWNUM=1 will not generate
any bad records and will be very clean)
sample
September 13, 2009 - 5am Central time zone
Reviewer: Jiri from San Francisco, CA
This is how external table with LOAD WHEN can look like, it does not generate any bad records and
should be even faster than using reject unlimited. I also changed the varchar to date (assuming it
is always a date in static format).
hope this helps
jiri
CREATE TABLE DT_ET
(
ROW_TYPE VARCHAR2(4 BYTE),
X DATE
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY UPLOAD_DIRECTORY
ACCESS PARAMETERS
( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
BADFILE BAD_DIRECTORY:'dept.bad'
LOGFILE LOG_DIRECTORY:'dept.log'
NODISCARDFILE
SKIP 0
LOAD WHEN ( ROW_TYPE = 'HEAD')
FIELDS LDRTRIM
REJECT ROWS WITH ALL NULL FIELDS
(
ROW_TYPE(1:4) CHAR(4),
X (5:12) DATE 'YYYYMMDD'
)
)
LOCATION (UPLOAD_DIRECTORY:'dept.dat')
)
REJECT LIMIT 0;
just to clarify
September 13, 2009 - 5am Central time zone
Reviewer: Jiri from San Francisco, CA
just to clarify sample above, you would read it with ROWNUM=1 to guarantee maximum speed and
possible eliminate rare cases where body of the file includes word HEAD at the beginning of the row
(not sure if it is possible in this case). REJECT LIMIT 0 and DATE conversion also stop the load in
case of date format issue in the header (corrupted header, ...)
select * from dt_et where rownum = 1;
Whats wrong with rejecting records?
September 13, 2009 - 9pm Central time zone
Reviewer: Galen Boyer from Boston
> We all probably agree that we should never use plsql clause "when
> others then null" and rather catch each error directly, I really don't
> get why very generic "reject unlimited" is ok.
What is wrong with rejecting records? What is wrong is letting bad
records in, which is what "when others null" does. Rejecting all bad
records is the same as not having any exception handling, and just
letting your designed constraints fire.
Followup September 14, 2009 - 1pm Central time zone:
we invented things like dml error logging to mimic the sqlldr "bad" file which is the same as the external table bad file. It is pretty "standard" and is not anywhere near "when others then null"
Use it or not, it is up to you
Agreed
September 14, 2009 - 11am Central time zone
Reviewer: Jiri from San francisco ca
Galen, I thing we both say the same thing. I am strictly against blind reject unlimited that's why
I responded to very initial answer and offered better solution with reject limit 0 using load when
filter. My subject "why is Rejcting all reconds bad?" was a response to moderator of this question
who created the solution with reject limit unlimited and did not get why I was so against it.
reject unlimited is not bad
September 14, 2009 - 1pm Central time zone
Reviewer: Galen Boyer from Boston
Jiri,
I'm with Tom in this. I don't see how unlimited rejection of records
is bad. Take a small UI case. A user tries to something invalid,
REJECTION. The user tries something invalid, REJECTION. The user tries
something invalid, REJECTION. The user tries something valid, ALLOW.
The user tries something invalid, REJECTION ... and so on...
Why is unlimited rejection not a bad thing in the UI case but a bad
thing in a batch case?
Ask it another way, as long as the database does not allow invalid
records into the database, what is the issue?
Followup September 14, 2009 - 2pm Central time zone:
the bottom line is
it is neither inherently evil, not inherently awesome.
It is a capability that (much like when others then null - triggers - autonomous_transactions) when used appropriately is the greatest thing ever, when used inappropriately a disaster waiting to happen.
I really dont agree
September 15, 2009 - 6pm Central time zone
Reviewer: jiri from San Francisco, CA
just to close the topic ... REJECT UNLIMITED is useful in some cases. This questing was not generic
question but the person asked how to read HEADER row, blidndly rejecting body rows and creating
extra files on unix is overhead you dont need, I clearly showed that LOAD WHEN filter does the same
and will not need to create rejects. It not only does not create any extra files, it also makes the
load more tight in case of file corruptions in first row.
|