Your file format must change - or you will not be using sqlldr to do this. We have no way to know when your line ends.
This is from "Expert Oracle Database Architecture" on loading with sqlldr:
<quote>
Load Data with Embedded NewlinesThis is something that has been problematic for SQLLDR historically ¿ how to load free form data that may include a newline in it. The newline character is the default `end of line¿ character to SQLLDR, and the ways around this did not offer much flexibility in the past. Fortunately, in Oracle 8.1.6 and later versions we have some new options. The options for loading data with embedded newlines are now as follows:
o Load the data with some other character in the data that represents a newline (for example, put the string \n in the text where a newline should appear) and use a SQL function to replace that text with a CHR(10) during load time.
o Use the FIX attribute on the INFILE directive, and load a fixed length flat file.
o Use the VAR attribute on the INFILE directive, and load a varying width file that uses a format such that the first few bytes of each line is the length of the line to follow.
o Use the STR attribute on the INFILE directive to load a varying width file with some sequence of characters that represent the end of line, as opposed to just the newline character representing this.
We will demonstrate each in turn.
Use a Character Other than a Newline
This is an easy method if you have control over how the input data is produced. If it is easy enough to convert the data when creating the data file, this will work fine. The idea is to apply a SQL function to the data on the way into the database, replacing some string of characters with a newline. Lets add another column to our DEPT table:
ops$tkyte@ORA10G> alter table dept add comments varchar2(4000);
Table altered.
We¿ll use this column to load text into. An example control file with inline data could be:
LOAD DATA
INFILE *
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
COMMENTS "replace(:comments,'\\n',chr(10))"
)
BEGINDATA
10,Sales,Virginia,This is the Sales\nOffice in Virginia
20,Accounting,Virginia,This is the Accounting\nOffice in Virginia
30,Consulting,Virginia,This is the Consulting\nOffice in Virginia
40,Finance,Virginia,This is the Finance\nOffice in Virginia
Notice how in the call to replace we had to use \\n not just \n. This is because \n is recognized by SQLLDR as a newline, and it would have converted it into a newline, not a two character string. When we execute SQLLDR with the above control file, the table DEPT is loaded with:
ops$tkyte@ORA10G> select deptno, dname, comments from dept;
DEPTNO DNAME COMMENTS
---------- -------------- -------------------------
10 SALES This is the Sales
Office in Virginia
20 ACCOUNTING This is the Accounting
Office in Virginia
30 CONSULTING This is the Consulting
Office in Virginia
40 FINANCE This is the Finance
Office in Virginia
Use the FIX Attribute
The FIX attribute is another method available to us. If you use this, the input data must appear in fixed length records. Each record will be exactly the same number of bytes as any other record in the input data set. When using positional data, this is especially valid. These files are typically fixed length input files to begin with. When using `free form¿ delimited data, it is less likely that you will have a fixed length file as these files are generally of varying length (this is the entire point of delimited files ¿ to make each line only as big as it needs to be).
When using the FIX attribute, we must use an INFILE clause, as this is an option to INFILE. Additionally, the data must be stored externally, not in the control file itself using this option. So, assuming we have fixed length input records, we can use a control file such as this:
LOAD DATA
INFILE demo.dat "fix 80"
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
COMMENTS
)
This specifies an input data file that will have records that are 80 bytes each. This includes the trailing newline that may or may not be there. In this case, the newline is nothing special in the input data file. It is just another character to be loaded or not. This is the thing to understand ¿ the newline at the end of the record (if present) will become part of the record. In order to fully understand this, we need a utility to dump the contents of a file on screen so we can see what is really in there. Using Unix (or any Linux variant), this is pretty easy to do with od, a program to dump files to the screen in octal and other formats. We¿ll use the following demo.dat file:
[tkyte@desktop tkyte]$ od -c -w10 -v demo.dat
0000000 1 0 , S a l e s , V
0000012 i r g i n i a , T h
0000024 i s i s t h e
0000036 S a l e s \n O f f i
0000050 c e i n V i r g
0000062 i n i a
0000074
0000106
0000120 2 0 , A c c o u n t
0000132 i n g , V i r g i n
0000144 i a , T h i s i s
0000156 t h e A c c o u
0000170 n t i n g \n O f f i
0000202 c e i n V i r g
0000214 i n i a
0000226
0000240 3 0 , C o n s u l t
0000252 i n g , V i r g i n
0000264 i a , T h i s i s
0000276 t h e C o n s u
0000310 l t i n g \n O f f i
0000322 c e i n V i r g
0000334 i n i a
0000346
0000360 4 0 , F i n a n c e
0000372 , V i r g i n i a ,
0000404 T h i s i s t h
0000416 e F i n a n c e \n
0000430 O f f i c e i n
0000442 V i r g i n i a
0000454
0000466
0000500
[tkyte@desktop tkyte]$
Notice that in this input file, the newlines (\n) are not there to indicate where the end of the record for SQLLDR is ¿ rather they are just data to be loaded in this case. SQLLDR is using the fix width of 80 bytes to figure out how much data to read. In fact ¿ if you look at the input data, the records for SQLLDR are not even terminated by \n in this input file ¿ the character right before department 20¿s record is a space ¿ not a newline.
Now that we know each and every record is 80 bytes long, we are ready to load it using the control file we listed above with the FIX 80 clause. When we do so, we can see:
ops$tkyte@ORA10G> select '"' || comments || '"' comments from dept;
COMMENTS
-------------------------------------------------------------------------------
"This is the Sales
Office in Virginia "
"This is the Accounting
Office in Virginia "
"This is the Consulting
Office in Virginia "
"This is the Finance
Office in Virginia "
You might need to ¿trim¿ this data ¿ since the trailing whitespace is preserved. You can do that right in the control file using the TRIM built-in SQL function.
A word of caution for those of you lucky enough to work on both Windows and UNIX. The end of line marker is different on these platforms. On UNIX it is simply \n. On Windows NT is it \r\n. In general, if you use the FIX approach, make sure to create and load the file on a homogenous platform (UNIX and UNIX, or Windows and Windows).
Use the VAR Attribute
Another method of loading data with embedded newline characters is to use the VAR attribute. When using this format, each record will begin with some fixed number of bytes that represent the total length of the incoming record. Using this format, I can load varying length records that contain embedded newlines, but only if I have a record length field at the beginning of each and every record. So, if I use a control file such as:
LOAD DATA
INFILE demo.dat "var 3"
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
COMMENTS
)
then the var 3 says that the first three bytes of each input record will be the length of that input record. If I take a data file such as:
[tkyte@desktop tkyte]$ cat demo.dat
05510,Sales,Virginia,This is the Sales
Office in Virginia
06520,Accounting,Virginia,This is the Accounting
Office in Virginia
06530,Consulting,Virginia,This is the Consulting
Office in Virginia
05940,Finance,Virginia,This is the Finance
Office in Virginia
[tkyte@desktop tkyte]$
I can load it using that control file. In my input data file, I have four rows of data. The first row starts with 055, meaning that the next 55 bytes represent the first input record. This 55 bytes includes the terminating newline after the word Virginia. The next row starts with 065. It has 65 bytes of text, and so on. Using this format data file, we can easily load our data with embedded newlines.
Again, if you are using UNIX and Windows (the above example was UNIX where a newline is one character long), you would have to adjust the length field for each record. On Windows the above .DAT file would have to have 56, 66, 66, and 60 as the length fields in this particular example.
Use the STR Attribute
This is perhaps the most flexible method of loading data with embedded newlines. Using the STR attribute, I can specify a new end of line character (or sequence of characters). This allows you to create an input data file that has some special character at the end of each line ¿ the newline is no longer `special¿.
I prefer to use a sequence of characters, typically some special `marker¿, and then a newline. This makes it easy to see the end of line character when viewing the input data in a text editor or some utility, as each record still has a newline at the end of it. The STR attribute is specified in hexadecimal, and perhaps the easiest way to get the exact hexadecimal string you need is to use SQL and UTL_RAW to produce the hexadecimal string for us. For example, assuming you are on UNIX where the end of line marker is CHR(10) (line feed) and your special marker character is a pipe |, we can write this:
ops$tkyte@ORA10G> select utl_raw.cast_to_raw( '|'||chr(10) ) from dual;
UTL_RAW.CAST_TO_RAW('|'||CHR(10))
-------------------------------------------------------------------------------
7C0A
which shows us that the STR we need to use on UNIX would be X'7C0A'.
NOTE: On Windows, you would use UTL_RAW.CAST_TO_RAW( `|¿||chr(13)||chr(10) )
To use this, we might have a control file like this:
LOAD DATA
INFILE demo.dat "str X'7C0A'"
INTO TABLE DEPT
REPLACE
FIELDS TERMINATED BY ','
TRAILING NULLCOLS
(DEPTNO,
DNAME "upper(:dname)",
LOC "upper(:loc)",
COMMENTS
)
So, if your input data looks like this:
[tkyte@desktop tkyte]$ cat demo.dat
10,Sales,Virginia,This is the Sales
Office in Virginia|
20,Accounting,Virginia,This is the Accounting
Office in Virginia|
30,Consulting,Virginia,This is the Consulting
Office in Virginia|
40,Finance,Virginia,This is the Finance
Office in Virginia|
[tkyte@desktop tkyte]$
where each record in the data file ended with a |\n, the above control file will load it correctly.
Embedded Newlines Wrap-Up
So, we have explored at least four ways to load data with embedded newlines. In the very next section, we will use one of these, the STR attribute, in a generic unload utility to avoid issues with regards to newlines in text.
Additionally, one thing to be very aware of, and I¿ve mentioned it above a couple of times, is that on Windows (all flavors), text files may end in \r\n (ASCII 13 + ASCII 10, carriage return/line feed). Your control file will have to accommodate this ¿ that \r is part of the record. The byte counts in the FIX and VAR, and the string used with STR must accommodate this. For example, if you took any of the above .DAT files that currently contain just \n in them and ftp¿ed them to Windows using an ASCII transfer (the default), every \n would turn into \r\n. The same control file that just worked in UNIX would not be able to load the data anymore. This is something you must be aware of, and take into consideration when setting up the control file.
</quote>