Skip to Main Content
  • Questions
  • Loading data which contains Long datatype by using SQLLoad

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Sharon.

Asked: January 17, 2001 - 2:55 pm UTC

Last updated: January 18, 2010 - 6:24 am UTC

Version: Oracle 8.1.5

Viewed 10K+ times! This question is

You Asked

I have an exported data file from Lotus Notes, one column contains comments information which user can enter as many as they want, when I look at export file, the comments column data runs mutiple lines since it contains Enter Key(user press enter key), so, how do I give correct control file in order to loading data by using SQLLoad. The table in Oracle contains one column as Long datatype to store comments. This column is last column.

If you need more info, please let me know!

Thank you very much in advance!

Sharon

and Tom said...

Loading data with embedded Newlines -- that is something that has been problematic for SQLLDR historically – how to load free form data that may include a newline in it. The newline character is the default “end of line” character to SQLLDR and the ways around this did not offer much flexibility in the past. Fortunately, in Oracle8i 8.1.6 and up, we have some new options. I'll list them out with some examples and tell you which work in which versions. I see you have version 8.1.5 so the "easiest" option (the STR option below) is not available.

The options for loading data with embedded newlines are now as follows:

o Load the data with some other character in the data that represents a newline (e.g. put the string \n in the text where a newline should appear) and use a SQL function to replace that text with a chr(10) during load time. This works for string data of 4000 bytes or less only! This works in all releases (although in 7.x, the limit is 2000 bytes)

o Use the FIX attribute on the infile directive and load a fixed length flat filed. This works in all releases.

o Use the VAR attribute on the infile directive and load a varying width file that uses a format such that the first few bytes of each line are the LENGTH of the line to follow. This also works in all releases.

o Use the STR attribute on the infile directive to load a varying width file with some sequence of characters that represent the end of line – as opposed to just the newline character representing that. This is new in Oracle8i release 8.1.6 (it appears to work in 8.1.5 but it is not documented nor supported and you get funny results in the log file about the number of records actually loaded and processed. In 8.1.6 -- it works as expected and is documented/supported)

We will demonstrate each in turn.

Use some character other then a newline:

This is an easy method if you have control over how the input data is produced. If it is easy enough to convert the data when creating the data file – this will work fine. The idea is to apply a SQL function to the data on the way into the database, replacing some string of characters with a newline. An example control file with inline data could be:

load data
infile *
into table T
TRUNCATE
fields terminated by ',' optionally enclosed by '"'
(
TEXT "replace(:text,'\\n',chr(10))"
)
begindata
How Now\nBrown Cow

Notice how in the call to replace we had to use “\\n” not just “\n”. This is because \n is recognized by SQLLDR as a newline and it would have converted it into a newline – not a 2 character string. In SQLLDR -- \\n is how to get the character string constant \n into the control file. When we execute SQLLDR with the above control file, the table T is loaded with:

ops$tkyte@DEV816> select * from t;

TEXT
----------
How Now
Brown Cow

Use the FIX attribute:

The FIX attribute is another method available to us. If you use this – the input data must appear in fixed length records. Each record will be exactly the same number of bytes as any other record in the input data set. When using POSITIONAL data – this is especially valid, those files are typically fixed length input files to begin with. When using “free form” data, this is less flexible as those files are generally varying length files to begin with.

When using the FIX attribute, you will code a control file similar to this:

load data
infile fix.dat "fix 25"
into table T
TRUNCATE
fields terminated by ',' optionally enclosed by '"'
(
TEXT
)

That specifies an input data file that will have records that are 25 bytes each. This includes the trailing newline that may or may not be there – in this case, the newline is nothing special in the input datafile. It is just another character to be loaded or not. So, if you take a datafile such as:

$ cat fix.dat
123456789012345678901234
how now
brown cow
this
is
another
line

Where each and every line is exactly 25 bytes long and load it using the above control file, you will get:

ops$tkyte@DEV816> select '"' || text || '"' text, dump(text,16) dumptext from t
2 /

TEXT DUMPTEXT
------------------------------ ----------------------------------------
"123456789012345678901234 Typ=1 Len=25:
" 31,32,33,34,35,36,37,38,39,30,31,32,33,3
4,35,36,37,38,39,30,31,32,33,34,a

"how now Typ=1 Len=25:
brown cow 68,6f,77,20,6e,6f,77,a,62,72,6f,77,6e,20
" ,63,6f,77,20,20,20,20,20,20,20,a

"this Typ=1 Len=25:
is 74,68,69,73,a,69,73,a,61,6e,6f,74,68,65,
another 72,a,6c,69,6e,65,20,20,20,20,a
line
“

Notice how each row loaded ends with a newline – that is because my input data always had a newline in the 25’th byte. If this is not desired, you should enclose your data in quotes (and account for that additional space on each line in the FIX attribute). In that fashion, the trailing newline will not be loaded as part of the input data.

Using the VAR attribute:

Another method of loading data with embedded newline characters is to use the VAR attribute. When using this format, each record will begin with some fixed number of bytes that represent the total length of the incoming record. Using this format – I can load varying length records that contain embedded newlines – but only if I have a record length field at the beginning of each and every record. So, if I use a control file such as:

load data
infile var.dat "var 6"
into table T
TRUNCATE
fields terminated by ',' optionally enclosed by '"'
(
TEXT
)

The “VAR 6” says that the first 6 bytes of each input record will be the length of that input record. If I take a datafile such as:

$ cat var.dat
000028"1234567890123456789012345"
000020"How now
brown cow"
000023"this
is
another
line"

I can load it using that control file. In my input datafile – I have 3 lines of data. The first line starts with 000028 – meaning that the next 28 bytes represent the first input record. As you can see I have 25 characters of text (the number 123…345 plus 2 characters for the quotes plus 1 character for the newline). The next line starts with 000020 – it has 20 bytes of text and so on. Using this format datafile, we can easily load our data with embedded newlines.

Using the STR attribute:

This is perhaps the most flexible method of loading data with embedded newlines. Using the STR attribute – I can specify a new end of line character (or sequence of characters). This allows you to create an input datafile that has some special character at the end of each line – the newline is no longer “special”.

I prefer to use a sequence of characters – typically some special “marker” and then a newline. This makes it easy to see the end of line character when viewing the input data in a text editor or some utility as each record still has a newline at the end of it. To use this, we might have a control file like:

load data
infile str.dat "str X'7c0a'"
into table T
TRUNCATE
fields terminated by ',' optionally enclosed by '"'
(
TEXT
)

The above usings the “STR X’7c0a’” to specify that each input record will end with a | (pipe character) and a newline. To construct that string – made of the hex characters X’7c0a’ – I used SQLPlus. For example to see that a PIPE is 7c, and a newline 0a, I simply:

ops$tkyte@DEV816> select to_char( ascii( '|' ), 'xx' ) from dual
2 /

TO_
---
7c

ops$tkyte@DEV816> select to_char( ascii( '
2 ' ), 'xx' ) from dual;

TO_
---
a

So, if your input data looks like:

$ cat str.dat
123456789012345678901234|
how now
brown cow|
this
is
another
line|

The above control file will load it correctly.


Rating

  (11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

how to use STR infile directive with command-line 'DATA=' parameter ?

Paul Ryan, April 30, 2002 - 8:34 am UTC

Thanks for this very useful summary of the options available for using SQL*Loader when there are newlines in the data to be loaded.

Of course using the STR attribute on the infile directive is the best way - but, how do you use this technique if you are specifying the datafile name on the command line (and so there is no INFILE directive in the control file) ?

Tom Kyte
April 30, 2002 - 11:35 am UTC

data= on the command line will always override the infile= in the ctl file, so just

load data
infile to_be_named.later "str X'7c0a'"

into table T
TRUNCATE

and use data= on the command line

Thank you Tom

mohammad, February 08, 2005 - 6:03 pm UTC

I have a quick question: when using external table instead of sql loader ,where is the right place to put
"STR X’7c0a’" to load a long string containing \n?

Tom Kyte
February 09, 2005 - 2:40 am UTC

when I'm faced with "what is the equivalent" in an external table -- and I have the ctl file, i just:


[tkyte@localhost tkyte]$ sqlldr / t external_table=generate_only;

SQL*Loader: Release 10.1.0.3.0 - Production on Wed Feb 9 02:38:04 2005

Copyright (c) 1982, 2004, Oracle. All rights reserved.

and it tells me:


CREATE TABLE "SYS_SQLLDR_X_EXT_T"
(
"TEXT" VARCHAR2(2000)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY DATA_PUMP_DIR
ACCESS PARAMETERS
(
RECORDS DELIMITED BY 0x'7C0A' CHARACTERSET WE8ISO8859P1
BADFILE 'DATA_PUMP_DIR':'str.bad'
LOGFILE 't.log_xt'
READSIZE 1048576
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' LDRTRIM
REJECT ROWS WITH ALL NULL FIELDS
(
"TEXT" CHAR(255)
TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
)
)
location
(
'str.dat'
)
)REJECT LIMIT UNLIMITED


Thanks Tom

Mohammad, February 17, 2005 - 6:55 pm UTC

Your answer was very helpful. Thanks again Tom

how to use STR infile directive with command-line 'DATA=' parameter ?

Floyd T. Wright (Tom), March 03, 2005 - 3:57 pm UTC

Took me some time to find this little nugget of info burrowed in this question under, 'how to use STR infile directive with command-line 'DATA=' parameter ?', but it provided the precise solution I was looking for; Thanks much! Now have script that calls sqlldr and my ctl files don't have to have path/filename hardcoded within the ctl itself yet I can use infile parameter "str X'7c0a'" needed to correctly load data from file. Worked fine with 9iR1 (Oracle9i Enterprise Edition Release 9.0.1.1.0 - 64bit Production).

MS Access file export

Backus, February 17, 2006 - 4:25 pm UTC

It took me a few days, but with this article and some putzing around, it finaly dawned on me that the fixed byte count that Access reports is 2 bytes short - the record terminating crlf. Thought I'd pass it on in case someone else was struggling with this.

Excellent!

Admirer, April 03, 2006 - 3:51 pm UTC

Tom,
that was very useful. I was searching this one for weeks now and you have solved and made it very easy.Thanks for that one.

I have one doubt though. What would control file look like, if the table contains 5 columns with embedded Newlines instead of one column as you had shown in the example?

Thanks once again for the great insight.


Still have a problem after trying your ideas

Irene Westervelt, May 12, 2006 - 12:47 pm UTC

Tom
Maybe I am overlooking something, I have tried everything still cannot get the data loaded:
desc psusys.PS_W3EB_TEXT
Name Null? Type
----------------------------------------- -------- ----------------------------
TEXT_ID NOT NULL VARCHAR2(20)
EFFDT NOT NULL DATE
BENEFIT_PROGRAM NOT NULL VARCHAR2(3)
PLAN_TYPE NOT NULL VARCHAR2(2)
EVENT_CLASS NOT NULL VARCHAR2(3)
W3EB_TEXT LONG

this is the flat file:
|CONFIRMDESCR |;19000101;| |;| |;|FSC|;|ices have been successfully submitted to the Benefits Department. <br>You will receive a confirmation statement within one week to confirm your family status change enrollment. <br>To return to the Benefits Enrollment page, click OK.|

this is the controlfile:
load data
INFILE irene1.del "str X'7c0a'"
into table psusys.PS_W3EB_TEXT
replace
fields terminated by ";" optionally enclosed by "|"
TRAILING NULLCOLS
(TEXT_ID ,EFFDT ,BENEFIT_PROGRAM ,PLAN_TYPE ,EVENT_CLASS ,W3EB_TEXT char)

I am not very familiar with SQLLoader, I tried your solutions, but maybe I need to do something more.

john dent, October 02, 2006 - 7:08 pm UTC

Tom,
Is there anything in current (10.2) or planned releases which makes this issue any easier to deal with - especially when dealing with a) variable-length records in a file where b) you can't use the STR method because you can't control or modify the file contents?

A beefed-up 'enclosed by' clause or something?

I'm using Apex, which has often been promoted by Oracle as the way to replace Microsoft (Access) apps - though it's massively more powerful than that - but if we can't easily eat MS-generated data (e.g. Outlook exports) there's a problem with this marketing message...

Many thanks





Tom Kyte
October 02, 2006 - 9:16 pm UTC

well, since apex has their "own loader", this would be a GREAT message to post in the apex forums!

they not only read that forum (the guys that write apex), they act on the input.

Feel free to start the message with "Tom Kyte told me to post this here...." and give your case - you might be surprised that they respond.

Thanks

john dent, October 03, 2006 - 12:04 am UTC

Thankyou Tom,

I've often been helped by the excellent support the Apex guys provide through the Apex forum, but posted my question here here because:

- the "own loader" in Apex to which I think you may be referring is part of the Builder app, and deals with files (e.g. application exports) which because of their structure do not raise the problem of CSVs with embedded newlines;

- the HTML standard File Browse component of Apex uploads a file into a blob, but the issue is what to do with it next: how to transform the blob into table data, which is where (in my provisional solution) the external table mechanism comes in...

- the issue with external tables / sqlldr seems to me to be Oracle-generic rather than anything to do with Apex (though I acknowledge that's where I pointed my guns).

Still, I take your point, and I'll post in the Apex forum with a cross-reference to this thread.

I'd welcome any further insights you may have on the issue.

Thanks again




Ayaz, January 11, 2010 - 3:30 am UTC

I want to Skip the Trailer(last line) from the record. How can I do that.
Also Can we use CONSTANT key word with POSITION.
Tom Kyte
January 18, 2010 - 6:24 am UTC

how do you skip the "last line" from a "single record"

not sure what you are trying to ask here.

STR Attribute

Mukesh, March 18, 2015 - 6:26 pm UTC

Tom,

You suggested using STR attribute for loading the data file which has a new line character in a field itself. However, it seems to me that you are suggesting to modify data file in such a way as to mark genuine new line character with a sequence of characters and leave rest of new line as is. How can I achieve that if that's true?

We get data file from external sources and it is not possible to ask for any changes. Example is below. How can I load this file using STR attribute?

Example data file
id|product|description|contact
1|pc|windows 7
Solid state drive
Long power cord|abc@hotmail.com
1|pc|windows 8
SATA drive
short power cord|xyz@hotmail.com

Appreciate your help!