Skip to Main Content
  • Questions
  • Reading fields with embedded newline characters with external tables

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Maureen.

Asked: January 04, 2011 - 3:35 pm UTC

Last updated: August 19, 2019 - 6:56 pm UTC

Version: Oracle 11G

Viewed 10K+ times! This question is

You Asked

Hi Tom,
I have read through your threads regarding sql loader, external tables and reading fields containing newline characters which are also the record delimiter. I have a csv file that contains variable length fields delimited by "," optionally enclosed by '"' and have no control over the contents of this file.

If I read correctly, one of your responses include this solution ...


organization external
(
  type ORACLE_LOADER
  default directory DIR_ALUMNI
  access parameters 
  (
    records delimited by NEWLINE
    fields terminated by ',' optionally enclosed BY '"' LDRTRIM
(
      ACTC_CODE_SA_NEW       char(500) TERMINATED BY ','  ENCLOSED BY '"',
      ACTC_CODE_VA_NEW       char(500) TERMINATED BY ','  ENCLOSED BY '"',

)


However, this does not work (the record is rejected because it finds the newline character before the ending '"'). And the Oracle documentation states that sql loader will reject the record if it encounters newline before the ending enclosure character. Did I miss something in your response? Is there a way to have an external table read a field with an embedded newline?

Thank you for your help and Happy New Year!

and Tom said...

In order to load with embedded newlines - you will *need* to have control over the input file to a degree.

If you do not, you will not be using external tables nor sqlldr - they will not work.


There are four ways to do it:


o Load the data with some other character in the data that represents a newline (e.g., put the string \n in the text where a newline should appear) and use a SQL function to replace that text with a CHR(10) during load time.

o Use the FIX attribute on the INFILE directive, and load a fixed-length flat file. In this case there is no record terminator; – but rather, the fact that each record is exactly as long as every other record is used to determine where records begin and end.

o Use the VAR attribute on the INFILE directive, and load a variable-width file that uses a format such that the first few bytes of each line specify the length in bytes of the line to follow.

o Use the STR attribute on the INFILE directive to load a variable-width file with some sequence of characters that represents the end of line, as opposed to just the newline character representing this.



To demonstrate the STR one (the easiest to apply by far - you need to have an end of record terminator - I use "|\n" generally).

The create table would resemble:

CREATE TABLE "SYS_SQLLDR_X_EXT_DEPT"
(
"DEPTNO" NUMBER(2),
"DNAME" VARCHAR(4000),
"LOC" VARCHAR(4000),
"COMMENTS" VARCHAR2(4000)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000
ACCESS PARAMETERS
(
RECORDS DELIMITED BY 0x'7C0A' CHARACTERSET US7ASCII
BADFILE 'SYS_SQLLDR_XT_TMPDIR_00000':'demo.bad'
LOGFILE 't.log_xt'
READSIZE 1048576
FIELDS TERMINATED BY "," LDRTRIM
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS
(
"DEPTNO" CHAR(255)
TERMINATED BY ",",
"DNAME" CHAR(255)
TERMINATED BY ",",
"LOC" CHAR(255)
TERMINATED BY ",",
"COMMENTS" CHAR(255)
TERMINATED BY ","
)
)
location
(
'demo.dat'
)
)REJECT LIMIT UNLIMITED


and the data could look like:

10,xxx,here,"this is a
long field with
newlines"|
20,yyy,there,"this is a
long field with
newlines"|



Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

similar issue for Sql loader and resolution for it

Prathamesh Joshi, August 16, 2019 - 2:23 pm UTC

https://docs.oracle.com/cd/B28359_01/server.111/b28319/ldr_concepts.htm#i1005800
https://gerardnico.com/sql_loader/carriage_return_in_text_field
These linked helped me in handling huge dirty data with new line characters in delimited export of Sybase ASE. just adding "str '^^^^\n'" after Infile location helped me to load streamed data smoothly.(^^^^ is row delimiter for my BCP out file)
It avoids any preprocessing of file to quote any fields which can take huge time for files with Gbs of data.
Connor McDonald
August 19, 2019 - 6:56 pm UTC

Nice stuff. Thanks