Skip to Main Content
  • Questions
  • SQL Loader- Multibyte character error

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Pratima.

Asked: October 05, 2016 - 11:13 am UTC

Last updated: May 23, 2022 - 3:39 am UTC

Version: Oracle Database 11.2.0.1.0

Viewed 10K+ times! This question is

You Asked

Hi Team,

When I am trying to load an UTF8 characterset datafile using sqlldr, it fails with Multibyte character error.

Steps to reproduce the issue:

1. Create table.

create table LoaderTest(
rec_no number,
rec_id_no nvarchar2(30),
column1 varchar2(2000 char),
column2 varchar2(255),
column3 varchar2(2000));

2. Control file(TEST.ctl)

LOAD DATA
CHARACTERSET UTF8
REPLACE
PRESERVE BLANKS
-- Record Type 61: Statement Line
INTO TABLE LOADERTEST
WHEN rec_id_no = ':61'
TRAILING NULLCOLS
(rec_no RECNUM,
rec_id_no POSITION(1:3) CHAR
TERMINATED BY WHITESPACE "ltrim(:rec_id_no, ':')",
column1 POSITION(5:10) CHAR,
column3 POSITION(11:110) CHAR)

-- Record Type 61: Supplimentary Detail (set rec_id_no to 9)
INTO TABLE ce_stmt_int_tmp
WHEN rec_id_no <> ':'
TRAILING NULLCOLS
(rec_no RECNUM,
rec_id_no POSITION(1:1) ,
column1 POSITION(1:2000) CHAR)

-- Record Type 86: Information to Account Owner
INTO TABLE ce_stmt_int_tmp
WHEN rec_id_no = ':86'
TRAILING NULLCOLS
(rec_no RECNUM,
rec_id_no POSITION(1:3) CHAR
TERMINATED BY WHITESPACE "ltrim(:rec_id_no, ':')",
column1 POSITION(5:2000) CHAR)

3. Datafile has below lines.

:86:/PT/FT/PY/Оплата по счету №731 от 26.08.2016 за аренду в сент
ябре 2016 гВКЛЮЧАЯ НДС 18% 15761.59/BN/ОАО Русские самоцветы 407
02810432000005783/AB/044030790 ПАО БАНК САНКТ-ПЕТЕРБУРГ/AB3/Г. СА
НКТ-ПЕТЕРБУРГ/AB4/40702810432000005783/BTC/12/ИНН - 7806007100

4. Execute below command

sqlldr userid=apps/apps@fb4yd213 control = TEST.ctl, DATA=datafile.dat

5. Log file shows:

value used for ROWS parameter changed from 64 to 61
Record 2: Rejected - Error on table LOADERTEST, column REC_ID_NO.
Multibyte character error.
Record 4: Rejected - Error on table LOADERTEST, column REC_ID_NO.
Multibyte character error.

I would appreciate if you could provide any suggestions on this.

Thanks in advance for your help.

Best Regards,
Pratima


and Connor said...

From MOS Note 435128.1

========================
SQL*Loader Example When Loading Data Into A Unicode Character Set (UTF8/AL32UTF8) Database (Doc ID 435128.1) To BottomTo Bottom

In this Document
Goal
Solution
APPLIES TO:

Oracle Database - Enterprise Edition - Version 8.1.7.0 and later
Information in this document applies to any platform.
***Checked for relevance on 30-Jul-2013***
GOAL

You are attempting to load data into a Unicode character set based database such as UTF8 or AL32UTF8.

You are now finding that a previously used delimiter is no longer working as expected.

SOLUTION

There is more than one potential source for this problem:

a. The character encoding of the flat/text file that is being used to load data into the Unicode database :

b. The CHARACTER SET specified (or left to default) in SQL*Loader control file does not match the character set encoding used for the flat/text file

c. The CHARACTER SET defined for the NLS_LANG is incorrect or left to a default (US7ASCII) that does not recognize the character codepoint used

d. the LOCALE (UNIX/LINUX) of CHCP (MSWIN) does not handle the character codepoint used

e. the Character codepoint used as the DELIMITER is *not unique* as some of the current data in the flat/text file matches the delimiter

For our example, we will state that the cedilla (ç) was used as a field delimiter and the database character set had recently been changed to UTF8.

Problem scenario

a. unset NLS_LANG. This defaults to AMERICAN_AMERICA.US7ASCII and there is no definition for the cedilla, or, set NLS_LANG=AMERICAN_AMERICA.UTF8

Comment: LANGUAGE_TERRITORY is not relevant for this issue, it is the character set definition that impacts the results seen.

Here is a sample of the SQL*Loader control file:

LOAD DATA
INFILE *
APPEND
INTO TABLE EMP1
FIELDS TERMINATED BY "ç" OPTIONALLY ENCLOSED BY '"'
(
empno,
ename, job, mgr,
hiredate DATE(20) "DD-Month-YYYY",
sal,
comm,
deptno
)
BEGINDATA
7782ç "Clark"ç "Manager"ç 7839ç 09-June-1981ç 2572.50çç 10ç
7839ç "King"ç "President"ç ç 17-November-1981ç 5500.00çç 10ç
7658ç "Chan"ç "Analyst"ç 7566ç 03-May-1982ç 3450çç 20ç

Here is the test:

With 9i

D:\oracle92\ora92\BIN> SET NLS_LANG=AMERICAN_AMERICA.UTF8
D:\oracle92\ora92\BIN> sqlldr control=ulcaseç.ctl

Username:scott/tiger

SQL*Loader: Release 9.2.0.1.0 - Production on Wed Jun 6 11:45:08 2007
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

SQL*Loader-625: Multibyte character error in control file.

With 10gR2

D:\oracle\product\10.2.0\db_10gR2\BIN> set NLS_LANG=AMERICAN_AMERICA.UTF8
D:\oracle\product\10.2.0\db_10gR2\BIN> sqlldr control=ulcaseç.ctl
Username:scott
Password:

SQL*Loader: Release 10.2.0.1.0 - Production on Wed Jun 6 11:37:17 2007
Copyright (c) 1982, 2005, Oracle. All rights reserved.

SQL*Loader-625: Multibyte character error in control file.

Changing the NLS_LANG to match the encoding of the input file and allows us to "see" and use the cedilla which was originally stored using single-byte character set rules

Correct setting

For Windows

set NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1

For Linux/Unix

export NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1

D:\oracle\product\10.2.0\db_10gR2\BIN> sqlldr control=ulcaseç.ctl
Username:scott/tiger

SQL*Loader: Release 10.2.0.1.0 - Production on Wed Jun 6 11:13:03 2007

Copyright (c) 1982, 2005, Oracle. All rights reserved.
Commit point reached - logical record count 3

SQL> select * from emp1;

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
-------- ------- --------- ------- --------- ------- ---- ------
7782 Clark Manager 7839 09-JUN-81 2572.5 10
7839 King President 17-NOV-81 5500 10
7658 Chan Analyst 7566 03-MAY-82 3450 20

Discussion

The NLS_LANG, character set for the SQL*Loader control file and encoding for the flat/text file do not have to match the database character set. This means you can use several different character set sources to load data into a UNICODE database. For example, you could load Greek data using a Greek based single-byte character set such as EL8ISO8859P7 (ISO 8859-7), Cyrillic CL8MSWIN1251 (ANSI 1251) and AR8ISO8859P6 (ISO 8859-6) to load data into a UTF8 / AL32UTF8 database. The requirement is that you use the proper file encoding, set the OS and NLS_LANG to proper source rules.

The encoding of the SOURCE data is not always known when the DBA received a FLAT / text file: however, this *critical* to be able to load data using SQL*Loader if the data is not US7ASCII based

The operating system requires the ability to recognize the Rules and must be configured to handle the data encoding used in the flat/text file. This means the LOCALE, CHCP at the OS should be set properly and the NLS_LANG variable should also use a character set that understand all the data being loaded.

You do *not* have to use the EXACT character set rules used for the string data in the flat/text file as the rules can be for a SUPERSET.

e.g. WE8MSWIN1252 is strict physical superset of WE8ISO8859p1
========================

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, October 06, 2016 - 4:49 am UTC

Hi Connor,

First of all thanks a lot for your response.

I checked the mentioned metalink note and your comments.

My database has NLS_CHARACTERSET = AL32UTF8 and NLS_LANGUAGE = AMERICA.

Another thing to notice is that datafile has Cyrillic text and line is getting rejected only when first character of the line is special(encoded) character(check line 2 and 4 of datafile). If I place same line after ":86:" or add some alphanumeric character before special character, lines get loaded successfully.

Is there any proper explanation as to why sqlldr fails only when first character is special character and how to fix this?

Thanks & Regards,
Pratima


how to load even "Multibyte character error" occured?

bai, May 18, 2022 - 11:37 am UTC

But there is another scenario:
The flat text file contains multibyte chars ( like Chinese), but due to source system problem, a Chinese filed is truncated and the last multibyte char is splited . For example multibyte char '中',utf8: X'E4B8AD', was truncated to X'E4B8'.

When I try to load this file, error occured:
Multibyte character error.

But the record is very import and I need to load this file, the target filed is varchar2, how to handle this?
Connor McDonald
May 23, 2022 - 3:39 am UTC

If the data is truncated, there is not much we can do - we can't "invent" data. This problem needs to be fixed at source.