It works
Gaetano Galletta, July 08, 2002 - 10:54 am UTC
Thanx a lot, I've tried and now I succed in loading data correctly.
I've got a Win XP environment so I simply "substitute" your unix-style sintax with windows style.
Thanx again
Gaetano
Loading Mutiple Languages from the same datafile using SQL*Loader
Sami, January 02, 2004 - 3:19 pm UTC
Dear Tom,
Happy New Year!
I have to load both French and Japanese characters from datafile into tables using SQL*loader.
DB character set is UTF8 (version 8.1.7.3)
Client : Unix (Sun Solaris)
If I set NLS_LANG=AMERICAN_AMERICA.UTF8 then French characters are NOT getting loaded properly(garbled-> non-readable format) but Japanese characters are getting loaded without any problem.
If I set NLS_LANG=French then French characters are getting loaded properly( no garbled issue) but Japanese characters are NOT getting loaded(garbled).
French Data
===========
Laforêt
François
modèle
démineur
Japanase Data
===============
çµãè¾¼ã¿å±¥æ´
çµãè¾¼ã¿ã®è¨å®
I copied sample FRENCH & JAPANESE characters above.
Even here in BROWSER,
(A)if the BROWSER Encoding (please select View->Encoding) is set to Unicode(UTF-8)then Japanese Data (characters) are in READABLE format.
(B)if the BROWSER Encoding (please select View->Encoding) is set to Wester European(ISO),then French Data (characters) are in READABLE format.
Questions:
==========
1)Why french characters are not getting loaded properly when I set NLS_LANG=AMERICAN_AMEICA.UTF8
2)How do I load mutiple langauges content from the same datafile into db table.
Kindly through some light on this?
Thanks for your time.
January 02, 2004 - 4:24 pm UTC
1) are the input files in fact UTF8 encoded when the data is french?
2) make sure they are all encoded the same way?
Diff between "Unicode" and "UTF-8" encoding
Sami, January 03, 2004 - 9:41 am UTC
Dear Tom,
<asktom>
"make sure they are all encoded the same way"
</asktom>
You are correct. They were not encoded in same way, that was the problem. Thanks a lot. It is working now.
Sorry for asking bit off-topic question.If possible please...
We are getting input data (text) file from user( they use NOTEPAD/Textpad). In NOTEPAD there are two encoding stuff (UTF-8 and Unicode).
1)What is the difference between "Unicode" and "UTF-8" encoding?
2)In Unix Vi editor, how to identify the encoding type?
For example, I want to store abc.txt in UTF-8 encoding using VI. How do I do that?
January 03, 2004 - 10:09 am UTC
1) beyond the scope of "ask the database guy" :)
google it, there is lots of information out there. They are just different encoding schemes, like "ascii" and "ebcidic"
2) the files are "flagged", i don't know all of the specifics (something to research on the web ;)
for example:
</code>
http://docs.oracle.com/docs/cd/B10501_01/server.920/a96652/ch06.htm#1010644 <code>
shows there may be byte order markers in there (BOM's) and tools that are "utf/unicode" aware respect that.
Sorry -- sort of out of my domain knowledge area, this is when I would go start asking questions myself :)
Loading French & Diff between "Unicode" and "UTF-8" encoding
Sergiusz, January 06, 2004 - 6:43 am UTC
As this is my area of expertise, let me comment :-)
To load French encoded in WE8ISO8859P1 (most probably) and Japanese encoded in UTF8 (most probably), put them into separate files and use the control files's CHARACTERSET clause to specify the character set for each file separately. Oracle8i/9i Utilities Guide has more info on the clause.
If the text in mulitple encodings is mixed in one file, then you cannot load it with SQL*Loader in one run.
Unicode is a character set, i.e. a set of characters that are numbered from 0 to 2**31-1 (not all numbers are already assigned, actually most are not assigned but reserved). A Unicode encoding scheme is a way to store these numbers as bytes in computer memory and on disk (serialization). Unicode has a few encoding schemes: UTF-7, UTF-8, UTF-16, UTF-16LE, UTF-16BE, UTF-32, UTF-32LE, UTF-32LE, CESU-8. More details on www.unicode.org and otn.oracle.com (under Globalization).
Best regards,
Sergiusz
Unicode vs. UTF-8
Sergiusz, January 06, 2004 - 6:58 am UTC
And specifically for standard Windows Notepad:
"Unicode" means UTF-16LE + Byte Order Mark, i.e. character codes are two bytes each, little-endian (Intel), plus (0xFF,0xFE) in the two first bytes of the file. These two bytes denote the little-endian byte ordering. The reverse sequence (0xFE, 0xFF) means the big-endian order.
SQL*Loader 9i can recognize the Byte Order Mark of an UTF-16 files and interpret the file accordingly if the CHARACTERSET clause in the control file is set to UTF16.
"UTF-8" means UTF-8, i.e. character codes are one, two, three or four bytes each depending on the range of their original unicode code point (i.e. number in the character set).
In vi you cannot set the encoding. If your vi version (Unix) is properly globalized, it will use the encoding from the LANG environment variable (Unix locale). Otherwise it will work in ASCII.
Best regards,
Sergiusz
Thanks Sergiusz
Sami, January 07, 2004 - 10:28 pm UTC
I read your comments in Metalink as well. Really helpful.
A reader, August 24, 2017 - 2:34 pm UTC
I have a UTF-16(BOM) file to be imported into Oracle 12c ( Windows 64 bit) .
I tried setting NLS_LANG as below . The file did not import successfully .
set NLS_LANG=AMERICAN_AMERICA.UTF8
I tried setting it to ( NLS_LANG=AMERICAN_AMERICA.UTF16) . It looks like it is a invalid value .
SQL*Loader-128: unable to begin a session
ORA-12705: Cannot access NLS data files or invalid environment specified
What should be the NLS_LANG setting ?
Thanks