Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Pushparaj.

Asked: June 06, 2003 - 11:39 am UTC

Last updated: August 17, 2012 - 12:22 pm UTC

Version: 9.2.0

Viewed 50K+ times! This question is

You Asked

Tom,

We have a new requirement where we need to store Chinese Characters
in the database, diplay it and print it. This affects only one table
in the database. Only one table will store the chinese characters.

Currently our NLS_CHARACTERSET is set to WE8ISO8859P1.
The chinese is a MULTIBYTE character and only UTF8 can handle this.

My question is:
Is it possible to keep the NLS_CHARACTERSET=WE8ISO8859P1 as it is now
and only set the NLS_NCHAR_CHARACTERSET to handle the mulibyte
and store the chinese characters in a NCLOB or NVARCHAR2 data types. Currently we do not have any other NCHAR datatype columns in our database.
Or
Do we need to create a separate instance with UTF8 characterset for
this purpose.

Following is current setup in our environment.

SQL> select * from nls_database_parameters;

PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT yyyy-mm-dd.hh24.mi.ss
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET WE8ISO8859P1
NLS_SORT BINARY
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZH:TZM
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZH:TZM
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_COMP BINARY
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZH:TZM
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZH:TZM
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_RDBMS_VERSION 9.2.0.3.0
NLS_SAVED_NCHAR_CS WE8ISO8859P1


Please give us your suggestion.

Thanks
Pushparaj

and Tom said...

thats what the nchar stuff is for, yes. so you can store two in one.

have you read through:

</code> http://docs.oracle.com/cd/B10501_01/nav/docindex.htm#index-GLO <code>

it'll be useful to you.

Rating

  (54 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

can do it

Reader, June 06, 2003 - 1:12 pm UTC

You already have NLS_NCHAR_CHARACTERSET is set as AL16UTF16 as can be seen from the parameters that you have displayed. Now you can create a table with datatype NCHAR or NVARCHAR in the same instance and will be able to store foreign characters provided your client termional is configured to enter chinese characters.

Chinese character experience

Ya-Ping, June 06, 2003 - 2:48 pm UTC

When creating the database using the UTF8 character set, the Chinese character can be stored in and extracted from the varchar2 data type column.

Can't load Chinese characters

Sanjay Jha, September 27, 2003 - 2:36 am UTC

Tom, 
I need your urgent help. 

SQL>  SELECT * FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER LIKE '%CHARACTERSET';

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_CHARACTERSET               UTF8
NLS_NCHAR_CHARACTERSET         UTF8

SQL> drop table test
  2  ;

Table dropped.

SQL> create table test (a varchar2(20), b varchar2(1000), c varchar2(10));

Table created.

SQL> select * from test;

no rows selected

I have an excel spreadsheet with few columns having Chinese characters mixed with English. I saved the file as Unicode text file. When I load, it loads garbage in the database.

I have taken a sample from that and tried to load in this table. 
Here is what it looks in my data file to be loaded using SQL*Loader, (when looked in Word or Internet Browser):
Sanjay,CBV - &#29305;&#35768;&#19994;&#21153;&#20272;&#20540;&#24072;,tch

(Wow! I am surprised here! When I typing this,it looks Chinese Character, but when I preview, it looks like what you are seeing there...what is that all about?)

I load them using the SQL*Loader:
LOAD DATA 
INFILE 'C:\test.txt' 
BADFILE 'C:\test.bad'
DISCARDFILE 'C:\test.dsc'

INTO TABLE "TEST"
INSERT

FIELDS TERMINATED BY ','

  (A
, 
   B
, 
   C
)
SQL*Loader: Release 8.1.7.0.0 - Production on Sat Sep 27 01:41:45 2003

(c) Copyright 2000 Oracle Corporation.  All rights reserved.

Control File:   C:\test.ctl
Data File:      C:\test.txt
  Bad File:     C:\test.bad
  Discard File: C:\test.dsc 
 (Allow all discards)

Number to load: ALL
Number to skip: 0
Errors allowed: 50
Bind array:     64 rows, maximum of 65536 bytes
Continuation:    none specified
Path used:      Conventional

Table "TEST", loaded from every logical record.
Insert option in effect for this table: INSERT

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
A                                   FIRST     *   ,       CHARACTER            
B                                    NEXT     *   ,       CHARACTER            
C                                    NEXT     *   ,       CHARACTER            

Record 2: Rejected - Error on table "TEST", column B.
Column not found before end of logical record (use TRAILING NULLCOLS)

Table "TEST":
  1 Row successfully loaded.
  1 Row not loaded due to data errors.
  0 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.


Space allocated for bind array:                  49536 bytes(64 rows)
Space allocated for memory besides bind array:        0 bytes

Total logical records skipped:          0
Total logical records read:             2
Total logical records rejected:         1
Total logical records discarded:        0

Run began on Sat Sep 27 01:41:45 2003
Run ended on Sat Sep 27 01:41:48 2003

Elapsed time was:     00:00:02.36
CPU time was:         00:00:00.03    

Question: Why it says that it rejected  one row, when there was only one row in the file? Also why there is some garbage before the first column value "Sanjay"?

SQL> col a format a15
SQL> col b format a30
SQL> select * from test;

A               B                              C
--------------- ------------------------------ ----------
ÿþS a n j a y    C B V   -   yr¸¿N¡R0O<P^     t c h 


SQL> 

Then I look using non-graphical SQL*Plus, I set NLS_LANG to .ZHT16BIG :
C:\>SET NLS_LANG=.ZHT16BIG5

C:\>SQLPLUS CI_USER/CI_USER@CICA

SQL*Plus: Release 8.1.7.0.0 - Production on Sat Sep 27 01:48:16 2003

(c) Copyright 2000 Oracle Corporation.  All rights reserved.


Connected to:
Personal Oracle8i Release 8.1.7.0.0 - Production
With the Partitioning option
JServer Release 8.1.7.0.0 - Production

SQL> SELECT * FROM TEST;

A
----------------------------------------
B
--------------------------------------------------------------------------------

C
--------------------
íHíHS a n j a y
 C B V   -   yríHíH&#8594;NíHR0O<^
 t c h


SQL>


I have refered almost all the documents and help available on internet, metalink and your site as well, but no help.
The closest document for this on metalink is 143783.1, but it did not help.
Must mention that using Java Program through browser we can insert and retrieve correctly the Chinese characters in this database, but when I viewed the inserted data above, it shows garbage.
Where is the problem? Please help.

Regards
 

Tom Kyte
September 27, 2003 - 11:15 am UTC

it is all about fonts and what charactersets the clients have. thats what happened with the chinese characters here -- my browser would not have been able to render chinese at all. my database cannot store it (i use an 8 bit characterset)


your client probably had the wrong characterset -- you can specificy it in the environment (registry if you use windows) or even in the sqlldr control file if you like (there is a section on loading unicode data in the server utilities guide).

java by default sets the characterset of the client to a unicode one.

so, check the NLS_LANG of your client.

Tom, what is this OTN doc saying?

A reader, September 28, 2003 - 8:49 pm UTC

Tom,
While I am still struggling with the problem, I forgot to mention in my question that I am using Windows 2000 Professional. So came across the OTN Doc at :
</code> http://otn.oracle.com/products/oracle8i/htdocs/faq_combined.htm#characterset10
************************************************************
How do I load multilingual data into an Oracle UTF-8 database 8.1.6, from Windows 2000 client?
Unicode has two forms: fixed-width UCS-2 form (or its superset UTF16) and variable-width UTF-8 form. Windows 2000 support UCS-2. Oracle supports UTF-8 in all components and UCS-2 in OCI, Pro*C OLE/DB and ODBC. 
You have three options: 

1. Prepare your data in UCS-2 under Windows 2000 and use a custom or third party converter to convert UCS-2 to UTF-8. Then you may use SQL*Plus or SQL*Loader to load the data, provided you set NLS_LANG in Registry to AMERICAN_AMERICA.UTF8. 

2. Prepare your data in UCS-2 under Windows 2000 and write your own loader using OCI, Pro*C or ODBC (>=8.1.5.5). See documentation. 

3. Use Oracle Forms, if you want to enter data manually. If you set NLS_LANG to UTF8, Oracle Forms will automatically convert UCS-2 (coming from Windows) to UTF-8 (sent to RDBMS). 

***********************************************************
My questions in this context are:
1. Is it true that on W2K platform I will not be able to load Chinese characters in a straightforward manner, using sqlldr?
2. Can you tell me any ucs-2 to utf8 conversion tool that you know of?
3. Could you help me in writing a loader program using OCI, Pro*C or ODBC (>=8.1.5.5) ?
4. Also I was reading at 
http://www.unicode.org/iuc/iuc18/papers/b1.ppt <code>
where they mention about New Unicode features in Oracle9i
and mention following-

Character semantics support for text column
Reliable Unicode datatype as NCHAR
UTF-16 support for Oracle Call Interface(OCI)
Complete Unicode support for ODBC/OLEDB /JDBC
Unicode and ISO14651 based multilingual sort
Unicode enabled Oracle utilities such as SQL*Loader
Unicode based locale builder for locale customization

My question is, does Oracle 8i SQL*Loader not have Unicode enabled Oracle utilities?

Thanks for your help, much appreciated.

Regards

Tom Kyte
September 29, 2003 - 7:15 am UTC

I never dealt with either of Windows or Chinese characters myself.

I do know that sqlldr (written in OCI) is capable of loading chinese. It is all about how you set your client character set.

How to write nls paramter correctly?

Teymur, August 25, 2004 - 8:28 am UTC

Dear Tom.

Where can I get a list of all NLS parameters. I mean for example: american_american...

If I need another nls(for example: Azerbaijan), how can I learn it's correct form?

Thanks in advance.

9iR2 and W2K with Japanese, Greek, Russian, etc.

Duke Ganote, November 05, 2004 - 1:17 pm UTC

Tom-- I so truly empathize with your Sep 28, 2003 reader struggling with W2K and Chinese characters and all that documentation on Unicode and encodings and charactersets -- arg! I was reading intermittently for weeks!

Here's what worked for me:  I've got Oracle 9iR2 (on W2K) which I set up as AL32UTF8 characterset (NCHAR is AL16UTF16).  9iR2 is wonderful because of the ASCIISTR and UNISTR functions, and because of external tables!  

I also have an Excel 2000 (9.0.7616 SP3) spreadsheet with multi-lingual phrases, including Greek, Russian, and Japanese, each in a separate column.  

I copied each language/column into a separate Excel workbook, then "saved as" Unicode text (I think this saved it as Unicode UCS-2 encoding which is almost AL16UFT16).  Separate workbooks spared me having to deal with tab delimited columns.  

I opened each file with Notepad (version 5) and did a "save as" text encoded as UFT-8 (which is, I think, part of the AL32UTF8 characterset).

Then in Oracle, I created an external table that accessed the file, e.g. for my Greek phrases:

SQL> ed
Wrote file afiedt.buf

  1  create table gre ( phrs_txt varchar2(2000) )
  2  organization external ( type oracle_loader
  3  default directory language_terms
  4  access parameters ( fields terminated by '\t'
  5    optionally enclosed by '"' )
  6  location ( 'phrs_gre.txt' ) )
  7* reject limit unlimited
SQL> /

Table created.

SQL> select asciistr(phrs_txt) from gre where rownum = 1;

ASCIISTR(PHRS_TXT)
----------------------------------------------------------------------------------------------------------------------------------
\03A6\03A1\0395\0396\0391 \03A4\03A1\03A5\03A0\0391\039D\0399\039F\03A5 \03A4\03A1\039F\039A\0391\03A1 3 \0391\03A5\039B\03A9\039D
 \03A5\03A0\0395\03A1\0391\03A5\03A4\039F\039C\0391\03A4\0397

Similarly for my Japanese phrases:

SQL> select asciistr(phrs_txt) from jpn where rownum = 1

ASCIISTR(PHRS_TXT)
---------------------------------------------------------------------------------------------------------------------------
3\30D5\30EB\30FC\30C8\30FB\30C8\30ED\30AB\30FC\30EB\30FB\30C9\30EA\30EB\30D3\30C3\30C8\30FB\30AF\30A4\30C3\30AF\9023\7D50


Without ASCIISTR, the phrases looks like garbage in SQL*PLUS!  With ASCIISTR, it's an easy hop to the Unicode website.  There I can verify that 03A6 is -- hmm, well, my Greek's a wee might rusty -- a capital psi or phi, which matches the first character I see in the spreadsheet cell (see 
http://www.unicode.org/charts/PDF/U0370.pdf
.

Likewise, it's easy to verify that 30D5 is the same first Katakana symbol I found in the Excel cell (see 
http://www.unicode.org/charts/PDF/U30A0.pdf
).  Regards!

 

Tom Kyte
November 05, 2004 - 5:39 pm UTC

thanks -- i really appreciate when others followup with their experiences and solutions -- especially for things I cannot work with (like chinese characters!)

Russian character-set problem..

atul, November 29, 2004 - 2:04 am UTC

Hi TOM,

Our database has some values in russian characters.
Now if i want to select those values i am getting answer like this

+++++++++++++++++++++++++++++++++++++++++++++++++++++++
SQL> select id,line from vd.address where id = 15;

ID
----------
LINE
--------------------------------------------------------------------------------
     15
посква 121099
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Which is not correct.

I am using 9i database having following NLS values

++++++++++++++++++++++++++++++

NLS_LANGUAGE  AMERICAN  
NLS_TERRITORY  AMERICA  
NLS_CURRENCY  $  
NLS_ISO_CURRENCY  AMERICA  
NLS_NUMERIC_CHARACTERS  .,  
NLS_CALENDAR  GREGORIAN  
NLS_DATE_FORMAT  DD-MON-RR  
NLS_DATE_LANGUAGE  AMERICAN  
NLS_CHARACTERSET  UTF8  
NLS_SORT  BINARY  

+++++++++++++++++++++++++++++++++++++++++++++

I also tried using session level settings:

++++++++++++++++++++++++++++++++++++++++++++++++
SQL > alter session set NLS_LANGUAGE= RUSSIAN; 
SQL > alter session set NLS_TERRITORY=CIS; 

+++++++++++++++++++++++++++++++++++++++++++++++++
But that doesn't work...

Could you help me out on this.

Thanks,
Atul

 

Tom Kyte
November 29, 2004 - 8:12 am UTC

set your NLS_LANG in your environment or registry to tell Oracle what you would like to see - -and make sure whatever display you are using is capable of doing that.

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96529/toc.htm <code>

Display Setting

atul, December 06, 2004 - 5:44 am UTC

Hi,

What do you mean by

1)set your NLS_LANG in your environment or registry to tell Oracle what you would ..

Do you mean EXPORT NLS_LANG for that session in unix?

2)and make sure whatever display you are using is capable of doing that.

How to check that? And if not what to install for Russian
Characters?

Thanks,
Atul






Tom Kyte
December 06, 2004 - 12:06 pm UTC

1) yes.

2) i have no idea.

How to display other charsets

Franco., December 07, 2004 - 7:14 am UTC

"2)and make sure whatever display you are using is capable of doing that."

Win2000/XP (small differences): Control Panel --> Regional and Language Options --> (tab)Languages.

On Xp mark the two checkbox "Install files..." related to other languages"

On win2000 choose to install russian fonts.

Hope this can helps you.

Tom Kyte
December 07, 2004 - 10:28 am UTC

thanks!

Thanks

atul, December 08, 2004 - 7:35 am UTC

Hi Tom,

Could you tell me,To see Russian charecters,In windows XP
and in Unix do i need to install certain fonts?

If yes what are they?

Thank,
Atul



Tom Kyte
December 08, 2004 - 10:46 am UTC

see #2 above.

Sanjay Jha, December 08, 2004 - 11:15 am UTC

Not sure whether you want to see the Russian data in your SQL*Plus or want to validate it by seeing on your screen. There were couple of ways I did that.
The easiest way was to see the data through OEM (Oracle 9i) and see the contents of the table, it will show you the data in its real font.
The other way was to use command line sql*plus and set the appropriate NLS_LANG (I presume it must be UTF8 character set, so set it as UTF8). Then spool the file and open it in IE browser and you should be fine, if required you might have to set the encoding to Unicode(UTF8) in your IE View option.
One thing I must point out that Windows Unicode encoding scheme is not "true" unicode but UCS-2 (when one saves file in Excel to Unicode form etc.) and therefore all the font mapping gets scrambled. In order to convert the file into true Unicode from "Windows unicode form"(which is UCS-2), you need to use some kind of third part tool (I used recode freely available at </code> http://recode.progiciels-bpi.ca/ <code>
Hope it helps you.

Thanks,

atul, December 09, 2004 - 12:31 am UTC

Hi,

Thanks,

But my requirement is as follows:

++++++++++++++++++++++++++++++++++++++++++++++++++
We need an implementation that can support all character sets at the same time. I can not be expected to set the character set/language and do queries by country. I need to be able to run a query that returns a result set that has multiple character set/ language combinations and can display them correctly.

++++++++++++++++++++++++++++++++


Thanks,
Atul

Tom Kyte
December 09, 2004 - 1:10 pm UTC

you are going to need to read the above referenced documentation and understand how it works, not how you would like to see it work.

Your client application most certainly can be expected to work in that fashion -- and it will. You cannot have multiple character set/language combinations -- think about it.

Validation of multi-lingual queries

Duke Ganote, December 09, 2004 - 10:48 am UTC

Atul-- I also cannot query using SQL*PLUS and get Russian characters. I also am uninterested in changing my NLS parameters unless there's a generic UTF parameter for display.

DPG92\dganote> select * from rus where rownum = 1;

PHRS_TXT
------------------------------------------
¿¿¿¿¿¿¿¿ ¿¿¿¿¿¿ ¿ 3 ¿¿¿¿¿¿¿¿¿ ¿¿¿¿ ¿¿¿¿¿¿¿

When I had to write up validation tests for my multi-lingual load (see my prior review), I documented two approaches:
1. Sampling-based, character-by-character (glyph-by-glyph) validation.
2. Excel-based validation (Excel in, Excel out).

For the sampling approach, I used ASCIISTR:

DPG92\dganote> select asciistr(phrs_txt) from rus where rownum = 1;

ASCIISTR(PHRS_TXT)
----------------------------------------------
\0422\0420\041E\0410\041A\0410\0420\041D \0421\0412\0415\0420\041B\041E \0421 3 \041A\0410\041D\0410\0412\041A\0410\041C\0418 \041B\0415\0413\041A \041D\0410\0421\0410\0414\041A\0410

Then I can go to Unicode website and verify it. For example, the third from last glyph has the codepoint 0414.
This shows it (it matches my source Excel spreadsheet's glyph in the 3rd-from-last position):
</code> http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=0414 <code>
Clearly, you'd only want to sample the data, it'd be too painful for every character.

For the Excel-based approach, I used Oracle 9.2 ODBC drivers and pulled the data back into an Excel spreadsheet. Excel selected the appropriate font (which was always Arial, except for Japanese characters which were MS Mincho). It matches my original source Excel spreadsheet exactly (except for occasional enclosing double-quotes or beginning/trailing spaces in the original).

BTW, I didn't use a third-party conversion tool for UCS-2 (subset of UFT-16) to UTF-8 transformation when coverting from the original Excel source (Sanjay Jha suggested a free tool for this); I used MS Wordpad's "SAVE AS" feature.

Mulitilingual Support

Subodh Deshpande, March 24, 2005 - 9:07 am UTC

Hi Tom,

The scnerio is we are developing an application in 10g, 10gAS and Oracle Forms and Reports using 10gDS.

the default character set is AL16UTF16.

By Multilingual, our client defines that end user should be able to enter the data in selected language (say Italian),
a french person will enter the data in french.

the data entered by a french should visible to Italian person in Italian and vice versa.

first of all is this possible in 10g or is there any such feature of 10g.

thanx in advance..take care..subodh

Tom Kyte
March 24, 2005 - 3:41 pm UTC

have you read the globalalization guide that talks about NLS and such?

as long as the nls character set is set properly for each end user, yes.

1401 when writing to utf-8 db

John, May 06, 2005 - 1:38 pm UTC

Hi Tom,

Perhaps off topic a bit, but I get a ora-01401 when inserting data in Pro*C.

If I create a table like
create table abcd
(
id NUMBER(10)
,data VARCHAR2(10 CHAR)
)

and try to insert a utf-8 string of <= 10 octettes it works excellent. And, this works in toad:

insert into abcd
(
id
,data
) VALUES (
1
,unistr('\0422\0420\041E\0410\041A\0410\0420\041D')
)

1)
However, when I have a utf-8 string with 10 characters and >10 octettes I get a Ora-01401.

I pre-compile with multibyte on in pro*c and in the beginning of the program, before anything else, I have a
putenv ("NLS_LANG=AMERICAN_AMERICA.UTF8")
2)Is this stupid? Is it important to put it outside the process?

3)
In "Pro*C/C++ Precompiler Programmer's Guide" it's mentioned a uvarchar datatype and a ulong_varchar datatype. Why is it this way? Is it because wchar_t is ushort on some platforms and long on other (Solaris?)?

4)
Is this uvarchar struct the prefered way to do this? I just wcscpy the value to it and live happily ever after? If so, len=wcslen(...)?

5)
Do I have to convert utf-8 to wchar_t and put them in the correct (if def:ed if it should be long or not) struct?

Thanks for any suggestion/hint of how to do this properly.

Best regards,
John

Oracle: 9.2.0.1.0
Db charset: utf-8


Tom Kyte
May 06, 2005 - 2:14 pm UTC

got code? really tiny snippet -- as small, yet complete, as possible.

My/Pro*C's fault.

A reader, May 06, 2005 - 5:26 pm UTC

I had a struggle with something in Pro*C that I don't really agree with. I turn on and off unicode builds with a flag and have to ifdef lengths of a character etc.

I had
VARCHAR a_data [LOCAL_SQL_ARRAY_SIZE][(199 + 1) * MAX_UTF8_LEN];

...and Pro*C substitutes MAX_UTF8_LEN with 1 it seems so the arrays were too short and got corrupted. This also happened if it was defined in a header included with EXEC SQL include

Now it works like a charm!

Have a nice weekend!
John


Chinese in WEISO8859P1 table

SDiZ, May 07, 2005 - 7:30 am UTC

I know lots of people here is Hong Kong just use WEISO8859P1 database to store Chinese charactor. Because no charset conversion is done, their applications can handle Chinese well.. except that some strange result comes out when they do a partial match or full-text search... untill, someday, they try to upgrade their database ......

I really hopes their application did never work at all.....

Tom Kyte
May 07, 2005 - 8:13 am UTC

and when they use substr and other builtins things go whacky -- that would be the worst idea on the planet, to store multi-byte data in a single byte database. They have a mess on their hands.

displaying output of report in hindi

vinay krishna ravi, March 18, 2006 - 11:33 am UTC

hi tom
i want to display output of a report in hindi.
how can i do it?



Tom Kyte
March 18, 2006 - 4:39 pm UTC

first, you would need hindi data in the database.

Then you would set your NLS_LANG appropriately on the client.

And print.

about hindi in database

vinay krishna ravi, March 20, 2006 - 1:31 am UTC

how can we store hindi data in database
through froms 6i as well as sql command

Tom Kyte
March 20, 2006 - 7:18 am UTC

exactly in the fashion described above. you have a database with a characterset capable of holding that. you have a client capable of typing it in. the client has the appropriate character set specified in their NLS_LANG.

It is not different from storing "english" or "french" or whatever in the database.

You might want to read through this document:

</code> http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/toc.htm <code>

RE: about hindi in database

Duke Ganote, March 20, 2006 - 12:00 pm UTC

When configured as a Unicode database, Oracle can store any unicode codepoint. See the general discussion of unicode in Hindi:
</code> http://www.unicode.org/standard/translations/hindi.html
Here's a list of Devanagari codepoints
http://www.unicode.org/charts/collation/chart_Devanagari.html <code>
any of which can be stored in Oracle using, for example, the ASCIISTR and UNISTR functions.

length semantics -- help required

Sridhar.S, August 25, 2006 - 10:37 am UTC


Hi Tom,

First I have to appreciate for your great efforts in explaining the things in a clear manner. This Q's/ reviews are really helpful in resolving different issues. Thanks a lot.

ok, let me explain the scenario:

we have oracle 9i version and have
NLS_LENGTH_SEMANTICS = BYTES,
NLS_CHARACTERSET = UTF8.

All the tables/plsql package are defined with the default byte semantics for e.g. Customer_Name VARCHAR2(80). Now the issue is because of the new incoming chinese characters.
We are not sure how/what are the changes to be done to accomodate this new chinese character and to avoid the "too large value" errors (though they are less then 80 charactes in case of Customer_Name).

To my knowledge I have the following ways to do this, but not sure how much valid they are???

Method 1: Change the length of all the tables/columns which are supposed to hold such multibyte chars. This method requires more effort for us, as we have many tables in our system.Almost all the tables are supposed to have such multibyte chars.

Method 2: Change the COLUMN definition from VARCHAR2(80) to VARCHAR2(80 CHAR). I don't think this method will be profitable one. And if that is the case we have to change all the tables in the system, and in future if we create new table then this has to been taken care.

From the oracle docs, i came to know that the change in NLS_LENGTH_SEMANTICS will affect only the new columns created. And the existing COLUMNS remains unaffected by this change.

Please guide me an efficient method to resolve this issue as early as possible. Thanks for your extensive support.

Best Regards,
Sridhar.S

Tom Kyte
August 27, 2006 - 8:43 pm UTC

you have allocated 80 bytes.

that might be able to hold many fewer than 80 CHARACTERS by definition.

method 1 and method 2 are not significantly different - effort wise??? why is method one "of great effort" and method 2 is not?


You will have to basically do either method 1 or method 2, method 2 being a bit more CLEAR and maintains your data integrity (so when your application is used with single byte data - only 80 CHARACTERS are inserted, not up to 80 * 6 !!)



length semantics -- help required

Sridhar.S, August 29, 2006 - 4:11 am UTC

Thanks!!!
Is there any way to resolve this by changing some parameters at the database level? (like the one -- setting of NLS_LENGTH_SEMANTICS!).
Comparatively, which is more effective/long lasting solution either the above two methods or doing the change in DB params? Because, the above two methods leads us to alter the entire system and also care should be taken for upcoming tables.

Best Regards,
Sridhar.S

Tom Kyte
August 29, 2006 - 7:14 am UTC

you have to rebuild the entire database, if won't automagically retrofit itself to already existing tables.

you can set this as an init.ora, but I'm not going to recommend that.

Bite the bullet here, you have to fix what you have to fix.

length semantics -- help required

Sridhar.S, August 29, 2006 - 7:36 am UTC

Thanks Tom!!!

Do things the easy way!

Michael Friedman, August 30, 2006 - 6:49 am UTC

First, change your DB to Character Semantics.

Then write a script to change the semantics of every VARCHAR column.

There is no need to create a new DB.

Let me also raise a few other points:

1. Never create a DB with Character Semantics. There are known bugs if you do so. Create with Byte and then change to Character before adding any data.

2. If you are using 10g and you care about Chinese you should seriously consider using AL32UTF8. AL32UTF8 contains a large number of additional Chinese characters. The most important ones are a slew of Hong Kong specific characters that are used most frequently in names. All HK systems for HK government are required to support these characters.

But if you do this, be aware that Dev 6i does not support AL32UTF8.

Sridhar.S, August 31, 2006 - 5:17 am UTC

Hi Mike,

What actual you mean "Then write a script to change the semantics of every VARCHAR column. ". How this can be done?? Do you mean to create a script to alter the table..for e.g. if an existing table contains a column like "customer_name varchar2(80)", then change it to "customer_name varchar2(80 CHAR)" using 'Alter' script. In this case, I think we no need to change the DB to CHAR semantics...as this is supported by BYTE semantics itself.
I'm using Oracle 9i and the character set is UTF8.


Tom Kyte
August 31, 2006 - 9:24 am UTC

correct, there is no need to change the database to char semantics, that is what he said (and what I said - don't do that, don't make the init.ora change, that would be a bad idea)

NLS_SAVED_NCHAR_CS

Jay, November 17, 2006 - 11:53 am UTC

Tom,

We recently converted one of our 9.2 databases char set from US7ASCII to WE8ISO8859P1 with ALTER DATABASE CHARACTER SET command. After that, we checked nls_database_parameters
and found

...
NLS_SAVED_NCHAR_CS US7ASCII
...
NLS_CHARACTERSET WE8ISO8859P1
...

My question is, what is NLS_SAVED_NCHAR_CS for? Do we need to care about it? If yes, how can we change it to be the same as our NLS_CHARACTERSET?

Thanks!

Tom Kyte
November 20, 2006 - 2:03 am UTC

It seems odd that you want to change the setting of a parameter that you don't know anything about?

read support note: Note:316889.1

that is your 'old' nchar characterset, for reference.

Re: NLS_SAVED_NCHAR_CS

Jay, November 20, 2006 - 11:13 am UTC

I read that note. Looks like this parameter just records the history NCHAR info and shouldn't have much impact on our system. We're not using NCHAR, NVARCHAR2, etc. anyway.

Thanks.

Confused - please check my understanding

Neil, December 08, 2006 - 12:48 am UTC

Hi Tom

I have gone through the globalization document,
but it just makes my head hurt !

If I want to store and retrieve (read) say Japanese symbols
I need

1) database characterset set to a characterset capable of storing the caharacters.

2) UTF8, ALT32UTF8 would be capable of storing, WE8ISO8859P1 wouldn't

3) NLS_LANG needs to be set to a characterset capable of displaying and writing the japanese symbols.

4) JAPANESE_JAPAN.UTF8 would be capable, but JAPANESE_JAPAN.WE8MSWIN1252 wouldn't.

Many thanks

Tom Kyte
December 08, 2006 - 7:40 am UTC

1) correct
2) correct
3) correct
4) correct


you need a character set capable of storing your multi-byte data and single byte character sets (we9iso for example) cannot do that.

A reader, December 08, 2006 - 3:19 pm UTC

Tom,

Thanks a lot.
One question, If i had a option to choose between WE8ISO8859P1 and US7ASCII, which one is more ideal as i think both are single byte characteset.

Tom Kyte
December 09, 2006 - 2:38 pm UTC

which one can hold your data?

do you have 7 bit US Ascii data
or
do you have 8 bit Western European data

Which characterset

Michael Friedman, December 08, 2006 - 6:59 pm UTC

Depends on your requirements. US7ASCII only supports the basic ASCII character set. WE8ISO8859P1 supports a variety of Western European characters such as accented letters.

Will you have any names in your system with accented letters? Want to store words like "fiance" correctly with the accent included? Use WE8ISO8859P1.

Personally, my opinion is that any new system being built for anything more important than a school project should be UTF-8.

I had a conference call with the senior management of a public company last week. They built their inhouse ERP system 5 years ago using WE8ISO8859P1. Now they've just opened a China sub and they are trying to figure out if migrating their entire corporate ERP system to UTF8 is too risky so that they will need two separate ERP installations, something that their IT department is recommending but which horrifies all the business management.

As the IT department correctly points out, 5 years ago when they were asked to build this they were told that the company had no plans to move into Asia.

Well, they should have done UTF-8 anyway.


A reader, December 09, 2006 - 3:44 pm UTC


uncode character handling

sridhar.s, January 09, 2007 - 12:52 pm UTC

Hi Tom,

This is regarding unicode character handling.

At present, our system(tables, functions, procedures etc) does processing of character as single-byte. Our database is UTF8 character set.

Now, we have a requirement to handle the mixed (both unicode and single-byte) characters too. The tables are not modified to fit these unicode as the column sizing was planned considering unicode character in mind. But in certain cases we have to truncate some of the incoming data to fit the column size. For e.g. the "Name" column in "Employee" table is fixed as Varchar2(100), and if the incoming data exceeds this size, just truncate the data to fit this column.
In general we are using SUBSTRB(Name,1,100) to do this, but this fails where valid unicode (which are multi-byte) are splitted and becomes junk character.

My question: Is there any way/built-in function to find the last valid unciode(or single byte) characters from the
"SUBSTRB(Name,1,100)" and truncate till that char. i.e.
STEP 1: Cut the incoming "Name" to fit the column size.
STEP 2: Cut the "Result of the STEP2" till the last valid unicode(or single byte).

Thanks for your time.

Tom Kyte
January 11, 2007 - 9:37 am UTC

... our system(tables, functions, procedures etc) does processing of character as single-byte. Our database is UTF8 character set. ...

that conflicts with itself. If you are UTF8 - you are multibyte.


you would have to write a function that looked at the lengthb return and took off characters until it fit.

Store chinese chars into Oracle table

kamal, January 30, 2007 - 12:51 am UTC

Can anybody help me , how to upload the excel records with chinese chars into oracle table??

I tried the following things , but it doesn't show the exact chinese chars in the Oracle9i table.
I saved the Excel file as Test.xls with File type as Unicode(*.txt), only then i was able to see the Right chinese chars displayed in the excel sheet.

My test.ctl file looks like below:

LOAD DATA
CHARACTERSET AL16UTF16
TRUNCATE
INTO TABLE test
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(name,date1)

When i run the following command
sqlldr user/pass@sb data='Book1.xls' control='test.ctl' log='test.log' bad='test.bad'

I could export the chinese chars into Oracle table by saving the file as Test.csv and file type as Unicode(* .txt).
the problem arises when i exporting the Test.csv file into Oracl table thru sqlldr, it has inserted the concatenated values(Name and Date1) under a column Name.

i am only getting the merged field values together in the same field (Name)!
What exactly control file should contains for inserting the field values seperatedly under each column??


Tom Kyte
January 31, 2007 - 1:07 pm UTC

you need to make sure that

a) your database character set supports this data
b) your client character set (NLS_LANG) is set to support this data

RE: Store Chinese characters

Duke Ganote, January 31, 2007 - 1:30 pm UTC

Kamal wrote "I saved the Excel file as Test.xls with File type as Unicode(*.txt)" but also "I could [save] the file as Test.csv and file type as Unicode(* .txt)."

File Type Unicode ".txt" is tab-separated, not comma-separated. Try using
fields terminated by '\t'
instead of
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10250420313043#27317016562512

Unable to Insert special Characters like Ð,Æ

A reader, April 20, 2007 - 8:25 am UTC

Hi Tom,

NLS_LANG in my Database is AMERICAN_AMERICA.UTF8. In unix and windows i set the nls_lang as AMERICAN_AMERICA.UTF8. When i load data using sqlldr, few characters like Ð,Æ are not inserted into tables.
When i tried using insert statement the below error i got
SQL> insert into test values('Æ');
ERROR:
ORA-01756: quoted string not properly terminated
In sqldr error like "Multibyte character error".

1.Please help me how to insert the data.
2.Which characterset to use for such special characters.
Tom Kyte
April 20, 2007 - 10:01 am UTC

maybe your text file is not stored in unicode

does it have the magic bytes at the beginning that denote "this is unicode"

How to Insert special Characters like Ð,Æ

Duke Ganote, April 20, 2007 - 9:59 am UTC

how to load using sqlldr

A reader, April 23, 2007 - 1:57 am UTC

Hi, Thanks a lot..it was working with insert statements but please tell me
1. How to load with sqlldr from a csv file.
2. How to save a .xls as unicde(.txt)
Tom Kyte
April 23, 2007 - 4:28 pm UTC

1) cd $ORACLE_HOME/rdbms/demo
ls ulcase*.ctl

lots of examples....

2) you need to ask Microsoft?

unicaode

A reader, April 23, 2007 - 4:27 am UTC

Hi, Thanks i am able to save as unicode file. but sql loader is not taking with '\t' as delimiter. below is the script

LOAD DATA
INFILE 'test.txt'
INTO TABLE test
FIELDS TERMINATED BY '\t'
( value
)

the test.txt is
129 
130 ¿
131 ¿
132 ¿
133 ¿
134 ¿

I saved the csv file as --save as-- unicode(*.txt). What went wrong?
Tom Kyte
April 23, 2007 - 4:30 pm UTC

that is because to sqlldr \t is a two character string, like XX or AB...

x'09'


Tab delimited data with special characters

BC, August 18, 2008 - 11:44 am UTC

Tom,

We frequently encounter data issues while using external tables to load data files containing special characters, 
The coulumns in the source file is tab delimited, It appears as if the tab after the special character is ignored 
and the data values are shifted.

Your help is highly appreciated.

create table t_ext 
(
  full_name          varchar2(100), 
  date_time_in       varchar2(20),
  date_time_out      varchar2(20)  
) 
organization external 
( 
   type oracle_loader
   default directory "DATA"
   access parameters
   ( 
      records delimited by newline
      fields terminated by  0x'09' 
      missing field values are null
      (
         full_name,
         date_time_in,
         date_time_out
      )
   )
   location
   ( 
      "DATA":'t1.txt'
   )
)
reject limit unlimited;

DEV > host cat t1.txt
Gamboa, Ms. Ana Lía     13-Aug-2008 6:17        13-Aug-2008 11:32
Valverde, Ms. Carmen Elena      13-Aug-2008 6:35        13-Aug-2008 11:32

DEV > set null <<null>>
DEV > select * from t_ext;

FULL_NAME                                 DATE_TIME_IN         DATE_TIME_OUT
----------------------------------------- -------------------- --------------------
Gamboa, Ms. Ana Lía     13-Aug-2008 6:17  13-Aug-2008 11:32    <<null>>
Valverde, Ms. Carmen Elena                13-Aug-2008 6:35     13-Aug-2008 11:32

Tom Kyte
August 20, 2008 - 10:03 am UTC

i have no clue what a special character is.

what is your database character set
what is a "special character"

do an od -c on the file, not a 'cat', I cannot see anything.

Thanks

bc, August 20, 2008 - 10:32 am UTC

Tom,

Our database characterset is UTF-8, The "í" character seems to be causing the issue

Below is the od -c output of the file,

DEV@hq-12:/home/custom/data>od -c t1.txt
0000000 G a m b o a , M s . A n a
0000020 L í a \t 1 3 - A u g - 2 0 0 8
0000040 6 : 1 7 \t 1 3 - A u g - 2 0 0 8
0000060 1 1 : 3 2 \n V a l v e r d e ,
0000100 M s . C a r m e n E l e n
0000120 a \t 1 3 - A u g - 2 0 0 8 6 :
0000140 3 5 \t 1 3 - A u g - 2 0 0 8 1
0000160 1 : 3 2 \n
0000165

If I replace the "í" with an "i", the output of the select is quite different

DEV >host od -c t1.txt
0000000 G a m b o a , M s . A n a
0000020 L i a \t 1 3 - A u g - 2 0 0 8
0000040 6 : 1 7 \t 1 3 - A u g - 2 0 0 8
0000060 1 1 : 3 2 \n V a l v e r d e ,
0000100 M s . C a r m e n E l e n
0000120 a \t 1 3 - A u g - 2 0 0 8 6 :
0000140 3 5 \t 1 3 - A u g - 2 0 0 8 1
0000160 1 : 3 2 \n
0000165

DEV >select * from t_ext;

FULL_NAME DATE_TIME_IN DATE_TIME_OUT
---------------------------------------- -------------------- --------------------
Gamboa, Ms. Ana Lia 13-Aug-2008 6:17 13-Aug-2008 11:32
Valverde, Ms. Carmen Elena 13-Aug-2008 6:35 13-Aug-2008 11:32

Your help is highly appreciated.

Thanks

BC
Tom Kyte
August 20, 2008 - 10:45 am UTC

do you have a utf8 formatted file... if not, character set conversion will take place.

Thanks

bc, August 20, 2008 - 10:47 am UTC

How do I tell if the file is UTF-8 formatted?

Can I re-format a file to be UTF-8?

Thanks

BC


Tom Kyte
August 20, 2008 - 12:39 pm UTC

it would have a 2 byte field right at the beginning

go into windows notepad and save a file in that format and "od -c" it.

Thanks .....

bc, August 20, 2008 - 12:12 pm UTC

Tom,

Thanks a million for pointing me in the right direction, I did figure out how to resolve the issue.

Source File from vendor
DEV@hq-12:/home/custom/data>od -c t1.txt
0000000 G a m b o a , M s . A n a
0000020 L í a \t 1 3 - A u g - 2 0 0 8
0000040 6 : 1 7 \t 1 3 - A u g - 2 0 0 8
0000060 1 1 : 3 2 \n V a l v e r d e ,
0000100 M s . C a r m e n E l e n
0000120 a \t 1 3 - A u g - 2 0 0 8 6 :
0000140 3 5 \t 1 3 - A u g - 2 0 0 8 1
0000160 1 : 3 2 \n
0000165
Convert File to UTF-8
DEV@hq-12:/home/custom/data>iconv -t utf-8 t1.txt > t11.txt
Converted File
DEV@hq-12:/home/custom/data>od -c t11.txt
0000000 G a m b o a , M s . A n a
0000020 L Ã 255 a \t 1 3 - A u g - 2 0 0 8
0000040 6 : 1 7 \t 1 3 - A u g - 2 0 0
0000060 8 1 1 : 3 2 \n V a l v e r d e
0000100 , M s . C a r m e n E l e
0000120 n a \t 1 3 - A u g - 2 0 0 8 6
0000140 : 3 5 \t 1 3 - A u g - 2 0 0 8
0000160 1 1 : 3 2 \n
0000166

SQL*Plus: Release 10.2.0.3.0 - Production on Wed Aug 20 11:42:58 2008
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

DEV > set null <<null>>
DEV > alter table t_ext location( "DATA":'t1.txt');

Table altered.

Select From Source File

DEV > select * from t_ext;

FULL_NAME DATE_TIME_IN DATE_TIME_OUT
--------------------------------------------- -------------------- --------------------
Gamboa, Ms. Ana L¿13-Aug-2008 6:17 13-Aug-2008 11:32 <<null>>
Valverde, Ms. Carmen Elena 13-Aug-2008 6:35 13-Aug-2008 11:32

DEV > alter table t_ext location( "DATA":'t11.txt');

Table altered.

Select From UTF-8 Converted File

DEV > select * from t_ext;

FULL_NAME DATE_TIME_IN DATE_TIME_OUT
--------------------------------------------- -------------------- --------------------
Gamboa, Ms. Ana Lía 13-Aug-2008 6:17 13-Aug-2008 11:32
Valverde, Ms. Carmen Elena 13-Aug-2008 6:35 13-Aug-2008 11:32

Thanks

BC

fixed length

blury, March 31, 2009 - 10:53 pm UTC

Hi Tom,

I have no prob with upload chinese characters and view it on Oracle Portal, however, we are now trying to have a more dynamic ctl file for SQLLDR, thus we plan to load all data into 1 column in the BD, yes, 1 column...
Then we'll use the VIEW to cut the line text into column(pre-defined fixed length)...The problem is now for example the chinese char is:
¸£½¨±¦ÃöÌåÓýÓÃÆ·ÓÐÏÞ¹«Ë¾ (in ANSI file, 24byte) will become --> ¿¿¿¿¿¿¿¿¿¿¿¿ (12byte)after it was uploaded...
FYI, I SET NLS_LANG=SIMPLIFIED CHINESE_CHINA.ZHS16GBK
So at the end of the day, the whole line is shrinked...and my method doesn't work as it's not same as the predefined length...
Please advise is there any method I can upload the chinese char with the same byte as it is in the source file??
Or any method I can use to add space after each chinese word so that I was same length as the predefined length(other than count ASCII code one char ny one char...)

Tom Kyte
April 01, 2009 - 9:01 am UTC

I'm not following you at all here. I don't know what you mean.

Portal seems a red-herring here, eg: it is not relevant to the problem at hand, so leave it out and better describe what you mean.


I don't know what a "more dynamic ctl file" is or means.

I cannot imagine any good would come of loading all data into a single column and parsing it in a view. Be sort of horrible for maintenance, data integrity, performance, security - you name it, it is bad for it. Nothing good will come of this.

the difference between the two?

jianhuang zheng, April 01, 2009 - 11:21 am UTC

hi tom
can you briefly describe the difference between nchar and char, and nvarchar and varchar?
under which circumstances use each other?

thanks!
Tom Kyte
April 01, 2009 - 4:29 pm UTC

<quote src = expert Oracle database architecture>

The 'N' Variant

So, of what use are the NVARCHAR2 and NCHAR (for completeness)? They are used in systems where the need to manage and store multiple character sets arises. This typically happens in a database where the predominant character set is a single-byte fixed-width one (such as WE8ISO8859P1), but the need arises to maintain and store some multibyte data. There are many systems that have legacy data but need to support multibyte data for some new applications, or systems that want the efficiency of a single-byte character set for most operations (string operations on a fixed-width string are more efficient than on a string where each character may store a different number of bytes), but need the flexibility of multibyte data at some points.

The NVARCHAR2 and NCHAR datatypes support this need. They are generally the same as their VARCHAR2 and CHAR counterparts, with the following exceptions:

* Their text is stored and managed in the database¿s national character set, not the default character set.

* Their lengths are always provided in characters, whereas a CHAR/VARCHAR2 may specify either bytes or characters.

In Oracle9i and above, the database¿s national character set may take one of two values: UTF8 or AL16UTF16 (UTF-16 in 9i; AL16UTF16 in 10g). This makes the NCHAR and NVARCHAR types suitable for storing only multibyte data, which is a change from earlier releases of the database (Oracle8i and earlier allowed you to choose any character set for the national character set).

</quote>

Cristally clear as always!,thanks!

jianhuang zheng, April 01, 2009 - 10:17 pm UTC


fixed length

blury, April 02, 2009 - 2:07 am UTC

Hi Tom,

Let's get it simple...
From the source file:
MGWS093.MG01 ¹ù±õÉú [8spaces] ¸£½¨±¦ÃöÌåÓýÓÃÆ·ÓÐÏÞ¹«Ë¾ [2spaces] ABC

then after it was uploaded into DB, what I want is:
MGWS093.MG01 ¿¿¿ [8 + 3 spaces] ¿¿¿¿¿¿¿¿¿¿¿¿ [2 + 12spaces] ABC

but now, it was:
MGWS093.MG01 ¿¿¿ [8spaces] ¿¿¿¿¿¿¿¿¿¿¿¿ [2spaces] ABC

which, the chinese char is converted from ¹ù±õÉú(6byte) to ¿¿¿(3byte), and the space behind is shrink...so there's no way I can cut it from the line text into view as the predefined length is useless...

The reason why I want to use this is because our standard is to store all the information into a staging table 1st, then transfer and transform(mapping) to another table for storing. Since there are many different incoming files, thus if every incoming file we would want to create a separate CONTROL file, then it will be time consuming and difficult to maintain.
So, to solve this, we'll store whatever from the incoming file into 1 same table, 1 same column, then use VIEW to cut it, i/o having so many CONTROL file.
Tom Kyte
April 02, 2009 - 9:47 am UTC

You do know I cannot actually read any of this.

You will want to read:

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/toc.htm

and get an understanding of character sets and then ensure you are using the right ones when loading using sqlldr - and the right ones in the database itself.

fixed length

blury, April 02, 2009 - 11:15 pm UTC

Hi Tom...
The most important thing is not whether the chinese char is readable or not...it's the LENGHT
It's different before and after it was uploaded...the char setting is correct, as it readable if open with html...
If I can solve the problem, then I can use the dynamic CTL file...you have any idea??
Tom Kyte
April 03, 2009 - 7:25 am UTC

and that is because someones NLS_LANG characterset is different from the database characterset.

I urge you to read that link I sent you (it will pay dividends to you for years and years - and it'll explain to you what is happening)

nls_lang

jianhuang zheng, April 07, 2009 - 3:17 am UTC

Hi Tom
suppose my database characterset is us7ascii, and nls_nchar_characterset is AL16UTF16. I created a table nvv with nchar(20) col . now I want to store and display chinese character in nvv.
I set nls_lang=american_america.zhs16gbk, I can store and display chinese correctly from nvv table. I was wondering AL16UTF16 is different from zhs16gbk why there is no characterset conversion problem? If I want to store and display chinese character in nvv table, I dont have to consider database characterset us7ascii, is that right?
by the way, if I set nls_lang=american_america.utf8, it doesnt work to display correctly chinese character, why?

thanks!

"A Unicode Database that supports Chinese"

A reader, November 20, 2009 - 2:30 pm UTC

Tom - if you received in a spec for building a database
"WE8ISO8859P1 but Unicode would be better, should be able to support Chinese" - what exactly would you do for database character set, national character set, and default language?
Is it a slamdunk they want AL16UTF1616? Aren't there a lot of options here? I sent it back for clarification on database characterset, national characterset, and default language.
What do you think?
Tom Kyte
November 23, 2009 - 4:09 pm UTC

A Unicode database that supports Chinese

A reader, November 20, 2009 - 2:45 pm UTC

I forgot to mention.. Oracle 10gR2 10.2.0.4

A reader, September 09, 2010 - 12:59 pm UTC

Hi Tom,

Below is a small test I did, I am trying to insert the value Cédric in a column.
*************

SQL> select * from v$version;

BANNER
------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for HPUX: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> select property_value from database_properties where property_name='NLS_CHARACTERSET';

PROPERTY_VALUE
-----------------------------------------------------------

US7ASCII

SQL> select property_value from database_properties where property_name='NLS_NCHAR_CHARACTERSET';

PROPERTY_VALUE
-----------------------------------------------------------

AL16UTF16

SQL> CREATE TABLE TEST1(NAME NVARCHAR2(50));

Table created.

SQL> INSERT INTO TEST1 VALUES ('Cédric');

1 row created.

SQL> SELECT * FROM TEST1;

NAME
-----------------------------------------------------------

C ? d r i c

SQL>
***************

Does the above mean that the parameter NLS_NCHAR_CHARACTERSET is being ignored, or am I missing setting up some other parameter before inserting the data?

Thanks for your help!
Tom Kyte
September 09, 2010 - 9:01 pm UTC

n'Cedric'

use an nvarchar string - which starts with N

and make sure you TERMINAL is set to display those characters (use dump to verify the bytes that are stored)

Cédric the Entertainer?

Duke Ganote, September 10, 2010 - 8:13 pm UTC

00E9, that's the codepoint for é ( e-acute ), as the Unicode standard shows:

http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=00E9

Next the UNISTR and ASCIISTR functions come in handy.

SQL> insert into test values ( UNISTR('C\00E9dric') );

1 row created.

SQL> select NAME from test;

NAME
--------------------------------------------------
CΘdric

SQL> select asciistr(name) from test;

ASCIISTR(NAME)
--------------------------------------------------
C\00E9dric




Not able to insert GREEK character using SQLPLUS

Nikhilesh, February 16, 2012 - 6:45 am UTC

Dear Tom,

 INSERT INTO test1(col1,col2,col3,col4,col5,col6) VALUES (1324,1,'1','Îάν Î­Î½Î±Ï Î±ÏοÏÏÎ¿Î»Î­Î±Ï ÎµÎ¯Î½Î±Î¹ δηλÏÎ¼Î­Î½Î¿Ï Î³Î¹Î± Ïλα Ïα είδη ÏÏÏε ÏÏέÏει να δηλÏθεί Ïε εÏίÏεδο βαÏικÏν ÏÏοιÏείÏν ειδάλÏÏ Ïε εÏίÏεδο είδοÏÏ ÏÏÏÎ','1','');

On client machine this statement works fine with SQL DEVELOPER but when I execute it using pro*C or dynamic sql in sqlplus its gives following error.

  1  declare
  2  str varchar2(4000) := q'[INSERT INTO test1(col1,col2,col3,col4,col5,col6) VALUES (1324,1,'1','Îάν Î­Î½Î±Ï Î±ÏοÏÏÎ¿Î»Î­Î±Ï ÎµÎ¯Î½Î±Î¹ δηλÏÎ¼Î­Î½Î¿Ï Î³Î¹Î± Ïλα Ïα είδη ÏÏÏε ÏÏέÏει να δηλÏθεί Ïε εÏίÏεδο βαÏικÏν ÏÏοιÏείÏν ειδάλÏÏ Ïε εÏίÏεδο είδοÏÏ ÏÏÏÎ','1','')]';
  4  begin
  5  execute immediate str;
  6* end;
SQL> /
declare
*
ERROR at line 1:
ORA-00917: missing comma
ORA-06512: at line 5

Client character set is AMERICAN_AMERICA.WE8MSWIN1252
and database character set is AMERICAN_AMERICA.AL16UTF16.

I have not worked on GLOBALIZATION before. Can you pls help me out?


Thanks in advance.

Why LONG can't accomodate Chinese characters

A reader, August 06, 2012 - 12:28 pm UTC

Tom,

I am in process of makingchanges to my DB procedures to accomodate chinese characters. At many places LONG is being used to temporarily store data in procedures.

I have seen LONG datatypes can't store Chinese datatype. 1) why is that so.
2) Is there an equivalent to LONg that can store chinese
3) Or I will have to convert my LONg to NVarchar2. THese LONG variables are used to store select queries which are being built dynamically.
4) Would you know any impact if I change LONG to NVARCHAR2 datatype. What should I be careful about.

Please help


Tom Kyte
August 17, 2012 - 12:22 pm UTC

long is simply a varchar2 in plsql - type long is varchar2(32765);

it can store chinese, assuming your character set supports it.

if your character set doesn't but your national character set does, yes, you'd want to use an nvarchar2(32765)

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here