Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Venkatesh.

Asked: October 07, 2002 - 8:38 am UTC

Last updated: July 19, 2012 - 3:54 pm UTC

Version: 8.1.6

Viewed 100K+ times! This question is

You Asked

Greetings Tom !!
I am always here when I need to get that 100% right answer and so here is my question .
The existing Database (WE8ISO8859P1 Char Set) needs to cater to English ,Chinese,Japanese and Korean as well .
Before I opt to have a new UTF-8 instance , I need to clarify on a few points.

1. Can the data being pushed from the Browser (Lets say Korean data) be encoded in UTF-8 using the encoding mechanisms available in Java , and stored in the WE8ISO8859P1 Database ,is so then does it mean I have UTF-8 format data inside ?

Maybe it sounds absolutely inane , but it just occured to a few of my colleagues (Java Guys), that this was a possibility .

2. The Java guys also claim to have seen a system using a WE8ISO8859P1 database to store Chinese Data , the system would perfectly store , retrieve and display data in Chinese ?
Does this prove a point or am I missing out on something very significant ,like my basics on NLS support ?

Albeit, this issue has totally obfuscated me , I still feel very strongly that having a UTF-8 database is the perfect,long term solution .

Also if I were to export WE8ISO8859P1 db data to UTF8 db , then do I have to set the NLS_LANG=UTF8 prior to exporting the database(as opposed to the original Db char), so as to avoid the IMP-00016 error during import ? Is it the right way ?

Please let me know your thoughts on the same .

Thanx and Regards
Venkatesh





and Tom said...

1) Doesn't sound right, think about it.

WE8ISO is a single byte character set. It has 255 characters.

Korean data requires a multi-byte character set -- each character could be 1, 2, 3 or more bytes. It is a variable length encoding scheme. It has more then, way more then 255 characters. I don't see it fitting into we8iso unless they use RAW in which case it is just bytes, not characters at all.


2) tell them to show you this. They should be able to set up a test. However, just to mess with them -- make sure to test it where the client has a different character set then the database.


I can say this "There is a chance that if you use we8iso as the client character set and we8iso as the database characterset -- it could possibly work since NO character set translation would happen. However, with Java -- i seriously doubt it as Java is already NOT using we8iso, it has a different character set"


UTF8 would be the correct and proper approach.

Trying to fit the square peg into a round hole (abusing we8iso character set in this case) would lead to long term "pain" and "suffering" -- it'll bite you in the butt someday for sure.

Rating

  (155 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Multilingual data

Mark, October 07, 2002 - 9:13 pm UTC

Hi Tom ,
Does that mean I would still be able to store Chinese characters in a WE8ISO Db ,provided the Client and Db Char sets are the same ? Thru the java program if I managed to encode the data in WE8ISO scheme , according to you , it should work ? Please correct me if I am wrong .


Tom Kyte
October 08, 2002 - 7:24 am UTC

Not at all reliably -- no. The data would at some point appear to corrupt itself via some maintenance process due to character set conversion.


Chinese = more then one byte/ character
We8iso = one byte/ character

Impedence mismatch, you lose -- data corrupt.


don't do it unless you want lots of heartburn down the road.

Sasa, December 18, 2002 - 7:06 am UTC

Hi Tom,

We migrated our DB 8.1.7 to 9.2.In 8.1.7 we used UTF8 character set.It remains same in 9.2.
We know that Oracle 9.2 doesn't have UTF8 but AL32UTF8.
Can we keep this UTF8 or have to change to AL32UTF8.
If we need to change, may we do it by :
alter database character set AL32UTF8
or
we must use exp/imp utility?

Regards

Tom Kyte
December 18, 2002 - 11:10 am UTC

what do you mean -- utf8 is still a valid character set?

Migrating to 9.2

Sasa, December 18, 2002 - 11:54 am UTC

After migration to 9.2 when we run this query:
select * from nls_database_parameters where parameter like '%CHARACTERSET'
we got :
Parameter Value
------------------ ------
NLS_CHARACTERSET UTF8
NLS_NCHAR_CHARACTERSET UTF8

We are still confused with this values.
Any explanation?





Tom Kyte
December 18, 2002 - 12:42 pm UTC

why are you confused? what are you confused about?

UTF8 in Oracle 9.2

Sasa, December 19, 2002 - 4:13 am UTC

Hi tom,

We are confused because during creation of new db under Oracle 9.2 we have had no opportunity to choose UTF8 as character set, only AL32UTF8 and AL16UTF16.

Thanks

Tom Kyte
December 19, 2002 - 7:43 am UTC

it is in my drop down list when I use DBCA.

I just went to the "character sets" tab, selected "choose from the list" and it is there.

Maybe you were just looking at the "use unicode" radio selection -- look down one more item.

Multilingual Database with 8i client and 9i server

Marie, February 22, 2003 - 9:50 pm UTC

Hi Tom,

Currently I have to work with oracle9.2.0.2.1 server as a database server, which uses UTF8 characterset. 
At the same time I have to use oracle8.1.7 client since some perl modules only work with oracle8 client.
But I found out that I could not use oracle8.1.7 client to insert diacritics value, the data value 
is changed after inserting into the database. If I use oracle9i client, I can insert and select then get the same value.

For example: 
==============================================================
Using oracle9i client,

 SQL> create table t (tm varchar2(100));
 
 Table created.
 
 SQL> insert into t (tm) values ('êë');
 
 SQL> select tm from t;
 
 TM
 ------------------------------------------------------------
 êë
 
 1 row selected.
 
Elapsed: 00:00:00.02
=============================================================
If I use oracle8i client, I get the following:

 SQL> insert into t (tm) values ('êë');
 
 SQL> select tm from t;
 
 TM
 ------------------------------------------------------------
 ++
 
 ======================================================
 
 So is there a way I can get diacritics value work CORRECTLY in the environment of 
 Oracle8i client and Oracle9i server ?
 
 Thanks, 

Tom Kyte
February 23, 2003 - 8:47 am UTC

just set your NLS_LANG correctly -- you will find that your NLS_LANG is set differently in 8i then in 9i. This CLIENT setting dictates to the database what character set it should expect from the client and the characters get converted appropriately.

UTF-8

A reader, February 24, 2003 - 6:37 am UTC

Please tell us which one has to be used - either utf-8 or
some other - explicity please...

Tom Kyte
February 24, 2003 - 9:13 am UTC

impossible to answer.

This is like asking "what font must be used" -- well, the one that works for you must be used and that well could be different then the on I use.

I use we8iso, why? I do not store multi-byte data. would that work for you? maybe, maybe not.

analyze your needs, read the globalization guide and decide WHICH is best for YOU given YOUR circumstances.

Char sets can be brutal

A reader, February 24, 2003 - 1:29 pm UTC

If the client and db char sets are the same- you can stuff anything in, and if the tool expects a certain char set, it'll come out "looking fine". While it may seem dandy at first- the long term prospects are horrendous.

I worked on a migration project that turned out to be "un-migratable" because of it. Tool one created accounts in batch, tool two via a nifty little Java program for the CSRs. As long as you looked at the Java tool accounts with the Java tool and the others with the older tool, all was well. These clowns built a high-end OLTP system that was stuffing data in both Latin-1 and UTF-8.

Once you mix char sets, you can't separate things out later. Take a look at what Oracle thinks is in the table using the SQL Dump function and you'll see that as soon as you tell Oracle that data is Latin-1 and you want it to be UTF-8, there will be characters that Oracle won't be able to understand. You then get to decide which accounts aren't really important. But actually, it's even better- there is of course overlap in the char sets, so Oracle looks at some of the data and says, yup I can convert that, but it's actually from the other char set and results in gibberish.

If you go this path, take the fast track and make sure the Java guys don't use bind variables, you'll be able to redesign the system in less time that way.







There is a character set scanner tool shipped with the database, you can

A reader, April 23, 2003 - 8:13 am UTC

Hi Tom,

We are migrating from oracle 8.1.6 to oracle 9 R2. We have about 14 oracle instance. All instances have WE8ISO88591P1
character set. Our company is expanding globally so we are thinking to use unicode character set with oracle 9.
I have few questions on this issue.

1) What is the difference between UTF-8,UTF-16
Is AL32UTF8 and UTF-8 is same character set or they are different?
Is UTF-16 and AL16UTF16 is same character set or different ?

2) Which character is super set of all character set?
If there is any, Does oracle support that character set?

3) Do we have to change our pl/sql procedure if we move to unicode database ? The reason for this question is our developer is using ascii character for carrage return and line feed like chr(10) and chr(13) and some other ascii character .

4) What is impact on CLOB ?

5) What will be the size of the database? Our production DB size is currently 50GB. What it would be in unicode?

Thanks

Tom Kyte
April 23, 2003 - 6:57 pm UTC

you want to read

</code> http://docs.oracle.com/docs/cd/B10501_01/nav/docindex.htm#index-GLO

from start to finish.

see
http://docs.oracle.com/docs/cd/B10501_01/server.920/a96529/glossary.htm#432012 http://docs.oracle.com/docs/cd/B10501_01/server.920/a96529/glossary.htm#433170 http://docs.oracle.com/docs/cd/B10501_01/server.920/a96529/glossary.htm#432169 <code>

basically utf8 is unicode 3.0 support, utf16 is unicode 3.1


there is no super super "top" set.


Your plsql routines may will have to change -- your data model may well have to change.

You'll find that in utf, european characters (except ascii -- 7bit data) all take 2 bytes. That varchar2(80) you have in your database? It might only hold 40 characters of eurpean data (or even less of other kinds of data). It is 80 bytes (you can use the new 9i syntax varchar2( N char ) -- it'll allocate in characters, not bytes).

So, you could find your 80 character description field cannot hold 80 characters.

You might find that x := a || b; fails -- with string to long in your plsql code due to the increased size.

You might find that your string intensive routines run slower (substr(x,1,80) is no longer byte 1 .. byte 80 -- Oracle has to look through the string to find where characters start and stop -- it is more complex)


chr(10) and chr(13) should work find, they are simple ASCII.


On clob -- same impact as on varchar2, same issues.


Your database could balloon to 200gb, but it will be somewhere between 50 and 200. As unicode is a VARYING WIDTH encoding scheme, it is impossible to be precise -- it is not a fixed width scheme, so we don't know how big your strings will get to be.

japanese character how to view it on sqlplus

reader, April 23, 2003 - 10:07 am UTC

tom,
i have created a database with utf8 character set.

UTF8WL>select * from nls_database_parameters where parameter like '%CHARACTERSET';

PARAMETER VALUE
============================== =========================================================
NLS_CHARACTERSET UTF8
NLS_NCHAR_CHARACTERSET UTF8

i created a table temp, i want to insert a japanese character and view it on sqlplus - please let me how to do it.

i can take a japense character from my application,but when i copy to sqlplus to insert it goes as ????. dont know why!

please through some light on this


Thanks

Tom Kyte
April 23, 2003 - 7:12 pm UTC

I don't know -- I cannot type japanese on my keyboard, I don't have support for it fontwise, my telnet window doesn't do japanese.

you probably have the same problem -- your terminal emulator, set of windows fonts, NLS_LANG -- whatever is wrong.

Globalization

A Reader, April 28, 2003 - 10:18 pm UTC

I read the subject oracle doc online. However, I would like to clarify some ponits with you.

(1) If my database char set is UTF8, does it mean that I can store chars longer than a byte (ex chinese or japanese chars) in CHAR or VARCHAR2 datatypes even though I have not specified ny National Char set when I created my database.

(2) Only if my client system or application is set in a character set (for example UTF8) the clients can input foreign chars such as Chinese to store in the database.

(3) Oracle does character set conversion NOT data conversion. My understanding is that using globaliztion we can store chars in many languages in the database
if the database character set supports.

(4) Do I have to explicitly set National char set while creating database to use NCHAR or NVARCHAR datatypes in my tables?

I would greatly appreciate if you could please clarify these points. Thanks so much for your time. You are the best.

Tom Kyte
April 29, 2003 - 7:54 am UTC

1) yes -- the nchar stuff just lets you have two character sets at the same time.

UTF8 is a character set that permits you to store multi-byte data.

2) statement of fact, not a question. the client needs to use a character set setting that is appropriate for the data they are inserting. They need not be using utf8 -- we will convert using the rules ISO tells us to use. They (the chinese) would not use US7ASCII obviously -- but they could use a big8 or some other relevant character set.

3) yes.

4) no.

Thanks

A reader, April 29, 2003 - 10:06 pm UTC


UTF8 vs UTF-8

A reader, July 01, 2003 - 6:02 am UTC

I thought that I was beginning to get a handle on this multi-byte stuff but now I'm confused. (Sometimes the more you read, the more you realize how little you know!) The 9i Global. support guide states:

UTF8
The UTF8 Oracle character set encodes characters in one, two, or three bytes. It is for ASCII-based platforms. The UTF8 character set supports Unicode 3.0. Although specific supplementary characters were not assigned code points in Unicode until version 3.1, the code point range was allocated for supplementary characters in Unicode 3.0. Supplementary characters are treated as two separate, user-defined characters that occupy 6 bytes.

UTF-8
The 8-bit encoding of Unicode. It is a variable-width encoding. One Unicode character can be 1 byte, 2 bytes, 3 bytes, or 4 bytes in UTF-8 encoding. Characters from the European scripts are represented in either 1 or 2 bytes. Characters from most Asian scripts are represented in 3 bytes. Supplementary characters are represented in 4 bytes


UTF-8 *and* UTF8?? Even though the differences are listed above I'm not sure that I exactly understand the distinction. Which UTF-eight characterset has been the one discussed in this thread up until now -- UTF8 or UTF-8?

Thanks,

Tom Kyte
July 01, 2003 - 8:14 am UTC

utf8 is a character set

utf-8 is an encoding scheme (like an algorithm)


(hey, we don't make up these names, we just use them -- they are made up by others)


see
</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96529/ch5.htm#1004607 <code>

utf8 is a charcterset implemented using the utf-8 ENCODING scheme


nls isssues

K.Venkat, October 01, 2003 - 9:31 am UTC

Dear Tom,

We have inserted japanese data using japanese key board through  our java application into the database of character set UTF8.
When we retrieve data either through the application or backend, it is displayed as '?????'. 

Our NLS_LANG setting is AMERICAN_AMERICA.UTF8.

The output using the dump function on the column wherein we have inserted japanese data is given below:
SQL> select dump(pt_desc,1016) from w_type;

DUMP(PT_DESC,1016)
--------------------------------------------------------------------------------
Typ=1 Len=31 CharacterSet=UTF8: e2,80,9a,c3,8d,e2,80,9a,c2,b5,e2,80,9a,c3,86,20,
e2,80,9a,c3,8d,e2,80,9a,c2,b5,e2,80,9a,c3,86

Typ=1 Len=5 CharacterSet=UTF8: 3f,3f,3f,3f,3f
Typ=1 Len=5 CharacterSet=UTF8: 4d,55,54,48,55
Typ=1 Len=16 CharacterSet=UTF8: 4d,55,54,48,55,20,4d,41,52,54,48,41,4e,44,41,4d
Typ=1 Len=14 CharacterSet=UTF8: 3f,3f,3f,3f,3f,3f,3f,3f,3f,3f,3f,3f,3f,3f
Typ=1 Len=7 CharacterSet=UTF8: 3f,3f,3f,3f,3f,3f,3f
Typ=1 Len=7 CharacterSet=UTF8: 3f,3f,3f,3f,3f,3f,3f
Typ=1 Len=5 CharacterSet=UTF8: 3f,3f,3f,3f,3f
Typ=1 Len=5 CharacterSet=UTF8: 3f,3f,3f,3f,3f


What could be the possible reason for non-display of the japanese data? Is there any other setting we will have to look into and change?

Thanks and Regards,
Venkat

 

Tom Kyte
October 01, 2003 - 9:42 am UTC

fonts, client not displaying it correctly. the data looks right doesn't it (via dump). it is a client display issue. when your java app retrieves it and shows it, what does it look like then.

nls issues

Venkat, October 01, 2003 - 9:49 am UTC

Dear Tom,

Thanks for your immediate response.
Is there any way to query and see the data in japanese at the back end so that we can confirm that it is certainly not a data storage problem.

Thanks and Regards,
Venkat

Tom Kyte
October 01, 2003 - 10:18 am UTC

look at the dump output -- is it what you expected? that is what is stored.

nls issues

Venkat, October 01, 2003 - 10:09 am UTC

Dear Tom,

In the application , the displays is '?????'.
Forgot to add this point.

Thanks and Regards,
Venkat

nls issues

Venkat, October 01, 2003 - 10:25 am UTC

Dear Tom,

The application team wants to see the japanese characters when they query this data at the back end through sqlplus to confirm that the storage and display of data is fine at the backend. What is the way to achieve this? We open a telnet session with the server and get connected to the database.

Thanks and Regards,
Venkat



Tom Kyte
October 01, 2003 - 1:42 pm UTC

find a client that can show you japanes? does your telnet client support it?

nls issues

Venkat, October 03, 2003 - 1:54 am UTC

Tom,

In the dump output, the character set of the database is displayed. How to interpret and confirm that the record is stored in utf8? A sample output of dump showing the particular record containing japanese data is given below:

DUMP(PT_DESC)
--------------------------------------------------------------------------------
Typ=1 Len=31: 226,128,154,195,141,226,128,154,194,181,226,128,154,195,134,32,226
,128,154,195,141,226,128,154,194,181,226,128,154,195,134

Typ=1 Len=5: 63,63,63,63,63
Typ=1 Len=5: 77,85,84,72,85
Typ=1 Len=26: 239,190,157,239,190,133,239,189,182,239,190,157,239,190,133,32,239
,189,182,239,189,189,239,189,189,32

Typ=1 Len=26: 239,190,157,239,190,133,239,189,182,239,190,157,239,190,133,32,239
,189,182,239,189,189,239,189,189,32

This is just to confirm to the application team that the data storage is correct in the database.

Thanks and Regards,
Venkat

Tom Kyte
October 03, 2003 - 8:19 am UTC

you'd have to look up the utf8 encoding scheme and interpret the output

Look at DOC Id 69518.1 in Metalink for a detailed answer to this query

Sanjay Jha, October 03, 2003 - 10:56 am UTC

UNICODE ENCODINGS 
The Unicode Character Set has a few encodings, i.e. ways in which characters 
are represented by binary codes. The main one is UCS-2 in which exactly 2 bytes 
are used to store each character. Unicode codes of characters are conventionally 
specified in this encoding and denoted as U+nnnn, where nnnn is the hex UCS-2 
code of the character in range 0000-FFFF. 
The second popular encoding is UTF-8 -- multibyte encoding scheme where 
characters are stored in either 1, 2 or 3 bytes. Oracle RDBMS supports only 
the UTF-8 encoding. The corresponding database character set is named UTF8. 
MAPPING BETWEEN UCS-2 AND UTF-8 
For character with UCS-2 codes in range 0x0000-0x007f (0-127), the character 
code has no more than 7 significant bits and the UCS-2 and UTF-8 code points 
(bit patterns) are the same, except that the UCS-2 code has 2 bytes and UTF-8 
only one. If you want to use the CHR function to insert a Unicode character 
in this range into an UTF8 database you simply use CHR(n) (where n is 0-127). 
UTF-8 codes in this range are identical to US7ASCII. 
For Unicode characters with codes from 0x0080 (128) onwards, UTF-8 uses 
multibyte encoding and so you have to convert the UCS-2 code to the multibyte 
code point. 
The mapping between UCS-2 and UTF-8 is as follows: 
Encoding bits Unicode (UCS-2) UTF-8 bit pattern UTF-8 byte size 
------------- --------------- --------------------------- --------------- 
7 0x0000 - 0x007f 0xxxxxxx 1 
11 0x0080 - 0x07ff 110xxxxx 10xxxxxx 2 
16 0x0800 - 0xffff 1110xxxx 10xxxxxx 10xxxxxx 3 
As an example, from either of the above URLs, character U+00C2 is "A" with a 
circumflex above. The UCS-2 code is obviously 0x00c2. To find its UTF-8 code 
point: 
1. 0x00c2 lies within the 11 bit range (i.e. there are from 8 to 11 significant 
bits in the code) so the UTF-8 bit pattern to be used is 
110xxxxx 10xxxxxx 
|___| |____| 
| | 
A B 
2. Convert 0x00c2 to its bit representation: 0000 0000 1100 0010 
3. Convert 1100 0010 to the UTF-8 bit pattern: 
Take the rightmost 6 bits and slot them into position B: 1000 0010 
Take the remaining 5 bits (11-6) and slot them into position A: 1100 0011 
Any bit positions not used are set to 0. 
4. The UTF8 bit pattern is: 1100 0011 1000 0010 
5. Convert this to decimal: 50050 
The UTF-8 code point for UCS-2 character U+00C2 is 50050. You can use CHR(50050) 
to insert "A" with circumflex above into an UTF8 database. 
Similar example could be given for the Euro currency symbol, codepoint U+20AC. 
20AC = 10000010101100. We have to map this to the following 3 byte pattern: 
1110xxxx 10xxxxxx 10xxxxxx 
|__| |____| |____| 
| | | 
A B C 
A = 0010 , B = 000010 , C = 101100 
(note that we prefixed A with '00') 
So we get: 111000101000001010101100 binary, that is 14844588 decimal. 
a more handy place to find a UTF8 codepoint: 
http://www.macchiato.com/unicode/charts.html
(the UCS2 hex codepoint corresponds with the UTF16 hex) 
create table test ( testcol VARCHAR2(10)); 
insert into test values (chr(50050)); 
(-> U+00C2 : LATIN CAPITAL LETTER A WITH CIRCUMFLEX) 
insert into test values (chr(14844588)); 
(-> U+20AC : EURO SIGN) 
If you want to insert into an Nvarchar column you need to use: 
create table test ( testcol NVARCHAR2(10)); 
insert into test values (chr(50050 USING NCHAR_CS)); 
insert into test values (chr(14844588 USING NCHAR_CS)); 
From V9 onwards you can use also this syntax: 
* the Nchar: 
create table test ( testcol NVARCHAR2(10)); 
insert into test values (nchr(50050)); 
insert into test values (nchr(14844588)); 
* the UNISTR function to insert unicode directly: 
create table test ( testcol NVARCHAR2(10)); 
insert into test values (UNISTR('\00c2')); 
insert into test values (UNISTR('\20AC')); 
the value is the UCS2 code for the symbol, you can append this like this: 
insert into test values (UNISTR('\20AC\00c2')); 
Oracle will do no normalisation on this type of operation, so insert of 
insert into test values (UNISTR('\0041\0308')); 
will insert 2 symbols the A and the ¨ and not Ä ... 
6) With the "dump" command you can see what is actually stored in the database 
but you get the representation for each byte back, hence if you want to check 
the data in a UTF8 database you need to covert the UCS-2 to UTF8 and then to 
the decimal for each byte. 
For "A" with circumflex this gives 195,130 
For the Euro symbol this gives 226,130,172 
SQL> select dump(testcol) from test; 
DUMP(TESTCOL) 
-------------------------------------- 
Typ=1 Len=2: 195,130 
Typ=1 Len=3: 226,130,172 
the "Len" give the number of BYTES the row physically uses in the 
database, if you store only one character, then you have of course the length of 
the character (3 bytes for euro in utf8 as you can see) 
other options with "dump": 
SQL> select dump(testcol,16) from test; 
DUMP(TESTCOL,16) 
---------------------------------------- 
Typ=1 Len=2: c3,82 
Typ=1 Len=3: e2,82,ac 
50050 decimal is C382 hex and 14844588 decimal is E282AC hex 
adding a '16' to the dump command shows the data in hexadecimal format. 
Side remark: 
How can you tell the number of bytes for a character encoded in UTF-8? (! not UCS2 !!! ) 
You look at the first hex digit and follow this rule: 
If it has X then the character is n byte long 
----------- -------------------------------------- 
0 through 7 1 
C or D 2 
E 3 
F 4 
SQL> select dump(testcol,1016) from test; 
DUMP(TESTCOL,1016) 
------------------------------------------------- 
Typ=1 Len=2 CharacterSet=AL32UTF8: c3,82 
Typ=1 Len=3 CharacterSet=AL32UTF8: e2,82,ac 
adding a '1016' to the dump command shows the characterset used for the data 
and the data in hexadecimal format. 
SQL> select dump(testcol,1010 )from test; 
DUMP(TESTCOL,1010) 
--------------------------------------------------- 
Typ=1 Len=2 CharacterSet=AL32UTF8: 195,130 
Typ=1 Len=3 CharacterSet=AL32UTF8: 226,130,172 
adding a '1010' to the dump command shows the characterset used for the data 
and the data in decimal format. 
You can also use the convert function if you know the codepoint of a character 
in another characterset: (if you know the codepoint of the euro is 128 in the 
WE8MSWIN1252 characterset by example) 
SQL> select dump(convert(chr(128),'UTF8','WE8MSWIN1252'),16) from dual; 
DUMP(CONVERT(CHR(128) 
--------------------- 
Typ=1 Len=3: e2,82,ac 
the syntax is: 
select dump(convert(chr(<Decimal value>),'<target_char_set>','<source_char_set>'),16) from dual; 
 

multilingual database

Venkat, November 18, 2003 - 8:08 am UTC

Dear Tom,

The DB character set is UTF8.Like to know whether the NLS_LANG needs to be set suitably during every insertion/querying of different language data?

For example, during insertion of japanese data, setting the value of NLS_LANG as AMERICAN_AMERICA.JA16SJIS and settting as AMERICAN_AMERICA.UTF8 for insertion of Hindi( one of the indian languges). Is this the way to go?

Regards,
Venkat




Tom Kyte
November 21, 2003 - 7:46 am UTC

not every statement

but for every SESSION -- yes, every SESSION would want to say what NLS_LANG they are using.

multilingual database

Venkat, November 25, 2003 - 4:40 am UTC

Dear Tom,

We like to store japanese,korean and english data in our database of character set UTF8. My colleague says that UTF8 may not support all japanese characters and suggest going for AL32UTF8 as charater set and AL16UTF16 as national character set. Is this correct? If so, kindly explain the rationale behind this. If not, what should be the value for character set and national character set and why?

Regards,
Venkat

Tom Kyte
November 25, 2003 - 7:52 am UTC

check out the globalization guide.

</code> https://docs.oracle.com#index-GLO <code>

I don't work with enough different charactersets personally (in particular, never worked with the languages you want to support) to answer.

US7ASCII TO UTF8

Alvin, December 01, 2003 - 2:20 am UTC

We have an existing db using US7Ascii charset if we made it UTF8 will there be performance issues ? since characters will be represented by several bytes ergo less data fitting on the block buffer cache??



Tom Kyte
December 01, 2003 - 7:10 am UTC

all of the us7ascii data will be a byte in utf8.

the performance "issues" could come from two places

a) string functions can no longer assume a byte per character. they may be more cpu intensive

b) increased storage


I truly hesitate to call them "issues" however, they are simply the price of playing the game here. If you have a multi-byte requirement, they are the price of admission

the buffer cache issue will be in my opinion "less then noise" performance wise.

support for multi language

Sanjiv S, January 07, 2004 - 6:37 am UTC

Dear Tom,
I have installed Oracle 9i with UTF-8 characterset and nls_language as AMERICAN. I want to use this Database for storing ENGLISH and FRENCH Data.I want to know the following
how do i log onto sqlplus with my language as FRENCH / ENGLISH. what are the parameters that i have to set. if i have logged in as French then i want all the ora messages to be in french. when i alter the session and set the date format to dd/mm/yyyy it works but when i alter the session and set the nls_language to french it executes and on giving an errorneous select statement it returns the ora message in English and not french. Pls. help

Tom Kyte
January 07, 2004 - 8:16 am UTC

it is all ready to store both. utf8 is perfect for that (overkill even).

the nls language just controls the language errors are shown in

do you have "french" installed? (i don't)

A reader, January 07, 2004 - 3:38 pm UTC

Suggest that you read the support Note:152260.1 and 179133.1


Tom Kyte
January 08, 2004 - 11:13 am UTC

Hmm, an article on getting the windows code page to show a Euro symbol

And an article on getting windows fonts to show correctly...

Interesting, but of what relevance to this thread exactly........

Strange problem...

Piotr Jarmuz, January 22, 2004 - 10:18 am UTC

Hello Tom,

I have the following:


----------------------------------------------------------------------------------
create table clob_table(pk number, text clob, primary key(pk));

create or replace java source named test2 as
import java.lang.*;
import java.sql.*;
import oracle.sql.*;
import oracle.jdbc.driver.*;
import java.io.*;
public class Test2 {
public static void save(String fn, CLOB clob) throws Exception {
OutputStream out=clob.getAsciiOutputStream();
InputStream z=new FileInputStream(fn);
byte[] buffer=new byte[1024];

int cnt;
while ((cnt=z.read(buffer))!=-1) {
out.write(buffer,0,cnt);
}
z.close();
out.close();
}
public static void main(String s[]) throws Exception {
Class.forName ("oracle.jdbc.driver.OracleDriver");
Connection con=DriverManager.getConnection("jdbc:oracle:thin:@bill:1521:devdb","USER","PASSWORD");
con.setAutoCommit(false);
CallableStatement cs=con.prepareCall("begin insert into clob_table (pk,text) values (?,empty_clob()) returning text into ?; end;");
cs.registerOutParameter(2,OracleTypes.CLOB);
cs.setInt(1,10);
cs.execute();
CLOB clob=((OracleCallableStatement)cs).getCLOB(2);
save("/var/tmp/plain.txt",clob);
con.commit();
}
};

/

create or replace procedure save(p_fname varchar2, p_clob clob)
as language java
name 'Test2.save(java.lang.String, oracle.sql.CLOB)';
/

declare
l_clob clob;
begin
insert into clob_table (pk,text) values (10,empty_clob()) returning text into l_clob;
save('/var/tmp/plain.txt',l_clob);
end;


$> cat /var/tmp/plain.txt
über

----------------------------------------------------------------------------------


Then if run the following Java program as client from windoze or unix it stores the word `über' as it should

Offset Byte
--------------
0 252
1 98
2 101
3 114

but if I deploy the Java program directly on the server and call the above anonymous block it stores `?ber'
Offset Byte
--------------
0 191
1 98
2 101
3 114

My server is:

Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE 9.2.0.1.0 Production
TNS for Linux: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production

Both client and server use WE8ISO8859P1

Can you explain this?

Thank you in advance


Tom Kyte
January 22, 2004 - 6:42 pm UTC

before we go down this path -- why would you do this rather than:

dbms_lob.loadFromFile

?? why would you even dream of writing code?

Extra processing needed

Piotr Jarmuz, January 23, 2004 - 3:20 am UTC

Because I need to process the file before it is saved in database. This processing is something pl/sql is not capable to do (currently, I know 10g could do that).



Tom Kyte
January 23, 2004 - 7:55 am UTC

hmm, processing plsql is not capable of doing.

such as???


but anyway -- it is most likely the fact that java is "ucs2" or whatever character set it demands and you are seeing the effects of character set conversion.

suggestion:

a) use dbms_lob.loadfromfile -- get into clob
b) have java process clob, not raw input file

see what happens then (but it'd be interesting to see what java can do that you could not do in plsql)

I found a solution

Piotr Jarmuz, January 23, 2004 - 3:58 am UTC

I use Writer from clob.getCharacterOutputStream and explicit byte[] -> char[] encoding. And that works albeit a little bit slower.

But still if you could explain the original problem. I'd be grateful. Understanding "why" is sometimes more important than finding an alternate solution.

Thank you very much.

Tom Kyte
January 23, 2004 - 7:55 am UTC

that would be a character set conversion issue -- by using "byte" (equivalent to raw) you are bypassing that. java is a different character set and you are seeing the defined character set conversion going on.

Thanks!

Piotr Jarmuz, January 23, 2004 - 8:33 am UTC

This "mysterious" ;) processing is zip compression. I omitted this in original question for brevity. 9i can't do that natively but 10g can. So I opted for java (I could do it in C as well)

We had similar issue already some time ago:
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:12980645053487, <code>

But so far I have not had to store "strange" characters so the problem didn't manifest itself.

Well, that it is character conversion issue was clear. But why does it work on the client (exactly the same Java code) and fails on the server???

Maybe it is different Java versions?
Client 1.4.2_01-b06
Server 1.3.1

And maybe it is because of different "drivers". On the server it is native "already there" connection (supposedly native code) while on the client it is "normal" remote connection (java class). Whether it is a bug I can't tell.

Thank you very much,

Regards
Piotr


Issue is OCI versus thin driver

Piotr Jarmuz, January 26, 2004 - 3:05 am UTC

Hi again,

Just to make a final note (sorry for being persistent on that issue but I could not sleep until I found an explanation ;))

So the little java client program above, when run (on the client's machine) with java thin driver, does not make character conversion IMO as should be cause client character set=server character set but with OCI driver it does character conversion and stores junk in DB.

So now I am sure it is a bug. Cause the drivers should work at least consistently. Changing the driver may not change the semantics of a program.

Best regards,
Piotr


Solution to last question

Ajay, March 26, 2004 - 9:47 am UTC

Pls see the metalink note Note:60942.1
Subject: JDBC Drivers and NLS
It answers your query

Clarification on varchar2 and NVarchar2

Alvin, April 30, 2004 - 3:25 am UTC

Pls help me clarify some language issues.

As of my understanding.

1. If my db characterset is JA16SJIS i can store japanese characters in column types char, varchar and varchar2.

2. Using the same charset as above (JA16SJIS) and if i want to store Chinese char in my db it should be placed on columns defined as Nchar, Nvarchar and Nvarchar2 and set my nls_language = chinese charset.

3. If i want to store both chinese and japanese on the same table using the same column defined as char, varchar, varchar2 my db characterset should be = UTF8 ?

4. 'Fortunately' for us our current db charset is US7ASCII and i've read that if using US7ASCII it only takes 'alter database character set xxxxxx' command to change the existing character set.


select * from nls_database_parameters;

PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET US7ASCII
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZH:TZM
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZH:TZM
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_NCHAR_CHARACTERSET US7ASCII
NLS_RDBMS_VERSION 8.1.7.0.0



Tom Kyte
April 30, 2004 - 7:34 am UTC

2) well, your NCHAR characterset in the database should be something that supports your other characterset -- not your NLS_LANG (that would just cause chinese characters to be converted into japanese characters, probably destroying the data entirely).

UTF8 or some other generic characterset might be more appropriate for you (all around)

have you read the globalization guide?


3) most likely.....

4) see the NLS guide (globalization guide), chapter 3.

correction on the above post

Alvin, April 30, 2004 - 3:30 am UTC

#2 should be nls_nchar_characterset instead of nls_language

Globalization guide

Alvin, May 03, 2004 - 4:05 am UTC

I've 8i and 9i documentations available.

Since our db's in 8i i'm reading the 8i docs. There seems to be no globalization guide on 8i just the Oracle8i National Language Support Guide Release 2 (8.1.6). 

1. Where can i find documents or guides laying out which db characterset are strict superset of another characterset ?

2. If my target characterset is a strict superset of my current characterset. Do i need to do a full export/import ? or just the suggested 

---8<---
SQL> SHUTDOWN IMMEDIATE;   -- or NORMAL
    <do a full backup>

SQL> STARTUP MOUNT;
SQL> ALTER SYSTEM ENABLE RESTRICTED SESSION;
SQL> ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
SQL> ALTER DATABASE OPEN;
SQL> ALTER DATABASE CHARACTER SET <new_character_set_name>;
SQL> SHUTDOWN IMMEDIATE;   -- or NORMAL
SQL> STARTUP;

---8<---

3. How does the isuance of 'alter database character set xxxx' affects standby databases ?


 

Tom Kyte
May 03, 2004 - 7:28 am UTC

the NLS guide is the globalization guide.

1) contact support, they might have one -- they are "standard", we just follow the standards as far as that goes.

2) if you can use the alter database -- that would be the prefered method. you would use the character set scanner to see if there will be any issues.

3) good question -- for a physical, it should work as it is just binary redo being generated and applied. for a logical -- I'm not sure. that would be a question for support to research. I do not see anything written up about it.

A reader, May 04, 2004 - 2:20 pm UTC

Tom,

I have 2 database db1 (9204) with AL32UTF8 characterser and db2 (8174) with WEISO889P1 characterset.

I am trying to execute a procedure (passing some parameters) from 9204 on 8174 using db links. It throws up an error "database character set conversion not supported"

If i convert those parameters to raw and pass it works, but i dont wanna do that as it involves lots of changes.

Can you please suggest some solution.

Thanks.


Tom Kyte
May 04, 2004 - 3:25 pm UTC

exact ora-nnnn error number is?

have you looked at "convert"?

A reader, May 04, 2004 - 9:08 pm UTC

Tom,

This is the error message.

Ora-12703 "This character set conversion is not supported"

Sorry Tom i didn't get you on convert one. Can you please clarify.

Thanks.






Tom Kyte
May 05, 2004 - 7:17 am UTC

show us the output from this, I'm assuming you had typos on the other characterset name?

tkyte@ORA9IR2W> select convert( 'hello world', 'al32utf8', 'we8iso8859p1' ) from
dual
2 /

CONVERT('HE
-----------
hello world

tkyte@ORA9IR2W>
tkyte@ORA9IR2W> select convert( 'hello world', 'we8iso8859p1', 'al32utf8' ) from
dual
2 /

CONVERT('HE
-----------
hello world

A reader, May 05, 2004 - 10:14 am UTC

Thanks for your feedback. Sorry i typed it wrong,corrected.
As i mentioned before it will difficult to change the code,
Can we implement a trigger for the user who executes this procedures the parameter that will be passed will be converted to we8iso8859p1 and viceversa.


Tom Kyte
May 05, 2004 - 2:22 pm UTC

don't know what you mean by "trigger" here?

A reader, May 05, 2004 - 6:39 pm UTC

What i meant is can we enforce some plsql code so that whenever the procedure is executed the parameter that is passed to the procedure is automatically converted to we8iso8859p1 characterset.

Thanks.

Tom Kyte
May 05, 2004 - 8:22 pm UTC

it *should be*

if the test case I supplied worked -- there is no reason it would not be. meaning it would be "a bug" perhaps.

so, what was the outcome?

A reader, May 06, 2004 - 10:20 am UTC

Tom,

Here is the output,

Error on line 0
select convert( 'hello world', 'al32utf8', 'we8iso8859p1' ) from dual

ORA-01482: unsupported character set

Error on line 0
select convert( 'hello world', 'we8iso8859p1', 'al32utf8' ) from dual

ORA-01482: unsupported character set


Thanks.

Tom Kyte
May 06, 2004 - 10:59 am UTC

seems that the necessary language support files were not installed -- you'll need to have your DBA fix that. Until that script runs, you won't have any joy.

about nvarchar2

lizhuohua, May 11, 2004 - 9:32 am UTC

Dear Tom,
  I have this:
SQL> show user
User is "lizh"
SQL> drop table emp;
Table dropped
SQL> drop table dept;
Table dropped
SQL> create table emp as select * from scott.emp where 1=0;
Table created
SQL> create table dept as select * from scott.dept where 1=0;
Table created
SQL> alter table DEPT modify DEPTNO nvarchar2(20);
Table altered
SQL> alter table EMP modify DEPTNO nvarchar2(20);
Table altered
SQL> insert into dept select * from scott.dept;
4 rows inserted
SQL> insert into emp select * from scott.emp;
14 rows inserted
SQL> commit;
Commit complete
SQL> alter table DEPT
 2    add constraint pk_dept primary key (DEPTNO);
Table altered
SQL> alter table EMP
  2    add constraint pk_emp primary key (EMPNO);
Table altered
SQL> alter table EMP
  2    add constraint fk_dept foreign key (DEPTNO)
  3    references dept (DEPTNO);
Table altered
SQL> create or replace function gencommastr(pc in sys_refcursor)
  2  return varchar2
  3  is
  4    temp   varchar2(100);
  5    Result varchar2(1000);
  6  begin
  7    loop
  8      fetch pc
  9        into temp;
 10      exit when pc%notfound;
 11      Result := Result || ',' || temp;
 12    end loop;
 13    return nvl(substr(Result, 2),'null');
 14  end;
 15  /
Function created
SQL> select gencommastr(
  2         cursor (select ENAME
  3                 from emp
  4                 where DEPTNO = d.deptno)) result,
  5         deptno
  6    from dept d;
RESULT                               DEPTNO
------------------------------------ ----------------------------------------
null                                 10
null                                 20
null                                 30
null                                 40
SQL> ALTER SESSION SET NLS_NCHAR_CONV_EXCP=TRUE;
Session altered
SQL> select gencommastr(
  2         cursor (select ENAME
  3                 from emp
  4                 where DEPTNO = d.deptno)) result,
  5         deptno
  6    from dept d;
RESULT                               DEPTNO
------------------------------------ ----------------------------------------
null                                 10
null                                 20
null                                 30
null                                 40
SQL> select gencommastr(
  2         cursor (select ENAME
  3                  from emp
  4                 where DEPTNO = d.deptno)) result,
  5         deptno
  6    from (select to_char(deptno) deptno,dname from dept) d;
RESULT                               DEPTNO
------------------------------------ ------------------------------------
CLARK,KING,MILLER                    10
SMITH,JONES,SCOTT,ADAMS,FORD         20
ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES 30
null                                 40


-------------------------------------------------------------------------
   Why the result is "null" when I use Nvarchar2?

(SQL> select * from v$nls_parameters;
PARAMETER                        VALUE
-------------------------------- --------------------------------
NLS_LANGUAGE                     SIMPLIFIED CHINESE
NLS_TERRITORY                    CHINA
NLS_CURRENCY                     RMB
NLS_ISO_CURRENCY                 CHINA
NLS_NUMERIC_CHARACTERS           .,
NLS_CALENDAR                     GREGORIAN
NLS_DATE_FORMAT                  DD-MON-RR
NLS_DATE_LANGUAGE                SIMPLIFIED CHINESE
NLS_CHARACTERSET                 ZHS16GBK
NLS_SORT                         BINARY
NLS_TIME_FORMAT                  HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT             DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT               HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT          DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY                RMB
NLS_NCHAR_CHARACTERSET           AL16UTF16
NLS_COMP                         BINARY
NLS_LENGTH_SEMANTICS             BYTE
NLS_NCHAR_CONV_EXCP              TRUE
19 rows selected
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE    9.2.0.3.0    Production
TNS for 32-bit Windows: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production

sql:
drop table emp;
drop table dept;
create table emp as select * from scott.emp where 1=0;
create table dept as select * from scott.dept where 1=0;
alter table DEPT modify DEPTNO nvarchar2(20);
alter table EMP modify DEPTNO nvarchar2(20);
insert into dept select * from scott.dept;
insert into emp select * from scott.emp;
commit;
alter table DEPT
     add constraint pk_dept primary key (DEPTNO);
alter table EMP
  add constraint pk_emp primary key (EMPNO);
alter table EMP
     add constraint fk_dept foreign key (DEPTNO)
     references dept (DEPTNO);
create or replace function gencommastr(pc in sys_refcursor)
    return varchar2
    is
      temp   varchar2(100);
      Result varchar2(1000);
    begin
      loop
        fetch pc
          into temp;
       exit when pc%notfound;
       Result := Result || ',' || temp;
     end loop;
     return nvl(substr(Result, 2),'null');
   end;
   /
select gencommastr(
           cursor (select ENAME
                   from emp
                   where DEPTNO = d.deptno)) result,
           deptno
      from dept d;
ALTER SESSION SET NLS_NCHAR_CONV_EXCP=TRUE;
select gencommastr(
           cursor (select ENAME
                   from emp
                   where DEPTNO = d.deptno)) result,
           deptno
      from dept d;
select gencommastr(
           cursor (select ENAME
                    from emp
                   where DEPTNO = d.deptno)) result,
           deptno
      from (select to_char(deptno) deptno,dname from dept) d;
) 

Tom Kyte
May 11, 2004 - 10:40 am UTC

you passed a nvarchar to a varchar -- converting it. using the explicit to_char, you change the behaviour (converted again).

use the same type everywhere

always beware IMPLICIT conversions.

about nvarchar2

lizhuohua, May 11, 2004 - 9:04 pm UTC

hi Tom,
  Thank you for your review.

>SQL> alter table DEPT modify DEPTNO nvarchar2(20);
>Table altered
>SQL> alter table EMP modify DEPTNO nvarchar2(20);
>Table altered

emp.deptno and dept.deptno are all nvarchar2(20)
Why there are IMPLICIT conversions?
And 
  >ALTER SESSION SET NLS_NCHAR_CONV_EXCP=TRUE;
Why there are no errors reported if the data lossed when conversions?
(sorry for my poor english ;) )
Thank you in advance. 

Tom Kyte
May 12, 2004 - 7:30 am UTC

in plsql there sure was

temp varchar2(100);


data CONVERSION is not an error, but rather the designed behaviour.

Just like when you


string := date;


you might "lose" the seconds, or the day, or whatever depending on your NLS_DATE_FORMAT. with character set conversions -- the rules are well defined "map this character to that character" -- that is what we do.

about nvarchar2

lizhuohua, May 12, 2004 - 11:40 pm UTC

hi tom,
 I have changed the datatype in my table and function:
SQL>DESC DEPT
    Name                        Null?    Type
    --------------------------- -------- ----------------
    DEPTNO                      NOT NULL NVARCHAR2(20)
    DNAME                                NVARCHAR2(20)
    LOC                                  NVARCHAR2(20)
SQL>DESC EMP
    Name                        Null?    Type
    --------------------------- -------- ----------------
    EMPNO                       NOT NULL NUMBER(4)
    ENAME                                NVARCHAR2(20)
    JOB                                  NVARCHAR2(20)
    MGR                                  NUMBER(4)
    HIREDATE                             DATE
    SAL                                  NUMBER(7,2)
    COMM                                 NUMBER(7,2)
    DEPTNO                               NVARCHAR2(20)

create or replace function gencommastr(pc in sys_refcursor)
    return nvarchar2
    is
      temp   nvarchar2(100);
      Result nvarchar2(1000);
    begin
      loop
        fetch pc
          into temp;
       exit when pc%notfound;
       Result := Result || ',' || temp;
     end loop;
     return Result;
   end;
   /


I get the same result!
I was confused by this.
Can you Please point me where I'am wrong?

Thank you.
 

Tom Kyte
May 13, 2004 - 10:12 am UTC

confirmed -- looks like something is wrong, i just copied deptno=10:

ops$tkyte@ORA9IR2> select emp.ename, dept.deptno
  2    from emp, dept
  3   where emp.deptno = dept.deptno;
 
ENAME      DEPTNO
---------- --------------------
CLARK      10
KING       10
MILLER     10
 
<b>it joins right, but</b>

ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select cursor (select ENAME
  2                     from emp
  3                     where DEPTNO = d.deptno ) result,
  4             deptno
  5        from dept d;
 
RESULT               DEPTNO
-------------------- --------------------
CURSOR STATEMENT : 1 10
 
CURSOR STATEMENT : 1
 
no rows selected
 

<b>no joy with the cursor variable...  But the value is getting passed in, but it looks strange on the way out:</b>

 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select d.deptno, cursor( select d.deptno from dual )
  2    from dept d;
 
DEPTNO               CURSOR(SELECTD.DEPTN
-------------------- --------------------
10                   CURSOR STATEMENT : 2
 
CURSOR STATEMENT : 2
 
DEPTNO
--------------------------------
 1 0
 
 
<b>here is a workaround for you if you like -- might be more efficient as well:</b>


ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> select dept.deptno, stragg(emp.ename)
  2    from emp, dept
  3   where emp.deptno = dept.deptno
  4   group by dept.deptno
  5  /
 
DEPTNO
--------------------
STRAGG(EMP.ENAME)
------------------------------------------------------------------------
10
CLARK,KING,MILLER
 

(search this site for stragg to pick up that user defined aggregate package)


Interestingly, in 8i and before, you would get:

ops$tkyte@ORA817DEV> select cursor (select ENAME
  2                     from emp
  3                     where DEPTNO = d.deptno ) result,
  4             deptno
  5        from dept d;
      from dept d
           *
ERROR at line 5:
ORA-00604: error occurred at recursive SQL level 1
ORA-12704: character set mismatch


if you would like -- i would persue this via support with this very small test case:

drop table emp cascade constraints;
create table emp ( deptno nvarchar2(10), ename varchar2(10) );
drop table dept cascade constraints;
create table dept(deptno nvarchar2(10) );
insert into emp values( n'10', 'king' );
insert into dept values( n'10' );
commit;
                                                                                                                       
select emp.ename, dept.deptno
  from emp, dept
 where emp.deptno = dept.deptno;
                                                                                                                       
select cursor (select ENAME
                   from emp
                   where DEPTNO = d.deptno ) result,
           deptno
      from dept d;
                                                                                                                       
select d.deptno, cursor( select d.deptno from dual )
  from dept d;

 

lizhuohua, May 13, 2004 - 2:10 am UTC

hi tom,
In my database I need to store Chinese and English.
When the input mixed Chinese and English I use Nvarchar2 (for verify data length easier),when the input only English character I use varchar2.
What do you think about it?
How can I avoid the problem that I asked you previously?
Please give me some advise.
Thank you.


about nvarchar2

lizhuohua, May 16, 2004 - 1:42 am UTC

hi tom,
Thanks for the review.
I have done some test use your test case,this is the result:

lizh@lizhdb>alter session set nls_language=english;

Session altered.
(lizh:for display english).

lizh@lizhdb>drop table emp cascade constraints;

Table dropped.

lizh@lizhdb>create table emp ( deptno nvarchar2(10), ename varchar2(10) );

Table created.

lizh@lizhdb>drop table dept cascade constraints;

Table dropped.

lizh@lizhdb>create table dept(deptno nvarchar2(10) );

Table created.

lizh@lizhdb>insert into emp values( n'10', 'king' );

1 row created.

lizh@lizhdb>insert into dept values( n'10' );

1 row created.

lizh@lizhdb>commit;

Commit complete.

lizh@lizhdb>select emp.ename, dept.deptno
2 from emp, dept
3 where emp.deptno = dept.deptno;

ENAME DEPTNO
---------- --------------------
king 10

lizh@lizhdb>
lizh@lizhdb>
lizh@lizhdb>select cursor (select ENAME
2 from emp
3 where DEPTNO = d.deptno ) result,
4 deptno
5 from dept d;

RESULT DEPTNO
-------------------- --------------------
CURSOR STATEMENT : 1 10

CURSOR STATEMENT : 1

no rows selected


lizh@lizhdb>
lizh@lizhdb>
lizh@lizhdb>select d.deptno, cursor( select d.deptno from dual )
2 from dept d;

DEPTNO CURSOR(SELECTD.DEPTN
-------------------- --------------------
10 CURSOR STATEMENT : 2

CURSOR STATEMENT : 2

DEPTNO
--------------------------------
1 0

and test another sql:
lizh@lizhdb>column deptno format a30
lizh@lizhdb>select d.deptno,
2 cursor (select deptno
3 from dept)
4 from (select to_char(deptno) deptno from dept) d;

DEPTNO CURSOR(SELECTDEPTNOF
------------------------------ --------------------
10 CURSOR STATEMENT : 2

CURSOR STATEMENT : 2

DEPTNO
------------------------------
10

A reader, July 07, 2004 - 9:20 pm UTC

Hi Tom,

Is AL32UTF8 and UTF-8 is same character set or they are different?
Is UTF-16 and AL16UTF16 is same character set or different ?

Thanks

Tom Kyte
July 08, 2004 - 8:05 am UTC

AL32UTF8 is the Unicode 3.1 UTF-8 Universal character set.

AL16UTF16 is the Unicode 3.1 UTF-16 Universal character set.

Can we convert UTF-8 back to WE8ISO8859P1

A Reader, July 18, 2004 - 8:45 am UTC

Hi Tom,

If, for any reason, we want to convert UTF-8 back to WE8ISO8859P1, can we do that and what is the easiest way to do so?

Thank you for your help.

Tom Kyte
July 18, 2004 - 12:27 pm UTC

export/import

utf-8 is a super, not sub, set of we8iso

Why do we get this error ?

Nags, August 19, 2004 - 10:52 am UTC

When we run the following query why do we get this error ?

SELECT COMPANY_NAME
FROM COMPANY
UNION
SELECT '' AS COMPANY_NAME
FROM COMPANY

SELECT COMPANY_NAME
*
ERROR at line 1:
ORA-12704: character set mismatch

The column COMPANY_NAME is defined as NVARCHAR2(200).

We have all UTF8 data. Althout it is UTF8 there is only English data right now. Why is it mandatory to put an N before the ''.

This query works :

SELECT COMPANY_NAME
FROM COMPANY
UNION
SELECT N'' AS COMPANY_NAME
FROM COMPANY

Why cannot Oracle automatically convert it ? Or it supposed to work like that ? This works in SQL Server.

Tom Kyte
August 19, 2004 - 10:54 am UTC

because nvarchar2 <> varchar2

and '' is a varchar2

and company_name is a nvarchar2

so -- they "mismatch".


it would be like

ops$tkyte@ORA10G> select 1 from dual
  2  union all
  3  select '1' from dual;
select 1 from dual
       *
ERROR at line 1:
ORA-01790: expression must have same datatype as corresponding expression

you have two imcompatible datatypes. 
 

Why do we get this error ?

Nags, August 19, 2004 - 2:16 pm UTC

The error we are getting is

SELECT COMPANY_NAME
*
ERROR at line 1:
ORA-12704: character set mismatch

Doesn't this indicates that the character set is different.

The parameters we have are

NLS_CHARACTERSET AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16

Both are compatible, right ?

Tom Kyte
August 19, 2004 - 7:49 pm UTC

er? how about the entire statment please.

How to Insert and Query Swedish Characters

Pawan, August 19, 2004 - 3:39 pm UTC

Here are some fields including Swedish characters that we want to load 
Vendor Name: Björneborgs Lådfabrik AB
Address Line 1: Köttgränd 4
Postal Code: 740 22
City: Bälinge
Contact First Name: Åke
Contact Last Name: Möhré
Bank Name: Föreningssparbanken
Bank Branch Address Street: Smålandsvägen 32
Bank Branch Address City: Västervik

    Here are some questions that I have:
·    How will supplier names with special characters impact the load of data into the datawarehouse?  Will the data be rejected or accepted?  Will the data display in Business Objects reports with special characters or without?
·    How will supplier names with special characters impact the load of data into a Java based application?  Will the data be rejected or accepted?  Will the data display in on line with special characters or without?
·    Can queries be run in a Java based application if the suppliers have special characters?  Is there a workaround?
·    Can queries be run in Oracle if the suppliers have special characters? Since the keyboards in US, Mexico, Hungary do not have special characters.  Is there a workaround? Like using ASCII.


    SQL> select * from V$NLS_PARAMETERS
    
    PARAMETER                         VALUE
    ------------------------- -----------------------------------
    NLS_LANGUAGE                          AMERICAN
    NLS_TERRITORY                         AMERICA
    NLS_CURRENCY                      $
    NLS_ISO_CURRENCY                  AMERICA
    NLS_NUMERIC_CHARACTERS            .,
    NLS_CALENDAR                          GREGORIAN
    NLS_DATE_FORMAT                   DD-MON-YY
    NLS_DATE_LANGUAGE                 AMERICAN
    NLS_CHARACTERSET                  WE8ISO8859P1
    NLS_SORT                              BINARY
    NLS_TIME_FORMAT                   HH.MI.SSXFF AM
    NLS_TIMESTAMP_FORMAT            DD-MON-RR HH.MI.SSXFF AM
    NLS_TIME_TZ_FORMAT                HH.MI.SSXFF AM TZR
    NLS_TIMESTAMP_TZ_FORMAT           DD-MON-RR HH.MI.SSXFF AM TZR
    NLS_DUAL_CURRENCY                 $
    NLS_NCHAR_CHARACTERSET            UTF8
    NLS_COMP                              BINARY
    NLS_LENGTH_SEMANTICS              BYTE
    NLS_NCHAR_CONV_EXCP               FALSE
    
    19 rows selected.
    
    SQL> select * from NLS_DATABASE_PARAMETERS
      2* where parameter ='NLS_CHARACTERSET'
    SQL> /
    
    
    PARAMETER                     VALUE
    -------------------------     -----------------------------------
    NLS_CHARACTERSET          WE8ISO8859P1
    

Thanks for the wonderful service 

Tom Kyte
August 19, 2004 - 7:59 pm UTC

given your character set, i don't see any special characters?

if you used US7ASCII, sure, but given that you are using WE8ISO8859P1, as long as your clients do too (NLS_LANG set in the environment/registry), they are "not special"

How to replace special characters

Pawan, September 02, 2004 - 4:14 pm UTC

Thanks Tom.But supposed I just wanted to replace a special character like å,ä etc.with say "a" and "A".How do I do it?



Tom Kyte
September 02, 2004 - 4:30 pm UTC

replace( string, '<special character tom cannot type>', 'a' )

or

translate( string, 'abc', 'xyz' )



Why this?

pawan, September 04, 2004 - 1:17 pm UTC

Tom,
  I have a table temp_tilde2 - someone else created it so I don't know how they inserted the record. When I try to insert a record using ASCII equivalent 148  of say an o(letter o in lower case) with two dots on top this is what I get this :

SQL>  desc tilde_temp2;
 Name                            Null?    Type
 ------------------------------- -------- ----
 VENDOR_NAME                              VARCHAR2(240)

SQL> select * from tilde_temp2;

VENDOR_NAME
----------------------------------------------------------------------------
--------
Lövånger Elektronik AB

SQL> col value format a30
SQL> select * from NLS_DATABASE_PARAMETERS where parameter like
'%CHARACTERSET';

PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_CHARACTERSET               UTF8
NLS_NCHAR_CHARACTERSET         UTF8

SQL> select dump(vendor_name) from tilde_temp2;

DUMP(VENDOR_NAME)
----------------------------------------------------------------------------
------------------------
Typ=1 Len=24:
76,195,182,118,195,165,110,103,101,114,32,69,108,101,107,116,114,111,110,105
,107,32,65
,66

SQL> insert into tilde_temp2 values ('A'||chr(195)||chr(182)||'BCDE');

1 row created.

SQL> select * from tilde_temp2;

VENDOR_NAME
----------------------------------------------------------------------------
Lövånger Elektronik AB
A¿BCDE

SQL> Delete from tilde_temp2 where
vendor_name like 'A%'

1 row deleted.

SQL> insert into tilde_temp2 values ('A'||chr(148)||'BCD');

1 row created.

SQL> select * from tilde_temp2;

VENDOR_NAME
----------------------------------------------------------------------------
-
Lövånger Elektronik AB
A¿BCD

SQL> 

Why am I not getting an o with double dots on top? 

Thanks
 

Tom Kyte
September 04, 2004 - 1:45 pm UTC

what is YOUR characterset, your CLIENTS characterset.

and dump the substr( vendor_name,2,1)

Characterset

pawan, September 04, 2004 - 1:53 pm UTC

Tom,
   Thanks for the reply. I am using SQL Plus to connect to the database. How do I find the client's character set. Here is the output you requested.

SQL> select dump (substr(vendor_name,2,1)) from tilde_temp2
/

DUMP(SUBSTR(VENDOR_NAME,2,1))
--------------------------------------------------
Typ=1 Len=2: 195,182
 

Tom Kyte
September 04, 2004 - 2:35 pm UTC

echo $NLS_LANG in unix

fire you your registry editor and start hunting away in that easier OS windows.

NLS_LANG

A reader, September 04, 2004 - 5:08 pm UTC

Thanks Tom,
I knew in Unix. Yes I have Windows and the registry setting is

NLS_LANG = AMERICAN_AMERICA.WE8ISO8859P1

Can I change it to UTF8?



Tom Kyte
September 04, 2004 - 5:28 pm UTC

yes, you can change it to utf8.

Time Zone

reader, January 08, 2005 - 4:41 pm UTC

How does Oracle handle day light savings times? Does it automatically change it for example, if I set my database time zone to US/PST? Thanks.

Tom Kyte
January 08, 2005 - 5:06 pm UTC

just like your OS does. we record the current time using your timezone stuff. meaning there will be two 2am's some days -- but they differ in the descriptive timezone text that comes with it.


day light savings time

reader, January 08, 2005 - 10:58 pm UTC

<quote>they differ in the descriptive
timezone text that comes with it.<quote>

Could you elaborate on this? Thanks.

so, If I set database time zone US/PST, it will fall back an hour and Spring forward an hour automatically?


Tom Kyte
January 09, 2005 - 11:40 am UTC

EST vs EDT

on some day during the year, you have 2am EST and 2am EDT -- that tz text changes.

Just like on your computers. It works just the same.

day light savings

reader, January 09, 2005 - 12:34 pm UTC

Thanks.

(1) should I set my time zone for the database to US/EDT to take care of this. Is there US/PDT for pacific?

(2) If I don't use the tz as EDT, then oracle would not take care of day light savings?

Thanks.

Tom Kyte
January 09, 2005 - 1:22 pm UTC

set it to the timezone you would have told your OS to set itself to.


whatever timezone you set, we use.

db timezone

rob, January 09, 2005 - 8:09 pm UTC

so, can db timezone be different from server (os) time zone? When you say whatever we set at os level we can use it, then why do we have to set time zone for the database? Maybe I am missing a point here. Thanks.

Tom Kyte
January 10, 2005 - 8:17 am UTC

you can set it to whatever you like, but setting it to be the same as the OS would make the most sense as they would both physically always be in the same TZ.

8i national charecter set

reader, January 10, 2005 - 2:13 pm UTC

My understanding is that in 8i any character set can be used for National Character set. I was wondering what is the point of using US7ASCII as National character set as it cannot store char that needs more than a byte? Thanks.

Tom Kyte
January 10, 2005 - 2:19 pm UTC

if it doesn't make sense to you, don't do it?

I had no reason to have it be anything else, no other value made any more "sense" as I don't use nvarchars

UTF8

A reader, January 18, 2005 - 5:46 pm UTC

My database is currently us7ascii, everyone is happy, no globalization needs yet. There is this one reporting app that we are considering that needs some database to park its metadata, needs minimal storage (<100MB). For Oracle, it requires the characterset to be UTF8, the installation aborts if it is not.

Does it make sense to convert the entire database to AL32UTF8 just for the sake of this app?

The other alternative is to create another 100MB database containing just this one schema, but then we have another little production database to manage, with its associated admin overhead.

Comments?

Thanks

Tom Kyte
January 19, 2005 - 10:12 am UTC

<quote<>
Does it make sense to convert the entire database to AL32UTF8 just for the sake
of this app?
</quote>

there is just about 1 person in the world that can answer that and it is not me, it is YOU.

I don't like the dblink idea, it'll be problematic, especially with the character set conversions that will be constantly taking place back and forth between the two.

From UFT8 to WE8ISO8859P1

A Reader, February 02, 2005 - 11:19 am UTC

Hi Tom,

We have two databases, a new one (9iR2) uses UTF8 and the old one (8.1.4) uses WE8ISO8859P1. In both databases, there is a table that has NVARCHAR2 column (the tables are identical in both database). Currently, tables only have western characters. We want to transfer some new data into the old table. We can do the following in the new UTF8 database:

insert into t_we8@db_link_to_old select * from t_utf8;

The query is executed without any errors. However, when we select from the t_we8 table, there is no data showing in the NVARCHAR2 column.

Please help us with that. What is the correct way to transfer UTF8 data to WE8ISO8859P1 with NVARCHAR2 column?

Thank you in advance.

Tom Kyte
February 03, 2005 - 12:50 am UTC

have you tried good old export/import.

have you filed your tar with support about this issue?

Want to see day of sysdate in arabic !!!

Parag Jayant Patankar, May 03, 2005 - 9:09 am UTC

Hi Tom,

We are in Oracle 9.2.0.6 on which I have activated isqlplus. I am just trying to see sysdate in arabic for this reason in isqlplus I am trying following

alter session set nls_language='ARABIC';

select to_char(sysdate, 'DAY') from dual;

I am also using IE 6 in this I am doing setting view -> encoding -> arabic ( tried dos/windows ..etc ) tried unicode ...etc

But still it is showing question marks or some junk characters. Pl guide me where I have done mistake and how to correct it ?

regards & thanks
pjp

Tom Kyte
May 03, 2005 - 1:59 pm UTC

sorry - outside of my expertise, i simply don't have the facilities to play with it.

i don't use isqlplus at all.

A reader, May 03, 2005 - 3:06 pm UTC

Check the database characterset.

If I remember correctly it should be one
of AR8ISO8859P1/P6

UTF-8 should work as well. I have not tried this one for Arabic.


A reader, May 03, 2005 - 3:10 pm UTC

I forgot to include one more point in addition to database characterset.

Your NLS_LANG should be set appropriately as well.

Please refer to Oracle manual - NLS_LANG session to see a valid list of NLS_LANG settings and choose the one for Arabic.

Example : NLS_LANG=Arabic.United Arab Emirates.AR8MSWIN1256




Tom Kyte
May 03, 2005 - 7:08 pm UTC

yes, but it needs to be setup for isqlplus

MULTILINGUAL DATABASE !!!

Parag Jayant Patankar, May 04, 2005 - 8:11 am UTC

Hi Tom,

I am planning to set multilingual database specially for asian language for an application in oracle 9.2.0.6. As a first step I am trying to see day of the month in native language. For e.g. in Arabic or Hindi or Tamil ...etc. My understanding is that your frontend should able to display these speical characters. For this reason I am trying to see special characters thru isqlplus and setting IE6 for language support, which is failed.

Can you or anybody tell me how to do for asian language with an example ? ( isqlplus is not a mandatory requirement for me)

regards & thanks
pjp



Tom Kyte
May 04, 2005 - 9:47 am UTC

set your nls_lang on your client (can be an environment variable)

connect to database using client tool like SQLPLUS

query. data will be convered to be in the characterset of your client

nls_lang

Parag Jayant Patankar, May 05, 2005 - 9:05 am UTC

Hi Tom,

thanks for your valuable help to all oracle users/dbas for giving more insight to oracle. As per your suggestion I have tried to set nls_lang ( set nls_lang=HINDI_INDI.UTF8 ) and invoke sqlplus by setting up language in Windows XP SR3 but still it is showing junk characters. Will you pl show me by an example for asian language

1. setting up nls_lang
2. executing sysday in native language
3. storing sysday in a table
4. extracting a data from a table

regards & thanks
pjp

Tom Kyte
May 05, 2005 - 9:31 am UTC

I don't have anything in asian available to me, nor would I have the fonts necessary to display such. sorry.

CEL8ISO8859P14 vs UTF8

Jeeva, May 19, 2005 - 7:26 am UTC

Greetings Tom,
In one of our systems upgrade programme we are migrating 7.3 database to 10g. The charcterset was US7ASCII. As a part of upgrade we also need to cater to Celtic language support. Oracle is providing CEL8ISO8859P14 characterset for supporting Celtic languages. I opine that we should use this characterset. However, the data architects believe the database characterset should be UTF8, since it would take care of Celtic language. As for the issue of length semantics of BYTE or CHAR, they recommend setting the session NLS_LENGTH_SEMANTICS parameter, so that we can use all the sql scripts, stored procedures etc. They opine that storage is not an issue, so we should go for UTF8, even if the database balloons to a bigger size. I read in one of your answers above on this issue about string manipulations being CPU intensive, may get slower, if the database characterset is set to UTF8. On checking the code and business logic, I found there isn't much of string manipulations such as substr or instr to make life worse.
I feel UTF8 is an overkill.
Is there any other issue, I might have overseen that may help me in convincing the data architects to consider CEL8ISO8859P14 over UTF8.
Regards
Jeeva



Tom Kyte
May 19, 2005 - 8:16 am UTC

for future flexibility, they are correct -- what if you need to accept other things that don't fit in the celtic character set?

At the end of the day, this probably isn't worth arguing about.

Thanks

Jeeva, May 19, 2005 - 8:21 am UTC

Thanks Tom,
Your view has struck a chord in me. There isn't anything to argue about on this issue.


WE8ISO8859P1 or UTF8

DB, June 03, 2005 - 2:20 pm UTC

Hi Tom,

I'm in the process of creating an Oracle 10g database. No specific user request with respect to the character set. Given this situation, which do you recommend WE8ISO8859P1 or UTF8?

Thank you

Tom Kyte
June 03, 2005 - 5:13 pm UTC

I'd recommend getting a specific user request.

*I* use we8iso on asktom, but only because *I* have no intentation of ever needing multi-byte data in my particular database.

If I did, it would be different (maybe, there is always nvarchar2)

Use AL32UTF8 or UTF8

Mike Friedman, June 04, 2005 - 8:05 am UTC

Face it... just because you have no intention of using other character sets that does not mean that everyone else will cooperate with you.

Look here: &#20013;&#25991;

I just typed some Chinese. Doesn't matter what you planned - I didn't cooperate.

What will you do when you integrate with a vendor system and you find out that some product descriptions have randome foreign characters? Or double length english like this: &#65316;&#65359;&#65365;&#65346;&#65356;&#65349;&#12288;&#65324;&#65349;&#65358;&#65351;&#65364;&#65352;&#65311; Or Asian style punctuation like this&#65311; &#12290;&#12290;&#12290;&#12290;&#65292;&#65288;&#65289;

These days Unicode is the default choice&#12290;

Tom Kyte
June 04, 2005 - 8:38 am UTC

But I don't care what you typed, I don't want it, my application doesn't want. I cannot handle it. Not because my database is we8iso -- but becuase I don't want it.

And if I wanted it, I have the nvarchar2, nclob to deal with it.

You see, if there was only one choice that should be made in all circumstances -- well, we wouldn't waste the effort implementing them all would we?

So, I'll stick with what I've said.

NLS_LANG or TRANSFORM

Ashley, July 01, 2005 - 7:28 am UTC

Tom I have imported several tables that contain arabic data which I would like to convert to english. Whats the best way to do this?

Tom Kyte
July 01, 2005 - 10:20 am UTC

you'll need to find some software that converts arabic into english and hire a human being that can validate the conversion was even remotely accurate....



arabic into english

Ashley, July 01, 2005 - 10:57 am UTC

Ok so there is no way we can do this with NLS_LANG, TRANSLATE or CONVERT functions? If not do you recommend any 3rd party software that can do this?

Tom Kyte
July 01, 2005 - 12:02 pm UTC

I've never been faced with the problem of language translation, no.

Maybe you need to restate your requirement

Michael Friedman, July 01, 2005 - 11:50 am UTC

Are you really asking if the Oracle DB can translate Arabic text into English? Machine translation?

I'll admit it's not totally unreasonable - the company that created the Oracle Context technology (now part of Intermedia) was working on machine translation before Oracle bought them - but it's not on the horizon for any of the major DB vendors.

There are third party tools that can do a semi-decent job that provides a good starting point for human translators (ie. Systrans) but I suspect that you are looking for something different like a Roman character transliteration.

Suggest you clarify your requirement.

internationalization

thirumaran, July 12, 2005 - 8:06 am UTC

Hi Tom,

Pl explain the pros of sets like UTF8 vs. western alphabet sets - performance tradeoffs.

some example will be much helpfull

Tom Kyte
July 12, 2005 - 5:10 pm UTC

pro of utf8, it can hold any data, it is multibyte
con of utf8, it can hold any data, it is multibyte

Maybe this beta version of what'll be in my next book will help...

<quote>

NLS Overview
NLS stands for National Language Support and it is a very powerful feature of the database, but one that is often not as well understood as it should be.  NLS controls many things about our data – how it is sorted, whether we see commas and a single period in a number – eg: 1,00,000.01 -- or many periods and a single comma , 1.000.000,01.  But most importantly, it controls the encoding of the textual data as stored persistently on disk and it transparently converts data from character set to character set.  It is this transparent part that confuses people the most – it is so transparent you cannot even really see it happening.  Let’s take a small example.

Suppose you are storing 8 bit data in a WE8ISO8859P1 character set in your database, but have some clients that connect using a 7 bit character set such as US7ASCII.  These clients are not expecting 8 bit data and need to have the data from the database converted into something they can use.  While this sounds wonderful, if you are not aware of it taking place, you might well find your data to “lose” characters  over time as the characters that are not available in US7ASCII are translated into some character that is.  This is due to the character set translation taking place, in short, if you retrieve data from the database in character set 1, convert it to character set 2, and then insert it back (reversing the process) there is a very good chance that you have materially modified the data.  Character set conversion is typically a process that will change the data and we are typically mapping a large set of characters (in this example the set of 8 bit characters) into a smaller set (that of the 7 bit characters).  This is a lossy conversion, you modify the characters,  – we quite simply cannot represent every character.  But, it must take place – we have to do this conversion.  If the database is storing data in a single byte character set but the client (say a java application, that language uses Unicode) expects it in a multi-byte representation, it must be converted, simply so the client application can work with it.

You can see character set conversion very easily.  I have a database whose character set is set to WE8ISO8859P1, a typical Western European characterset:

ops$tkyte@ORA10G> select *
  2  from nls_database_parameters
  3  where parameter = ‘NLS_CHARACTERSET’;
 
PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_CHARACTERSET               WE8ISO8859P1

Now, if I ensure my NLS_LANG is set the same as my database characterset (Windows users would change/verify this setting in their registry):

ops$tkyte@ORA10G> host echo $NLS_LANG
AMERICAN_AMERICA.WE8ISO8859P1

We can create a table and put in some “8 bit” data – data that will not be usable by a 7 bit client, one that is expecting only 7 bit ASCII data:

ops$tkyte@ORA10G> create table t ( data varchar2(1) );
Table created.
ops$tkyte@ORA10G> insert into t values ( chr(224) );
1 row created.
 
ops$tkyte@ORA10G> insert into t values ( chr(225) );
1 row created.
 
ops$tkyte@ORA10G> insert into t values ( chr(226) );
1 row created.
 
ops$tkyte@ORA10G> select data, dump(data) dump
2    from t;
D DUMP
- --------------------
à Typ=1 Len=1: 224
á Typ=1 Len=1: 225
â Typ=1 Len=1: 226 
ops$tkyte@ORA10G> commit;

Now, if we go to another window and specify we are “a 7 bit ASCII” client, we’ll see quite different results:

[tkyte@desktop tkyte]$ export NLS_LANG=AMERICAN_AMERICA.US7ASCII
[tkyte@desktop tkyte]$ sqlplus /
SQL*Plus: Release 10.1.0.4.0 - Production on Mon May 30 15:58:46 2005
Copyright © 1982, 2005, Oracle.  All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - Production
With the Partitioning, OLAP and Data Mining options
ops$tkyte@ORA10G> select data, dump(data) dump
2   from t;
D DUMP
- --------------------
a Typ=1 Len=1: 224
a Typ=1 Len=1: 225
a Typ=1 Len=1: 226

Notice how in that session we received the letter ‘a’ three times, no diacritical marks.  However, the DUMP function is showing us that in the database – there are in fact three separate distinct characters, not just the letter ‘a’.  Our data in the database hasn’t changed, just the values this client received.  And in fact, if this client were to retrieve that data into host variables:

ops$tkyte@ORA10G> variable d varchar2(1)
ops$tkyte@ORA10G> variable r varchar2(20)
ops$tkyte@ORA10G> begin
2  select data, rowid into :d, :r from t where rownum = 1;
  3  end;
  4  /
PL/SQL procedure successfully completed.

And do nothing whatsoever with it, just send it back to the database:

ops$tkyte@ORA10G> update t set data = :d where rowid = chartorowid(:r);
1 row updated.
ops$tkyte@ORA10G> commit;
Commit complete.

We would observer in the original session that we have lost one of the original characters, it has been replaced with the lowly 7 bit ‘a’, not the fancy ‘a’ we had previously:

ops$tkyte@ORA10G> select data, dump(data) dump
  2  from t;
 
D DUMP
- --------------------
a Typ=1 Len=1: 97
á Typ=1 Len=1: 225
â Typ=1 Len=1: 226

That shows the immediate impact of an environment with a heterogeneous character set, where the clients and database use different settings.  It is something to be aware of, if you were not expecting it.  It comes into play in many circumstances – for example, if the DBA uses the EXP tool to extract information, they may note:

[tkyte@desktop tkyte]$ exp userid=/ tables=t
Export: Release 10.1.0.4.0 - Production on Mon May 30 16:19:09 2005
Copyright © 1982, 2004, Oracle.  All rights reserved.
 
Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 - Production
With the Partitioning, OLAP and Data Mining options
Export done in US7ASCII character set and AL16UTF16 NCHAR character set
server uses WE8ISO8859P1 character set (possible charset conversion)
…
That warning should be regarded very seriously.  If you were exporting this table with the goal of dropping the table and using IMP to recreate it – you would find that all of your data in that table was now lowly 7 bit data!  Beware the unintentional character set conversion.

But also be aware that in general, they are necessary, if the client is expecting data in a specific character set – it would be disastrous to send them the information in another character set.

NOTE: I highly encourage everyone to read through the Oracle “Globalization Support Guide”.  It covers NLS related issues to a depth we will not here.  Anyone creating applications that will be used around the globe, or even across international boundaries – and who isn’t today – needs to master the information contained in that document.

Now that we have a cursory understanding of character sets and the impact they will have on us, let’s take a look at the character string types provided by Oracle.
</quote> 

"ORA-12704: character set mismatch" for NVARCHAR2 datatype

Rajashekar Wadeyar, July 25, 2005 - 12:46 am UTC

Hi Tom,

We have an Entity Bean which has got a CMP field of type String which is mapped to NVARCHAR2 datatype of ORACLE 9.2.
We are using weblogic 9.0 Beta as application server. While inserting a row in to the Database by creating an entity bean we are getting the following error: "ORA-12704: character set mismatch" for NVARCHAR2 datatype.

The above thing is working fine for WEBLOGIC9.0 beta/MSSQL, JBOSS4.0.0/MSSQL and JBOSS 4.0.0/ORACLE 9.2 combinations.

I have tried the following ways for fixing it:
1) Configured the connection pool with the different ORACLE drivers
2) Replaced nls_charset12.jar with orai18n.jar
3) set NLS_LANG=AMERICAN_AMERICA.UTF8 in the server startup script StartWeblogic.cmd
4) Created Database with characterset UTF8 and National CharacterSet UTF8

Still the problem is not fixed I tried with "goolge.com" to fix this problem but I not get any solutions. Please give me the Solutions from your end to fix this problem.

Thanks for your
Raj

Tom Kyte
July 25, 2005 - 7:28 am UTC

sorry, I've never used weblogic. I would recommend you try support

Internationalization

thirumaran, August 16, 2005 - 1:08 pm UTC

Hi tom ,

i am working on Internationalization stuff in Oracle 10g DB.
i need to migrate data from oracle 10g R1 (western alphabet sets ) to oracle 10g R1 "normalized DB" (proposed version ) and from oracle 10g R1 to SQL server 2000 Both oracle 10g and SQL server reside on Windows 2003 Server (Enterprise) "DOT NET platform"

i have table with the following structure

Source DB

LOCALE_TABLE
=============
PK_LOCALE_ID NUMBER PRIMARY KEY
LOCALE_LANGUAGE VARCHAR2(50)

CONTACTS_TABLE
=================
PK_CONTACTS_ID NUMBER PRIMARY KEY
FK_LOCALE_ID NUMBER
CONTACT_NAME VARCHAR2(50)
DESCRIPTION VARCHAR2(100)

Target DB
=========
CONTACTS_TABLE
=================
PK_CONTACTS_ID NUMBER PRIMARY KEY
FK_LOCALE_ID NUMBER
CONTACT_NAME VARCHAR2(50)
DESCRIPTION NVARCHAR(100)


The source data is in western alphabet sets ,after migration When an user selects a locale as japenese/Tamil/Telugu from the UI(User interface)he should be able to see the description on japanese/Tamil/Telugu language or it can be any language used in this world.

1) what character set do you recommend to use in destination/target DB 10g R1 or R2 (source is in western alphabet sets )

2) can i go ahead using western alphabet sets as db character set for target and set National CHARACTER SET (not the db character set) to UTF-8 or ALUTF8 or ALUTF16.
is this a right/wrong approach , which unicode should i use for DB and National CHARACTER SET .

3)when using windows 2003 server is it recommended to use UFT 16
</code> http://msdn.microsoft.com/library/default.asp?URL=/library/techart/IntlFeaturesInSQLServer2000.htm <code>

4)say "Hello" is stored in a db table column, should i stored the equavelent language unicode value in a seperate table so that i can view them.

Thanks
Thirumaran





Tom Kyte
August 17, 2005 - 11:59 am UTC

1) you left out the most important detail -- what languages you need to support.

Have you read the globalalization guide?
</code> http://otn.oracle.com/pls/db10g/portal.portal_demo3?selected=3#index-GLO <code>

Probably a UTF characterset.

2) you'll want to read that guide and based on your needs, come to your conclusions. You could do it with the nvarchar vs varchar types, but if you want unicode, why not go unicode.

3) fine

4) that didn't make sense to me.

Translate Japanese data to English

Maxim, Singapore, September 06, 2005 - 2:54 am UTC

Hi Tom,

I am having Japanese data in the Multibyte database. Is it possible to translate few fields in a table into English?

Please let us know. On going through your replies, I am not able to find it in your replies.

Tom Kyte
September 06, 2005 - 8:16 am UTC

Oracle doesn't not have any translation software, no.

data in two different character sets,

sns, September 22, 2005 - 9:21 am UTC

We have a situation where the data could be in Chinese Traditional character or
Chinese Simplified. However, we are not able to distinguish which one is what.

The people from Asia Pacific region needs to convert to UTF8 format and should send the data to us
in form of a text file.
They are using CONVERT function, but the data is in two different formats and one of them is coming out
with junk characters. The character set they are using is WESTERN EUROPEAN (WE8ISO8859P1).
Probably they are setting the NLS_LANG at the host level to UTF8.

Is there a way to get the data converted to UTF8 irrespective of the source character set?

What I am thinking is,set the NLS_LANG to Chinese Traditional at the host level.

Convert the column to chinese simiplified. If the data is already chinese simplified, it would
return as it is. Later convert it again to UTF8.

Assuming my column name is ADDRESS.

At the unix level, I would do
export NLS_LANG='CHINESE TRADITIONAL'

I thought of using CONVERT(ADDRESS,'CHINESE SIMPLIFIED').
This would convert chinese traditional to chinese simplified and the chinese simplified data would
return as it is. Is this assumption correct?

Later using the convert function again on top of that to convert to UTF8.

So my final CONVERT statement looks like:

CONVERT (
CONVERT(ADDRESS,'CHINESE SIMILIFIED'),'UTF8');

Does this work for me?

Thanks,


Tom Kyte
September 22, 2005 - 5:42 pm UTC

sorry, beyond what I know here -- if you don't know what the character set is, and we don't actually store the character set with each string... don't know what to say.

I'm not familar enough with these sets to talk authoritatively about them.

Display of ??????????????????

Vikas Khanna, September 30, 2005 - 3:17 am UTC

Hi Tom,

One of our database has been configured for the below mentioned nls parameters: However we are receiving the logs and SQL*LOADER is pumping the data in the database tables and the jobs are running on the server itself.There are no clients involved. One of our findings reveal that we have certain columns having the data as ??????????????????????????????. On further investigation (the dump of that column) it has been seen that the string has chr(239),chr(191) and chr(189) represenatation values.

------------------------------------------------------------------------------------------------------------------------------------
Typ=1 Len=144: 239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,1
89,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,1
89,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,1
89,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,191,1
89,239,191,189,239,191,189,239,191,189,239,191,189,239,191,189

sys@DWADM> Select chr(239) from dual;
C
-


sys@DWADM> Select chr(191) from dual;
C
-
?

sys@DWADM> Select chr(189) from dual;
C
-
?

sys@DWADM>

sys@DWADM> Select * from V$nls_parameters;

PARAMETER VALUE
---------------------------------------------------------------- ----------------------------------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-YYYY
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET UTF8
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM

PARAMETER VALUE
---------------------------------------------------------------- ----------------------------------------------------------------
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE

19 rows selected.

What should we change so that the representaion of these characters does n't show the ??????????.

Regards,
Vikas

Tom Kyte
September 30, 2005 - 9:30 am UTC

remember, in utf8 - a character does not take a byte, it can take many bytes -- those characters could be many bytes long.

Your NLS_LANG on the client must be right for the data.
Your terminal must be capable of displaying that data.



Converting to UTF8

A reader, October 19, 2005 - 7:32 pm UTC

tOM,
We are running a J2EE application against an Oracle DB. We are in the process of changing the character set of the DB from WE8ISO8859P1 to UTF8. We have executed char scan and are increasing the width of the columns as reported by character scan tool.

We are looking at changing the registry and enviroment variable setting for NLS_Lang on the application server (client to the DB).

My questions
1) What should nls_lang be set to on the client to the DB (Oracle app server machine) if the DB server has UTF8?
2) What changes should be made on the client to the App server (the end users machine)?
3) Are ther any other changes other than char scan on the DB, NLS Lang on the cleint machines that need to be looked into?

Thank you


Tom Kyte
October 19, 2005 - 8:01 pm UTC

1) the same. utf8

2) the nls_lang

3) full functionality test, test to scale, performance test.

You are going from a fixed width character set to a variable width one. substr( column, 1, 5 ) was EASY in we8iso - not so easy in utf8, test it.

Testing UTF8

A reader, October 24, 2005 - 7:15 pm UTC

I would like to get some guidance/examples for testing UTF8 conversion.

The following has been done so far
We did a character scan, fixed items reported by character scan,
Converted the db character set to UTF 8
Changed nls_lang on the db server, app server to UTF8

The test I have in mind is
Create a temp table with a varchar2 column.

Inserting data
1) Get the dump code of a few special characters(japaneses characters)
2) Log into SQLPlus on the db server.
3) Insert these special characters into the temp table, logging from SQLPlus on the db server.
4) Check the dump code of the data inserted.
5) If dump code values in step #1 and step #4 are the same, then test is successful.

Other test
1) Log into SQLPlus from the application server and do the same test.
2) Log into the application and insert data into the temp table or one of the different tables.

Compare the before and after dump code values, if all of them are the same then data is being inserted properly.


What I woud like to know is
1) Are the test cases I have correct?
2) Can you provided an example which shows the before and after dump values.

As always, appreciate all your help


Tom Kyte
October 25, 2005 - 1:30 am UTC

you need to have your nls_lang set correctly everywhere as well.

The tests are valid, if that is what you want to test.


Not sure what you are trying to test however. the sql function "dump" will "dump" the data.

RE: Testing UTF-8

Duke Ganote, October 25, 2005 - 11:19 am UTC

Might also try some the methods discussed on related threads, such as </code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:10250420313043#29873762616953 <code>

creating a UTF8 spool file with sqlplus

A reader, December 06, 2005 - 12:33 pm UTC

hi tom,

i want to create a comma seperated *.csv file by using a simple sqlplus script and spooling the results.

the data is stored in a 9.2 database having UTF8 as database characterset.

my question is whether it is enough to set the client characterset to UTF8 too for getting the spool file in UTF8 characterset?

i thought about the font. is it necessary to have an appropriate font even if i just want to store but not display the data?


Tom Kyte
December 07, 2005 - 4:59 am UTC

SQLPlus spool files are not UTF-8 files, they are plain "text files" (they do not have the 3 byte BOM in front of them for example)

UTF8 spool files

A reader, December 07, 2005 - 8:37 am UTC

hi tom,

thanks for your answer. let me refine.

the problem i have is to unload data from an UTF8 database, create a *.csv file und load the data into another database - propably also UTF8 - using this *.csv file in following.

i'm not sure about what i have to consider regarding the characterset theme in that case!? will it be enough to change the client characterset to UTF8, unload and reload the data? or will i have conversion errors in that case?


Tom Kyte
December 08, 2005 - 12:58 am UTC

why not export it, that'll solve all characterset issues.

RE: UTF8 spool files

Duke Ganote, December 07, 2005 - 5:31 pm UTC

Why not ASCIISTR any string columns when spooling the contents into a CSV, then UNISTR when reading the results?

Tom Kyte
December 08, 2005 - 1:49 am UTC

why not export/import ? less work, deals with any issues...

different between CHAR & NCHAR

Ed, December 08, 2005 - 3:35 am UTC

hi tom,

What is the main different between CHAR & NCHAR? As we know both data type can hold 2000 bytes length of data. NCHAR is a Unicode only datatype, and CHAR is not.

I have created a TEST table to store both the CHAR & NCHAR columns, and started to insert ASCII character and unicode character between both. Both are inserted successfully. I hardly differentiate both. CHAR value can store Unicode character as well.

When is the right time to use NCHAR, and when it is not? Can you show us some example?

TQ







Tom Kyte
December 08, 2005 - 7:11 am UTC

Depends on your database setup if these are even "different".  Depends on your client characterset as well.

Here is a short snippet from my recent book on them, chapter on datatypes:


<quote>
The “N” Variant

So, of what use are the NVARCHAR2 and NCHAR (for completeness)? They are used in systems where the need to manage and store multiple character sets arises. This typically happens in a database where the predominant character set is a single-byte fixed-width one (such as WE8ISO8859P1), but the need arises to maintain and store some multibyte data. There are many systems that have legacy data but need to support multibyte data for some new applications, or systems that want the efficiency of a single-byte character set for most operations (string operations on a fixed-width string are more efficient than on a string where each character may store a different number of bytes), but need the flexibility of multibyte data at some points. 

The NVARCHAR2 and NCHAR datatypes support this need. They are generally the same as their VARCHAR2 and CHAR counterparts, with the following exceptions:

    *    Their text is stored and managed in the database’s national character set, not the default character set. 

    *    Their lengths are always provided in characters, whereas a CHAR/VARCHAR2 may specify either bytes or characters.

In Oracle9i and above, the database’s national character set may take one of two values: UTF8 or AL16UTF16 (UTF-16 in 9i; AL16UTF16 in 10g). This makes the NCHAR and NVARCHAR types suitable for storing only multibyte data, which is a change from earlier releases of the database (Oracle8i and earlier allowed you to choose any character set for the national character set).
</quote>

ops$tkyte@ORA9IR2.US.ORACLE.COM> select * from nls_database_parameters;

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_NCHAR_CHARACTERSET         AL16UTF16
...
NLS_CHARACTERSET               WE8ISO8859P1
...
20 rows selected.


Here is a "quick" difference:

ops$tkyte@ORA9IR2.US.ORACLE.COM> create table t ( x varchar2(1), y nvarchar2(1) );

Table created.

ops$tkyte@ORA9IR2.US.ORACLE.COM> insert into t values ( unistr( '\00d6' ), unistr('\00d6' ) );

1 row created.

ops$tkyte@ORA9IR2.US.ORACLE.COM> select dump(x), dump(y) from t;

DUMP(X)
---------------------------------------------------------------------------------------------------------
DUMP(Y)
---------------------------------------------------------------------------------------------------------
Typ=1 Len=1: 214
Typ=1 Len=2: 0,214


<b>the unicode was converted to we8iso for X, stored "naturally" for Y - different number of bytes...</b>
 

UTF8 spool files

A reader, December 12, 2005 - 8:05 am UTC

hi tom, me again,

maybe we can and will do the data transfer using exp/imp. but we will have to do it via a *.csv file at this time! besides, it's also a very interesting problem!

so again, how can we transfer UTF-8 encoded data via a spool file? using this asciistr / unistr approach another user stated? or are there any better ways?

Tom Kyte
December 12, 2005 - 8:15 am UTC

(always amazed when someone says "yes, that would work (exp/imp) but we still have to do it this other way even though it is much harder, less reliable and we are not really sure how to in the first place)..... oh well.

write a program in a language that gives you the ability to create a unicode file.

or you can try the uni/ascii-str approach.


Me, I would consider nothing but exp/imp (or expdp/impdp), probably cause I'm lazy.

Insert Romanian value

Kshitij, December 14, 2005 - 1:38 pm UTC

Tom

I want to insert Romanian values into table:

O/S - WinXP

set NLS_LANG=ROMANIAN_ROMANIA.EE8MSWIN1250


SQL> create table try (name varchar2(100));

Table created.


SQL> insert into try values('Rãmâneþi conectat în toatã');

1 row created.

SQL> select * from try;

NAME
----------------------------------------------
Ramâne_i conectat în toata

But when i perform the select, certain characters are not displayed as it sholuld be, for instance the second character is different from what we inserted.

Kindly advise where am i wrong. 

Tom Kyte
December 14, 2005 - 2:18 pm UTC

windoze uses registry, not sure what environment you were using. did you set this in the registry and further, what is the character set that the DATABASE supports.

if the database characterset differs from client - then client data is converted to databases character set.

Is there any issue if UTF8 used in JAVA and ZHS16GBK used in Oracle?

A reader, December 17, 2005 - 5:01 am UTC

Hi Tom,

JAVA <------> JDBC <-------> Oracle
===================================
JAVA:UTF8(JDBC 2.0)
Oracle:ZHS16GBK (9.2.0.5)
===============
Unicode(UTF8) is used in JAVA.
ZHS16GBK Character set is used in Oracle Database.
The language is Chinese.
-------------------------------
I worry about the different of character set between JAVA and Oracle will cause a problem of Character conversion.
It is superset to subset when writing data from JAVA to Oracle.Will the conversion cause any issue? Can the JDBC driver convert it rightly?Could you give me a detailed and exact explanation?

Thanks

Tom Kyte
December 17, 2005 - 11:15 am UTC

there will be character set conversion taking place, there has to be.

will it convert it correctly - yes, according to the rules.

will it cause you any "problems", cannot say.

WE8ISO8859P1 - list of characters supported

candba, December 20, 2005 - 3:05 pm UTC

HI Tom,
Is their a link where we can see a list of all the characters supported by a specific character set.. I tried looking on metalink and in globalization doc but could not find it. We are mainly interetsed in WE8ISO8859P1 character set and if we could get a list of all the french characters supported by it.....

Thanks for your help as always .

Tom Kyte
December 20, 2005 - 3:32 pm UTC

You'll want to look up the standard on that

</code> http://www.htmlhelp.com/reference/charset/ <code>

seems to have it.

candba, December 20, 2005 - 9:27 pm UTC

As usual you are right on target......

Thanks a lot.

Romanian Raptor !!

Duke Ganote, January 04, 2006 - 8:54 am UTC

Well, I'm practically giddy. I tried Raptor with the Romanian test posted by Kshitij back up this thread at:
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:5783936214008#53305665471667 <code>
Using the script:
create table try (name varchar2(100));
insert into try values('Rãmâneþi conectat în toatã');
select * from try;
these results were displayed:

NAME ------------------------------------------------------------
Rãmâneþi conectat în toatã

1 rows selected

Never could do that in SQL*PLUS !!

Not yet totally enRAPTORed...

Duke Ganote, January 04, 2006 - 4:24 pm UTC

another doubt with Multilingual Database Application

Archana, March 09, 2006 - 6:52 am UTC

Hi Tom,

We are using multilingual database for our application. I have a doubt in pl/sql coding.

Consider following code,

create or replace procedure test_proc(v_flag char)
is
begin
if v_flag = 'Y' then
dbms_output.put_line('display True');
else
dbms_output.put_line('display False');
end if;
end test_proc;
/


Here I am checking whether the parameter passed is True or False. The parameter value is given by user from frontend (java).

Now in case of Thai and Japanese characterset(just for example, there can be other languages as well), these characters will differ.

I want to know, how can I handle such kind of situation for all the languages at a time so that I need not do any kind of code change for various languages.

Can you help in this case?

Regards,
Archana.

Tom Kyte
March 09, 2006 - 2:27 pm UTC

what character do you think they will be passing?

The database has a characterset, the client has one, the clients characterset is converted to the database character set - what issue do you forsee?

Also, that test_proc is technically "wrong"

if v_flag = 'Y' then
true
elsif v_flag <> 'Y' then
false
else
unknown :)
end if;



UTF8 and AZ8ISO8859P9E

Totu, March 09, 2006 - 7:28 am UTC

Dear Tom.
Both client/server OS is Win 2000.
My 10G DB Characterset is UTF8. And NLS_LANG set to AZ8ISO8859P9E at client side.

Using sqlplusworksheet:

insert into valyuta (valyutaname) values('ü');
insert into valyuta (valyutaname) values('ö');
insert into valyuta (valyutaname) values('&#287;');
insert into valyuta (valyutaname) values('&#305;');
insert into valyuta (valyutaname) values('&#601;');

select * from valyuta;

Result:
VALYUTAID VALYUTANAME
---------- --------------------
46 ü
47 ö
48 ?
49 ?
50 ?

5 rows selected.

Sure, I got AZ8ISO8859P9E code page from Oracle Locale Builder list. All above characters are inside it.

What can I do?

Thanks in advance.




NCHAR vs Unicode characterset

Jelena, July 18, 2006 - 12:20 pm UTC

Hi Tom,
at the moment we have:
PARAMETER VALUE
------------------------------
NLS_CHARACTERSET WE8ISO8859P1
NLS_NCHAR_CHARACTERSET AL16UTF16

Now, there is a need to store customer data (in rusian,chinese) in some of the fields (name, description,owner etc).

What would you do? convert 200 columns to NVARCHAR or go for reinstalling db with real unicode char set?

For me, using NVARCHAR has advantage that all 'db internal' strings are still 'small' and probably faster (like table names) and only specific columns which need to store multibyte chars are multibyte. Any disadvantages you can see?

In that case, would would be migration, simply:
insert into <new_table>
select convert(lastname,'we8..','al16'),.. from <old_table> ?
Thank you for all useful info!



Tom Kyte
July 19, 2006 - 8:28 am UTC

given the amount of testing you are going to have to do, it might just be time to bite the bullet and go multi-byte - across the board. It'll support your existing customers and new ones without change.

(you might want to use varchar2(20 CHAR) syntax when you do this instead of just varchar2(20)...)

NCHAR vs Unicode characterset

Jelena, July 19, 2006 - 1:01 pm UTC

Thanks, yes you are right - if we test it all then do a complete solution.
But one more interesting thing i read in the guide: "Your SQL statements and PL/SQL code contain Unicode data: You must use a Unicode database ... A common place to use Unicode data in a SQL statement is in a string literal."
So what exactly is meant by sql statement (containing string literal), really raw sql statement without values for bind variables? Or in other words - in a "non unicode" db, if you use a bind variable, can you insert a multibyte value in a NVARCHAR column as a 'simple' string?

Tom Kyte
July 20, 2006 - 7:55 am UTC

n'string'

is the syntax for a simple string that is of nvarchar type.

utf8 and clob

A reader, August 10, 2006 - 11:25 pm UTC

Hi Tom
My database's character is utf8, I have a table which has a clob column, my client is windows xp simplified chinese version, my nls_lang is .zhs16gbk, I saved a chinese text and used dbms_lob.loadfromfile to load chinese into that clob table, but display of selecting that table is crazy code but not correct chinese code, is it related to database utf8 character?

Thanks!

Tom Kyte
August 10, 2006 - 11:52 pm UTC

depends on the character set of the client loading the data, the characterset of the data in the file, and the character set of the client reading the data.

If they are all "correct" for the data as it was being touched, it should be correct.

However, I have absolutely zero access to any chinese data - I will not be able to test a single thing for you.

(note that clob data is stored encoded differently depending on the database version)

Alberto Dell'Era, August 11, 2006 - 4:57 am UTC

I'd take a look in the docs to dbms_lob.loadCLOBfromfile, the charset-aware version of dbms_lob.loadfromfile ...

UTF8 and AL32UTF8

ST, September 15, 2006 - 2:25 pm UTC

As per suggestion from Oracle, "recommended characterset for 10g OCS database is AL32UTF8". But my database NLS_CHARACTERSET is UTF8. In application tier, OPMN.XML, we have
<variable id="NLS_LANG" value="AMERICAN_AMERICA.AL32UTF8" append="false"/>

Do we need to change the value from AL32UTF8 to UTF8 in the OPMN.XML file or need to change NLS_CHARACTERSET in 10g OCS database to AL32UTF8 from UTF8?

We were getting ORA-01461: can bind a LONG value only for insert into a LONG column ORA-06512: at "ES_MAIL.ES_MESSAGE
_API", line 2496

For the above error we were suggested to change the value from AL32UTF8 to UTF8 in OPMN.XML file. Please advise.

Tom Kyte
September 15, 2006 - 2:27 pm UTC

thhis would best be handled via support - OCS (collaboration suite) is an "application" and you need the advice from people that manage that particular application.

UTF8

sam, October 25, 2006 - 6:27 pm UTC

Tom:

Based on what you said below, does that mean if I set my client (SQL*plus) session to UTF8 it will convert my data coming out of the RDBMS in WE8ISO to UTF8 on the client.
<
just set your NLS_LANG correctly -- you will find that your NLS_LANG is set
differently in 8i then in 9i. This CLIENT setting dictates to the database
what character set it should expect from the client and the characters get
converted appropriately. >


Tom Kyte
October 25, 2006 - 9:44 pm UTC

that is what character set translation is sort of all about - yes.

suggestion (made on other pages to you as well): read globalization guide.

A reader, November 06, 2006 - 5:58 pm UTC

Hi Tom,
We are in the process of developing our application to support Chinese, Japanese, Korean and Western European languages. Therefore, we agreed to use AL32UTF8 as character set and AL16UTF16 as national character set.

Our consultant is recommending to change every VARCHAR2 to NVARCHAR2 and CHAR to NCHAR.
Following is their explanation why we need to change VARCHAR2 to NVARCHAR2 and CHAR to NCHAR.
Could you advise if there is performance concern if type is NVARCHAR or VARCHAR because of conversion even after the database is unicode. My understanding is, if client is using different character set; character conversion occurs irrespective of NCHAR or CHAR. Could you please advise.
Thank you,
V
Consultant’s explanation:
The server database character set specifies the character set to be used for varchar data. It is also the character set that is used when converting between the client and server. To store multi-lingual text in a varchar2 field, you’ll want to be sure to specify the server database character set as UTF-8. This will limit any loss of data transferred between client and server. The client character set is usually specified in NLS_LANG. If this is not set to a character set which is a subset of the server database character set then you’ll likely lose data (on insert, select, etc.).

The national character set is used for nvarchar data. This is always Unicode.

Runtime itself is based on Unicode and uses Oracle’s OCI wide char API. This means that there is no character set conversion required when manipulating nvarchar data. There is, however, conversion required when manipulating varchar data – NLS database charset <-> server database charset. Therefore, performance should be greater if you use nvarchar data types for your string data.

When comparing storage requirements, UTF-8 requires more storage for CJK (Chinese, Japanese, and Korean) languages – a minimum 3 bytes per kanji character, maximum of 4 bytes per character. UTF-8 requires less storage non-CJK languages – usually only 1 byte per char with a maximum of 2 bytes per character. UTF-16 always requires 2 bytes (and 4 bytes for very rare CJK characters). So overall, you may find you’ll need up to twice as much storage for switching to nvarchar data types.


Tom Kyte
November 07, 2006 - 4:30 pm UTC

ask him what that alf thingy is - it be, well, that same thing they get with the N stuff.

if the client and server have the same character set, then varchar2 would require no conversion.

I'm not sure why they are recommending Nvarchar2.

A reader, November 07, 2006 - 6:13 pm UTC

Hi Tom,
Could you please advise if the following statement is correct:

Internally Oracle processes most characters in AL16UTF16 encoded strings as fixed-width characters, so Oracle can usually process UTF-16 encoded strings faster than UTF-8 encoded strings (which are variable width)
Thank you,
V

Tom Kyte
November 08, 2006 - 8:18 am UTC

if some characters are not - then any character might not be - hence they would be about the same.

If someone is trying to say "this is faster than that" ask them "got benchmark so we can evaluate the overall difference?"

A reader, November 15, 2006 - 11:00 pm UTC

Hi Tom,
We are in the process of developing application to support CJK and Western
European languages.

We use AL32UTF8 as character set and AL16UTF16 for National character set.

We are developing application and using UTF-16 Unicode( OCI wide API) because of

faster string processing capability.

1.Is it good approach to change every CHAR and VARCHAR2 columns in the database
to NCHAR and NVARCHAR2 to avoid character set conversation from UTF 8 to UTF16.

2. Is conversation from UTF-8 to UTF-16 considerable overhead?

3.We have more Western European characters (85%) and few Asian characters; I am
guessing that if we convert every VARCHAR to NVARCHAR2 we may almost double the

database size and requires to double SGA size for same performance before
conversion. Could you advise if this is reasonable guess?
4.Our application team using UTF-16 for faster string processing (over UTF8
variable width) but database uses UTF-8 not UTF-16 for CHAR type, Is is good
approch to develop application to use UTF-8 instead of UTF-16 ?
Thank you,
V


Require to display data in Spanish Langauge

Hemal, March 14, 2007 - 8:32 am UTC

Hello Tom,
I am using 10G Database.
I am having a table containing data in English Langauge.
There is a Front end Java Screen which fetches the data from this table and displays it.
This Java Screen must display the data in the Spanish Langauge.
Can you please advice me the steps to be performed to display the data in Spanish Language.
Thanks and Best Regards
-Hemal
Tom Kyte
March 14, 2007 - 1:58 pm UTC

step 1: find language translation software
step 2: install it
step 3: use it

we do not translate from language to language - that would be something you have to do.

Actually, Oracle can translate... kind of

Mike Friedman, March 14, 2007 - 10:11 pm UTC

If you're feeling mischievous, just implement a PL/SQL function using UTL_HTTP to call the Google translation service (translate.google.com/translate_t).

It's quite amusing to see people's eyes bug when you show them a form that can translate arbitrary Chinese into English and vice versa and you tell them you implemented it in PL/SQL.

A reader, August 23, 2007 - 10:47 am UTC

Hi Tom,

We are generating XML using SQLX function and sending data to our sister company.When we send data in XML, some of the special characters are not handeled properly. 

for e.g. we have account like "Péter LLC".Here the XML that we are sending...
SQL> SELECT XMLELEMENT("ACCOUNT", 'Péter LLC') FROM DUAL;

XMLELEMENT("ACCOUNT",'PÉTERLLC')
--------------------------------------------------------------------------------
<ACCOUNT>Péter LLC</ACCOUNT>

out sister company want data in following format

<ACCOUNT>P'é'ter LLC</ACCOUNT>

They want 'é' instead of é. (Just putting quote so that you can see which character we need)

Can you please help?

Thanks


Tom Kyte
August 23, 2007 - 1:14 pm UTC

the special characters are not special, they are just characters.

You'll need to massage your data using replace to add these quotes around things you think are needing special handling.

A reader, August 23, 2007 - 7:34 pm UTC

Tom,

SQL> SELECT XMLELEMENT("ACCOUNT", 'Péter LLC') FROM DUAL;

XMLELEMENT("ACCOUNT",'PÉTERLLC')
----------------------------------------------------------------------------------------------------
<ACCOUNT>Péter LLC</ACCOUNT>


we want chracter é  as Ampersand#233; in the output( I hope you can understand what I want)


Thanks

Tom Kyte
August 24, 2007 - 2:47 pm UTC

why, while that would be conforming XML, it is already conforming XML.

Your partner should accept standard, documented XML - that is the entire purpose of the standard. If they cannot, they are not accepting XML, they are accepting MUXML (made up xml) - and you'll be hard pressed to use any existing tools with them (since existing tools generate standard XML)

I believe you'll have to "replace" those characters with their entity encoding.

I understand what you say you think you want, it is just a strange request - one that doesn't really make a ton of sense (to me).

Why this Error

A reader, January 14, 2008 - 6:53 pm UTC

DB :- 9208
NLS_CHARACTERSET   :  UTF8
NLS_NCHAR_CHARACTERSET     : UTF8.

CREATE TABLE TEMP ( N NCLOB) ;

SQL> insert into temp values (1 ) ;
insert into temp values (1 )
                          *
ERROR at line 1:
ORA-12704: character set mismatch ;

SQL>  insert into temp values ('1') ;

1 row created.

Is this error related to Oracle not doing implicit conversion for nclob or its a genuine Character set issue.

 I ran this from the windows client with NLS_LANG SET AS utf8. I got same Error.

I ran this from the Unix DB server.
Got the same Error.

 Why this Error ?How to solve this. ?

Thanx

Tom Kyte
January 15, 2008 - 7:10 am UTC

let me ask you -- why are you trying to stuff a number into a string?

you need a double conversion here:

a) number to string
b) string to nvarchar2

we'll do it once, but not twice, you'll get the same thing with dates...

ops$tkyte%ORA10GR2> insert into t values ( sysdate ) ;
insert into t values ( sysdate )
                       *
ERROR at line 1:
ORA-12704: character set mismatch


ops$tkyte%ORA10GR2> insert into t values ( to_char(sysdate) ) ;

1 row created.



the implicit conversion is from number to string, you need number to nvarchar2 - that it not implicitly available.

save special characters in a column

Dibyendu, March 26, 2008 - 3:05 am UTC

In our application when we are copy-pasting some text in text area and trying to save that in database, same text is not getting saved. What we were trying to save is: ¿jjjjjjjjjjj¿¿*******!!`sachin (copied and pasted the text from microsoft word), and what is getting saved is: ¿jjjjjjjjjjj¿¿*******!!`sachin. When I was trying to save the same text executing a procedure from sql*plus, text is getting saved properly. But from the application it is not the same.

Please suggest a solution. Do we need to change any character set?
I am using Oracle9i Enterprise Edition Release 9.2.0.1.0, in Windows XP. The character set that is used is: WE8ISO8859P1.
Tom Kyte
March 26, 2008 - 8:57 am UTC

what is the character set of the client
and what is the character set of the database

msword by the way does NOT using WE8ISO8859P1, pasting from word typically introduces lots of garbage that is not WE8ISO8859P1 ("smart quotes" for example!)

save special characters copied from ms word?

Dibyendu, March 26, 2008 - 4:57 am UTC

Actually in the previous post I wrote '"' and ''' in microsoft word and pasted, but it is showing like '¿'. Please give some suggestion. Whether we need to change character set or there is some other solution.



Tom Kyte
March 26, 2008 - 9:01 am UTC

they are 'smart quotes'

they are not characters in that characterset, they do not exist, they are special to MS and their character set.

they would need to be converted into normal quotes

save special characters copied from ms word?

Dibyendu, March 27, 2008 - 1:25 am UTC

Hi Tom,
Thanks for your response.
The character set for both database and client are: WE8ISO8859P1.
Could you please tell me is there any other way to fix this issue? Will changing the character set help us? We need to support French, Spanish, English and Portugese language in the application.
If we change the character set to WE8MSWIN1252, will the problem be solved.
Please suggest.
Tom Kyte
March 27, 2008 - 10:42 am UTC

... rench, Spanish, English
and Portugese language in the application.
...

and none of those are the super special MS character set they invented for themselves on Windows.

You would have to research the WE8MSWIN1252 character set, see if it is what Word uses/Windows uses and if it supports all of your other requirements as well. They the ISO standards - we just implemented them.

HIndi data in LInux Command line

Orauser, April 09, 2008 - 3:35 am UTC

Hi Tom,
10.2
RHEL 4
Ours is a client/server terminal based forms 6i application.
The reports are generated through Pro*c.
I need to store hindi data in some tables and these tables are further processed in Pro*c code.
I have created a testing database in AL32UTF8 character set and AL16UTF16 character set.
As per this link
[url] http://www.oracle.com/technology/tech/globalization/htdocs/nls_lang%20faq.htm#_Toc110410565[/url]
I have set LC_ALL to UTF-8 and NLS_LANG=HINDI_INDIA.UTF8
Now when i query the table in GUI sqlplus, ican see hindi data but in command line mode ,the query shows empty blocks .
Questions:
1)How to see hindi data in command line mode of linux?
2)How to generate reports containing hindi data via pro*c?

Many Thanks

Regards

Going "backwards" AL16UTF16 => UTF8 ???

Kim, April 11, 2008 - 8:21 am UTC

Hi Tom

We would like to consolidate our databases, and in doing this, we would like to downgrade the "surviving" databases national char.set from AL16UTF16 to UTF8.

It is NOT used in any application, and csscan reports 100% success (although it could not analyze some tables succesfully).

Are we in for many issues or could it work?

Br
Kim
Tom Kyte
April 11, 2008 - 8:47 am UTC

why couldn't it review some of the tables?

Multilingual Database and UTF-8

Vijay, April 11, 2008 - 2:51 pm UTC

Hi Tom,
I found this site very useful with lot of details.

We have a database with database character set WE8ISO8859P1 (NLS_CHARACTERSET) and national character set (NLS_NCHAR_CHARACTERSET)AL16UTF16.

I am able to store the names with extended ascii (é, Ñ etc). But when retrieving the same data (using perl script on linux), the extended characters comes as stripped of (e, N etc or ?).
I have set my Windows and unix (linux) NLS_LANG as AMERICAN_AMERICA.WE8ISO8859P1. I even tried using AMERICAN_AMERICA.UTF8. Still no luck.

I tried using dump on the names. The code associated with each letter looks correct. It gives code 233 equivalent to é.

Am I missing something, please clarify.

Thanks in advance.
Tom Kyte
April 11, 2008 - 3:45 pm UTC

can you "prove this" using cut and paste?

when I use sqlplus - and ensure the NLS_LANG is set - no translation takes place (eg: it works)

now, I do not use perl, do not have perl... so, prove that perl isn't mucking up the environment.

Characterset information

Candy Peck, April 11, 2008 - 5:54 pm UTC

If what you are doing is putting data into a file, you will need to make sure what ever tool you have will render the characters correctly. We thought we had a problem at one place I worked and it turned out it was working okay, but the default font for the tool we were looking at the data with made it look like it was being messed up.

Also UTF8 characterset is not the characterset of choice any more. AL32UTF8 is what should be used. The UTF8 characterset will not be expanded. AL32UTF8 will store the data in 1 - 4 bytes depending on the character. This helps the database to not be any bigger than necessary, but allows it to deal with multibyte characters. Do not move from the western european to UTF8. You are wasting your time. It will not deal with all the charactersets you are going to work with. I know because I did a conversion and the UTF8 characterset did not work. We converted Korean, Chinese, Japanese and Cyrilic.

Is there any way out?

A reader, April 15, 2008 - 3:03 am UTC

Hi Tom,

I posted one question regarding Hindi data display in linux line mode terminal.It is in the fourth last comment of this topic.

Please guide regrading it


Thanks


Tom Kyte
April 16, 2008 - 2:35 pm UTC

this is really a question for whomever wrote your terminal emulator. Usually - you can adjust the character set of your emulator via a menu or command line options when starting it

Gzip issue

vaibhav, June 23, 2008 - 1:47 am UTC

Hi Tom,

We have a test environment and a development environment.

i have a clob column where data stored is compressed using gzip and then uncompressed back to its normal form during retrieval.

a simple ref cursor is used to return records to the front end, but this cursor fails to produce correct results when used in DEV environment but works perfectly well in test environment.

When i try to unzip the file in DEV, it says file format is invalid, but it gets unzipped in TEST environment

we have same locales installed in both the boxes. the only difference is that the Development box is in India and the test box in is UK

i checked init.ora and all the settings are the same for both the boxes.

the only difference that i could see is, in DEV, when the file is zipped before inserting, CRC is some d4...
but after retrieving, this CRC changes to some b4b4

But in the TEST environment the CRC remains the same. i think this is the cause of the problem


Tom Kyte
June 23, 2008 - 8:01 am UTC

I'll bet you are using a clob in the database.

and gzip data is RAW, binary, NOT TEXT


so you used the wrong type, you must, you need to, you have to, you can only use a blob for this data. There is no other type you can use - none.


What is happening - in some cases the client is using an NLS_LANG different than you are in the database (and you cannot control that, clients pick their NLS_LANG, you do not).

So, when they store the RAW DATA in a clob, we pretend that the data is character data (we have to, you told us it was) and we convert it from their character set to the databases character set.

And on the way back out - we do the same thing, we convert from the database character set to the clients.

So, say the client that inserts has characterset CI, the database has characterset D and the retrieving client has characterset CS. We'll have taken your raw bits and bytes and converted

(pretend this raw data is CI data) conver that into D, take the D data and convert that into CS


and you get scrambled bits and bytes.


You have to replace your datatype and use a blob, that is the only way this will work. You do not have a choice there - you are in development, you are adding a new feature, you can make this change - you have to make this change.

Thanks You

vaibhav, June 24, 2008 - 1:00 am UTC

Hi Tom,

thanks for that quick reply

the explanation was really great...no wonder it wasn't working.

I will change the datatype to blob and then try out the same. i am sure it will work.

thanks again

Microsoft cut/paste

A reader, August 18, 2008 - 6:18 pm UTC

Here are our db settings:
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET AL32UTF8
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE


Would we be able to store "smart quotes" from Microsoft word cut/pasting in our db, or would we need something beefier - UCS-2 or UTF-16 characters?

Currently, customers cut/paste from word docs to their browser(conencting to a) JBOSS appserver to this database. The characters we see in the db are converted (ie - bullets and smart quotes are ?? or blocks).

Would we need something more then AL32UTF8?

UTF8 vs. UTF-8

Dave, October 27, 2008 - 10:58 am UTC

Tom,

There was a question about the difference between the UTF8 (Character set) and UTF-8 (Encoding scheme). Your answer was very clear but one question that I have is why Oracle is using UCS-2 Encoding scheme for such functions as UNISTR. Wouldn't it be handier to just use UTF-8 to avoid confusion or does it have a reason?


character sets

A reader, January 21, 2009 - 11:53 pm UTC

Tom:

We are planning to upgrade several oracle 9iR2 databases (WEISO8859-1) to Oracle 11g.

Can you give me your opinion on the following:

1. Databases are mostly used to store book information in english. Some book titles are spanish and require spanish diacritics but i believe this is currently supported by ISO-88591-1.

Some mentioned created the 11g database as unicode to be able to generate xml files from a few columns in unicode format.

Would you consider UTF-8 or UTF-16 for the new database for these reasons? Is not UTF-8 same as ISO-88591 since they both use 8 bit encoding.

2. If we decided to do the unicode would it be a lot of extra effort in testing the charcter set conversion or it would be part (included) of testing the 9i->11g upgrade .

3. Some may argue in saying "what do we have to lose if go to unicode character set"? Can there be any data loss or performance impact or anything else.

thanks,
Tom Kyte
January 22, 2009 - 8:44 am UTC

1) You are upgrading, why talk of a "new database", you have an existing database, just upgrade it. A rebuild isn't suggested.

If your existing character set supports your needs and you don't reasonably expect to have to change - I'd leave it be.

2) It would be the same amount of testing - functionality test, load test.

3) Some might say "we are just upgrading, it'll be fast and easy, we don't want to unload and reload"

character set

A reader, January 22, 2009 - 8:32 pm UTC

Tom:

can you elaborate on "unload and reload".

do you mean if you change characterset you have to unload data and reload it. I thought you do that when you upgrade too.

2. If the main reason is to get UNICODe out of the database do you suggest using NVARCHAR2 or CONVERT FUNCION.

I am not sure why people get hung up on unicode. most if not all xml editors should read iso-88591 with no issues.
Tom Kyte
January 23, 2009 - 7:47 am UTC

it depends on whether your character set is a pure subset of your new character set. If it is, you can alter the database, if it isn't, you'll need to re-encode your data (modify, change it, make the bits and bytes different than what they are now)
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/ch2charset.htm#sthref157


when you upgrade, you just upgrade, it is done "in place", you do not unload/reload.

2) you don't need unicode, as stated. Why would main reason be that?

All xml editors have to, else they are not an xml editor.

Besides - who edits xml? You might look at it during development, but after that, it would be like editing a database by using a binary editor directly on database blocks. xml is a data interchange format.

nls_characterset

anupam pandey, February 06, 2009 - 12:03 am UTC

I am inserting

r/hpo]\xe2\x9d\xc9nu\xeb)\xae\x87^^\xb2\xcaa

this line in DB table but when I did the select and spooled the record I got following output

r/hpo]\xe2\xbf\xc9nu\xeb)\xae\xbf^^\xb2\xcaa

Selected one is differend from that inserted .



DB NLS_CHARACTERSET ==AL32UTF8

Tom Kyte
February 06, 2009 - 3:28 pm UTC

and what is the client side NLS_LANG

nls_characterset

A reader, February 11, 2009 - 12:09 am UTC

client side NLS_LANG is set as follows :-
NLS_LANG = "AMERICAN_AMERICA.WE8ISO8859P1";

thanks And Regards,
Anupam Pandey
Tom Kyte
February 11, 2009 - 10:35 am UTC

your client is presumed to be transmitting single byte characters. It is quite simply not capable of inserting multibyte data.


As way of an analogy, you are speaking English, the database is speaking Japanese and there is a translator in the middle doing the best they can to make your english into equivalent japanese. But unfortunately, you think you can speak japanese and are trying to do so - causing the translator to completely garble the message.

Your client can only insert single byte data as it is configured.

nls_characterset

A reader, February 11, 2009 - 11:52 pm UTC

Whats the solution you suggest Tom.

Thanks And Regards,
Anupam Pandey
Tom Kyte
February 12, 2009 - 12:09 pm UTC

well, I fully recommend reading:

http://www.oracle.com/pls/db102/portal.portal_db?selected=3#index-GLO
the globalization guide...


from front to back so you understand what you are dealing with - and then - perhaps have the client use a character set that supports your needs

UTF8

Kiran, April 17, 2009 - 2:02 am UTC

Hi Tom,

Greetings!

Below are my NLS Char setting in my DB

---------------------------------
NLS_NCHAR_CHARACTERSET UTF8
NLS_SAVED_NCHAR_CS UTF8
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET UTF8
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN


we are using local languages ( Japaneese,korean etc ) to store the Vendors in my DB, now I want to extract the vendor names into English during SQL extrace, I tried convert function in SQL Developer, but still I am not able to see the converted from local language to English, please suggest me how to retrieve the local language value into English.

Below is the extract from my DB

select vendor_name,convert(vendor_name,'UTF8') FROM PO_VENDORS ;


Vendor_name Convert(vendor_name, 'UTF8')
------------------------------------------------------
(¿)¿¿¿¿-¿¿¿ (¿)¿¿¿¿-¿¿¿
(¿)¿¿¿¿ (¿)¿¿¿¿
(¿)¿-¿¿¿¿¿¿¿¿ (¿)¿-¿¿¿¿¿¿¿¿
(¿)¿¿¿¿¿¿¿¿¿-¿¿ (¿)¿¿¿¿¿¿¿¿¿-¿¿
(¿)¿-¿¿¿¿¿ (¿)¿-¿¿¿¿¿
(¿)¿¿¿¿¿¿¿¿ (¿)¿¿¿¿¿¿¿¿
(¿)¿¿-¿¿-¿¿-¿¿¿¿¿¿¿ (¿)¿¿-¿¿-¿¿-¿¿¿¿¿¿¿
(¿)¿¿-¿¿¿¿¿ (¿)¿¿-¿¿¿¿¿

I need the vendor name to be converted into English in the Extract, please suggest me.

Regards
Kiran
Tom Kyte
April 17, 2009 - 9:45 am UTC

well, NLS is not a language translator - it does not convert Japanese into English.

It allows you to store data in various charactersets.

If you need to translate that data, you'll have to do that

UTF

Kiran, April 17, 2009 - 5:17 pm UTC

Hi Tom,

Thanx for your Response.

is there any possiblity in getting the records in local language while retrieving using SQL instead of garbled values, if there any possibility by using any standard functions, please suggest me.

Regards
Kiran
Tom Kyte
April 20, 2009 - 10:52 am UTC

are you asking to get things TRANSLATED - for example, someone in France inserts "maison" and you retrieve "house"?

or are you asking more of a "display" thing - I want to retrieve the original France (8 bit) data, but display it on my US7ASCII (7 bit only) terminal sort of question?

Storing UTF8 data in USASCII7 DB with ISO8859 !!!

Seemit, October 23, 2009 - 4:28 pm UTC

Hi Tom,
DB : Oracle 10g on AIX. We use Putty as Terminal Emulator.



$ echo $NLS_LANG
AMERICAN_AMERICA.WE8ISO8859P1

SQL> select * from v$NLS_PARAMETERS where parameter like '%CHARACTER%';

NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET US7ASCII
NLS_NCHAR_CHARACTERSET AL16UTF16

We got a file from our source who uses Sql Server on Windows. The export was on UTF8, which I tried to import in UTF8 with above NLS_LANG. Here is a sample record value in question.(look at apostrophe kind of char after E)

Patriarkhu Grigoriu E’ 11

After doing the CTRL+M removing, if I edit (vi) the file in Unix, it shows ^Y for that apostrope. SQLLDR loads the ^Y also (I saw in dump, ASCII 25).
I can not even paste that apostrophe on Unix window (comes as a dot), even after trying to change the Putty's Translation to UTF8 (just in case).

Tried SQLLDR in all permuations/combinations of CHARACTERSET(s) (and without any too).
I looked at ASCII value of the actual apostrophe, it is 146. Same result if I change Column datatype to/from CHAR/VARCHAR2.

Where am I off? Or I am entirely off? My guess is NLS_CHARACTERSET being US7ASCII and 146 doesn't fit into 7 bits.


Tom Kyte
October 26, 2009 - 12:46 pm UTC

DB : Oracle 10g on AIX. We use Putty as Terminal Emulator.

that is like saying "car: prius. i use a hand pump for my bike"

Not sure what the two have to do with each other right away....




You cannot edit that utf-8 file and make changes like that. Not at all. You messed the file up by editing it.

but basically, that utf8 data when loaded into a us7ascii database will have all kinds of changed data. TONS OF IT.

Not sure what you want to do here - but don't edit the file like that (you can tell sqlldr that the end of record is \r\n instead of just \n.


but taking utf8 data - that contains characters that are not in the ascii range of 0..127 - and loading it into a us7ascii database will absolutely lead to the data changing - perhaps a lot.

utf8

A reader, March 09, 2010 - 3:19 pm UTC

Tom:

Can you advise the best solution for this.

I have a web page that is renedred from 9i database using ISO-8859-1.

I need that in UTF-8 format.

What would you do:

1) Create a new COLUMN NCLOB and store the xml file there. That would save it in UTF16 which is the national character set. Change the national character set from UTF16 to UTF8.

2) Use CONVERT(column1,source charset,destination charset) for each column in the cursor and then print he data. I also need to use CONVERT('any string on the web page',source, target)

3) Store the whole web page as a string and do CONVERT (big_string,source,target).
Now you have a new wb page in UTF-8

4) none of the above - do this instead (tell me what it is)



WEBPAGE 1 - ISO88591

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE package PUBLIC "+//ISBN 008-1-9//DTD Package//EN" "test.dtd">
<metadata>
<Title>The moving seas </Title>
<Creator>Tom, Jones </Creator>
...other database fields.
<x-metadata>
<meta name="Publisher" content="SEA"/>
.....other
</x-metadata>
</metadata>

WEBPAGE 2 - UTF8

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE package PUBLIC "+//ISBN 008-1-9//DTD Package//EN" "test.dtd">
<metadata>
<Title>The moving seas </Title>
<Creator>Tom, Jones </Creator>
...other database fields.
<x-metadata>
<meta name="Publisher" content="SEA"/>
.....other
</x-metadata>
</metadata>

Tom Kyte
March 09, 2010 - 4:22 pm UTC

I would not do #1 - your choice of national character sets is limited in current releases. Not a good long term solution.


What generates your web pages. It could be as easy as "the thing that generates it sets its character set to UTF8".

utf8

A reader, March 09, 2010 - 4:47 pm UTC

Tom:

Thank you for the prompt response.

Page is generated using a pl/sql stored procedure that has 5-6 cursors and then HTML formatting section using HTP.PRINT(..)

Some HTP.P lines print the XML declaration lines and headers
like

htp.p('<xml .......');

and some print the data retreived from the cursor

htp.p(v_title);

The page is also retrieved using MOD_PLSQL which my understanding can upload/download files without character set conversion. I think mod_plsql is set to ISO-8859 too.

If using mod_plsql would not work then i have to store the file on a servr and have a retreive it via a link (bypassing mod_plql).

Please advise the best way to get this.
Tom Kyte
March 10, 2010 - 8:23 am UTC

... The page is also retrieved using MOD_PLSQL which my understanding can
upload/download files without character set conversion. ...

right, but you are not uploading/downloading a file stored in a blob, you are generating the page.

setting the characterset of the 'middle tier' client should do this, have you tried that. If you set the characterset of the apache mod_plsql environment to utf8, then the database will expect that everything sent to it is utf8 and everything it needs to send is utf8 (to and from that client with that characterset)

utf8

A reader, March 10, 2010 - 10:08 am UTC

Tom:

I cant change the character set for the mod_plsql that easily. lot of paperwork needed for that.

But would not that screw up things in other places. I have hundreds of pages that use ISO-88591. It would be better to have client and database using ame character set rather than doing conversions all the time.

I would rather do this programmatically.

Here is what i came up with. Would this work

1. I can replace each HTP.p() call to PRINT call.
2. Save the whole page to CLOB variable
3. CONVERT the CLOB from ISO88591 to UTF8.
4. print the new utf8 variable.

The code is below. Do you think mod_plsql will still convert it to ISO-8859. I can also save the page to GTT or a regular table and download it since mod_plsql does not convert the charcter set that way.

PROCEDURE GENERATE_UTF8(p_magno IN VARCHAR2 DEFAULT NULL)



IS

l_my_var CLOB;
L_utf8_var CLOB;

CURSOR MY_CURSOR IS
SELECT col1,col2 from table1, table2
WHERE ........ result

PROCEDURE PRINT (p_string IN VARCHAR2)
IS
BEGIN
l_my_var := l_my_var || p_string;
END;


BEGIN

OWA_UTIL.mime_header ('text/xml', TRUE);
PRINT('<?xml version="1.0" encoding="UTF-8"?>');
PRINT('<ROWSET>');

/*Open the cusror and print the data */

FOR x IN my_cursor(100)
LOOP
print(x.result.getclobval());
END LOOP;

PRINT ('</ROWSET>');

/* After saving the whole ISO-88591 page to l_my_var convert it to UTF-8 */


l_utf8_var := CONVERT(l_my_var,'alt32utf8','we8iso8859p1')


HTP.P(l_utf8_var);

END;

please advise.


Tom Kyte
March 10, 2010 - 10:26 am UTC

well, the mod_plsql client is going to sort of dictate the characterset sent back from the database. You are going to have strange things going on if the client using character set X sends/receives from a database with characterset Y using characterset Z.

I cannot advise you - too many unknowns. If you have a requirement to return character set X, it needs to be to a client willing to accept character set X.

utf8

A reader, March 10, 2010 - 11:18 am UTC

Tom:

OK, then

1) cant i write the above l_clob_utf8 to a file on the server (UTL_FILE) and have a URL that opens it directly. That way it will be browser directly access the file (no mod_plsql).

2) I do not understand why i cant create a new column "NCLOB" or "NVARCHAR2" or "NBLOB" that stores UTF8 data.
Then mod_plsql can retrieve the file without any character set conversion.

2.6 File Upload and Download
mod_plsql allows you to:

Upload and download files as raw byte streams without any character set conversions. The files are uploaded into the document table. A primary key is passed to the PL/SQL upload handler routine so that it can retrieve the appropriate table row.

Tom Kyte
March 11, 2010 - 7:54 am UTC

Sam, I often feel like I'm going around and around in circles with you over and over and over, again and again...


1) don't forget to study and research and learn what it means to be a unicode file, don't forget to learn what the format of such a file is, and then learn how to write it.


2) you can, problem is YOUR CLIENT IS NOT A UTF8 client. Do you understand what character set conversion is and does? It converts charactersets. If you purposely return utf8 data to a client that says "I speak X, not UTF8" we will convert utf8 into X for you - it is what characterset conversion does.

I know perfectly well what file upload and download does, problem is you are neither uploading nor downloading a file.






utf8

A reader, March 11, 2010 - 8:34 am UTC

Tom:

I udnerstand the client character set conversion and the mod_plsql or any client will convert the data to the language it talks..

But i do file downloads using mod_plsql and BLOB field frequently.. Is not this called file download? please explain what is a file download if this is not.

SELECT blob_content,
mime_type
INTO l_blob_content,
l_mime_type
FROM documents
WHERE name = file;

OWA_UTIL.mime_header(l_mime_type, FALSE);
HTP.p('Content-Length: ' || DBMS_LOB.getlength(l_blob_content));
OWA_UTIL.http_header_close;
WPG_DOCLOAD.download_file(l_blob_content);


My other option was to skip mod_plsql altogether. Would apache character set dicatate the character set then.
But based on what your saying the user browser is probably using win-1252, so it would convert the utf-8 again.

You seem to say it is almost impossible to do unless you change the character set in mod_plsql or web server.
Tom Kyte
March 11, 2010 - 8:45 am UTC

yes, that is a file download. but what does it have to do with the price of cheese? You don't have a file. That isn't what you've been talking about doing.

... You seem to say it is almost impossible to do unless you change the character
set in mod_plsql or web server. ...

I seem to be saying repeatedly - that unless the client is willing, ready and able to accept the character set you propose, the character set you get will be somewhat not reliable - won't it. You'll have character set conversions (which can be lossy) taking place at multiple places. think about this - you just added an entirely new bad dimension to this by mentioning that the browser will do whatever it wants too.

You have a database with character set A
You have a client middle tier with a character set probably A (I think)
You have a client browser with a character set of B

You want to use X


replace A with Russian
replace B with Japanese
replace C with Spanish

Imagine what would happen to a paragraph of text as it is converted over and over...



UTF8

A reader, March 11, 2010 - 10:26 am UTC

<<yes, that is a file download. but what does it have to do with the price of cheese? You don't have a file. That isn't what you've been talking about doing.>>

Let me clarify something.

1. I said i will change the code so the whole web page is assigned to l_clob variable. Then I will save that to NCLOB column in the database. Does not this mean i have a the file in unicode format now??

2. After I do #1, user will *download* the file using Mod_plsql which will *NOT* do a character set conversion according to the docs.

so mod_plsql will deliver the file to the browser as unicode file.



ORA-01482: Unsupported Character Set

prakash, March 15, 2010 - 3:41 am UTC

Tom,

I am getting this error ORA-01482: Unsupported Character Set, please help.

Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 
Connected as system
 
SQL> 
SQL> SELECT CONVERT(P.TITLE,'US7ASCII') CONVERTED_TITLE, P.TITLE
  2    FROM PRODUCTS                   p,
  3         DISCIPLINE_INFO           DI,
  4         PRODUCT_INTERNET_INDEXES   I,
  5         DISCIP_NUMBER2DISCIP_CODE DC
  6   WHERE P.PROD_ID                                 = 839118
  7     AND I.PROD_ID                                 = P.PROD_ID
  8     AND I.DISCIPLINE_CODE                         = DC.DISCIPLINE_CODE
  9     AND DC.DISCIPLINE_NUMBER                      = DI.DISCIPLINE_NUMBER
 10     AND I.CATALOG_CODE                            = 'INET'
 11  /
 
CONVERTED_TITLE             TITLE                          
Guide d'ecriture            Guide d'écriture

SQL> SELECT CONVERT(P.TITLE,'US7ASCII') CONVERTED_TITLE, P.TITLE
  2    FROM PRODUCTS                   p,
  3         DISCIPLINE_INFO                           DI
  4   WHERE P.PROD_ID                                 = 839118
  5     AND EXISTS ( SELECT 1
  6                    FROM PRODUCT_INTERNET_INDEXES   I,
  7                         DISCIP_NUMBER2DISCIP_CODE  DC
  8                   WHERE I.PROD_ID                                 = P.PROD_ID
  9                     AND I.DISCIPLINE_CODE                         = DC.DISCIPLINE_CODE
 10                     AND DC.DISCIPLINE_NUMBER                      = DI.DISCIPLINE_NUMBER
 11                     AND I.CATALOG_CODE                            = 'INET'
 12                     )
 13  /
 
SELECT CONVERT(P.TITLE,'US7ASCII') CONVERTED_TITLE, P.TITLE
  FROM PRODUCTS                   p,
       DISCIPLINE_INFO                           DI
 WHERE P.PROD_ID                                 = 839118
   AND EXISTS ( SELECT 1
                  FROM PRODUCT_INTERNET_INDEXES   I,
                       DISCIP_NUMBER2DISCIP_CODE  DC
                 WHERE I.PROD_ID                                 = P.PROD_ID
                   AND I.DISCIPLINE_CODE                         = DC.DISCIPLINE_CODE
                   AND DC.DISCIPLINE_NUMBER                      = DI.DISCIPLINE_NUMBER
                   AND I.CATALOG_CODE                            = 'INET'
                   )
 
ORA-01482: unsupported character set
 
SQL> 


please help.

thanks,
Prakash

Tom Kyte
March 15, 2010 - 11:10 am UTC

no creates, no inserts, no way to reproduce, no look.

there is far too little 'information' here to say anything

utf8

sam, March 20, 2010 - 1:20 pm UTC

Tom:

hope your enjoying the nice weather!

Can you tell me if my plan of action is correct ot not.

If i store a l_clob variable (xml file) into a column with NCLOB data type, the oracle stores the data in UTF8 format assuming the national character set is set to UTF8.

Then if i download that file via mod_plsql, the file is sent as UTF8 format.


BTW, i am not sure if the browser does character set conversion. from what I understand you tell the browser the character set of the file, and then it will display it based on that characer set. i think it is more for display purposes.
Tom Kyte
March 22, 2010 - 8:26 am UTC

it is -3C where I am, snow on the ground, wet. Nice weather? Not yet.

only if your character sets are identical all of the way through - then it will work entirely by accident. All it will take is the client (the app server, web server whatever you want to call it) making a change and no, it won't be that character set anymore. You already knew that Sam, we've had that discussion.



utf8

A reader, March 22, 2010 - 10:14 am UTC

Tom:

I can not control the user browser setting.
My reqt is to deliver to the user an XML file in UTF8 format..and i am done.

Mod_plsql will not change the character set if it is a file download so we are only concerend with database.

I did some testing with NCLOB column and tried to change the oracle client sql*plus on windows to UTF8 but it did not seem to work. I do not know the default setting.

I beleive the below shows that data was delivered to client in UTF16 format. All i need to do is change the National Character set to UTF8. Do you still see issues.

SELECT * from NLS_SESSION_PARAMETERS

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_SORT                       BINARY
NLS_TIME_FORMAT                HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH.MI.SSXFF AM

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_TIME_TZ_FORMAT             HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY              $
NLS_COMP                       BINARY
NLS_LENGTH_SEMANTICS           BYTE
NLS_NCHAR_CONV_EXCP            FALSE



c:> set NLS_LANG=AMERICAN_AMERICA.AL32UTF8


SQL> create table x (doc clob,ndoc nclob);

Table created.

SQL> desc x
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
DOC                                                CLOB
NDOC                                               NCLOB

SQL> insert into x values (chr(128),chr(128));

1 row created.

SQL> commit;

Commit complete.

SQL> select dump(dbms_lob.substr(doc, 4000, 1),1016) from x;

DUMP(DBMS_LOB.SUBSTR(DOC,4000,1),1016)
--------------------------------------------------------------------------------
Typ=1 Len=1 CharacterSet=WE8ISO8859P1: 80

1 row selected.


  1* select dump(dbms_lob.substr(ndoc, 4000, 1),1016) from x
SQL> /

DUMP(DBMS_LOB.SUBSTR(NDOC,4000,1),1016)
--------------------------------------------------------------------------------
Typ=1 Len=2 CharacterSet=AL16UTF16: ff,fd

1 row selected.

8.5 year thread

Ryan, March 22, 2010 - 10:57 am UTC

Wow... an 8.5 year thread. I wonder if that is a record.

I had some Character Set questions because we need to go from just western characters to greek. I got most of my questions answered with 1 Oracle Support ticket and yeah I read the docs first, but this type of chnage requires alot of work so I needed to be certain.

This is the type of question that can easily be answered through Oracle Support.

just a gripe... It is rather annoying that the default Oracle characterset does not support Greek. Alot of scientific databases (including mine) need to support it. When the same characterset supports Icelandic which very few people need.
Tom Kyte
March 22, 2010 - 11:36 am UTC

well, the current "default" varies by platform and is unicode based - so, we'll, it would support greek.

But, when installing a database, yes, one would typically make sure the character set they choose (you pick it, you tell us what it is) should be one that supports the set of characters you need to support.

character set

A reader, March 22, 2010 - 11:34 am UTC

Oracle does not write character sets. they only support it.

If ISO-88591 does not support greek, has nothing to do with oracle.

Your issue seems to be related to converting a whole database and there is more work invloved. I am only trying to generate one XML web page using UTF-8 format.

characterset choice

Ryan, March 22, 2010 - 2:42 pm UTC

Don't you work in the Oracle Government division? you know how government contracting works. they constantly change their minds... so an "oh yeah we need Greek characters" is not something to be surprised about.

In 10g this one appears to be the default choice. WE8ISO8859P1

This one probably should include greek, since it includes many languages that are used far less often.

I am not putting in an enhancement request here. Just blowing smoke.

charcter set

A reader, March 22, 2010 - 3:34 pm UTC

With oracle 11g, oacle documentation recommends to use unicode (utf8 or utf16) for new deployments as it can support any language on planet.

That way you do not worry about anything.

I think asktom site is UTF-8 and it does not store any foreign language.

It's all Greek to me...

Duke Ganote, March 22, 2010 - 8:21 pm UTC

In this askTom thread I demonstrated how I stored Greek and other languages in a 9iR2 database:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10250420313043#27317016562512
including examples with unicode Greek symbols, such as those found here:
http://www.unicode.org/charts/PDF/U0370.pdf

The browser can fool you if you don't pay attention.

Galen Boyer, March 23, 2010 - 9:20 am UTC

Hi Mike Friedman from Shanghai,

> Face it... just because you have no intention of using other character
> sets that does not mean that everyone else will cooperate with you.
>
> Look here: &#20013;&#25991;
>
> I just typed some Chinese. Doesn't matter what you planned - I didn't
> cooperate.

Incorrect. You didn't store chinese characters in the database. You
typed chinese into your browser and your browser turned it into ascii.
What gets stored in the database is &#20013;&#25991; which is exactly
16 ascii characters. But, then, just to really fool you, what happens
on the way out of the database is that those characters are seen by
the browser, and, vhalla, you see chinese characters. You type in
chinese characters and you get out chinese characters. Must have
persisted chinese characters, eh? But, you were duped by the browser
into thinking what was stored was chinese characters.

Store the following as a tstchinese.html file and open it up.

<html>
<body>
&#20013;&#25991;
</body>
</html>

What you see is chinese characters, but the actual data that is being
read is just 16 characters of plain ascii.

what gets stored in the database

Duke Ganote, March 23, 2010 - 3:53 pm UTC

Correct, it's "just codepoints" in the database, as can be verified with Unicode's "Unihan" database.
http://www.unicode.org/charts/unihan.html
Although intended for Chinese characters, the Unihan lookup seems to contain the full gamut of Unicode characters. For example, if you look up the 03A6 codepoint, you'll see a Greek character.
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=03A6

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode...

Duke Ganote, March 24, 2010 - 8:43 am UTC

UTF8 string in an US7ASCII db?

A reader, April 07, 2010 - 7:36 pm UTC

Looks like we have ISO8859P1 characters stored in an ASCII db. The client NLS_LANG is UTF8.

Here's an example -

select name, dump(name, 1016) from users where id = 931402;
t�s�sstt� Typ=1 Len=9 CharacterSet=US7ASCII: 74,e9,73,e9,73,73,74,74,e9

Is there any way to write a sql query given the weird settings to fetch the string such that it displays in the correct format maybe using functions like convert etc? The actual string is 'tésésstté'.
Tom Kyte
April 13, 2010 - 7:45 am UTC

the actual string cannot be that since the client that inserted it promised to only give us 7 bit ascii.

likely your best bet would be to cleanse the data, connect as us7ascii, read the data out (it will not be converted), save it - connect as the character set of choice and put it back.

You can try using the convert function - but beware, it will assume that the source data is VALID 7bit data and then convert that to 8 bit data.

NCAR

Aru, April 14, 2010 - 7:29 pm UTC

Hi Tom,
We have a 10gR2 database that is:-
LANGUAGE AMERICAN
TERRITORY AMERICA
NLS_CHARACTERSET WE8ISO8859P1
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_NCAR_CHARACTERSET AL16UTF16

We want to now store chinese characters in the database and display chinese characters on our portal.

We have changed the content column from CLOB to NCLOB, by creating a new column as NCHAR, updating it with CLOB content and renaming the NCHAR column to the CLOB column after dropping the CLOB column.
The display still is showing 'traditional Chinese: ¿¿¿'.

We are really stumped here. Is there any way this can be done. We cannot change the database character set as the database has more than 20 schemas on it and this functionality is needed by only one.
Pleaseee can you advice if possible or any other way it can be done.
Regards,
Aru.


Tom Kyte
April 15, 2010 - 8:07 am UTC

did you actually, you know, insert anything with chinese characters?

is the client connecting actually specifying chinese as their characterset?

monolingual sort and multilingual sort

A reader, October 27, 2010 - 11:07 pm UTC

Hi Tom,

1) I read the 9i Globalization support guide, but couldn't understand the part "Oracle compares character strings in two steps for monolingual sorts. The first step
compares the major value of the entire string from a table of major values. Usually,letters with the same appearance have the same major value. The second step compares the minor value from a table of minor values.
"

where do it get the minor and major tables? where is it defined? I can understand A and a can be only sorted in ascending order in the basis of their minor values. Is it true?

2) Also in the same document I found that a Multilingual sorting has got 3 levels of sorting: i) sorting to evaluate base characters, ii) sorting among base characters (diacritic A and A ), iii) after second level of sorting it sorts with Case.

Is my understanding right?
Now why would we need a Multilingual sorting? Any enhanced effect?

Tom Kyte
October 28, 2010 - 11:58 am UTC

1) it is the very definition of the character set itself. It is a standard dictated to people that implement the character sets.

2) why would you need multilingual sorting? to get the data in the right order of course.

ora-12716

Ari, November 18, 2010 - 9:49 pm UTC

Hi Tom,

I was trying to convert to UTF8 from US7ASCII, My database has CLOB columns table in SYS and SYSTEM schemas.It is not allowing me to change the character set, I got

ORA-12716: Cannot ALTER DATABASE CHARACTER SET when CLOB data exists error message.


1) I heard that "alter database character set" statement is possible only when migration is done between single byte encoded character sets or multibyte encoded character sets. Probably I have violated the rule. Isn't it?
2) Now how can I migrate CLOB data?
Should I use export-drop-import steps to convert it from single byte to multibyte? But would SYS table export/import takes place in this way?

nls chars

Mani, September 09, 2011 - 11:21 am UTC

create table um2(x varchar2(1000));

insert into um2
values ('110830,Própria'); -- using sql developer

commit;

insert into um2
values ('110830,Própria'); -- using sqlplus session.

commit;

when querying in sqlplus or sqldeveloper its shown as

110830,Própria
110830,Pr�pria

while first one is what it should be second one is incorrect. both sqldeveloper and sqlplus are invoked from same machine.

dump of the values looks like

Typ=1 Len=15: 49,49,48,56,51,48,44,80,114,195,179,112,114,105,97
Typ=1 Len=16: 49,49,48,56,51,48,44,80,114,239,191,189,112,114,105,97

what could be the reason for this.

Tom Kyte
September 10, 2011 - 2:52 pm UTC

check out the characterset of the TERMINAL you are using. The second one would be incorrect if your terminal is using a different characterset than you are telling the database you are using.

@Mani on nls chars

Stew Ashton, September 11, 2011 - 5:08 am UTC


In addition to what Tom said, SQL Developer uses the JDBC Driver which does not operate exactly the way SQL*Plus does. I suggest you ask in the SQL Developer forum at https://forums.oracle.com/forums/main.jspa?categoryID=84 specifying your database character set, your SQL Developer version and NLS settings.

With SQL Developer V3 and a database in AL32UTF8, I cannot reproduce your problem.
Tom Kyte
September 11, 2011 - 9:08 am UTC

they are not having the problem with sql developer - because that is not running in a terminal. They are having an issue with sqlplus - and that is likely because the terminal they are running sqlplus in is using the wrong characterset encoding.

unsure of character set used by terminal

Mani, September 13, 2011 - 5:12 am UTC

Thank you Tom for your review comments.


Terminal I use is a MS windows XP Prof 2002 one where the language option just says English (US) no clarity as to which character set it uses, any idea as to how to get that.

One thing to note is when typing the insert command in sqlplus window it displays as it is, only upon retrieval from the DB it shows as junk.

Any help that you can offer?

thanks
Mani
Tom Kyte
September 13, 2011 - 6:24 am UTC

I don't do windows, sorry.

It is your terminals character set or your clients character set (NLS setting in registry).

Console character set

Martin Vajsar, September 13, 2011 - 7:22 am UTC

The encoding for Windows console can be set using Regional Settings control panel (renamed to Region and Language in more recent versions). Details depend on the version of Windows you're using, but in general, there should be an option called "Language for non-Unicode programs" somewhere. Here you choose the encoding for all consoles.

However, I'd suggest the following: save the commands you want to run into a SQL script (in whatever encoding you choose).

Use the NLS_LANG environment variable to make the sqlplus client use the same encoding - see http://download.oracle.com/docs/cd/B28359_01/server.111/b31189/ch11.htm#SQPUG021 for details. You can set it system-wide as mentioned in the documentation or just before running the script, eg. in a batch file.

Then run the script from sqlplus. If you get the encodings right, it should work.

In Oracle 11g and Sql Developer

A reader, October 05, 2011 - 9:19 am UTC

HI Tom - here is a related and problem and hope your views will help to clear all doubts.

Database Version - Oracle Database 11g Enterprise Edition Release 11.1.0.6.0

Created a table :
09:16:47 SQL> desc ecs
 Name                                      Null?    Type
 ----------------------------------------- -------- -------------------------

 A                                                  VARCHAR2(1000)
 B                                                  BLOB
 C                                                  CLOB

Settings in DB:
NLS_CHARACTERSET WE8MSWIN1252

inserted the following text into all the columns:
Forges et Aciéries de la Marine et d'Homécourt

1. when the data is viewed through Sql Developer , it is as was inserted.

2. when viewed through sql plus in the windows server box , it is as 

Forges et AciΘries de la Marine et d'HomΘcourt

What according to you could be the issue and fix for this.

Clob created from Chinese resulset has issues

Malhotra, July 19, 2012 - 12:51 pm UTC

I have a table which stores chinese data correctly. My Database supports Chinese data.

Now I use dbms_xmlquery to generate an XML on data mentioned above.

dbms_xmlquery.setencodingtag(l_ctx_schema);uses default encoding, which is ISO8559-1. And the XML geneated has corrupted chinese data.

Should I set UTF-8 as encoding. Would you know if ISO8559-1 supports Chinese.

I am not sure if asktom is the right forum for this query but I had no one else to look for. Please help.....

PARAMETER VALUE
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET WE8ISO8859P1
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_RDBMS_VERSION 11.2.0.2.0

Tom Kyte
July 19, 2012 - 3:54 pm UTC

I have never had any chance to work with that character set. I would try the XML forums on otn.oracle.com

To Malhotra ...

José Laurindo Chiappa, July 19, 2012 - 5:33 pm UTC

Malhotra, I have used extensively ISO-8559 (in special ISO 8559-15 for Portuguese) : by definition it is a Latin-alphabets compilation (same basis, some special characters in some derivated parts, such as the Euro sign, absent in 8859-1 and present in 8559-15)... To the best of my knowledge, your answer will be NO for 8859: ISO-8559 is a 8-bit (one byte) encoding, and as refered many times in this thread before, non-Latin languages needs multibyte encoding (at least two-bytes), due to the fact that much more than 255 characters exists in these languagens, even in the most simplified alphabets....
http://www.khngai.com/chinese/charmap/Chinese gives some history about the multiple character codes developed to encode Chinese characters, but Unicode is voted as the most flexible, capable to express both Simplified and Traditional symbols, so I would say go for it...
Just a point, when we talk about Unicode, you must choose between UTF-8 and UTF-16 : in my humble opinion, only if you really know as a fact that you application really will need to deal with uncommon/old symbols OR if all of your strings will be in Chinese, choose UTF-16 - if not my vote is for UTF-8, with UTF-8 you will not need to deal with any endianess, UTF-8 uses 1, 2, or 3 bytes depending on the character, so that an eventual ASCII character is represented still as 1 byte, while UTF-16 uses uniformly 16 bits to represent a character ....

Hope it helps,

J. Laurindo Chiappa

Sourcing data into a UTF8 DB

Oracle, May 16, 2014 - 3:00 am UTC

BACKGROUND

We are building a datawarehose on oracle

Version : Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
Characterset :
"NLS_CHARACTERSET","AL32UTF8"
"NLS_NCHAR_CHARACTERSET","AL16UTF16"

We are required to store multilingaul data in our datawarehose. From a Storage perspective
since we have a UFT8 characterset we should be good.

Question I have is related to getting data from our data sources into the warehouse. Some
facts about our environment:

We will be recieveing data from 100 or so systems using files

These systems have varying DB/OS/Characterset environments
- some are Windopws others Unix
- Some are SQL Server, others Oracle
- Even the ones on oracle may have different character sets


The data from source system need to be provided using files that are sftp'ed to our
servers

OBJECTIVE

These source systems from which we are sourcing data have varying level of support for
multilingal data

Since out datawarehouse is UTF-8 which has all possible characters, we are trying to
find a solution where we can get data from various sources into the datawarehouse
along with the foreign characters coming through correctly (no garbling foreign
characters -upside down ?'s etc)

The data from source system need to be provided using files that are sftp'ed to our servers


QUESTION

How do i approach this problem.

Can systems using different DB/OS/Characterset environments - export data in a standard
unicode format that can be loaded in our datawarehouse What is this file format called ?
UNICODE ? UTF-8 ?

Or will the files exported from different source systems be encoded as per the DB/OS/
Characterset of their environment.If so how can i use them to load data in our datawarehouse
so that we do not garble foreign characters (upside down ?'s etc)


Thanks in advance!