Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Hamid.

Asked: March 12, 2001 - 4:45 am UTC

Last updated: April 25, 2011 - 1:23 pm UTC

Version: 8.1.6

Viewed 1000+ times

You Asked

Hi Tom,

Thanks you for your previous answer. My next question is:

We are planning to set up a database which can be accessed by people throughout the world in many different languages. I want to know if Oracle supports Unicode and what is the best source to go about this as far as the database is concerned.

Many thanks.

Hamid

and Tom said...

Yes, we support UNICODE. You would probably be creating a UTF8 database though. See

</code> http://docs.oracle.com/cd/A81042_01/DOC/server.816/a76966/toc.htm <code>

for details in this area.

Rating

  (28 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Hamid, March 15, 2001 - 12:53 pm UTC


Unicode

Hamid, March 15, 2001 - 12:54 pm UTC

I want to have a database in Unicode, within this database I want to create say a table with data (Text)stored in it also in Unicode. Now can people throught the world see the text in thier own languages and make sense of it. if not how can something like this be implemented.

Many thanks Tom.

Unicode

Donald Holliday, March 11, 2002 - 2:24 pm UTC

Sure would be nice if the reply included a link to a white paper with compilable, working code rather than the typical link to an Oracle manual table of contents.

Tom Kyte
March 11, 2002 - 3:40 pm UTC

Warning <rant = on>

well donald, if more people read the docs, I wouldn't get so many questions. Then I wouldn't have to post links to the docs.

If you browse around on this site, I think you'll find I try to answer things that aren't spelt out clearly in the docs with liberal examples. On the whole, I post links to docs when the docs clearly and properly answer the question.

Everybody races to condem the docs -- wonder how many people have actually READ them? The concepts guide for example -- totally excellent, wonderful book. People pay $40/$60 USD for books by others that aren't 1/2 as good yet never bother to read the concepts guide (a document I do tend to link to frequently here).

The NLS stuff -- most people don't even bother to read the list of documents we ship -- don't bother to find we have a document that covers this NLS stuff front to back. I'm not going to reproduce the wheel here, I'll be glad to point out where the wheel exists when proper.

<rant = off>



Docs good

Jim Kennedy, March 11, 2002 - 4:59 pm UTC

Love it that you can carry the docs around on a CD. I have found that if you take some time getting used to what is in what "book" and get familiar with the TOC (table of contents) then you can find information quite easily. In fact, often better than having a search engine look for you. Carrying a CD around or having a copy on your hard disk is so much easier than lugging the number of books around!

Unsing Unocde Supported Font - Bangla

Nazmul Hoque, August 04, 2004 - 4:53 am UTC

Hi Dear,

I himself Devolup application whene a can feed data by local font (this is a uniocde supported Banla Font and this is used in Ms word, People can type Bangla in Ms word), I can feed data and get report in the same font also, Where i using database 8.1.7 and developer 6i.
I can't type Heading of the report and coloum name in Bangla font, Please help me how can I use Bangla Font (Unicode Supported)in form builder to type a prompt and in report builder to type heading and column anem

Thanks in Advance for Co-operation
Nazmul Hoque



Tom Kyte
August 04, 2004 - 10:06 am UTC

sorry -- i neither working with unicode myself nor have i used the forms builder in many years.

please try otn.oracle.com -> discussion forums.

inserting multi-byte (japanese) character in a US7ASCII characterset database

Abraham, April 27, 2005 - 10:49 pm UTC

Tom,
I have a 9.2.0.4 database that has database characterset as US7ASCII. The National Character set is AL16UTF16.
I have not been successful in loading a single Japanese character into test table in this database. (My other database is UTF8, no problem there). Before I give the specifics of my experiment, just wanted to know if it is at all possible to store multi-byte characters in a US7ASCII characterset database? I used nvarchar2 datatype for my test table.

Tom Kyte
April 28, 2005 - 7:40 am UTC

define "not successful"

Here are the details:

Abraham, May 02, 2005 - 2:15 pm UTC

In my utf8 database:
SQL> select * from v$nls_parameters where parameter= 'NLS_CHARACTERSET'; 
                                                                         
PARAMETER                                                                
----------------------------------------------------------------         
VALUE                                                                    
----------------------------------------------------------------         
NLS_CHARACTERSET                                                         
UTF8                                                                     
create table utf_test (a varchar2(240);
              
Table created.
              
load a single row using this control file:
LOAD DATA
CHARACTERSET UTF8
INFILE 'utf_test.txt'
TRUNCATE
INTO TABLE utf_test
(a char (720))

using isqlplus -
select * from utf_test; -- I get one row
**) FireWall ÊøãÊ (é·íþÏ°à÷)¡¡2005/1êÅÓø Ñ¢Êࣺ12/1/2004-11/30/2005

SQL> select dump(a,1016) from UTF_TEST;                                            
                                                                                
DUMP(A,1016)                                                                    
--------------------------------------------------------------------------------
Typ=1 Len=79 CharacterSet=UTF8: 2a,2a,29,20,46,69,72,65,57,61,6c,6c,20,e7,9b,a3,
e8,a6,96,20,28,e5,86,97,e9,95,b7,e6,a7,8b,e6,88,90,29,e3,80,80,32,30,30,35,2f,31
,e6,9c,88,e5,ba,a6,20,e6,9c,9f,e9,96,93,ef,bc,9a,31,32,2f,31,2f,32,30,30,34,2d,3
1,31,2f,33,30,2f,32,30,30,35                                               
SQL>                                                               

In my us7ascii database:
select * from v$nls_parameters where parameter= 'NLS_CHARACTERSET'  
SQL> /                                                                   
                                                                         
PARAMETER                                                                
----------------------------------------------------------------         
VALUE                                                                    
----------------------------------------------------------------         
NLS_CHARACTERSET                                                         
US7ASCII                                                          
1 row selected.                                                              
SQL>                                                                     
  select * from v$nls_parameters where parameter = 'NLS_NCHAR_CHARACTERSET'
SQL> /                                                                        
                                                                              
PARAMETER                                                                     
----------------------------------------------------------------              
VALUE                                                                         
----------------------------------------------------------------              
NLS_NCHAR_CHARACTERSET                                                        
AL16UTF16

SQL>create table utf_test (a nvarchar2(240);
              
Table created.

load the same single row using this control file:
LOAD DATA
CHARACTERSET UTF8
INFILE 'utf_test.txt'
TRUNCATE
INTO TABLE utf_test
(a char (240))

Using isqlplus -
SELECT * FROM UTF_TEST;
**) FireWall ¢¯¢¯ (¢¯¢¯¢¯¢¯) 2005/1¢¯¢¯ ¢¯¢¯:12/1/2004-11/30/2005 

 -- No more Japanese characters. -- 

SQL> select dump(a,1016) from UTF_TEST;

DUMP(A,1016)
-------------------------------------------------------------------------------
Typ=1 Len=110 CharacterSet=AL16UTF16: 0,2a,0,2a,0,29,0,20,0,46,0,69,0,72,0,65,0
,57,0,61,0,6c,0,6c,0,20,76,e3,89,96,0,20,0,28,51,97,95,77,69,cb,62,10,0,29,30,0
,0,32,0,30,0,30,0,35,0,2f,0,31,67,8,5e,a6,0,20,67,1f,95,93,ff,1a,0,31,0,32,0,2f
,0,31,0,2f,0,32,0,30,0,30,0,34,0,2d,0,31,0,31,0,2f,0,33,0,30,0,2f,0,32,0,30,0,3
0,0,35


1 row selected.

Looking at the dump I would think that the database has correctly loaded values in the nvarchar2 column, but using isqlplus I am not able to display them. Initially I thought that the loader is not loading the data correctly, so I asked my question..

I looked at the source of my web browser page:
This is from my connection to the utf8 database:
...
...
<!-- Copyright (c) Oracle Corporation 2000, 2003. All rights reserved. -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
...
...
**) FireWall ÊøãÊ (é·íþÏ°à÷)¡¡2005/1êÅÓø Ñ¢Êࣺ12/1/2004-11/30/2005
...
...
This is from my connection to the us7ascii database:
...
...
<meta http-equiv="Content-Type" content="text/html; charset=WINDOWS-1252">
...
**) FireWall ¢¯¢¯ (¢¯¢¯¢¯¢¯) 2005/1¢¯¢¯ ¢¯¢¯:12/1/2004-11/30/2005
...
...

My questions:
Why does US7ASCII database send the characters in ascii to the web browser even though it is coming off an nvarchar2 column? Is there any setting on the database or the session that will help me do that? 

How do I set my isqlplus so that it senses that the data is from nvarchar2 column so that it does not default to charset=WINDOWS-1252?

Thanks for your help.
 

Tom Kyte
May 02, 2005 - 7:43 pm UTC

i'll ask you to contact support for setting up isqlplus's character set, it'll be a change on the server and I've not configured that product myself, i just use sqlplus.

We contacted the support, but..

Abraham, May 16, 2005 - 3:06 pm UTC

Dear Tom,

We contacted the support and they said that we need to convert our database character set to Japanese or UTF-8. I doubt if I really have to do that, otherwise what good is the nvarchar2 datatype for?

So my basic question is:
Is it possible to store and view multi-byte characters in a 9.2.0.4 US7ASCII database using nvarchar2 datatype? Configuration of the client may be a different issue, but I am trying to understand if this is at all possible.

Thank you for your help.

Tom Kyte
May 16, 2005 - 5:00 pm UTC

you should only need to configure isqlplus's character set (the CLIENT characterset).

unicode charset

reader, June 21, 2005 - 9:53 am UTC

Is there a performance overhead in choosing unicode utf8 database characterset instead of us7ascii even though I may not store any multibyte characters? Thanks.

Tom Kyte
June 21, 2005 - 5:07 pm UTC

yes, there can be -- if you use tons of string operations. But if you do it from day one, it becomes a sunk cost in a way (always there) so you are prepared for the day when you need it.

Just because you didn't put any multi-byte data in there, doesn't mean there cannot be any, so the assumption is the data could contain multi-byte stuff.

us7ascii would be a tad extreme, most people need the extended charactersets.

unicode performance

David, June 22, 2005 - 11:11 am UTC

Tom, you say for the above question, <quote> yes, there can be -- if you use tons of string operations <quote>

How this could be measured? What type of string operations would slow the query if I used utf8 as database characterset? Thanks.

Tom Kyte
June 22, 2005 - 4:47 pm UTC

substr, instr, anything that needs to get to the n'th character -- the 5th character is not 5 bytes into a string -- it is 5 characters and that could be 5, 6, 7, 8, 9, 10 or more bytes. So it takes more work.

Unicode

Kamal, July 07, 2005 - 7:05 am UTC

hi Tom

I need to generate the Unicode character "Less than or Equal to" (symbol with < with an underscore). The Unicode value for that character is 2264.

I tried this

select unistr('\2264') val from dual

it is printing only "="

how to get that symbol

Thanks
kamal

Tom Kyte
July 07, 2005 - 9:37 am UTC

[tkyte@desktop tkyte]$ export NLS_LANG=AMERICAN_AMERICA.UTF8
[tkyte@desktop tkyte]$ sqlplus /
 
SQL*Plus: Release 9.2.0.4.0 - Production on Thu Jul 7 09:33:13 2005
 
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
 
 
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
 
ops$tkyte@ORA9IR2> select unistr( '\2264' ) from dual;
 
UNI
---
\x{2264}
 
ops$tkyte@ORA9IR2>

worked for me (after setting my terminal to display utf8 that is, normally it displayed western (iso-8859-15) only, when set to that I see the a with a caret on top and some squiggly thing)

 

Characterset

Kamal, July 07, 2005 - 10:48 am UTC

Hi Tom

the following are my NLS parameters

NLS_LANGUAGE AMERICAN
NLS_CHARACTERSET WE8MSWIN125
NLS_DATE_LANGUAGE AMERICAN

I was not able to set my database character set to UTF8. it says we need to specify the Superset character set of the old characterset.

Then can you please tell me how to set the terminal display character. do you mean the font

I am not sure why i am getting "=" symbol when i execute the same query.

Thanks
kamal

Tom Kyte
July 07, 2005 - 1:07 pm UTC

you didn't -- mine wasn't, you have to have YOUR CLIENTS character set set.

what is your CLIENTS nls_lang??? That is what the data gets translated to.

You have a function returning UTF8 data -- your client characterset is "something" and the unistr return value is converted into that.

I use linux, gnome, for me it is terminal-> character coding in my terminal menu. Unless you are using linux.....

but set your CLIENT characterset, the NLS_LANG, so the database can return to your UTF8 data. But remember all data will now be utf8 coming and going from you!

Character Set

kamal, July 07, 2005 - 2:02 pm UTC

Hi Tom

I am using Oracle 9.2 in Windows 2000.

I have set the NLS_LANG value in the registry to AMERICAN_AMERICA.UTF8

Now i am getting what you have said before a with a caret on the top and some other things.

Now how to set the terminal character for SQL * PLUS or the Command prompt so that i can see that symbol

Regards
kamal

Tom Kyte
July 07, 2005 - 2:32 pm UTC

askbill@microsoft.com :)

not sure, I don't do windows much. You are getting the right data now, the client display is just mucking it up.

mutli lingual database,

sns, January 17, 2006 - 11:58 am UTC

I have a situation that people from US who logs into our database should see the output in ENGLISH. The people from Germany who logs into the same database should see the output in German language.

For example: the word "please" should be displayed as "please" if English is the language and "bitte" in German if the language is set to German.

Can you please elaborate the steps on how to make this happen?

Thanks,

Tom Kyte
January 17, 2006 - 4:14 pm UTC

this is not a multi-lingual database in as much as you need to make your application able to show multiple languages depending on the end users preference/location.

the database is already multi-lingual for things we control

ops$tkyte@ORA10GR2> select to_char(sysdate,'Day Month' ) from dual;

TO_CHAR(SYSDATE,'DA
-------------------
Tuesday   January

ops$tkyte@ORA10GR2> alter session set nls_language = 'French';

Session altered.

ops$tkyte@ORA10GR2> select to_char(sysdate,'Day Month' ) from dual;

TO_CHAR(SYSDATE,'D
------------------
Mardi    Janvier

ops$tkyte@ORA10GR2>

You'll be interested in the globallization guides:

http://www.oracle.com/pls/db102/portal.portal_db?selected=3#index-GLO

If you are using java, you might be interested in:

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/ch8gdk.htm#i1006182

a toolkit to help you implement an application that displays text in multiple languages.  It goes into localization of content like this:

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/ch8gdk.htm#i1007731

multi-lingual database design

Duke Ganote, January 18, 2006 - 10:06 am UTC

RE: sns' requirement:
> people from US ...should see ...ENGLISH.
> people from Germany ...should see ...German

In my experience, there are two basic design options. Either:
1. Pick a 'driving' language, like English, then all other languages are just options.
2. Use surrogates and be language-neutral.

I used the 2nd case recently because the application needed to
1. accept text input from users in their language,
2. be immediately usable for building a product document,
3. be sent out for translation for later display in the foreign language portion of the document.

If the transactions are fixed, and all the developers just work in one language, then the first option is the easiest.

The translation table in the first case is pretty simple in form (this is rough, not the actual table):

CREATE TABLE translation_text ( english_text VARCHAR2(2000),
other_language_code VARCHAR2(2), other_language_text NVARCHAR2(2000) );

In the second case,
CREATE TABLE neutral_text ( surrogate_key NUMBER,
language_code VARCHAR2(2), language_text NVARCHAR2(2000) );

Note I assume that all language texts are consolidated into a single table. I've also seen applications where the texts are split into multiple tables: one per application area, e.g. comments, headers, etc.

Caution: For very short phrases, multiple translations in the same foreign language may be needed, because one short phrase may have multiple implications in another language, depending on context. Think how many alternative definitions there are in the dictionary for a single word. Fortunately for me, in the application the user-input phrases where lengthy, scientifically oriented text, so I didn't need to add a context field to disambiguate them.

Question regarding Unicode datatypes in a single byte characterset db

Bill S., June 23, 2006 - 9:25 am UTC

Platform: Oracle 10gR1 on SuSe Linux 9

We have a database (which is a data warehouse) already established, vendor is getting ready to hand it over to us. About a month ago, they decided that an issue they were having could be resolved by converting some VARCHAR2 fields to NVARCHAR2, so they did this on development (of course, without telling us what they were doing). This resulted in wholesale changes to their application code as well.

Here we are now, almost a month later, and they want to push this change through on production as well. My concern is that the database was built with characterset WE8ISO8859P1, and the NVARCHAR2 datatype is Unicode. Is this going to cause problems ahead? I read the 10g Concept guide related to datatypes, and the use of this one kind of makes me wonder.
Note that for our purposes we do not require multilingual support, nor do we ever expect to (the nature of our business does not require it).

The reasoning behind using the NVARCHAR2 datatype was :

1) They want to be able to handle foreign languages in future implementations.

2) Their Microsoft .NET framework uses wide strings.

3) A byproduct of using NVARCHAR2 was that indexing went quicker.

The way the model is structured, there are about 3 teirs of tables that data gets passed through before going out to the final warehouse. The intermediate staging tables are all VARCHAR2, but the final warehouse tables are mostly NVARCHAR2.

My question now : Is there anything we should be looking for in terms of performance issues, backp/recovery issues, day-to-day db maintenance issues? I'm not familiar at all with using Unicode datatypes in a single-byte database, so my boss wants to find out as much as possible before we let the vendor proceed.


Tom Kyte
June 23, 2006 - 10:24 am UTC

define "problems ahead".

it will work as expected, but you might not expect how it works :) that could be a problem.


backup/recovery - no, not affected.
day to day maintanance? no...

however, things like substr, string functions in general - have to be looked at. people that touch this data need to be informed about what character set is proper and all. things that assume "a character is one byte" need to be aware (30 characters might need 90 bytes!)

Mostly things in the CLIENT.

Thank you!

Bill S., June 23, 2006 - 10:43 am UTC

I have been reading through the globalization support guide as well to learn whatever I can about this stuff, but the boss' main concern was whether it affects how we manage our database and how we grow it. Obviously using multibyte datatypes will constrain our field lengths, and cause some bloat. As you mention, it will also affect any future applications that get designed against it.
Thanks much for the quick answer - very much appreciated.

One last question if I may

Bill S., July 06, 2006 - 8:21 am UTC

I am curious as to whether or not unicode datatypes in a single-byte characterset database may potentially use more redo/undo? My thought was since the datatype actually uses more space than a VARCHAR2 to store data, it would stand to reason that more undo/redo space would be required as well.

Is this a reasonable assumption to make?

Thanks very much for your previous reply, it was very helpful.

Tom Kyte
July 08, 2006 - 9:49 am UTC

yes it would - probably not "materially more" (a few bytes here and there)

What about the effect on indexes?

Bill S., July 10, 2006 - 9:05 am UTC

Sorry, I lied - I have another question ;-D.
Thanks for your answer on redo/undo, was just curious. I have been asked to look at how using Unicode datatypes might affect indexes, since there was a claim that switching to NVARCHAR2 made the indexes go "faster". Can you think of any reason why this might be so, or possibly are we looking at one of those false causality situations? I don't have any numbers to provide, just a statement that they indeed were faster after converting.

Thanks very much!

Tom Kyte
July 10, 2006 - 9:21 am UTC

seems not likely - unless it was able to use an index that it could not use before.



which client characterset when administrating unicode databases

A reader, October 06, 2006 - 8:54 am UTC

hi tom,

i wonder if i have to set my client characterset to unicode (eg. NLS_LANG=AMERICA_AMERICAN.UTF8) in my working session when ADMINISTRATING unicode databases?

eg. do i have to set this in my client environment when CREATING a new unicode database? do i have to set this in my client environment when BACKING UP a unicode database with RMAN? ...

could there be any administration issues using a non-unicode client characterset, eg. WE8ISO8859P1. could you please discuss on this?

Tom Kyte
October 06, 2006 - 9:08 am UTC

depends, would like character set translation to occur.....
or not.

if you use differing charactersets, you can end up "changing data" just by fetching it out and putting it back.

eg: exp using differing charactersets, import using any charactersets, you have likely CHANGED the data as it was converted from character set A to B on the way out.

which client characterset when administrating unicode databases

A reader, October 09, 2006 - 7:16 am UTC

hi tom,

exp/imp issues on character set conversion are well known and even told to us by the utilities.

are there any other issues using a non-unicode client characterset when administrating a unicode database? eg. CREATE a unicode database, using RMAN to backup/restore a unicode database,...?

Tom Kyte
October 09, 2006 - 8:52 am UTC

rman is binary as far as the blocks go, but rman can run queries, characterset conversion will happen with queries, is that what you want - maybe yes, maybe no, depends on your use of rman. for example.

again, you choose, it is your choice. I would say in general, if you can avoid character set conversion - go for it, less processing.

which client characterset when administrating unicode databases

A reader, October 09, 2006 - 9:48 am UTC

the discussion with characterset conversion is always on "normal" data processing. i never saw an in-depth discussion on database administration and characterset conversion.

i just wanted to know if there are any pitfalls you know when administrating unicode databases using working session with non-unicode client characterset, eg. DON'T ... unicode databases with non-unicode sessions as you will ...

Tom Kyte
October 09, 2006 - 10:09 am UTC

not really, I don't understand the othorgonal relationship you seem to infer between "normal data processing and administration"

backup - binary.

what falls into "adminstration" for you? exp/imp - data unload and load tools, normal "data processing" for me.



which client characterset when administrating unicode databases

A reader, October 12, 2006 - 7:11 am UTC

normal data processing: select, insert, update, delete,... on any user schema

administration: create a unicode database, run some $ORACLE_HOME/rdbms/admin scripts as SYS, install/set up some additional database features or options, create new tablespaces, move some tables, add some constraints, rebuild indexes,...

Tom Kyte
October 12, 2006 - 8:29 am UTC

to create a unicode database - should be obvious what the character set is going to be, isn't it?

think about it. use the character set that makes sense, that is all. (in general, it'll be simply the databases character set of course).

running scripts is just "normal data processing"

which client characterset when administrating unicode databases

A reader, October 13, 2006 - 8:28 am UTC

tom, thanks for your time but i'm afraid i can't follow you.

do you say the client characterset doesn't matter when running oracle scripts against unicode databases?

let's assume i create a database (CREATE DATABASE ... CHARACTERSET AL32UTF8...) followed by the "catalog.sql" script. could there be any troubles if i do this with a client characterset WE8MSWIN1252?

Tom Kyte
October 13, 2006 - 2:27 pm UTC

I'm saying "you choose", but in general "use the same as the database was created it, that is sort of NORMAL" that is all.

Ricardo, April 20, 2011 - 4:11 am UTC

Hi Tom

My current 10g database has WE8ISO8859P1 character set. I want to install a fresh 11g database and use exp/imp utility to move the data.
I notice that WE8ISO8859P1 is not available in 11g hence I have to use WE8MSWIN1252.
We have some chinese characters in the database therefore I am not sure whether I should use WE8MSWIN1252 or unicode (AL32UTF8)

What is the disadvantage of using WE8ISO8859P1 as the database character set and using AL16UTF16 as the National character set ?
In this case we will need to use Nchar datatype to store chinese characters

Thanks in advance





Tom Kyte
April 20, 2011 - 8:07 am UTC

you want to use expDP and impDP - not exp/imp!!!

.. I notice that WE8ISO8859P1 ...

???? huh? sure it is.

... We have some chinese characters in the database therefore I am not sure whether
I should use WE8MSWIN1252 or unicode (AL32UTF8) ...

Now you've totally lost me as chinese characters are multibyte and we8 character sets are single byte. I don't know how you could have chinese in your existing database - unless you are storing them in nvarchar types - not just varchar types???


Ricardo, April 20, 2011 - 2:22 pm UTC

Yes sir,
they are currently in nvarchar2 datatypes.

What is the disadvantage of using WE8MSWIN1252 as the database character set and using AL16UTF16 as the National character set to store chinese characters ?
or
What is the advantage of not using AL32UTF8 as a database character set ?



Tom Kyte
April 25, 2011 - 7:36 am UTC

there is not any "disadvantage" in general, it depends on your needs.

If you need to almost always store we8mswin1252 data and rarely store some al16utf16 data - what you have is fine.

If you store lots of al16utf16 data - then probably that is the character set you meant to use in the first place.

You have to look at what you are doing. If you are going to 11g 'fresh', and are going to change character sets anyway, you might consider going multi-byte.

character set conversion

Sam, April 20, 2011 - 4:28 pm UTC

We went from 9i WEISO8859P1 to 11g WE8MSWIN1252 without any issues.

but we dont have chinese data too.

WE8MSWIN1252 is a seperset of WEISO8859P1 so you should be oK.

You need to run the database characterset scanner on your database and read the reports generated to see if you lose any data with the conversion.

your chinese characters of course would covnert to the NLS character set of the target database which should be multibyte set (i.e UTF8 or UTF16).
Tom Kyte
April 25, 2011 - 7:49 am UTC

I'm sorry sam, but think about this - how can one 8bit characterset be a superset of another 8bit charcaterset when they both use all of the alloted characters and they support different characters.

think about this please.

character set

sam, April 25, 2011 - 9:35 am UTC

Tom:

They are mostly the same except MSWIN1252 supports a few more characters.

Also read what you said on May 2006

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1224836384599


Followup May 8, 2006 - 8am Central time zone:

I would prefer to not put it on windows - truth be told - as I would be at a loss managing it :)

since you have a superset characterset, everything should go OK this way - but when you decide to
move back - you'll be going to a subset characterset.


Also, read this



http://en.wikipedia.org/wiki/Windows-1252

The encoding is a superset of ISO 8859-1, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 0x80 to 0x9F range.
Tom Kyte
April 25, 2011 - 1:23 pm UTC

you are correct, sorry about that.