Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question.

Asked: April 27, 2002 - 9:59 pm UTC

Last updated: April 14, 2011 - 10:11 am UTC

Version: 901

Viewed 10K+ times! This question is

You Asked

Hi Tom
Somehow there is very little practical and implementation specific knowledge on this topic available. I was wondering if you have touched upon this subject in your expert book.

Please help me in unraveling some doubts on the following.
Varchar2 vs Nvarchar

1)When I create a table with a nvarchar2(20) column, this creates an NVarchar2(60) datatype.
Why does oracle do this? Does this mean that oracle specifies this column to be a fixed width 3 byte character.

2)In oracle 9i we can specify the following
create table test(name varchar2( 20 char));

This means that now we can add 20 characters whether single or multibyte(2,3,4).
So does this mean that now we dont need to use Nchar or nvarchar2 datatype. What is the essential difference between varchar & nvarchar.

3) Can I store unicode characters in a varchar column. Suppose I have a varchar2(60) column . I am sure this can store a spanish character as well as english. So the only possible loss would be character truncation . But if I design this column to take the max of spanish characters {eg (varchar2(200))}that can be stored in this column, then I dont have to use nvarchar?Is this correct.

Thanks for the help


and we said...

No, my book doesn't really cover it as it is something I rarely touch.

1) Well, it doesn't, you would have to tell me your characterset in order to comment.

ops$tkyte@ORA9I.WORLD> create table t ( x varchar2(20 char),
2 y varchar2(20),
3 z nvarchar2(20)
4 )
5 /

Table created.

ops$tkyte@ORA9I.WORLD>
ops$tkyte@ORA9I.WORLD> desc t
Name Null? Type
----------------------------- -------- --------------------
X VARCHAR2(20 CHAR)
Y VARCHAR2(20)
Z NVARCHAR2(20)


2) Nvarchar types in 9i are purely Unicode datatypes (changed from 8i).

See
</code> http://docs.oracle.com/cd/A91202_01/901_doc/server.901/a90120/ch4_dep.htm#9789 <code>

that also has a pointer to the Oracle9i Globalization Support Guide, which is the book you are looking for.

varchar2(20 char) means you can store 20 characters -- whereas varchar2(20) means you can store 20 bytes. the varchar2(20 char) might take 20, 40, 60, 80 or more bytes to hold a string. The maximum length of a varchar2 is still 4000 bytes so varchar2(4000 char) is "misleading" in that the field will max out at 4000 bytes -- not 4000 characters.

This is meaningful in multi-byte encoding schemes.

3) Yes, if your database uses such a characterset (eg: utf8)

You would use varchar2(60 char), not varchar2(200) -- that would be the better practice.


Rating

  (20 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

clarification on varchar2 and nvarchar2

YenYang, May 16, 2004 - 10:51 am UTC

hi,
sorry, my question wud seem very silly but none of the books nor manual talk abt these things with an example. incase if its repeated question my apologies for that. even if u know some link i wud be very happy rather than, u wasting ur time in explaining.
even if someone wud send me the link it wud be fine with me :)

is it something like this nvarchar2 refers to national character set and varchar2 refers to database charecter set. whatever character set i choose in national character set it wud refer to nvarchar2. suppose if i've a national character different from database character then i wud use nvarchar2.
(a) like suppose my data would contain some in chinese and some in english but the source code (sql / plsql statements) wud be in english. to store chinese i'll use nvarchar and for english i wud use varchar2 but the database char set wud be 7bit ascii

(b) suppose i've source code in chinese and data also in chinese then multibyte character set must be choosen for both database character set and national character set. which one we'll be using nvarchar or varchar2? i think we'll be using nvarchar2 throughtout our database. inshort varchar2 datatype is not used at all ? am i correct ?

(c) i didnt understand the concept behind why we cant use a fixed-width multibyte character set as database character set ?

(d) also i cud not understand the sorting as per major and minor in monolingal sorting ?

expecting to get clear answer.

Tom Kyte
May 16, 2004 - 5:03 pm UTC

translation table used for processing input was:

wud = would
u = you
ur = your
abt = about

(i think your keyboard might be on the fritz? vowels are disappearing left and right)

a) yes, that would be one use.

b) you would pick a better characterset, you use nvarchar for exceptional data, if all of your data is "exceptional", you picked the wrong character set!

c) don't understand the question

d) you'll want to read that globalization guide referenced above! it goes into great detail.

SR, May 17, 2004 - 9:13 am UTC

Bad luck YenYang ! I was sure that this would happen. The "chatting language" is the one thing that makes Tom furious. He had already expressed his discomfort on this many times before. Tom, here is a known Achilles Heel to your otherwise prodigious patience :-)

Tom Kyte
May 17, 2004 - 3:07 pm UTC

if I was furious, I wouldn't address the question. I'm trying to be tongue in cheek about it :)

What is the difference?

A reader, May 17, 2004 - 3:31 pm UTC

Hi Tom,
Well i read about the varchar2 and char, and then the nchar and nvarchar2 data types. To be frank i am a bit confused from the documentation as it does not explain it in depth.
Could you please explain these two datatypes and how they differ from the conventional ones? And also, under what circumstances should they be used?
Thanks as always

Tom Kyte
May 17, 2004 - 4:18 pm UTC

varchar2 -- stores data in the databases base characterset

nvarchar2 -- stores data in the national characterset

so, you could have a column with data stored in 7 bit ascii (us7ascii) right next to a column with data stored in utf8 -- in the same database.


useful because single byte data is simple, fixed width, smaller faster -- but sometimes you need to store multi-byte data in the same database. rather then make everything multi-byte, you can just store the small amount of multi-byte data "special"

Globalization

THIRUMARAN, September 28, 2005 - 5:18 am UTC

Hi Tom,

i am working on ORACLE 10g R2 , i need to create a db table which supports muliple languages (Internationalization).
As mentioned above in this article

A) when i need to go for Globalization support my table should support/accept these data ( ENGLISH , JAPANESE,FRENCH)etc
should i have 2 columns 1) to store english data & 2) to store data other than english if my DB (WESTERN) & National character set(AL32UFT8) are different

B)I have a column "DESCRIPTION" i entered the value
"WHAT IS YOUR NAME" in this column in english, but if any other language person needs to view this should i store the equavelent data in other languages or oracle will convert this
for example: if i need to see the same data in french will oracle convert into "qui est nome ? " ro i need to design the table to store the equavelent data

i read the latest "Going Global "article
:</code> http://www.oracle.com/technology/oramag/oracle/05-sep/o55global.html <code>
Oracle9i Database introduced character semantics, which changed the way multibyte characters were treated. Instead of doubling or tripling column or variable precision, setting NLS_LENGTH_SEMANTICS = CHAR causes Oracle Database to treat storage of the string 'Today' the same as the Japanese string '&#12371;&#12435;&#12395;&#12385;&#12399;'.








Tom Kyte
September 28, 2005 - 9:58 am UTC

You'll want to read this document:

</code> http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/toc.htm <code>

we do not have *translation* software in the database, that you have to supply. we'll store the data, but we don't translate it.

You probably want a single characterset capable of holding any of your languages, not a character set and National characterset.

Internationalization

Thirumaran, September 28, 2005 - 10:29 am UTC

Hi Tom,

Thanks for your reply and the the documentation link.
As mentioned by you
"You probably want a single characterset capable of holding any of your languages, not a character set and National characterset. "

What is the probable single characterset (oracle 10g R2) i should use to store multilanguage support . (english,spanish,french,japanese)
just a suggestion from you !

I am reading the oracle10g globalization support document currently.

thanks
thirumaran


Tom Kyte
September 28, 2005 - 10:42 am UTC

AL32UTF8 is your likely candidate - the "universal" one.

A reader, November 08, 2006 - 1:12 am UTC

Hi Tom,
We are in the process of developing application to support CJK and Western European languages.

We use AL32UTF8 as character set and AL16UTF16 for National character set.

We are developing application and using UTF-16 Unicode( OCI wide API) because of faster string processing capability.

1.Is it good approach to change every CHAR and VARCHAR2 columns in the database to NCHAR and NVARCHAR2 to avoid character set conversation from UTF 8 to UTF16.

2. Is conversation from UTF-8 to UTF-16 considerable overhead?

3.We have more Western European characters (85%) and few Asian characters; I am guessing that if we convert every VARCHAR to NVARCHAR2 we may almost double the database size and requires to double SGA size for same performance before conversion. Could you advise if this is reasonable guess?
4.Our application team using UTF-16 for faster string processing (over UTF8 variable width) but database uses UTF-8 not UTF-16 for CHAR type, Is is good approch to develop application to use UTF-8 instead of UTF-16 ?
Thank you,
V

advise

A reader, November 09, 2006 - 8:21 pm UTC

Hi Tom,
We are in the process of developing application to support CJK and Western
European languages.

We use AL32UTF8 as character set and AL16UTF16 for National character set.

We are developing application and using UTF-16 Unicode( OCI wide API) because of
faster string processing capability.

1.Is it good approach to change every CHAR and VARCHAR2 columns in the database
to NCHAR and NVARCHAR2 to avoid character set conversation from UTF 8 to UTF16.

2. Is conversation from UTF-8 to UTF-16 considerable overhead?

3.We have more Western European characters (85%) and few Asian characters; I am
guessing that if we convert every VARCHAR to NVARCHAR2 we may almost double the
database size and requires to double SGA size for same performance before
conversion. Could you advise if this is reasonable guess?
4.Our application team using UTF-16 for faster string processing (over UTF8
variable width) but database uses UTF-8 not UTF-16 for CHAR type, Is is good
approch to develop application to use UTF-8 instead of UTF-16 ?
Thank you,
V



A reader, November 12, 2006 - 7:49 pm UTC

Hi Tom,
Could you please reply to my questions above.
Thanks a lot.
V

Internationalization -

THRJG, November 24, 2006 - 2:56 am UTC

I feel that there are some good question pending on this subject, that my self have been wondering around, may be those once have creep in to this thread (I believe those are very worth considering Dear Mr. Tom) but if any of you know where to find the information on more details related to performance implications on using unicode db and nvarchar approach, I would appreciate it.

I personally believe that NLS_NCHAR is for previous era, prior to unicode approach, so if building new applications should use unicode, if you have something old/legacy :) stick to NLS_NCHAR's.


A reader, January 10, 2007 - 10:37 pm UTC

Tom
As always your input is most valuble in Oracle users community. COuld you please take some time to answer my question above ?
Thank you,
V

Client character set must be subset of server

Dieter Oberkofler, February 23, 2007 - 1:42 am UTC

Hi Tom,
After having read this discussion and a several Metalink TARs without a straight answer from Oracle I would like to get your opinion on this:
We are implementing Oracle Database in several Western and Eastern European countries and are (depending on the necessity of mixed clients) either using a Unicode (UTF8) or one of the appropriate ISO (WE8ISO8859P15, EE8ISO8859P2, etc.) characters sets on the server.
In a typical environment, we now use clients using the same character set as the database or if the database uses UTF8 a subset of the database character set.
So far so good, but what if someone running against a non Unicode character set changes the character set to no longer be a subset. If for example client and serer both use WE8ISO8859P15 Oracle simply stores and retrieves the clients characters without any translation but what if a client with character set EE8ISO8859P2 uses the same server. It is my understanding, that Oracle in this case Oracle simply stores the characters tat are not part of the server character set in "some free" location within the server character set and by doing so mixes characters on the server and can no longer the correctly retrieved by other clients.
I do understand that there isn't a lot Oracle can do in such a configuration but why does it not simply reject this configuration of client and server or at least warn about this kid of misconfiguration between client and server character set? For us this is the number one short come in Oracle's I18N support and was never able to understand it.

Tom Kyte
February 23, 2007 - 7:47 am UTC

We do what you ask us to do.

A client says "this is my character set"

The database was set up with "that character set"

character set conversion is precisely what has to happen there - anything the client provides must be converted from their character set to the databases character set and vice versa.

There is nothing to "warn about" - applications can "warn" if they see fit (eg: export and import sure do - precisely because of this)

If your applications feel they should warn, your applications should "warn"

There is actually no way for the database to "warn" the client - the client is the only thing that can do IO with the end users.

So use Unicode

Michael Friedman, February 23, 2007 - 10:12 am UTC

The problem you describe is one of the (many) reasons I think it's irresponsible in this day and age to build a system in anything but Unicode.

Why do you want to make your life painful (and expensive!) by supporting systems in multiple character sets? KISS - Keep It Simple Stupid!

That said, it would be nice if Oracle has some validation routines that could be used to determine which if any characters in a Unicode string are not valid characters within a given national language...

"might be on the fritz"

A Reader, February 23, 2007 - 11:51 am UTC

what does
"might be on the fritz" mean ?
I'm not a native English speaker and I didn't hear that before.
I know "Fritz" only as a German Christian name and also
"Deep Fritz" as the world's best chess program.

meaning of fritz

A reader, February 23, 2007 - 1:58 pm UTC

http://m-w.com/dictionary/fritz

Main Entry: fritz
Pronunciation: 'frits
Function: noun
Etymology: origin unknown
: a state of disorder or disrepair -- used in the phrase on the fritz

Client character set must be subset of server

dieter, February 24, 2007 - 3:53 pm UTC

Hi Tom,

I agree with you that there is nothing the database can do if the client character set does not match the server character set but cannot follow why the database should not "warn" if we are going to loose data.

Clearly the warning must be "shown" by the application but Oracle could inform of a mismatch. The Oracle client could recognize the mismatch (that would cause to loose characters) and for example return a status (not an error) like the ORA-24345 or other ones that notify warnings.
The minimum would be an OCI level function that can be used to ask if the character sets would match.

Cheers

Tom Kyte
February 26, 2007 - 1:12 pm UTC

you are not loosing data

you have explicitly said to us - YOU HAVE SAID TO US - this is what I speak, this is the way I talk, deal with it.


the function to see if the character sets match - it is called SQL, you know how to query the database, the information is right there.

i18n Multi-Language Data

A Reader, March 01, 2007 - 8:57 am UTC

Hi, firstly thanks for you attention.

Please I need your advice. We have to create a structure to keep data translation, for instance, in my product table we have to keep the english product description, and french product description. The problem is, some language terms are differents according to a region, for example American says subway, but English people says tube. We are planning to create one table for each language, and one more for recording exceptions (term differents) this is our select:

SELECT t.translation, field_id
FROM tint_data_content_en t
WHERE t.table_id = 'PRODUCT_TABLE'
AND t.field_id IN ('DESCRIPTION','SHORT_NAME') /*Field to Get translation*/
AND t.table_index = 1 /*This is Record PK in Prodcut_Table*/
AND NOT EXISTS ( SELECT 1 FROM tint_data_content_en_uk e
WHERE t.table_id = e.table_id
AND t.field_id = e.field_id
AND t.table_index = e.table_index)
UNION
SELECT translation,field_id FROM tint_data_content_en_uk x
WHERE x.table_id = 'PRODUCT_TABLE'
AND x.field_id IN ('DESCRIPTION','SHORT_NAME')
AND x.table_index = 1

I don't how can optimize this SQL SELECT. If you see we are using the same table in some way.

By The way, In the beggining we were thinking to create multi-language fields as XMLTYPE or CLOB, that is, inserting int the same field language-tags for each language ? something like:

<en>subway</en>
<en_uk>tube</en_uk>
<es>metro</es>

My position is this solution would be more efficient than the current one. What do you think?

Thanks in Advanced, And I'm sorry beacuase I'm not native english speaker.


-- Create table
create table TINT_DATA_CONTENT_EN
(
TABLE_ID VARCHAR2(32) not null,
FIELD_ID VARCHAR2(128) not null,
TABLE_INDEX VARCHAR2(128) not null,
TRANSLATION VARCHAR2(4000)
)
/

alter table TINT_DATA_CONTENT_EN
add constraint TINT_DATA_CONTENT_EN_PK primary key (TABLE_ID,FIELD_ID,TABLE_INDEX)
/

create table TINT_DATA_CONTENT_EN_UK
(
TABLE_ID VARCHAR2(32) not null,
FIELD_ID VARCHAR2(128) not null,
TABLE_INDEX VARCHAR2(128) not null,
TRANSLATION VARCHAR2(4000)
)
/

alter table TINT_DATA_CONTENT_EN_UK
add constraint TINT_DATA_CONTENT_EN_PK primary key (TABLE_ID,FIELD_ID,TABLE_INDEX)
/

Tom Kyte
March 02, 2007 - 11:38 am UTC

why would you have a table per language? one table


and there is "UK english" and "US english" - I'm not a fan of the exception system - why not treat them as two different languages if you want to have two different messages. You'll find that the UK english and its lack of use of the letter Z will make the exception the rule!

multiByte characters insertion/selection from oracle database

bilel khedira, September 04, 2009 - 10:15 am UTC

Hi Tom,
in my database, I have a table which some of its attributes are in multi-byte character. I have done some researchs and these are the results found:
1- oracle provide NCAHR and NVARCHAR2 types for attributes to be persisted in multibyte characters,
2- we must set NLS_LANG environment variable used to define language, territory and character set properties: export NLS_LANG=Japanese_Japan.JA16SJIS for japanese charcter set for example (for oracle we have also ORACLE_HOME and ORACLE_SID).

To test insertion/retrieving of multibyte characters, I have:
1- create a new table called 'TEST' which some columns types are NVARCHAR2
2- export NLS_LANG=Japanese_Japan.JA16SJI
3- restart Oracle
4- insert into TEST a new row which some attributes are in japanese character set via a java porogram
5- if we select attributes in multibyte character from TEST table we retrieve exclamation points set: ¿¿¿¿¿¿
I think that we need configurations else. what do you think ?

--Best Regards,
Bilel

what is new in 10g

A reader, June 29, 2010 - 5:30 am UTC

Hi Tom,

I have read about utl_i18n somewhere. but this is not mentioned in Globalization guide. what is this package.
How this can help in multilingual database applications.

I am still going through globalization guide - the answer I am seraching for is -

I have a table where I have a column whihc stores data in English as well as Chineese.

when I query on that table using

select * from t where col1 = 'ABCD' it works ok.

when I query

select * from t where col1 = 'chineeze value' then it does not return any rows..

WHY ?

I am using Oracle SQL developer.
If you could just point to some document, that would be really a help.

Thanks
Tom Kyte
July 06, 2010 - 12:48 pm UTC

utl_i18n is documented, just use the documentation search for it.

but to your question, insufficient data to answer.

what is the character set of the database
what is the version of the database
what is the datatype of the column you are comparing to
what is the national character set of the database
what is the characterset of the client???

English, Chinese, etc

Duke Ganote, July 06, 2010 - 1:28 pm UTC

VARCHAR2(xx CHAR)

Eugene Desyatnik, April 14, 2011 - 7:01 am UTC

Dear Tom,

The second part of the original question, marked as (2), has still one interesting nuance which I am not sure was fully addressed.

When the post said "So does this mean that now we dont need to use Nchar or nvarchar2 datatype", I believe the implication is still an interesting one.

The Oracle default for VARCHAR2(20) is VARCHAR2(20 BYTE). However, by having the VARCHAR2 datatype also support VARCHAR2(20 CHAR) -- meaning, accommodate literally 20 characters, even if, due to their character set, that will exceed 20 physical bytes of storage -- it makes one think that VARCHAR2(20 CHAR) is already somehow supporting international e.g. Chinese characters.

If so, I am left with the same question -- why is NVARCHAR2() additionally needed?

Conversely, if -- unlike the NVARCHAR2() datatype -- the VARCHAR2() does not support international or special characters, why then would there be variability between the length that a VARCHAR2(20 BYTE) vs VARCHAR2(20 CHAR) can accommodate?

Is this simply due to the existence of characters that are already multi-byte i.e. will not fit in VARCHAR2(1 BYTE), but not yet truly international i.e. not requiring NVARCHAR2?

Perhaps my lexicon can benefit from a more precise way to refer to these two seemingly different classes of characters.

Thanks!
Eugene.
Tom Kyte
April 14, 2011 - 10:11 am UTC

the N datatypes are used to store data in a different characterset from the base database character set. I don't see the connection to using 20 byte or 20 character directly?

varchar2 = database character set (could be single or multibyte)
nvarchar2 = national (extra) character set ( will be multibyte)