Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Eduardo.

Asked: December 03, 2004 - 2:36 pm UTC

Last updated: April 02, 2014 - 5:40 am UTC

Version: 9.0.1

Viewed 50K+ times! This question is

You Asked

Hi Tom,

I have a new vended application that requires to set NLS_LENGTH_SEMANTICS=CHAR. I am planning on putting it on an existing instance which uses the default value of bytes. Can you please clarify the following:

1. Does changing NLS_LENGTH_SEMANTICS in the init.ora affect existing char columns?
2. Is it recommended to change NLS_LENGTH_SEMANTICS database wide or per session?
3. What does variable-width multibye character set mean in the documentation </code> http://download-west.oracle.com/docs/cd/A91202_01/901_doc/server.901/a90190/ch1120.htm#REFRN10658 <code>?


Thank you in advance.

and Tom said...

1) no

2) be nicer to change at the session level. anything that changes the database wide setting affects *everything*

3) it means that a single character might take 1 byte, or 2 bytes or 3 bytes or N bytes -- each character could take a different number of bytes to store, the width of the character in bytes is variable.

Rating

  (18 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Max width occupied by a single char

A reader, December 02, 2005 - 11:19 am UTC

Tom,
What is the maximum number of bytes a single character might occupy, assuming the database has a char set of UTF8.

Thank you



Tom Kyte
December 02, 2005 - 11:41 am UTC

</code> http://www.fileformat.info/info/unicode/utf8.htm <code>

It looks like "3"

Alberto Dell'Era, December 02, 2005 - 3:50 pm UTC

> It looks like "3"

It is definitely "3" for UTF8:

</code> http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14225/appunicode.htm#i634604

But if you use the "new and suggested by Oracle" AL32UTF8 charset, it is "4", thanks to supplementary characters (click on "table B-2" on the link to get a nice breakdown by character type).

It's worth noting that the Euro Symbol needs 3 bytes.

--

For Italian almost all chars need 1 byte, except the "accented chars" (à,è,ò, etcetera) that take 2 bytes, and the Euro Symbol that takes 3 bytes. So basically - a very modest "overhead" since the frequency of accented chars are around 2-3% in a string on average.

For latin-derived languages such as French, Spanish, German, it's almost the same - just increase the frequency of "funny-looking" (to an English native speaker ;) chars (such as the accented chars), probably to 10%.

--

In passing, let me share an outstanding reference on charsets:

http://czyborra.com/ <code>

I use it so frequently that I've committed the URL to memory :)

Tom Kyte
December 03, 2005 - 10:14 am UTC

thanks!

NLS_LENGTH_SEMANTICS

Mike, April 26, 2006 - 2:24 pm UTC

In the Oracle® Database SQL Reference
10g Release 2 (10.2) Manual

</code> http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements001.htm#i54330 <code>

the following is stated for both the CHAR and VARCHAR2 datatypes:
For performance reasons, Oracle recommends that you use the NLS_LENGTH_SEMANTICS parameter to set length semantics and that you use the BYTE and CHAR qualifiers only when necessary to override the parameter.

We're currently trying to upgrade from 9.2 to 10(.2 I think) and the DBA has run into a problem in our DEVL environment with some non-ascii data that was sent to us by an offsite system. It's in a VARCHAR2(30) field, for example, and the underlying definition of the datatype is, of course, the default of bytes. In the past, we've just ignored the '?' in the data, but I'm wondering if it really is some UNICODE character that we currently don't support.

I'm trying to decide if we should just change the default to be CHAR. Are there any gotcha's that we need to be aware of. We are supposed to try the migration in TEST later this week with an implementation in PROD sometime next month.

Tom Kyte
April 27, 2006 - 8:10 am UTC

the nls length semantics though would not have any bearing on HOW the data is interpreted - the character set would.

I'm not sure what link you see between the two?

Misinterpretation of the documentation

Mike, April 27, 2006 - 8:48 am UTC

Sorry, apparently I was misinterpreting what I was reading. I thought the NLS_LENGTH_SEMANTICS governed how the multi-byte characters were handled.

Our DBA informed us that there’re some control characters in the data which caused problems in the import from the old database to the new. The control characters got expanded from single-byte to multiple-byte upon import, and therefore were too long for the table columns, which were defined with a BYTE length of 10. If they’d been defined with a CHARacter length of 10, it would’ve worked fine because even if a character grew from 1 to 3 bytes, it’s still only one character. My development lead wants me to handle this by going through all of the tables in the database and finding all of the char and varchar2 fields and do this (for example):

ALTER TABLE MODIFY (field1 VARCHAR2(30 CHAR));

Is that the correct approach?

Tom Kyte
April 27, 2006 - 3:19 pm UTC

umm, no, what you say is not correct either. this would NOT have happened:

... The control
characters got expanded from single-byte to multiple-byte upon import, and
therefore were too long for the table columns, which were defined with a BYTE
length of 10. If they’d been defined with a CHARacter length of 10, it would’ve
worked fine because even if a character grew from 1 to 3 bytes, it’s still only
one character. ...

that is NOT the way multi-byte data works at all. control characters will not be automagically subsumed into something else and transmorphed into yet something else. That would be scary wouldn't it.

Your development lead is "not being terribly technically accurate" here.


please tell me:

a) what is your character set
b) what is the character set of the input data
c) how do you define "control character" - what exactly does that term mean to you.

Restrictions for use

Mathias, December 08, 2006 - 8:54 am UTC

Is there some restrictions in it's use?

I'm told that it is not supported to have length_semantics set to char when creating "oracle structures".

The idea is that having this set when creating an AQ queue_table would lead to a queue_table that is not supported and could experience extreme performance issues as a result. Is there any truth to this?

Tom Kyte
December 09, 2006 - 12:17 pm UTC

who told you that? and what are "oracle structures" specifically. they must have referenced a note or something?

Anecdotal evidence

Mathias, December 10, 2006 - 9:25 am UTC

I have not been able to get any documentation for this, other than the reference to anecdotal evidence that it helps to change it back to BYTE. No recreation is said to be needed, but that parameter should somehow improve performance of SQL statements.

Oracle support has said this in a TAR with no supporting documentation and the anecdotal evidence turns out to be postings here (other people, not you).

As we are about to do this in production, I just wanted to see if you saw more of a benefit than I can see. Support is apparently not interested in looking elsewhere (AWR, system state dumps, traces...) until this has been tested in our production system.

Do you want me to post the TAR # here? I will post again if I get some form of documentation for this.

Tom Kyte
December 10, 2006 - 7:39 pm UTC

but how would this affect performance? It is a table create concept.



Anecdotal followup

Mathias, December 10, 2006 - 9:02 pm UTC

Thank you for validating my feeling that this theory is not obvious.

The postings they have found have reported solving performance issues. It has not indicated how bad the performance is or what kind of performance problem it is that has been experienced.

One reason I've heard for why this would help would be that the translation from character to byte takes some time. But then the object would need to be recreated, and that is said to not be needed.

The problem we have is that the dequeue operation experiences massive contention. It ends up taking 70% of the database server activity and maxing out memory. All contention is on library cache and it is for 50% of the database activity.

I find it very surprising that this parameter could cause this by how it is set, even when no objects are created. The contention is all on an "unknown:" SQL. The SQL ID maps to a LOB retriever (the messages are XMLTYPE payloads) and the table4_ reference for them maps to our AQ table for XML messages.

I have not been able to find a way to get what happens from Oracle instrumentation as this is a background service for the dequeue call, but it results in over 800 million latches per hour for library cache and library cache lock. The wait time per hour is 27 hours spread over many connections.

The fact that there is a library cache issue is undisputable, the problem is with how to prove what it is and how we can solve or work around it. Would you have any pointers for how we would go about identifying the actual problem this LOB retriever service is encountering?

I'll try to use your answers to get the people involved, both Oracle support and the on site production support DBA's, to help with identifying more documentation about this.

The concern we have is that at the end of the month another system will need to be installed on this server and that will cause our performance to be further degraded as we're already have sever performance problems and the CPU is maxed out. That is just background to why I'm so concerned with testing a parameter that probably will do nothing in production. I'm afraid we'll just lose a week or so as the procedure for getting it done in production takes several days.

Tom Kyte
December 10, 2006 - 9:18 pm UTC

lobs are not affected by the nls character length stuff...


LOB and char length semantics

Mathias, December 10, 2006 - 9:54 pm UTC

I understand that it isn't impacted that way. They idea is that there somehow is a bug hiding that has not been identified and this somehow causes cursors to not be shared and that in turn causes this extreme performance issue.

I guess a bug can impact in many odd ways, but this sounds as wishful thinking and I'm afraid we'll be running late on fixing this once this has been tested.

Thank you so much for your answers and help in verifying my fear with this. I hope this thread will help in getting some attention to the need to produce documentation for why this is good and also increased understanding that this change has a very small chance of solving this problem.

setting ¿NLS_LENGTH_SEMANTICS¿

Sonali, April 15, 2009 - 3:47 pm UTC

Hi,
I have set this parameter to char on session level. How do i check doing the query on database that it worked? or now i have CHAR instead of BYTE.
Tom Kyte
April 15, 2009 - 4:43 pm UTC

I do not know what you mean.

If you set at the session level, why would you want to look at the "database" level?

the session that set it can query v$parameter:

ops$tkyte%ORA10GR2> show parameter semantics

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
nls_length_semantics                 string      BYTE
ops$tkyte%ORA10GR2> alter session set nls_length_semantics = 'CHAR';

Session altered.

ops$tkyte%ORA10GR2> show parameter semantics

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
nls_length_semantics                 string      CHAR

asktom fan, June 06, 2013 - 4:31 pm UTC

Hi Tom,

When nls_length_semantics is set the system level, it appears that it does not affect the varchar2 columns semantics: the column will default to byte even though this parameter is set to char.  Only when nls_length_semantics is set at session level, it will have an effect. Is this a bug?

Please see the followiing test case:

 SQL> select * from v$version;

 BANNER
 --------------------------------------------------------------------------------
 Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
 PL/SQL Release 11.2.0.1.0 - Production
 CORE    11.2.0.1.0      Production
 TNS for Linux: Version 11.2.0.1.0 - Production
 NLSRTL Version 11.2.0.1.0 - Production

 
 SQL> alter system set nls_length_semantics = 'CHAR' scope = memory;

 System altered.

 SQL> show parameter nls_length_semantics

 NAME                                 TYPE        VALUE
 ------------------------------------ ----------- ------------------------------
 nls_length_semantics                 string      CHAR

 SQL> create table t3(c1 varchar2(5));

 Table created.

 SQL> select table_name, data_type, char_used from dba_tab_columns where table_name in ('T3');

 TABLE_NAME                     DATA_TYPE            C
 ------------------------------ -------------------- -
 T3                             VARCHAR2             B


 SQL> alter SESSION set nls_length_semantics = 'CHAR' ;

 Session altered.

 SQL> show parameter nls_length_semantics

 NAME                                 TYPE        VALUE
 ------------------------------------ ----------- ------------------------------
 nls_length_semantics                 string      char
 
 SQL> drop table t3;

 Table dropped.

 SQL> create table t3(c1 varchar2(5));

 Table created.

 SQL> select table_name, data_type, char_used from dba_tab_columns where table_name in ('T3');

 TABLE_NAME                     DATA_TYPE            C
 ------------------------------ -------------------- -
 T3                             VARCHAR2             C


Tom Kyte
June 06, 2013 - 5:54 pm UTC

it looks like nls_length_semantics should be "deferred", not "immediate" (see issys_modifiable here:
http://docs.oracle.com/cd/B28359_01/server.111/b28320/dynviews_2085.htm
for what that means)

if you were to have reconnected after changing it - it would have taken effect. Looks like a bug in the v$parameter description of it - it shows as immediate, but should show as deferred.


if you have support please file an SR or tell me you don't have support and I'll open a bug, thanks!

asktom fan, June 06, 2013 - 6:31 pm UTC

Hi Tom, 

Thanks for your quick response:

The behavior is the same even after reconnect:

 $ sqlplus a/a

 SQL*Plus: Release 11.2.0.1.0 Production on Thu Jun 6 13:24:56 2013

 Copyright (c) 1982, 2009, Oracle.  All rights reserved.


 Connected to:
 Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
 With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
 Data Mining and Real Application Testing options

 SQL> show parameter nls_length_semantics

 NAME                                 TYPE        VALUE
 ------------------------------------ ----------- ------------------------------
 nls_length_semantics                 string      CHAR

 SQL> create table t4 (c1 varchar2(5));

 Table created.


 SQL> select table_name, data_type, char_used from dba_tab_columns where table_name in ('T4');

 TABLE_NAME                     DATA_TYPE            C
 ------------------------------ -------------------- -
 T4                             VARCHAR2             B


Can you please file the bug?

Thanks!

Tom Kyte
June 06, 2013 - 6:54 pm UTC

I found an open bug on this:

Bug 1488174 : ALTER SYSTEM SET NLS_LENGTH_SEMANTICS/NLS_NCHAR_CONV_EXCP HAS NO EFFECT

nls_length_format

asktom fan, June 07, 2013 - 6:48 pm UTC

Thanks Tom.

After I restarted the database, the system level setting for nls_length_format takes effect.

nls_length_semantics=char

Amit, July 24, 2013 - 3:12 pm UTC

We are setting nls_length_semantics=char in session level.
After that we are creating one table create table ... select * from

Still we can that new table which got created having data type precision as byte insted of character .
Can you please advice hoe to achieve aforesaid .
Source table --
create table test1
(name varchar2(20 byte));

Target I am expecting to be as
create table test1
(name varchar2(20 char));



Tom Kyte
July 31, 2013 - 7:06 pm UTC

create table as select borrows the datatypes from what is selected. you are not telling us what types to use, the existing types are used.


either cast or create the table and then direct path insert into it.

ops$tkyte%ORA11GR2> create table t
  2  as
  3  select cast( dummy as char(20 char)) xxx,  dummy yyyy from dual;

Table created.

ops$tkyte%ORA11GR2>
ops$tkyte%ORA11GR2> desc t
 Name                                     Null?    Type
 ---------------------------------------- -------- ----------------------------
 XXX                                               CHAR(20 CHAR)
 YYYY                                              VARCHAR2(1)


What are the implications of changing to CHAR semantics?

Khalid Rahim, October 26, 2013 - 9:55 pm UTC

I am using DMU to convert an old US7ASCII to AL32UTF8 format. I know how to do the conversion and it works very well, I am concerned about only one field that is defined with BYTE semantics, but in order for conversion to succeed I need to convert that field to CHAR semantics.

I have a field "some_code CHAR(4)" with BYTE semantics in the original US7ASCII database, If I change that field to CHAR semantics, the conversion is successful in my test database. My concern is will the existing application code break? in other words, will the SQL, PL/SQL, embedded SQL, etc that currently exist will still work?

Elsewhere I have been reading in asktom, you say that SUBSTR, INSTR, and all those use CHAR semantics anyway, which is encouraging.

Thanks,
Khalid

Tom Kyte
November 01, 2013 - 9:01 pm UTC

I am not aware of any issues you'll have.

In fact, if you do not use char semantics for pretty much all of the fields - then I can think of issues you'll have.


for example, if you have a varchar2(20) - and that is 20 bytes, your application which could have put 20 characters in there before - will not be able to any more.

So in general, you might consider looking into using character semantics rather than byte for more than just this attribute.

Can I convert on IMPort ?

Tailor K., April 01, 2014 - 8:07 pm UTC

I have been making some tests; but I guess it is not possible to export a schema from a database with 'standard' charset (and default byte characters) to a unicode DB - it mantains the original byte setting. Is there anyway to circunvent this ?

Or should I use Oracle DMU? Not sure on how 'instruct' it to change the varchar2(nn byte) to varchar2(nn char) - some columns it did weird things, like changing varchar2(1000) to varchar2(4000 byte), wich is not what I want...
Tom Kyte
April 02, 2014 - 5:40 am UTC

I don't know what a "standard" character set is? do you mean "single byte"?

did you want to have the tables created with CHAR instead of BYTE semantics? is that what you are trying to ask?


that is, you'd like varchar2(1000) to be interpreted as varchar2(1000 CHAR) during a create table?

and when you say imp/exp you really mean impdp/expdp right? data pump is what we are talking about - since exp is not supported against current releases of the database...

More info

Tailor K., April 02, 2014 - 12:40 pm UTC

I apologize for not being specific... I think that knowing what we do and pretend to do will help.
We do use an internal database for our application schema structures, used to deploy on new clients. This database uses a single-byte character set, WE8MSWIN1252. We are upgrading this database among the years, since Oracle 8i... So all DDL were using varchar2(nn) without knowing about byte or char, using the implicit database standard of byte in that case. Then we do an expdp of this initial database to deploy on new clients (So we do not do all SQL DDL every time).

One small client asked to use Oracle XE; since it has a unicode charset, we are not able to sucessfully import on it, because of special characters that uses more than on byte. We do not even have need of unicode, once we do work only with portuguese, spanish and english.

Is it possible to do an automated conversion of the implicit varchar2(nn byte) into varchar2(nn char)? The application itself can deal with unicode, so it will be fine to it.

Maybe I have to convert my origin database to unicode - but in that case, will its expdp data works fine again databases that will remain on WE8MSWIN1252?

And how could I do that conversion, if it is required? The Oracle Migration assistant to Unicode (DMU) made some weird things...

To import syngle-byte data into UNICODE...

J. Laurindo Chiappa, April 07, 2014 - 10:09 pm UTC

Tailor, let´s see the answer from Tom but I personally really do not recommend to change 11g XE characterset (had too many issues in the past after that, such as sessions disconnecting without warning and instabilities in general) or to set NLS_LENGTH_SEMANTICS (in my experience it is simply IGNORED by XE import tools, be imp or impdp).
So the only viable option is the one I use in my customers running XE : do an import with metadata only, run a script that changes BYTE to CHAR in VARCHAR2 columns and later do an import with data only - http://manchev.org/2011/08/migrating-single-byte-encoding-data-to-al32utf8/ have an example.

Best regards,

J. Laurindo Chiappa

varchar2(10 char) can only hold 5 special characters.

A reader, April 24, 2014 - 5:40 pm UTC

Hi Tom

My database is configured with charecter al32utf8.

I have verified that nls_length_semantics is set to char in both instance and session level.

When I create a column of varchar2(10 char) I would expect it to be able to contain 10 characters nomatter how many bytes they take, but I am only allowed to insert 5.

Can you please explain why?

SQL> select * from nls_database_parameters where PARAMETER  = 'NLS_CHARACTERSET' ;

PARAMETER
------------------------------
VALUE
--------------------------------------------------------------------------------
NLS_CHARACTERSET
AL32UTF8


SQL> create table test (col1 varchar2(10 char) ) ;

Table created.

SQL> desc test ;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 COL1                                               VARCHAR2(10 CHAR)

SQL> insert into test values ('éééééééééé') ;
insert into test values ('éééééééééé')
                         *
ERROR at line 1:
ORA-12899: value too large for column "SYS"."TEST"."COL1" (actual: 20, maximum:
10)


SQL> insert into test values ('éééééé') ;
insert into test values ('éééééé')
                         *
ERROR at line 1:
ORA-12899: value too large for column "SYS"."TEST"."COL1" (actual: 12, maximum:
10)


SQL> insert into test values ('ééééé') ;

1 row created.

SQL>

Thank you in advance,

Merk