UTF8
sanjai, August 18, 2002 - 10:58 pm UTC
Thanks for your prompt response. Just one more clarification.
In Oracle9i by using varchar2( 20 CHAR ) will the perforamce be same as using varchar2(20) in Oracle8i ?
August 19, 2002 - 7:23 am UTC
Trick question.
you would have to compare varchar2(20 char) with varchar2(100) or varchar2(120).
It'll perform like they would. a varchar2(20) might only be able to hold 3 or 4 characters whereas the varchar2(20) would be able to hold 20.
It is a syntactic "niceness", we do the multiplication your DBA's are telling you to do now.
UTF8
sanjai, August 19, 2002 - 11:14 am UTC
Thanks a lot, Tom.
UTF8
Zahir, September 26, 2006 - 1:32 am UTC
Hi Tom,
We need to export some french characters from our Oracle DB into a file,then we need to load the same onto another system.Now the file loads correctly only if we open the file and manually do SaveAS UNIX UTF8 format.
We need to have an automated process for this.So my question is ,is there a way to automatically save the file in UNIX UTF8 format?
Zahir
September 26, 2006 - 2:38 am UTC
by using a client program capable of creating a UTF8 file.
why wouldn't you just use export and import??
Informative
Kulkarni, September 26, 2006 - 3:33 am UTC
Hi Tom,
I am frequent visitor of your site and always got the require clarification.
Now I have one project where we are migrating from HP-UX to SUN and aslo upgradig 9i to 10g. Our application team want to change character set to UTF8. We are planning to create a blank database in 10g and then import the dump from 9i.
As I need to change ex..varchar2(20) to varchar2(20 char) I plan to generate indexfile using export with syntax to create tables. Then I need to edit it to change varchar2(20) to varchar2(20 char). This is very tedious job. Can you give any alternative.
sachin, September 26, 2006 - 5:49 am UTC
Hello Tom ,
Ok , plz let me know. one more thing that is it requried some more changes in oracle forms after converting NLS_LANG parameter for forms and Database changes.
Plz help us.
can u give me brief idea ? how can we do ?
Thanks in Advance.
September 26, 2006 - 3:40 pm UTC
plz? German postal codes again - why do they keep coming up? I've never userstood the relationship between them and technical things about Oracle?
for the rest of the question, only one word:
eh?
there are no changes required. You set NLS_LANG on the client in order to support the requirements of the client. If the NLS_LANG on the client is different from the server, be prepared and expect data conversions to happen, that is all.
UTF8
Zahir, September 27, 2006 - 10:23 pm UTC
Hi Tom,
Thankx a lot for the reply.
My requirement is just create a file and save it in UNIX UTF8 format through a plsql code.
Could you please help?
Regards,
Zahir
September 28, 2006 - 2:55 am UTC
what is your database character set first.
Character Set Performance
Mark, September 28, 2006 - 12:54 am UTC
Can different character sets have an impact on performance?
September 28, 2006 - 3:03 am UTC
well, multi-byte can add a bit of extra processing since characters are not single bytes anymore. So things like "substr" have to think a little harder.
Also, instead of a binary sort (like with ascii), you might have to use a collating sequence to sort.
In general however, you pick a character set based on what you need to store and nothing else.
UTF8
Zahir, September 28, 2006 - 3:37 am UTC
==== original message===================
Hi Tom,
Thankx a lot for the reply.
My requirement is just create a file and save it in UNIX UTF8 format through a
plsql code.
Could you please help?
Regards,
Zahir
==========================================
Hi Tom,
Our Database characterset is UTF8.
Regards,
Zahir
September 28, 2006 - 4:22 am UTC
check out support Note 227531.1, you need to put out the 2 byte "I am a unicode file" byte marker (that is what opening and saving the file is doing for you)
UTF8
Zahir, September 28, 2006 - 4:41 am UTC
--------Original Msg------------------
==== original message===================
Hi Tom,
Thankx a lot for the reply.
My requirement is just create a file and save it in UNIX UTF8 format through a
plsql code.
Could you please help?
Regards,
Zahir
==========================================
Hi Tom,
Our Database characterset is UTF8.
Regards,
Zahir
Followup:
check out support Note 227531.1, you need to put out the 2 byte "I am a unicode
file" byte marker (that is what opening and saving the file is doing for you)
----------------------------------------
Hi Tom,
Thankx for the reply.
But were can I find the support Note 227531.1 ??
Regards,
Zahir
September 28, 2006 - 6:54 am UTC
metalink.oracle.com
UTF8
Zahir, September 28, 2006 - 10:45 pm UTC
==========Original Msg===================
Hi Tom,
Thankx for the reply.
But were can I find the support Note 227531.1 ??
Regards,
Zahir
Followup:
metalink.oracle.com
==========================================
Hi Tom,
Thankx for the link,but I am not able to find it.
Actually I am very busy in my rollout.I need to urgently solve this issue.Could you paste me the content?
Thankx in Advance.
Regards,
Zahir
September 29, 2006 - 8:02 am UTC
no, it really isn't an excellent idea for me to reproduce their content (they update, i wouldn't update it)
all you had to do was plug the number into the very very first simple search screen and it pops right up? it wasn't hidden or anything.
UTF8
Zahir, September 29, 2006 - 3:07 am UTC
Hi Tom,
I donot have access to metalink.oracle.com.
Can you please help me with the content.
Regards,
Zahir
UTF8
Zahir, October 02, 2006 - 8:04 am UTC
-------------Orig msg-----------------------
==========Original Msg===================
Hi Tom,
Thankx for the reply.
But were can I find the support Note 227531.1 ??
Regards,
Zahir
Followup:
metalink.oracle.com
==========================================
Hi Tom,
Thankx for the link,but I am not able to find it.
Actually I am very busy in my rollout.I need to urgently solve this issue.Could
you paste me the content?
Thankx in Advance.
Regards,
Zahir
Followup:
no, it really isn't an excellent idea for me to reproduce their content (they
update, i wouldn't update it)
all you had to do was plug the number into the very very first simple search
screen and it pops right up? it wasn't hidden or anything.
----------------------------------
Hi Tom,
Could you please tell me which search are you talking about and were it is on this site.
Regards,
Zahir
October 02, 2006 - 8:20 am UTC
umm, well I emailed it to you last week, look in your inbox.
Or, just type the note number *into the search field* and hit enter.
nls_length_semantics did not help
Bheemasen M Kulkarni, October 31, 2006 - 11:04 pm UTC
Hi Tom,
Greetings
I set nls_length_semantics=char and then imported export file. But still I am getting errors
Column 9 987309
Column 10 1
Column 11 03-MAR-2004:14:42:17
Column 12 03-MAR-2004:14:42:17
Column 13 956
Column 14 956
Column 15 Y
IMP-00019: row rejected due to ORACLE error 12899
IMP-00003: ORACLE error 12899 encountered
ORA-12899: value too large for column "PLD"."ATTENDEE_CREDIT_CARD"."ENCRYPTED_CREDIT_CARD_NBR" (actual: 25, maximum: 24)
Column 1 502174001.
Now what I need to do?. Should I run csscanner and make out which table columns need a change and run through that tedious process.
November 01, 2006 - 8:40 am UTC
oh, that column name scares me, encrypted credit card nbr.
it is encrypted? are you using RAW for that? If not, that is a huge problem that you will need to fix!!
RAW columns
Kullkarni, November 06, 2006 - 1:50 am UTC
Hi Tom,
Yes we use RAW data type for that column. More over import is taking lot of time.....though export completed within 3.5 hrs import is taking more than 2 days!!!. How can I improve it?.
If I use nls_length_semantics=char is there no need run csscan and identify columns whose width needs to be changed.
November 06, 2006 - 9:53 am UTC
raw has nothing at all to do with nls_length_semantics
raw is raw, it is just BYTES, nothing more.
you have allocated 24 bytes, is that sufficient for the data you extracted.
setting nls_length_semantics
Kulkarni, November 08, 2006 - 7:59 am UTC
Hi Tom ,
Greetings
"you have allocated 24 bytes, is that sufficient for the data you extracted. "
I did not follow this. Setting nls_length_semantics=char will allow 24 bytes?
Will you please explain this.
Could you suggest how I can speed up import.Import is taking days!!. I have around 8GB export file(multiple files). Export finished within 3 hrs!!!
November 08, 2006 - 8:34 am UTC
you have a raw
raw is raw
raw specifies bytes.
raw will never look at nls_length_semantics
because raw is raw
so you have a raw 24 field, is that what you should have, did the source system have a raw(24) field (unlikely since RAW IS RAW AND IS JUST BYTES, not characters, no characterset conversion happens here)
example
Leonardo H, November 08, 2006 - 11:17 am UTC
Hi Tom, in 9i I create this table:
create table t
( x varchar2( 20 CHAR ),
...
)
/
You say that "x varchar2(20) sets aside 20 BYTES, not 20
characters"... please give my a practical example of the differences between them.
Thanks!
November 08, 2006 - 7:59 pm UTC
<quote src="Expert Oracle Database Architecture">
Bytes or Characters
The VARCHAR2 and CHAR types support two methods of specifying lengths:
* In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multibyte character set.
* In characters: VARCHAR2(10 char). This will support to up 10 characters of data, which could be as much as 40 bytes of information.
When using a multibyte character set such as UTF8, you would be well advised to use the CHAR modifier in the VARCHAR2/CHAR definitionthat is, use VARCHAR2(80 CHAR), not VARCHAR2(80), since your intention is likely to define a column that can in fact store 80 characters of data. You may also use the session or system parameter NLS_LENGTH_SEMANTICS to change the default behavior from BYTE to CHAR. I do not recommend changing this setting at the system level; rather, use it as part of an ALTER SESSION setting in your database schema installation scripts. Any application that requires a database to have a specific set of NLS settings makes for an unfriendly application. Such applications generally cannot be installed into a database with other applications that do not desire these settings, but rely on the defaults to be in place.
One other important thing to remember is that the upper bound of the number of bytes stored in a VARCHAR2 is 4,000. However, even if you specify VARCHAR2(4000 CHAR), you may not be able to fit 4,000 characters into that field. In fact, you may be able to fit as few as 1,000 characters in that field if all of the characters take 4 bytes to be represented in your chosen character set!
The following small example demonstrates the differences between BYTE and CHAR and how the upper bounds come into play. Well create a table with three columns, the first two of which will be 1 byte and one character, respectively, with the last column being 4,000 characters. Notice that were performing this test on a multibyte character set database using the character set AL32UTF8, which supports the latest version of the Unicode standard and encodes characters in a varyiable length fashion using from 1 to 4 bytes for each character:
ops$tkyte@O10GUTF> select *
2 from nls_database_parameters
3 where parameter = 'NLS_CHARACTERSET';
PARAMETER VALUE
------------------------------ --------------------
NLS_CHARACTERSET AL32UTF8
ops$tkyte@O10GUTF> create table t
2 ( a varchar2(1),
3 b varchar2(1 char),
4 c varchar2(4000 char)
5 )
6 /
Table created.
Now, if we try to insert into our table a single character that is 2 bytes long in UTF, we observe the following:
ops$tkyte@O10GUTF> insert into t (a) values (unistr('\00d6'));
insert into t (a) values (unistr('\00d6'))
*
ERROR at line 1:
ORA-12899: value too large for column "OPS$TKYTE"."T"."A"
(actual: 2, maximum: 1)
This example demonstrates two things:
* VARCHAR2(1) is in bytes, not characters. We have single Unicode character, but it wont fit into a single byte.
* As you migrate an application from a single-byte fixed-width character set to a multibyte character set, you might find that the text that used to fit into your fields no longer does.
The reason for the second point is that a 20-character string in a single-byte character set is 20 bytes long and will absolutely fit in a VARCHAR2(20). However a 20-character field could be as long as 80 bytes in a multibyte character set, and 20 Unicode characters may well not fit in 20 bytes. You might consider modifying your DDL to be VARCHAR2(20 CHAR) or using the NLS_LENGTH_SEMANTICS session parameter mentioned previously when running your DDL to create your tables.
If we insert that single character into a field set up to hold a single character, we will observe the following:
ops$tkyte@O10GUTF> insert into t (b) values (unistr('\00d6'));
1 row created.
ops$tkyte@O10GUTF> select length(b), lengthb(b), dump(b) dump from t;
LENGTH(B) LENGTHB(B) DUMP
---------- ---------- --------------------
1 2 Typ=1 Len=2: 195,150
That INSERT succeeded, and we can see that the LENGTH of the inserted data is one characterall of the character string functions work character-wise. So the length of the field is one character, but the LENGTHB (length in bytes) function shows it takes 2 bytes of storage, and the DUMP function shows us exactly what those bytes are. So, that example demonstrates one very common issue people encounter when using multibyte character sets, namely that a VARCHAR2(N) doesnt necessarily hold N characters, but rather N bytes.
The next issue people confront frequently is that the maximum length in bytes of a VARCHAR2 is 4,000, and in a CHAR it is 2,000:
ops$tkyte@O10GUTF> declare
2 l_data varchar2(4000 char);
3 l_ch varchar2(1 char) := unistr( '\00d6' );
4 begin
5 l_data := rpad( l_ch, 4000, l_ch );
6 insert into t ( c ) © values ( l_data );
7 end;
8 /
declare
*
ERROR at line 1:
ORA-01461: can bind a LONG value only for insert into a LONG column
ORA-06512: at line 6
That shows that a 4,000-character string that is really 8,000 bytes long cannot be stored permanently in a VARCHAR2(4000 CHAR) field. It fits in the PL/SQL variable because in PL/SQL a VARCHAR2 is allowed to be up to 32KB in size. However, when it is stored in a table, the hard limit is 4,000 bytes. We can store 2,000 of these characters successfully:
ops$tkyte@O10GUTF> declare
2 l_data varchar2(4000 char);
3 l_ch varchar2(1 char) := unistr( '\00d6' );
4 begin
5 l_data := rpad( l_ch, 2000, l_ch );
6 insert into t ( c ) values ( l_data );
7 end;
8 /
PL/SQL procedure successfully completed.
ops$tkyte@O10GUTF> select length( c ), lengthb( c )
2 from t
3 where c is not null;
LENGTH(C) LENGTHB(C)
---------- ----------
2000 4000
And as you can see, they consume 4,000 bytes of storage.
</quote>
CHAR(3) vs VARCHAR2(3)
Alex, November 08, 2006 - 11:56 am UTC
Hi Tom
How does CHAR(2) and VARCHAR2(2) compare in terms of
1 storage efficiency
2 performance
What about CHAR(1) and VARCHAR2(1) ?
I am thinking with variable length you would need to reserve bytes to store the 'used' data length.
Thanks
November 08, 2006 - 8:02 pm UTC
why, why do you want to use a char(2)
a char(2) is simply a varchar2(2) that is always blank padded (when not null) to the full two characters.
it'll cause problems when people use varchar2 binds - varchar2's are not blank padded and you'll "miss" data.
a char(n) is EXACTLY the same as varchar2(n) but blank padded - they BOTH use the same length byte.
char(2) v varchar2(2)
Alex, November 08, 2006 - 8:38 pm UTC
Tom
Sorry let me clarify, I speaking strictly on data where there are no blanks.
for example
2-character US State Code
or 3-character Airport Code
With data like this, which is more efficient between char or varchar2 ?
November 08, 2006 - 8:50 pm UTC
neither, they would be *THE SAME*
as stated above:
...
a char(n) is EXACTLY the same as varchar2(n) but blank padded - they BOTH use
the same length byte.
.....
ora-12899
Gautham, July 29, 2010 - 4:44 am UTC
Hi Tom,
We are experiencing a situaion since many days while importing the data from oracle 9i (solaris OS) to 10g (AIX OS).
The source database has the following nls parameters:
PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET WE8ISO8859P1
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 9.2.0.4.0
Target database has the following nls parameters:
PARAMETER VALUE
------------------------------ -----------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET AL32UTF8
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
*******************************************************
While importing the data we are gettin the error
ORA-12899: value too large for column "CHN"."BFMASTER"."PMID"
We realised that charaterset in source and target database is different. We couldnt change the characterset in any of the database at database level. So we changed at a session level i.e..
$ export NLS_CHARACTERSET=AL32UTF8
and exported a sample table and checked it by importing it in target database,
But,still we got the same error.....
Later We also changed NLS_LENGTH_SEMANTICS=CHAR at session level in target database and we are still getting the same error.
Please, We tried every possible solution and we are not able to rectify it, You are our last hope Tom, Please help us.
July 29, 2010 - 9:47 am UTC
... NLS_LENGTH_SEMANTICS=CHAR ...
that only counts during table creation, not during insert.
the problem is you have a bit of text in your single byte database (the source) that is N characters long. In the single byte database it takes N bytes to store N character always.
In the multi-byte database, it could take up to 4*N bytes to store that same data. If you used byte semantics when you created the table, your varchar2(M) column is M bytes - not characters. If you try to store a string in there that takes more than M bytes - you will fail.
That is what you are seeing here, you have some data that takes N bytes in the single byte database but N+M bytes in the multi-byte database and the column is too small.
You will have to recreate your tables using CHAR semantics OR live with the fact that some of your old data just won't fit.
value too large for column
Gautham, July 29, 2010 - 9:59 am UTC
Hi Tom,
Continuation of above question
We have also tried setting the NLS_CHARACTERSET to we8is0...
in target database at session level...before importing,...
Asusual it didnt work.
July 29, 2010 - 12:06 pm UTC
the DATABASE has a characterset, all data in that database will be stored in that characterset.
You have a string that in UTF takes more bytes to store than you have allocated.
You have a string say "abc" that takes 3 bytes in a single byte character set and hence fits in a varchar2(3), but takes maybe 4 bytes in UTF8 (which is what it will be stored in - the DATABASE character set) and it won't fit into a varchar2(3) in that database.
UTF8
A reader, July 29, 2010 - 3:28 pm UTC
They need to run CSSCAN and read the report generated.
It tells you exactly which columns will not fit in the new databse characterset and need to be expanded.
August 02, 2010 - 7:47 am UTC
good point, I should have mentioned that in addition to telling them "why" it is happening, thanks!
NLS_LENGTH_SEMANTICS performance
Ramesh, January 02, 2013 - 8:44 am UTC
Hi Tom,
Our current NLS_LENGTH_SEMANTICS is BYTE and SCOPE='BOTH'. Our product has to support UTF8 charset.We have about 1300 VARCHAR2 type columns starting from 64 to 4000 bytes.
1. If we want to support UTF8 charset, do we really need to set NLS_LENGTH_SEMANTICS to CHAR, ?
2. Does this change will have impact on performance other than working with functions like substr?or it is just a matter of database growing up in size?
Thank you very much.
January 04, 2013 - 2:54 pm UTC
1) no, of course not.
but you do need to understand that your varchar2(N) definitions specify a number of BYTES. So, you might not get 64 characters into a varchar2(64) - you'll get 64 bytes which might be far fewer than 64 characters.
So, when you specify varchar2(64) - are you OK with it being 64 bytes - or do you want to have it be 64 char?
If 64 char - you can either change your definition (my preference) to include the word CHAR - varchar2(64 char) - or you can set this parameter.
2) this parameter has no effect on performance - but using a multibyte characterset could make string operations take a little longer (we cannot assume the 5th character is the 5th byte - we have to process the data differently) and consume more storage.
remember also - that 4000 bytes is the limit on varchar2 length, there will be 4000 character strings that you just cannot store in 4000 bytes no matter what.
Restricting Characters in a UTF8 Database
A Reader, December 26, 2013 - 3:36 am UTC
Version Information
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE 11.2.0.2.0 Production
TNS for 64-bit Windows: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
We are building an application that will store data about companies. It will get feeds from 100+ sources. The database character set is AL32UTF8
The user requirement is that the database should allow storing any 'Latin' and 'Arabic' characters. Looking at Unicode specification (
http://www.unicode.org/charts/ ) the Latin and Arabic characters in Unicode are in the following ranges:
- Basic Latin (ASCII) [0000-007F]
- Latin-1 Supplement [0080-00FF]
- Latin Extended-A [0100-017F]
- Latin Extended-B [0180-024F]
- Latin Extended-C [2C60-2C7F]
- Latin Extended-D [A720-A7FF]
- Arabic [0600-06FF]
- Arabic Supplement [0750-077F]
- Arabic Extended [08A0-08FF]
Questions I have
1. Once we get data from a source its first loaded in a temporary staging table. Is there an easy way to query the staging table to find out if specific column (e.g. Company Name) have any data that is not covered in the above acceptable Character ranges (so that it can be rejected and not be loaded in the master tables). Can
2.Since we would be getting large volumes of such data, the check should ideally work in reasonable amount of time.
3.We need to create a specification document for our data providers. I am wondering what we need to specify in that document - will it suffice to say that the files should be encoded in UTF8 and the characters should be in the code ranges that our application accepts (specified above)?
Thanks in advance for your help.