Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, robert.

Asked: December 28, 2000 - 11:29 pm UTC

Last updated: January 28, 2021 - 9:44 am UTC

Version: 8.1.6 and 7.3.4

Viewed 100K+ times! This question is

You Asked

Hi TOM,

Some question on Character set and NLS.

1) What is the different between US7ASCII and WE8ISO8859P1 character set and NLS?

2) Is there any problem for the DB with two diferrent character sets to interact,
- trigger
- dblink
- export and import accross the DB.

3) In your opinion, which is more popular or "flexible", US7ACII or WE8ISO8859P1.

4) I have dbs with US7ASCII and WE8ISO8859P1, do I need to worry ?


Thanks in advance for you response.
robert


and Tom said...

1) will NLS is national language support and encompasses lots of things like how to display currency, whether we use a comma or a dot to separate numbers, how the name of a day is spelled and so on.

character sets are how we store data. US7ASCII for example is a 7bit character set, the high bit is "stripped". The WE8ISO8859P1 is the standard western european 8 bit character set. It can store 2 times as many characters as the 7bit one can (special characters with umlats and such).

2) there are no "problems" just considerations. If I export from an 8bit database and import into a 7bit -- I'll undergo a characterset conversion. My 7bit databasebase cannot store any 8bit characters, they will be mapped into the 7bit characterset (eg: the data will be DIFFERENT, I'll lose some characters that have the high bit set, they'll become some other character based on the rules for converting 8bit to 7bit).

Going the other way, from 7bit to 8bit won't encounter any issues since the 7bit US7ASCII is a subset of the 8bit WE8ISO8859P1.

You must consider this when using multiple charactersets. The strings will be converted from one to the other and you may "lose" some characters.

3) neither is more popular or flexible. They both do exactly what they do. the 8bit can hold more types of characters so unless you are doing a US only application, 8bit might be more appropriate. I've taken to creating all of my databases with 8bit now as I must support many countries and they get upset when we lose their special characters.

4) cannot answer that for you. It depends on what you put in there. You can convert the US7ASCII into WE8ISO8859P1 with later releases of Oracle (not 7.3, do not believe the functionality was implemented then). See the NLS guide, chapter 3 for details:

</code> http://docs.oracle.com/cd/A81042_01/DOC/server.816/a76966/ch3.htm#47136

actually, you may want to read through the entire document as it answers your questions directly in greater detail then I can here:

http://docs.oracle.com/cd/A81042_01/DOC/server.816/a76966/toc.htm <code>
...

Rating

  (237 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

US7ASCII

Brigitte, October 29, 2002 - 8:35 am UTC

Hallo Tom,
We use Oracle 8.1.7.2 and an AIX server.
We have a database with the character set US7ASCII. I know it can present ASCII-Code from 0 to 127. I wonder why the client (works on the AIX server) can present characters with an ASCII-Code > 127. I read characters with an ASCII-Code > 127 were convert to ? So we test it with the SVRMGR.

spk15se:astadm 13> echo $NLS_LANG
AMERICAN_AMERICA.US7ASCII

SVRMGR> select * from v$NLS_PARAMETERS;
PARAMETER VALUE
----------------------------------------- ------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-YY
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET US7ASCII
NLS_SORT BINARY
NLS_NCHAR_CHARACTERSET US7ASCII

SVRMGR> select * from nls_database_parameters where parameter ='NLS_CHARACTERSET';
PARAMETER VALUE
------------------------------ ------------------------------
NLS_CHARACTERSET US7ASCII


SVRMGR> update equi set typbz = 'Ö ä € Ü' where equnr like '%400059';

SVRMGR> select typbz from equi where equnr like '%400059';
TYPBZ
--------------------
Ö ä Ü

SVRMGR> select dump (TYPBZ) from equi where equnr like '%400059';
DUMP(TYPBZ)
----------------------------------------------------
Typ=1 Len=7: 214,32,228,32,128,32,220

How is it possible to store 8bit characters in 7 bit?


Tom Kyte
October 29, 2002 - 1:43 pm UTC

Because when the client characterset = server character set the data is treated "as if it were binary" and no conversion is done. the client is trusted to be inserting data is says it will (we have to assume the data is in the us7ascii characterset -- if not, what characterset IS it in and how could we translate that? since we cannot -- we do not.

Good Reminder

Anthony, Cheuk Tung, LAI, October 29, 2002 - 8:33 pm UTC

Actually, this is gray area for me for character set and I will pay more attention on the characterset the database are using when import/export needs to be conducted.



Euro symbol

Michel Cadot, October 30, 2002 - 3:09 am UTC

In addition, Euro symbol is neither in US7ASCII nor WE8ISO8859P1 character set.
You have to use WE8ISO8859P15, WE8MSWIN1252, UTF8...
As i notice, in your example, your code for Euro symbol is 128, you have to choose WE8MSWIN1252. In WE8ISO8859P15 code is 164, and in UTF8 226,130,172 (multibytes character set).

Hope this helps


Stored bits

Brigitte, October 30, 2002 - 3:11 am UTC

Hallo Tom,
your answer is very interested, I thought the 8. bit is removed in US7ASCII.

Where can I find Information how many bits are really stored for one character dependent on the character set if no conversion is done.

Thanks
Brigitte

Tom Kyte
October 31, 2002 - 9:27 am UTC

It is ALWAYS bytes -- never bits. We store BYTES per character, not bits. We just don't edit the bytes if the client character set = server character set.

Conversion at Client?

Sikandar Hayat Awan, October 30, 2002 - 11:30 am UTC

Hi TOM,
On the server there is US7ASCII character set but I need 128,129,130 and 131 at client. Is there any way that without converting character set at server from US7ASCII to WE8ISO8859P1 so I can get the results at client?



Tom Kyte
October 31, 2002 - 8:16 pm UTC

You have a very square peg and a totally round hole. All a hammer will do here is splinter the square peg and splinters hurt lots.

No joy for you here -- the database character set just won't support it -- and even if you use a US7ASCII client -- you'll just be heading for certain disaster as someone will eventually use WE8ISO8859P1 or something else -- stripping the data (that and windows clients default the character set to something OTHER then US7ASCII)...

You need a database that supports the character set you wish to use.

Question regarding Nls_date_format

Ash, October 31, 2002 - 6:39 pm UTC

Hi Tom,
I am not sure whether I should ask this question here or should I raise it seperately. If so, Please feel free to ignore it.

How come NLS_DATE_FORMAT appears to be different in Unix from that in windows.
I did a simple test. Login through sqlplus in both environments & "select sysdate from dual ;"
Unix output is in DD-MON-YYYY format while in windoes its
DD-MON-RR format.
Also, values for NLS_DATE_FORMAT in "V$NLS_PARAMETERS" are different.
Shouldn't they be same at database level unless changed using "Alter session..".
Thanks for your wondeful informative site.
Regards,
Ash

Tom Kyte
October 31, 2002 - 9:35 pm UTC

see
</code> http://asktom.oracle.com/Misc/NLSDateFormat.html <code>

Here is my guess:

a) the init.ora has dd-mon-yyyy or dd-mon-rrrr as the nls_date_format
b) the unix client has a US7ASCII default character set (by default)
c) the database is US7ASCII
d) the windows client is NOT us7ascii (by default they are NOT)
e) the windows client is using the default default NLS_DATE_FORMAT of dd-mon-rr

thats why....

Character sets are same

Ash, October 31, 2002 - 9:48 pm UTC

Thanks Tom for immediate response.
Below is ourput from v$NLS_PARAMETERS from both env.

UNIX
---
PARAMETER VALUE
------------------------------ ----------------------------------
-
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-YYYY
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET US7ASCII
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZH:TZM
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZH:TZM
NLS_DUAL_CURRENCY $
NLS_NCHAR_CHARACTERSET US7ASCII
NLS_COMP BINARY

17 rows selected.


Windows NT
----------
PARAMETER VALUE
------------------------------ --------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET US7ASCII
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZH:TZM
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZH:TZM
NLS_DUAL_CURRENCY $
NLS_NCHAR_CHARACTERSET US7ASCII
NLS_COMP BINARY

17 rows selected.

Also, I have checked this in another organisation from my friend. They also have this behaviour.
THis seems to me as if we use default settings, we would get this different behaviour.
Regards,
Ash

Tom Kyte
November 01, 2002 - 6:31 am UTC

You are looking in the wrong place.  Here is what I believe to be true:

a) your INIT.ORA on the server has DD-MON-YYYY as the nls_date_format <b>this is what you need to check</b>

b) your UNIX clients have the same character set as the database - hence they take the database default which was over-ridden by your init.ora

c) your WINDOWS clients do NOT have the same character set -- they are setting the NLS_LANG in the registry to something other then US7ASCII.  Hence -- they are NOT getting the default NLS_LANG from the database -- they are using the "default default" NLS_LANG (as described in that link i sent you before) 


All you've shown is that -- yes, the databases character set is US7ASCII:

<b>
[tkyte@dhcp-reston-gen-3-east-123-45 tkyte]$ export NLS_LANG=AMERICAN_AMERICA.US7ASCII
[tkyte@dhcp-reston-gen-3-east-123-45 tkyte]$ echo $NLS_LANG
AMERICAN_AMERICA.US7ASCII
[tkyte@dhcp-reston-gen-3-east-123-45 tkyte]$ plus
</b>

SQL*Plus: Release 9.2.0.1.0 - Production on Fri Nov 1 06:36:12 2002

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production

ops$tkyte@ORA920.US.ORACLE.COM> select * from nls_database_parameters;

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,<b>
NLS_CHARACTERSET               WE8ISO8859P1</b>
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
......


ops$tkyte@ORA920.US.ORACLE.COM> select * from v$nls_parameters;

PARAMETER                                                        VALUE
---------------------------------------------------------------- ----------------------------------------------------------------
NLS_LANGUAGE                                                     AMERICAN
NLS_TERRITORY                                                    AMERICA
NLS_CURRENCY                                                     $
NLS_ISO_CURRENCY                                                 AMERICA
NLS_NUMERIC_CHARACTERS                                           .,
NLS_CALENDAR                                                     GREGORIAN
NLS_DATE_FORMAT                                                  DD-MON-RR
NLS_DATE_LANGUAGE                                                AMERICAN<b>
NLS_CHARACTERSET                                                 WE8ISO8859P1</b>
.....

that is NOT the clients NLS_LANG in there, thats the databases.


It is as I described previously. 

Help with Character Set

Abhijit, December 11, 2002 - 1:14 am UTC

Hi Tom,
I was earlier able to run a function, but now it gives me an error stating

select patindex('a',loginid) from logintable
*
ERROR at line 1:
ORA-06553: PLS-561: character set mismatch on value for parameter 'S2'

However when I change the function to use a convert like this

select patindex('a',convert(loginid,'US7ASCII')) from logintable

PATINDEX('A',CONVERT(LOGINID,'US7ASCII'))
-----------------------------------------
4
4
0
0
4
1
0



It returns me the correct output.

I was wondering, how do I re-set this situation.
Is it possible to alter the character set of a database.
I am no expert at Oracle Databases and I was wondering what should I do to correct
this situation? I don't want to change the patindex function call,
because that would mean changes to a number of stored procedures in the database.

These are my current parameters.

SELECT * FROM V$NLS_PARAMETERS;

PARAMETER VALUE
------------------------------ -------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET US7ASCII
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM

PARAMETER VALUE
------------------------------ -------------------------
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZH:TZM
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM
TZH:TZM

NLS_DUAL_CURRENCY $
NLS_NCHAR_CHARACTERSET JA16SJISFIXED
NLS_COMP BINARY


What should I do to correct the above situation?

Cheers!
Abhijit


Tom Kyte
December 11, 2002 - 6:47 am UTC

patindex? thats a sybase function -- not familar with it in an Oracle context.

maybe if you tell me:

o what patindex is
o what the nls_lang of the client is
o what the table is, how it was defined

I might be able to look at this. You should work on figuring out what changed in your environment to break it

More info

Abhijit, December 12, 2002 - 5:56 am UTC

Hi Tom,
its me again. Patindex is a function which we wrote to simulate the sybase function. Basically it returns the location of the character inside a string. That has two input parameters.
For example -
1) Select PATINDEX('Y','THOMAS KYTE') From DUAL will return a 9

I am sure there are oracle functions which would return the same, but we prefer to use PATINDEX as a matter of legacy.
Here is the code for patindex.


CREATE OR REPLACE FUNCTION PATINDEX(
S1 VARCHAR2,
S2 VARCHAR2)
RETURN INTEGER
AS
PPOS
INTEGER;
OPOS INTEGER := 0;
TYPE STABTYP IS TABLE OF VARCHAR2(100) INDEX
BY BINARY_INTEGER;
STAB STABTYP;
LOOPC INTEGER;
STARTPOS INTEGER;
CURPOS
INTEGER;
SSTRING VARCHAR2(100);
BEGIN
PPOS := INSTR(S1,'%');
WHILE PPOS
!= 0 LOOP
SSTRING := SUBSTR(S1,OPOS+1,PPOS-OPOS-1);
IF SSTRING IS NOT
NULL THEN
STAB(STAB.COUNT+1) := SSTRING;
END IF;
OPOS := PPOS;
PPOS
:= INSTR(S1,'%',OPOS+1);
END LOOP;
SSTRING := SUBSTR(S1,OPOS+1);
IF
SSTRING IS NOT NULL THEN
STAB(STAB.COUNT+1) := SSTRING;
END IF;
STARTPOS
:= INSTR(S2,STAB(1));
IF STARTPOS > 0 THEN
CURPOS := STARTPOS + 1;
FOR
LOOPC IN 2..STAB.COUNT LOOP
CURPOS := INSTR(S2,STAB(LOOPC),CURPOS);
IF
CURPOS = 0 THEN
STARTPOS := 0;
EXIT;
END IF;
CURPOS := CURPOS
+ 1;
END LOOP;
END IF;
RETURN STARTPOS;
END PATINDEX;

Looking at this patindex function, I personally think that an INSTR would achieve the same result. What is your opinion?

My problem is actually with the NLS_SETTINGS. I am using ORacle 8i on Windows NT Server.

Funnily enough, when I re-started the server, the settings went back to normal.

PARAMETER VALUE
---------------------------------------------------------------- ----------------------------------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET WE8ISO8859P1
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZH:TZM
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZH:TZM
NLS_DUAL_CURRENCY $
NLS_NCHAR_CHARACTERSET WE8ISO8859P1
NLS_COMP BINARY

I want to know how do they get altered in the first place and how do I set them back?

Cheers!
Abhijit




Tom Kyte
December 12, 2002 - 9:55 am UTC

that is a truly horrid idea.  What you are basically saying is:

"hey, we bought Oracle but are so hung up on this sybase stuff we'll just make Oracle run really slow"


the function you are looking for is instr.  


ops$tkyte@ORA920> select instr( 'thomas kyte', 'y' ) from dual
  2  /

INSTR('THOMASKYTE','Y')
-----------------------
                      9


That is my opinion - that you use native database functionality and create plsql routines ONLY WHEN THERE IS NO FUNCTION built already.



Use it. 

Session level NLS settings

KU, December 12, 2002 - 1:16 pm UTC

I would like to know if it is possible to check the client side NLS_LANG settings in any of v$ or nls_ views once connected to the database.

I would also like to know if there is any way to force a client to use a different NLS_LANG session level setting thru application in java/cobol/plsql.
(I think in C getenv and setenv might help.)

(e.g: if database NLS_LANGUAGE is WE8ISO8859P1 and client has it set to WE8DEC, I want to force use of WE8ISO8859P1 to avoid characterset conversion and loss of data.)

Thanks for taking time to answer this question.

KU


Tom Kyte
December 12, 2002 - 1:37 pm UTC

the client application would have to check this. Java is dicey, as Java is all about UCS and is going to override you anyway in the end.

It is not "loss of data", it is converted data.

there is no way to "force" a client that I know of -- it is the client that picks this based on what they want. only the client knows their character set as well -- it is not available "in the database"

nls_Date_format

mo, December 12, 2002 - 2:09 pm UTC

Tom:

what is the difference between nls_database_parameters and
V$nls_parameters?

How do you change any parameter in a client session?

Thanks,

Tom Kyte
December 12, 2002 - 2:26 pm UTC

fortunately, we document these:

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96536/ch3119.htm#1095949 http://docs.oracle.com/docs/cd/B10501_01/server.920/a96536/ch2424.htm#1175134 <code>

one is the database "permanent" settings, the other is the current settings.

alter session.....



Do session settings like this apply to work done by a database job ?

A Reader, February 11, 2003 - 4:58 am UTC

Hi Tom,

I need to change the date format behaviour of a pl/sql packaged procedure which gets executed by a job. Can you tell me if I do the 'ALTER SESSION SET NLS_' etc. for date formats will it then affect all work done by the server process (the snp process) responsible for executing the job after that ? Or is each job executed in a new session ? I tried to find info on how sessions are handled for dbms_job but failed miserably.

I am thinking the alternative would be to select the current settings from v$nls_parameter and store them, re-setting them at the end of the job or if the job fails.

Hopefully you can tell me that it is just for the life of the job, but even then if the job executes multiple packages I'd rather ensure that this particular package doesn't affect the others.

Regards,

Paul

Tom Kyte
February 11, 2003 - 8:26 am UTC

Actually -- the nls_ info is picked up by default when you submit the job - if you query user/dba jobs you'll see stuf flike:

0 clbeck.status_mail( 'TKYTE' );
NLS_LANGUAGE='AMERICAN' NLS_TERRITORY='AMERICA' NLS_CURRENCY='$' NLS_ISO_CURRENCY='AMERICA' NLS_NUMERIC_CHARACTERS='.,' NLS_DATE_FO
RMAT='DD-MON-RR'
NLS_DATE_LANGUAGE='AMERICAN' NLS_SORT='BINARY'
0102000200000000


so you could actually do this at job submission time -- but the job queues will set the proper NLS settings for each job.

Excellent.

A reader, February 11, 2003 - 8:36 am UTC

So I have choices....I can configure it in the job, OR from what you are saying it sounds like I can alter it within the procedure, and it won't affect other jobs/processes kicked-off by the same server process...i.e. it sounds like (in effect) a new session is created for each job ?

Tom Kyte
February 11, 2003 - 4:44 pm UTC

a pristine environment is setup for each job, yes.

need more infor

Sandra Chen, March 06, 2003 - 12:03 pm UTC

Hi Tom:
We use HEAT as a HELPDESK application.
We set character equal to 'TH8TISASCII' in database (oracle 8.1.7 )base on HEAT application request.
This time they need to upgrade the HEAT system and requested to change the character set to 'WE8ISO8859P1'.

but WE8ISO8859P1 is not the superset of TH8TISASCII,
so we got oracle error when we try to change it.

as we go through the document seems the only way is to recreat the database or use import/export.

1.for using import/export...is that mean do a system level export and drop the system user, change the character set ,than do a system level import is the whole procedure ?

If this is right, either recreate database and import/export would be a pain to us.

2. Is there any possibility can work this around in ORACLE 9i
later on if we still need to do the things like that for the front end application ?

Thanks
Sandra

Tom Kyte
March 06, 2003 - 3:41 pm UTC

1) i would just export/import the SCHEMA's I needed. I don't like to do full database exports.

Just create a new database.
Export the schemas you want.

you'll need to move public synonyms and users "by hand", but that is a GREAT way to clean up the clutter.

2) if the app says "i need this character set" and you cannot change to that characterset, no -- they are forcing your hand here.



How about using CSSCAN (Character Set Scanner Utility)

Logan Palanisamy, March 06, 2003 - 7:02 pm UTC

Tom,

How about using the Character Set Scanner Utility (CSSCAN) and identifying the problem data, fixing it and then doing 'alter database character set ....' command.

Chapter 10 of the Globalization Support Guide talks about this utility.

</code> http://download-west.oracle.com/docs/cd/A91202_01/901_doc/server.901/a90236.pdf <code>

Store special characters in 8.1.7 Database

S.M. Sundaram, June 12, 2003 - 10:48 am UTC

How do we store and retrieve characters like &#945; (Alpha), &#915; (gamma), š (S with reverse cap)

The init.ora parameters are
NLS_CHARACTERSET,WE8ISO8859P15
NLS_NCHAR_CHARACTERSET,WE8ISO8859P15

For alpha when i retrieve data it displayed as "?".

Tom Kyte
June 12, 2003 - 11:19 am UTC

but what are the CLIENTS nls settings - that is just as relevant.

My guess - client was different and character set conversion took place, changing the data.

still not getting &#945; ß &#915;

umesh, June 18, 2003 - 5:16 am UTC


&#945;, ß, &#915; , &#960; , &#931;, &#963; , µ in database and applications

I have an application in html which has all html components i.e text box ( input type = text) . I store data in database which has alpha (&#945;) beta (ß) and so on special characters when i retreive it it appears as a with 2 dots on it
my nls_character_set is american_america.WE8ISO8859P1
also my nls_lang paramater of client machine on windows 2000 =american_america.WE8ISO8859P1

additionally when i display those records in line mode sqlplus it works fine but if i try displaying the same in sqlplusw it does not give me a proper display of special characters even after setting the client settings with nls_lang on registry
please advice


About NLS Sort

David, June 20, 2003 - 10:03 am UTC

Hi Tom,

I'm working on a project for a school in Montreal, Quebec. There character set is WE8ISO8859P1, but most of the data is French. The saving of the French data is OK. But how can I sort the data in French character order? For example, é goes between d and f. I've tried to use

select City from City order by nlssort(city,'NLS_SORT=FRENCH')

But I got the following result:

...
Saint-Dunstan-du-Lac-Beauport
Sainte-Adèle
...
Sainte-Croix
Saint-Edmond
Sainte-Dominique
Saint-Édouard-de-Fabre
Sainte-Edwidge-de-Clifton
...

It seemed like when Oracle is doing French NLSSORT, the special characters such as -/_... are ignored.

Can you help me on this? Any help will be appreciated.

Thanks.

David


Tom Kyte
June 20, 2003 - 5:26 pm UTC

sorry, that is the way nls sorting is designed to work, those "-"'s just don't "exist"

I do not know of any method to alter this behaviour.

Followup NLS Sort

David, June 24, 2003 - 10:44 am UTC

Hi Tom,

Thank you for your response. Do you think changing the character set and using normal sort will solve this problem?

David

Tom Kyte
June 25, 2003 - 9:07 am UTC

not sure, I don't work with different characterset myself

What's wrong ?

Ashiq Shamsudeen, June 25, 2003 - 9:57 am UTC

Hi Tom,

I've 2 database ,one is oracle 9.2.0 its on sun box(solaris 8) and another is oracle 8.1.7 its on windoze 2000 server.
nls_characterset and nls_nchar_characterset is set to UTF8 for oracle9i and WE8ISO8859P1 for oracle8i.

I did a export from oracle9i and import it to oracle8i.

I exported scott schema from a client machine(client version is 8.1.7 on w2k professional) ,before export I changed the nls_lang parameter to UTF8 .As expected , export went through without any error.

see the log file...

Connected to: Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
Export done in UTF8 character set and UTF8 NCHAR character set
. exporting pre-schema procedural objects and actions
. exporting foreign function library names for user SCOTT
. exporting object type definitions for user SCOTT
About to export SCOTT's objects ...
. exporting database links
. exporting sequence numbers
. exporting cluster definitions
. about to export SCOTT's tables via Conventional Path ...
. . exporting table BONUS 0 rows exported
. . exporting table DEPT 4 rows exported
. . exporting table EMP 14 rows exported
. . exporting table SALGRADE 5 rows exported
.
.
.


But when i did import from client I'm seeing these messages like the below one...

import done in UTF8 character set and WE8ISO8859P1 NCHAR character set
import server uses WE8ISO8859P1 character set (possible charset conversion)
export server uses UTF8 NCHAR character set (possible ncharset conversion)
.
.
.

How to get rid of this message.

Tom Kyte
June 25, 2003 - 7:06 pm UTC

short of converting the we8iso database to utf8?

You can "change" the message, but you won't get rid of it unless both source and target are the same.

What it says is "true", there will be a character set conversion -- some data may change. That is a fact of life when using multiple charactersets -- you can put stuff in utf8 that just has NO corresponding character in we8iso...

A reader, July 18, 2003 - 9:45 am UTC

Hi Tom,

If I have set NLS_SORT on database as BINARY sort and on oracle client machine I don't have NLS_SORT entry on registry (HKEY_LOCAL_MACHINE\Software\Oracle).

What kind of sort order we would get on oracle client machine if I run application? (i.e Would it be BINARY or Others?)

Thanks

Tom Kyte
July 19, 2003 - 10:59 am UTC

the default is based on a function of the NLS_LANG , generally binary

Database characterset

Naveen, September 19, 2003 - 7:45 am UTC

Hi Tom,
Our Database character set is WE8ISO8859P1 and the national character set is AL16UTF16. For the application we are developing, i need to store some Japanese and Chinese characters. Could you please through some light on how i should be doing this.

Thank you
Nav.

Tom Kyte
September 20, 2003 - 5:26 pm UTC

you would put them into an NVARCHAR field as utf can hold them were as we8iso8859p1 cannot.

have you read through the globalization guide?
</code> http://docs.oracle.com/docs/cd/B10501_01/nav/docindex.htm#index-GLO <code>



Database characterset

Naveen, September 19, 2003 - 7:47 am UTC

Hi Tom,
I forgot to include the database version. It is 9.2.0.1.0.

Thank you
Nav


Database Characterset

Naveen, September 22, 2003 - 12:58 am UTC

Hi Tom,

Thanks for the reply.

Nav

Loading data from flat file in different characterset using UTF File package

Bhavesh, November 10, 2003 - 7:02 am UTC

Hi Tom,
I am short of words to express my gratitude for the service you are providing to a vast Oracle Users community. Actually there is one issue which is bugging me for a while on Database Characterset. Let me try
to explain you the scenario.
Every month I receive a flat file containing some Japanese, korean characters and English characters which I need to upload in my database. Currently everything is working fine. I am using UTL_FILE package where by I
i) read one line of the flat file
ii) based on fixed length break up the columns
iii) upload them in my database.

Following is an excerpt from the program :
procedure read_data_from_file(str in varchar2) is
begin
r.cod_category := substr(str, 1, 2);
r.nam__desc := substr(str, 3, 30);
r.txt_option := substr(str, 33, 15);
r.cod_type := substr(str, 48, 8);
r.cod_model := rtrim(substr(str, 56, 20));
r.pri_list := to_number(nvl(substr(str, 76, 11),0));
r.pri_purchase := to_number(nvl(substr(str, 87, 11),0));
r.num_weight := to_number(nvl(substr(str, 98, 6),0));
........
and so forth.

Later I insert these values into the database. Now the problem is that when I use the flat file with a database having character set WE8ISO8859P1 the program works fine.
Next I created another database in UTF8 character set and increase the column length(s) as desired. Now if I try to import the same file it starts giving me errors ? Surprisingly the values it is now reading from file is quite different from the one above i.e. substr(str, 48,8) yields different values if the database is in WE8ISO8859P1 and in UTF ?
I was under the impression that when reading from flat file with fixed length delimiter the database should be consistent in reading.
Can you please explain what is happening here and how can I overcome this issue.
Thanking you in advance.
Regards


Bhavesh


Tom Kyte
November 10, 2003 - 8:03 am UTC

it is expecting the file to be utf encoded and it must not be.

how is the file itself encoded in the first place??? if it has those multibyte characters in it?????

Further Update

Bhavesh, December 17, 2003 - 8:57 pm UTC

Hi Tom,
I have made a variation in my flat file. Instead of fixed width file now I have got tab delimited file.
As you have asked "it is expecting the file to be utf encoded and it must not be".
Actually the files was created from legacy system, on a Japanese OS with the characterset JA16SJIS. When I open the file on Japanese OS everything looks fine. Even when I open the txt file in IE and set the Encoding to Auto-Select it automatically select the JA16SJIS character set and shows the data to me.

"how is the file itself encoded in the first place??? if it has those multibyte characters in it????? "
The file is a flat file created in JA16SJIS character set and I do not think it is encoded. My database is UTF8 and the server on which we are doing the processing is having Windows 2000 SP4 English. My goal is to store both the Japanese and English characters in UTF8 database properly from this file.
I have gone through a note (Note:227531.1) at metalink
which suggest to convert the file using following command and then insert into database.
......
line_conv := CONVERT(line_in, 'UTF8', 'JA16SJIS');
insert into utl_conv_example values(line_conv);
......
I tried to replicate the same. When I read one line of file and insert into database as a single column everything looks fine. But actually the file has got various columns which are delimited by tab. I want to read each column, store it in a local variable (within package) and then insert all columns of one record into database.
When I try to do this, then in some records the file skips one tab there by making two columns as one. What I now want to know is :
A) Should I read one line as a whole, convert it into UTF and then use the instr, substr command to find columns. OR
B) Should I read each column delimited by tab (using instr, substr) and then use convert function to convert data to UTF8.
Can you please throw some light on how to accomplish this. Thanking you in advance.


Tom Kyte
December 18, 2003 - 10:49 am UTC

open it with a binary editor and you'll probably find it is in fact encoded. The fact that IE can auto detect says "this is the case" (that it is encoded)

you can specify the files character set right in sqlldr.


</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96652/ch01.htm#1011906 <code>

Thanks a million

Bhavesh, December 21, 2003 - 8:02 pm UTC

Thanks Tom. After going through your response things are more clear and now I think I should be able to get the desired results. Thanks once again.

Converting charactersets

Benoit Hébert, January 27, 2004 - 5:52 pm UTC

Tom,

In your initial response, you stated that it would be possible to convert a database that was created as US7ASCII to the WE8ISO8859P1. When I created one of my databases, I made the mistake of creating it as US7ASCII, instead of the other one, so the database cannot store the French accented characters that I need. I seem to remember from one of the Oracle University courses that I attended that it isn't possible to change the database nls_characterset setting once the database has been created, but your response seems to indicate otherwise. Am I mistaken? Am I going to have to re-create the database?

Thanks!

Character Set Convert Issue

Ivan, February 25, 2004 - 7:28 pm UTC

Hi Tom,

I am having issues with accents when generating search names and I was hoping from your assistance.

our database setting: NLS_CHARACTERSET WE8ISO8859P1

Passed in name: TV Globo de São Paulo Ltda

Current results: TV Globo de S?o Paulo Ltda

CORRECT results TV Globo de Sao Paulo Ltda


Have tried the following:


select name, CONVERT(name,'US7ASCII', 'WE8ISO8859P1')
from companies
where company_id = 42430690

TV Globo de São Paulo Ltda TV Globo de S?o Paulo Ltda


select name, CONVERT(name,'UTF8', 'WE8ISO8859P1')
from companies
where company_id = 42430690

TV Globo de São Paulo Ltda TV Globo de São Paulo Ltda


select name, CONVERT(name,'AL32UTF8', 'WE8ISO8859P1')
from companies
where company_id = 42430690

TV Globo de São Paulo Ltda TV Globo de São Paulo Ltda

Thanks,
Ivan




Tom Kyte
February 26, 2004 - 7:42 am UTC

why do you say "correct" results? the conversions are mandated to us by standards bodies, we follow those rules.

it is actually "correct", just not what you wanted. You'll want to use TRANSLATE or replace with a custom code table of your liking to achieve your goal, your interpretation of how they should be mapped.

export NLS_LANG=AMERICAN_AMERICA.UTF8 is not working for 'é'

Sami, March 03, 2004 - 9:23 am UTC

Dear Tom,

We are dealing with so many different languages like Japanese,Korean,Simplified Chinese,Traditional Chinese,spanish and French. 

Everything works fine with UTF8 characterset and NLS_LANG=AMERICAN_AMERICA.UTF8 ***EXCEPT*** 'é'

DB Version: Oracle 8.1.7.3
OS: Sun Solaris

Below example I tried on DB server itself.

SQL> drop table fr_char_test;
 
Table dropped.
 
SQL> create table fr_char_test(c1 varchar2(100));
insert into fr_char_test values ('é');
insert into fr_char_test values ('éé');
insert into fr_char_test values ('ééé');
Table created.
 
SQL> 
1 row created.
 
SQL> 
1 row created.
 
SQL> 
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
SQL> select dump(c1) from fr_char_test;
 
DUMP(C1)
--------------------------------------------------------------------------------
Typ=1 Len=1: 105
Typ=1 Len=2: 105,105
Typ=1 Len=3: 105,105,105
 
SQL> select * from fr_char_test;
 
C1
--------------------------------------------------------------------------------
i
ii
iii
 
SQL> !
 
njd05dsmdbc01 myuser mydb /opt/home/usssd429
$ echo $NLS_LANG
 
 
njd05dsmdbc01 myuser mydb /opt/home/usssd429
$ exit
 
SQL> select * from v$nls_parameters where PARAMETER like '%CHARACTERSET%';
 
PARAMETER
----------------------------------------------------------------
VALUE
----------------------------------------------------------------
NLS_CHARACTERSET
UTF8
 
NLS_NCHAR_CHARACTERSET
UTF8
 
 
SQL> 
 

Tom Kyte
March 03, 2004 - 10:15 am UTC

looks like the nls_lang isn't even set? it would default to us7ascii

From SQL prompt, "!" or "host" command is not displaying UNIX ENV variable

Sami, March 03, 2004 - 11:11 am UTC

Dear Tom,

Looks like if I come out SQL prompt using "!" it is not displaying UNIX ENV variable. But it is set in .profile.

Look at this

SQL> !
 
njd05dsmdbc01 myuser mydb /opt/home/myuser
$ echo $NLS_LANG
 
 
njd05dsmdbc01 myuser mydb /opt/home/myuser
$ exit
 
SQL> exit
Disconnected from Oracle8i Enterprise Edition Release 8.1.7.3.0 - Production
With the Partitioning and Parallel Server options
JServer Release 8.1.7.3.0 - Production
 
njd05dsmdbc01 myuser mydb /opt/home/myuser
$ echo $NLS_LANG
AMERICAN_AMERICA.UTF8
 

Tom Kyte
March 03, 2004 - 3:22 pm UTC

not sure, I don't work with characters like that -- i'm not usre if the behaviour is correct -- or not. I'll have to suggest you contact support for this particular issue (unless someone out there that works with this sort of data knows...)

A reader, March 16, 2004 - 2:15 am UTC

Tom,

I got this from one of my friend, saying to change the character set of db, I have seen some lengthy documents in metalink and even in your site.

But surprised to see this one. Please let me know if this is a valid method of converting a db character set from default characterset to UTF8, Ver: 9.2.0


Run the following queries to set Character Set value to ‘UTF-8’


Update sys.props$ set value$='UTF8' where name= NLS_NCHAR_CHARACTERSET';
Update sys.props$ set value$='UTF8' where name='NLS_CHARACTERSET';
commit;

Thanks a lot


Tom Kyte
March 16, 2004 - 7:47 am UTC

bzzt -- your friend is no friend, not if they tell you to do this.


do NOT do this.

if support tells you to update a table owned by sys -- great, fine, go ahead.

If anyone else tells you to update a table owned by sys -- look at them funny and walk away.

How about rename database?

A reader, March 16, 2004 - 8:00 am UTC

I find that when I change the database name, I need to change one value of sys.prop$, such that database link can work for this renamed database. Is that OK?

David

Tom Kyte
March 16, 2004 - 9:31 am UTC

Unless support told you to do that -- you did absolutely the wrong thing.


you probably updated the global_db_name right? well, the *right*, proper, supported, supportable way to do that would be the alter datatabase command.

Thanks Tom

A reader, March 16, 2004 - 8:12 am UTC


I was surprised and so was the question ... Thanks again

alter database rename database...

A reader, March 16, 2004 - 10:17 am UTC

Tom,

Yes, I update the GLOBAL_DB_NAME to change the database name for using the database link. Before doing that, I first re-create the control file by using "CREATE CONTROLFILE SET DATABASE ...", and it seems that the database is now renamed without problem, except the database link cannot be used. After changing the GLOBAL_DB_NAME, then everything seems OK.

David

Tom Kyte
March 16, 2004 - 10:19 am UTC

the command you were looking for was not UPDATE, but alter database again.  the correct and proper way to rename a global name -- is to rename the global name!


ops$tkyte@ORA9IR2> alter database rename <b>GLOBAL_NAME</b> to ora9ir2.us.oracle.com;
Database altered.

is that easy (and safe to boot!)
 

Work in 8i?

A reader, March 16, 2004 - 10:41 am UTC

Tom,

Is it work in 8i (by issue alter database rename instead of update sys.props$)?

Thanks,
David

Tom Kyte
March 16, 2004 - 1:31 pm UTC

Connected to:
Oracle8i Enterprise Edition Release 8.1.7.4.1 - Production
With the Partitioning option
JServer Release 8.1.7.4.1 - Production

tkyte@ORA8I> alter database rename global_name to foobar;

Database altered.

tkyte@ORA8I>


Yes.  In fact, here is the oldest release I have lying about:

Connected to:
Oracle7 Server Release 7.1.6.2.0 - Production Release
With the distributed, replication and parallel query options
PL/SQL Release 2.1.6.2.0 - Production
 
tkyte@ORA716> alter database rename global_name to foobar;
 
Database altered.


<b>it has never been the right way to update sys.props$ -- never</b>
 

10g and NLS

Ajay, March 19, 2004 - 4:10 am UTC

Hi Tom,
The article is very useful, we have recently installed j2ee and web cache of 10 g and deployed a J2EE application on it, which will be inserting the french data in the database Our db character set is WE8ISO8859P15.
My question is
1. Where should I set the NLS_LANG on application server ( .bashrc or some configuration file)
2. Do I need to set the LANG variable at OS level?
Our operation system is Linux Suse.
Thanks in advance

Tom Kyte
March 19, 2004 - 9:00 am UTC

1) it is an environment variable
2) NLS_LANG, yes, else it'll default to us7ascii

Solution to "About NLS Sort"

A reader, April 08, 2004 - 5:27 am UTC

To David from Toronto, ON, Canada

David this might be helpfull:

select City
from City
order by replace(city, '-', chr(1))
/

If you omit the chr(1) you might get the wrong results:
order by replace(city, '-')

SaintDunstanduLacBeauport
SaintEdmond
SainteAdèle
SainteCroix
SainteDominique
SainteEdwidgedeClifton
SaintÉdouarddeFabre -- note the E accent!

instead of
order by replace(city, '-', chr(1))

SaintDunstanduLacBeauport
SaintEdmond
SaintÉdouarddeFabre -- where it should be
SainteAdèle
SainteCroix
SainteDominique
SainteEdwidgedeClifton



doubt about nls_language

Fernando Sanchez, April 08, 2004 - 8:43 am UTC

I don't understand the following situation (I don't undrestand very well how nls_lang parameters work):

I have an 8.1.7. release database on a linux platfrom, these are the language parameters of both the db an the only instance:

CSR>select * from nls_database_parameters;

PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS ,.
NLS_CHARACTERSET WE8ISO8859P15
...............
...............


CSR>select * from nls_instance_parameters;

PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
...............
...............


And my problem is the following: I have changed my session parameters in order to being able of writing and reading Spanish characters (I've been told that NLS_CHARACTERSET = WE8ISO8859P15) supports them.
After doing that things are different in my client windows sqlplus than in my linux server sqlplus, so if I do the following, this is what I get:


CLIENT'S WINDOWS SQLPLUS:

CSR>select * from nls_session_parameters;

PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE SPANISH
NLS_TERRITORY SPAIN

CSR>update ir_ct_sms set ds_sms_v='marketing de campaña navideña' where co_sms_n=44;

1 fila actualizada.

CSR>commit;

Validación terminada.

CSR>select ds_sms_v from ir_ct_sms where co_sms_n=44;

DS_SMS_V
--------------------------------------------------------------------------------
marketing de campaña navideña



SERVER'S LINUX SQLPLUS:

CSR>select * from nls_session_parameters;

PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE SPANISH
NLS_TERRITORY SPAIN

CSR>update ir_ct_sms set ds_sms_v='marketing de campaña navideña' where co_sms_n=44;

1 fila actualizada.

CSR>commit;

Validacion terminada.

CSR>select ds_sms_v from ir_ct_sms where co_sms_n=44;

DS_SMS_V
--------------------------------------------------------------------------------
marketing de campa?a navide?a


I don't understand why I can write the character 'ñ' but I can't read it on the server.

Can you tell my what the problem is about ?


Tom Kyte
April 08, 2004 - 10:49 am UTC

You are mixing "language settings" which affect messages and the use of ,. in numbers with character sets.

what is your NLS_LANG set to on the CLIENT machine. (in the registry on windows)

WE8ISO8859P1 to UTF8

A reader, April 26, 2004 - 3:08 pm UTC

Tom,
We currently have WE8ISO8859P1 character set on the DB. We are considering changing the character set to either UTF8 or AL32UTF8. The data that is stored in the current DB (char set WE8ISO8859P1) has a lot of special characters that don't make sense (dark square boxes...). We are not sure how these characters got into the DB, DB into made through an ejb application - data entered and viewed through the browser worldwide
db version we have: 9.2.0.4

The questions I have are when we change the char set from WE8ISO8859P1 to UTF8,
1) Can we recover the special charc that don't have any meaning currently.
2) Will the currency symbol, format of the decimal character's etc change.
3) What components should need attention when this char set change is made.

Thank you



Tom Kyte
April 27, 2004 - 4:11 am UTC

you will want to read:

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96529/toc.htm <code>

and run the character set scanner utility.

those dark boxes are probably going to stay "garbage".

"testing" this is a 100% *must do thing*. don't do this to your production instance, do this in test. and you should really investigate "why" you have dark boxes before doing anything.

Excellent

May, June 08, 2004 - 6:36 pm UTC

Another question:
How do I make the date type display on web format like 'yyyy/mm/dd hh24:mi:ss'. When I did: htp.Header(3, 'Current Timestamp : '||sysdate,'center'); It shows
Current Timestamp : 08-JUN-04

Is there any configuration I need to change to make it display as 'yyyy/mm/dd hh24:mi:ss'.

The purpose is that I want to display date type value to show the hour, minute, and second while I am using owa_util.tablePrint. i.e.
ignore := owa_util.tablePrint('issue_today',
cattributes=>'text="#000000" border ="1" bgcolor="#FFFFCC" width="100%" align="center"',
ccolumns=>'host, sid, dba, issue_type, issue, entry_date',
ccol_aliases=>'HostName, DatabaseName, DBA/Netbackup, Issue Type, Issue, Entry Date');

Thanks

May


Tom Kyte
June 08, 2004 - 7:08 pm UTC

to_char( sysdate, 'yyyy/mm/dd hh24:mi:ss' )


or you can issue:

alter session set nls_date_format = 'yyyy/mm/dd hh24:mi:ss';




execute immediate 'alter session set nls_date_format = ''yyyy/mm...'' ';

character convertion

Asl&amp;#305; Urguplu, June 17, 2004 - 5:00 am UTC

Hi,
We are using a database Oracle 9b.we have some primary key written in turkish alphabet and want to convert them in engilish alpahabet.for exemple convert "&#351;" to "s"..

how can we do this?can you help us to write a trigger or something usefull??

thanks

Tom Kyte
June 17, 2004 - 10:07 am UTC


update using translate to translate whatever characters you want to whatever you want.

update t set x = translate( x, 'abc', 'ABC' )
where translate(x,'abc','ABC') <> x;

would turn abc into ABC for anything that had abc in it.

Mixed language in alert.log

Sergey Isenko, June 24, 2004 - 9:08 am UTC

Hello dear Tom!

I begin to manage one database from new customer, which is working yet.
9.2.0.1 version, SLES 8.0 
All NLS params were German ...
So I've change environment variables like:

> env|grep LANG

LANG=en_US
NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P15

and restart instance.

Everything looks Ok, all messages were in English.
But in a days I've expect very interested situation, when most of lines 
in alert.log are in English, but some lines are in German:

Tue Jun 22 14:13:31 2004
ARC1: Evaluating archive   log 2 thread 1 sequence 1379
ARC1: Beginning to archive log 2 thread 1 sequence 1379
Creating archive destination LOG_ARCHIVE_DEST_1: '/opt/oracle/arch/1_1379.arc'
ARC1: Completed archiving  log 2 thread 1 sequence 1379
Tue Jun 22 16:21:06 2004
Errors in file /opt/oracle/OraHome1/admin/ihk/udump/ihk_ora_27656.trc:
ORA-00600: Interner Fehlercode, Argumente: [ttcgcshnd-1], [0], [],   <----- !!!
Tue Jun 22 16:22:22 2004
Errors in file /opt/oracle/OraHome1/admin/ihk/udump/ihk_ora_27681.trc:
ORA-00600: Interner Fehlercode, Argumente: [ttcgcshnd-1], [0], [], 
Tue Jun 22 16:25:29 2004
Thread 1 advanced to log sequence 1381
  Current log# 4 seq# 1381 mem# 0: /opt/oracle/OraHome1/oradata/ihk/redo04.log
  Current log# 4 seq# 1381 mem# 1: /opt/oracle/OraHome1/oradata/ihk/redo14.log

I guess it is due to database parameters:

SQL> select * from nls_database_parameters;

PARAMETER                      VALUE                                            
------------------------------ ----------------------------------------         
NLS_LANGUAGE                   GERMAN    <-------
NLS_TERRITORY                  GERMANY                                          
NLS_CURRENCY                   ?                                                
NLS_ISO_CURRENCY               GERMANY                                          
NLS_NUMERIC_CHARACTERS         ,.                                               
NLS_CHARACTERSET               WE8ISO8859P15                                    
NLS_CALENDAR                   GREGORIAN                                        
NLS_DATE_FORMAT                DD.MM.RR                                         
NLS_DATE_LANGUAGE              GERMAN                                           
NLS_SORT                       GERMAN                                           
NLS_TIME_FORMAT                HH24:MI:SSXFF                                    
NLS_TIMESTAMP_FORMAT           DD.MM.RR HH24:MI:SSXFF                           
NLS_TIME_TZ_FORMAT             HH24:MI:SSXFF TZR                                
NLS_TIMESTAMP_TZ_FORMAT        DD.MM.RR HH24:MI:SSXFF TZR                       
NLS_DUAL_CURRENCY              ?                                                
NLS_COMP                       BINARY                                           
NLS_LENGTH_SEMANTICS           BYTE                                             
NLS_NCHAR_CONV_EXCP            FALSE                                            
NLS_NCHAR_CHARACTERSET         AL16UTF16                                        
NLS_RDBMS_VERSION              9.2.0.1.0                                        

SQL> select * from nls_instance_parameters;

PARAMETER                      VALUE                                            
------------------------------ ----------------------------------------         
NLS_LANGUAGE                   AMERICAN                                         
NLS_TERRITORY                  AMERICA                                          
NLS_SORT                                                                        
NLS_DATE_LANGUAGE                                                               
NLS_DATE_FORMAT                                                                 
NLS_CURRENCY                                                                    
NLS_NUMERIC_CHARACTERS                                                          
NLS_ISO_CURRENCY                                                                
NLS_CALENDAR                                                                    
NLS_TIME_FORMAT                                                                 
NLS_TIMESTAMP_FORMAT                                                            
NLS_TIME_TZ_FORMAT                                                              
NLS_TIMESTAMP_TZ_FORMAT                                                         
NLS_DUAL_CURRENCY                                                               
NLS_COMP                                                                        
NLS_LENGTH_SEMANTICS           BYTE                                             
NLS_NCHAR_CONV_EXCP            FALSE                                            

How is is possible at all to mix languages,
and how can I correct it?

Thanks in advance. 

Tom Kyte
June 24, 2004 - 10:28 am UTC

individual applications can specify their own language.

can you determine the application that was running when this trace was created, I believe you'll find that it is setting its NLS_LANG.



Mixed language in alert.log

Sergey Isenko, June 24, 2004 - 11:07 am UTC

Thank you for so quick responce!

Yes, I know it, it was JDBC Client, Bug: 1725012 point to it.

Am I right understand -
can CLIENT change a language settings for messages in alert.log file?

Thanks in Advance.

Tom Kyte
June 24, 2004 - 2:49 pm UTC

if the clients dedicated server writes to the alert log, yes.

the dedicated server will open the alert log and write to it to record the message.


If you only have the *us.msb files however (in $ORACLE_HOME/rdbms/mesg), things can only be in english.


Mixed language in alert.log

Sergey Isenko, June 25, 2004 - 3:24 am UTC

Many thanks, it's solution for me.

NLS_CHARACTERSET

A reader, June 25, 2004 - 10:54 am UTC

Tom

Is it possible to change NLS_CHARACTERSET value at the client level?

Also, if I excecute the following sql statement

create table test as select *from test@dblink

where source database has NLS_CHARACTERSET = WE8ISO8859P15
and destination database has NLS_CHARACTERSET = UTF8, all the VARCHAR2 columns sizes for table test in the destination database( NLS_CHARACTERSET = UTF8 )are three times their equivalent size in the source database( NLS_CHARACTERSET = WE8ISO8859P15 ). Is this because of different character sets?

Thanks

Tom Kyte
June 25, 2004 - 3:51 pm UTC

the character set describes what is stored in database.

the client makes their desired translated to character know to the database via the NLS_LANG settting.

UTF8 is a multi-byte encoding scheme, a single character may take many bytes of storage, yes.

Character set SQLLDR error

Anirudh, September 06, 2004 - 11:07 am UTC

Hi Tom,
While trying to load flat file to my database from SQLLDR I ecountered following errors
"SQL*Loader-266: Unable to locate character set handle for WE8ISO8859P1.
SQL*Loader-704: Internal error: ulconnect: OCIInitialize [1804]"

My control file looks like
LOAD DATA
CHARACTERSET WE8ISO8859P1
truncate
INTO TABLE GLOBAL_HEPM_OWNER.TITLE
FIELDS TERMINATED BY '^_' TRAILING NULLCOLS
(MODE_TYPE ,
REGION ,
ORIGINATOR,
FULL_ISBN ,
HEPM_ID FILLER,
ACCOUNTING_CODE ,
IMPRINT_ID ,
ASSOCIATION_CLASS_CD,
TITLE_DESCRIPTION ,
EDITION_NBR ,
COPYRIGHT_YEAR ,
ACTIVE_EDITION ,
PREV_ISBN ,
NEXT_ISBN ,
PIE_INDICATOR ,
PAGES ,
TRIM_SIZE ,
PROD_FORMAT,
UPC_CODE ,
EAN ,
INCLUDE_ECATALOG TERMINATED BY '^^' ,
ISBN )


I am using direct load for this. The SQLLDR ver is 9.2.0 and database is 9.2.0. This was working well with sqlldr ver 8.1.7 but is giving this error when it is chnaged to 9.2.0. The character set of my DB is WE8ISO8859P1.

Please advice.

Thanks,
Anirudh

Tom Kyte
September 06, 2004 - 11:49 am UTC

should work -- check your environment, ensure you are using the sqlldr 9.2 code, ensure your ORACLE_HOME is set properly.

this is nothing "obviously" wrong here.

CHANGE character set?

A reader, October 13, 2004 - 8:54 am UTC

Tom,

Can we change the character set once database is up & running ?

cheers!

Tom Kyte
October 13, 2004 - 9:11 am UTC

after you have been using it? yes

while it is running with users? no

see the globalization/NLS guide. oh wait, that was in the original answer :)

link to the directions above, if using a later release, read that document of course instead.


A reader, October 13, 2004 - 9:26 am UTC

Sorry for my shear ignorance.
And thanks very much for your time.


Characterset conversion

PM, November 05, 2004 - 8:04 pm UTC

we have a application in V7 on AIX. our management want to change the character set because euro symoble is nt displaying. so

What are the possible methods to change the character set.

or

do u suggest that from 'alter database command'

I wud appreciate ur help.
Thanks

Tom Kyte
November 06, 2004 - 10:16 am UTC

I suggest getting onto a supported release of the database since you are making a major huge change.

Using the way back machine (reading the v7 documentation) that alter database command did not exist way back in 1994.

Way back then, to change the characterset, you would rebuild the database -- export the old data and import it.

Since that is a fairly huge change (rebuild database, change characterset) -- include an upgrade to 10g whilst you are at it.


(your keyboard is in danger of complete failure -- it is skipping vowels every now and then -- these letters are extremely important in order to convey written thoughts properly and make your text readable by others. You might need to have that looked at)

Globalization

Chris Soza, November 22, 2004 - 8:21 am UTC

Hi Tom,

I have read this forum and the Oracle Globalisation Guide as per the link you provided. However, I still don't get it and we are hitting numerous problems. We have a datawarehouse database created in UTF8. We buy data from 3rd parties from various countries. Some of these datafeeds refer to the same products however, they have different fact data columns that we are interested in. The problem we have is that these datafeeds are encoded in different charater sets (UTF8, IS8859P1 etc.)

The problem we have is when we load the data, we can't compare two string fields. Although it's the same string. We expect the data when loaded to be converted into UTF8 which is the character set of our database. In our sqlldr files we specify the characterset of the source data. To view the data correctly, we need to set up the NLS_LANG to the appropriate character set, otherwise we get ? etc. So, if the data is converted and stored in UTF8 why can#t we do these comparissons?

How can we get this to work? Have you any ideas? Because we buy data from third parties, we can#'t specify a fixed characterset for the datafeeds.

Any help, ideas would be appreciated

Regards, Chris

Tom Kyte
November 22, 2004 - 8:44 am UTC

<quote>
Although it's the same string.
</quote>

but you see - it is not. there are standard mappings of "this bytes in this characterset shall be translated to these bytes in that characterset"


Apparently, when you take character_set_one and convert into utf8 and then you take character_set_two and convert into utf8 -- you get *different results* (by the very design of the charactersets however).

the strings are converted into utf8
the strings are in utf8
the strings are in fact different.


not sure what to tell you other than you may well have to set up your own translations during the load. when dealing with input charactersets that are varied and many -- this is going to be the way it is, it is the way they work (we didn't invent all of these encoding schemes, we just implement them)


(seems funny but when I buy stuff, I normally have a say in what it is exactly I'm purchasing....)


Thanks for your input

Chris Soza, November 22, 2004 - 9:59 am UTC

Hi Tom,

Thanks for your input. Are you suggesting that we standardise on something like UTF8 for the source encoding?. Problem with this is that when we use non-web based tools, it is not possible to display some of these characters properly. Also, it is non-standard to set NLS_LANG to UTF8 or similar on PC's which doesn't default to the locale of most of our users.

Global databases are a real dilemma! unless ofcourse reporting is split based on countries. It is virtually impossible to have a global reporting database unless, all source data is encoded the same and reporting is done via the web.

From most of the documents, articles I've read, it is adviced that the source data should remain encoded based on their default locale. However, what you are suggesting is the opposite

Regards, Chris

Tom Kyte
November 22, 2004 - 2:49 pm UTC

if you want to compare the data - join the data, keeping it in its own encoding would sort of defeat your stated goal. if everything is encoded differently, that would be a problem -- you have the same "we've got to get everything into one or the other characterset" problem.

A reader

A reader, December 03, 2004 - 5:28 am UTC

Hi,
Is it possible to set different character set at SCHEMA level ?? we have a situation where we have different vendor db install in ONE box.Every vendor wants different character set.I was thinking of a solution like as soon as
they get into their schema ,change the character set for that schema.Is it possible ,if yes how can we do this.

Cheers


Tom Kyte
December 03, 2004 - 8:06 am UTC

nope, the characterset is a database thing and the client picks their character set.

database has a characterset
clients pick their own characterset
conversion takes place.

Doubt in Date formats

Sujit, December 28, 2004 - 1:21 am UTC

Hi Tom, 
   I had a doubt regarding the Date formats we use. I am taking an example, 

SQL*Plus: Release 9.2.0.1.0 - Production on Tue Dec 28 11:45:03 2004

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

SQLPLUS scott/tiger@orcl.world
-------------------------------------------------------------
Connected to:
Oracle8i Enterprise Edition Release 8.1.7.4.0 - 64bit Production
With the Partitioning option
JServer Release 8.1.7.4.0 - 64bit Production

SQL> select sysdate from dual;

SYSDATE
---------
28-DEC-04

SQL> select date '2004-12-11' day from dual;

DAY
---------
11-DEC-04

SQL> select date '11-12-2004' day from dual;
select date '11-12-2004' day from dual
            *
ERROR at line 1:
ORA-01847: day of month must be between 1 and last day of month


SQL> 
-------------------------------------------------------------

       Can you tell me why "select date '2004-12-11' day from dual" gives output where as "select date '11-12-2004' day from dual" gives error. 

 

Tom Kyte
December 28, 2004 - 10:19 am UTC

use to_date(), that functionality you are trying to use is neither documented nor supported, especially in 8i.

use to_date( 'string', 'format' );



PLSQL block in DBMS_JOB submission not interpreting current_timestamp and sysdate correctly

Sujit, February 11, 2005 - 7:15 am UTC

I am on the East coast of the US. Hence our timezone is gmt-5. SELECT sysdate from dual returns a date; call it 02/09/2005 1 PM; if I write an anonymous block, which inserts sysdate into a table in the database, and submit the anonymous block as the "What" of a dbms_job, the actual value getting inserted appears to be sysdate -5/24, or 02/09/2005 8 AM (the 5 hour difference from GMT). Its as if PL/SQL run within dbms_job appears to not take into account what my system time is supposed to be; and instead is going in absolute time (GMT). Additionally; and I'm sure it is related, if I submit a job with an interval of TRUNC(SYSDATE+1) +1/24; and assume it will actually run at 1 AM; I am unhappy to discover it actually runs at 6 AM in my timezone (again the 5 hr difference). Any assistance would be appreciated. By the way; oracle version =9.2.0.3.0


What I am saying is that I can run a pl/sql block in sqlplus and it will insert a timestamp with one time; and seconds later run it with dbms_job and the timestamp inserted into the table appears to be 5 hours later (in fact it appears a future timestamp). I can look at both with the same NLS_DATE format in the same session and clearly see that they are 5 hours apart.

Tom Kyte
February 11, 2005 - 8:29 pm UTC

ok, time to peek at environments here....


dbms_job is run by a background process.
the background process has an environment -- inherited from whatever started the database.

sounds like the TZ environment variable is set to "X" when the database is started


Now, you connect via sqlplus -- one of two things is happening

a) you use a listener
b) you direct connect on the server (sqlplus forks/execs the dedicated server)


if A) the TZ for your dedicated server comes from the listener.

if B) the TZ for your dedicated server comes from you.


So, you could actually see "three" different times -- one from the jobs background (with its TZ setting), one when connecting over the network (from the listeners environment) and yet another from your environment.



what os are you running --

$ /usr/ucb/ps -auxwwe | grep ora

on solaris would show you the environment for the backgrounds

ps -auxwwe | grep ora on linux


but -- this wil be because the environments are set differently.


RE: PLSQL block in DBMS_JOB submission not interpreting current_timestamp and sysdate correctly

Sujit, February 11, 2005 - 10:17 pm UTC

SELECT Parameter, Value FROM NLS_SESSION_PARAMETERS where parameter like '%DATE%' or parameter like '%TIME%'
 ORDER BY 1;
Results
========================================

NLS_DATE_FORMAT     MM/DD/YYYY HH24:MI:SS
NLS_DATE_LANGUAGE     AMERICAN
NLS_TIMESTAMP_FORMAT     DD-MON-RR HH.MI.SSXFF AM
NLS_TIMESTAMP_TZ_FORMAT     DD-MON-RR HH.MI.SSXFF AM TZR
NLS_TIME_FORMAT     HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT     HH.MI.SSXFF AM TZR

LECT Parameter, Value FROM NLS_DATABASE_PARAMETERS where parameter like '%DATE%' or parameter like '%TIME%'
 ORDER BY 1;
========================================
NLS_DATE_FORMAT     DD-MON-RR
NLS_DATE_LANGUAGE     AMERICAN
NLS_TIMESTAMP_FORMAT     DD-MON-RR HH.MI.SSXFF AM
NLS_TIMESTAMP_TZ_FORMAT     DD-MON-RR HH.MI.SSXFF AM TZR
NLS_TIME_FORMAT     HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT     HH.MI.SSXFF AM TZR


Here is the example:
SQL> create table test_table(test_id NUMBER,
  2                          created_timestamp timestamp not null,
  3                          local_timestamp timestamp not null,
  4                          default_timestamp timestamp DEFAULT current_timestamp not null
  5                         );

Table created.




SQL> set serveroutput on;
SQL> BEGIN
  2    DECLARE
  3      JobName VARCHAR2(200);
  4      myJobId BINARY_INTEGER;
  5  BEGIN
  6    JobName := ' BEGIN
  7   INSERT into test_Table(test_id,created_timestamp,local_timestamp)
  8    VALUES(1,current_timestamp,localtimestamp);
  9   COMMIT;
 10   END;';
 11  dbms_job.submit(
 12     job => myJobId,
 13     what => JobName
 14    );
 15    dbms_output.put_line('Job Number: ' || myJobId || ': ' || JobName);
 16    COMMIT;
 17  END;
 18  END;
 19  /
Job Number: 56:  BEGIN
 INSERT into                                             
test_Table(test_id,created_timestamp,local_timestamp)
                          
VALUES(1,current_timestamp,localtimestamp);
 COMMIT;
 END;                      

PL/SQL procedure successfully completed.

SQL> select * from test_table;

   TEST_ID                                                                      
----------                                                                      
CREATED_TIMESTAMP                                                               
---------------------------------------------------------------------------     
LOCAL_TIMESTAMP                                                                 
---------------------------------------------------------------------------     
DEFAULT_TIMESTAMP                                                               
---------------------------------------------------------------------------     
         1                                                                      
11-FEB-05 04.05.13.529332 PM                                                    
11-FEB-05 04.05.13.529332 PM                                                    
11-FEB-05 04.05.13.529332 PM                                                    
                                                                                

SQL> select current_timestamp from dual;

CURRENT_TIMESTAMP                                                               
---------------------------------------------------------------------------     
11-FEB-05 11.05.40.773723 AM -05:00                                             

SQL> spool off;


Whst should i do?? 

Tom Kyte
February 12, 2005 - 12:34 pm UTC

I'm talking about the UNIX ENVIRONMENT.

so, for example:

[tkyte@localhost tkyte]$ <b>ps -auxwwe | grep pmon_ora9ir2</b>

ora9ir2  25642  0.0  0.3 336400 7848 ?       S    00:56   0:05 ora_pmon_ora9ir2
tkyte    26979  0.0  0.0  3676  644 pts/3    S    17:20   0:00 grep pmon_ora9ir2 VNCDESKTOP=localhost.localdomain:1 (tkyte) HOSTNAME=localhost.localdomain DESKTOP_STARTUP_ID= SHELL=/bin/bash TERM=xterm HISTSIZE=1000 NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1 GTK_RC_FILES=/etc/gtk/gtkrc:/home/tkyte/.gtkrc-1.2-gnome2 WINDOWID=26434373 USER=tkyte LD_LIBRARY_PATH=/home/ora9ir2/lib ORACLE_SID=ora9ir2 SESSION_MANAGER=local/localhost.localdomain:/tmp/.ICE-unix/3404 MAIL=/var/spool/mail/tkyte PATH=/home/tkyte/bin:/home/ora9ir2/bin:/home/tkyte/bin:/home/ora9ir2/bin:/home/tkyte/bin:/home/ora10g/bin:/home/tkyte/bin:/home/ora9ir2/bin:/home/tkyte/bin:/home/ora9ir2/bin:/bin:/usr/bin:/usr/local/bin:/usr/bin/X11:/usr/X11R6/bin:/home/tkyte/bin:/home/tkyte/binpriv:/sbin:/usr/java/j2sdk1.4.2_04/bin:/home/tkyte/bin:/home/tkyte/bin:/home/tkyte/binpriv:/sbin:/usr/java/j2sdk1.4.2_04/bin:/home/tkyte/bin:/home/tkyte/binpriv:/sbin:/usr/java/j2sdk1.4.2_04/bin INPUTRC=/home/tkyte/.inputrc PWD=/home/tkyte LANG=en_US.UTF-8<b> TZ=UTC </b>SQLPATH=/home/tkyte/Desktop/sqlstuff SHLVL=7 HOME=/home/tkyte LD_ASSUME_KERNEL=2.4.1 GNOME_DESKTOP_SESSION_ID=Default LOGNAME=tkyte ORACLE_PROC_MAKEFILE=/home/ora9ir2/precomp/demo/proc/demo_proc.mk CLASSPATH=.:/home/ora9ir2/jdbc/lib/classes12.zip DISPLAY=:1.0 NO_TIME_PROXY=YES ORACLE_HOME=/home/ora9ir2 COLORTERM=gnome-terminal _=/bin/grep OLDPWD=/home/tkyte/Desktop/sqlstuff


<b>so, my database pmon process has a TZ of UTC (and jobs, and so on)</b>.

ops$tkyte@ORA9IR2> create table t ( where_from varchar2(10), x date );
 
Table created.
 
ops$tkyte@ORA9IR2> insert into t values ( 'plus', sysdate );
 
1 row created.
 
ops$tkyte@ORA9IR2> declare
  2          l_n number;
  3  begin
  4          dbms_job.submit( l_n, 'insert into t values ( ''job'', sysdate );' );
  5          commit;
  6          loop
  7                  select count(*) into l_n from user_jobs;
  8                  exit when l_n = 0;
  9                  dbms_lock.sleep(1);
 10          end loop;
 11  end;
 12  /
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> !echo $TZ
EST
 
ops$tkyte@ORA9IR2> @bigdate
ops$tkyte@ORA9IR2> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';
 
Session altered.
 
ops$tkyte@ORA9IR2> select * from t;
 
WHERE_FROM X
---------- --------------------
plus       12-feb-2005 12:28:09
job        12-feb-2005 17:28:14
 
<b>the database tz is different from my dedicated server tz, someone somewhere has the WRONG tz environment variable setup</b>


this could be from the listener (who spawns your dedicated server)

it could be your environment (a direct connect to the database, no net involved).

it could be the environment of the admin who started the database.


We just pick it from there -- the process that is getting "sysdate" askes the OS what time is it, they tell us.  If you have the environment set wrong -- you get wrong stuff.
 

if it might be of some help to Sujit ....

pasko, February 12, 2005 - 3:44 am UTC

Hi Tom,

@comments to User Sujit :

i had same problems with this current_timestamp Function, which heavily depends on Client/Session Time Zone settings.

Try to use another Function: systimestamp
which works pretty much like sysdate with the added seconds precision digits and Time Zone info.

My experience is that systimestamp does not depend on the client's Time zone info: it works pretty much like sysdate.





Tom Kyte
February 12, 2005 - 12:41 pm UTC

sysdate and all depend on the dedicated server processes environment.


this is due to the server having one TZ setting and the dedicated server used to connect having a different one (someones got messed up stuff in their environment)

VIPR, February 24, 2005 - 4:29 am UTC

Hi Tom,

I want to change NLS_NCHAR_CHARACTERSET to WE8ISO8859P1 from AL16UTF16.

So I have done :

alter database backup controlfile;
shutdown immediate;
startup nomount;

CREATE CONTROLFILE REUSE DATABASE "mydbname" RESETLOGS ARCHIVELOG
-- SET STANDBY TO MAXIMIZE PERFORMANCE
MAXLOGFILES 50
MAXLOGMEMBERS 5
MAXDATAFILES 100
MAXINSTANCES 1
MAXLOGHISTORY 113
LOGFILE
GROUP 1 'D:\ORACLE\ORADATA\mydbname\REDO01.LOG' SIZE 100M,
GROUP 2 'D:\ORACLE\ORADATA\mydbname\REDO02.LOG' SIZE 100M,
GROUP 3 (
'D:\ORACLE\ORADATA\mydbname\REDO03.LOG',
'D:\ORACLE\ORADATA\mydbname\REDO03_1.LOG'
) SIZE 100M
-- STANDBY LOGFILE
DATAFILE
'D:\ORACLE\ORADATA\mydbname\SYSTEM01.DBF',
'D:\ORACLE\ORADATA\mydbname\UNDOTBS01.DBF',
'D:\ORACLE\ORADATA\mydbname\DRSYS01.DBF',
'D:\ORACLE\ORADATA\mydbname\EXAMPLE01.DBF',
'D:\ORACLE\ORADATA\mydbname\INDX01.DBF',
'D:\ORACLE\ORADATA\mydbname\ODM01.DBF',
'D:\ORACLE\ORADATA\mydbname\TOOLS01.DBF',
'D:\ORACLE\ORADATA\mydbname\USERS01.DBF',
'D:\ORACLE\ORADATA\mydbname\XDB01.DBF',
'D:\ORACLE\ORADATA\mydbname\OEM_REPOSITORY.DBF',
'D:\ORACLE\ORADATA\mydbname\EXAMPLE02.DBF',
'D:\ORACLE\ORADATA\mydbname\MYTEST.DBF',
'D:\ORACLE\ORADATA\mydbname\RCVCAT01.DBF'
CHARACTER SET WE8MSWIN1252
NATIONAL CHARACTER SET WE8ISO8859P1
;
NATIONAL CHARACTER SET WE8ISO8859P1
*
ERROR at line 31:
ORA-01967: invalid option for CREATE CONTROLFILE

Whats wronge I have done?

Tom Kyte
February 24, 2005 - 5:47 am UTC

what version.

A reader, February 24, 2005 - 6:24 am UTC

my oracle version is 9.2.0.1.0

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
PL/SQL Release 9.2.0.1.0 - Production
CORE    9.2.0.1.0       Production
TNS for 32-bit Windows: Version 9.2.0.1.0 - Production
NLSRTL Version 9.2.0.1.0 - Production
 

Tom Kyte
February 24, 2005 - 6:38 am UTC

never mind, didn't actually need the version, but national character set is not part of the create controlfile syntax and in 9i and above, the NCHAR character set range of values has changed.


</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96529/ch2.htm#100837 <code>

a single byte characterset isn't appropriate.

Insert XML from multiple character sets

Jeremy, April 15, 2005 - 3:58 pm UTC

Hi Tom,

Thanks for your useful site.  I have a project where we are inserting xml documents into an xmldb (9.2.0.5).  Most of the documents are American and have a character set where numbers are represented as "99.99" (like the final database), but some are from Europe and are using a different charceter set where numbers are represented as "99,99".  

Is there an internal way to let the database know when I insert, that some xml documents have a different charecter set?  I am not too concerend about losing any text information that might change from one characterset to the other, just the numeric information.  (When I try to insert an xml document with numerical values like "99,99" I get an, "ORA-20101: ERRORA-01722: invalid number")

Thanks

SQL> select * from nls_database_parameters;


PARAMETER                      VALUE
----------------------------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_NCHAR_CHARACTERSET         AL16UTF16
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CHARACTERSET               WE8ISO8859P1
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-YYYY
NLS_DATE_LANGUAGE              AMERICAN
NLS_SORT                       BINARY
NLS_TIME_FORMAT                HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT             HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY              $
NLS_COMP                       BINARY
NLS_LENGTH_SEMANTICS           BYTE
NLS_NCHAR_CONV_EXCP            FALSE
NLS_RDBMS_VERSION              9.2.0.5.0

  

Tom Kyte
April 15, 2005 - 4:09 pm UTC

you can use an alter session when processing those to set ehe nls_numeric_characters to ,. instead of .,


scott@ORA10G> select to_number( '9,9' ) from dual;
select to_number( '9,9' ) from dual
*
ERROR at line 1:
ORA-01722: invalid number


scott@ORA10G> alter session set nls_numeric_characters = ',.';

Session altered.

scott@ORA10G> select to_number( '9,9' ) from dual;

TO_NUMBER('9,9')
----------------
9,9



Be future safe

Mike Friedman, April 17, 2005 - 2:40 am UTC

One thing to realize is that in today's world the only future safe character set for your database is AL32UTF8. If you are on 9i then UTF8 is the best you can do, but you should be aware that you will not be able to support some CJKV characters.

In today's world where input comes as XML files and people use Java clients to access DB systems I don't know any way to be really confident that people aren't putting almost any character set they want into your database.

For example, your purchasing system may be for a US only company, but are you sure that your Japanese engineer isn't including a few Japanese language notes for your Japanese supplier to make sure he understands some complicated requirement? Or your supplier in China may fully understand that your company requires XML Advance Ship Notices (ASNs) to be in English only but he may not even realize that the double width letters he is using for emphasis are used in Chinese to match English and Chinese character widths and do not exist in any Western character set.

If you want to have a system that doesn't break in complicated and unexpected ways that corrupt your data use Unicode throughout your technology stack. It's the only way to go.

RE: Insert XML from multiple character sets

Jeremy, April 18, 2005 - 5:13 pm UTC

Thanks Tom!

I was a little skeptical that it would work with XML inserts, but I built a test case and it performs perfectly

Thanks Again.

Is there any way to see NLS settings of a session from within a different session?

A reader, May 08, 2005 - 8:23 am UTC

... through a dictionary view -- without tracing (or auditing?) such session?

Tom Kyte
May 08, 2005 - 8:28 am UTC

no, the session settings are in 'private' memory and not globally visible.

Request removal of question

Kirsten Ioos, June 06, 2005 - 7:45 am UTC

Can you remove my question (or at least most of the information I attached) for "Special char., language setting, and 8i to 9i client conversion April 30, 2004"? It is from Kirsten Ioos, in Raleigh. I had inadvertently included too much information in the question and that has been brought to my attention.

Sorry to put this in here - I had hoped to ask it as a new question, but I can't find any way to do so since you've had too much traffic for new questions in the last few weeks.

If this is not possible, can you tell me what I should do? Sorry!

Thanks ...

Tom Kyte
June 06, 2005 - 7:50 am UTC

it is gone.

SQL-NET CLOB charset conversion?

Harris, July 26, 2005 - 5:06 am UTC

Tom, am I correct to assume that SQL-NET will perform any required charset conversion if the client and database charsets are different? If so, does this also apply to CLOB's? Reason I ask is that I am writing an application that will return a CLOB (of XML) from a function. I strongly don't wish to be forced into having to return an XMLTYPE datatype - I need the function to simply return the CLOB. The problem I have however is (assuming my application is assembling that XML CLOB), what should be placed into the XML header "encoding" and/or "charset" attributes?

Would I place the charset of the database (NLS_DATABASE_PARAMETERS.PARAMETER="NLS_PARAMETER") or the client (V$NLS_PARAMETERS.PARAMETER="NLS_CHARSET")?

Eg, Assume the db charset is equivalent to IANA ISO-8859-1 and the client is UTF-8. What header should I place in the XML CLOB I generate ....

<?xml version="1.0" encoding="ISO-8859-1"?> or,
<?xml version="1.0" encoding="UTF-8"?>

Otherwise, is this a situation where I either have no choice but to return an XMLTYPE (rather than the CLOB itself) or should I be looking at using NCLOB's here? Or, is there something obvious that I've overlooked here?

Regards,

Harris.


Tom Kyte
July 26, 2005 - 8:13 am UTC

Yes.

clobs undergo characterset conversions, that is what sets them apart from BLOBs.


the encoding would be that of the client and the client would have to tell you what THAT is.

encoding is not madatory to be there.

NLS Parameter on Unix session not equal

modolo, August 04, 2005 - 7:27 am UTC

I have an 10.1.2.0 cient on my XP PC when I query this example.
SQL> select 'Stück' from dual;

'STÜC
-----
Stück
Thats correct but on the Unix Machine there is diffenerent

SQL> select 'Stück' from dual;
Show current NLS database settings
'ST?C
-----
St?ck
Waths wrong?

On the client
NAME                         VALUE$                     
---------------------------- -------------------------- 
DICT.BASE                    2                          
DEFAULT_TEMP_TABLESPACE      TEMP                       
DEFAULT_PERMANENT_TABLESPACE USERS                      
DBTIMEZONE                   +02:00                     
DEFAULT_TBS_TYPE             SMALLFILE                  
NLS_NCHAR_CHARACTERSET       AL16UTF16                  
GLOBAL_DB_NAME               BASISP                     
EXPORT_VIEWS_VERSION         8                          
NLS_LANGUAGE                 GERMAN                     
NLS_TERRITORY                SWITZERLAND                
NLS_CURRENCY                 SFr.                       
NLS_ISO_CURRENCY             SWITZERLAND                
NLS_NUMERIC_CHARACTERS       .'                         
NLS_CHARACTERSET             WE8ISO8859P1               
NLS_CALENDAR                 GREGORIAN                  
NLS_DATE_FORMAT              DD.MM.RR                   
NLS_DATE_LANGUAGE            GERMAN                     
NLS_SORT                     GERMAN                     
NLS_TIME_FORMAT              HH24:MI:SSXFF              
NLS_TIMESTAMP_FORMAT         DD.MM.RR HH24:MI:SSXFF     
NLS_TIME_TZ_FORMAT           HH24:MI:SSXFF TZR          
NLS_TIMESTAMP_TZ_FORMAT      DD.MM.RR HH24:MI:SSXFF TZR 
NLS_DUAL_CURRENCY            SF                         
NLS_COMP                     BINARY                     
NLS_LENGTH_SEMANTICS         BYTE                       
NLS_NCHAR_CONV_EXCP          FALSE                      
NLS_RDBMS_VERSION            10.1.0.4.0 

on the Unix Server
NAME                      VALUE$
------------------------- -----------------------------------
DICT.BASE                 2
DEFAULT_TEMP_TABLESPACE   TEMP
DEFAULT_PERMANENT_TABLESP USERS
ACE

DBTIMEZONE                +02:00
DEFAULT_TBS_TYPE          SMALLFILE
NLS_NCHAR_CHARACTERSET    AL16UTF16
GLOBAL_DB_NAME            BASISP
EXPORT_VIEWS_VERSION      8
NLS_LANGUAGE              GERMAN
NLS_TERRITORY             SWITZERLAND
NLS_CURRENCY              SFr.
NLS_ISO_CURRENCY          SWITZERLAND
NLS_NUMERIC_CHARACTERS    .'
NLS_CHARACTERSET          WE8ISO8859P1
NLS_CALENDAR              GREGORIAN
NLS_DATE_FORMAT           DD.MM.RR
NLS_DATE_LANGUAGE         GERMAN
NLS_SORT                  GERMAN
NLS_TIME_FORMAT           HH24:MI:SSXFF
NLS_TIMESTAMP_FORMAT      DD.MM.RR HH24:MI:SSXFF
NLS_TIME_TZ_FORMAT        HH24:MI:SSXFF TZR
NLS_TIMESTAMP_TZ_FORMAT   DD.MM.RR HH24:MI:SSXFF TZR
NLS_DUAL_CURRENCY         SF
NLS_COMP                  BINARY
NLS_LENGTH_SEMANTICS      BYTE
NLS_NCHAR_CONV_EXCP       FALSE
NLS_RDBMS_VERSION         10.1.0.4.0

can you help me? 

Tom Kyte
August 04, 2005 - 9:41 am UTC

what is the CLIENTS nls_lang setting?

Windows -> registry
Unix -> environment (defaults to us7ascii)


[tkyte@xtkyte-pc tkyte]$ echo $NLS_LANG
AMERICAN_AMERICA.WE8ISO8859P1


NLS Problems

Modolo, August 04, 2005 - 9:43 am UTC

Hi I solve my problem I set in the UNIX Shell on the .profile the NLS Parameter
export NLS_LANG=german_germany.we8iso8859p1

Now the problem is solved





utl_i18n.escape_reference

Harris, August 05, 2005 - 1:41 am UTC

Tom, thanks for your previous answer regarding CLOBs. I have another question. Does utl_i18n.escape_reference work correctly for characters above 100 hex? For example ...

Why does this work ...
select unistr('\0424') from dual;

But this doesn't work? ...
select utl_i18n.unescape_reference('&#x424;') from dual;

The database charset is WE8ISO8859P1 and the client NLS_LANG is AMERICAN_AMERICA.UTF8. However, the problem still seems to occur if I use NLS datatypes and different NLS_LANG settings on the client. I need to make use of the utl_i18n.escape_reference function but can't seem to get it to work for characters above about 100 hex.

Regards,

Harris.


Tom Kyte
August 05, 2005 - 11:01 am UTC

sorry, you have just exceeded my knowledge of utf/unicode/and all :)



utl_i18n.unescape_reference followup

Harris, August 08, 2005 - 1:34 am UTC

For info, I built a new DB using AL32UTF8 and utl_i18n.unescape_reference seems to work fine.


Here's a tricky one

Stefan, October 14, 2005 - 8:14 am UTC

Hello Tom,

i just ran into a situation i cannot possibly explain...
please take a look at that:

oradba@SYRPROD@sr-ac99-0181 #unset NLS_LANG
oradba@SYRPROD@sr-ac99-0181 #dba

SQL*Plus: Release 9.2.0.6.0 - Production on Fri Oct 14 14:05:22 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.6.0 - Production

sys@SYRPROD> drop table t;

Table dropped.

sys@SYRPROD> create table t(x varchar2(100));

Table created.

sys@SYRPROD> insert into t values ('ä');

1 row created.

sys@SYRPROD> insert into t values ('ää');

1 row created.

sys@SYRPROD> insert into t values ('äää');

1 row created.

sys@SYRPROD> quit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.6.0 - Production
oradba@SYRPROD@sr-ac99-0181 #export NLS_LANG=American_America.UTF8
oradba@SYRPROD@sr-ac99-0181 #dba

SQL*Plus: Release 9.2.0.6.0 - Production on Fri Oct 14 14:05:54 2005

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.6.0 - Production

sys@SYRPROD> drop table t;

Table dropped.

sys@SYRPROD> create table t (x varchar2(100));

Table created.

sys@SYRPROD> insert into t values ('ä');
ERROR:
ORA-01756: quoted string not properly terminated


sys@SYRPROD> insert into t values ('ää');
ERROR:
ORA-01756: quoted string not properly terminated


sys@SYRPROD> insert into t values ('äää');

1 row created.

sys@SYRPROD> quit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.6.0 - Production

what really puzzles me is why the 'äää' works just fine, yet 'ä' and 'ää' don't.
i've experienced this situation as well trying to insert a value consisting of a longer name, but containing 'ä' as well.

even odder, other umlauts, like 'ü' and 'ö' work just fine.

maybe you can shed some light on this ? :)

Regards Stefan


Tom Kyte
October 14, 2005 - 9:06 am UTC

sorry, I cannot, I don't really even have the ability to put in the a with dots..


when you unset the nls_lang, your default on unix would be us7ascii - so that would have been converted to "a" probably. Not sure what the UTF8 is doing - it could be your terminals character set (i see different characters for the same keystrokes depending on what encoding scheme my linux terminal is using)

8bit data stored in US7ASCII characterset

Khalid, December 07, 2005 - 8:39 am UTC

Tom,
We are facing some issues while migrating our database from 8i to 9i. The 8i database uses US7ASCII character set but the new 9i database has its characterset as WE8ISO8859P1. Now we are facing issues with special characters like ® (The registered trademark symbol). What i fail to understand is that ® does not fall under 7-bit encoding range and infact its extended-ASCII value is 174, so how was it possible to store this character using a 7 bit character set in the first place. Also while migrating from 8i to 9i it seems that the character ® is replaced by .(dot) given the fact that WE8ISO8859P1 is a superset of US7ASCII.

Can you please help us on this?

Thanks,
Khalid

Tom Kyte
December 08, 2005 - 1:00 am UTC

if the client character set was US7ASCII and the database was US7ASCII, character set translation is in effect "not happening". The database says "hey, we share the same character set, I can just take whatever the client gives me and I'll store it as is"

Since the smallest thing that can be stored is really a byte, that is what is stored.

And if the client which said "I will give you us7ascii data", really gave it 8bit data - there is nothing we do about that - we trust the client to have given us us7ascii data.

IMP-00017

Olaf, December 12, 2005 - 6:03 am UTC

Tom,

I want to export/import the database from one windows server (OS lang = german) into another windows server with OS lang = american importieren. The nls_character_set for both server is UTF8.
So I done the export using set NLS_LANG=german_germany.utf8
on "germ" server and then try to import with the same setting (NLS_LANG=german_germany.utf8) into "american" server. I got a lot of following errors by importing:

IMP-00017: following statement failed with ORACLE error 6550:
"DECLARE SREC DBMS_STATS.STATREC; BEGIN SREC.MINVAL := '30312D30363835382D3"
"038'; SREC.MAXVAL := '38312D39303031342D3030'; SREC.EAVS := 0; SREC.CHVALS "
":= NULL; SREC.NOVALS := DBMS_STATS.NUMARRAY(2502276674683920000000000000000"
"00000,291766045114744000000000000000000000); SREC.BKVALS := DBMS_STATS.NUMA"
"RRAY(0,1); SREC.EPC := 2; DBMS_STATS.SET_COLUMN_STATS(NULL,'"ART_NKW"','"AR"
"T_NR"', NULL ,NULL,NULL,1504,,000664893617021277,0,srec,12,2); END;"
IMP-00003: ORACLE error 6550 encountered
ORA-06550: line 1, column 405:
PLS-00103: Encountered the symbol "," when expecting one of the following:

( - + mod not null others <an identifier>
<a double-quoted delimited-identifier> <a bind variable> avg
count current exists max min prior sql stddev sum variance
execute forall time timestamp interval date
<a string literal with character set specification>
<a number> <a single-quoted SQL string>
The symbol "null" was substituted for "," to continue.

Have you any idea what is the problem?

Tom Kyte
December 12, 2005 - 8:04 am UTC

you don't say, but is this "old stuff", like old 8i?

the problem is right there:

... ,1504,,00066489361702 ....


the ,, - it is missing the word NULL. now, since this is just statistics, it is "OK", you'll want to gather statistics after importing, the data was imported, but the act of SETTING the statistics failed.

Import

Olaf, December 12, 2005 - 8:23 am UTC

Thanks Tom.

Yes, it was 8.1.7. And yes, the errors were only regarding statistics. But if I do the same import without explicit setting NLS_LANG to german_germany.utf8 I get no such errors (that is why I thougt that the errors were due to NLS settings).

Thanks again

Olaf

Tom Kyte
December 12, 2005 - 8:33 am UTC

it could be - I vaguely recall something (that was why the reference to old software). If the


NULL ,NULL,NULL,1504,,000664893617021277,0,srec,12,2); END;
^^^

is really for the decimal place (nls number settings), instead of a missing NULL, sure.

Hmm ... any other thoughts on ...

Greg, December 12, 2005 - 8:44 am UTC

You said:

>> And if the client which said "I will give you us7ascii
>> data", really gave it 8bit data - there is nothing we do
>> about that - we trust the client to have given us
>> us7ascii data

Agreed ... however, is there anything you could/would recommend to anyone in this situation? We've got a similar situation, and are beginning to think about upgrading to WE8ISO8859P1 (or something similar - since we obviously *need* it) ...

I'm starting to think and ponder the possibility of writing our own "conversion" routine .. something that "maps" characters higher than 128 to some combination of characters (heck: even to something like \174) then map them back once we upgrade the character set ...

Obviously, the columns would have to hold those extra character lengths .. (strings would grow, rows would chain ... an ice age will come) ...

Is this the only way to do it?? We've tried some tests and experiments with Import/Export .. but obviously, that's not doing it ... is this the only way to avoid losing data during conversion??


Tom Kyte
December 12, 2005 - 8:50 am UTC

define losing data, you are going to take a 256 character "set" and map them yourself? that'll "lose" as well.


You could always "unload the data" using your own program/unloader to a flat file, reload it.

Numeric Characters

Anirudh, February 07, 2006 - 1:00 pm UTC

Hi Tom,

We want to load a data file into our database. The file has numeric equivalants of some characters like single quote (with a numeric equivalant for E.g \221). while loading into our database tables these numeric values are auto converted to some character and stored (like /221 is converted to some funky character) I have tried using utf8 and we8iso as client character set but of no use.
The problem is while rendering this value, html does not recognize the auto converted characters and shows a square box, while if I put numeric equivalants like \221 or \345 it displays the desired character.
Is there a way by which i can stop this auto conversion while loading the database table (through sqlldr) so that the numeric values stay intact?

Database :- 912
nls_database_parameters:-
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET WE8ISO8859P1

NLS_lang client - WE8ISO8859P1

Thanks,
Anirudh



Tom Kyte
February 08, 2006 - 1:38 am UTC

does not compute.

I don't know what you mean by

"The file has numeric equivalants
of some characters like single quote (with a numeric equivalant for E.g \221). "



Changing character set from WE8MSWIN1252 to WE8ISO8859P1

Vande Matram, February 14, 2006 - 11:25 am UTC

Hi,

We developed application with database that is WE8ISO8859P1. When we transfered the metadata files to our client who are using WE8MSWIN1252, they started seeing "¿" for line feeds/carrige returns. Can someone please suggest what is the best way to get this fixed? The client has to change the setting on their database as we as a software company cannot change our standard (our other clients use WE8ISO8859P1).
Can someone please provide a list of steps that we can tell our client to perform in order to get this problem fixed?
This is very urgent. Thank you in advance.

Tom Kyte
February 14, 2006 - 1:20 pm UTC

well, there isn't anything to "fix", you converted character sets.

I'm not sure why the upside down question mark is appearing - chr(13)||chr(10) should not have been affected.

have you used the DUMP() function to see what the data really is in your database.

windows like sort

Jelena, March 22, 2006 - 7:48 am UTC

Hi Tom,
we have a windows-sort "spoiled" users and i can not find proper sorting in oracle that is "windows like".
I do not use windows, i use linux, and do not agree that windows is standard, but 99% of our users do, and they want to have underscore sorted before the numbers. And BINARY_AI for nls_sort is almost perfect, except for order of "_" (and we would like to keep binary sort because of performance).
Any solution you can recommend?
Thanks in advance,
Jelena

Tom Kyte
March 22, 2006 - 3:56 pm UTC

does sorting by replace(col,'_','/') help?

Jelena, March 23, 2006 - 12:20 pm UTC

> does sorting by replace(col,'_','/') help?

hm, doesnt it slow down the sort?
besides, the issue is that it is not only _ but some other spec chars, so yes, it might be the solution, but since we use java and object mapping, it's not possible to put it everywhere...

Strange is that i dont think we are the only one facing that problem, so does oracle have plans to provide something like alter session set NLS_SORT='WINDOWS_AI'? :)



Tom Kyte
March 23, 2006 - 1:49 pm UTC

Windows_ai? doubtful - not that I know of.



Export done in US7ASCII character set and AL16UTF16 NCHAR character set

Dawar Naqvi, May 03, 2006 - 10:54 am UTC

Tom,

Recently I have applied 10g Relase 1 (10.1.0.5) patch set for AIX 64 -- Patch # 4505133.

My earler version of database was 10.1.0.3.0.

I got message patch successfully installed.
I can login to the database.

One of our weekly routine is to run our internal shell script interface programs.

Very first step of this program is to created .dmp (exp) file. We are doing this from years.

Now after applying this patch set,
I am getting error in this program as below:

Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.5.0 - Production
With the Partitioning, OLAP and Data Mining options
Export done in US7ASCII character set and AL16UTF16 NCHAR character set
server uses AL32UTF8 character set (possible charset conversion)

About to export specified users ...
. exporting pre-schema procedural objects and actions
. exporting foreign function library names for user WITS
. exporting PUBLIC type synonyms
. exporting private type synonyms
. exporting object type definitions for user WITS
About to export WITS's objects ...
. exporting database links
. exporting sequence numbers
. exporting cluster definitions
EXP-00056: ORACLE error 932 encountered
ORA-00932: inconsistent datatypes: expected BLOB, CLOB got CHAR
EXP-00000: Export terminated unsuccessfully


Regards,
Dawar



Tom Kyte
May 03, 2006 - 1:12 pm UTC

seems definitely like something you would want to use support for - doesn't it?

Export done in US7ASCII character set and AL16UTF16 NCHAR character set

Dawar, May 03, 2006 - 3:46 pm UTC

Tom,

This is the Linux box not the AIX.

According to Doc ID: Note:339938.1 ,
I run catmetx.sql.

But still getting exp errors on export.

After that I run utlrp.sql, and catproc.sql respectively

Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.5.0 - Production
With the Partitioning, OLAP and Data Mining options
Export done in US7ASCII character set and AL16UTF16 NCHAR character set
server uses AL32UTF8 character set (possible charset conversion)

About to export specified users ...
. exporting pre-schema procedural objects and actions
. exporting foreign function library names for user abc
. exporting PUBLIC type synonyms
. exporting private type synonyms
. exporting object type definitions for user abc
About to export WITS's objects ...
. exporting database links
. exporting sequence numbers
. exporting cluster definitions
EXP-00056: ORACLE error 932 encountered
ORA-00932: inconsistent datatypes: expected BLOB, CLOB got CHAR
EXP-00000: Export terminated unsuccessfully

Dawar

Tom Kyte
May 03, 2006 - 4:34 pm UTC

have you followed the instructions for the upgrade? You should not have to do this (and utlrp before catproc doesn't entirely make sense).

Export done in US7ASCII character set and AL16UTF16 NCHAR character set

Dawar, May 04, 2006 - 12:19 am UTC

Tom,

It was typo, (above)
I have run catproc & utlrp.sql respectively.

It is our weekly exp of one of our main schema.
Not the upgarde.

History of this DB:

More than a year ago, I exp/imp database from 7.3 to 10.0.1.0.

After that I have applied pathes twice.
i.e:10.1.0.3.0 & now 10.1.0.5.0.

After 10.1.0.5.0, I am getting this issue.
Please see above feedback.

Dawar

Tom Kyte
May 04, 2006 - 2:13 am UTC

my feedback is going to be very consistent:

seems definitely like something you would want to use support for - doesn't it?


You did an upgrade - you just told me you did. "I have applied patches twice".

What is the difference between WE8MSWIN1252 and WE8ISO8859P15 character set?

A reader, May 05, 2006 - 10:59 am UTC

Tom,

What is the difference between WE8MSWIN1252 and WE8ISO8859P15 character set? We are going to move our old Linux Oracle server to Windows and my questions is:

If I export database schema from Linux (WE8ISO8859P15) and import it into existing Oracle instance on Windows (WE8MSWIN1252), is there anything I should worry about (settings for NLS_LANG during export/import, change NLS_LANG on the application server after switching to the new database etc)?.



Tom Kyte
May 05, 2006 - 2:53 pm UTC

WE8ISO8859P15 - ISO standard is "smaller than" the Microsoft created and default character set on windows of WE8MSWIN1252.

The other way around WE8MSWIN1252 encodes everything that is encoded in WE8ISO8859P15

so your we8iso data should fit ok int we8mswin.


you'll want to use the we8iso nls_lang for the export
you'll want to use the we8mswin for the import

and your app server should likely (you decide) use the characterset of the database it connects to.


and before to test ;)

Export done in US7ASCII character set and AL16UTF16 NCHAR character set

Dawar, May 05, 2006 - 3:03 pm UTC

Tom,

Thanks for Oracle Support.

Here is the solution my question: (2nd above)


SQL> connect / as sysdba
SQL> @?/rdbms/admin/catmetx.sql
SQL> @?/rdbms/admin/utlrp.sql

And its work

Regards,
Dawar 

What is the difference between WE8MSWIN1252 and WE8ISO8859P15 character set

A reader, May 08, 2006 - 3:57 am UTC

Tom,

Thanks for your quick response to my question regarding the difference between WE8MSWIN1252 and WE8ISO8859P15 character set. Actually I am about to create a new database on Windows for importing our old Linux database (WE8ISO8859P15).
What would you prefer to do in this case: create a new database with the same character set as the old one (WE8ISO8859P15) and avoid character set conversion or to create the new database with the Windows default characther set WE8MSWIN1252?

What are potential drowbacks of having "not default" characther set for daily tasks like exporting/importing objects, RMAN backup etc. and for emergencies like restoring etc? I can set NLS_LANG explicit in all my batch scrips and sessions, but I see this as a POTENTIAL error...

Tom Kyte
May 08, 2006 - 8:16 am UTC

I would prefer to not put it on windows - truth be told - as I would be at a loss managing it :)

since you have a superset characterset, everything should go OK this way - but when you decide to move back - you'll be going to a subset characterset.

your choice is a long term one - perhaps it is time to even consider a multi-byte characterset - but this is a choice you have to make.


Whatever characterset you choose will become "your default", you'll just want to make sure clients use the same characterset if you don't want characterset conversion.

RMAN will be not really affected as backups are binary copies of data.

It is only a potential error if you "misconfigure" things - you already likely have windows clients (with the windows character set) going against your linux databases right?

Identifying Non-English characters

R.S.Karthik, June 22, 2006 - 5:12 pm UTC

Hi Tom

Is it possible to identify non-english characters from a column.... ?
To overcome limitation of a target system, the requirement is to not send any non-english characters in a specific field....

One way is to use the "dump" keyword to get the and check for any alphabet having the code > 127 but somehow did not sound like a clean way....

Thanks in Advance for any help on this.



Tom Kyte
June 23, 2006 - 9:53 am UTC

define for me "english" first :)

How about this - set up a string that consists of all characters you consider 'bad', say anything above 127:


begin
for i in 128..255
loop
l_str := l_str || chr(i);
end loop;


then you can use translate:

translate( data, l_str, rpad( ' ', length(l_str) );




Nice.... !

Karthik, June 27, 2006 - 6:56 pm UTC

Thanks Tom.... Good one.... !


Kevin, July 11, 2006 - 2:55 pm UTC

Hi TOM,

We're running oracle 10.1.0.4.2 with nls_characterset AL32UTF8.
I've insert data from file which located on server into a table. After that, I've try to retrieve the data with the "select * from table_name" command and the ouput just display question marks like below:

select output from dlog_log;

OUTPUT
--------------------------------------------------------------------------------
????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????


here is the step i did:

create table dlog_log (output clob);

create or replace procedure load_data
AS
l_clob clob;
l_bfile bfile;
begin

insert into dlog_log (output) values
(empty_clob())
returning output into l_clob;

l_bfile := BFILENAME ('TEMP_DIR', 'export.log');
DBMS_LOB.FILEOPEN(l_bfile);
DBMS_LOB.LOADFROMFILE(l_clob, l_bfile, dbms_lob.getlength(l_bfile));
DBMS_LOB.FILECLOSE(l_bfile);
end;

exec load_data;


select * from v$nls_parameters;

PARAMETER VALUE
------------------------- --------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET AL32UTF8
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM

PARAMETER VALUE
------------------------- --------------------
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXF
F AM

NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXF
F AM TZR

NLS_DUAL_CURRENCY $
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE

PARAMETER VALUE
------------------------- --------------------
NLS_NCHAR_CONV_EXCP FALSE

19 rows selected.


How can I get rid of this problem?

Thank you so much

Tom Kyte
July 12, 2006 - 3:10 pm UTC

what exactly is the client's character set.

and the "client" (eg: on linux, unless I set my TERMINAL settings to handle multi-byte data - they don't).



Alberto Dell'Era, July 12, 2006 - 3:44 pm UTC

Could the real issue be that the data in the BFILE is not AL32UTF8 in the first place ? According to

</code> http://download-uk.oracle.com/docs/cd/B14117_01/appdev.101/b10802/d_lob.htm#998779 <code>

dbms_lob.loadfromfile does not perform character set conversions:

"No character set conversions are performed implicitly when binary BFILE data is loaded into a CLOB. The BFILE data must already be in the same character set as the CLOB in the database."

dbms_lob.loadCLOBfromfile does perform character sets conversion instead (parameter src_csid).

Mubeen, August 07, 2006 - 2:35 pm UTC

Hello Tom,

One of our Militry unit have Oracle 8.1.5.0 / Novell 5( implemented by third company, we are not aware of it), using some power builder application as front end. Unfortunately last week there was a problem, since then they are getting arabic data in question marks. v$nls_parameters shows nls_char as AR8MSAWIN and nls_nchar as AR8MSWIN1256.

There is no CSSCAN utility for this version on novel. I tried to change nls_char using ALTER DATABASE to AR8MSWIN1256, but once I restart instance v$nls_parameter shows nls_char again as AR8MSAWIN.

So after altering characterset Before shutting down I took export for that characterset (AR8MSWIN1256) & tried to import on another machine, but didnot work.

When I inspected one of clients I found nls_lang as AMERICAN_AMERICA.WE8ISO8859P1 ?

Is all arabic data is lost?? Could please guide me towards a solution for this ?

Thanks a lot

Tom Kyte
August 07, 2006 - 9:46 pm UTC

if the single byte client read the data out and then modified the data (wrote it back), then you have only that which the single byte client can send - which doesn't include all of the arabic characters.

I have no idea why you would try what you did (the alter database).

Please utilize support - so they can get an entire history of what is going on and why - but likely, it is "gone" if they did the update (except from your backups, but given the description, it is not likely lots of those exist..)

NLS_LANG

Si, August 08, 2006 - 5:52 am UTC

Assuming that you're data is safe and sound, i.e. you haven't written it back to the DB using an incorrect character set, I think you should try modifying your NLS_LANG setting on your client.

You need to use one of the single-byte sets listed here:

</code> http://download-uk.oracle.com/docs/cd/B10501_01/server.920/a96529/appa.htm#968651 <code>

On windows I've successfully used AR8MSWIN1256, e.g. NLS_LANG = AMERICAN_AMERICA.AR8MSWIN1256

os locale and nls_lang

Gerardo, August 12, 2006 - 12:58 pm UTC

Hi Tom,

We need to know Solaris OS locale and database nls_lang relationship.
Database character set is WE8MSWIN1252. What NLS_LANGUAGE, NLS_TERRITORY and LC_ALL (locale) values could we set up?

Thanks in advance.

Tom Kyte
August 12, 2006 - 8:03 pm UTC

not sure what you mean?

os locale and nls_lang

A reader, September 08, 2006 - 9:47 am UTC

gerardo, you might have a look at metalink note "264157.1", "The correct NLS_LANG setting in Unix Environments".

data is not same shown by oracle sql developer and sql *plus

Vijay, October 05, 2006 - 10:01 am UTC

Dear Tom,
good day to you, I am facing a weird problem, we are having Oracle database 10.2.0.2.0 on AIX box, the characterset for database are as follows

PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET AR8ISO8859P6
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY

PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_RDBMS_VERSION 10.2.0.2.0

This database is to support Arabic data, when I query the data from oracle sql developer I see data in Arabic but if I query the data from sql * plus it's junk characters, my client is on Windows XP with Arabic support the nls_lang setting is AMERICAN_AMERICA.AR8ISO8859P6.

I am not able to figure out how Oracle Sql developer is able to show proper data, but sql * plus is not able to convert it in proper format, any pointers to resolve this will be a great help.

Kind Regards,
Vijay.


Tom Kyte
October 05, 2006 - 1:12 pm UTC

do you have the proper windows fonts installed here, are you POSTIVE that the character set is that (sql developer is java, java is unicode... it will be different character set)

storage czech character 'š' in DB

Marko, October 19, 2006 - 7:08 am UTC

I've problem with storage character 'š' or '&#283;' in Oracle 10gXE.Default NLS_CHARACTERSET in database settings after instalation is WE8MSWIN1252.How can I resolve this problem?

character set

A reader, October 24, 2006 - 3:54 pm UTC

Tom:

1. When we would you consider changing the character set for a database from WE8 to UTF8? Is it if the database is required to support foreign languages too?

2. Is there any benefit of using utf8 ove We8 or it is just two diferent representation of the same data and it has no effects.



Tom Kyte
October 25, 2006 - 9:21 am UTC

1) when I had a need to store multi-byte data.

2) utf8 allows you to store things we8 does not - that is the "advantage"

utf8

A reader, October 25, 2006 - 11:41 am UTC

Tom:

1. What are the things that "utf8" allows you sto store that the WEISO does not? is it foreign characters only?

2. Is there something called that UTF8 allows you to generate UTF data that can be included into XML files. Someone says that there are benefits of utf8 in storage of xml files in the database but I do not beleive that. XML is english and should make no difference. WE8 allows you to create store XML files or generate xml formats that can be incorporatd to other systems. Any comments.

3. Can you change the character set for an existing database ar you have to create a new utf8 one and export/impot and then retest all functions.

Tom Kyte
October 25, 2006 - 2:01 pm UTC

mulitbyte characters - please define "foreign" for me :) to the person using that language, well, they are not foreign

see the globalization guide, it covers these topics including the changing of a database characterset.

utf8

A reader, October 25, 2006 - 12:09 pm UTC

Tom:

1. Can you clarify this issue a little. XML requires UTF8 encoding. Can oracle WE8 character set create that foramt or you need to have it set to UTF-8.

<Google's Sitemap service recently caused a minor stir in the XML community by requiring that all sitemaps be published exclusively in the UTF-8 encoding of Unicode. Google doesn't even allow alternate encodings of Unicode such as UTF-16, much less non-Unicode encodings like ISO-8859-1. Technically, this means Google is using a nonconforming XML parser, because the XML Recommendation specifically requires that "All XML processors MUST accept the UTF-8 and UTF-16 encodings of Unicode 3.1." However, is this really such a big problem?

Thanks


UTF8

A reader, October 25, 2006 - 2:25 pm UTC

Tom:

Foreign = anything not english character

Where is the globalization guide at.

Do you have any comment on XML requirement for UTF8 encoding and whether oracle WE8 supports that (or you do not need to)

Tom Kyte
October 25, 2006 - 4:22 pm UTC

that is a strange definition of foreign, but.....

if you need MULTI-BYTE DATA, then you need a character set that can store multi-byte data. It is that simple..........


I'm not into xml myself :)

UTF8

A reader, October 25, 2006 - 4:34 pm UTC

Tom:

1. Let us sau you have a database A that is using WE8ISO and you want to convert it to UTF-8. DO you create a new database and export/import or you set change the setting in the existing one.

2. When you do this conversion do you revise the column sizes of each column since you mentioned some columns that are varchar2(40) may need to become varchar2(160).

3. Can a WE8ISO database create an XML file encoded in UTF8 format. Does oracle have any packages that does the conversion.

Thanks,

Tom Kyte
October 25, 2006 - 4:42 pm UTC

1) I read the globalization guide, which I pointed you to before, to find out my options.

</code> http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/ch2charset.htm#sthref157 <code>

2) only answer that applies is "it depends", do you want to? do you need to? what are YOUR requirements, what do YOU want to support?

3) western european 8 bit data is a subset of utf8, so sure.

UTF8

A reader, October 25, 2006 - 4:54 pm UTC

Tom:

thanks for the link. i will read it.

as a followup to #3 above, how would you generate an xml file of the EMP table in UTF8 format (from oracle database in WE8ISO). Would just print the header statement
<?xml version="1.0" encoding="UTF-8"?>

make it UTF8 format. I am missing something here.

Tom Kyte
October 25, 2006 - 5:04 pm UTC

see my comment above regarding my expertise with XML.

eg: none.

utf8

A reader, March 28, 2007 - 4:49 pm UTC

Tom:

1. If you have an existing varchar2(4000) column, and your database WE8 and you want that column now to store data in UTF8. IF you change the data type to nvarchar2(4000), would you lose the data since the 4000 characters were saved as one byte/character and now they will 2 bytes/character?

2. Would it better to save UTF8 data in this case in NCLOB?

THanks
Tom Kyte
March 30, 2007 - 11:41 am UTC

1) that is a varying length encoding scheme whereby characters take one OR MORE bytes - so, would you lose characters? maybe - depends on what was in there in the first place. 4000 letter 'A' - no, you would not lose any, that takes 1 byte in WE8 and UTF. Some other character - yes, you could because it takes two bytes.

2) only you can answer this. Is 4000 bytes sufficient to store your data, or not. You tell us. If you say "no", then a 4000 byte field would not be appropriate for you!

utf8

A reader, March 30, 2007 - 4:32 pm UTC

TOm:

How you would then solve the problem.

If You were asked to take an existing varchar2(4000) column that stores XML file data in WE8 to store XML data in UTF8 formats. What is the best approach that ensures no data loss? i.e. would you modify existing column, or create a new column.

2. How can you confirm that data was lost or not after you convert? do you compare columns together.

3. Based on what you are saying it sounds that all english characters and numbers will convert fine since they will only consume 1 byte in WE8 and UTF8. IT seems the problem may lie in special characters and foreign characters. is this acurate?
Tom Kyte
March 30, 2007 - 4:46 pm UTC

you, you have to answer this.

will your data exceed 4000 bytes. if you answer "yes, it can" - then you will not be using a varchar2(4000), it won't FIT.

2) tell me how you are going to "convert", it would FAIL upon insert for example.

3) I don't personally know the utf8 encoding scheme, not sure if all western european 8 bit characters are represented in a single byte or not.

if you ask me - we8 already HAS foreign characters - I only need 7 bits - us7ascii :)

Alberto Dell'Era, March 30, 2007 - 6:19 pm UTC

> I don't personally know the utf8 encoding scheme, not sure if all western european 8 bit characters are represented in a single byte or not.

Unfortunately (for me) I know way too well that important characters in Italian such as accented vowels (à, è, ì, ò, ù), that take 8 bit in iso8859-1 (so in WE8ISO88591), take 2 chars in utf8 (so in UTF8 and AL32UTF8). Same for many characters in German, French, etc etc.

But since these special characters are less than 2-3% of the total in normal phrases, this is not a big issue, length in bytes is normally just a tad greater than length in chars.

Unless you try to store an accented char in a VARCHAR2(1 BYTE) ...

utf8

A reader, March 30, 2007 - 11:19 pm UTC

Tom:

I think you are misunderstanding me.

<will your data exceed 4000 bytes. if you answer "yes, it can" - then you will not be using a varchar2(4000), it won't FIT.>

yes the varchar2(4000) is currently fine and fits the XML text data that needs to be saved. However oracle internally is saving it in WE8 format. Now, I want to change that column to store data in UTF8 format. Can i just change the existing varchar2(4000) column to nvarchar2(4000) and data would be then saved in utf8 format? or do I need to do something else.


Tom Kyte
March 31, 2007 - 12:54 pm UTC

arg - i don't know how else to say this:

will YOUR data exceed 4000 bytes in UTF

I don't know, only YOU know. Only you can answer that.



utf

A reader, March 31, 2007 - 10:09 pm UTC

Tom:

How do I know?

Let us say I have a varchar2(12) in WE8 database to store names. The longest name is "John Killeman".

If I understand correctly, the database assigns one byte/per character in WE8.

Now, I need to store this in UTF8.

DO I need to create varchar2(24) because UTF8 stores 2 bytes/per character so I do not lost any data? or just keep it at varchar2(12) and the database will save utf8 internally fine.
Tom Kyte
April 01, 2007 - 8:02 pm UTC

You will need to do.... that research.

As I said, I am not the resident expert that has each character set burned into memory and call tell you YOUR data (of which I know not) will fit.

As Alberto above gave an example for - the answer could well be NO. Do you store that stuff he was talking about??? Do you store stuff that falls into this category that he did not point out???

I don't know, but you - you have a chance of knowing since you have the source data.

utf8

A reader, April 02, 2007 - 4:39 pm UTC

Tom:

I think I finally understand what you are trying to explain to me. But let me confirm:

If I have a varchar2(4000) existing column and I modify it to nvarchar2(4000) (or create a new column, update date and then delete the old one), oracle will do the conversion automatically for me and if it could not fit the data in UTF 8 format it will give me an error exactly like trying to save 25 characters into varchar2(20) field.

Same thing will happen when you try inserting data into nvarchar2(4000), it will give me an error.

Now if this happens, the SOLUTION for this is to go to NCLOB data type. This data type can store up to 4 gigabytes of data in UTF8 format.

Is this correct?

Tom Kyte
April 03, 2007 - 9:13 pm UTC

sort of - the alter will complain if the THEORETICAL maximum is too big - not the real. Only you know if the real maximum is too big...

UTF8

Chris Brown, April 03, 2007 - 4:22 am UTC

> Now if this happens, the SOLUTION for this is to go to NCLOB data type. This data type can store up to 4 gigabytes of data in UTF8 format.

Alternatively you could use CHAR length semantics in the database. NLS_LENGTH_SEMANTICS = CHAR.
Then your varchar2(4000) column can be defined to store 4000 characters rather than 4000 bytes. This has many advantages if converting an existing application.
Tom Kyte
April 03, 2007 - 11:12 pm UTC

ahh, but the catch is....

4000 bytes is the ultimate maximum. you can say 4000 characters, but it'll always be limited to 4000 bytes.

Clarification

Chris, April 03, 2007 - 4:40 am UTC

Sorry - was a bit hasty in the last post.
CHAR length semantics would be good for a varchar2(100) field for example. However, if you really have over 4000 bytes of data in a column converted to UTF8 then defining the column as varchar2(4000) CHAR won't help. The 4000 byte limit remains.
Tom Kyte
April 03, 2007 - 11:13 pm UTC

yes, correct...

UTF8

A reader, April 04, 2007 - 1:46 pm UTC

Tom:

<sort of - the alter will complain if the THEORETICAL maximum is too big - not the real. Only you know if the real maximum is too big... >

1. What is the difference between theoratical max and real max. Can you prvode a real example?

2. When you convert a column from varchar2 to nCLOB, can you lost some data? Is there a way to verify that no data was lost in conversion?

Thank you,

Tom Kyte
April 04, 2007 - 3:46 pm UTC

1)

ops$tkyte%ORA10GR2> create table t ( x varchar2(4000), y varchar2(2000) );

Table created.

ops$tkyte%ORA10GR2> insert into t values ( 'x', 'y' );

1 row created.

ops$tkyte%ORA10GR2> alter table t modify x nvarchar2(4000);
alter table t modify x nvarchar2(4000)
                                     *
ERROR at line 1:
ORA-00910: specified length too long for its datatype


ops$tkyte%ORA10GR2> alter table t modify y nvarchar2(2000);

Table altered.


varchar2(4000) -> nvarchar2(4000) that won't fly because in theory it might not fit.

2) the clob is large enough to hold any varchar2(4000) being converted to the natiional character set.

however, think long and hard about this - it is a rather big step to go to a clob from the varchar2 - you want to make sure you have to and want to do that.

utf8

A reader, April 04, 2007 - 5:06 pm UTC

Tom:

PLease see example below.

1. I could not create nvarchar2(4000). maximum allowed was nvarchar2(2000). is this correct?

2. When I converted from VARCHAR2(4000) to NCLOB ,data seems to have been lost unless ther is a different way to select from NCLOB field.

SQL> desc test15;
Name Null? Type
----------------------------------------------------- -------- ------------------------------------
A1 VARCHAR2(4000)
A2 NVARCHAR2(2000)
A3 NCLOB


1 insert into test15(a1) values ('<?xml version="1.0"?>
2 <o:program>
3 <o:type name="HelloWorld">
4 <o:function name="hello">
5 <o:do>
6 <o:return select="Hello World"/>
7 </o:do>
8 </o:function>
9 </o:type>
10 <o:set instance="HelloWorld()"/>
11 <o:eval select="instance.hello()"/>
12 </o:program>
13* ')
SQL> /

1 row created.

SQL> commit;

Commit complete.

SQL> update test15 set a2=a1;

1 row updated.

SQL> update test15 set a3=a1;

1 row updated.

SQL> commit;

Commit complete.

1* select a1 from test15
SQL> /

A1
--------------------------------------------------------------------------------------------------
<?xml version="1.0"?>
<o:program>
<o:type name="HelloWorld">
<o:function name="hello">
<o:do>
<o:return select="Hello World"/>
</o:do>
</o:function>
</o:type>
<o:set instance="HelloWorld()"/>
<o:eval select="instance.hello()"/>
</o:program>


1 row selected.

SQL> select a2 from test15;

A2
--------------------------------------------------------------------------------------------------
<?xml version="1.0"?>
<o:program>
<o:type name="HelloWorld">
<o:function name="hello">
<o:do>
<o:return select="Hello World"/>
</o:do>
</o:function>
</o:type>
<o:set instance="HelloWorld()"/>
<o:eval select="instance.hello()"/>
</o:program>


1 row selected.

SQL> select a3 from test15;

A3
--------------------------------------------------------------------------------
<?xml version="1.0"?>
<o:program>
<o:type name="HelloWorld">
<o:function n


1 row selected.


Tom Kyte
April 04, 2007 - 6:07 pm UTC

set long 500000

else lobs/longs are not entirely printed by sqlplus, just a formatting thing.

utf8

A reader, April 05, 2007 - 1:04 pm UTC

TOm:

you are right it is there.

SQL> select a3 from test15;

A3
--------------------------------------------------------------------------------
<?xml version="1.0"?>
<o:program>
<o:type name="HelloWorld">
<o:function name="hello">
<o:do>
<o:return select="Hello World"/>
</o:do>
</o:function>
</o:type>
<o:set instance="HelloWorld()"/>
<o:eval select="instance.hello()"/>
</o:program>


1. Does this mean that now I have my xml stored in UTF8 format (if natinal character set is set to UTF8) in a2 and a3?

2. Would this select statement prove that no data loss occured or you have to write a program to compare fields or something else?

3. Why nvarchar2(4000) would not work?

Tom Kyte
April 05, 2007 - 1:47 pm UTC

1) as long as the proper nls settings are in place, the data will be rendered in utf - yes (lobs use special charactersets in general - different from a nvarchar2)

2) for this row.... sure

3) you are specifying the maximum number of CHARACTERS

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/ch7progrunicode.htm#sthref807


utf8

A reader, April 05, 2007 - 5:18 pm UTC

Tom:

THanks, it is clear now.

But I do not understand what is the benefit of having UTF-8 format if you are not storing foreign languages anyway.

Is this because XML parsers require UTF8 or the XML specification requires that or what?
Tom Kyte
April 05, 2007 - 7:42 pm UTC

If you have no need to store multibyte data, you have no need to store multibyte data.

It is your choice.

NLS_SORT & NLS_COMP

Piyush Chechani, May 02, 2007 - 1:54 am UTC

Hi Tom...
I have read your follow ups on this topic.

I am trying to execute case insensitive queries using SQL*Plus.
For this I am trying to set these two things: -
execute immediate 'ALTER SESSION SET NLS_SORT=BINARY_CI';
execute immediate 'ALTER SESSION SET NLS_COMP=LINGUISTIC';
But I am not getting it correctly.

Can you please tell me how to change the environment settings to obtain a case-insensitive query execution environment.

Regards.

Tom Kyte
May 02, 2007 - 8:16 am UTC

umm, example.


ops$tkyte%ORA10GR2> variable x varchar2(25)
ops$tkyte%ORA10GR2> exec :x := 'hello';

PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select * from t where data = :x;

no rows selected
ops$tkyte%ORA10GR2> alter session set nls_comp=ansi;

Session altered.

ops$tkyte%ORA10GR2> alter session set nls_sort=binary_ci;

Session altered.

ops$tkyte%ORA10GR2> select * from t where data = :x;

DATA
--------------------
Hello
HeLlO
HELLO

Problem in example...

Piyush Chechani, May 03, 2007 - 12:25 am UTC

I tried

SQL> alter session set nls_comp=ansi;

Session altered.

SQL> alter session set nls_sort=binary_ci;
ERROR:
ORA-12705: invalid or unknown NLS parameter value specified


SQL> select * from employee where ename like 'shefali';

no rows selected

What is the problem???
Tom Kyte
May 03, 2007 - 5:29 pm UTC

perhaps you have 9i, and this is a new 10g feature.

Case-insensitive query

Piyush Chechani, May 04, 2007 - 12:51 am UTC

Yes. I have 9i.

Then please tell me how can I obtain a case-insensitive query execution environment in Oracle 9i??
Tom Kyte
May 04, 2007 - 12:54 pm UTC

create index i on t(upper(column));

where upper(column) = :bv



or use a TEXT index and the contains clause - that is a case insensitive index as well.



how to appear kurdish language in form& report

muthanna, May 29, 2007 - 5:38 am UTC

hi tom .
i have difficulties to see my form & reports with kurdish language ,actually the kurdish language is same in arabic
but 5 letters difference ..
i work with oracle 8i ,developer 2000 .
how can see my form & report with kurdish font.

regards. muthanna


A reader, June 26, 2007 - 4:40 am UTC

Tom,

At our site we share our Oracle database (currently 9i)between 5 different applications. We are going to move database to a new hardware with 10g SW istalled by exp/imp data. A characterset of our database is currently WE8ISO8859P15. I know that in a short time we will host one new application that need to have UTF-8 or AL32UTF8.

My questions:

1. As I want to have one database instance for all 6 applications - can I simple create a new 10g database with AL32UTF8? Is there any "penalty" regarding performance etc. that my 5 "old" applications should pay for having new AL32UTF8 characterset?

2. Should I perform any testing from application/client side before moving database to a new superset character set. We have pretty complex infrastucture with a lot of application servers and clients that access a database.
Can I as a DBA say that it is full "transparent" for the client what a "new" characterset the datbase has as long as it is a superset of the old one? For me it is really a lot of work to get a approvement from all application owners that there application work the same way with AL32UTF8 as it did with WE8ISO8859P15 (not to ask them by to get any professional response from them :-) ).

3. Is it a good practice simpe to create all new databases by using AL32UTF8 characterset and don't worry about what new application we should host in the future or there is any considerations here?


Tom Kyte
July 02, 2007 - 9:12 am UTC

1) benchmark, you will likely observe no penalty performance wise

2) OF COURSE. that goes without saying. Just moving from 9i to 10g or changing hardware without changing database versions would necessitate this. Every change should be tested.

3) if you know you have a need for this (as most of us do/will nowadays) yes, it would make sense.

nls parameters in the database_properties vies

A reader, July 05, 2007 - 2:16 pm UTC

Tom,

After HW failrue I had to recreate one of our test databases by installing SW, creating a database and exp/imp data from prod system. Unfortunately, I created a database with setting german_germany for nls_language and nls_territory.

SQL> select property_name, property_value from database_properties;

NLS_LANGUAGE GERMAN
NLS_TERRITORY GERMANY
NLS_CURRENCY ?
NLS_ISO_CURRENCY GERMANY
NLS_NUMERIC_CHARACTERS ,.
NLS_CHARACTERSET WE8ISO8859P15
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD.MM.RR
NLS_DATE_LANGUAGE GERMAN
NLS_SORT GERMAN
NLS_TIME_FORMAT HH24:MI:SSXFF
NLS_TIMESTAMP_FORMAT DD.MM.RR HH24:MI:SSXFF
NLS_TIME_TZ_FORMAT HH24:MI:SSXFF TZR
NLS_TIMESTAMP_TZ_FORMAT DD.MM.RR HH24:MI:SSXFF TZR
NLS_DUAL_CURRENCY ?
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 10.2.0.1.0

After "going live" with new test database users started to get a ORA-01722 "invalid number" error on their client. As there were no changes made on client side, I thought that is probably because of diffenet value for NLS_NUMERIC_CHARACTERS of the database (they swear nothing was changed on the client side). So I changed NLS_LANGUAGE in the Windows Registry on the database server and then changed values for nls_language and nls_territory in the spfile and restarted the database.

Now I have:

SQL> show parameter nls

nls_calendar string
nls_comp string
nls_currency string
nls_date_format string
nls_date_language string
nls_dual_currency string
nls_iso_currency string
nls_language string AMERICAN
nls_length_semantics string BYTE
nls_nchar_conv_excp string FALSE
nls_numeric_characters string
nls_sort string
nls_territory string AMERICA
nls_time_format string
nls_time_tz_format string
nls_timestamp_format string
nls_timestamp_tz_format string

How can I understand that my database has different values for nls_territory and nls_language to the same time if I query V$nls_parameters (or do show parameter) or query database_properties. My understanding both are database properties and both should have same values.
What Oracle documentation means saying that database_properties describes "permanent" database properties.

My second question is:

Is there any difference for my windows clients what default setting for nls_language and territory has a database. I thought that as the clients has its own setting in Windows registry for NLS_Language that should be no difference for them (regarding server setting)

and to the end :

can I get Ora 01722 "INVALID NUMBER" on client side by changing nls_language/territory on the database server?

Thanks in advance
Tom Kyte
July 05, 2007 - 3:06 pm UTC

you made the first mistake by thinking export DMP files where proper backups.

suggestion: start over, use the right nls settings, get back to where you were at least and then sit down and read, understand and IMPLEMENT the backup and recovery guide so you never make all of these mistakes again.


yes, the nls_numerics - if set wrong - will cause that. If the client uses a "." to separate decimals but the database expects "," - or vice versa - that'll happen.

ops$tkyte%ORA10GR2> select to_number( '1,2' ) from dual;
select to_number( '1,2' ) from dual
                  *
ERROR at line 1:
ORA-01722: invalid number


ops$tkyte%ORA10GR2> select to_number( '1.2' ) from dual;

TO_NUMBER('1.2')
----------------
             1.2

A reader, July 05, 2007 - 4:02 pm UTC

Tom,

as it was a test system we did not any backup at all meaning in case of HW failre we can recreate our test sytem from prod. We doing this very often by droping and recreating a test SCHEMA. This time yes it was my failure that I created the database choosing german as a default language. But my understanding was it is no matter what a default language database has (this understanding is from reading your site) as far as a client has a right settings.

I am afraid I don't really understand your words:

".. If the client uses a "." to separate decimals but the database expects "," .....", what means database expects?
I have client setting for nls_language/territory and appliction code resides on the client...so client "parse" the input I do in the sql using its own nls setting, not the server.
Tom Kyte
July 05, 2007 - 4:29 pm UTC

german nls_numerics use ',' for decimals and '.' for thousands.

other countries DO NOT, they use '.' for decimals and ',' for thousands.


so, did your client set the nls_numerics?

ansi and utf8

Didier, August 08, 2007 - 8:33 pm UTC

Hi Tom,

i'm working in a 9.2.0.6 db and our nls parameters are:

PARAMETER VALUE
------------------------- ------------------------------
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_CHARACTERSET UTF8
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE

We store english as well as french data, we are running Oracle Applications 11.5.9, our db server os is linux.

When I'm using the UTL_FILE package to extract data, from a regular sqlplus session whether i am on my laptop or on the db server, i'm not having a consistent result, as far as the characterset of the files i'm getting.
It looks like if i'm extracting only english characters, the files will be in ANSI, but if i'm extracting english and french characters, then the files are in UTF8.
I have to admit that i never worked on a multi-language environment and i'm not sure if this behavior is normal. Personnally, I was expecting all files to be in UTF8, since my db is setup this way.
Could you, without making fun of my ignorance, shed some light in my problem?

Thank you very much in advance,
Didier
Tom Kyte
August 14, 2007 - 10:36 am UTC

huh?
...
Could you, without making fun of my ignorance,
......

why would you say something like that?

what are your NLS settings on the client when you do this, are they different.

A reader, August 23, 2007 - 5:19 pm UTC

Tom,

:-) I'm saying this b/c just by reading my question I realize that I only understand half of what I am talking about. Nothing against you.

Anyway, on my laptop, my registry shows:
NLS_LANG: AMERICAN_AMERICA.WE8MSWIN1252

On the linux server, the env parameters show:
NLS_SORT=binary
NLS_LANG=American_America.UTF8
CFLAGS=$(INCLUDE_FLAGS) -Dlinux -DLINUX -DNLS_ASIA -D_GNU_SOURCE
NLS_DATE_LANGUAGE=
NLS_DATE_FORMAT=DD-MON-RR
NLS_NUMERIC_CHARACTERS=.,

Whether I try to write a file from my pc or from the server, I have the same issue with characterset of the files created.

Thanks
Didier
Tom Kyte
August 24, 2007 - 1:57 pm UTC

I think your character sets are different at different times, I'd need a way to reproduce - can you from a single session create a file that is not utf8 and then one that is by accessing just different data?

A reader, August 28, 2007 - 10:05 am UTC

Tom,

Yes, basically that's what's happening.

I have a sql script with a query that retrieves some data and write them in a file. I run it twice from the same session, with a different where clause, to retrieve data from a different language, and the file containing english is ANSI, while the file containing french is UTF8.
What I describe might be completely normal, but what I don't understand is what triggers the change of characterset.

Thanks
Didier
Tom Kyte
September 04, 2007 - 1:44 pm UTC

this would not be normal, the UTF marker would be written to a file BEFORE any data is written there (it is a byte mark at the beginning of the file), it doesn't change mid course


I'd need a way to reproduce this.

Alberto Dell'Era, September 04, 2007 - 4:57 pm UTC

But Metalink note 227531.1 says that "UTL_FILE does not take any character set conversion into account when writing or reading data" (and Didier is using UTL_FILE if I'm not mistaken).

Maybe the English data is stored in VARCHAR2 columns (so in UTF8 given the NLS settings), and the French data in NVARCHAR2 (hence in AL16UTF16) - maybe AL16UTF16 "looks like" the "ANSI" charset (which I'm not really sure what it is) ?

Tom Kyte
September 05, 2007 - 1:55 pm UTC

this is why I need a way to reproduce - to avoid the "maybes"


A reader, September 13, 2007 - 4:24 pm UTC

Hello,
sorry i didn't answer before I was on vacation for a week and I just came back.
When I execute twice the program changing the where clause, I create a new file each time, I don't append to the existing one.
However, I tried something different, and the issue seems to go beyond the db.
I loggued in our linux server and vi-ed a file. I entered some english characters and i saved: file is ANSI. Reopened the file and pasted french characters: file switch to UTF8... so it looks like the linux os triggers the switch. I'm going to try to work with OOD people to understand what's happening on the OS, but this is really weird.

Thanks
Didier

Strugging with exporting data out from an Unicode database

Anthony, November 20, 2007 - 1:39 pm UTC

-----
Background information
-----

Server: Sun Solaris 5.10; 10g
Client: Windows 2000; 10g, TOAD, Oracle ODBC 10.2.0.1

select * from v$NLS_PARAMETERS
NLS_LANGUAGE                  AMERICAN                      
NLS_TERRITORY                 AMERICA                       
NLS_CURRENCY                  $                             
NLS_ISO_CURRENCY              AMERICA                       
NLS_NUMERIC_CHARACTERS        .,                            
NLS_CALENDAR                  GREGORIAN                     
NLS_DATE_FORMAT               DD-MON-RR                     
NLS_DATE_LANGUAGE             AMERICAN                      
NLS_CHARACTERSET              <b>AL32UTF8</b>                      
NLS_SORT                      BINARY                        
NLS_TIME_FORMAT               HH.MI.SSXFF AM                
NLS_TIMESTAMP_FORMAT          DD-MON-RR HH.MI.SSXFF AM      
NLS_TIME_TZ_FORMAT            HH.MI.SSXFF AM TZR            
NLS_TIMESTAMP_TZ_FORMAT       DD-MON-RR HH.MI.SSXFF AM TZR  
NLS_DUAL_CURRENCY             $                             
NLS_NCHAR_CHARACTERSET        AL16UTF16                     
NLS_COMP                      BINARY                        
NLS_LENGTH_SEMANTICS          BYTE                          
NLS_NCHAR_CONV_EXCP           FALSE

We import SAS data (Windows Latin character set) into Oracle, use OWB for ETL, export the results to SAS. Per regulatory requirements, character columns cannot exceed 200 in length.

-----
Problem scenario
-----

Data that cause the trouble (200 characters, with a degree sign at the 77th position):

XXX PT PRIOR TO MSFC NOT TO USE WALKER, PT STATED SHE NEEDED IT LAST VISIT 2° BAD HEADACHE DECREASED BALANCE, WHICH WAS LATER FOUND TO BE SINUS INFECTION. ASKED PT NOT TO USE WALKER THIS TIME, PT SAID

Degree sign is U+00B0 in UTF-8, or 0xB0 (176) in ASCII. Though, I found out select ascii('°') from dual would return 49480 (or, 0xC2 0xB0).

In order to accommodate the import, Source.COMMENTX is VARCHAR2(201). Using OWB, we are mapping this to Target.COVAL which is VARCHAR2(200).

To get around ORA-12899: value too large for column, we use the expression convert(Source.COMMENTX, 'WE8ISO8859P1', 'AL32UTF8')).

Although viewing Target.COVAL shows a ¿ (true in TOAD, SQL*Plus), dump(COVAL) confirms the 77th character is 176:
DUMP(COVAL)                                                                  
---------------------------------------
Typ=1 Len=200: [...],32,50,176,32,[...]

-----
Desirable outcome
-----

Store and display the text in a VARCHAR2(200) column without compromising the high-bit ASCII characters, e.g., degree sign, micro sign (i.e., Greek character mu), copyright sign, etc.

-----
Questions
-----

1. Is it a wrong assumption that AL32UTF8 supports the high-bit ASCII characters (i.e., characters between 128 and 255)? If not, why do the clients display the inverted question mark instead of degree sign when executing select chr(176) from dual?

2. The aforementioned DUMP statement seems to confirm ASCII 0xB0 (i.e., not 0xC2 0xB0, or 0xBF) is being stored in the database at the 77th position. Why do my applications via ODBC interpreted and replaced it as 0xBF, which is the inverted question mark?

-----
Avenues attempted without the desirable outcome
-----

1. Changing Target.COVAL from VARCHAR2(200) to NVARCHAR2(200) or VARCHAR2(200 CHAR) would make SAS (data access through ODBC) think the length is 400 or 800, respectively [Note: The vendor claims it is ODBC 3.0 compliant]

2. Through Microsoft's ODBC Test software, this is the output for describe column all against select COLVAL from Target:
icol, szColName, *pcbColName, *pfSqlType, *pcbColDef, *pibScale, *pfNullable 
1, COMMENTX, 8, SQL_WVARCHAR=-9, 200, 0, SQL_NULLABLE=1

Tom Kyte
November 21, 2007 - 11:11 am UTC

1) al32utf8 is an ISO standard, I don't know the intimate details of precisely how each character is stored - however I definitely know that in a multi-byte encoding scheme NOT ALL of the high bit ascii characters can be stored in a single byte - because there has to be some "bits" that one could look at and say "this is a multi-byte character" (eg: if all 255 fit into a single byte, it would be hard to tell what started a multi-byte string)

You have to assume that ANY character could take 1 OR MORE bytes in a multi-byte scheme - period.

2) it is all about character sets and what character set the client has set and what character set the data was stored in. Mixing character sets is nasty business for this reason - a fetch of data and an update of the data (without touching it on the client - just retrieve it and send it back) can and will cause the data to change. These are not isomorphic functions - especially when you go multi-byte to single byte.

Why ?

Emma, November 28, 2007 - 11:33 am UTC

Hello Tom,
On Oracle 10g, changing from WE8MSWIN1252 to UTF8 is giving the following error:

I suppose UTF8 is superset of WE8MSWIN1252?

SQL> ALTER DATABASE CHARACTER SET utf8;
ALTER DATABASE CHARACTER SET utf8
*
ERROR at line 1:
ORA-12712: new character set must be a superset of old character set

EXP-00056: ORACLE error 12705 encountered

Krishna, April 13, 2008 - 3:43 am UTC

Hi Tom,

while doing export of tables the below mentioned errors arised.

EXP-00056: ORACLE error 12705 encountered

ORA-12705: invalid or unknown NLS parameter value specified

Username:

EXP-00030: Unexpected End-Of-File encountered while reading input

EXP-00000: Export terminated unsuccessfully


please suggest what should be the proper step to overcome these errors.

Regards
Krishna

Tom Kyte
April 13, 2008 - 8:48 am UTC

$ oerr ora 12705
12705, 00000, "Cannot access NLS data files or invalid environment specified"
// *Cause: Either an attempt was made to issue an ALTER SESSION command
//         with an invalid NLS parameter or value; <b>or the environment
//         variable(s) NLS_LANG, ORA_NLSxx, or ORACLE_HOME was incorrectly
//         specified, therefore the NLS data files cannot be located.</b>
// *Action: Check the syntax of the ALTER SESSION command and the NLS
//          parameter, correct the syntax and retry the statement, or
//          specify the correct directory path/values in the environment
//          variables.

A reader, April 13, 2008 - 9:35 am UTC

Hi Tom,

i appreciate your quick response...

please explain in brief what needs to be done....

Thanks
Krishna
Tom Kyte
April 16, 2008 - 1:43 pm UTC

see the above bit in bold, check it out. It tells you what is wrong.

NLS_LANG

Krishna, April 15, 2008 - 9:19 am UTC

Hi Tom

i am facing a problem while exporting.
first time when i run my export utility,it fails with below mentioned erro and when i rerun this export utility it runs successfully.

EXP-00094: Could not convert to server national character set's handle
EXP-00000: Export terminated unsuccessfully

please suggest why it fails at first attemp...its very imp for me ....i need to implement it in production....

Regards
Krishna
Tom Kyte
April 16, 2008 - 2:59 pm UTC

please utilize support, I cannot imagine a circumstance whereby running imp twice would cause it to work.

unless you were running imp via a script that just happened to do something like:

...
imp ..... <<=== fails

export NLS_LANG=....
export ORACLE_HOME=.....


and the second time around it was running in an OK environment.

Asheesh Dhupper, April 17, 2008 - 2:44 am UTC

Tom:

what is the difference between nls_database_parameters and
V$nls_parameters?

How do you change any parameter in a client session?

Thanks,

Tom I am not clear with the links u sent. Please explain.
Our DBA says we need to create a new Database if we want to have changed characterset. Please help.
Tom Kyte
April 17, 2008 - 7:38 am UTC

documentation says:

"NLS_DATABASE_PARAMETERS lists permanent NLS parameters of the database."

"V$NLS_PARAMETERS This view contains current values of NLS parameters."

so, for example, my database is usually america_american.WE8ISO8859P1, if I change my environments NLS_LANG:

ops$tkyte%ORA10GR2> !echo $NLS_LANG
American_Australia

ops$tkyte%ORA10GR2> set echo on
ops$tkyte%ORA10GR2> @test
ops$tkyte%ORA10GR2> select coalesce( a.parameter, b.parameter ) parm,
  2         a.value perm_value,
  3         b.value curr_value
  4    from nls_database_parameters a full outer join v$nls_parameters b on (a.parameter=b.parameter)
  5   where decode(a.value,b.value,0,1) = 1
  6  /

PARM                      PERM_VALUE                     CURR_VALUE
------------------------- ------------------------------ ------------------------------
NLS_TERRITORY             AMERICA                        AUSTRALIA
NLS_ISO_CURRENCY          AMERICA                        AUSTRALIA
NLS_DATE_FORMAT           DD-MON-RR                      DD/MON/RR
NLS_TIME_FORMAT           HH.MI.SSXFF AM                 HH12:MI:SSXFF AM
NLS_TIMESTAMP_FORMAT      DD-MON-RR HH.MI.SSXFF AM       DD/MON/RR HH12:MI:SSXFF AM
NLS_TIME_TZ_FORMAT        HH.MI.SSXFF AM TZR             HH12:MI:SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT   DD-MON-RR HH.MI.SSXFF AM TZR   DD/MON/RR HH12:MI:SSXFF AM TZR
NLS_RDBMS_VERSION         10.2.0.2.0

8 rows selected.


you can see what happens - nls_database_parameters are the "permanent defaults", v$ is currently, what is in place for me


How do you change a setting?

Well, via the environment/registry - as demonstrated above is one way.

Via alter session is another way.

"U" sent no links, I've banned "U" from logging onto the system ever. They are not allowed here.


The DBA should read the globalization guide (all are available on the site otn.oracle.com) for the release of the database you are using, there the DBA can learn how to CHANGE the character set of the current database if possible. It is all documented.

A reader, April 29, 2008 - 4:01 pm UTC

create table sym
(id number
,val varchar2(20)
)

insert into sym
values
(1,'ABC/PR');

insert into sym
values
(2,'DEF/PR');

insert into sym
values
(3,'PQR-W');

insert into sym
values
(4,'MN-W');

insert into sym
values
(5,'AB/U');

insert into sym
values
(6,'DEF/U');

commit;

select * from sym;

ID VAL
-- --------
1 ABC/PR
2 DEF/PR
3 PQR-W
4 MN-W
5 AB/U
6 DEF/U

I am creating a view using this table. In the view, I have to encode the following rules:

If there is /PR in the value field, I need to replace it with .PR
If there is -W in the value filed, I need to replace it with #
If there is /U in the value field, I need to replace it with =


The result of the view should be
ID VAL
-- --------
1 ABC.PR
2 DEF.PR
3 PQR#
4 MN#
5 AB=
6 DEF=

Can you please let me know how to make this view?
Tom Kyte
April 30, 2008 - 9:06 am UTC

ops$tkyte%ORA10GR2> select id, val, replace( replace( replace( val, '/PR', '.PR'), '-W', '#' ), '/U', '=') newval from sym;

        ID VAL                  NEWVAL
---------- -------------------- ----------
         1 ABC/PR               ABC.PR
         2 DEF/PR               DEF.PR
         3 PQR-W                PQR#
         4 MN-W                 MN#
         5 AB/U                 AB=
         6 DEF/U                DEF=

6 rows selected.

character sets chaos

Michal Pravda, May 14, 2008 - 10:23 am UTC

Hello Tom,

I've got a db 10r2 with utf-8. I have a client on win2000 with home0 NLS_LANG set to CZECH_CZECH REPUBLIC.EE8MSWIN1250.

I've got an insert statement in a script file (a lot of statements actually, but solve one solve all) which inserts text containing non-us7ascii characters. This file has been stored in utf8.

I need to run that file that is insert the data into the database. As is it stores some of the characters wrongly to the database when run by @@skript_file from sqlplus (due to implicit character set conversion on client/server boundary, I believe).

I can overcome the problem by setting my client to utf8 or converting the file to win1250 (EE8MSWIN1250 in Oracle terms).

However that is not the solution a wish for. I can do that only on my development environment. I cannot do it on test or production because I have no write priviledges there and no knowledge of their settings. I can of course write in installation metafile something like 'set your client to this charset...'. But that is very error-prone I think. I would like to be able to set that information in my script_file. Something like
alter session set char_set=EE8MSWIN1250. But I have found neither a way to doit (either in doc or on Google) nor an explicit statement of it being impossible.

Could you point one or the other way please?
Tom Kyte
May 14, 2008 - 3:41 pm UTC

not sure what to say here - if the client characterset is different from the database - characterset conversion MUST take place.

if the file is a utf encoded file - the client should be using - well, that characterset?

clarification

Michal Pravda, May 16, 2008 - 2:12 am UTC

Thank you for your time.

I wanted to write the entire context in the previous post. Maybe it was misleading. So simple, hope clearer, question.
Is it possible to change the client't character set from within a script running on that client (for one session)? Or I must do it BEFORE I connect to the database by setting nls_lang OS variable?
Tom Kyte
May 19, 2008 - 2:35 pm UTC

correct, the nls character set is specified before hand. the client uses that character set, it should be set before launching the program.

getting a client's charset

Michal Pravda, October 02, 2008 - 5:53 am UTC

Good morning Tom,

DBA's mistakes has costed me about 4 days of work in a few months. Everytime the scenario has been the same. I wrote a script in win1250, told them to switch their nls_lang to appropriate charset prior to installation and they failed to do it. And I had to clean the mess. I don't want to do it anymore. So:

The goal: stop processing sqlplus installation script when the one who runs it has wrong charset set. "Wrong" defined as other than set as parameter (eg. I put into script somehow "check_client_char_set('EE8MSWIN1250');" And it won't continue unless defined.

The problem: I can't find a way to determine (within an SQLplus script) which charset the client is using.

Thank you for your help
Tom Kyte
October 02, 2008 - 7:59 am UTC

in 11g, this is in v$session_connect_info - but not before.


prior to that - since you know at least one thing that causes a mess (one conversion that you cannot deal with) you could - using a one row, one column table - test the data and verify that it is 'correct'.

for example: insert something, use dump() to verify that the bits and bytes are correct, if not - fail and exit.

validating character set

Michal Pravda, October 03, 2008 - 3:35 am UTC

Thanks for the answer. I gave up trying to find the information and came with similar solution after a few hours of testing yesterday. I have created a package which has a constant that contains all chars with diacritics. In the following install scripts I'll call a procedure which compares it's parameter with the stored constant. If some unexpected conversion takes place the comparison failes and the procedure throws an exception.

...

Michal Pravda, October 03, 2008 - 3:40 am UTC

I forgot to add that I would have to check that the validation package has been installed correctly (sorry for english :(, probably wrong usage of tenses). But that's easy.

Need help

Ashish Mrig, November 12, 2008 - 11:32 am UTC

Hi Tom,
   Its been most helpful reading your columns.  However we are getting a small issues about which I need your urgent help:

On Oracle 10g, changing from WE8ISO8859P1 to UTF8 is giving the following error:

SQL> ALTER DATABASE CHARACTER SET utf8;
ALTER DATABASE CHARACTER SET utf8
*
ERROR at line 1:
ORA-12712: new character set must be a superset of old character set

I thought UTF8 is superset of WE8ISO8859P1 but apparently not.  So
a. Do you know which UTF8 compliant characterset we can use?
b. what is the difference between UTF8 & AL16UTF16 & AL32UTF8?

Thanks in advance for your help.

Tom Kyte
November 13, 2008 - 4:17 pm UTC

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/toc.htm

you'll want to become familiar with that document...

NLS_SORT

Suvendu, December 30, 2008 - 5:23 am UTC

Hi Tom,

I got stack at a sort operation. I tried with my options but could not able to find the solution. Could you please, look into this?

Data before sorting:

d1
d1.1
d1.1.1
d1.1.2
d1.2
d1.10
d1.1.9
d1.1.10
d10


Data need to be display after sorting :
---------------------------------------
d1
d1.1
d1.1.1
d1.1.2
d1.1.9
d1.1.10
d1.2
d1.10
d10


It's in Oracle 10.2.

Thanks,
Suvendu
Tom Kyte
January 05, 2009 - 9:05 am UTC

ops$tkyte%ORA10GR2> select x,
  2         substr( data, instr( data, '.', 1, 1 )+1, instr( data, '.',1, 2) - instr( data, '.', 1, 1 ) -1 ) a,
  3         to_number(substr( data, instr( data, '.', 1, 2 )+1, instr( data, '.',1, 3) - instr( data, '.', 1, 2 ) -1 )) b,
  4         to_number(substr( data, instr( data, '.', 1, 3 )+1, instr( data, '.',1, 4) - instr( data, '.', 1, 3 ) -1 )) c,
  5         to_number(substr( data, instr( data, '.', 1, 4 )+1, instr( data, '.',1, 5) - instr( data, '.', 1, 4 ) -1 )) d
  6    from (
  7  select x, '.'||x||'.' data
  8    from t
  9         )
 10   order by
 11         substr( data, instr( data, '.', 1, 1 )+1, instr( data, '.',1, 2) - instr( data, '.', 1, 1 ) -1 ),
 12         to_number(substr( data, instr( data, '.', 1, 2 )+1, instr( data, '.',1, 3) - instr( data, '.', 1, 2 ) -1 )) nulls first,
 13         to_number(substr( data, instr( data, '.', 1, 3 )+1, instr( data, '.',1, 4) - instr( data, '.', 1, 3 ) -1 )) nulls first,
 14         to_number(substr( data, instr( data, '.', 1, 4 )+1, instr( data, '.',1, 5) - instr( data, '.', 1, 4 ) -1 )) nulls first
 15  /

X               A                          B          C          D
--------------- ----------------- ---------- ---------- ----------
d1              d1
d1.1            d1                         1
d1.1.1          d1                         1          1
d1.1.2          d1                         1          2
d1.1.9          d1                         1          9
d1.1.10         d1                         1         10
d1.2            d1                         2
d1.10           d1                        10
d10             d10

9 rows selected.


of course, you don't need to select the columns - just order by them. This example assumes four levels max, you can see how to add more if you can go "deeper"

NLS_LANG and encoding question

Eduard, April 30, 2009 - 7:33 am UTC

Let me explain an abstract scenario. I'm trying to figure where the NLS_LANG is used and how it works.

In my local machine I have an APPLICATIONA (a winform, a web application or plsql developer, for example).

APPLICATIONA uses oracle client ORACLIENTB (it is the Net8 client installed in my local machine) in order to connect to the database server via TCP.

ORACLIENTB creates a TCP channel that connects THREADC (running in my local machine) and THREADD(running in the DB server). (THREADD want to represent the server thread that consumes PGA memory in a dedicated server).

(Maybe it is not the best way to express what it is going on but I found difficult to figure what are we saying when we talk about client programs or server proceses. In some sense my WinForm application is an oracle client too.)

Between THREADC and THEARDD we are passing the sql sentence in plain text. At this point THREADC and THREADD should be "encoding syncronized" in order to properly communicate "text".

Before that text exchange, the text sentence has been created in my APPLICATIONA. This text could be stored in memory in UTF16-BE . Then APPLICATIONA send the sentence to ORACLIENTB.

The first question: Is ORACLIENTB able to receive text in UTF16-BE directly or it is only able to receive ANSI encoded strings?

The second question: Is the parameter NLS_LANG in my local machine used to convert the encoded string received by THREADC to the one that THREADD could properly understand?

Hope it makes sense and sorry for my English

Thanks in advance,
Eduard.
Tom Kyte
April 30, 2009 - 10:41 am UTC

you should read this book from cover to cover then:

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/toc.htm



Your client has an NLS_LANG setting, this specifies the character set of the client

Your database has a character set, this specifies how data is to be stored and processed in the server.


When the client connects to the database, our networking code ascertains the character set of the database. There are two cases to consider

a) client character set = database character set

In this case, data is simply transferred from the client to the server - the client is expected to send well formed data in the character set they profess to be using. We simply store the bits and bytes they send us. For example, suppose the database is US7ASCII and the client professes to be US7ASCII as well. The client is expected to transmit character data using the ANSI/ISO ASCII standard encoding scheme (7 bit data only). No checks are performed to verify the client is transmitting 7 bit data only, this is assumed - since the client has stated "I am a 7 bit client".


b) client character set <> database character set

In this case, the client knows both character sets and the client know the conversion rules from character set A to character set B. The data will have these conversion rules applied to them and then stored in the database. For example suppose the database is US7ASCII and the client is WE8ISO8859P1 (Western European 8 bit data using an extended ASCII character set). The client data will be converted to 7 bit ASCII data before storing in the database.


http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/ch2charset.htm#sthref133

summarizes this as well.

NLS_LANG and encoding question

Eduard, April 30, 2009 - 11:41 am UTC

Thanks for the quick response. I started reading the globalization documentation but I got a fundamental doubt.

When the documentation says "character set on the client operating system" what it means in Windows platform?

I mean it's not clear for me that in Windows exist such a "global character set" concept.

Internally "text data in memory" is usually stored in UTF16LE (at least under .Net platform) and any "text file" in the filesystem could be stored in any enconding (unicode based or ansi-codepage based). Additionally if a windows application send "text data" it specifies in which encoding the data will be transmitted. But it is not a OS parameter. Each application choose which enconding to use (unicode based or ansi-codepage based).

I have read long time ago (not remember where) that NLS_LANG should be aligned with the value stored under "Language for non-unicode programs".

Is it "Language for non-Unicode programs" what the documentation says "character set on the client operating system"?

If the latter is true it means that NLS_LANG in a windows client machine would never be set to UTF8 or UTF16 because "Language for non-Unicode programs" (roughly speaking) means the AnsiCodepage encoding the OS will use to transform "unicode text in OS" to "ansi text in program" when old standard apis are used. In fact this setting is there for compatibility reasons for ANSI based programs under windows 95 where the OS didn't support unicode internally.

To be honest, I'm not sure at all if what I said it is true. It is my current mental view of the situation. If something I have said is totally wrong please correct me so I would be able to change my mind.

Thanks in advance,
Eduard


Tom Kyte
April 30, 2009 - 12:00 pm UTC

Windows is just an OS, not special - not any more so than unix, linux or OS/390.

Your client application uses a character set - it can be anything you want. It is valid to say "I am us7ascii" on windows, or western european or whatever you want.

... But it is not a OS parameter. Each application choose which
enconding to use (unicode based or ansi-codepage based).

...


correct - when the documentation says "character set on the client operating system" that means the character set being used - by the application.


And the client has control over what character set they use, they support.

NLS_LANG and encoding question

Eduard, April 30, 2009 - 12:25 pm UTC

Thank you very much!

For windows developers probably the next extracts from the globalization documentation are useful.

"ADO is a high-level API to access database with the OLE DB and ODBC drivers. Most
database application developers use the ADO interface on Windows because it is
easily accessible from Visual Basic, the primary scripting language for Active Server
Pages (ASP) for the Internet Information Server (IIS). To OLE DB and ODBC drivers,
ADO is simply an OLE DB consumer or ODBC application. ADO assumes that OLE
DB and ODBC drivers are Unicode-aware components; hence, it always attempts to
manipulate Unicode data."

"OCI Unicode binding and defining features are used by the ODBC and OLE DB
drivers to handle Unicode data. OCI Unicode data binding and defining features are
independent from NLS_LANG. This means Unicode data is handled properly,
irrespective of the NLS_LANG setting on the platform."

So, creating a windows client using ADO.NET it doesn't matter which value you set for NLS_LANG?

Thanks again!
Tom Kyte
April 30, 2009 - 1:21 pm UTC

well, no, not really.

If you have a us7ascii database and feed something other than us7ascii in there...

You still have characterset considerations.

zhs16gbk and utf8

jian huang zheng, April 30, 2009 - 1:44 pm UTC

Hi Tom

suppose the chinese windows client use a client characterset : zhs16gbk and database characterset is utf8 like.
According to you, there is a conversion since two charactersets dont match,righ? Could there be any lost conversions between them?
Since I cant set utf client characterset, ...
Thanks!
Tom Kyte
April 30, 2009 - 2:06 pm UTC

but you do control the client characterset?

if you are client server, you are installing code and stuff on their machine - you control it.

if you are web based, the client is the application server, and you own it - you control it again.



And utf-8 is a unicode character set
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/ch6unicode.htm#g1014017

(UTF-8 wouldn't be the best choice, read through that chapter). The goal of unicode is:

... The Unicode character set includes characters of most written languages around the world, ...

it has a representation for all of them, so it can store all of them without data conversion/loss.


About Oracle Globalization Support Guide

Eduard, May 01, 2009 - 5:16 am UTC

I found quite unfortunate the next paragraph from the Oracle10 Globalization guide:

"The NLS_LANG character set should reflect the setting of the operating system character set of the client. For example, if the database character set is AL32UTF8 and the client is running on a Windows operating system, then you should not set AL32UTF8 as the client character set in the NLS_LANG parameter because there are no UTF-8 WIN32 clients. Instead, the NLS_LANG setting should reflect the code page of the client. For example, on an English Windows client, the code page is 1252. An appropriate setting for NLS_LANG is AMERICAN_AMERICA.WE8MSWIN1252."

Comments:

(1) It is quite confusing to talk about "operating system character set". Instead it should say "character set" alone.

(2) "there are no UTF-8 WIN32 clients" ??? What does it means? Almost all WIN32 APIs that use "characters" have two flavors: A-functions and W-functions. W for Wide (UTF16 enconding) and the A for ANSI (8bits ansi codepage enconding). So it is the WIN32 client (or server BTW) who choose which function it uses (the A ones or the W ones). One could argue that a WIN32 client that uses W functions in reality is a UTF-16 client... but again very unfortunate.

(3) "In an English Windows client, the code page is 1252". ??? By default on a English windows client the setting "Language for non-Unicode programs" (only applies to ANSI based programs) is set to ANSI codepage 1252. But this setting could be changed. But a windows client could uses W APIS so bypassing this setting.

Reading again the whole paragraph it seems that when it talks about "windows clients" it only refers to "ANSI based windows clients".

Currently under windows platform almost all new programs are "Unicode based". The support windows provides to ANSI based programs exist only por compatibility reasons.

In thing it is a bad idea on a current documentation to imply windows clients are ANSI based.

Later on chapter 7 "Programming with Unicode" the documentation is much better in my personal opinion.

Maybe I'm overreacting to some unfortunate sentences in a 406 page document but all this enconding stuff could become very confusing if the main concepts are not well understood. One need to know how the OS works and how the programs are written (client and servers).

Thanks,
Eduard

Globalisation

Abhijeet, June 24, 2009 - 11:08 am UTC

Greetings Tom,
Need some expert advice, hence pinging you.
Scenario:
We are completed almost 60% of development of an application required by specific client. There is no multilingual requirement but it can be deployed at one language at one time. Ie. Entire application should support French and for next client separate installation entire application should support German.

We have extensively used NVARCHAR2 datatypes everywhere in application. But considering the above scenario we can easily manage with varchar2 data types just by choosing appropriate DB character set at the time of installation.

Please advise us should we change the datatypes to simple varchar2 in our application DB or just live with the nvarchar2 data type. Please ignore the Impact changes in application and advice from technical front.

Many Thanks
Abhijeet
Tom Kyte
June 26, 2009 - 9:56 am UTC

can you describe the thought process behind your decision to "extensively use" this datatype? what was your motivation / logic behind that?


If you only need to support a single language and get to pick the characterset, it would never enter my mind to use it.


Abhijeet R, July 14, 2009 - 1:33 am UTC

The understanding at the starting phase of project was to develop a globalised application but as the things get clarified we understood it will be single foreign language at any point of time. So what¿s your take on this..Please advice considering scenario posted above.

Many Thanks
Abhijeet

Tom Kyte
July 15, 2009 - 11:04 am UTC

it would be up to you entirely. Do you want to go back to using varchar2? would it make your life better? easier?

Your nvarchar2 data will be stored in multibyte fields - even if you use only single byte data - the processing overhead is nominal (single byte data can be processed marginally faster) - but if you have a large sunk cost already, unless you think there is something that will be incredibly 'good' resulting from a mass conversion (your existing databases will need a massive reorganization to do this)....

Abhijeet, August 11, 2009 - 3:22 am UTC

Thanks Tom. We have implemented this change and switched to Varchar.

unicode

A reader, December 11, 2009 - 9:44 pm UTC

Tom:

Are most oracle databases installed with unicode settings now?

WE are migrating from 9i to 11g in next few weeks. We are installing 11g and then export data from 9i and import data into 11g.

WOuld it be better to do a unicode 11g database instead o WEISO88591. WE do not store multibyte data but we do a a lot of XML generation from relational data.

WOuld this require much more testing too??

please advise.
Tom Kyte
December 14, 2009 - 8:09 am UTC

it depends on the person/site/company/needs of people installing/requirements.


character set

A reader, December 14, 2009 - 10:26 pm UTC

Tom:

I told you the needs:

1) no need to store foreign characters
2) need to output XML data preferably in UTF8 format.
3) need to store special symbols anc characters sometimes.

so would you go Unicode with 11g? is there a lot of extra effort for data validation or it is minimal with data scanner.
Tom Kyte
December 15, 2009 - 8:41 am UTC

does anyone else see how #1 and #3 could conflict with each other?

As does #2 and #1?

Sam/SMK - what are special symbols? Are they perhaps part of a different characterset, one that is not supported in your current character set? That need - the need to support the storage and retrieval of data in the set (.....) will drive your character set.

Sam/SMK - I've referred you to documentation before (many many times). Have you read the globalization guide to get a feel for

a) whether you want/need/desire this
b) how to get there
c) what'll happen when you do?

http://docs.oracle.com/docs/cd/E11882_01/server.112/e10729/toc.htm

character set

A reader, December 17, 2009 - 3:44 pm UTC

Tom:

geez, this guide will take 1 month to read and digest and a lot of grey hair afer that...

i will scan the chapters.

just curious is asktom running UTF8 or ISO88591.

There is nothing to lose going unicode.

If users have special character, bring it on we will store it.
If you do not have one, well we do not care, we wont store it..
Tom Kyte
December 17, 2009 - 4:51 pm UTC

Sam, SMK - I shall stop answering your questions for the next month, that shall give you plenty of time to read the documentation and save me a few grey hairs myself.


If users have special character, bring it on we will store it.


read the documentation and you might find out how naive that statement is.


You know how to get to understand something? You actually learn about it.

See you at the end of January.

character set

A reader, December 17, 2009 - 7:21 pm UTC

sorry tom

you know i was not serious about reading the guide

I hope you are not serious about end of january too.
that is severe punishment. make it couple of days!

Have a happy holiday!

Complaints to Oracle Globalization documentation

Sergiusz Wolicki, January 11, 2010 - 10:14 am UTC

@Eduard from Spain

(Sorry for my shameless self-promotion on Tom's unmatched source of Oracle-related knowledge.)

Please, post complaints and suggestions regarding Oracle Database globalization documentation on the OTN Database Globalization forum ( http://forums.oracle.com/forums/forum.jspa?forumID=50 ). The route from that forum to documentation authors is much shorter.

General globalization questions are also welcome on that forum.

Tom Kyte
January 18, 2010 - 12:13 pm UTC

No worries, you work for the globalization team - I've no problem with comments from you guys :)

ORA-12899 occurs with SQL COPY command even when space exists!

Peter, March 16, 2010 - 3:59 pm UTC

Hi all,

I wonder why I can not use SQL COPY command due ORA-12899.
Alternative INSERT or exp/imp method works fine with same data.

Example is below:

D:\>set NLS_LANG=.AL32UTF8

D:\>sqlplus SIEBEL/pass@db1

SQL*Plus: Release 10.2.0.1.0 - Production on Tue Mar 16 21:33:32 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select value from nls_session_parameters where parameter in ('NLS_LENGTH_SEMANTICS');

VALUE
--------------------------------------------------------------------------------
CHAR

SQL> select value from nls_instance_parameters where parameter in ('NLS_LENGTH_SEMANTICS');

VALUE
--------------------------------------------------------------------------------
CHAR

SQL> select value from nls_database_parameters where parameter in ('NLS_LENGTH_SEMANTICS','NLS_CHARACTERSET');

VALUE
--------------------------------------------------------------------------------
AL32UTF8
BYTE

SQL> create table tmp1 (c char(1 char));

Table created.

SQL> create table tmp2 (c char(1 char));

Table created.

SQL>
SQL> insert into tmp1 values (unistr('\0161'));

1 row created.

SQL> commit;

Commit complete.

SQL>
SQL>
SQL> copy from SIEBEL/pass@db1 to SIEBEL/pass@db1 insert tmp2 using select * from tmp1;

Array fetch/bind size is 15. (arraysize is 15)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)

ERROR:
ORA-12899: value too large for column "SIEBEL"."TMP2"."C" (actual: 3, maximum:
1)


SQL> insert into tmp2 using select * from tmp1;

1 row created.

SQL>


Any ideas are welcome..

Tom Kyte
March 16, 2010 - 4:15 pm UTC

I would presume something is broken with the deprecated sqlplus copy command with regards to multibyte data and the way it binds.

This would be a good case to use just "sql" insert into tmp2 select * from tmp1; - that is all you need here.

Or if a long is involved, data pump, just use the API - a bunch of sqlplus plsql invocations - full (and more) replacement for the deprecated sqlplus copy command.

RE: ORA-12899 occurs with SQL COPY command even when space exists

Peter, March 16, 2010 - 4:42 pm UTC

Thanks for additional methods, unfortunately I can not use

- datapump because not allowed to access UNIX DB server in any way and nobody will do that for me, no directory access

- INSERT is out of scope - originally I need COPY data between 2 databases - and no DB LINKS are allowed again !

So I was preferring Sql COPY instead of exp/imp so far.. but now I have just one last option after SQL COPY fail..
still strange why it nicely works for VARCHAR2(1 CHAR) and does not for CHAR (1 CHAR) ..


Peter.
Tom Kyte
March 16, 2010 - 4:46 pm UTC

you do not need unix database server access, datapump goes database to database - no files. but it would need a dblink, but I see your site prefers to keep things in the dark ages and do things in a non-secure fashion (funny, they probably outlawed dblinks for security reasons - only they got it backwards)

(sqlplus copy has not been touched since 8.0 days - or even longer, it's been deprecated for a long time unfortunately)

Peter, March 16, 2010 - 5:03 pm UTC

Unfortunately site is restricted by security rules and I must respect them.. and that is actually not related to sql COPY problem.

However, you are right that COPY was obsolete in late 8.x, 9.x, 10x, (11?)..

But interestingly, documentation

http://download.oracle.com/docs/cd/B19306_01/server.102/b14357/apb.htm#i641598

says that it will not support "new" types fron now, etc..

In my case, I think table with one field CHAR(1 CHAR) does not refer to "not"supported new types and multibyte is here for years...

Anyway, many thanks, I am 'really' happy to find-out the problem is not me :) this time... so I can stop wasting time.

PS: Maybe it is time to drop COPY binaries finally from source CDs - or to make it perfect and flexible for scenarios where no datapump and dblink access is allowed.


11g

A reader, April 27, 2010 - 10:14 am UTC

Tom:

Quick Question.

We are in the process of migrating a database from 9.2.0.0 to 11g.

I noticed DBA created the new 11g database with WE8MSWIN1252
character set instead of WE8ISO8859P1.

Would this be OK or there might be some data loss and we would need to run the scanner and check the data after importing. It seems both are very similar character sets.
If this can create problems then i would have to request to recreaet the database with WEISO88591P1.



***9i*****

PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_CALENDAR GREGORIAN
NLS_CHARACTERSET WE8ISO8859P1
NLS_RDBMS_VERSION 9.2.0.2.0

****11g******

PARAMETER VALUE
------------------------------ ----------------------------------------
NLS_CALENDAR GREGORIAN
NLS_CHARACTERSET WE8MSWIN1252
NLS_RDBMS_VERSION 11.1.0.7.0




Tom Kyte
April 27, 2010 - 11:50 am UTC

why would a dba create a new 11g database in order to migrate? That is so wrong.


Why didn't they just upgrade it???? Using export and import to upgrade isn't upgrading - just upgrade the thing...

Every export/import "upgrade" I run into has issues - done wrong, didn't know you were missing something for a while, character set conversions - whatever. And it is rather easy to avoid - you just have to save yourself a couple of hours or days to achieve it too (yes, *save* not use more). Just upgrade.

11g

A reader, April 27, 2010 - 3:30 pm UTC

Tom:

I gave up arguing with them on this.
it is not worth getting some grey hair for this.

One reason I assume, is that they are changing the *MACHINE* hardware. The new machine has 11g server running on it.

In theory there should *not* be any problem with WEMSWIN1252 since it is a superset of WEISO8859P1 according to this.


http://arjudba.blogspot.com/2009/03/difference-between-we8iso8859p1-and.html

I thought I will confirm with you though. Would there be anything to worry about. do we need to run data scanner. The oracle http server is also running MEMSWIN1252 so it might be better.
Tom Kyte
April 28, 2010 - 7:31 am UTC

I'm not a character set expert (and I doubt they are either). This will require heavy duty testing.


Even if they were changing hardware - if it were the same OS, I would upgrade.

If it were different OS - but same endian systems, I would upgrade and full database transport.


If it were different OS - but different endian systems, I would install NEW with same character set and cross platform transport the data.


export/import - not. a. chance.

11g

A reader, April 28, 2010 - 9:41 am UTC

Tom:

What tools you would use for heavy duty data testing (oracle data scanner?).

yes, the new machine with 11g has different O/S. Not sure what you mean by endian system.

You would do the migration using transportable tablespaces instead of exp/imp. for some reason exp/imp seems to be a very popular tool among dbas. What are your concerns about it.


Tom Kyte
April 28, 2010 - 12:42 pm UTC

Sam

You know how to use google? Endian is something I would think every software developer would know - and most DBA's. It is sort of a basic in our profession - search for it and research it.


If I could have everyone raise their hands if they've ever had a dmp file that for whatever reason -wouldn't import. Or have spent days watching import say "I'm working.. I'm still working... I'm still still working....". Or have lost/corrupted data because they didn't see that "warning" about character set conversion. Or have lost and index/grant or whatever because they didn't review the log. Or.. Or... Or..... it is just about the hardest way to do it.


You'd want to run csscan at least. And do a functionality test on all aspects of your application - make sure things sort as you think they should, data that should be enterable is enterable (and unchanged), that the clients all have the right character set set in their environment, etc.

11g

A reader, April 28, 2010 - 6:17 pm UTC

Tom:

Thanks for the reply. I know what is small endian and large endian mean now.

I do not know why many DBAs I talked to seem to prefer new fresh installation of 11g database server and then import/export a 9i schema instead of doing an upgrade (9i to 11g) on the database software. They seem to think the upgrade is *buggy* and new install is better.

But as i was reviewing this thread again, i notice you recommend (march 2003) a similar steps to what we are doing.
*********************************************************
1) i would just export/import the SCHEMA's I needed. I don't like to do full database exports.

Just create a new database.
Export the schemas you want.
*********************************************************

we are only doing one schema/user exp/imp (not a hole database) and also yes we will review the import log for any warnings/errors, etc.

But you know better.
Tom Kyte
September 15, 2010 - 8:13 am UTC

umm, did you happen to read the context surrounding it?

It was 7 years ago.
version 8.1.7.
cross platform transports - did not exist.
changing character sets.

does that happen to describe you SAM? are you on 8i and WANTING to change charactersets? And doing an upgrade?

Or, did you tell me "we are in the process of migrating a database from 9i to 11g" - oh wait, yes, that is what you said.


You started by saying:

We are in the process of migrating a database from 9.2.0.0 to 11g.

And now you are saying

we are only doing one schema/user exp/imp

If I had a nickle for everytime that happened - that you started by saying "X" and then said "but since I'm doing 'Y'".... sigh.......


Sam, you may do whatever you like - go for it.

WE8MSWIN1252 to AL32UTF8

A reader, August 30, 2010 - 9:37 am UTC

My source is in Linux RHEL 5.4 - Oracle 10.2.0.4
character set: WE8MSWIN1252

My target is in Linux RHEL 5.4 - Oracle 11.1.0.7
character set: "AL32UTF8"

Q1. We will be using Informatica PowerExchange CDC capture to pull data from source to Target. Will there be any issues with data since the character set is different?

Q2. If we use DBLinks and pull data from source to target, will there be any issues since the character set is different.

The target has not yet been set up. It is advisable to have the target DW characterset to WE8MSWIN1252? What is the difference between these two ?
Tom Kyte
September 09, 2010 - 9:30 am UTC

q1) could be - you'll need to research if al32utf8 is a strict superset of we8mswin1252.


q2) see q1...

Some answers to last questions

Sergiusz Wolicki, September 15, 2010 - 1:33 am UTC

Some extra answers to the last two topics in this thread:

## Would this be OK or there might be some data loss and we would need to run the scanner and check the data after importing.

It depends. If the original database keeps really only WE8ISO8859P1 characters, then there will be no issues. But many WE8ISO8859P1 databases actually contain Windows-specific character codes, i.e. codes defined in WE8MSWIN1252 but not defined in WE8ISO8859P1 (ok, actually defined but as control codes). They include "smart" quotes, Euro currency symbol, bullet symbol, TM symbol, en-dash, em-dash, French ligature oe, etc. Such characters would be lost during the import.

In the world in which Windows workstations dominate, WE8MSWIN1252 is actually preferred over WE8ISO8859P1. In 11g, WE8ISO8859P1 is not on the list of recommended character sets available in OUI. But I fully agree with Tom, that it is faster and safer to upgrade the database to 11g using the old character set and only then change the character set using appropriate character set migration procedures.

## Q2. If we use DBLinks and pull data from source to target, will there be any issues since the character set is different.

A common issue with such setup is related to length expansion. When non-ASCII characters are converted from WE8MSWIN1252 to AL32UTF8, they expand in byte size. All characters in WE8MSWIN1252 have one-byte codes. But only the first 128 codes of WE8MSWIN1252 still have the same one-byte codes in AL32UTF8. All other characters become two-byte codes, with exception of Euro symbol and some punctuation characters, which become three-byte codes. Therefore, if you pull a 4000-byte (or slightly shorter) value through a database link, it may become longer than 4000 bytes. It will not fit into VARCHAR2 any longer and you may need to use CLOBs to store the values.

## The target has not yet been set up. It is advisable to have the target DW characterset to WE8MSWIN1252?

If the DW is going to hold data only from WE8MSWIN1252 sources, then it makes no sense to create it in AL32UTF8. But if it is expected that text data in non-Western European languages will be added, sooner or later, to the DW, it is advisable to create the database in AL32UTF8 to avoid future costly migration.

## What is the difference between these two ?

Huge. WE8MSWIN1252 is a Western European character set. It can store data only from Western European languages. It cannot even store all languages of the European Union (misses Greek, Polish, Czech, Hungarian, Bulgarian, etc.). AL32UTF8 (=Unicode UTF-8), on the other hand, supports all characters that can be stored in an Oracle database -- all other character sets are defined as mappings to Unicode. But AL32UTF8 is a multibyte character set. It is therefore somehow slower to process and puts additional requirements on applications.


Sergiusz


PS. Anybody wanting to dig deeper into character set issues is invited to post questions to the Database/Globalization Support forum on OTN.

Tom, feel free to refer all guys asking globalization questions that you do not want to answer yourself to this forum.

How to find out the client character set of a session?

Michael, September 20, 2010 - 6:30 am UTC

Hi Tom,

do you know of a way (i.e. data dictonary query etc.) to find out the client character set (well, what the client told the database) of a session?
Tom Kyte
September 20, 2010 - 2:56 pm UTC

there is not one as far as I know (if anyone knows better, I'd be interested - it would have to be a very very new feature - but I'm not aware of any way to get it unless the client application pushed it into the database for us)

Characterset

ABhisek, September 22, 2010 - 3:13 pm UTC

Hi Tom,

Can we change the NLS_CHARACTERSET at session level through alter session.. If yes, please let me know the syntax..
Tom Kyte
September 23, 2010 - 10:44 am UTC

no, you may not, it is always specified by the client, the client has to make that decision

How to find out the client character set of a session?

Sergiusz Wolicki, September 25, 2010 - 7:16 am UTC


Actually, it is possible now (>=11.1). See:

http://download.oracle.com/docs/cd/E11882_01/server.112/e17110/dynviews_3018.htm#i1415448

V$SESSION_CONNECT_INFO.CLIENT_CHARSET

This information is pushed by OCI to the server ats login time.
Tom Kyte
September 27, 2010 - 11:26 am UTC

excellent, I shall file that away in my head and hope it stays indexed for the next time :)

Character set of client

Michel Cadot, September 27, 2010 - 11:52 am UTC


I'm not convince this column is currently actually filled:
SQL> select distinct CLIENT_CHARSET from V$SESSION_CONNECT_INFO 
  2  where sid=sys_context('userenv','sid');

CLIENT_CHARSET
----------------------------------------
Unknown

SQL> @v

Version Oracle : 11.2.0.1.0

Regards
Michel
Tom Kyte
September 27, 2010 - 12:38 pm UTC

ops$tkyte%ORA11GR2> select distinct CLIENT_CHARSET from V$SESSION_CONNECT_INFO where sid=sys_context('userenv','sid');

CLIENT_CHARSET
----------------------------------------
US7ASCII


....
[tkyte@localhost ~]$ export NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
[tkyte@localhost ~]$ plus

SQL*Plus: Release 11.2.0.1.0 Production on Mon Sep 27 13:37:51 2010

Copyright (c) 1982, 2009, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

ops$tkyte%ORA11GR2> select distinct CLIENT_CHARSET from V$SESSION_CONNECT_INFO where sid=sys_context('userenv','sid');

CLIENT_CHARSET
----------------------------------------
WE8ISO8859P1






you have to be using an 11.1 or better OCI client (no jdbc thin)

are you?

Character set of client

Michel Cadot, September 27, 2010 - 11:54 am UTC


Actually I got with a 11.2 SQL*Plus on a 11.2 instance (the previous post was from a 10.2 SQL*Plus):
SQL> select distinct CLIENT_CHARSET from V$SESSION_CONNECT_INFO
  2  where sid=sys_context('userenv','sid');
CLIENT_CHARSET
----------------------------------------
WE8MSWIN1252

Regards
Michel

character set

A reader, October 18, 2010 - 10:09 pm UTC

Hi Tom,

I would like to ask some simple questions.

Why do we need Globalization support? Why do we need character set conversation? Is this something to do with the OS on which oracle is installed?

Tom Kyte
October 25, 2010 - 10:08 am UTC

... Why do we need Globalization support? ...

because there are many languages/ways of doing things around the world.

I use a ',' to separate thousands from hundreds in numbers and a '.' to separate decimals. You might not.

I need 7bits to store my entire alphabet. You might need a lot more than 7bits.



... Why do we need character set conversation? ...

because I might be using US7ASCII on my workstation (7 bit data only please) and you might be using WE8ISO8859P1. When I select your data, I must have it converted to MY character set (else you might make my screen beep!). A character in your character set might well use a DIFFERENT binary code then I use to represent the same character. It is almost like speaking different languages!



It is not the OS, it is the client application (it has NLS settings) and the database (it has NLS settings).

Character set

A reader, October 25, 2010 - 10:50 am UTC

Hi Tom,

Thanks for that.
1) Can't it be possible for everyone in this world to use a general character set so that no conversion is required?
2) Also, do the character set encoding scheme decides how much space each character can accommodate? I mean can I say that for a single byte encoding, char(20) will require 20bytes and for a multi byte encoding, char(20) will require 40bytes of space?
3) Can the code values of a definite character (say A) in different character sets be different?
Tom Kyte
October 25, 2010 - 6:12 pm UTC

1) a variation on "why can't we all just get along"

There is the universal character set - UTF - but that is just a character set capable of storing any character - clients will STILL have their character set that they support.

I don't use terminals capable of displaying certain characters - I need the translation.

Until we all use the same characters - translation of character sets will be necessary.

2) you can say that in a single byte character set a varchar2(20) will require upto 20 bytes for the characters and can store 20 characters at least (and at most) ( I refuse to use char, bad idea, I won't go there). In a multibyte character set - the characters might take from one to six bytes depending on the implementation. So a varchar2(20) in a multibyte character set might be able to store only three character (6 bytes * 3 characters = 18 bytes) or it might be able to store 20 characters - if they all take 1 byte. A varchar2(20 CHAR) can store 20 characters (but might need 120 bytes to do so!)

3) yes, it is not beyond the realm of belief that that could be true. In general, a simple character like A will be pretty uniform - but something like the euro symbol would and does change from character set to character set.

character set

A reader, October 25, 2010 - 9:31 pm UTC

Hi Tom,

1) From explanation #2 above I have a feeling like, varchar2(20) has maximum accommodation capability of 20 bytes. If multibyte encoding scheme is used then it can store 3 char(each with 6 bytes) to 20 char(each with 1 byte). Isn't it?
2) I couldn't understand the difference between fixed width and variable width multibyte encoding schemes. Can you please provide some examples of that?
3) Can 7 bit and 8 bit single byte encoding accommodate same 1 byte of space? i.e. for the former 7bit=1byte and for the later 8bit=1 byte?
Tom Kyte
October 26, 2010 - 7:49 pm UTC

1) you would be correct, varchar2(20) is by default varchar2(20 BYTES)

and as I wrote, that might only hold 3 characters in some character sets, yes.


2) I don't know what you mean? Some character sets use a single byte for ANY character. Some other character sets use 1 bytes for some characters, 2 for others, 3 for yet others, 4 for some others and so on....

3) 7 bits need one byte, 8 bits need one byte. They encode the same characters DIFFERENTLY however (since one uses 8 bits and the other 7bits).

Shannon Severance, October 25, 2010 - 11:46 pm UTC

"3) yes, it is not beyond the realm of belief that that could be true. In general, a simple character like A will be pretty uniform - but something like the euro symbol would and does change from character set to character set."

Depends how far afield you get EBCDIC will have signficantly different representation for simple characters than ASCII and all the character sets that build on ASCII.

"1) From explanation #2 above I have a feeling like, varchar2(20) has maximum accommodation capability of 20 bytes. If multibyte encoding scheme is used then it can store 3 char(each with 6 bytes) to 20 char(each with 1 byte). Isn't it?"

No you can define size of a varchar2 with either byte semantics or char semantics. If you define with char semantics you are saying how many characters to hold, irregardless of size. So VARCHAR2(20 CHAR) will hold up to twenty characters, no matter how many bytes it takes. Note that the restrictions imposed on the datatype VARCHAR2 of 4000 bytes, is 4000 bytes. So VARCHAR2(4000 CHAR) may not hold 4000 characters. (At least that is true of 10gR1.)

"2) I couldn't understand the difference between fixed width and variable width multibyte encoding schemes. Can you please provide some examples of that?" Read up on UTF-8. The standard 7 bit ASCII characters are one byte with the high bit set to zero. If the high bit of the byte is set to one, the byte is part of a multi-byte sequence that represents a single character.

http://en.wikipedia.org/wiki/UTF-8

The biggest reason for different character sets is history. Everyone working in their own little corner. Software used to be local, computers weren't generally connected all to each other across the globe.

Unicode is probably the future, a future with just one character set, but not everything works that way yet. And even with unicode there are different encodings, UTF-8, UTF-16, and UTF-32, with UTF-16 and UTF-32 coming in both big-endian and little-endian format. Check out the FAQs at http://www.unicode.org/
Tom Kyte
October 26, 2010 - 7:51 pm UTC

3) correct, that is why I said "in general", you would go wide and far to find the exception.



1) by DEFAULT varchar2(20) is 20 bytes, you would have to make a major change for it to be otherwise.


varchar2(20 char) is NOT varchar2(20) and varchar2(20) by default is varchar2(20 bytes) - but could be changed to be varchar2(20 char). But IN GENERAL, by DEFAULT varchar2(20) will be 20 bytes




Character sets

Sergiusz Wolicki, October 29, 2010 - 6:11 am UTC

## 1) by DEFAULT varchar2(20) is 20 bytes,
## you would have to make a major change for it
## to be otherwise.

Not really a major change. It is enough to say ALTER SESSION SET NLS_LENGTH_SEMANTICS=CHAR for VARCHAR2(20) becoming VARCHAR2(20 CHAR) in all CREATE TABLE statements following in the same session. If you set NLS_LENGTH_SEMANTICS=CHAR in spfile, which you should never do (!), this default will change for all following sessions (unless overridden individually be a session back to BYTE).

Therefore, the lack of semantics specification in character data type declarations is legacy -- unfortunately very strong legacy. My recommendation is to always explicitly say VARCHAR2(20 BYTE), if this is what you want for your application.

Regarding fixed-width vs. variable-width character sets. A fixed-width character set (actually, character encoding) defines all character codes as having the same number of bytes. A fixed-width character set may be single-byte, with each code having only 1 byte, or multibyte, with each code having more than 1 byte.

A variable-width character set must be multibyte. Variable width means that different character codes may have different number of bytes. The minimum number is 1 and the maximum is 4 (at least in all Oracle-supported character sets). UTF8 (not to be confused with UTF-8) is sometimes considered to use 6-byte codes but I prefer to regard it as using 3-byte codes from which some codes need to go in pairs (this is another subject).

Oracle supports many single-byte character sets for the database character set but no multibyte fixed-width character sets. The fixed-width multibyte (precisely, 2-byte) character set AL16UTF16 is allowed only as the national character set (for NCHAR/NVARCHAR2). AL16UTF16 is also used internally to store CLOBs if the database character set is multibyte variable width.

Many variable-width multibyte character sets are supported for the database character set. Except for Unicode encodings, UTF8 and AL32UTF8, there are various legacy character sets for storing Chinese, or Japanese, or Korean.

storing foreign language

A reader, November 16, 2010 - 9:46 pm UTC

Tom:

I have a 9i database in WE8ISO8859P1
A new requirement came in to store some data like book titles in Russian, arabic, and other foreign languages.
What would be the best thing to do?
a) change the whole database to UTF-8. that would mean to create a new DB and migrate data and test everything - lot of work i expect
b) create new columns with NVARCHAR2(4000) or NCLOB type. This would require many code changes in the client code.
c) Change existng columns that need to store foreign data from VARCHAR2 to NVARCHAR2.

Would c) work. Is not this the best and easiest solution. would there be any consequences in data stored in DB or pl/sql stored procedures (SQL) or power builder client accessing the data.
Tom Kyte
November 17, 2010 - 7:40 am UTC

... What would be the best thing to do? ...

Sam - after the many years of asking questions on asktom - you should be able to guess what my answer is going to be...

IT DEPENDS.

a) might be best
b) or b might work better and easier
c) or c might be the smartest approach for you

in your circumstances, under your conditions, using your developed and purchased software.


c) would also likely require code changes in the client code.


a) would be great, you could use this as a good excuse to get on current software as well - 11gr2. Since you have to upgrade anyway - why not bite the bullet is get it done.


will there be consequences? sure, all three entail consequences. You'll be going from a single byte character set (where characters = 1 byte so a varchar2(N) holds N characters) to a multibyte character set where by 1 character = some number of bytes ( so a varchar2(n) holds less then or equal to N characters).

If you show Russian data in a client that only supports single byte data, you'll likely see lots of upside down question marks or the like.

If there isn't any sensible mapping of the WE8ISO data to your set of Russian characters, that client might see funny things too.

Someone with a WE8ISO that updates Russian data (or vice verse) may well create something you didn't intend.

This will be a somewhat non-trivial exercise for you, yes. It will require design, testing, planning - time in short.

unicode

A reader, November 17, 2010 - 10:33 am UTC

Tom:

Yes, i understand the client will conveert the data to whatever character set it is set to.

But let us say the client supports unicode. and i currently have this table in WEISO8859P1

MY_TABLE
-----------
TITLE VARCHAR2(4000)
NAME VARCHAR2(100)

and i changed it to

MY_TABLE
-------------------
TITLE NVARCHAR2(4000)
NAME VARCHAR2(100)

If the client or a pl/sql stored procedure has

SELECT TITLE, NAME from MY_TABLE;

would it result in any unnoticed issues or my data will be retrieved/displayed in unicode.



Tom Kyte
November 17, 2010 - 11:36 am UTC

the data would be returned to the client in the characterset of the client, whatever that is - if different from the character set of the database, or the national characterset of the database, appropriate conversions would take place.

the data would be manipulated in plsql using the national character set for the nvarchar2 data, since you are dealing with NVARCHAR instead of varchar for that one column.

where there be issues you have to deal with? Sam, just read the above stuff over again. One obvious issue will be that title is not 4000 characters anymore, it might be significantly less.

character sets

A reader, November 17, 2010 - 11:49 am UTC

Tom:

yes, you are right. the VARCHAR2(4000) may become NCLOB since you cant have VARCHAR2(8000). teh double byte may increase the size. I was assuming the data will fit in same column.

But that seems only part of the problem. I have to change all the PL/sql variable types from VARCHAR to NVARCHAR(if not dynamic) to match table column type.

Then if the web page is declared as ISO8859P1 we need to change that to UTF-8 for unicode support. The browser may have issues converting the ISO88591P1 to unicode.

I think you mean to say that *client* does the data conversion. It is not sent from database in the client character set too. correct?

It seems mixing character sets in different fields in the database creates all kinds of issues and it is better to convert the whole database to UTF-8.
Tom Kyte
November 18, 2010 - 3:06 am UTC

... I have to change all the PL/sql
variable types from VARCHAR to NVARCHAR(if not dynamic) to match table column
type.
...

might be an excellent time to sweep all of your code and see if you can replace explicit datatypes with table.column%type instead.

In between the database and the client - the conversion happens - that is the easiest way to think about it. pretend it happens on the wire between the two. It doesn't really matter if it physically happens "in the database before it leaves", "outside of the database after it leaves" - it only matters that it happens between the client and the database (even if the client is yet another database itself)

Character set

Ari, November 18, 2010 - 2:15 am UTC

Hi Tom,

1) I couldn't understand why in above you said: "One obvious issue will be that title is not 4000 characters anymore, it might be significantly less."
Can you please explain it?

2) Also if Client uses Japanese character set then I think for UTF8 national character set title column in the table MY_TABLE(title nvarchar2(4000),name (100)) will have no issues in storing Japanese characters. Isn't it? But we can have data loss for Name column. Isn't it correct?
Tom Kyte
November 18, 2010 - 3:37 am UTC

1) character set used for title right now: single byte WE8ISO... Therefore every character takes ONE BYTE - hence a varchar2(4000) holds at least 4000 characters in that characterset.

Now, change the character set used for title to: MULTI-BYTE, UTF. Now every character might take MORE than one byte. Therefore, a NVARCHAR2(4000) which holds at least 4000 BYTES of data - holds less then or equal to 4000 characters (if you use a character that takes two bytes - you have one less byte to store some other character in)

2) it depends - it depends on what character set the clients use and what those clients do to the data.

Character Set

A reader, November 18, 2010 - 6:17 am UTC

Hi Tom,

I thought UTF8 is a superset of WE8ISO8859P1. If yes shouldn't each character has same code value and same encoding bytes in both the character sets. I mean code values of "A" in UTF8 and WE8ISO8859P1 character sets should be same. Shouldn't it be?
Tom Kyte
November 19, 2010 - 7:19 am UTC

what about things above ascii 127?

character set

A reader, November 18, 2010 - 7:01 pm UTC

Tom:

Who does the conversion on the wire? You must have a software or protocol that does the conversion. I thought the client has that installed.

2) is this correct statement?

there are only 2 ways to store foreign language multibyte characters in oracle:

a) create a unicode database
b) if database is not unicode, create a column type using national character set (i.e NVARCHAR2).

Oracle CONVERT function can convert data from WEISO8859P1 to UTF-8 but if the foreign character was not supported by WE8ISO8859P1 it will not be stored and conversion will not work for that character.

Tom Kyte
November 19, 2010 - 9:24 am UTC

Sam, re-read what I wrote:

In between the database and the client - the conversion happens - that is the easiest way to think about it. pretend it happens on the wire between the two. It doesn't really matter if it physically happens "in the database before it leaves", "outside of the database after it leaves" - it only matters that it happens between the client and the database (even if the client is yet another database itself)

I wasn't addressing where it happened, just trying to tell you to conceptually understand it as "happening on the wire". It physically happens on the client in general, but can happen in the database (there are functions to do it).

You only need to understand "it happens in between" - just understand that and you are good to go.


2) there are other multibyte charactersets - other than unicode ones. so no, your statement is not correct.


Character sets (use of NVARCHAR2)

Sergiusz Wolicki, December 14, 2010 - 1:47 pm UTC


The use of NCHAR data type columns as a way to introduce multilingual functionality to an application is generally not recommended. There are a few reasons for this:

* Access to NCHAR columns requires application changes to set the "form of use" for input (bind) and output (define) variables. If the form of use is not set to NCHAR, the data will always be converted to the database character set on its way to/from clients, defying the purpose of using NCHAR.

* NCHAR data types are not supported by certain database features. Oracle Text and Oracle XDB are the most prominent ones.

* First you have to add multilingual capabilities to a single column, like a title. But later, the next requirement comes in and you have to add the same to customer names and addresses, author names, employee names, etc. You keep adding NCHAR columns and you realize that you should actually migrate your database to AL32UTF8. But the database is already much larger and it may be much more expensive and difficult to migrate it in whatever downtime window you have available.

Therefore, migrate to AL32UTF8. Though, if you have a legacy application that nobody develops any longer and which does not support multibyte character sets, you may be forced to keep your current database character set. Adding an NVARCHAR2 column to satisfy some special requirement for some new additional module makes sense in such case.

NLS_Sort (Generic_m) is not working in oracle 11g though the same working in 10g

Vinoth S, May 03, 2011 - 2:14 am UTC

Hi Tom,

Select * From v$version
1 Oracle Database 11g Release 11.1.0.6.0 - Production
2 PL/SQL Release 11.1.0.6.0 - Production
3 CORE 11.1.0.6.0 Production
4 TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
5 NLSRTL Version 11.1.0.6.0 - Production

Select * From v$NLS_PARAMETERS;

1 NLS_LANGUAGE AMERICAN
2 NLS_TERRITORY AMERICA
3 NLS_CURRENCY $
4 NLS_ISO_CURRENCY AMERICA
5 NLS_NUMERIC_CHARACTERS .,
6 NLS_CALENDAR GREGORIAN
7 NLS_DATE_FORMAT DD-MON-RR
8 NLS_DATE_LANGUAGE AMERICAN
9 NLS_CHARACTERSET WE8MSWIN1252
10 NLS_SORT BINARY
11 NLS_TIME_FORMAT HH.MI.SSXFF AM
12 NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
13 NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
14 NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
15 NLS_DUAL_CURRENCY $
16 NLS_NCHAR_CHARACTERSET AL16UTF16
17 NLS_COMP BINARY
18 NLS_LENGTH_SEMANTICS BYTE
19 NLS_NCHAR_CONV_EXCP FALSE

I have one table 'Test' in that column TestDesc varchar2(4000). Now if i retrieve the records by select sts, it is successfully executed. The same if i do after exec of the below sts, it is throwing ORA-00910:specified length too long for its datatype.

1. ALTER SESSION SET NLS_SORT=GENERIC_m;

2. Select * From Test order by TestDesc; (It is throwing ORA-00910 exception)

But the same query is got executed successfully in oracle 10g.

a. Why it is throwing this exception especially in 11g?
b. do i need to change the charactor set to UTF-8, will it be solve this prob?

Please advice
Tom Kyte
May 04, 2011 - 12:20 pm UTC

what are the block sizes of the two databases.
what are the character sets of the two databases.


Re: NLS_Sort (Generic_m) is not working in oracle 11g though the same working in 10g

Sergiusz Wolicki, June 20, 2011 - 3:41 pm UTC

This is bug #8889930, fixed in 10.2.0.5 and 11.1.0.7, support note 8889930.8; related to the behavior described here:

http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/functions113.htm#i78399

in the paragraph starting with "The string returned, also known as the collation key..."

ORA-910 has been introduced in the fix for bug #5208177 to avoid a memory corruption issue. Therefore, make sure you run 10.2.0.5 to avoid both the memory corruption and ORA-910.

SQL loader using CHARACTERSET JA16SJIS

Ranjith, July 27, 2011 - 2:02 am UTC

Tom,
I recevied fixed lenght files from HOST, I am able to see the Japanese characters in IE - Using Japananese Shift(JIS) encoding.

I am using below parameter in conrol file.

LOAD DATA
CHARACTERSET JA16SJIS
REPLACE
INTO TABLE JD_NAFAPPS_DATA

In sqlloader, i am getting below error.

Record 1: Rejected - Error on table JD_NAFAPPS_DATA, column NAPP_SRCE_DESC.
Multibyte character error.
Record 2: Rejected - Error on table JD_NAFAPPS_DATA, column NAPP_SRCE_DESC.
Multibyte character error.
Record 3: Rejected - Error on table JD_NAFAPPS_DATA, column NAPP_SRCE_DESC.
Multibyte character error.
Record 4: Rejected - Error on table JD_NAFAPPS_DATA, column NAPP_SRCE_DESC.
Multibyte character error.

Please help to provide any alternative for this.


case insensitive and different character sets,

A reader, August 25, 2011 - 5:14 pm UTC

Hello,

We are going to have a huge table with information stored in different character sets including English. In English, the values can be in any case and the search condition need to retrieve all the records irrespective of the case the string is. Does the combination of nls_sort=binary_ci and nls_comp=linguistic works?

If users from Germany (for example) want to search by putting German string, I think NLS_SORT need to be set to GERMAN and likewise for other languages.

Question: How to enable all the searches irrespective of the language to return the information they are looking for?

If I create index for each language (on one particular column), will it have an impact when 100's and 1000's of records are inserted every day?

Thanks,

Tom Kyte
August 30, 2011 - 4:29 pm UTC

If you want to search in each language using an index - you would need an index per language with the right NLS_SORT settings.

If you have many indexes - it will definitely affect insert performance. Now if you only insert 100's or 1000's of records per day (tiny) - it might well not really matter. But if you start trying to insert millions or billions - you'll feel it.


Otherwise - if you don't need an index - you can have each session set their NLS settings to that which makes sense for them and it'll work.

sqlldr and characterset

Lal, August 31, 2011 - 5:19 am UTC

Tom, 
i have the following query on sqlloader with loading special charater
Test case

sqlplus scott/tiger@orcl27
create table try (
  a VARCHAR2(5 CHAR), 
 b VARCHAR2(5 CHAR) 
)

data file try.dat with the following line
FRANÇGBGB

control file try.ctl with the following code
load data
infile 'try2.dat'
insert
into table try
(a position(1:5),
b position(6:10))

on using sqlloader
D:\sqlloader\sqlldr scott/tiger@orcl27 control=try.ctl

SQL*Loader: Release 9.2.0.1.0 - Production on Wed Aug 31 14:54:18 2011

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Commit point reached - logical record count 1


select * from try from sql developer showed 
FRANÃ and ¿GBGB for columns a and b.

I tried with characterset option by using the following code 
load data
characterset utf8
infile 'try2.dat'
insert
into table try
(a position(1:5),
b position(6:10))

on execution the following error was shown in log file

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
A                                     1:5     5           CHARACTER            
B                                    6:10     5           CHARACTER            

Record 1: Rejected - Error on table TRY, column A.
Multibyte character error.

but when i modified the code as shown below
alter table try modify a varchar2(10 char)
load data
characterset utf8
infile 'try2.dat'
insert
into table try
(a position(1:10),
b position(6:10))

got the following data
FRANÇGBGB ¿GBGB

In this case i merged two values for a and b into column  a by changing the position from 1:5 to 1:10 for first column.

The character in fifth position is a special character.

Does that mean if the character corresponding to the end value in position clause is a special character sqlloader throw error?

do i need to use any client settings
I am using a windows machine for executing sqlldr and database server is in linux with the following nls settings

NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET WE8ISO8859P1
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 10.2.0.1.0


Tom Kyte
August 31, 2011 - 1:53 pm UTC

Does that mean if the character corresponding to the end value in position
clause is a special character sqlloader throw error?


yes, you are trying to load FIXED WIDTH data - that is always measured in bytes.

So, you would have to actually use position(1:20) for the first five characters and have your input file set up accordingly since a utf8 character takes between 1 and 4 bytes each.

If you want to use fixed width files - you'll need to make sure each of your fields in the file is big enough to hold the entire string - 5 bytes doesn't cut it.

case insensitive and different character sets- follow up,

A reader, August 31, 2011 - 10:57 am UTC

Thanks Tom. I have information stored in different character sets (cyrillic, hebrew, traditional chinese, kanji, etc). These are actually not languages. If I want to create an index for those character sets, what would be my specification for NLS_SORT in the Index create statement? I don't think I can say NLS_SORT=CYRILLIC.

What is geeric linguistic sort GENERIC_M ?

Thanks,

sql loader and special characters

A reader, September 01, 2011 - 5:54 am UTC

Thanks Tom, for the very fast response on my query on sql loader with fixed width file having special characters (multibyte).

The source file is provided by a different vendor and its not possible to change the structure of source file format.

In that case, i think the option is to load entire record into a table and scan for records having multibyte characters, and do a temporary transformation to make the special characters single byte and after loading to target columns update the special character columns correctly.

Any better options you can suggest on this, considering this limitation.

Also i assume, that the same limitation is applicable for external tables as well for fixed width file.

Thanks in advance for your precious time.

Tom Kyte
September 01, 2011 - 8:21 am UTC

The source file is provided by a different vendor and its not possible to
change the structure of source file format.


Then they have given you something that is virtually useless - you'll have to write a custom program to read that file character by character.


garbage in, garbage out.

case insensitive and different character sets- follow up,

A reader, September 01, 2011 - 9:00 am UTC

Thanks again. I like to know more about NLS settings. The Index for example:
CREATE INDEX nls_index ON my_table (NLSSORT(name, 'NLS_SORT = German'));

The above index does two jobs: 1) better performance and 2)sorts the data in german.

Taking performance out of the equation, how can I use NLS setting to sort data in german (or korean, or japanese) without using the Index? How can I dynamically set NLS parameters for each session that suits their respective language? Finally, what the NLS parameters that I have to set?

Thanks,

Tom Kyte
September 01, 2011 - 9:02 am UTC

Taking performance out of the equation, how can I use NLS setting to sort data
in german


simple, you log in with your NLS settings correct and you say "order by column", it will do the right thing.


The client sets their NLS parameters by default using their registry or environment. If you want to set of different defaults - you would/could use a login trigger.

The NLS parameters you want to set are up to you - my suggestion would be to read the entire globalization guide to get a good understanding of how all of this stuff works. It is not a huge document...

follow up,

A reader, September 02, 2011 - 10:08 am UTC

About linguistic index and nls_sort. According to the Oracle doc:
Set NLS_SORT Appropriately

The NLS_SORT parameter should indicate the linguistic definition you want to use for the linguistic sort. If you want a French linguistic sort order, then NLS_SORT should be set to FRENCH. If you want a German linguistic sort order, then NLS_SORT should be set to GERMAN.

If I have a Linguistic Index why would I still need to set NLS_SORT to a particular language? I thought Index had already sorted the data in respective language.


Further on Lingustic Index: Oracle document provides option to create single linguistic index for all languages. The following example builds a single linguistic index for multiple languages. With this index, the rows with the same values for NLS_LANGUAGE are sorted together.

CREATE INDEX i ON t (NLSSORT(col, 'NLS_SORT=' || LANG_COL));

I tried and I got syntax error. Is LANG_COL is a variable?

Thanks,


Tom Kyte
September 02, 2011 - 11:15 am UTC

If I have a Linguistic Index why would I still need to set NLS_SORT to a
particular language? I thought Index had already sorted the data in respective
language.


you need to tell us to use it. You would either have to code the nlssort call in your order by OR set your nls_sort and well automagically do it for you.




http://docs.oracle.com/docs/cd/B19306_01/server.102/b14225/ch5lingsort.htm#sthref651

lang_col is, as per the documentation, another column in that table that tells us what LANGUAGE that row is.

Build a single linguistic index for all languages. This requires a language column (LANG_COL in "Example: Setting Up a French Linguistic Index") to be used as a parameter of the NLSSORT function. The language column contains NLS_LANGUAGE values for the data in the column on which the index is built. The following example builds a single linguistic index for multiple languages. With this index, the rows with the same values for NLS_LANGUAGE are sorted together.

still confused :-(

A reader, September 02, 2011 - 11:35 am UTC

About linguistinc indexes, I am thinking the 2nd option in the document link you sent creates one Index that is applicable for any language.  Am I correct in this assumption?  The first option is clear that one index is for each language.

However, if I look at the syntax for the option 1 and option 2, they both are same:

CREATE INDEX test_idx ON test4(NLSSORT(name, 'NLS_SORT=FRENCH'));  --2nd option


CREATE INDEX french_index ON employees (NLSSORT(employee_id, 'NLS_SORT=FRENCH')); --first option

How is option 2 different than option 1?

I also tried the 2nd option on my test database and got errors:

 SQL> desc diff_char3
 Name                                                                                                              Null?    Type
 ----------------------------------------------------------------------------------------------------------------- -------- ----------------------------------------------------------------------------
 ADDR                                                                                                                       NUMBER
 NAME                                                                                                                       VARCHAR2(100)
 LANG                                                                                                                       VARCHAR2(20)

SQL> create index idx1 on diff_char3 (NLSSORT(col,'NLS_SORT='||name));
create index idx1 on diff_char3 (NLSSORT(col,'NLS_SORT='||name))
                                         *
ERROR at line 1:
ORA-00904: "COL": invalid identifier


SQL> create index idx1 on diff_char3 (NLSSORT(name,'NLS_SORT='||name));
create index idx1 on diff_char3 (NLSSORT(name,'NLS_SORT='||name))
                                                           *
ERROR at line 1:
ORA-12702: invalid NLS parameter string used in SQL function

But if put FRENCH, the it creates (but this is like option 1).
SQL> create index idx1 on diff_char3 (NLSSORT(name,'NLS_SORT=FRENCH'));

Index created.


Tom Kyte
September 02, 2011 - 12:17 pm UTC

this is the concept they were trying to demonstrate. Their tables had a column COL and a column LANG_COL. since your table has NAME and LANG, you will have to change that of course:

ops$tkyte%ORA11GR2> create table t ( name varchar2(20), lang varchar2(20) );

Table created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> insert into t values ( 'abc', 'FRENCH' );

1 row created.

ops$tkyte%ORA11GR2> insert into t values ( 'def', 'GENERIC_M' );

1 row created.

ops$tkyte%ORA11GR2> 
ops$tkyte%ORA11GR2> create index t_idx on t(nlssort(name,'nls_sort='||lang));

Index created.


confusion gone :-)

A reader, September 02, 2011 - 1:02 pm UTC

Oh.. my interpretation was completely different. Thanks for clarifying it.


NLS_SORT Confused Why It Does Not Take

Jim Cox, October 06, 2011 - 1:57 pm UTC

Hi Tom


unsure what I am doing incorrect regarding NLS_SORT

I set the NLS_SORT='Generic_M' in my PFile
I restarted the database
The Alert Log shows (excerpt):

  timed_statistics         = TRUE
  streams_pool_size        = 64M
  nls_sort                 = "Generic_M"
  _first_spare_parameter   = 1
  memory_target            = 2G
  memory_max_target        = 2G

When i Check the database 
SQL> show parameter nls_sort

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----------
nls_sort                             string      BINARY

if i do 

ALTER SESSION SET NLS_SORT='GENERIC_M';

it looks fine

SQL> show parameter nls_sort

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----------
nls_sort                             string      GENERIC_M

Tom Kyte
October 06, 2011 - 2:06 pm UTC

same reason as this:

http://asktom.oracle.com/pls/asktom/ASKTOM.download_file?p_file=6551282707163926958

probably.... client has some NLS settings set - and if they have any set - then none of the database defaults are used.

NLS_SORT Confused Why It Does Not Take-Follow-up

Jim Cox, October 06, 2011 - 2:24 pm UTC

Hi Tom

I have no clients connected
Strictly on the server myself, no other connects

Thanks
Jim
Tom Kyte
October 06, 2011 - 2:51 pm UTC

*you* are the client, you have things set in your registry.

in order to run a query there must have been a client - and that was *you*!

NLS_SORT Confused Why It Does Not Take-Follow-up

Jim Cox, October 06, 2011 - 2:45 pm UTC

Hi Tom

well it looks like it has to be entered into the windows registry to take effect. Placing it into the Pfile does nothing as far as I can see

Thanks
Jim
Tom Kyte
October 06, 2011 - 3:01 pm UTC

again - it is because the registry is the clients registry... You had some NLS settings in the registry (the windows install always puts something in there) - hence your database default nls settings won't be used.

NLS_SORT Confused Why It Does Not Take-Follow-up

Jim Cox, October 06, 2011 - 4:39 pm UTC

Hi Tom


Okay, then is this a correct assumption? If the only "NLS" type entry I have in the oracle home registry on my server is NLS_LANG, then if I remove that entry and add the NLS_SORT entry into the PFile and restart the database, are you saying that it (NLS_SORT) will now show correctly ?

Sorry for being so dense about this one

Thanks
Jim
Tom Kyte
October 06, 2011 - 10:22 pm UTC

If you don't set any NLS settings in your registry- then non of the database defaults will "take"

So, if you have NLS_LANG in your client setup, none of the database defaults will take place.

See that link above - what I said about nls_date_format applies 100% to this.

NLS_SORT Confused Why It Does Not Take-Confused Again

Jim Cox, October 07, 2011 - 12:09 pm UTC

HI Tom

I created a simple test case and i still do not understand why the NLS_SORT does not take affect until after I do an Alter Session. Can you explain it to me and what I am missing.

Thanks for your Patience
Jim

create table sorting
(
col_string varchar2(40)
);

insert into sorting values ('A');
insert into sorting values ('B');
insert into sorting values ('C');

insert into sorting values ('a');
insert into sorting values ('b');
insert into sorting values ('c');

commit;



10g Database

show parameter nls_sort;

NAME TYPE VALUE
------------------------------------ ----------- ----------
nls_sort string

select col_string from sorting order by col_string;

COL_STRING
----------------
A
B
C
a
b
c


shutdown immediate
modify Pfile (add NLS_SORT='Generic_M')

startup

show parameter nls_sort;

NAME TYPE VALUE
------------------------------------ ----------- ----------
nls_sort string Generic_M


select col_string from sorting order by col_string;

COL_STRING
----------------------------------------
A
B
C
a
b
c

alter session set nls_sort='Generic_M';


select col_string from sorting order by col_string;

COL_STRING
---------------------------------------
a
A
b
B
c
C
Tom Kyte
October 07, 2011 - 2:15 pm UTC

please read this:

http://asktom.oracle.com/pls/asktom/ASKTOM.download_file?p_file=6551282707163926958


if you have the NLS_LANG set in your registry, then the client will use the *defaults for that nls_lang and your settings in the database just don't count anymore*


I GIve Up

Jim Cox, October 07, 2011 - 4:16 pm UTC

Hi Tom

I have read that document over and over and over and I don't get it. The only thing I get is that any NLS setting on the client overides any server/database setting. I am not on a client, I have no client connections. I am on the server. Just me and the server. Only me as a connection on the server. I will have to re-visit this next week as I have now been on this for two days and still do not understand why the NLS_SORT parameter in the PFILE will not work correctly on the server.

Thanks for all your time :-(
Tom Kyte
October 07, 2011 - 4:23 pm UTC

Jim

YOU are the client.

when you go into sqlplus - YOU are the client. You are the client connection.

Without being a client - you would not realize there was an issue - you have to be a client to see anything in the database.

*you* are also a client - if you are in sqlplus, if you are logged in, you are by definition *a client*

A client doesn't have to reside far away - it can be right there next to the database, on the same machine.

Azhar, October 24, 2011 - 9:42 am UTC

Hello Tom,

In our database we uses character set as WE8ISO8859P15, and client uses characterset WE8ISO8859P1 so my question is that 1)Is there any impact of using different character set at client and database side 2) what is the difference between character set WE8ISO8859P1 and WE8ISO8859P15.

Regards,
Azhar
Tom Kyte
October 24, 2011 - 10:37 am UTC

1) yes, character set translation will take place. If you have a character stored in the database that is represented differently in the client character set - we'll convert it.

2) you'll have to research that - we don't make up the charactersets - we just implement them. I'm not intimately familiar with the bits and bytes differences between them all myself - google them up...

Loading spanish characters into WE8ISO8859P1

Sita, October 24, 2011 - 10:50 am UTC

Hi Tom:

Oracle Version: 11.2.0.1

Our DB character set is WE8ISO8859P1.
Application team has received spanish characters encoded in UTF-16BE. This need to be loaded into couple of tables only.
This is the first time they are receiving it from SAP system. We would like to handle this at the table level rather than database level.

Can you please advice whats the best way to load this spanish character:

1. Do we need to change the DB character set ?
2. Can we change the column datatype to unicode ? 
   If yes, How to change it ?


Note: Since only couple of tables are expected with spanish character load, ideally do not wish to change the entire DB character set.

Thanks!


SQL> select * from nls_database_parameters ;

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_NCHAR_CHARACTERSET         AL16UTF16
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CHARACTERSET               WE8ISO8859P1
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_SORT                       BINARY

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_TIME_FORMAT                HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT             HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY              $
NLS_COMP                       BINARY
NLS_LENGTH_SEMANTICS           BYTE
NLS_NCHAR_CONV_EXCP            FALSE
NLS_RDBMS_VERSION              11.2.0.1.0

20 rows selected.


Tom Kyte
October 24, 2011 - 11:07 am UTC

... We would like to
handle this at the table level rather than database level. ...

I don't know what you mean by that.

Tables do not have a charcterset, only databases do.


You should be able to load that into your database - the client would just say "Hey, I'm this characterset" and the database will convert the utf data into 8 bit western european data.

WE8ISO8859P1 can hold spanish characters.

Loading spanish characters into WE8ISO8859P1

Sita, October 24, 2011 - 12:37 pm UTC

Thanks Tom.

<< WE8ISO8859P1 can hold spanish characters

Glad that i do not have to change the character set.


<< You should be able to load that into your database - the client would just say "Hey, I'm this characterset" and the database will convert the utf data into 8 bit western european data.

Only one particular field (TRANSACTION_DESCRIPTION column) in the feed has spanish character embedded.

I believe you mean NLS_LANG setting on the client side ?
How do we specify export NLS_LANG=? in the UNIX shell prompt to set to SPANISH ?

Can you please show us the NLS_LANG setting.



Tom Kyte
October 24, 2011 - 12:55 pm UTC

the client would have to set an appropriate UTF character set since you said the data you were to load was encoded in that format.

export NLS_LANG=American_America.AL32UTF8

replace the AL32UTF8 with the character set that matches the encoding of your data.

Loading spanish characters into WE8ISO8859P1

Sita, October 24, 2011 - 1:26 pm UTC

Thanks Tom.

In my case, spanish characters encoded in UTF-16BE.

So I would think corresponding NLS_LANG on client side would be:

export NLS_LANG=American_America.AL16UTF16


Please correct me if this setting is wrong.
Tom Kyte
October 25, 2011 - 5:57 am UTC

Read about BE/LE

http://en.wikipedia.org/wiki/UTF-16

and decide if you need a BOM marker in order to deal with windows. Otherwise, sure.

character set and storage space

Amir Riaz, October 25, 2011 - 8:05 am UTC

Hi Tom,

My objective is to pack as many row in 32K block as i can and then compress the table.

after much study, we have found that AL32UTF8 take 1 to 4 byte (your book page 499). We are planning to change the character set to WE8ISO8859p1, since our data is English and America region.

we changed our database block size from 4K to 32K and compress it, now performance is satisfactory and now our objective is to compress it further with proper character set. Since WE8ISO8859p1 take one byte and we are not using any other language than English. so why waste storage with AL32UTF8

The problem is,i have a 25TB data warehouse on a two node rac 32 core processor machine with insufficient SAN IOs.

should change in character set will help Please suggest. Any suggestion in make the data compression better will be welcome from you. our clients cannot purchase Exadata but we have to do it in given hardware. It would be easier for us to work, if storage index and EHCC is present in Oracle database, because bitmap indexes are totally useless at this point and we are living on packing data in a big block and partition tables. its miserable
Tom Kyte
October 25, 2011 - 9:21 am UTC

you are doing it backwards, you want to compress the data during load - and pack as many rows into that 32k block as you can. You don't want to pack rows into a block - and then have to reorganize the entire table just to compress it.

AL32UTF8 is probably not wasting any storage for you - if you are only using basic characters - almost all of your data is already single byte. AL32UTF8 can use from 1 to 4 - but for most of us - it just uses 1.


Indexes, anybodies indexes, are good to:


As the means to access rows in a table: You will read the index to get to a row in the table. Here you want to access a very small percentage of the rows in the table.

As the means to answer a query: The index contains enough information to answer the entire query—we will not have to go to the table at all. The index will be used as a “thinner” version of the table.

As the means to optimize for initial response time: You want to retrieve all of the rows in a table, including columns that are not in the index itself – in some sorted order – the index will possibly allow for immediate response (but slower overall throughput).



You do not use anyones indexes to retrieve millions of rows from a table - indexes do not do that efficiently (nice full scans do that)

You do use indexes instead of the table many times (so creating a set of single column bitmap indexes that can be combined to be used INSTEAD of the table makes a great deal of sense in many cases).

You do use indexes to retrieve a few rows from a table (when you start talking millions, forget about it - that is true of *anyones* index).

You do use indexes to perform "initial response time" optimization (first_rows(n)).



You have an a probable IO problem - insufficient disk spindles to serve up data, coupled with insufficient bandwidth to move 25tb of data back and forth between the disk and the server, coupled with a shared bit of hardware (guessing you are using a SAN that is used by others - see it all of the time). Forget indexing - look at your hardware - do you have the real capability to move the data from disk to the server over and over.

Hardware - insufficient hardware - will completely limit you with any database, every time.




You can sometimes squeeze out a bit more compression by sorting your data before/as you load it. If you order the data by things that tend to repeat a lot and are "big" (wide), you can get better compression ratios since the compression you are using is based on removing redundant fields. Also - make sure you've normalized things out so you don't have artificially induced repetition (use dimension tables)

Read a Chinse character from Database

Su, November 05, 2011 - 1:22 am UTC

Hi Tom,

I am tried to store Chinese character in my Database, it worked using nls character set.it stored as ??. this should be fine.But when i tried to retrieve the data it showing like ?? only.but i want to see as a Chinese character. is this possible using sql query or in some other way? Please help me in this.

Chinese character i have stored - 杏仁

create table unicode(chin varchar(100 char));

insert into unicode (chin) values (N'??' );

SELECT *
FROM unicode;

chin
--------
??

Thank you,
Su
Tom Kyte
November 07, 2011 - 11:37 am UTC

you need to be using a terminal with settings set to enable the display of chinese characters

you need to have your nls settings set to chinese in that terminal


so either your terminal is not set to display the character you are trying to see or your nls_lang is not set appropriately to fetch chinese data, or the session you used to insert the data in the first place had this issue as well and you didn't actually insert chinese data.

use the DUMP function to see what it physically stored in the database to rule out the second issue (that of the data maybe not being inserted correctly in the first place)


Saving Russian in NVARCHAR2 column

Rima, December 15, 2011 - 5:23 pm UTC

Hi Tom,
I'm trying to save russian characters in a column of type NVARCHAR2 but I'm getting question marks. The national character set is AL16UTF16. The client terminal has UTF-8 character set in the settings. Is it possible that the conversion from UTF-8 to AL16UTF16 is corrupting the data?

U@STAGEDB> create table test (n nvarchar2(50), c varchar2(50));

Table created.

U@STAGEDB> insert into test values ('хелло', 'хелло');

1 row created.

U@STAGEDB> select N, dump(N, 1016) N_DUMP, C, dump(C,1016) C_DUMP from test;

N               N_DUMP                                                                                               C               C_DUMP
--------------- ---------------------------------------------------------------------------------------------------- --------------- ----------------------------------------------------------------------------------------------------
??????????      Typ=1 Len=20 CharacterSet=AL16UTF16: 0,bf,0,bf,0,bf,0,bf,0,bf,0,bf,0,bf,0,bf,0,bf,0,bf               ??????????      Typ=1 Len=10 CharacterSet=WE8ISO8859P1: bf,bf,bf,bf,bf,bf,bf,bf,bf,bf


I have the following nls parametes set on the database and the nls_language on the session is also AMERICAN:

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CHARACTERSET               WE8ISO8859P1
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_SORT                       BINARY
NLS_TIME_FORMAT                HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT             HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY              $
NLS_COMP                       BINARY
NLS_LENGTH_SEMANTICS           BYTE
NLS_NCHAR_CONV_EXCP            FALSE
NLS_NCHAR_CHARACTERSET         AL16UTF16
NLS_RDBMS_VERSION              10.2.0.4.0

Thanks!

Tom Kyte
December 16, 2011 - 5:20 pm UTC

you need to use

n'blah blah blah'


the n before the literal is important - when using literals. You'll be binding a nvarchar2 in real life though - right :) ???

Saving Russian in NVARCHAR2 column

Rima, December 27, 2011 - 2:48 pm UTC

Hi Tom,
Thanks for the answer. Of course I am not using literals in the code, I was just using this to make sure the DB allows unicode.
So after many days of searching and reading, i found the answer in the globalization guide (one paragraph in chapter 7), and it turns out that I have to add this property setting on the application server -Doracle.jdbc.defaultNChar=true so that the DB doesn't convert all the unicode data coming to the main character set.
I found it very weird that this important piece of information is hidden and not communicated more clearly. So I am doubting that this is the right solution. I would expect that the database would know to expect unicode data for columns defined as NVARCHAR2, and wouldn't need such a property setting.
So can you please confirm that this is the right way to deal with NVARCHAR2 columns?
Thanks again!
Rima


Tom Kyte
December 27, 2011 - 3:16 pm UTC

you have to bind properly:

http://docs.oracle.com/cd/B28359_01/java.111/b31224/global.htm#CHDHHJDB

describes this.


<quote>
The default value of defaultNChar is false. If the value of defaultNChar is false, then you must call the setFormOfUse(<column_Index>, OraclePreparedStatement.FORM_NCHAR) method for those columns that specifically need national-language characters. For example:

PreparedStatement pstmt =
conn.prepareStatement("insert into TEST values(?,?,?)");
pstmt.setFormOfUse(1, OraclePreparedStatement.FORM_NCHAR);
pstmt.setString(1, myUnicodeString1); // NCHAR column
pstmt.setFormOfUse(2, OraclePreparedeStatement.FORM_NCHAR);
pstmt.setString(2, myUnicodeString2); // NVARCHAR2 column
If you want to set the value of defaultNChar to true, then specify the following at the command-line:

java -Doracle.jdbc.defaultNChar=true myApplication
If you prefer, then you can also specify defaultNChar as a connection property and access NCHAR, NVARCHAR2, or NCLOB data.

If the value of defaultNChar is true, then you should call the setFormOfUse(<column_Index>, OraclePreparedStatement.FORM_CHAR) for columns that do not need national-language characters. For example:

PreparedStatement pstmt =
conn.prepareStatement("insert into TEST values(?,?,?)");
pstmt.setFormOfUse(3, OraclePreparedStatement.FORM_CHAR);
pstmt.setString(3, myString); // CHAR column

</quote>

Unicode DB and NLS_LENGTH_SEMANTICS

A Reader, January 04, 2012 - 11:51 pm UTC

Sergiusz Wolicki of Oracle recommends that we should not set nls_length_semantics=char at the instance level. One of the problems he mentioned is columns may be defined too big. Now, if I have a new database with characterset set to AL32UTF8. What is the problem with nls_length_semantics set to CHAR at the instance (rather than session) level? One of the advantages is that we will not have column storage being defined too small (becuase unicode may use more storage). Application vendors cannot guarantee setting nls_length_semantics at the session level in all scripts or specifying, for example, VARCHAR(20 CHAR) instead of VARCHAR(20) in all DDLs.
Tom Kyte
January 05, 2012 - 9:32 am UTC

I don't recommend changing many of the defaults - agreed, it makes you "special", "different" and subject to strange situations.

Application vendors cannot guarantee setting
nls_length_semantics at the session level in all scripts or specifying, for
example, VARCHAR(20 CHAR) instead of VARCHAR(20) in all DDLs.


In particular - you should not do something like this for 3rd party applications.

They haven't tested it.
They probably don't support it.
Strange things could happen (because you are non-default)

You want to default as much as possible.


When installing the 3rd party application - you can only use char semantics IF the vendor explicitly supports it.

Unicode DB and NLS_LENGTH_SEMANTICS

A Reader, January 05, 2012 - 3:01 pm UTC

Thanks Tom.
The application is new and supporting multi-byte characters is part of the requirements (as agreed with the vendor). Hence, we actually want the application to fail if there are column semantics issues. Hopefully, we find out all the length semantics issues during testing.

We are running Oracle 11.2 and this is a new db, no legacy applications or applications not supporting multi-byte characters will be running on it.

Some of the Oracle 10.2 documents say that nls_length_semantics (at instance level) is not applicable to SYS and SYSTEM sessions. 11.2 documents say it is not applicable to SYS sessions.

Is/Will this setting applicable to SYSTEM sessions in 11.2?

We can test this (SYSTEM sessions) out in 11.2 but we cannot tell what will be coming in the future.

Thanks again.

Tom Kyte
January 05, 2012 - 3:52 pm UTC

The application is new and supporting multi-byte characters is part of the
requirements (as agreed with the vendor). Hence, we actually want the
application to fail if there are column semantics issues. Hopefully, we find
out all the length semantics issues during testing.


why doesn't the vendor use the CHAR semantics in their create table or explicitly issue the alter session to have it in effect when they need it?


Is/Will this setting applicable to SYSTEM sessions in 11.2?


If it is, that would be bad for you and yet another reason to never ever set it at the instance level.

It just isn't necessary. If the vendor

a) supports it
b) wants it

it should be part of their installation - seriously.






Unicode and NLS_LENGHTH_SEMANTICS

A Reader, January 05, 2012 - 9:19 pm UTC

Hi Tom,
Thanks for your quick response.
The application is new to us but pretty old for the vendors.

We also have 10.2 single byte and multi-byte databases running for years. In the single byte db, all the character string columns in SYSTEM schema are defined as BYTE semantics. In the multi-byte db with NLS_LENGTH_SEMANTICS set to CHAR at the instance level, some columns are defined as BYTE and some as CHAR semantics. This appears to indicate that the 10.2 documentation is not accurate and that SYSTEM scripts cater for differences in settings. The NLS_LENGTH_SEMANTICS appears to be applicable to SYSTEM sessions.
Tom Kyte
January 10, 2012 - 9:18 pm UTC

what was the nls_length_semantics when the database was very very very first created.

sys will ignore it (nls_length_semantics) in the init.ora and as an alter session, but if sys explicitly issues "create table t ( x varchar2(2 char) )" it will be respected.

As for the vendor, if they support it, they would support it, their scripts would explicitly use CHAR in the datatype definition or they would alter session have it there.


give an explicit pointer into the documentation you are referring to on otn.oracle.com please.


I will not however change my stance that this nls setting should be left at the default in the init.ora and you would change it when you WANTED to explicitly. Changing something like this globally is a bad idea.

nls_characterset issue?

AndyP, January 06, 2012 - 7:23 am UTC

Tom,

I am getting the following errors, which seem to me to be related in some way to the characterset (although the 2 errors may be unrelated of course). However, to my knowledge this (the nls_characterset) has not changed, and this used to work for me. Any suggestions as to what to check?
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

sql > select * from nls_database_parameters;

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CHARACTERSET               WE8MSWIN1252
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_SORT                       BINARY
NLS_TIME_FORMAT                HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT             HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY              $
NLS_COMP                       BINARY
NLS_LENGTH_SEMANTICS           BYTE
NLS_NCHAR_CONV_EXCP            FALSE
NLS_NCHAR_CHARACTERSET         AL16UTF16
NLS_RDBMS_VERSION              11.2.0.1.0

20 rows selected.

sql > c.database.session
  1* select * from nls_session_parameters
sql > /

PARAMETER                      VALUE
------------------------------ ----------------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_SORT                       BINARY
NLS_TIME_FORMAT                HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT           DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT             HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT        DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY              $
NLS_COMP                       BINARY
NLS_LENGTH_SEMANTICS           BYTE
NLS_NCHAR_CONV_EXCP            FALSE

17 rows selected.

sql > set autotrace traceonly explain
sql > select * from dual;

Execution Plan
----------------------------------------------------------
An uncaught error happened in fetching the records : ORA-00600: internal error code, arguments: [kokle_lob2lob13:input mismatch], [1
], [], [], [], [], [], [], [], [], [], []

ORA-00600: internal error code, arguments: [kokle_lob2lob13:input mismatch], [1], [], [], [], [], [], [], [], [], [], []

sql > set autotrace off

sql > select count(*) from all_objects;
select count(*) from all_objects
                               *
ERROR at line 1:
ORA-06552: PL/SQL: Compilation unit analysis terminated
ORA-06553: PLS-553: character set name is not recognized

sql > c.all.user
  1* select count(*) from user_objects
sql > /

  COUNT(*)
----------
       167

sql > c.user.dba
  1* select count(*) from dba_objects
quiz > /

  COUNT(*)
----------
     75528



Thanks in advance
Tom Kyte
January 10, 2012 - 9:34 pm UTC

ora-600 always means one thing: please contact support, something that was not supposed to happen just happened.

Convert US7ASCII to UTF8

A reader, January 10, 2012 - 8:07 pm UTC

Hello,

We have a database which is in 11.1 version but the nls_character is set to US7ASCII (this database has been for years and got upgraded probably from 8i). The application team now wants their database to store Russian characters (cyrillic). What are the options I have? I think US7ASCII won't work. If so, how can I convert my database to AL32UTF8 without losing data or corrupting any data?

Thanks,


Tom Kyte
January 10, 2012 - 10:31 pm UTC

follow up,

A reader, January 13, 2012 - 1:56 pm UTC

Thanks for the URL. It looks like I have two options to change the database character set - 1. using full exp/imp and 2. using csalter

With regards to storage, if my US7ASCII database is 100 GB (for example) and if I convert the database character set to UTF8, is there a chance that the size of the database increases after conversion?

Thanks,

Tom Kyte
January 17, 2012 - 11:01 am UTC

is there a chance that the size
of the database increases after conversion?


not until you start putting stuff in there after the conversion that takes more than one byte. us7ascii data is still one byte in utf

Encrypt Unicode characters using DBMS_CRYPTO

Rima, January 19, 2012 - 5:48 pm UTC

Hi Tom,
We have some columns in our database that contain sensitive information, so we use the DBMS_CRYPTO package to encrypt the data before saving it.
The database NLS_CHARACTERSET is WE8ISO8859P1, and NLS_NCHAR_CHARACTERSET is AL16UTF16. The data we want to encrypt is multi-lingual, we are dealing with encrypting russian characters. The columns are defined as NVARCHAR2 (but i guess that's not necessary).

We are able to save correct russian data now into other NVARCHAR2 columns (not encrypted), but when we call DBMS_CRYPTO.ENCRYPT, we are losing the non latin characters.

In the DBMS_CRYPTO documentation, it is mentioned under Restrictions:

"The VARCHAR2 datatype is not directly supported by DBMS_CRYPTO. Before you can perform cryptographic operations on data of the type VARCHAR2, you must convert it to the uniform database character set AL32UTF8, and then convert it to the RAW datatype. After performing these conversions, you can then encrypt it with the DBMS_CRYPTO package."

Nothing is mentioned regarding NVARCHAR2, and there is no conversion function (equivalent to UTL_I18N.STRING_TO_RAW ) that transforms NVARCHAR2 to RAW. Since data in NVARCHAR2 is already unicode, i tried to use UTL_RAW.CAST_TO_RAW but that didn't work. Looking in UTL_RAW, there is a RAW_TO_NCHAR function, but only a STRING_TO_RAW function, and that takes a VARCHAR2 as input. There are no functions that take NVARCHAR2 input variables.

Can you please help with this? Is there another way to encrypt Unicode characters when the NLS_CHARACTERSET is not UTF?

Thanks!

Tom Kyte
January 20, 2012 - 9:39 am UTC

I don't care what your character set is.

encrypted data can only be stored in a RAW column, we don't use a characterset for that.

You need to change the underlying datatype of the column to RAW to store this encrypted data, there is no way around that. Well, I guess you could DOUBLE the length of the existing fields (assuming all are well under 2000 bytes) and store a hex string that represents the bytes of the encrypted binary data.


If you are storing encrypted data in a varchar2/nvarchar2 - you have done it wrong, you need to start over and create a new raw field that will be a multiple of 16 bytes and is larger than your existing field. For example, if you have a varchar2(30), you need to use a RAW(32)

No if, and, or buts about it.

UTL_I18N.ESCAPE_REFERENCE encoding problem

rahul, November 22, 2012 - 4:02 am UTC

Hi Tom,

I am facing a problem with UTL_I18N.ESCAPE_REFERENCE function. UTL_I18N.ESCAPE_REFERENCE and UTL_I18N.UNESCAPE_REFERENCE dont appear to work in sync. 
Eg. Below is the problem

When i use the UNESCAPE_REFERENCE function it decodes the data to correct format 

SQL> select UTL_I18N.UNESCAPE_REFERENCE('&#x3c;&#x3e;') x from dual;
X
------

<>


But when i use ESCAPE_REFERENCE it does not encodes it back to the original decode string.
SQL>
SQL>
SQL> select UTL_I18N.ESCAPE_REFERENCE('<>') x from dual;

X
------------
&lt;&gt;

To summarise the problem
                 Input Output
UNESCAPE_REFERENCE &#x3c; <
ESCAPE_REFERENCE   <   &lt;

As you see escape_reference does not generate the expected encoding.

The olny find in this regards i have is in below link
http://www.htmlhelp.com/reference/html40/entities/special.html

Any help in this regards is appriciated. 

Thanks 
Rahul. 



Tom Kyte
November 29, 2012 - 6:32 am UTC

ops$tkyte%ORA11GR2> select utl_i18n.unescape_reference( UTL_I18N.ESCAPE_REFERENCE( '<>' )) from dual;

UTL_I18N.UNESCAPE_REFERENCE(UTL_I18N.ESCAPE_REFERENCE('<>'))
-------------------------------------------------------------------------------
<>




a) it works

b) it did not generate an unexpected encoding, rather - you thought it worked differently. Your expectations were incorrect.



whenever there are multiple ways to encode something - someone has to pick one and say "that is it"

it is 100% legitimate (and 'more normal') to encode '<' as '&lt'. That is the commonly accepted encoding for that particular character.


just because there is more than one way to encode it does not mean the one we chose is "wrong"

there are three ways to encode <, we chose the most commonly accepted one.

rahul, November 30, 2012 - 4:51 am UTC

Thanks Tom for the explanation.

I agree with what you say.

"whenever there are multiple ways to encode something - someone has to pick one and say "that is it"

it is 100% legitimate (and 'more normal') to encode '<' as '&lt'. That is the commonly accepted encoding for that particular character. "

But when Oracle decodes '&lt' and '&#x3c;' both as '<' (understands both encoding and decodes to the right character), is there no legitimate way to select the type of encoding?

How do i decode '<' as '&#x3c;' likewise?




Tom Kyte
November 30, 2012 - 6:11 am UTC

... is there no legitimate way to
select the type of encoding? ...


yes there is, and we chose it - we selected it.


Tell me this - why does it matter to you?

Since you an encode a string - and anything that accepts an encoded string using a valid encoding scheme will be able to decipher it properly.

And since we can take that encoded string and decipher it back into the unencoded string...

what matter is it to you which encoding value we use?


(no, you cannot change the encoding scheme - but - why does it matter????? )

A reader, December 04, 2012 - 2:16 am UTC

Hi Tom,

I have a requiement where in i have receive data in the encoded format and i have to decode and store it in database. Now this is directly achived using the UTL_I18N.UNESCAPE_REFERENCE() function. 

Also while the application retrives the data it should be able to retrieve the original string, meaning again encode the string back in the format it was received. So when i use the UTL_I18N.ESCAPE_REFERENCE() 
 It returns the sting in other encoding. 

If you refer to my example
SQL> select UTL_I18N.UNESCAPE_REFERENCE('&#x3c;&#x3e;') x from dual;
X
------

<>


But when i use ESCAPE_REFERENCE it does not encodes it back to the original encoding string.
SQL>
SQL>
SQL> select UTL_I18N.ESCAPE_REFERENCE('<>') x from dual;

X
------------
&lt;&gt;


So i did not get the exact string i had received. 

How do i achive it.

Regards,
Rahul.

Tom Kyte
December 07, 2012 - 4:35 am UTC

Also while the application retrives the data it should be able to retrieve the
original string, meaning again encode the string back in the format it was
received.


let me ask you this - what if the start sending you the other encoding - or both encodings?

You have a polymorphic function here - the encoding is one to many - a single character can be encoded in many ways. therefore you can never rely on getting back what you put in after a decode/encode routine. You get back logically consistent data - but it might not be the same bits and bytes.

think about this - either you have to

a) develop a standard with them that says "X will be represented as Y and Y only". Do this for *all* special characters. Then write your own encode/decode routines.

b) store both the original string and the decoded string

c) store the original string only, decode it on demand when you retrieve it

d) change your requirement, most times requirements are not actually requirements - just requests, requests that can easily be put aside


(c) is by far the easiest and probably the most technically correct as well. You can choose.

Link to same question

Michel Cadot, December 07, 2012 - 6:13 am UTC


For the record, same question posted at:
http://www.orafaq.com/forum/mv/msg/185038/571225/102589/#msg_571225
Regards
Michel

Rahul, December 10, 2012 - 12:58 am UTC

Thanks for your advise.

The string received will always be in mentioned encoding. But if we don't have any standard function available in oracle, as suggested, I will have to write my own.

Thanks for your help.

@Michel : Yes I did post the same question on 2 sites at the same site so that it reaches wider audiances and i get the query answered faster.



Tom Kyte
December 14, 2012 - 2:23 pm UTC

you do have a standard function.

the problem is - you need a special, specific implementation of it - very customized to you and your environment.


@Michel : Yes I did post the same question on 2 sites at the same site so that
it reaches wider audiances and i get the query answered faster.


that is called the machine gun approach and it is very annoying to people that take the time to answer your questions since they are typically doing redundant work. In other words - you wasted other peoples time, maybe mine, maybe theirs.

A reader, December 13, 2012 - 4:44 am UTC

how to change database chracter-set use following link
http://aqeelabbas.blog.com/how-change-oracle-database-chracter-set/
Tom Kyte
December 17, 2012 - 3:50 pm UTC

I have to say, that is the least readable background I've seen anyone use in a long long time.

here is the official documentation on this
http://docs.oracle.com/cd/E11882_01/server.112/e10729/ch2charset.htm#i1007203


testin data

sam, December 16, 2012 - 7:28 pm UTC

Tom:

If i am migrating a 9i we8ISO8850P1 database to a new server with 11gR2 and Unicode/UTF8 database.

First, I want to run CSSCAN and review the reports to ensure no data loss.

But is it a valid test, to create another WE8SIO8850P1 database on the 11g machine (in adition to the Unicode) and then do

select * from T1@UTF8_DB
MINUS
select * from T1@WE8ISO8859P1_DB

SELECt * from T1@WE8ISOO8859P1_DB
MINUS
select * from T1@UTF8_DB

If i get no difference that confirms data is the same in both databases for this table.

Is this a valid test to verify data in the unicode database?
Tom Kyte
December 18, 2012 - 12:27 pm UTC

as soon as you fetch from a database with character set A into a database with characteset B - there is characterset conversion taking place. So, when you fetch the UTF data into the WE8ISO database - the UTF data is converted.

just use CSSCAN on the original database to see what data it reports as not being convertable.


and beware - some of your data might not fit in your new database since some of your characters might take 2 bytes or more in UTF.

data

A reader, December 18, 2012 - 12:38 pm UTC

Tom:

Yes, I forgot about the charset conversion when you do SELECT.

Do you think CSSCAN report is sufficient for testing purposes - meaning if CSSCAN said everything is OK (no data loss or truncation) you will not do any more data checks.

I was thinking of dumping table data to a text file from each database and then run a unix compare or diff to see if there any differences reported.

I would like to test using the DB but like you said the conversion will always take place.

test data with conversion

sam, February 11, 2013 - 10:07 am UTC

Tom:

Do you know if Oracle 9i and Oracle 11g have a built-in funcion that can compute the MD5 hash or some other hash so I can compare if the row data is identical in TAB1.ROW1@WE8ISO88591 9i database and TAB1.ROW1@AL32UTF8 11g database.

I understand there might be some differences if some characters use multibyte storage instead of single byte but most characters are single byte and should have the same code point in both databases.

I can calculate this hash and store this hash value as a column in each row and compare it between source and target tables, I can tell if the data is identical.


Tom Kyte
February 11, 2013 - 10:14 am UTC

dbms_crypto package does that.

but, that was in 10g, not 9i.

You might be able to do something in a java stored procedure if computing a MD5 has was part of the java standard API back then (I'm not sure if it was or not..)

CS change

A reader, February 12, 2013 - 8:11 am UTC

Tom:

Do you agree with my idea on using ROW hashes to compare data in each database or there are better methods?

I can also run SQL query in each database and dump results to a text file and then do a DIFF on the files to see if they match or not.

Another option is SQL and MINUS in source database abut the target DB data will be converted in this case back to source CS.

What method do you think is the best?
Tom Kyte
February 12, 2013 - 8:17 am UTC

Sam -

and if it is not converted - I wouldn't expect them to match. The entire purpose of different character sets is to store the same data differently.

CS

A reader, February 12, 2013 - 8:30 am UTC

Tom:

I did not follow you. Am I wasting my time doing the above work...

Internally some characters will be stored differently.

Most standard ASCII characters internal storage should be the same for both character sets since the codepoints ae similar.

But i am comparing character display here. An "A" or ":' or "," should display the same in both databases when you view the data in a SQL*plus report even though one database stores the chracter in single byte and the other in multi byte.


Am I missing something?


Tom Kyte
February 12, 2013 - 8:34 am UTC

Sam,

if you only have ASCII, true it won't matter. But you are comparing different character sets so a normal assumption by a normal person would be "you have data that needs that character set" - and therefore won't be simple ASCII. Otherwise, ummm, what are you doing?

please think about this, we've (you and I - you Sam and me) have talked about this many many times before.

the entire goal, point, implementation, reason for being - of different character sets is <you put the answer here, I wrote it above>.


and you do know that the terminal settings for sqlplus can change the way the characters are displayed as well right?

And if that is the point, the goal, and you have already said "if we do it over a database link, character set conversion would happen" - you should be able to answer this yourself.

CS

A reader, February 12, 2013 - 8:44 am UTC

Tom:

okay, so in conclusion when you do a DB CS migration, would you just rely on the CSSCAN reports (before migration) and the export/import tool logs (after migration) to verify that all data was exported and imported/converted without any issues?

There is no need to inspect sample data? There are millions of records and it is impossible to check every row.
Tom Kyte
February 12, 2013 - 9:30 am UTC

Sam,

I wouldn't use export import. insert as select, create table as select, data pump, something efficient.

If you use SQL, you know the data has come over - along with any character set conversions (you are expecting that the data might not be exactly the same at the string level). If you use a tool like data pump, you'd have to verify the outcome of that tool - yes.

you can use the dbms_rectifier_diff package if you wanted to actually compare them in detail.

NLS_CHARACTER_SET & NLS_LENGTH_SEMANTICS

Chris, May 16, 2013 - 2:01 pm UTC

Hi Tom,

We get data from a schema in the SOURCE database to an identical schema in the TARGET database all day long via streams. Both databases are version 10.2.0.3 and both have NLS_LENGTH_SEMANTICS = BYTE.

SOURCE uses NLS_CHARACTER_SET = WE8MSWIN1252.
TARGET uses NLS_CHARACTER_SET = AL32UTF8.

But the British Pound sign, '£', takes 1 byte on SOURCE and 2 bytes on TARGET. One particular column in SOURCE, a varchar2(60), had a 60 character string, including the '£'. When it came over to TARGET as part of a streams LCR, it became 61, thus causing an error. Much pain and suffering ensued.

Both SOURCE and TARGET will be upgraded to Oracle 11g this year, so I think it would be a good time to get these databases synced up. In your opinion, what's the better method?

1. Make SOURCE use NLS_CHARACTER_SET = AL32UTF8. I thought this was recommended by Oracle.
2. On TARGET, use NLS_LENGTH_SEMANTICS = CHAR, so 60 characters will take up 60 characters. Why is that not the default?**
3. Both #1 and #2.

I was leaning towards #1. Your thoughts?

** Seriously, why is that not the default? If I'm thinking I want a column to hold a string that is N characters long, I think in characters, not in bytes. With the default of NLS_LENGTH_SEMANTICS = BYTE (and NLS_CHARACTER_SET = AL32UTF8), this kind of thing happens:

create table drop_me
  (
  currency_symbol varchar2(1)
  );
 
Table created
insert into drop_me(currency_symbol) values ('$');
 
1 row inserted
insert into drop_me(currency_symbol) values ('£');
 
insert into drop_me(currency_symbol) values ('£')
 
ORA-12899: value too large for column "GFPPROC"."DROP_ME"."CURRENCY_SYMBOL" (actual: 2, maximum: 1)


Thanks!
Chris
Tom Kyte
May 18, 2013 - 6:56 am UTC

I would use AL32UTF8 on both if at all possible, eliminating the chance of a truncation like that.

the datatypes use byte semantics for legacy reasons (that is the way it always was - in the olden days, 1 byte always was 1 character). That and a varchar2(4000 char) might not be able to hold 4000 characters - it is sort of lying to you - it can hold AT MOST 4000 characters, but might only be able to hold significantly less.

a varchar2(4000) on the other hand holds 4000 bytes - regardless.


Be aware of char side effects

Dana, May 20, 2013 - 3:27 pm UTC

I did a byte to char nls_length_semantics conversion many years ago. I was not happy with the results.
Char semantics will cause errors on indexes if you use long multi-column natural keys and small data block sizes. Memory consumption is higher for procedures/packages, efficiency is lower.
Oracle determines maximum possible memory consumption/space usage. Easy to use, but inefficient in practice.
Better to know and design for proper byte semantics in my opinion. If you need 300 bytes to do 100 UTF8 characters, call it 300 bytes rather than 100 characters.

Reverse Function and Multibyte Characters

A reader, May 26, 2013 - 4:16 pm UTC

How does the REVERSE function work in a string with multibyte characters (e.g. Asian languages)? Would it work at all?
Tom Kyte
May 29, 2013 - 5:12 pm UTC

it does a characterwise reverse. but remember, it is not a documented function

A reader, May 30, 2013 - 3:30 am UTC

So you're saying that it should work, but, if anything goes wrong, too bad? Is there a supported function that does the same thing?
Tom Kyte
May 30, 2013 - 2:37 pm UTC

you could write your own.

I'm just warning you that reverse, a function we use in the data dictionary views - so it is likely very safe - is undocumented currently. If anything goes wrong, it could be "too bad"


you can write your own reverse in plsql if you like.

To "A reader:

Sokrates, May 30, 2013 - 4:41 pm UTC

What about

set pages 50
select 
   utl_raw.cast_to_varchar2(
      utl_raw.reverse(
         utl_raw.cast_to_raw(q'|
? "
;laud morf
)      
)         
)'try this'(war_ot_tsac.war_ltu         
(esrever.war_ltu      
(2rahcrav_ot_tsac.war_ltu   
tceles
" tuoba tahw
:"redaer A" oT
|'
   )
))
from dual;


?

sokrates@11.2 > set pages 50
sokrates@11.2 > select
  2     utl_raw.cast_to_varchar2(
  3        utl_raw.reverse(
  4           utl_raw.cast_to_raw(q'|
  5  ? "
  6  ;laud morf
  7  )
  8  )
  9  )'try this'(war_ot_tsac.war_ltu
 10  (esrever.war_ltu
 11  (2rahcrav_ot_tsac.war_ltu
 12  tceles
 13  " tuoba tahw
 14  :"redaer A" oT
 15  |'
 16     )
 17  ))
 18  from dual;

UTL_RAW.CAST_TO_VARCHAR2(UTL_RAW.REVERSE(UTL_RAW.CAST_TO_RAW(Q'|?";LAUDMORF)))'T
--------------------------------------------------------------------------------

To "A reader":
what about "
select
utl_raw.cast_to_varchar2(
utl_raw.reverse(
utl_raw.cast_to_raw('siht yrt')
)
)
from dual;
" ?

REVERSE Function

A reader, May 30, 2013 - 9:40 pm UTC

"it [reverse] does a characterwise reverse."

One 3-byte character. It seems to be doing a reverse at the byte level.


SQL> SELECT LENGTH(str) len, VSIZE(str) vsize
  2  FROM   x;

       LEN      VSIZE
---------- ----------
         1          3

SQL>
SQL> SELECT DUMP(str) dp, DUMP(REVERSE(str)) dpr FROM x;

DP                                            DPR
--------------------------------------------- ------------------------
Typ=1 Len=3: 227,129,167                      Typ=1 Len=3: 167,129,227


Tom Kyte
May 30, 2013 - 9:44 pm UTC

I stand corrected, I was going off of memory - I should never do that :( Every time I don't actually demonstrated something - something about it is wrong.

appreciate it, thanks!

Reverse Function

A reader, May 30, 2013 - 10:09 pm UTC

How would you write a reverse function that works on multibyte characters and performs well?
Tom Kyte
May 31, 2013 - 1:59 pm UTC

a loop.


A reader, May 31, 2013 - 3:42 pm UTC

If I have a custom "reverse" function with the same name, is there any way to override the Oracle reverse function without qualifying the custom function with the user/schema name?

SELECT reverse(str)...

instead of

SELECT <user>.reverse(str)...


Tom Kyte
May 31, 2013 - 4:10 pm UTC

no, SQL will resolve that to be the internal builtin.

SQLPLUS displaying garbbage for all UTF8 characters

Lisa, June 15, 2013 - 2:04 pm UTC

Hi TOM,
I did the excersice given in one of your blog,my datatbase LANG settings & the client settings are all on UTF8, still there characters are displaying wrongly. Is there something i am missing ? Please help

create table fr_char_test(c1 varchar2(100));
insert into fr_char_test values ('é');
insert into fr_char_test values ('éé');
insert into fr_char_test values ('ééé');
Table created.
 
SQL> 
1 row created.
 
SQL> 
1 row created.
 
SQL> 
 
1 row created.
 
SQL> commit;
 
Commit complete.



SQL> select dump(c1) from fr_char_test;
 
DUMP(C1)
-----------------------------------------------------------Typ=1 Len=3: 239,191,189
Typ=1 Len=6: 239,191,189,239,191,189
Typ=1 Len=9: 239,191,189,239,191,189,239,191,189
SQL> select * from fr_char_test;

C1
----------------------------------------------------------------------------------------------------
�
��
���
The client & datatbase settings are,
-sh-3.2$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
-sh-3.2$ echo $NLS_LANG
American_America.AL32UTF8
-sh-3.2$
the nls_lang in windows is 
AMERICAN_AMERICA.WE8MSWIN1252
still the sqlplus is not displaying the characters i have inserted. Please help where could be the issue.

Tom Kyte
June 18, 2013 - 3:28 pm UTC

check the settings of your terminal. You terminal has a characterset too. this is not a sqlplus setting.

nls and forms 6i

antonio, October 11, 2013 - 3:33 pm UTC

hi tom, i want to say thanks for you help.

i am trying to migrate forms 6i to jsp (java web).
my Database 11g have spanish information and form 6i can extract the accents very well, i dont see any problems inserting data. but I have not been able to see that data in other language well. using sqlplus, for example, i have tried a lot of things but nothing working.

DATABASE NLS_CALENDAR GREGORIAN
DATABASE NLS_CHARACTERSET WE8MSWIN1252
DATABASE NLS_COMP BINARY
DATABASE NLS_CURRENCY $
DATABASE NLS_DATE_FORMAT DD-MON-RR
DATABASE NLS_DATE_LANGUAGE AMERICAN
DATABASE NLS_DUAL_CURRENCY $
DATABASE NLS_ISO_CURRENCY AMERICA
DATABASE NLS_LANGUAGE AMERICAN
DATABASE NLS_LENGTH_SEMANTICS BYTE
DATABASE NLS_NCHAR_CHARACTERSET AL16UTF16
DATABASE NLS_NCHAR_CONV_EXCP FALSE
DATABASE NLS_NUMERIC_CHARACTERS .,
DATABASE NLS_RDBMS_VERSION 11.2.0.3.0
DATABASE NLS_SORT BINARY
DATABASE NLS_TERRITORY AMERICA
DATABASE NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
DATABASE NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
DATABASE NLS_TIME_FORMAT HH.MI.SSXFF AM
DATABASE NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR

SQL> select FORRENOMBR from formarepor where FORRECODIG='ATENCION_SUSCRIPTORES_PERDIDAS';

FORRENOMBR
--------------------------------------------------------------------------------
Atenci?n Suscriptores P?rdidas

but the corret value is "Atención Suscriptores Pérdidas"


in the regedit of form i have the value nls_lang LATIN AMERICAN SPANISH_AMERICA.WE8ISO8859P1


but if i do:

export NLS_LANG="LATIN AMERICAN SPANISH_AMERICA.WE8ISO8859P1"

but:

select FORRENOMBR from formarepor where FORRECODIG='ATENCION_SUSCRIPTORES_PERDIDAS';

FORRENOMBR
--------------------------------------------------------------------------------
Atenci▒n Suscriptores P▒rdidas

change the characher but i dont si the Ó or É.


And this:

SQL>
select convert(FORRENOMBR,'UTF8','WE8MSWIN1252') from formarepor where FORRECODIG='ATENCION_SUSCRIPTORES_PERDIDAS';
SQL>
select convert(FORRENOMBR,'US7ASCII','WE8MSWIN1252') from formarepor where FORRECODIG='ATENCION_SUSCRIPTORES_PERDIDAS';

select convert(FORRENOMBR,'AL32UTF8','WE8MSWIN1252') from formarepor where FORRECODIG='ATENCION_SUSCRIPTORES_PERDIDAS';

select convert(FORRENOMBR,'WE8ISO8859P1','WE8MSWIN1252') from formarepor where FORRECODIG='ATENCION_SUSCRIPTORES_PERDIDAS';

CONVERT(FORRENOMBR,'UTF8','WE8MSWIN1252')
--------------------------------------------------------------------------------
Atenci¿n Suscriptores P¿rdidas

SQL> SQL>
CONVERT(FORRENOMBR,'US7ASCII','WE8MSWIN1252')
--------------------------------------------------------------------------------
Atenci?n Suscriptores P?rdidas

SQL> SQL>
CONVERT(FORRENOMBR,'AL32UTF8','WE8MSWIN1252')
--------------------------------------------------------------------------------
Atenci¿n Suscriptores P¿rdidas

SQL> SQL>
CONVERT(FORRENOMBR,'WE8ISO8859P1','WE8MSWIN1252')
--------------------------------------------------------------------------------
Atenci▒n Suscriptores P▒rdidas


i dont know what more i can to do.

thank for you help. and sorry for my english.


Tom Kyte
October 11, 2013 - 4:28 pm UTC

check your nls_lang environment/registry settings. make sure your sqlplus session is using the same as your forms session (which would be set on the server for forms, on the client for sqlplus)


check your terminal program language settings (could just be a display issue with your terminal)

A reader, November 08, 2013 - 4:34 am UTC

Tom:

What would be the coorect value to use for NLS_LANG when you exp/imp an 11g database in AL32UTF8 format on RHEL.

My understanding it should be set to the same chracter set as the OS character set (Linux).

$locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=


-- Options

$ export NLS_LANG=en_us.UTF-8
$ export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
$ export NLS_LANG=AMERICAN_AMERICA.UTF8
Tom Kyte
November 11, 2013 - 9:31 am UTC

Sam,

think about it. what would be the correct value *FOR YOU*.

do you want the data converted to some other character set? if yes, then use that character set.

do you want the data exported "as is", no conversions? if yes, then use the databases character set.


where did you get the idea that the OS character set has anything to do with your decision?

how will I store

A reader, December 10, 2013 - 10:11 pm UTC

How will I store pictures in the database using oracle forms?

cinversion

A reader, February 11, 2014 - 2:36 pm UTC

hello tom

i have we8mswin1252 source database and i have target Db AL32UTF8 .i just want to export few schemas from source to target.source 11.2.0.1 and target is 11.2.03.
i export schemas using expdp from source and import only metadata to target and then i alter all column byte to char.then import using impdp.all rows inserted successfully.
there has some Begali data in table. when i set my correct NLS_LANG to target like export NLS_LANG=American_america.AL32UTF8 i found data like this ||a??||a??||
but if i set source NLS_LANG >> export NLS_LANG=American_america.WE8MSWIN1252 i can view bangla data from my shall.my question is why my al32utf8 db datas code point are not changed? whether i use expdp/impdp .is expdp/impdp cant convert data's

Re: cinversion

Sergiusz Wolicki, July 14, 2014 - 1:36 am UTC

@A reader

As your source database is WE8MSWIN1252, the Bangla (Bengali) text you store there is not recognized by the database as Bangla. It is recognized as some senseless Western European text consisting of accented Latin characters that happen to have the same binary codes as the Bangla characters in whatever encoding this Bangla characters really are. Data Pump does convert those characters to AL32UTF8 however not to Bangla characters in AL32UTF8 but to Latin characters.

When you set NLS_LANG=.WE8MSWIN1252 and select the Bangla text to a terminal, then the codes are interpreted by the terminal and its fonts. If the terminal settings are compatible with the real encoding of the Bangla text, you see the text correctly.

There is no easy way to move the Bangla text to the target database. Two main options are:
(1) Spool the text to a text file in SQL*Plus (after setting NLS_LANG=.WE8MSWIN1252) and then load it with SQL*Loader to the target database after settings the CHARACTERSET clause in the control file to the real encoding of the text (possibly AL32UTF8).
(2) Copy the text to an NVARCHAR2 columns using
UTL_I18N.RAW_TO_NCHAR(UTL_I18N.STRING_TO_RAW(src_column,'WE8MSWIN1252'),'<real encoding of the Bangla text>'). You have to identify the real encoding first. Then, the NVARCHAR2 column can be exported and imported. Once imported to the AL32UTF8 target, the NVARCHAR2 column can be copied into the original VARCHAR2 column without any special functions. This method will work for text up to 2000 bytes per column value, unless Oracle12c Long VARCHAR feature is enabled (max_string_size=extended) allowing 32767 bytes per value.

Thanks,
Sergiusz

Oracle_winx64_12102_database

Jobi Varghese, October 11, 2017 - 7:40 am UTC

Hi Tom,

I would like ask one help. I need to create a database with character set AL16UTF16 but I did not found in the dropdown to select.

My Oracle version is Oracle_winx64_12102_database.

Regards

Jobi varghese
Sergiusz Wolicki
October 13, 2017 - 3:58 am UTC

Jobi,

AL16UTF16, i.e. the UTF-16 Unicode encoding form, is supported only as the national (aka NCHAR) character set, which is used by the NVARCHAR2, NCHAR, and NCLOB data types. The database character set (VARCHAR2, CHAR, CLOB, LONG) supports Unicode via AL32UTF8, i.e. the UTF-8 Unicode encoding form.

By the way, AL16UTF16 is the default national character set.

-- Sergiusz

EE8MSWIN1250

Pavel, January 27, 2021 - 10:21 am UTC

Hi Tom,
is there any difference between Oracle EE8MSWIN1250 and standard windows-1250 charset?
We have Oracle 19c enterprise edition with NLS_CHARACTERSET = EE8MSWIN1250.
The Application server communicates with the database using oracle JDBC thin driver version 19.
Some database columns varchar2 occasionaly contain strange characters, typically 0x81.
0x81 is not valid character in standard windows-1250.
Why such a character can appear in varchar2 column when NLS_CHARACTERSET is set to EE8MSWIN1250?

Thank you. Pavel.

Connor McDonald
January 28, 2021 - 9:44 am UTC

I would say you have a characterset mismatch somewhere in your stack.

In EE8MSWIN1250, I think 0x81 is an umlaut, which is common in Europe, but in UTF8 it is an "High Octet Preset"

The other thing to check for is that your client program can actually handle the characters you are producing, eg

https://www.thatjeffsmith.com/archive/2016/08/unicode-and-oracle-sqlcl-on-windows/


More to Explore

PL/SQL demos

Check out more PL/SQL tutorials on our LiveSQL tool.

PL/SQL docs

PL/SQL reference manual from the Oracle documentation library