Skip to Main Content
  • Questions
  • ODBC changes national characters to latin equivelant

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Gabor.

Asked: September 29, 2016 - 6:08 am UTC

Last updated: October 11, 2016 - 12:44 am UTC

Version: 11.2.0.3.0 64bit

Viewed 1000+ times

You Asked

Hi all,

I ran into a bit of a problem using Oracle database through ODBC from C++. The situation is as follows:

- There is an Oracle database installed on a Linux OS
- There is a C++ component which is communicating with that database through ODBC
- Inside the database there are some values, which contain Hungarian national characters (e.g. é, ü, ű)

When running a query against this database from the C++ component, the national characters appear as their latin equivalent versions (e.g. é -> e, ü,ű -> u). When querying the same data from JDBC everything's fine. Checking the data in SQL Developer also works.

The component is running on a different Linux mahcine. I tried setting the NLS_LANG environment variable there with various values, none seems to work. The closest I can get is with the "official" Hungarian character set, which is "HUNGARIAN_HUNGARY.EE8ISO8859P2", but this gives me an "û" instead of "ű" (all other characters are fine). I get the same result using "HUNGARIAN_HUNGARY.EE8MSWIN1250".

I wrote a small anonymous block in SQL Developer to convert the national characters to every available character set. I only get good results with three character sets: "AL24UTFFSS", "UTF8" and "AL32UTF8". But the strange thing is that when I set the NLS_LANG environment variable to "HUNGARIAN_HUNGARY.UTF8" the component's logging shows "ű" instead of "ű", and other similar values.

The component and its logging is fine, because it receives similar national characters from other sources as well (on a socket from a Java component), and those values appear in the logs as they should, national characters and all.

Before some of you would ask, removing the national characters from the database is not an option, as these values are required in this format for one of our clients' backend system - don't ask me why.

The database version is:

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
"CORE 11.2.0.3.0 Production"
TNS for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

The database charset data is:

NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET AL32UTF8
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET UTF8
NLS_RDBMS_VERSION 11.2.0.3.0

The session charset data is:

NLS_LANGUAGE HUNGARIAN
NLS_TERRITORY HUNGARY
NLS_CURRENCY Ft
NLS_ISO_CURRENCY HUNGARY
NLS_NUMERIC_CHARACTERS ,
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT RR-MON-DD
NLS_DATE_LANGUAGE HUNGARIAN
NLS_SORT HUNGARIAN
NLS_TIME_FORMAT HH24.MI.SSXFF
NLS_TIMESTAMP_FORMAT RR-MON-DD HH24.MI.SSXFF
NLS_TIME_TZ_FORMAT HH24.MI.SSXFF TZR
NLS_TIMESTAMP_TZ_FORMAT RR-MON-DD HH24.MI.SSXFF TZR
NLS_DUAL_CURRENCY Ft
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE

I'm basically lost here, I don't really see what I should try next. That's why I turned to you guys, I hope you can help.

Regards,
Gabor

and Chris said...

I've spoken with Sergiusz Wolicki, our in-house NLS expert about this. He had this to say:

Based on the available data, my theory is as follows:
1. You look at the correctness of data selected via JDBC or SQL Developer on a different terminal than the one you use to look at C++/ODBC results.
2. The terminal you use to view the C++/ODBC results is configured for the Western European character set (iso8859-1). However, Hungarian uses the Central European character set (iso8859-2). Therefore, when you set NLS_LANG to .EE8ISO8859P2, you extract correct codes from the database. However, you display these codes in a Western European font and it is why "ű" becomes "û". The other characters (e.g. é, ü) happen to exist in both character sets with the same binary codes, so they display correctly.

Therefore, you need to configure your terminal for either iso8859-2 or utf-8 and then set NLS_LANG to .EE8ISO8859P2 or .AL32UTF8 correspondingly.


If this doesn't help, you could try asking on the globalization forum:

https://community.oracle.com/community/database/oracle-database-options/globalization_support

Sergiusz hangs out there.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Solved on my own

Gabor Major, October 09, 2016 - 2:59 pm UTC

Hi Guys,

So, first of all, sorry for the lowest rating, nothing personal, but the problem was entirely different from your suggestion. I actually found the correct NLS_LANG value about an hour after submitting my question. It was "AMERICAN_AMERICA.UTF8". On some level it is logical, because this is the encoding of the database, but it is still strange.

BTW the terminal couldn't have been wrong, because, as I stated, the component correctly logs (and the terminal correctly shows) national characters which the component receives from a Java component through a socket.

Thanks for the answer anyway, keep up the good work guys, I've found lots of help in this forum earlier, and I'm certain I will find lots of help in the future as well.

Thanks,
Gabor
Connor McDonald
October 10, 2016 - 5:32 am UTC

We appreciate all feedback ... good and bad :-)

UTF8

Sergiusz Wolicki, October 10, 2016 - 7:22 pm UTC

You wrote:

"But the strange thing is that when I set the NLS_LANG environment variable to "HUNGARIAN_HUNGARY.UTF8" the component's logging shows "ű" instead of "ű", and other similar values."

Displaying of characters does not depend on the language (maybe except when the language is bi-directional, like Arabic). Hence, it is not theoretically possible to get incorrect characters with "HUNGARIAN_HUNGARY.UTF8" and correct ones with "AMERICAN_AMERICA.UTF8", provided the setting was syntactically correct and exported properly on both occasions. I would be very cautious. It is possible to start having illegal codes in the database, when NLS_LANG and the database character set are the same, as the database does not validation at insert/query. The screen may show proper codes but they may be illegal in the database. Use the DUMP SQL function to verify that the codes in the database are correct.

Diagnosis of NLS issues is not trivial. Do not underestimate the complexity. Seemingly correct results are not necessarily correct.

Thanks,
Sergiusz

Chris Saxon
October 11, 2016 - 12:44 am UTC

Nice input.