Skip to Main Content
  • Questions
  • Storing non English characters in table column

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question.

Asked: April 20, 2016 - 11:01 am UTC

Last updated: April 21, 2016 - 3:31 pm UTC

Version: 11g R2

Viewed 1000+ times

You Asked

Hi Tom,

We are in Oracle 11g r2 dB.

We are facing this issue in dB. Table has a column with the length 500bytes. We get non English characters as input and we will store that information in this column using substr.

Like substr(I_var,1,500)

But non English characters takes more than 1byte in Oracle, causing value too large oracle error.

Can you please advise how can we handle this scenario. Don't want to increase column size.


and Chris said...

You can use substrb to return a number of bytes from a string:

select substrb('éééééééééé', 1, 8) eight_bytes,
       substr('éééééééééé', 1, 8) eight_chars
from dual;

EIGHT_BY EIGHT_CHARS    
-------- ----------------
éééé     éééééééé        

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, April 20, 2016 - 1:29 pm UTC

Is there any disadvantage of using substr. My DBA is not agreeing to use substrb.

Can you advise any other methods to process data with out getting exception.


Chris Saxon
April 20, 2016 - 2:45 pm UTC

What exactly is their objection? Press them for a specific reason.

The disadvantage of substr is it limits the number of characters, not the number of bytes!

As in the example shown, the string is 10 characters long. But it's 20 bytes. So if your limit is 8 bytes substr returns four characters too many.

If you want to limit a string to X bytes, substrb is the way to go!

Multi-byte character sets

Heath, April 21, 2016 - 1:55 pm UTC

A potential issue with SUBSTRB and multi-byte character sets is if it cuts off the string in the middle of one of those multi-byte characters. What would SUBSTRB(1,3) mean for the string in the previous example?

String data makes sense in the context of characters, not bytes. If you are storing multi-byte characters in a varchar2 column and you need to be able to hold 500 characters, then define the column using character length semantics. Some strings may take more than 500 bytes to store, but I cannot think of a reason why your DBA would object to that.
Chris Saxon
April 21, 2016 - 3:31 pm UTC

It seems that if substrb finishes part way through a character, it just returns a space:

select substrb('éééééééééé', 1, 3) three_bytes,
       dump(substrb('éééééééééé', 1, 3)) three_byte_dump,
       ascii(' ') space_chr
from dual;

THR THREE_BYTE_DUMP          SPACE_CHR
--- ----------------------- ----------
é   Typ=1 Len=3: 195,169,32         32


I agree with your conclusion though: the correct solution here is to use character semantics for the column.

nls

A reader, May 02, 2016 - 6:57 pm UTC

You can set the database NLS_CHARACTERSET at database level by alter database statement. Your client should support the correct NLS characterset as well. I had it in Oman where we were storing Arabic characters in database where we had Arabic windows as client