multibyte character sets do not use a fixed number of bytes to represent each and every character.
They use between 0 and N bytes - the word testing - likely just uses 7 one byte characters.
You would need to use "fancier" characters from your character set.
Here is an excerpt from my last book on this topic:
<quote src=Expert Oracle Database Architecture>
Bytes or Characters
The VARCHAR2 and CHAR types support two methods of specifying lengths:
* In bytes: VARCHAR2(10 byte). This will support up to 10 bytes of data, which could be as few as two characters in a multibyte character set.
* In characters: VARCHAR2(10 char). This will support to up 10 characters of data, which could be as much as 40 bytes of information.
When using a multibyte character set such as UTF8, you would be well advised to use the CHAR modifier in the VARCHAR2/CHAR definition¿that is, use VARCHAR2(80 CHAR), not VARCHAR2(80), since your intention is likely to define a column that can in fact store 80 characters of data. You may also use the session or system parameter NLS_LENGTH_SEMANTICS to change the default behavior from BYTE to CHAR. I do not recommend changing this setting at the system level; rather, use it as part of an ALTER SESSION setting in your database schema installation scripts. Any application that requires a database to have a specific set of NLS settings makes for an ¿unfriendly¿ application. Such applications generally cannot be installed into a database with other applications that do not desire these settings, but rely on the defaults to be in place.
One other important thing to remember is that the upper bound of the number of bytes stored in a VARCHAR2 is 4,000. However, even if you specify VARCHAR2(4000 CHAR), you may not be able to fit 4,000 characters into that field. In fact, you may be able to fit as few as 1,000 characters in that field if all of the characters take 4 bytes to be represented in your chosen character set!
The following small example demonstrates the differences between BYTE and CHAR and how the upper bounds come into play. We¿ll create a table with three columns, the first two of which will be 1 byte and one character, respectively, with the last column being 4,000 characters. Notice that we¿re performing this test on a multibyte character set database using the character set AL32UTF8, which supports the latest version of the Unicode standard and encodes characters in a varyiable length fashion using from 1 to 4 bytes for each character:
ops$tkyte@O10GUTF> select *
2 from nls_database_parameters
3 where parameter = 'NLS_CHARACTERSET';
PARAMETER VALUE
------------------------------ --------------------
NLS_CHARACTERSET AL32UTF8
ops$tkyte@O10GUTF> create table t
2 ( a varchar2(1),
3 b varchar2(1 char),
4 c varchar2(4000 char)
5 )
6 /
Table created.
Now, if we try to insert into our table a single character that is 2 bytes long in UTF, we observe the following:
ops$tkyte@O10GUTF> insert into t (a) values (unistr('\00d6'));
insert into t (a) values (unistr('\00d6'))
*
ERROR at line 1:
ORA-12899: value too large for column "OPS$TKYTE"."T"."A"
(actual: 2, maximum: 1)
This example demonstrates two things:
* VARCHAR2(1) is in bytes, not characters. We have single Unicode character, but it won¿t fit into a single byte.
* As you migrate an application from a single-byte fixed-width character set to a multibyte character set, you might find that the text that used to fit into your fields no longer does.
The reason for the second point is that a 20-character string in a single-byte character set is 20 bytes long and will absolutely fit in a VARCHAR2(20). However a 20-character field could be as long as 80 bytes in a multibyte character set, and 20 Unicode characters may well not fit in 20 bytes. You might consider modifying your DDL to be VARCHAR2(20 CHAR) or using the NLS_LENGTH_SEMANTICS session parameter mentioned previously when running your DDL to create your tables.
If we insert that single character into a field set up to hold a single character, we will observe the following:
ops$tkyte@O10GUTF> insert into t (b) values (unistr('\00d6'));
1 row created.
ops$tkyte@O10GUTF> select length(b), lengthb(b), dump(b) dump from t;
LENGTH(B) LENGTHB(B) DUMP
---------- ---------- --------------------
1 2 Typ=1 Len=2: 195,150
That INSERT succeeded, and we can see that the LENGTH of the inserted data is one character¿all of the character string functions work ¿character-wise.¿ So the length of the field is one character, but the LENGTHB (length in bytes) function shows it takes 2 bytes of storage, and the DUMP function shows us exactly what those bytes are. So, that example demonstrates one very common issue people encounter when using multibyte character sets, namely that a VARCHAR2(N) doesn¿t necessarily hold N characters, but rather N bytes.
The next issue people confront frequently is that the maximum length in bytes of a VARCHAR2 is 4,000, and in a CHAR it is 2,000:
ops$tkyte@O10GUTF> declare
2 l_data varchar2(4000 char);
3 l_ch varchar2(1 char) := unistr( '\00d6' );
4 begin
5 l_data := rpad( l_ch, 4000, l_ch );
6 insert into t ( c ) © values ( l_data );
7 end;
8 /
declare
*
ERROR at line 1:
ORA-01461: can bind a LONG value only for insert into a LONG column
ORA-06512: at line 6
That shows that a 4,000-character string that is really 8,000 bytes long cannot be stored permanently in a VARCHAR2(4000 CHAR) field. It fits in the PL/SQL variable because in PL/SQL a VARCHAR2 is allowed to be up to 32KB in size. However, when it is stored in a table, the hard limit is 4,000 bytes. We can store 2,000 of these characters successfully:
ops$tkyte@O10GUTF> declare
2 l_data varchar2(4000 char);
3 l_ch varchar2(1 char) := unistr( '\00d6' );
4 begin
5 l_data := rpad( l_ch, 2000, l_ch );
6 insert into t ( c ) values ( l_data );
7 end;
8 /
PL/SQL procedure successfully completed.
ops$tkyte@O10GUTF> select length( c ), lengthb( c )
2 from t
3 where c is not null;
LENGTH(C) LENGTHB(C)
---------- ----------
2000 4000
And as you can see, they consume 4,000 bytes of storage.
</quote>