Thanks for the question, Anton.
Asked: December 02, 2021 - 8:02 am UTC
Last updated: December 08, 2021 - 2:26 am UTC
Version: 12.2
Viewed 1000+ times
You Asked
According to
https://docs.oracle.com/database/121/SQLRF/functions100.htm#SQLRF00658 the length function should work on a clob and on a varchar2, returning the length of a string in characters, as defined by the input character set.
Could you explain why the length of the same string differs for a clob and a varchar2? Isn't the clob in the same character set?
select length( c ), length( to_clob( c ) ), dbms_lob.getlength( to_clob( c ) ), nls_charset_name( nls_charset_id( 'CHAR_CS' ) )
from ( select chr(to_number('f09f9090','xxxxxxxx')) c, banner from dual )
LENGTH(C) LENGTH(TO_CLOB(C)) DBMS_LOB.GETLENGTH(TO_CLOB(C)) NLS_CHARSET_NAME(NLS_CHARSET_ID('CHAR_CS'))
1 2 2 AL32UTF8
and we said...
Indeed, CLOBs are stored in AL16UTF16 in any database that has a multibyte character set, such as AL32UTF8.
The UTF-8 code {f0, 9f, 90, 90} represents the supplementary character U+1F410 (GOAT, eighth of the signs of the Asian zodiac, used in Vietnam). Its UTF-16 encoding is a surrogate pair 0xD83D 0xDC10. The function LENGTH counts the pair as two characters.
This storage form is the compromise necessary to make CLOB character semantics for offsets and lengths performant with the potentially terabyte long values.
Is this answer out of date? If it is, please let us know via a Comment