Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Sergiusz Wolicki

Thanks for the question, Anton.

Asked: December 02, 2021 - 8:02 am UTC

Last updated: December 08, 2021 - 2:26 am UTC

Version: 12.2

Viewed 1000+ times

You Asked

According to https://docs.oracle.com/database/121/SQLRF/functions100.htm#SQLRF00658 the length function should work on a clob and on a varchar2, returning the length of a string in characters, as defined by the input character set.
Could you explain why the length of the same string differs for a clob and a varchar2? Isn't the clob in the same character set?

select length( c ), length( to_clob( c ) ), dbms_lob.getlength( to_clob( c ) ), nls_charset_name( nls_charset_id( 'CHAR_CS' ) )
from ( select chr(to_number('f09f9090','xxxxxxxx')) c, banner from dual )

LENGTH(C) LENGTH(TO_CLOB(C)) DBMS_LOB.GETLENGTH(TO_CLOB(C)) NLS_CHARSET_NAME(NLS_CHARSET_ID('CHAR_CS'))
1 2 2 AL32UTF8


and we said...

Indeed, CLOBs are stored in AL16UTF16 in any database that has a multibyte character set, such as AL32UTF8.

The UTF-8 code {f0, 9f, 90, 90} represents the supplementary character U+1F410 (GOAT, eighth of the signs of the Asian zodiac, used in Vietnam). Its UTF-16 encoding is a surrogate pair 0xD83D 0xDC10. The function LENGTH counts the pair as two characters.

This storage form is the compromise necessary to make CLOB character semantics for offsets and lengths performant with the potentially terabyte long values.

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database