Skip to Main Content
  • Questions
  • Get the length of CLOB columns in bytes

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Livio.

Asked: February 15, 2016 - 4:15 pm UTC

Last updated: January 11, 2022 - 3:54 am UTC

Version: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

Viewed 50K+ times! This question is

You Asked

Hello Tom,

I have a table with a CLOB column:


create table plch_clob
(i int primary key, x clob);

begin
   for indx in 1 .. 1000 loop
      insert into plch_clob( i, x) values (indx, 'CLOB Row: ' || indx);
   end loop;
   
   commit;
end;



By calling the function dbms_lob.getlength, I get the total number of characters for each CLOB:

select i, DBMS_LOB.GETLENGTH(x) len
from plch_clob
order by i desc


But I need to find out, for each row, the length of the CLOB column in byte, as well as the total and average length of the CLOBs, always in bytes.
What is the best way to do that?

Thanks in advance

Livio

and Chris said...

As stated in MOS note 790886.1, there isn't a function available to do this. You need to write your own.

dbms_lob.getlength gives the length in bytes if the input is a blob. So they solution they provide is to convert the clob to a blob. Then measure the length of the blob:

create or replace function cloblengthb(p_clob in clob ) return number
as
  v_temp_blob BLOB;
  v_dest_offset NUMBER := 1;
  v_src_offset NUMBER := 1;
  v_amount INTEGER := dbms_lob.lobmaxsize;
  v_blob_csid NUMBER := dbms_lob.default_csid;
  v_lang_ctx INTEGER := dbms_lob.default_lang_ctx;
  v_warning INTEGER;
  v_total_size number := 0; -- Return total clob length in bytes
BEGIN
  IF p_clob is not null THEN
    DBMS_LOB.CREATETEMPORARY(lob_loc=>v_temp_blob, cache=>TRUE);
    DBMS_LOB.CONVERTTOBLOB(v_temp_blob, p_clob,v_amount,v_dest_offset,v_src_offset,v_blob_csid,v_lang_ctx,v_warning);
    v_total_size := DBMS_LOB.GETLENGTH(v_temp_blob);
    DBMS_LOB.FREETEMPORARY(v_temp_blob);
  ELSE
    v_total_size := NULL;
  END IF;
  return v_total_size;
end cloblengthb;
/

create table plch_clob
(i int primary key, x clob);

begin
   for indx in 1 .. 1000 loop
      insert into plch_clob( i, x) values (indx, 'éé CLOB Row: ' || indx);
   end loop;
   
   commit;
end;
/

select cloblengthb(x), dbms_lob.getlength(x) from plch_clob
where  rownum = 1;

                         CLOBLENGTHB(X)                   DBMS_LOB.GETLENGTH(X)
--------------------------------------- ---------------------------------------
                                     16                                      14


Update: 2021

In most modern versions of the database, the defaults are multibyte charactersets (UTF8 etc), and thus we stored characters as 2byte pairs. Some extended characters need multiple 2-bytes pairs, but for most cases, a reasonable approximate of space used will be

dbms_log.getlength() *2

Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Outdated

Wat Hughes, May 16, 2018 - 8:11 pm UTC

dbms_lob.getlength works for me on CLOB values.
Connor McDonald
May 18, 2018 - 2:16 am UTC

Can you elucidate on what you mean by "works"?

getlength always "works" but the question was about getting a byte length rather than character length.

A reader, July 04, 2018 - 9:50 pm UTC


Exactly right

P Jordan, November 04, 2019 - 4:55 pm UTC

DBMS_LOB.GETLENGTH returns character length, not bytes.

cloblengthb is nice function, but wrong

Pavol Babel, December 15, 2021 - 8:23 am UTC

Oracle store CLOB data type in UCS-2, rather than UTF-8 even in case when charatcer set is AL32UTF8. That means fixed 2-byte representation of every character.
So cloblenthb is obviously wrong, at least it does not tell anything of your CLOB size in bytes.
Chris Saxon
December 16, 2021 - 6:54 pm UTC

I reached out to Sergiusz Wolicki, our NLS/character set expert about this:

They are stored in UTF-16 but only if the database character set is multibyte. In single byte databases, CLOBs are stored in the database character set, so each character is one byte. As the national character set is always multibyte, NCLOBs are always in UTF-16.

UTF-16 means that the supplementary characters in an AL32UTF8 database (code points > U+0FFFF) are stored as surrogate pairs. Each AL32UTF8 character is then mapped to two UTF-16 codes. So, each original character may occupy either one or two 2-byte codes, kind of 2 or 4 bytes, depending on whether its Unicode code point is below or above U+0FFFF.

To byte or not to byte

Name, December 17, 2021 - 2:54 pm UTC

So cloblenthb is obviously wrong, at least it does not tell anything of your CLOB size in bytes.


This is discussion statement.
If you need the length in bytes for external exchange (for example http content-length), you need to calculate the bytes in charset of that exchange protocol.
If you want to roughly estimate the space occupied by CLOB, for fixed-width database use the dbms_lob.getlength(). For variable-width database use the dbms_lob.getlength()*2. Do not use length(CLOB) in database AL32UTF8, since it counts supplementary characters as 1 char.

The support note about CLOBs states that UCS2 for CLOB was used for 9x database. Since 10g is used the endian independent AL16UTF16 charset. Globalization support guide up to 12c mentioned UCS2.
Connor McDonald
January 11, 2022 - 3:54 am UTC

Agreed.

Unless your clobs are littered with emojis etc :-), then getlength*2 is going to be close enough to get a good estimate

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database