Skip to Main Content
  • Questions
  • convert CLOB data over 4K character to character

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Prabhu.

Asked: February 22, 2017 - 9:09 am UTC

Last updated: February 25, 2017 - 12:30 am UTC

Version: 11g

Viewed 10K+ times! This question is

You Asked

Hi,

I am creating a materialized view, which needs to join multiple tables, and one of them (just only one table) giving me duplicate records in the view. When I used 'Distinct' keyword it was throwing error, knowing that when SELECT script has CLOB, columns we can't use 'Distinct' keyword. So I converted CLOB into char using to_char method, it was fine till yesterday. Now it's throwing an error , because of for some records, CLOB columns has over 4K characters. So to_char () throwing buffer size is too small. Please help me how to fix this problem ?

Note : I have to use that particular table which is giving me duplicate records, and can't fix that because the table has designed like that, So only option is converting into character or string , so that I can use distinct keyword.

and Chris said...

You can use dbms_lob.substr to extract the first 4,000 characters. This returns a varchar2. So you can do a distinct on this in SQL:

create table t (
  x clob
);

declare
  tmp clob;
begin
  dbms_lob.createtemporary(tmp, true);
  
  for i in 0 .. 9 loop
    dbms_lob.writeappend(tmp, 4000, lpad('x', 4000, 'x'));
  end loop;

  insert into t values (tmp);
end;
/

select dbms_lob.getlength(x) from t;

DBMS_LOB.GETLENGTH(X)  
40,000  

select count(distinct x) from t;

SQL Error: ORA-00932: inconsistent datatypes: expected - got CLOB

select count(distinct dbms_lob.substr(x, 4000, 1)) from t;

COUNT(DISTINCTDBMS_LOB.SUBSTR(X,4000,1))  
1 

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Prabhu Baskaran, February 22, 2017 - 4:54 pm UTC

Thanks for your quick reply.dbms_lob.substr () method returns upto 4k chars only, but I want to return all 40k chars.
Chris Saxon
February 22, 2017 - 4:56 pm UTC

If you want all 40k characters, then it has to be a clob! Which means you can't use distinct...

Or you could make multiple calls to dbms_lob.substr as separate columns in your output, getting the first 4k, then the next 4k and so on.

Suggest

A reader, February 22, 2017 - 6:17 pm UTC

Could one use a group by? Or analytics like dense rank or rownumer?
Connor McDonald
February 23, 2017 - 2:51 am UTC

Does your clob data also *define* what you mean by distinct ?

If not, then change

select distinct c1, c2, c3, clobcol from t

to

select c1,c2,c3, clobcol
from 
  ( select c1,c2,c3, colbcol, row_number() over ( partition by by c1,c2,c3 order by rowid) r
    from  t
  )
where r = 1



If the clob *does* matter, then you could do

select c1,c2,c3, clobcol
from 
  ( select c1,c2,c3, clobcol, row_number() over ( partition by by c1,c2,c3,ora_hash(clobcol) order by rowid) r
    from  t
  )
where r = 1


Follow up

A reader, February 23, 2017 - 6:16 am UTC

That's what expect as answer. So the ora_hash works fine for clobs? But is it performance concerns?
Connor McDonald
February 25, 2017 - 12:30 am UTC

If you need the clob content to be included in the determination of distinct, then you need to read the clob data no matter what you do. That's more likely to be the significant cost rather than the applying of a hash function.

The real problem of course to be solved here is - stopping duplicate data from getting *in*

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here