There are a few options:
Substr the lobUse dbms_lob.substr to return the first 4k characters and take the distinct of that:
create table t (
c1 clob
);
insert into t values ( 'xxx' );
insert into t values ( 'xxx' );
insert into t values ( 'yyy' );
commit;
select distinct dbms_lob.substr ( c1 , 4000, 1 )
from t;
DBMS_LOB.SUBSTR(C1,4000,1)
yyy
xxx
Obviously you need to be reasonably sure these first N characters will be close enough to the distinct values.
Hash the lobGenerate hashes of the clobs using dbms_crypto. Use this as the partition by clause for row_number to find others with the same hash. Then filter to those where the row_number = 1:
with rws as (
select row_number () over (
partition by sys.dbms_crypto.hash ( c1, 1 ) order by rowid
) rn,
t.*
from t
)
select c1 from rws
where rn = 1;
C1
xxx
yyy
Theoretically has a chance of false positives - i.e. hashing two different clobs to the same value. Unlikely to happen in practice.
DBMS_lob.compareAdd a unique row number to every row. Use dbms_lob.compare to find matching values. Discard those with a higher row number using not exists:
with rws as (
select row_number () over (
order by rowid
) rn,
t.*
from t
)
select * from rws r1
where not exists (
select * from rws r2
where dbms_lob.compare ( r1.c1, r2.c1 ) = 0
and r1.rn > r2.rn
);
RN C1
1 xxx
3 yyy