Home>Question Details



Mikito -- Thanks for the question regarding "equality predicate on clob", version 9i

Submitted on 26-Sep-2001 20:34 Central time zone
Last updated 23-Jun-2005 13:51

You Asked

I've looked into 9i clob manual (wow, whole manual for one built-in datatype!) and still 
have a question: What index technique can be used to make effective clob comparison in a 
query like this:

select * from my_table where clob_column = another_clob;

I noticed that whenever I create a table with clob a unique index is created as well. Is 
it a "clob" identity index? 

and we said...

You cannot index a CLOB like that.  A clob is potentionally HUGE -- indexes don't work on 
HUGE things.

We can index the CONTENTS of the clob using Oracle text (aka interMedia).

One method you can use would be to create an index on the checksum of the first 32k of 
the clob.  this checksum is "small".  We can use that to effectively reduce the NUMBER of 
clobs we need to compare to.  We can index the checksum of the CLOB and at runtime use a 
predicate like:

 where checksum(clob_column) = :checksum_of_thing_to_look_for
   and clob_column = another_clob


For example:

ops$tkyte@ORA9I.WORLD> create table t ( x int primary key, y clob );

Table created.

ops$tkyte@ORA9I.WORLD> create index t_idx on t(checksum(y));

Index created.

ops$tkyte@ORA9I.WORLD> 
ops$tkyte@ORA9I.WORLD> insert into t values (1,'hello world' );

1 row created.

ops$tkyte@ORA9I.WORLD> insert into t values (2,'goodbye' );

1 row created.

ops$tkyte@ORA9I.WORLD> insert into t values (3,'this is just a test...' );

1 row created.

ops$tkyte@ORA9I.WORLD> 
ops$tkyte@ORA9I.WORLD> set autotrace on
ops$tkyte@ORA9I.WORLD> select t.*
  2    from t
  3   where checksum(y) = checksum('hello world')
  4     and to_char(y) = 'hello world'
  5  /

         X Y
---------- 
--------------------------------------------------------------------------------
         1 hello world


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=FIRST_ROWS (Cost=1 Card=1 Bytes=2015)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=1 Card=1 Bytes=2015)
   2    1     INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) (Cost=1 Card=1)


Where I used:

ops$tkyte@ORA9I.WORLD> create or replace function checksum_str( p_str in varchar2 ) 
return number
  2  as
  3     l_checksum varchar2(32);
  4     l_raw      raw(16);
  5  begin
  6  
  7     l_checksum :=
  8            dbms_obfuscation_toolkit.md5( input_string=> p_str );
  9  
 10     l_checksum := utl_raw.cast_to_raw( l_checksum );
 11  
 12     return to_number( substr(l_checksum,1,16), lpad('x',33,'x') )+
 13            to_number( substr(l_checksum,17), lpad('x',33,'x') );
 14  
 15  end;
 16  /

Function created.

ops$tkyte@ORA9I.WORLD> create or replace function checksum( p_lob in clob ) return number
  2  DETERMINISTIC
  3  as
  4     l_checksum varchar2(32);
  5     l_raw      raw(16);
  6  begin
  7  
  8     l_checksum :=
  9            dbms_obfuscation_toolkit.md5( input_string=> 
dbms_lob.substr(p_lob,32000,1) );
 10  
 11     l_checksum := utl_raw.cast_to_raw( l_checksum );
 12  
 13     return to_number( substr(l_checksum,1,16), lpad('x',33,'x') )+
 14            to_number( substr(l_checksum,17), lpad('x',33,'x') );
 15  
 16  end;
 17  /

Function created.

as simple checksum functions. 

Reviews    
5 stars cool!   September 27, 2001 - 1pm Central time zone
Reviewer: Mikito Harakiri 
BTW, I was trying a similar thing

CREATE INDEX my_idx ON my_table (SUBSTR(clob_column,1,100));

and it didn't work
 


4 stars explicit hash function in the equality predicate   September 27, 2001 - 1pm Central time zone
Reviewer: Mikito Harakiri 
On pessimistic note, the ideal solution would avoid explicitly mentioning hash function in the 
equality predicate -- it is physical access implementation detail after all. I guess this is too 
much to ask from the optimizer today... 


3 stars no equality predicate on clobs   September 27, 2001 - 9pm Central time zone
Reviewer: Mikito Harakiri 
Hmm... It took me a while to notice to_char function in your predicate. 

There seems no equality predicate on clobs to begin with. CLOB manual says:

Non-Supported SQL Functionality for LOBs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The following SQL functionality is not supported for LOBs because the functions are either 
infrequently used or have easy workarounds. 
...
Operators: >, <, =, != 

 


Followup   September 28, 2001 - 6am Central time zone:

well, you seem to be getting less and less "happy" with this answer.

The fact is CLOBS are big, huge, not really things that you want to be doing equals on -- you have 
to force it in some cases.

I would question why you need to do this -- compare clobs -- it will be slow, the tips here will 
help you make it faster, more realistic.... 

3 stars no equality predicate on clobs   September 28, 2001 - 1pm Central time zone
Reviewer: Mikito Harakiri 
I'm collecting sql statements from huge ERP application. Most of the statements are reasonably 
short, while the longest one is 30K bytes long. I need to be able to identify identical statements, 
as I also would like to peek into V$SQL. Therefore, I'm not comparing huge clobs at all.

Even if I would, knowing that todays database limits are somewhere under 1B records, we can think 
of hash function that can make this comparison efficient. This hash heed to have a range > 1B*1B in 
order to eleminate chances of collision.

On the other hand, you are probably correct, naive literal comparison of clobs woldn't get me too 
far. In my example, ignoring differences in the names of bind variables would probably be better 
comparison criteria. Looks like for big objects only user-defined equality makes sence. 

 

 


3 stars Another way of computing a checksum for anysize clob   March 17, 2003 - 5pm Central time zone
Reviewer: Yves Azie 
create or replace and compile
java source named "GetChecksum"
as
import java.io.*;
import java.lang.*;
import java.security.*;

/** 
 * This class takes in a clob, and computes 
 * a unique hash value based on the md5 hash algorythm
 */
public class GetChecksum extends Object
{
   public static float get(oracle.sql.CLOB clobToCompute) throws Exception
   {
      BufferedReader clobReader = new BufferedReader(clobToCompute.getCharacterStream());
      String s = null;
      StringBuffer sb = new StringBuffer();
      while ((s = clobReader.readLine()) != null) {
        sb.append(s);
      }
      clobReader.close();

      byte[] clobBytes = sb.toString().getBytes();
      MessageDigest md5 = MessageDigest.getInstance("MD5");
      byte[] clobResult = md5.digest(clobBytes);

      ByteArrayInputStream bytestream1 = new ByteArrayInputStream (clobResult);
      DataInputStream bytestream2 = new DataInputStream( bytestream1 );

      float clobFloat = bytestream2.readFloat();
      return clobFloat;
   }
}
/
show err

create or replace function checksum ( p_lob in clob ) 
return number
  DETERMINISTIC
  as
  language java
  name 'GetChecksum.get(oracle.sql.CLOB) return float';
/
show err


You may then use this function in SQL as follows:
select checksum(clobfield) from clobTable; 


4 stars Comparing CLOBs   June 22, 2005 - 5pm Central time zone
Reviewer: Anirudh from NJ, USA
Hi Tom,

We have this requirement of comparing two CLOBS or rather do a stuff which is similar to "Track 
Changes" in Microsoft Word. We would like to have a program which would compare two CLOBs and 
return the differences (line no., word no. or whatever). 
I figure that there is no simple and easy way to go about it, can you please suggest an approach 
which would help us in doing this job. 
Again, our requirement is just to compare two CLOBS and point out the differences to the users.

Please feel free to ignore this if you feel this question is  irrelevant to the context of the 
parent topic.

Thanks as always.
Anirudh

 


Followup   June 23, 2005 - 1pm Central time zone:

there is no simple way to do this, you'd have to implement it or find an implementation of it... 

4 stars MD5 checksum on a clob   August 11, 2005 - 9am Central time zone
Reviewer: Carsten RĂ¼thel from BY Germany
Hi,

we have modified Yves Azie's code to get "real" MD5 in a 32 digit hex code.

create or replace and compile
java source named "GetChecksum"
as
import java.io.*;
import java.lang.*;
import java.security.*;

/**
 * This class takes in a clob, and computes
 * a unique hash value based on the md5 hash algorythm
 */
public class GetChecksum extends Object
{

    public static String byteToHex(byte b){
      int i = b & 0xFF;
      String res = Integer.toHexString(i);
      if ( res.length() == 0 ) return ( "00" );
      if ( res.length() == 1 ) return ( "0"+res );
      return ( res );
    }

    public static String byteArr2Hex( byte[] b ) {
      String res = new String();
      for ( int ix=0; ix < b.length; ++ix ) {
        res += byteToHex(b[ix]);
      }
      return ( res );
    }

   public static String get(oracle.sql.CLOB clobToCompute) throws Exception
   {
      BufferedReader clobReader = new BufferedReader(clobToCompute.getCharacterStream());
      String s = null;
      StringBuffer sb = new StringBuffer();
      while ((s = clobReader.readLine()) != null) {
        sb.append(s);
      }
      clobReader.close();

      byte[] clobBytes = sb.toString().getBytes();
      MessageDigest md5 = MessageDigest.getInstance("MD5");
      byte[] clobResult = md5.digest(clobBytes);

      return ( byteArr2Hex ( clobResult ));
   }
}
 



Write a Review
 


All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.

About Oracle | Legal Notices and Terms of Use | Privacy Statement