cool!
Mikito Harakiri, September 27, 2001 - 1:01 pm UTC
BTW, I was trying a similar thing
CREATE INDEX my_idx ON my_table (SUBSTR(clob_column,1,100));
and it didn't work
explicit hash function in the equality predicate
Mikito Harakiri, September 27, 2001 - 1:07 pm UTC
On pessimistic note, the ideal solution would avoid explicitly mentioning hash function in the equality predicate -- it is physical access implementation detail after all. I guess this is too much to ask from the optimizer today...
no equality predicate on clobs
Mikito Harakiri, September 27, 2001 - 9:50 pm UTC
Hmm... It took me a while to notice to_char function in your predicate.
There seems no equality predicate on clobs to begin with. CLOB manual says:
Non-Supported SQL Functionality for LOBs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The following SQL functionality is not supported for LOBs because the functions are either infrequently used or have easy workarounds.
...
Operators: >, <, =, !=
September 28, 2001 - 6:58 am UTC
well, you seem to be getting less and less "happy" with this answer.
The fact is CLOBS are big, huge, not really things that you want to be doing equals on -- you have to force it in some cases.
I would question why you need to do this -- compare clobs -- it will be slow, the tips here will help you make it faster, more realistic....
no equality predicate on clobs
Mikito Harakiri, September 28, 2001 - 1:37 pm UTC
I'm collecting sql statements from huge ERP application. Most of the statements are reasonably short, while the longest one is 30K bytes long. I need to be able to identify identical statements, as I also would like to peek into V$SQL. Therefore, I'm not comparing huge clobs at all.
Even if I would, knowing that todays database limits are somewhere under 1B records, we can think of hash function that can make this comparison efficient. This hash heed to have a range > 1B*1B in order to eleminate chances of collision.
On the other hand, you are probably correct, naive literal comparison of clobs woldn't get me too far. In my example, ignoring differences in the names of bind variables would probably be better comparison criteria. Looks like for big objects only user-defined equality makes sence.
Another way of computing a checksum for anysize clob
Yves Azie, March 17, 2003 - 5:11 pm UTC
create or replace and compile
java source named "GetChecksum"
as
import java.io.*;
import java.lang.*;
import java.security.*;
/**
* This class takes in a clob, and computes
* a unique hash value based on the md5 hash algorythm
*/
public class GetChecksum extends Object
{
public static float get(oracle.sql.CLOB clobToCompute) throws Exception
{
BufferedReader clobReader = new BufferedReader(clobToCompute.getCharacterStream());
String s = null;
StringBuffer sb = new StringBuffer();
while ((s = clobReader.readLine()) != null) {
sb.append(s);
}
clobReader.close();
byte[] clobBytes = sb.toString().getBytes();
MessageDigest md5 = MessageDigest.getInstance("MD5");
byte[] clobResult = md5.digest(clobBytes);
ByteArrayInputStream bytestream1 = new ByteArrayInputStream (clobResult);
DataInputStream bytestream2 = new DataInputStream( bytestream1 );
float clobFloat = bytestream2.readFloat();
return clobFloat;
}
}
/
show err
create or replace function checksum ( p_lob in clob )
return number
DETERMINISTIC
as
language java
name 'GetChecksum.get(oracle.sql.CLOB) return float';
/
show err
You may then use this function in SQL as follows:
select checksum(clobfield) from clobTable;
Comparing CLOBs
Anirudh, June 22, 2005 - 5:18 pm UTC
Hi Tom,
We have this requirement of comparing two CLOBS or rather do a stuff which is similar to "Track Changes" in Microsoft Word. We would like to have a program which would compare two CLOBs and return the differences (line no., word no. or whatever).
I figure that there is no simple and easy way to go about it, can you please suggest an approach which would help us in doing this job.
Again, our requirement is just to compare two CLOBS and point out the differences to the users.
Please feel free to ignore this if you feel this question is irrelevant to the context of the parent topic.
Thanks as always.
Anirudh
June 23, 2005 - 1:51 pm UTC
there is no simple way to do this, you'd have to implement it or find an implementation of it...
MD5 checksum on a clob
Carsten RĂ¼thel, August 11, 2005 - 9:41 am UTC
Hi,
we have modified Yves Azie's code to get "real" MD5 in a 32 digit hex code.
create or replace and compile
java source named "GetChecksum"
as
import java.io.*;
import java.lang.*;
import java.security.*;
/**
* This class takes in a clob, and computes
* a unique hash value based on the md5 hash algorythm
*/
public class GetChecksum extends Object
{
public static String byteToHex(byte b){
int i = b & 0xFF;
String res = Integer.toHexString(i);
if ( res.length() == 0 ) return ( "00" );
if ( res.length() == 1 ) return ( "0"+res );
return ( res );
}
public static String byteArr2Hex( byte[] b ) {
String res = new String();
for ( int ix=0; ix < b.length; ++ix ) {
res += byteToHex(b[ix]);
}
return ( res );
}
public static String get(oracle.sql.CLOB clobToCompute) throws Exception
{
BufferedReader clobReader = new BufferedReader(clobToCompute.getCharacterStream());
String s = null;
StringBuffer sb = new StringBuffer();
while ((s = clobReader.readLine()) != null) {
sb.append(s);
}
clobReader.close();
byte[] clobBytes = sb.toString().getBytes();
MessageDigest md5 = MessageDigest.getInstance("MD5");
byte[] clobResult = md5.digest(clobBytes);
return ( byteArr2Hex ( clobResult ));
}
}
Is there a mistake in the checksum function ?
Neil Smith, March 24, 2020 - 6:39 am UTC
Is there a mistake ?
You've declared a variable l_raw which isn't used, and then there's this line:
l_checksum := utl_raw.cast_to_raw( l_checksum );
that will cast it to raw and then implicitly cast it back to varchar2 as it's being assigned back to the l_checksum variable. i.e. do nothing.
should that be
l_raw := utl_raw.cast_to_raw( l_checksum );
and then use l_raw in place of l_checksum afterwards, or was l_raw and the cast_to_raw not required at all, or am I missing some subtle effect of the cst_to_raw?
March 25, 2020 - 1:51 am UTC
Not really a mistake just a redundant variable and a touch of laziness.
l_checksum := utl_raw.cast_to_raw( l_checksum );
will
- convert the checksum to a raw
- then due to the datatype mismatch, convert that raw to a char to store in l_checksum
So the net result is the 32byte char represetnation of the 16byte raw being stored.
"l_raw" can be dispensed with
But nice catch!