I'm sorry, my last follow up question was too brief. I should have provided more details into my issues.
In my example, I based my approach on converting the row data to VARCHAR2 as a basis:
ALTER TABLE my_emp
ADD hashkey VARCHAR2(4000)
GENERATED ALWAYS AS(
CAST(MY_SECURITY_PKG.hash_value(TO_CHAR(empno) || '|' || ename || '|' || job || '|' || TO_CHAR(mgr) || '|' || TO_CHAR(hiredate, 'mm/dd/yyyy') || '|' || TO_CHAR(sal) || '|' || TO_CHAR(comm) || '|' || TO_CHAR(deptno))
AS VARCHAR2(512))
);
/
and my function receives and return VARCHAR2:
CREATE OR REPLACE PACKAGE BODY my_security_pkg
AS
FUNCTION hash_value(
p_input_string_in IN VARCHAR2
)
RETURN VARCHAR2 DETERMINISTIC
IS
BEGIN
RETURN DBMS_CRYPTO.HASH(UTL_I18N.string_to_raw(p_input_string_in),DBMS_CRYPTO.hash_sh512);
END hash_value;
END my_security_pkg;
/
So now I introduce a column that is either a CLOB or a BLOB. What is the best approach to hashing my record? Should I hash the CLOB/BLOB by itself and simply include the hash in my concatenation and then hash the entire record? Also, looking at my choice of VARCHAR2 for the parameter into my hash_value FUNCTION I'm thinking I should switch to CLOB or maybe at least overload it to handle if my concatenation ever gets beyond 4000 characters.
Finally, one last question... DBMS_CRYPTO.HASH returns RAW. After writing the original code in the question I noticed that I did not convert raw to varchar2 in my function but it had been working. I assume Oracle has been implicitly converting for me. I added in an explicit conversion to avoid problems:
RETURN UTL_I18N.raw_to_char(DBMS_CRYPTO.hash(UTL_I18N.string_to_raw(p_input_string_in),DBMS_CRYPTO.hash_sh512));
Before I would get readable alphanumeric characters returned, but afterwards would get unreadable characters. Can you explain what is happening? Here is basically that change written as a SELECT:
SELECT
UTL_I18N.raw_to_char(DBMS_CRYPTO.hash(UTL_I18N.string_to_raw('Some string'||'Another string just to add some length'||TO_CHAR(SYSDATE, 'mm/dd/yyyy hh24:mi:ss')||TO_CHAR(53)),6))
FROM dual
Thanks for taking the time to review and answer my follow up. I get so much valuable insight from your site and I cannot imagine doing my job without having it as a resource.
September 03, 2020 - 2:59 am UTC
DBMS_CRYPTO.hash can take as inputs
raw
clob
blob
so for the latter 2 there isn't really a need to do any casting to raw. But since you are going for a combination of data types, you'd really need to take the "superset" which would be blob.
The risk there is the overhead of building a new blob from your existing blob + other columns. I would recommend a simple benchmark which would test the differences between:
- concatenate all, single hash execution
- concatenate all simple data types, hash them + hash blobs/clobs separately
- hash every datatype
and whatever other permutations might suit your needs.