Skip to Main Content
  • Questions
  • hashkey implemented as a virtual column

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Rodney.

Asked: March 06, 2020 - 12:12 pm UTC

Last updated: September 03, 2020 - 2:59 am UTC

Version: Oracle 18c Standard Edition

Viewed 1000+ times

You Asked

First of all, I want to say that ASK TOM is my single greatest resource for questions about the Oracle database. I have learned so many valuable things from this site over the years and I just want to thank you for your efforts.

My question is in regards to implementing a hash value to facilitate avoiding lost updates. All of the examples I have ever seen create this as a "real" column that is maintained when inserting/updating a row. I am wondering if there are any drawbacks to implementing this as a virtual column such as in my script below. I am particularly wondering about performance issues with virtual columns. If I do not select the virtual column in my query, do I incur any overhead from it?

CREATE OR REPLACE PACKAGE my_security_pkg
AS
   FUNCTION hash_value(
      p_input_string_in IN VARCHAR2
      )
   RETURN VARCHAR2 DETERMINISTIC;
END my_security_pkg;
/

CREATE OR REPLACE PACKAGE BODY my_security_pkg
AS
   FUNCTION hash_value(
      p_input_string_in IN VARCHAR2
      )
   RETURN VARCHAR2 DETERMINISTIC
   IS
   BEGIN
      RETURN DBMS_CRYPTO.HASH(UTL_I18N.string_to_raw(p_input_string_in),DBMS_CRYPTO.hash_sh512);
   END hash_value;
END my_security_pkg;
/

DROP TABLE my_emp;
/

CREATE TABLE my_emp
AS( SELECT * FROM SCOTT.emp )
/

ALTER TABLE my_emp
ADD hashkey      VARCHAR2(4000)
GENERATED ALWAYS AS(
   CAST(MY_SECURITY_PKG.hash_value(TO_CHAR(empno) || '|' || ename || '|' || job || '|' || TO_CHAR(mgr) || '|' || TO_CHAR(hiredate, 'mm/dd/yyyy') || '|' || TO_CHAR(sal) || '|' || TO_CHAR(comm) || '|' || TO_CHAR(deptno))
   AS VARCHAR2(512))
   );
/



with LiveSQL Test Case:

and Chris said...

Thanks!

If I do not select the virtual column in my query, do I incur any overhead from it?


No. The database only evaluates it when you access it. So if you don't select it, it's not executed.

You can see this by adding dbms_output (or similar) calls in your function:

CREATE OR REPLACE PACKAGE BODY my_security_pkg
AS
   FUNCTION hash_value(
      p_input_string_in IN VARCHAR2
      )
   RETURN VARCHAR2 DETERMINISTIC
   IS
   BEGIN
      dbms_output.put_line ( 'CALLED!' );
      RETURN DBMS_CRYPTO.HASH(UTL_I18N.string_to_raw(p_input_string_in),DBMS_CRYPTO.hash_sh512);
   END hash_value;
END my_security_pkg;
/

select * from my_emp
where  empno = 7369;

EMPNO   ENAME   JOB     MGR    HIREDATE               SAL   COMM     DEPTNO   HASHKEY                                                                                                                            
    7369 SMITH    CLERK       7902 17-DEC-1980 00:00:00       800    <null>        20 FB376528D96E948CC31460B5508200AB2E86EB3EC98F355A4967F8E034F3206356A044B80960A0ED680A336E613BF2AA55C1ACC190DB084094344E28333082A4    

CALLED!

select ename from my_emp
where  empno = 7369;

ENAME   
SMITH   


So it's efficient in this respect.

The downside is you have to evaluate it every time you select it. Usually you query data more often than you write it. For expensive functions you call repeatedly, this can add up.

A big upside of the virtual column is it can never be out-of-sync with the other columns. So there no need for triggers/constraints/etc. to ensure people always update it when updating other columns.

You'll only need to select the hash value in forms where people can change the data. Often a small subset of the queries you'll run against the table.

So personally I'd start with a virtual column. And switch to a real one if the performance overheads are too great.

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Thanks!

Rodney, March 06, 2020 - 4:23 pm UTC

Thanks for your insight. This early in design I just hate to build in such a fundamental flaw that has to be dealt with later. The virtual column seemed like the way to go and I've tested it with my forms for single-row maintenance and there are no issues at all as far as performance (which I expected). I just wasn't sure if there were unexpected consequences waiting for me down the road. Your insight is a confidence boost. Thanks again!
Chris Saxon
March 09, 2020 - 10:20 am UTC

Happy to help :)

Warning and suggestion

Stew Ashton, March 10, 2020 - 4:16 pm UTC

Warning: you are converting numbers and dates to strings without explicitly including NLS parameters. If your session-level NLS parameters change, the hash value will change. You do not want this!

Suggestion: use STANDARD_HASH instead of DBMS_CRYPTO.

Second suggestion: don't convert to strings in the first place. Hash each column directly with STANDARD_HASH, then concatenate the hashes using UTL_RAW.CONCAT, then hash the hashes.

Best regards,
Stew
Chris Saxon
March 10, 2020 - 5:58 pm UTC

Great point on NLS settings, definitely something to be explicit about.

What about LOB columns?

Rodney, September 01, 2020 - 12:35 pm UTC

Following up my original question... if my table contains a BLOB or CLOB column how would you revise handling the hash and would that change my choice of hash function to use (STANDARD_HASH, DBMS_CRYPTO, etc.)? I am tangled up in the choice of when and where to convert data and what to convert it to in order to include a BLOB/CLOB.
Connor McDonald
September 02, 2020 - 12:17 am UTC

Easy answer there. STANDARD_HASH is out

SQL> create table t ( x clob );

Table created.

SQL> select standard_hash(x) from t;
select standard_hash(x) from t
                     *
ERROR at line 1:
ORA-00902: invalid datatype


Sorry, I wasn't clear

Rodney Bailey, September 02, 2020 - 11:34 am UTC

I'm sorry, my last follow up question was too brief. I should have provided more details into my issues.
In my example, I based my approach on converting the row data to VARCHAR2 as a basis:

ALTER TABLE my_emp
ADD hashkey      VARCHAR2(4000)
GENERATED ALWAYS AS(
   CAST(MY_SECURITY_PKG.hash_value(TO_CHAR(empno) || '|' || ename || '|' || job || '|' || TO_CHAR(mgr) || '|' || TO_CHAR(hiredate, 'mm/dd/yyyy') || '|' || TO_CHAR(sal) || '|' || TO_CHAR(comm) || '|' || TO_CHAR(deptno))
   AS VARCHAR2(512))
   );
/


and my function receives and return VARCHAR2:

CREATE OR REPLACE PACKAGE BODY my_security_pkg
AS
   FUNCTION hash_value(
      p_input_string_in IN VARCHAR2
      )
   RETURN VARCHAR2 DETERMINISTIC
   IS
   BEGIN
      RETURN DBMS_CRYPTO.HASH(UTL_I18N.string_to_raw(p_input_string_in),DBMS_CRYPTO.hash_sh512);
   END hash_value;
END my_security_pkg;
/


So now I introduce a column that is either a CLOB or a BLOB. What is the best approach to hashing my record? Should I hash the CLOB/BLOB by itself and simply include the hash in my concatenation and then hash the entire record? Also, looking at my choice of VARCHAR2 for the parameter into my hash_value FUNCTION I'm thinking I should switch to CLOB or maybe at least overload it to handle if my concatenation ever gets beyond 4000 characters.
Finally, one last question... DBMS_CRYPTO.HASH returns RAW. After writing the original code in the question I noticed that I did not convert raw to varchar2 in my function but it had been working. I assume Oracle has been implicitly converting for me. I added in an explicit conversion to avoid problems:

RETURN UTL_I18N.raw_to_char(DBMS_CRYPTO.hash(UTL_I18N.string_to_raw(p_input_string_in),DBMS_CRYPTO.hash_sh512));


Before I would get readable alphanumeric characters returned, but afterwards would get unreadable characters. Can you explain what is happening? Here is basically that change written as a SELECT:

SELECT
UTL_I18N.raw_to_char(DBMS_CRYPTO.hash(UTL_I18N.string_to_raw('Some string'||'Another string just to add some length'||TO_CHAR(SYSDATE, 'mm/dd/yyyy hh24:mi:ss')||TO_CHAR(53)),6))
FROM dual


Thanks for taking the time to review and answer my follow up. I get so much valuable insight from your site and I cannot imagine doing my job without having it as a resource.
Connor McDonald
September 03, 2020 - 2:59 am UTC

DBMS_CRYPTO.hash can take as inputs

raw
clob
blob

so for the latter 2 there isn't really a need to do any casting to raw. But since you are going for a combination of data types, you'd really need to take the "superset" which would be blob.

The risk there is the overhead of building a new blob from your existing blob + other columns. I would recommend a simple benchmark which would test the differences between:

- concatenate all, single hash execution
- concatenate all simple data types, hash them + hash blobs/clobs separately
- hash every datatype

and whatever other permutations might suit your needs.


More to Explore

SQL

The Oracle documentation contains a complete SQL reference.