Skip to Main Content

Breadcrumb

Question and Answer

Chris Saxon

Thanks for the question, Igen.

Asked: September 15, 2016 - 6:17 pm UTC

Last updated: January 06, 2021 - 5:01 pm UTC

Version: Oracle 11.2

Viewed 10K+ times! This question is

You Asked

Hi Tom,
I am attempting to assign unique value to a an expression value which is distinct and derived from the concatenation of multiple fields.
Here's the usage:

Query1:

create table Table2 parallel 16
as
select /*+ parallel(a,16)*/ distinct col1, col2,col3,col4,col5 from Table 1 a;

Query2:
create table Table3 parallel 16
select /*+ parallel(a,16) */a.*, ora_hash(col1||col2||col3||col4||col5,1,4294967295,0) as key
from Table2;


Query3:
select key, count(1) from Table3 group by key having count(1) > 1;

1. When working with table1 having less number of records ( ~20k rows) the Query3 doesnt return any rows which implies that ORA_HASH is generating a unique value for the concatenated value(col1||col2|col3||col4||col5)

2. When working with a higher volume ( ~ 700 million rows) the Query3 returns rows which impiles that ORA_HASH is assigning the same value to different cancatenated field values. Hence there are value collisions using ORA_HASH.

can ORA_HASH be used in a way to make it work? Any other function we can use to achieve this? We need the key value to be numeric.
Have tried DBMS_UTILITY.get_hash_value (col1||col2||col3||col4||col5, 1, power(2,31)-1) as well but facing the same issue.

Any input is appreciated!

Thanks






and Connor said...

Can you try this first - rather than

ora_hash(col1||col2||...)

try

ora_hash(col1||'-'||col2||'-'||...)

to make sure that it is not a simple cross-column value overlap issue.

If that doesn't work, perhaps explore the hash functions in dbms_crypto

Hope this helps.


Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Generation of UNIQUE hash value

Johny, April 12, 2017 - 9:10 am UTC

Hi Tom,

I have tried to create a table with unique hash values, but somehow it creates duplicate values in my table.

Could you help to create a unique hash value ( You can use any dirty tricks ) column, so that our datawarehouse environment will not fall over with “UNIQUE KEY VIOLATION”.

See the test script I have created.
( I am using 3 ways to insert the row.
Thanks in advace


--- Table Creation Script.


create table xxx_test_tab(tno number )

--- Pl/Sql block to populate the rows.

begin
for i in 1..1000000
loop
insert into xxx_test_tab values (ORA_HASH ('Steve' || '~' || 'Johny' || '~' || 'Alex'||'~'|| to_char (to_char ( SYSTIMESTAMP, 'SSSSSFF9')/77777 )||'~'||dbms_random.random ) );
-- insert into xxx_test_tab values (ORA_HASH('Steve' || '~' || 'Johny' || '~' || 'Alex'||'~'|| TO_CHAR(SYSTIMESTAMP, 'SSFF9' ) ));
-- insert into xxx_test_tab values ( DBMS_UTILITY.GET_HASH_VALUE ('Steve' || '~' || 'Johny' || '~' || 'Alex'||'~'|| to_char (to_char ( SYSTIMESTAMP, 'SSSSSFF9')/77777 )||'~'||dbms_random.random ,1 ,2000000000 ) );
end loop;
commit;
end;

--- Checking duplicates

select tno, count(*) from xxx_test_tab group by tno
having count(*) > 1

--- Truncate the table after usage

truncate table xxx_test_tab


--- Drop the table after usage

Drop table xxx_test_tab purge


Connor McDonald
April 13, 2017 - 1:55 am UTC

You dont know if those are duplicates are not - try this

create table xxx_test_tab(
  val varchar2(100), tno number );
  
  
insert /*+ APPEND */ into xxx_test_tab
with vals as 
 ( select 'Steve' || '~' || 'Johny' || '~' || 'Alex'||'~'|| to_char (to_char ( SYSTIMESTAMP, 'SSSSSFF9')/77777 )||'~'||dbms_random.random  v
   from dual connect by level <= 1000000
 )
select 
  v, ora_hash(v)
from vals;

commit;

select *
from 
  ( select val, count(*) over ( partition by tno ) as hash_dup
    from xxx_test_tab
  )
where hash_dup > 1;



Having said that - dont forget that hash values are exactly that ... hash values. They are *nearly* unique, not *guaranteed* unique.

But the demo above - see how you go.

Sr. Oracle Specialist

Kevin, January 06, 2021 - 4:10 pm UTC

your problem is simple. You somehow expect a HASH to produce a unique value. That is not what HASH does. Every hash algorithm has the potential to produce duplicate hits for different values. That is the way hashing works. So your design of trying to use HASH for uniqueness is flawed at the start.

You will never get a unique value from a hash algorithm so stop trying and go back to the design of your system and fix the elemental flaw you started with.

I am sorry if this sounds harsh but it is the simple truth. What you want to do will never work.

I see people misusing hashing all the time and every time I explain the flaw they don't want to hear it cause it means there is a big bug in their system. Do yourself a favor and rethink what your are doing.

Good luck. Kevin
Chris Saxon
January 06, 2021 - 5:01 pm UTC

Well it depends on the hashing algorithm!

Collisions are always possible, but SHA-256 is effectively unique. But ora_hash tops out at 4294967295 values which is nowhere near enough to avoid duplicates on non-trivial data sets.

More to Explore

Hints

Need more information on hints? Check out the complete hint descriptions