Home>Question Details



Udi -- Thanks for the question regarding "Creating a unique HASH value for the contents of a table", version 11.1.0.7

Submitted on 21-Oct-2009 13:51 Central time zone
Last updated 9-Nov-2009 11:57

You Asked

Hi Tom,

We run a large data warehouse with giant tables that have to be copied to multiple destination databases. We copy them via db-link / export-import, etc.

We need a dependable mechanism that would QA and ensure that the copies are precisely identical to the source table.

Ideally - we would like some kind of a HASH function that gets as parameter a table name and returns a unique number that's a product of the data in that table. It needs to do that consistently, of course, and then - by comparing the HASH values of a source table and one of its copies - you will be able to ascertain that they are equal.

Even a function that returns a consistent HASH value for one row is useful - we can then add the values for all the rows in the table - and identical sums would indicate identical tables on both sides with a high degree of confidence.

Ideas ? On Teradata there's an efficient builtin HASH function that gets as parameter a list of columns (typically the keys, or all the columns of a table) and returns this value for every row.

Any Oracle equivalent? Or any other slick mechanism in Oracle to compare 2 tables to see if they are precisely identical?

We are talking tables with billions of rows, so efficiency of the mechanism is of the essence.

Thanks,

Udi

and we said...

well, this sort of thing always surprises me

- if you use dblinks, you know the two are the same because you just did SQL - you either a) succeed, b) get an error message

- if you use datapump or even exp - you are hopefully using transportable tablespaces

there would be no need to compare a thing - especially in the case of transporting. Since you MUST be using the same charactersets on source and target system (else the data would not compare!) - you can absolutely use transport.

You would load on system A (and index if you want)
and then transport (copy datafiles) to system B

done - no compare needed.

You have as manual tools if you want (but you DO NOT WANT - really - just transport, you cannot lose a single row during a transport)

o dbms_rectifier_diff
http://download.oracle.com/docs/cd/B28359_01/server.111/b28327/rarrectdifpac.htm#i94795


o ora_hash( c1||'/'||c2||'/'|| ... , .... )

http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions112.htm#sthref1780


o dbms_crypto.hash

http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_crypto.htm#i1002022




but again, the fastest way to do something is TO NOT DO IT and if you transport or use dblinks - I fail to see why you would even consider doing it

heck, even with export/import - as long as you review the logs - you are done there as well.


for billions of rows (I assume that is pretty big, billions of rows is sort of MEANINGLESS when you think about it. How big is a billion rows? Sort of like asking "how long is a piece of string" - until you measure it, you don't know) I would not be looking at verifying after the fact - I would be looking at using techniques that make it so I don't have to (transporting using data pump!)
Reviews    
5 stars The fastest way...   October 21, 2009 - 5pm Central time zone
Reviewer: Dana from Phoenix, AZ USA
"We run a large data warehouse with giant tables that have to be copied to multiple destination 
databases."
I really like the answer: "the fastest way to do something is TO NOT DO IT"
Why would you really need multiple large data warehouse tables copied in multiple databases? 


5 stars Thanks - it's not so simple   October 21, 2009 - 6pm Central time zone
Reviewer: Udi Karni from LA
Tom - thanks for the HASH links - sounds like that's the way to go. However - the criticism over 
our need to do this is misplaced. I was oversimplifying. Sorry to report that Oracle's Data 
Warehouse toolbox needs some beefing up .

(1) DB-Links - are the most convenient mechanism. You get the full power of SQL - you can 
add/remove columns, etc. But DB-Links are serial, do NOT parallelize (why ???), and are very CPU 
intensive (why ???). To move a Terabyte table between 2 multi-CPU servers you need to kick off as 
many single-threaded DB-Link jobs as there are CPUs, maybe one partition per job, "manually 
parallelizing" in order to max the pipe between the servers. Many jobs ==> very easy to miss or 
double transfer.

(2) Datapump - at the cost of making an intermediate flat file copy - this would seem like a good 
option (though not as flexible as DB-Link SQL). A single parallel job that does it all. But guess 
what - parallel datapump export/import of a multi-terabyte entire schema/tablespace simply doesn't 
work. Try to export 20 Terabytes in parallel - and you will encounter some very unpleasant 
ORA-0600s.

(3) Transportable tablespaces - you're right - it's quite "safe" - but again - you get overwhelmed 
by the numbers. Tens of Terabytes mean hundreds of Oracle datafiles to be moved. And reconnected. 
Metadata extracted. And reapplied. Quite the busy scene.

With careful scripting - you can make it happen. But executives are funny - between HIPAA, and SOX 
- they demand double and triple audits to ensure that the copy on server "B" is EXACTLY identical 
to that on server "A". And we just work here and need to deliver.

So - a magical function that will hash an entire table, or at least an entire partition - to serve 
as a "proxy" for the contents of the table/partition is crucially useful. Absent that - we will try 
and construct something row-based per your suggestions.

BTW - when I said "billions of rows" I was implying that process-intensive solutions like 
"MINUS-ing" 2 copies of a 
2-billion row 1 Terabyte table over a DB-Link looking for a "null set" result - is not a practical 
option.

And yes, I know, if our entire universe resided in a single RAC database - there will be no need to 
move data between DBs. 

But those same executives are reluctant to put all the eggs in one RAC basket and demand multiple 
machines/DBs. Hence the need to make copies.

Thanks for the kick start as always.

Udi


Followup   October 23, 2009 - 1pm Central time zone:

(1) so what? Not sure what you are saying there - because I didn't say "don't use dblinks"

I said "hey, if you are using a dblink and you say "insert this data over there", it will do it and you have procedural access to the results and you know what happened.

(2) but I would never even consider doing that.

(3) so? I don't know why you would move 10's of terabytes (sounds like you should be doing your tuning at your OPERATIONAL LEVEL, not in the database, why move that which need not be moved)

but even so, so what - who cares how many files there are. If you have 10's of terabytes and you unload it - I'll bet you have........

a lot of files - don't you.




Use transportable tablespaces, nuff said. You can use as large a file as you want, you could use bcv splits or whatever to make instance copies, you could use your backups (you can backup a tablespace on one system and restore to another)

think about it.
5 stars It's really not so simple   October 21, 2009 - 6pm Central time zone
Reviewer: Udi Karni from LA
Dana - you raise a good point - but we've learned this the hard way.

When you have a REALLY LARGE data warehouse you start hitting the limits of physics and 
scalability. Last year our warehouse required 500 CPUs and a ton of IO. This year we run on 1000 
CPUs and next year 2000 CPUs, etc., with IO doubling correspondingly.

They simply don't sell multi-thousand CPU servers in your friendly vendor's catalog. And even if 
you manage to put one together - you can't just order an expansion box to double its capacity every 
year. At some point - you simply run into the law of seriously diminishing returns.

So - the only option (unless you have billion dollar budgets to order something completely custom) 
is to buy the largest commercially available servers - typically with no more than 64 or 128 CPUs - 
and just get a number of them, with multiple copies of your DB. 

You can try to pursue a single DB RAC - but it's really hard to build an IO backbone powerful 
enough to support a RAC of 1000 CPUs.

At some point - you're forced to cut it up into a few identical copies that must be maintained... 
perfectly identical. 

It's a completely different paradigm than datamarts.


Followup   October 23, 2009 - 1pm Central time zone:

you sound like a serious candidate for exadata - push the database down to the storage level itself.

Have you heard of it?
Have you looked at it?


5 stars Duplication of data warehouse data   October 22, 2009 - 2pm Central time zone
Reviewer: Dana from Phoenix, AZ USA
Udi, 

I don't disagree with your premise. And your solutions show hard work and creativity.
I do wonder if the premise is workable.
In my experience, business abhors unused capacity, like nature abhors a vacuum. 
 
It would be very interesting to see the ROI statement requiring that much capital outlay, developer 
and administrative support. Does the ROI double each year like the resource requirement? 

The proposed solutions are intriguing, and useful. I just don't know I can say the same about the 
requirements. 

Re-write of SQL "could" reduce requirements so the original outlay of resources works. I believe 
SQL techniques advance faster than hardware. That's because I've been reading AskTom for so long. :)


5 stars Tom - you're making my point...   October 23, 2009 - 4pm Central time zone
Reviewer: Udi Karni from LA
... any solution means lots of detail. "bcv" is probably the safest because it's SAN hardware 
based, but SAN is a poor solution for data warehouses because it derives its performance from its 
cache which is typically too small and insignificant for large data warehouses.

This is why the smart people that built Exadata (which we are looking at) - who know a thing or two 
about data warehouses - are basically using "direct attached JBOD" in their storage cells.

As for why we're making all these copies - as I explained - we have too much processing for a 
single commercial machine to handle, so we have to split the load. We also don't have a Los Alamos 
budget to custom order a 10,000 CPU supercomputer, nor are there very many multi-thousand CPU 
Oracle RAC installations out there for us to learn from their mistakes (we checked).

The point is - making the copies involves a lot of details - and the executives aren't interested 
in technicals. Patients' lives are at stake and they want to feel confident that the LAB table on 
server A is precisely identical to the LAB table on server B.

They are business people - and in their world 2 tables are identical if their counts are the same, 
or the sum of their numeric columns is equivalent, or table A minus table B = table B minus table A 
= (null set). Stuff like that.

So - while they trust that we have taken every precaution - they also expect us to run these types 
of audits when all the loading is done - and generate a clean bill of health.

So - an Oracle builtin function that will return some kind of "number" that uniquely and 
consistently represents the data contents of a table or partition - would be very handy for us - 
because we would just run this function on the various copies and make sure the "number" for a 
given table on all the copies is the same.

Absent such a function - we'll have to continue to count, and sum, and build something ourselves - 
maybe using ORA_HASH - but it will definitely be much less efficient than an Oracle supplied 
"ORA_HASH on steroids" function.

Before we go all out - I wanted to pick your brain in the hope we can spare ourselves all this 
development.

Makes sense?

Udi


Followup   October 26, 2009 - 1pm Central time zone:

... we have too much
processing for a single commercial machine to handle, ...

probably not with exadata - which is commercial.


And if you have a hard time handling the load NOW, why would you want to do this hashing (read that as "extremely cpu intensive thing) for all of the data (read that as excessive, unnecessary IO).

If you have the money to finance ALL of the redundant copies
And their hardware
And your time
And your design efforts

than you probably have twice as much money as you need to right size a machine, with technology (such as exadata) that can scale it out.

... Patients' lives are at stake and they want to
feel confident that the LAB table on server A is precisely identical to the LAB
table on server B.
...

and now I'm really afraid, for you have introduced huge latencies into the dispersal of really important data and you have synchronization issues.


... They are business people - and in their world 2 tables are identical ..

and (i've written this before) it is our job as TECHNOLOGY PEOPLE to tell the BUSINESS PEOPLE "we don't tell you how to accomplish your task, you don't tell us how to accomplish ours - we provide a service, you tell us what needs be in place and we do it". The cost what of they propose is the Los Alamos budget you were talking about.


.. when all the loading is done ... and if you do it the way I suggest (transport), there is no loading, hence - you are done, at 1% of the cost (no load, no unload, no reindex, no compare)

.... an Oracle builtin function that will return some kind of "number" that
uniquely ...

as long as you are aware that when you hash an infinite number of inputs into a finite space - the word UNIQUE cannot be used - right - you know that unique doesn't come into play here.

... I wanted to pick your brain in the hope we can spare
ourselves all this development.

...


and I'll keep telling you "TRANSPORT", then you are done


It was making sense until you keep going that you need to do this, because you do not.


5 stars I'd like to invite you to the trenches.....   October 30, 2009 - 6pm Central time zone
Reviewer: Udi Karni from LA
Tom - this is all good advice - if life were only so simple. In corporate environments you can't 
always do what's technically "correct".

(1) Exadata is great - but if it's an "XYZ" shop and there's reluctance to buy HP/Sun hardware - 
there won't be any Exadata.

(2) Transportable TBS are great - but they involve moving datafiles across machines - for which you 
need system permissions, that are not always easy to obtain. With DB-Links you create a "secure 
tunnel" between DBs that is under your control and allows you to get the work done without 
mountains of red tape - even if DB-links have their challenges.

(3) Plus, if you have many tables that get updated throughout the day, and you want to propagate 
them as soon as they are ready - putting each one in a tiny tablespace and shipping it across - is 
using an 18-wheeler to transport an evelope. Transportable tablespaces were meant to transport an 
entire system or schema, not one table. Yes you can transport one table or one partition - but the 
overhead of the process, I hope you will agree, is way greater than just saying "insert into 
destination@db-link select from source".

(4) And I particularly like your advice about telling the Business Bosses "you don't tell me how to 
accomplish my job...". In the world of HIPAA and SOX - when execs are facing 25 years in the 
slammer for data inequities - they WILL tell us which audit procedures they are comfortable with - 
or WE will spend 25 years DBA'ing in the north pole outpost. Come on.

Life isn't perfect, but we still manage to run a 100 Terabyte data warehouse, and since we can't 
get the one machine to do it - we make copies across a bunch of machines, spread the load across 
them, and make it work that way. Yes, it adds up to 1 Petabyte of spinning Oracle copies. Yes, 
there's the propagation overhead. Yes, there are QA issues. But overall - not too many other 
realistic options for us. Trust me - we've considered many.

Our size doubles every year. It's not easy to get a new giant machine every year that's 2x its 
predecessor. It is, however, easy to get new smaller machines and upgrade the copies one-by-one, or 
get a few extra ones to accomodate a surge in growth.

So - I very much appreciate your thoughts and suggestions. We could, however, use a function that 
"represents" the contents of a table or partition. If you think about it - a terabyte table is 
really an 8-trillion-bit string - so there's your uniqe representation. I was hoping, though, that 
the genious developers in Oracle could find a way to compress the 8 trillion bits into something 
more compact, just as WinZIP compresses large files to very small ones without losing contents. 
Then we could compare source and destination strings and verify that they are equal. Will it use up 
cycles? no doubt. But better expend some cycles than spend some long winters in our north pole 
facility. I like California better.

Thanks in any event for your time and thoughts.


Followup   November 9, 2009 - 11am Central time zone:

I live in the trenches? I work in a large company. And I violently disagree with your opening statement.

violently. hugely. 100%


It is your job to actually educate the business. If you sort of told them the cost benefit - you and your team, you know, the people that actually know the technology - I have a feeling it would be different.

and you know what, if you are forced to work in a place where you are doing things you know to be utterly stupid, wasteful, the wrong way - how does that make you feel at the end of the day? Are you happy with your current place?


(1) so what, but these same people would consider a netezza or a teredata proprietary solution.

and exadata is not just hp, it is a san you can connect to things. sun is a huge player as well.

(2) so what, so what, so what. period - all I say to you is "so what". Do you have any technical points to make? I made quite a few.


as for your secure tunnel. Hmm, I hope you are using ASO or some other encryption technology. And it is as secure as moving files would be. If you do things in a professional manner.


dblinks not only have their challenges - but for data moves ( which frankly, as stated, SHOULD NOT BE DONE IN THE FIRST PLACE - the first "not smart" thing we are doing here is copying a ton of data - not smart) of this size are about the slowest way to do it. Think about that, think about that cost. 'Business' is about cost and getting things fast - you are doing it the most costly slow way possible.


(3) you didn't even say you were doing that, if you change the problem, then I will change the answer.

we've been talking about using things like BCV's to move tons and tons of data - not a trickle feed.

Fine, you want to trickle feed just changes - use streams. Done - but that doesn't make your need to move tons of data go away and we are back to either

(a) DO NOT DO IT (that is best)
(b) DO IT RIGHT using transports.


(4) you again need to educate, tell them how insecure what you are doing is - if what you say is true, copying the data all over the place is the LAST THING THEY WANT. give me a *break*.




... but we still manage to run a 100 Terabyte data warehouse,
...

that is a medium, almost small, database these days, you are doing what hundreds of thousands of people do.


And you could get a machine to do a lot more.


... But overall - not too many other realistic
options for us. Trust me - we've considered many.
...

I disagree. You know, as an Oracle shareholder, I should be promoting this - big time. How much money are you burning to copy this to so many machines.

But I'm not, I'll persist in pushing the right agenda, not giving a stamp of approval to a really bad idea.


... I was hoping, though, that the genious
developers in Oracle could find a way to compress the 8 trillion bits into
something more compact, ...

you know what, we can, you just have to whip through 8 tillion bits first. Your analogy with winzip causes me huge confusion - I see not a single parallel??

And you would hope the genius developers where you are - could do this smarter?



5 stars Beware of hash cracking   November 2, 2009 - 5am Central time zone
Reviewer: Another reader from Germany
As Tom Kyte already pointed out, a hash mathematically is a mapping of a huge set of possible 
values to a smaller set. It cannot be unique because if it were it would need the same amount of 
space to represent the same cardinality.

So your bosses who are afraid of going "to the slammer" bet on statistics: they hope you will not 
hit a collision of your hashes in your lifetime. This sounds too optimistic considering 100 TB of 
data. Let us assume you have 256-bit hashes which are accepted as "safe". 256 bit = 32 bytes. Let 
us further assume you map one 1-TB-table's contents to one hash value. This means 1,099,511,627,776 
bytes map into a 32 byte value. So you will have guaranteed collisions

power (1099511627776/32, 2) = 2^34,359,738,368 (around 10^10,343,312,041)

times! And your bosses rely on hash values? Hmm. Maybe mathematics may convince them if sound 
advice cannot.


5 stars Beware of hash cracking   November 2, 2009 - 5am Central time zone
Reviewer: Another reader from Germany
As Tom Kyte already pointed out, a hash mathematically is a mapping of a huge set of possible 
values to a smaller set. It cannot be unique because if it were it would need the same amount of 
space to represent the same cardinality.

So your bosses who are afraid of going "to the slammer" bet on statistics: they hope you will not 
hit a collision of your hashes in your lifetime. This sounds too optimistic considering 100 TB of 
data. Let us assume you have 256-bit hashes which are accepted as "safe". 256 bit = 32 bytes. Let 
us further assume you map one 1-TB-table's contents to one hash value. This means 1,099,511,627,776 
bytes map into a 32 byte value. So you will have guaranteed collisions

power (1099511627776/32, 2) = 2^34,359,738,368 (around 10^10,343,312,041)

times! And your bosses rely on hash values? Hmm. Maybe mathematics may convince them if sound 
advice cannot.


5 stars That's a little extreme.....   November 2, 2009 - 1pm Central time zone
Reviewer: Udi karni from LA
Reader from Germany - you raise a very good point, but I think we have a little lattitude here.

Maybe take uptime as a guide - executives equate a system with 5 "9"s to a system that "never goes 
down" and make such promises to the board, stockholders, and public.

A 5 "9"s system - if I did my math right - is a system that doesn't go down for more than 30 
seconds out of the 30 million seconds in a year - so 1 out of a million is considered an acceptable 
error.

A hash function that can reduce the 8 trillion bits of a 1 trillion byte (1 Terabyte) table by a 
factor of 1 million - down to an 8 Mbyte string - might just be an acceptable error. You compare 
two 8 Mbyte hash strings and if they are equal - you assume the entire 1 Terabyte is equal.

We do the same when comparing DNA, when taking a thumb print or a retina scan and allowing access - 
I doubt the computer checks every pixel - there must be some "formula" derived from the picture and 
compared.

So - there has got to be some reasonable way to do this. Maybe a function with a throttle parameter 
- the more accuracy you want - the bigger the string will be, and then the users can make an 
educated determination of how far they want to extend it.

I am not an expert on this, but in my humble opinion - in the world of data warehouse - you often 
end up with large objects that are copies of each others - and it would be very useful to have a 
function that could quickly tell you whether they are identical - without comparing every bit.


5 stars Comparison Exactness   November 2, 2009 - 2pm Central time zone
Reviewer: Another Reader from Germany
Hi Udi,

I just wanted to show the numbers to give you a helping hand when arguing with management ;-)

When Oracle compares two tables and finds they *are* the same they *are* the same with regards to 
their use by the appropriate application layer. Bit by bit, however, not necessarily on the 
physical layer (one disk may have a different representation of "1" and "0" than another) but with 
respect to "select * from tab;".

And here comes the point: when you use transportable tablespaces you have them read-only. So you 
can mount them read-only from several sides *and* you can ensure they *are* the same, not just the 
same with regards to the equivalence class (mathspeak) of the same hashvalue.

So by transportable tablespaces you might provide an even "better equality" than by comparing 
hashvalues. What about that?


5 stars You are right...   November 6, 2009 - 5pm Central time zone
Reviewer: Udi Karni from LA
Another reader from Germany - you are right, and I should clarify 2 things -
(1) I completely agree that transportable tablespaces are a great way to both move the data and 
guarantee equality
(2) when discussing the 8 trillion bits I was speaking metaphorically - of course the hash string 
needs to represent the data from a user's perspective - put the 2 tables "side-by-side" and every 
column of every row must have the same value.

As I stated before -

The problem with transportable tablespaces is just that it's really meant for shipping large chunks 
- whole tablespaces, entire schemas. You could of course put a tiny table in a tablespace and ship 
it - but it's not exactly how it was meant to be used. In our environment - parts of the tables get 
updated at different times of the day (east coast earlier than west coast)and need to be propagated 
immediately - so a DB-link transfer in a SQL script is a very easy way to ship one chunk of a table 
that has just been updated, while the transportable tablespace is a little more cumbersome.

Also - transportable tablespaces require jumping between an Oracle process that loads the data into 
the tablespace and a system process that extracts the metadata, moves the datafiles, loads, 
connects on the other side, etc. In a highly secure shop - this requires much more setup and 
permissions, and is a lot less flexible than a db-link. So - the ability to get things done quickly 
without involving other groups (network, system) gets a few points as well.

Oracle obviously has a ton of features, and every customer uses the ones that work for them. 
Sometimes I am just amazed at how resistant Oracle is to adding features even if they are only 
useful to a small segmment of the user base. 

Examples that come to mind - 
(1) the ability to rename a tablespace - a seemingly "nothing-to-it" feature that for many long 
years wasn't available.
(2) the lack for all these years of an "IsNumeric" builtin function (Googling for it still brings 
up Tom Kyte's you-code-it-yourself workaround). All kinds of stuff that exists in SQL Server - that 
brings joy to no end to the Microsoft crowd - even though theirs, ultimately, is an inferior 
product. 

In any event - thanks for your feedback !


5 stars Simply agree...   November 18, 2009 - 4pm Central time zone
Reviewer: Dana from Phoenix, AZ USA
This very much sounds like a presentation given several years ago by an esteemed member of the db 
community. 
He owned a consulting company where he solemnly agreed with everything the customer wanted.
Need 2X the CPU... Great!
Need 10X the disk... Even better!!!
Need 100X the memory...  That'll do it!!!

Sadly for the premise, it was completely with tongue firmly in cheek. But talk about funny!



Write a Review
 


All information and materials provided here are provided "as-is"; Oracle disclaims all express and implied warranties, including, the implied warranties of merchantability or fitness for a particular use. Oracle shall not be liable for any damages, including, direct, indirect, incidental, special or consequential damages for loss of profits, revenue, data or data use, incurred by you or any third party in connection with the use of this information or these materials.

About Oracle | Legal Notices and Terms of Use | Privacy Statement