The fastest way...
October 21, 2009 - 5pm Central time zone
Reviewer: Dana from Phoenix, AZ USA
"We run a large data warehouse with giant tables that have to be copied to multiple destination
databases."
I really like the answer: "the fastest way to do something is TO NOT DO IT"
Why would you really need multiple large data warehouse tables copied in multiple databases?
Thanks - it's not so simple
October 21, 2009 - 6pm Central time zone
Reviewer: Udi Karni from LA
Tom - thanks for the HASH links - sounds like that's the way to go. However - the criticism over
our need to do this is misplaced. I was oversimplifying. Sorry to report that Oracle's Data
Warehouse toolbox needs some beefing up .
(1) DB-Links - are the most convenient mechanism. You get the full power of SQL - you can
add/remove columns, etc. But DB-Links are serial, do NOT parallelize (why ???), and are very CPU
intensive (why ???). To move a Terabyte table between 2 multi-CPU servers you need to kick off as
many single-threaded DB-Link jobs as there are CPUs, maybe one partition per job, "manually
parallelizing" in order to max the pipe between the servers. Many jobs ==> very easy to miss or
double transfer.
(2) Datapump - at the cost of making an intermediate flat file copy - this would seem like a good
option (though not as flexible as DB-Link SQL). A single parallel job that does it all. But guess
what - parallel datapump export/import of a multi-terabyte entire schema/tablespace simply doesn't
work. Try to export 20 Terabytes in parallel - and you will encounter some very unpleasant
ORA-0600s.
(3) Transportable tablespaces - you're right - it's quite "safe" - but again - you get overwhelmed
by the numbers. Tens of Terabytes mean hundreds of Oracle datafiles to be moved. And reconnected.
Metadata extracted. And reapplied. Quite the busy scene.
With careful scripting - you can make it happen. But executives are funny - between HIPAA, and SOX
- they demand double and triple audits to ensure that the copy on server "B" is EXACTLY identical
to that on server "A". And we just work here and need to deliver.
So - a magical function that will hash an entire table, or at least an entire partition - to serve
as a "proxy" for the contents of the table/partition is crucially useful. Absent that - we will try
and construct something row-based per your suggestions.
BTW - when I said "billions of rows" I was implying that process-intensive solutions like
"MINUS-ing" 2 copies of a
2-billion row 1 Terabyte table over a DB-Link looking for a "null set" result - is not a practical
option.
And yes, I know, if our entire universe resided in a single RAC database - there will be no need to
move data between DBs.
But those same executives are reluctant to put all the eggs in one RAC basket and demand multiple
machines/DBs. Hence the need to make copies.
Thanks for the kick start as always.
Udi
Followup October 23, 2009 - 1pm Central time zone:
(1) so what? Not sure what you are saying there - because I didn't say "don't use dblinks"
I said "hey, if you are using a dblink and you say "insert this data over there", it will do it and you have procedural access to the results and you know what happened.
(2) but I would never even consider doing that.
(3) so? I don't know why you would move 10's of terabytes (sounds like you should be doing your tuning at your OPERATIONAL LEVEL, not in the database, why move that which need not be moved)
but even so, so what - who cares how many files there are. If you have 10's of terabytes and you unload it - I'll bet you have........
a lot of files - don't you.
Use transportable tablespaces, nuff said. You can use as large a file as you want, you could use bcv splits or whatever to make instance copies, you could use your backups (you can backup a tablespace on one system and restore to another)
think about it.
It's really not so simple
October 21, 2009 - 6pm Central time zone
Reviewer: Udi Karni from LA
Dana - you raise a good point - but we've learned this the hard way.
When you have a REALLY LARGE data warehouse you start hitting the limits of physics and
scalability. Last year our warehouse required 500 CPUs and a ton of IO. This year we run on 1000
CPUs and next year 2000 CPUs, etc., with IO doubling correspondingly.
They simply don't sell multi-thousand CPU servers in your friendly vendor's catalog. And even if
you manage to put one together - you can't just order an expansion box to double its capacity every
year. At some point - you simply run into the law of seriously diminishing returns.
So - the only option (unless you have billion dollar budgets to order something completely custom)
is to buy the largest commercially available servers - typically with no more than 64 or 128 CPUs -
and just get a number of them, with multiple copies of your DB.
You can try to pursue a single DB RAC - but it's really hard to build an IO backbone powerful
enough to support a RAC of 1000 CPUs.
At some point - you're forced to cut it up into a few identical copies that must be maintained...
perfectly identical.
It's a completely different paradigm than datamarts.
Followup October 23, 2009 - 1pm Central time zone:
you sound like a serious candidate for exadata - push the database down to the storage level itself.
Have you heard of it?
Have you looked at it?
Duplication of data warehouse data
October 22, 2009 - 2pm Central time zone
Reviewer: Dana from Phoenix, AZ USA
Udi,
I don't disagree with your premise. And your solutions show hard work and creativity.
I do wonder if the premise is workable.
In my experience, business abhors unused capacity, like nature abhors a vacuum.
It would be very interesting to see the ROI statement requiring that much capital outlay, developer
and administrative support. Does the ROI double each year like the resource requirement?
The proposed solutions are intriguing, and useful. I just don't know I can say the same about the
requirements.
Re-write of SQL "could" reduce requirements so the original outlay of resources works. I believe
SQL techniques advance faster than hardware. That's because I've been reading AskTom for so long. :)
Tom - you're making my point...
October 23, 2009 - 4pm Central time zone
Reviewer: Udi Karni from LA
... any solution means lots of detail. "bcv" is probably the safest because it's SAN hardware
based, but SAN is a poor solution for data warehouses because it derives its performance from its
cache which is typically too small and insignificant for large data warehouses.
This is why the smart people that built Exadata (which we are looking at) - who know a thing or two
about data warehouses - are basically using "direct attached JBOD" in their storage cells.
As for why we're making all these copies - as I explained - we have too much processing for a
single commercial machine to handle, so we have to split the load. We also don't have a Los Alamos
budget to custom order a 10,000 CPU supercomputer, nor are there very many multi-thousand CPU
Oracle RAC installations out there for us to learn from their mistakes (we checked).
The point is - making the copies involves a lot of details - and the executives aren't interested
in technicals. Patients' lives are at stake and they want to feel confident that the LAB table on
server A is precisely identical to the LAB table on server B.
They are business people - and in their world 2 tables are identical if their counts are the same,
or the sum of their numeric columns is equivalent, or table A minus table B = table B minus table A
= (null set). Stuff like that.
So - while they trust that we have taken every precaution - they also expect us to run these types
of audits when all the loading is done - and generate a clean bill of health.
So - an Oracle builtin function that will return some kind of "number" that uniquely and
consistently represents the data contents of a table or partition - would be very handy for us -
because we would just run this function on the various copies and make sure the "number" for a
given table on all the copies is the same.
Absent such a function - we'll have to continue to count, and sum, and build something ourselves -
maybe using ORA_HASH - but it will definitely be much less efficient than an Oracle supplied
"ORA_HASH on steroids" function.
Before we go all out - I wanted to pick your brain in the hope we can spare ourselves all this
development.
Makes sense?
Udi
Followup October 26, 2009 - 1pm Central time zone:
... we have too much
processing for a single commercial machine to handle, ...
probably not with exadata - which is commercial.
And if you have a hard time handling the load NOW, why would you want to do this hashing (read that as "extremely cpu intensive thing) for all of the data (read that as excessive, unnecessary IO).
If you have the money to finance ALL of the redundant copies
And their hardware
And your time
And your design efforts
than you probably have twice as much money as you need to right size a machine, with technology (such as exadata) that can scale it out.
... Patients' lives are at stake and they want to
feel confident that the LAB table on server A is precisely identical to the LAB
table on server B.
...
and now I'm really afraid, for you have introduced huge latencies into the dispersal of really important data and you have synchronization issues.
... They are business people - and in their world 2 tables are identical ..
and (i've written this before) it is our job as TECHNOLOGY PEOPLE to tell the BUSINESS PEOPLE "we don't tell you how to accomplish your task, you don't tell us how to accomplish ours - we provide a service, you tell us what needs be in place and we do it". The cost what of they propose is the Los Alamos budget you were talking about.
.. when all the loading is done ... and if you do it the way I suggest (transport), there is no loading, hence - you are done, at 1% of the cost (no load, no unload, no reindex, no compare)
.... an Oracle builtin function that will return some kind of "number" that
uniquely ...
as long as you are aware that when you hash an infinite number of inputs into a finite space - the word UNIQUE cannot be used - right - you know that unique doesn't come into play here.
... I wanted to pick your brain in the hope we can spare
ourselves all this development.
...
and I'll keep telling you "TRANSPORT", then you are done
It was making sense until you keep going that you need to do this, because you do not.
I'd like to invite you to the trenches.....
October 30, 2009 - 6pm Central time zone
Reviewer: Udi Karni from LA
Tom - this is all good advice - if life were only so simple. In corporate environments you can't
always do what's technically "correct".
(1) Exadata is great - but if it's an "XYZ" shop and there's reluctance to buy HP/Sun hardware -
there won't be any Exadata.
(2) Transportable TBS are great - but they involve moving datafiles across machines - for which you
need system permissions, that are not always easy to obtain. With DB-Links you create a "secure
tunnel" between DBs that is under your control and allows you to get the work done without
mountains of red tape - even if DB-links have their challenges.
(3) Plus, if you have many tables that get updated throughout the day, and you want to propagate
them as soon as they are ready - putting each one in a tiny tablespace and shipping it across - is
using an 18-wheeler to transport an evelope. Transportable tablespaces were meant to transport an
entire system or schema, not one table. Yes you can transport one table or one partition - but the
overhead of the process, I hope you will agree, is way greater than just saying "insert into
destination@db-link select from source".
(4) And I particularly like your advice about telling the Business Bosses "you don't tell me how to
accomplish my job...". In the world of HIPAA and SOX - when execs are facing 25 years in the
slammer for data inequities - they WILL tell us which audit procedures they are comfortable with -
or WE will spend 25 years DBA'ing in the north pole outpost. Come on.
Life isn't perfect, but we still manage to run a 100 Terabyte data warehouse, and since we can't
get the one machine to do it - we make copies across a bunch of machines, spread the load across
them, and make it work that way. Yes, it adds up to 1 Petabyte of spinning Oracle copies. Yes,
there's the propagation overhead. Yes, there are QA issues. But overall - not too many other
realistic options for us. Trust me - we've considered many.
Our size doubles every year. It's not easy to get a new giant machine every year that's 2x its
predecessor. It is, however, easy to get new smaller machines and upgrade the copies one-by-one, or
get a few extra ones to accomodate a surge in growth.
So - I very much appreciate your thoughts and suggestions. We could, however, use a function that
"represents" the contents of a table or partition. If you think about it - a terabyte table is
really an 8-trillion-bit string - so there's your uniqe representation. I was hoping, though, that
the genious developers in Oracle could find a way to compress the 8 trillion bits into something
more compact, just as WinZIP compresses large files to very small ones without losing contents.
Then we could compare source and destination strings and verify that they are equal. Will it use up
cycles? no doubt. But better expend some cycles than spend some long winters in our north pole
facility. I like California better.
Thanks in any event for your time and thoughts.
Followup November 9, 2009 - 11am Central time zone:
I live in the trenches? I work in a large company. And I violently disagree with your opening statement.
violently. hugely. 100%
It is your job to actually educate the business. If you sort of told them the cost benefit - you and your team, you know, the people that actually know the technology - I have a feeling it would be different.
and you know what, if you are forced to work in a place where you are doing things you know to be utterly stupid, wasteful, the wrong way - how does that make you feel at the end of the day? Are you happy with your current place?
(1) so what, but these same people would consider a netezza or a teredata proprietary solution.
and exadata is not just hp, it is a san you can connect to things. sun is a huge player as well.
(2) so what, so what, so what. period - all I say to you is "so what". Do you have any technical points to make? I made quite a few.
as for your secure tunnel. Hmm, I hope you are using ASO or some other encryption technology. And it is as secure as moving files would be. If you do things in a professional manner.
dblinks not only have their challenges - but for data moves ( which frankly, as stated, SHOULD NOT BE DONE IN THE FIRST PLACE - the first "not smart" thing we are doing here is copying a ton of data - not smart) of this size are about the slowest way to do it. Think about that, think about that cost. 'Business' is about cost and getting things fast - you are doing it the most costly slow way possible.
(3) you didn't even say you were doing that, if you change the problem, then I will change the answer.
we've been talking about using things like BCV's to move tons and tons of data - not a trickle feed.
Fine, you want to trickle feed just changes - use streams. Done - but that doesn't make your need to move tons of data go away and we are back to either
(a) DO NOT DO IT (that is best)
(b) DO IT RIGHT using transports.
(4) you again need to educate, tell them how insecure what you are doing is - if what you say is true, copying the data all over the place is the LAST THING THEY WANT. give me a *break*.
... but we still manage to run a 100 Terabyte data warehouse,
...
that is a medium, almost small, database these days, you are doing what hundreds of thousands of people do.
And you could get a machine to do a lot more.
... But overall - not too many other realistic
options for us. Trust me - we've considered many.
...
I disagree. You know, as an Oracle shareholder, I should be promoting this - big time. How much money are you burning to copy this to so many machines.
But I'm not, I'll persist in pushing the right agenda, not giving a stamp of approval to a really bad idea.
... I was hoping, though, that the genious
developers in Oracle could find a way to compress the 8 trillion bits into
something more compact, ...
you know what, we can, you just have to whip through 8 tillion bits first. Your analogy with winzip causes me huge confusion - I see not a single parallel??
And you would hope the genius developers where you are - could do this smarter?
Beware of hash cracking
November 2, 2009 - 5am Central time zone
Reviewer: Another reader from Germany
As Tom Kyte already pointed out, a hash mathematically is a mapping of a huge set of possible
values to a smaller set. It cannot be unique because if it were it would need the same amount of
space to represent the same cardinality.
So your bosses who are afraid of going "to the slammer" bet on statistics: they hope you will not
hit a collision of your hashes in your lifetime. This sounds too optimistic considering 100 TB of
data. Let us assume you have 256-bit hashes which are accepted as "safe". 256 bit = 32 bytes. Let
us further assume you map one 1-TB-table's contents to one hash value. This means 1,099,511,627,776
bytes map into a 32 byte value. So you will have guaranteed collisions
power (1099511627776/32, 2) = 2^34,359,738,368 (around 10^10,343,312,041)
times! And your bosses rely on hash values? Hmm. Maybe mathematics may convince them if sound
advice cannot.
Beware of hash cracking
November 2, 2009 - 5am Central time zone
Reviewer: Another reader from Germany
As Tom Kyte already pointed out, a hash mathematically is a mapping of a huge set of possible
values to a smaller set. It cannot be unique because if it were it would need the same amount of
space to represent the same cardinality.
So your bosses who are afraid of going "to the slammer" bet on statistics: they hope you will not
hit a collision of your hashes in your lifetime. This sounds too optimistic considering 100 TB of
data. Let us assume you have 256-bit hashes which are accepted as "safe". 256 bit = 32 bytes. Let
us further assume you map one 1-TB-table's contents to one hash value. This means 1,099,511,627,776
bytes map into a 32 byte value. So you will have guaranteed collisions
power (1099511627776/32, 2) = 2^34,359,738,368 (around 10^10,343,312,041)
times! And your bosses rely on hash values? Hmm. Maybe mathematics may convince them if sound
advice cannot.
That's a little extreme.....
November 2, 2009 - 1pm Central time zone
Reviewer: Udi karni from LA
Reader from Germany - you raise a very good point, but I think we have a little lattitude here.
Maybe take uptime as a guide - executives equate a system with 5 "9"s to a system that "never goes
down" and make such promises to the board, stockholders, and public.
A 5 "9"s system - if I did my math right - is a system that doesn't go down for more than 30
seconds out of the 30 million seconds in a year - so 1 out of a million is considered an acceptable
error.
A hash function that can reduce the 8 trillion bits of a 1 trillion byte (1 Terabyte) table by a
factor of 1 million - down to an 8 Mbyte string - might just be an acceptable error. You compare
two 8 Mbyte hash strings and if they are equal - you assume the entire 1 Terabyte is equal.
We do the same when comparing DNA, when taking a thumb print or a retina scan and allowing access -
I doubt the computer checks every pixel - there must be some "formula" derived from the picture and
compared.
So - there has got to be some reasonable way to do this. Maybe a function with a throttle parameter
- the more accuracy you want - the bigger the string will be, and then the users can make an
educated determination of how far they want to extend it.
I am not an expert on this, but in my humble opinion - in the world of data warehouse - you often
end up with large objects that are copies of each others - and it would be very useful to have a
function that could quickly tell you whether they are identical - without comparing every bit.
Comparison Exactness
November 2, 2009 - 2pm Central time zone
Reviewer: Another Reader from Germany
Hi Udi,
I just wanted to show the numbers to give you a helping hand when arguing with management ;-)
When Oracle compares two tables and finds they *are* the same they *are* the same with regards to
their use by the appropriate application layer. Bit by bit, however, not necessarily on the
physical layer (one disk may have a different representation of "1" and "0" than another) but with
respect to "select * from tab;".
And here comes the point: when you use transportable tablespaces you have them read-only. So you
can mount them read-only from several sides *and* you can ensure they *are* the same, not just the
same with regards to the equivalence class (mathspeak) of the same hashvalue.
So by transportable tablespaces you might provide an even "better equality" than by comparing
hashvalues. What about that?
You are right...
November 6, 2009 - 5pm Central time zone
Reviewer: Udi Karni from LA
Another reader from Germany - you are right, and I should clarify 2 things -
(1) I completely agree that transportable tablespaces are a great way to both move the data and
guarantee equality
(2) when discussing the 8 trillion bits I was speaking metaphorically - of course the hash string
needs to represent the data from a user's perspective - put the 2 tables "side-by-side" and every
column of every row must have the same value.
As I stated before -
The problem with transportable tablespaces is just that it's really meant for shipping large chunks
- whole tablespaces, entire schemas. You could of course put a tiny table in a tablespace and ship
it - but it's not exactly how it was meant to be used. In our environment - parts of the tables get
updated at different times of the day (east coast earlier than west coast)and need to be propagated
immediately - so a DB-link transfer in a SQL script is a very easy way to ship one chunk of a table
that has just been updated, while the transportable tablespace is a little more cumbersome.
Also - transportable tablespaces require jumping between an Oracle process that loads the data into
the tablespace and a system process that extracts the metadata, moves the datafiles, loads,
connects on the other side, etc. In a highly secure shop - this requires much more setup and
permissions, and is a lot less flexible than a db-link. So - the ability to get things done quickly
without involving other groups (network, system) gets a few points as well.
Oracle obviously has a ton of features, and every customer uses the ones that work for them.
Sometimes I am just amazed at how resistant Oracle is to adding features even if they are only
useful to a small segmment of the user base.
Examples that come to mind -
(1) the ability to rename a tablespace - a seemingly "nothing-to-it" feature that for many long
years wasn't available.
(2) the lack for all these years of an "IsNumeric" builtin function (Googling for it still brings
up Tom Kyte's you-code-it-yourself workaround). All kinds of stuff that exists in SQL Server - that
brings joy to no end to the Microsoft crowd - even though theirs, ultimately, is an inferior
product.
In any event - thanks for your feedback !
Simply agree...
November 18, 2009 - 4pm Central time zone
Reviewer: Dana from Phoenix, AZ USA
This very much sounds like a presentation given several years ago by an esteemed member of the db
community.
He owned a consulting company where he solemnly agreed with everything the customer wanted.
Need 2X the CPU... Great!
Need 10X the disk... Even better!!!
Need 100X the memory... That'll do it!!!
Sadly for the premise, it was completely with tongue firmly in cheek. But talk about funny!
|