Thanks for the question, Vinothkumar.
Asked: February 23, 2009 - 1:46 am UTC
Last updated: February 23, 2009 - 3:15 am UTC
Version: 10.2.0
Viewed 1000+ times
You Asked
Hi Tom,
First of all i would like to give many thanks for this website.
My question,
We have our own ETL Component (which is designed based on Oracle Database and Multithreaded Java concepts) - which is running in Production server now, it is consuming 100% CPU during ETL Run (full CPU Utilization).
Though it is dedicated server for our application, will there be any harm due to this usage?
In addition to that,
If Data volume is increases (amount of data to be processed by ETL) in near future, will there be any harm?
100% cpu utilization will extend some more time? or do we need to increase CPU?
Thanks.
and Tom said...
Well, the goal is to fully utilize the hardware - however, if you are at 100% one of two things is likely true:
a) you are hitting the magic point, you are maximizing the resources fully, you are just at the peak of the machine - you cannot add anymore workload but you are charging ahead full steam - you are not waiting for the cpu, you are just using all of it.
b) you have already exceeded the capacity of the machine with your slow by slow java ETL procedural process (I'm guess, when I see "multithreaded java concepts", all I can think is we have the most procedural slow by slow approach ever designed at play - happens all of the time). Your application is waiting for CPU - to get 1 cpu second it might have to wait 2, 3, 4 or more elapsed seconds - you have swamped the machine and are actually running much slower than you would be if you backed off on the number of threads competing for cpu.
How do you tell? Well, through instrumentation typically. I can tell you from the database side where we are spending on our, what we are waiting on (via tracing), but unless you instrumented your code likewise, it is unlikely you'll be able to really measure this (this is a question you should have been asking before you wrote a line of code - how are we going to instrument our code so we can see where we spend our time and what we wait on...)
I would guess that in this case, with a monolithic multithreaded java application - most of the time (most of the cpu time) is spent OUTSIDE of the database, in your code and we'll not be able to measure that nicely. It is highly likely you have already exceeded the machines capacity - you should back off on the number of threads until you start to see at least a tiny bit of CPU free for some duration of time - that is the maximum degree of 'threads' you can execute.
And you should consider not doing things procedurally in the future, most of your ETL should just be T - transform, that is, USE SQL to transform data in the database, do not extract it, do not reload it - just transform it.
Rating
(1 rating)
Is this answer out of date? If it is, please let us know via a Comment