Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, Vinothkumar.

Asked: February 23, 2009 - 1:46 am UTC

Last updated: February 23, 2009 - 3:15 am UTC

Version: 10.2.0

Viewed 1000+ times

You Asked

Hi Tom,

First of all i would like to give many thanks for this website.

My question,
We have our own ETL Component (which is designed based on Oracle Database and Multithreaded Java concepts) - which is running in Production server now, it is consuming 100% CPU during ETL Run (full CPU Utilization).

Though it is dedicated server for our application, will there be any harm due to this usage?

In addition to that,

If Data volume is increases (amount of data to be processed by ETL) in near future, will there be any harm?
100% cpu utilization will extend some more time? or do we need to increase CPU?

Thanks.




and Tom said...

Well, the goal is to fully utilize the hardware - however, if you are at 100% one of two things is likely true:

a) you are hitting the magic point, you are maximizing the resources fully, you are just at the peak of the machine - you cannot add anymore workload but you are charging ahead full steam - you are not waiting for the cpu, you are just using all of it.

b) you have already exceeded the capacity of the machine with your slow by slow java ETL procedural process (I'm guess, when I see "multithreaded java concepts", all I can think is we have the most procedural slow by slow approach ever designed at play - happens all of the time). Your application is waiting for CPU - to get 1 cpu second it might have to wait 2, 3, 4 or more elapsed seconds - you have swamped the machine and are actually running much slower than you would be if you backed off on the number of threads competing for cpu.


How do you tell? Well, through instrumentation typically. I can tell you from the database side where we are spending on our, what we are waiting on (via tracing), but unless you instrumented your code likewise, it is unlikely you'll be able to really measure this (this is a question you should have been asking before you wrote a line of code - how are we going to instrument our code so we can see where we spend our time and what we wait on...)


I would guess that in this case, with a monolithic multithreaded java application - most of the time (most of the cpu time) is spent OUTSIDE of the database, in your code and we'll not be able to measure that nicely. It is highly likely you have already exceeded the machines capacity - you should back off on the number of threads until you start to see at least a tiny bit of CPU free for some duration of time - that is the maximum degree of 'threads' you can execute.

And you should consider not doing things procedurally in the future, most of your ETL should just be T - transform, that is, USE SQL to transform data in the database, do not extract it, do not reload it - just transform it.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Vinothkumar Srinivasan, February 23, 2009 - 2:42 am UTC

Hi Tom,

Thanks for your valuable response.

"And you should consider not doing things procedurally in the future, most of your ETL should just be T - transform, that is, USE SQL to transform data in the database, do not extract it, do not reload it - just transform it"

- We are using SQL Queries for (T)ransformation (90%).

- The rest of the 10% Java code due to not possible to write in a single SQL Statement plus SQL resultset (JDBC Layer) in flat files for further processing.

We are running the above two tasks in Java PARALLEL Threads (Max 20) during this time we are reaching CPU to peak (100%).

NOTE: We didnt get any IO Bound here, since we are writing into Buffer then disk (instead of direct IO on disk).

Any further comments on this?




Tom Kyte
February 23, 2009 - 3:15 am UTC

... - We are using SQL Queries for (T)ransformation (90%). ...

ah, but you have plural there :) do you have tons of tiny queries working small bits of data, or "big queries" that do a lot of work in a single statement.


...
- The rest of the 10% Java code due to not possible to write in a single SQL
Statement plus SQL resultset (JDBC Layer) in flat files for further processing.

...

that - well, that is just wrong. Flat files????? oh boy, oh boy. Probably nothing you couldn't have done in a single sql statement - and even if your could not, definitely nothing you could not have done more efficiently in plsql in a TABLE.


...We are running the above two tasks in Java PARALLEL Threads (Max 20) during
this time we are reaching CPU to peak (100%).
.....

so, probably, this 10% of the code is using 100% of the cpu.... consider that.



.... NOTE: We didnt get any IO Bound here, since we are writing into Buffer then
disk (instead of direct IO on disk).
.....


hmmm, really, what buffer and just because you buffer does not mean IO is not a choke point, buffers overflow. And you probably mean the database cache, but then you are generating redo (gobs of it, row by row from that flat file) and that is IO too - and undo - that is IO too.

You might not be IO bound because you are so cpu bound.... But a buffer does not imply, infer or assure "no IO boundness"