Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question.

Asked: February 13, 2012 - 11:13 am UTC

Last updated: April 30, 2024 - 4:54 am UTC

Version: 11.2.0.3

Viewed 1000+ times

You Asked

Hi Tom,

We run a multi-user OLTP system on Exadata Quarter Rack (Linux version). Though the system response time is consistent and is performing well. We observed Run queues with CPU utilization at 70% on both the nodes. What could be the reason?
My understanding always has been that Run queues are formed only if the system utilization exceeds 100%. But in this case CPU on both the nodes is 65% utilized and 30% is free.
But may be my understanding is flawed.
Could you pls explain the concept of cpu utilization, run queues vis-avis cpu count, specially in OLTP workload?

and Connor said...

CPU utilizations are a measure of a cpu being either

a) 100% busy
b) 0% busy

over some period of time. when you see a cpu utilization of say 50%, that means 50% of the time the CPU was 100% busy and 50% of the time it was idle.

A cpu can never really be "50%" busy. It is just a measure, a fuzzy one at that. (and it gets really fuzzy if you start throwing cpu-threads into the mix!)


In short, if a system says it was:

50% utilized, that means a process wanting to be on the CPU will have a 50/50 chance of getting scheduled right away (and hence a 50/50 chance of being made to wait!). You have a 1 in 2 chance of getting on the CPU, a 1 in 2 change of being made to wait for the CPU.


66% utilized, you have a 1 in 3 chance of running right away, you have a 2 in 3 chance of being made to WAIT for the cpu

80% busy, you have a 1 in 5 chance of running right away, you have a 4 in 5 chance of waiting.

90% busy, 1 in 10 to run straight off, 9 in 10 you'll wait.


60-65% is what you typically might want to aim for in order to get steady, reliable OLTP performance (close to 90-100% for warehousing - where you control the workload very carefully).

and don't forget - when the OS says "we were 50% busy over the last second", that could mean "we were 100% utilized with run queues populated for 1/2 of a second and then idle with no run queues"

everything is averaged out...


In addition - many times the OS would prefer you to run on the last cpu you used as much of your working set might already be loaded up on the L1/L2 caches. It is expensive to get context switched on and off a single cpu. It is hugely expensive to have to move from cpu to cpu as well.


Update 2024: In this respect, "CPU" below can be used (relatively) interchangably with a single core. Each core can be 0 or 100 busy

Rating

  (3 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

CPU capacity planning

A User, October 16, 2012 - 6:32 am UTC

Dear Tom,

Aim is to :

CPU Utilization forecast..

today we are at 80% CPU utilization.. ( average)
Next to see the volume growth we want to estimate , how much CPU utilization would it go...

I know you would say - Benchmark it.
But, without that would it be possible to find the current utilization Vs current load on the system
and then using extrapolation we reach on some stats :
that x load... has 80% CPU utilized
y load ( increased load) .... would have xx% CPU utilized ( where xx >80)

What would be best starting point to begin with ( AWR/OS Utility..etc)

regards


Tom Kyte
October 16, 2012 - 9:59 am UTC

if you are at 80% utilization on an OLTP system, you are already past maximum CPU usage (it should be in the 60's to very low 70's at best)


you cannot add more load to this system, you need more memory, cpu and disk IO (you typically cannot add more of one resource without adding more of all resources unless you oversized something already)

you are right on the knee of the hockey stick of response times


R|                   x 
e|                  x
s|                 x
p|                x 
 |               x
t|              x
i|             x
m|xxxxxxxxxxxx
e|------------------------------
 10 20.. 70 80  90  100
    cpu utilization
            ^
            ^
you are right here



increase your workload and bamm, things will fall apart very quickly.

A Reader, October 17, 2012 - 7:27 am UTC

Dear Tom,
thanks for your reply.


I have seen you saying that either you use it or lose it... do you mean for non OLTP systems we can have utlization closer to full.?

regarding
...if you are at 80% utilization on an OLTP system, you are already past maximum CPU usage (it should be in the 60's to very low 70's at best)

what are the rationales behind these figures 60s and 70s.?

regards
Tom Kyte
October 17, 2012 - 3:20 pm UTC

in a warehouse, where you control the load on the system exactly and precisely (like when you are doing a load, you are in total control) you can get higher - near 90/100%.

In OLTP where things arrive totally randomly, chaotically - you have to leave some breathing room or the machine will appear to sporadically freeze.


The rational in the 60's is the original answer (what are your chances of getting scheduled) and the fact that in OLTP you want to get your request satisfied right away without waiting. Once you start waiting - response times become erratic, the system starts spending more time queueing processes than running them and queueing theory dictates that things will break down.

Just like the lines at the toll booth when people arrive in clumps, traffic jams result. if those people had arrived in a nice pattern - they'd have gotten through the toll faster.

Does the 60% utilization theory still hold true for multi-core CPUs?

Scott, April 24, 2024 - 6:36 am UTC

Say a CPU has 10 cores, if 9 out of 10 cores are occupied, the CPU utilization is 90%, can another process still use the last idle core?
Connor McDonald
April 30, 2024 - 4:54 am UTC

Regarding Tom's original answer, you can reasonably use CPU and *single* core interchangeably.

Hence a core is either running at 0 or 100 percent. If half your cores are running flat, you'd likely see your "CPU" reporting 50% utilisation.

For me, I'm generally more concerned with "waiting for CPU" as being the true measure of whether a machine is under stress. If that's non-zero, then no matter what your CPU utilisation is reporting, there is work that could *not* make it onto the CPU.

(Don't even get me started on modern machines which now have performance cores versus efficiency cores and the like).