Skip to Main Content

Breadcrumb

Dev Live Dev Intro

We are celebrating Developers at AskTOM. We welcome Developers of all levels of experience to join us at our FREE Developer Live events coming in August and September. Just click on the left to register today! If you are brand new to Database Technology, then we also have got you covered. Just click on the right for your comprehensive FREE training program to kick start your Oracle Database Development journey!

Question and Answer

Tom Kyte

Thanks for the question, fentu.

Asked: October 23, 2002 - 9:08 am UTC

Answered by: Tom Kyte - Last updated: August 08, 2019 - 2:27 am UTC

Category: Database - Version: 8.1.5

Viewed 100K+ times! This question is

You Asked

How can I reduce CPU utilization of Oracle database?

and we said...

shutdown abort

works well ;)

Are you at 100% utilization? If not, you haven't accomplished your job yet. You cannot put CPU in the bank and save it for later. So, if you are running with idle cycles you should be looking for more ways to use it.

You can do things like:

o Tune queries, use alternate approaches
o reduce the workload
o using MTS (aka shared server) which is just a way to reduce the workload
o identifying WHY you are using so much CPU (this is probably the most important
and workable way!!!) and then changing the processes that cause this

But then again, if you are at 99.99% utilization -- pat yourself on the back, you've done well.




and you rated our response

  (153 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

Excellent.......

October 24, 2002 - 12:40 am UTC

Reviewer: Hitesh from UK

Thanks a lot.....

February 04, 2003 - 9:34 pm UTC

Reviewer: Mike from TX

"
Are you at 100% utilization? If not, you haven't accomplished your job yet.
"

If i had Oracle application running and the CPU utilization was under 20%, what should i look into to raise the CPU utilization? And what could be for the low utilization.

Tom Kyte

Followup  

February 05, 2003 - 7:37 am UTC

Just might mean you bought 5x the machine you needed.

If you are doing a "batch operation" at that point in time, you may have neglected to take advantage of parallelism. Eg: instead of using a single process to load a million records, use 10 that each load 100,000 records. Instead of running a plsql procedure that loops over and processes a million records, run 10 copies of it -- each of which process some non-overlapping subset of the data...

If you are a simple transactional system -- where parallel operations don't make sense -- you just have 5x more machine then you need.

February 05, 2003 - 9:24 am UTC

Reviewer: Mike from TX

Tom, Thanks. I think I understood how the parallelism works.
I was looking for the diagnose methods for causes of "Low CPU Utilization". For example, use which v$_ or DBA_ views to find out the problems. Does it most likely waiting for the I/O causing the low CPU utilization?


Tom Kyte

Followup  

February 05, 2003 - 11:46 am UTC

it would be anything that causes your application to block and wait

IO is a biggy
enqueue waits (waiting on a lock) would be another

look at the big waits -- those are generally the things you are doing when you are not using the CPU

cpu under utilization case

February 05, 2003 - 1:01 pm UTC

Reviewer: Mike from TX

Thanks Tom.
I am doing a little test of the cpu utilization case on my laptop. I know the follow query will generate alot i/o:
select count(*) from t1, t2;
t1 is about 1/5 million and t2 is about 150000.

the cpu utilization is 60%. To find out what's the waits:

mburk@jnas> l
1 select event, sid, seq#,
2 wait_time,
3 seconds_in_wait,
4 /* state,
5 p1text, p1, p1raw,
6 p2text, p2, p2raw,
7 p3text, p3, p3raw
8 p1text || ' = ' || p1 parm1,
9 p2text || ' = ' || p2 parm2,
10 p3text || ' = ' || p3 parm3
11 */
12 decode( p1text, null, null,
13 p1text || ' = ' || p1 ) ||
14 decode( p2text, null, null,
15 ', ' || p2text || ' = ' || p2 ) ||
16 decode( p3text, null, null,
17 ', ' || p3text || ' = ' || p3 )
18 parameters
19 from v$session_wait
20 where event not in ( 'pmon timer', 'rdbms ipc message', 'smon timer',
21 'WMON goes to sleep',
22 'SQL*Net message from client' )
23* order by event, p1, p2
mburk@jnas>

mburk@jnas> @vsessw

Wed Feb 05
Session Waits

WAIT SECONDS
EVENT SID SEQ TIME IN WAIT PARAMETERS
----------------------------- ----- ------ ---- ------- -------------------------
null event 9 23689 1
11 83 -1
wakeup time manager 8 2719 20

***** the sid 9 is doing the query.
***** What's the "null event"?

Check the session stats:
mburk@jnas> l
1 select s.sid, s.value, n.name
2 from v$sesstat s, v$statname n
3 where n.statistic# = s.statistic#
4 and s.value <> 0
5 and s.sid like '&sid'
6 and n.name like '&name'
7 and s.statistic# like '&statistic'
8* order by sid, n.name
mburk@jnas>
mburk@jnas> @vsesstat
Enter value for sid: 9
old 5: and s.sid like '&sid'
new 5: and s.sid like '9'
Enter value for name: %
old 6: and n.name like '&name'
new 6: and n.name like '%'
Enter value for statistic: %
old 7: and s.statistic# like '&statistic'
new 7: and s.statistic# like '%'

Wed Feb 05 page 1
Session System Statistics - By Sid

SID VALUE NAME
---- ---------------- ----------------------------------------------------------
9 3 CPU used by this session
3 CPU used when call started
6,053 Cached Commit SCN referenced
4 Commit SCN cached
49 SQL*Net roundtrips to/from client
6,762 buffer is not pinned count
18 buffer is pinned count
3,990 bytes received via SQL*Net from client
3,813 bytes sent via SQL*Net to client
111 calls to get snapshot scn: kcmgss
1 change write time
4 cleanout - number of ktugct calls
4 cleanouts only - consistent read gets
35 cluster key scan block gets
35 cluster key scans
4 commit txn count during cleanout
6,938 consistent gets
146 consistent gets - examination
10 cursor authentications
4 db block changes
4 db block gets
4 dirty buffers inspected
8 enqueue releases
8 enqueue requests
89 execute count
4 free buffer inspected
6,651 free buffer requested
64 hot buffers moved to head of LRU
4 immediate (CR) block cleanout applications
52 index fetch by key
53 index scans kdiixs1
1 logons cumulative
1 logons current
6,705 no work - consistent read gets
39 opened cursors cumulative
1 opened cursors current
3 parse count (hard)
39 parse count (total)
5 parse time cpu
29 parse time elapsed
1,500,624 physical reads
1,493,974 physical reads direct
236 physical writes
236 physical writes direct
236 physical writes non checkpoint
6,395 prefetched blocks
1,044,465,908 process last non-idle time
1,537 recursive calls
3 recursive cpu usage

Wed Feb 05 page 2
Session System Statistics - By Sid

SID VALUE NAME
---- ---------------- ----------------------------------------------------------
9 4 redo entries
240 redo size
13 rows fetched via callback
1,044,465,908 session connect time
6,942 session logical reads
1,662,788 session pga memory
2,711,364 session pga memory max
1,225,904 session uga memory
2,040,880 session uga memory max
53 shared hash latch upgrades - no wait
1 sorts (disk)
45 sorts (memory)
479,387 sorts (rows)
41 table fetch by rowid
6,627 table scan blocks gotten
485,574 table scan rows gotten
2 table scans (long tables)
5 table scans (short tables)
47 user calls
38 workarea executions - optimal
1,105 workarea memory allocated

70 rows selected.

the results show session 9 using 3/100 seconds of CPU time.
The LIO=6,938 and PIO=1,500,624.

Is this supposed method to find what the casue of low CPU utilization?
Thanks


Tom Kyte

Followup  

February 05, 2003 - 1:32 pm UTC

whats the version


in order to find out waits for "something" -- eg "for that query" you need to mimick statspack.


start the process
o snapshot (copy) v$sesswait in another sessoin
o wait a bit, remember how long you waited.
o snapshot it again

then join the snaps together and compare the diffs. You are seeing the stats for the session from the begining of time for the session.

CPU time and other things will be meaningless averaged over that amount of time. but, if you snapshot, wait, snapshot and then find out you have 1.1 cpu seconds and the wait was say 2 seconds long, you know you were 50% on cpu for example....




CPU hungry

February 05, 2003 - 2:08 pm UTC

Reviewer: A reader

> But then again, if you are at 99.99% utilization -- pat
> yourself on the back,
> you've done well.

I wouldn't be patting anyone on the back. On the contrary, I'd be scratching my poor old bald head then.

If I'm at 99,99%, I'm on the edge. There has to be some "comfort zone".

That talk of running at 99.99% is purely theoretical stuff -- very cool not wasting your cycles.

In my experience though, I like running at 60-80%. We just can't always predict something is going to jump in and eat up our time slices.

Once -- I remember -- we had a memory leak in a snmp process and it began consuming cpu and memory. We had time to spot and fix the problem because we were not short of cpu. Had we been, probably we'd have had a hard time with our end-users.

I also like using cpu quotas like with profiles and resource manager. They work fine.

Tom Kyte

Followup  

February 05, 2003 - 2:43 pm UTC

comfort zone = "wasted cycles"

you cannot, repeat, cannot put CPU in the bank.

If you are constantly running at 90% -- what use -- what use -- is that other 10%

Can you save it?
Can you use it else where?

If you go for the "don't know what I need so I'll just double it" thats great (hardware vendors count on that -- almost as much as they count on no one really sizing systems -- just buy as much as we have money for).

If an snmp process was running away -- it has the ability to more then consume 100% of a CPU's resources on the machine. Having "spare" cpu buys you nothing. Doesn't even make sense. A runaway is a runaway -- it only takes one to completely overpower a CPU. So, if you want to buy 40% more CPUs (say a 10 cpu machine instead of 6) just in case you get 4 runaway processes -- more power to you.

Sort of like buying a stock race car for street driving. Lots of horsepower, never going to use it.


That last paragraph -- now that makes sense. The rest of it -- don't agree even a little. Waste of money -- that comfort zone does one thing (well, besides costing money) and that is "makes you feel better". If that is worth 40% extra cost -- great. Do you do the same for RAM? For disk? hardware vendors must *love* you. Sorry -- just really disagree here. I see people running around crazy cause the machine is "running at 90% plus". So what? It is *supposed* to. I want to load up a machine till it is running at capacity -- then, move onto next machine. (this is what grid computing for example is all about)

CPU Hungry -- the mission

February 05, 2003 - 8:16 pm UTC

Reviewer: Same guy as above

Sorry Tom, but I don't like the idea of a squeezed cpu. What if I need more power all of a sudden, and I can't scale ?

Ok, I can't save my cpu cycles in a bank. But I can keep my cyles from being used in case I need them. Not too many, not too little -- just enough.

I think you are exaggerating. I may be wrong, but it sounds like common sense to me.

I think it's a matter of choices and opinions, and above all -- experience. I'd never risk to be at 99%. My choice, my opinion, my experience. You have yours, okay.

BTW, this discussion is more suited for a sys admin forum than here.

BTW 2 -- nice discussing with you. I liked it here ;-)

Tom Kyte

Followup  

February 06, 2003 - 7:40 am UTC

why if I need more power all of a sudden..... then you didn't size, didn't think about what you needed to do

If you need more power all of a sudden -- you want to look at blade servers or some similar technology that lets you re-assign power.

Sure, you can buy boxes 50% bigger then you need "just in case", or you can move stuff around and get away with 1/3 the computing power.


Other "common sense" things I hear all of the time:

o rebuild indexes on a schedule. everone knows indexes need to be rebuilt
o you must separate tables from indexes by tablespace, it is only common sense
o segments should be in as few extents as possible -- one extent is best by far
o a cold backup is "better" then hot
o you should shutdown your database frequently
o if your query is not using your indexes, the optimizer did the wrong thing
o adding more CPU to your system will make it go faster, especially if it is 100% utilized <<<=== that one is really funny actually


Oh this list of conventional wisdom goes on and on and on and on.



Too each their own. Me, I will never ever sweat my production boxes running at or near capacity. Means I've done my job and my requisition for new hardware will be taken seriously as people know I use what I order.

Your approach is like ordering two entree's at a restaurant and eating just a little from each -- wasting the rest "just in case your eyes where smaller then your stomache" ;)

CPU hungry

February 05, 2003 - 8:58 pm UTC

Reviewer: another reader

If your application uses the data base constantly you can let your server run at the edge. But, this is not the reality. Processes don't run constantly. They run in bursts.

Let's say that here is an application that uses 100% of the cpu for 10min every day. This means that your server won't be able to respond to any other request in the mean time.
For web applications, 10 min is an inifite time (you get a timeout msg). So, this is not a good solution.


Tom Kyte

Followup  

February 06, 2003 - 7:45 am UTC

who runs in bursts? and it's all about peak usage.

So, you are going to double your costs for 10 minutes? or maybe fix that so it uses 50% of the cpu for 20 minutes? your choice.

It could depend on the type of workload

February 06, 2003 - 12:55 am UTC

Reviewer: Ramakrishna from India

Dear Tom,
You have said:
"Are you at 100% utilization? If not, you haven't accomplished your job yet."
I was going through the paper on 'Performance Management - Myths and Facts' by Cary Millsap where he goes on to explain that, if it is a batch-only environment, then 100% CPU is what one should aim for. However, if it is an interactive environment or a mixed environment, running at 100% could mean that users are actually experiencing excessive delays due to queueing. Is it right to say that, for an interactive or mixed system, one should start considering the purchase of more CPU when the existing system is constantly at, say 70 to 80% CPU utilization AND when the run queue is constantly going over 2? If we wait for CPU to be close to 100% in such a system, then perhaps response times will have started degrading rapidly by then. This means that it is better to always have some spare headroom on such systems, doesn't it?

regards
Ramakrishna

Tom Kyte

Followup  

February 06, 2003 - 7:57 am UTC

ok, 99.9999%

point is -- no need to be concerned because you are actually using your CPU -- it is something you actually want to do.

You cannot put it in the bank.

Use it or lose it.

but in the end -- your choice. some people will just not be comfortable at or near capacity I guess.

cpu cycles

February 06, 2003 - 7:56 am UTC

Reviewer: A reader

From Ahmed Alomari's Oracle8 & UNIX Performance Tuning published by Prentice Hall: </code> http://www.amazon.com/exec/obidos/tg/detail/-/0130187062/qid=1044534932/sr=8-1/ref=sr_8_1/103-3789535-0772632?v=glance&s=books&n=507846 <code>(this one is usually very well rated)

"User time percentages should range anywhere from 60% for applications that issue a large number of system calls to 95% for an application with a minimal number of system calls." (page 73)

That's what you'll find in virtually every book of this kind.

Tom Kyte

Followup  

February 06, 2003 - 8:53 am UTC

great -- go for it. I stand humbly corrected.

No wait -- that is talking about the ratio of user to system calls.

Doesn't seem to be talking about CPU utilization at all.

Back on my soapbox: USE IT OR LOSE IT.

There Ahmed is talking about User calls vs System calls. You know, like the output from top:


CPU states: 97.4% user, 2.3% system, 0.0% nice, 0.1% idle




CPU hungry

February 06, 2003 - 8:27 am UTC

Reviewer: Same guy

>who runs in bursts? and it's all about peak usage.

aha... now, we're speaking the same language. That didn't seem to be the point you were making ;)

The guys above and me (and literature) are talking of *mean time*, obviously.

You're talking of *peak* usage -- that's hardly measurable and very infrequent -- not good metrics. You say you keep your *peak* at 99.99% ?

comfort zone = *peak* minus *mean* <> wasted cycles.

I hope this settles the case.

CPU Usage

February 06, 2003 - 8:52 am UTC

Reviewer: Tony Andrews from London, UK

"Sort of like buying a stock race car for street driving. Lots of horsepower, never going to use it."

This is really not an area where I have any expertise ("shut up then!"), but I wonder whether an analogy with supermarket checkouts might be more apt?

If you run a supermarket with 10 manned checkouts, and find that on average only 6 are serving cusomers at any time (i.e. 60% utilisation), then you are wasting money - you only need 6 checkout staff.

On the other hand, if on average all 10 staff are busy all the time (100% utilisation) then either you have just the right number, or maybe you haven't got enough. It depends how long the queues are (and whether you end up losing customers as a result).

Now if you can monitor the waiting times of the customers, you can determine whether or not you need more checkouts. But if all you can monitor is the checkout utilisation (you can't see the queues), then at 100% utilisation you have no idea whether you have just enough checkouts, or whether you need twice as many.

Does this analogy work? Are the pessimists saying that all they can monitor is the utilisation, so they need the spare CPU to be sure? And is Tom saying that you should be monitoring the waits/queues - if the throughput is acceptable, you don't need more CPU, you have just the right amount?


Tom Kyte

Followup  

February 06, 2003 - 9:15 am UTC

Does this analogy work?

I liked it...

Are the pessimists saying that all they can monitor is
the utilisation, so they need the spare CPU to be sure? And is Tom saying that
you should be monitoring the waits/queues - if the throughput is acceptable, you
don't need more CPU, you have just the right amount?

That is what I'm saying. Thanks for saying is "more clearly"

It is like the cache hit ratio thingy.

someone: "Hey, my cache hit is 99.99%, pretty good eh?"
me: "oh, thats pretty bad -- that could indicate serious performance issues,
then again, maybe not -- but in any case, by itself it is a meaningless
number"




Thanks Tony, the analogy is great

February 06, 2003 - 11:20 pm UTC

Reviewer: A reader


February 07, 2003 - 9:54 pm UTC

Reviewer: Mike from TX

The quote from "Essential System Administration" by Aeleen Frisch, 2nd Edition, Published by O'Reily
Page 288
"High level of CPU usage are not a bad thing in themselves (quite the contrary, in fact, when they mean that system is accomplishing a lot of useful work). The mere presence of high CPU usage numbers means nothing in and itself. However, if you are tracking down a system performance problem, and you see such levels of CPU use consistently over a significant period of time, then you will have determined that a shortage of CPU cycle is one factor contributing to that problem (it may not be the total problem, however, we shall see)."

I agree with you on the CPU utilization. However if there is true on the CPU bottleneck, as Aeleen Frisch said "levels of CPU use consistently over a significant period of time". How can we verify in Oracle database perspective it is CPU resource lacking, ask the management for faster CPU or add more CPUs?


Tom Kyte

Followup  

February 08, 2003 - 9:04 am UTC

transaction throughput. hit counts. amount of work accomplished.

If the system uses 100% of the CPU to do 1tps and it uses 100% of the CPU to do 10tps and it uses 100% of the CPU to do little to no database work -- you have a problem.

CPU utilization is alot like cache hit ratios (except of course people want low cpu like golf scores and high cache hit like basketball scors).

"hey, I've got 99.9% cache hit -- pretty good eh".

Much like Aeleen Frisch pointed out for CPU, 99.9% cache hit could be an indication of serious performance problems, or it might mean nothing or it could mean things are going great. you cannot tell from that one number.

Same with CPU -- as others have pointed out here -- you need to look at things like your run queue, the service times, the backlog. was that 100% really trying to be 200% or was that 100% really 100%?

it is just a number that has to be taken into consideration along with every other number and by itself it not very meaningful.



what's the TPS?

February 08, 2003 - 10:07 am UTC

Reviewer: A reader


Tom Kyte

Followup  

February 08, 2003 - 10:33 am UTC

transactions per second.

tps = total calls / total elapsed in the tkprof?

February 09, 2003 - 11:25 am UTC

Reviewer: A reader


Tom Kyte

Followup  

February 09, 2003 - 3:25 pm UTC


No, TPS = transactions per second as reported by statspack for example. Number of commits/rollbacks in a period of time divided by period of time.

February 09, 2003 - 4:16 pm UTC

Reviewer: Mike from TX

So, for a report server we can not use tps to find out the over loaded CPU. What's should we look for?
Thanks alot

Tom Kyte

Followup  

February 09, 2003 - 6:45 pm UTC

ok, executes / second then. the load profile portion of a statspack report would be useful -- right from the top....




CPU review

February 11, 2003 - 4:39 am UTC

Reviewer: Zoran Martic from Dublin, Ireland

If you have CPU utilization at peak time that is >90% in user/kernel space
If you have more or equal active processes then the number of CPU's
YOU NEED TO BE PROUD OF THAT only if:
you tuned your SQL to use as lees as possible CPU (for example to read 3 blocks instead of 10 blocks from the buffer cache)

The second myth is to have 20% backup CPU for who know what:
It is OK if the application is real-time - that means you have bound latency (like I have 10ms for every business transaction for charging mobile phones in real-time).

But usually we are talking about standard OLTP applications.
If you have 20 active processes at peak time and only 10 CPU's you will have to say 100% larger latency.
That means if you spend 10ms on your SELECT why do you think that the big problem is to spend 20ms and have 100% CPU utilization.

You need to be very very happy if you tuned all your SQL and your instance and you are CPU bound.

I DO NOT UNDERSTAND THIS STATEMENT
"I want to leave 20% CPU for something that could happened"
Does this mean that you want to limit the number of processes at peak time to be less or equal then the number of CPU's. Or you want to have the botleneck on I/O or something else and to leave 20% idle and you have more processes then CPU's.
THIS IS NOT TOO HARD TO ACHIEVE. THERE ARE TWO METHODS:
1. Use more CPU's then you have processes (great idea if you have enough money, I think IBM, Sun or Intel will be very helpful to achieve this)
2. Just build stupid application with stupid non-optimized SQL's and you will be bounded on I/O or enqueue or something else (that means you can achieve 50% idle with only 8 processes on 8 CPU machine). This will be great because many good consultants and tuning experts will thank you for this, it will be more work for smart people. Who wants everything to be optimized.

Tom, I supported you in idea that 100% CPU utilization at peak time and that is not that much in wait I/O is great (because I/O operations are more time expensive then CPU operations you in any case need to perform after reading/writing to disk).
This is OK for I think most (95%) applications and not for real-time maxtime latency bounded or fixed-time latency bounded (like video streaming) applications, but this is another story.

Regards,
Zoran





cpu vs processes

May 27, 2003 - 8:36 pm UTC

Reviewer: Raja from Chennai, India

Tom, is there any connection between number of concurrent users and number of cpus? The reason i ask is that we are planning for new servers and we want to decide how many cpus the servers should have? Following are the specifications:

(1) it is a oltp and we anticipate about 300 users to start with and may be 100 to 200 concurrent sessions.
(2) The server will have one instance supporting one database.
(3) We are thinking about RAM of 1G. Initial SGA size will be around 400MB. SGA_MAX_SIZE will be 500MB.
(4) It will be dedicated server configuration. In future we may go for MTS if number of users increase but not now as we think shared server is not necessary now.
(5) Oracle version 9.2.0 on Sun solaris 5.8.
(6) We are not going to run any parallel dml/query on this instance.
(7) the database will be in ARCHIVELOG as we don't run any of our production databases in NOARCHIVELOG.

Our second server will support a reporting database again with Oracle 9.2.0 on Sun solaris. We will have read only snapshots on this server that will be refreshed every evening from our prod oltp database. We are planning to use parallel query feature on this server. There will be a few reports generated every evening. How many cpus and RAM we should go for on this reporting server. We may have concurrent reports run on this server and at present we don't know how many of them.

I hope I have provided sufficient info to you to make your valuable suggestion. Thanks so much for your time. I have a week to make a decision on this. If you are too busy now, I would appreciate if you please try to answer later. Thanks again.

Tom Kyte

Followup  

May 28, 2003 - 6:36 am UTC

1) insufficient data. 100 to 200 concurrently ACTIVE sessions (hitting the enter key simultaneously). Improbably with only 300 users. Probably more like 10-20 maximum active sessions. that is key -- ACTIVE.

2) perfect

3) not relevant, no relationship between number of users and SGA size can be drawn.

4) perfect

5) great

6) that derives from the "we are oltp"

7) perfect

You should use data guard instead of replication for the reporting instance. You'll get a reporting site, failover site, a switchover site, a site you can offload backups too -- all with the least impact on the production site and the admins.


You'll need to benchmark in order to size your system. If you ask 1,000 people for the definition of a "transaction", you'll probably get about 5,000 different responses. Are your reports cpu intensives (lots of OLAP/analytics/sorting) or disk instensive. Do your oltp users do lots of cpu intensive things or not.


You need to benchmark, that will be the only accurate way to size. Every day mhz goes up, disk speed change, rules of thumb that where inaccurate a year ago don't even apply today.


I can say I run over 1,000 users doing "oltpish" applications on a 7 year old 4 cpu sparc. It is called "asktom". but the characteristics of that application are just totally different from anything you would build.

High Throughput

May 28, 2003 - 5:00 pm UTC

Reviewer: Sean Bu from Atlanta

Tom,
I have been brought to give recommendations on the OLTP of a web application, which is a 3-tier configuration, 9iAS R2 with the repository and a 9i R2 customer database. They want me to find the problems as well as to give the recommendations for the 100X more active users. The system has been in the production for more than a half-year. The following is the list items I think I shall zoom in, which are in the order of the importance:

1) Bind variables
2) Tune bad queries
3) Size PGA_SIZE_TARGET
4) Configure JDBC Connection Pooling
5) Use Raid 1+0
6) DB_WRITER_PROCESSES > 1


Do I miss some?

Thanks

Sean

Ps. Someone else are looking into other 9ias components, such as webcahce and oc4j clusters, so I did not put them in the list though they might be equally important.


Tom Kyte

Followup  

May 28, 2003 - 8:23 pm UTC

do have have problems? I mean, you cannot have a plan of attack without knowing what you are attacking.....

you need to have a set of goal posts, so you'll know when you cross them.

Thanks!

May 28, 2003 - 9:50 pm UTC

Reviewer: Raja from Chennai, India

Will consider dataguard for reporting server. Thanks for your explanation. It makes a lot of sense to me.

May 28, 2003 - 10:07 pm UTC

Reviewer: A reader

......
do have have problems? I mean, you cannot have a plan of attack without knowing
what you are attacking.....

you need to have a set of goal posts, so you'll know when you cross them.
......

What I tried to say was the list of "checking" of the health of a typical busy OLTP. Look at the checking list,Do I miss any important. Of cuase I will look into the report of statspack.

Thanks.

Sean

Tom Kyte

Followup  

May 29, 2003 - 7:36 am UTC

there is no such thing as a "typical system" but anyway, assuming they are not screaming "it is slow", this is somewhat subjective.

get a 15 minute stats pack during "peak" time.

look at the soft parse % and make sure it is above 99%.
look at the waits and see if they are "reasonable"
look at the top sql and see if they look "reasonable"

don't see where raid 10 comes into play necessarily.

May 28, 2003 - 11:13 pm UTC

Reviewer: A reader

When CPU is > 90%, generally, things "seem" sluggish. Even a simple 'ls -l' on a UNIX shell prompt takes a long time. Opening a telnet session takes long. Everything takes long.

So, how is this a good thing? I realize that CPU util, by itself, is a meaningless number, but all of us agree that it is, most likely, the precursor or symptom of some deeper lurking problem. How to find out what this problem is?

Tom Kyte

Followup  

May 29, 2003 - 7:43 am UTC

I don't see sluggish response from ls and such at 90% on my systems, but anyway...

if you are running at 90% continously that is not good.
if you peak at 90% from time to time that is ok.

how to find out what the "problem" is -- you analyze your system. find out what is taking 90% of the cpu -- is it 1,000 little things or one big thing. is it something you need to do or can you skip it all together (be surprised how often that is true)

May 29, 2003 - 10:17 am UTC

Reviewer: Sean from Atlanta

"
.....
don't see where raid 10 comes into play necessarily.
.....

they are current on raid 5, and i want raid 1+0 for better writing throughput for 100X current active sessions later.

Tom Kyte

Followup  

May 29, 2003 - 10:32 am UTC



and N times the disk storage needs. raid 5 is typically bought on the cheap. raid 10 is at the high end of the cost scale.

See
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:359617936136 <code>

but again -- you are taking the wrong approach.

Sure raid 10 is "faster" then raid 5 -- however -- do they have an identifiable problem that needs FIXING. If so, identify it and then come up with solutions.

You are trying to solve things before knowing what needs (if anything!) fixing.

minimum resource usage

October 08, 2003 - 11:32 am UTC

Reviewer: Pushparaj Arulappan from NJ, USA

Tom,

In a given environment, what is the minimum resource
consumption by an oracle session.

Assume my environment is HP-UX 11i and Oracle 9.2.0.3.
Given the database configuration, if I connect to a SQLPLUS
session and doing nothing, what will be the minimum resource (memory and etc) consumed by this SQLPLUS session.

For example,
In my database the SORT_AREA_SIZE parameter is set to 1MB
and the HASH_AREA_SIZE is set to 1 MB.

Does this mean that, the idle SQLPLUS session will
pre-occupy this 2 MB memory ?

Thanks
Pushparaj

Tom Kyte

Followup  

October 08, 2003 - 11:52 am UTC

no -- sort_area/hash_area sizes are the max dynamic memory that might be allocate IF you need it and then will be released (eg: you don't sort and hash at the same time so they would never get 2meg, just 1mg max)

it is a couple hundred k, eg:

ops$tkyte@ORA920LAP> select a.name, b.value
  2  from v$statname a, v$mystat b
  3  where a.statistic# = b.statistic#
  4  and lower(a.name) like '%' || lower('&1')||'%'
  5  /
old   4: and lower(a.name) like '%' || lower('&1')||'%'
new   4: and lower(a.name) like '%' || lower('pga')||'%'

NAME                                VALUE
------------------------------ ----------
session pga memory                 220508
session pga memory max             220508


that is 220k right after login 

Quite an Enlightening discussion

October 20, 2003 - 6:26 am UTC

Reviewer: H.S.Anand from New Delhi, India

Hi Tom,
The discussion has --"as always"-- really informative and has cleared quite a few misunderstandings I had about CPU utilizations!

Kudos to you.

thanks
Anand

Querying Perfstat Schema

October 22, 2003 - 7:37 pm UTC

Reviewer: Matt from Australia

I am planning to generate some SQL to query the perfstat schema to generate a summary report for a period, say 24 hours. I snapshot every 10 minutes. I was thinking I would generate at least the following:

* CPU used by this session for each 10 minute interval in the 24 hour range.
* Top 5 Wait Events for each 10 minute interval in the 24 hour range.

The idea is, to identify trends in resource usage over the period of monitoring.

Before I go any further with this, has anyone done anything like this already?

Tom, do you think that this can be done efficiently in a single query?

Best Regards



Tom Kyte

Followup  

October 22, 2003 - 7:46 pm UTC

I haven't looked at the schema close enough to tell -- but you will LOVE 10g, 100% assured of that (we've given the database a long term memory ;)

CPU and Server Types

October 22, 2003 - 8:56 pm UTC

Reviewer: Meyer from Baltimore, MD

Tom looking at page 270 of Effective Oracle by Design...and your servers.

Aria-Dev: Sun UltrEnterprise 450 server, 2UltraSparc-2 CPU's 400mhz, 1536MG of physical memory

Bosshog: Dell Poweredge 2650 server, 2 Xeon CPUs 2.8Ghz, 512k Cache, 6GB of physical Memory

I know they are not the same CPU's or platforms...but can you explain how Bosshog has 7 times the CPU speed and four times the ram of Aria-dev but transactions per second are 2,000 vs. 1,600 (not as far). I know these are not linear and it involves much more (and not sure of your specifics)...but this surprised me.

Thanks,
Meyer


Tom Kyte

Followup  

October 23, 2003 - 7:29 am UTC

you cannot really compare a 400mhz SPARC to a 2.8ghz CISC chip in the first place (apples and oranges with no real conversion functions).

they were just "2 machines". I wasn't trying to compare the two machines really -- but rather the difference between "don't bind" vs "bind" on a multi-cpu machine with multiple concurrent users.

the tests did insert/commit 50,000 times. There was lots of other waiting going on (log file syncs, etc).

also, even if you assume that the one machine had 7 times the CPU -- that does not mean anything in a purely insert intensive operation. there the bottleneck would not really be CPU.

CPU utilization

October 22, 2003 - 11:32 pm UTC

Reviewer: praveen from India

Excellent discussion...Hats off to Tom and thanks to Fentu whose question fired an excellent discussion!

"Long-Term Memory"

October 23, 2003 - 11:21 am UTC

Reviewer: Doug Wingate from New Orleans, LA, USA

You remark that Oracle Corp. has given Oracle Database 10g a long-term memory. Is this a coy reference to developments in connection to the X$Trace fixed view? After having learned of this new facility in Milsap & Holt's book, I'm very interested in whatever information you can provide about plans for X$Trace. You can speak openly with us; we'll consider it all off the record and just make attribution to "a senior administration official."

Tom Kyte

Followup  

October 23, 2003 - 1:18 pm UTC

it is this thing called ADDM, search for that on otn.

Method to know the current load on Oracle server.

January 29, 2004 - 4:48 pm UTC

Reviewer: Pat from US

Hi Tom,
In my reporting application users will create alot of SQL's on real time basis, which I store in Database table ( these sqls can return huge data of 100,000 + ) .Now based on the Oracle Server Load ( cpu utilization ) I would like to execute these SQL's. I do not want to execute these SQL's when already the Oracle server is under heavy load due to some other operations. e.g How do I get a callback mentioning that the server load ( cpu utilization ) is less that say less than 50% giving me a green signal to automatically execute some SQL's and stop when the load increases.

Tom Kyte

Followup  

January 30, 2004 - 8:04 am UTC

rather then you write tons of code to implement a resource manager, why don't you just use the builtin one?

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96521/part5.htm#435958 <code>



CPU costing

February 23, 2004 - 10:52 am UTC

Reviewer: Christo Kutrovsky from Ottawa, ON Canada

Since we are talking about CPU usage I have a small question.

Since 9i there's the concept of CPU costing. Could one expect to see somehow lower CPU usage due to more efficient use of CPU, or more CPU usage (and less IO for example) due to the system been aware of the spare CPU time available.

Also, what kind of queries would change plans if we had CPU costing enabled ?



Tom Kyte

Followup  

February 23, 2004 - 4:26 pm UTC

One could expect to see "both" depending on the circumstances

All kinds (every in short) of queries could change. You are changing the mathematical model that spits out the cost of a given plan. In effect every query plan is affected.

Follow up

February 23, 2004 - 4:51 pm UTC

Reviewer: Christo Kutrovsky from Ottawa, ON Canada

I was hoping that you would provide 1 or 2 examples of a query plan that flips due to enabling CPU costing.

Or if not examples, theritcally specific cases in which it will flip.



What is a blade server?

March 28, 2004 - 7:16 pm UTC

Reviewer: reader

I have been hearing about blade server but I don't know what it is and how is it different frpm a SMP? Thanks.

Tom Kyte

Followup  

March 28, 2004 - 7:24 pm UTC

google "blade server"
</code> http://search390.techtarget.com/sDefinition/0,,sid10_gci770169,00.html <code>


the difference is - a blade typically means SMP. SMP does not mean "blade". blades are interchangeable "pizza boxes" that fit into a rack. cheap enough to throw away when then break (like razor blades). easy to expand, easy to reconfigure, easy to manage -- all in a big cabinet.

Conventional wisdom? Who me?

April 13, 2004 - 12:12 pm UTC

Reviewer: Ross from London, United Kingdom

Tom,
I've thoroughly enjoyed the discussion above, but as a relatively inexperienced DBA the following comments you made on conventional Oracle wisdom kinda blew me out of the water with regard to pretty much everything I've learned so far:

o rebuild indexes on a schedule. everyone knows indexes need to be rebuilt - (Uh Oh!)
o you must separate tables from indexes by tablespace, it is only common sense - (Oh No!)
o you should shutdown your database frequently - (Yup, that's me!)
o if your query is not using your indexes, the optimizer did the wrong thing - (It isn't?)

Back to the drawing board, I guess ;)

Thanks,
Ross


Cpu time

May 03, 2004 - 1:18 pm UTC

Reviewer: atul from India

Hi,

When i see cpu utilzation of my-database,
Everytime i find same process which is taking high cpu.
Then i found out SQL behind that using:

++++++++++++++++++++++++++++++++++++++++++++++++++

select a.sid, a.last_call_et ,b.sql_text
from v$session a
,v$sqltext b
where a.username is not null
and a.status = 'ACTIVE'
and a.sql_address = b.address
order by a.last_call_et,a.sid,b.piece;

+++++++++++++++++++++++++++++++++++++++++++++++++++++++

I got following sql

select JOBNAME ,STATUS ,PARMFILE ,GROUPNAME ,APPLICATION ,AUTHORITY_GROUP ,JOBDESC ,REQDATE ,SEQNO ,ROWID from BRFRUNQ Q
where Q.STATUS='REQUESTED' order by REQDATE

I don't think this sql should take 37% of server cpu.
So if i fired sql-finding query again and again
its showing same sql..

Does that mean its a job running and has opened a cursor
which appends the CPU after each run??

Table contains only 3,000,00 rows and is indexes properly

Database is 7.3.4 and using CBO.
I saw dba_jobs which shows "no rows selected".

Plz suggest


Tom Kyte

Followup  

May 03, 2004 - 7:35 pm UTC

that query is an application generated query -- nothing "internal" to oracle or anything.

It means it is an application you are running that is running that sql.

May 03, 2004 - 7:53 pm UTC

Reviewer: Gary from Sydney, Australia

I've encountered the BRFRUNQ table before.
It's part of a batch queueing mechanism (QMaster).

You've simply got a job that is continually polling the table to see if there have been any batch jobs requested.



Confused...again!!

May 04, 2004 - 11:21 am UTC

Reviewer: Vijay from UK

Dear Tom,
I was going through this discussion and was very happy at the beginning and confused at the end of it..
I am copying part of your replies in the discussion..
1.
"you cannot, repeat, cannot put CPU in the bank.

If you are constantly running at 90% -- what use -- what use -- is that other
10%

Can you save it?
Can you use it else where?"

2.
"I don't see sluggish response from ls and such at 90% on my systems, but
anyway...

if you are running at 90% continously that is not good.
if you peak at 90% from time to time that is ok.
"


Now ..is the second reply not contradicting the first one? Please clarify.

Thanks

Tom Kyte

Followup  

May 04, 2004 - 1:41 pm UTC

No, the question/answer just got refined over time.


it is that there is a difference between

a) peaking at 90/100% of cpu
b) running at capacity all of the time.

that's all. If I hit 90% during peak, excellent.

if I'm at 90-100% all of the time, not so excellent (no growth room, probably over capacity already).

CPU Question

May 18, 2004 - 10:20 pm UTC

Reviewer: Matt from Australia

The statspack snippet below shows top 5 waits as CPU and index reads. The logical reads per segment are all on index objects.

There is one statement using 86/306 seconds of CPU each.

What approaches are available in general to reduce the amount of CPU used by a statement (In this case an insert into as select)?

Snap Id Snap Time Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
Begin Snap: 7 19-May-04 11:24:41 10 8.5
End Snap: 8 19-May-04 11:33:25 10 8.5
Elapsed: 8.73 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 24M Std Block Size: 8K
Shared Pool Size: 152M Log Buffer: 512K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 3,948,094.40 2,068,801,464.00
Logical reads: 45,685.46 23,939,183.00
Block changes: 28,424.99 14,894,694.00
Physical reads: 138.25 72,441.00
Physical writes: 289.28 151,581.00
User calls: 0.05 27.00
Parses: 1.65 866.00
Hard parses: 0.02 13.00
Sorts: 0.21 109.00
Logons: 0.00 0.00
Executes: 686.49 359,721.00
Transactions: 0.00

% Blocks changed per Read: 62.22 Recursive Call %: 99.99
Rollback per transaction %: 0.00 Rows per Sort: 3460.27

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 99.72 In-memory Sort %: 97.25
Library Hit %: 99.98 Soft Parse %: 98.50
Execute to Parse %: 99.76 Latch Hit %: 100.00
Parse CPU to Parse Elapsd %: 5.61 % Non-Parse CPU: 99.98

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 35.76 35.90
% SQL with executions>1: 89.29 88.69
% Memory for SQL w/exec>1: 69.24 68.56

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 306 59.60
db file sequential read 19,365 70 13.65
log file parallel write 18,691 54 10.43
db file scattered read 6,438 23 4.47
db file parallel write 768 16 3.21
-------------------------------------------------------------

...

Top 5 Logical Reads per Segment for DB: TSTDM Instance: tstdm Snaps: 7 -8
-> End Segment Logical Reads Threshold: 10000

Subobject Obj. Logical
Owner Tablespace Object Name Name Type Reads %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
DPRDATA DPR_MED_IN PK_DELIVERY_POINT INDEX 8,577,488 36.38
DPRDATA DPR_MED_IN PK_DP_USER INDEX 1,038,912 4.41
DPRDATA DPR_MED_IN PK_DPU_SU_CUSTOMER INDEX 1,029,776 4.37
DPRDATA DPR_MED_IN PK_DPU_GAS_ZONE INDEX 1,029,024 4.36
DPRDATA DPR_MED_IN NDX2_DPU_GAS_ZONE INDEX 1,019,280 4.32
-------------------------------------------------------------

...


CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
4,674,645 1 4,674,645.0 19.5 86.33 112.43 744374707
Module: SQL*Plus
insert /*+ append */ into dprdata.dpu_gas_zone ( delivery_point
_id ,effective_from_date ,network_id ,license_area_id ,s
ub_network_id ,heating_value_zone_id ,created_by_irid ,eff
ective_to_date ) select mirn delivery_point_id ,mirn_commi
ssioned_date effective_from_date ,substr(gas_zone_code,1,1




Tom Kyte

Followup  

May 19, 2004 - 9:41 am UTC

well, CPU time is a timed event -- not a wait. It is just trying to tell you "hey, you are using lots of CPU" -- you 523.8 cpu seconds of time PER cpu you have in your snap (8.73*60)...


the insert as select is doing lots and lots of buffer gets -- that'll take a bit of CPU to do (lios = CPU)

you would be back to "query tuning" here - reducing the amount of work the defining query does (probably by doing things "bulky", hash joins instead of nested loops whatever -- hard to say, cannot see the query cannot see the plans)

I must say, i've never seen this before :)

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 3,948,094.40 2,068,801,464.00
^^^^^^^^^^^^^^^^

wow!

Additional info..

May 19, 2004 - 7:53 pm UTC

Reviewer: Matt from Australia

For the above question. I was really just curious about the what took up th etime in my process. I am doing a bulk transfer of data from one schema to another. It takes about ten minutes which is fast enough, I just wondered where the time was going.

Regarding the REDO.

I carry out 20 or so of insert /*+ append*/ into as selects. Each which generates about 500,000 rows. What mechanisms do I have for reducing the redo?

I have the queries, I don't have the plans. If I get time I will generate these and if I catch you accepting questions I will post these. at this point I am interested purely for acedemic reasons (fast enough is good enough for me).

Many Thanks.

Tom Kyte

Followup  

May 20, 2004 - 10:14 am UTC

CPU time 306 59.60
db file sequential read 19,365 70 13.65
log file parallel write 18,691 54 10.43
db file scattered read 6,438 23 4.47
db file parallel write 768 16 3.21

looks like 5.1 minutes on the CPU (assuming you were "it", the thing running)
1.2 minutes on sequential reads (index reads)
0.4 minutes doing scattered reads (full scans)

so, about 7 minutes or thereabouts are accounted for there. a TKPROF of the application itself (not statspack) would nail it down even better.


as for reducing redo

o insert append into unindexed tables and build indexes after the fact (parallel, nologging)

o if in archivelog mode, and it is OK with your DBA, put the table into nologging mode and no redo for the table insert will be generated. DBA needs to know so they can backup that unrecoverable operation.




Mutliple CPUs

June 29, 2004 - 2:37 pm UTC

Reviewer: A reader

Hi Tom,

What if there are 2 CPUs and the CPU utilization is constantly 50% when I run a large query.

Should I look at parallel execution?

Tom Kyte

Followup  

June 29, 2004 - 4:29 pm UTC

you could. make sure you are using the CBO currently.

how to know which query taking 100%

July 06, 2004 - 5:40 am UTC

Reviewer: Bhaskar from India

Hello tom,

I want to know which query is taking lot of resources like cpu utilization, memory etc.

Tom Kyte

Followup  

July 06, 2004 - 7:43 am UTC

read about statspack and start using it.

July 06, 2004 - 11:04 am UTC

Reviewer: A reader

Tom,
What is the larget number of concurrent users that can be supported by Oracle, given a Sun SPARC / or SunFire hardware

Could You pls compare it with NT Servers / Linux too.

Some sort of benchmark

Thanks!


Tom Kyte

Followup  

July 06, 2004 - 11:32 am UTC

</code> http://www.tpc.org/ <code>

not that anyone elses benchmarks can be applied to your problem -- if you have different types of transactions than a TCP-C or H or whatever (and you do, of that I am sure) -- your mileage will vary.

in a word

"lots"

on any of those platforms.

Thank you again

July 06, 2004 - 1:07 pm UTC

Reviewer: GJ from UK

I was looking at our systems today and starting to worry that 98% CPU usage was bad, but everything was working well and no one was complaining, after reading this thred, I am starting to think more clearly. Look at the bigger picture. If your running at 99% and if everything both OLTP and batch, are getting done and no one is complaining, why worry. I guess the point is if your running OK then you have got the system resources fully utilised, you only need to be concerned if the R&D mob come down and say they want to add another 1000 users and 25 apps, you can then start to look at the load stats and recourse usage for each component and size up what additional resources you need to add to cover the additional.

Sounds to me like you need that 20% buffer zone if you are not comfortable with your system or worse don't understand what it is doing at various stages of it's day. (That last one has probably condemned me to a stoning!)

Tom Kyte

Followup  

July 06, 2004 - 2:26 pm UTC

well -- running at 99% 100% of the time isn't so good, no problem hitting it, or peaking at it, but if you are always there -- you do need to look into either "why" or "how can i grow over time"

Need Guidance!!!!!!!!!!!!!

August 11, 2004 - 12:54 pm UTC

Reviewer: A reader

Hi Tom,
Hope you are not too tired by answering all our queries ;)
Well our system was using 100% CPU since 2-3 days. I did a statspack report in the morning when the problem was there.
Here is an excerpt:


Snap Id Snap Time Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
Begin Snap: 221 11-Aug-04 09:45:37 26 3.7

End Snap: 222 11-Aug-04 10:08:02 22 3.1

Elapsed: 22.42 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 824M Std Block Size: 8K
Shared Pool Size: 400M Log Buffer: 512K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 178,175.70 148,479.75
Logical reads: 4,096.68 3,413.90
Block changes: 1,215.82 1,013.18
Physical reads: 17.81 14.84
Physical writes: 35.19 29.32
User calls: 17.80 14.84
Parses: 22.93 19.11
Hard parses: 0.07 0.06
Sorts: 7.88 6.57
Logons: 0.96 0.80
Executes: 248.61 207.17
Transactions: 1.20

% Blocks changed per Read: 29.68 Recursive Call %: 95.59
Rollback per transaction %: 31.60 Rows per Sort: 192.85

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 99.74 In-memory Sort %: 99.99
Library Hit %: 99.92 Soft Parse %: 99.69
Execute to Parse %: 90.78 Latch Hit %: 99.99
Parse CPU to Parse Elapsed %: 23.11 % Non-Parse CPU: 99.02

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 34.26 34.30
% SQL with executions>1: 53.36 53.94
% Memory for SQL w/exec>1: 61.93 61.89

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
enqueue 5,254 15,660 85.82
PX qref latch 766 795 4.36
PX Deq: Execute Reply 427 584 3.20
PX Deq Credit: send blkd 26,622 355 1.95
db file sequential read 4,821 245 1.34
-------------------------------------------------------------


As you can see a lot of Redo is being generated. Is that a bad thing? Also the Top 5 show me enqueue. I understood that, but what are the waits starting with PX.
Can you explain them a bit.
Also the Buffer Gets and Physical Reads seem normal to me. There is a huge one but b'coz it lacked an index which I have created.
Also any guides as to what should be my approach in identifying the bottleneck and reducing the CPU utilization to acceptable limits.
Thanks in advance as always.

Tom Kyte

Followup  

August 11, 2004 - 1:54 pm UTC

question is (i use that redo example in my talks, excellent) -- what was the redo generation when things were "good"?

you cannot tell if that is "bad" unless you know what "good" for you is.

ignore the PX stuff -- look at those enqueues, massive blocking issue here don't you think?

i would tend to zero in on that really big design problem.

(question - in a system where enqueue waits are so large, I'm thinking "transactional system" -- but then why are you running PQ????)


Thanks Tom

August 11, 2004 - 2:27 pm UTC

Reviewer: A reader

Hi again Tom,
I know I must focus on Enqueues more than anything else.
But where do I start from. Any area in the statspack report deserves special attention.
Also these blocks started occurring from last night.
So that put me in a bit of a worry.
Any advise is welcome as always.
Thanks

Tom Kyte

Followup  

August 12, 2004 - 7:38 am UTC

statspack is not very useful for tuning an application, enqueues are an application problem.

I'd look to a level 12 10046 trace on my most important, criticial application(s) and work from there.

If these locks "just started happening" and never happened before -- it should be simple to correct. Back out whatever you added to your system last night.


If the enqueues are epidemic, you of course can use v$lock to view blockers and blockees.

get cpu usage info of the session.

November 05, 2004 - 10:49 pm UTC

Reviewer: Sean from NJ, USA

Hi Tom,

We allow analysts to query production db and give some developers more privilege such as "create table as select " to migrate data ( a few millions rows). Analysts are supposed to use tuned sql statement and developer are supposed to run "create table as " as a batch job during off peak hour (night).

Sometimes, end users complained that application is slower than usual. I found this method is very useful to find culprit:

Use prstat to find which process use >30% CPU for more than 30 min, then find the session and sql using V$sqltext, v$session and v$process.

The problems are always like this:
(1) Some analysts query million rows table without index.
(2) Developer run job such as "create table as select " for millions rows during the daytime.

My question is how I can get the process info which use 30% CPU for more than 30 min in the database, so I can automate the process and send email to me whenever this happens ( Try to be more proactive).


9204 on Solairs 5.9.

Thanks so much for your help.


Tom Kyte

Followup  

November 06, 2004 - 10:34 am UTC

cpu stats are not added into the v$ tables until the CALL COMPLETES (we don't update that stat until it is "done" with whatever call it is doing)


Why not go by LIO's?

Using LIO to find the session which uses a lot of cpu .

November 06, 2004 - 8:14 pm UTC

Reviewer: Sean from NJ, USA

Hi Tom,

How can I quickly find the session which currently uses high percent of cpu for a long time by using LIO?

It seems that statspack report are not good and fast enough to do this job?


Thanks so much for your help.


Tom Kyte

Followup  

November 06, 2004 - 9:07 pm UTC

you don't care "about the session"

you care about "the sql"

v$sql has the info.

script to find currently running sql which uses most of resource

November 07, 2004 - 12:13 am UTC

Reviewer: Sean from NJ, USA

Hi Tom,

I want to find sql which is CURRENTLY using most of cpu. The v$sql generates historical sql (I found sql which I used yesterday).

select * from (select sql_text, buffer_gets, cpu_time
from v$sql order by 2 desc )
where rownum < 6;

You showsql script is very useful, but didn't help me in this case.

Please help me with sql script which will find the CURRENTLY running sql which uses most of cpu or resource.

Thanks so much for your help.


Tom Kyte

Followup  

November 07, 2004 - 2:40 pm UTC

you cannot -- i said that CPU stats are not dumped into the V$ tables until the end of the call. if you have a query plan like:


hash join
table access full
table access full

that first row is going to "do all of the work" -- but until you actually get the first row -- it will not have dumped any cpu time into the v$ tables.

cpu is a post mortem statistic, like sqlnet message from client, it is not known until after it happens.


That shows you HISTORICALLY

current running sql which use a lot of resources

November 07, 2004 - 5:02 pm UTC

Reviewer: Sean from NJ, USA

Hi Tom,

I am sorry to bother you again. I should have dropped cpu term in my question. But it must be a way to find current running sql which uses a lot of resource, or which slows down the whole application.

Here is scenario:
The server is slow, let us find which sql are culprit. Here is the result of Tom’script which clearly shows these are the suspects. You must have something like this.

Thanks so much for your help.







Tom Kyte

Followup  

November 07, 2004 - 5:16 pm UTC

I myself use statspack.

top sql report.

top sql by whatever you want to look at it. it is already done for us.

Take a snap, get a cup of coffee, take another snap and generate the report.

Performance extrapolation isn't linear to 100% CPU usage

November 08, 2004 - 1:25 pm UTC

Reviewer: andrew from L.A., CA USA

These are interesting links in the discussion of running at high CPU percentages. They indicate that performance becomes unpredictable and non-linear up to 100% CPU:
</code> http://www.rittman.net/archives/001119.html http://www.ixora.com.au/q+a/0104/02142638.htm <code>

100% or less

November 12, 2004 - 9:59 am UTC

Reviewer: Zoran Martic from Dublin, Ireland

Andrew,

You want your system using as much resources as possible.
That means as more you are 100% (at least at peak time) and everybody is happy with the response times of the apps they are running while not decreasing the critical transactions over the limit and serving more, then you are OK.
Also your investment in the server is great, because you did not buy concord and fly 300 mph/h.

Also bear in mind that for most OLTP applications the server side processing is not the biggest chunk of processing time, you have networking, client processing, crazy waits, ....
In many cases you do not care then some fast SQL is executed 50% slower because of the queueing theory because you spent more on the sending SQL from for example Forms to Oracle.

For batch processes you want to be 100% :)

Probably my rule is that 90-100% is great while everybody are happy.
Of course this is not true if you parallelize some work in 2x more processes and you processed less, just to keep CPU's near 100% :)

Of course that is what all of these people are saying (Millsap, Adams) I suppose.

I had a few situations when 100% CPU was great:
1. 1billion array inserts with C-OCI on 24 CPU machine
2. added new batch while 8 CPU machine was 97% used to get additional 100k transactions done

There are a planty when it was not because of some unscalable or untuned shared structure inside Oracle, UNIX, apps, ....

Sometimes you do not care about M/M/n theory while considering the overall response time for the critical business units.

You want to burn that beast of hardware to the ground -joking of course :)

Regards,
Zoran

cpu usage from 10046

November 19, 2004 - 8:56 pm UTC

Reviewer: reader from PA

tom,

If you just have a 10046, can you tell if a process is waiting for cpu time? If elapsed = cpu, could it be that cpu took longer than it should have because cpu was being contended for or overly taxed? or would cpu contention manifest as some form of wait? If there are 0 waits of significance (cpu~=elapsed) in a tkrpof report, does that likely mean you still have some CPU time left or would you see the same thing on a box that was out of power, but the cpu time you see would just be greater than a lower load level?

So simply, how does cpu starvation manifest itself in a 10046 trace, or does it?

Tom Kyte

Followup  

November 20, 2004 - 8:11 am UTC

when (elapsed-cpu) > 0 and the sum(waits during that time) < (elapsed-cpu), the time spent waiting for cpu is generally:

(elapsed-cpu)-sum(waits during that time)

it would in most cases manifest itself as "missing time" -- elapsed-cpu is time to account for, if your waits don't cover that, then the most probable cause is "waiting for cpu" (although it could be uninstrumented oracle code -- but there is very little of that left)

Are enqueues application or database specific

November 22, 2004 - 1:22 pm UTC

Reviewer: Prashant from San Jose, CA

Hi Tom,
I am facing a problem, in our application for a batch job we are seeing a lot of enqueue waits when I turn on the trace level 10046. The process in itself is broken down into 8 threads using "business logic", not by the rowid logic you would recommend. These 8 threads are all started at the same time but by the tkprof it appears to me that each of threads is queued and is waiting till the previous one completes, so the overall time is actually a summation of all 8 threads if run serially not parallely as was desired. The process is a actually a single insert into .... select from table. I ran the selects separately and all the 8 threads complete in 9 minutes. But if all the 8 threads with inserts are run, then it takes more than 60 minutes.

A sample tkprof output of the wait events is below, this is for 2nd thread, I have noticed the subsequent threads have more enqueue waits.

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 335.68 807.78 274860 185952 112 3878034
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 335.68 807.78 274860 185952 112 3878034

Misses in library cache during parse: 0
Optimizer goal: ALL_ROWS
Parsing user id: 26 (recursive depth: 1)



Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
enqueue 133 3.01 399.89
PX Deq: Join ACK 4 0.00 0.00
PX Deq: Execute Reply 23 2.00 21.98
PX Deq Credit: send blkd 744 0.01 0.32
PX Deq: Parse Reply 6 0.02 0.06
db file scattered read 5857 0.22 56.67
db file sequential read 3 0.01 0.02
direct path write 201 0.20 3.13
latch free 1 0.00 0.00
direct path read 177 0.12 2.84
PX Deq Credit: need buffer 295 0.01 0.21
PX Deq: Signal ACK 2 0.10 0.10
--------------------------------------------------------------------------------

By my quick calculation, I have noticed that the sum of waits are more than (elapsed-cpu), so I guess it wasn't waiting on CPU, also I noticed that CPU was under utilized on our SOLARIS box. And at the same time I was observing the waits and the locks, and observed there were a lot of enqueue waits and the exclusive row level locks were been held for the table to which we were inserting data.

The inserted table is partitioned and I have increased the freelists and initrans value to 15 and 8.

The only difference in the two environments i.e development (it works fine here) and production is the table creation script, in production it is PARALLEL ( DEGREE 5 INSTANCES 1 ); while in dev it is NOPARALLEL, we have 24 cpu's, is that the cause?

Tom Kyte

Followup  

November 22, 2004 - 4:00 pm UTC

enqueue waits are "locks" -- you waited on a row lock (you were trying to lock a row but someone already had it)

can you be more specific what the inserts/tables look like? it would be unusual for this to happen on "normal old inserts"

More info....Insert Scripts...Query

November 23, 2004 - 7:20 pm UTC

Reviewer: Prashant from San Jose, CA

Hi Tom,

Below are the scripts, one insert table, 3 select tables their count(*) and oracle version. (also I wrongly mentioned as SOLARIS it's HP-UX). We are running 8 of this inserts parallely using different data range (we use a splitting logic based on fiscal month key). Unfortunately all of them wait and only 1 runs at a time, and once it completes the next one is spawned. However our DBA provided a solution he told us not to use PARALLEL (DEGREE 5 INSTANCES DEFAULT) on the insert table definition instead use NOPARALLEL and this allowed the job to complete in 10 minutes and all the workers were spawned parallely. Could you let me know what was the reason for the behaviour seen earlier?

"INSERT Table Script"
CREATE TABLE SRS_CO_BKG_L1_FD_A
(
  ORDER_NUMBER             NUMBER(22),
  CUSTOMER_KEY             NUMBER(22),
  SALES_TERRITORY_KEY      NUMBER(22),
  VERTICAL_MARKET_KEY      NUMBER(22),
  COMPANY_TARGET_KEY       NUMBER(22),
  TMS_KEY                  NUMBER(22),
  SALESREP_KEY             NUMBER(22),
  PRODUCT_FAMILY_KEY       NUMBER(22),
  SUPER_SALES_CHANNEL_KEY  NUMBER(22),
  FISCAL_MONTH_KEY         NUMBER(22),
  FISCAL_DATE_KEY          NUMBER(22),
  SERVICE_TYPE             VARCHAR2(2 BYTE),
  ADJUSTMENT_CODE          VARCHAR2(30 BYTE),
  ADJUSTMENT_TYPE          VARCHAR2(30 BYTE),
  BOOK_LIST_PRICE          NUMBER(22,7),
  BOOK_NET_PRICE           NUMBER(22,7),
  BOOK_COST_PRICE          NUMBER(22,7),
  BOOK_SUPPORT_LIST        NUMBER(22,7),
  BOOK_SUPPORT_NET         NUMBER(22,7),
  BOOK_SUPPORT_COST        NUMBER(22,7),
  HOLD_LIST_PRICE          NUMBER(22,7),
  HOLD_NET_PRICE           NUMBER(22,7),
  HOLD_COST_PRICE          NUMBER(22,7),
  HOLD_SUPPORT_LIST        NUMBER(22,7),
  HOLD_SUPPORT_NET         NUMBER(22,7),
  HOLD_SUPPORT_COST        NUMBER(22,7),
  PRODUCT_US_LIST          NUMBER(22,7),
  PRODUCT_US_NET           NUMBER(22,7),
  PRODUCT_US_COST          NUMBER(22,7),
  SUPPORT_US_LIST          NUMBER(22,7),
  SUPPORT_US_NET           NUMBER(22,7),
  SUPPORT_US_COST          NUMBER(22,7),
  TIMEZONE                 VARCHAR2(10 BYTE)
)
TABLESPACE P1D
PCTUSED    40
PCTFREE    1
INITRANS   15
MAXTRANS   255
STORAGE    (
            FREELISTS        20
           )
NOLOGGING
PARTITION BY RANGE (FISCAL_MONTH_KEY) 
(  
  PARTITION SRS_CO_BKG_L1_FD_A_2003_1 VALUES LESS THAN (158)
    NOLOGGING
    NOCOMPRESS
    TABLESPACE P1D
    PCTUSED    40
    PCTFREE    1
    INITRANS   15
    MAXTRANS   255
    STORAGE    (
                INITIAL          64K
                MINEXTENTS       1
                MAXEXTENTS       2147483645
                FREELISTS        20
                FREELIST GROUPS  1
                BUFFER_POOL      DEFAULT
               ),  
so on....
NOCACHE
PARALLEL ( DEGREE 5 INSTANCES 1 );

"SELECT Tables Script"
CREATE TABLE SRS_BOOK_END_CUST_F
(
  EDW_BKG_TRX_ROLE_KEY   NUMBER,
  DATA_SOURCE            VARCHAR2(30 BYTE),
  HEADER_ID              NUMBER,
  LINE_ID                NUMBER,
  SERVICE_TYPE           VARCHAR2(2 BYTE),
  END_CUSTOMER_KEY       NUMBER,
  PO_CUSTOMER_KEY        NUMBER,
  PRODUCT_KEY            NUMBER,
  SALES_TERRITORY_KEY    NUMBER,
  SALESREP_KEY           NUMBER,
  FISCAL_MONTH_KEY       NUMBER,
  TRANSACTION_DATE_KEY   NUMBER,
  CORP_PROCESS_DATE_KEY  NUMBER,
  EMEA_PROCESS_DATE_KEY  NUMBER,
  ASIA_PROCESS_DATE_KEY  NUMBER,
  CORP_VM_ID             NUMBER,
  ORDER_NUMBER           NUMBER,
  PURCHASE_ORDER_NUMBER  VARCHAR2(150 BYTE),
  DEAL_ID                VARCHAR2(150 BYTE),
  CANCEL_CODE            VARCHAR2(30 BYTE),
  BOOK_LIST_PRICE        NUMBER,
  BOOK_NET_PRICE         NUMBER,
  BOOK_COST_PRICE        NUMBER,
  BOOK_SUPPORT_LIST      NUMBER,
  BOOK_SUPPORT_NET       NUMBER,
  BOOK_SUPPORT_COST      NUMBER,
  HOLD_COST_PRICE        NUMBER,
  HOLD_LIST_PRICE        NUMBER,
  HOLD_NET_PRICE         NUMBER,
  HOLD_SUPPORT_COST      NUMBER,
  HOLD_SUPPORT_LIST      NUMBER,
  HOLD_SUPPORT_NET       NUMBER,
  ACQUISITION_FLAG       VARCHAR2(1 BYTE),
  ADJUSTMENT_TYPE        VARCHAR2(30 BYTE),
  ADJUSTMENT_CODE        VARCHAR2(30 BYTE),
  DESCRIPTION            VARCHAR2(1000 BYTE),
  ADJUSTMENT_SOURCE      VARCHAR2(30 BYTE),
  PRODUCT_US_LIST        NUMBER,
  PRODUCT_US_COST        NUMBER,
  PRODUCT_US_NET         NUMBER,
  SUPPORT_US_LIST        NUMBER,
  SUPPORT_US_COST        NUMBER,
  SUPPORT_US_NET         NUMBER,
  ADJUSTMENT_DATE_KEY    NUMBER,
  TRANSACTION_ID         NUMBER,
  TRANSACTION_TYPE       VARCHAR2(20 BYTE),
  REC_TYPE               VARCHAR2(15 BYTE),
  SOURCE_CREATE_DATE     DATE,
  SOURCE_UPDATE_DATE     DATE,
  CREATE_DATE            DATE,
  UPDATE_DATE            DATE,
  STATUS                 VARCHAR2(1 BYTE),
  HISTORY_FLAG           VARCHAR2(1 BYTE)
)
TABLESPACE STD
PCTUSED    80
PCTFREE    20
INITRANS   1
MAXTRANS   255
PARTITION BY RANGE (FISCAL_MONTH_KEY) 
(  
  PARTITION SRS_BK_EU_F_2002_1 VALUES LESS THAN (146)
    LOGGING
    NOCOMPRESS
    TABLESPACE STD
    PCTUSED    80
    PCTFREE    20
    INITRANS   1
    MAXTRANS   255
    STORAGE    (
                INITIAL          64K
                MINEXTENTS       1
                MAXEXTENTS       2147483645
                FREELISTS        1
                FREELIST GROUPS  1
                BUFFER_POOL      DEFAULT
               ),

so on....
NOCACHE
NOPARALLEL;



CREATE TABLE SRS_CO_SALES_HIER_DIM
(
  VERSION_ID                   NUMBER(10),
  ORG_NODE_CODE                VARCHAR2(1 BYTE),
  ORG_NODE_NAME                VARCHAR2(40 BYTE),
  UO_TERRITORY_ID              NUMBER(22),
  BO_TERRITORY_ID              NUMBER(22),
  ORG_NODE_LEVEL_NUMBER        NUMBER(22),
  ORG_LEVEL_2_NAME             VARCHAR2(40 BYTE),
  NODE_LEVEL_1                 VARCHAR2(40 BYTE),
  NODE_LEVEL_2                 VARCHAR2(40 BYTE),
  NODE_LEVEL_3                 VARCHAR2(40 BYTE),
  NODE_LEVEL_4                 VARCHAR2(40 BYTE),
  NODE_LEVEL_5                 VARCHAR2(40 BYTE),
  NODE_LEVEL_6                 VARCHAR2(40 BYTE),
  NODE_LEVEL_1_KEY             NUMBER(22),
  NODE_LEVEL_2_KEY             NUMBER(22),
  NODE_LEVEL_3_KEY             NUMBER(22),
  NODE_LEVEL_4_KEY             NUMBER(22),
  NODE_LEVEL_5_KEY             NUMBER(22),
  NODE_LEVEL_6_KEY             NUMBER(22),
  SALES_TERRITORY_KEY          NUMBER(22),
  SALES_TERRITORY_CODE         VARCHAR2(30 BYTE),
  SALES_TERRITORY_DESCRIPTION  VARCHAR2(240 BYTE),
  NODE_LEVEL_1_DESCRIPTION     VARCHAR2(240 BYTE),
  NODE_LEVEL_2_DESCRIPTION     VARCHAR2(240 BYTE),
  NODE_LEVEL_3_DESCRIPTION     VARCHAR2(240 BYTE),
  NODE_LEVEL_4_DESCRIPTION     VARCHAR2(240 BYTE),
  NODE_LEVEL_5_DESCRIPTION     VARCHAR2(240 BYTE),
  NODE_LEVEL_6_DESCRIPTION     VARCHAR2(240 BYTE),
  FISCAL_MONTH_KEY             NUMBER(22),
  SOURCE_CREATE_DATE           DATE,
  SOURCE_UPDATE_DATE           DATE,
  SRS_CREATE_DATE              DATE,
  SRS_UPDATE_DATE              DATE,
  STATUS                       VARCHAR2(50 BYTE),
  TIMEZONE                     VARCHAR2(10 BYTE)
)
TABLESPACE P1D
PCTUSED    40
PCTFREE    10
INITRANS   1
MAXTRANS   255
STORAGE    (
            INITIAL          64K
            MINEXTENTS       1
            MAXEXTENTS       2147483645
            PCTINCREASE      0
            FREELISTS        1
            FREELIST GROUPS  1
            BUFFER_POOL      DEFAULT
           )
LOGGING 
CACHE
NOPARALLEL;

CREATE TABLE SRS_CO_TMS_PROD_DIM
(
  TMS_KEY                       NUMBER(22),
  PRODUCT_FAMILY_KEY            NUMBER(22),
  DOLLAR_ALLOCATION_PERCENTAGE  NUMBER(22,7),
  PRODUCT_KEY                   NUMBER(22)
)
TABLESPACE P1D
PCTUSED    40
PCTFREE    10
INITRANS   1
MAXTRANS   255
STORAGE    (
            INITIAL          64K
            MINEXTENTS       1
            MAXEXTENTS       2147483645
            PCTINCREASE      0
            FREELISTS        1
            FREELIST GROUPS  1
            BUFFER_POOL      DEFAULT
           )
LOGGING 
CACHE
NOPARALLEL;


"Count(*) of different tables involved and oracle version"
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
PL/SQL Release 9.2.0.5.0 - Production
CORE    9.2.0.6.0       Production
TNS for HPUX: Version 9.2.0.5.0 - Production
NLSRTL Version 9.2.0.5.0 - Production

SQL> column name format a40
SQL> column type format 0
SQL> column value format a30
SQL> select name, type, value from v$parameter where name like 'parallel%';

NAME                                     TYPE VALUE
---------------------------------------- ---- ------------------------------
parallel_server                             1 FALSE
parallel_server_instances                   3 1
parallel_min_percent                        3 0
parallel_min_servers                        3 0
parallel_max_servers                        3 30
parallel_instance_group                     2
parallel_execution_message_size             3 2152
parallel_adaptive_multi_user                1 FALSE
parallel_threads_per_cpu                    3 2
parallel_automatic_tuning                   1 FALSE

10 rows selected.

SQL> select /* small select table */ count(*) from srs_co_sales_hier_dim;

  COUNT(*)
----------
     18778

SQL> select /* medium select table */ count(*) from srs_co_tms_prod_dim;

  COUNT(*)
----------
    127257

SQL> select /* large select table, partitioned on fiscal_month_key */ count(*) from srs_book_end_cust_f;

  COUNT(*)
----------
  33156904

SQL> select /* large select table, partitioned on fiscal_month_key, approx count on each partition */ count(*) from srs_book_end_cust_f where fiscal_month_key=182;

  COUNT(*)
----------
   1904797

SQL> select /* insert table, partitioned on fiscal_month_key */ count(*) from srs_co_bkg_l1_fd_a;

  COUNT(*)
----------
  29904190

SQL> 


"Query"
INSERT INTO srs_CO_bkg_l1_fd_a
  (
   order_number,
   customer_key,
   sales_territory_key, 
   vertical_market_key,
   company_target_key,
   tms_key,
   salesrep_key,
   product_family_key,
   super_sales_channel_key,
   fiscal_month_key,
   fiscal_date_key,
   timezone,
   service_type,
   adjustment_code,
   adjustment_type,
   book_list_price,
   book_net_price,
   book_cost_price,
   book_support_list,
   book_support_net,
   book_support_cost,
   hold_list_price,
   hold_net_price,
   hold_cost_price,
   hold_support_list,
   hold_support_net,
   hold_support_cost,
   product_us_list,
   product_us_net,
   product_us_cost,
   support_us_list,
   support_us_net,
   support_us_cost
  )
 SELECT /*+ ALL_ROWS  FULL(sbecf) FULL(sshd) CACHE(sshd) CACHE(stpd) */
          sbecf.order_number                       AS order_number,
         sbecf.po_customer_key                                                    AS customer_key,
         sbecf.sales_territory_key                                                AS sales_territory_key, 
         stpd.tms_key                                                             AS tms_key,
         sbecf.salesrep_key                                                       AS salesrep_key,
         stpd.product_family_key                                                  AS product_family_key,
         sbecf.fiscal_month_key                                                   AS fiscal_month_key,
         sbecf.corp_process_date_key AS fiscal_date_key, decode(book_support_net,:"SYS_B_03",NULL,decode(service_Type,:"SYS_B_04",:"SYS_B_05",:"SYS_B_06",:"SYS_B_07",:"SYS_B_08",:"SYS_B_09",:"SYS_B_10",:"SYS_B_11",:"SYS_B_12")) AS service_type,
         adjustment_code,
         adjustment_type,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_13") * NVL(book_list_price,:"SYS_B_14"))   AS book_list_price,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_15") * NVL(book_net_price,:"SYS_B_16"))    AS book_net_price,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_17") * NVL(book_cost_price,:"SYS_B_18"))   AS book_cost_price,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_19") * NVL(book_support_list,:"SYS_B_20")) AS book_support_list,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_21") * NVL(book_support_net,:"SYS_B_22"))  AS book_support_net,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_23") * NVL(book_support_cost,:"SYS_B_24")) AS book_support_cost,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_25") * NVL(hold_list_price,:"SYS_B_26"))   AS hold_list_price,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_27") * NVL(hold_net_price,:"SYS_B_28"))    AS hold_net_price,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_29") * NVL(hold_cost_price,:"SYS_B_30"))   AS hold_cost_price,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_31") * NVL(hold_support_list,:"SYS_B_32")) AS hold_support_list,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_33") * NVL(hold_support_net,:"SYS_B_34"))  AS hold_support_net,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_35") * NVL(hold_support_cost,:"SYS_B_36")) AS hold_support_cost,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_37") * NVL(product_us_list,:"SYS_B_38"))   AS product_us_list,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_39") * NVL(product_us_net,:"SYS_B_40"))    AS product_us_net,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_41") * NVL(product_us_cost,:"SYS_B_42"))   AS product_us_cost,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_43") * NVL(support_us_list,:"SYS_B_44"))   AS support_us_list,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_45") * NVL(support_us_net,:"SYS_B_46"))    AS support_us_net,
         SUM(NVL(stpd.dollar_allocation_percentage,:"SYS_B_47") * NVL(support_us_cost,:"SYS_B_48"))   AS support_us_cost
    FROM srs_book_end_cust_f                               sbecf,
         srs_CO_sales_hier_dim   sshd,  
     srs_CO_tms_prod_dim       stpd
   WHERE sbecf.product_key           = stpd.product_key
     AND sbecf.sales_territory_key   = sshd.sales_territory_key  
     --AND sbecf.end_customer_key      = scecd.end_customer_key
     AND sbecf.corp_process_date_key IS NOT NULL AND sbecf.fiscal_month_key IN (:"SYS_B_49",:"SYS_B_50",-:"SYS_B_51")  AND sshd.fiscal_month_key = :"SYS_B_52"   
GROUP BY sbecf.order_number,
         sbecf.po_customer_key,
         sbecf.sales_territory_key,  
         stpd.tms_key,
         sbecf.salesrep_key,
         stpd.product_family_key,
         sbecf.fiscal_month_key,
         sbecf.corp_process_date_key,         decode(book_support_net,:"SYS_B_55",NULL,decode(service_Type,:"SYS_B_56",:"SYS_B_57",:"SYS_B_58",:"SYS_B_59",:"SYS_B_60",:"SYS_B_61",:"SYS_B_62",:"SYS_B_63",:"SYS_B_64")),
         adjustment_code,
         adjustment_type 

Tom Kyte

Followup  

November 23, 2004 - 8:00 pm UTC

just needed the insert logic really -- so, do these guys being inserted into have any unique constraints --etc.

it is really the table being INSERTED INTO that is relevant here, it'll be the cause of the enqueues.

(shame you have to use extra cpu for cursor_sharing, not all statements need it -- only the ones from really bad programmers)

and the cache hint?

November 23, 2004 - 10:04 pm UTC

Reviewer: Prashant from San Jose, CA

1) The insert table doesn't have any constraints, no indexes. Also the logic for insert is fairly straightforward, divide the data that is been fetched into 8 unique sets by passing a number for the spawned thread and generating either one or max 3 months data which will not overlap with other threads, and each of the threads works on their own data for insert into the target table which is only one. One of the select tables which is quite huge i.e partitioned on every month is the driving table.

2) I didn't understand the cursor_sharing part?

3) The cache hint was added for the smaller tables, since we were "beating round the bush" to perceive the reason for this behaviour.

Let me know if this information is not sufficient.


Tom Kyte

Followup  

November 24, 2004 - 6:55 am UTC

1) I see you are using parallel query -- so this insert is a parallel direct path insert by default and only one of those is going at a time.

So tell me -- do you run 8 of these inserts concurrently, each of them attempting to do concurrent parallel direct path loads?


2) the :sys_b_ stuff -- you have cursor_sharing=force

decode(book_support_net,:"SYS_B_55",NULL,decode(service_Type,:"SYS_B_56",:"SYS_B_
57",:"SYS_B_58",:"SYS_B_59",:"SYS_B_60",:"SYS_B_61",:"SYS_B_62",:"SYS_B_63",:"SYS
_B_64"))

all literals removed, whether they should be -- no not. indication programmers didn't do their job.


3) thats not good. guessing, not good. see #1.

Concurent Inserts...

November 24, 2004 - 11:13 am UTC

Reviewer: Prashant from San Jose, CA

Hi Tom,

Thanks for the observations.

1) We are doing concurrent inserts on the table...that's correct...and when you mention its parallel direct path load inserts....is it because the table is defined such i.e PARALLEL DEGREE 5 DEFAULT INSTANCES, so in this event how many parallel process will be spawned when 8 workers are inserting concurrently into this table, and is direct path insert the reason for the other workers to wait?

2) We are using CURSOR_SHARING=similar, and since ours is a DW you suggest we should move to CURSOR_SHARING=EXACT?

3) Ok...I understand there is no good guessing, but Tom I would like to know, would CACHE anytime be helpful?

Thanks
Prashant

Tom Kyte

Followup  

November 24, 2004 - 12:23 pm UTC

1) check out if you are doing a direct path insert -- run your query and try to select from the table without committing.

ops$tkyte@ORA9IR2> create table t ( x int );
 
Table created.
 
ops$tkyte@ORA9IR2> insert into t values ( 1 );
 
1 row created.
 
ops$tkyte@ORA9IR2> select * from t;
 
         X
----------
         1
 
ops$tkyte@ORA9IR2> insert /*+ append */ into t select 1 from dual;
 
1 row created.
 
ops$tkyte@ORA9IR2> select * from t;
select * from t
              *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


If you get the 12838, you cannot do that from "multiple sessions" concurrently.

Use the v$ views to monitor how many parallel processes you have going -- insufficient data to answer (depends on other parameters)

2) cursor_sharing=exact is right for data warehouses, you are running a few queries (relative to a OLTP system) and the query run times are measured in seconds -- not milli-seconds.  You want the best plan possible for each and every single query in your system.

You are running queries that take seconds.
You are not running many queries per second.

so the hard parsing for your DW queries isn't going to be the issue, the runtime performance of these ad-hoc, "big" queries is.

3) I've never found a use for it. 

Getting closer.....direct path inserts

November 24, 2004 - 3:16 pm UTC

Reviewer: Prashant from San Jose, CA

Hi Tom,

1) We aren't doing an insert /* append */ into this table, just a INSERT INTO ....SELECT FROM, however I believe you have answered my question. The table was created with a parallel clause default degree. And before we run this query we alter the session.

ALTER SESSION ENABLE PARALLEL DML;

So the combination of both these causes a direct path insert, is that correct?

2) I had one more question, how do we determine the number of parallel process spawned for a single insert into a table with parallel clause default degree defined. Which parameter is it?

NAME TYPE VALUE
------------------------------ ---- ------------------------------
log_parallelism 3 1
parallel_server 1 FALSE
parallel_server_instances 3 1
recovery_parallelism 3 0
fast_start_parallel_rollback 2 LOW
parallel_min_percent 3 0
parallel_min_servers 3 0
parallel_max_servers 3 30
parallel_instance_group 2
parallel_execution_message_siz 3 2152
e

NAME TYPE VALUE
------------------------------ ---- ------------------------------

parallel_adaptive_multi_user 1 FALSE
parallel_threads_per_cpu 3 2
parallel_automatic_tuning 1 FALSE

3) Also I would like to know for selecting parallely do we need to have alter enable parallel dml done?

Thanks
Prashant

Tom Kyte

Followup  

November 24, 2004 - 3:40 pm UTC

and in PDML -- a parallel direct path insert is the default.

yes, you are locking yourself. one at a time, one after the other...


2) check out the concepts and the data warehouse guide -- you want to get a holostic view of PQ going. depending on the number of sorts/steps you could have "lots"

Thanks a lot....Tom...n/t

November 24, 2004 - 4:39 pm UTC

Reviewer: Prashant from San Jose, CA


how to use the table default value specified by the DEGREE clause after dbms_stats completion ...

January 03, 2005 - 9:41 am UTC

Reviewer: A reader

Tom,

while running gather_table_stats,I set

..
..
granularity => 'ALL',-
degree => dbms_stats.default_degree -
);

after his i ran my load process and it waited on
..
..
PX Deq: Execute Reply 4916 2.01 7990.55
PX qref latch 1057 1.00 9.55
PX Deq: Table Q Normal 901508 1.01 1132.91

I do not want degree to be set now .. this has taken more time. i want to quickly reset it to NULL - use the table default value specified by the DEGREE clause . Can you please help me on this..

Should i have to redo the gather table stats? ..if so, what would be the quickest way .. because, it took 4hrs for me to gather stats with granularity => 'ALL',-

by any way can i improve that faster, to set the degree => NULL.

Thanks for your time and consideration



Tom Kyte

Followup  

January 03, 2005 - 9:54 am UTC

drop degree from the call?

January 03, 2005 - 10:00 am UTC

Reviewer: A reader

Sorry ..I didnt get you ..

I dont want the PQ process .. just use the default degree option set in the table. what i feel, since i said, degree => dbms_stats.default_degree - .. it has read from initialization parameter and started using it ..


Thanks ..

Tom Kyte

Followup  

January 03, 2005 - 10:25 am UTC

if you don't want PQ, degree=>1

How about increasing LOG_BUFFER for instance?

January 04, 2005 - 5:35 am UTC

Reviewer: Shimon Tourgeman

Did a lot for me in a system which faced a heavy wait events on redo activity.

The problem was that my system was at 100% CPU load for hours!

I used OEM Performance Manager chart called "Wait or CPU Bound?" to find that my server problem is a lot of waits for free log buffers. Additional drill down into redo log activity had proved that the problems was too much buffers scaned to find a free redo log buffer to use, on avg.

Increasing LOG_BUFFER from 512 KB (shame on me for that!) to 8MB solved the problem - my system CPUs show now very nice and balanced loads.


CPU used by this session

January 19, 2005 - 1:20 pm UTC

Reviewer: steve from Montreal

Just a little clarification from above. Suppose we have:

o a 4 CPU machine
o snapshot period is 1000 seconds
o CPU used by this session = 5000

Does that mean that 50 seconds per CPU were used?

Thanks

Steve



Tom Kyte

Followup  

January 20, 2005 - 9:51 am UTC

4 cpu's * 1000 seconds = 4,000 cpu seconds available

it would mean you used 80 cpu seconds per minute per cpu :)

if you used 5,000 cpu seconds in that period, it means you have long running processes that "dumped" cpu into your snapshot window and you cannot say ANYTHING about cpu use during the period of observation......


CPU is contributed to the v$ views after a call completes. A call is for example "a stored procedure invoked by a client".

So, suppose at NOON, I submit the following stored procedure for execution:


begin
loop
exit when( sysdate > trunc(sysdate)+14/24+5/24/60 ) ;
end loop;
end;
/


that burns CPU for 2 hours and 5 minutes. About 7,500 cpu seconds (assuming it got all of the cpu it wanted)....

Now, at 2pm, I take a snapshot. At 2:15 pm I take another snapshot....

When I generate the report, I will show at least 7,500 cpu seconds used - that stored procedure will have "dumped" it's cpu time into my window of observation.


the cpu statistics are only truly useful on transactional systems or systems where individual calls are measured in seconds - not minutes.

Thanks man!

January 20, 2005 - 10:31 am UTC

Reviewer: steve from Montreal

Thanks Tom.

By cpu used by this session = 5000, I meant
5000 centiseconds (i.e. 50 seconds). But I
loved your explanation of how we can get more
cpu used seconds than snapshot period!! That's cool.


Thanks man.

How to decide which is the problem

February 03, 2005 - 1:35 am UTC

Reviewer: Sanjaya Balasuriya from Colombo, Sri Lanka

Hi Tom,

Say an application is running slow time to time. When it's slow, CPU usage is over 95% (in top).

Now, how can we decide that the application is written badly or the server needs another CPU added ?

At which point we can say that, OK.. application has nothing to do now. You have to add another CPU to the server ?


Thanks in advance.

Tom Kyte

Followup  

February 03, 2005 - 1:53 am UTC

have you traced the application, tuned up the sql.

have you looked at its algorithms, redesigned the slow by slow stuff to be set oriented

have you looked at the way it does stuff and asking "does it really need to do that"

have you verified "yup, we cannot think of a single way to make this thing go faster"

and remember, throwing cpu at it might make it go slower :)

management statistics

February 23, 2005 - 7:34 pm UTC

Reviewer: daniel

Maybe this doesn't apply to this thread....I wanted to get an idea what are the usefull stats that could be provided to upper management to show them that the DBA's are kicking butt. They requested some stats from us, but they really don't represent anything as far as we're concerned. Maybe some other DBA's already provide numbers to their upper execs.


Tom Kyte

Followup  

February 24, 2005 - 5:34 am UTC

I think 42 is the best number of all.


In all seriousness -- report "hours of uninterrupted uptime". Period. If you suffer hardware failure, the time to repair (look we lost 50 disks in that water pipe bursting fiasco, you were down for 5 hours -- this is what we did and if we hadn't practiced it 500 times -- it would have taken days).

Upper management has no idea what a soft parse ratio is, a library cache hit, how optimal you've set memory.

They care about "was the stuff up and available"


The APPLICATION team should be presenting their metrics (average response times, number of transactions, etc). But they probably cannot because they probably didn't build in any sort of audit trails (sigh....).

Those are numbers you cannot report, for you have only the database -- not the end to end transaction information. It matters not that you processed 5 billion sql statements in a day, it only matters that the transaction response time (including network and all) was more than acceptable.

Statspack wait time for CPU

March 01, 2005 - 10:14 am UTC

Reviewer: A reader

Over a 10 minute period, my statspack report shows

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 936 32.66
db file scattered read 97,378 350 12.22
latch free 7,265 232 8.09
SQL*Net message from dblink 434 227 7.91
direct path read 23,416 178 6.21

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 99.97
Buffer Hit %: 89.05 In-memory Sort %: 100.00
Library Hit %: 97.06 Soft Parse %: 93.86
Execute to Parse %: 71.79 Latch Hit %: 99.84
Parse CPU to Parse Elapsd %: 27.98 % Non-Parse CPU: 57.82

"top" shows
CPU states: 4.3% idle, 79.2% user, 14.4% kernel, 2.1% iowait, 0.0% swap
Memory: 8.0G real, 3.3G free, 3.9G swap in use, 3.4G swap free

That "936" above cant possibly be in seconds. There are only 600 seconds in 10 minutes!

Is this good, bad or indifferent? Users are complaining about slow reponse time

Thanks

Tom Kyte

Followup  

March 01, 2005 - 11:17 am UTC

how many cpu's do you have?

do you run batch processes during this time?

and you have most definitely a bind variable problem -- almost 1/2 the CPU on your system is being used to parse, not actually do work for you!

the high cpu would come from massive hard parsing and the latch frees are probably library cache latches.


But -- how many CPU's do you have
Are you running "batch" or long running processes during this period.

March 01, 2005 - 11:25 am UTC

Reviewer: VA from New Jersey, USA

I have 4 CPUs on the box.

System Configuration: Sun Microsystems sun4u Sun Fire 880
System clock frequency: 150 MHz
Memory size: 8192 Megabytes

Batch processes are supposed to be run overnight, but today they "spilled over" into the daytime.

"and you have most definitely a bind variable problem -- almost 1/2 the CPU on your system is being used to parse, not actually do work for you"

How did you conclude this?

"the high cpu would come from massive hard parsing and the latch frees are probably library cache latches"

How can I identify the culprits and slap them on the wrist? We have a lot of DW-type queries using BI-tools that cannot use bind variables, so not much I can do about that.

Thanks

Tom Kyte

Followup  

March 01, 2005 - 11:33 am UTC

so, you have 10*60*4 cpu seconds or 2,400 cpu seconds available.


see the non-parse cpu time? time not spent parsing...


DB and BI -- ok, but are they really parsing 8/9 queries PER SECOND (for DB/BI queries take seconds to run. lets say they take 2 seconds on average to run. You would need 18 of the busiest data miners ever to sustain that rate (no pauses -- just query query query)


So, you need to look to the transactional apps, are they present on this system?

March 01, 2005 - 11:52 am UTC

Reviewer: A reader

Where did you get the "8/9 queries per second" from? Most of these BI things are long-running queries, not fast and rapid little queries.

Transactional apps -- just a handful.

Thanks

Tom Kyte

Followup  

March 01, 2005 - 12:26 pm UTC

opps -- i was looking at two statspack's more or less at the same time and they must have "crossed over" in my mind.

cpu time

March 09, 2005 - 1:22 pm UTC

Reviewer: APL

Why in some reports, cpu time is showing lager than elapsed time? Can you specify the reasons?

Tom Kyte

Followup  

March 09, 2005 - 1:57 pm UTC

<quote from effective Oracle by Design>

A question that frequently comes up with regards to TKPROF and the report is this – how could output such as the following be produced:

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ----- -------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 14755 12.77 12.60 4 29511 856828 14755
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ----- -------- ---------- ----------
total 14756 12.77 12.60 4 29511 856828 14755

How can CPU time be larger then elapsed time? This discrepancy is due to the way timings are collected – and attempts to time very fast operations or lots of operations. For example – suppose you were using a stopwatch that measured only down to the second. You timed 50 events. Each event seemed to have taken 2 seconds according to the stop watch. That means you had 100 seconds of time elapsed during these events right? Well, probably not. Suppose each event really took 2.99 seconds – you really had almost 150 seconds of time there.
Taking it a step further – suppose the stopwatch was continuously running. So, when the event started, you would look at the stopwatch and then when the event finished, you would look again and subtract the two numbers. This is closer to what happens with timing on a computer – you look at the system “watch”, do something and look again – the delta represents the timing of the event. Now, we’ll perform the same timing test as above. Again, each event appears to have taken 2 seconds – but they may have taken just 1.01! How so? Well, when the event started, the clock was really at time 2.99 – but you only saw “2” (the granularity of the stopwatch). When the event completed, the stopwatch reported 4 (and the real time was 4.00). The delta as you see it – 2, the “real” delta – 1.01.


Now, over time these discrepancies can accumulate in the aggregate total. The rule of averages would have them effectively cancel each other out more or less but over time, a small error can creep in. That is the cause of the discrepancy above – where the elapsed time is less then the CPU time. At the lowest level, Oracle is gathering statistics for timing at either the millisecond or microsecond level. And, further, it may time some events using one clock and other events using a different one – this is unavoidable as the timing information is gathered from the operating system, using its API’s. In this example, we executed a statement 14,755 times – meaning the average CPU time to execute that statement was 0.00865469 seconds. If we were to run this test over and over – we would find that the timings for CPU and elapsed are more or less “the same”. In general, this error is not so large as to send you looking down the wrong path, but it can be confusing the first time you see it.
</quote>

CPU utilization in real world

March 30, 2005 - 2:31 am UTC

Reviewer: shahid

Hi Tom,

I was reading this article and the discussion you had with some one about how much cpu one should have in spare.
You were having the concept that 99% is good and he was saying that although he agrees with you but he is better of having a few %cpu always free to fill up those unwanted loads.

I agree with both of you, however we have to look at the problem in a practical way and not in theoretical way.

You were right when yo talk about having capacity planning etc, but tell you the truth, in an anvironment where i have been working, its a banking on and the parameters are so dynamic here that if we tune to the idealistic situation and only acquire cpu of what we think we would need and be happy relax seeing oour data center unix machines running oracel upto 99%, thats is a bit wonderland feeling.
In reality we always have enough extra cpu available and then within we constantly updating our capacity doc to have more cpus if required, but the bottom line is we consider our systems ok when we have on the avg 60% of cpu usage.
More than that is warning threshold for us.

I am saying this becuase for those site that run complex env like they have sql servers, sybase, mainframe and then the architecture is open systems where you got middlewares like tuxedo, hrms, payroll, financials and then the company has actually done server consolidation using ibm regatta boxes.

With our experience what we have seen is that in oprn systems architecture we always have had unfortunate incidents.

For example, our internet application is using oracle 9ias servers and one day the cpu consumption went high, becuase there was a change in the java code and there was a bug in the oracle application thay we find out after so much of arguments between oracle and microsoft and no to mention the ibm aix, every one seems to point at each other until we havbe a consultant who found out and fix the problem.
Yes the problem was application code but the bottom line is we had enough cpu available to survive that day with less users connected.

I think the problem lies in the complexity we are witnessing in it today, lots of different parameters .

As systems manager I have done the best in acquiring a machine say with 8 cpus based on the understanding and figures we have from our development for the laod wof say 1 year. But suddenly the demand arises and the users increases and somehow the other party did not inform us until we find out from our esm tool like hp openview that we are going to hit 100% cpu soon. In this case if we had already started with our database in prod being 99% or 100% utilised, we would end up in a disaster.

So in reality IT systems should always have a value which it feels is secure and happy with , for us we find this value of cpu usage to be 60% on the avg.

Thanks




fast cpu slow er, slow cpu faster

April 12, 2005 - 8:44 am UTC

Reviewer: Hans-Peter Sloot from The Netherlands

I hope this is inline for this thread.

How can a 4*750Mhz cpu be slower (10 times) than a 4*150Mhz system.
This happens for me with updates and deletes when I compare a 8.1.7. instance on the 150Mhz with a 9.2.0 instance on 750Mhz.
Tables are the same, consistent gets, db_block_gets etc are comparable.

The striking thing is that the 9.2.0 instance needs 195,000 times the library cache latch whereas the 8.1.7 instance needs only 1,700 times the library cache.
More than 90% of the elapsed time is cpu.

This happens with simple deletes, simple updates or updates using joins. CTAS seem to be equally fast like normal selects on tables.



Tom Kyte

Followup  

April 12, 2005 - 8:58 pm UTC

can you give me a for example...

(also, what 9204 is this and does it happen during sql trace, but not during regular stuff - suggest you get runstats from </code> http://asktom.oracle.com/~tkyte <code>- a simple test harness and use that to measure instead of sql trace once.)

April 13, 2005 - 4:13 am UTC

Reviewer: Hans-Peter Sloot from The Netherlands

Example:
CREATE TABLE T3523_BASIS_NW_HP_ORG
(
CT_SLEUTEL NUMBER(10) NOT NULL,
CT_MARKT NUMBER(10)
)
Insert 100000 rows where ct_sleutel is unique and ct_market has about 15 distinct values (infact that does not matter)
Delete 1/15-th using ct_markt.

Infact I used runstats (you showed it at the seminar in Utrecht). That showed me te large number of library cache latches needed .

LATCH.library cache 194,132 1,773
STAT...CPU used by this sessio 23,149 2,720
STAT...CPU used when call star 23,149 2,720

Column 2 is the slow system and 3 is the fast system.

event total_waits time waited average wait
CPU used by this session -1 23202 -1
db file sequential read 487 7 0
log file sync 3 1 0
SQL*Net message from client 36 1 0
latch free 1 0 0
SQL*Net message to client 36 0 0


Tom Kyte

Followup  

April 13, 2005 - 9:11 am UTC

we need to backup a minute. Since you cannot runstat across versions, I don't know what I'm looking at here.

Can we see the entire test case (you can remove unneeded runstats lines) but let me see what you see -- your test.

also, explain the install, anything going on like streams, cdc, anything in the background?

cpu utilization

April 14, 2005 - 2:30 am UTC

Reviewer: Hans-Peter Sloot from The Netherlands

Infact what I do is
1.exec runStats_pkg.rs_start
2.delete or update statement
3.exec runStats_pkg.rs_middle
4.select * from dual
5.exec runStats_pkg.rs_stop


I only take the stats from run1 and repeat this on the other instance. Further I collected wait event from v$system_event and CPU.. from v$sysstat like statspack and displayed them.

This happens without any other background activities.
When I start this it consumes 1 entire cpu of 4.

The database has 32k block_size tablespaces. I discovered that in a test database the same update takes 28000hsecs in a 32k tablespace but only 6500hsecs in an 8k tablespace created in the same database. I did this a couple of times and this is repeatable.
I have ouputs available.

Tom Kyte

Followup  

April 14, 2005 - 7:43 am UTC

do i need a 32k blocksize database to reproduce? and are you comparing 32k in 8i to 32k in 9i?

everything else is "the same"?

(i don't have any testing systems with a 32k blocksize right now)

cpu usage

April 14, 2005 - 7:55 am UTC

Reviewer: Hans-Peter Sloot from The Netherlands

At first I compared a 8.1.7 instance with 9.2.0 instance.
The same job was 5 times faster at 4 times slower cpu.
I found out in a test database that if you run the same update in 9.2.0 and the table is in a 8k block size tablespace cpu consumption is 5 times lower compared with
the table in a 32k tablespace.

So yes you need a 32k tablespace.


Tom Kyte

Followup  

April 14, 2005 - 8:37 am UTC

sorry, for that I'll have to ask you to work with support, I don't have a 32k capable system right now.

(does it reproduce with 16k? )

cpu usage

April 14, 2005 - 9:51 am UTC

Reviewer: Hans-Peter from The Netherlands

It seems to reproduce on 16k.
The results are somewere between the 8k and the 32k tests.
One specific test showed
8k : 1 minute 28s
16k: 3 minutes 35s
32k: 7 minutes 20s

I have already contacted Oracle Support.

Tom Kyte

Followup  

April 14, 2005 - 10:00 am UTC

I'll see if I cannot get a test set up.

cpu usage

April 15, 2005 - 2:27 am UTC

Reviewer: Hans-Peter Sloot from The Netherlands

Hi Tom,

It seems not to be reproducible in 9.2.0.6.


cpu usage

April 15, 2005 - 5:32 am UTC

Reviewer: Hans-Peter from The Netherlands

Hello Tom,

I also asked Harald van Breederode to look at it.
He noticed that db_block_checking was TRUE.
I put it on false and the problem was gone.

Tom Kyte

Followup  

April 15, 2005 - 9:12 am UTC

That makes total sense, bigger block, infinitely more time consuming to have block checking done.

thanks so much for the followup

is there a way to slow down a database?

June 07, 2005 - 7:58 am UTC

Reviewer: Reader from Kuwait

Dear Tom,

Im not sure if im posting this in the right section, but I hope to see your responce.

I have never done this and tried to search asktom.oracle.com for something similar but nothing came up.

Is there a way to test the hardware? What is the method used by oracle to come up with number i.e. XXX transaction per second?

Or let me put it in the other way, can I run some sort of SQL query or procedure to test my hardware to get numbers (Transaction per second)? My manager wants to know on what level it will distress the hardware and wants to do this by dummy transactions. Till it degrade performance for our database, and I'm given the task to get the capability of my server running on Tru64 4 CPU X 500 Mhz with 4 GB ram. I really don’t have any idea on what to start and from where. I'm not able to find anything in this regards coz all of the posts says how to improve performance , but there should be a way to test Oracle database and Transaction per second till the database goes really slow in performance so that we can have the exact number on what is the capability of our servers.

I will really be thankful for your kind response and comments.


Tom Kyte

Followup  

June 07, 2005 - 8:28 am UTC

this is called "a benchmark"

In order to do this, you need to simulate your system (my transactions per second on my system using my workload would mean NOTHING to you for I doubt you do the same thing I do).

You need to set up a simulation. You need to set up a benchmark.

Regarding benchmark

June 08, 2005 - 12:24 am UTC

Reviewer: Reader from Kuwait

Can I use some kinda code to do this? could you provide me a link or something from which I could benifit? I was tryin to search for it but no luck.

Hope you could help me....

Tom Kyte

Followup  

June 08, 2005 - 8:29 am UTC

You sort of are the only person in the world that knows what transactions *you* need to execute to test *your* system.

There are many load generators out there, from "I'll buy one" from Mercury Interactive, to freebies on sourceforge.net and other places. Depends on whether you are 3 tier, what language, what protocols, etc etc etc etc.

Doing a benchmark is a "big deal", it takes planning and time to do right.. And if you've never done one, you might want to engage someone that has (so they can teach you)

A script to slow down the database...

June 08, 2005 - 8:17 am UTC

Reviewer: Reader from Kuwait

I really forgot to mention that I don't want to change my database parameters but do want to have a script which can really slow down the database and I could check the stats and compare. I know you are genius and could give me something like that and it will be a great favor for me.... Im really sorry I know im askin alot :(

Total System Global Area 2735708800 bytes
Fixed Size                   735872 bytes
Variable Size            1090519040 bytes
Database Buffers         1543503872 bytes
Redo Buffers              100950016 bytes
SQL> 

Also if you don't mind can you tell me what's behind the "@trace" the one you use? 

I really appreciate the time you take to answer the question over here. 

 

Tom Kyte

Followup  

June 08, 2005 - 9:11 am UTC

ops$tkyte@ORA9IR2> @trace
ops$tkyte@ORA9IR2> alter session set events '10046 trace name context forever, level 12';
 
Session altered.
 



If you just want to slow down your database, 


select count(*) from all_objects,all_objects,all_objects,all_objects,all_objects,all_objects,all_objects;

ought to do it.  

To benchmark your database, you'll need a more measured approach. 

wait event CPU time

January 23, 2006 - 5:04 am UTC

Reviewer: A reader

Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 11 23-Jan-06 08:24:52 24 33,753.0
End Snap: 14 23-Jan-06 09:52:28 33 24,558.1
Elapsed: 87.60 (mins)


Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 6,986 98.33

...how the waited time can be bigger than total elapsed time of the snapshot?

cpu utilization was about 80% during that time.



Tom Kyte

Followup  

January 23, 2006 - 10:25 am UTC

it is not a wait event, it is a timed event.

Anyway, you had:

ops$tkyte@ORA9IR2> select 87.6 * 60 from dual;

   87.6*60
----------
      5256


5,256 seconds by the wall clock (elapsed time).  If you had a machine with N cpus:


ops$tkyte@ORA9IR2> select rownum cpu_count, rownum * 5256*rownum cpu_seconds_avail
  2  from all_users
  3  where rownum <= 6;

 CPU_COUNT CPU_SECONDS_AVAIL
---------- -----------------
         1              5256
         2             21024
         3             47304
         4             84096
         5            131400
         6            189216

6 rows selected.


You could have significantly more CPU seconds in 5,256 wall clock seconds.  So that is the first "most plausible" reason.


The second is the fact that CPU time is contributed to the V$ tables *at the end of a call.


Create a procedure like this for example:

ops$tkyte@ORA9IR2> create or replace procedure spin_lots
  2  as
  3          l_date  date := sysdate;
  4  begin
  5          loop
  6                  exit when sysdate-l_date >= 1/24/60 *25;
  7          end loop;
  8  end;
  9  /

Procedure created.


That runs in a loop for about 25 minutes, burning 100% cpu.  Now, do the following:


8:55am     start procedure
9:00am     statspack.snap  snap1
9:10am     statspack.snap  snap2
9:21am     statspack.snap  snap3 (right after procedure exits)....


do nothing else on this system during this time.  Assume you have a single cpu machine, the statspack for:

snap1 .. snap2   -> ZERO cpu seconds (call had not completed)
snap2 .. snap3   -> 1,500 cpu seconds (call had completed)


but snap2 .. snap3 only had 600 cpu seconds, we just experienced cpu dumping, that long running procedure dumped all of its cpu into that snapshot window, none into the first.


If you have long running calls into the database, you must be aware of that.


I rather suspect in your case, the first thing I showed, you have more than one cpu and have forgotten to multiply.



Do the same with wait events.  Say you take a 5 minute snapshot.  Would it be possible for you to experience 4,000 MINUTES of IO wait during that table?  

Sure, 1,000 people waiting 4 minutes each for IO will be 4,000 minutes of IO wait.

Don't forget to "multiply" 

CPU Seconds

January 23, 2006 - 2:11 pm UTC

Reviewer: A reader

Shouldn't your query be:

  1  select rownum cpu_count,  5256*rownum
  2  cpu_seconds_avail
  3      from all_users
  4*     where rownum <= 6
SQL> /

 CPU_COUNT CPU_SECONDS_AVAIL
---------- -----------------
         1              5256
         2             10512
         3             15768
         4             21024
         5             26280
         6             31536

6 rows selected.

 

Tom Kyte

Followup  

January 23, 2006 - 11:03 pm UTC

darn, how did that extra *rownum slip in there (cut and pasting error I'd guess)

absolutely, I was just trying to multiply the number of seconds by the number of cpus

tracking CPU usage

April 03, 2006 - 8:46 pm UTC

Reviewer: J from Ca

I have different view on tracking CPU usage. We bill customer based on CPU seconds by looking at the oracle instance, how many CPU consumed during the month. User is eager to know how those CPU time spent on what/who. I took data from v$sesstat when user disconnect from database. From there, I am looking into statistic# 12 for CPU values during the session. However, adding all those number up is significantly less the billed minutes, for sure. How can I interprete the CPU time by user and system? Is there any guideline, say, the fixed % CPU is used by keeping system running? Is there any other way to track the user's CPU time? Thanks!

Tom Kyte

Followup  

April 04, 2006 - 9:52 am UTC

you can set up a resource manager plan and it'll record this information for you.


(don't look at statistic #12, look at the statistic NAME - the numbers change from release to release and so on).


the cpu used by individual sessions will be much less than the cpu used on the machine itself. there is a lot that goes on outside of user sessions.

BAARF !!!

April 06, 2006 - 9:48 am UTC

Reviewer: Jens from Norway

<Snip>
May 29, 2003
Reviewer: Sean from Atlanta

"
.....
don't see where raid 10 comes into play necessarily.
.....

they are current on raid 5, and i want raid 1+0 for better writing throughput
for 100X current active sessions later.


Followup:


and N times the disk storage needs. raid 5 is typically bought on the cheap.
raid 10 is at the high end of the cost scale.
</snip>

My 2cents:
</code> http://www.baarf.com

We simply do not! discuss this anymore!
We do not use RAID-F for Oracle. (Check the link and you will understand...)

Though throwing out the RAID5 would probably not be the first thing to do, it should definetely be on the short-to-mid-term list.
If not for anything else, then because we want to sleep well at night.
I really urge anyone to read that link, maybe particularly " Is RAID 5 Really a Bargain? Cary Millsap Hotsos" 
http://www.miracleas.com/BAARF/1.Millsap2000.01.03-RAID5.pdf <code>

Best regards,
Jens

BTW:
I liked the analogy using the shop with 10 payment points.


Tom Kyte

Followup  

April 07, 2006 - 3:33 pm UTC

raid 5 can be used very well for many types of systems.

not sure I'd want my redo logs or temp on it, but datafiles in a transactional system can live there quite well under moderate to light use. DBWR writes to them in the background, end users don't have to wait for the writes. In a data warehouse - it can be painful if you do lots of direct path operations.

I am not proposing the use of raid-5, just saying that it is not entirely useless.

October 30, 2006 - 11:02 am UTC

Reviewer: A reader

Hi Tom

We have a performance problem. Our database server has 4 CPUs. But oracle use only 1 CPU . Every time we see in peak times CPU % = 25%.

Our oracle version is :
Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
PL/SQL Release 9.2.0.6.0 - Production
CORE 9.2.0.6.0 Production
TNS for 32-bit Windows: Version 9.2.0.6.0 - Production
NLSRTL Version 9.2.0.6.0 - Production

Why and how do we find why Oracle can't use all the available CPU on the system? Is it some thing with misconfiguration on server side?

Please help

Tom Kyte

Followup  

October 30, 2006 - 12:47 pm UTC

Oracle uses all of the cpus - smon, pmon, lgwr, dbwr, etc - they'll all use the cpus the OPERATING SYSTEM SCHEDULES THEM TO RUN ON.


Now, your single threaded session, YOUR dedicated (or shared server) process. That'll use a single cpu, so if you run a stored procedure - it'll run on A CPU. If you run 4 copies of your stored procedure - each one will potentially run on a separate cpu.

There is nothing wrong here, everything is working as expected. YOU are not doing parallel operations (like a parallel query), so YOU will use a single cpu at a time, but if YOU start 4 things at once, YOU will use all four cpus

My 2 cents....

October 31, 2006 - 6:40 am UTC

Reviewer: Paul from Wollongong, Australia

Some people have sweared that only <60% CPU utilisation is OK. However it must be remembered that the CPU always services the higher priority processes before the lower priority processes.

This means that 100% CPU utilisation can be completely acceptable when your friendly system administrator has assigned time critical processes the highest execution priority, and the non-time critical processes a lower execution priority.

Response time degradation NEVER occurs for the processes with the highest priority - doesn't matter if the CPU is at 10% or 100%.

December 13, 2006 - 4:41 pm UTC

Reviewer: A reader from Ca, USA

I'm trying troubleshoot database performance issue. Here's a portion of the statspack report taken at a 15 minute interval. We generated statspack at a 15 minute interval for a few days and every report looks more or less the same in terms of the numbers.

I couldn't really find anything glaring here. It looks like the memory usage, soft parse counts, and other vital stats are ok.


STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
XXXXXX YYYYYYYYYY XXXXXX 1 9.2.0.7.0 NO driver

Snap Id Snap Time Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
Begin Snap: 69 11-Dec-06 16:45:03 30 6.0

End Snap: 70 11-Dec-06 17:00:04 30 6.2

Elapsed: 15.02 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 112M Std Block Size:
8K
Shared Pool Size: 112M Log Buffer:
2,048K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 168,361.53 79,922.94
Logical reads: 7,689.93 3,650.49
Block changes: 235.87 111.97
Physical reads: 608.98 289.09
Physical writes: 22.82 10.83
User calls: 146.38 69.49
Parses: 50.93 24.18
Hard parses: 0.00 0.00
Sorts: 7.04 3.34
Logons: 6.71 3.18
Executes: 83.31 39.55
Transactions: 2.11

% Blocks changed per Read: 3.07 Recursive Call %: 49.20
Rollback per transaction %: 6.32 Rows per Sort: 3.21

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.89 Redo NoWait %: 100.00
Buffer Hit %: 92.44 In-memory Sort %: 100.00
Library Hit %: 100.00 Soft Parse %: 99.99
Execute to Parse %: 38.87 Latch Hit %: 99.97
Parse CPU to Parse Elapsd %: 86.97 % Non-Parse CPU: 95.87

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 47.11 47.15
% SQL with executions>1: 70.83 70.63
% Memory for SQL w/exec>1: 68.40 68.25

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 115 54.21
db file sequential read 9,503 31 14.68
SQL*Net more data to client 102,069 21 9.81
log file parallel write 2,371 18 8.32
log file sync 2,109 10 4.70
-------------------------------------------------------------
Wait Events for DB: PRTPRD Instance: PRTPRD Snaps: begin_snap -end_snap
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)




Instance Activity Stats for DB: XXXXXX Instance: XXXXXX Snaps: begin_snap -end

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
CPU used by this session 11,484 12.8 6.1
CPU used when call started 11,482 12.7 6.1
CR blocks created 7,754 8.6 4.1
Cached Commit SCN referenced 0 0.0 0.0
Commit SCN cached 0 0.0 0.0
DBWR buffers scanned 185,500 205.9 97.7
DBWR checkpoint buffers written 713 0.8 0.4
DBWR checkpoints 1 0.0 0.0
DBWR free buffers found 182,923 203.0 96.4
DBWR lru scans 54 0.1 0.0
DBWR make free requests 54 0.1 0.0
DBWR revisited being-written buff 0 0.0 0.0
DBWR summed scan depth 185,500 205.9 97.7
DBWR transaction table writes 12 0.0 0.0
DBWR undo block writes 754 0.8 0.4
DFO trees parallelized 0 0.0 0.0
PX local messages recv'd 0 0.0 0.0
PX local messages sent 0 0.0 0.0
Parallel operations downgraded 75 0 0.0 0.0
SQL*Net roundtrips to/from client 105,520 117.1 55.6
active txn count during cleanout 7,621 8.5 4.0
background checkpoints completed 1 0.0 0.0
background checkpoints started 1 0.0 0.0
background timeouts 1,079 1.2 0.6
branch node splits 0 0.0 0.0
buffer is not pinned count 2,084,161 2,313.2 1,098.1
buffer is pinned count 33,839 37.6 17.8
bytes received via SQL*Net from c 148,070,758 164,340.5 78,014.1
bytes sent via SQL*Net to client 223,399,326 247,946.0 117,702.5
calls to get snapshot scn: kcmgss 134,096 148.8 70.7
calls to kcmgas 18,238 20.2 9.6
calls to kcmgcs 182 0.2 0.1
change write time 422 0.5 0.2
cleanout - number of ktugct calls 6,755 7.5 3.6
cleanouts and rollbacks - consist 6,607 7.3 3.5
cleanouts only - consistent read 40 0.0 0.0
cluster key scan block gets 22 0.0 0.0
cluster key scans 4 0.0 0.0
commit cleanout failures: block l 0 0.0 0.0
commit cleanout failures: buffer 0 0.0 0.0
commit cleanout failures: callbac 0 0.0 0.0
commit cleanout failures: cannot 0 0.0 0.0
commit cleanouts 12,365 13.7 6.5
commit cleanouts successfully com 12,365 13.7 6.5
commit txn count during cleanout 317 0.4 0.2
consistent changes 411,537 456.8 216.8
consistent gets 6,519,445 7,235.8 3,434.9
consistent gets - examination 482,520 535.5 254.2
current blocks converted for CR 0 0.0 0.0
cursor authentications 0 0.0 0.0
data blocks consistent reads - un 328,065 364.1 172.9
db block changes 212,523 235.9 112.0
db block gets 434,247 482.0 228.8
deferred (CURRENT) block cleanout 1,622 1.8 0.9
dirty buffers inspected 590 0.7 0.3
enqueue conversions 166 0.2 0.1
Instance Activity Stats for DB: XXXXXX Instance: XXXXXX Snaps: begin_snap -end

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
enqueue releases 21,099 23.4 11.1
enqueue requests 21,116 23.4 11.1
enqueue timeouts 17 0.0 0.0
enqueue waits 10 0.0 0.0
execute count 75,063 83.3 39.6
free buffer inspected 922 1.0 0.5
free buffer requested 548,948 609.3 289.2
hot buffers moved to head of LRU 6,458 7.2 3.4
immediate (CR) block cleanout app 6,647 7.4 3.5
immediate (CURRENT) block cleanou 4,333 4.8 2.3
index fast full scans (full) 0 0.0 0.0
index fetch by key 50,218 55.7 26.5
index scans kdiixs1 43,494 48.3 22.9
leaf node 90-10 splits 6 0.0 0.0
leaf node splits 48 0.1 0.0
logons cumulative 6,042 6.7 3.2
messages received 3,406 3.8 1.8
messages sent 3,406 3.8 1.8
no buffer to keep pinned count 0 0.0 0.0
no work - consistent read gets 5,921,588 6,572.2 3,119.9
opened cursors cumulative 45,887 50.9 24.2
parse count (failures) 0 0.0 0.0
parse count (hard) 3 0.0 0.0
parse count (total) 45,887 50.9 24.2
parse time cpu 474 0.5 0.3
parse time elapsed 545 0.6 0.3
physical reads 548,687 609.0 289.1
physical reads direct 42 0.1 0.0
physical reads direct (lob) 25,067 27.8 13.2
physical writes 20,559 22.8 10.8
physical writes direct 42 0.1 0.0
physical writes direct (lob) 16,639 18.5 8.8
physical writes non checkpoint 20,237 22.5 10.7
pinned buffers inspected 4 0.0 0.0
prefetched blocks 448,982 498.3 236.6
prefetched blocks aged out before 0 0.0 0.0
process last non-idle time 901 1.0 0.5
queries parallelized 0 0.0 0.0
recursive calls 127,739 141.8 67.3
recursive cpu usage 1,384 1.5 0.7
redo blocks written 306,922 340.7 161.7
redo buffer allocation retries 44 0.1 0.0
redo entries 82,260 91.3 43.3
redo log space requests 2 0.0 0.0
redo log space wait time 2 0.0 0.0
redo size 151,693,740 168,361.5 79,922.9
redo synch time 1,036 1.2 0.6
redo synch writes 2,104 2.3 1.1
redo wastage 739,248 820.5 389.5
redo write time 1,803 2.0 1.0
redo writer latching time 0 0.0 0.0
redo writes 2,371 2.6 1.3
rollback changes - undo records a 4 0.0 0.0
rollbacks only - consistent read 1,205 1.3 0.6
rows fetched via callback 26,902 29.9 14.2
session connect time 0 0.0 0.0
Instance Activity Stats for DB: XXXXXX Instance: XXXXXX Snaps: begin_snap -end

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
session logical reads 6,928,625 7,689.9 3,650.5
session pga memory 17,024 18.9 9.0
session pga memory max 17,024 18.9 9.0
session uga memory 75,549,168 83,850.4 39,804.6
session uga memory max 471,444,232 523,245.5 248,390.0
shared hash latch upgrades - no w 37,948 42.1 20.0
shared hash latch upgrades - wait 0 0.0 0.0
sorts (memory) 6,341 7.0 3.3
sorts (rows) 20,328 22.6 10.7
summed dirty queue length 3,559 4.0 1.9
switch current to new buffer 116 0.1 0.1
table fetch by rowid 49,146 54.6 25.9
table fetch continued row 0 0.0 0.0
table scan blocks gotten 2,010,754 2,231.7 1,059.4
table scan rows gotten 290,496,974 322,416.2 153,054.3
table scans (cache partitions) 3,899 4.3 2.1
table scans (long tables) 0 0.0 0.0
table scans (rowid ranges) 0 0.0 0.0
table scans (short tables) 4,811 5.3 2.5
transaction rollbacks 1 0.0 0.0
transaction tables consistent rea 0 0.0 0.0
transaction tables consistent rea 0 0.0 0.0
user calls 131,884 146.4 69.5
user commits 1,778 2.0 0.9
user rollbacks 120 0.1 0.1
workarea executions - optimal 155 0.2 0.1
write clones created in backgroun 0 0.0 0.0
write clones created in foregroun 13 0.0 0.0
-------------------------------------------------------------



Tom Kyte

Followup  

December 15, 2006 - 7:56 am UTC

this is a statspack report for a system that is basically "not doing anything"

if it is "slow", you should look to your SQL and your ALGORITHMS to tune them. Look at your "table scan rows gotten".... that looks large doesn't it.

are you running processes that take longer than 15 minutes to complete by the way?

December 13, 2006 - 4:57 pm UTC

Reviewer: A reader

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 115 54.21
db file sequential read 9,503 31 14.68
SQL*Net more data to client 102,069 21 9.81
log file parallel write 2,371 18 8.32
log file sync 2,109 10 4.70


In the above statspack report, CPU time is the top event. In a 15.02 minute interval (902 seconds), the system uses 115 seconds of CPU. What does "% Total Ela Time" represent? Does that mean that, out of 115 seconds, the system spends 54.21% of the time waiting for something. Since 54% seems to be a pretty high number, I have to look into what the system is waiting for. The second top event shows that the system spends 14% of its time waiting for "db file sequential read". Is this the correct way of interpreting Top 5 Timed Events?

Tom Kyte

Followup  

December 15, 2006 - 7:59 am UTC

in 15 minutes, you have 15*60 = 900 cpu seconds PER CPU (so multiply 900 by cpu_count)

Now, you have used by 115 seconds of that 900. eg: you are not really pushing this machine

OR

you have processes that take longer than 15 minutes - and their cpu time is not visible in this report.

eg:

if you start a stored procedure at 8:55am
and you take a snapshot at 9:00am
and you take a snapshot at 9:15am
and your procedure finishes at 9:25am
and you take a snapshot at 9:30am


then, the report from 9:00-9:15 will show ZERO cpu time
and the report from 9:15-9:30 will show 30 MINUTES of cpu time



CPU intensive query

December 17, 2006 - 5:57 am UTC

Reviewer: kuldeep from India

prstat -a

NPROC USERNAME SIZE RSS MEMORY TIME CPU
184 oracle 342G 337G 99% 372:58.43 90%
114 root 671M 283M 0.1% 278:07.34 5.6%
45 webman 850M 299M 0.1% 0:26.03 0.1%
2 irams 3136K 2696K 0.0% 0:00.00 0.1%
7 cdadmin 26M 12M 0.0% 1:39.59 0.0%
2 cdtirms 1985M 1955M 0.6% 0:01.02 0.0%
6 spmadmin 115M 43M 0.0% 0:00.12 0.0%
2 ucps 2992K 1880K 0.0% 0:00.21 0.0%
4 nobody 6736K 3904K 0.0% 0:00.00 0.0%
1 daemon 2608K 960K 0.0% 0:00.00 0.0%

Almost this was the state of our machine where oracle was consuming 90% of cpu and 99%memory.

Our DBAs pointed out a query and said that this query is most CPU intensive and when I checked there were 24 sessions executing the same query. I waited for an hour and still all those session were running the same query. When I killed those sessions, cpu usage by oracle had gone down.

My question is how DBA located this query which was eating up most CPU time?

Thanks and regards,

Tom Kyte

Followup  

December 17, 2006 - 10:05 am UTC

the rss includes the SGA


DBA took process id and asked Oracle via the v$ tables what that process was doing perhaps.

or they looked at v$session for long running active sessions and asked "what are those sessions doing"

but - since you work with the DBA's, why not ask them?

Gathering load stats

February 21, 2007 - 2:27 pm UTC

Reviewer: harri from Finland

Hi!

Any ideas how i can get some database load statistics(just like in AWM/statspack report) from 10g databases using sql?

The goal is to gather some key statistics constantly to the report database using db-links. Any hints how to get for example "transactions per second" calculated?

Tom Kyte

Followup  

February 21, 2007 - 3:32 pm UTC

the v$ tables are full of stuff.

but - one wonders why one would reinvent the proverbial wheel?

I found the wheel

February 21, 2007 - 6:20 pm UTC

Reviewer: harri from Finland

dba_hist_% are the views i looked for...

select avg(average) from dba_hist_sysmetric_summary where metric_id = 2075;

Tom Kyte

Followup  

February 22, 2007 - 8:31 am UTC

then, one asks "why not use the tools since - you must have them licensed to use that stuff"

eg: the wheel is the tool you say you didn't want to use.

...

February 22, 2007 - 2:57 pm UTC

Reviewer: hk from Finland

I didn't say that I don't want to use any tools. I only asked for ideas to get load statistics out from db using sql-query over dblink. I only pointed out that AWR/statspack reports(files) contains that kind of data I am looking for.

I didn't know how to get statistics like "transactions per second" or "CPU Usage Per Sec" without taking out AWR or statspack reports or using EM. I know now - after some googling. Anyhow my solution still uses AWR-views so of course I am still looking for if there is relatively easy way to do this by some other way.(without additional costs)

In fact I didn't remember using AWR is option that costs separately. This sucks especially because AWR is there wanted or not - you don't install it separately or make any selection to install it. I bet there is lot of dba's who has started to use AWR without knowing it costs separately. Of course this is just what Oracle is looking for here.

Anyways, our mission is to add these load/performance stats to our collection procedure: "transactions per second", "CPU Usage Per Sec" and memory usage of user sessions. That's it. Everything else is already collected using PL/SQL - hope it's still included in Oracle Server license :)


Server Hardware Planning

July 03, 2007 - 9:33 am UTC

Reviewer: Deepak from India

Hi Tom,

Want to have expert comments on Server Hardware and OS Planning.

If you are asked to recommend a server (Hardware/OS) configuration for a database server. Then what factors you will examine and how will you zero in on the desired server.

Please explain the steps with example.. This will really be very helpful.

It will equally be good if you can point me to any document.

CPU utilization By each session

July 19, 2007 - 5:19 pm UTC

Reviewer: A reader

Hi Tom,

It was really a good thread and i learned a lot from this one.
I am a novice in this field. I want to monitor how much CPU resources are being used by each session. If i dont want to use Statspack to see cpu utilization for expensive sql.What is the query to find out cpu resources used by each session.

Thanks
Tom Kyte

Followup  

July 19, 2007 - 7:32 pm UTC

depends on release - but there is v$sesstat and in 10g, if you have access to it, there are the ASH (active session history) views.

If you want to find expensive sql however, you would not look at a session, you would look at - well - SQL

that would be v$sql

CPU utilization for session

July 20, 2007 - 12:43 pm UTC

Reviewer: A reader

Hi Tom,

Thanks a lot for your response.

Database version is oracle8i.

I am using this particular query to find what the session is doing and how much CPU they have consumed. Can you please confirm whether my approach is right or not?

select ss.sid,se.command,ss.value CPU ,se.username,se.program
from v$sesstat ss, v$session se
where ss.statistic# in
(select statistic#
from v$statname
where name = 'CPU used by this session')
and se.sid=ss.sid
and ss.sid>6
order by ss.sid




Tom Kyte

Followup  

July 20, 2007 - 6:44 pm UTC

quick glance looks OK.

Just remember that if someone issues:

begin loop null; end loop; end;


that'll consume tons of CPU - but won't show up in the v$ tables until that statement ends - since CPU is not added to v$ tables until after the call completes.

Same as above

July 23, 2007 - 9:59 am UTC

Reviewer: A reader

Hey Tom,

Thank you for your response.

As you said that using v$tables we can not see CPU consumes by a statement like "begin loop null; end loop; end;" until call completes.

Is there any other way to find CPU consumption for this call while its in progress?

Thanks


Tom Kyte

Followup  

July 24, 2007 - 8:49 am UTC

the cpu time is not reported until the call completes, it is not computed until the call completes. short of monitoring the OS process, no - not really.

thanks for your help

July 24, 2007 - 9:54 am UTC

Reviewer: A reader


Question about enqueue & CPU utilization

July 31, 2007 - 2:38 am UTC

Reviewer: Suraj Sharma from Bangalore, India

Tom,

It is really a nice information about CPU utilization. I have a query:

We are facing CPU utilization on one of our database server. We are receiving CPU utilization alert almost everyday for this server. I tried to find out the statspack report for a month when there was CPU spike and I got something like this:


STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
XXXXX 353663326 XXXXX 1 9.2.0.7.0 NO xxxxxxxx07

Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 16057 25-Jun-07 00:00:04 119 56.8
End Snap: 16763 23-Jul-07 23:00:04 152 310.6
Elapsed: 41,700.00 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 608M Std Block Size: 8K
Shared Pool Size: 912M Log Buffer: 10,240K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 30,314.07 12,184.00
Logical reads: 42,967.09 17,269.57
Block changes: 244.60 98.31
Physical reads: 985.73 396.19
Physical writes: 27.46 11.04
User calls: 392.00 157.55
Parses: 49.92 20.06
Hard parses: 0.33 0.13
Sorts: 57.97 23.30
Logons: 0.16 0.06
Executes: 226.85 91.18
Transactions: 2.49

% Blocks changed per Read: 0.57 Recursive Call %: 56.41
Rollback per transaction %: 2.54 Rows per Sort: 79.14

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 97.78 In-memory Sort %: 100.00
Library Hit %: 99.93 Soft Parse %: 99.33
Execute to Parse %: 78.00 Latch Hit %: 96.91
Parse CPU to Parse Elapsd %: 50.23 % Non-Parse CPU: 96.31

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 93.40 93.28
% SQL with executions>1: 84.44 82.81
% Memory for SQL w/exec>1: 80.44 80.13

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
enqueue 4,419,529 12,759,003 50.83
CPU time 6,271,012 24.98
latch free ############ 5,766,902 22.97

As you can see the major contributor to the wait event is "enqueue" by taking this into consideration I found further information about enqueue wait event in the statspack report, which is as under:

Enqueue activity for DB: XXXXX Instance: XXXXX Snaps: 16057 -16763
-> Enqueue stats gathered prior to 9i should not be compared with 9i data
-> ordered by Wait Time desc, Waits desc

Avg Wt Wait
Eq Requests Succ Gets Failed Gets Waits Time (ms) Time (s)
-- ------------ ------------ ----------- ----------- ------------- ------------
TX 22,136,214 20,622,865 1,513,339 22,172 589,418.42 13,068,585
UL 2,265,670 1,813,868 451,802 10,562 44.28 468

While looking at some SQls (ordered by execution) with insert, update and delete I found few


Executions Rows Processed Rows per Exec Exec (s) Exec (s) Hash Value
------------ --------------- ---------------- ----------- ---------- ----------
26,293,596 26,293,601 1.0 0.00 0.00 1528430012
INSERT INTO CUIT_TMP ( CUIT , CUIT_ID , CUIT_NAME , CUIT_T
YPE , CUIT_TYPE_DSP , SITE_ID , SITE , PAYMENT_SCHEDULE_ID , AMO
UNT , ACCTD_AMOUNT , AMOUNT_DUE_REMAINING , ACCTD_AMOUNT_DUE_REM
AINING , TRX_DATE , TRX_TYPE_NAME , TRX_NUMBER , AGREEMENT , DUE
_DATE , TRX_ID , TRX_TYPE , SOURCE , CURRENCY_CODE , CURRENCY_CO

6,166,323 6,166,323 1.0 0.00 0.00 3371479671
select t.name, (select owner_instance from sys.aq$_queue_table_
affinities where table_objno = t.objno) from system.aq$_queue
_tables t where t.name = :1 and t.schema = :2 for update skip lo
cked

860,336 860,336 1.0 0.01 0.01 3716207873
update seq$ set increment$=:2,minvalue=:3,maxvalue=:4,cycle#=:5,
order$=:6,cache=:7,highwater=:8,audit$=:9,flags=:10 where obj#=:
1

As you can see out of these three two of them is related to system tables. Not sure how to approach into this further to let this not happen again with the database or application. Please let me know the best approach in this kind of situation.
Tom Kyte

Followup  

August 02, 2007 - 9:15 am UTC

umm, that is 29 DAYS LONG???? that is useless.

you cannot really do anything with this statspack. You have lots of enqueue contention (locks, blocking locks) but other than that, you cannot really say anything conclusive.

Further information about my question

July 31, 2007 - 2:39 am UTC

Reviewer: A reader

For further information about my question. I am using Oracle 9i (9.2.0.7.0)

Further to: Question about enqueue & CPU utilization

August 03, 2007 - 1:05 am UTC

Reviewer: Suraj Sharma from Bangalore, India

Tom,

I tried with statspack for the period where the CPU spike was in the month, but again I found different reasons at different time. I had CPU spike (which was for long) for about 6 times. Out of these three times the statspack showed in top 5 wait event as Latch Free and other three times it showed enqueue. I was not sure what to do in this case so I took the statspack report for a month and tried to figure out the issue and for the month's statspack I got enqueue as major contributor for wait event.

If the wait event was CPU time or db file sequential read/
db file scattered read my approach would have been to look into the SQL statements or the cache size, but for enqueue I am not sure the best approach. Please let me know about the same.


Tom Kyte

Followup  

August 05, 2007 - 1:12 pm UTC

... but
again I found different reasons at different time.....


precisely why a monthly report is - useless.


think about it - you have different problems at different times, they require different solutions.

enqueue = locks = application issue almost certainly.

Further to: Question about enqueue & CPU utilization

August 03, 2007 - 1:05 am UTC

Reviewer: Suraj Sharma from Bangalore, India

Tom,

I tried with statspack for the period where the CPU spike was in the month, but again I found different reasons at different time. I had CPU spike (which was for long) for about 6 times. Out of these three times the statspack showed in top 5 wait event as Latch Free and other three times it showed enqueue. I was not sure what to do in this case so I took the statspack report for a month and tried to figure out the issue and for the month's statspack I got enqueue as major contributor for wait event.

If the wait event was CPU time or db file sequential read/
db file scattered read my approach would have been to look into the SQL statements or the cache size, but for enqueue I am not sure the best approach. Please let me know about the same.


Further to: Question about enqueue & CPU utilization

August 08, 2007 - 6:03 am UTC

Reviewer: Suraj Sharma from Bangalore, India

Tom,

My apology for not being so specific. Actually, we take data of a month say like CPU spikes during the month, physical I/O during the month etc. and do our analysis and for this database we had issue with I/O and CPU spike both at different times during the whole month so that was the approach I took of taking statspack for a whole month and analyze it like which are SQLs contributed during the month in terms of physical and logical I/O, what wait event contributed the most etc.
I hope you understand why I did analysis for a month long statspack.

Thanks,

Tom Kyte

Followup  

August 14, 2007 - 10:02 am UTC

there is nothing you can get from a month long statpack - I'm hoping to make you understand that.

You have no ideas what sql's are contributing to your spikes, none at all.

Unless and until you zero in on those times - take 15 minute observations, then study one from a "spike" time.

September 13, 2007 - 2:12 pm UTC

Reviewer: kulbhushan from India

Tom,
1. Can you list some database activities that causes cpu utiliization & which are most expensive ones?
2. Also how cpu time for any query or other database operation is measured or calculated?
Tom Kyte

Followup  

September 15, 2007 - 7:43 pm UTC

1) anything you do, period. there are none that cause "the most" or are "more expensive".

2) it is not calculated, it is reported. Queries use "so much cpu", the OS tells us this.

September 16, 2007 - 5:26 pm UTC

Reviewer: kulbhushan

1.Wanted to know like for eg sorting,database writing,database reading or latching(which involves cpu pooling) etc will cause more cpu utilization (%cpu).
for eg you say if parsing of sql is high it causes additional cpu utilization and latching .

2. Ok OS reports but how it decides the percentage of cpu for any operation.
for eg at os level some for oracle server process it shows 40% cpu consumption how it reports that.

3. Additional question here like oracle also reports in v$*stat views about cpu usage how its measured /reported/calculated ? (i guess they are some units


Thanks for above answers

Tom Kyte

Followup  

September 18, 2007 - 1:32 pm UTC

1) well, there is a big difference between me saying "stop parsing so much, it consumes a lot of resources" (something you DO NOT NEED TO DO AS MUCH AS YOU TYPICALLY DO IT) and trying to stack rank those things that are generally things you must do (and hence cannot really avoid).

sorts take more than writes, no wait - writes take more than sorts - sometimes - it depends.

I say "stop parsing so much, it takes latches, it takes time, it takes CPU - and you do not need to do it". I did not ever put it in a rank of any sort....

2) it looks at the resource (cpu) and just reports "the facts" - the utilization. not sure what words you are looking for here.

3) we get it from the OS.

September 20, 2007 - 12:37 pm UTC

Reviewer: A reader

Tom,
One final question
After reading above answers i get a impression that whenever there is high cpu utilization on a machine ,i should be looking at database activities which are consuming most cpu and depending upon that i take action.

kulbhushan

CPU question

January 08, 2008 - 8:49 pm UTC

Reviewer: A reader

Hi Tom, how are you? I have the following question regarding with CPU utilization. I have a 10gR2 database with 10 CPU's with 70 GB of memory. We are not sure whether the server is configured optimally to use all the CPU's. Is there any tool or method to identify it? Could you help me on this?

Thanks a lot!
Tom Kyte

Followup  

January 09, 2008 - 7:11 am UTC

the server is absolutely configured to optimally used all of the cpu's

oracle is SMP aware
your OS is SMP aware

Now, the question is whether you are taking advantage of it correctly....

are you characterized by lots of concurrent users - then serial query execution is correct. Probably this is you - you are done.

are you characterized by 1 or 2 concurrent users doing REALLY BIG queries? Then maybe - just maybe - parallel query is something to consider. The data warehousing guide has details for using that particular feature:
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14223/toc.htm

but short of that - you are likely "done" (well, assuming you've designed your application to be efficient and use only the cpu it really needs...)

Increasing CPU Utilisation

January 21, 2008 - 5:26 am UTC

Reviewer: fuz from United Kingdom

Hello Tom,

You mention that Oracle and the o/s are SMP aware. Does this mean that a large single query running on Oracle will automatically load balance across all available cpu ?.

The reason I ask is that I have noticed some "unexpected" behaviour between Oracle running on Windows and Linux (Xeon dual cpu, dual core cpu's) and Sun SPARC Solaris 10 (T2000 Single core with 32 threads). When testing a large and long running query with a single session on 10gR2 (consisting of several million inserts followed by selects) I notice that on the Xeon cpu's that the query workload appears to be balanced across all the cores of all the cpu's on the box at approx 25%+ activity per cpu. Whereas on the Solaris box (where threads are represented as cpu's on a local zone) I notice a single cpu appears to become saturated by the workload but the others cpu's are almost idle. The only way I am able to force the other cpu's into action is to manually break the query into seperate concurrently running scripts or by using Oracle's parallelism.

I would have expected that as Oracle and the o/s are SMP aware that the workload on SPARC would be automatically load balanced across available CPU's. Would I be correct in this assumption ?.

Regards,
fuz
Tom Kyte

Followup  

January 21, 2008 - 8:18 am UTC

... Does this mean that a large single query running on Oracle will automatically load balance across all available cpu ?. ...

If you have enabled and desire parallel query - for you see, back in the day (7.1.6 of Oracle when parallel query was introduced), it was not uncommon for a data 'warehouse' to have a single user or two users - and breaking a query into 16 bits for a single user to completely use all 8 cpus was desirable.

Today, with the exception of administration, parallel query is not as desirable because today instead of one or two users, we have one hundred or two hundred users all trying to use those 32 cpus and if they use lots of them at once, it'll never work.

Sounds like you ran a serial process - and on one OS it unfortunately context switched it over many CPU's and on the other more efficient one, it kept it on the same CPU (meaning, what you think you wanted was not what you wanted). If you have a serial process - a bunch of inserts followed by some NON-PARALLEL executed selects - I'd want one CPU to remain pegged - not all of them evenly so - remember a cpu does not get tired, but to switch from cpu to cpu takes a lot of effort.

script to find high CPU usage process

May 01, 2008 - 12:21 am UTC

Reviewer: Manian from Ohio, USA

Tom,

Is there a script or ways to find the oracle process and its corresponding SQL consuming high CPU usage when the CPU on the Sever is 0% idle ?

Thanks


Tom Kyte

Followup  

May 01, 2008 - 9:32 am UTC

look for active sessions that have a large "last_call_et", that'll show long running calls - they would *typically* be what you are looking for.

How to interpret cpu_time in v$SQL

May 01, 2008 - 11:24 am UTC

Reviewer: Manian


Thanks Tom.

<< look for active sessions that have a large "last_call_et", that'll show long running calls - they would *typically* be what you are looking for.

We have CPU_TIME and ELAPSED_TIME columns in v$SQL .
Can i use CP_TIME column to find out the actual CPU usage in seconds if its possible.

How to interpret this value to CPU time in seconds ?
I am on 9.2.0.6.

Appreicate your help.

Thanks

Tom Kyte

Followup  

May 01, 2008 - 11:52 am UTC

v$sql is aggregated over all sessions, it is not session specific. you said you were looking for "a session"

so, what are you looking for then - if you are looking for 'sessions that are currently active and been running for a while consuming resources' - you have it.

May 01, 2008 - 12:13 pm UTC

Reviewer: Manian

Thanks Tom.
Is it possible to find the cpu time also used by the session consuming more cpu with actual SQL being run (i can find the SQL from v$SQL or V$sqltext).
Tom Kyte

Followup  

May 01, 2008 - 2:44 pm UTC

cpu is contributed to v$ views at the END of a call, so, if you have a session that has been active for an hour (last_call_et tells you how long), you will not see their sql.

the ASH tables can provide some insight into what is going on in the session...

suggestion: get off of the focus of "CPU CPU CPU". Look at who is doing work and has been doing work for a while.

June 17, 2008 - 9:39 am UTC

Reviewer: Sam Mousleh from France

The statspack report shows the following in the "SQL ordered by Reads for DB:" section:
CPU      Elapsd
 Physical Reads  Executions  Reads per Exec %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
     19,868,163          921       21,572.4   83.9  1700.65   1684.50  444294199


And the following is the explain plan using the hash value:

STATSPACK SQL report for Hash Value: 444294199 Module: JDBC Thin Client

DB Name         DB Id    Instance     Inst Num Release     Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
SABO           310423053 SABO                1 9.2.0.6.0   NO      su1162

 Start Id     Start Time         End Id      End Time       Duration(mins)
--------- ------------------- --------- ------------------- --------------
    28176 05-Jun-08 13:00:03      28178 05-Jun-08 14:00:03           60.00

SQL Statistics
~~~~~~~~~~~~~~
-> CPU and Elapsed Time are in seconds (s) for Statement Total and in
   milliseconds (ms) for Per Execute
                                                       % Snap
                     Statement Total      Per Execute   Total
                     ---------------  ---------------  ------
        Buffer Gets:      39,008,039         42,354.0    8.57
         Disk Reads:      19,868,163         21,572.4   83.90
     Rows processed:              38              0.0
     CPU Time(s/ms):           1,701          1,846.5
 Elapsed Time(s/ms):           1,685          1,829.0
              Sorts:               0               .0
        Parse Calls:               0               .0
      Invalidations:               0
      Version count:               1
    Sharable Mem(K):              25
         Executions:             921

SQL Text
~~~~~~~~
SELECT /*+ INDEX(SAOPTDEM SAOPTDEM_I2)*/ S.NDEMNUM
, S.NCNTNUM
, S.CCNTSCS
, S.CDEMPROD
, DECODE (S.CDEMTYPACT,'S', 'A', 'R', 'S')
, G.GEPR_ECHEANCE
, S.CDEMMED
, S.CDEMEMET
FROM SAOPTDEM S, GEPARAM_PROD G
WHERE ROWNUM = 1
AND S.CDEMSTAT = :B1
AND S.CDEMTYPACT IN ('S', 'R')
AND G.GEPR_MODE = 'TP'
AND S.CDEMPROD = G.GEPR_CODE_PROD
FOR UPDATE OF CDEMSTAT

Known Optimizer Plan(s) for this Hash Value
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Shows all known Optimizer Plans for this database instance, and the Snap Id's
they were first found in the shared pool. A Plan Hash Value will appear
multiple times if the cost has changed
-> ordered by Snap Id

First First Plan
Snap Id Snap Time Hash Value Cost
-------- --------------- ------------ ----------
25343 02 Apr 08 05:00 1677235172 4247
25367 03 Apr 08 05:00 1677235172 4251

Plans in shared pool between Begin and End Snap Ids
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Shows the Execution Plans found in the shared pool between the begin and end
snapshots specified. The values for Rows, Bytes and Cost shown below are those
which existed at the time the first-ever snapshot captured this plan - these
values often change over time, and so may not be indicative of current values
-> Rows indicates Cardinality, PHV is Plan Hash Value
-> ordered by Plan Hash Value

--------------------------------------------------------------------------------
| Operation                      | PHV/Object Name     |  Rows | Bytes|   Cost |
--------------------------------------------------------------------------------
|SELECT STATEMENT                |----- 1677235172 ----|       |      |   4251 |
|FOR UPDATE                      |                     |       |      |        |
| COUNT STOPKEY                  |                     |       |      |        |
|  HASH JOIN                     |                     |   101K|    3M|   4251 |
|   TABLE ACCESS FULL            |GEPARAM_PROD         |   128 |    1K|      7 |
|   TABLE ACCESS FULL            |SAOPTDEM             |   144K|    3M|   4243 |
--------------------------------------------------------------------------------

End of Report

So, "Reads per Exec" is 21,572.4 and the "%Total Time" is 83% and it is executed 921 times with a total of 19,868,163
physical reads. also it is show in the "SQL Statistics" section:
 Buffer Gets:      39,008,039         42,354.0    8.57
         Disk Reads:      19,868,163         21,572.4   83.90
     Rows processed:              38              0.0
     CPU Time(s/ms):           1,701          1,846.5
 Elapsed Time(s/ms):           1,685          1,829.0


I can see they are doing full table scan on both tables, even thought there is 3 indexes on SAOPTDEM table. But to get the data needed with the conditions in the query the index is not used because it represent different columns.
Is the best way to add new index on this table according to the conditions on the query? to minimize cpu utilisation and disk read.

cpu_count

June 27, 2008 - 6:40 am UTC

Reviewer: Deepak from India

Hi Tom,

I have Sun T5240 server having 128 cpu threads.

My oracle cpu_count parameter is getting set to 128 by default. And parallel_threads_per_cpu=2 (default), parallel_automatic_tuning=false(default).

My DB is an OLTP database.

I am observing a lots of "PX qref latch" waits.

Please advise me whether I need to reduce the cpu_count parameter to 4 or 8 to get rid of the above wait event.

Please help.
Tom Kyte

Followup  

June 27, 2008 - 9:21 am UTC

if you are oltp, you should not be using parallel at all. Why did you parallel enable your segments?

parallel

June 27, 2008 - 1:37 pm UTC

Reviewer: DEEPAK from India

Hi Tom,

Thanks for your response. No we have not set parallel on our segments. But during statistics collection using dbms_stats I could see few queries with parallel hint. Not sure whether Oracle uses them by default.

Also does the query against data dictionary uses parallel hint/few sys owned segments have been marked as parallel?

Tom, Do I need to alter my cpu_count value if I do not need this degree of parallelism?
Tom Kyte

Followup  

June 27, 2008 - 1:45 pm UTC

why are you using parallel hints then, if you are OLTP, you would have nothing needing parallel - they should run SLOWER in parallel

Yes, if you tell us to use parallel we will.


Why are you using parallel in an OLTP system?

memory usage

July 05, 2008 - 4:09 am UTC

Reviewer: pradikan from India

Find one of our DB details below

Data base ¿oracle 8i(enterprise)
size=2 terabyte
UNIX 11
RAM SIZE=28GB

My question is oracle instance is using only 11 GB at any given point of time. How can we improve the percent utilization of memory

thanks in Advance
Tom Kyte

Followup  

July 07, 2008 - 11:43 am UTC



There is no answer for this. You could increase your SGA, you could increase your PGA. If you increase your SGA, it might be in the large pool, it might be in the buffer cache, it might be it might be it might be. If you increase your pga (really old stuff here) it might be a really big sort area size, or maybe your retained size needs adjusting, or maybe you need to use parallel query and get many sessions going or maybe it is the hash area and so on.

"it depends", it depends on what could benefit (if anything, nothing says more memory will reduce runtimes or make anything better), you find that out and then we can talk...

More Cpu

July 21, 2008 - 5:21 am UTC

Reviewer: Subbu from India

In my database(oracle9i) one delete statment taking more cpu.can you guide me how to over come this?
no of cpu is 4.


Tom Kyte

Followup  

July 22, 2008 - 10:26 am UTC

no one could guide you with this level of information.


In my garage, one of my cars won't start. can you guide me how to over come this?



more cpu than what?
what is "more cpu"

why does it matter if a delete takes more cpu than some other delete - I would not expect all sql to use the same resources, it would not even be remotely possible.

cpu time versus percentage of cpu used,

October 08, 2008 - 12:47 pm UTC

Reviewer: A reader

I have a schedule job that runs every 30 minutes and collect information from v$sql and puts it in a user defined table.

At the end of the day, if I sum the CPU_TIME column, I see a big number (400,000 seconds -- I divided the original number by 1000000 to get the value in seconds).

Now, using this 400k as a value to make an inference, how can I say 400k cpu seconds is too high or average or too low? I need to correlate it to the physical cpus on the machine.

If I can say 400k seconds of cpu time had consumed x% of cpu, then I can kind of make inference.

Appreciate your help on this.

Thanks,

Tom Kyte

Followup  

October 08, 2008 - 10:20 pm UTC

... Now, using this 400k as a value to make an inference, how can I say 400k cpu
seconds is too high or average or too low? I need to correlate it to the
physical cpus on the machine.

...

you cannot.

what if you have a single cpu (just to make the math easy)

so that means you have 3,600 cpu seconds per hour.

Say you have observed from 12 midnight to 6am - 0 cpu seconds (idle)
from 6am-3pm you have 32,400 cpu seconds (busy)
from 3pm till midnight you have 0 again.


so, for 24 hours you have 86,400 cpu seconds, you used 32,400 - that means on average you were 37.5% utilized right?


No, for 6 hours you were idle.
For 9 hours you really needed more cpu - you were 100% pegged - undoubtedly you need more.
For 9 hours you were idle again.

Averages - not useful over a long period where the workload changes.

No matter what numbers you give me - I can develop a story whereby the numbers are

a) awesome, the best ever
b) neither good nor bad
c) the worst ever

For the same numbers...

So, what is YOUR goal, what are you actually trying to measure.

It is easy to measure.
It is really hard to understand what you are trying to measure or why.

CPU as an attribute,

October 09, 2008 - 11:20 am UTC

Reviewer: A reader

Thanks a lot for a beautiful explanation.

I am trying to measure the amount of CPU used by our application in order to determine whether we need to upgrade our single instance to RAC database. This is one of the measure we are using to figure out for the upgrade.

We are anticipating new application on the existing database with more users and more usage of database.

Your explanation gave me insight now to think in the right direction.

Lot of times we may not get an exact answer from you, but your thoughts will help us to focus our attention in the right direction.

Thanks,

Tom Kyte

Followup  

October 09, 2008 - 11:23 am UTC

You would want to trend out CPU utilization over time - the smaller the interval the better. It might well be that capturing this at the OS level will be easier and less intrusive (over querying of the v$ tables can itself lead to a performance issue)

CPU info,

October 09, 2008 - 12:12 pm UTC

Reviewer: A reader

I am extracting CPU info from v$ tables ever half an hour and I did that for 3 weeks (I have stopped it now).

I have found that the application had used up to a maximum of 75% CPU in a given hour. However, in a given day the average CPU utilization was under 20%.

You are right about capturing them from OS. In Unix flavor OS, I remember using vmstat. Also, recently I have used NMON utility which runs ons LINUX and AIX. This is a great tool. Here we have Windows and I am not sure how to capture them in a regular interval.

Thanks a lot.




CPU utilization: diagnostic tools

October 09, 2008 - 12:58 pm UTC

Reviewer: Rajendra from Anchorage, Alaska

In response to the last reviewer...
Let me suggest these diagnostic tools:

OS Watcher: Please see metalink Note:301137.1
LTOM System Profiler: Please see metalink Note:461052.1

CPU utilization: diagnostic tools

October 09, 2008 - 3:01 pm UTC

Reviewer: Rajendra from Anchorage, Alaska USA

Oops! I missed including:

OS Watcher For Windows (OSWFW) : Please see metalink Note:433472.1

CPU time,

October 09, 2008 - 3:48 pm UTC

Reviewer: A reader

Thanks Rajendra for metalink information.

With regard to CPU time, I have another question:

If total CPU time for one hour is 3600 seconds (in a one CPU box), can oracle process consume more than 3600 seconds in one hour?

I am seeing in one situation where total cpu taken by Oracle is 16k seconds in one hour window but the total available for one hour in that box is 4*3600=14400 seconds (4 cpu box).

How is this possible?

Thanks,



how to tune this query

February 22, 2009 - 9:30 am UTC

Reviewer: A reader

Tom,
We have the query running over 10 hrs and not finish. We added the RULE hint and it completed in less 3.5 hours.
It returned about 24 million records. I did the trace on the query with the wait events.
I understood that RULE to not the way to go. Could you please give advices on how making the improvement under cost based optimizer?
Also it was shown by tkprof for this query, the cpu time was 30 mins by the elapsed was 3.5 hrs. SHould " db file sequential read" be included in CPU time? I do not understand why was elapsed so hugh.

10.2.0.4 RDBMS
the tables involved:
NUM_ROWS TABLE_NAME
--------- ------------------------------
5 PA_RESOURCE_LISTS_ALL_BG
6285 PA_RESOURCE_LIST_MEMBERS
7227 HR_ALL_ORGANIZATION_UNITS
17062 PA_RESOURCES
95140 PA_BUDGET_VERSIONS
784450 P2PA_WORK_ITEM_INTERFACE_STG
2107920 PA_TASKS
2773940 PA_RESOURCE_ASSIGNMENTS
68909250 PA_BUDGET_LINES
54280 PROJECTS


select count(*) from
(
SELECT /*+ rule */
P.PROJECT_NUMBER,
P.FOA,
P.ORG_ID,
DECODE (R.TRACK_AS_LABOR_FLAG, 'N', H.NAME, R.NAME) ORG,
T.TASK_ID,
T.PM_TASK_REFERENCE,
T.TASK_NUMBER,
T.DESCRIPTION,
WIS.WORK_ITEM_CODE,
DECODE (R.TRACK_AS_LABOR_FLAG, 'Y', 'LABOR',R.NAME) RESOURCE_CODE, NVL (SUM (L.BURDENED_COST), 0) BUDGET,
L.PERIOD_NAME PERIOD,
DECODE (TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'MM'),
1, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
2, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
3, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
4, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
5, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
6, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
7, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
8, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
9, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
TO_CHAR (ADD_MONTHS (TO_DATE ('01-' || PERIOD_NAME), 12),
'YYYY'
)
) FY,
M.ENABLED_FLAG,
P.REFRESH_DATE
FROM
APPS.PA_BUDGET_LINES L,
APPS.PA_BUDGET_VERSIONS B,
APPS.PA_RESOURCE_ASSIGNMENTS A,
APPS.PA_RESOURCE_LIST_MEMBERS M,
APPS.PA_RESOURCE_LISTS_ALL_BG BG,
APPS.PA_RESOURCES R,
APPS.PA_TASKS T,
PROJECTS P,
APPS.HR_ALL_ORGANIZATION_UNITS H,
P2PA.P2PA_WORK_ITEM_INTERFACE_STG WIS
WHERE
LENGTH (L.PERIOD_NAME) >= 5
AND A.RESOURCE_ASSIGNMENT_ID = L.RESOURCE_ASSIGNMENT_ID
AND A.BUDGET_VERSION_ID = B.BUDGET_VERSION_ID
AND A.PROJECT_ID = P.PROJECT_ID
AND A.TASK_ID = T.TASK_ID
AND A.RESOURCE_LIST_MEMBER_ID = M.RESOURCE_LIST_MEMBER_ID
AND T.PROJECT_ID = P.PROJECT_ID
AND T.CARRYING_OUT_ORGANIZATION_ID = H.ORGANIZATION_ID
AND R.RESOURCE_ID = M.RESOURCE_ID
AND M.RESOURCE_LIST_ID = BG.RESOURCE_LIST_ID
AND B.RESOURCE_LIST_ID = BG.RESOURCE_LIST_ID
AND B.PROJECT_ID = P.PROJECT_ID
AND B.BUDGET_STATUS_CODE = 'B'
AND UPPER (B.BUDGET_TYPE_CODE) = 'CURRENT'
AND B.CURRENT_FLAG = 'Y'
AND B.BUDGET_ENTRY_METHOD_CODE = 'OP3 COST BY GL PERIOD'
AND BG.RESOURCE_LIST_ID = '1001'
AND A.PROJECT_ID = B.PROJECT_ID
AND T.PROJECT_ID = B.PROJECT_ID
AND M.RESOURCE_LIST_ID = B.RESOURCE_LIST_ID
AND WIS.TASK_ID(+) = T.TASK_ID
AND WIS.PROJECT_FOA_CODE IS NULL
GROUP BY
P.PROJECT_NUMBER,
P.FOA,
P.ORG_ID,
DECODE (R.TRACK_AS_LABOR_FLAG, 'N', H.NAME, R.NAME),
T.TASK_ID,
T.PM_TASK_REFERENCE,
T.TASK_NUMBER,
T.DESCRIPTION,
WIS.WORK_ITEM_CODE,
DECODE (R.TRACK_AS_LABOR_FLAG, 'Y', 'LABOR',R.NAME),
L.PERIOD_NAME,
DECODE (TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'MM'),
1, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
2, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
3, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
4, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
5, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
6, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
7, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
8, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
9, TO_CHAR (TO_DATE ('01-' || PERIOD_NAME), 'YYYY'),
TO_CHAR (ADD_MONTHS (TO_DATE ('01-' || PERIOD_NAME), 12),
'YYYY'
)
),
M.ENABLED_FLAG,
P.REFRESH_DATE
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.03 0 0 0 0
Execute 1 0.01 0.00 0 0 0 0
Fetch 2 1933.21 12363.11 2101958 27765424 128 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1933.24 12363.15 2101958 27765424 128 1

Misses in library cache during parse: 1
Optimizer mode: RULE
Parsing user id: 239 (P2RP)

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=27765424 pr=2101958 pw=481470 time=1675264037 us)
23929794 VIEW (cr=27765424 pr=2101958 pw=481470 time=1684011509 us)
23929794 SORT GROUP BY (cr=27765424 pr=2101958 pw=481470 time=1660081710 us)
23931662 TABLE ACCESS BY INDEX ROWID PA_BUDGET_LINES (cr=27765424 pr=1620488 pw=0 time=11319812547 us)
25220293 NESTED LOOPS (cr=20266707 pr=596799 pw=0 time=4766771743 us)
1288630 FILTER (cr=16257150 pr=408721 pw=0 time=3546428704 us)
1468625 NESTED LOOPS OUTER (cr=16257150 pr=408721 pw=0 time=3454337579 us)
1288674 NESTED LOOPS (cr=12477143 pr=313841 pw=0 time=2734685743 us)
1288674 NESTED LOOPS (cr=9899793 pr=313804 pw=0 time=2705046216 us)
1288674 NESTED LOOPS (cr=7244183 pr=313803 pw=0 time=2671539798 us)
1288674 NESTED LOOPS (cr=4666833 pr=313770 pw=0 time=2640611590 us)
1288674 NESTED LOOPS (cr=601618 pr=145418 pw=0 time=1266858923 us)
46841 NESTED LOOPS (cr=159779 pr=6438 pw=0 time=38572011 us)
54285 NESTED LOOPS (cr=4049 pr=2446 pw=0 time=1896186 us)
1 INDEX UNIQUE SCAN PA_RESOURCE_LISTS_U1 (cr=1 pr=1 pw=0 time=50306 us)(object id 41704)
54285 MAT_VIEW ACCESS FULL PROJECTS (cr=4048 pr=2445 pw=0 time=1737309 us)
46841 TABLE ACCESS BY INDEX ROWID PA_BUDGET_VERSIONS (cr=155730 pr=3992 pw=0 time=39027296 us)
46845 INDEX RANGE SCAN PA_BUDGET_VERSIONS_N1 (cr=108850 pr=12 pw=0 time=2796211 us)(object id 327017)
1288674 TABLE ACCESS BY INDEX ROWID PA_RESOURCE_ASSIGNMENTS (cr=441839 pr=138980 pw=0 time=1230899054 us)
1288674 INDEX RANGE SCAN PA_RESOURCE_ASSIGNMENTS_U2 (cr=178516 pr=84826 pw=0 time=518461774 us)(object id 327016)
1288674 TABLE ACCESS BY INDEX ROWID PA_TASKS (cr=4065215 pr=168352 pw=0 time=1327726906 us)
1288674 INDEX UNIQUE SCAN PA_TASKS_U1 (cr=2577350 pr=5038 pw=0 time=67397087 us)(object id 41591)
1288674 TABLE ACCESS BY INDEX ROWID HR_ALL_ORGANIZATION_UNITS (cr=2577350 pr=33 pw=0 time=28278972 us)
1288674 INDEX UNIQUE SCAN HR_ORGANIZATION_UNITS_PK (cr=1288676 pr=0 pw=0 time=14428339 us)(object id 43498)
1288674 TABLE ACCESS BY INDEX ROWID PA_RESOURCE_LIST_MEMBERS (cr=2655610 pr=1 pw=0 time=29911262 us)
1288674 INDEX UNIQUE SCAN PA_RESOURCE_LIST_MEMBERS_U1 (cr=1288676 pr=1 pw=0 time=13704106 us)(object id 41759)
1288674 TABLE ACCESS BY INDEX ROWID PA_RESOURCES (cr=2577350 pr=37 pw=0 time=27182217 us)
1288674 INDEX UNIQUE SCAN PA_RESOURCES_U1 (cr=1288676 pr=0 pw=0 time=14001179 us)(object id 41618)
1303944 TABLE ACCESS BY INDEX ROWID P2PA_WORK_ITEM_INTERFACE_STG (cr=3780007 pr=94880 pw=0 time=778266760 us)
1303944 INDEX RANGE SCAN P2PA_WORK_ITEM_INT_STG_N4 (cr=2583578 pr=586 pw=0 time=41760067 us)(object id 330211)
23931662 INDEX RANGE SCAN PA_BUDGET_LINES_U1 (cr=4009557 pr=188078 pw=0 time=1154833857 us)(object id 517881)

error during retrieval of EXECUTE PLAN output
ORA-30929: ORDER SIBLINGS BY clause not allowed here

parse error offset: 264

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
db file sequential read 1618068 0.70 10471.71
direct path write temp 12380 0.18 154.53
latch: cache buffers chains 2 0.00 0.00
db file scattered read 197 0.05 1.75
direct path read temp 28644 0.66 176.59
SQL*Net message from client 2 0.00 0.00
********************************************************************************

February 23, 2009 - 8:58 am UTC

Reviewer: A reader

Tom, do you need some more info relavent to the query in order to make comments?
Thanks.
Tom Kyte

Followup  

February 23, 2009 - 5:28 pm UTC

it is not a comment on existing content on this page

it is a query and plan that causes me to page up and down and down and down to see it all(eg: it is big)

I have nothing really to say about it, it would take a lot more detail (which is not approach here in the comment/review section) to even start looking at it. Sorry, but I'm skipping almost every "here is my big query, make it run faster" review these days - insufficient time to extract bit by bit the necessary detail to understand your data and say anything.

Try full table scans

February 24, 2009 - 1:48 pm UTC

Reviewer: A reader from FL

In reply to "a reader" with a query that returns 24 million rows, I see index range scans and index by row id accesses all over the execution plan. If the query is returning that many rows, have you tried NOT using the indexes?

This line is pretty revealing:

23931662 TABLE ACCESS BY INDEX ROWID PA_BUDGET_LINES (cr=27765424 pr=1620488 pw=0
time=11319812547 us)

You said PA_BUDGET_LINES contains 68.9 million rows and the query returns 24 million or 35% the total number of rows so a full scan MIGHT work better here.

Also, the sequential read event is related to index access, if you do full scans you'll see more scattered reads (multiblock reads) which could help improve performance.

As I've seen 'someone' say before: Full scans are not always bad! :)



CPU utilization on 10g

March 03, 2009 - 11:39 am UTC

Reviewer: DUR from PL

Hi Tom,
Could you please advice on approach:
I have cpu utilization issue. There is 100% CPU used by oracle on server. I know it from nmon. I took some AWR reports for different application activities
and most often Top 5 wait events are:
1. CPU time
2. latch: library cache
3. latch: shared pool
4. db file sequential read
5. latch: cache buffers chains

I'm assuming, that it's not parsing issue, because of % Non-Parse CPU is always over 99% and there is only a few hard parses.

Starting from the top I'm going to look at section SQL ordered by CPU Time to examine particular sql and look for ... exactly... to look for what - unnecessary LIO I think.

Or maybe CPU is utilized because of latches? So is looking at top-cpu sql the right way to investigate what in database is using so much CPU? What other checks should I perform?

Thanks in advance.



Tom Kyte

Followup  

March 03, 2009 - 9:03 pm UTC

CPU time is NOT a wait event, it is a timed event, it is impossible for CPU time to be a "wait" event. If you accrue CPU time - by definition - you are NOT waiting.

... I'm assuming, that it's not parsing issue, because of % Non-Parse CPU is always over 99% and there is only a few hard parses. ...


and yet, your biggest waits are for............... Parsing related activities. Interesting conclusion (but probably wrong)

Parsing -> cpu
Latching -> cpu
Parsing -> lots of latching

If your biggest waits are actually going after the shared pool and library caches, you have a parsing problem. Look at your latch report and see how often you SPIN on those latches, that'll be a HUGE consumer of cpu.

March 04, 2009 - 5:45 am UTC

Reviewer: DUR from PL

Tom, thank you for answer.

Indeed, there are latches with high spin gets. Top latches are:
Latch Name Get Requests Misses Sleeps Spin Gets Sleep1 Sleep2 Sleep3
cache buffers chains  226,915,586  288,423  1,500  287,657  0  0  0
transaction allocation  18,567,161  50,709  51  50,663  0  0  0
library cache  2,035,532  6,268  1,395  5,000  0  0  0

these values are from single aplication proces, from 13 minutes AWR report, but order of magnitude is similar in other AWR reports.
So, it leads me to identifying hot blocks.
Thank you for a clue.
Tom Kyte

Followup  

March 04, 2009 - 1:26 pm UTC

unfortunately, my brain cannot realign that data....


each of the misses represents "spinning", burning of CPU


and you know, if library cache is ..... (it is not a hot block thing, it is a parse thing for that one)...

April 17, 2009 - 7:25 am UTC

Reviewer: Scofield

Respected Mr Tom;
I am using Oracle 9i on Unix server.
The CPU is sometimes overloaded.
I can see oracle users who are using high cpu from the "top" command of unix.
However I couldnt manage to find which sql statement and user uses the most of the CPU.
I have been struglling with this for the couple of weeks and decided to ask you as a final solution.


I tried the following queries but cpu_usage always displayed "0" for all of the returned rows.
Do you have any particular script that can help me about this issue?

Sorry for any inconvinience..

************************************************
select nvl(ss.USERNAME,'ORACLE PROC') username,
se.SID,
VALUE cpu_usage
from v$session ss,
v$sesstat se,
v$statname sn
where se.STATISTIC# = sn.STATISTIC#
and NAME like '%CPU used by this session%'
and se.SID = ss.SID
order by VALUE desc

***************************************************
select * from
(select sql_text,
cpu_time/1000000000 cpu_time,
elapsed_time/1000000000 elapsed_time,
disk_reads,
buffer_gets,
rows_processed
from v$sqlarea
order by cpu_time desc, disk_reads desc
)
where rownum < 21
********************************************

Tom Kyte

Followup  

April 17, 2009 - 10:23 am UTC

why are you dividing by one billion?

anyway, use statspack or AWR - v$sqlarea is cumulative since instance startup time. You want to see the cpu used by a given set of queries in a span of time (say the last hour).


Take regular snapshots (if you have AWR, that is already happening, just use the reporting tool in enterprise manager to see what is going on). If you don't have AWR read up on statspack, install it, and schedule it to capture a snapshot on the top of the hour (and then set up a job you'll run to purge data over time as well..)

April 17, 2009 - 9:40 pm UTC

Reviewer: A reader

Respected Mr Tom:
I have local oracle database in my personal computer, when I run this:

select cpu_time from
(select sql_text,
cpu_time,
elapsed_time,
disk_reads,
buffer_gets,
rows_processed
from v$sqlarea
order by cpu_time desc, disk_reads desc
)
where rownum < 21

I can see sql texts with their cpu usage.
However, when I run the same script in the company, all the cpu usage is shown "0".
I dont undertand why..
Tom Kyte

Followup  

April 20, 2009 - 10:56 am UTC

what init.ora parameters are set in "company", sounds like they disabled the computation of these values (bad idea)


select name || ' = ' || value from v$parameter where isdefault = 'FALSE';



April 21, 2009 - 4:44 pm UTC

Reviewer: A reader

SQL> select name || ' = ' || value from v$parameter where isdefault = 'FALSE';

NAME||'='||VALUE
--------------------------------------------------------------------------------
_trace_files_public = TRUE
processes = 1400
sessions = 1545
timed_statistics = FALSE
resource_limit = TRUE
shared_pool_size = 419430400
sga_max_size = 2148530944
java_pool_size = 0
enqueue_resources = 900
nls_date_format = YYYYMMDDHH24MISS
control_files = /opt/oracle/admin/control01.ctl, /opt/oracle/admin/

NAME||'='||VALUE
--------------------------------------------------------------------------------
control02.ctl, /opt/oracle/admin/control03.ctl

db_block_size = 4096
db_32k_cache_size = 67108864
db_cache_size = 318767104
compatible = 9.2.0
log_archive_start = TRUE
log_archive_dest = /ora_archive/
log_archive_format = _%S_%t.arc
log_buffer = 163840
log_checkpoint_interval = 500000000

NAME||'='||VALUE
--------------------------------------------------------------------------------
db_files = 750
db_file_multiblock_read_count = 32
dml_locks = 900
rollback_segments = r01, r02, r03, r04, r31, r32, r33, r34, r35, r36
_enable_block_level_transaction_recovery = FALSE
remote_os_authent = TRUE
remote_os_roles = TRUE
db_domain = uxakprd1
global_names = FALSE
utl_file_dir = /axis/work/data1/debug
job_queue_processes = 3 
NAME||'='||VALUE
--------------------------------------------------------------------------------
hash_join_enabled = FALSE
background_dump_dest = /opt/oracle/admin/bdump
user_dump_dest = /opt/oracle/admin/udump
max_dump_file_size = 51200
core_dump_dest = /opt/oracle/admin/cdump
audit_trail = FALSE
sort_area_size = 10000000
sort_area_retained_size = 4000000
db_name = nzuw1p1
open_cursors = 3000
ifile = /opt/oracle/admin/pfile/config.ora

NAME||'='||VALUE
--------------------------------------------------------------------------------
optimizer_mode = CHOOSE 43 rows selected.


Tom Kyte

Followup  

April 21, 2009 - 5:24 pm UTC

and you just showed that what I said was true (the goal was for you to look at them, they are all documented)

timed_statistics = FALSE



that is a scary scary list of parameter settings - run away from that database.

hash_join_enabled = FALSE -- just ouch, ouch ouch ouch....

remote_os_authent = TRUE -- holy cow.

_enable_block_level_transaction_recovery = FALSE -- why? just to make it take longer to startup?

global_names = FALSE -- ugh

db_32k_cache_size = 67108864 -- using multiple block size tablespaces, probably thinking "we are good tuners"... ugh.


control_files = /opt/oracle/admin/control01.ctl, /opt/oracle/admin/control02.ctl, /opt/oracle/admin/control03.ctl -- funny, but scary at the same time.



timed statistics should never be set false, never.


April 22, 2009 - 3:29 am UTC

Reviewer: A reader

Mr tom
You are again and as always very helpfull.
Could you tell me what is the disadvantage of setting the below parameters as I did?
I just need your enlightining ideas instead of reading the explanations in the internet.



hash_join_enabled = FALSE -- just ouch, ouch ouch ouch....

remote_os_authent = TRUE -- holy cow.

_enable_block_level_transaction_recovery = FALSE -- why? just to make it take longer to startup?

db_32k_cache_size = 67108864 -- using multiple block size tablespaces, probably thinking "we are good tuners"... ugh.



Tom Kyte

Followup  

April 23, 2009 - 12:12 pm UTC

hash join enabled - hash joins are HUGELY EFFICIENT, best thing ever for large joins. If you disable them, you are disabling a feature added in version 7.3 some FOURTEEN YEARS AGO. You tell me why you think this is a good thing - you justify it to me.


remote_os_authent = security hole the size of a space shuttle. I'll just drop my laptop into your network, I'll create a user account on it, and you'll trust that I'm who I say I am.

I already commented on the underscore parameter? It is not obvious what I would say again???

And for the multiple caches - again, YOU justify to me why YOU think "this is good".

Here is a quote from expert oracle database architecture:

<quote>
So, now I have another buffer cache set up: one to cache any blocks that are 16KB in size. The default pool, controlled by the db_cache_size parameter, is 768MB in size and the 16KB cache, controlled by the db_16k_cache_size parameter, is 256MB in size. These two caches are mutually exclusive; if one ¿fills up,¿ it cannot use space in the other. This gives the DBA a very fine degree of control over memory use, but it comes at a price. A price of complexity and management. These multiple block sizes were not intended as a performance or tuning feature, but rather came about in support of transportable tablespaces¿the ability to take formatted data files from one database and transport or attach them to another database. They were implemented in order to take datafiles from a transactional system that was using an 8KB blocksize and transport that information to a data warehouse using a 16KB or 32KB blocksize.
The multiple blocksizes do serve a good purpose, however, in testing theories. If you want to see how your database would operate with a different block size¿how much space, for example, a certain table would consume if you used a 4KB block instead of an 8KB block¿you can now test that easily without having to create an entirely new database instance.
You may also be able to use multiple blocksizes as a very finely focused tuning tool for a specific set of segments, by giving them their own private buffer pools. Or, in a hybrid system with transactional users, you could use one set of data and reporting/warehouse users could query a separate set of data. The transactional data would benefit from the smaller blocksizes due to less contention on the blocks (less data/rows per block means less people in general would go after the same block at the same time) as well as better buffer cache utilization (users read into the cache only the data they are interested in¿the single row or small set of rows). The reporting/warehouse data, which might be based on the transactional data, would benefit from the larger blocksizes due in part to less block overhead (it takes less storage overall), and larger logical I/O sizes perhaps. And since reporting/warehouse data does not have the same update contention issues, the fact that there are more rows per block is not a concern, but a benefit. Additionally, the transactional users get their own buffer cache in effect; they do not have to worry about the reporting queries overrunning their cache.

But in general, the default, keep, and recycle pools should be sufficient for fine-tuning the block buffer cache, and multiple blocksizes would be used primarily for transporting data from database to database and perhaps for a hybrid reporting/transactional system.
</quote>

CPU used on nearly exact moment

November 10, 2010 - 8:57 am UTC

Reviewer: Juan from SPAIN

Hi tom,

I am using normally the next query to find top use of cpu :

SELECT /*+ ordered */ p.spid, s.sid, s.serial#, s.username,s.machine,
TO_CHAR(s.logon_time, 'mm-dd-yyyy hh24:mi') logon_time, s.last_call_et, st.value,
s.sql_hash_value, s.sql_address, sq.sql_text
FROM v$statname sn, v$sesstat st, v$process p, v$session s, v$sql sq
WHERE s.paddr=p.addr
AND s.sql_hash_value = sq.hash_value and s.sql_Address = sq.address
AND s.sid = st.sid
AND st.STATISTIC# = sn.statistic#
AND sn.NAME = 'CPU used by this session'
AND s.status = 'ACTIVE'
ORDER BY st.value desc ;

Until some time i thought the query was great until i found out one day comparing with enterprise manager the results where not exactly the same.

Top 1 of E.M. on last 5 minutes of cpu usage was my top 6 of use of cpu.

I imagine that my query depends of how long time a session has been working using cpu.
I mean that a session may have been two days running and my V$SESSTAT shows the cpu high but is not who is using cpu on the last 5 minutes (at least not to get worried about).

So how could i found out who is using on the last couple of minutes the cpu. Like Enterprise Manager does i wonder which query is being used by EM.(AWR , ASH ??)

Could you please give me a suggestion how to find out a correct query.

Thanks.
Tom Kyte

Followup  

November 11, 2010 - 1:50 pm UTC

... I imagine that my query depends of how long time a session has been working
using cpu ...

exactly.


EM would be using ASH for that.

You need to have a snapshot from one point in time, and another from another point in time.

And then you subtract them from each other, that shows how much of something was used in that period of time.

HIstory of CPU used by a user,

February 07, 2012 - 6:00 pm UTC

Reviewer: A reader

Hello,

I am looking for information that gives the CPU Used by a set of users in the past. I couldn't use v$sesstat. Is there a way to find out how much cpu seconds a user has consumed?

Thanks,

Tom Kyte

Followup  

February 08, 2012 - 1:31 am UTC

you can get a VAGUE picture of it using the ASH (active session history) repository that is part of AWR. See the DBA_HIST_* views, assuming you have access to AWR.

follow up,

February 08, 2012 - 8:11 am UTC

Reviewer: A reader

I prefer writing my own queries using DBA_HIST* views. I have extracted lot of interesting queries from your website and have documented them.

I have looked at dba_hist_sysstat, dba_hist_sqlstat and others but I couldn't get the VALUE like we get in v$sesstat. Which history view should I use here? Is the VALUE for "CPU used" is in centiseconds?

Thanks,



Tom Kyte

Followup  

February 08, 2012 - 12:12 pm UTC

you use this:

http://docs.oracle.com/cd/E11882_01/server.112/e25513/dynviews_1007.htm#I1030299

and count the number of times it was observed to be on the CPU (session_status) and 'assume' 1 second of time for that.

It will give you a vague, imprecise - but totally usable for a good estimate - of the amount of cpu the session used.

we do not track the summary, perfect, totally complete statistics session by session, we take snapshots that give you a good gist of what the session was doing.

follow up,

February 08, 2012 - 1:37 pm UTC

Reviewer: A reader

Thanks.  If I have 8 CPU on a server, then I would be having 3600 seconds * 24 hours * 8 = 691,200 CPU seconds in a day right?

Here is the following output I got querying dba_hist_active_sess_history;


SQL> select count(*),trunc(sample_time) from dba_hist_active_sess_history where session_state='ON CPU'
  2  group by trunc(sample_time);

   COUNT(*) TRUNC(SAM
---------- ---------
      8208 30-JAN-12
      2165 07-FEB-12
      2019 01-FEB-12
     23926 03-FEB-12
     11670 02-FEB-12
      1340 05-FEB-12
      2381 31-JAN-12
      4326 04-FEB-12
      2440 06-FEB-12
      2757 08-FEB-12

10 rows selected.

Does that mean, on 3rd Feb, the CPU used was 23926 seconds (approximately) which is less than 4% (the box has 8 cpus)?  

Thanks,

Tom Kyte

Followup  

February 08, 2012 - 2:33 pm UTC

use cores when counting - not "cpu count". Make sure you have 8 cores.

You cannot look at things over a day - everything is averaged out over too long of a span. You might have been 100% utilized for a burst of time, and then 0% for some other duration of time.

the dba_hist_active_sess_history view is a 10% sample of a 1 second sample.

The V$ table is 1 second samples of interesting events.

The dba_hist table is 10% of that - not every 10th snapshot, but about 1 out of 10 interesting snapshots.

Multiply by 10.

follow up

February 08, 2012 - 5:06 pm UTC

Reviewer: A reader

Thanks. That's a new piece of info (multiply by 10) that I can add to my document.
Tom Kyte

Followup  

February 08, 2012 - 9:01 pm UTC

remember - it is "vague" - you have a 10% sample of a 1 second sample of continuous information. It is purely a sample - an estimate.

cpu time

May 11, 2012 - 7:51 am UTC

Reviewer: A Reader

Tom,
Suppose

1 SQL has CPU time =5.5 sec
and total elapsed time = 5.6 sec

a) Does this mean there is nothing to be tuned as this query is not waiting for anything ( alomost)?

b)The only way would be to re-write it so that it uses less cpu , so that I can bring down CPU time ( which is 5.5 sec currently)?

regards








Tom Kyte

Followup  

May 11, 2012 - 8:21 am UTC

1) nope, it does not mean that.

are the 5.5 cpu seconds necessary - could it be done in 0.1 cpu seconds? Is it using an index inefficiently. Would adding one attribute to the existing index make it even better? Are the 5.5 cpu seconds spent latch spinning (busy work) or is it really needing 5.5 cpu seconds to do real work?

2) rewrite - maybe. Re-evaluate your schema (indexing for example), maybe. Re-evaluate your physical storage (partitioning, clustering, index organized tables, etc) - maybe

Historical CPU and other information

June 21, 2012 - 7:58 am UTC

Reviewer: lalu

Hello Tom,

I am using the below query to track the historical db CPU utilization.

select inst_id,min(CPU_UTIL),max(CPU_UTIL) from (
select inst_id,trunc(to_char(sample_time,'SSSSS')/2) "hourly_sample", trunc(count(*)/
(min(p.value)*2)*100,2) "CPU_UTIL",
count(*) "time consumed per sessions [s]"
from gV$ACTIVE_SESSION_HISTORY,
(select value from v$parameter where name = 'cpu_count') p
where 1=1
and sample_time >= to_date('18-06-2012 11:56:00','dd-mm-yyyy hh24:mi:ss')
and sample_time <= to_date('18-06-2012 12:33:00','dd-mm-yyyy hh24:mi:ss')
and session_state = 'ON CPU'
group by inst_id,trunc(to_char(sample_time,'SSSSS')/2)
)
group by inst_id
order by 1;

Does it provide me the actual CPU utilization, like I can see in SAR output in unix?
Or using DBA_HIST_ACTIVE_SESS_HISTORY makes more sense rather than the v$ACTIVE_SESSION_HISTORY?

Similar way I want to track down the historical memory usage etc.

Thanking you.

Tom Kyte

Followup  

June 22, 2012 - 6:59 am UTC

it is an estimate, it will not be precise. We sample every second or so and see what each session is doing. If they are on the cpu - we say "on cpu". Then you have to sort of assume they were on the cpu for an entire second - even if it might have been 1/1000th of a second.

so, it is a gross estimate, better than good enough - but not perfect.

if you use the dba table, the granularity goes down even more, that is a 10% sample of the 1 second samples. So, stick with the v$ view.

CPU utilization..

July 19, 2012 - 3:32 am UTC

Reviewer: Ashok from India

HI Tom,
This is Ashok.
Greetings of the day.
I am using oracle server,in oracle server C.P.U range is very high,present range is 89.8%us.
I want to decrease the C.P.U range like below 10%..
How can i solve this problem..
Thank you.

Kind regards,

Ashok.
Tom Kyte

Followup  

July 19, 2012 - 12:18 pm UTC

no you don't, 10% is never your goal.


Ok, if you are a reporting/warehouse system where you are controlling the number of active sessions (using resource manager and parallel execution servers) - you are doing great, you want that machine humming along at near full utilization (what I said above initial)

If you are a transacational system - you want to hum along at about 60% utilization to know you've got the machine fully utilized (but not more)

And the way to do that - manage resources. Watch this video:
http://www.youtube.com/watch?v=xNDnVOCdvQ0

cut down the size of your connection pool in the middle tier

or if you cannot convince the people running that bit of software simple math actually works (that happens, frequently, people do not believe *simple math*), then yous the resource manager to limit the number of active sessions - less efficient than a proper connection pool setting but it can do the trick. If you cannot use the resource manager - configure shared server and limit the number of shared servers.




CPU range

July 19, 2012 - 3:33 am UTC

Reviewer: Ashok from India

HI Tom,
This is Ashok.
Greetings of the day.
I am using oracle server,in oracle server C.P.U range is very high,present range is 89.8%us.
I want to decrease the C.P.U range like below 10%..
How can i solve this problem..
Thank you.

Kind regards,

Ashok.

High CPU

July 27, 2012 - 5:55 am UTC

Reviewer: Prashant from Mumbai, India

Hi Tom

I'm trying to test a very strange concurrent scenario. I know it wouldn't occur very often but I wanted to understand something from you.

I have a query that I submit at the exact same instant (almost via shellscript) in different sessions. I tested for 1,2,3,4,5,10 concurrent sessions.

# of users Response Time (sec)
1 user 150.92
2 users 159.91
3 users 201.26
4 users 257.12
5 users 313.39
10 users 642



All the sessions typically complete around the same time (within few seconds here and there).

I don't have anything else running.

I have 64 CPUs and I see that my overall CPU utilization is hardly 15% when I run the worst case (10 users / sessions).

I see that uses 1 CPU for each of those sessions and in all the tests it maxes out the CPU (sometimes >100%).

Coming to the query - This query is an expensive one with lots of IOs, around 156,000 (block size 32K).

I am sharing one of the session trace files (this is for 5 user run).

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       26    312.01     313.39     156423     158320          0         372
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       28    312.01     313.39     156423     158320          0         372

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      26        0.00          0.00
  asynch descriptor resize                       35        0.00          0.00
  Disk file operations I/O                        5        0.00          0.00
  gc cr grant 2-way                               1        0.00          0.00
  db file sequential read                         1        0.00          0.00
  reliable message                               12        0.00          0.02
  enq: KO - fast object checkpoint               24        0.00          0.00
  direct path read                               75        0.07          1.24
  SQL*Net message from client                    26        0.00          0.20
********************************************************************************



I like to know why does the response time increase as I keep increasing the users when I have so much processing power still available? I see that this query is completely on CPU. Can you please help?
Tom Kyte

Followup  

July 30, 2012 - 9:58 am UTC

because you are accessing shared resources - the buffer cache, shared memory, apparently the OS file system cache outside of Oracle (that is the only reason your db file sequential reads are so 'fast', they are not really coming from disk)


For example, you needed 158,320 blocks from the cache. For each block we have to do something like the following:

a) take the dba (data block address- file.block) from the rowid and hash it. This tells us what linked list in the buffer cache we have to go to.

b) try to get the cbc latch (cache buffers chains latch) on that list. If someone else has it - SPIN on the cpu (burn cpu) so as to wait for the latch (or mutex or whatever the serialization device is).

c) walk the list after getting it and find our block (or not - it appears NOT in most cases). If we found it - unlatch and we are done.

d) ask the OS to perform the physical IO - but it isn't really a physical IO since you undersized your SGA apparently and we are going to what I call the secondary SGA - the OS file system cache. So, we have concurrency issues here as well - the OS needs to perform much of the same type of work we just did in the buffer cache.... The more people you have - the more time they'll spend in spin waits too - the more cpu you need.

e) once you got the physical IO done - re-latch that list (spinning again!!) so you can put the block in there.


that is just the block processing, what about the temp you are doing (direct path read implied possible temp access - or maybe a lob, I don't know, I cannot see much here).

what about the shared pool you are accessing and so on.


so, you are waiting on the other sessions for some shared resource.

Thanks

July 30, 2012 - 10:32 pm UTC

Reviewer: Prashant from Mumbai, India

Hi Tom

Thank you so much for your valuable inputs.

Yeah SGA is undersized. We have been discussing about the need to increase this. We have around 160 GB free on our Linux host and our SGA is currently sized 12 G and PGA 8 G. We have not used AMM since we have hugepagesize enabled.

This one query itself consumes 4 GB.

We are a warehouse. I can reduce the LIOs by aggregating it to a higher grain like a quarter or year. This is a yearly report and it hits the month level fact currently.

Apart from the above are there any other techniques that you can recommend to improve the response time? Also to determine which of the CPU activities it spends more time which tools can I use?

Thanks and Regards
Prashant
Tom Kyte

Followup  

July 31, 2012 - 11:59 am UTC

are you employing compression? index compression and table compression?


Thanks

July 31, 2012 - 10:36 pm UTC

Reviewer: Prashant from Mumbai, India

Tom, we don't.
Tom Kyte

Followup  

August 01, 2012 - 6:43 am UTC

you should

Compression and ETL

August 02, 2012 - 11:31 pm UTC

Reviewer: Prashant Shenava from Mumbai, India

Thank you Tom. We also use goldengate for real time etl and our reporting tables are undergoing change every 5 mins.

By using compression we might be exposing ourselves to a risk of slower ETL, am I right? I will do some tests.
Tom Kyte

Followup  

August 15, 2012 - 4:44 pm UTC

You would be using OLTP compression, part of advanced compression - it should not effect ETL performance significantly - but then again you might not experience the best compression either.

I wouldn't care frankly if the compression made my ETL go slower, you ETL the data ONCE

You query it over and over. Think of the entire lifecycle - not a tiny bit of it. If it does go slower, so be it, it will be worth it.

interpret dba_hist_sysmetric_summary

August 06, 2012 - 10:08 am UTC

Reviewer: A reader

For different cpu related metrics, I got the information from dba_hist view.  However, I am not sure how to interpret the numbers.  

For example 77892.9981 is the average value for Host CPU Utilization (%).  The metric unit for this metric is  % Busy/(Idle+Busy).  I didn't understand how to read the value 77892.9981.  Your help is appreciated.
 

 SQL> select sum(minval),sum(maxval),sum(average),metric_name from dba_hist_sysmetric_summary where lower(metric_name) like '%cpu%' group by metric_name;

SUM(MINVAL) SUM(MAXVAL) SUM(AVERAGE) METRIC_NAME
----------- ----------- ------------ -----------------------------
          0  87873.8274   77892.9981 Host CPU Utilization (%)
          0  694250.769   618329.532 Host CPU Usage Per Sec
          0  77435.4813   22440.2342 CPU Usage Per Txn
          0  59631.5906   19362.9328 Background CPU Usage Per Sec
          0  59510.3195    35691.877 Database CPU Time Ratio
          0  226881.624   127560.008 CPU Usage Per Sec


Michel

August 29, 2012 - 12:40 pm UTC

Reviewer: A reader from France

I have this:

SQL> select sid, serial#, username, status, sql_id from v$session
  2 where status = 'ACTIVE'
  3    and username is not null;

       SID    SERIAL# USERNAME                       STATUS   SQL_ID
---------- ---------- ------------------------------ -------- -------------
         1          8 SYSTEM                         ACTIVE   as911snfq01py
        23          1 SYSMAN                         ACTIVE   2b064ybzkwf1y
        35         18 DBSNMP                         ACTIVE

For me, it looks very strange that a session could be active with sql_id equals NULL!
Could it be java?

...Twilight Zone...

January 24, 2013 - 9:41 am UTC

Reviewer: Don Cavin from Columbus, OH USA

Tom:

Been following you for decades, and I think this is the first time I'm scratching my head. I understand from a theoretical standpoint that you want to use all your CPU...but, my end users are screaming. A server with a saturated CPU translates to end users as an almost unusable database. Alert log looks good...I've just got to many users running too much crap at the same time...I think for newbies, a more holistic discussion of how the internals and server processes configuration and relation to interacting with the I/O subsystem would be a better approach to this question. For example, although my users need a fix now, as an experienced DBA, I'm going to take time and go back to making sure I setup files for best practice, index separated from tables, etc. Will let you know what I fix, and where my predecessor may have misconfigured things...I will post accordingly...other that that, cheers, sorry I missed you in Dayton!!
Tom Kyte

Followup  

January 30, 2013 - 1:01 pm UTC

well - I was thinking "batch operation" more than anything - there - going full throttle (since you control the resources very carefully yourself - you don't go over 100%, you take it to near 100%).

In OLTP, yes, being at 99.9% would be very bad - you want it in the mid 60's at best.

I should have stressed that a bit more.

July 04, 2013 - 11:06 am UTC

Reviewer: kunwar from indi

Hi Tom,
As far i have understood the cpu time is a component in the response time formula.
So if the CPU reaches 100% wont we get response time degradation?
Tom Kyte

Followup  

July 16, 2013 - 1:17 pm UTC

Not necessarily, it depends on what you are doing.


If you are doing a batch process - running reports - where you control the load, getting at or near 100% will improve throughput.

If you are an OLTP system - you likely want to keep cpu at or below 65% utiliziation.


Think about it, think about the different situations. Here we are, on a warehouse. Machine is idle right now. I have a big query. I could run it on one cpu - and it might take hours. Or I could run it on all 32 cores, using 100% of the machine and get it in minutes or seconds.

Of course, that approach doesn't work in OLTP, in OLTP we want to limit the number of tiny small transactions we perform concurrently - in order to avoid building a big run queue and backing the machine up.

See

http://tinyurl.com/RWP-OLTP-CONNECTIONS
http://tinyurl.com/RWP-DW-PART4

for a quick video on the two...

Database CPU metrics

August 26, 2013 - 9:31 am UTC

Reviewer: Arvind from Pune, India

Hello Tom,

I found following CPU related statistics in database:

select METRIC_ID, METRIC_NAME from dba_hist_sysmetric_summary
where upper(metric_name) like '%CPU%';

NETRIC_ID METRIC_NAME
-----------------------------------------------
2153 Background CPU Usage Per Sec
2155 Host CPU Usage Per Sec
2057 Host CPU Utilization (%)
2075 CPU Usage Per Sec
2076 CPU Usage Per Txn
2108 Database CPU Time Ratio

Does these statistics(specially host CPU utilization%) are only of Oracle's CPU utilization or it includes OS CPU utlization also?

Is there any query using which I can see OS CPU utilization and Oracle CPU utilization side by side?

Regards,

Arvind

%Total CPU and %Busy CPU

March 28, 2018 - 6:06 pm UTC

Reviewer: Rajeshwaran, Jeyabal

Team,

Was looking into an Application database AWR that has this information.

%Total CPU %Busy CPU %DB time waiting for CPU (Resource Manager)
66.1   98.6   0.0 


Could you help us to understand
a) what does %Total CPU denotes?
b) what does %Busy CPU denotes?
Connor McDonald

Followup  

March 30, 2018 - 4:39 am UTC

3% of an AWR report doesn't really help us.

Please log a new question, and we'll get you to mail us the full AWR report

Need help with cpu utilization

August 01, 2019 - 8:36 pm UTC

Reviewer: Diwakar Krishnakumar from United States

Dear Sir,

We have a scenario where:

1) There are several databases running on a specific server
2) The server is clustered - RAC

At any point of time in the past we would like to know the

1) CPU utilization on that server and
2) Usage being contributed by the specific database.

I see that there are a few ways to do this:

1) Analyze the AWR report of every database - might take a little longer to analyze all the reports - manual work - need to collect all the awr's and look into them manually....

2) Go to the OEM repository database and gather the details from there - Mostly works, but if for some reason there are issues with OEM at that point of time, we may need to have alternatives.

Given the above scenario, is there a specific query that you would advise us to execute, for gathering the usage at a specific point of time - may be we can run it in a loop over each of the db instance on that server.

We get to troubleshoot the specifics on databases very quickly and it would always be handy to have one such query - might be useful for others as well - can you suggest one here if you have it.

Please share your thoughts over the same, if any other approach is advisable.....

Thank you in advance......

Regards,
K.Diwakar

Need help with cpu utilization

August 01, 2019 - 8:52 pm UTC

Reviewer: Diwakar Krishnakumar from United States

If you think it is a higher ask for a readymade query, please let me know the relevant columns and the table/view that we need to use to extract the information - the aim is to get:
1) Total cpu utilization
2) Specific db's contribution to that and
3) This should be for a specific time in the past.

Thank you,

Regards,
K.Diwakar

Need help with cpu utilization

August 02, 2019 - 10:41 pm UTC

Reviewer: Diwakar Krishnakumar from United States

Would this query give the necessary details:

select *
from DBA_HIST_SYSMETRIC_SUMMARY
where begin_time between to_date('31-JUL-19 09.55.00 AM','dd-MON-yy hh:mi:ss PM') and
to_date('31-JUL-19 11.05.00 AM','dd-MON-yy hh:mi:ss PM')
and metric_name like '%CPU%'
order by snap_id,metric_name;

#########################

% cpu/db_time --> the db's share of cpu and
% busy/ (idle+busy) --> total cpu usage

ofourse we have min/max/average columns to give the specifics.


Connor McDonald

Followup  

August 05, 2019 - 10:00 am UTC

That looks like a sound approach. A couple of things

1) Remember that this very much depends on the frequency of snapshots. A 30min snapshot means I could

- a mean of 50%
- a max of 100%

and never truly know whether it was: 100% for 15mins, 0% for 15mins or 100% for a couple of seconds, and 50% for the rest. The standard deviation helps, but still...its important not to read *too much* into the data

2) For the total host I'd use:

Host CPU Utilization (%)

but for the contribution of each db, I'd use:

( CPU Usage Per Sec + Background CPU Usage Per Sec ) / Host CPU Usage Per Sec


Need help with cpu utilization

August 07, 2019 - 6:16 pm UTC

Reviewer: Diwakar Krishnakumar from Redwood shores parkway, United States

Awesome, thank you as always :)
Connor McDonald

Followup  

August 08, 2019 - 2:27 am UTC

Glad we could help