Thanks for the question, Ram.

Asked: February 11, 2006 - 9:10 pm UTC

Last updated: October 08, 2008 - 7:51 pm UTC

Version: 9.2.0

Viewed 1000+ times

You Asked

Hi Tom,

Here is Statspack report that was obtained when the end-users complained about the poor reponse times in the system.
Please advice on the following:
(1) The Top wait event i.e. Latch Free.
(2) Latches Activity.

During this period an mpstat on the OS side shows that
wt and idl times are 0 and the user processes is taking more than 90% cpu( 4 cpu).

SunOS 5.9$ mpstat 5 5
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
20 134 14 2669 463 139 1318 235 38 23 3 2343 80 20 0 0
21 128 12 2816 328 101 1290 238 38 24 3 2279 89 11 0 0
22 144 13 3073 1134 1030 1117 225 39 26 3 2342 91 9 0 0
23 133 12 3380 290 57 1284 240 41 25 3 2243 92 8 0 0

This happened continously for the past 2 days.

3.) Is there anything else to investigate with respect to the poor reponse times and the system hanging on the end-users.

Thanks Tom in advance.
Ram.

STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
XXXXX 3013888371 XXXXX 1 9.2.0.6.0 NO prodb

Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 185 08-Feb-06 11:10:58 159 39.2
End Snap: 186 08-Feb-06 11:37:02 146 43.3
Elapsed: 26.07 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 448M Std Block Size: 16K
Shared Pool Size: 400M Log Buffer: 512K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 74,923.36 4,903.55
Logical reads: 21,238.92 1,390.03
Block changes: 360.10 23.57
Physical reads: 1,293.47 84.65
Physical writes: 103.05 6.74
User calls: 1,181.01 77.29
Parses: 252.17 16.50
Hard parses: 22.01 1.44
Sorts: 22.95 1.50
Logons: 0.40 0.03
Executes: 349.55 22.88
Transactions: 15.28

% Blocks changed per Read: 1.70 Recursive Call %: 46.94
Rollback per transaction %: 4.54 Rows per Sort: 989.63

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.99 Redo NoWait %: 99.97
Buffer Hit %: 94.39 In-memory Sort %: 99.70
Library Hit %: 95.54 Soft Parse %: 91.27
Execute to Parse %: 27.86 Latch Hit %: 99.65
Parse CPU to Parse Elapsd %: 4.27 % Non-Parse CPU: 88.41

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 93.30 92.03
% SQL with executions>1: 61.13 57.51
% Memory for SQL w/exec>1: 51.23 49.54

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
latch free 231,787 18,490 49.24
direct path write 35,679 7,706 20.52
CPU time 3,496 9.31
db file scattered read 461,152 2,755 7.34
direct path read 38,931 1,672 4.45
-------------------------------------------------------------
Wait Events for DB: XXXXX Instance: XXXXX Snaps: 185 -186
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
latch free 231,787 144,912 18,490 80 9.7
direct path write 35,679 0 7,706 216 1.5
db file scattered read 461,152 0 2,755 6 19.3
direct path read 38,931 0 1,672 43 1.6
log file sync 8,576 602 1,299 151 0.4
db file sequential read 293,444 0 723 2 12.3
log file parallel write 31,914 0 342 11 1.3
SQL*Net more data to client 511,438 0 338 1 21.4
log buffer space 547 47 233 426 0.0
buffer busy waits 2,184 10 206 94 0.1
db file parallel write 398 0 154 387 0.0
LGWR wait for redo copy 4,498 2,472 47 11 0.2
log file switch completion 84 6 34 400 0.0
control file parallel write 607 0 28 46 0.0
library cache pin 65 0 9 131 0.0
async disk IO 198 0 5 27 0.0
row cache lock 102 0 5 45 0.0
log file sequential read 156 0 4 26 0.0
wait list latch free 152 0 4 23 0.0
enqueue 2 0 2 920 0.0
ARCH wait on ATTACH 19 0 1 77 0.0
SQL*Net break/reset to clien 784 0 1 2 0.0
log file single write 24 0 0 13 0.0
db file parallel read 12 0 0 20 0.0
control file sequential read 565 0 0 0 0.0
ARCH wait on SENDREQ 19 0 0 6 0.0
ARCH wait on DETACH 19 0 0 1 0.0
buffer deadlock 1 1 0 0 0.0
SQL*Net message from client 1,848,570 0 96,546 52 77.4
jobq slave wait 1,868 944 4,610 2468 0.1
SQL*Net message to client 1,848,567 0 26 0 77.4
SQL*Net more data from clien 3,059 0 0 0 0.1
-------------------------------------------------------------
Background Wait Events for DB: XXXXX Instance: XXXXX Snaps: 185 -186
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file parallel write 31,924 0 343 11 1.3
db file parallel write 398 0 154 387 0.0
LGWR wait for redo copy 4,498 2,472 47 11 0.2
latch free 447 200 36 80 0.0
control file parallel write 607 0 28 46 0.0
log buffer space 70 3 19 278 0.0
async disk IO 198 0 5 27 0.0
buffer busy waits 20 0 5 239 0.0
log file sequential read 156 0 4 26 0.0
ARCH wait on ATTACH 19 0 1 77 0.0
db file scattered read 128 0 1 11 0.0
direct path write 78 0 1 18 0.0
db file sequential read 125 0 1 5 0.0
log file single write 24 0 0 13 0.0
control file sequential read 523 0 0 0 0.0
wait list latch free 4 0 0 37 0.0
ARCH wait on SENDREQ 19 0 0 6 0.0
direct path read 78 0 0 1 0.0
ARCH wait on DETACH 19 0 0 1 0.0
rdbms ipc message 13,358 1,340 8,259 618 0.6
smon timer 232 0 1,530 6593 0.0
pmon timer 1,312 466 1,493 1138 0.1
-------------------------------------------------------------
SQL ordered by Gets for DB: XXXXX Instance: XXXXX Snaps: 185 -186
-> End Buffer Gets Threshold: 10000
-> Note that resources reported for PL/SQL includes the resources used by
all SQL statements called within the PL/SQL code. As individual SQL
statements are also reported, it is possible and valid for the summed
total % to exceed 100

CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
1,857,990 1 1,857,990.0 5.6 11.11 181.40 3248088983
Module: dllhost.exe
select uRef || ' (#' || to_char(hMy - 1200000000) || ')' from tr
ans where iType = 12 and bOpen = -1 and (hmy in (select t.hParen
t2 from trans t, detail d where t.hParent2 = trans.hMy and t.hMy
= d.hInvOrRec and (d.hProp IN (2712, 2713, 2714, 2715, 2716, 36
77, 2717, 2718, 2721, 2724, 2859, 4054, 3466, 609, 985, 934, 922

676,302 108,327 6.2 2.0 27.00 430.48 3273774910
SELECT MAX(HMY) FROM H8SUMM WHERE HTENANT = :B2 AND NVL(BMASTERR
ECD, 0) = 0 AND INSTR(:B1 , SACTIONTYPE2A) <> 0 AND TO_CHAR(DTEF
FECTIVE2B, 'YYYYMMDD') || TO_CHAR(NVL(ICORRECTIONNUM,0)) = ( SEL
ECT MAX(TO_CHAR(DTEFFECTIVE2B, 'YYYYMMDD') || TO_CHAR(NVL(ICORRE
CTIONNUM,0))) FROM H8SUMM WHERE HTENANT = :B2 AND NVL(BMASTERREC

-------------------------------------------------------------
Instance Activity Stats for DB: XXXXX Instance: XXXXX Snaps: 185 -186

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
CPU used by this session 349,570 223.5 14.6
CPU used when call started 338,828 216.6 14.2
CR blocks created 3,438 2.2 0.1
Cached Commit SCN referenced 0 0.0 0.0
Commit SCN cached 0 0.0 0.0
DBWR buffers scanned 81,453 52.1 3.4
DBWR checkpoint buffers written 8,958 5.7 0.4
DBWR checkpoints 6 0.0 0.0
DBWR free buffers found 72,997 46.7 3.1
DBWR lru scans 78 0.1 0.0
DBWR make free requests 79 0.1 0.0
DBWR revisited being-written buff 11 0.0 0.0
DBWR summed scan depth 81,453 52.1 3.4
DBWR transaction table writes 75 0.1 0.0
DBWR undo block writes 3,686 2.4 0.2
DFO trees parallelized 0 0.0 0.0
PX local messages recv'd 0 0.0 0.0
PX local messages sent 0 0.0 0.0
Parallel operations downgraded 25 0 0.0 0.0
SQL*Net roundtrips to/from client 1,845,799 1,180.2 77.2
SQL*Net roundtrips to/from dblink 0 0.0 0.0
active txn count during cleanout 4,323 2.8 0.2
background checkpoints completed 6 0.0 0.0
background checkpoints started 6 0.0 0.0
background timeouts 1,802 1.2 0.1
branch node splits 0 0.0 0.0
buffer is not pinned count 21,749,100 13,906.1 910.1
buffer is pinned count 13,628,581 8,713.9 570.3
bytes received via SQL*Net from c 88,199,894 56,393.8 3,690.8
bytes received via SQL*Net from d 0 0.0 0.0
bytes sent via SQL*Net to client 1,408,176,981 900,368.9 58,926.9
bytes sent via SQL*Net to dblink 0 0.0 0.0
calls to get snapshot scn: kcmgss 1,345,109 860.0 56.3
calls to kcmgas 47,854 30.6 2.0
calls to kcmgcs 2,039 1.3 0.1
change write time 34,940 22.3 1.5
cleanout - number of ktugct calls 4,577 2.9 0.2
cleanouts and rollbacks - consist 1,778 1.1 0.1
cleanouts only - consistent read 307 0.2 0.0
cluster key scan block gets 35,092 22.4 1.5
cluster key scans 19,142 12.2 0.8
commit cleanout failures: block l 747 0.5 0.0
commit cleanout failures: buffer 5 0.0 0.0
commit cleanout failures: callbac 5 0.0 0.0
commit cleanout failures: cannot 37 0.0 0.0
commit cleanout failures: hot bac 0 0.0 0.0
commit cleanouts 87,684 56.1 3.7
commit cleanouts successfully com 86,890 55.6 3.6
commit txn count during cleanout 4,942 3.2 0.2
consistent changes 195,457 125.0 8.2
consistent gets 32,790,808 20,966.0 1,372.2
consistent gets - examination 8,443,878 5,398.9 353.3
current blocks converted for CR 60 0.0 0.0
cursor authentications 18,690 12.0 0.8
data blocks consistent reads - un 184,284 117.8 7.7
db block changes 563,198 360.1 23.6
Instance Activity Stats for DB: XXXXX Instance: XXXXX Snaps: 185 -186

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
db block gets 426,840 272.9 17.9
deferred (CURRENT) block cleanout 37,148 23.8 1.6
dirty buffers inspected 78 0.1 0.0
enqueue conversions 640 0.4 0.0
enqueue deadlocks 0 0.0 0.0
enqueue releases 130,800 83.6 5.5
enqueue requests 130,810 83.6 5.5
enqueue timeouts 4 0.0 0.0
enqueue waits 2 0.0 0.0
exchange deadlocks 1 0.0 0.0
execute count 546,702 349.6 22.9
free buffer inspected 12,419 7.9 0.5
free buffer requested 1,872,044 1,197.0 78.3
hot buffers moved to head of LRU 39,683 25.4 1.7
immediate (CR) block cleanout app 2,085 1.3 0.1
immediate (CURRENT) block cleanou 1,906 1.2 0.1
index fast full scans (full) 1,513 1.0 0.1
index fast full scans (rowid rang 0 0.0 0.0
index fetch by key 6,013,423 3,844.9 251.6
index scans kdiixs1 5,187,054 3,316.5 217.1
leaf node 90-10 splits 56 0.0 0.0
leaf node splits 119 0.1 0.0
logons cumulative 618 0.4 0.0
messages received 16,479 10.5 0.7
messages sent 16,479 10.5 0.7
no buffer to keep pinned count 0 0.0 0.0
no work - consistent read gets 18,451,850 11,797.9 772.1
opened cursors cumulative 375,051 239.8 15.7
parse count (failures) 910 0.6 0.0
parse count (hard) 34,425 22.0 1.4
parse count (total) 394,387 252.2 16.5
parse time cpu 40,504 25.9 1.7
parse time elapsed 949,246 606.9 39.7
physical reads 2,022,993 1,293.5 84.7
physical reads direct 159,729 102.1 6.7
physical reads direct (lob) 0 0.0 0.0
physical writes 161,170 103.1 6.7
physical writes direct 143,872 92.0 6.0
physical writes non checkpoint 158,108 101.1 6.6
pinned buffers inspected 1,696 1.1 0.1
prefetch clients - default 2 0.0 0.0
prefetched blocks 1,108,825 709.0 46.4
prefetched blocks aged out before 0 0.0 0.0
process last non-idle time 1,562 1.0 0.1
queries parallelized 0 0.0 0.0
recovery array read time 0 0.0 0.0
recovery array reads 0 0.0 0.0
recovery blocks read 0 0.0 0.0
recursive calls 1,633,893 1,044.7 68.4
recursive cpu usage 37,457 24.0 1.6
redo blocks written 243,227 155.5 10.2
redo buffer allocation retries 585 0.4 0.0
redo entries 282,608 180.7 11.8
redo log space requests 86 0.1 0.0
redo log space wait time 3,445 2.2 0.1
redo ordering marks 0 0.0 0.0
Latch Activity for DB: XXXXX Instance: XXXXX Snaps: 185 -186
->"Get Requests", "Pct Get Miss" and "Avg Slps/Miss" are statistics for
willing-to-wait latch get requests
->"NoWait Requests", "Pct NoWait Miss" are for no-wait latch get requests
->"Pct Misses" for both should be very close to 0.0

Pct Avg Wait Pct
Get Get Slps Time NoWait NoWait
Latch Requests Miss /Miss (s) Requests Miss
------------------------ -------------- ------ ------ ------ ------------ ------
Consistent RBA 16,006 0.0 0 0
FAL request queue 12 0.0 0 0
FIB s.o chain latch 84 0.0 0 0
FOB s.o list latch 4,874 0.0 1.0 0 0
SQL memory manager latch 1 0.0 0 511 0.0
SQL memory manager worka 199,851 0.0 0.8 5 0
active checkpoint queue 943 0.0 0 0
alert log latch 32 0.0 0 0
archive control 70 0.0 0 0
archive process latch 67 4.5 0.0 0 0
cache buffer handles 28,611 0.0 0 0
cache buffers chains 61,536,340 0.1 3.4 0 3,249,271 0.4
cache buffers lru chain 28,340 2.9 0.5 99 4,101,010 2.2
channel handle pool latc 992 0.0 0 0
channel operations paren 3,838 0.0 0.0 0 0
checkpoint queue latch 119,324 0.0 0.3 0 17,964 0.0
child cursor hash table 245,426 0.1 0.3 7 0
commit callback allocati 1 0.0 0 0
constraint object alloca 1 0.0 0 0
dml lock allocation 85,042 0.0 0.2 2 0
dummy allocation 1,250 0.0 0 0
enqueue hash chains 262,645 0.0 0.5 10 0
enqueues 378,411 0.2 0.6 67 0
event group latch 309 0.0 0 0
hash table column usage 437 0.9 4.0 0 2,548,839 0.0
job workq parent latch 1 100.0 0.0 0 620 1.5
job_queue_processes para 335 0.0 0 0
ktm global data 232 0.0 0 0
lgwr LWN SCN 16,017 0.0 0.1 0 0
library cache 5,084,839 2.2 0.5 6432 100,934 11.2
library cache load lock 4,888 0.0 0 0
library cache pin 2,853,744 0.1 0.4 283 0
library cache pin alloca 1,614,590 0.1 0.5 216 0
list of block allocation 988 0.0 0 0
loader state object free 794 0.0 0 0
longop free list parent 2,575 0.0 0 187 0.0
message pool operations 637 0.0 0 0
messages 51,139 0.0 0.2 0 0
mostly latch-free SCN 16,054 0.1 0.1 0 0
multiblock read objects 1,278,671 0.2 0.3 173 0
ncodef allocation latch 428 0.0 0 0
object stats modificatio 102 0.0 0 0
post/wait queue 14,689 0.0 0.0 0 9,209 0.0
process allocation 309 0.3 33.0 1 309 0.0
process group creation 631 0.0 0 0
redo allocation 315,286 0.1 0.3 17 0
redo copy 0 0 283,343 1.7
redo writing 50,122 0.0 0.5 0 0
row cache enqueue latch 6,335,779 0.4 0.0 52 0
row cache objects 6,767,248 0.4 0.2 1386 1,584 0.2
Latch Sleep breakdown for DB: XXXXX Instance: XXXXX Snaps: 185 -186
-> ordered by misses desc

Get Spin &
Latch Name Requests Misses Sleeps Sleeps 1->4
-------------------------- -------------- ----------- ----------- ------------
shared pool 4,600,581 127,578 24,417 104263/22273
/988/54/0
library cache 5,084,839 111,775 52,506 62069/47161/
2326/219/0
cache buffers chains 61,536,340 42,486 142,926 0/0/0/0/0
row cache objects 6,767,248 29,574 4,696 24914/4624/3
6/0/0
row cache enqueue latch 6,335,779 22,846 497 22354/487/5/
0/0
library cache pin 2,853,744 3,540 1,364 2178/1360/2/
0/0
multiblock read objects 1,278,671 3,023 775 2249/773/1/0
/0
session idle bit 3,975,458 2,393 1,794 0/0/0/0/0
library cache pin allocati 1,614,590 2,359 1,108 1251/1108/0/
0/0
session allocation 808,491 2,067 803 1264/803/0/0
/0
cache buffers lru chain 28,340 819 424 397/420/2/0/
0
enqueues 378,411 617 354 263/354/0/0/
0
redo allocation 315,286 331 94 239/90/2/0/0
simulator lru latch 47,662 242 208 34/208/0/0/0
child cursor hash table 245,426 160 52 108/52/0/0/0
enqueue hash chains 262,645 108 52 56/52/0/0/0
dml lock allocation 85,042 36 8 28/8/0/0/0
SQL memory manager workare 199,851 24 20 4/20/0/0/0
undo global data 136,739 14 7 7/7/0/0/0
redo writing 50,122 11 6 5/6/0/0/0
messages 51,139 10 2 8/2/0/0/0
mostly latch-free SCN 16,054 9 1 8/1/0/0/0
lgwr LWN SCN 16,017 8 1 7/1/0/0/0
simulator hash latch 1,820,154 7 3 4/3/0/0/0
hash table column usage la 437 4 16 0/0/0/0/0
checkpoint queue latch 119,324 3 1 2/1/0/0/0
sequence cache 34,660 2 2 0/2/0/0/0
user lock 4,199 2 1 1/1/0/0/0
process allocation 309 1 33 0/0/0/1/0
FOB s.o list latch 4,874 1 1 0/1/0/0/0
-------------------------------------------------------------

and Tom said...

if you are a frequent reader of this site or any of my writings.... This is a "no brainer"

Hard parses: 22.01

so, you are telling me that in a 26 minute period, there were 34,428 new SQL statements that never ever existed before coming into your system.

This is not possible, not unless someone of course has forgotten to use - well, yes: bind variables...

Here, this looks like a culprit (one of many I am sure)

Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
1,857,990 1 1,857,990.0 5.6 11.11 181.40 3248088983
Module: dllhost.exe
select uRef || ' (#' || to_char(hMy - 1200000000) || ')' from tr
ans where iType = 12 and bOpen = -1 and (hmy in (select t.hParen
t2 from trans t, detail d where t.hParent2 = trans.hMy and t.hMy
= d.hInvOrRec and (d.hProp IN (2712, 2713, 2714, 2715, 2716, 36
77, 2717, 2718, 2721, 2724, 2859, 4054, 3466, 609, 985, 934, 922

It does not use binds - and it does "quite a bit of work".

It is really hard to "tune a system" using aggregate information like this, however I feel very confident in stating the following:

a) the developers have chosen to not use bind variables. This system therefore cannot and will not scale.

b) there is some obvious low hanging fruit in the form of really expensive to execute SQL statements.

Rating

(8 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Ram, February 12, 2006 - 5:40 pm UTC

Tom,

I'm a frequent reader of your site & read all your books.

I'm aware of the hard parses and the consequences of not using bind variables but my concern is that we have had this issues(hard parses) earlier but the system was never so poor in response times.

So is there any thing else we have to look as far as poor response times are concerned(apart from the hard parses).
This is a 3rd party software and i want to make sure that the memory structures and the CPU's are sufficient enough to handle the load(600 concurrent users).

Thanks in advance Tom.

February 13, 2006 - 8:01 am UTC

As I said

you have two things that popped out of that report and smacked me right in the forehead:

a) the developers have chosen to not use bind variables. This system therefore
cannot and will not scale.

b) there is some obvious low hanging fruit in the form of really expensive to
execute SQL statements.

There isn't enough cpu's on the PLANET to handle 600 concurrent users with a hard parse problem!!!

You would have to have 601 cpus before you start seeing "spare cpu" in such a system - the latching that is going on - killer.

A reader, February 12, 2006 - 8:19 pm UTC

>>This is a 3rd party software and i want to make sure that >>the memory structures and the CPU's are sufficient enough to >>handle the load(600 concurrent users)

Tom has also mentioned numerous times you cannot make up for inefficient application by throwing in hardware.

February 13, 2006 - 8:07 am UTC

(and the only thing you can 'prove' with a hard parse problem is that you always need at least one more cpu ;)

Ram, February 12, 2006 - 8:29 pm UTC

Tom,

<quote>
so, you are telling me that in a 26 minute period, there were 34,428 new SQL statements that never ever existed before coming into your system.
</quote>

How did you get 34,428?

Thanks Tom.

February 13, 2006 - 8:07 am UTC

I see someone else followed that up...

Doing the math

Adam Musch, February 12, 2006 - 11:27 pm UTC

22.01 hard parses 60 sec 26.07 min
----------------- * -------- * ----------- = 34,428
sec min

no performance work around for hard parsing

Dhar, February 13, 2006 - 12:16 am UTC

In Top 5 events 1) Latch free - hard parsing
2) db scattered read (waits related to full table scan, could be index dropped / missing index. check for your indexes. and
3) db sequential read. (was any code change done in the past two days )

I had/have similar problem (high latch free). I learnt hard coding is a MUST to change ,else we will have to pay for it..Fortunately its not 3rd party s/w. so our developers are working on it (to use bind variables)

Though you have mentioned that you are aware of the hard parsing and its consequences ...please refer tom's explanation on hard parsing (search for 'Another question about bind variables ' and read till end of the thread.). this might help you to convince the 3rd party to change the code.

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2320123769177



There are lots of misses in 'Library cache' and 'Shared pool'.Latch on library cache is very expensive.(one of the main culprit 'not using bind vaiables)


Latch Name                       Requests      Misses      Sleeps Sleeps 1->4-------------------------- -------------- ----------- ----------- ------------
shared pool                     4,600,581     127,578      24,417 104263/22273
                                                                  /988/54/0
library cache                   5,084,839     111,775      52,506 62069/47161/

Please refer to tom response on one of the questions on high CPU utilization (CPU idle time reaches 0 in a 8 CPU system ,8Gb RAM ,that too only with 150 concurrent users - all because of HARD PARSING ).

http://asktom.oracle.com/pls/ask/f?p=4950:61:13483060899627307688::::P61_ID:16470445503580#56782670186170 <code>
search for 'High CPU and Log file sync February 05, 2006 '

Thanks,
Dhar

Latch Free - in Top 5 timed Event

Manoj, October 02, 2008 - 11:56 pm UTC

Hi Tom,
Thanks for your time. Your answers always enriches me many folds.

Our Database Version : 9.2.0.6
Number of CPUs : 16

Can you please explain me the followings.
a)We have 70,685 CPU seconds so out of that 61,841 seconds Latch free wait event consumed? Is it too much..? Latch Free is making other processes CPU starve?
b)What does timeout mean in case of Latch event?
c)Does very high latch free event is due to excessive parsing. ( parses / per second = 62.69). Is it too bad ? We have OLTP.
d)SQL order by parse suggest that some of the SQL are parsed as many as times these are executed. So if we reduce excessive parsing "waits" for latch free wait events will get reduced?

I have excerpts from statspack report.

Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 2848 03-Oct-08 03:00:07 ####### .0
End Snap: 2850 03-Oct-08 03:15:01 ####### .0
Elapsed: 14.90 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 1,248M Std Block Size: 8K
Shared Pool Size: 864M Log Buffer: 2,048K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 46,176.68 14,479.81
Logical reads: 76,382.53 23,951.59
Block changes: 141.04 44.23
Physical reads: 693.28 217.39
Physical writes: 13.49 4.23
User calls: 204.35 64.08
Parses: 62.69 19.66
Hard parses: 0.02 0.01
Sorts: 4.26 1.33
Logons: 0.08 0.02
Executes: 63.16 19.80
Transactions: 3.19

% Blocks changed per Read: 0.18 Recursive Call %: 19.16
Rollback per transaction %: 0.00 Rows per Sort: 68.57

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.99 Redo NoWait %: 100.00
Buffer Hit %: 99.11 In-memory Sort %: 99.95
Library Hit %: 99.99 Soft Parse %: 99.97
Execute to Parse %: 0.74 Latch Hit %: 76.97
Parse CPU to Parse Elapsd %: 18.31 % Non-Parse CPU: 99.98

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 100.00 100.00
% SQL with executions>1: 98.61 98.61
% Memory for SQL w/exec>1: 91.57 91.51
Wait Events for DB: XXXX Instance: XXXX Snaps: 2848 -2850
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
latch free 2,245,529 2,242,341 61,841 28 787.6
db file scattered read 113,629 0 254 2 39.9
db file sequential read 43,775 0 36 1 15.4
buffer busy waits 4,516 3 29 7 1.6

SQL ordered by Gets for DB: NB04C Instance: nb04c02 Snaps: 2848 -2850
-> End Buffer Gets Threshold: 10000
-> Note that resources reported for PL/SQL includes the resources used by
all SQL statements called within the PL/SQL code. As individual SQL
statements are also reported, it is possible and valid for the summed
total % to exceed 100

CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
14,934,338 283 52,771.5 21.9 2034.75 20346.01 4193765259
Module: java@a02 (TNS V1-V3)
SELECT T103.x_cpd,T103.objid FROM table_site_part T100,table_ord
er_line T101,table_order_action T102,table_contract T103 WHERE(
T100.objid = :P1)AND(T101.affected2site_part=T100.objid) AND (T
101.order_line2order_action=T102.objid) AND (T102.order_action2c
ontract=T103.objid)

10,365,979 149 69,570.3 15.2 1458.29 14893.43 2536153886
Module: java@a02 (TNS V1-V3)
SELECT T103.x_version,T103.objid FROM table_site_part T100,table
_order_line T101,table_order_action T102,table_contract T103 WHE
RE( T100.objid = :P1)AND(T101.affected2site_part=T100.objid) AN
D (T101.order_line2order_action=T102.objid) AND (T102.order_acti
on2contract=T103.objid)

SQL ordered by Parse Calls for DB: NB04C Instance: nb04c02 Snaps: 2848 -2850
-> End Parse Calls Threshold: 1000

% Total
Parse Calls Executions Parses Hash Value
------------ ------------ -------- ----------
6,461 6,461 11.53 982153273
Module: java@a02 (TNS V1-V3)
SELECT T101.attribute_value,T101.attribute_datatype,T101.objid F
ROM table_site_part T100,table_fa_site_part T101 WHERE ( T101.at
tribute_name = :P1) AND ( T100.objid = :P2)AND (T100.objid=T10
1.fa_site_part2site_part)

4,073 4,073 7.27 2303038669
Module: java@a01 (TNS V1-V3)
SELECT T1.objid,T1.name,T1.type,T1.label,T1.description,T1.defau
lt_value,T1.is_required,T1.expr_param2expr FROM table_expr_param
T1 WHERE T1.expr_param2expr=:P16

Cheers.

October 03, 2008 - 7:12 am UTC

You would have to go to the latch section in the report to see what the latch free waits were for.

Basically, the rudimentary algorithm for getting a latch is:

Loop
 for I in 1 .. 2000
 loop
  try to get latch
  if got latch, return
  if I = 1 then misses=misses+1
 end loop
 INCREMENT WAIT COUNT
 sleep
 Add WAIT TIME
End loop;

Note that you only 'wait' after spinning 2000 times (this is called a spin lock, a common implementation).

So,

                                                                   Avg
                                                     Total Wait   wait    Waits
Event                               Waits   Timeouts   Time (s)   (ms)     /txn
---------------------------- ------------ ---------- ---------- ------ --------
latch free                      2,245,529  2,242,341     61,841     28    787.6

Means that 2.2 million times - you spun 2000 times and then waited. That is not really good.

I doubt in this case it is parse related - I'm thinking more likely cache buffers chains related. I say that because of things like this:

CPU Elapsd

  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash 
Value
--------------- ------------ -------------- ------ -------- --------- 
----------
     14,934,338          283       52,771.5   21.9  2034.75  20346.01 
4193765259
Module: java@a02 (TNS V1-V3)
SELECT T103.x_cpd,T103.objid FROM table_site_part T100,table_ord
er_line T101,table_order_action T102,table_contract T103 WHERE(
T100.objid =  :P1)AND(T101.affected2site_part=T100.objid) AND (T
101.order_line2order_action=T102.objid) AND (T102.order_action2c
ontract=T103.objid)

that is a TON of logical IO (53k per execution!) for what appears to be a rather simple query

And a query that just doesn't look right - why a four table join when all you need comes from one table. Not sure - but you might rip apart the high load SQL, validate it for 'correctness' and look to tune it.

But go to the latch report and see what you are actually waiting on. And bear in mind - latching = locking = serialization. The more latching you do, the less you scale. You are not executing lots of statements (60 some a second is pretty small in the grand scheme of things), but these statements seem to be doing a ton of work

...Latch Free - in Top 5 timed Event

Manoj, October 03, 2008 - 1:44 pm UTC

Hi Tom,
With your answers Oracle complex architecture looks very easy. Kudos!!!

<quote>
....You would have to go to the latch section in the report to see what the latch free waits were for.
...</quote>
Here is the excerpts from statspack
-> ordered by misses desc

Get Spin &
Latch Name Requests Misses Sleeps Sleeps 1->4
-------------------------- -------------- ----------- ----------- ------------
cache buffers chains 143,395,453 35,700,824 2,235,280 0/0/0/0/0

Pct Avg Wait Pct
Get Get Slps Time NoWait NoWait
Latch Requests Miss /Miss (s) Requests Miss
------------------------ -------------- ------ ------ ------ ------------ ------

cache buffers chains 143,395,453 24.9 0.1 0 1,769,497 0.2

So the statement made by you :
...I doubt in this case it is parse related - I'm thinking more likely cache buffers chains related.....
Is very correct.

I want to give some background on the statspack which i sent earlier.

- We did schema refresh ( usong export/import) of test environment from live.
- After import we didnt gathered the schema stats. Optimizer_mode=First_rows. ( So it was breakfast without coffee!)
- We ran batch job ( started at 0300 hrs and it got completed at 0600 hrs)
- I sent the statspack report above between 03:00:07 & 03:15:01 to you today morning.
- We found "latch free" event as the top event.
- I monitored the cpu usage during the above period and it was touching 90% throught.

After the above test we gatherd the stats using dbms_stats.gather_schema_stats and ran the same batch job again.
It got finished in 30 mins.
Statspack for the same is given below.

Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 2933 03-Oct-08 09:20:05 ####### .0
End Snap: 2936 03-Oct-08 09:50:06 ####### .0
Elapsed: 30.02 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 1,248M Std Block Size: 8K
Shared Pool Size: 864M Log Buffer: 2,048K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 485,058.83 20,063.64
Logical reads: 118,133.41 4,886.39
Block changes: 1,322.53 54.70
Physical reads: 4,316.64 178.55
Physical writes: 59.61 2.47
User calls: 1,635.73 67.66
Parses: 471.05 19.48
Hard parses: 0.39 0.02
Sorts: 22.03 0.91
Logons: 0.03 0.00
Executes: 471.56 19.51
Transactions: 24.18

% Blocks changed per Read: 1.12 Recursive Call %: 21.18
Rollback per transaction %: 0.00 Rows per Sort: 14.92
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.60 Redo NoWait %: 100.00
Buffer Hit %: 96.36 In-memory Sort %: 99.99
Library Hit %: 99.97 Soft Parse %: 99.92
Execute to Parse %: 0.11 Latch Hit %: 99.74
Parse CPU to Parse Elapsd %: 30.84 % Non-Parse CPU: 99.14

"sp_2933_2936_0920_to_0950_after_stats_gather.lst" [Read only] 2319 lines, 140944 characters
Parse CPU to Parse Elapsd %: 30.84 % Non-Parse CPU: 99.14

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 100.00 100.00
% SQL with executions>1: 99.07 98.98
% Memory for SQL w/exec>1: 93.39 93.23

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file scattered read 1,728,892 52,257 61.00
db file sequential read 1,598,270 14,477 16.90
buffer busy waits 773,454 11,723 13.68
CPU time 3,503 4.09
latch free 246,787 1,851 2.16
-------------------------------------------------------------
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
db file scattered read 1,728,892 0 52,257 30 39.7
db file sequential read 1,598,270 0 14,477 9 36.7
buffer busy waits 773,454 67 11,723 15 17.8
latch free 246,787 143,862 1,851 8 5.7
....
^LSQL ordered by Gets for DB: NB04C Instance: nb04c02 Snaps: 2933 -2936
-> End Buffer Gets Threshold: 10000
-> Note that resources reported for PL/SQL includes the resources used by
all SQL statements called within the PL/SQL code. As individual SQL
statements are also reported, it is possible and valid for the summed
total % to exceed 100

CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
1,528,649,874 126,607 12,074.0 718.5 ######## ######### 4071151282
Module: java@cbh75a01 (TNS V1-V3)
SELECT T1.objid,T1.dev,T1.x_e2edata,T1.x_create_timestamp,T1.x_u
pdate_timestamp,T1.x_e2edata2proc_inst FROM table_x_e2e_data T1
INNER JOIN table_proc_inst T2 ON T1.x_e2edata2proc_inst = T2.ob
jid WHERE ( T2.objid = :P16 )

136,749,594 2,300 59,456.3 64.3 451.09 626.63 2999358581
Module: java@cbh75a02 (TNS V1-V3)
SELECT T102.objid,T102.objid FROM table_site_part T100,table_ord
er_action T101,table_order_line T102 WHERE( T100.objid = :P1)AN
D(T101.order_action_svc2site_part=T100.objid) AND (T102.order_li
ne2order_action=T101.objid)

...
^LSQL ordered by Parse Calls for DB: NB04C Instance: nb04c02 Snaps: 2933 -2936
-> End Parse Calls Threshold: 1000

% Total
Parse Calls Executions Parses Hash Value
------------ ------------ -------- ----------
2,357,411 2,357,411 277.88 2445008564
Module: java@cbh75a01 (TNS V1-V3)
SELECT T103.attribute_value,T103.attribute_datatype,T103.objid F
ROM table_order_action T100,table_order_line T101,table_site_par
t T102,table_fa_site_part T103 WHERE ( T103.attribute_name = :P
1) AND ( T100.objid = :P2)AND (T102.objid=T103.fa_site_part2sit
e_part) AND (T101.order_line2order_action=T100.objid) AND (T102.

691,544 691,543 81.52 2400981116
....
Latch Activity for DB: XXX
Pct Avg Wait Pct
Get Get Slps Time NoWait NoWait
Latch Requests Miss /Miss (s) Requests Miss
------------------------ -------------- ------ ------ ------ ------------ ------
cache buffers chains 264,223,293 0.2 0.3 0 22,377,226 0.2

MY Question :
a) STATS gathering reduced the batch processing time from 3 hrs to 3/6 hrs. How it has caused "latch free" event etc.. to get improved?
b) CPU usage touches touches 50%. during the period batch job ran (30 mins). Is it because of low latch free wait events?

Then to improve it further, I started looking into the SQL as pointed out by you.
..And a query that just doesn't look right - why a four table join when all you need comes from one table. Not sure - but you might rip apart the high load SQL, validate it for 'correctness' and look to tune it.....

I will take the same with my development team.

Some tables indexes were missing and i created it.
- We ran the same batch job again now it got completed in 7 mins.
--------Sorry Tom for making my question long & long..
Here is the some excerpts taken from statspack report which was taken during the period we ran batch job events after making creating the indexes.
Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 3026 03-Oct-08 15:30:06 ####### .0
End Snap: 3027 03-Oct-08 15:40:06 ####### .0
Elapsed: 10.00 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 1,248M Std Block Size: 8K
Shared Pool Size: 864M Log Buffer: 2,048K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 1,339,099.09 20,839.31
Logical reads: 21,361.90 332.44
Block changes: 3,659.97 56.96
Physical reads: 3,776.71 58.77
Physical writes: 147.81 2.30
User calls: 4,401.12 68.49
Parses: 1,242.59 19.34
Hard parses: 0.48 0.01
Sorts: 59.70 0.93
Logons: 0.03 0.00
Executes: 1,243.06 19.34
Transactions: 64.26

% Blocks changed per Read: 17.13 Recursive Call %: 21.55
Rollback per transaction %: 0.00 Rows per Sort: 8.23

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.87 Redo NoWait %: 100.00
Buffer Hit %: 82.50 In-memory Sort %: 100.00
Library Hit %: 99.99 Soft Parse %: 99.96
Execute to Parse %: 0.04 Latch Hit %: 99.84
Parse CPU to Parse Elapsd %: 59.21 % Non-Parse CPU: 96.39

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 100.00 100.00
% SQL with executions>1: 98.96 98.70
% Memory for SQL w/exec>1: 93.03 92.01

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file scattered read 402,921 877 44.62
CPU time 675 34.32
db file sequential read 232,035 154 7.82
log file sync 38,452 84 4.28
latch free 9,510 48 2.44
-------------------------------------------------------------

LWait Events for DB:
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
db file scattered read 402,921 0 877 2 10.5
db file sequential read 232,035 0 154 1 6.0
log file sync 38,452 0 84 2 1.0
latch free 9,510 3,312 48 5 0.2
....
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
816,486,213 1,095,202 745.5 ###### ######## 60340.28 3629242874
Module: java@cbh75a02 (TNS V1-V3)
SELECT T101.name,T101.objid FROM table_site_part T100,table_orde
r_action T101 WHERE( T100.objid = :P1)AND(T101.order_action_svc
2site_part=T100.objid)

190,058,671 41,116 4,622.5 ###### ######## ######### 4193765259
Module: java@cbh75a02 (TNS V1-V3)
SELECT T103.x_cpd,T103.objid FROM table_site_part T100,table_ord
er_line T101,table_order_action T102,table_contract T103 WHERE(
T100.objid = :P1)AND(T101.affected2site_part=T100.objid) AND (T
101.order_line2order_action=T102.objid) AND (T102.order_action2c
ontract=T103.objid)

Now all top 5 got improved a lot!

Cheers.

October 06, 2008 - 2:34 pm UTC

MY Question :
a) STATS gathering reduced the batch processing time from 3 hrs to 3/6 hrs. How
it has caused "latch free" event etc.. to get improved?
b) CPU usage touches touches 50%. during the period batch job ran (30 mins). Is
it because of low latch free wait events?

a) better plans as a result of having statistics resulted in fewer logical IO's, since logical IO's require the cache buffers chain latch - you reduced your need for that latch..... When you require a latch less often, you reduce contention for that latch....

b) I'm not sure where you are going there or why you are asking? Do you think this is bad - it sounds bad to me, I'd prefer it to be much closer to 100%, wouldn't you?

Manoj, October 07, 2008 - 2:59 am UTC

Hi Tom,
......b) I'm not sure where you are going there or why you are asking? Do you think this is bad - it sounds bad to me, I'd prefer it to be much closer to 100%, wouldn't you?

Questions:
i) Does it mean that we should keep cpu utilisation closer to 100%?
ii) If cpu utilisation is 50%, then does it mean that
either we have "spare" cpus?
iii) Your suggestion.. if cpu(s) are under utilised?

Many Thanks.

October 08, 2008 - 7:51 pm UTC

i) in a batch process, sure, I'd want the machine close to fully utilized. You cannot put CPU in the bank and use it later - use it, or lose it.

ii) you have cpu cycles that are not being used, sure.

iii) use more of them - perhaps increase the degree of parallelism - assuming your batch process is parallel capable

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Doing the math

no performance work around for hard parsing

Latch Free - in Top 5 timed Event

...Latch Free - in Top 5 timed Event

More to Explore

Performance

Investigate Wait events & Poor response times using Statspack report.

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Doing the math

no performance work around for hard parsing

Latch Free - in Top 5 timed Event

...Latch Free - in Top 5 timed Event

More to Explore

Performance