You Asked

Hi Team,

Currently we have seen huge spike for our database (non-RAC ) . This spike was due to log file sync that was around 900 session , blocker of this sessions was log file parallel write .

whereas their was no blocker for log file parallel write. From top queries we were able to find some DMl Statement was happening .
in one of top insert statement their was also enq:tx index contention we can see .

From our analysis hidden parameter _adaptive_log file sync was false , from awr background process details , time spent in log file sync w.r.t log file parallel write is low .

From ASH report this is top wait event :-

Event % Event P1 Value, P2 Value, P3 Value % Activity Parameter 1 Parameter 2 Parameter 3
log file sync 87.87 "4294967295","901430137","0" 0.14 buffer# sync scn NOT DEFINED
buffer busy waits 4.26 "2946","765779","1" 0.81 file# block# class#
enq: TX - index contention 2.21 "1415053316","5832725","8818981" 0.27 name|mode usn<<16 | slot sequence
direct path read 1.94 "898","301442","126" 0.01 file number first dba block cnt
db file sequential read 1.28 "71","180047","1" 0.01 file# block# blocks

we are not getting any conclusive details regarding the cause of log file sync .

From storage end we have checked and found some of device avg.service time is beyond 1000+ .but storage team is denied about any issue related to storage or server end .

Request you to show some light on this to carry out my further analysis for RCA.

and Connor said...

'log file sync' is sessions waiting for the log writer to respond. So from a session's point of view, it is waiting for a commit to complete, but it is *not* a reflection on the actual writing of redo to the redo log. For example, I could easily have 1000+ sessions all waiting on LGWR if LGWR was stalled in some way.

For that, we look at events for LGWR itself and see what *it* was waiting on. If you are seeing 'log file parallel write' at anything over a couple of milliseconds on modern hardware, then it suggests something is wrong with storage. Now that is not *necessarily* a SAN issue, but it is definitely a storage issue from the perspective of the database server, so it can be anything in the path between it and storage - OS, file system, network, fibre, switches, SAN CPU, SAN cache etc...

"From storage end we have checked and found some of device avg.service time is beyond 1000+" is cause for concern *if* this is related to where the LGWR is trying to write.

If log file parallel write is low *for the lgwr*, and when I say low, I mean check v$event_histogram, not just the average, the something is possibly causing the log file sync. Common causes:

- overloaded server
- excessive commit frequency

Is this answer out of date? If it is, please let us know via a Comment

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database

Breadcrumb

Question and Answer

You Asked

and Connor said...

More to Explore

Administration

performance tunning

Breadcrumb

Question and Answer

You Asked

and Connor said...

More to Explore

Administration