Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Sharon.

Asked: July 18, 2007 - 1:11 pm UTC

Last updated: July 13, 2011 - 1:54 pm UTC

Version: 9.2.0

Viewed 10K+ times! This question is

You Asked

Hi Tom.

I have a question regarding log file sync waits. From the stats pack reports, it is always in the Top 5 Timed events.

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                     % Total
Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time                                                        2,446    75.27
log file sync                                      12,883         212     6.52
latch free                                         17,869         182     5.60
log file parallel write                            35,538         174     5.34
SQL*Net break/reset to client                       5,544          80     2.46
          -------------------------------------------------------------
Wait Events for DB: PROD01  Instance: PROD01  Snaps: 222 -223
                                                                  Avg
                                                     Total Wait   wait    Waits
Event                               Waits   Timeouts   Time (s)   (ms)     /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync                      12,883         59        212     16      0.5
In a previous question ( http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:16470446503580 ) you stated that there might not be a performance benefit from tuning these waits.

I submitted my stats report to Anjo Kolk's website, OraPerf.com . The analysis said to reduce the number of process in the init.ora to reduce some of the waits. "Another option is to decrease the init.ora parameter "processes". The LGWR needs to scan all processes to find each process that is waiting for the commit to be written. The current value for processes is 2400. Try setting it close to the number of process that you really need." I have the processes set at 2400 but we usually max out at around 900.

My question is why would reducing the number of process in the init.ora reduce the log file snyc waits? Are 2400 processes created at db startup? Are there really more processes than users logged in?

We are running 9.2.0.7 with the E-Business Suite.

Thanks for your insight,
Sharon

and Tom said...

well, the average wait is 0.016 seconds - one or two one hundredths of a second, it would see that is the time to wait for the IO to complete much more than anything else.

I just did a quick and dirty example with processes = 100 / 2000 (2000 is the max on my little test machine)

Did 100,000 insert/commits.

connect / as sysdba
alter system set processes=100 scope=spfile;
startup force;

connect /
drop table t;
create table t ( x int );
alter session set events '10046 trace name context forever, level 12';
begin
    for i in 1 .. 100000
    loop
        insert into t values (i);
        commit work write immediate wait;
    end loop;
end;
/
disconnect
!tkprof `ls -t $ORACLE_HOME/admin/$ORACLE_SID/udump/*ora_*.trc | head -1` ./100.prf


connect / as sysdba
alter system set processes=2000 scope=spfile;
startup force;

connect /
drop table t;
create table t ( x int );
alter session set events '10046 trace name context forever, level 12';
begin
    for i in 1 .. 100000
    loop
        insert into t values (i);
        commit work write immediate wait;
    end loop;
end;
/
disconnect
!tkprof `ls -t $ORACLE_HOME/admin/$ORACLE_SID/udump/*ora_*.trc | head -1` ./2000.prf

!grep 'log file sync' 100.prf 2000.prf


first time I ran it:

100.prf: log file sync 99934 0.05 32.77
100.prf: log file sync 99934 0.05 32.77
2000.prf: log file sync 99933 0.04 36.74
2000.prf: log file sync 99933 0.04 36.74

so, a small difference, but that was only one run (statistically not too meaningful)

100.prf: log file sync 99952 0.06 35.00
100.prf: log file sync 99952 0.06 35.00
2000.prf: log file sync 99928 0.37 36.85
2000.prf: log file sync 99928 0.37 36.85


second time - I'm skeptical that it would give you any huge boost, anything "meaningful"

the best way to reduce log file sync waits are:

a) ensure lgwr has free and clear access to the devices the logs are on (make IO faster)

b) commit less frequently (you wait for a log file sync when you commit)

Rating

  (36 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

log file syncs

Laurie Murray, August 13, 2007 - 2:56 pm UTC

Hi, we have a problem that happens sporadically...not too often, but has a high impact when it does happen. The system waits on log file syncs and during this time users cannot even login! Our redo logs are on RAID 10 devices and are also multiplexed at the Oracle level. Since they are multiplexed, perhaps I should have our sysadmin change the devices to RAID0? Do you think that would be faster? Perhaps mirroring redo logs at the Oracle level and also at the RAID level is too redundant and hurts performance? Here is a statspack report from when the problem last happened....massive log file sync waits!

STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
xx XX xx 1 9.2.0.7.0 NO XX

Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 142 09-Aug-07 13:21:58 125 2.0
End Snap: 143 09-Aug-07 13:40:02 123 2.3
Elapsed: 18.07 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 9,824M Std Block Size: 8K
Shared Pool Size: 2,048M Log Buffer: 151,229K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 12,015.17 18,579.80
Logical reads: 30,746.07 47,544.57
Block changes: 28.13 43.49
Physical reads: 105.26 162.77
Physical writes: 6.16 9.52
User calls: 185.50 286.85
Parses: 49.08 75.89
Hard parses: 0.56 0.86
Sorts: 5.34 8.25
Logons: 0.11 0.18
Executes: 58.04 89.75
Transactions: 0.65

% Blocks changed per Read: 0.09 Recursive Call %: 54.19
Rollback per transaction %: 0.00 Rows per Sort: 504.76

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 99.71 In-memory Sort %: 100.00
Library Hit %: 99.01 Soft Parse %: 98.87
Execute to Parse %: 15.44 Latch Hit %: 99.99
Parse CPU to Parse Elapsd %: 77.17 % Non-Parse CPU: 94.50

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 84.27 84.27
% SQL with executions>1: 61.47 61.98
% Memory for SQL w/exec>1: 48.13 48.60

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
log file sync 37,991 36,634 86.92
log file parallel write 822 1,519 3.60
db file parallel write 115 1,003 2.38
enqueue 323 920 2.18
db file sequential read 9,535 863 2.05
-------------------------------------------------------------
Wait Events for DB: F84P Instance: f84p Snaps: 142 -143
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync 37,991 37,235 36,634 964 54.2
log file parallel write 822 0 1,519 1848 1.2
db file parallel write 115 0 1,003 8725 0.2
enqueue 323 312 920 2847 0.5
db file sequential read 9,535 0 863 90 13.6
db file scattered read 5,704 0 378 66 8.1
control file parallel write 314 0 131 417 0.4
direct path read 1,072 0 74 69 1.5
buffer busy waits 142 0 13 94 0.2
control file sequential read 475 0 1 3 0.7
latch free 173 11 0 3 0.2
SQL*Net more data to client 3,930 0 0 0 5.6
ARCH wait on SENDREQ 18 0 0 2 0.0
SQL*Net break/reset to clien 116 0 0 0 0.2
SQL*Net message from client 202,502 0 25,029 124 288.9
SQL*Net message to client 202,594 0 0 0 289.0
SQL*Net more data from clien 444 0 0 0 0.6
-------------------------------------------------------------

thanks,
Laurie
Tom Kyte
August 20, 2007 - 10:54 am UTC

that looks fundamentally wrong

you have an average wait of one second for a write to disk - RAID 10 is great for writing - I'd suspect something "wrong", really wrong, if you are waiting an average of one second...

I doubt it is raid 10 as opposed to something really wrong in the configuration of the device itself

log file syncs

Laurie Murray, August 21, 2007 - 4:24 pm UTC

I will have our Unix admin do a health check on the disks/RAID/controller of this Solaris container. I'll post back with results.
Thank you, Tom.

log file syncs

Laurie Murray, September 07, 2007 - 5:26 am UTC

o/s administrator thinks the problem is that we have redo logs on the same RAID(1+0) stripe set as the datafiles. We only have two RAID stripe sets, one for the database and one for recovery (RMAN, archive logs). I don't want to put part of the database on the recovery stripe set. Would it be worthwhile to look into adding a 3rd stripe set and putting redo on it, or would the performance benefit be negligable? I have asked o/s admin again to look for "something wrong" with the I/O so am hoping he will find something. We have a physical standby db in play here, too, so I'm considering shutting that off for awhile to see if removing the work of log transport will help our db.
Tom Kyte
September 12, 2007 - 9:57 am UTC

If there is contention for the devices, they should be able to measure that in increased IO requests and increased IO wait times at the OS level - if they see that, then separating it out so that LGWR 'owns' some devices and dbwr has at the others - without running into each other would be something to consider.

log file syncs

Laurie Murray, September 17, 2007 - 5:45 pm UTC

Thank you!

log file sync

Shivani, December 14, 2007 - 12:09 am UTC

Hi Tom,

What do you suggest on below statspack report? We are getting log file sync as huge problem.Its causing laod average increase day by day and run queue length size in creasing.We see high number of commits.But these small trnsaction commits are unavoidable.


STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
DNSMGRP 3809414512 dnsmgrp 1 9.2.0.6.0 NO dotmacdb005

Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 27820 13-Dec-07 14:32:01 1,617 19.6
End Snap: 27821 13-Dec-07 14:47:01 1,617 19.5
Elapsed: 15.00 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 10,000M Std Block Size: 8K
Shared Pool Size: 1,008M Log Buffer: 20,480K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 1,833,266.71 2,976.80
Logical reads: 62,408.58 101.34
Block changes: 10,765.64 17.48
Physical reads: 20.26 0.03
Physical writes: 784.45 1.27
User calls: 23,153.53 37.60
Parses: 3,444.31 5.59
Hard parses: 0.00 0.00
Sorts: 0.49 0.00
Logons: 0.02 0.00
Executes: 3,444.91 5.59
Transactions: 615.85

% Blocks changed per Read: 17.25 Recursive Call %: 4.26
Rollback per transaction %: 0.08 Rows per Sort: 357.05

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.72 Redo NoWait %: 99.99
Buffer Hit %: 99.97 In-memory Sort %: 100.00
Library Hit %: 99.98 Soft Parse %: 100.00
Execute to Parse %: 0.02 Latch Hit %: 99.58
Parse CPU to Parse Elapsd %: 51.03 % Non-Parse CPU: 97.11

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 38.19 38.19
% SQL with executions>1: 82.07 82.07
% Memory for SQL w/exec>1: 62.70 62.70

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
log file sync 576,742 63,030 79.60
CPU time 5,618 7.09
enqueue 89,383 3,498 4.42
db file parallel write 5,279 1,595 2.01
ARCH wait on SENDREQ 4,715 1,534 1.94
-------------------------------------------------------------

Wait Events for DB: DNSMGRP Instance: dnsmgrp Snaps: 27820 -27821
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync 576,742 11,980 63,030 109 1.0
enqueue 89,383 8 3,498 39 0.2
db file parallel write 5,279 0 1,595 302 0.0
ARCH wait on SENDREQ 4,715 0 1,534 325 0.0
buffer busy waits 158,149 0 1,061 7 0.3
db file sequential read 18,367 0 844 46 0.0
LGWR wait on SENDREQ 42,890 0 799 19 0.1
latch free 47,410 22,630 746 16 0.1
SQL*Net more data to client 2,213,260 0 334 0 4.0
log file switch completion 702 0 96 137 0.0
log file sequential read 1,783 0 20 11 0.0
log file parallel write 43,567 0 6 0 0.1
control file parallel write 547 0 3 5 0.0
direct path write 115 0 2 17 0.0
async disk IO 6,370 0 1 0 0.0
LGWR wait for redo copy 4,295 0 1 0 0.0
control file sequential read 6,350 0 1 0 0.0
direct path read 115 0 0 3 0.0
wait list latch free 18 0 0 15 0.0
log file single write 20 0 0 2 0.0
buffer deadlock 10,779 10,332 0 0 0.0
SQL*Net break/reset to clien 10 0 0 1 0.0
SQL*Net message from client 20,914,408 0 768,551 37 37.7
SQL*Net message to client 20,914,237 0 61 0 37.7
SQL*Net more data from clien 22 0 0 15 0.0
-------------------------------------------------------------

Background Wait Events for DB: DNSMGRP Instance: dnsmgrp Snaps: 27820 -27821
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
db file parallel write 5,263 0 1,594 303 0.0
ARCH wait on SENDREQ 4,720 0 1,535 325 0.0
LGWR wait on SENDREQ 42,970 0 799 19 0.1
log file sequential read 1,787 0 20 11 0.0
log file parallel write 43,644 0 6 0 0.1
control file parallel write 547 0 3 5 0.0
direct path write 115 0 2 17 0.0
async disk IO 6,383 0 1 0 0.0
LGWR wait for redo copy 4,302 0 1 0 0.0
control file sequential read 6,244 0 1 0 0.0
direct path read 115 0 0 3 0.0
latch free 15 5 0 13 0.0
log file single write 20 0 0 2 0.0
enqueue 8 0 0 5 0.0
db file sequential read 1 0 0 16 0.0
buffer busy waits 1 0 0 0 0.0
rdbms ipc message 4,824 901 5,067 1050 0.0
smon timer 5 2 878 ###### 0.0
pmon timer 305 297 871 2856 0.0
-------------------------------------------------------------

SQL ordered by Gets for DB: DNSMGRP Instance: dnsmgrp Snaps: 27820 -27821
-> End Buffer Gets Threshold: 10000
-> Note that resources reported for PL/SQL includes the resources used by
all SQL statements called within the PL/SQL code. As individual SQL
statements are also reported, it is possible and valid for the summed
total % to exceed 100

CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
11,774,756 285,589 41.2 21.0 1102.17 1187.99 1506460494
Module: JDBC Thin Client
select this_.DNS_ZONE_ID as DNS1_0_2_, this_.VERSION as VERSION0
_2_, this_.MODIFY_DATE as MODIFY3_0_2_, this_.ZONE_NAME as ZONE4
_0_2_, this_.OWNER_NAME as OWNER5_0_2_, this_.VERIFIED as VERIFI
ED0_2_, records2_.DNS_ZONE_ID as DNS9_4_, records2_.DNS_RECORD_I
D as DNS1_4_, records2_.DNS_RECORD_ID as DNS1_2_0_, records2_.VE

11,208,263 294,971 38.0 20.0 1038.03 1553.19 965395428
Module: JDBC Thin Client
select this_.DNS_ZONE_ID as DNS1_0_2_, this_.VERSION as VERSION0
_2_, this_.MODIFY_DATE as MODIFY3_0_2_, this_.ZONE_NAME as ZONE4
_0_2_, this_.OWNER_NAME as OWNER5_0_2_, this_.VERIFIED as VERIFI
ED0_2_, records2_.DNS_ZONE_ID as DNS9_4_, records2_.DNS_RECORD_I
D as DNS1_4_, records2_.DNS_RECORD_ID as DNS1_2_0_, records2_.VE

8,231,342 266,003 30.9 14.7 571.92 650.95 2695346271
Module: JDBC Thin Client
select records0_.DNS_ZONE_ID as DNS9_1_, records0_.DNS_RECORD_ID
as DNS1_1_, records0_.DNS_RECORD_ID as DNS1_2_0_, records0_.VER
SION as VERSION2_0_, records0_.MODIFY_DATE as MODIFY3_2_0_, reco
rds0_.RECORD_NAME as RECORD4_2_0_, records0_.RDATA as RDATA2_0_,
records0_.TTL as TTL2_0_, records0_.RECORD_TYPE as RECORD7_2_0_

7,178,082 285,660 25.1 12.8 393.01 5106.07 4256402077
Module: JDBC Thin Client
insert into LH_DNS_RECORD (VERSION, MODIFY_DATE, RECORD_NAME, RD
ATA, TTL, RECORD_TYPE, EXPIRE, DNS_ZONE_ID, DNS_RECORD_ID) value
s (:1, :2, :3, :4, :5, :6, :7, :8, :9)

5,368,344 266,209 20.2 9.6 248.54 423.10 753678696
Module: JDBC Thin Client
delete from LH_DNS_RECORD where DNS_RECORD_ID=:1 and VERSION=:2

4,946,895 549,641 9.0 8.8 338.00 582.90 1940981072
Module: JDBC Thin Client
update LH_DNS_RECORD set VERSION=:1, MODIFY_DATE=:2, RECORD_NAME
=:3, RDATA=:4, TTL=:5, RECORD_TYPE=:6, EXPIRE=:7, DNS_ZONE_ID=:8
where DNS_RECORD_ID=:9 and VERSION=:10

3,038,276 551,039 5.5 5.4 256.80 437.65 3428910029
Module: JDBC Thin Client
update LH_DNS_ZONE set VERSION=:1, MODIFY_DATE=:2, ZONE_NAME=:3,
OWNER_NAME=:4, VERIFIED=:5 where DNS_ZONE_ID=:6 and VERSION=:7

2,451,468 267,106 9.2 4.4 122.22 337.29 1221287356
Module: JDBC Thin Client
select record0_.DNS_RECORD_ID as DNS1_2_0_, zone1_.DNS_ZONE_ID a
s DNS1_0_1_, record0_.VERSION as VERSION2_0_, record0_.MODIFY_DA
TE as MODIFY3_2_0_, record0_.RECORD_NAME as RECORD4_2_0_, record

SQL ordered by Gets for DB: DNSMGRP Instance: dnsmgrp Snaps: 27820 -27821
-> End Buffer Gets Threshold: 10000
-> Note that resources reported for PL/SQL includes the resources used by
all SQL statements called within the PL/SQL code. As individual SQL
statements are also reported, it is possible and valid for the summed
total % to exceed 100

CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
0_.RDATA as RDATA2_0_, record0_.TTL as TTL2_0_, record0_.RECORD_
TYPE as RECORD7_2_0_, record0_.EXPIRE as EXPIRE2_0_, record0_.DN

873,961 287,528 3.0 1.6 94.31 241.02 1094993204
Module: JDBC Thin Client
select LH_DNS_RECORD_SEQ.nextval from dual

209,573 1 209,573.0 0.4 11.56 12.13 261811750
Module: JDBC Thin Client
delete from LH_DNS_RECORD where EXPIRE<:1

48,883 12,785 3.8 0.1 3.03 3.11 1273402200
Module: JDBC Thin Client
select sharedsecr0_.DNS_SHARED_SECRET_ID as DNS1_4_, sharedsecr0
_.VERSION as VERSION4_, sharedsecr0_.MODIFY_DATE as MODIFY3_4_,
sharedsecr0_.SHARED_SECRET_NAME as SHARED4_4_, sharedsecr0_.SECR
ET as SECRET4_, sharedsecr0_.OWNER_NAME as OWNER6_4_ from LH_DNS
_SHARED_SECRET sharedsecr0_ where sharedsecr0_.SHARED_SECRET_NAM

11,686 4 2,921.5 0.0 1.89 2.29 3685100867
Module: sqlplus@dotmacdb005 (TNS V1-V3)
select 'TransSessions :' || count(*) from v$session s, v$transa
ction t where t.SES_ADDR = s.SADDR and s.taddr = t.addr and to_d
ate (t.START_TIME , 'MM/DD/YY HH24:MI:SS' ) < sysdate - 5/60/24
and s.username not in ('MIGRATION_OWNER')

8,775 3 2,925.0 0.0 4.12 4.07 459106769
Module: emagent@dotmacdb005 (TNS V1-V3)
/* OracleOEM */ SELECT blocking_sid, SUM(num_blocked) num
_blocked FROM (SELECT id1, id2, MAX
(DECODE(block, 1, sid, 0)) blocking_sid, SU
M(DECODE(request, 0, 0, 1)) num_blocked FROM v$l
ock WHERE block = 1 OR request

8,758 2,879 3.0 0.0 1.74 3.07 3716207873
update seq$ set increment$=:2,minvalue=:3,maxvalue=:4,cycle#=:5,
order$=:6,cache=:7,highwater=:8,audit$=:9,flags=:10 where obj#=:
1

5,574 1 5,574.0 0.0 24.56 37.36 2522684317
Module: sqlplus@dotmacdb005 (TNS V1-V3)
BEGIN statspack.snap; END;

1,400 1 1,400.0 0.0 0.18 0.62 173271802
Module: sqlplus@dotmacdb005 (TNS V1-V3)
INSERT INTO STATS$ROLLSTAT ( SNAP_ID , DBID , INSTANCE_NUMBER ,
USN , EXTENTS , RSSIZE , WRITES , XACTS , GETS , WAITS , OPTSIZE
, HWMSIZE , SHRINKS , WRAPS , EXTENDS , AVESHRINK , AVEACTIVE )
SELECT :B3 , :B2 , :B1 , USN , EXTENTS , RSSIZE , WRITES , XACT
S , GETS , WAITS , OPTSIZE , HWMSIZE , SHRINKS , WRAPS , EXTENDS

SQL ordered by Gets for DB: DNSMGRP Instance: dnsmgrp Snaps: 27820 -27821
-> End Buffer Gets Threshold: 10000
-> Note that resources reported for PL/SQL includes the resources used by
all SQL statements called within the PL/SQL code. As individual SQL
statements are also reported, it is possible and valid for the summed
total % to exceed 100

CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------

1,277 78 16.4 0.0 0.10 2.70 3346241830
Module: JDBC Thin Client
insert into LH_DNS_ZONE (VERSION, MODIFY_DATE, ZONE_NAME, OWNER_
NAME, VERIFIED, DNS_ZONE_ID) values (:1, :2, :3, :4, :5, :6)

1,174 1 1,174.0 0.0 0.26 8.53 977245137
Module: sqlplus@dotmacdb005 (TNS V1-V3)
INSERT INTO STATS$SQL_PLAN_USAGE ( SNAP_ID , DBID , INSTANCE_NUM
BER , HASH_VALUE , TEXT_SUBSET , PLAN_HASH_VALUE , COST , ADDRES
S , OPTIMIZER ) SELECT /*+ ordered use_nl(sq) index(sq) */ :B3 ,
:B2 , :B1 , SS.HASH_VALUE , SS.TEXT_SUBSET , SQ.PLAN_HASH_VALUE
, NVL(SQ.OPTIMIZER_COST,-9) , MAX(SS.ADDRESS) , MAX(SQ.OPTIMIZE

1,096 78 14.1 0.0 0.06 0.06 366854126
Module: JDBC Thin Client
insert into LH_DNS_ZONE_SHARED_SECRET (DNS_SHARED_SECRET_ID, DNS
_ZONE_ID) values (:1, :2)

1,057 80 13.2 0.0 0.11 4.25 3751560917
Module: JDBC Thin Client
insert into LH_DNS_SHARED_SECRET (VERSION, MODIFY_DATE, SHARED_S

-------------------------------------------------------------


Instance Activity Stats for DB: DNSMGRP Instance: dnsmgrp Snaps: 27820 -27821

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
CPU used by this session 561,808 624.2 1.0
CPU used when call started 561,812 624.2 1.0
CR blocks created 66,270 73.6 0.1
DBWR buffers scanned 359 0.4 0.0
DBWR checkpoint buffers written 705,877 784.3 1.3
DBWR checkpoints 4 0.0 0.0
DBWR free buffers found 359 0.4 0.0
DBWR lru scans 13 0.0 0.0
DBWR make free requests 13 0.0 0.0
DBWR revisited being-written buff 0 0.0 0.0
DBWR summed scan depth 359 0.4 0.0
DBWR transaction table writes 12,098 13.4 0.0
DBWR undo block writes 89,367 99.3 0.2
DFO trees parallelized 0 0.0 0.0
PX local messages recv'd 0 0.0 0.0
PX local messages sent 0 0.0 0.0
Parallel operations not downgrade 0 0.0 0.0
SQL*Net roundtrips to/from client 20,805,064 23,116.7 37.5
active txn count during cleanout 1,115,649 1,239.6 2.0
background checkpoints completed 5 0.0 0.0
background checkpoints started 4 0.0 0.0
background timeouts 1,657 1.8 0.0
branch node splits 22 0.0 0.0
buffer is not pinned count 32,189,029 35,765.6 58.1
buffer is pinned count 43,867,001 48,741.1 79.1
bytes received via SQL*Net from c 3,094,168,498 3,437,965.0 5,582.5
bytes sent via SQL*Net to client 4,718,364,694 5,242,627.4 8,512.8
calls to get snapshot scn: kcmgss 4,827,128 5,363.5 8.7
calls to kcmgas 652,202 724.7 1.2
calls to kcmgcs 100,928 112.1 0.2
change write time 17,040 18.9 0.0
cleanout - number of ktugct calls 222,889 247.7 0.4
cleanouts and rollbacks - consist 38,603 42.9 0.1
cleanouts only - consistent read 1,854 2.1 0.0
cluster key scan block gets 24 0.0 0.0
cluster key scans 8 0.0 0.0
commit cleanout failures: block l 2 0.0 0.0
commit cleanout failures: buffer 588 0.7 0.0
commit cleanout failures: callbac 1,985 2.2 0.0
commit cleanout failures: cannot 25,753 28.6 0.1
commit cleanouts 3,856,095 4,284.6 7.0
commit cleanouts successfully com 3,827,768 4,253.1 6.9
commit txn count during cleanout 80,795 89.8 0.2
consistent changes 197,530 219.5 0.4
consistent gets 42,500,141 47,222.4 76.7
consistent gets - examination 17,646,522 19,607.3 31.8
current blocks converted for CR 0 0.0 0.0
cursor authentications 0 0.0 0.0
data blocks consistent reads - un 196,751 218.6 0.4
db block changes 9,689,076 10,765.6 17.5
db block gets 13,671,718 15,190.8 24.7
deferred (CURRENT) block cleanout 1,040,175 1,155.8 1.9
dirty buffers inspected 14 0.0 0.0
enqueue conversions 286,825 318.7 0.5
enqueue deadlocks 0 0.0 0.0
enqueue releases 1,774,543 1,971.7 3.2

Instance Activity Stats for DB: DNSMGRP Instance: dnsmgrp Snaps: 27820 -27821

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
enqueue requests 1,774,478 1,971.6 3.2
enqueue timeouts 5 0.0 0.0
enqueue waits 83,756 93.1 0.2
exchange deadlocks 10,723 11.9 0.0
execute count 3,100,419 3,444.9 5.6
free buffer inspected 115 0.1 0.0
free buffer requested 110,619 122.9 0.2
hot buffers moved to head of LRU 144,531 160.6 0.3
immediate (CR) block cleanout app 40,457 45.0 0.1
immediate (CURRENT) block cleanou 449,794 499.8 0.8
index fast full scans (full) 1 0.0 0.0
index fast full scans (rowid rang 0 0.0 0.0
index fetch by key 3,082,584 3,425.1 5.6
index scans kdiixs1 1,422,660 1,580.7 2.6
leaf node 90-10 splits 989 1.1 0.0
leaf node splits 3,897 4.3 0.0
logons cumulative 15 0.0 0.0
messages received 49,256 54.7 0.1
messages sent 49,254 54.7 0.1
no buffer to keep pinned count 0 0.0 0.0
no work - consistent read gets 22,458,026 24,953.4 40.5
opened cursors cumulative 3,104,940 3,449.9 5.6
parse count (failures) 3 0.0 0.0
parse count (hard) 3 0.0 0.0
parse count (total) 3,099,877 3,444.3 5.6
parse time cpu 16,261 18.1 0.0
parse time elapsed 31,868 35.4 0.1
physical reads 18,233 20.3 0.0
physical reads direct 115 0.1 0.0
physical reads direct (lob) 0 0.0 0.0
physical writes 706,006 784.5 1.3
physical writes direct 115 0.1 0.0
physical writes non checkpoint 19,929 22.1 0.0
pinned buffers inspected 0 0.0 0.0
prefetched blocks 0 0.0 0.0
prefetched blocks aged out before 0 0.0 0.0
process last non-idle time 0 0.0 0.0
queries parallelized 0 0.0 0.0
recursive calls 928,075 1,031.2 1.7
recursive cpu usage 14,017 15.6 0.0
redo blocks written 3,349,241 3,721.4 6.0
redo buffer allocation retries 515 0.6 0.0
redo entries 5,079,219 5,643.6 9.2
redo log space requests 620 0.7 0.0
redo log space wait time 8,958 10.0 0.0
redo ordering marks 0 0.0 0.0
redo size 1,649,940,036 1,833,266.7 2,976.8
redo synch time 6,239,358 6,932.6 11.3
redo synch writes 559,277 621.4 1.0
redo wastage 10,816,784 12,018.7 19.5
redo write time 83,325 92.6 0.2
redo writer latching time 78 0.1 0.0
redo writes 43,818 48.7 0.1
rollback changes - undo records a 23,017 25.6 0.0
rollbacks only - consistent read 27,686 30.8 0.1
rows fetched via callback 3,071,580 3,412.9 5.5

Instance Activity Stats for DB: DNSMGRP Instance: dnsmgrp Snaps: 27820 -27821

Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
session connect time 0 0.0 0.0
session cursor cache count 171 0.2 0.0
session cursor cache hits 3,096,958 3,441.1 5.6
session logical reads 56,167,718 62,408.6 101.3
session pga memory 3,563,416 3,959.4 6.4
session pga memory max 4,154,552 4,616.2 7.5
session uga memory 2,312,536 2,569.5 4.2
session uga memory max 2,566,504 2,851.7 4.6
shared hash latch upgrades - no w 1,443,565 1,604.0 2.6
shared hash latch upgrades - wait 378 0.4 0.0
sorts (disk) 0 0.0 0.0
sorts (memory) 444 0.5 0.0
sorts (rows) 158,532 176.2 0.3
summed dirty queue length 29 0.0 0.0
switch current to new buffer 731 0.8 0.0
table fetch by rowid 36,125,141 40,139.1 65.2
table fetch continued row 0 0.0 0.0
table scan blocks gotten 288,368 320.4 0.5
table scan rows gotten 289,425 321.6 0.5
table scans (long tables) 0 0.0 0.0
table scans (short tables) 288,328 320.4 0.5
transaction rollbacks 2,546 2.8 0.0
transaction tables consistent rea 0 0.0 0.0
transaction tables consistent rea 0 0.0 0.0
user calls 20,838,175 23,153.5 37.6
user commits 553,838 615.4 1.0
user rollbacks 429 0.5 0.0
workarea executions - onepass 0 0.0 0.0
workarea executions - optimal 405 0.5 0.0
write clones created in backgroun 7 0.0 0.0
write clones created in foregroun 17,938 19.9 0.0
-------------------------------------------------------------



Tom Kyte
December 14, 2007 - 1:08 pm UTC

is there something relevant you are not telling us? like something about your configuration - maybe standby database is also involved?

log file sync issue

Shivani, December 15, 2007 - 12:34 am UTC

Sorry for not to mention we are having maximize availability dataguard configuration. Actually I was not able to paste complete statspack due to char restriction and while editing missed that part.

Our storage is X-RAID and having solaris box.log file sync as top wait event is causing us high load avarage.So any suggestions will be really helpful.

Thanks...Shivani
Tom Kyte
December 17, 2007 - 10:55 am UTC

I would suggest is is caused by data guard -

ARCH wait on SENDREQ 4,720 0 1,535 325 0.0
LGWR wait on SENDREQ 42,970 0 799 19 0.1

looks like lgwr is waiting for data guard and you have many people simultaneously waiting for lgwr. For example, 100 people go to commit, but before any of them can - lgwr has to wait for the log to be sent to the remote site (buffer full or you are in max data protection mode). If each of them waits 0.5 seconds - you'll have recorded 50 seconds of log file sync wait, and 0.5 seconds of wait by lgwr on the send request to the remote site.

suggest you look at your data guard configuration and ensure you have it set up optimally.

Shivani, December 19, 2007 - 2:34 am UTC

looks like lgwr is waiting for data guard and you have many people simultaneously waiting for lgwr. For example, 100 Thanks for the input. We have maximum availability dataguard configuration synced up by lgwr shipping. The nature of database is frequent commits of small transactions. Can you please elaborate on (buffer full or you are in max data protection mode) and also suggest you look at your data guard configuration and ensure you have it set up optimally. Which areas in dataguard we should look for? Our organization requirement is high availability since we have online store.

Thanks...Shivani


Tom Kyte
December 19, 2007 - 10:07 am UTC

if you have maximum availability - you are basically saying "unless and until we know the transaction is committed on the remote site - you cannot be committed"


http://docs.oracle.com/docs/cd/B19306_01/server.102/b14239/log_transport.htm#sthref554

You are in a mode I've always questioned (I question why someone would want to be in this mode)

Basically this mode says:

a) when standby is available you will wait for redo to be written locally AND for redo to be written remotely and to get confirmation (that is a large latency thing there)

b) when standby is NOT AVAILABLE - you will not wait anymore, we'll just queue locally.


basically saying by (a) that the DATA IS SO IMPORTANT WE CANNOT LOSE ANY OF IT (that is the maximum protection mode)

and then saying by (b) that BUT IF WE CANNOT DO IT, CANNOT PROTECT THE DATA, OH WELL - NO BIG DEAL, WE'LL JUST STOP PROTECTING IT.


Now, without knowing you and your needs, it sounds a bit like maximum performance
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14239/log_transport.htm#sthref559
mode might be more your cup of tea. You (and only you and your management and your business) can answer that however.

sync cost ratio

A reader, March 14, 2008 - 11:33 am UTC

What is sync cost ratio with regard to log file sync and log file parralel write?

How to intrepret the value of sync cost for analysis purposes?

Thanks,

Tom Kyte
March 15, 2008 - 9:48 am UTC

typically, that "ratio" is defined as 'log file sync' time divided by the 'log file parallel write' time.

if it is "large" (bigger than one), then log write IO might be an issue.

Many people can wait for log file sync at the same time - so if 50 people issue commit and they wait 1 unit of time, and lgwr takes 1 unit of time to do it's write, you will have 50 units of log file sync wait time and 1 unit of lgwr IO time.

basically, you would a) identify that log file sync is a problem for you - lots of people waiting on it. then you would b) find out why log file sync is large, a problem.

it could be because log file parallel write is very high (lgwr cannot do IO fast enough)

it could be because you commit like mad and it is entirely unavoidable.

for example

A reader, April 11, 2008 - 2:06 am UTC

Hi Tom,

We have two standby databases. Primary Database sends logs to Standby A using LGWR ASYNC and to Standby B using ARCH.

We see 'LNS wait on SENDREQ' and 'ARCH wait on SENDREQ' as the top 2 wait events in the AWR report.

The question is What is the impact of the above two wait events on the performance of the database and If these wait events can slow down the database, what measures can be taken to reduce the wait.

Regards
Prashanth
Tom Kyte
April 11, 2008 - 8:01 am UTC

you'll want to read Note 233491.1 in metalink, it describes these..

Very large log file sync waits

Ivan Rivolta, April 16, 2008 - 11:40 am UTC

Hi Tom
I have a very high log file sync. Our application is almost real time and its have many commits in transaction very short. This is the statpackreport in one hour. We have the database in Sun 6900 (24 cpu) and storage EMC (Raid 0+1) with BCV. I appreciate your answer.
STATSPACK report for

STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
SMS 612928394 SMS 1 9.2.0.6.0 NO telpago5

Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 31169 16-Apr-08 10:00:06 704 219.9
End Snap: 31170 16-Apr-08 11:00:07 707 219.2
Elapsed: 60.02 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 24,576M Std Block Size: 4K
Shared Pool Size: 512M Log Buffer: 4,096K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 4,918,351.48 2,303.80
Logical reads: 73,159.53 34.27
Block changes: 30,336.51 14.21
Physical reads: 303.92 0.14
Physical writes: 2,525.10 1.18
User calls: 14,604.77 6.84
Parses: 13.74 0.01
Hard parses: 0.20 0.00
Sorts: 23.94 0.01
Logons: 0.20 0.00
Executes: 12,457.98 5.84
Transactions: 2,134.89

% Blocks changed per Read: 41.47 Recursive Call %: 13.46
Rollback per transaction %: 4.94 Rows per Sort: 9.08

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.98 Redo NoWait %: 99.99
Buffer Hit %: 99.89 In-memory Sort %: 100.00
Library Hit %: 100.01 Soft Parse %: 98.52
Execute to Parse %: 99.89 Latch Hit %: 99.37
Parse CPU to Parse Elapsd %: 96.66 % Non-Parse CPU: 99.94

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 98.21 98.23
% SQL with executions>1: 32.85 32.75
% Memory for SQL w/exec>1: 31.32 31.00

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
log file sync 7,327,704 120,889 84.75
CPU time 11,759 8.24
db file sequential read 295,536 3,759 2.64
log file parallel write 2,249,335 3,065 2.15
db file parallel write 35,902 1,136 .80
-------------------------------------------------------------
^LWait Events for DB: SMS Instance: SMS Snaps: 31169 -31170
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync 7,327,704 0 120,889 16 1.0
db file sequential read 295,536 0 3,759 13 0.0
log file parallel write 2,249,335 0 3,065 1 0.3
db file parallel write 35,902 0 1,136 32 0.0
log file switch completion 4,221 0 773 183 0.0
latch free 76,140 17,595 376 5 0.0
log file sequential read 17,686 0 360 20 0.0
direct path read 67,601 0 187 3 0.0
buffer busy waits 56,140 81 129 2 0.0
PX Deq: Execute Reply 78 11 55 701 0.0
enqueue 12,107 1 52 4 0.0
control file parallel write 2,430 0 33 14 0.0
control file sequential read 11,304 0 26 2 0.0
db file scattered read 1,321 0 12 9 0.0
async disk IO 35,254 0 11 0 0.0
SQL*Net more data to client 419,955 0 10 0 0.1
wait list latch free 371 0 6 16 0.0
process startup 42 0 5 118 0.0
LGWR wait for redo copy 52,426 0 1 0 0.0
direct path write 5,305 0 1 0 0.0
db file parallel read 1 0 0 236 0.0
log file single write 18 0 0 6 0.0
PX Deq: Signal ACK 5 2 0 20 0.0
index block split 2 2 0 16 0.0
SQL*Net break/reset to clien 54 0 0 0 0.0
L1 validation 1 1 0 13 0.0
db file single write 18 0 0 1 0.0
PX Deq: Join ACK 3 0 0 0 0.0
PX Deq: Parse Reply 3 0 0 0 0.0
buffer deadlock 943 932 0 0 0.0
undo segment extension 238 234 0 0 0.0
SQL*Net message from client 52,589,625 0 1,815,840 35 6.8
PX Idle Wait 2,214 2,210 4,326 1954 0.0
wakeup time manager 116 116 3,378 29124 0.0
jobq slave wait 830 806 2,409 2903 0.0
SQL*Net message to client 52,589,630 0 52 0 6.8
PX Deq: Execution Msg 86 3 9 105 0.0
SQL*Net more data from clien 45,983 0 1 0 0.0
-------------------------------------------------------------
^LBackground Wait Events for DB: SMS Instance: SMS Snaps: 31169 -31170
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file parallel write 2,249,289 0 3,065 1 0.3
db file parallel write 35,902 0 1,136 32 0.0
log file sequential read 17,683 0 360 20 0.0
control file parallel write 2,430 0 33 14 0.0
control file sequential read 3,162 0 19 6 0.0
async disk IO 35,249 0 11 0 0.0
direct path read 19,305 0 4 0 0.0
LGWR wait for redo copy 52,426 0 1 0 0.0
latch free 164 7 1 7 0.0
direct path write 5,301 0 1 0 0.0
db file sequential read 47 0 0 7 0.0
log file single write 18 0 0 6 0.0
db file scattered read 4 0 0 25 0.0
rdbms ipc reply 70 0 0 0 0.0
db file single write 18 0 0 1 0.0
buffer busy waits 31 0 0 0 0.0
rdbms ipc message 304,968 10,270 37,316 122 0.0
smon timer 74 3 3,560 48110 0.0
pmon timer 1,200 1,200 3,501 2918 0.0
-------------------------------------------------------------

Tom Kyte
April 16, 2008 - 4:13 pm UTC

... I appreciate your answer. ...

I'd appreciate a question? I guess it is "how do we lower logfile sync"

the only way to lower log file sync waits would typically be

a) commit less frequently - only if the application can safely permit it.
b) make log devices faster - eg: ensure the lgwr and arch have uncontended access to these devices and that these devices are as good as they get. They already look pretty good t
hough:

                                                                   Avg
                                                     Total Wait   wait    Waits
Event                               Waits   Timeouts   Time (s)   (ms)     /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file parallel write         2,249,335          0      3,065      1      0.3


another (not so far fetched) idea would be to use clustering - to give you more than one lgwr in effect for your database.

Very high log file sync waits

Ivan Rivolta, April 16, 2008 - 6:31 pm UTC

Thanks for the answer Tom only two question, can you explain better your idea about clustering and "that give more than one lgwr in effect in Database", other question we have in our database the parameter log_parallelism =2, this value you think is good, or your think is better change or this parameter dont help for the problem. Thanks for your time
Regards Ivan Rivolta
Tom Kyte
April 16, 2008 - 9:44 pm UTC

I'm talking about Real Application Clusters - RAC.



that init.ora parameter (goes away in current releases) would have you waiting on....


<quote>
and you are experiencing very high contention on the redo allocation latch
</quote>

you are, well, waiting on log file sync, that'll do NOTHING for the IO waits.

scott duncan, June 18, 2008 - 2:55 pm UTC

Hi Tom,

regarding log file sync waits..

We currently have a clients system that is due to go live on 10g 10.2.0.3 very soon having been running very well on 9.2.0.8 for several years.

It was noted during testing however that one very key application within our product that uses OCI is processing much slower on 10g. After investigation it was determined that the slowness is related to the application reading a msg record from our own msg_queue table and then processing appropriately and then updating a column search_flag = NULL as follows :

UPDATE MSG_QUEUE SET search_flag = :b1
WHERE
msg_id = :b2 AND search_flag = :b3

search flag is designed to allow rapid access to msgs that are not yet processed via having an index on that column. Msgs that are processed have search_flag set back to NULL and are therefore not in the index.

The update here seems to work very efficiently as expected. However the app very inefficiently commits every record. It has always done this however and has always worked better on the 9i system. Having traced with 10046 level 12, several times now the trace files seem to indicate that the additional time in 10g is being spent waiting on Log File Sync events. The overall time difference between 9i and 10g in our application can become very large as it is doing thousands of these updates to this table. We isolated the problem to this update and wrote a very small test program to perform and update and commit within a loop several thousand times. It was clear that the difference in time seems to again be being spent doing Log file sync. The difference is of the order of e.g. 10 secs vers 200 secs. Strangely while it happens via the OCI app it does not seem to happen when we run the same updates and commits ( each time around the loop ) via sqlplus or PL/SQL developer.
I know you would say that commiting each time around the loop is very bad. Is there any reason for it be even worse in 10g ?

Below is an exert from the end of a trace from the actual OCI application :

------------------------------------------------------------------------------------------------------
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 20503 0.31 0.45 0 0 0 0
Execute 20505 8.85 1000.29 92 79240 188384 15313
Fetch 5253 2.73 3.39 0 446312 0 17537
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 46261 11.90 1004.14 92 525552 188384 32850

Misses in library cache during parse: 3
Misses in library cache during execute: 1

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
pipe get 328 5.00 798.45
SQL*Net message to client 38104 0.00 0.06
log file sync 15445 0.33 1791.52
SQL*Net message from client 38104 4.75 17.41
buffer busy waits 78 0.89 0.89


What is most confusing here is that :

1) the elapsed time does not seem to include the stated Total Waited on log file sync as the latter is much larger ?
2) this large log file sync wait time only shows up in the overall totals and I can't see it accounted for in any wait events above that?
3) the Total Waited does not state any units but it seems to be in seconds, is that correct?
4) If it is log file sync, how can it possibly happen via OCI apps and not via sqlplus or PL/SQL developer? That doesn't seem to make sense.
5) from above can you say whether the scale of log file sync seems to be the problem ? Is this enough information ?

If you can possibly clarify these points at all it would really help a lot.
If you can confirm it looks to be that Log file sync is a problem that would also.

One concern I had was that perhaps I may be reading the units wrongly and perhaps log file sync is fine but it certainly looks to be the problem.

------------------------------------------------------------------------------------------------------

and below is from the simple OCI test program.

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3004 1.28 1.33 0 0 0 0
Execute 3004 1.65 1.98 0 14000 6000 2001
Fetch 1002 0.06 0.08 0 0 0 1003
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 7010 3.00 3.40 0 14000 6000 3004

Misses in library cache during parse: 1003
Misses in library cache during execute: 2

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 7009 0.00 0.01
SQL*Net message from client 7009 0.01 1.52
SQL*Net break/reset to client 2 0.00 0.00
log file sync 2000 0.21 72.15
latch: session allocation 1 0.00 0.00

------------------------------------------------------------------------------------------------------

Both the real OCI app and Test program appear to highlight 'log file sync' waits. We initially thought that the slowdown from 9i to 10g may be that the redo logs on the 10g system were on SAN rather than local disk but we got someone to move the redo logs and re-test but it was still just as bad.

This is about to throw some major plans for a 10g production upgrade right out the window so we would be very grateful for any clarification on log file sync, help or advice that you can give.

I hope that the information provided is not too sketchy. You may well require a lot more detailed trace information etc..

I shall look forward to hearing any feedback you may be able to provide.

Best Regards and thanks,

Scott Duncan

Tom Kyte
June 18, 2008 - 3:35 pm UTC

can you run 9i on the SAME DEVICES as 10g was running on

or can you run 10g on the SAME DEVICES as 9i was running on


both of these would exhibit this log file sync wait - both of them. There is something else at work here.


Also, if your sqlplus/plsql developer tests were done with plsql, then you are seeing a plsql optimization


begin
  for x in 1 .. 1000000
  loop
    update t set foo = bar where id = X;
    commit;
  end loop
end;


there, plsql knows that you don't know how far it got, so it uses an async commit - each commit doesn't wait for a log file sync, it just continues. plsql would only wait for the LAST commit to finish - not all of the intervening ones

(so one thought, one very smart thought, move code into plsql :) it is the efficient way to process data and we can hide lots of bad programming constructs from you and protect you from them)


Scott Duncan, June 19, 2008 - 9:52 am UTC

Thanks for your very helpful feedback. It certainly clears up why our small test was fast in PL/SQL developer and sqlplus.

Unfortunately the logic in real C++ / OCI application is complex and cannot be easily transferred into PLSQL, otherwise that would have been a great idea for sure.

Regretably we are also unable to set up the 9i and 10g on the same devices as this set up has been configured on a remote client site and it is unlikley we can re-install versions of oracle at this late stage in the upgrade process. I appreciate it would help enormously if we could make all things other than the Oracle version equal.

One thing I should perhaps mention is that when we tried to replicate the problem on our own hardware with the same software 10g was fine. It makes me think that we are missing something in the redo config. Our DBAs did clarify that the redo logs on 9i were simpler with a redo log on each of several drives whereas for 10g they created a config that we have apparently been using ( without issue elsewhere ) that has I think 3 groups of redo logs multiplexed onto several drives ? He wondered if this may be the cause of the problem as it is a more complex configuration. Let me know if you need me to be clearer on this redo set up as my description is not the best.

I explained that if it is redo config a statspack report may help to get more detailed event info and to me it seems to indicate that the problem is related to writing the redo logs ? :-

Below are some exerts from statspack that may help I suppose.


Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
----------------------------------------- ------------ ----------- ------ ------
log file sync 4,498 367 82 27.1
CPU time 326 24.1
PL/SQL lock timer 223 222 995 16.4
log file parallel write 4,204 211 50 15.6
db file scattered read 3,283 56 17 4.1
-------------------------------------------------------------

Avg
%Time Total Wait wait Waits
Event Waits -outs Time (s) (ms) /txn
--------------------------------- ------------ ------ ---------- ------ --------
log file sync 4,498 0 367 82 1.0
PL/SQL lock timer 223 100 222 995 0.0
log file parallel write 4,204 0 211 50 0.9
db file scattered read 3,283 0 56 17 0.7
control file parallel write 547 0 39 71 0.1
db file parallel write 1,089 0 36 33 0.2

Total ----------------- % of Waits ------------------
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
LGWR wait for redo copy 10 100.0
PL/SQL lock timer 223 100.0
SQL*Net break/reset to cli 12 91.7 8.3
SQL*Net more data to clien 46K 99.9 .1 .0 .0 .0 .0
buffer busy waits 8 100.0
control file parallel writ 547 4.6 17.2 6.6 5.1 3.8 .7 62.0
control file sequential re 1628 89.6 4.8 .9 1.7 .8 .3 2.0
db file parallel write 1089 15.2 3.4 7.1 11.6 19.9 21.9 20.9
db file scattered read 3283 9.4 4.7 5.6 28.3 23.9 13.0 15.1
db file sequential read 4594 10.1 3.0 12.0 42.6 25.9 5.7 .7
db file single write 108 .9 .9 .9 97.2
direct path read 1509 56.7 5.3 13.0 18.8 5.0 1.0 .3
direct path write 113 92.9 2.7 .9 .9 2.7
enq: CF - contention 6 16.7 16.7 50.0 16.7
enq: TX - index contention 5 40.0 60.0
latch: checkpoint queue la 1 100.0
log file parallel write 4204 42.1 4.9 2.1 .7 .4 .3 49.5
log file sync 4498 19.3 2.9 1.3 .4 .4 1.0 74.8
os thread startup 2 50.0 50.0
rdbms ipc reply 70 78.6 7.1 14.3
SQL*Net message from clien 27K 60.1 2.2 1.6 1.5 7.7 18.9 7.3 .6
SQL*Net message to client 27K 100.0
SQL*Net more data from cli 74 100.0
Streams AQ: qmn coordinato 91 49.5 50.5
Streams AQ: qmn slave idle 45 100.0
class slave wait 2 100.0
pipe get 123 6.5 93.5
pmon timer 449 5.3 94.7
rdbms ipc message 8073 29.1 1.6 .3 3.7 1.2 6.0 19.1 39.1
smon timer 7 100.0
-------------------------------------------------------------


Does 'log file parallel write' waits imply that we are having I/O problems writing to the redo logs and log file sync is waiting on that ?
We did get a DBA to move the REDO logs from SAN to local drives and they said the test timing was still poor, so they put them back on SAN.
They also believe that the SAN access is efficient and is not the problem.

Please let me know if the above helps to clarify what is wrong and/or whether you need further info, e.g. from statspack.

Any further advice would be much appreciated.

Thanks again for the help you have already given so far.

Scott
Tom Kyte
June 19, 2008 - 12:58 pm UTC

... It makes me
think that we are missing something in the redo config. Our DBAs did clarify
that the redo logs on 9i were simpler with a redo log on each of several drives
whereas for 10g they created a config that we have apparently been using (
without issue elsewhere ) that has I think 3 groups of redo logs multiplexed
onto several drives ? He wondered if this may be the cause of the problem as it
is a more complex configuration. Let me know if you need me to be clearer on
this redo set up as my description is not the best.
...

I think that has everything to do with it - you changed basically one big thing - how redo is setup, configured. Different disks, different strategy, different everything - devices and all.

And your major wait? The big difference? IO waits for log writes.

log file sync has to wait for the log file parallel write to complete.


I'm not really sure what I'm looking at above - interesting things to see would be a comparision between 9i and 10g for the SAME WORKLOAD.

check alert.log

A reader, June 19, 2008 - 1:18 pm UTC

You may want to check alert.log to see if the lgwr is waiting for next online log to be available.
Tom Kyte
June 19, 2008 - 1:58 pm UTC

that wait would appear as well right here.

Scott Duncan, June 19, 2008 - 2:40 pm UTC

Thanks once again for all your input which has been most helpful. I completely agree with you that the redo config was completely changed. Thanks for confirming that Log File Sync is indeed waiting on Log File Parallel Writes due to poor I/O to the redo logs.

I mentioned above that we got a DBA to relocate the redo to local drives as we suspected that the problem was poor I/O to SAN but when he tested he had observed performance was still just as poor and put the redo logs back on SAN. Unfortunately, after we now clarified that the problem that is being highlighted in statspack is definitely that I/O ( log file parallel write waits ) to the redo logs are slow, the DBA has realised that the redo logs were mistakenly not fully relocated to local drives. Apparently having tested again with the redo logs on the only local drive - C: - and with multiplexing off ( same as it was on 9i ) the performance is just as good as 9i. It appears that it was simple error in relocating the redo as D: drive was thought to be local but wasn't.

As I understand it the DBAs are now deciding what to do with redo multiplexing and they thought that they need the redo logs on SAN to support Net App DR but they are having another look at that.

We now understand the problem and can progress from here so thanks once again for all your help.

Best Regards,

Scott


log file sync

Tito, September 01, 2008 - 4:25 am UTC

Hello Tom,

We're experience some log file syncs and log file parallel writes since some months ago ...

The situation is the following:

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
CUBA 2632031150 cuba 1 9.2.0.8.0 NO cuba.uoc.es

Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 15182 01-Sep-08 07:00:01 611 1.5
End Snap: 15183 01-Sep-08 07:15:01 616 1.4
Elapsed: 15.00 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 7,024M Std Block Size: 8K
Shared Pool Size: 1,008M Log Buffer: 4,096K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 14,793.58 1,536.91
Logical reads: 51,441.30 5,344.24
Block changes: 91.31 9.49
Physical reads: 1,424.26 147.97
Physical writes: 4.82 0.50
User calls: 388.60 40.37
Parses: 66.32 6.89
Hard parses: 0.01 0.00
Sorts: 18.96 1.97
Logons: 0.24 0.02
Executes: 65.90 6.85
Transactions: 9.63

% Blocks changed per Read: 0.18 Recursive Call %: 12.86
Rollback per transaction %: 13.29 Rows per Sort: 1096.00

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 97.23 In-memory Sort %: 100.00
Library Hit %: 99.99 Soft Parse %: 99.98
Execute to Parse %: -0.64 Latch Hit %: 99.92
Parse CPU to Parse Elapsd %: 87.62 % Non-Parse CPU: 98.80

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 94.46 94.54
% SQL with executions>1: 59.91 59.91
% Memory for SQL w/exec>1: 51.34 51.36

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
log file sync 9,335 3,183 69.84
log file parallel write 4,930 669 14.68
control file parallel write 538 263 5.78
CPU time 224 4.91
db file sequential read 86,799 164 3.61
-------------------------------------------------------------

As you can see, the top 2 events numbers are just waaaay out from our usuals. I say this because we see the increasing everynight from midnight till dawn.

We have no I/O problems on redo log file systems, in fact, the behaviour is completely ok during work hours. So the theory is that exists some process which commits a lot. Like crazy.

This is like an X file. We have a couple of dataguard databases but we have them configurated as MAXIMUM PERFORMANCE. Nevertheless, we tried suspending the archiving syncro with no positive results. We have suspend jobs, crons ..

The question is. Is there any way in which i could find the SQL or whatever the process might be, in order to detect the commit maker ?? ..

Developers, as usual, doesn't have any idea ...

Thanks for your big help
Tom Kyte
September 01, 2008 - 1:06 pm UTC

... As you can see, the top 2 events numbers are just waaaay out from our usuals ...

I cannot "see" that, I just see that you committed about 9,000 times in that 15 minute period - about 10 times per second. And each time you waited about 1/3 of a second for the log write to complete. I don't see "just waaaay out" - I just see one simple observation that doesn't get me excessively excited or panicky


I don't see how you can say "we have no I/O problems on redo log file system". It sure looks like you might - 4930 waits for a log file parallel write, 669 seconds - that is 0.13 seconds on average which seems very high for a write to the redo log perhaps.


Let me ask you this - how is your application response time? Is anyone complaining? If you are hunting so hard - could it be that it is something you not really need to be concerned with (you are not knowing what application would be affected - so maybe the application isn't materially affected by this)

log file sync

Tito, September 02, 2008 - 6:58 am UTC

Hello Tom .. thanks for your answer.

The fact is that the report was taken early in the morning, where there are very few people connected to our systems. That's why 10 times per second seems to be a little high (obviously there is some process doing that, the commit maker -- maybe --).

On the other hand, our redo log files are located on a Direct I/O via Fibre Channel, and moreover the log file sync appears in the top 10 just from midnite till dawn. the rest of the day, which happens to be the rush hours, the log file sync is less that 1% of the whole system waits. Less than 4 seconds.

About the application, yes ...is hanging. Some nights, our 24*7 team have had to restart some services in order to try to restablish them. But as the log file sync wait continue ...

Doing research --- We have "broken" our database jobs, stop dataguard syncro, stop RMAN backups, stop some night processes ... no results, same behaviour every night. Can we catch that commit maker?. We can't see it on the statpack reports, top sessions ...

Thanks for your help !
Tom Kyte
September 02, 2008 - 11:55 am UTC

sounds like a backup window doesn't it.... could it be that you in fact do have an IO problem - and the sluggish IO response times are coming from the fact that you are backing up this impressive sounding IO system?


cannot you look at the sql in the statspack and sort of figure out what applications were running.

You could write a monitor job that queries v$sesstat to see what sessions are issuing lots of commits - you *might* find something - however, then again, it might just be that you have lots of sessions and no single session is committing a lot.


log file sync

Nitin, December 03, 2008 - 11:41 am UTC

I found this to be a very helpful document as I'm facing the same problem of very high value of 'log file sync' waits. It is the top most wait event shown by AWR reports.

I would like to describe the scenario here a bit. The database is setup of Amazon EC2 and has a very high number of unavoidable commits. Currently, all the redo log files are on the same EBS volume as of datafiles.

Very simply, do you think creating couple of small EBS volumes and spreading the log files across them will help?

Thanks a lot in advance. BTW, I totally love this forum!
Tom Kyte
December 09, 2008 - 10:22 am UTC

"it depends"

there will always be a set of wait events, no matter how much you 'tune', there will always be a top five - always.


You give NO indication really here that log file sync waits are a LARGE wait, just the most prevalent. If your application that you are trying to tune isn't waiting a measurable amount of time for log file sync - then fixing this will fix nothing.

For example, let us say

You have 100,000 transactions in some period of observation.
The response time is 0.1 seconds.
Of that, 0.01 seconds is spent waiting on log file sync.


So, in this period of observation, you have 100,000 log file sync waits, the total time is 1,000 seconds of wait for that (simple math).


If you 'fix' this wait event, your transaction response time will decrease from 0.1 seconds to 0.09 seconds (maybe, it could go up too - it might not change at all).


What I'd be looking for would be something that could really speed up the transaction time - that would likely involve looking at what the transaction itself does and asking myself "is there a better way?"


so, in summary:

if log file sync is what is causing a serious bottleneck in your transaction processing (that is, it is a LARGE percent of your runtime, waiting for it is), then making that wait 'smaller' could be beneficial. And tuning IO would be one way to accomplish that.

log file sync

Nitin, December 10, 2008 - 2:01 am UTC

Thanks for the update on this. And yes, I missed out on the details as I thought of it as more of a logical question. My mistake! Here it is:

The response time was reaching somewhere between 1 sec and 2.5 odd secs and the % of this wait even was showing at about 58% in AWR report, which I believe is a considerable time.

Also, in trace files I could see that the LGWR had to wait for anywhere between 800 - 2000 ms before it was able to write the data to the logfile.

BTW, as of now, I have configured another instance which writes logs on a different EBS volume than the datafiles as well as providing it with more CPU power and it seems to be working fine.

Also, you said it correctly, there are things which can be improved in the application as well because each transaction consists of a search on multiple tables before it inserts into 1/2 table(s) and the search query wasn't very efficient when the amount of data reached to millions of records in those tables.

This probably is a different question but I'm still trying to figure out the point of balance between fast read/write when it comes to using more indices.

Thanks for your help again and I would be looking forward to more of your expert comments on this. :-)
Tom Kyte
December 10, 2008 - 9:49 am UTC

... which I believe is
a considerable time.
...

not given what you've said. 58% *of what*...


but given a 2 second wait for log file sync, that would be bad - 0.8 to 2.0 seconds is glacial timing.


the top wait events, not nearly as interesting or as useful as "in trace files I could see that the LGWR had to wait for anywhere between
800 - 2000 ms before it was able to write the data to the logfile.". That shows "yes, you had a problem there"


log file sync value equal log file parallel write

yoav ben moha, April 06, 2009 - 6:40 am UTC

Hi Tom ,

From above :

"
the only way to lower log file sync waits would typically be
a) commit less frequently - only if the application can safely permit it.
b) make log devices faster - eg: ensure the lgwr and arch have uncontended access to
these devices and that these devices are as good as they get.
"

We have 3rd party application that consistenly show at the top 5 event :
log file sync
log file parallel write.

Also i notice that there values are almost identical .
For example :

Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
log file sync 176,871 1,950 11 74.1 Commit
log file parallel write 175,274 1,411 8 53.7 System I/O
CPU time 752 28.6
db file scattered read 10,586 229 22 8.7 User I/O
log file sequential read 2,140 120 56 4.5 System I/O
-------------------------------------------------------------

The vendor said that the tool is commiting each 100 bulk insert records.
We changed this bulk commit parameter , to commit each 10,000 records - in order to reduce the amount of commit.

As you can see from the awr bellow , After making the application commiting less frequence the amount
of wait time is currently 1/3 of the time befor the change (for 1 hour report in same time)

Also there is an improvment, a one hour awr report still show log file sync and log file parallel write
at the top of the awr report and there values are still almost identical.
1. What cause there values to be almost identical ? Is it indicatition for somthing i should be aware of ?
2. The DB Time is 23 min and the log file sync total waits is about 10 min .
Yet , an avarge time of 12 ms not seems like a high value - which make me think that it not related to
"make log devices faster" as you mention.
Would you consider the increase more the bulk insert commiting to an higher value ?

Regards


Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 3951 06-Apr-09 07:00:23 65 3.5
End Snap: 3952 06-Apr-09 08:00:27 73 3.5
Elapsed: 60.07 (mins)
DB Time: 23.00 (mins)

Cache Sizes
~~~~~~~~~~~ Begin End
---------- ----------
Buffer Cache: 548M 564M Std Block Size: 8K
Shared Pool Size: 232M 216M Log Buffer: 6,152K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 470,800.03 32,384.88
Logical reads: 3,306.16 227.42
Block changes: 3,099.88 213.23
Physical reads: 70.74 4.87
Physical writes: 38.17 2.63
User calls: 40.09 2.76
Parses: 20.35 1.40
Hard parses: 0.26 0.02
Sorts: 4.82 0.33
Logons: 0.06 0.00
Executes: 24.21 1.67
Transactions: 14.54

% Blocks changed per Read: 93.76 Recursive Call %: 57.16
Rollback per transaction %: 0.00 Rows per Sort: 26.75

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.83 Redo NoWait %: 100.00
Buffer Hit %: 97.86 In-memory Sort %: 100.00
Library Hit %: 98.34 Soft Parse %: 98.74
Execute to Parse %: 15.94 Latch Hit %: 99.83
Parse CPU to Parse Elapsd %: 38.77 % Non-Parse CPU: 98.78

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 88.95 82.51
% SQL with executions>1: 77.24 64.67
% Memory for SQL w/exec>1: 74.36 67.94

Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
log file sync 52,387 626 12 45.4 Commit
log file parallel write 52,200 599 11 43.4 System I/O
CPU time 305 22.1
db file scattered read 9,949 211 21 15.3 User I/O
read by other session 18,809 184 10 13.4 User I/O
-------------------------------------------------------------
^LTime Model Statistics DB/Inst: Snaps: 3951-3952
-> Total time in database user-calls (DB Time): 1380.2s

Statistic Name Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time 748.3 54.2
DB CPU 304.9 22.1
parse time elapsed 10.9 .8
hard parse elapsed time 9.9 .7
PL/SQL execution elapsed time 2.1 .2
PL/SQL compilation elapsed time 0.5 .0
connection management call elapsed time 0.4 .0
hard parse (sharing criteria) elapsed time 0.2 .0
failed parse elapsed time 0.0 .0
hard parse (bind mismatch) elapsed time 0.0 .0
repeated bind elapsed time 0.0 .0
DB time 1,380.2 N/A
background elapsed time 949.0 N/A
background cpu time 15.4 N/A
-------------------------------------------------------------

Avg
%Time Total Wait wait Waits
Wait Class Waits -outs Time (s) (ms) /txn
-------------------- ---------------- ------ ---------------- ------- ---------
System I/O 74,286 .0 940 13 1.4
Commit 52,387 .0 626 12 1.0
User I/O 45,079 .0 543 12 0.9
Configuration 462 5.4 69 148 0.0
Other 337 .6 4 13 0.0
Concurrency 2,205 .1 2 1 0.0
Network 146,589 .0 0 0 2.8
Application 20 .0 0 3 0.0
-------------------------------------------------------------

Avg
%Time Total Wait wait Waits
Event Waits -outs Time (s) (ms) /txn
---------------------------- -------------- ------ ----------- ------- ---------
log file sync 52,387 .0 626 12 1.0
log file parallel write 52,200 .0 599 11 1.0
db file scattered read 9,949 .0 211 21 0.2
read by other session 18,809 .0 184 10 0.4
log file sequential read 1,843 .0 150 81 0.0
db file sequential read 15,992 .0 145 9 0.3
db file parallel write 9,789 .0 100 10 0.2
control file sequential read 6,772 .0 52 8 0.1
log buffer space 354 .0 46 131 0.0
control file parallel write 1,885 .0 37 20 0.0
log file switch completion 80 3.8 22 279 0.0
block change tracking buffer 10 .0 2 204 0.0
buffer busy waits 1,401 .1 2 1 0.0
enq: CF - contention 9 .0 1 118 0.0
direct path write 167 .0 1 5 0.0
Log archive I/O 1,705 .0 1 1 0.0
direct path read 162 .0 1 4 0.0
change tracking file synchro 98 .0 1 5 0.0
change tracking file synchro 78 .0 1 7 0.0
log file single write 72 .0 0 6 0.0
SQL*Net message to client 142,367 .0 0 0 2.7
os thread startup 9 .0 0 15 0.0
latch free 4 .0 0 18 0.0
rdbms ipc reply 6 .0 0 12 0.0
enq: RO - fast object reuse 12 .0 0 6 0.0
kksfbc child completion 1 100.0 0 50 0.0
SQL*Net more data from clien 1,551 .0 0 0 0.0
ksfd: async disk IO 20 .0 0 2 0.0
SQL*Net more data to client 2,671 .0 0 0 0.1
latch: shared pool 33 .0 0 0 0.0
latch: cache buffers chains 751 .0 0 0 0.0
LGWR wait for redo copy 117 .0 0 0 0.0
cursor: pin S wait on X 1 100.0 0 11 0.0
enq: TX - allocate ITL entry 4 .0 0 1 0.0
reliable message 9 .0 0 0 0.0
SQL*Net break/reset to clien 8 .0 0 0 0.0
row cache lock 3 .0 0 0 0.0
latch: row cache objects 4 .0 0 0 0.0
latch: redo allocation 2 .0 0 0 0.0
undo segment extension 24 91.7 0 0 0.0
latch: cache buffers lru cha 2 .0 0 0 0.0
latch: In memory undo latch 2 .0 0 0 0.0
cursor: mutex S 1 .0 0 0 0.0
buffer deadlock 1 100.0 0 0 0.0
SQL*Net message from client 142,360 .0 36,753 258 2.7
class slave wait 295 6.1 5,087 17245 0.0
Streams AQ: waiting for time 37 75.7 3,714 100366 0.0
Streams AQ: waiting for mess 724 99.4 3,518 4859 0.0
ASM background timer 1,266 .0 3,516 2777 0.0
Streams AQ: qmn slave idle w 129 .0 3,513 27234 0.0
Streams AQ: qmn coordinator 262 50.8 3,513 13409 0.0
virtual circuit status 120 100.0 3,487 29060 0.0
PL/SQL lock timer 66 100.0 3,450 52275 0.0
jobq slave wait 41 97.6 118 2867 0.0
KSV master wait 296 .7 6 21 0.0

Tom Kyte
April 13, 2009 - 10:09 am UTC

you have more than one session going right?

You seem to use almost no cpu - what is this program doing? It isn't doing much in the database at all - I think you would want to focus on what the APPLICATION is doing here.

If you have 20 sessions all committing and you have 10 minutes of "log file sync" time, each of the 20 sessions waited an average of 30 seconds apiece. In an hour, making something 30 seconds faster - won't do much.


Can you tell me more about this application - why is the cpu so LOW? What is the degree of concurrency here? of the 60 or so sessions, which are relevant? Have you traced the application you are trying to tune (eg: NOT a database level report, but an application level trace)

Trying to solv "log file sync"

Lasse Jenssen, June 07, 2010 - 7:43 am UTC

Hi Tom

Once in a while it seems like we're having a "log file sync" issue. The users complain about long wait times, and I see the following in my statspack report (30 minutt snap with an average physc at about 1,15):

Statspack:
DB Name         DB Id    Instance     Inst Num Release     Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
FD            1334016727 XX                  1 9.2.0.8.0   NO      xxxxxxxxxxxxx

              Snap Id     Snap Time      Sessions Curs/Sess Comment
            --------- ------------------ -------- --------- -------------------
Begin Snap:     19069 07-Jun-10 09:45:02      239       6.4
  End Snap:     19070 07-Jun-10 10:15:01      259       7.1
   Elapsed:               29.98 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
               Buffer Cache:     2,560M      Std Block Size:          8K
           Shared Pool Size:       512M          Log Buffer:      1,024K

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:            314,054.34              3,322.26
              Logical reads:            209,294.54              2,214.05
              Block changes:              2,058.82                 21.78
             Physical reads:                933.52                  9.88
            Physical writes:                140.07                  1.48
                 User calls:              2,259.04                 23.90
                     Parses:                498.78                  5.28
                Hard parses:                 15.78                  0.17
                      Sorts:                279.83                  2.96
                     Logons:                  2.53                  0.03
                   Executes:              1,365.62                 14.45
               Transactions:                 94.53



Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                     % Total
Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
log file sync                                     167,778       3,641    40.75
db file sequential read                           633,943       3,525    39.46
CPU time                                                        1,234    13.81
db file scattered read                             88,242         150     1.68
db file parallel write                             33,490         143     1.61


From sar i see:
00:00:00    %usr    %sys    %wio   %idle   physc   %entc
09:40:00      79      18       1       2    1.43   143.1
09:45:00      81      16       1       1    1.68   167.6
09:50:00      75      23       1       1    1.20   119.6
09:55:00      84      14       1       1    0.95    94.8
10:00:00      84      14       1       1    0.99    99.1
10:05:00      82      16       1       1    1.10   109.9
10:10:00      82      16       1       1    1.03   102.6
10:15:00      81      17       1       1    2.17   217.5
10:20:00      81      17       1       1    2.53   253.4
10:25:00      80      17       1       1    2.42   242.4 


I've been reading several of your comments covering the "log file sync" and you very often state that there is two ways to solve this issue: 1) reduce commits 2) faster disks.

Yes - We do store redo-logs on SAN (probably using raid 5), and worse ... i know our DBAs are copying take backup of archivelogs every 15 minut and removing archivefiles from archive log destination. (archive files is stored on a separate mount point, from the archivelogs but that's no garantee not sharing the same spindels).

But ... If IO was a problem I would expect "log file paralell write" to show up as an issue to.

                                                                   Avg
                                                     Total Wait   wait    Waits
Event                               Waits   Timeouts   Time (s)   (ms)     /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync                     167,778      1,782      3,641     22      1.0
db file sequential read           633,943          0      3,525      6      3.7
db file scattered read             88,242          0        150      2      0.5
db file parallel write             33,490          0        143      4      0.2
buffer busy waits                   9,011          3         70      8      0.1
sbtwrite2                           6,060          0         64     11      0.0
latch free                          7,546      1,135         21      3      0.0
control file parallel write           979          0         17     18      0.0
async disk IO                       4,058          0         14      4      0.0
log file parallel write           159,139          0         14      0      0.9
SQL*Net break/reset to clien       11,682          0         14      1      0.1
enqueue                               782          0         11     14      0.0
log file sequential read              535          0          7     14      0.0
direct path read                    1,850          0          2      1      0.0
control file sequential read       18,847          0          2      0      0.1
log file switch completion             35          0          1     18      0.0
sbtclose2                               4          0          1    130      0.0
SQL*Net more data to client         8,795          0          0      0      0.1
sbtbackup                               4          0          0     69      0.0
...


When I look at my commit rate, we do about 90 commits every second. This sounds about right. And doing 2250 user calls every sec (not just SELECTs) this does not sounds like a commit issue. I know we are doing 2 commits each "transaction" (one for real data and one for log data. I've discussed this with the developers, and they are rethinking this design).

I've read that CPU starvation can increase the "log file sync" issue. But the OS resources claims that the virtual node have available CPU:

Node Name                                  : xxxx
Partition Name                             : xxxx
Partition Number                           : 11
Type                                       : Shared-SMT
Mode                                       : Uncapped
Entitled Capacity                          : 1.00
Partition Group-ID                         : 32779
Shared Pool ID                             : 0
Online Virtual CPUs                        : 4
Maximum Virtual CPUs                       : 5
Minimum Virtual CPUs                       : 1
Online Memory                              : 16384 MB
Maximum Memory                             : 24576 MB
Minimum Memory                             : 4096 MB
Variable Capacity Weight                   : 144
Minimum Capacity                           : 0.10
Maximum Capacity                           : 5.00
Capacity Increment                         : 0.01
Maximum Physical CPUs in system            : 8
Active Physical CPUs in system             : 8
Active CPUs in Pool                        : 8
Shared Physical CPUs in system             : 8
Maximum Capacity of Pool                   : 800
Entitled Capacity of Pool                  : 520
Unallocated Capacity                       : 0.00
Physical CPU Percentage                    : 25.00%
Unallocated Weight                         : 0


So ... could there be other issues than IO or excessive commits? Or is my options still to increase IO speed or reduce commit rates?

Please let me know if you need other information about our system.
Tom Kyte
June 09, 2010 - 8:33 am UTC

but another way to look at that is:

ops$tkyte%ORA10GR2> select 3641/167778 from dual;

3641/167778
-----------
 .021701296


So, if a user hits a button to run a transaction, approximately 0.02 seconds of their time would be spent waiting for the commit.


So, unless your application makes the user wait for hundreds of commits per button press - this might not be your "big problem". If the application is written correctly - your users wait ONCE per 'button press'. Only if the developers commit after each statement (which in general would be classified as a bug in their code since a transaction is generally larger than a single statement) would log file sync be the thing that is making your users wait.

Looking further down, we see lots of IO waits - users can wait for hundreds of these and they would add up.



I should add a third thing to that list

a) reduce commits
b) make disk faster
c) ensure cpu is not the bottleneck.


A log file sync is the time from application issuing commit to application getting "commit done". In between those two events there is a lot going on - physical IO is generally the big part of that, but if you are cpu constrained that would be a problem.

remember please, you are looking at a 30 minute continuous window. What if 90% of the work was done in the middle ten minutes - you would have been way over 100% cpu committed at that point. Remember when looking at two points in time - every thing you see is an "average", you lose the peaks and valleys. Perhaps look at ASH (active session history) data to get a more detailed view of what was going one in this continuum

and remember also that poor performance does not mean "we were waiting". If your queries take 5 seconds to run and wait for IO a total of 0.5 seconds - fixing the "wait" event won't do much for you - it would just drop to 4.5 seconds.

RE:Trying to solv "log file sync"

Lasse Jenssen, June 09, 2010 - 4:43 pm UTC

Thanks Tom.

I don't think this is a commit issue. I've talked to the developers and as I said earlier, they confirms that there is 2 commits for each "transaction". One commit for customer data, and another commit for log-data (we don't want to loose the log-data if changes is rolled back).

This is a 9.2 database, and I've read some articles written by Jonathan Lewis about reporting wrong wait times for the "log file parallel write" waits. I'll follow up on this bug. This might explain the differences i see in wait for "log file sync" and "log file parallel write". And writing logs to raid 5, possible sharing disks with archive logs (which are backed up every 15 minutes and removed from disk), yes this might be an issue. Is there any performance vies where i'll find the average write time to logs?

Another thing as you're mentioning is the CPU issue. I do suspect that we might have an CPU issue. We are running on a shared-SMT LPAR partition. I do not like the fact that it is shared. I feel i'm loosing control over my available resources.

As you say I'm aware of average issues that can arrise in a 30-minute statspack snap. Especially when the customers only report performance problems for about 5-10 minuttes. Then everything starts to spin around again. So I'll reduce the intervals down to 10 minutes.

The application has some good logging features, and these logs pretty much suggest that the wait time is towards the database. Of course this could also be network issue,or issues with WebSphere app server and the database drivers, but the reports from customer correspond very much with the wait time seen in my statspack reports. So I have a feeling that this is where the wait time is spent. We also have a locking (read: blocking) issue, that arises from time to time. We're re-writing some code using pesimistic locking with for update nowait to handle these.

Thank you for your time! I really appreciate it.
By the way - when are you coming to Norway?
Sure there must be dozens of Oracle resources waiting for a seminar :-)
Tom Kyte
June 10, 2010 - 12:22 pm UTC

even if log file parallel sync is wrong, it still doesn't explain how an average wait of 0.02 * 2 (twice in a transaction) could be the cause of a performance issue as observed by the end users....


I've been to Norway twice already :) No plans on coming back for a bit.

RE:Trying to solv "log file sync"

Lasse Jenssen, June 10, 2010 - 12:40 pm UTC

Is it possible that because of the 30 minutt snapshot, that avg wait time (0,022 sec) is dead wrong.

What do the 1700 timeouts at log file sync mean? What happens when a timeout occurs?
Tom Kyte
June 10, 2010 - 2:54 pm UTC

1700 out of 167,000 is pretty small

yes, it could be (due to averaging), which is why I would say you have to drill down a bit more.

... Perhaps look at ASH (active session history) data to get a more detailed view of what was going on in this continuum ...


RE:Trying to solv "log file sync"

Lasse Jenssen, June 10, 2010 - 3:37 pm UTC

I'm still not 100% sure i understand what the numbers in the "Timeouts" columns mean. Thank full if you can explain. What does a timeout mean? Is it the session that times out waiting? When and how does a timeout occure?

Regarding ASH ... i'm on 9i.
Tom Kyte
June 11, 2010 - 7:07 am UTC

it waited for the logfile sync, it got tired of waiting - timed out - and went to sleep, woke up and waited again.

so at least a handful of your commits did wait a while - but a small (about 1/4th of a percent) number of them.


Use sql_trace and tkprof on a real session then during the peak workload to see what it is waiting on.

Trying to solv e "log file sync

Rajeev, November 02, 2010 - 12:46 pm UTC

Hi Tom,

I have a question regarding log file sync waits. From the AWR reports, it is in the Top 5 Timed events. I really don't know whether this could create some issue

We do not have often commits during our batch run. What are the other areas I can look into, and please help me reading this report exactly.


Cache Sizes 
-------------         Begin End  
Buffer Cache:  3,136M 3,248M Std Block Size: 8K
Shared Pool Size: 1,344M 1,232M Log Buffer: 14,340K
    
Load Profile 
--------------   
    
         Per Second Per Transaction  
Redo size: 297,810.30 6,358.44  
Logical reads: 13,415.18 286.42  
Block changes: 1,084.18 23.15  
Physical reads: 516.73  11.03  
Physical writes:44.9  0.96  
User calls: 165.12  3.53  
Parses:  92.2  1.97  
Hard parses: 6.93  0.15  
Sorts:  38.71  0.83  
Logons:  0.15  0  
Executes:       767.95  16.4  
Transactions: 46.84   
  
Instance Efficiency Percentages (Target 100%) 

Buffer Nowait    %: 99.99 Redo NoWait %: 100.00 
Buffer Hit   %: 96.15 In-memory Sort %: 100.00 
Library Hit     %: 99.08 Soft Parse %: 92.49 
Execute to Parse %: 87.99 Latch Hit %: 99.85 
Parse CPU to Parse Elapsd %: 50.41 % Non-Parse CPU: 99.24 

Shared Pool Statistics 

       Begin End 
Memory Usage %:      46.97 49.31 
% SQL with executions>1:   82.76 95.03 
% Memory for SQL w/exec>1: 79.19 91.26 


Top 5 Timed Events 

       Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class 
log file sync     114,088     2,387   21             42.6          Commit 
CPU time         1,125       20.1   
db file scattered read      84,593      1,045   12             18.7                 User I/O 
db file sequential read     102,599     993     10             17.7                 User I/O 
LNS wait on SENDREQ         610         755     1,237      13.5     Network 

Wait Class

Wait Class     Waits %Time -outs Total Wait Time (s) Avg wait (ms) Waits /txn
Commit      114,088 3.33 2,387 21 0.68
User I/O     189,690 0 2,068 11 1.12
System I/O     87,782 0 1,260 14 0.52
Network         436,443 0 760 2 2.58
Concurrency     7,276 0 32 4 0.04
Configuration 960 0 14 14 0.01
Other         114,    925 0.05 2 0 0.68
Application     11    0 1 84 0



Tom Kyte
November 03, 2010 - 1:11 am UTC

... We do not have often commits during our batch run. ...

you apparently do - else you would not have log file sync waits. YOU might not be committing explicitly in your code, but you are likely committing implicitly (like 114,000 times).

I have a theory - you use lots of sequences. Every ten times you call nextval, by default - you commit. How about you alter your sequences to be cache 100000 - and try again. Could make a large difference as to the number of times you commit and how long you wait for it. Imagine - every 10 records processed - you are doing a ocmmit - turn that into every 100,000 records and it will be a world of difference.

(note: an alternate theory is "this is space management recursive sql doing the commits" - if you don't use sequences)

Log File Sync

Jim, November 02, 2010 - 2:12 pm UTC

Hi Tom

following up on your statement above regarding fixing log file sync issues:

a) reduce commits
b) make disk faster
c) ensure cpu is not the bottleneck

does that mean that adding redo log file groups or increasing log file sizes will have NO effect on fixing log file syncs ?

Thanks
Jim
Tom Kyte
November 03, 2010 - 1:24 am UTC

log file sync is the wait you experience while lgwr writes to disk. Unless you are getting "checkpoint not complete - cannot allocate new log" in your alert.ora and are seeing "logfile switch" waits - adding more files/space won't make 'writing to disk' any faster.

adding more redo - possible solution for checkpoint not complete

adding more redo - won't make writing redo to disk faster

RE:Trying to solv "log file sync"

Rajeev, November 03, 2010 - 8:59 am UTC

Thanks for replying.

Tom,

Sorry, i missed few things in my previous writeup

1. We have a data Guard setup (Max performance) on the environment. I don't know much about the setup.
2. We are using log4plsql framework, where in for each procedure/functions we have logged I/P and O/P parameters. As per me that may be doing autocommit, So just wondering that could be the issue.

As we are in production, without any solid reason and evidence no body will allow me to increase the cache. Please help me how we should isolate this issue.


Rajeev




Tom Kyte
November 03, 2010 - 9:59 am UTC

1) that could have a small impact on log file sync waits but should not

2) if you are in plsql, user commits are asynchronous. that would not be it in all probability.

test it - set up a controlled test whereby you call your log4plsql stuff and see how many log file syncs you see.

test the sequences as well - set up a controlled test.


Is your batch job written in plsql or is it written outside the database in a 3gl that calls log4plsql?

RE:Trying to solv "log file sync"

Rajeev, November 03, 2010 - 11:09 am UTC

....Is your batch job written in plsql or is it written outside the database in a 3gl that calls log4plsql? ..

its normal PL/SQL code.
Tom Kyte
November 04, 2010 - 3:04 am UTC

are you *SURE* those log file sync waits are your's then? Are you the only thing running on the machine


plsql code shouldn't really be affected by log file sync (unless you are doing distributed stuff). plsql uses asynchronous commits.

RE:Trying to solv "log file sync"

A reader, November 09, 2010 - 6:25 pm UTC

<code>
Top User Events
---------------
Event Event Class % Activity Avg Active Sessions
db file sequential read  User I/O  25.15  1.59
log file sync  Commit  20.31  1.28
CPU + Wait for CPU  CPU  17.74  1.12
log file sequential read  System I/O  3.52  0.22
db file scattered read  User I/O  2.42  0.15

Top Background Events
-----------------------

Event Event Class % Activity Avg Active Sessions
LNS wait on SENDREQ  Network  9.90  0.63
log file parallel write  System I/O  9.53  0.60
log file sequential read  System I/O  3.15  0.20
db file parallel write  System I/O  1.61  0.10

Top Event P1/P2/P3 Values
-------------------------

Event % Event P1 Value, P2 Value, P3 Value % Activity Parameter 1 Parameter 2 Parameter 3
db file sequential read  25.22  "1","27914","1"  0.07  file#  block#  blocks
log file sync  20.31  "172","3972441364","0"  0.07  buffer#  NOT DEFINED  NOT DEFINED
log file parallel write  9.53  "1","3","1"  2.20  files  blocks  requests
      "1","2","1"  1.61         
log file sequential read  6.67  "0","1","1"  0.15  log#  block#  blocks
db file scattered read  2.42  "8","2570125","32"  0.07  file#  block#  blocks



Sorry for the formatting. Its the only application running on the system currently. One more thing to add in supplement logs are also 'on' few tables (if this helps).

I have increased the cache sequence, as mentioned, but no help.

Is there some way we can identify who is doing so many commits, any trace file I can put on?





</code>
Tom Kyte
November 11, 2010 - 10:19 am UTC

... Sorry for the formatting. ...

well, it is unreadable.

do you have access to ASH data - active session history.

RE:Trying to solv "log file sync"

A reader, November 12, 2010 - 5:28 pm UTC

I have it but it is in html format. How should I paste it?
Tom Kyte
November 15, 2010 - 8:56 am UTC

you can query it yourself. Use the ASH data to find sessions that are committing frequently - see if they are yours, you should be able to trace it down from the ASH data itself. It gives you a picture of what the session is doing.

No need to paste it, just read it yourself.

log file sync on RAC

Periklis, January 29, 2011 - 2:58 am UTC

Hello Tom,
We do have the 'log file sync' problem in a 11.2.0.1 3-node RAC with a forth standby database and we are considering 2 solutions:
1. In order to accelerate I/O, can we move the redo-logs out of the clustered storage into the local disk of each node? Since the redo logs are specific to each node(thread), I assume that this can be done.

2. Since we cannot reduce the number of commits, we can reduce the locks by setting commit_write='IMMEDIATE,NOWAIT' to enable asynchronous writes for LGWR instead of the default 'IMMEDIATE,WAIT'. However, you said that plsql commit is asynchronous by default. Is that true?

What do you think?

Best regards,
Periklis

Tom Kyte
February 01, 2011 - 3:48 pm UTC

1) the redo needs to be universally available. If node-1 fails, node-2 would need to be able to access the online redo logs to perform crash recovery for node-1.

2) plsql does - you wouldn't see log file sync waits in your plsql IF your plsql is doing the commits like this:

create procedure p
as
begin
   for x in ( ... )
   loop
      ...
      commit; <<<= async
   end loop;
   <<<= we wait here for the last commit to actually commit...
end;


However, if your code is like this:



create procedure p
as
begin
   do something...
   commit; <<<= async
   <<<= we wait here for the last commit to actually commit...
end;


and you call that procedure over and over - you will wait.



I'm not a big fan of the nowait commit - it does mean that you will have some committed transactions - that the CLIENT APPLICATION outside of the database has been told "these are committed" that can become uncommitted in the event of a crash.

Caveat Emptor big time.

log file sync

Rajeshwaran, Jeyabal, May 12, 2011 - 2:09 pm UTC

rajesh@ORA10GR2> drop table t purge;

Table dropped.

Elapsed: 00:00:03.56
rajesh@ORA10GR2> create table t(x number,y char(2000));

Table created.

Elapsed: 00:00:00.00
rajesh@ORA10GR2> create index t_ind on t(x);

Index created.

Elapsed: 00:00:00.00
rajesh@ORA10GR2>
rajesh@ORA10GR2> alter session set timed_statistics=true;

Session altered.

Elapsed: 00:00:00.00
rajesh@ORA10GR2> alter session set events '10046 trace name context forever, level 12';

Session altered.

Elapsed: 00:00:00.09
rajesh@ORA10GR2>
rajesh@ORA10GR2>
rajesh@ORA10GR2> begin
  2     for i in 1..100000
  3     loop
  4             insert into t(x,y) values(i,'*');
  5             commit work write immediate wait;  --- This is default COMMIT;
  6     end loop;
  7
  8     for i in 1..100000
  9     loop
 10             insert into t(x,y) values(i,'*');
 11             commit work write batch nowait;
 12     end loop;
 13  end;
 14  /

PL/SQL procedure successfully completed.

Elapsed: 00:01:53.50
rajesh@ORA10GR2>



and Tkprof shows me this.


--------------------------------------------------------------------------------

COMMIT WORK WRITE IMMEDIATE WAIT


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 100000      8.71      43.74          0          0     100000           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   100001      8.71      43.74          0          0     100000           0

Misses in library cache during parse: 0
Parsing user id: 61     (recursive depth: 1)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  log file sync                              100000        0.17         33.98
--------------------------------------------------------------------------------


--------------------------------------------------------------------------------

COMMIT WORK WRITE BATCH NOWAIT


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute 100000      4.40       4.11          0          0     100000           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   100001      4.40       4.11          0          0     100000           0

Misses in library cache during parse: 0
Parsing user id: 61     (recursive depth: 1)
--------------------------------------------------------------------------------


Tom:

http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_4010.htm#i2060233

<quote> The BATCH parameter causes the redo to be buffered in Redo log </quote>

Questions:

1) what is Redo log represents Is that "Redo log buffer" or "online redo logs"?

2)What will happen if Redo logs are buffered (not written to disk, assuming that "Redo logs" represents "redo log buffer" and not "online redo log") and Instance crashed / failed. Will the modified data blocks saved or rollback?


Tom Kyte
May 12, 2011 - 3:22 pm UTC


rajesh@ORA10GR2> begin
2 for i in 1..100000
3 loop
4 insert into t(x,y) values(i,'*');
5 commit work write immediate wait; --- This is default COMMIT;
6 end loop;


Actually, that is NOT the default commit in plsql, the default commit is and has been to NOWAIT in plsql (except for the last one before returning to the client)

http://asktom.oracle.com/pls/asktom/asktom.search?p_string=%22commit+time+optimization%22


1) the complete quote is:

The BATCH parameter causes the redo to be buffered to the redo log. No I/O is initiated.


It is buffering in memory - no IO is done right then.

2) if the system crashes whilst the data is in memory and not on disk in the online redo log, upon recovery - it will be as if that transaction never existed.

Hence, the draw back of "nowait" - your application might get a success from a commit - but find out later that the transaction never really happened.

Caveat Emptor - use this with extreme care.

log file sync

Rajeshwaran, Jeyabal, May 13, 2011 - 12:11 am UTC

Tom:

the default commit is and has been to NOWAIT in plsql

How do you say this? Is not able to get that in the link you provided. Is that mentioned in product documentation about this (default commit for plsql) ?

Tom Kyte
May 13, 2011 - 12:51 pm UTC

<quote src = Expert Oracle Database Architecture>


COMMITS in a Non-Distributed PL/SQL Block


Since PL/SQL was first introduced in version 6 of Oracle, it has been transparently using an asynchronous commit. That approach has worked because all PL/SQL is like a batch program in a way—the end user does not know the outcome of the procedure until it is completely finished. That’s also why this asynchronous commit is used only in non-distributed PL/SQL blocks of code; if we involve more than one database, then there are two things—two databases—relying on the commit being durable. When two databases are relying on the commit being durable, we have to utilize synchronous protocols or a change might be committed in one database but not the other.

Note: Of course, pipelined PL/SQL functions deviate from “normal” PL/SQL functions. In normal PL/SQL functions, the outcome is not known until the end of the stored procedure call. Pipelined functions in general are able to return data to a client long before they complete (they return “chunks” of data to the client, a bit at a time). But since pipelined functions are called from SELECT statements and would not be committing anyway, they do not come into play in this discussion.


Therefore, PL/SQL was developed to utilize an asynchronous commit, allowing the COMMIT statement in PL/SQL to not have to wait for the physical I/O to complete (avoiding the “log file sync” wait). That does not mean that you can’t rely on a PL/SQL routine that commits and returns control to your application to not be durable with respect to its changes—PL/SQL will wait for the redo it generated to be written to disk before returning to the client application—but it will only wait once, right before it returns.


Note The following example demonstrates a bad practice—one that I call “slow-by-slow processing” or “row-by-row processing,” as row-by-row is synonymous with slow-by-slow in a relational database. It is meant just to illustrate how PL/SQL processes a commit statement.


Consider this PL/SQL procedure:
ops$tkyte%ORA11GR2> create table t
  2  as
  3  select *
  4    from all_objects
  5   where 1=0
  6  /
Table created.

ops$tkyte%ORA11GR2> create or replace procedure p
  2  as
  3  begin
  4      for x in ( select * from all_objects )
  5      loop
  6          insert into t values X;
  7          commit;
  8      end loop;
  9  end;
 10  /

Procedure created.


That PL/SQL code reads a record at a time from ALL_OBJECTS, inserts the record into table T and commits each record as it is inserted. Logically, that code is the same as:

ops$tkyte%ORA11GR2> create or replace procedure p
  2  as
  3  begin
  4      for x in ( select * from all_objects )
  5      loop
  6          insert into t values X;
  7          commit write NOWAIT;
  8      end loop;
  9
 10      -- make internal call here to ensure
 11      -- redo was written by LGWR
 12  end;
 13  /

Procedure created.


So, the commits performed in the routine are done with WRITE NOWAIT and before the PL/SQL block of code returns to the client application, PL/SQL makes sure that the last bit of redo it generated was safely recorded to disk—making the PL/SQL block of code and its changes durable.

In Chapter 11, “Indexes,” we’ll see the salient effects of this feature of PL/SQL when measuring the performance of reverse key indexes. If you’d like to see how PL/SQL performs in the manner described above, skip there for a moment to review the reverse key index benchmark.
</quote>

cpu starvation

Pedro, May 15, 2011 - 1:53 pm UTC

Hi Tom

In one of your previous posts you said this:

"A log file sync is the time from application issuing commit to application getting "commit done". In between those two events there is a lot going on - physical IO is generally the big part of that, but if you are cpu constrained that would be a problem."

Does this apply to db file sequential read as well?

Thanks a lot

Tom Kyte
May 18, 2011 - 3:03 am UTC

it applies to pretty much everything - to do anything, we need CPU at some point

Log file sync wait for SELECT statement

Peter, July 12, 2011 - 9:35 am UTC

Tom,

One of the queries wait for more than 60 minutes on 'Log file sync' wait event. I'm surprised why the log file sync wait is appearing for select statement. Oracle version is 10.2.0.4.0

Please clarify.

Thanks
Peter


Tom Kyte
July 13, 2011 - 1:54 pm UTC

show me how you figured this out.
show me the sql as well

it can definitely happen due to recursive sql - like if you select s.nextval from big_table (although 60 minutes is large)

but show us something please??

cause of log file sync

john, March 16, 2014 - 10:06 pm UTC

Hello Tom,

In our production database, we observe very high log file sync wait events(Avg wt(ms): 1500 for 30min interval). User commits are normal during this time(compared to previous awr snapshots). Also, log file parallel write is very less(Avg wt(ms): 2 for 30min interval). Top 5 timed foreground wait events are "logfilesync(60%)/cpu(15%)/db file seq read(1%)/direct path read(5%)/log buffer space(1%)". We do have standby database. In the background wait events section, ARCH wait on SENDREQ is the top wait event but Avg wait time(ms) is just 17(%bg time is 7). Not sure what could be causing this issue. Please throw some light on this. Appreciate your help