Skip to Main Content
  • Questions
  • Average wait time for Log File Sync and Log File Parallel Write wait events increasing daily

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Ab.

Asked: September 07, 2018 - 2:48 pm UTC

Last updated: September 13, 2018 - 12:45 am UTC

Version: 12.1.0.2

Viewed 1000+ times

You Asked

Hi Team,

First of all, thanks for all the awesome work you are doing!

We are facing a issue with Log File Sync wait events. Pasted below is a detailed write-up. It would be great if you can help us by sharing your views.

Thanks again for all your help!

--------------------------------------------------------------------------------------------------------------------------------

[Issue]

Average wait time for “Log File Sync” and “Log File Parallel Write” wait events is increasing daily on our DB instance. We experience application slowness with increase in the wait times. Wait time for these events is ~20ms just after DB start-up (which itself is high) and within 7 – 8 days it climbs up to 250+ms. Wait time again returns to ~20 ms after DB server re-boot.

Our understanding is that we are facing a redo file I/O issue as ratio of (Log File Sync average wait time)/ (Log File Parallel Write average wait time) is in rage from 1.1 to 1.4

DB is hosted on a Windows server.

[Things Checked]

1. We analyzed following factors and they are almost similar between any two days:

A. Count of User Commits (~43000/hour), Rollbacks(~4000/hour), User Calls(~750000/hour), Transactions(13-14/Second), Waits on Log file sync event (9000/hour).
B. Size of Redo (~50,000 bytes/second), Logical reads (8000 blocks/second), Physical reads(~7 blocks/second), Physical writes(~10-15 blocks/second)
C. Number of log file switches: ~ 3 /hour.

2. DB is not CPU bound and system is not under load: DB time is ~22 mins for 60 minutes of elapsed time on a machine having 30+ CPUs with 20+ cores. We are using flash drives to store redo log files.

3. Top foreground events shows log file sync (~30% DB time, 8,669 waits), DB CPU (~25-40% DB time), control file sequential read (~2-3 % DB time, 2,887 waits), buffer busy waits (~1-2 % DB time, 24,169 waits).

We have already explored and implemented avenues to reduce Commits/Rollbacks.

[Questions]

1. Anything else that we can check before concluding that we have a redo file I/O issue? Do we need to consider any other metric/data? Any other obvious thing that we are missing?
2. Any pointers on what could be the reason behind the constant increase in average wait time would be very helpful.
3. Would it be a good idea to monitor I/O on disk hosting redo files? Do we have any utility that can help us to co-relate disk I/O with processes running on windows server?
4. Can you elaborate on how can we co-relate average wait time of the above mentioned events with I/O capacity of our disks?

and Connor said...

1)

Man.. 13-15 transactions per second, and 50K redo per second is *nothing*. Here's a demo from my PC on a stock standard consumer hard drive.

SQL> create table t ( x char(100));

Table created.

SQL> insert into t values (1);

1 row created.

SQL> commit;

Commit complete.

SQL>
SQL> conn scott/tiger   -- fresh session
Connected.

SQL> select
  2    s.name, st.value
  3  from v$statname s, v$mystat st
  4  where st.statistic# = s.statistic#
  5  and s.name = 'redo size';

NAME                                                    VALUE
-------------------------------------------------- ----------
redo size                                                 668

SQL>
SQL> set timing on
SQL> begin
  2  for i in 1 .. 100000 loop
  3    update t set x = i;
  4    commit write immediate wait;
  5  end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:15.10
SQL> set timing off
SQL>
SQL> select
  2    s.name, st.value
  3  from v$statname s, v$mystat st
  4  where st.statistic# = s.statistic#
  5  and s.name = 'redo size';

NAME                                                    VALUE
-------------------------------------------------- ----------
redo size                                            85986460


So I can do 6MB per second of redo when doing lots of small transactions (~6000/sec)

So something is *definitely* not right there with your storage setup.

2)

Not really, but my guess would be perhaps some OS level issue, or perhaps even hardware. There's a lot of technology in flash drives nowadays to do things like wear levelling, and garbage collection - it *could* be related to that, but I'm just hypothesizing here. The obvious course of action would be to try the redo logs on a different disk (even temporarily)

3) Yes.

Orion is a simple tool to emulate Oracle I/O. Here's a nice write up

https://oracle-base.com/articles/misc/measuring-storage-performance-for-oracle-systems

4) Capacity and throughput are different things, although (most) disk technologies typically run slower as they get more full.


So in a nutshell, plan of attack:

- try a different disk if possible, observe differences
- try Orion, see if it can be replicated there, which would mean its not the database per se.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

voodoo

Racer I., September 10, 2018 - 7:08 am UTC

Hi,

Absolutely no real knowledge here but when browsing on that topic one occasionally finds recommendations to "nice" the logwriter process i.e. increase the prio. Don't know what the windows equivalent would be but maybe its something you can try. However rarely is this mentioned with increasing delays.

Reason seems to be that the scheduling can interrupt the logwriter too often so it can't get enough done even if the system as a whole is not cpu bound.

regards,
Connor McDonald
September 13, 2018 - 12:45 am UTC

Agree, but whilst I've also done this in the past (or used a database parameter to achieve the same), I always recommend getting the nod of approval from Support before heading down this path

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database