re:"Database alert log switches", version 10.2.0.4
Joe, December 10, 2008 - 12:48 pm UTC
that is not based on any science I know of. can you refer me to the note(s)? I'd like to see their reasoning.
Reply:
I apologize for my wording; it may have been off base. The following came from an Oracle analysts from an SR I opened about redo logs.
"The best way to set the redo log size is by trial and error, meaning depending on your alert log showing the interval of
the switch time taken between switching, the idle is to have 20 minutes time bet
ween every time the redo log is switched as this is proven to give the optimal performance as more than that the size of the redo logs will be so high and the switch operation may take too long, and less than that definitely this will affect our performance and the checkpoint will keep on coming faster and faster.
As to fix this matter and size the redo logs correctly you will need to use the tri
al and error type of practice as to resize all redologs to be at the same size a
nd to have two redo log groups and see how long does it take to switch and adjus
t the size of the redo to try to reach the optimal 20 minutes.
We will wait for your feedback."
The oracle note that read was note 147468.1 Checkpoint Tuning and Troubleshooting guide where I am copying and quoting the following from this note.
"4. Redo logs and Checkpoint
A good rule of thumb is to switch logs at most every twenty minutes.
Having your log files too small can increase checkpoint activity and reduce performance.
Oracle recommends the user to set all online log files to be the same size,
and have at least two log groups per thread. The alert log is a valuable tool for
monitoring the rate that log switches occur, and subsequently, checkpoints
occur.
The following is an example of quick log switches
from the alert log:
Fri May 16 17:15:43 1997
Thread 1 advanced to log sequence 1272
Current log# 3 seq# 1272 mem# 0: /prod1/oradata/logs/redologs03.log
Thread 1 advanced to log sequence 1273
Current log# 1 seq# 1273 mem# 0: /prod1/oradata/logs/redologs01.log
Fri May 16 17:17:25 1997
Thread 1 advanced to log sequence 1274
Current log# 2 seq# 1274 mem# 0: /prod1/oradata/logs/redologs02.log
Thread 1 advanced to log sequence 1275
Current log# 3 seq# 1275 mem# 0: /prod1/oradata/logs/redologs03.log
Fri May 16 17:20:51 1997
Thread 1 advanced to log sequence 1276
Current log# 1 seq# 1276 mem# 0: /prod1/oradata/logs/redologs01.log
If redo logs switch every 3 minutes, you will see performance degradation.
This indicates the redo logs are not sized large enough to efficiently handle
the transaction load.
size of the redolog files. "
"The switch itself won't 'degrade' performance"
Don't the above contradict your statement? Or am I misreading what is being said in the note compared to your statement?
"You either allocate more logs of the same size you have (so we have longer to complete that checkpoint) or you can make your log files larger."
Is there a preference? For instance, would having 18 log files at 25 MB be worse then just having 6 at 75 MB a piece?
December 10, 2008 - 2:04 pm UTC
well, you do not need trial and error - dbconsole will guide you.
... If redo logs switch every 3 minutes, you will see performance degradation.
....
I'd like to see them back that up with something - anything.... Yes it would contradict me, no it is not true.
It used to be in days gone by - a log switch was the main cause of checkpoint firing. In days gone by. Having more small log files was the way to reduce your mean time to repair (MTTR) from an instance crash. If you used small files and switched frequently, we checkpointed frequently, and the number of dirty blocks requiring recovery in the cache was small - hence instance crash recovery would be fast. If you had really large log files - then the number of dirty blocks in the cache could be huge and the MTTR a failed instance could be large.
Now, in the 21st century we control that with fast_start_mttr_target, and the log switch isn't the thing that mostly causes checkpointing (it is more or less continuous really)
So, if you are using fast_start_mttr_target - size your logs the way you want to size them pretty much, we'll use the fast_start_mttr_target to keep the cache "clean" instead of using log switches to do the same.
There are other considerations - if you are using data guard in the mode that uses archives, if you are using streams with downstream capture - then you might want your logs to switch based on TIME (archive_lag_target controls that) and again, size wouldn't count - you'd just want a log switch to happen AT LEAST every N minutes in this case.
I learn something new every time...
A reader, December 10, 2008 - 3:13 pm UTC
Tom,
Till today, I also believed that log switches should take place every 20 minutes or so.
Please correct me if I am wrong:
a) I am not seeing "checkpoint not complete" in alert log AND
b) I am not using Dataguard AND
c) I am not using Streams or any other special features
then it really does not matter how fast the log switches are happening or what the size of redo log file is. Correct?
Thanks...
December 10, 2008 - 4:11 pm UTC
well - it will affect how clean the buffer cache is - and that could have an impact (it could make things faster, slower, or stay the same)
Think about what happens during a log switch:
a) arch is invoked to archive the log
b) a checkpoint of the dirty blocks protected by that logfile is initiated.
Now (b) might be offset by the "fast start" stuff - meaning, the buffer cache might already well be "very very clean" and the checkpoint initiated isn't a big deal (this is mostly true these days...)
However, think about (a). If you have a busy system... and you wait 5 hours to fill a 20gb log file.... and then arch just kicks in and a burst of IO reading and writing it happens.... so you have "spikey" performance
versus
If you have a busy system... And you wait 15 minutes to fill a 1gb log file... and then arch is sort of constantly doing its thing over time... Then you 'flat' performance - arch is always sort of doing it's thing
Now, which is better?
Answer: it depends. What if you have a batch job that runs for 4.8 hours and then is done and the system becomes more or less idle for a long time (or just does queries - you loaded a warehouse). Maybe 5 hours is better as the batch might take longer than 4.8 hours if arch was busy doing it's thing.
What if you have a 24x7 continuous stream of transactions non-stop. Maybe 15 minutes is better as it causes no 'spikes' really.
Visualize what would happen in each case - if you know what arch does, what lgwr does, what a checkpoint would mean, what causes dirty blocks to get cleaned out - you can visualize what the probably ramifications of doing "X" is.
Sometimes having constant log switches every few minutes is absolutely good and correct.
Sometimes having them switch very infrequently is absolutely good and correct.
"it depends"