Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Koh.

Asked: February 07, 2019 - 6:58 am UTC

Last updated: February 14, 2019 - 4:43 am UTC

Version: 11.2.0.4

Viewed 1000+ times

You Asked

Hi all,

I am using Oracle 11g Active Dataguard in maximum performance mode. I notice 2 general pattern of log recovery in the database alert log.

=====================

Pattern1
RFS[7]: Selected log 11 for thread 2 sequence xxxxx
Archived Log entry xxxxx added for thread 2 sequence xxxxx
Media Recovery Log /frapool/STBY/archivelog/2019_02_07/o1_mf_2_xxxxx_g5pv57mj_.arc


Pattern2
RFS[3]: Selected log 12 for thread 2 sequence yyyyy
Media Recovery Waiting for thread 2 sequence yyyyy (in transit)
Recovery of Online Redo Log: Thread 2 Group 12 Seq yyyyy Reading mem 0
Archived Log entry yyyyy added for thread 2 sequence yyyyy


======================

What is the difference between the above 2 patterns ?

Pattern2 seems to recover from Standby Redolog while Pattern1 seems to perform recovery from archivelog.

However, if Pattern1 is recovering from archivelog, why does RFS still allocated a Standby Redolog for it ?
Shouldn't RFS just copy the archivelog from primary directly as an archivelog in standby to be apply by MRP later on ?

What would be the circumstances/scenario that leads to Pattern1 ?

Thank you!

and Connor said...

If there are standby redo logs, then my understanding is that we will always "funnel" redo through that, even if we ultimately do not use it for real time apply. That flow can be seen from this image from the docs:

DATAGUARD_APPLY

RFS => SRL => ARCn

The difference in the patterns reflects whether we used real time apply or fell back to using the archived redo. The most common reason for that is the standby getting "hammered" and it cannot keep up with real time apply.

Rating

  (4 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Alan, February 11, 2019 - 1:04 pm UTC

Hi Connor,

Thanks for your reply.

So can i confirm that my observations are correct as in
- Pattern1 is recovery done using archived log
- Pattern2 is recovery done using SRL real time apply

I realize that everytime i do a logswitch in Primary, the same is being done in the standby.
If there are a surge in prod redo log switches; What will happen if the redo changes are still being send over to the RFS/Standby RL and a log switch on the current log happen ?

e.g.
T1) current ORL#1 redo entries are being sent over to SRL#1

T2) ORL#1 full, log switch happen. ORL#2 is the current Redolog now.
But redo entries belonging to ORL#1 are stll being send to to SRL#1 over the network


T2) if SRL#1 do a logswitch as well at this time, it will not have all the redo entries from ORL#1 - what will happen then ?

Is that what you mean by Standby being "hammered" ?

Connor McDonald
February 12, 2019 - 1:19 am UTC

Is that what you mean by Standby being "hammered" ?


No. I meant that the standby tries to apply redo in the same timeframe as the primary. If your primary is doing (say) 20MB of redo per second, then the standby will attempt to do the same (real time apply). But often a standby node is not configured as highly spec'd as a primary node. If 20MB per second is too much for the standby to cope with, it will *not* do real time apply, but will fall back to archive log apply. It might seem that we're still applying the same amount of redo, but real time apply can be thought of as "commit by commit" so there's a fair big of synchronization required. Applying from archive log is more a "bulk operation". Even so, if you have so much redo generating that even archive log application falls behind - then you get a "lag" in your standby.

I realize that everytime i do a logswitch in Primary, the same is being done in the standby.


Yes, but that does *not* mean we are 1-to-1 with redo and standby redo logs. Let's say the db is running quietly. Redo #1 might be transmitted to SRL #1, but when we switch to Redo #1, we can *still* write to SRL #1. Typically we'll only head into SRL #2 or #3 etc as load requires it.

A reader, February 12, 2019 - 5:11 pm UTC

Hi Conner,

Thanks for your reply.
My apologies if i am not understanding it correctly..

What i meant is that

a) redo logs must be apply sequentially ( we can't skip sequence#123 and apply sequence#124 1st )
b) real time apply vs arch apply are still redo apply

What is stopping the standby from doing real time apply using/from the SRL even if there is a high redo generation from the primary ?
I mean if there is a 20MB redo/second at the production, while the SRL is only receiving/applying 10MB per second, what wrong with that ?
(unless what you mean is that the current SRL that is receiving/applying redos needs/have to be switch out ?)

====================

Also are you able to shed some light on what will happen if there are a surge in prod redo log switches -
What will happen if the current redo entries are still being send over to the RFS/Standby RL and a logswitch happen on the primary ?
e.g.
ORL#1 redo entries ---> in the midst of sending over to SRL#1
ORL#1 log switch - what will happen to SRL#1 if the entries are not fully send over completely ? Will SRL#1 still continues to receive the redo entries if ORL#1 is not being overwritten yet ?

Connor McDonald
February 14, 2019 - 4:43 am UTC

I mean if there is a 20MB redo/second at the production, while the SRL is only receiving/applying 10MB per second, what wrong with that ?

It means that eventually you could exhaust available SRL. By falling back to sourcing the redo from archived redo, then we're never going to "run out", because the archived redo logs never get reused.

What will happen if the current redo entries are still being send over to the RFS/Standby RL and a logswitch happen on the primary ?


In max protection mode, then this wont happen, because you can't commit on the primary unless you've 'committed' on the standy

In max performance/avail mode, my understanding is that the moment we logswitch, we'll change to a new SRL, and archive the old one. Incoming redo will just be directed into the new SRL.

A reader, February 16, 2019 - 1:58 pm UTC

Hi Connor,

Thank you for your continous help and reply.

To avoid ambiguility, the protection mode being use is best performance.

It means that eventually you could exhaust available SRL
When does oracle dataguard decides it could reuse an SRL ?

When does it decides that the MRP can no longer catch up with the redos generated and decided to switch to archive log ?

=========================

In max performance/avail mode, my understanding is that the moment we logswitch, we'll change to a new SRL, and archive the old one. Incoming redo will just be directed into the new SRL.

Do you mean redo entries belonging to a particular sequence# can be split across 2 different SRLs and eventually different archivelog ? I have never seen a sequence# split across 2 archivelogs in standby.

What i mean is Primary Redolog#1 with seq#1234 has (for e.g. 200 redo entries ), and the network between the primary and stnadby is extremely slow. These redo entries are being received into SRL#1

While sending redo entry #124 (out of 200), there is a logswtich on Primary Redolog#1 , what will happen to redo entry #125, and the current SRL ?

if the current SRL is switch out also, for the same sequence#1234, redo entry #1 to #124 is in SRL#1 and redo entries #125 to #200 is in SRL#2 ?

=====================================

Just in case you are thinking why i am trying to understand the internal mechanism ->
Reason for asking these is because i am encountering a phenomenal whereby
i) after a surge in logswitches
ii) followed by low database activity,
iii) even though all required archivelogs are already applied, standby is still reflecting a high A-LAG and 0 T-LAG.

The A-LAG is the duration of the current ORL in use and the dataguard does not seems to be doing REAL time apply until the current ORL switches.

This is refleted as Pattern1 above.

So if the database has very low activity and the ORL first and next time is 3 hours, my A-LAG is reflected as 3 hours.

( support is telling me the issue is cause by surge in log switches - but standby has already applied all logs, and should be doing real time apply for the current redo log)

A reader, February 16, 2019 - 2:01 pm UTC

Hi Connor,

Thank you for your continous help and reply.

To avoid ambiguility, the protection mode being use is best performance.

It means that eventually you could exhaust available SRL


- When does oracle dataguard decides it could reuse an SRL ?
- When does it decides that the MRP can no longer catch up with the redos generated and decided to switch to archive log ?

=========================

In max performance/avail mode, my understanding is that the moment we logswitch, we'll change to a new SRL, and archive the old one. Incoming redo will just be directed into the new SRL. 


- Do you mean redo entries belonging to a particular sequence# can be split across 2 different SRLs and eventually different archivelog ? I have never seen a sequence# split across 2 archivelogs in standby.

What i mean is Primary Redolog#1 with seq#1234 has (for e.g. 200 redo entries ), and the network between the primary and stnadby is extremely slow. These redo entries are being received into SRL#1

While sending redo entry #124 (out of 200), there is a logswtich on Primary Redolog#1 , what will happen to redo entry #125, and the current SRL ?

if the current SRL is switch out also, for the same sequence#1234, redo entry #1 to #124 is in SRL#1 and redo entries #125 to #200 is in SRL#2 ?

=====================================

Just in case you are thinking why i am trying to understand the internal mechanisms ->

Reason for asking these is because i am encountering a phenomenal whereby

i) after a surge in logswitches
ii) followed by immediate low database activity,
iii) even though all required archivelogs are already applied, standby is still reflecting a high A-LAG and 0 T-LAG.

The A-LAG is the duration (first_time, to next_time) of the current ORL in use and the dataguard does not seems to be doing REAL time apply until the current ORL switches.

This is refleted as Pattern1 above.

So if the database has very low activity and the current ORL first and next_time is 3 hours, my A-LAG is reflected as 3 hours.

( support is telling me the issue is cause by surge in log switches - but my stand is that standby has already applied all logs, and should be doing real time apply for the current redo log)

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database