Excellent explanation
Andre, January 12, 2011 - 3:57 pm UTC
Yep. I had not realized the implications of recovery in that situation. A very thoughtful answer, indeed. Thanks.
=)
continued
Andre, January 12, 2011 - 4:34 pm UTC
Can I apply the same reasoning you used for DB_LOST_WRITE_PROTECT also for DB_BLOCK_CHECKING, DB_BLOCK_CHECKSUM and DB_ULTRA_SAFE ? (the latter is a combination of the previous three in 11g).
All of these parameters will -- in specific ways -- proactively monitor block write errors. So, as far as I gather, media recovery will be able to correct the blocks in the same fashion you described, will it not ?
January 13, 2011 - 7:11 am UTC
Yes, that is what they are about. Proactive monitor looking for corruption at the point of introduction - rather than down the road, later on.
As long as you have a backup and your archives that pre-date the corruption - we can recover from it. That is the rub however, if you don't notice the corruption for six months (no one happened to hit that block until year end for example, running the "close the books" process) - you might not be able to recover as you might not have a six month old backup and all of the archives. The sooner you detect corruption - the higher the probability you have everything you need.
Andre, January 13, 2011 - 7:56 am UTC
Great.
Do you suggest any configuration you like for these parameters ?
My guess is setting DB_ULTRA_SAFE to DATA_ONLY would be ok.
January 13, 2011 - 10:58 am UTC
it depends on your needs - else there would just be "the one setting"
You need to understand what they do for you - and what you need as far as data protection and how much you are willing to give extra resource wise to accomplish that.
Any of these usefull in the non data-guard environment?
A reader, January 13, 2011 - 12:34 pm UTC
Any of these usefull in the non data-guard environment?
January 13, 2011 - 2:34 pm UTC
db lost write protect is for data guard - the rest are for standalone database configurations.
very good
A reader, January 15, 2011 - 4:48 am UTC
thank you tom,i will study harder on oracle
Impact on redo log?
Kamal, August 24, 2011 - 11:01 am UTC
August 30, 2011 - 3:50 pm UTC
define huge. It is all relative. You need what you need. If your goal is to ensure you detect corruption at the point of introduction - this is a penalty you'll gladly pay.
like you do for redo generation in the first place.
and archiving.
and redo shipping.
You size for it.
true impact on redo log
Andrea Monti, March 01, 2012 - 5:06 am UTC
Hi Tom
what "the instance logs buffer cache reads for read-write tablespaces in the redo log" means?
When the primary instance does a single physical read what is going to happen:
1) the full block (header + data) is written in the redolog?
2) the full header of the block is written in the redolog?
3) only the block's SCN is written in the redolog?
What about multiblock reads ?
If option 3) is the right one, could we estimate the impact on redolog with a formula like
extra_amount_of_redolog_per_day =
( size_of_block_id + size_of_SCN ) * physical_reads_per_day
Thanks,
Andrea
March 01, 2012 - 2:09 pm UTC
I asked Larry Carpenter - author of Oracle Data Guard 11g Handbook (Oracle Press)
http://amzn.to/zE82fh and he said:
<quote>
... it averages about 50 bytes per read block. And those are packed into the redo blocks where there is room, they are not 512 redo blocks all by themselves. So rule of thumb to me would be a 1,000,000 blocks reads would generate around 47MB more redo with it turned on rather than off. ...
</quote>
DB_LOST_WRITE_PROTECT & Standby DB ?
Ram Dittakavi, September 08, 2017 - 12:07 pm UTC
Hi TOM,
I'm a bit confused about the whole of this.
Couple of questions:
1) What significance this parameter value has in a data guard environment? I don't understand the role of standby database here.
2) "When the parameter is set to TYPICAL on the primary database, the instance logs buffer cache reads for read-write tablespaces in the redo log, which is necessary for detection of lost writes."
What does "logs buffer cache reads" mean here and how and where will it help detection of lost writes?
September 19, 2017 - 2:37 am UTC
MOS note 1302539.1 has a good write up, but to explain it briefly:
A- I put "row 1" into a block on the primary, it goes to the standby (via redo) "scn1"
B- I put "row 2" into a block on the primary, it goes to the standby (via redo) "change scn1=>scn2"
C- the *write* of row 2 to disk on the primary is lost, so the block still says "scn1"
D- I put "row 3" into a block on the primary, it goes to the standby (via redo) "change scn1=>scn3"
The standby at that point gets very confused, because "change scn1=>scn3" does align with its view of the block (which is at scn2). It will throw an ora-600 because the primary is basically corrupt.
But notice that "C", the primary is still working just fine. It's *data* is corrupted (we lost a write) but you can query it with no error - you are just getting the wrong results.
But once you hit "D", at that point you are pretty much dead in the water. You have a corrupt primary. And notice when the standby detected it - when we *next* tried to update the block. That could be hours, weeks or months after the lost write occurred. You would probably have to failover to the standby and accept the loss of data.
With "DB_LOST_WRITE_PROTECT" enabled, what we do is that when we *read* data on the primary (not just change it), we add some information to the redo logs reflecting that fact. This can be transmitted to the standby, and so we can detect the lost write potentially much earlier.
You still have to then resolve the issue - MOS note 1265884.1 is the starting place for that.
Ram Dittakavi, September 13, 2017 - 5:32 am UTC
Can i get a reply please?
September 13, 2017 - 7:16 am UTC
Well... you *were* going to get a reply.
Ram, October 04, 2017 - 7:49 am UTC
Thanks a lot, Connor.
Thank you
Rakesh, April 04, 2020 - 5:25 pm UTC
Thank you Connor for the detailed explanation , but one question I keep getting with db_lost_write_protect is , even if we set db_lost_write_protect , until we read the corrupted object block again we are at risk ?
suppose we haven't queried primary block after "c" for few days, we will be at data loss even then ? since other DB objects could have modified meanwhile and now to recover the primary block, we can restore from standby without having to failover?
A- I put "row 1" into a block on the primary, it goes to the standby (via redo) "scn1"
B- I put "row 2" into a block on the primary, it goes to the standby (via redo) "change scn1=>scn2"
C- the *write* of row 2 to disk on the primary is lost, so the block still says "scn1"
D- I put "row 3" into a block on the primary, it goes to the standby (via redo) "change scn1=>scn3"
Regards
Rakesh
April 06, 2020 - 12:31 am UTC
That's a bit like the "if a tree falls in the forest and no-one is there...does it make sound?" metaphor.
If you've changed a block, then I would assume it is *likely* you will read it reasonably soon (for example, the next backup). However, if for some reason you *never* read that block again until (say) months later, then yes, *that* would be the time you would detect the block corruption...and you'd be in the scenario covered by the MOS note, ie, looking at various options to recover that data.
In the worst case, you'd be looking at Support's assistance to manually fix the blocks...never a pleasant excercise.