Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Andre.

Asked: January 12, 2011 - 3:11 pm UTC

Last updated: April 06, 2020 - 12:31 am UTC

Version: 10.2.0

Viewed 10K+ times! This question is

You Asked

Tom,

Starting at 10g, there is a new initialization parameter: DB_LOST_WRITE_PROTECT (which can be set to NONE, TYPICAL and FULL).

Reading a few articles on that, I understand it protects against eventual asyncronous WRITE errors.

From the 11g documentation:

DB_LOST_WRITE_PROTECT enables or disables lost write detection. A data block lost write occurs when an I/O subsystem acknowledges the completion of the block write, while in fact the write did not occur in the persistent storage.


I was taken aback, since Oracle so strongly guarantees against lost data (especialy commited data through redo log direct writes).

Could you please explain how that checking works ?

Thank you.

and Tom said...

Well, first and foremost, the lost write would be a bug in the OS level code - something outside of our control. Basically - the OS would have told us "write complete, you are good to go" but in reality the write did not complete. It is a situation that should not occur - but if it does - media recovery would be able to repair it once it was detected.

This parameter, DB_LOST_WRITE_PROTECT, is used to proactively detect this situation in a data guard configuration and give you the option of immediately correcting the issue using the standby (or primary if the standby suffered the lost write).

So, you are protected without it - via media recovery (reason 1241245 why ARCHIVE LOG MODE is imperative unless you want to lose data - if you are not in archive log mode, you will lose data someday, it is only a matter of time). DB_LOST_WRITE_PROTECT would proactively detect it earlier probably then you would without it and correct it from the standby (or primary as the case may be)

Rating

  (11 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Excellent explanation

Andre, January 12, 2011 - 3:57 pm UTC

Yep. I had not realized the implications of recovery in that situation. A very thoughtful answer, indeed. Thanks.
=)

continued

Andre, January 12, 2011 - 4:34 pm UTC

Can I apply the same reasoning you used for DB_LOST_WRITE_PROTECT also for DB_BLOCK_CHECKING, DB_BLOCK_CHECKSUM and DB_ULTRA_SAFE ? (the latter is a combination of the previous three in 11g).

All of these parameters will -- in specific ways -- proactively monitor block write errors. So, as far as I gather, media recovery will be able to correct the blocks in the same fashion you described, will it not ?
Tom Kyte
January 13, 2011 - 7:11 am UTC

Yes, that is what they are about. Proactive monitor looking for corruption at the point of introduction - rather than down the road, later on.

As long as you have a backup and your archives that pre-date the corruption - we can recover from it. That is the rub however, if you don't notice the corruption for six months (no one happened to hit that block until year end for example, running the "close the books" process) - you might not be able to recover as you might not have a six month old backup and all of the archives. The sooner you detect corruption - the higher the probability you have everything you need.

Andre, January 13, 2011 - 7:56 am UTC

Great.

Do you suggest any configuration you like for these parameters ?

My guess is setting DB_ULTRA_SAFE to DATA_ONLY would be ok.
Tom Kyte
January 13, 2011 - 10:58 am UTC

it depends on your needs - else there would just be "the one setting"


You need to understand what they do for you - and what you need as far as data protection and how much you are willing to give extra resource wise to accomplish that.

Any of these usefull in the non data-guard environment?

A reader, January 13, 2011 - 12:34 pm UTC

Any of these usefull in the non data-guard environment?
Tom Kyte
January 13, 2011 - 2:34 pm UTC

db lost write protect is for data guard - the rest are for standalone database configurations.

very good

A reader, January 15, 2011 - 4:48 am UTC

thank you tom,i will study harder on oracle

Impact on redo log?

Kamal, August 24, 2011 - 11:01 am UTC

Hi Tom,
the doc ( http://download.oracle.com/docs/cd/E11882_01/server.112/e17110/initparams061.htm ) says:

"When the parameter is set to TYPICAL on the primary database, the instance logs buffer cache reads for read-write tablespaces in the redo log, which is necessary for detection of lost writes."

Wouldn't this generate a huge amount of redo in an instance with a lot of reads?

Thanks!

--
Kamal
Tom Kyte
August 30, 2011 - 3:50 pm UTC

define huge. It is all relative. You need what you need. If your goal is to ensure you detect corruption at the point of introduction - this is a penalty you'll gladly pay.

like you do for redo generation in the first place.
and archiving.
and redo shipping.

You size for it.

true impact on redo log

Andrea Monti, March 01, 2012 - 5:06 am UTC

Hi Tom

what "the instance logs buffer cache reads for read-write tablespaces in the redo log" means?

When the primary instance does a single physical read what is going to happen:
1) the full block (header + data) is written in the redolog?
2) the full header of the block is written in the redolog?
3) only the block's SCN is written in the redolog?

What about multiblock reads ?

If option 3) is the right one, could we estimate the impact on redolog with a formula like

extra_amount_of_redolog_per_day =
( size_of_block_id + size_of_SCN ) * physical_reads_per_day


Thanks,

Andrea

Tom Kyte
March 01, 2012 - 2:09 pm UTC

I asked Larry Carpenter - author of Oracle Data Guard 11g Handbook (Oracle Press) http://amzn.to/zE82fh

and he said:

<quote>

... it averages about 50 bytes per read block. And those are packed into the redo blocks where there is room, they are not 512 redo blocks all by themselves. So rule of thumb to me would be a 1,000,000 blocks reads would generate around 47MB more redo with it turned on rather than off. ...

</quote>

DB_LOST_WRITE_PROTECT & Standby DB ?

Ram Dittakavi, September 08, 2017 - 12:07 pm UTC

Hi TOM,

I'm a bit confused about the whole of this.

Couple of questions:

1) What significance this parameter value has in a data guard environment? I don't understand the role of standby database here.

2) "When the parameter is set to TYPICAL on the primary database, the instance logs buffer cache reads for read-write tablespaces in the redo log, which is necessary for detection of lost writes."

What does "logs buffer cache reads" mean here and how and where will it help detection of lost writes?
Connor McDonald
September 19, 2017 - 2:37 am UTC

MOS note 1302539.1 has a good write up, but to explain it briefly:

A- I put "row 1" into a block on the primary, it goes to the standby (via redo) "scn1"
B- I put "row 2" into a block on the primary, it goes to the standby (via redo) "change scn1=>scn2"
C- the *write* of row 2 to disk on the primary is lost, so the block still says "scn1"
D- I put "row 3" into a block on the primary, it goes to the standby (via redo) "change scn1=>scn3"

The standby at that point gets very confused, because "change scn1=>scn3" does align with its view of the block (which is at scn2). It will throw an ora-600 because the primary is basically corrupt.

But notice that "C", the primary is still working just fine. It's *data* is corrupted (we lost a write) but you can query it with no error - you are just getting the wrong results.

But once you hit "D", at that point you are pretty much dead in the water. You have a corrupt primary. And notice when the standby detected it - when we *next* tried to update the block. That could be hours, weeks or months after the lost write occurred. You would probably have to failover to the standby and accept the loss of data.

With "DB_LOST_WRITE_PROTECT" enabled, what we do is that when we *read* data on the primary (not just change it), we add some information to the redo logs reflecting that fact. This can be transmitted to the standby, and so we can detect the lost write potentially much earlier.

You still have to then resolve the issue - MOS note 1265884.1 is the starting place for that.




Ram Dittakavi, September 13, 2017 - 5:32 am UTC

Can i get a reply please?
Connor McDonald
September 13, 2017 - 7:16 am UTC

Well... you *were* going to get a reply.

Ram, October 04, 2017 - 7:49 am UTC

Thanks a lot, Connor.

Thank you

Rakesh, April 04, 2020 - 5:25 pm UTC

Thank you Connor for the detailed explanation , but one question I keep getting with db_lost_write_protect is , even if we set db_lost_write_protect , until we read the corrupted object block again we are at risk ?

suppose we haven't queried primary block after "c" for few days, we will be at data loss even then ? since other DB objects could have modified meanwhile and now to recover the primary block, we can restore from standby without having to failover?

A- I put "row 1" into a block on the primary, it goes to the standby (via redo) "scn1"
B- I put "row 2" into a block on the primary, it goes to the standby (via redo) "change scn1=>scn2"
C- the *write* of row 2 to disk on the primary is lost, so the block still says "scn1"

D- I put "row 3" into a block on the primary, it goes to the standby (via redo) "change scn1=>scn3"


Regards
Rakesh
Connor McDonald
April 06, 2020 - 12:31 am UTC

That's a bit like the "if a tree falls in the forest and no-one is there...does it make sound?" metaphor.

If you've changed a block, then I would assume it is *likely* you will read it reasonably soon (for example, the next backup). However, if for some reason you *never* read that block again until (say) months later, then yes, *that* would be the time you would detect the block corruption...and you'd be in the scenario covered by the MOS note, ie, looking at various options to recover that data.

In the worst case, you'd be looking at Support's assistance to manually fix the blocks...never a pleasant excercise.