Skip to Main Content
  • Questions
  • Challenging Block Corruption Questions

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, John.

Asked: February 01, 2017 - 2:41 am UTC

Last updated: February 03, 2017 - 7:29 pm UTC

Version: 11.2

Viewed 1000+ times

You Asked

Good Day,

I have yet another area where I ask for your help in supplementing Oracle documentation. This case is in regards to block corruptions. This has always been one of those items that I have been hesitant to implement. My guess is that < 5% of DBAs ever set a nondefault value to more than one these parameters. For one, we all hate to be the ones that push the database overhead to the point of introducing a performance problem. Another reason is that we may not really understand enough about it since it doesn't seem to prevent block corruptions just reduce it. Here are my questions:

- I've read that Oracle has four parameters to help prevent logical and physical corruptions. It seems based on what I read that it is impossible to prevent and/or detect a block corruption by setting these parameters:
...- DB_BLOCK_CHECKING = prevent memory and data corruption ( 1% to 10% overhead)
...- DB_BLOCK_CHECKSUM = Detect I/O storage, disk corruption (1-2% overhead)
...- DB_LOST_WRITE_PROTECT= Detect non-persistent writes on physical standby
...- DB_ULTRA_SAFE- changes only the default values for the above parameters
- Please help me justify setting these above parameters to management. I read that by setting these parameters, CPU utilization will increase by up to 15%. Also, some of these parameters will only help 'detect' not prevent. Most of the block corruptions that I have experienced have resulted when the database server crashes. If were to set the above parameters to DB_ULTRA_SAFE= DATA_AND_INDEX, would this significantly prevent block corruptions during a database server crash? What other situation would be protected? What situations still go unprotected? Obviously, it can't protect against someone overwriting the storage disk outside of the database or if the disk fails or if a controller cache set as write-back crashes.
- Out of those parameters, it seems like DB_BLOCK_CHECKING is the best one to set to prevent block corruptions since it will 'prevent' corruption. Is my interpretation correct?
- I also read about 'validate database' command to find block corruptions. Is it a good idea to instead of setting up the above parameters, that we simply perform a daily job that will execute 'validate database' during some low time period?
- I really don't know what 'validate database' does behind the scenes. I assume that it simply performs the same operation as "DB_BLOCK_CHECKSUM' on all database blocks, right? Can executing 'validate database' risk corrupting the database?
- Regarding logical corruptions, the documentation simply stated that logical corruptions occur in memory. I found a document that listed these situations as cause for logical errors; however, it didn't state how it can occur:
...- Corruption Examples are:
.....- row locked by non-existent transaction - ORA-600 [4512], etc
.....- the amount of space used is not equal to block size
.....- avsp bad
...- Can you provide a few examples on how it can occur?

I know that I am listing eight questions here, but I since they are all related I didn't want to ask it in six to seven threads. Thank you once again for your help!

John

and Connor said...

A block corruption is rare, but can be caused by

a) software bug on our part

b) some sort of failing *between* the database software and the OS layer, eg, we say "please write then 8k block in an atomic fashion" and the OS says "OK, I'll break that up into 4 x 2k writes", one of which fails but we told the database all was good.

c) a hardware fail. I remember working on a customer site years ago where the NVRAM cache on the storage array corrupted hundreds of databases due to a firmware problem.

All of the block checking parameters simply introduce various "intensities" of checking at the cost of increased overhead in doing so. If your server has headroom, then you may as well be more protected (after all, cpu doesnt last longer if you dont use it :-))

If your server is running close to its limit, then you probably wouldnt.

And like the stock market commercials say: "Past performance is not a reliable indicator of future activity". Every block might be fine on disk, but if the storage layer returns gibberish to us on a read, then you'll still run the risk of getting block errors reported to you.

Rating

  (2 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

John Cantu, February 02, 2017 - 9:31 pm UTC

if we turn on block checking, will it add significant time to the total response time for DML?
Connor McDonald
February 03, 2017 - 7:29 pm UTC

It may, it may not :-)

Like anything - this is about cost/benefit. If my server is max'd out, then I probably dont want to add anything that would add more load. If my server is running at 20%, then I'd probably look at doing it, by starting with LOW and working my way up, monitoring as I go.

Test It?

Chuck Jolley, February 02, 2017 - 10:05 pm UTC

I'm not sure how anyone but you could even have a guess.
Why not test it?

Would it make you feel better if I told you we have had ultra safe turned on for years using servers with very minimal processors because of licencing costs and have never had any issues?