Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, Andre.

Asked: November 27, 2000 - 10:03 pm UTC

Last updated: May 03, 2012 - 8:49 am UTC

Version: 8.1.6

Viewed 10K+ times! This question is

You Asked

Hi again !

I have asked this questions before, but maybe I was nor clear enough. Would you please try to explain this to me, which is not clear in the reference material ?

DB_BLOCK_CHECKSUM verifies only when block are written and read from disk (physycal i/o). It seems to me DB_BLOCK_CHECKING checks any modification to the block (in memory only, since blocks can only be altered in the buffer cache). Am I right ?

Thanks again !

and Tom said...

first for the definitions:

DB_BLOCK_CHECKSUM determines whether DBW n and the direct loader will calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk. Checksums are verified when a block is read only if this parameter is TRUE and the last write of the block stored a checksum. In addition, Oracle will give every log block a checksum before writing it to the current log.


DB_BLOCK_CHECKING controls whether Oracle performs block checking for data blocks. When this parameter is set to TRUE, Oracle performs block checking for all data blocks.
When it is set to FALSE, Oracle does not perform block checking for blocks in the user tablespaces. However, block checking for the SYSTEM tablespace is always turned on. Oracle checks a block by walking through the data on the block, making sure it is self-consistent. Block checking can often prevent memory and data corruption. Block checking typically causes 1% to 10% overhead, depending on workload. The more updates or inserts in a workload, the more expensive it is to turn on block checking. Oracle recommends that you set DB_BLOCK_CHECKING to TRUE if the performance overhead is acceptable.



So, the checksum will ensure that what was WRITTEN OUT is what gets read back in. It is useful to detect corruptions introduced by IO operations (bad disk, bad cables, bad hardware). It will not detect that a block in memory is already corrupt (bad memory chips, bad software, etc). The checksum is useful to find faulty disk operations.

The block checking is useful to verify that a block is physically "correct" -- that it most likely is not corrupted. If a varchar2 field on a block has a length > 4000 bytes -- we know it is corrupt. If the head of the block doesn't match the tail, we know it is corrupt. And so on. Block checking validates the integrity of the block.

The two parameters do different checks (checksum doesn't validate only makes sure what we wrote is what we read, checking doesn't ensure what we read is what we wrote but rather that what we read was a VALID block) at different times.

Rating

  (10 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

these two are quite confusing

A reader, January 09, 2003 - 5:42 pm UTC

Hi

Working with Oracle 8.1.7.4 and Suse Enterprise Server 7.

I have been doing some research about these two parameters forced by some block corruptions we have faced (5 corruptions in last month and we dont have a clue wheather it's physical or logical/software corruption). I visited Metalink, read Oracle 8i backup & Recovery book by Velpuri and lastly visiting your site.

To start with when you said DIRECT Loader I guess you mean direct writes bypassing the buffer cache am I correct?

Then quoting a setence from your reply:
"calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk.
Checksums are verified when a block is read only if this parameter is TRUE and the last write of the block stored a checksum."

I am confused by this sentence, do you mean when the block is read into buffer cache and it's becomes dirty (modified) when it written back to disk a checksum is written to the corresponding data block as well as the modified data? And next time when this block is read again by some other server process, this server process computes a checksum for the block and compared with the one stored in the block?

Regarding DB_BLOCK_CHECKING I understand it checks for data block's integrity but you said (quoting again)

"The block checking is useful to verify that a block is physically "correct""

However note.32969.1 states DB_BLOCK_CHECKING is a logical block checking feature. Also in same note it states that when a corrupt block is found Oracle tries to perform cache recovery and my question is what is cache recovery, if a data block is badly written how can you make it "good" again? By reading the redo log or what?
Then later in the note it states following: as of Oracle 8.1.6 onwards Oracle reports ORA-00607 if block recovery fails and mark the block soft corrupted however in note.77587.1 case 2 it says to perform sanity check in database set events

event = "10210 trace name context forever, level 10"
event = "10211 trace name context forever, level 10"

which consequently equals DB_BLOCK_CHECKING in Oracle 8i and when block corruption is found in cache prior to
writing the block to disk, it will signal an ORA-00600[3398] and will crash the database instance. My question is which note is correct, crashing the instance or marking the block soft corrupt if we set DB_BLOCK_CHECKING? (Or instance crash only happens when set events)

Finally I have to quote again

"The two parameters do different checks (checksum doesn't validate only makes sure what we wrote is what we read, checking doesn't ensure what we read is what we wrote but rather that what we read was a VALID block) at different times."

It seems to me

checksum only makes sure what we read is what we wrote, I mean we cant write non-dirty blocks to disk so how can checksum ensure what we read is what we wrote? Isnt it the other way round? We must store some checksum first, imagine I have never enable checksum but I decided now to enable it, consider I read block X into memory if that block X is not modified will Oracle still perform a checksum and make it dirty and writes to disk?

and block checking according to note.32969.1 this check is performed whenever Oracle MODIFIES a data block, in other words performed when there is a write but you suggest this feature ensures what we READ (not write) valid blocks. Isnt it the other way round too?

As you can see I am quite confused by several explanations with these two parameters and unfortunately I cant find much useful information on Metalink and the documentations. What I want to do is determine wheather my corruptions are

media or memory or software

and I think enabling these parameters it can help me to discard the causes since I am pretty sure we will have more corruptions.

My guess checksum will help me to discard media corruption and block checking memory or software (we have ECC memory so i dont think it's memory anyway though)

Thanks in advance


Tom Kyte
January 09, 2003 - 7:20 pm UTC

direct -- yes, you got that right.

on the checksum -- if db_block_checksum is on -- then yes, DBWR when it flushed the dirty block out will create a checksum.


ignore the term physically if that causes confusion. The surrounding text describes the process. It checks that the data on the block "makes sense"


cache recovery would be in the cache, it would not use the redo or anything like that.

that note seems to say the the _ parameter does the crashing -- not the events. I would avoid the use of events and _ parameters as the two "normal" routes will catch your issue.

as for the "read what we wrote" -- we:

a) compute checksum for dirty block
b) put checksum on block
c) write block to disk
d) later re-read block
e) recompute checksum and compare

That ensures what we READ is what we WROTE. If the checksums don't match -- we don't have what we wrote when we read it.

No, oracle will not do it to non-dirty blocks, we didn't WRITE them yet so the phrase "ensures what we read is what we wrote" is accurate (to me anyway).


These will not help you determine "media" or "software". They will just help you proactively catch it when it happens.

To see if it is media, get some destructive read/write testing software on there and beat the disks to death.



To me -- I read it totally backwards from you -- checksum failure would lead me to blame hardware, block checking -- software (but not much more then hardware really)

so let me resume

A reader, January 10, 2003 - 4:09 am UTC

Hi

Thanks for your reply.

So I try to make a small conclusion

Checksum makes sure we read what we wrote, helps to detect media corruption

Block checking makes sure we read and write consistent blocks from and to disk, helps to detect software or memory corruption

Correct :-?


Tom Kyte
January 10, 2003 - 7:28 am UTC

checksum makes sure we read what we wrote. If the checksum is different upon re-reading the block later, it points to an issue between Oracle and the filesystem. This could be hardware related (bad disk, bad cables, bad card) OR it could be software related (bad interaction between Oracle and file system drivers, logical volume manager, raid software, whatever)

block checking does a logical sanity check of the block. it would catch logical errors introduce by memory overwrites and such BUT it would also catch errors introduced by media corruption. It could be the disk is flipping a bit in a length field causing logical corruption of the block (corruption in the data structure of the block itself).

Neither points the finger clearly at one or the other.

I'll re-iterate. If you want to get to the bottom of this quickly -- move the datafiles from a disk that has experienced this corruption. Get your hardware vendor to supply some destructive low level read write testing software. Beat the disk to death and see if it passes or fails.

DB_BLOCK_CHECKSUM

Winston, October 21, 2004 - 8:47 pm UTC

Hi Tom,

Can you please take a look at </code> http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96521/dfiles.htm#7647 <code>

It says: The default value of DB_BLOCK_CHECKSUM is FALSE

Caution:

Setting DB_BLOCK_CHECKSUM to TRUE can cause performance overhead. Set this parameter to TRUE only under the advice of Oracle Support personnel to diagnose data corruption problems.

HOwever I noticed DB_BLOCK_CHECKSUM of Oracle 9.2.0.1.0 on SunOS is default to TRUE.

Shall we change it to false?

Tom Kyte
October 22, 2004 - 3:59 pm UTC

I have and will continue to let it default. The decision was made to enable to be more proactive about catching corruption's introduced by any means sooner rather than later.

how about fixing error if block is bad

Ronald Chan, February 07, 2006 - 12:02 pm UTC

I received this error "An inconsistency was discoverd in the data dictionary when obtaining the necessary extent information, segment not found:file id:79,Block no:9562. Contact Oracle Customer Support"

this was a dictionary managed tablespace database version is 9206

I am very curious how can I fix this inconsistency.

Tom Kyte
February 08, 2006 - 1:33 am UTC

I would take the last sentence of the message into consideration?

how about fixing block checking failure

Ronald Chan, February 08, 2006 - 1:57 am UTC

Mr Kyte, your humor is very nice, actually I was very anxious to know how to fix that.

Tom Kyte
February 08, 2006 - 7:58 am UTC

I wasn't being humorous at all.

"Contact Oracle Customer Support"

You don't have a corrupt/bad block, you have an inconsistency in the data dictionary.

I was being dead serious when I said to take the last part of that error message into consideration.

Re: how about fixing block checking failure

Ronald Chan, February 08, 2006 - 10:46 pm UTC

Thank you very much for your reply, always.
I did that at the first place, but spent quite some time without any result, so I was thinking if you would perhaps point me to somewhere to start with.

Tom Kyte
February 10, 2006 - 10:43 am UTC

if you have this problem and you have - this is a serious problem, sort of a p2 at the very least, if not higher. This has to be resolved via support, there is quite simply no other path to take. Do you know how to escalate with support?

You can email me your tar number and I'll look at it to see if I believe it is being handled correctly.

db_block_checking for insert append ...

Laxman, July 20, 2009 - 8:29 am UTC

Hi Tom,

if we set db_block_checking = true, does the block checking happen for direct path inserts and nologging operaions, like alter table <t> move <ts> parallel?
How to make sure if block checking is happening?

Really appreciate your answer on this.

Thanks a lot !!
Tom Kyte
July 24, 2009 - 11:38 am UTC

block checking is to make sure things in memory are not corrupt, upon a subsequent modification of the block, it would be validated.

the direct path formats the block and writes it, block checking is more of an SGA thing

Different DB_BLOCK_CHECKING's settings

Sandro, August 17, 2009 - 12:46 pm UTC

Can you explain me different between 4 different values of DB_BLOCK_CHECKING?
* OFF(=FALSE)
* LOW
* MEDIUM
* FULL(=TRUE)
These settings are available from which version?

Thanks in advance.
Tom Kyte
August 24, 2009 - 8:53 am UTC

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14237/initparams039.htm#REFRN10029

all fully documented - also, you have access to the documentation for old releases available on OTN, you can see what what supported under which version by plugging db_block_checking into the search against each library.

Block cache protection

Bakunian, December 28, 2009 - 1:18 pm UTC

Tom,

How Oracle guarantees against in-memory block corruption, will SMON crash instance particularly in 11gR1 and 11gR2 if memory block considered corrupt? And is there a parameter that enforces that behavior besides of course db_block_checksum and db_block_cheking?

Is there place in the docs that describes this event I did not find it.

Thanks,




Tom Kyte
January 04, 2010 - 8:07 am UTC

http://www.oracle.com/pls/db112/search?word=db_block_checking&partno=


there is the newer db_ultra_safe parameter that encompasses a few of the checking parameters (one place shopping sort of)

http://docs.oracle.com/docs/cd/E11882_01/server.112/e10820/initparams066.htm#REFRN10295


If it is survivable, you'll just get an ora-600 (no instance crash) with the corrupt block - with db_block_checking, that happens in the SGA - so the block can probably be fixed with simple media/instance recovery.

Wait events

reader, May 03, 2012 - 12:34 am UTC

Tom,

Reference documentation states that overhead of these parameters on the system can be anything 1%-10%. What are the wait events I can possibly encounter in 10g-11g that associated with db_block_checksum, db_block_checking?

Perhaps specific latches that would be placed on the buffer while examining its consistency or calculating checksum. What would those be?

I wasn't being able to find any details on this subject.

Thank you.

Tom Kyte
May 03, 2012 - 8:49 am UTC

there is basically just increased cpu - you might see increased buffer busy waits as a side effect.

the way to find out - set up a benchmark, a simulation. run it exactly the same (use flashback database to put the database back in between test runs) way - once with and once without these settings.

get a comparison AWR report to see what, if anything, was different between the two. If you don't have access to AWR use statspack and compare the reports manually.