these two are quite confusing
A reader, January 09, 2003 - 5:42 pm UTC
Hi
Working with Oracle 8.1.7.4 and Suse Enterprise Server 7.
I have been doing some research about these two parameters forced by some block corruptions we have faced (5 corruptions in last month and we dont have a clue wheather it's physical or logical/software corruption). I visited Metalink, read Oracle 8i backup & Recovery book by Velpuri and lastly visiting your site.
To start with when you said DIRECT Loader I guess you mean direct writes bypassing the buffer cache am I correct?
Then quoting a setence from your reply:
"calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk.
Checksums are verified when a block is read only if this parameter is TRUE and the last write of the block stored a checksum."
I am confused by this sentence, do you mean when the block is read into buffer cache and it's becomes dirty (modified) when it written back to disk a checksum is written to the corresponding data block as well as the modified data? And next time when this block is read again by some other server process, this server process computes a checksum for the block and compared with the one stored in the block?
Regarding DB_BLOCK_CHECKING I understand it checks for data block's integrity but you said (quoting again)
"The block checking is useful to verify that a block is physically "correct""
However note.32969.1 states DB_BLOCK_CHECKING is a logical block checking feature. Also in same note it states that when a corrupt block is found Oracle tries to perform cache recovery and my question is what is cache recovery, if a data block is badly written how can you make it "good" again? By reading the redo log or what?
Then later in the note it states following: as of Oracle 8.1.6 onwards Oracle reports ORA-00607 if block recovery fails and mark the block soft corrupted however in note.77587.1 case 2 it says to perform sanity check in database set events
event = "10210 trace name context forever, level 10"
event = "10211 trace name context forever, level 10"
which consequently equals DB_BLOCK_CHECKING in Oracle 8i and when block corruption is found in cache prior to
writing the block to disk, it will signal an ORA-00600[3398] and will crash the database instance. My question is which note is correct, crashing the instance or marking the block soft corrupt if we set DB_BLOCK_CHECKING? (Or instance crash only happens when set events)
Finally I have to quote again
"The two parameters do different checks (checksum doesn't validate only makes sure what we wrote is what we read, checking doesn't ensure what we read is what we wrote but rather that what we read was a VALID block) at different times."
It seems to me
checksum only makes sure what we read is what we wrote, I mean we cant write non-dirty blocks to disk so how can checksum ensure what we read is what we wrote? Isnt it the other way round? We must store some checksum first, imagine I have never enable checksum but I decided now to enable it, consider I read block X into memory if that block X is not modified will Oracle still perform a checksum and make it dirty and writes to disk?
and block checking according to note.32969.1 this check is performed whenever Oracle MODIFIES a data block, in other words performed when there is a write but you suggest this feature ensures what we READ (not write) valid blocks. Isnt it the other way round too?
As you can see I am quite confused by several explanations with these two parameters and unfortunately I cant find much useful information on Metalink and the documentations. What I want to do is determine wheather my corruptions are
media or memory or software
and I think enabling these parameters it can help me to discard the causes since I am pretty sure we will have more corruptions.
My guess checksum will help me to discard media corruption and block checking memory or software (we have ECC memory so i dont think it's memory anyway though)
Thanks in advance
January 09, 2003 - 7:20 pm UTC
direct -- yes, you got that right.
on the checksum -- if db_block_checksum is on -- then yes, DBWR when it flushed the dirty block out will create a checksum.
ignore the term physically if that causes confusion. The surrounding text describes the process. It checks that the data on the block "makes sense"
cache recovery would be in the cache, it would not use the redo or anything like that.
that note seems to say the the _ parameter does the crashing -- not the events. I would avoid the use of events and _ parameters as the two "normal" routes will catch your issue.
as for the "read what we wrote" -- we:
a) compute checksum for dirty block
b) put checksum on block
c) write block to disk
d) later re-read block
e) recompute checksum and compare
That ensures what we READ is what we WROTE. If the checksums don't match -- we don't have what we wrote when we read it.
No, oracle will not do it to non-dirty blocks, we didn't WRITE them yet so the phrase "ensures what we read is what we wrote" is accurate (to me anyway).
These will not help you determine "media" or "software". They will just help you proactively catch it when it happens.
To see if it is media, get some destructive read/write testing software on there and beat the disks to death.
To me -- I read it totally backwards from you -- checksum failure would lead me to blame hardware, block checking -- software (but not much more then hardware really)
so let me resume
A reader, January 10, 2003 - 4:09 am UTC
Hi
Thanks for your reply.
So I try to make a small conclusion
Checksum makes sure we read what we wrote, helps to detect media corruption
Block checking makes sure we read and write consistent blocks from and to disk, helps to detect software or memory corruption
Correct :-?
January 10, 2003 - 7:28 am UTC
checksum makes sure we read what we wrote. If the checksum is different upon re-reading the block later, it points to an issue between Oracle and the filesystem. This could be hardware related (bad disk, bad cables, bad card) OR it could be software related (bad interaction between Oracle and file system drivers, logical volume manager, raid software, whatever)
block checking does a logical sanity check of the block. it would catch logical errors introduce by memory overwrites and such BUT it would also catch errors introduced by media corruption. It could be the disk is flipping a bit in a length field causing logical corruption of the block (corruption in the data structure of the block itself).
Neither points the finger clearly at one or the other.
I'll re-iterate. If you want to get to the bottom of this quickly -- move the datafiles from a disk that has experienced this corruption. Get your hardware vendor to supply some destructive low level read write testing software. Beat the disk to death and see if it passes or fails.
DB_BLOCK_CHECKSUM
Winston, October 21, 2004 - 8:47 pm UTC
Hi Tom,
Can you please take a look at </code>
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96521/dfiles.htm#7647 <code>
It says: The default value of DB_BLOCK_CHECKSUM is FALSE
Caution:
Setting DB_BLOCK_CHECKSUM to TRUE can cause performance overhead. Set this parameter to TRUE only under the advice of Oracle Support personnel to diagnose data corruption problems.
HOwever I noticed DB_BLOCK_CHECKSUM of Oracle 9.2.0.1.0 on SunOS is default to TRUE.
Shall we change it to false?
October 22, 2004 - 3:59 pm UTC
I have and will continue to let it default. The decision was made to enable to be more proactive about catching corruption's introduced by any means sooner rather than later.
how about fixing error if block is bad
Ronald Chan, February 07, 2006 - 12:02 pm UTC
I received this error "An inconsistency was discoverd in the data dictionary when obtaining the necessary extent information, segment not found:file id:79,Block no:9562. Contact Oracle Customer Support"
this was a dictionary managed tablespace database version is 9206
I am very curious how can I fix this inconsistency.
February 08, 2006 - 1:33 am UTC
I would take the last sentence of the message into consideration?
how about fixing block checking failure
Ronald Chan, February 08, 2006 - 1:57 am UTC
Mr Kyte, your humor is very nice, actually I was very anxious to know how to fix that.
February 08, 2006 - 7:58 am UTC
I wasn't being humorous at all.
"Contact Oracle Customer Support"
You don't have a corrupt/bad block, you have an inconsistency in the data dictionary.
I was being dead serious when I said to take the last part of that error message into consideration.
Re: how about fixing block checking failure
Ronald Chan, February 08, 2006 - 10:46 pm UTC
Thank you very much for your reply, always.
I did that at the first place, but spent quite some time without any result, so I was thinking if you would perhaps point me to somewhere to start with.
February 10, 2006 - 10:43 am UTC
if you have this problem and you have - this is a serious problem, sort of a p2 at the very least, if not higher. This has to be resolved via support, there is quite simply no other path to take. Do you know how to escalate with support?
You can email me your tar number and I'll look at it to see if I believe it is being handled correctly.
db_block_checking for insert append ...
Laxman, July 20, 2009 - 8:29 am UTC
Hi Tom,
if we set db_block_checking = true, does the block checking happen for direct path inserts and nologging operaions, like alter table <t> move <ts> parallel?
How to make sure if block checking is happening?
Really appreciate your answer on this.
Thanks a lot !!
July 24, 2009 - 11:38 am UTC
block checking is to make sure things in memory are not corrupt, upon a subsequent modification of the block, it would be validated.
the direct path formats the block and writes it, block checking is more of an SGA thing
Different DB_BLOCK_CHECKING's settings
Sandro, August 17, 2009 - 12:46 pm UTC
Can you explain me different between 4 different values of DB_BLOCK_CHECKING?
* OFF(=FALSE)
* LOW
* MEDIUM
* FULL(=TRUE)
These settings are available from which version?
Thanks in advance.
Block cache protection
Bakunian, December 28, 2009 - 1:18 pm UTC
Tom,
How Oracle guarantees against in-memory block corruption, will SMON crash instance particularly in 11gR1 and 11gR2 if memory block considered corrupt? And is there a parameter that enforces that behavior besides of course db_block_checksum and db_block_cheking?
Is there place in the docs that describes this event I did not find it.
Thanks,
Wait events
reader, May 03, 2012 - 12:34 am UTC
Tom,
Reference documentation states that overhead of these parameters on the system can be anything 1%-10%. What are the wait events I can possibly encounter in 10g-11g that associated with db_block_checksum, db_block_checking?
Perhaps specific latches that would be placed on the buffer while examining its consistency or calculating checksum. What would those be?
I wasn't being able to find any details on this subject.
Thank you.
May 03, 2012 - 8:49 am UTC
there is basically just increased cpu - you might see increased buffer busy waits as a side effect.
the way to find out - set up a benchmark, a simulation. run it exactly the same (use flashback database to put the database back in between test runs) way - once with and once without these settings.
get a comparison AWR report to see what, if anything, was different between the two. If you don't have access to AWR use statspack and compare the reports manually.