Skip to Main Content
  • Questions
  • Prevention and Detection of database corruption

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Stephen.

Asked: August 08, 2000 - 3:48 pm UTC

Last updated: June 11, 2009 - 8:14 am UTC

Version: 8.1.5

Viewed 10K+ times! This question is

You Asked

Tom,

I run a 24x7 OLTP database that's just cannot afford ANY data corruption. I am trying to proactively detect any possible data integrity/corruption issues before it hits our clients. Besides DBVerify and ANALYZE...VALIDATE STRUCTURE for data block checks what else can I do and why? ANYTHING you can possible think to help me do early detection of problems would be a tremendous help.

and Tom said...

a full database export/import show=y is another method. it is fairly low impact and exercises

o the logical integrity of the system tablespace
o as well as full scans all tables and reads them.

its what I do (exp/imp). (i don't use analyze or dbv)


also, look at the init.ora parameters

db_block_checking
db_block_checksum

they do impact runtime performance so you'll need to evaluate them first.


Rating

  (37 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Prevention and Detection of database corruption

Mark, January 29, 2003 - 6:08 pm UTC

Metalink Note:214369.1 explains in detail what EXP does and does not check for database corruption. The metalink note [NOTE:32969.1] explains the second option of Block Checksumming.


block corruption

vinodhps, June 14, 2003 - 10:07 am UTC

Hi Tom,

We are Using OLTP 24X7 system which recently in one month i faced 2 block corruption problems in same tablespace .
We are runing on noarchive mode due to some technical problem. last time i lost 1day crusial data of production .but some how i managed to get the data.. but again now iam having same problem but so far no problem for writing data or reading from that block i face problem when i export the tables from the tablespace. it avoids all the tables in that tablespace , iam using cron to execute export in Direct path , and i saw the

"error message in export.err file."

ORA-01578: ORACLE data block corrupted (file # 29, block # 28264)
ORA-01110: data file 29: '/od/pms/pmsd03.dbf'
EXP-00008: ORACLE error 1578 encountered
EXP-00067: Direct path can not export T_OIIS_ATP_TEST_CASE_DTLS which contains object or lob data.
EXP-00067: Direct path can not export T_OIIS_DOCUMENT_DTLS which contains object or lob data.
EXP-00067: Direct path can not export T_OIIS_DOCUMENT_HIST_DTLS which contains object or lob data.

----------------
"error messages in alertsid.log"
Corrupt block relative dba: 0x07406e68 file=29. blocknum=28264.
Fractured block found during user buffer read
Data in bad block - type:6. format:2. rdba:0x07406e68
last change scn:0x0000.06465372 seq:0x1 flg:0x00
consistancy value in tail 0x00000000
check value in block header: 0x0, check value not calculated
spare1:0x0, spare2:0x0, spare2:0x0
Sat Jun 14 08:33:02 2003
Thread 1 advanced to log sequence 225615


--------------


THAT particular tablespace DBF file is the important in our Application which contains.. all the data of production.

currently there is no DBA .. iam just new to DBA job ..

Iam not sure what exactly should.. i do.. some of my friends.. asked me to
1)copy the particular pms03.dbf to some other place and chage the path in control file . But i don't think this will work.
2)copy the particular pms03.dbf to some other place, drop the exsisting one and then rename the new file and move the particular path.. i suppose this too does't work..

3) what i think is to export that particular tablespace tables in conventional path.
a)drop tablespace
b)create tablespace
c)imp the dmp file back to the tablespace.

In the 3 above step which one is the correct one .Please help , if there is any other method to solve the problem..

Thanks in advance.
i should thank you.. you have helped several times and you site helped a lot..iam sure you will help to solve the problem.

Tom Kyte
June 14, 2003 - 1:44 pm UTC

you just said two things, both of which cannot possibly be true at the same time:

thing 1) we are 24x7 OLTP
thing 2) we are in noarchivelog mode.

there are no technical problems with running archivelog mode. You cannot be 24x7 in noarchivelog. You can be in development, but you cannot be production.

thing 3) to back up the fact that thing 1 is the farthest thing from the truth is "there is no DBA"

I think you need to get in touch with support and develop a list of things to try. You are between a rock and an incredibly hard place. I am not going to give you steps to try as you can easily done tons more harm then good right now.



Block corruption

vinodhps, June 16, 2003 - 4:23 am UTC

Hi Tom,
Yes , its true...

Its not 24x7 as our server hardware cannot support archiving.. so we use to take full back up on every sunday (3hrs) down time..and every day we are taking 3 full export and we got new server (HP RP7410 , HP UNIX ), just waiting.. to migrate in 1 or 2 weeks we will migrate oracle 9i at that time we will be use archive mode...


and the above problem is with one table ,it was identified and dropped it.. and import from my old export fortunatly that table is not an important one and data flow is very low.

My doubt is, suppose this same problem come again before we migrate to new server with couple of tables it is dangerous.. to lose data..

i can't do full export and import due to down time..


1)Why this block corruption happens?Any way to avoid this?
2)Is this block corruption is Os block or oracle data block corruption?

can you kindly explain this Tom.
Thanks

Tom Kyte
June 16, 2003 - 8:09 am UTC

exports ain't backups.

this looks like a hardware related issue if you ask me. you can turn on block checking (performance impact) to proactively find these issues faster. A fractured block is when the head of the block is newer or older then the tail. Like we asked the OS "please write this 4k of data" and the OS only wrote 2k of it.

check your system logs, see if you have spurious IO errors reported.

Block corruption

vinodhps, June 16, 2003 - 8:35 am UTC

Hi Tom ,
Thanks for your immediate responce, Well its not hardware problem i checked with Hardware Engineer.He checked for any bad block or any spurious IO errors , there is nothing like that, what will the actual problem any where for this block corruption.

Can any one corrupt the block,some thing like that..is it possible of so kind of transaction will corrupt the block.

What will the main reason for the block corruption?

Thank you Tom,

Tom Kyte
June 16, 2003 - 9:00 am UTC

I still believe it to be "hardware related" but you can enable block checking to catch it at the point of failure.

did they do destructive low level read write tests? if not, you have not yet rules out anything.




Block corruption

vinodhps, June 17, 2003 - 12:11 am UTC

Hi Tom,
I once again checked with hardware Incharge about that problem he said.. server does not have any problem.

Can you tell me how to enable block checking tom, is there any referece in you book(Expert one-on-one) tom.

Thank you,

Tom Kyte
June 17, 2003 - 7:24 am UTC

search for

db_block_checksum

on this site.

A reader, January 23, 2004 - 12:48 am UTC

Tom,

I am in the process of testing backup and recovery using rman, i vaguely remember you mentioning, its difficult to intentionaly corrupt a data block.

Can you please give some updation on that one as I wanna do testing of block level recovery.

Thanks.

Tom Kyte
January 23, 2004 - 7:27 am UTC

you can write a program to corrupt blocks. you would just open an oracle datafile, seek somewhere in the file, write junk and exit.

I'm not posting such a program -- but it is only a couple of lines of code.

On Windows platform simply use notepad to corrupt database files

Lars Bo Vanting, January 26, 2004 - 6:40 am UTC

Hi!

On Windows platforms I simply use the NOTEPAD.EXE to open e.g. a tablespace datafile, modify the contents and save it back. This corrupts your db file (careful)!

Tom, I saw you at the Miracle master Class in Copenhagen - simply fabulous how you share your knowledge with other!

Cheers
Lars Bo

Prevention/Detection of database corruption

Bernice, September 23, 2004 - 12:30 pm UTC

Hi Tom,

In your original answer to this thread, you said you only use exp/imp as the means for detection of db corruption. I have read a bunch of articles from Metalink that discusses using exp/imp, dbverify, and analyze table ...validate structure.

My question is ... why do you only use exp/imp as a test? Doesn't that only touch blocks below the HWM? What if a corruption is present in a block above the HWM? Would oracle barf when it comes time to use that block? Or will the formatting clean up that corruption?

thank you!


Tom Kyte
September 24, 2004 - 9:05 am UTC

if I use exp/imp -- it is only to exercise the databases dictionary - and with the advent of RMAN and in 9i -- all of the cool things it can do -- well, we don't really even need to do that anymore (with the logical block checking it does).

(who cares about corruption above the HWM -- all blocks above the HWM are by definition "corrupt", we haven't touched them yet)....




How do i corrupt block

sachin, November 01, 2004 - 9:55 am UTC

Hi tom,
How do i corrupt block????? I want to learn more ,i want to recover corrupt block on my test server.
One day i got chance to recover block corruption on production database .But business asked me to restore using last export backup.I'm managing 14 databases on solaris,windows,linux,nowell netware.Before facing block corruption on production database,i want to learn how to revover it.
I know BBED utility to corrupt block.But i came to know that i need password to use this utility.
So any other idea to corrupt block.

Thanks in advance.

Tom Kyte
November 01, 2004 - 1:45 pm UTC

$ man dd

if you are on unix, else find some binary editor on windows.....

Oracle does it by themselves (how do corrupted block)

Alex, November 02, 2004 - 10:05 am UTC

Hello Tom and Sachin

Oracle does it by themselves with your help

Windows
rdbms 8.1.7.4
1. Create tiny actDB (archivelog mode)
2. Create standby database  : 
 -- on actDB
 alter database create standby controlfile as '...';
 --copy datafiles to stbServer,mount stbDB ...
3. Populate actDB & stbDB

CREATE TABLE partitioned_table ( timestamp date,  id        int )
PARTITION BY RANGE (timestamp)
(PARTITION part_2002 VALUES LESS THAN ( to_date('01-jan-2003','dd-mon-yyyy') ) ,
PARTITION part_2003 VALUES LESS THAN  ( to_date('01-jan-2004','dd-mon-yyyy') ) ,
PARTITION the_rest VALUES LESS THAN ( maxvalue ));

insert into partitioned_table partition(part_2002)
select to_date('31-dec-2002')-mod(rownum,360), object_id
from all_objects;
insert  into partitioned_table partition(part_2003)
select to_date('31-dec-2003')-mod(rownum,360), object_id
from all_objects;
insert  into partitioned_table partition(the_rest)
select to_date('31-dec-2004')-mod(rownum,360), object_id
from all_objects;
commit;

create global temporary table INVALID_ROWS (
  owner_name         varchar2(30),
  table_name         varchar2(30),
  partition_name     varchar2(30),
  subpartition_name  varchar2(30),
  head_rowid         rowid,
  analyze_timestamp  date
) on commit preserve rows;

create index local_prefixed on partitioned_table (timestamp) local logging;

analyze table PARTITIONED_TABLE validate structure cascade into invalid_rows;

select table_name,partition_name,num_rows,last_analyzed,logging from user_tab_partitions
where table_name='PARTITIONED';

select index_name,partition_name,num_rows,last_analyzed,logging from user_ind_partitions
where index_name='LOCAL_PREFIXED';

SELECT d.name, d.unrecoverable_change#,t.name  tablespace FROM v$datafile d, v$tablespace t
where d.unrecoverable_change# > 0
 and d.ts#=t.ts#;

alter system switch logfile;
  -- -- on the other hand (stbDB): recover before this point
  -- recover standby database;
  --auto
  --alter database open read only;
  --analyze table PARTITIONED_TABLE validate structure cascade into invalid_rows;
  -- --SQL> Table analyzed !!!
  -- shutdown immediate;
  -- startup nomount
  -- alter database mount standby database;
  -- -- we have correct 2 DB - act & stb in this point 

--next on actDB: welcome DBA a'la default
-- he knows, that  all tables&indexes in LOGGING mode

-- error: DEFAULT rebuild = on stbDB corrupted block
alter index local_prefixed rebuild partition the_rest;

alter system switch logfile;
  -- -- on the other hand (stbDB): recover before this point
  -- recover standby database;
  --auto
  --alter database open read only;
  --analyze table PARTITIONED_TABLE validate structure cascade into invalid_rows;
  -- --SQL> ......corrupted block

-- error: ONLINE rebuild = the same on stbDB corrupted block
--alter index local_prefixed rebuild partition the_rest online;

4. if your actDB 24x7 is dead and you have to crash change-over to stbDB,  where is guarantee, that
    on stbDB (after activate) you can receive sql response
    select * from partitioned_table where  timestamp > sysdate -1;   

-- OK!
--alter index local_prefixed rebuild partition the_rest logging;
--drop table partitioned_table;
--drop table invalide_rows;
 

concept

reader, December 25, 2004 - 8:24 pm UTC

Merry Christmas Tom.

Conceptually, do dbverify, analyze, export, db_block_checking=TRue all do the same kind of block check? Is there any difference when should one use which method to check for block corruption? Thanks.

Tom Kyte
December 26, 2004 - 12:24 pm UTC

no, not even close.

eport just reads blocks, exercises them -- doesn't "check the blocks". in particular, export would never pick up on the table that there is a row in a table that is not pointed to by an index, or there are index entries that point to non-existent rows.

But Analyze validate structure cascade -- it would..

But DBV would not...

and none of the above would detect perhaps a bit flip that a checksum would...

About detecting file corruption....

Michael, January 04, 2005 - 11:40 am UTC

Regarding the above discussion: Long ago, I had a V 7.3 database that had a corrupt datafile, and it stayed up running for almost a week without the corruption being detected. This was not an important database, and it was only being backed up with a weekly full export and a weekend cold backup. I detected the file corruption while doing the export. People asked at that time how the database could stay up for days (or even longer) with a corrupt datafile, and I said something to the effect that as long as nobody accessed the particular blocks affected by the corruption in that file, Oracle would not throw an error. In fact, we could access certain tables that were in that file and not others (the tables were in a specific tablespace supported only by the corrupt datafile).

Just today, I was asked a similar question about an Oracle 8.0.x database with Filenet running against it. (I am not responsible for that database but am trying to help.) They apparently had a corrupt datafile for some days and were totally unaware of it. Then they lost their disk array, which they rebuilt and then tried to recover the database from a December cold backup, which is when they hit the error about the bad file. They were wondering why Oracle did not detect the corruption during that period. I gave an answer like the one above. I cannot recall when Oracle's corruption detection parameters were first rolled out, but if it was in 8.0.x (I think it was later) they were not using them. I pointed them to a few metalink bulletins (198640.1, 94114.1, 230829.1) to give them a better feel for what they need to do. They do not have a real DBA and I cannot access their system. Do you know of any good bulletins or papers that deal with trying to recover a datafile other than the ones I referenced above? You seem to have a knack for giving the best references to documents that always are on target for the issue at hand.

Tom Kyte
January 05, 2005 - 8:39 am UTC

If you use RMAN to backup, you'll detect corrupt right then and there -- as you backup.

If you use "cp", "tar", "xcopy", "ocopy", "copy", "whatever" -- you won't detect corruption until you actually touch the data which could be months AFTER it was introduced.


RMAN, knowing that it is reading an Oracle datafile, looks at each block and says "yup, this one is ok" or "hey -- this is bad, we need to correct this". Since you have to have AT LEAST -- AT A VERY VERY VERY SMALL MINIMUM, two backups ready to restore with -- you can fix that bad block with the good block in your other backup...

That aside, if you are hitting corruption right now and trying to recover -- your absolute BEST bet is "file a tar" and have support walk you through all of the options. It depends on how wide spread the problem is, the segment it is in, what other backups you might have and so on.

"destructive low level read write tests" and propagation to standby

naresh, August 30, 2005 - 3:32 am UTC

Hello Tom,

1)
You mentioned above whether the HW team performed "destructive low level read write tests" - what would these be (conecptually)?

2)
Other question is whether block corruption propagates through the archivelog (or maybe the redo log in 10G) to a standby database?



Tom Kyte
August 30, 2005 - 5:03 am UTC

1) it would be a program that writes a know set of bytes to the disk, then reads them in an verifies they are the same.

It is "destructive" in that it overwrites data that already exists there, it is a last ditch test.

2) depends on the nature of the corruption, but in general "no".


What does RMAN check?

naresh, August 30, 2005 - 3:35 am UTC

Hello Tom,

In one of your responses:

****

export just reads blocks, exercises them -- doesn't "check the blocks". in
particular, export would never pick up on the table that there is a row in a
table that is not pointed to by an index, or there are index entries that point
to non-existent rows.

But Analyze validate structure cascade -- it would..

But DBV would not...

and none of the above would detect perhaps a bit flip that a checksum would...

*****

Does RMAN detect all of the above?
-index<>table inconsistency
-checksum for bit flip

Thanks,
Naresh

Tom Kyte
August 30, 2005 - 7:07 am UTC

index/table - no
checksum, that the block is OK, yes.

Block Corruption

Deepak, September 09, 2005 - 12:29 am UTC

Hi Tom,

While running a bulk update in my 9.0.1 Database (Win XP Pro) my instance terminated automatically. When I saw the alert.log I found that there a block corruption in one of the data files. After that I checked for the block corruption using the DBV (DB Verify) utility. But the statistics did show that there are no block corruption errors.

Please help me in troubleshooting the problem. Also please do let me know the detailed steps. I neither have backup of the data files nor the export dump. How to tackle the situation?

Tom Kyte
September 09, 2005 - 7:03 am UTC

support is the first place to call in a situation like this, they'll gather the required information and help you figure out your options.

hopefully this is a play/test system, it must be - no backups?

follow-up : RMAN and checksum

Bernice, October 15, 2005 - 3:15 pm UTC

Hi Tom,

We just moved our database files over to a new hardware and I am checking to insure no corruptions have been introduced.

I am interested in a follow-up question to Naresh's question (above) where this was asked --
"Does RMAN detect all of the above?
-index<>table inconsistency
-checksum for bit flip"

and you replied "checksum, that the block is OK, yes."

Since these are 8i databases, I decided not to use RMAN to back them up (have been using regular OS backups).

I know I can use this command:
backup validate check logical database;
to perform the check in RMAN.

But in order for the checksum to work, wouldn't I have had to set this in the init.ora "db_block_checksum=true" in order for oracle to start putting the checksum value on the block header? Without a previous checksum value, what would rman use to compare to see if a bit did flip?

thanks,
Bernice

Tom Kyte
October 15, 2005 - 3:45 pm UTC

it wouldn't - it would do the "logical check" to ensure the blocks made "sense" but it would not be able to verify a non-existent checksum was still ok.


at the os level, you could always just "sum" the files to verify they match the source system you copied from.

$ man sum

very helpful!

Bernice, October 16, 2005 - 12:47 pm UTC

Hi Tom,

Thanks! "sum" worked perfectly.

B


RMAN check logical....

Ram, October 23, 2005 - 1:29 am UTC

(1) Logical corruption: I thought RMAN would verify index rowid that points to the correct row or not? is this right? if not what does logical check mean with RMAN?

(2) When RMAN checks for physical corruption, what does it do?

Thanks.

Tom Kyte
October 23, 2005 - 4:49 am UTC

1) rman does not perform cross segment verification. It does not need access to a tables datafile in order to backup the index datafile.

the logical check means it looks at the block (the database block) and verifies the data on it is not scrambled up.

2) it tries to read the data - it either succeeds or fails doing so.

logical check

Reader, October 23, 2005 - 2:27 pm UTC

When you say RMAN does logical check, does it mean it compares the header and footer of oracle database block? Thanks.

a check for physical corruption meaning it reads data inside the block? I am trying to understand the difference between logical and physical block corruption and how does RMAN help us to find them. Thanks.

Tom Kyte
October 23, 2005 - 2:46 pm UTC

header - footer - that is one of the checks, yes (it is looking for fractured blocks and when it hits one, it'll re-read it, that is why there is no need for "alter tablespace begin backup" with rman)

physical corruption is "we cannot read the block from disk, something is physically preventing us from doing so".

ORA-01499

Bernhard, October 28, 2005 - 4:11 am UTC

Yesterday I had an issue with ORA-01499.

I noticed that 2 queries against a table with an index
returned inconsistent results, so I did

SQL> analyze table audit_stage validate structure cascade online;
analyze table audit_stage validate structure cascade online
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file 

so I re-created this table with all indexes, shutdown DB and
dbv every datafile.
Also, I turned on db_block_checking=true

Question:
I would like Oracle to raise an ORA-01499 not only when I do a "validate structure" but also it notices this in a query.
Is there any parameter for that ?


 

Tom Kyte
October 28, 2005 - 12:57 pm UTC

well, the problem usually is that there would be rows in the table the index does not reference.

If we full scan the table, we'd never know that (we don't use the index)

and if we use the index, well, we'd never know that either since we sort of use the index to get to the table (hence we don't see that other row).


I'd rather try to figure out why this happened and make sure I was patched against it happening ever again.

Using both db_block_checksum and db_block_checking

James Valenti, May 15, 2007 - 6:09 pm UTC

Tom,

Can you provide further guidance on the use of DB_BLOCK_CHECKSUM and DB_BLOCK_CHECKING?
1. If I have DB_BLOCK_CHECKSUM = FULL is it overkill to also have DB_BLOCK_CHECKING = FULL?
2. If I have DB_BLOCK_CHECKSUM = FULL would DB_BLOCK_CHECKING run faster?
3. If I use just one of these checks which one give the most protection against corruption?
4. Do you recommend using one or both of these parameters?
Tom Kyte
May 16, 2007 - 10:22 am UTC

checksum detects IO errors
block checking detects/prevents memory errors

they serve two similar but different purposes.

if a block is 'memory corrupted', we'll gladly compute a checksum for it and make sure the corrupt block is read in corrupt.


use both if you

a) want to
b) can

Bhaskar BV, April 23, 2008 - 4:20 am UTC

Hi Tom,

Does corruption in primary site will be replicated to standby site??. What are the options
available to avoid disaster to standby.

Thanks & Regards
Bhaskar
Tom Kyte
April 28, 2008 - 9:14 am UTC

logical corruption, like you are introducing into this web site by asking the SAME EXACT QUESTION OVER AND OVER - will be copied.

look at the other place you asked this to get the formal answer.

Force Logging

JATIN, May 13, 2009 - 5:54 am UTC

Hi Tom

We had put our LIVE database in force logging mode after we faced soft block corruption on our QA database (which was copied from LIVE and recovery applied). However, this time again we faced soft block corruption after applying 1 week of archivelog files...

Now that we are able to recover from the situation by using some exports from live, I am wondering what other factors could have introduced this corruption? Is there a way to bypass force logging too..?..

Thanks to show some light.
Tom Kyte
May 13, 2009 - 2:01 pm UTC

define "soft corruption" for me in the context you use it.

Force Logging

JATIN, May 13, 2009 - 8:41 am UTC

In continuation to above, can db file sequential read or control file sequential read be the culprits to muck up the blocks..?.. In my case I found these wait events to be abnoarmally high.. so am thinking in those lines..?..

Please advise.
Tom Kyte
May 13, 2009 - 2:16 pm UTC

no, they cannot.

you are thinking along the wrong lines. A read - a *read* - would not 'muck up' anything.

you need to define your term, what is a "soft corruption" to you. 99.9999% of the times there is corruption - it is hardware related, the disks done did it to you.

Soft Corruption

JATIN, May 14, 2009 - 2:29 am UTC

Well, in cases where the database was running in no force logging mode and we did some nologging operation.. followed by a recovery using the archivelog file, we faced soft corruption (a term coined for me by metalink). 

However, in this case, the DB was in force logging mode and still (on copying and applying archivelog files) I am facing the corruption (which look similar to prvious case - that's why I called it soft corruption). A RMAN> backup validate check logical database gives the following in v$database_block_corruption:


SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
         1     140090          8         8.8956E+12 LOGICAL
         1     151098          8         8.8956E+12 LOGICAL
         1     151290          8         8.8954E+12 LOGICAL
         3        844         63         8.8956E+12 LOGICAL
         3       7152         63         8.8956E+12 LOGICAL

.. and so on (around 2000 lines)

Can you please suggest where should I look for a reason of this LOGICAL corruption (or soft corruption?). Could this be related to disk (won't that be physical corrution?). I donot want to to reappear so want to nail it down here.. 

Please advise.

Thanks in anticipation.

Tom Kyte
May 14, 2009 - 10:38 am UTC

... Well, in cases where the database was running in no force logging mode and we
did some nologging operation.. followed by a recovery using the archivelog
file, we faced soft corruption (a term coined for me by metalink). ...

doh, you did yourself a very bad thing there.

You did nologging operations - which destroys your ability to do media recovery on that segment.

... am facing the corruption (which look similar
to prvious case - that's why I called it soft corruption)....


it is not similar at all - the first one was SELF INFLICTED. You did a direct path operation on a block and did not generate redo for it (nothing in the archives). Later, before you backed up this file with the non-logged operation - you restored a backup of it - and this backup was taken BEFORE the direct path operation. We rolled forward through the archives and when we were done we said "oh no, this block that was direct pathed without redo - we don't know what belongs there - we don't have a backup of it after the non-logged change and we have no redo - that segment is corrupt, those blocks we are MARKING SOFT CORRUPT, that data is lost"




Your current report doesn't mean that the blocks are corrupt, unless you FIXED the original problem, they'll stay that way. The first row for example, does it actually map to a segment??? (why are you doing direct path operations in SYSTEM anyway???)

In other words, v$database_block_corruption will not 'reset' until you perform block media recovery (which you could not have), restoring datafiles (that is the original problem, they had soft corrupt blocks in them), recovering via incremental backups (same problem) and block newing

Of all of them - block newing is the one that would/could apply to you - when those blocks are reused by some segment - and formatted - they'll disappear.

Until then, they are the blocks you messed up the first time around.

JATIN, May 18, 2009 - 12:44 am UTC

Tom,

I understood the mistake in first scenario. However, for the second case, I am still not sure what inflicted the corruption.. ?

To elaborate, the second scenario is not at all generated from the first one.. We completely destroyed the QA env (which has soft corruption earlier) and relaced all files with LIVE data (which never had corruption) - thereafter applied recovery using archivelogs (from LIVE). During all this, the LIVE was ALWAYS is force logging mode.. however, I am not sure whether any nologgin operation was performed or not (as it's application developers job to load/unload data here..).

I am more concerned about "what could have inflicted corruption in QA" when I was running my LIVE system in force logging mode..? Means we were forcing all changes to archivelogs this time..

As for your comments "v$database_block_corruption will not 'reset' until you perform block media recovery" - This view must reset as I copied all dbf's and archivelogs from LIVE (where this view contains no corruption).

Also, I am not clear on "block newing is the one that would/could apply to you - when those blocks are reused by some segment - and formatted - they'll disappear" - could you elaborate please..? Currently, no one is able to view any data in tables with corruption.. so what do you mean by newing -- do you mean updates/inserts on corrupted blocks?

Thanks a lot.

Tom Kyte
May 23, 2009 - 11:07 am UTC

did you see that I said "it is not corrupt"


... Your current report doesn't mean that the blocks are corrupt, unless you FIXED the original problem, they'll stay that way....



until you reuse that space or get it out of your database, they will be that way. That is what I said.

block "newing" is reusing a block, we format it again.


did you consider answering my question?

The first row for example, does it actually map to a segment??? (why are you doing direct path operations in SYSTEM anyway???)

Please Clarify..

JATIN, May 26, 2009 - 2:14 am UTC

Hi Tom,

Here's the mapping:

select owner, segment_name, segment_type
from dba_extents  
where file_id=1
and 140090 between block_id and block_id + blocks - 1;  

OWNER SEGMENT_NAME              SEGMENT_TYPE                            
----- ----------------------  ------------
SYS   DBA_FREE_SPACE_DFSPACE  TABLE

  1  select owner, segment_name, segment_type
  2  from dba_extents
  3  where file_id=1
  4* and 151098 between block_id and block_id + blocks - 1
SQL> /

no rows selected


  1  select owner, segment_name, segment_type
  2  from dba_extents
  3  where file_id=1
  4* and 151290 between block_id and block_id + blocks - 1
SQL> /

no rows selected

Not sure if I am getting you right -- I have unmounted the volume group hosting the db file system and refreshed the data (& binaries) from LIVE. Doesn't that remove the v$database_block_corruption entries and the supposed corruption (if it was there)?

I still have a few unanswered questions here:

1. What is the reason of corruption I am facing in my TEST when my LIVE database was running in "force logging" mode? Is it to do with disk (I dont' see so)?

2. How will block formatting happen? Will it happen when we insert some data into these blocks..?..

3. Why do you say it is not corruption..?

Thanks for your patience.

Tom Kyte
May 26, 2009 - 9:58 am UTC

what have you done? what is this object DBA_FREE_SPACE_DFSPACE, why is it owned by sys, why is it in system? WHAT ARE YOU DOING????


the last two queries show those two blocks are NOT used by any segment - they are marked soft corrupt because they HAD been marked that way and HAVE NOT been reused yet.

And if you were to dump the other block - undoubtedly you would find it is above the high water mark for that table (that table that should not exist, does not belong, is not part of the sys schema) and hence hasn't been reused yet. It is in that segment but NOT USED - it hasn't been "newed" yet, it hasn't been reused.


your questions have all been answered many times over.

1) you do not have corruption
2) when we reuse the block, until we reuse it - it is the way it is.
3) because they are not corrupt.

Data file corruption

Ankit, May 26, 2009 - 6:35 am UTC

Hi Tom,

Our dbv script (runs daily) identified data file corruption in datafile of SYSAUX and other 3 datafiles of user created Tablespace

My colleague restored the Database to a earlier point in time as it was a dev box.

But how should one tackle this without data loss (like in production) ?

I know using RMAN we can do block level recovery but we use SQL backtrack to do the backups, the DB runs on 10.2.0.4


Tom Kyte
May 26, 2009 - 10:00 am UTC

sigh....

do you run dbv against a closed, shutdown normal database?

you are doing lots of unnecessary stuff here.


You almost certainly had NO CORRUPTION at all, you probably ran dbv against an open database or a shutdown abort database or a hot backup.

Ankit, May 28, 2009 - 6:16 am UTC

Actually there was a filesystem issue due to which the box went down (Looks like the shutdown abort) after which The dbv script (which runs in the morning) reported corruption in 4 data files, one of which belonged to SYSAUX TB

What do you sugegst we should have done for this ?


Tom Kyte
May 28, 2009 - 7:34 am UTC

ignore the dbv output - if you run dbv against influx files, of course it'll say "something looks amiss"

they were not corrupt, as suspected. You would have fixed them by typing "startup" - instance recovery would have 'fixed' them.

Ankit, May 28, 2009 - 6:17 am UTC

Also dbv runs against a open database in the morning, the backup is done at night

DBV against hot backup and open database

pinson, May 29, 2009 - 3:02 am UTC

Hi Tom,

Previously you wrote :

>> sigh....

>> do you run dbv against a closed, shutdown normal database?

>>you are doing lots of unnecessary stuff here.


>>You almost certainly had NO CORRUPTION at all, you >>probably ran dbv against an open database or a shutdown >>abort database or a hot backup.

I have to recommend a strategy in order to detect database corruption on our numerous world wide sites, whioch are running 24x7 databases and are doinf hot backups. So I opened an SR with Oracle Support in order to ask them about DBV running on hot backups or opened database. Their answer is :
Hi,

DBV can be run in both cases.
Please see note 35512.1 for the restrictions.

Thomas

So who should I believe? YOu or Oracle Support?

Oracle version we use is 10g (10.2.0.2.0 and 10.2.0.4.0)

Thanks a lot for the response.


Tom Kyte
May 29, 2009 - 8:34 am UTC

RMAN would accomplish that for you already.

You can use DBV against an open database in 10gr2 - but as I've tried explaining over and over here - you didn't have any corruption, you need to "new" those blocks (use them again) and until you do - you'll keep on getting them reported.


I would suggest you just use rman, it does block integrity checking when it backs up and that way you can just skip that entire dbv process. Validate your backups.


you should believe neither of us carte blanche

RMAN, yes, but...

pinson, May 29, 2009 - 10:34 am UTC

Thanks Tom.

I am very embarassed as I restored on our machine a backup received from one of our sites, which had corrupted datafiles (we worked together with Oracle Support on that case, finally, couldn't save the database, we had to restore from a backup) and RMAN did not detect any corruption, while dbv did, while ran against open database, normal closed database and backup files ( and it was a true one, as Oracle Support finaly gave up and asked us to restore from backup).

Thank you anyway by answering so fast (as you suggest using RMAN, I will not insists about dbv anymore, thus I would have like to know if running it against a hot backup is reliable, which i think not, because you have fractured blocks reconstructed when restoring, from the archived redo logs created during the hot backup), and your export show=yes method seems to be the best way to do this.

I always learn so much from your site, and helps a lot in my work. As you always say, you can learn every day something, even if one is working with Oracle since many years. Even after 22 years of Oracle practice, I still learn some new things every day.

For the carte blanche, thanks, I will keep it in mind...
Tom Kyte
June 01, 2009 - 1:54 pm UTC

and RMAN did not detect any corruption, while dbv did, ....

that would mean that the corruption happened AFTER the backup and before the DBV - else your restore would have failed. So, the next time you backed up - you would have been told "corrupt" and you would have corrected it then.



You never have fractured blocks in an RMAN backup, RMAN doesn't require files to be in backup mode (which would permit fractured blocks). The use of RMAN actually prevents fractured blocks from appearing in the backup set!


Understood, but...

pinson, June 02, 2009 - 4:51 am UTC

The history was :

1. Take an HOT OS backup
2. Restore it (it was succesfull: the corruption appears only when the applications touch the faulty block)
3. Run dbv against the successful restore --> corruption detected on a datafile
4. Run RMAN "backup verify" on the succesfull restore --> no corruption detected
5. Take a RMAN backup
6. Verify with RMAN the backup --> no corruption detected

When I was talking about fractured block, I was talking about HOT OS classic backups using "alter tablespace...begin backup" statements. I know with RMAN there is no fractured block, as you confirmed it.

Thank you Tom for your answers, it was most usefull for us.
Tom Kyte
June 02, 2009 - 7:40 am UTC


All over again

Ankit, June 04, 2009 - 8:58 am UTC

Tom,

I will try to present the situation i faced again

We had OS issue due to which the box went down (at night)

Then the SA got the system back on

Database came up fine

The scheduled dbv script which validates all data files showed corruption in SYSAUX Datafile and 3 other data files of a user created Tablespace


Following is the dbv output for SYSAUX Datafile >>

DBVERIFY - Verification starting : FILE = /bto/sys/data/ora01/sysaux01.dbf
Page 1967 is marked corrupt
Corrupt block relative dba: 0x040007af (file 16, block 1967)
Bad header found during dbv:
Data in bad block:
type: 0 format: 0 rdba: 0x00000020
last change scn: 0x0044.28decb0a seq: 0x7 flg: 0x9d
spare1: 0xa2 spare2: 0xe spare3: 0x10
consistency value in tail: 0x786d031f
check value in block header: 0x0
computed block checksum: 0x0

Page 1972 is marked corrupt
Corrupt block relative dba: 0x040007b4 (file 16, block 1972)
Bad header found during dbv:
Data in bad block:
type: 0 format: 0 rdba: 0x00000030
last change scn: 0x0088.28decb0a seq: 0xa1 flg: 0xe3
spare1: 0xa2 spare2: 0xe spare3: 0x3a12
consistency value in tail: 0x00800cbe
check value in block header: 0x715
block checksum disabled

Page 1973 is marked corrupt
Corrupt block relative dba: 0x040007b5 (file 16, block 1973)
Bad header found during dbv:
Data in bad block:
type: 0 format: 0 rdba: 0x00000040
last change scn: 0x0074.28decb0a seq: 0xdf flg: 0xb0
spare1: 0xa2 spare2: 0xe spare3: 0x2600
consistency value in tail: 0x00641a72
check value in block header: 0xa961
block checksum disabled

Page 4135 is marked corrupt
Corrupt block relative dba: 0x04001027 (file 16, block 4135)
Bad header found during dbv:
Data in bad block:
type: 0 format: 0 rdba: 0x00000010
last change scn: 0x0000.28dee122 seq: 0x5 flg: 0xd0
spare1: 0xa2 spare2: 0x15 spare3: 0x0
consistency value in tail: 0x00000001
check value in block header: 0x979
block checksum disabled

Page 63590 is marked corrupt
Corrupt block relative dba: 0x0400f866 (file 16, block 63590)
Bad header found during dbv:
Data in bad block:
type: 0 format: 0 rdba: 0x00000000
last change scn: 0x0000.00000000 seq: 0x0 flg: 0x00
spare1: 0xd4 spare2: 0xa8 spare3: 0x0
consistency value in tail: 0x00000000
check value in block header: 0x0
block checksum disabled



DBVERIFY - Verification complete

Total Pages Examined : 64000
Total Pages Processed (Data) : 16529
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 21676
Total Pages Failing (Index): 0
Total Pages Processed (Other): 7108
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 18682
Total Pages Marked Corrupt : 5
Total Pages Influx : 0
Highest block SCN : 3166775182 (2425.3166775182)


We use SQLBACKTRACK for backups(mostly HOT) and not RMAN(I have checked the Backtrack Script log it backups up by putting the Tablespace in Backup Mode)

I read your previous reply to my post in which you said if you run dbv against influx file it would say something is missing.


Also you said we did a lots of wasteful work and issuing a simple startup would have fixed them


So, I read about dbv in which says that if dbv encounters some pages/blocks which are currently been used by DBWn process it would be notified in Total Influx Pages.This means it has provision for noticing a page currently been used

Could you please tell me if using dbv is a right process in our scenario where we dont use RMAN and if it not right to use dbv what is the use of DBV(Why does Oracle provides it if we cannot rely on its output ?)
Tom Kyte
June 08, 2009 - 11:27 am UTC

what files are you using - from backup, from closed shutdown normal database, from open 'hot' database

and did you verify that those blocks actually *map* to some segment
and did you try to query that data up (if you have a file and block, you can use dbms_rowid create rowid to create a rowid range that'll get all rows on that block - just use 0 for the starting row number and 1000 for the ending.

Ankit, June 10, 2009 - 7:03 am UTC

The dbv script runs against a Open(Hot) Database, also this dbv script runs in the morning at 9 while the Hot backup runs at 2000hrs using SQL backtrack(No overlapping)

did you verify that those blocks actually *map* to some segment >> No i did not do this


This issue happened probably a month back on a Dev Box, i am trying to learn about it so in case this comes on production we are in a postion to handle it porperly.

Thanks
Tom Kyte
June 10, 2009 - 12:57 pm UTC

verify those blocks were actually used, else I think we are right back to the original issue - non-logged changes.

Since it happened about a month ago, there is probably NOTHING you can do now, you won't have what you need to look at anymore.


block corruption reported -> first step = figure out what segment, if no segment, then so what.

A reader, June 10, 2009 - 10:27 pm UTC

Tom,
I have a production and standby database.Force logging is not enabled and some tables are in nologging mode in production, When I scan the datafiles of the standby db with dbverify, i notice more than 10 datafiles are logically corrupted. How can I recover these datafiles?

Tom Kyte
June 11, 2009 - 8:14 am UTC

look into production and see if you actually did anything non-logged (v$datafile). If you did, you'll need to follow the (documented) procedures (see documentation) for synchronizing the two systems

and then enable force logging.

Restore archivelog validate

Tom, April 03, 2011 - 3:41 am UTC

Hi Tom,

As part of our backup strategy we run "restore backup validate" and "restore archivelog all validate" on a regular basis to ensure our backups are ok.

Recently, we have been receiving an odd error whereby using rman to list archivelogs shows nothing, but trying "restore archivelog all validate" gives errors about targets being missing. See below


RMAN> list archivelog all;

using target database controlfile instead of recovery catalog
specification does not match any archive log in the recovery catalog

RMAN> list backup by file;


List of Datafile Backups
========================

File Key TY LV S Ckp SCN Ckp Time #Pieces #Copies Compressed Tag
---- ------- - -- - ---------- --------- ------- ------- ---------- ---
1 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
2 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
3 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
4 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
5 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
6 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
7 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
8 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
9 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
10 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
11 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
12 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
13 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
14 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
15 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
16 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
17 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
18 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
19 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
20 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
21 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
22 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438
23 6114 B F A 397048355 03-APR-11 19 1 NO TAG20110403T082438

List of Archived Log Backups
============================

Thrd Seq Low SCN Low Time BS Key S #Pieces #Copies Compressed Tag
---- ------- ---------- --------- ------- - ------- ------- ---------- ---
1 38744 397048351 03-APR-11 6115 A 1 1 NO TAG20110403T084504

List of Controlfile Backups
===========================

CF Ckp SCN Ckp Time BS Key S #Pieces #Copies Compressed Tag
---------- --------- ------- - ------- ------- ---------- ---
397049676 03-APR-11 6116 A 1 1 NO TAG20110403T084506
List of SPFILE Backups
======================

Modification Time BS Key S #Pieces #Copies Compressed Tag
----------------- ------- - ------- ------- ---------- ---
03-APR-11 6116 A 1 1 NO TAG20110403T084506

RMAN> restore archivelog all validate;

Starting restore at 03-APR-11
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=108 devtype=DISK

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 04/03/2011 09:38:06
RMAN-06026: some targets not found - aborting restore
RMAN-06025: no backup of log thread 1 seq 38743 lowscn 397047712 found to restore
RMAN-06025: no backup of log thread 1 seq 38742 lowscn 397047710 found to restore
RMAN-06025: no backup of log thread 1 seq 38741 lowscn 397047708 found to restore
RMAN-06025: no backup of log thread 1 seq 38740 lowscn 397047705 found to restore
RMAN-06025: no backup of log thread 1 seq 38739 lowscn 397047703 found to restore
RMAN-06025: no backup of log thread 1 seq 38738 lowscn 397047701 found to restore
RMAN-06025: no backup of log thread 1 seq 38737 lowscn 397047699 found to restore
RMAN-06025: no backup of log thread 1 seq 38736 lowscn 397047048 found to restore
RMAN-06025: no backup of log thread 1 seq 38735 lowscn 397046565 found to restore
RMAN-06025: no backup of log thread 1 seq 38734 lowscn 397045261 found to restore
RMAN-06025: no backup of log thread 1 seq 38733 lowscn 397038320 found to restore
RMAN-06025: no backup of log thread 1 seq 38732 lowscn 397016846 found to restore
RMAN-06025: no backup of log thread 1 seq 38731 lowscn 396996455 found to restore
RMAN-06025: no backup of log thread 1 seq 38730 lowscn 396972634 found to restore
RMAN-06025: no backup of log thread 1 seq 38729 lowscn 396968015 found to restore
RMAN-06025: no backup of log thread 1 seq 38728 lowscn 396952259 found to restore
RMAN-06025: no backup of log thread 1 seq 38727 lowscn 396928803 found to restore
RMAN-06025: no backup of log thread 1 seq 38726 lowscn 396914213 found to restore
RMAN-06025: no backup of log thread 1 seq 38725 lowscn 396892621 found to restore
RMAN-06025: no backup of log thread 1 seq 38724 lowscn 396886314 found to restore
RMAN-06025: no backup of log thread 1 seq 38723 lowscn 396868896 found to restore
RMAN-06025: no backup of log thread 1 seq 38722 lowscn 396852963 found to restore
RMAN-06025: no backup of log thread 1 seq 38721 lowscn 396828455 found to restore
RMAN-06025: no backup of log thread 1 seq 38720 lowscn 396813059 found to restore
RMAN-06025: no backup of log thread 1 seq 38719 lowscn 396798755 found to restore
RMAN-06025: no backup of log thread 1 seq 38718 lowscn 396774411 found to restore
RMAN-06025: no backup of log thread 1 seq 38717 lowscn 396767477 found to restore
RMAN-06025: no backup of log thread 1 seq 38716 lowscn 396763179 found to restore
RMAN-06025: no backup of log thread 1 seq 38715 lowscn 396761452 found to restore
RMAN-06025: no backup of log thread 1 seq 38714 lowscn 396752262 found to restore
RMAN-06025: no backup of log thread 1 seq 38713 lowscn 396730861 found to restore
RMAN-06025: no backup of log thread 1 seq 38712 lowscn 396709408 found to restore
RMAN-06025: no backup of log thread 1 seq 38711 lowscn 396694252 found to restore
RMAN-06025: no backup of log thread 1 seq 38710 lowscn 396685723 found to restore
RMAN-06025: no backup of log thread 1 seq 38709 lowscn 396674904 found to restore
RMAN-06025: no backup of log thread 1 seq 38708 lowscn 396658544 found to restore
RMAN-06025: no backup of log thread 1 seq 38707 lowscn 396644453 found to restore
RMAN-06025: no backup of log thread 1 seq 38706 lowscn 396632565 found to restore
RMAN-06025: no backup of log thread 1 seq 38705 lowscn 396617215 found to restore
RMAN-06025: no backup of log thread 1 seq 38704 lowscn 396602604 found to restore
RMAN-06025: no backup of log thread 1 seq 38703 lowscn 396590438 found to restore
RMAN-06025: no backup of log thread 1 seq 38702 lowscn 396572899 found to restore
RMAN-06025: no backup of log thread 1 seq 38701 lowscn 396556590 found to restore
RMAN-06025: no backup of log thread 1 seq 38700 lowscn 396534816 found to restore
RMAN-06025: no backup of log thread 1 seq 38699 lowscn 396514261 found to restore
RMAN-06025: no backup of log thread 1 seq 38698 lowscn 396494671 found to restore
RMAN-06025: no backup of log thread 1 seq 38697 lowscn 396486144 found to restore
RMAN-06025: no backup of log thread 1 seq 38696 lowscn 396462145 found to restore
RMAN-06025: no backup of log thread 1 seq 38695 lowscn 396440971 found to restore
RMAN-06025: no backup of log thread 1 seq 38694 lowscn 396423702 found to restore
RMAN-06025: no backup of log thread 1 seq 38693 lowscn 396413324 found to restore
RMAN-06025: no backup of log thread 1 seq 38692 lowscn 396401814 found to restore
RMAN-06025: no backup of log thread 1 seq 38691 lowscn 396387880 found to restore
RMAN-06025: no backup of log thread 1 seq 38690 lowscn 396364151 found to restore
RMAN-06025: no backup of log thread 1 seq 38689 lowscn 396341393 found to restore
RMAN-06025: no backup of log thread 1 seq 38688 lowscn 396328078 found to restore
MAN-06025: no backup of log thread 1 seq 38687 lowscn 396315912 found to r
RMAN>


As you can see, RMAN is trying to validate archivelogs which exist well before the latest backup despite claiming no knowledge of them!

Any idea what we might have done to cause this (i.e. is it self-inflicted or is it likely to be a bug)?

Also, am I right that the best way to clear this error would be to backup controlfile to trace and then recreate the controlfiles (so as to clear the rman history)?