Skip to Main Content
  • Questions
  • Indexes require recreation post IBM Global Mirroring

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Sanji.

Asked: June 10, 2008 - 5:44 pm UTC

Last updated: October 17, 2008 - 9:13 pm UTC

Version: 9.2.0.7

Viewed 1000+ times

You Asked

Tom,

Env. Oracle 9.2.0.7, HP-UX 11i, SAN IBM-DS8100.

We implemented IBM's Global Mirroring for our DR process.

I'm not sure if this information is relevant, but never the less i thought of posting it.

Global mirroring is basically
1. Creation of consistent point across replicated environment
2. Transmission of required updates to the secondary location
3. Saving of consistent data to ensure a consistent image of the data is always available.

Using the data freeze concept, consistency is obtained by temporarily inhibiting write IO to the devices in an environment and then performing the actions required to create consistency. Once all devices have performed the
required actions the write IO is allowed to resume. This might be suspending devices in a Metro Mirror environment or performing a FlashCopy when using
consistent FlashCopy. With Global Mirror, this action is the creation of the bitmaps for the consistency group.

Now, during global mirroring, the tablespaces are not required to be put in backup mode (as claimed by IBM).

When we recover a database on the DR server, the recovery is pretty much at instance level. This is because the entire disk set (at OS block level) is copied to the DR site, including the redo logs.

We haven't had a problem with instance recovery, but once the database comes up, we start getting
ORA-00600: internal error code, arguments: [12700], [11173], [168333041], [5], [247521727], [25], [],
errors, upon issuing DMLs on some of the application's tables.

I had opened a service request for the same, but didn't really get a convincing reply. The concept of block level copy through Global Mirroring was kind of difficult to get across.

The error essentially means that "A mismatch was found between the index rowid and the data block it is pointing to. The rowid points to a non-existent row in the data block.The corruption can be in data and/or index blocks."

Everytime we encounter the corruption, we analyze the tables and indexes and the corruption has always been reported on indexes. The only workaround available is to drop the indexes and recreate them. Things work fine thereafter.

When we update a table, for instance, insert a record in the table, the information is logged in the redo along with the updates to the table
s index structure as well. So essentially all this information is with the redo logs. We are missing index entries on the DR location post Global mirroring.

If Global mirroring claims that it transfers only consistent blocks to the secondary location, then what "might" be prompting the index corruption.

May i request for your thought's on this.

Regards
Sanji

and Tom said...

I would suspect the entire infrastructure - it is not doing a consistent copy.

What you might try doing - before mirroring - is alter database suspend IO, have ORACLE suspend writing. If it always works that way - then.... probably....

IBM is copying things at an OS block level (say 512bytes).

Oracle is writing things at a database block level (say 8k - 16 times the OS block size).

They suspend IO - but they suspend IO at a low level - the OS block level. We had a chance to write out the first 7 or 8 OS blocks - have 8 or 9 more to go to have a complete database block - and they suspended us. Now you are dead in the water. You have a broken, fractured block - that cannot be repaired from REDO (when you put a tablespace into backup mode, we log a full block image the first time a block in that tablespace is modified so we can fix fractured blocks - we normally do not do that)

putting tablespaces into backup mode would accomplish the same 'feat'


Actually, I always wonder why someone would use this disk stuff for disaster recovery of a DATABASE when the DATABASE provides this feature (data guard) and provides a useful DR database every day - you can use it to offload backups, you could use it read only for reports and the transmission of data to sync up the two is typically 1/7th or LESS for data guard....


Rating

  (9 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Just Because

A reader, June 11, 2008 - 2:02 pm UTC

Actually, I always wonder why someone would use this disk stuff for disaster recovery of a DATABASE when the DATABASE provides this feature (data guard) and provides a useful DR database every day - you can use it to offload backups, you could use it read only for reports and the transmission of data to sync up the two is typically 1/7th or LESS for data guard....

In my case, mirroring and o/s images are used because the business says that is the way we have to do it. The box admins do not understand how database backups work and don't trust them because they have always done it "their" way. I just nod and agree and then continue to do database backup and recovery the way I was taught...

Tom Kyte
June 11, 2008 - 8:34 pm UTC

... because the business says that is the way we have to do it. ...

part of our responsibility as IT professionals is to educate "the business".

Fractured blocks

Sanji, June 11, 2008 - 2:41 pm UTC

Tom, does it mean that instance recovery "can" succeed even if there are fractured blocks on disk, since the DR copy does come up when we start the database ?
It's only on issuing DMLs that we start encountering the errors and that too only indexes.
What's surprising is that only the indexes get corrupted.

Regarding the disk level copy against dataguard, I had implemented dataguard for the same purpose but the management decided against it because bandwidth between Shelton & Mesa (our DR site) wasn't enough for log transport services.
With ssh tunnel, it was transferring 1G (size of our archives) in around 22 mins. I had configured 16 redo log files and 4 archivers. All of the archivers were using the same tunnel. So multiple archivers too wouldn't have solved the problem.

Global mirroring had a "different" approach to data transfer, so it was prefered.

Regards
Sanji
Tom Kyte
June 11, 2008 - 8:36 pm UTC

... Tom, does it mean that instance recovery "can" succeed even if there are fractured blocks on disk, since the DR copy does come up when we start the database ? ...

sure, it just means you have corruption and until you actually HIT that block, we'll not know.

I'll betcha a table or two is affected, you just haven't hit it.


What approach did global mirroring have that allows it to transfer 7 times the data in less time?

Very interesting discussion

Tom G., June 11, 2008 - 4:28 pm UTC

I'm embarrassed to admit that I'm not intimately familiar with
the finer details of Oracle Data Guard and the various
mirroring options that surround the Oracle database. Our
backup and recovery and disaster recovery approaches are
pretty basic.

Tom, I was surprised to see the reference to "alter database
suspend". I'm quite certain that you meant "alter system
suspend" instead.

Thanks for the good work, though!


Tom Kyte
June 11, 2008 - 9:00 pm UTC

yah, alter system, thanks!

EMC SRDF for protection

A reader, June 12, 2008 - 4:53 pm UTC

Tom,
In our environment, there seems to be a big push from our storage management folks to use EMC's SRDF to protect our data and for data recovery. One of the strong arguments for this is that it can be used to protect both the Oracle database and changes on the application servers.
I'll be honest, I don't know much about EMC's SRDF. I've done a bit of research using Google, and seem to have found that it can be used to protect Oracle databases. However, I'm not sure if this is the best approach or not. At this time, I don't think there is any desire to have the secondary database up and running for any purpose (such as reporting or testing), except to be ready for data recovery purposes.
Are there reasons why Data Guard is really the better approach and SRDF should really be "pushed aside" as a proper solution?
If we did failover, would OEM Grid Control be able to immediately start monitoring the secondary instance, or are there some requirements to change Database IDs or something else?
We are running Oracle 10.2.0.2 databases in a non-RAC, non-ASM environment.
Thanks for your wonderful insight.
Tom Kyte
June 12, 2008 - 6:48 pm UTC

You already own data guard.

Therefore, your can reduce the cost of SRDF by the amount of data you store in Oracle which is probably the single largest consumer of space.

And use 1/7th or less the network bandwidth.

You would use SRDF to replicate "files".
You should use the database to DR a database.

There is no reason not to use both.

... Are there reasons why Data Guard is really the better approach and SRDF
should really be "pushed aside" as a proper solution?
...

no one said to push aside SRDF, all I'm saying here is:

o we'll reduce hugely the amount of SRDF you need
o you get a benefit from a standby that you won't with SRDF copying it
o you'll do it with tons less network traffic
o you can do it over hundreds of miles (not so with SRDF)

if you use data guard (DG), yes, all of the tools are DG aware.

BEGIN BACKUP ... END BACKUP are still required

Hemant K Chitale, June 30, 2008 - 3:17 am UTC

I would recommend Sanji read MetaLink Note#300225.1


SRDF Vs. DG

Mac, October 06, 2008 - 1:34 pm UTC

Thank you Tom for the helpful responses.

I am curious why 2 posts prior (June 12, 2008 at 6pm US/Eastern) when you listed benefits to DG:

"Tom Kyte"
o we'll reduce hugely the amount of SRDF you need
o you get a benefit from a standby that you won't with SRDF copying it
o you'll do it with tons less network traffic
o you can do it over hundreds of miles (not so with SRDF)

Why didn't you mention the possibility of fractured blocks as you did when answering the first post of the thread? It seems like if that is a possibility of using SRDF (hardware global mirror between SANs), isn't that the #1 reason not to rely on SRDF/mirroring and rely first and foremost on DG?

Thanks again.
-Mac
Tom Kyte
October 06, 2008 - 3:09 pm UTC

well, fractured blocks are a possibility without SRDF - we are expecting that.

think of an OS crash.

SRDF Vs. DG

Mac, October 06, 2008 - 3:58 pm UTC

Is it fair to say that it is more likely to suffer a full database crash (both primary & secondary) from encountering a fractured block when using SRDF rather than DG? (in theory)

If I were using SRDF, would a fractured block in the database be mirrored to the standby site if the primary were to crash? If I were using DG, what would happen to the same fractured block?

Thanks again.
Tom Kyte
October 07, 2008 - 1:04 pm UTC

forget fractured blocks - SRDF would propagate immediately any and all logical corruption and many (most) physical corruptions.


fractured blocks

Lou, October 16, 2008 - 1:54 pm UTC

Tom,
Wouldn't data guard (and possibly rman if done either on the primary site, or on the secondary if they are using active dg) identify fractured blocks? If this is the case, as you said, SRDF would simply propogate the fractured blocks along to the secondary site.

Does dataguard help identify fractured blocks so the DBA can fix them, while SRDF simply propogates them?

Thank you.
Tom Kyte
October 17, 2008 - 9:13 pm UTC

fractured blocks are a BACKUP THING - if you use an OS tool to read an open Oracle datafile - you can get fractured blocks in the backup (read the head of a block at time t1, at time t2 the OS rewrites the block, at time t3 read the tail of the same block - now head != tail)

it is a backup thing (and RMAN, not being an OS copy - says "hey, head != tail, re-read that block - so it fixes it)

I'm not sure where you are going here with this.....



fractured blocks

Lou, October 20, 2008 - 12:29 pm UTC

I guess I was under the impression that if my SAN administrators were attempting to use storage-mirroring (snap clones) to maintain the disaster-recovery site's copy of the database, then they'd be at risk for propogating fractured blocks to the disaster-recovery site undetected.

It sounds as if this is not possible.