Thanks for the question, Abdullah .

Asked: May 02, 2000 - 1:20 pm UTC

Last updated: April 09, 2012 - 11:42 am UTC

Version:

Viewed 1000+ times

You Asked

I need to know what is the Disaster Recovery Plan do
you have for a Oracle Database?
Knowing that the database is Oracle 7.3 and it is mounted on a
SunOS 5.5.1 box.

best regards,

and Tom said...

There are at least 5 options to consider:

o OPS (Oracle Parallel Server)
o Replication
o Standby database
o OS oriented solutions
o Architectural Solutions

We'll look at some of the pro's and con's of each. I won't be able to directly answer your question "which one" as there are many variables to consider.

o OPS (Oracle Parallel Server)

In this scenario we are are using Oracle's ability to have >1 computer, more then 1 instance mount and open a single database. In the event a machine fails -- the other machines continue processing with little to no interuption. We might for example use a configuration with 2 nodes. If one node fails, the other node will recover any transactions the failed node had outstanding and rollback any in progress transactions from the failed node. The system will never go down (but users that were processing a transaction on the failed node will recieve an error and will have to resubmit their transaction, your software could of course detect the failure and automatically resubmit as well).

The upside to this is:

- increased scalability horizontally. If you need more capacity, you can add another machine to the node. You do not need to necessarily buy bigger machines -- you need more machines.

- proven technology. OPS has been around for a long time, we have many people using this -- it is well known and understood by many people. There are design considerations you have to be aware of -- but they are well known (which is half the battle). For example, partitioning will play a key role in designing an application with OPS.

- fairly transparent. As opposed to replication for example -- replication will impact how the application interacts with data. OPS will not. At the application level, the fact you are using OPS is not as relevant as it would be with replication. OPS will impact the DBA but not necessarily the programmer. Replication will impact both.

Some of the downsides are:

- you have to design to work with OPS. You have to understand it. You need to be aware of the potential performance bottlenecks. In 8.1.6 (Oracle8i, release 2) there is a special mode of OPS in a 2 node cluster where 100% of the performance issues are a NON-issue. Both nodes are used to read data -- only one at a time to modify. The 2'cnd node will take over for the first node in the event of a failure.

- performing upgrades will most likely cause server downtime. Typically (there are exceptions) all nodes would be offline to upgrade. The time to perform the upgrade will vary but may be as long as 1 hour. These are upgrades to the Oracle software -- not to the OS.

- DBA must be fully aware of an OPS environment. Demands a DBA who knows OPS, not just an Oracle DBA.

o Replication

Replication can be used to perform failover -- but it wasn't really designed for failover. It was designed to make data available in more then one location. The way replication works is that a transaction takes place at 'db1'. That transaction is stored in a local queue at 'db1'. when the user commits, the database will (in the background, not as part of the user transaction) forward the transaction stored in the queue to other databases interested in that data. This transaction is then performed at these other sites. Replication is asyncronous in nature - you have to assume that at any point in time, the various databases can and will contain different data as they are all at different points in time. This fact can make application development somewhat harder. Lets say the site you are building is a site to sell stuff. You have an inventory. You have X widgets in inventory. At 2 different sites, people simultaneously order X widgets. As far as your software is concerned, both orders can be filled. It is only AFTER the transactions at both sites have been performed that you discover in the background that you now have -X widgets in inventory (you had an order for 2X widgets). This is known as an update conflict and it is up to you to 'fix it'. Oracle discovers it for you but will then invoke user defined code to fix the problem. This process must be part of your application design.

You can decide to avoid conflicts by partitioning data (eg: west coast data and east coast data, you need to goto the west coast server to affect west coast data and vice versa). Many people do this however if you fail over from the west coast to the east coast -- you must remember that there probably are transactions queued in the failed west coast machine that will be propagated to the east coast when you fix the problem. You still have the potential for update conflicts when using replication for failover -- you still need to design for how you will handle these issues.

if failover and availability are the key issues -- OPS wins out in my opinion. if redundancy is the main issue -- replication is the answer (as OPS is not a redudant solution -- there is only one database). They are equal in in the backup area (same issues, same considerations).

Both replication and OPS have certain performance implications. Both will have to be designed for in many respects. With OPS and a 2 node configuration -- we can eliminate the performance considerations. With OPS and a single write node and many (more then 2) read nodes, we can eliminate most of the performance considerations. With replication, a client transaction will generate about 2x the IO for any given WRITE transaction and will not affect read transactions at all.

In short -- both will have issues.

Some of the upsides of replication are:

- replication works over long distances and WANS.

- replication is tolerant of short network outages between database servers (short is relative here and is a function of the amount of changes performed in a given period of time. you do not want the servers to get too far out of sync. it could affect the application as well if the data diverges too much).

- replication imposes a 'fixed' overhead (2X the data modified for inserts/deletes, 3X for updates (before and after images are stored in the queue).

- replication provides you with a copy of the data elsewhere. with OPS -- there is one database. An earthquake can take out the only copy.

Some of the downsides are:

- conflict resolution and its impact must be carefully studied and designed for.
- you need a DBA familar with this technlogy.
- your developers need to be familar with the impact of this technology.

(NOTE: you can use OPS and replication together. OPS for rapid failover, replication for a hot spare. something else to consider).

o Standby Database

the standby database is a nice solution for failover. In this mode, the logs generated by Oracle are forwarded to another computer in the network and applied to a database instance that is in constant recovery mode. This standby database can be opened for reporting and such (read only) but generally is unavailable while in recovery mode. This standby database lags behind the real database by some small amount (governed by the size of your redo files) but can be brought on line rapidly after a failure and can be totally caught up to the master database if you can get to the file system on the down machine (eg: plug the disks that contain the redo into another machine). You should be aware the replication has the same sort of lag time as a standby database would.

Some of the upsides of standby databases are:

- they are extremely easy to setup.
- they have no performance impact on the master machine.
- they impose no design considerations on the application.

Some of the downsides are:

- the DBA must be aware of the standby database. If you make structural changes to the master, they must be coordinate to the standby. This is minimal.

- once you failover to the standby -- there is *no* failback. The standby will become the master. You will build a new standby to cover this new master.

o OS oriented solutions.

Here I am talking about dual ported raid devices or other types of operating system failover solutions. the concept here is that 2 machines have the same set of disks mounted. In the event one machine fails, the second machine takes over the IP address and starts a recovery script of some sort. This script would startup the Oracle instance, let it recover and you'd be good to go.

Talk to your OS vendor about these types of solutions.

o Architectural Solutions.

this is what i call the BIG server versus many SMALL server solution. BIG is a giant single server running on an E-10000. When they have a failure -- everything is down (their failures have to date mostly been operator error. failure to install recommended patches, failure to follow procedures and such). A database failure at BIG is visible to everyone immediately. It is all or nothing.

SMALL on the other hand has lots of little machines with lots of little databases. You want stocks -- you goto quote.small.com. You want some business news -- you goto that server. Tech news - else where. You want a random site, thats random.small.com. And so on. SMALL is a whole bunch of databases working together. If part of SMALL is down -- hardly anyone notices since each server is just doing a little part of the whole. SMALL never goes down but at any point in time, part of it might be. This requires lots more design and a really good / redundant directory service (to find stuff) but is doable.

Whether this can work for you depends on the nature of your site.

Sorry I cannot answer your question with a "use this technology". It is just that there are too many variables and it depends on your needs. I've worked on projects where OPS was right, replication was right, OPS+replication was right, standby was right, yahoo architecture was right and so on. This email will probably raise more questions then answer....

Rating

(54 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Its Awesome

A reader, July 12, 2001 - 5:44 pm UTC

Pichaimani, July 15, 2001 - 11:17 pm UTC

Thanks for explaning and rolling out all the options.

A reader, May 02, 2002 - 5:30 pm UTC

Hello Tom,

I was reading up on the dataguard (standby) feature of 9i. Using dataguard the standby database can be configured to be at no lag (there are other options too) with the primary by appliying the ONLINE REDO to the standby database. But it has very big downsides.

1. The synchronous transmission and affirmation of redo-apply will surely cast a big performance impact on the primary database.

2. When due to any (network or other) problems, if the redo transmission is suspended, Oracle will shutdown the Primary database! (just like when there is no space for redo writes). That is outrageous!! What is high availability then?

I wonder if the standby database is that good an option for high availability. If you go for 100% sync with the primary database you have to take the above risks. On the other hand, if you adapt the 8i standby architecture, you will always lag behind. Oracle's RAC, Replication or o/s level mirroring all looks much better options to me. Could you comment on that?

Also, does Oracle support anything like below:

The Primary database files are on a tripple mirrored RAID array. The array can be mounted on to another box easily. That box has a similary configured instance (with same name) running -but no database is mounted on it. When the primary becomes unavailable, mount the the array to the second box and mount the database. Database should do instance recovery upon opening. Will it not have 100% transactions available?

Is this possible? Or am I day dreaming? Waiting for your comments.

Thanks

May 03, 2002 - 10:15 am UTC

1) you get what you pay for this much is true -- yes. If you CANNOT stand to lose a single committed transaction ever (say you are a BANK), then you will

a) design for this
b) size for this
c) pay for this (in network bandwidth and low latency)

Also bear in mind that dataguard supports 4 levels:

GUARANTEED:
o lgwr does a syncronous write over the network to the standby redo
o if the standby fails (network/whatever) PRIMARY STOPS

INSTANT:
o lgwr does a syncronous write over the network to the standby redo
o if the standby fails (network/whatever) PRIMARY CONTINUES and data
guard falls back to the "send the archive redo logs later" mode

RAPID
o lgwr does an asyncronous write over the network to the standby redo
o if the standby fails (network/whatever) PRIMARY CONTINUES and data
guard falls back to the "send the archive redo logs later" mode

DELAYED
o arch does the work, transmitting archive redo logs as standby in 8i
and before did
o if the standby fails (network/whatever) PRIMARY CONTINUES and data
guard falls back to the "send the archive redo logs later" mode

So -- you tell us -- what level do you want? what level do you need? Me, I see GUARANTEED for people who cannot afford to lose a BYTE of data EVER (banks, financial institutions pop into my head -- my HR database, no, my 401k -- YES)

I don't see instant as being commonly used.

RAPID for most all other implementations

DELAYED for people who can afford to lose relatively large numbers of transactions when the building housing the data center blows up, burns down, whatever.

2) Outrageous? Not if you are a financial institution whose design spec states "a transaction is not to be considered committed until the transaction is safely stored in at least two geographically disperse locations" (and most of them do).

Guaranteed mode is like a really really really fast two phase commit. In the past, people used a syncronous replication technology (both sites MUST be available). Guaranteed mode is a no data loss mode, it is NOT necessary for HA -- it is for continuity of operations. Once the primary or standby is taken out (remember 9/11), the other shutsdowns UNTIL you start up a standby for it. This is used by people who cannot afford to ever lose a transaction (what do you mean that million dollars you transferred disappeared??)

You need to look at RAPID protection I think. Actually, you need to look at all four, understand when and where you would use them and then apply the appropriate technology in the right cases!

Here is how one would design an HA system that can afford to lose a very small amount of transactional data in the event of a truly catastrophic failure:

a) RAC in a room, because you NEVER want to failover unless the room goes away.
b) data guard in rapid mode, because you don't want to lag behind too much and you want to impact the system as little as possible. You will work to fix standby issues ASAP when they occur.

If you cannot afford to lose a single transaction -- change rapid to guaranteed and pay for the redundant networks to ensure they are not the issue with the standby. You will design for and pay for this extra level of HA'ness.

Yes, you can do that last paragraph and in fact there are many OS features that facilitate it. It is much like "RAC in a ROOM" and doesn't give you any data guard like features.

A reader, May 03, 2002 - 11:23 am UTC

You can also use storage replication like EMC's SRDF. It replicates on block level. You can set how out of phase the two sites can be or not at all. But you will pay big time for it. But its not cheap.

</code> http://www.emc.com/products/networking/srdf.jsp <code>

May 03, 2002 - 11:51 am UTC

And IT IS NOT the best way to do this either.

Consider how a block is changed. We will:

o generate a change to the block
o generate a change to the undo block
o generate two copies of redo at least
o and probably two archives

you have 6 IO's eventually in all probability. These will all be transmitted across the network.

With data guard, you'll send 1 redo log change, period. That is a significantly reduced amount of network traffic -- significant.

Addditionally, with SRDF, that other copy is useless. You cannot really use it for anything EXCEPT as failure. With data guard you can use it to offload backups or as a reporting instance (even in guaranteed mode, you can use the standby as a reporting instance)

A reader, May 03, 2002 - 11:30 am UTC

Exactly. I agree one has to pay for it. But I am not convinced standby database (dataguard) has any advantage over an SDRF kind of replication. That is what I wanted to findout. If we can mount the replicated filesystem to another instance and open the database with 100% transactions, that looks a better option to me than data guard with 2 phase commit, unless I am missing some other feature of standby database.

Thanks anyway Tom for your time.

May 03, 2002 - 1:46 pm UTC

Data guard is INSTEAD OF 2PC. It is not 2PC.

See comment right above as to why data guard would be better then SRDF (less costly, more functionality)

SRDF has its place, just not in mirroring datafiles.

A reader, May 03, 2002 - 12:27 pm UTC

There you go! Thanks Tom for pointing out the advantages. That's exactly what I wanted to know.

You may be incurring more network traffic with SDRF. But with standby you also incur time for applyig the re-do in the standby instance, then confirm that back to the primary. And your transactions are waiting on the primary all this time. Also if the redo transmission is interrupted you have a business disruption!

You are correct. Oracle has given a number of options and you pick the most suitable for you. Thank you verymuch for helping us to make that choice.

May 03, 2002 - 1:54 pm UTC

The standby DOES NOT APPLY THE REDO.

A data guard standby recieves the redo, writes it to disk and returns. It does NOT apply it.

The redo on the standby is NOT APPLIED until after the redo log file fills up.

You only have a business disruption IF YOU CANNOT MIRROR THE REDO (not apply) AND IF YOU CONFIGURE IT TO STOP (as I would hope SRDF would give you).

So, in short, LGWR transmits the redo, redo is saved on standby and we return (guaranteed and instant modes). You DO NOT APPLY IT at remote.

LGWR transmits the redo (rapid mode) and you do not wait at all. Standby recieves it in the background and writes it to standby redo. You do NOT APPLY IT at remote.

The redo is not applied until a full redo has been sent over.

Good Overview of the options

A reader, August 02, 2002 - 1:18 pm UTC

Reader

A reader, November 03, 2002 - 9:29 pm UTC

"
(even in guaranteed mode, you can use the
standby as a reporting instance)
"
Will the recovery of standby take place in the background even while the
standby is being used for reporting purposes

"
The redo on the standby is NOT APPLIED until after the redo log file fills up.
"

What happens if the someone did "alter system switch logfile" at the PRIMARY database

November 03, 2002 - 9:37 pm UTC

same thing as would happen as when the logfiles switch by themselves.

In guaranteed (instant and rapid) mode -- the redo is transmitted by lgwr in real time. the redo was already at the standby site when you switched the logs. The redo is accumulated there. If you are using a physical standy the database is either

o in recovery mode, applying the redo
o opened for read only access - redo is just getting queued up, not applied but safely tucked away on the standby.

In a logical standby (new with 9iR2) the redo can be applied even when the database is being used for reporting (or it can be applied in a DELAYED mode -- so that the standby runs an hour or two or whatever behind production -- to protect from that "accidently" dropped table)

SEVERAL SMALL vs ONE BIG

Abu Ahmadh, February 10, 2003 - 9:50 pm UTC

One of our table will grow up to 2TB in size. We wanted to split it into several pieces of databases as you mentioned in Architectural solution for better manageability and performance. Of course with partition. But, the application vendor does not certify to run on multiple small databases.

Now, our solution turned to be RAC. That means, we will have one single BIG database with multiple instances on different nodes. My questions are:

(a) What will be the impact on performance with our proposed approach?
(b) If batch jobs are running against the big table, how does it affect on-line performance?
(c) How do we size for sorting? Will RAC utilize temporary tablespaces of all running instances for a single process?
(d) Does the vendor need to certify if we deploy RAC when Oracle claims that Requirement to code change is NIL?
(e) Can we manually control RACs distributed processing to say, batch jobs will always run against a single instance?

There could be some stupid questions, but I need your expert advice to all my queries.

Thanks Tom

February 11, 2003 - 8:12 am UTC

a) it'll be positive in general. At the time of writing this particular article (jan 2000), RAC did not exist. It is not OPS, it is quite different.

b) like any other thing would. increased contention for IO resources -- but with RAC you can split out the workload by instance so the cpu's don't contend. But in general -- just like any other system with mixed workload

c) pretty specific here -- given you are at the information gathering level. suggest you give the RAC concepts guide a read through

d) the VENDOR must be willing to support your configuration - yes, 100%, absolutely. If they do not, what happens when you call their support line?

e) yes you can.

VLDB impact to startup & shutdown

A reader, February 12, 2003 - 11:19 pm UTC

Will there be any impact to startup/shutdown DB due to multiple instances? Let us say there are 10 nodes in the cluster. What will be the impact in terms of time?

February 13, 2003 - 9:22 am UTC

each instance is independent of the others, there really isn't an impact.

First-Class article about choosing the best Disaster-Recovery Plan

Sami, April 28, 2003 - 12:19 am UTC

John, June 04, 2003 - 9:15 pm UTC

Hi Tom,

We have a database 9.0.1 enterprise version on a windows 2000 server. We may want to purchase another windows 2000 server as standby server. Probablly we will implement data guard for them. However I am not sure when the master goes down, how can we let the client know the standby server takes over. Is it possible to let standby server use the master's IP address through failover procedure? Or you may have other better options. Our database client is a application server using java thin jdbc.

Thanks in advance.

June 05, 2003 - 7:35 am UTC

normally this is done via TNS.

you can specify a list of hosts to try in order when connecting in tnsnames.ora files.

you can use ldap or OracleNames as a nameserver to change the address centrally.

you can use IP failover as well -- but it wouldn't generally be necessary.

If you are big enough to run a failover site, you are probably big enough to use a centralized nameserver instead of little tnsnames.ora files all over the place. You would control is centrally and a simple update would repoint the tns entry.

lgwr and arch behavior

Martin Guillen, September 02, 2003 - 12:03 pm UTC

Hi Tom:
Let me know if I understand this:
If I'm using lgwr to send redo to standby it is NOT applied until a log switch occurs on primary, right?
I was thinking that lgwr apply changes immediately but it seems to be wrong (seeing your response).
So what's the advantage of using lgwr instead of arch?
Also I want to to know what is the purpose of standby redo logs.
Thank you,
Martin.

September 02, 2003 - 3:00 pm UTC

the standby redo logs get filled AND THEN applied at the standby.

The advantage of using lgwr? ok, you are on production. production has 50meg log files. You have just filled up 49.9meg of log. production is blown up.

that 49.9 meg of log is probably gone, forever. will never become an archive file, will never get over to standby.

so, you use standby -- same situation but lgwr writes to the online redo log and sends to the standby. production is blown up. no worries, the 49.9 meg of committed transaction data is on the standby -- apply it, open the database and you are back in business.

A reader, September 16, 2003 - 10:50 am UTC

Hi Tom,

I was reading the below paragraph in one of the followups and you answered that it is possible. Can I get some additional info on this please?

<QUOTE>
The Primary database files are on a tripple mirrored RAID array. The array can be mounted on to another box easily. That box has a similary configured instance (with same name) running -but no database is mounted on it. When the primary becomes unavailable, mount the the array to the second box and mount the database. Database should do instance recovery upon opening. Will it not have
100% transactions available?
<END QUOTE>

Is it possible to do a recovery like that if the tablespaces were not in backup mode? Isn't it just like a hot backup where you can have split blocks in the backup copy and you need full block copies in the logfile to fix them?

September 16, 2003 - 12:49 pm UTC

the disks would just appear to have a "crashed oracle database" on them. it is no different then if you DIDN'T mount them to another machine, but just "started up oracle" again.

all they did was move the disks from computer 1 to computer 2. it is not a media recovery situation, just an instance recovery.

A reader, September 16, 2003 - 5:19 pm UTC

Hi Tom,

Please see </code> http://asktom.oracle.com/pls/ask/f?p=4950:8:4496575172726829804 <code>There you say that splitting the mirrors requires putting the tablespaces in backup mode.

Here is my thinking:

When you suspend the i/o, the database queues up info in the buffer and does not write to the files. Now you split the mirrors. The datafiles are fuzzy and for recovering you need full images of the blocks in your re-do stream. Can you please explain how this is different from the above referenced thread.

thanks.

September 16, 2003 - 6:33 pm UTC

it is different because that thing (split mirrors + backup mode) was to BACKUP the database.

this, this is just failing the hardware over to another computer. it is not about backing up at all. it is about failing over.

A reader, September 16, 2003 - 5:23 pm UTC

Sorry, I this is the link I referred to:

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:373618158622, <code>

(Having a hardtime with these links, sorry I put it at the wrong place twice....)

Test Disaster recovery using RMAN ....

READER, September 18, 2003 - 8:46 pm UTC

when I backed up my database using rman, it did backup readonly datafiles along with others. However, when I practiced disaster recovery after removing all of the files, rman did not restore read only datafile. Why? THANKS.

RMAN> backup database; <------- IT DID BACKUP READONLY DATAFILE

Starting backup at 18-SEP-03
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=12 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00001 name=/home2/user18/ORADATA/u01/system01.dbf
input datafile fno=00005 name=/home2/user18/ORADATA/u02/sample01.dbf
input datafile fno=00002 name=/home2/user18/ORADATA/u02/undotbs01.dbf
input datafile fno=00003 name=/home2/user18/ORADATA/u03/users01.dbf
input datafile fno=00004 name=/home2/user18/ORADATA/u03/indx01.dbf
input datafile fno=00006 name=/home2/user18/ORADATA/u01/querydata01.dbf <--- READONLY
channel ORA_DISK_1: starting piece 1 at 18-SEP-03
channel ORA_DISK_1: finished piece 1 at 18-SEP-03
piece handle=/home2/user18/oracle/dbs/01f1kuho_1_1 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:26
Finished backup at 18-SEP-03

Starting Control File and SPFILE Autobackup at 18-SEP-03
piece handle=/home2/user18/oracle/dbs/c-411229649-20030918-00 comment=NONE
Finished Control File and SPFILE Autobackup at 18-SEP-03

PLEASE SEE BELOW THE STEPS THAT I FOLLOWED TO PERFORM DISASTER RECOVERY.

RMAN> connect target;

connected to target database (not started)

RMAN> set dbid 411229649

executing command: SET DBID

RMAN> STARTUP FORCE NOMOUNT;

startup failed: ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/home2/user18/oracle/dbs/initU18.ora'

trying to start the Oracle instance without parameter files ...
Oracle instance started

Total System Global Area 161187360 bytes

Fixed Size 730656 bytes
Variable Size 109051904 bytes
Database Buffers 50331648 bytes
Redo Buffers 1073152 bytes

RMAN> RUN
2> {
3> ALLOCATE CHANNEL C1 DEVICE TYPE DISK;
4> RESTORE SPFILE FROM AUTOBACKUP;
5> }

using target database controlfile instead of recovery catalog
allocated channel: C1
channel C1: sid=9 devtype=DISK

Starting restore at 18-SEP-03

channel C1: looking for autobackup on day: 20030918
channel C1: autobackup found: c-411229649-20030918-01
channel C1: SPFILE restore from autobackup complete
Finished restore at 18-SEP-03
released channel: C1

RMAN> STARTUP FORCE NOMOUNT;

Oracle instance started

Total System Global Area 26968864 bytes

Fixed Size 729888 bytes
Variable Size 20971520 bytes
Database Buffers 4194304 bytes
Redo Buffers 1073152 bytes

RMAN>

RMAN> RUN
2> {
3> ALLOCATE CHANNEL C2 DEVICE TYPE DISK;
4> RESTORE CONTROLFILE FROM AUTOBACKUP;
5> ALTER DATABASE MOUNT;
6> RESTORE DATABASE;
7> RECOVER DATABASE;
8> }

allocated channel: C2
channel C2: sid=11 devtype=DISK

Starting restore at 18-SEP-03

channel C2: looking for autobackup on day: 20030918
channel C2: autobackup found: c-411229649-20030918-01
channel C2: controlfile restore from autobackup complete
replicating controlfile
input filename=/home2/user18/ORADATA/u01/ctrl01.ctl
Finished restore at 18-SEP-03

database mounted

Starting restore at 18-SEP-03

datafile 6 not processed because file is read-only
channel C2: starting datafile backupset restore
channel C2: specifying datafile(s) to restore from backup set
restoring datafile 00001 to /home2/user18/ORADATA/u01/system01.dbf
restoring datafile 00002 to /home2/user18/ORADATA/u02/undotbs01.dbf
restoring datafile 00003 to /home2/user18/ORADATA/u03/users01.dbf
restoring datafile 00004 to /home2/user18/ORADATA/u03/indx01.dbf
restoring datafile 00005 to /home2/user18/ORADATA/u02/sample01.dbf <--- NOTE, IT DID NOT RESTORE READONLY DATAFILE
channel C2: restored backup piece 1
piece handle=/home2/user18/oracle/dbs/01f1kuho_1_1 tag=TAG20030918T120824 paramL
channel C2: restore complete
Finished restore at 18-SEP-03

Starting recover at 18-SEP-03
datafile 6 not processed because file is read-only

starting media recovery

archive log thread 1 sequence 3 is already on disk as file /home2/user18/ORADATc
archive log filename=/home2/user18/ORADATA/ARCHIVE1/arch_3.arc thread=1 sequenc3
unable to find archive log
archive log thread=1 sequence=4
Oracle Error:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: '/home2/user18/ORADATA/u01/querydata01.dbf'

September 20, 2003 - 4:47 pm UTC

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96565/rcmsynta39.htm#92274 <code>

Failover

Douglas, November 27, 2003 - 8:19 am UTC

Tom, the docs are rather reticent on this... Maybe you can give me a practical
advice. It's quite simple, I think. Though, I can't find much info on it.

I have a typical data guard configuration with two machines/instances.

After carrying out the procedures to failover from machine A to B, how should I
"re-point" the clients to machine B ?

Either:

1) Use the FAILOVER clause in the address descriptor in tnsnames.ora

or

2) Swap A's IP address (and/or hostname) with B's. That is -- fully change the
machine roles.

?

November 27, 2003 - 10:39 am UTC

what advice are you looking for?

the docs are saying you can either:

a) configure the tnsnames entry so that it tries production and if it cannot connect to production, it connects to standby (this will take longer)

b) change the IP addresses so the ip address of the failed prod machine becomes the standby IP address and vice versa

Other options are:

c) use Names and update the Names server (LDAP in current releases of Oracle, use LDAP to resolve service names)

d) run a listener on a 3rd machine and have the servers register with this listener as a "service". clients connect to listener, listener forwards to a database that serves that service.

Disaster Recovery Plan

Mehmood -Ur- Rehman, May 05, 2004 - 3:57 am UTC

The Discussion is really very helpfull. It gives many ideas to design the DRP.

Tom you are a great resource for the Oracle DBA community.

Architecture advice

Prakash, May 18, 2004 - 11:03 am UTC

Thanks for laying out the cards on the table. This really helps the thought process.

I have System "A" on 8.1.7.4 on solaris boxes. We have a new system "B" coming up on 8.1.7.4 on a new solaris box. We want high availability for "B". There is a set of common admin tables to be shared by the two systems.
Clustering is not an option for us at this point becuase of budget constraints. So here is what we propose to do
Create a standby database for "B" - perferably on a separate box
Do two-way multimaster replication on the admin tables between "A" and "B"

If this is good path then in the event there is a failure on the Primary"B" and Standby"B" is brought up, does replication continue seamlessly? Do we have to define replication groups both on the primary"B" and standby"B"?

Thanks

May 18, 2004 - 5:09 pm UTC

in 8174, the standby will always be behind unless you use some sort of disk based mirroring for the online redo logs since data guard did not have a no data loss mode in 8174.

So no, you cannot fail over and have the replication kick in as the standby may not have everything the production did. they are not at the same point in time.

High Availability comes at a price. You need to be willing to pay for that sort of availability. If the availability overall is "too expensive" then the cost of being down must be pretty cheap? I mean, at the end of the day -- you have to balance the two things out.

I would be looking at a more current release of the software -- 10g or at leat 9ir2 -- for a no loss data guard (or a lot less lossy dataguard) for this new system. I would look at having the admin tables "owned" by only one instance and doing one way replication -- if I did *any* at all (not a big fan of replication)

guarding badnwidth

Gabriel, September 22, 2004 - 10:23 am UTC

Hello Tom,

We have on the dataguard server 2 100MB cards listening on two different ports. Our dataguard second destination service is configured on one of these ports. How can we have a service configured for both cards or use two services for the same destination. We are trying to use the full 200MB bandwidth that we have at our disposal.

Thank you,

September 22, 2004 - 10:34 am UTC

why, are you generating more than 100 megabits/second of redo? (about 12meg/second...)

if so, you have a pretty fast system, gigabit ethernet would be what you were looking for (my laptop has that today!)

(this would really be a "ask the networking dudes" sort of question -- we use connected tcp/ip sockets to transfer data -- can you multi-plex those two cards into a single virtual card sort of thing....)

would this work?

Gabriel, September 23, 2004 - 2:45 pm UTC

Hello Tom,

I found this somewhere else. Do you think it will work?

Step 1:
On standby where you have 2 interface cards
Start the listener on card 1 IP.

Step 2:
On standby where you have 2 interface cards
Start listner on card 2 IP.

Step 3: on Primary
tnsnames could be used with load_balance=on
example
<SERVICE NAME> =
(DESCRIPTION =
(LOAD_BALANCE=ON)
(ADDRESS = (PROTOCOL= TCP)(Host= systemc-nb)(Port= 1524)) --- nb backup network
(ADDRESS = (PROTOCOL= TCP)(Host= systemc)(Port= 1524)) --- Primary network
(CONNECT_DATA = (SID = der))
)

Step 4: In the Primary database
create log_archive_dest with service as <SERVICE NAME> of tnsnames. This way the archive log will be shipped alternatively with listener.

September 24, 2004 - 9:27 am UTC

so? think about it, what have you achieved.....

ship log 1 on pipe 1
and then, later
ship log 2 on pipe 2
and then, later
ship log 3 on pipe 1
and then, later................

is that what you wanted.....

(are you solving a problem you in fact actually have is my question still...)

I really need your help- I have to use hot failover for standby db

Mariana, November 09, 2004 - 10:02 am UTC

Hi Tom,
well my problem is the following:
I have to create a script or a job that automatic connect the users connected to the primary database, and make them connect to the standby database, and the users doesn't have to know that thwe primary database is down, please help me!!!!

November 09, 2004 - 11:04 am UTC

you don't need a "script"

you just need to read up on how to handle connect time failover in a tnsnames.ora connection entry:

</code> http://docs.oracle.com/docs/cd/B10501_01/network.920/a96580/advcfg.htm#465216 <code>

Death of the primary database

Louie Warren, November 24, 2004 - 3:45 pm UTC

I have successfully set up Dataguard via OEM. I now want to simulate the death (power, connectivity) of the primary machine and make the standby take over. Probably a bad idea, but I deleted the configuration on the primary and am left with a standby that I can't make primary... at least I can't figure it out wading through the manuals... any help?

November 24, 2004 - 4:03 pm UTC

question 1: is this a real prod system? if so, stop -- hands off of the keyboard. get in immediate touch with support.

if not, continue...

have you read:
</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96653/role_management.htm#1024703 <code>
?

reader

A reader, January 04, 2005 - 12:21 pm UTC

Could you tell us what is the difference in terms of
recovering database by utilizing

1. from snapshot controlfile
vs.
2. autobackup controlfile

could we not use the snapshot controlfile just as well
by copying to controlfile location and perform
restore using the controlfile instead of catalog

January 05, 2005 - 8:49 am UTC

but the snapshot controlfile is a temporary file used by rman and not registered in the catalog so --- not sure what its relevance is.

reader

A reader, January 05, 2005 - 10:29 pm UTC

I have read that the snapshot cntrolfile is created by
the same mechanism by which "alter database backup
controlfile to 'filename.ctl" is. So we do have a
controlfile that can be used in place of the lost controlfile. Mount the database and startup RMAN. Restore backup sets. Is this a possible DR senario

January 06, 2005 - 10:27 am UTC

if you have a snapshot controlfile, you might well be able to use it in the event "all else has failed"

but -- it is not the proper method, you BACK UP stuff. it would be best to do it right.

Minimum requirements to prevent a disaster (data loss)!

Robert, February 23, 2005 - 3:24 pm UTC

Tom,

(8.1.7 and 9.2)

Given that we currently have all our database/hardware components locally, I am putting together minimum requirements for avoiding loss of data.

Could you please comment...

Parameter File - Backup copy along with the Database Backups.
Control File - Multiplexed on at least 2 (or 3) different physical devices.
Online Redo Logs - Multiplexed on at least 2 different physical devices
Archived Redo Logs - Multiplexed on at least 2 different physical devices
Data/Undo Files - Mirrored (at least?).
Database Backups - Multiplexed on at least 2 different physical devices and stored offsite (e.g. on tape).

Thanks as always,

Robert.

February 24, 2005 - 5:12 am UTC

technically, the mirroring of data files is not necessary for "avoiding loss of data" but it does increase availability for sure.

how about: OS backup stored with database backups so we can restore the version/patches/etc of the database software we are using....

there are many parameter files (tnsnames.ora, sqlnet.ora, etc) to consider as well.

see to be missing "monthly or weekly test restore to see that everything actually can be reconstituted from what we have - that the tapes are good, that we know how, that we did not miss any pieces"

RMAN example

Nikunj Thaker, February 24, 2005 - 8:02 am UTC

Dear Tom,

Can you please give me some practical example of RMAN.

I am beginner of RMAN and i would like to test how RMAN works and how its useful to recover the things.

I have Oracle 9iEE on which i am doing this practices on it. without practical example my testing is not ok.

so i need some practical scenario and there work around (which i will try to found out first) but i need practical example.

i.e. what i did i had removed one data file from my server and try to recover with rman & get success of it.

so i need scenario mean what i need to do for example i had removed one data file on above example like that.

thanks & regards

February 24, 2005 - 8:29 am UTC

umm, just work through various recovery scenarios.

R. Freeman has a good rman book (Oracle Press) if you are interested in more examples and uses.

Thanks, Tom, as always!

Robert, February 24, 2005 - 9:29 am UTC

pick a failure

scott, April 16, 2005 - 4:01 am UTC

Hi Nikunj,

Think through the consequences of a environmental failure in your server room. For example, complete power loss. Hopefully, this is an unlikely situation. However, if you can recover from this you can recover from most types of failure.

At least it will get you thinking....

SRDF

NOTNA, April 20, 2005 - 6:50 am UTC

Hi Tom,

Is the argument regarding SRDF still applicable today???

Regards,
NOTNA

April 20, 2005 - 7:07 am UTC

absolutely. If you want to DR a database, all we need to do is transmit the redo -- of a single member -- over the net.

If you do the database via disk mirroring you will be transmitting

o the block (remember a row is a fraction of a block but Oracle writes BLOCKS)
o all of the blocks (an insert into a table with 3 indexes modified AT LEAST 3 blocks and maybe more
o all of the undo blocks written to disk
o the multiple members of redo and
o their corresponding archives
o not to mention the update every three seconds to the control files and so on...

database - data guard
other files - disk mirroring.

Commited Transaction and Uncommited Transaction

NOTNA, April 21, 2005 - 2:42 am UTC

Hi Tom,

From your statement:

<quote>
so, you use standby -- same situation but lgwr writes to the online redo log and
sends to the standby. production is blown up. no worries, the 49.9 meg of
committed transaction data is on the standby -- apply it, open the database and
you are back in business.
</quote>

I got a bit consfused.Was it only commited transactions are transferred to the Standby??? How about the non-committed ones, will they be transferred across as well???

Regards,
NOTNA

April 21, 2005 - 6:21 am UTC

all redo is transmitted, any redo that was in the buffer (which could be for either committed or not yet committed transactions) is sent.

Failback in 10G - and earlier

naresh, June 14, 2005 - 12:08 pm UTC

Tom,

Below if from 10G article on oracle site:

However, in Data Guard 10g, after the primary server fault is repaired, the primary database may simply be brought up in mounted mode, "flashed back" (using flashback database) to the SCN at which the failover occurred, and then brought back as a standby database in the Data Guard configuration. No reinstantiation is required.

Is something similar possible with a resetlogs in earlier versions? Or does resetlogs completely abandon the current redo log?

Thanks in advance,
Naresh.

June 14, 2005 - 4:07 pm UTC

flashback database is totally new with 10g and is the equivalent of doing a "point in time recovery" (which is what you would do in 9ir2)

The old bottle needs Fresh Wine now

Ashwin Jaiswal, July 13, 2005 - 5:26 am UTC

Dear Tom, the basic inputs provided by you in 2002 were absolutely pertinent during those times. And as usual, you had given an extremely indepth and invaluable details on options. But over the last 3 years things have really changed. The competition came in with new products for Disaster recovery, besides lot of changes in Oracle Dataguard too.

Now in the current scanario, in the face of lots of competing products with additoional features from Varitas, EMC, HP etc vis-avis Dataguard(with new features), if you were to respond to the same question how would you now respond ? I think it will help many more people on this subject, since the DR itself has become more critical and more "in-thing" for many more organisation ever since !

July 13, 2005 - 12:44 pm UTC

this was actually *the first* article in 2000....

the major update here would be:

There are at least 5 options to consider:

o RAC (Real application clusters)
o Streams
o Standby database via data guard
o OS oriented solutions
o Architectural Solutions

still -- RAC for high availability in a "room"
dataguard for failover

are the primary answers from my perspective.

DG Modes

Sandeep, August 11, 2005 - 3:16 am UTC

Hi Tom,

Was reading up DataGuard (DG) modes and had some queries.

1. I have come across two "classifications" for DG.
a) Guaranteed, Instant, Rapid & Delayed.
b) Maximum Protection, Maximum Availibility, Maximum
Performance.
How do the above two tie in with each other?

2. I'm trying to "visualise" the difference between
"Network Transmission Mode" [NTM] and "Disk Write
Operation" [DWO] parameters. The NTM can be set to SYNC
or ASYNC and the DWO can be set to AFFIRM or NOAFFIRM.
Could you explain a bit more on how actually it makes
a difference, for eg: between SYNC-AFFIRM vs
SYNC-NOAFFIRM?

Regards,
Sandeep

August 11, 2005 - 9:51 am UTC

1) old names vs new names. B are the new names.

2) affirm and noaffirm have to do with "IO to disk", and the other with network.

basically sync+affirm and async+noaffirm are the valid, sensible pairings.

DataGuard and RMAN

Sanji, December 01, 2006 - 5:34 pm UTC

Tom,
The environment is Oracle 9i R2, HP-UX 11i.
Production DB size 570 Gb.
As of today, the database is subjected to hot backups every 2 hours and then the complete datafiles set along with archives are ported/ applied to a database hosted at a different geographical location. The idea is of efficiently setting up a Disaster Recovery site.

I'm in the process of configuring RMAN for all production databases.

Is it feasible to implement DataGuard and RMAN for this database and if yes how would it work. I am kind of confused as to how RMAN would be a value addition in this scenario.

I do understand that a standby database for the dataguard can be created through RMAN, but wanted your inputs as to how exactly would this work. I did go through the documentation, but am still not very clear regarding the RMAN-DATAGUARD setup.

If this is not feasible, could you suggest an alternative.

Thanks
Sanji

December 02, 2006 - 12:43 pm UTC

rman would be used to setup the standby database, then dataguard would keep it up to date.

What you are doing now is - well - not even close to a good practice.

rman used to copy initial stuff over
dataguard takes it from there.

rman is not mandatory, just makes it easier.

and data guard is the right approach to a disaster recovery site.

Standby DB Backup with RMAN

Sanji, January 17, 2007 - 12:02 pm UTC

Tom,
Considering a case where the Physical Standby Database is created "without" using RMAN, through DataGuard.

Question..

Would the standby database have a different DBID than the production database IF it's not created through RMAN.

Becuase if the DBID is same, the standby can then be configured to use RMAN for backups and if required, the backups can be restored to the primary database if primary has a failure of some kind.

Thanks
Sanji

Disaster Recovery Plan

naz, February 22, 2007 - 10:47 am UTC

I am planning this for my DR site. Please correct/comment.
I am running a database in ARCHIVELOG ON mode on production
site.

At D.R. site I installed the same version of Oracle 10g
Release 2, only the software, no database.

I take a full closed physical backup of production
and copied those (control, online redo, datafiles) files to my D.R. server. is it necessary that the files should
be in exact same path on DR as in production?

From now on I just copy the archived logs from production
to D.R. site and apply it to D.R. database, note that
I will only apply the Archived logs, I will not open the
database at D.R.

I will not open the database ever at D.R. site until
any disaster.

When there is a disaster, I just copy the last archvied logs
(if still available) to the D.R. site, apply it and open
the database at D.R. site.

In the case of disaster should I also apply the online
redo logs if available?

February 22, 2007 - 11:29 am UTC

you want to setup data guard
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14239/toc.htm

manual dataguard

naz, February 22, 2007 - 12:21 pm UTC

Yes, I know about dataguard, but just want to confirm
if I can do this sort of manual dataguard.

February 22, 2007 - 7:42 pm UTC

why? it would not be smart. Please don't - use the tools that do the job.

manual dataguard

naz, February 23, 2007 - 5:42 am UTC

Because we dont have any network conectivity between
our production and d.r. site, we will have to carry
the archived logs on tape manually to the d.r. site.
Thats why I was asking if it is possible?

February 26, 2007 - 10:16 am UTC

if there is no network connectivity, I honestly - 100% - thoroughly - cannot see how you can call this a DR site.

It isn't. This is not even making a tiny bit of sense.

Ok, so production blows up. What do you do with this machine that is not on the network???????????????????????????

manual d.r.

naz, February 26, 2007 - 10:55 am UTC

We will physically bring that machine from d.r. site
to our primary site and connect it to the network.

I understand, but we for some reason cannot afford
network connectivity.
We have only prepared our D.R. site for recovery when
the primary systems blows, but not when the primary network
blows, in that case we will have suffer down time.

February 26, 2007 - 3:27 pm UTC

just use backups then - I cannot imagine how driving someplace to turn off a machine, crate it up and so on is part of a professional "disaster recovery" plan.

this is not a good idea that I'm reading here, if you do this, please don't call it "disaster recovery".

yes you can take your backup to your 'offsite piece of hardware' somewhere.

you can restore it

you can take your archived redo logs and apply them to this backup

but please, do not call this DR - that gives DR a bad name ;)

procedure

naz, March 03, 2007 - 11:58 am UTC

Ok, I am following this procedure please correct.

1- Take closed backup of .ctl, online redo log, .dbfs
2- Restore all the above files to D.R. machine.
3- STARTUP MOUNT at the D.R. machine
From now on i will do the following daily morning.
a)copy all archived log files from primary to d.r.
b)RECOVER DATABASE USING BACKUP CONTROLFILE; -- at D.R.
now it will prompt for files, which I will accept until
the last archived log file is applied and it returns
to SQL prompt. I will leave it on the prompt as it is.
c) repeat step a and b until disaster occurs at primary
site.
d) In case of disaster, I will copy the remaining
archived log files to the d.r. site.
e) also copy the online redo log files to the d.r. site.
f) RECOVER DATABASE USING BACKUP CONTROLFILE -- at D.R.
this will apply all archive files including the online
redo logs.
g) ALTER DATABASE OPEN RESTLOGS -- at D.R. site.

I am assuming that by following the above procedure I will
not loose any committed transactions.

Please verify the above plan, as I have to implement it in
the our organization.

Thanks

March 04, 2007 - 6:53 pm UTC

please stop calling this D.R. I'm dead serious about that. This is not DR, this is, well - I'll leave it at that.

test it.

You will lose committed transactions - obviously. In case of disaster your archives don't exist. If they did, it would not be a disaster.

You will lose data. This is not DR.

naz, March 05, 2007 - 5:21 am UTC

Ok, I will not call it D.R. anymore.

Data guard's Maximum Performance mode:-
This mode offers slightly less data protection on the primary database, but higher performance than maximum availability mode. In this mode, as the primary database processes transactions, redo data is asynchronously shipped to the standby database. The commit operation of the primary database does not wait for the standby database to acknowledge receipt of redo data before completing write operations on the primary database. If any standby destination becomes unavailable, processing continues on the primary database and there is little effect on primary database performance.

So even here, we might/will loose commited transactions in case
of a disaster such that all redo (archived and online)
control files and datafiles are lost at the primary site.

March 05, 2007 - 1:02 pm UTC

yes, but you won't lose like - an entire days worth.

And you have to option to not lose ANY with data guard.

SRDF (EMC) of Redo Logs and Archived Logs

A reader, March 15, 2007 - 3:25 pm UTC

Hi Tom,

We have a production Database of Size 100 GB. We have taking the back-up of this database through Data Mirroring using EMC (SRDF) once in every day

If we mirrored both the redo log and archived log files through Synchronous SRDF (EMC)

1) If the Production Database goes down (Disaster happend to the SITE where Production Database resides), Can we able to up the DB to the same point where it is being lost by using the
> Previous Day Back-Up data
> Redo Logs (Mirrored Data)
> Archived logs (Mirrored Data)

2) Is there any performance problem occurred in the Production Database because of Synchronous Data Mirroring using SRDF (EMC)

Regards,
Sushanta

March 16, 2007 - 2:58 pm UTC

1) you should be able to, yes

2) ask EMC? yes, there will be a hit, you have to do two IO's typically over a WAN with all of the distributed commit protocols in place.

Autobackup

Aru, April 26, 2007 - 6:04 pm UTC

Hi Tom,
Suppose I wanted to test disaster recovery as in the document you sent me a bit earlier in the thread, it says;-

'To perform a disaster recovery, the minimum required set of backups is backups of some datafiles, some archived redo logs generated after the time of the backup, and at least one autobackup of the control file'.

What if someone has got no autobackup but has got RMAN backup's of controlfiles which are all validated ?
Please help,
Regards,
ARU.

April 27, 2007 - 10:26 am UTC

get correct and proper backups in place, this is why you are testing.

Autobackup

Aru, April 29, 2007 - 5:50 pm UTC

Thanks Tom,

So having an Autobackup of controlfile is mandatory for Disaster recovery scenerio when not using a cotalog during recovery??
Regards,
Aru

April 29, 2007 - 7:37 pm UTC

have proper thought out backups that just work out of the box are mandatory for disaster recovery situations - period.

Unless you want to be pulling your hair out, working with support, having people stand behind you asking "how much longer till you are done?"

recovery is the only thing you cannot get wrong in your job - in order to not muck it up, you would do the backups correctly.

Rman RAC backup

Marc, August 07, 2007 - 1:11 am UTC

Tom,

I'm quite new to the RAC environment. We have built a test system that contains seven cluster databases. We are in the process of designing our backup strategy and I'm somewhat confused as to where to connect to peform these backups! Do I connect to each cluster database to perform the backup or am I missing something?

August 07, 2007 - 11:40 am UTC

you just need to connect to one of them to backup a database (there is only one database to be backed up, all instances have full access to each)

A reader, August 07, 2007 - 6:05 pm UTC

Thanks Tom..

standby database

A reader, January 26, 2010 - 2:11 am UTC

Hi Tom,

We have a database 10gon a unix server. We may want to purchase
another windows 2003 server as standby server. how can I do that

January 29, 2010 - 2:22 pm UTC

you cannot. You can replicate to another platform, but you cannot use it for disaster recovery.

And if you sat down and thought it through, you would understand why it doesn't even begin to make sense to have a heterogeneous disaster recovery set of platforms. When the stuff hits the fan - the last thing you want to consider doing is changing a lot of stuff - like, oh, say, your operating system.

Get a smaller version of your unix box and use that.

A reader, February 01, 2010 - 6:05 am UTC

can i do that if i have 11g

February 01, 2010 - 10:36 am UTC

answer stands.

A reader, November 22, 2010 - 3:02 am UTC

how can i replicate whole database

November 23, 2010 - 12:48 pm UTC

insufficient set of inputs here to answer you.

You give no reasoning - no context.

I could just say "restore it from backup, done" - but that probably isn't the right answer for you. Otherwise you would have done that.

I don't know if you mean to have update anywhere replication (i'll talk you out of that)

I don't know if you mean for disaster recovery

Or for a reporting system

Or on the same type of machine

Or whatever - There are MANY ways to do "replicate whole database" but only a few will be right for you - but we won't know what they are until you tell us a bit about yourself.

Alexander, November 23, 2010 - 1:19 pm UTC

That has to be in my top 5 favorite questions of all time.

Testing a DR plan

A reader, April 09, 2012 - 3:30 am UTC

Hi Tom,

Is it normal practice to simulate unplanned outage of primary database as a part of DR testing plan? If so, is there any recommended best practice?

Database 11.2.0.2 with physical standby database.
Tested switch over.
Flashback database is on.

Thanks.

April 09, 2012 - 11:42 am UTC

It is normal to switchover to your disaster recovery site multiple times every single year - year in, year out.

for these reasons:

1) you use it to avoid planned downtime - for OS patches, hardware fixes, etc.

2) you do it to make sure you have everything you need over there (you don't want to discover when you fail over you are missing something do you...)

3) you do it to make sure you know HOW to do it, that you CAN in fact do it.

You could make it "unplanned" by just springing it on the DBA team, but I don't think that is necessary. Let them prepare for it, study for what they need to do, practice it in test.

Otherwise you might actually have a total unplanned outage :)

Thanks

A reader, April 10, 2012 - 12:43 am UTC

More to Explore

VLDB

If you are new to partitioning, check out Connor McDonald's introduction series here.

VLDB

Documentation set on VLDB and Partitioning.

disaster recovery

Breadcrumb

Question and Answer

You Asked

and Tom said...

Rating

Comments

Its Awesome

Good Overview of the options

Reader

SEVERAL SMALL vs ONE BIG

VLDB impact to startup & shutdown

First-Class article about choosing the best Disaster-Recovery Plan

lgwr and arch behavior

Test Disaster recovery using RMAN ....

Failover

Disaster Recovery Plan

Architecture advice

guarding badnwidth

would this work?

I really need your help- I have to use hot failover for standby db

Death of the primary database

reader

reader

Minimum requirements to prevent a disaster (data loss)!

RMAN example

Thanks, Tom, as always!

pick a failure

SRDF

Commited Transaction and Uncommited Transaction

Failback in 10G - and earlier

The old bottle needs Fresh Wine now

DG Modes

DataGuard and RMAN

Standby DB Backup with RMAN

Disaster Recovery Plan

manual dataguard

manual dataguard

manual d.r.

procedure

SRDF (EMC) of Redo Logs and Archived Logs

Autobackup

Autobackup

Rman RAC backup

standby database

Testing a DR plan

Thanks

More to Explore

VLDB

VLDB