Its Awesome
A reader, July 12, 2001 - 5:44 pm UTC
Pichaimani, July 15, 2001 - 11:17 pm UTC
Thanks for explaning and rolling out all the options.
A reader, May 02, 2002 - 5:30 pm UTC
Hello Tom,
I was reading up on the dataguard (standby) feature of 9i. Using dataguard the standby database can be configured to be at no lag (there are other options too) with the primary by appliying the ONLINE REDO to the standby database. But it has very big downsides.
1. The synchronous transmission and affirmation of redo-apply will surely cast a big performance impact on the primary database.
2. When due to any (network or other) problems, if the redo transmission is suspended, Oracle will shutdown the Primary database! (just like when there is no space for redo writes). That is outrageous!! What is high availability then?
I wonder if the standby database is that good an option for high availability. If you go for 100% sync with the primary database you have to take the above risks. On the other hand, if you adapt the 8i standby architecture, you will always lag behind. Oracle's RAC, Replication or o/s level mirroring all looks much better options to me. Could you comment on that?
Also, does Oracle support anything like below:
The Primary database files are on a tripple mirrored RAID array. The array can be mounted on to another box easily. That box has a similary configured instance (with same name) running -but no database is mounted on it. When the primary becomes unavailable, mount the the array to the second box and mount the database. Database should do instance recovery upon opening. Will it not have 100% transactions available?
Is this possible? Or am I day dreaming? Waiting for your comments.
Thanks
May 03, 2002 - 10:15 am UTC
1) you get what you pay for this much is true -- yes. If you CANNOT stand to lose a single committed transaction ever (say you are a BANK), then you will
a) design for this
b) size for this
c) pay for this (in network bandwidth and low latency)
Also bear in mind that dataguard supports 4 levels:
GUARANTEED:
o lgwr does a syncronous write over the network to the standby redo
o if the standby fails (network/whatever) PRIMARY STOPS
INSTANT:
o lgwr does a syncronous write over the network to the standby redo
o if the standby fails (network/whatever) PRIMARY CONTINUES and data
guard falls back to the "send the archive redo logs later" mode
RAPID
o lgwr does an asyncronous write over the network to the standby redo
o if the standby fails (network/whatever) PRIMARY CONTINUES and data
guard falls back to the "send the archive redo logs later" mode
DELAYED
o arch does the work, transmitting archive redo logs as standby in 8i
and before did
o if the standby fails (network/whatever) PRIMARY CONTINUES and data
guard falls back to the "send the archive redo logs later" mode
So -- you tell us -- what level do you want? what level do you need? Me, I see GUARANTEED for people who cannot afford to lose a BYTE of data EVER (banks, financial institutions pop into my head -- my HR database, no, my 401k -- YES)
I don't see instant as being commonly used.
RAPID for most all other implementations
DELAYED for people who can afford to lose relatively large numbers of transactions when the building housing the data center blows up, burns down, whatever.
2) Outrageous? Not if you are a financial institution whose design spec states "a transaction is not to be considered committed until the transaction is safely stored in at least two geographically disperse locations" (and most of them do).
Guaranteed mode is like a really really really fast two phase commit. In the past, people used a syncronous replication technology (both sites MUST be available). Guaranteed mode is a no data loss mode, it is NOT necessary for HA -- it is for continuity of operations. Once the primary or standby is taken out (remember 9/11), the other shutsdowns UNTIL you start up a standby for it. This is used by people who cannot afford to ever lose a transaction (what do you mean that million dollars you transferred disappeared??)
You need to look at RAPID protection I think. Actually, you need to look at all four, understand when and where you would use them and then apply the appropriate technology in the right cases!
Here is how one would design an HA system that can afford to lose a very small amount of transactional data in the event of a truly catastrophic failure:
a) RAC in a room, because you NEVER want to failover unless the room goes away.
b) data guard in rapid mode, because you don't want to lag behind too much and you want to impact the system as little as possible. You will work to fix standby issues ASAP when they occur.
If you cannot afford to lose a single transaction -- change rapid to guaranteed and pay for the redundant networks to ensure they are not the issue with the standby. You will design for and pay for this extra level of HA'ness.
Yes, you can do that last paragraph and in fact there are many OS features that facilitate it. It is much like "RAC in a ROOM" and doesn't give you any data guard like features.
A reader, May 03, 2002 - 11:23 am UTC
You can also use storage replication like EMC's SRDF. It replicates on block level. You can set how out of phase the two sites can be or not at all. But you will pay big time for it. But its not cheap.
</code>
http://www.emc.com/products/networking/srdf.jsp <code>
May 03, 2002 - 11:51 am UTC
And IT IS NOT the best way to do this either.
Consider how a block is changed. We will:
o generate a change to the block
o generate a change to the undo block
o generate two copies of redo at least
o and probably two archives
you have 6 IO's eventually in all probability. These will all be transmitted across the network.
With data guard, you'll send 1 redo log change, period. That is a significantly reduced amount of network traffic -- significant.
Addditionally, with SRDF, that other copy is useless. You cannot really use it for anything EXCEPT as failure. With data guard you can use it to offload backups or as a reporting instance (even in guaranteed mode, you can use the standby as a reporting instance)
A reader, May 03, 2002 - 11:30 am UTC
Exactly. I agree one has to pay for it. But I am not convinced standby database (dataguard) has any advantage over an SDRF kind of replication. That is what I wanted to findout. If we can mount the replicated filesystem to another instance and open the database with 100% transactions, that looks a better option to me than data guard with 2 phase commit, unless I am missing some other feature of standby database.
Thanks anyway Tom for your time.
May 03, 2002 - 1:46 pm UTC
Data guard is INSTEAD OF 2PC. It is not 2PC.
See comment right above as to why data guard would be better then SRDF (less costly, more functionality)
SRDF has its place, just not in mirroring datafiles.
A reader, May 03, 2002 - 12:27 pm UTC
There you go! Thanks Tom for pointing out the advantages. That's exactly what I wanted to know.
You may be incurring more network traffic with SDRF. But with standby you also incur time for applyig the re-do in the standby instance, then confirm that back to the primary. And your transactions are waiting on the primary all this time. Also if the redo transmission is interrupted you have a business disruption!
You are correct. Oracle has given a number of options and you pick the most suitable for you. Thank you verymuch for helping us to make that choice.
May 03, 2002 - 1:54 pm UTC
The standby DOES NOT APPLY THE REDO.
A data guard standby recieves the redo, writes it to disk and returns. It does NOT apply it.
The redo on the standby is NOT APPLIED until after the redo log file fills up.
You only have a business disruption IF YOU CANNOT MIRROR THE REDO (not apply) AND IF YOU CONFIGURE IT TO STOP (as I would hope SRDF would give you).
So, in short, LGWR transmits the redo, redo is saved on standby and we return (guaranteed and instant modes). You DO NOT APPLY IT at remote.
LGWR transmits the redo (rapid mode) and you do not wait at all. Standby recieves it in the background and writes it to standby redo. You do NOT APPLY IT at remote.
The redo is not applied until a full redo has been sent over.
Good Overview of the options
A reader, August 02, 2002 - 1:18 pm UTC
Reader
A reader, November 03, 2002 - 9:29 pm UTC
"
(even in guaranteed mode, you can use the
standby as a reporting instance)
"
Will the recovery of standby take place in the background even while the
standby is being used for reporting purposes
"
The redo on the standby is NOT APPLIED until after the redo log file fills up.
"
What happens if the someone did "alter system switch logfile" at the PRIMARY database
November 03, 2002 - 9:37 pm UTC
same thing as would happen as when the logfiles switch by themselves.
In guaranteed (instant and rapid) mode -- the redo is transmitted by lgwr in real time. the redo was already at the standby site when you switched the logs. The redo is accumulated there. If you are using a physical standy the database is either
o in recovery mode, applying the redo
o opened for read only access - redo is just getting queued up, not applied but safely tucked away on the standby.
In a logical standby (new with 9iR2) the redo can be applied even when the database is being used for reporting (or it can be applied in a DELAYED mode -- so that the standby runs an hour or two or whatever behind production -- to protect from that "accidently" dropped table)
SEVERAL SMALL vs ONE BIG
Abu Ahmadh, February 10, 2003 - 9:50 pm UTC
One of our table will grow up to 2TB in size. We wanted to split it into several pieces of databases as you mentioned in Architectural solution for better manageability and performance. Of course with partition. But, the application vendor does not certify to run on multiple small databases.
Now, our solution turned to be RAC. That means, we will have one single BIG database with multiple instances on different nodes. My questions are:
(a) What will be the impact on performance with our proposed approach?
(b) If batch jobs are running against the big table, how does it affect on-line performance?
(c) How do we size for sorting? Will RAC utilize temporary tablespaces of all running instances for a single process?
(d) Does the vendor need to certify if we deploy RAC when Oracle claims that Requirement to code change is NIL?
(e) Can we manually control RACs distributed processing to say, batch jobs will always run against a single instance?
There could be some stupid questions, but I need your expert advice to all my queries.
Thanks Tom
February 11, 2003 - 8:12 am UTC
a) it'll be positive in general. At the time of writing this particular article (jan 2000), RAC did not exist. It is not OPS, it is quite different.
b) like any other thing would. increased contention for IO resources -- but with RAC you can split out the workload by instance so the cpu's don't contend. But in general -- just like any other system with mixed workload
c) pretty specific here -- given you are at the information gathering level. suggest you give the RAC concepts guide a read through
d) the VENDOR must be willing to support your configuration - yes, 100%, absolutely. If they do not, what happens when you call their support line?
e) yes you can.
VLDB impact to startup & shutdown
A reader, February 12, 2003 - 11:19 pm UTC
Will there be any impact to startup/shutdown DB due to multiple instances? Let us say there are 10 nodes in the cluster. What will be the impact in terms of time?
February 13, 2003 - 9:22 am UTC
each instance is independent of the others, there really isn't an impact.
First-Class article about choosing the best Disaster-Recovery Plan
Sami, April 28, 2003 - 12:19 am UTC
John, June 04, 2003 - 9:15 pm UTC
Hi Tom,
We have a database 9.0.1 enterprise version on a windows 2000 server. We may want to purchase another windows 2000 server as standby server. Probablly we will implement data guard for them. However I am not sure when the master goes down, how can we let the client know the standby server takes over. Is it possible to let standby server use the master's IP address through failover procedure? Or you may have other better options. Our database client is a application server using java thin jdbc.
Thanks in advance.
June 05, 2003 - 7:35 am UTC
normally this is done via TNS.
you can specify a list of hosts to try in order when connecting in tnsnames.ora files.
you can use ldap or OracleNames as a nameserver to change the address centrally.
you can use IP failover as well -- but it wouldn't generally be necessary.
If you are big enough to run a failover site, you are probably big enough to use a centralized nameserver instead of little tnsnames.ora files all over the place. You would control is centrally and a simple update would repoint the tns entry.
lgwr and arch behavior
Martin Guillen, September 02, 2003 - 12:03 pm UTC
Hi Tom:
Let me know if I understand this:
If I'm using lgwr to send redo to standby it is NOT applied until a log switch occurs on primary, right?
I was thinking that lgwr apply changes immediately but it seems to be wrong (seeing your response).
So what's the advantage of using lgwr instead of arch?
Also I want to to know what is the purpose of standby redo logs.
Thank you,
Martin.
September 02, 2003 - 3:00 pm UTC
the standby redo logs get filled AND THEN applied at the standby.
The advantage of using lgwr? ok, you are on production. production has 50meg log files. You have just filled up 49.9meg of log. production is blown up.
that 49.9 meg of log is probably gone, forever. will never become an archive file, will never get over to standby.
so, you use standby -- same situation but lgwr writes to the online redo log and sends to the standby. production is blown up. no worries, the 49.9 meg of committed transaction data is on the standby -- apply it, open the database and you are back in business.
A reader, September 16, 2003 - 10:50 am UTC
Hi Tom,
I was reading the below paragraph in one of the followups and you answered that it is possible. Can I get some additional info on this please?
<QUOTE>
The Primary database files are on a tripple mirrored RAID array. The array can be mounted on to another box easily. That box has a similary configured instance (with same name) running -but no database is mounted on it. When the primary becomes unavailable, mount the the array to the second box and mount the database. Database should do instance recovery upon opening. Will it not have
100% transactions available?
<END QUOTE>
Is it possible to do a recovery like that if the tablespaces were not in backup mode? Isn't it just like a hot backup where you can have split blocks in the backup copy and you need full block copies in the logfile to fix them?
September 16, 2003 - 12:49 pm UTC
the disks would just appear to have a "crashed oracle database" on them. it is no different then if you DIDN'T mount them to another machine, but just "started up oracle" again.
all they did was move the disks from computer 1 to computer 2. it is not a media recovery situation, just an instance recovery.
A reader, September 16, 2003 - 5:19 pm UTC
Hi Tom,
Please see </code>
http://asktom.oracle.com/pls/ask/f?p=4950:8:4496575172726829804 <code>There you say that splitting the mirrors requires putting the tablespaces in backup mode.
Here is my thinking:
When you suspend the i/o, the database queues up info in the buffer and does not write to the files. Now you split the mirrors. The datafiles are fuzzy and for recovering you need full images of the blocks in your re-do stream. Can you please explain how this is different from the above referenced thread.
thanks.
September 16, 2003 - 6:33 pm UTC
it is different because that thing (split mirrors + backup mode) was to BACKUP the database.
this, this is just failing the hardware over to another computer. it is not about backing up at all. it is about failing over.
A reader, September 16, 2003 - 5:23 pm UTC
Test Disaster recovery using RMAN ....
READER, September 18, 2003 - 8:46 pm UTC
when I backed up my database using rman, it did backup readonly datafiles along with others. However, when I practiced disaster recovery after removing all of the files, rman did not restore read only datafile. Why? THANKS.
RMAN> backup database; <------- IT DID BACKUP READONLY DATAFILE
Starting backup at 18-SEP-03
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=12 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00001 name=/home2/user18/ORADATA/u01/system01.dbf
input datafile fno=00005 name=/home2/user18/ORADATA/u02/sample01.dbf
input datafile fno=00002 name=/home2/user18/ORADATA/u02/undotbs01.dbf
input datafile fno=00003 name=/home2/user18/ORADATA/u03/users01.dbf
input datafile fno=00004 name=/home2/user18/ORADATA/u03/indx01.dbf
input datafile fno=00006 name=/home2/user18/ORADATA/u01/querydata01.dbf <--- READONLY
channel ORA_DISK_1: starting piece 1 at 18-SEP-03
channel ORA_DISK_1: finished piece 1 at 18-SEP-03
piece handle=/home2/user18/oracle/dbs/01f1kuho_1_1 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:26
Finished backup at 18-SEP-03
Starting Control File and SPFILE Autobackup at 18-SEP-03
piece handle=/home2/user18/oracle/dbs/c-411229649-20030918-00 comment=NONE
Finished Control File and SPFILE Autobackup at 18-SEP-03
PLEASE SEE BELOW THE STEPS THAT I FOLLOWED TO PERFORM DISASTER RECOVERY.
RMAN> connect target;
connected to target database (not started)
RMAN> set dbid 411229649
executing command: SET DBID
RMAN> STARTUP FORCE NOMOUNT;
startup failed: ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/home2/user18/oracle/dbs/initU18.ora'
trying to start the Oracle instance without parameter files ...
Oracle instance started
Total System Global Area 161187360 bytes
Fixed Size 730656 bytes
Variable Size 109051904 bytes
Database Buffers 50331648 bytes
Redo Buffers 1073152 bytes
RMAN> RUN
2> {
3> ALLOCATE CHANNEL C1 DEVICE TYPE DISK;
4> RESTORE SPFILE FROM AUTOBACKUP;
5> }
using target database controlfile instead of recovery catalog
allocated channel: C1
channel C1: sid=9 devtype=DISK
Starting restore at 18-SEP-03
channel C1: looking for autobackup on day: 20030918
channel C1: autobackup found: c-411229649-20030918-01
channel C1: SPFILE restore from autobackup complete
Finished restore at 18-SEP-03
released channel: C1
RMAN> STARTUP FORCE NOMOUNT;
Oracle instance started
Total System Global Area 26968864 bytes
Fixed Size 729888 bytes
Variable Size 20971520 bytes
Database Buffers 4194304 bytes
Redo Buffers 1073152 bytes
RMAN>
RMAN> RUN
2> {
3> ALLOCATE CHANNEL C2 DEVICE TYPE DISK;
4> RESTORE CONTROLFILE FROM AUTOBACKUP;
5> ALTER DATABASE MOUNT;
6> RESTORE DATABASE;
7> RECOVER DATABASE;
8> }
allocated channel: C2
channel C2: sid=11 devtype=DISK
Starting restore at 18-SEP-03
channel C2: looking for autobackup on day: 20030918
channel C2: autobackup found: c-411229649-20030918-01
channel C2: controlfile restore from autobackup complete
replicating controlfile
input filename=/home2/user18/ORADATA/u01/ctrl01.ctl
Finished restore at 18-SEP-03
database mounted
Starting restore at 18-SEP-03
datafile 6 not processed because file is read-only
channel C2: starting datafile backupset restore
channel C2: specifying datafile(s) to restore from backup set
restoring datafile 00001 to /home2/user18/ORADATA/u01/system01.dbf
restoring datafile 00002 to /home2/user18/ORADATA/u02/undotbs01.dbf
restoring datafile 00003 to /home2/user18/ORADATA/u03/users01.dbf
restoring datafile 00004 to /home2/user18/ORADATA/u03/indx01.dbf
restoring datafile 00005 to /home2/user18/ORADATA/u02/sample01.dbf <--- NOTE, IT DID NOT RESTORE READONLY DATAFILE
channel C2: restored backup piece 1
piece handle=/home2/user18/oracle/dbs/01f1kuho_1_1 tag=TAG20030918T120824 paramL
channel C2: restore complete
Finished restore at 18-SEP-03
Starting recover at 18-SEP-03
datafile 6 not processed because file is read-only
starting media recovery
archive log thread 1 sequence 3 is already on disk as file /home2/user18/ORADATc
archive log filename=/home2/user18/ORADATA/ARCHIVE1/arch_3.arc thread=1 sequenc3
unable to find archive log
archive log thread=1 sequence=4
Oracle Error:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01157: cannot identify/lock data file 6 - see DBWR trace file
ORA-01110: data file 6: '/home2/user18/ORADATA/u01/querydata01.dbf'
September 20, 2003 - 4:47 pm UTC
Failover
Douglas, November 27, 2003 - 8:19 am UTC
Tom, the docs are rather reticent on this... Maybe you can give me a practical
advice. It's quite simple, I think. Though, I can't find much info on it.
I have a typical data guard configuration with two machines/instances.
After carrying out the procedures to failover from machine A to B, how should I
"re-point" the clients to machine B ?
Either:
1) Use the FAILOVER clause in the address descriptor in tnsnames.ora
or
2) Swap A's IP address (and/or hostname) with B's. That is -- fully change the
machine roles.
?
November 27, 2003 - 10:39 am UTC
what advice are you looking for?
the docs are saying you can either:
a) configure the tnsnames entry so that it tries production and if it cannot connect to production, it connects to standby (this will take longer)
b) change the IP addresses so the ip address of the failed prod machine becomes the standby IP address and vice versa
Other options are:
c) use Names and update the Names server (LDAP in current releases of Oracle, use LDAP to resolve service names)
d) run a listener on a 3rd machine and have the servers register with this listener as a "service". clients connect to listener, listener forwards to a database that serves that service.
Disaster Recovery Plan
Mehmood -Ur- Rehman, May 05, 2004 - 3:57 am UTC
The Discussion is really very helpfull. It gives many ideas to design the DRP.
Tom you are a great resource for the Oracle DBA community.
Architecture advice
Prakash, May 18, 2004 - 11:03 am UTC
Thanks for laying out the cards on the table. This really helps the thought process.
I have System "A" on 8.1.7.4 on solaris boxes. We have a new system "B" coming up on 8.1.7.4 on a new solaris box. We want high availability for "B". There is a set of common admin tables to be shared by the two systems.
Clustering is not an option for us at this point becuase of budget constraints. So here is what we propose to do
Create a standby database for "B" - perferably on a separate box
Do two-way multimaster replication on the admin tables between "A" and "B"
If this is good path then in the event there is a failure on the Primary"B" and Standby"B" is brought up, does replication continue seamlessly? Do we have to define replication groups both on the primary"B" and standby"B"?
Thanks
May 18, 2004 - 5:09 pm UTC
in 8174, the standby will always be behind unless you use some sort of disk based mirroring for the online redo logs since data guard did not have a no data loss mode in 8174.
So no, you cannot fail over and have the replication kick in as the standby may not have everything the production did. they are not at the same point in time.
High Availability comes at a price. You need to be willing to pay for that sort of availability. If the availability overall is "too expensive" then the cost of being down must be pretty cheap? I mean, at the end of the day -- you have to balance the two things out.
I would be looking at a more current release of the software -- 10g or at leat 9ir2 -- for a no loss data guard (or a lot less lossy dataguard) for this new system. I would look at having the admin tables "owned" by only one instance and doing one way replication -- if I did *any* at all (not a big fan of replication)
guarding badnwidth
Gabriel, September 22, 2004 - 10:23 am UTC
Hello Tom,
We have on the dataguard server 2 100MB cards listening on two different ports. Our dataguard second destination service is configured on one of these ports. How can we have a service configured for both cards or use two services for the same destination. We are trying to use the full 200MB bandwidth that we have at our disposal.
Thank you,
September 22, 2004 - 10:34 am UTC
why, are you generating more than 100 megabits/second of redo? (about 12meg/second...)
if so, you have a pretty fast system, gigabit ethernet would be what you were looking for (my laptop has that today!)
(this would really be a "ask the networking dudes" sort of question -- we use connected tcp/ip sockets to transfer data -- can you multi-plex those two cards into a single virtual card sort of thing....)
would this work?
Gabriel, September 23, 2004 - 2:45 pm UTC
Hello Tom,
I found this somewhere else. Do you think it will work?
Step 1:
On standby where you have 2 interface cards
Start the listener on card 1 IP.
Step 2:
On standby where you have 2 interface cards
Start listner on card 2 IP.
Step 3: on Primary
tnsnames could be used with load_balance=on
example
<SERVICE NAME> =
(DESCRIPTION =
(LOAD_BALANCE=ON)
(ADDRESS = (PROTOCOL= TCP)(Host= systemc-nb)(Port= 1524)) --- nb backup network
(ADDRESS = (PROTOCOL= TCP)(Host= systemc)(Port= 1524)) --- Primary network
(CONNECT_DATA = (SID = der))
)
Step 4: In the Primary database
create log_archive_dest with service as <SERVICE NAME> of tnsnames. This way the archive log will be shipped alternatively with listener.
September 24, 2004 - 9:27 am UTC
so? think about it, what have you achieved.....
ship log 1 on pipe 1
and then, later
ship log 2 on pipe 2
and then, later
ship log 3 on pipe 1
and then, later................
is that what you wanted.....
(are you solving a problem you in fact actually have is my question still...)
I really need your help- I have to use hot failover for standby db
Mariana, November 09, 2004 - 10:02 am UTC
Hi Tom,
well my problem is the following:
I have to create a script or a job that automatic connect the users connected to the primary database, and make them connect to the standby database, and the users doesn't have to know that thwe primary database is down, please help me!!!!
Death of the primary database
Louie Warren, November 24, 2004 - 3:45 pm UTC
I have successfully set up Dataguard via OEM. I now want to simulate the death (power, connectivity) of the primary machine and make the standby take over. Probably a bad idea, but I deleted the configuration on the primary and am left with a standby that I can't make primary... at least I can't figure it out wading through the manuals... any help?
reader
A reader, January 04, 2005 - 12:21 pm UTC
Could you tell us what is the difference in terms of
recovering database by utilizing
1. from snapshot controlfile
vs.
2. autobackup controlfile
could we not use the snapshot controlfile just as well
by copying to controlfile location and perform
restore using the controlfile instead of catalog
January 05, 2005 - 8:49 am UTC
but the snapshot controlfile is a temporary file used by rman and not registered in the catalog so --- not sure what its relevance is.
reader
A reader, January 05, 2005 - 10:29 pm UTC
I have read that the snapshot cntrolfile is created by
the same mechanism by which "alter database backup
controlfile to 'filename.ctl" is. So we do have a
controlfile that can be used in place of the lost controlfile. Mount the database and startup RMAN. Restore backup sets. Is this a possible DR senario
January 06, 2005 - 10:27 am UTC
if you have a snapshot controlfile, you might well be able to use it in the event "all else has failed"
but -- it is not the proper method, you BACK UP stuff. it would be best to do it right.
Minimum requirements to prevent a disaster (data loss)!
Robert, February 23, 2005 - 3:24 pm UTC
Tom,
(8.1.7 and 9.2)
Given that we currently have all our database/hardware components locally, I am putting together minimum requirements for avoiding loss of data.
Could you please comment...
Parameter File - Backup copy along with the Database Backups.
Control File - Multiplexed on at least 2 (or 3) different physical devices.
Online Redo Logs - Multiplexed on at least 2 different physical devices
Archived Redo Logs - Multiplexed on at least 2 different physical devices
Data/Undo Files - Mirrored (at least?).
Database Backups - Multiplexed on at least 2 different physical devices and stored offsite (e.g. on tape).
Thanks as always,
Robert.
February 24, 2005 - 5:12 am UTC
technically, the mirroring of data files is not necessary for "avoiding loss of data" but it does increase availability for sure.
how about: OS backup stored with database backups so we can restore the version/patches/etc of the database software we are using....
there are many parameter files (tnsnames.ora, sqlnet.ora, etc) to consider as well.
see to be missing "monthly or weekly test restore to see that everything actually can be reconstituted from what we have - that the tapes are good, that we know how, that we did not miss any pieces"
RMAN example
Nikunj Thaker, February 24, 2005 - 8:02 am UTC
Dear Tom,
Can you please give me some practical example of RMAN.
I am beginner of RMAN and i would like to test how RMAN works and how its useful to recover the things.
I have Oracle 9iEE on which i am doing this practices on it. without practical example my testing is not ok.
so i need some practical scenario and there work around (which i will try to found out first) but i need practical example.
i.e. what i did i had removed one data file from my server and try to recover with rman & get success of it.
so i need scenario mean what i need to do for example i had removed one data file on above example like that.
thanks & regards
February 24, 2005 - 8:29 am UTC
umm, just work through various recovery scenarios.
R. Freeman has a good rman book (Oracle Press) if you are interested in more examples and uses.
Thanks, Tom, as always!
Robert, February 24, 2005 - 9:29 am UTC
pick a failure
scott, April 16, 2005 - 4:01 am UTC
Hi Nikunj,
Think through the consequences of a environmental failure in your server room. For example, complete power loss. Hopefully, this is an unlikely situation. However, if you can recover from this you can recover from most types of failure.
At least it will get you thinking....
SRDF
NOTNA, April 20, 2005 - 6:50 am UTC
Hi Tom,
Is the argument regarding SRDF still applicable today???
Regards,
NOTNA
April 20, 2005 - 7:07 am UTC
absolutely. If you want to DR a database, all we need to do is transmit the redo -- of a single member -- over the net.
If you do the database via disk mirroring you will be transmitting
o the block (remember a row is a fraction of a block but Oracle writes BLOCKS)
o all of the blocks (an insert into a table with 3 indexes modified AT LEAST 3 blocks and maybe more
o all of the undo blocks written to disk
o the multiple members of redo and
o their corresponding archives
o not to mention the update every three seconds to the control files and so on...
database - data guard
other files - disk mirroring.
Commited Transaction and Uncommited Transaction
NOTNA, April 21, 2005 - 2:42 am UTC
Hi Tom,
From your statement:
<quote>
so, you use standby -- same situation but lgwr writes to the online redo log and
sends to the standby. production is blown up. no worries, the 49.9 meg of
committed transaction data is on the standby -- apply it, open the database and
you are back in business.
</quote>
I got a bit consfused.Was it only commited transactions are transferred to the Standby??? How about the non-committed ones, will they be transferred across as well???
Regards,
NOTNA
April 21, 2005 - 6:21 am UTC
all redo is transmitted, any redo that was in the buffer (which could be for either committed or not yet committed transactions) is sent.
Failback in 10G - and earlier
naresh, June 14, 2005 - 12:08 pm UTC
Tom,
Below if from 10G article on oracle site:
However, in Data Guard 10g, after the primary server fault is repaired, the primary database may simply be brought up in mounted mode, "flashed back" (using flashback database) to the SCN at which the failover occurred, and then brought back as a standby database in the Data Guard configuration. No reinstantiation is required.
Is something similar possible with a resetlogs in earlier versions? Or does resetlogs completely abandon the current redo log?
Thanks in advance,
Naresh.
June 14, 2005 - 4:07 pm UTC
flashback database is totally new with 10g and is the equivalent of doing a "point in time recovery" (which is what you would do in 9ir2)
The old bottle needs Fresh Wine now
Ashwin Jaiswal, July 13, 2005 - 5:26 am UTC
Dear Tom, the basic inputs provided by you in 2002 were absolutely pertinent during those times. And as usual, you had given an extremely indepth and invaluable details on options. But over the last 3 years things have really changed. The competition came in with new products for Disaster recovery, besides lot of changes in Oracle Dataguard too.
Now in the current scanario, in the face of lots of competing products with additoional features from Varitas, EMC, HP etc vis-avis Dataguard(with new features), if you were to respond to the same question how would you now respond ? I think it will help many more people on this subject, since the DR itself has become more critical and more "in-thing" for many more organisation ever since !
July 13, 2005 - 12:44 pm UTC
this was actually *the first* article in 2000....
the major update here would be:
There are at least 5 options to consider:
o RAC (Real application clusters)
o Streams
o Standby database via data guard
o OS oriented solutions
o Architectural Solutions
still -- RAC for high availability in a "room"
dataguard for failover
are the primary answers from my perspective.
DG Modes
Sandeep, August 11, 2005 - 3:16 am UTC
Hi Tom,
Was reading up DataGuard (DG) modes and had some queries.
1. I have come across two "classifications" for DG.
a) Guaranteed, Instant, Rapid & Delayed.
b) Maximum Protection, Maximum Availibility, Maximum
Performance.
How do the above two tie in with each other?
2. I'm trying to "visualise" the difference between
"Network Transmission Mode" [NTM] and "Disk Write
Operation" [DWO] parameters. The NTM can be set to SYNC
or ASYNC and the DWO can be set to AFFIRM or NOAFFIRM.
Could you explain a bit more on how actually it makes
a difference, for eg: between SYNC-AFFIRM vs
SYNC-NOAFFIRM?
Regards,
Sandeep
August 11, 2005 - 9:51 am UTC
1) old names vs new names. B are the new names.
2) affirm and noaffirm have to do with "IO to disk", and the other with network.
basically sync+affirm and async+noaffirm are the valid, sensible pairings.
DataGuard and RMAN
Sanji, December 01, 2006 - 5:34 pm UTC
Tom,
The environment is Oracle 9i R2, HP-UX 11i.
Production DB size 570 Gb.
As of today, the database is subjected to hot backups every 2 hours and then the complete datafiles set along with archives are ported/ applied to a database hosted at a different geographical location. The idea is of efficiently setting up a Disaster Recovery site.
I'm in the process of configuring RMAN for all production databases.
Is it feasible to implement DataGuard and RMAN for this database and if yes how would it work. I am kind of confused as to how RMAN would be a value addition in this scenario.
I do understand that a standby database for the dataguard can be created through RMAN, but wanted your inputs as to how exactly would this work. I did go through the documentation, but am still not very clear regarding the RMAN-DATAGUARD setup.
If this is not feasible, could you suggest an alternative.
Thanks
Sanji
December 02, 2006 - 12:43 pm UTC
rman would be used to setup the standby database, then dataguard would keep it up to date.
What you are doing now is - well - not even close to a good practice.
rman used to copy initial stuff over
dataguard takes it from there.
rman is not mandatory, just makes it easier.
and data guard is the right approach to a disaster recovery site.
Standby DB Backup with RMAN
Sanji, January 17, 2007 - 12:02 pm UTC
Tom,
Considering a case where the Physical Standby Database is created "without" using RMAN, through DataGuard.
Question..
Would the standby database have a different DBID than the production database IF it's not created through RMAN.
Becuase if the DBID is same, the standby can then be configured to use RMAN for backups and if required, the backups can be restored to the primary database if primary has a failure of some kind.
Thanks
Sanji
Disaster Recovery Plan
naz, February 22, 2007 - 10:47 am UTC
I am planning this for my DR site. Please correct/comment.
I am running a database in ARCHIVELOG ON mode on production
site.
At D.R. site I installed the same version of Oracle 10g
Release 2, only the software, no database.
I take a full closed physical backup of production
and copied those (control, online redo, datafiles) files to my D.R. server. is it necessary that the files should
be in exact same path on DR as in production?
From now on I just copy the archived logs from production
to D.R. site and apply it to D.R. database, note that
I will only apply the Archived logs, I will not open the
database at D.R.
I will not open the database ever at D.R. site until
any disaster.
When there is a disaster, I just copy the last archvied logs
(if still available) to the D.R. site, apply it and open
the database at D.R. site.
In the case of disaster should I also apply the online
redo logs if available?
February 22, 2007 - 11:29 am UTC
manual dataguard
naz, February 22, 2007 - 12:21 pm UTC
Yes, I know about dataguard, but just want to confirm
if I can do this sort of manual dataguard.
February 22, 2007 - 7:42 pm UTC
why? it would not be smart. Please don't - use the tools that do the job.
manual dataguard
naz, February 23, 2007 - 5:42 am UTC
Because we dont have any network conectivity between
our production and d.r. site, we will have to carry
the archived logs on tape manually to the d.r. site.
Thats why I was asking if it is possible?
February 26, 2007 - 10:16 am UTC
if there is no network connectivity, I honestly - 100% - thoroughly - cannot see how you can call this a DR site.
It isn't. This is not even making a tiny bit of sense.
Ok, so production blows up. What do you do with this machine that is not on the network???????????????????????????
manual d.r.
naz, February 26, 2007 - 10:55 am UTC
We will physically bring that machine from d.r. site
to our primary site and connect it to the network.
I understand, but we for some reason cannot afford
network connectivity.
We have only prepared our D.R. site for recovery when
the primary systems blows, but not when the primary network
blows, in that case we will have suffer down time.
February 26, 2007 - 3:27 pm UTC
just use backups then - I cannot imagine how driving someplace to turn off a machine, crate it up and so on is part of a professional "disaster recovery" plan.
this is not a good idea that I'm reading here, if you do this, please don't call it "disaster recovery".
yes you can take your backup to your 'offsite piece of hardware' somewhere.
you can restore it
you can take your archived redo logs and apply them to this backup
but please, do not call this DR - that gives DR a bad name ;)
procedure
naz, March 03, 2007 - 11:58 am UTC
Ok, I am following this procedure please correct.
1- Take closed backup of .ctl, online redo log, .dbfs
2- Restore all the above files to D.R. machine.
3- STARTUP MOUNT at the D.R. machine
From now on i will do the following daily morning.
a)copy all archived log files from primary to d.r.
b)RECOVER DATABASE USING BACKUP CONTROLFILE; -- at D.R.
now it will prompt for files, which I will accept until
the last archived log file is applied and it returns
to SQL prompt. I will leave it on the prompt as it is.
c) repeat step a and b until disaster occurs at primary
site.
d) In case of disaster, I will copy the remaining
archived log files to the d.r. site.
e) also copy the online redo log files to the d.r. site.
f) RECOVER DATABASE USING BACKUP CONTROLFILE -- at D.R.
this will apply all archive files including the online
redo logs.
g) ALTER DATABASE OPEN RESTLOGS -- at D.R. site.
I am assuming that by following the above procedure I will
not loose any committed transactions.
Please verify the above plan, as I have to implement it in
the our organization.
Thanks
March 04, 2007 - 6:53 pm UTC
please stop calling this D.R. I'm dead serious about that. This is not DR, this is, well - I'll leave it at that.
test it.
You will lose committed transactions - obviously. In case of disaster your archives don't exist. If they did, it would not be a disaster.
You will lose data. This is not DR.
naz, March 05, 2007 - 5:21 am UTC
Ok, I will not call it D.R. anymore.
Data guard's Maximum Performance mode:-
This mode offers slightly less data protection on the primary database, but higher performance than maximum availability mode. In this mode, as the primary database processes transactions, redo data is asynchronously shipped to the standby database. The commit operation of the primary database does not wait for the standby database to acknowledge receipt of redo data before completing write operations on the primary database. If any standby destination becomes unavailable, processing continues on the primary database and there is little effect on primary database performance.
So even here, we might/will loose commited transactions in case
of a disaster such that all redo (archived and online)
control files and datafiles are lost at the primary site.
March 05, 2007 - 1:02 pm UTC
yes, but you won't lose like - an entire days worth.
And you have to option to not lose ANY with data guard.
SRDF (EMC) of Redo Logs and Archived Logs
A reader, March 15, 2007 - 3:25 pm UTC
Hi Tom,
We have a production Database of Size 100 GB. We have taking the back-up of this database through Data Mirroring using EMC (SRDF) once in every day
If we mirrored both the redo log and archived log files through Synchronous SRDF (EMC)
1) If the Production Database goes down (Disaster happend to the SITE where Production Database resides), Can we able to up the DB to the same point where it is being lost by using the
> Previous Day Back-Up data
> Redo Logs (Mirrored Data)
> Archived logs (Mirrored Data)
2) Is there any performance problem occurred in the Production Database because of Synchronous Data Mirroring using SRDF (EMC)
Regards,
Sushanta
March 16, 2007 - 2:58 pm UTC
1) you should be able to, yes
2) ask EMC? yes, there will be a hit, you have to do two IO's typically over a WAN with all of the distributed commit protocols in place.
Autobackup
Aru, April 26, 2007 - 6:04 pm UTC
Hi Tom,
Suppose I wanted to test disaster recovery as in the document you sent me a bit earlier in the thread, it says;-
'To perform a disaster recovery, the minimum required set of backups is backups of some datafiles, some archived redo logs generated after the time of the backup, and at least one autobackup of the control file'.
What if someone has got no autobackup but has got RMAN backup's of controlfiles which are all validated ?
Please help,
Regards,
ARU.
April 27, 2007 - 10:26 am UTC
get correct and proper backups in place, this is why you are testing.
Autobackup
Aru, April 29, 2007 - 5:50 pm UTC
Thanks Tom,
So having an Autobackup of controlfile is mandatory for Disaster recovery scenerio when not using a cotalog during recovery??
Regards,
Aru
April 29, 2007 - 7:37 pm UTC
have proper thought out backups that just work out of the box are mandatory for disaster recovery situations - period.
Unless you want to be pulling your hair out, working with support, having people stand behind you asking "how much longer till you are done?"
recovery is the only thing you cannot get wrong in your job - in order to not muck it up, you would do the backups correctly.
Rman RAC backup
Marc, August 07, 2007 - 1:11 am UTC
Tom,
I'm quite new to the RAC environment. We have built a test system that contains seven cluster databases. We are in the process of designing our backup strategy and I'm somewhat confused as to where to connect to peform these backups! Do I connect to each cluster database to perform the backup or am I missing something?
August 07, 2007 - 11:40 am UTC
you just need to connect to one of them to backup a database (there is only one database to be backed up, all instances have full access to each)
A reader, August 07, 2007 - 6:05 pm UTC
Thanks Tom..
standby database
A reader, January 26, 2010 - 2:11 am UTC
Hi Tom,
We have a database 10gon a unix server. We may want to purchase
another windows 2003 server as standby server. how can I do that
January 29, 2010 - 2:22 pm UTC
you cannot. You can replicate to another platform, but you cannot use it for disaster recovery.
And if you sat down and thought it through, you would understand why it doesn't even begin to make sense to have a heterogeneous disaster recovery set of platforms. When the stuff hits the fan - the last thing you want to consider doing is changing a lot of stuff - like, oh, say, your operating system.
Get a smaller version of your unix box and use that.
A reader, February 01, 2010 - 6:05 am UTC
can i do that if i have 11g
February 01, 2010 - 10:36 am UTC
answer stands.
A reader, November 22, 2010 - 3:02 am UTC
how can i replicate whole database
November 23, 2010 - 12:48 pm UTC
insufficient set of inputs here to answer you.
You give no reasoning - no context.
I could just say "restore it from backup, done" - but that probably isn't the right answer for you. Otherwise you would have done that.
I don't know if you mean to have update anywhere replication (i'll talk you out of that)
I don't know if you mean for disaster recovery
Or for a reporting system
Or on the same type of machine
Or whatever - There are MANY ways to do "replicate whole database" but only a few will be right for you - but we won't know what they are until you tell us a bit about yourself.
Alexander, November 23, 2010 - 1:19 pm UTC
That has to be in my top 5 favorite questions of all time.
Testing a DR plan
A reader, April 09, 2012 - 3:30 am UTC
Hi Tom,
Is it normal practice to simulate unplanned outage of primary database as a part of DR testing plan? If so, is there any recommended best practice?
Database 11.2.0.2 with physical standby database.
Tested switch over.
Flashback database is on.
Thanks.
April 09, 2012 - 11:42 am UTC
It is normal to switchover to your disaster recovery site multiple times every single year - year in, year out.
for these reasons:
1) you use it to avoid planned downtime - for OS patches, hardware fixes, etc.
2) you do it to make sure you have everything you need over there (you don't want to discover when you fail over you are missing something do you...)
3) you do it to make sure you know HOW to do it, that you CAN in fact do it.
You could make it "unplanned" by just springing it on the DBA team, but I don't think that is necessary. Let them prepare for it, study for what they need to do, practice it in test.
Otherwise you might actually have a total unplanned outage :)
Thanks
A reader, April 10, 2012 - 12:43 am UTC