div.b-mobile {display:none;}

Saturday, September 03, 2005

Archive Log Mode

If I've said it once, I must have said it a billion times. Running a database in noarchivelog mode guarantees one thing and one thing only:

You will someday lose all of your data since your last full cold backup. Not might lose, not could lose, not likely will lose, not probably will lose - It means you WILL lose all of your data since your last full cold backup someday. (assuming of course, you haven't messed up your backups, in which case - you just lose your data period)

It happened again (sort of like clockwork). I believe I get an email like this once or twice a week at least.  I feel sorry for them, but there isn’t too much one can do. I just got back from the Czech Republic last night, forced myself to stay up till 10pm east coast time (4am Czech time) but still got up on 5am and couldn’t get back to sleep.  So, at 5:30am this morning, I read this (paraphrased):

We have one database in NOARCHIVE under version of xxxx. During night process heavy process is going on which is running since last 3 hrs. suddenly power was gone, UPS also not working. After receiving power when we start the database is is not started and giving following mess. We haven't any backup also. Backup was taken before 11 days back. If it is not recovered then, 11 days data entry will be made once again.

I feel really sorry for them, but I mostly feel sorry for the people that rely on this data – not so much for the DBA and technical team that

  • made the conscious decision to run a production system in noarchivelog mode
  • backs up so infrequently

They made these choices, purposely. 

The only thing I can tell them now is “open a tar with support and see how much data you can scrape out of this instance” (I said that in this case based on other information in the email, I know the “fix”, we can get the instance open but only long enough to scrape out possibly inconsistent bits of data). I personally do not give recovery advice via email and tend not to do it on asktom either.  You have to rely too much on the knowledge level of the person on the other end of the internet connection – far too easy for them to do something you never anticipated them doing and to REALLY mess up something big time.

Archivelog mode, it is the only thing that stands between you and your ability to recover your data in the event of a failure (well that and practice, practice, practice and – a test or two or three hundred).

POST A COMMENT

39 Comments:

Blogger Robert said....

That was sent via AskTom ?
That just sucks getting this kind of messages regularly.

Somehow I think in most of those cases nobody will loose jobs...maybe "respect". They still have the data in one form or another...Company will just get the data-entry folks in and start re-keying the data in straight thru. heh

Have a great weekend Tom !

Sat Sep 03, 11:47:00 AM EDT  

Anonymous adewri said....

I believe that if you learn some thing the hard way you will never forget it, but this is something which one should not wait to learn the hard way. Learn it the right way from the first day...

Keep everything archived; if not then know the consequences...

Sat Sep 03, 12:11:00 PM EDT  

Blogger Thomas Kyte said....

That was sent via AskTom ?

No, these are what I get in my inbox (along with statspacks and other requests.. to which I have a standard canned reply about sending me email directly...)

Sat Sep 03, 12:13:00 PM EDT  

Anonymous Anonymous said....

Let's hope that one of the first thing would do before attempting any recovery is to do a backup of the current inconsistent database (just in case).

Shouldn't gloat as my current organization is in the same situation running in NOARCHIVEDLOG mode and I've been slowly getting them to switch over to ARCHIVEDLOG. Worst case is that we lose data for an entire day which has been verified that it is something we can live with until we complete the process of switching everything over to ARCHIVEDLOG mode.

Sat Sep 03, 01:10:00 PM EDT  

Blogger Tim... said....

Tom:
I read a thread on the Dizwell forum a while ago where some of the guys metioned running data warehouses without archiving, relying on cold backups and reloads of missing data in the event of failure.

This doesn't sound like a terrible idea to me. Archiving slows down the weekly data loads and the data is easily recoverable.

Do you have any opinion on this, assuming the dataloads are big and weekly?

Of course, in an OLTP situation it's a nobrainer.

Cheers

Tim...

Sat Sep 03, 02:36:00 PM EDT  

Anonymous Anonymous said....

I work with a system where they take it out of archivelog mode for 4 days a month while they do their "load". The reason is is that there isn't enough space for the archive logs as they pile up and the archiver has gotten stuck. Rather than supply some more precious disk for the archived logs, their choice is to prevent it from cutting logs at all. Whenever I say we should just boost up the archive system they always wonder how much and that's where the discussion ends because I don't know how much it would need to keep up with compressing and dumping to tape. Inevitably, by the time the archiver is unstuck, the sysadmin comments that the archive mount point is only 1% used so why should we add more? On and on.. really frustrating.

Sat Sep 03, 03:06:00 PM EDT  

Blogger Thomas Kyte said....

and reloads of missing data in the event of failure.

They have therefore recognized what I said (you WILL lose all data...) and are prepared to deail with it. That is fine. You just have to

a) recognize that it will happen
b) be willing to live with that fact and do your own sort of recovery.


for 4 days a month

i'd ask them "how much does that 4 days cost? because someday we will be doing 8 days instead of 4, when it crashes on day 3.95 - oh and add in the 4 days of downtime that are normally uptime"

But a 4 day load? wow, you must be loading many petabytes of data?


You can easily tell how much -- just read your alert log, every switch is in there, just add them up.

Sat Sep 03, 03:44:00 PM EDT  

Blogger Pete_S said....

Tim
I read a thread on the Dizwell forum a while ago where some of the guys metioned running data warehouses without archiving, relying on cold backups and reloads of missing data in the event of failure
If I said it (and I can't remember!) then this is what we do: we back-up the whole DW to disk EVERY day (and from there to tape). We have a single dataload each day (takes 2-3 hours) and we are prepared to take this hit to catch-up after a restore. And yes, we do expect things to break, so we practice recoveries to make sure our process works in real-life

Sat Sep 03, 04:48:00 PM EDT  

Anonymous Anonymous said....

Actually it's a combination of loading and processing. It is no where near pentabytes of data, probably gigabytes. I can find out exactly how much disk all the logs would take but they want me to figure out how little I can get away with. In other words, a process runs and compresses files, puts them to tape and deletes them. Unfortunately, this process at times can't keep up with the new logs and it runs out of room (temporarily). Kind of a queueing problem. What's the maximum arrival time of logs compared to how fast I can get them to tape. However, this post has encouraged me to push this whole concept again. I will mine the alert log for a worst case starting point and work it down from there. We could always temporarily allocate a bunch of disk and monitor the high water mark and then lower it for next month. Another argument they made was they needed all the CPU for processing this job fast. I really don't think copying off files, compressing them and putting them to tape uses a lot of power compared to losing the whole processing cycle.

Sat Sep 03, 05:15:00 PM EDT  

Blogger Thomas Kyte said....

sorry, but 4 days sounds like 3.9 or 3.8 days too many for gigabytes :)

"all the cpu for processing this job fast" - hmm... I'd be looking at the algorithms that needed 4 days to pre-process gigabytes of data, they might benefit from some "rework".

Just seems far far too long (and if they did bulk operations - which will be faster typically - they can even opt to by pass redo generation if they like and back up those affected files after the operation instead)

Sat Sep 03, 05:37:00 PM EDT  

Blogger Rachel said....

From way too sad experience....

sometimes the DBA knows what they are doing (at least I hope I did) and loses the fight with upper management. Who insist that they do not need archivelog mode, that in fact they don't need to do a cold backup, a daily (6 hour) export will do fine. And no, it was not in "consistent mode". After all, they needed the one tape drive on the system for the backup of the program files.

Yes, *I* knew the risks. They chose to ignore them. Then when the database fell down, went boom and could not get back up, they said "this is unacceptable" (which part? the one where I tell you this is going to happen and you ignore me or the one where it happens?)

In any case, the call was made to Support, and we were fortunate enough to find out about DUL. And get someone in to save most of the data. Neither easy nor cheap and does not invalidate your original point.

Do I need to mention that within 2 weeks of this incident, we had archivelog mode running, a dedicated tape drive and online backups?


Ah, memories... and people wonder why I chose to get out of this business? :)

Sat Sep 03, 07:25:00 PM EDT  

Blogger Thomas Kyte said....

Rachel --

11 weeks since last backup.....

11 weeks...

no 6 hour export. 11 weeks...

eleven weeks....

Sat Sep 03, 10:04:00 PM EDT  

Blogger scubajim said....

A couple of jobs ago I sent the tech writer documentation on backing up. I included the words that if you don't run in archive log mode or don't test your backups you will lose your data. The technical writer didn't want to sound "harsh" so they change the wording and didn't tell me. (wording to might lose data) Of course, I didn't know about the "helpful" change until the documentation was printed, bound and in customer's hands. I was not pleased.

Sat Sep 03, 10:23:00 PM EDT  

Blogger Joel Garry said....

What does 11 weeks refer to?

Sun Sep 04, 12:17:00 AM EDT  

Blogger David Aldridge said....

ref Data Warehouses.

Was this the thread Tim?
http://www.phpbbserver.com/phpbb/viewtopic.php?t=141&start=0&mforum=dizwellforum

The key to being able to run in noarchive log mode is to know what modifications were made since the last cold backup. In an OLTP system where you have people placing orders, customers ringing with problems, invoices being generated etc. you have absolutely no idea, but with a data warehouse you do generally have a set of logs from you ETL system that will tell you exactly what was done, in what order, and with what data set. If you can re-run all of that then while the DBA's are restoring the last cold backup you can be setting up your ETL system to reload.

Don't forget the universal disclaimer: "applies to data warehouses only!"

Sun Sep 04, 01:40:00 AM EDT  

Blogger Rachel said....

Tom,

As I said, my story does not invalidate your original post. There is no way to mine the data files via DUL? I can understand, perhaps, if the backups were done and the tapes were bad. But not doing a backup at all? A friend in Support once told me that on calls like that one, what he wanted to say was "update your resume and don't send it to me". What he did is what Oracle does.... attempt to recover the data.

You know I'm the biggest believer in the world in backups, testing recovery, figuring out all the craziest possible scenarios and testing them as well. Who would have ever dreamed up 9/11 or Katrina?

Rachel

Sun Sep 04, 07:13:00 AM EDT  

Blogger Thomas Kyte said....

11 weeks

sorry -- meant 11 days, referenced from the original post..

11 DAYS (still horrible, weeks would be worse but 11 days since their last full cold backup of their noarchivelog mode database)


Rachel --

don't get me wrong, I didn't have an issue with your approach, I was making a comment on the 11 days of exposure. I would not like the daily export - but hey, it was daily, you were trying to limit the amount of exposure.

this was 11 days.

(coming from a guy that syncs his laptop with 3 other computers in 2 different locations as often as possible, at least once a day... And the databases - bzacked up even more vigorously).

Sun Sep 04, 09:19:00 AM EDT  

Blogger Connor McDonald said....

as they say in the classics...

ability to backup ain't important, but ability to recover is.

Mon Sep 05, 12:04:00 AM EDT  

Anonymous doug c said....

Would you mind repeating how you synch your laptop to two other computers once a day? Was it that open source tool you mentioned a few weeks ago?

Mon Sep 05, 01:25:00 AM EDT  

Blogger Justis Durkee said....

... bzacked up even more vigorously).

You listening to snoop dog these days?
Let me know when you want to hit the club.

Mon Sep 05, 02:45:00 AM EDT  

Anonymous MartinW said....

I was reading through, wondering why it is that organisations/people set up these complex, key systems and then cut the corner at the last hurdle and don't set up the redo archival. After all disc is so cheap.
But then I realised I'm pouring about 1TB of data a week into one of our systems and, though I have archivelog mode on and we back up all the logs.....I am doing half the work with nologging. Maybe I'm being not-so-smart. It's all recoverable, honest, but how long would it take me to unpick everything...

Mon Sep 05, 01:10:00 PM EDT  

Anonymous Andrew said....

I tell my users (business decision makers) something like this. . .

Timeline:
- - - - - - - - - - - - - - - - - ->
/\
|
End of your system's |
business life *

Hardware fails. You will have a hardware failure. If it occurs to the right of this point on the line, you will be fine. If it occrus on the left of this point, you will have big problems because you will have lost data (or in the case of RAID-5 disks and loosing two, everything).

That generally gets their attention.

Also, in the wake of Katrina. . .
Do you know where your offsite tapes are?
It seems that some companies in New Orleans use an offsite firm with a name that sounds like a place I would never want to store magnetic media, and they cannot get their tapes -- even if they are usable, because the offsite storage site is also in the New Orleans flood area and no one can get into the facility.

Tue Sep 06, 09:38:00 AM EDT  

Blogger Daniel Fink said....

11 days...11 days...11 days...

In terms of backups, an eternity. In terms of some business thinking, nothing. For some reason, some business organizations simply do not grasp the importance of backing up and testing said backups. Communication is a two party process, so the IT team needs to find the right words to communicate the urgency of the issue.

Several years ago I found that a client's backups were corrupted and could not be used for recovery. After explaining the issue to the team and technical management, the business owners decided that they could not afford the downtime to fix the bug (simple parameter change) as this would require their application servers to be rebooted...which was a 6 hour process. And they simply could not afford the downtime. A few weeks later, a rollback segment was corrupted, the corruption was not caught for several days (over a weekend) and the database came crashing down. At least they had an export from a week ago, so they only had to rebuild 6 days worth of transactions.

When faced with these issues, the technical staff deserves some blame. However, the business owners may deserve some as well.

Tue Sep 06, 10:58:00 AM EDT  

Blogger Jeff Hunter said....

You're preaching to the choir here. I've gotten reamed a couple times by people that say "in situation X archivelog mode doesn't make sense". Sometimes I agree, sometimes I don't. For me, archivelog mode is a requirement on all my stuff (dw included).

Tue Sep 06, 12:21:00 PM EDT  

Anonymous Jason McIntosh said....

I have to ask a few questions on this:

First, what about daily consistent dumps? Is this an acceptible method of backing up a database, IF you understand and accept that you'll lose all data transactions from that dump forward? Thus, a nightly dump for a system that does daily transactions.

Second, the trick I've always seen with Archivelog Mode - HOW do you know when to remove the archive files? Is there a SIMPLE method of doing so?

Third, having a full cold backup of the database and files is good. However, the question I have here regards windows systems: What happens if the windows OS goes belly up? I've seen SO much stored in the registry on windows, that most of the time, it's easier to rebuild a windows installation than try and recover from tape. At that point, it seems a ROYAL pain to try and recover a database from archivelog mode, instead of just importing a dump file. Particularly if your dump file is only a couple of gig in size.

Thoughts on any of this?

Wed Sep 07, 09:24:00 AM EDT  

Blogger Thomas Kyte said....

what about daily consistent dumps

well, no. not really. first, how many times do you test the integrity of the dmp file (yes, I've seen a dmp file not import - then what?). Every day you would have to run a full import on it just to be sure you could. Backing up properly doesn't 100% remove this issue, just 99.999% removes it (that is why we keep MULTIPLE backups, 3 or 5 at a minimum. You never know when that backup isn't going to restore because of a solar flare or something)

HOW do you know when to remove the archive files?

that is easy, when you don't have a backup to apply them to anymore? If your oldest backup is 5 days old (you have a full backup for each of the last 5 days) - you don't need those archives from 6 days ago. You need all of the archives created as of the time you started your oldest backup.

Tools like RMAN are great at being a book-keeper for you and telling you when it is OK to get rid of them based on your retention policy.

At that point, it seems a ROYAL pain to try and recover a database from archivelog mode

HOW SO? You have to go through these processes:

a) reinstall windows
b) install oracle

and then you either

c) create new database and import
or
c) use oradim to put into the registry what you need (register the instance) and restore?

I would trust dmp files to copy data from database A to database B.

But as a backup device? Not at chance.

Wed Sep 07, 09:39:00 AM EDT  

Anonymous Jason McIntosh said....


[snip]HOW do you know when to remove the archive files?
that is easy, when you don't have a backup to apply them to anymore? If your oldest backup is 5 days old (you have a full backup for each of the last 5 days) - you don't need those archives from 6 days ago. [/snip]


First, a note - a lot of this is NOT my views, but the views of administration/employers. With that:

You say "oldest backup" - I'm assuming oldest backup of the dbf, and control files, with the dbf (database) files put into backup mode? SO, the last date of the archivelog files you need to keep is the date of the first dbf file you backed up in backup mode, correct? I'm trying to think of an easy way to bash script this, until I can get RMAN working/installed.

Second, daily consistent dumps - I thought the consistent parameter on exp was transaction level consistent, instead of make sure the dump file itself is consistent. That is, it ignores in-doubt transactions, and makes sure the data in the dump file is consistent in case of commits during the dump.

On the reinstall windows/oracle issues, I wasn't aware of oradim. I'm a unix guy by trade, and I'd not seen an easy way to get oracle on windows backed up efficiently. Oradim will recreate all the registry settings for an instance? What about net configuration files, such as listener.ora and the spfiles? What exact files would you recommend backed up on an oracle windows install for recovery?

On a last note, full database dumps - I've ALWAYS had problems with these. More, importing these dumps back into a new database. I get all kinds of creation errors, including issues with the data dictionary creations. Is there any set guide to actually getting a full dump imported back in successfully?

Wed Sep 07, 09:58:00 AM EDT  

Blogger Thomas Kyte said....

I thought the consistent parameter on exp was transaction level consistent

it is, my point was - have you ever had a dmp file that for whatever reason wouldn't import? The file is "broke", the integrity of the file "is broke". Do you want to be sitting there with a dmp file that doesn't import for whatever reason?

You need to backup all of the config files yes (rman will get things like the spfile for you). Maybe start by searching for *.ora on your system and see what ones you have (not everything has a sqlnet.ora for example and so on...)


TEST TEST TEST - test a restore and see what you missed!


I hate full database exports, nasty sometimes. you need a very minimal database created.

User level exports - cool.

But now you see why export to me is not a backup tool, it is a data copying tool.

Wed Sep 07, 10:08:00 AM EDT  

Anonymous Jason McIntosh said....

For dump files that won't import, usually from what I've seen there, is that the dump file is being imported into a different version of a database, and thus the data dictionaries are very different, or because it's an earlier version of dump, it doesn't recognize the file format. Those are the only issues I've seen thus far.

Anyways, will look at switching entirely to RMAN - thanks for all the info!

Wed Sep 07, 11:09:00 AM EDT  

Blogger Jeff Hunter said....

Second, daily consistent dumps - I thought the consistent parameter on exp was transaction level consistent, instead of make sure the dump file itself is consistent.
The second you lay your finder on the ENTER key, your backup is obsolete. Especially if you think you have to use CONSISTENT=Y.

For dump files that won't import, usually from what I've seen there, is that the dump file is being imported into a different version of a database, and thus the data dictionaries are very different, or because it's an earlier version of dump, it doesn't recognize the file format. Those are the only issues I've seen thus far.
Then you haven't worked with imp as your recovery method enough.

Also, have you ever tried to import an 80G dump file? Block off about 5 days on Solaris and 40 days on Windows.

Thu Sep 08, 12:11:00 PM EDT  

Anonymous chickswv said....

Tom -
Not sure if you remember me or not. I first met you in Charleston, WV sometime in 2001, when you were there for a presentation you gave to state employees. Anyway, I was telling you about the Treasurer’s Office not letting me put the e-comm db in archive log mode for the ITS director stated "it was too expensive in regard to disk space". Well, needless to say, as a DBA that was not allowed to administrate, I am no longer there. I was told that after I left they lost everything since the last cold backup costing them big $$. That is funny.

What you told me then is still fresh in my mind; "It does not make any logical sense to not put a database in Archive log mode."

I am still amazed that companies do not listen to their technical staff in regard to backup and recovery. I think that most of these upper mgmt folks who say no to archive log should be fired immediately when the server goes down. I guess it is wishful thinking though. Office politics will kill the Oracle db if allowed to.

Wed Sep 28, 01:38:00 PM EDT  

Blogger Thomas Kyte said....

Not sure if you remember me or not.

I sure remember the trip (winter, to WV - we were stuck in Dulles for a long time, I had just flown back from CA)...

I remember the conversation too - it is one I've had over and over and over.

Just another proof point that it is a matter of WHEN, not IF you will lose your data.

It's not funny really, it is sort of sad. I understand what you mean though.

Wed Sep 28, 01:52:00 PM EDT  

Anonymous Anonymous said....

people are going to hate me for saying this, but...

i have a major production system running in noarchivelog mode. the database is about 4Tb and we load millions of rows a day doing bulk inserts which are not logged anyway. there are very few moments when we are not loading, the processing takes a long time and it goes on pretty much around the clock.

we do backups every 2-3 weeks. we use a mix of transportable tablespaces and rman -- the rman piece takes about 3-4 hours of downtime and the transportable tablespace copies take about 27 hours, so there's no processing going on for 30+ hours.

we clone the database using rman and importing the transportable tablespaces, so we immediately (well after 7-9 hours) know if the backup was good and then we shut it down.

the feeling is that if we lose our primary DB then we will just have to live with re-running all the batch jobs that came in since the last backup -- thank goodness there's no user update. of course, reruns will take forever. we were 10 hours late coming out of read only mode and restarting the jobs, and we are still not caught up. maybe by tomorrow...

we are hoping that once we move out of this shared environment and get the better disks that the performance will improve and that jobs will stop running so long. we are getting lousy response times from the disks and the CPUs are always pegged.

maybe then we will be able to again address the archive logging, but i sort of doubt it. i think the best i can hope for is to have enough spare time to do the backups more often.

what a mess.

Thu Oct 05, 02:50:00 PM EDT  

Anonymous Sofia said....

This is a very useful piece of advice, I have to say. I am the one who suffered from such a data loss, now I tike the importance of back-ups twice as seriously.

Wed Mar 28, 11:25:00 AM EDT  

Anonymous Welson said....

I believe that if you learn some thing the hard way you will never forget it, but this is something which one should not wait to learn the hard way. Learn it the right way from the first day...

Mon Aug 06, 05:49:00 AM EDT  

Anonymous look said....

Worst case is that we lose data for an entire day which has been verified that it is something we can live with until we complete the process of switching everything over to ARCHIVEDLOG mode.

Thu Nov 15, 01:39:00 AM EST  

Blogger Thomas Kyte said....

@Anonymous,

that is a pretty bad worse case, especially since it is also the best case in the event of a failure - when that failure eventfully happens. as it will

Thu Nov 15, 09:20:00 AM EST  

Blogger Toutou Le malin said....

Hi,
What is the impact of archivelog mode on database performances comparing with noarchivelog mode ?

Thu Jan 30, 11:19:00 PM EST  

Blogger Thomas Kyte said....

@Toutou


if you value your data, then archivelog mode imposes approximately a NEGATIVE 120304% performance hit (meaning, it is a rather necessary bit of work).

ensure you have the disk bandwidth/latencies in place - and archivelog mode will not affect performance in a measurable sense.

Fri Jan 31, 10:53:00 AM EST  

POST A COMMENT

<< Home