div.b-mobile {display:none;}

Wednesday, April 27, 2005

Messed up big time

Ok, so I messed up.

On asktom, you have the ability to be notified by email when an update to your question is done. That is, when someone posts a “review” or I do a “followup”.

Well, over 5 years, lots of emails have changed – so when a question from 2 years ago is touched, the odds are very high that the email will bounce. Guess who it bounces to. That would be…. Me.

Ok, so I’m writing the second edition yesterday morning (and hating most every minute of it so far… The second edition is less fun than the first, harder to update than write from scratch). So, any diversion is acceptable. The diversion this morning – one too many bounced emails. I must do something about them.

I decide “I know enough, heck, I wrote it”, so I proceed to just update the repository of questions and answers – setting the notify flag to false for any question over 6 months olds (yes, that would be most of them).

Before I write the following paragraph, I want to make something perfectly clear, this is about me, it is not about anyone else. It contains no hidden meanings. It is something I will say a dozen times a week in writing, not as often as I’ll mention BIND VARIABLES perhaps, but close. So repeat: this is not about anyone but me. None of these articles have been about anything but what I wanted to say – on my own time, after hours, from my home, not on Oracle time. Anyone that thinks this is some of of dig at them is charging windmills like Don Quixote.

What I did was the functional equivalent of updating the data dictionary. As that is a big huge mistake, this was a BIG mistake, I knew it wasn't 'smart' – but it seemed “so easy”. I bypassed my own API’s and just did it. Huge mistake.

How many people noticed that I actually updated over 8,000 articles yesterday at 10am? All of them – same timestamp….

Problem was, I did not notice. In fact, it wasn’t until later in the afternoon that I received this email:

To: thomas.kyte@oracle.com
Subject: A very, very busy day ;o)

... all "Ask Tom" entries got updated on 26 Apr 2005 10am ...

(yes, that would be the Bozo the clown emoticon ;o) Max -- thanks for that ;)

Well, if you look for silver linings – this was, yes, yet another diversion from writing! However, it was quite the bummer since I got the email about 3pm – some 5 hours after the deed, and my DBA had the undo retention set a bit low (it is now much higher thank you very much). So, we restored to another machine, rolled forward to 10am, created a scratch table as select the primary key/timestamp from this table, exported that – ftp’ed it over to the real server, imported it, did the update and – hey, we are back in business.

Only it was a stupid thing. Because I knew too much and forgot even more. Forgot about that trigger (but remembered to disable it during the “fix it” update). I hate triggers -- I'm on a campaign to eliminate them where possible.

I’ll use the API next time, actually, I’m going to modify the code soon so as to not send updates to people after six months have gone by. It’ll really reduce the amount of bounces I get (I get so many, I don’t even look at them anymore – which means the improperly addressed email from me gets tossed too, which is bad)

Moral of the story: updating tables directly -- bad. Really bad. Even when you think you know what you are doing.
POST A COMMENT

33 Comments:

Anonymous Michael Norbert said....

Human error happens all the time. It's good to see it happens to everyone. I try to never get mad at people at my work when they do something like this. It's the recovery that is important. I once had the joy of rm -rf $ORACLE_HOME

Wed Apr 27, 08:17:00 AM EDT  

Anonymous Dave said....

Next Question :-)

From Starting at Oracle in 1993, how did you get to be the VP you are now.

What did you start of as and how did your career in oracle progress

Wed Apr 27, 08:18:00 AM EDT  

Anonymous Anonymous said....

'Morale' of the story
should be 'Moral' (:

Wed Apr 27, 08:26:00 AM EDT  

Anonymous Anonymous said....

http://www.imdb.com/title/tt0049470/

;O)

Martin

Wed Apr 27, 08:27:00 AM EDT  

Blogger Thomas Kyte said....

Human error happens

You should have seen the look on the guys face when he typed in

$ cd /
$ rm /tmp * [enter]

I report to him now, so I cannot really make too much fun.... But I wish I had a camera.

Next question

filed away as an idea...


Morale

Indeed, and it now is :) thanks...

the man who knew too much, cool.

Wed Apr 27, 08:35:00 AM EDT  

Blogger Bill S. said....

You can be extremely knowledgeable and still make mistakes ;-D. I once had a situation where a customer reported their AS/400 system died due to a tape unit failure. Of course, everybody KNOWS non-essential peripherals can't take a system down, right? To prove my point, I hit the power off switch on the back of the tape unit on our PRODUCTION system. But those systems had RACK-MOUNTED devices - the power switch for the tape unit itself is in the front, the rack switch is on the back of the rack (guess which one I threw?). Turns out the customer's tape unit didn't fail, the RACK it was in did, and the system decided a power failure was imminent so it shut down. So I had to explain to my boss why we were going to spend the next 3 hours recovering a production system. He had a good laugh over it after it was done, but I never did that on a production system again!

Live and learn.

Wed Apr 27, 09:16:00 AM EDT  

Blogger DaPi said....

Putting toothpaste back in the tube is an essential DBA skill.

Wed Apr 27, 09:46:00 AM EDT  

Anonymous Anonymous said....

This story illustrates the importance of designing and testing ANY change before implementation in production. I have a simple philosophy: If a change to production has not been fully tested, assume something bad will happen.

“I hate triggers -- I'm on a campaign to eliminate them where possible.”

Now that is an interesting statement. I’ll have to think about that for a while. I guess the key is “where possible.“

Wed Apr 27, 09:53:00 AM EDT  

Anonymous Anonymous said....

Hi,

how do you relax ? Do you do Yoga or other practises ? :o)

Wed Apr 27, 11:07:00 AM EDT  

Blogger David Aldridge said....

I think that self-criticism is a normally healthy thing, but that was just a horrible self-flaggelation that we could have done without reading. And then anonymous critiques your misuse of the word "morale" ... must be ready for Friday.

I wish I could make you feel better by pointing out a mistake that I've made, but alas ...

Wed Apr 27, 12:31:00 PM EDT  

Blogger David Aldridge said....

Hey did that come across as a bit snippy?

Didn't mean it to -- just making the point that you're being v. hard on yourself.

ps. Please don't block my ip address.

Wed Apr 27, 12:56:00 PM EDT  

Anonymous denni50 said....

Tom....

don't sweat the small stuff...life
would be boring without mistakes.

:~)

Wed Apr 27, 02:08:00 PM EDT  

Anonymous Anonymous said....

"Ok, so I’m writing the second edition yesterday morning (and hating most every minute of it so far… The second edition is less fun than the first, harder to update than write from scratch). "


Tom, how's the writing coming along? I hope you are not thinking of extending the publication date.

Wed Apr 27, 02:16:00 PM EDT  

Blogger Thomas Kyte said....

Tom, how's the writing coming along

good news, bad news day...

just sent chapter 8 off. Sounds good right.

Well, I missed one in my outline, just figured it out this morning. That was a chapter 8 that was really a chapter 7.5. So while I did chapter 8 -- it was a chapter 8 that until this morning did not exist :)

Oh well... such is life.

Now to get to chapter 9 which yesterday was chapter 8 but now is not.

Writing is not like cooking. Writing is more like a software project, it's over and it's over and not before... Trust me, I want this to be over more than you do!

(it's not that bad, just very time consuming, so many little nit picky details have changed and I know more than I did 4 or 5 years ago so I have more to say about somethings....)

Wed Apr 27, 03:35:00 PM EDT  

Anonymous Doug C said....

People who don't make mistakes are either strange, or lying, which are both good reasons to not want to work with them. I noticed under [Your Questions] I still see a handful of April 26,2005 10am. Send you an email about it.

Wed Apr 27, 03:57:00 PM EDT  

Blogger Alberto Dell'Era said....

So you are at 8 / (23+1) = 33% of the road ... and you have yet to read the Reviewers' comments, and the Web Reviewers' comments (you got about 20 web reviews for the two semi-technical chapters - i bet they will increase for the next ones) ...

Next blogs should be on "how to manage panic", then "how to survive sleepless months" ;)

Escape: me, i would not care if the deadline slips; i prefer to wait and get a masterpiece (I have very high expectations for the book, i'm 100% sure it will become The Oracle Bible for Developers), and especially, get back a still-alive Tom Kyte :)

Wed Apr 27, 04:39:00 PM EDT  

Blogger David Aldridge said....

What software do you do the book writing on, Tom?

Wed Apr 27, 04:44:00 PM EDT  

Anonymous Anonymous said....

Hmmmm. For the casting of the movie treatment of "Expert 1-on-1", I had been thinking Will Smith as Tom: the blend of intelligence, looks, and physique should capture the essense of the character.

But after this incident I am thinking it might be better to have Brad Pitt reprise his role as Achilles in _Troy_: the demigod with the tragic human flaws. Physique still pretty good too!

sPh

Wed Apr 27, 05:31:00 PM EDT  

Anonymous Anonymous said....

I’m a bit shocked by some of the comments that seem to minimize the nature of Tom’s post. Sure, we all make mistakes, but we should also try to minimize the number of mistakes we make. Tom gave one suggestion - use
the API. Another is to test the changes in a non-production environment first. For me, that is the lesson I learned from Tom’s post.

Wed Apr 27, 06:06:00 PM EDT  

Blogger Thomas Kyte said....

I'm a bit shocked by some of the comments

Trust me -- I won't be touching that like that again. Part of it is that perhaps I don't truly consider it a mission critical production system, more of a hobby if you will. If I hadn't written it initially, and if I wasn't the person most affected by it (well, pretty much "the" person), no way.

I did the bad thing of updating application data without going through the application -- overconfidence perhaps. Won't be doing that again.

Wed Apr 27, 06:14:00 PM EDT  

Blogger Thomas Kyte said....

What software do you do the book writing

word :(

i know where the save button it for sure.

advice: never go over about 100 pages in a single document. backup alot.

and in a pinch, strings -a on unix can get alot of it back.

Wed Apr 27, 06:15:00 PM EDT  

Blogger Bill S. said....

Tom,
Just an aside - how about a small blurb in your blog each day : "Here is the new thing about Oracle I learned today"? Just as a human interest piece - I would be very curious to see what you find (asuming of course, that it isn't dangerous to let the public at large know). As for the book - just keep on plugging, I'm sure you doing a bang-up job as usual.

Wed Apr 27, 07:11:00 PM EDT  

Blogger Thomas Kyte said....

"Here is the new thing about Oracle I learned today"?

not so sure about that... The think I learned today was that page 172 of Expert One on One Oracle only applies to Oracle 9i R2 and before, in 10g it is all different (and takes about 2 pages to explain what those differences are :)

See, now everyone that doesn't have Expert One on One will be left wondering "what is he talking about..."

Wed Apr 27, 08:30:00 PM EDT  

Blogger Peter K said....

Whooops! Tough but a lesson driven home. Life happens. The thing is to be honest and focus on the recovery. Covering up is where they get you (e.g. Nixon, Martha Stewart, etc).

At least, you now know your recovery process and procedures works.

Thu Apr 28, 02:42:00 AM EDT  

Anonymous Matthias Rogel said....

6 stars, great story !!

I usually check asktom around this time
(9 o'clock in Germany) for fresh meat
reading the Subjects and picking out what seems interesting to me.
using the "next" link after each 10th Subject.

Yesterday I knew something was wrong as soon as I came to page 4 or 5 or 6 -
I think all Subjects were around 6 months old and updated 26 Apr 2005,
this went on for more than 10 pages.

I also was quite sure that you would fix it soon and was quite happy to read the story behind that today.

However, I personally do not agree with
your opinion about triggers.

I love triggers.
I use them as much as I can.

When doing a heck like that, I first check which triggers will fire, and put in a 1st line
if mypackage.donotfire then return; end if;
and before updating/deleting/inserting
exec mypackage.donotfire := true

But of course you are right, not
using APIs is a mistake.
However, never met anyone who denied
doing a heck like that when it seems to save you some hours !

Thu Apr 28, 03:20:00 AM EDT  

Anonymous Anonymous said....

> I think that self-criticism is a
> normally healthy thing, but that
> was just a horrible
> self-flaggelation that we could have
> done without reading.

I disagree by the way. I printed out this post and talked it over with my Operations group (such as it is) yesterday. It was a good example and led to a good discussion. There is a reason why safety-critical systems require multiple reviews and sign-offs, often including one by an otherwise uninvolved party. At the same time - not every system in the world is a safety-of-life or safety-of-flight system.

sPh

Thu Apr 28, 09:46:00 AM EDT  

Anonymous Dan Kefford said....

Tom...

Have you ever considered using LaTeX with or without LyX for your books? Or do your publishers require manuscripts to be in a more popular format?

Thu Apr 28, 03:05:00 PM EDT  

Blogger Thomas Kyte said....

using LaTeX

pretty much forced to use word -- the comment facility....

Thu Apr 28, 03:10:00 PM EDT  

Blogger Shivaswamy said....

Tom,

Not that, I want it to be fixed.. But to just to let you know that, still we have '26-Apr-2005 10am (first asked)' for "posted by you Answered, Do not publish " records.

Shivaswamy

Sun May 01, 02:04:00 PM EDT  

Blogger Thomas Kyte said....

Not that, I want it to be fixed

I've got the fix, I've got the data, I've got the test tested...

Now, I just need to find the 15 minutes to do it -- soon, very soon. My main goal was to get the published ones fixed first.

but soon

Sun May 01, 02:11:00 PM EDT  

Anonymous robertc said....

rofl haha you got owned by a lowly trigger

Mon May 02, 01:35:00 AM EDT  

Anonymous Anonymous said....

Had exactly the same thing happen with a "populate new fields on existing records" script as part of a release. I was actually having my dev database updated as my project was going live after theirs - but because my project used that timestamp to consider which records to process, I spotted that suddenly they were all within 10 minutes of each other...

Tue Jun 14, 05:31:00 AM EDT  

Anonymous Anonymous said....

I've just come from asktom.oracle.com

So How do you Ask Tom?

I could see how to search for answered questions but no place to submit a question.

Wed Jan 11, 12:00:00 PM EST  

POST A COMMENT

<< Home