Skip to Main Content
  • Questions
  • Autorestart a singleton database on Grid Infrastructure?

Breadcrumb

May 4th

Question and Answer

Connor McDonald

Thanks for the question, John.

Asked: June 24, 2017 - 8:32 pm UTC

Last updated: June 28, 2017 - 1:52 am UTC

Version: 11.2

Viewed 1000+ times

You Asked

Good Afternoon,

We have a grid infrastructure clustered environment that holds only singleton databases, so the instances are only active on one node. The instances are able to be started on other nodes, obviously. We have setup failover rules; however, we decided not to have the failover rule autostart a database instance that has crashed on its primary node automatically if the node is still up. My concern is that database instances mainly crashes when there is some corruption at the database layer. If we autofail it over to another node, the corruption will still exist and my concern is that it is possible to make the situation worse. If we set it only to notify the DBA when its in that state, the DBA maybe be able to gather information or the database may just be in an odd state like archiver full.

Am I being too paranoid? Should I set GI to autofailover when it can't communicate with the instance? What is the best practice? We do have the rule to auto-failover when the node the instance resides on is down.

Thanks,

John

and Connor said...

I can't really answer that for you :-) Ultimately its a risk/benefit business decision to be made.

For example, if my system availability is critical only during business hours, then I might opt to *not* auto-restart, because if it crashed during the night I can call someone out, and if it crashes during the day, then I know I have staff on hand.

But if system availability is critical 24x7, then maybe I'll opt for auto-restart because the time it takes to call someone out to investigate is too long to meet my business SLA's.

If it's any help, I've never seen a system get *more* damaged by failing over. In my experience, the issues are either transient (memory corruption etc) so a failover solves the issue, or catastrophic (ie, you can failover until the cows come home and that database aint gonna work no matter what).

But that's just *my* experience.

Is this answer out of date? If it is, please let us know via a Comment