Skip to Main Content
  • Questions
  • Oracle RAC not starting on one out of two nodes after server restart

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Salman.

Asked: September 29, 2023 - 8:45 am UTC

Last updated: October 16, 2023 - 1:46 am UTC

Version: Oracle RAC Database 19.3.0.0

Viewed 10K+ times! This question is

You Asked

1. We have 2 node Oracle RAC 19c (19.3.0.0) database installed over Oracle Linux 7.6 x86-64 (UEK). We gracefully shutdown cluster on each node and restart it turn by turn. The secondary node started normally after server restart and joined cluster services. However, primary node after restart is not joining cluster and is giving following errors when starting CRS:
PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
CRS-4000: Command Start failed or completed with errors.

2. Upon running ocrcheck on this node, following output is observed:
PROT-602: Failed to retrieve data from the cluster registery
PROC-26: Error while accessing the physical storage Storage layer error [Insufficient quorum to open OCR devices] [0]

3. When i checked secondary node (working node), it had all the requisite files in the location /u01/app/oracle/crsdata/mydb11/. However, when i checked the same location on primary node, there were only two directories available i.e., "crf" and "cvu".

4. Upon checking the path /etc/oracle/ on both nodes, ocr.loc and olr.loc files are present and pointing to their respective directions.

5. Cluster logs of the faulty node (primary node) are showing following errors:
OCROSD: utopen:6m': failed in stat OCR file/disk /u01/app/oracle/crsdata/mydb10/olr/mydb10.olr, errno=2, os err strink=No such file or directory
OCROSD: utopen:7: failed to open any OCR file/disk, errno=2, os err string=No such file or directory
OCRRAW: proprinit: Coult not open raw device

6. Since RAC is running perfectly on secondary node and all the required files are also present on that node, is it possible that if I copy "ocr" and "olr" directories from this node to the primary node and then start the cluster? Will this work?

and Connor said...

I spoke to RAC PM Anil Nair. A potential cause is a timing issue, namely that the required file systems are not available when the daemons start, but for anything like this, you really need to be logging an SR so that all of the required log files etc can be assessed.

Please keep us updated with any resolution that Support provides.

Rating

  (1 rating)

Is this answer out of date? If it is, please let us know via a Comment

Comments

A reader, October 12, 2023 - 7:05 am UTC

Thank you for the response. I have already logged an SR for the said issue. I have also extensively analysed the faulty nodes with reading all the logs and i have come to the conclusion that olr file that resides on the node and helps cluster to start on the node is missing. Moreover, surprisingly backups (both auto backups and manual if any) of the said olr are also missing. Therefore, if olr is missing on the node and there is no backup to recover it as well then node cannot start and join cluster. However, i have found a workaround to this issue which is as follows:

1. Re-configure problematic node like:
2. # <$GRID_HOME>/crs/install/rootcrs.pl -deconfig -force
3. # <$GRID_HOME>/root.sh
4. It should work and new OLR will be created and CRS will start.

I read the above mentioned method on Oracle Forum whose link is as follows:

https://forums.oracle.com/ords/apexds/post/olr-restoring-without-any-backup-5704

I am not sure whether to try this solution or not since its a production database and only one node is working in RAC at the moment. Thanks.

Regards
Connor McDonald
October 16, 2023 - 1:46 am UTC

Get the go-ahead from Support before doing this.

As the NASA saying goes:

"There is no problem so bad on a spacecraft that cannot be made worse with the wrong action"

More to Explore

Administration

Need more information on Administration? Check out the Administrators guide for the Oracle Database