distributed GRD
marcos cubric, June 21, 2005 - 1:02 pm UTC
hi,
thanks.
this is consistent with oracle's support.
what i could never get from oracle support is the following:
since the GRD is distributed (not replicated), what happens where one instance crashes? how the portion of the GRD being managed by the crashed instance can be recovered?
thanks
June 21, 2005 - 5:17 pm UTC
the resources (since the hash is now different) are reallocated entirely across the surviving nodes.
anything locked by the failed instance is recovered and rolled back(unlocked) and everyone else knows what they have.
RAC GRD recovery
Jonathan Lewis, June 21, 2005 - 6:54 pm UTC
To expand on the re-hashing.
Take a four-node RAC.
Node A has 1/4 of the GRD in its memory, including
the status of resources 1 and 5.
Assume Resource 1 is currently held exclusive by node A
itself, Resource 5 is currently held exclusive by Node B.
This means node B holds a local copy of some of the GRD
information about resource 5.
Node A crashes.
Assume node D becomes the recovery master; it interrogates
nodes B and C for any information about resources they are
holding that were node A's portion of the GRD.
Node B tells node D about resource 5.
Nothing can tell node D about resource 1.
So node D can acquire and sort out the problems
of any resources that were in A's GRD, but in
use by other nodes. So the GRD can be shared out
safely (re-hashed).
Any resources in A's GRD that were only in use
by A don't really matter, because if they were
held exclusive for blocks to be updated, then
A also had (now dead) transactions on those blocks -
and the blocks can be recovered by normal "instance
recovery" operated on behalf of node A by node D.
Of course, as the instance recovery proceeds, the
resources that A had been holding exclusive will
be acquired exclusive by node D as A's redo is
rolled forward, and its uncommitted transactions
are rolled back, and released as recovery completes.
There are more details, and the timing isn't quite
the way I've put it - but I think this should give
you the idea of how it all hangs together.
June 21, 2005 - 9:11 pm UTC
Jonathan, thanks much.
Reconfiguration
whizkid, June 22, 2005 - 6:51 am UTC
hi,
i've noticed that if node A crashes, then reconfiguration starts on the recovery master node.
1. How does oracle choose which node will act as a recovery master in case there are more than 2 nodes?
2. And during recovery, how does that node get hold of node A's redo to carry forward if the server itself crashes? in the above example, suppose node D does the redo rollover. Then we start node A. will it do an instance crash recovery? redo is already applied by node D.
3. Reconfiguration started
List of nodes: 0,1,
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 18169
125019 GCS shadows traversed, 0 cancelled, 4975 closed
120044 GCS resources traversed, 0 cancelled
73330 GCS resources on freelist, 132570 on array, 132570 allocated
set master node info
Submitted all remote-enqueue requests
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
125019 GCS shadows traversed, 60804 replayed, 4975 unopened
Submitted all GCS remote-cache requests
0 write requests issued in 59240 GCS resources
0 PIs marked suspect, 0 flush PI msgs
Wed Jun 22 13:04:14 2005
Reconfiguration complete
Post SMON to start 1st pass IR
Wed Jun 22 13:04:14 2005
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
The above re-configuration is from a node doing the crash recovery of other node. the numbers against GCS shadows & GCS resources, what do they tell us?
thanks.
p.s. why is hash algorithm so widely used in Oracle?
June 22, 2005 - 7:11 am UTC
all of the instances are "in touch"
all of the instances have access to everyone elses online redo log (shared disk system)
hashing algorithms are used perhaps in most every piece of software out there. They are an efficient method of organizing and storing data.
Resources and Shadows
Jonathan Lewis, June 22, 2005 - 2:03 pm UTC
GCS resources would be the global resources relating to the buffer cache. I believe that the number reported under resources is the number of full GRD entries held by the local instance and the number reported under shadows is the number of partial entries where the local instance is not the master, but is using a resource.
Note - if A is the master for a global cache resource, but B, C, and D have all had some reason to use the resource (i.e. the block covered by that resource) there will be partial copies of the resource information at each of B C and D. But I believe each partial copy is only "I am a user in mode {whatever}, and the state is {whatever}, A is the master"
RAC master node,
A reader, May 16, 2006 - 2:27 pm UTC
How to determine which is the master node in RAC database?
Suppose if I have bunch of scur (state=2) in x$bh table on instance 1 and nothing on other instances, what should I infer?
The reason for my second question is, I keep seeing gc cr multi block request wait event with maximum wait time. When I digged deep it was found out that one of the object in my query has lot of scur in x$bh (I joined dba_objects and x$bh tables to find out that).
Is that object causing the slowness in the query?
Is that the reason I am having gc cr multi block request in my gv$session_event table?
Thanks,
Session in RAC environment
sandro, June 04, 2007 - 8:42 am UTC
Why if I execute this statement...
select s.inst_id, s.sid, s.serial#, s.audsid, s.process
from sys.gv_$process p
,sys.gv_$session s
where p.addr = s.paddr
and s.inst_id = p.inst_id
and s.audsid = sys_context('USERENV'
,'SESSIONID')
and s.inst_id = sys_context('USERENV'
,'INSTANCE')
;
... I have two rows...
INST_ID SID SERIAL# AUDSID PROCESS
---------- ---------- ---------- ---------- ------------
2 309 10543 803291 3236:4064
2 313 9567 803291 1324630
...and NOT only one?!?!?!
BUT only SID=309 is correctly my session...SID 313 is anoter session, of anoter user...WHY????