Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Basharat.

Asked: July 16, 2002 - 8:57 am UTC

Last updated: April 16, 2014 - 5:04 pm UTC

Version: 8.1.6

Viewed 10K+ times! This question is

You Asked

Hi Tom

You are doing a great job by providing solutions and help to Oracle users. I want to ask you about 9i RAC, could you please explain how they work in delivering 24/7 database availability.

Thanks

Basharat Mehmood
MashreqBank Dubai.

and Tom said...

You use redundancy to remove SPOF's - single points of failure.

You use raid for example to make sure the disks never fail.
You use dual NIC's and networks to make sure the network never fails.

RAC allows you to use more then one computer in a cluster to make sure the failure of a single computer (n-1 computers actually) doesn't bring down the database. By running more then one instance of oracle on different computers -- all accessing the SAME database -- we can increase (greatly) the availability of the system

</code> http://docs.oracle.com/cd/B10501_01/rac.920/a96597/toc.htm <code>

read that for more details.

Rating

  (365 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

How about the stability of RAC on linux?

chao_ping, July 16, 2002 - 10:40 pm UTC

OPS is not that good as oracle said, for many customer's OPS database is slow or frequent 600/hang/down. Maybe this is because of it is poorly implemented and is used in oltp system, but it is not stable as single instance, this is the fact.
Now oracle said even OLTP can be used without application partition,i want to know if there is some successful case that has used oracle9i rac? What about the RAC on linux? Is it stable? It seems oracle want to take the low end market of rac on linux, and we are going to implement one RAC cluster on linux with 2node/4cpu each, but as a dba, i am not sure whether it IS STABLE.


Tom Kyte
July 17, 2002 - 10:10 am UTC

OPS is not RAC, you had to design your applications for OPS (and we always said that - so what you had was a poorly designed OLTP app on OPS).

With RAC -- you do not have to physically design to scale up in a cluster -- very very different architecture, Read about it. then -- give it a whirl

Real RAC customers

Alex, July 17, 2002 - 9:25 am UTC

Chao,

Some of the customers that use RAC are:UPS, CERN, FAA( on Linux ) , DELL ( On linux ), Austrian Railways, BACS, Bavarian Police Force, e-Spatial, Navision a/s, North Rhine-Westphalia Police Force, Orange Denmark, South African Police Service, Syngenta, Top Toy, Wolfe Group and etc. From what I know there are more than 800 customers who are implementing RAC and more than 250 of them live.If you follow the news, recently SAP certtified RAC for their suite
( You just cannot get more OLTP than that )There are some pilot projects have already started with mutually agreed-upon customers running SAP solutions on Oracle9i.
According to Dell and FAA ( and those guys are really 24X7) they did not have any issues with 9iRAC stability on Linux.

in-house applications migration to 9i RAC

A reader, March 14, 2003 - 3:38 pm UTC

Hi

We have several in-house designed application based on Oracle Developer 2000, Pro*C and many PL/SQL packages, procedures and functions. We are running all this in 7.3.4. We are considering migrate it to 9i and a couple of them to RAC. I have been doing heavy research on RAC regarding migration issues, advantages and disadvantages. Of course all I have found except one whitepaper all says RAC is wonderful but none of them actually talks about the issues you should consider. The whitepaper by Mr James Morle shows RAC can be much slower than single instance databases.... I even did a heavy search on Metalink seeking for migration issues we should look at but found nothing.

Do you know any whitepaper which Oracle Corp. has released stating some migration issues we should have in mind? Is RAC totally 100% transparent to applications?

Thanks in advance

Tom Kyte
March 15, 2003 - 8:53 am UTC

don't read whitepapers -- which are snippets of information

read the documentation:

</code> http://docs.oracle.com/docs/cd/B10501_01/nav/docindex.htm#index-REA <code>

there is a performance/tuning guide which calls out some things you want to be aware of.

Everything, when used inappropriately, can run really slow. I can show you two applications that do the same thing but do it differently. One will run really slow, one really fast. Everything will affect you.

The thing with RAC is that a bad programming choice that affects you marginally on a single instance may will be magnified on RAC (or vice versa). So something that runs "ok" in a single instance may fall apart on RAC (and vice versa)

RAC and OPC

A reader, March 15, 2003 - 9:33 am UTC

How is RAC different(or similar) from OPC? What is the difference between RAC and OPC?

Tom Kyte
March 15, 2003 - 12:44 pm UTC

I assume you mean OPS -- Oracle Parallel Server.

It is all about the way in which cache fusion works in a write write conflict.


see
</code> http://docs.oracle.com/docs/cd/A91202_01/901_doc/rac.901/a89867/whatsnew.htm#983402 <code>



RAC vs OPS

GVN, March 16, 2003 - 9:45 pm UTC


9i RAC vs 8i Advance replication

Alvin, June 23, 2003 - 3:02 am UTC

Is RAC a newer more robust version of 8i's Advance Replication (AR) ?

Or is it totally different ?

Tom Kyte
June 23, 2003 - 8:16 am UTC

RAC is a new more robust OPS.

it is not replication, not close.

see
</code> http://docs.oracle.com/docs/cd/B10501_01/rac.920/a96597/toc.htm <code>




RAC versus OPS

A reader, July 17, 2003 - 4:37 pm UTC

"OPS is not RAC, you had to design your applications for OPS (and we always said
that - so what you had was a poorly designed OLTP app on OPS)."

Thanx Tom,
I take it that if your application is well developed
and tested etc., for development, we dont need to
run it on RAC. For example, if deployment uses MTS,
we should develop also on MTS. If deployment is on RAC
then from above, can I conclude that we dont need to
develop also on RAC?

Thanx!!!



Tom Kyte
July 17, 2003 - 5:17 pm UTC

you would most certainly want to develop on RAC if you intend to deploy on RAC.

thanx tom!

A reader, July 17, 2003 - 6:35 pm UTC

Would you be kind enough to give some examples where
things may go wrong if a RAC is not being used in development and testing but gets used in production?

Is it common for people to develop on RAC clusters
in such cases? How easy or difficult is it for
a developer to set up a RAC configuration?

Thank you!!

Tom Kyte
July 17, 2003 - 9:09 pm UTC

would you develop, test on windows and go production on HP/UX?

it is just *different* code, the potential for something to be different exists.

Re: RACs

A reader, July 17, 2003 - 10:49 pm UTC

well, this is what the RAC documentation says

</code> http://download-west.oracle.com/docs/cd/B10501_01/rac.920/a96597/psintro.htm#10947 <code>

"Transparency
The concept of transparency implies that Real Application Clusters environments are functionally equivalent to single-instance Oracle database configurations. In other words, you do not need to make code changes to deploy applications on Real Application Clusters if your applications ran efficiently on single-instance Oracle configurations.
"

This seems to imply that you could develop and tune in
one instance and then deploy it on RAC.

Thanx!


Tom Kyte
July 18, 2003 - 1:45 am UTC

go for it -- if you like risk. if you like inflicting pain on your end users. if you like looking "not good"

think about it. would you really develop on platform X and then without testing deploy onto Y?

(common sense must figure into the equation - I am a "risk adverse" person when it comes to some things -- like testing, development, deploying a production application. You sort of want to make sure things will actually WORK before you inflict them on your end users)

RAC

Tim Hall, July 18, 2003 - 5:25 am UTC

I've been using RAC on Tru64 in development and production for a few months now and it's great. We've not had to do any RAC-specific modifications and it's been really stable.

If cost is an option you should move to Linux. That way you can build a development RAC using FireWire and save yourselves loads of cash :)

Cheers

Tim...

Tom Kyte
July 18, 2003 - 8:47 am UTC

and again, if you test on something totally different from production -- be prepared for long days, longer nights, lots of gnashing of teeth and mad people.

Applications on RAC

A reader, October 06, 2003 - 8:55 am UTC

Hi Tom,

We are developing Java Based appplication invloving J2EE architecture and Oracle 9iAS to be deployed upon Oracle9i RAC.

Recently we came across, while testing, a few limitations like global application context cannot be applied on RAC. Similarly we cannot use cached sequence numbers. Can you please help me by giving some details or link on what all limitations are there if using RAC and java based applications.

Thanks and Regards

Tom Kyte
October 06, 2003 - 10:29 am UTC

why cannot you use cached sequences?

Wrong wordings

A reader, October 06, 2003 - 10:47 am UTC

Hi Tom,

My mistake, i used wrong words in framing the statement. What i meant was one node breakdown can cause the loss of unused numbers in the sequence as the cache option default value is 20. And after going throught the following links i was sure that it was not a good idea.

</code> http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=FOR&p_id=421566.996 http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_id=1031850.6&p_database_id=NOT

Any way i m still looking into what limitations can be there for TAF. Just found the following link. 

http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=97926.1 <code>

Can you suggest me some more links on this issue and how to handle such problems if possible. As we think we need to do some rework in changing our application.

Thanks and Regards

Tom Kyte
October 06, 2003 - 11:02 am UTC

so? sequences are random numbers -- you can loose hundreds of thousands per second and still not exhaust one in trillions of years!!!! don't worry about that -- you only want to use cached sequences (use a big cache, it'll feel better and run faster)

you will lose sequences NO MATTER WHAT -- cache or nocache -- NO MATTER WHAT.


treat sequences as what they are -- a way to generate unique keys. The number they return has no meaning as far as "age", "order", etc....



what "problems" -- i see a note on the limits of TAF -- but what "problems" do you see?


it is very hard to guess what you "want"




Limitaions again

A reader, October 07, 2003 - 3:35 am UTC

Thanks Tom, for clearing the doubt on sequences.

I think im not able to explain properly, let me try to explain it again. We are using java based applications on J2EE archictecture involving JSP, Struts, JS etc as said earlier. We are using 9iAS as the application server and Oracle9i RAC as the database.

In our application we are using OID authentication during user login which will be using user_context and application_context to assign roles, roleid, userid, siteinfo, siteid, locationinfo, locationid, machineinfo, clientip, osuser etc to the user who has logged in whenever a connection is made. We wil be storing these information at the application server level. These information plays a major role in our application for inserting data which involves tracking who has inserted what data (This is our major concern).

Now some of these information like sid, serial#, session information etc will be instance specific which will all be lost in case a TAF occurs due to NODE failure. So what i was looking for is some way in which we could trap the message comming from the NODE which has gone down and using that message make the application fire the transaction again from the begining (which involves assigning user_context and application_context again) without user intervention or may be we could continue from where the connection was lost. Im little confused at the moment on how this can be done on RAC.

Or one other thought is to treat RAC as a single NODE machine and let the application behave as it would have done in a single NODE machine. Forget TAF and user RAC just for scalability and load balancing.

Therefore i was looking into what all other such limitations are there.

I hope im able to explain my problem this time...

Thanks and Regards.



Tom Kyte
October 07, 2003 - 8:05 am UTC

this isn't really a "limitation" -- it is just the "way it works".

are you using a connection pool?

Connection Pooling is there...

A reader, October 07, 2003 - 8:34 am UTC

Yes we are using java connetion pooling.

Tom Kyte
October 07, 2003 - 9:07 am UTC

so, you cannot cache this information in the session -- you'll have to cache it in the middle tier state. Every single call will get a different sid/serial# anyway (RAC or not).

This has nothing to do with RAC now -- you have to look at what connection pooling is going to mean. You are going to migrate from session to session anyhow. You will be "TAF'ing" in effect with each and every "get me a connection" call in java.

same as above

A reader, October 07, 2003 - 11:23 am UTC

We have come up with a possible solution, i would like ur comments on this.

What we are planning is to have a EJB Session bean at the application server level which will be passing the user context details (excluding the sid, serial# etc which won't be needed as we just need to know who is inserting what data) to the connection pool every time a connection is requested.

Its going to be like this.

Through OID we get authenticated (thats the initial and one time job), go to webtier which will capture the user informations like userid, ipaddress (can be a false ip, if user is behind the firewall or proxy) and pass on this information to the EJB session bean.

This EJB session Bean will pass through a number of java classes collecting rest of the informations necessary other than the userid and ipaddress to build a statement (like a select, insert, update or delete statement) based upon the user input on the screens thrown from the webtier and store a thread id in a hash table at application server which will be removed after a connection is returned back to the pool.

This statement will then make a connection, can be a pool connection or normal connection based on the data-source.xml file.

So what we have here is all the necessary information regarding the user details to be inserted in the tables as well. So now it does not matter if a NODE goes down as we will be sending all the information even before the connection is picked from a connection pool and will be sending the info every time.

Earlier we were planning to use the global application context to store the user information.

So what i wanted to confirm was, now that the information will be passing everytime, so will TAF take place the same way it happens for a normal client-server transaction?

Thanks and Regards

Tom Kyte
October 07, 2003 - 11:30 am UTC

not sure where any TAF concerns come into play in this entire scenario? Yes, TAF will just be TAF in this scenario.

(i can say it sounds like a ton of work just to run a query that could be done really effectively in a stored procedure -- hint hint hint. make sure you haven't gotten enamored of a technology to oversolve a simple problem)

:)

A reader, October 07, 2003 - 1:44 pm UTC

Thanks Tom,

Actually we are trying to build this application as a product so that in case the client has different database other than oracle we could use the same application without much rework. You can say we are not exploiting the power of oracle as using stored procedure would make it Oracle specific. So we are trying to keep the business logic at application level as much as possible. The current client of ours is using RAC and 9iAS. We are also going to test this on different RDBMS other than Oracle. Lets see where we land up.

Thanks and Regards

Tom Kyte
October 07, 2003 - 1:57 pm UTC

you'll land up spending all of your time trying to make the thing work -- thats my 2cents worth.

You can either run really well on one

Or
Modestly OK on one and really bad on the rest.



I think the BEST way to achieve portability is to put ALL logic into stored procedures using the BEST practices of each vendor. Databases are not created equal -- we have different (radically so, radically so) concurrency models, locking, transaction semantics, FEATURES.

Good luck.

Thanks i need that :)

A reader, October 07, 2003 - 2:20 pm UTC

Thanks for you valuable suggestions. I'll definitely keep that in mind and see how much we can implement that.

Regards

Real Application Clusters

S.Darshan, October 08, 2003 - 8:24 am UTC

Thanks for your support to provide useful technical informattion in this site.

I have refered metalink document id 175465.1 for RAC, URL as follows,

</code> http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=175465.1 <code>

"<quote>2.7 Installing a Volume Manager

It is now necessary to install volume management software. Sun Cluster v3 supports two products:-

Sun's Solaris Volume Manager software (Solstice DiskSuite software) or VERITAS Volume Manager (VxVM) software v3.0.4+ (32-bit RAC), v3.1.1+ (64-bit RAC) is needed to provide shared disk access and distributed mirroring.
Although Sun's Solstice DiskSuite (SDS), integrated into Solaris Operating System (SPARC) 9 onwards as Solaris Volume Manager (SVM), is supported by Sun Cluster v3, neither SDS or SVM supports cluster wide volume management, it is only for per node basis. Hence these products cannot be used for RAC."<quote>

My question is shall we use Sun's Solaris Volume Manager for RAC in two node configuration? What is the difference between the above two volume manager? Can you pls give some documentation or link for Sun Clusters?

With best regards
S.Darshan




Tom Kyte
October 08, 2003 - 10:49 am UTC

you would have to evaluate sun vs veritas with sun and veritas.

either product is capable of serving your basic needs (that of RAC)

you will want to evaluate the products based on your specific needs/wants however.

Is 10g the new RAC ?

Neil, October 09, 2003 - 6:33 am UTC

Would it be a good thing to get hold of 10g or implement RAC?

Tom Kyte
October 09, 2003 - 4:59 pm UTC

well, 10g isn't out

9iRac is

sooooooo, I'd be getting 9i RAC right and 10g sometime next year.....

10g is the next release of Oracle. RAC is an option of the 10g database. 10g encompasses some 360 some odd new bits and pieces of functionality -- much of it in the manageability arena

A reader, November 18, 2003 - 10:41 am UTC

Tom,

From RAC Concepts

<DOC>
The Global Cache and Global Enqueue Services maintain information about each resource within a cluster. The GCS and GES nominate one instance to manage all information about a particular resource. This instance is called the resource master.
<DOC>

If i have 4 instances how can i determine which instance is the resource master.


Thanks.



Tom Kyte
November 21, 2003 - 7:57 am UTC

every machine is a resource master of some resources - the resources are spread across all nodes.

A reader, November 21, 2003 - 11:01 am UTC

Tom,

Sorry for not understanding your answer, i am trying to understand this quote

"The GCS and GES nominate one instance to manage all
information about a particular resource. This instance is called the resource master"

Do they mean that one instance will be " resource master"
for all the instances.

If yes then, if i wanna change the resource master to some other instance how can i do that.

Thanks.




Tom Kyte
November 21, 2003 - 5:00 pm UTC

a PARTICULAR resource

not

ALL resources



RAC - How to setup

Sudhir, November 29, 2003 - 3:52 pm UTC

Tom,

I am hoping you or some of your readers have some document on how to setup RAC on windows2000 (one cpu) single machine.
My PC is Dell, 300MB RAM with w2k. I have 2 disks. I would very much like to install RAC and see for myself how it works.

Thank you in advance for any help.

Tom Kyte
November 29, 2003 - 4:35 pm UTC

you need two computers to run RAC

it is a clustering thing, you need more then one computer.

Instead of sequences in RAC environment

Sudhir, November 29, 2003 - 4:00 pm UTC

Tom,

Can you see effectiveness of using

instance id + TIMESTAMP(with 9 precision) as a sequence
number? I am mean what is the chance the two timestamps will match. And if they, I am sure thats not the only thing in the PK, correct? You have account numbers etc as well.

May be add a bit of random number at end to spice things up?

Thanks for your thoughts

Tom Kyte
November 29, 2003 - 4:35 pm UTC

no, i would just use a sequence.

RAC on single node

Sudhir, November 30, 2003 - 6:26 am UTC

Tom,

I am disappointed to know I have no way to do that for w2k. It seems like some folks have done that with Linux

</code> http://www.polarisdb.com/rac.htm <code>

As usual thanks for the great information!

Tom Kyte
November 30, 2003 - 8:22 am UTC

you are not on linux and you have a machine that is somewhat underpowered to run windows let alone RAC!

and even though you can "in theory" get rac going on a single linux box, I've never ever seen the point. You cannot really learn very much from it.

Thanks for the question regarding "Real Application Clusters", version 8.1.6

Yemi Ayofe, December 09, 2003 - 6:38 pm UTC

Good to know RAC can be technically implemented on one machine. Though as Tom said, I do not see the point of it. I would think 3 is more like it. Two to hold each instance, and the third your storage. i.e disks where oracle database is.

RAC test cases needed!

reader, December 25, 2003 - 9:03 pm UTC

Tom,

I was hoping you can help me setting up some test cases for RAC. I have RAC working on RH linux. Now comes the hard part of actually doing something with it.

I would like to know a small test harness (sql's) that you would run from node(s) to see/understand what is really happening in the clustered environment.

For example, I hear sequences may/may not be a problem.
Even if you give me just ideas I will be very glad.

Thanks

Tom Kyte
December 26, 2003 - 10:06 am UTC

you have it backwards.

you build a test case to prove something. an idea, a hypothesis....

you need to have a goal in mind.

you can't really just take a bunch of test cases and run them and gleen anything from that.

I would just take a REAL application, something you already wrote -- that exists, that is used. Host it on the RAC database. Test it. See how the "real thing" performs. Then, if you find issues build test cases to test out the ideas as to "why" the issue is happening.

A reader, December 29, 2003 - 1:11 pm UTC

Tom,

Recently somebody asked me this question "How can you find out the number of nodes that are participating in a clustered environment"

My answer was using srvctl command, is this right? and/or are there are any other commands that we can use.

Thanks.

Tom Kyte
December 29, 2003 - 1:17 pm UTC

SQL> show parameter cluster_database_instances
SQL> select * from gv$instance;

would be two that popped into mind first. 

RAC and DW

Arun Gupta, December 29, 2003 - 1:29 pm UTC

Tom,
Since hard parsing is unavoidable in DW and in RAC library cache operations are coordinated among all nodes, would RAC be a good choice for DW?
Thanks...

Tom Kyte
December 29, 2003 - 1:33 pm UTC

one of RAC's "sweet spots" is a DW.

hard parsing in a DW is not an issue -- you parse queries that are destined to run for many seconds/minutes/hours.

you are not parsing a thousand queries per second here.

resource

Reader, January 14, 2004 - 9:14 pm UTC

<quote>The Global Cache and Global Enqueue Services maintain information about each resource within a cluster.<quote>

Does oracle save the Global Cache and Global Enqueue resource information in the disk? If so, in which tablespace (system?) datafile? Thanks.

Tom Kyte
January 15, 2004 - 3:13 am UTC

no, does not.

resource info in RAC

reader, January 15, 2004 - 10:05 am UTC

If it does not save the block status info in the disk, if one instance fails, how does the other instance recontruct the resources managed by the failed instance? Thanks for your valuable time.
Have a great time abroad.


Tom Kyte
January 15, 2004 - 10:29 am UTC

another instance takes over for it.

instance owned redo files

Jianhui, January 15, 2004 - 5:31 pm UTC

Tom,
In RAC environment, when a single instance crashes, since it's own redo file set is not in the shared storage (like datafiles and control files), how could other instances take over the redo data of the crashed instance, then how could failover happen without data loss?

Another question is if the interconnect has problem, each node will think other nodes are dead and try to complete to take over the database, what mechanism ensures that the first instance gets the right? I have heard of the voting disk thing, but not so sure, could you explain?

Thanks as always

Tom Kyte
January 16, 2004 - 1:07 am UTC

they are in a shared environment -- it is that they are not "shared" -- each instance has their own copy, but they are in an environment where anyother instance can in fact access them.

if the interconnect dies, the entire cluster is pretty much going to go with it. the entire thing will be dead.

redo log files should be in the shared disk

reader, January 15, 2004 - 6:44 pm UTC


Global_dbname

Reader, January 15, 2004 - 9:59 pm UTC

Tom,

(1) is it true that global_dbname should not be set in the listener.ora in order for srvctl to manage the instances/database in RAC environment?
(2) Also, for oem to discover nodes, the SID_NAME must be explicitly set in the listener.ora file? If so, does it disable the dynamic registration of instances with the listener?
Thanks.

Tom Kyte
January 16, 2004 - 2:47 am UTC

1) depends on how you want to do it. it can be set, it need not be set. from the docs:

....
Note: In Real Application Clusters environments, the GLOBAL_ DBNAME parameter disables connect-time failover or transparent application failover (TAF), Oracle Corporation strongly recommends against adding this parameter to your listener.ora file.
..........


2) I'm not an OEM setup "wizard" but i believe if the intelligent agent is setup, it matters not what is in the listener.ora file. perhaps someone else reading this could confirm (but it would be an easy question for an iTar)

Datafile timestamp is diff between OPS nodes

Sami, January 18, 2004 - 9:22 pm UTC

Dear Tom,
Thanks for your marvelous support and help.

I have an OPS(2 node) environment where datafiles are stored in raw disk. I check datafiles TIMESTAMP from both node and they are totally different?

Just curious to know why is it like this?


#############
NODE-1
#############

$ cd /dev/vx/rdsk/dbdg3

server01 user1 PROD11 /dev/vx/rdsk/dbdg3
$ ls -ltr
total 0
crw-r--r-- 1 oracle dba 159,92003 Dec 16 13:16 NonCrm_i12.dbf
crw-r--r-- 1 oracle dba 159,92002 Dec 16 13:16 NonCrm_i11.dbf
crw-r--r-- 1 oracle dba 159,92001 Dec 16 13:16 Crm_i12.dbf
crw-r--r-- 1 oracle dba 159,92000 Dec 16 13:16 Crm_i11.dbf

server01 user1 PROD11 /dev/vx/rdsk/dbdg3
$

#############
NODE-2
#############
$ cd /dev/vx/rdsk/dbdg3
server02 user1 PROD12 /dev/vx/rdsk/dbdg3

$ ls -ltr
total 0
crw-r--r-- 1 oracle dba 159,92003 Jul 5 2002 NonCrm_i12.dbf
crw-r--r-- 1 oracle dba 159,92002 Jul 5 2002 NonCrm_i11.dbf
crw-r--r-- 1 oracle dba 159,92001 Jul 5 2002 Crm_i12.dbf
crw-r--r-- 1 oracle dba 159,92000 Jul 5 2002 Crm_i11.dbf

server02 user1 PROD12 /dev/vx/rdsk/dbdg3
$

Tom Kyte
January 19, 2004 - 9:25 am UTC

those are links right.

not knowing your OS or the nuances of your file system -- i'd say "those look like the dates the links were created".

high availability

SR, January 19, 2004 - 9:44 pm UTC

Tom,

In RAC environment or even regular setup, I find one thing little bit surprising the way oracle has done it. I mean, if you lose a multiplexed redo member, database stays up, it just warns you. Really classy. Now, I got four controlfiles, one of them gives error, the whole thing falls down. What's with that? Considering RAC, this SPOF is even worst because you will bring down all nodes.

So my curiosity is if log writer is taking a chance on remaining member, why can't same be said about control file? Its not even that you are losing any data. You can recreate controlfile completely as well. So where's the catch? I would like to see failsafe controlfile setup. We had a situation once when Veritas checkpoint held the controlfile hostage a bit too long, panicked the perfectly fine database and aborted the instance.

Obviously, if performance vs safety is a concern & customer wants to do it this way: Do you think it will be reasoanble to allow oracle based mirroring of system tablespace? You can do online recovery. If system is down you are again stuck. I know same can be said about UNDO, but before going too far - realistically if we have 8 node RAC cluster, adding a bit of extra safety on these SPOF points wouldn't cost that much over long run - especially if oracle internally does it..

Thanks

Tom Kyte
January 20, 2004 - 8:12 am UTC

controlfiles are key to our operation -- a failed write on the control file is considered "down database", something really bad has happened. You are losing data -- the control files tell us about the state of our system -- where everything is, what point in time they are, everything.

Tom, my main concern

SR, January 20, 2004 - 7:44 pm UTC

1. RAC - All nodes will go down if one controlfile write/read fails? How can that be justifiable

Now a follow up on that. What is the point of multiple controlfiles then? Please explain to me what are we truly gaining from multiple controlfiles? (I do understand disk failure, controller problem and etc etc reasons). But, suppose each and every time if you were to lose controlfile - as long as you have backup controlfile created (for just argument sake) each structural change and each hour - you won't really lose any data, would you?
So, if what are we buying with extra controlfiles? convenience - so that you will copy good controlfile to failed location?

I am just not understanding, how LGWR being so much more busy, dealing with so much critical data, can take a hit on log file but not same with the controlfile.

I apologize for pushing this issue but, if my employer is going to RAC and it still has one point of failure (cluster volume manager being one already) I wan't to understand more about it.

Thanks

Tom Kyte
January 20, 2004 - 10:07 pm UTC

you don't have to use a cluster volume manager.

the control files are the eyes of the database -- without them we are blind. we don't even know what files constitute the thing.

controlfiles

Reader, January 21, 2004 - 11:29 am UTC

My understanding is that if you loose one controlfile during the database operaion, oracle would function with the remaining mirrored controlfiles. Otherwise, what is the purpose of multiplexing the controlfiles? Only if the write operation failed for all of the controlfiles, the system would crash. Thanks.

Tom Kyte
January 21, 2004 - 3:08 pm UTC

they are the eyes. they are important. a failed write shuts down.

controlfiles

Reader, January 21, 2004 - 4:05 pm UTC

<quote>a failed write shuts down<quote>

Tom, do you mean failed write on one controlfile (even though Oracle wrote successfully on the remaining controlfiles) would shutdown the instance? Thanks.

Tom Kyte
January 21, 2004 - 5:58 pm UTC

don't do this at home but:

$ echo x > /home/ora9ir2/oradata/ora9ir2/control03.ctl
$ tail -f alert*


***
Corrupt block relative dba: 0x00000001 (file 0, block 1)
Completely zero block found during control file header read
Wed Jan 21 17:58:51 2004
Errors in file /home/ora9ir2/admin/ora9ir2/bdump/ora9ir2_ckpt_16456.trc:
ORA-00202: controlfile: '/home/ora9ir2/oradata/ora9ir2/control01.ctl'
Wed Jan 21 17:58:52 2004
Errors in file /home/ora9ir2/admin/ora9ir2/bdump/ora9ir2_ckpt_16456.trc:
ORA-00227: corrupt block detected in controlfile: (block 1, # blocks 1)
ORA-00202: controlfile: '/home/ora9ir2/oradata/ora9ir2/control01.ctl'
CKPT: terminating instance due to error 227
Instance terminated by CKPT, pid = 16456






further question on control file mirror

SR, January 21, 2004 - 10:41 pm UTC

Tom,

Let me as you would you wish for a feature where database won't abort when at least one controlfile is functioning properly and others not? I understand what happens now. But thats not Oracle always is about, is it? All those new features of 9i make you want to believe Oracle must have some such plan in the makings.

In your earlier comments you mention don't use cluster volume manager. I don't understand that. We have Veritas infrastructre. We will use CVM, CFS, ODM and RAC. You cannot drop CVM - thats the heart. Am I missing something?

May be in early version of oracle error in one of redo member may have caused db to crash also, did it? I don't know. If so we have come long way. I hope with 10g fault tolerant control file would be very welcome addition.

Thank you

Tom Kyte
January 22, 2004 - 6:32 am UTC

you need the clustering software to cluster the OS's in 9i (not in 10g and above) but you don't need to cluster the filesystems if you don't want to. Raw will do.

How does RAC recover resource GCD info?

reader, February 05, 2004 - 8:43 pm UTC

Tom, it is not clear from reading the doc how the surviving instance knows about the resources mastered by the failed instance as the resource info is not stored in the database. Global Cache Directory resides in the memory and once the instance crashes there goes the resource info as well. It is not documented clearly how it is done. Is it a proprietory info for oracle or DBAs do not need to know in depth to manage RAC? I know you must be knowing all of these and can you share? Thanks for your time.

Tom Kyte
February 06, 2004 - 8:35 am UTC

just internal stuff, we don't need to know really. we only need to know "it works as advertised"



A reader, February 06, 2004 - 6:22 pm UTC

Tom,

Lets say if i issue the same command as yours to datafile, will it corrupt once block or entire datafile.

$ echo x > /home/ora9ir2/oradata/ora9ir2/users01.dbf

Thanks.


Tom Kyte
February 07, 2004 - 2:20 pm UTC

maybe, maybe not.

You see, if an Oracle process has users01.dbf open -- it'll will be "open" and the echo will just unlink (remove the directory entry) of the file and create a new file. the Oracle process will still "see" the old file -- until it closes it and opens it again.

So, it might take a long time for this to "appear" to happen and when it does, the file will just be "x" (1 byte) long.

it would corrupt the "entire file" effectively.

CFS vs Raw

reader, February 19, 2004 - 6:35 pm UTC

Tom, what is the difference between CFS and raw partitions in RAC? Which one is prefered? Thanks.

Tom Kyte
February 19, 2004 - 7:31 pm UTC

cfs gives you "a file system". it makes people "feel good" because raw is perceived as "scary" and "hard".

CFS adds a tiny bit of overhead -- single digit percents -- but if you feel more comfortable with a file system over raw -- it'll be what you want.

RAC versus standalone database

A reader, April 08, 2004 - 1:22 pm UTC

Hi Tom
If I want to know whether a particular database instance is mounted on a cluster or is a standalone instance, does
the following query do the job?
(9.1.0.4 or 9i r2)
select parallel
from v$instance;

If we get "YES" then it is a RAC else it is standalone.

Thanx!


Tom Kyte
April 08, 2004 - 3:37 pm UTC

show parameter instance_number

would be another (0 = no rac, <>0 = rac)

but yes, that'd do it as well -- your query.

arch

A reader, April 10, 2004 - 1:15 pm UTC

sometimes in node 2 arch files with thread no of node 1 appers why.

Tom Kyte
April 10, 2004 - 6:13 pm UTC

</code> http://docs.oracle.com/docs/cd/B10501_01/rac.920/a96597/psintro.htm#11668 <code>

<quote>
If you operate Oracle in ARCHIVELOG mode, then when a log file is full Oracle converts it to an archive log file before overwriting the log file with information. In Real Application Clusters, each instance automatically archives its own redo log files, or one or more instances can archive the redo log files for some or all of the instances in the cluster database.
</quote>

all of the redo is available to all of the instances. anyone can archive it.

Private SQL Area

Jayesh, April 12, 2004 - 2:49 am UTC

What is private SQL area? Is there no sharing of private SQL Area how is it going to be advantange the DB?

Tom Kyte
April 12, 2004 - 7:15 am UTC

this has to do with RAC how?

arch files

reader, April 14, 2004 - 11:41 am UTC

What is the best recommendation for arch files? If each node in RAC has arch files stored in its own disks, how would other instances know about these arch files in case other instance needs to do media recovery? Thanks.

Tom Kyte
April 14, 2004 - 2:08 pm UTC

I'll refer you to the RAC admin guide -- it covers this in great detail discussing all of the options (cluster file systems, nfs, etc)

</code> http://docs.oracle.com/docs/cd/B10501_01/rac.920/a96596/intro.htm#1025908 <code>

rac and dataguard

reader, April 19, 2004 - 6:46 pm UTC

If my primary database is in RAC environemnt, is there any limit on the distance where I can keep my standby database that may or may not be my another RAC? Thanks.

Tom Kyte
April 19, 2004 - 7:39 pm UTC

no, just practical ones depending on the mode and transaction throughputs. hundreds, thousands of miles -- works.

Any Suggestions!!!!!!!!!!!!!!

A reader, May 18, 2004 - 4:12 pm UTC

Hi Tom,
We are presently running 9iR2 on Windows 2000 Advanced server. We are thinking of shifting the existing database to a RAC environment. Is it possible to convert the existing single instance database to RAC with around 2-3 servers/nodes? Also, we run about 10 procedures using JAVA on the present server. So we thought to divide the load onto 2 Servers/nodes in RAC, for e.g. 5 JAVA procedures on one node and another 5 on another node. Is it possible and feasible? Also it is an OLTP environment.
Thanks in advance.


Tom Kyte
May 19, 2004 - 7:29 am UTC

yes, it is possible and done.

is it possible to run "5 clients against node1" and "5 against node 2" -- sure, absolutely. You are either use load balancing and let us decide or you can specially say "you 5 go here, you 5 go there"

Thanks Tom!!!!!!!!!!!!

A reader, May 19, 2004 - 9:28 am UTC

Thanks for the prompt reply. That sure was a relief to know that it can be done. As for the load balancing more on that in its thread.
Thanks again..............

Real Application Component

Jay, May 25, 2004 - 11:12 pm UTC

Do I have to install any additional software for working with RAC other than Oracle 9i Enterprise Edition(Windows).? I am not getting the svrctl and srvconfig. What else is required for implementing RAC.?

Tom Kyte
May 26, 2004 - 7:35 am UTC

Yes, you need to have licensed RAC...

You will need clustering software in 9i as well -- in 10g, the only software you need is:

o Oracle SE comes with RAC standard. You use ASM for managing disk, you use our clustering software to cluster.

o Oracle EE has RAC as an OPTION. You may choose to use ASM or not, you may choose to use our clustering software or a third party's software.

Resource Manager and RAC

Vivek Sharma, May 26, 2004 - 11:58 am UTC

Dear Tom,

We are planning to migrate our Oracle 9205 Database to RAC in 2 months time. It will be a 2 Cluster Node. The Database is using Shared Server Connection for OLTP applications and Dedicated for some Batch processes. Current Box (Assume A) is able to handle all of these connections which are approx.6000 connections. But when we move to RAC, other box (Assume B) will be slightly lower than current one and might not be able to serve 6000 connections. I have a question to ask about resource manager on RAC. Out of the 6000 connections 500-800 connections are dedicated.

When we configure RAC, the connections will be distributed across the Instances.
Assume Instance A has 3000 Connections and
Instance B has 3000 Connections.

Suppose Instance A Crashes, the load on Instance A will be failover to Instance B. But since Instance B is of slight lower capacity. I want some sort of restrictions that Sessions doing Batch processes are not able to connect to Instance B since the load is full on this Instance. Also User XYZ can have 2 sessions one doing OLTP and other doing Batch. So how to avoid connection to X for Batch ?

Please suggest some solution.

Regards
Vivek






Tom Kyte
May 26, 2004 - 3:32 pm UTC

have you thought about not doing that? not having an unbalanced cluster? it will make it much harder to tune, configure, manage.


but you would use the tnsnames.ora entry for that.

the "oltp" connect string would load balance.
the "batch" connect string (with server=dedicated) would specify only host "a"

RAC

Jayesh, May 27, 2004 - 12:34 am UTC

I have installed Oracle 9i Release 2.0 (Windows) on my machine. But I am not having srvctl,srvconfig which for utilities for starting with RAC. What else I need to install to implement RAC as per the steps given in documentation?

Tom Kyte
May 27, 2004 - 8:57 am UTC

you need to install RAC.

it is an option (separately licensed) to 9iR2 ENTERPRISE edition only. If you did not buy it, you do not have it.

Unbalanced performance in RAC environment.

Ian, June 22, 2004 - 10:07 am UTC

As part of our testing for some very large data loads (30 Million Rows spread across 20 odd tables) we stress tested on RAC as Production is on RAC.

We are loading via Oracle Applications APIs and have arranged it so we can run multiple batches in parallel.

When run on a single instance we found that the optimal number of batches was equal to the number of CPUs.

When we came to test on RAC we started with 2 batches - one on each side of the 2 Node RAC. We watched the performance (via v$session_longops which our code is instrumented for) and the performance was horrible. So we terminated the run and went to 2 batches on one Instance and none on the other and it flew.

But when we looked back at the first test we found an unusual result - between the two batches it had processed 1000 entities (equals approx 200,000 inserts) - but one Instance had done 970 entities and the other had done 30.

It was as if cache fusion was favoring one instance over the other. We have multiple free list groups and the inserts do not hit any sequences (they use sysguid).

Unfortunately I cannot provide tkprofs as we in development no longer have access to the RAC environments (They are used for UAT, Production and Support).

But I was wondering if you have come across this phenomena in RAC before, or could explain it.

Regards

Ian

Tom Kyte
June 22, 2004 - 1:44 pm UTC

it is too hard to say "why" without any supporting information -- like at least a statspack from each node.

Rac and external tables

Jim, June 22, 2004 - 2:05 pm UTC

I was at Oracle World in San Fran. and one presentation had a shared disk across multiple servers and did the data load by external tables (9i). Since each instance across the RAC could see the file (external table) each instance parallelized the load and it was quite fast. Much simpler than trying to divide up the file etc.

(also was able to get a signed copy of Effective Oracle by Design)

Sequences and RAC

reader, July 01, 2004 - 6:40 pm UTC

1. Can we use different sequence generator for each instance so that we can use different ranges and take care of index leaf block contention .
If yes ,how can this be achieved ie how can the same column of a table refer two different sequences .

2.Using a muliplier to create different ranges of values by instance .
Primary Key = prod_sequence.nextval + instance_number*10000000

How can we access instance_number as a normal user .

(NB :I am reading the training material on RAC from Oracle )

Can you elaborate with some examples ?

Thanks



Tom Kyte
July 01, 2004 - 8:15 pm UTC

reverse key indexes.

or have a two part key -- (instance_id,id)

dbms_utility.get_parameter_value will let you get instance_number.

RAC and Replication

A reader, July 12, 2004 - 1:02 pm UTC

Hi Tom,
1) Is the configuration the same to set up multiple master site replication for RAC or regular 9i databases?

2) Using OEM, I connect to a specific instance of RAC to set up replication. Will the other instance of RAC also gets the configuration?

3) For RAC, should I set global_name = True or False for replication.

I really appreciate your help.

Tom Kyte
July 12, 2004 - 8:54 pm UTC

1) well, you have more options -- but in general "the same", you can pick which node does the replication if you like.

2) yes, ddl sort of stuff affects everything. replication is a bunch of metadata stored in the dictionary.

3) global_name =true in general for everything, a must for replication.

Operation expansion

reader, July 13, 2004 - 10:49 am UTC

Not sure if this is in the correct thread.

We have plans to expand operations, we now have Prod and Test Servers
and are considering expanding to have dw and email services.

We would need to be able to link the 3 systems together, example:

data that is old move to dw for archiving when that data is needed at
some future time for a future purpose recapture(restore) back to original
source. Keeping the part of business that is conducted via e-mail(web services?)
on an E-mail Server but being able to integrate that data into the Production
data for reporting purposes, are just some of the plans that mgt is considering.

What would be the best way to setup these systems:
1) database links
2) Application Clusters
3) Web Services

We are using 9iR2 on a Windows 2000 Server OS for the Production and Test Servers.
Plans considered are purchasing additional hardware to meet the demands.

I'm looking for guidance as to what technology(s) would be most appropriate
and easy for the beginning stages and not waste time researching more ambitious
options.

thanks.

Tom Kyte
July 13, 2004 - 12:23 pm UTC

all three are viable.

"not waste time researching" is an oxymoron.


the classic approach would be #1.

an approach we are using here at Oracle ourselves is #2 -- RAC it, one database -- many instances.

#3 is just an api layer, good for communicating around to people you don't communicate with each and every day (open system api). perhaps not the most efficient way for you to talk to yourself (given that you have dblinks and other native database tools to move data).

followup

reader, July 13, 2004 - 2:14 pm UTC

perhaps I should re-phrase "wasting time researching".
what I meant was given the scenario that was presented and my not having done something like this before and management wanting me to develop, prepare and present a proposal in the coming months I was hoping that someone with your vast experience and knowledge could give me a good starting point so I could focus my efforts and energies towards that mission instead of spending alot of time looking at a plethora of possibilities, some that may be more than what is actually needed.That's what I meant by "wasting time".

I don't consider anything I've learned in Oracle to be a waste of time.

could you please expand on the RAC:
you mean one database could be split into 3 instances(production,dw,e-mail)
how would each of those instances impact each other as a whole?
can they all be run at the same time?
what specs of hardware would we need to develop such a system?

thanks again

Tom Kyte
July 13, 2004 - 8:04 pm UTC

with rac, there would be one database .. so the DW can easily just "insert into select * from oltp stuff" as your ETL

each instance would run on its own server -- has its own cpu, its own memory -- they would share the disk -- but since the dw typically reads/writes it's own data (except when refreshing, moving data over) -- it would be separated from the oltp instance more or less.

additionally, you would be able to drill down from the DW into the OLTP data if you like, an added bonus.

the specs of the hardware would be entirely dependent on your size. could be a couple of small linux boxes, could be bigger -- impossible to size without a more intimate knowledge of the overall system (something impossible really to pick up in a forum like this)

Application design for OPS env

sami, July 13, 2004 - 9:18 pm UTC

Dear Tom,

Very useful thread.

Reference to URL,
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:4663078177884#4673261113427 <code>

"OPS is not RAC, you had to design your applications for OPS (and we always said
that - so what you had was a poorly designed OLTP app on OPS)."

you had to design your applications for OPS - What do you mean by this?

Could you please describe what are the things should be considered while designing apps for OPS (except- during failover the application code has to resubmit the request)


Tom Kyte
July 14, 2004 - 11:18 am UTC

in OPS, in order for two instances to both modify a block at the same time, we had to ping the block to disk -- instance 1 would be notified by instance 2 that 2 wanted the block, 1 would write it out and 2 would read it in.

Not very "fast" if you did this lots.


in RAC, in order for two instances to both modify a block at the same time, we send the block back and forth over a high speed inter-connect, while not as fast as ram -- it is infinitely fast compared to disk.

So, in OPS, you tried to design to limit the pinging that would happen -- otherwise the application would not run or run very slowing. You would use partitioning (dept=10 use node 1, dept=20 = node 2) or application partitioning (hr users use node 1, finapps users use node 2) in order to reduce/remove the contention.

Asif, July 14, 2004 - 9:37 am UTC

Tom,

Recently we had an issue where the database did not startup automatically after fixing hardware failure, though we have auto startup scripts in place. Accordinly to the SA the reason was due to sycnchronization not completed by cluster(hardware resource) which takes more than 5-7 minutes to complete. And if during that time if you try to start gsd daemon it fails which results in srvctl not functioning.

The solution was to put the autostartup script to sleep for 8 min or 10 min, so the db would come up properly.

Whats your take on this, can you please comment.

Thanks.

Tom Kyte
July 14, 2004 - 11:40 am UTC

what can i say? if the cluster is not up, and we rely on the cluster, until the cluster is up -- we cannot start.


followup # 2

reader, July 14, 2004 - 9:43 am UTC

thank you for taking the time to explain RAC.

a couple more questions based on your answer and what I've read from your book EOBD(page171+) about RAC.

We currently have a 28g Production db(considered small by Oracle standards).We currently outsource the Internet/E-Mail business operations(which is a fledgling
venture expected to grow exponentially in the coming years) and bring that in-house for complete management control. Mgt wants to keep the Internet part of the business separate from the OLTP(Production db) operations...except when there are requirements to integrate some portion of the Internet data to the Production data for analysis.

The dw aspect is quite small for now(2 mil records) that I can store in another table within the same tablespace(not a bona-fide dw operation with 100 mil's of records)

The main priority is the Internet/E-Mail operations.

In this case we would be looking at 1 database, 2 instances?

You explain each instance resides on a separate server?
where does the main database reside?

Sorry if these questions seem inane. I understand the concept of RAID(one box with several hard drives with data that is spread throughout, should one disk fail the others pickup the load)...however with RAC it is several boxes(depending) on number of instances connected to a main box that houses the Database?

Am I visualizing this correctly?



Tom Kyte
July 14, 2004 - 11:43 am UTC

RAC consolidates -- single database. RAC does not keep separate, RAC brings together.

if you used RAC and a single database -- you would use a SINGLE admin team for the entire thing (thats one of the benefits -- one team, managing lots of stuff, instead of lots of teams managing lots of stuff)

one last point to followup #2

reader, July 14, 2004 - 10:14 am UTC

we are not looking for redundancy we are looking at
two distinct entities that require the ability to
integrate for a specific purpose.



Tom...you wrote

reader, July 14, 2004 - 1:24 pm UTC

"each instance would run on its own server -- has its own cpu, its own memory --
they would share the disk -- but since the dw typically reads/writes it's own
data (except when refreshing, moving data over) -- it would be separated from
the oltp instance more or less."

On page 175(EOBD) figure 3-4 it shows 4 different nodes
using shared cache architecture using cache fusion.

So..architectural-wise the instances are separate entities.
Now you say:
"RAC consolidates -- single database. RAC does not keep separate, RAC brings together."

I presume this means conceptually.

my question is if there are 4 different nodes where does
the single database reside? and how is the cluster connected
via the cache fusion.

I saw in your book there's a RAC Concepts Guide so I'll
get my reading glasses out and start scouring the guide.





Tom Kyte
July 14, 2004 - 10:05 pm UTC

they are separate INSTANCES, sharing a single -- consolidated database.

not conceptually -- 100% physically. RAC brings together, in a cluster, what you used to keep all over the place.




thanks for your attempts to explain

reader, July 15, 2004 - 12:54 pm UTC

but I'm just not getting it by the your varied explanations.

First you say instances are on separate servers, their own
cpu, memory, then it's instances are separate but consolidated on a SINGLE database. Is it a single database
spread out across several nodes for redundancy like
a RAID system? If that's the case then that's not what
we are looking for.

We are looking to develop two distinct business systems
and have them run independently of one another(much like
a corporation with subsidiaries), but have the capability
of merging data from both systems for specific projects
or analysis.

It appears database linking is more in line with what
we have planned.

If anyone else out there has any suggestions please feel
free.

thanks again




Tom Kyte
July 15, 2004 - 5:51 pm UTC

it is a single database accessible from many nodes -- not "spread out" over many nodes.

I said above, it would not lend itself to two separate groups managing it -- it is for consolidation -- not making more work, reducing the amount of work.

distributed complexity and different management teams is certainly one way to go -- the harder way, but a way.

business need

Jim, July 15, 2004 - 6:11 pm UTC

RAC has its place, but I think your business need is what you need to concentrate on. If I had your problem and the schemas of the two businesses were the same. (hope they are or it becomes very complex to combine data) Then I would put them in the same database. If I was designing them from scratch I might even put the data in the same tables and use some set of FGA (Fine Grained Acess) to allow people to see what they should be allowed to see. (eg sales person from subsidiary A would only see his subsidiaries stuff, but president of the conglomerate could see everything.)

Or I might put them in different schemas in the same database. There are a lot of ways to solve this problem, but none of them have anything to do with RAC or database links. Solving it as above does not preclude you from using RAC for reliability or scalability; you still can.

I would stay away from database links if you do not have to use them. It just adds more complexity and may decrease performance in certain types of queries.

To Jim

reader, July 16, 2004 - 8:55 am UTC

your input offered some possible alternatives.
We already have the OLTP Production db in place and mgt
is looking to extend a component of that business system
(the Internet/Web part)in-house.

We currently out-source that component and then load
the files to the Production db.

Mgt wants to keep the Production & Internet systems
separate but able to integrate. My understanding to date
of the plan is that mgt wants the systems on separate
machines so that there is no performance,resource or contention issues while the OLTP team does their work on the Production db. The Internet/Web team will perform their functions on the Internet db and should any system fail(for whatever reason)the other system will remain unaffected.

I understand the role of RAC from an architectural and logical standpoint, however our focus at the moment is how to best serve the business logic plan that mgt is proposing.

thanks for your input.



RAC - Oracle Software on Shared Disk

A reader, July 20, 2004 - 9:51 am UTC

Dear Tom,
I am trying to setup 9i RAC at home as mentioned in the article
</code> http://otn.oracle.com/pub/articles/hunter_rac.html <code>

Is it possible(okay) to install oracle software on Shared Disk?



Tom Kyte
July 20, 2004 - 8:09 pm UTC

define "shared disk"

External hard disk connected via FireWire

A reader, July 21, 2004 - 8:58 am UTC

Sorry I was not clear.
Shared Disk ==>External hard disk connected to both nodes using IEEE 1394 FireWire card

Tom Kyte
July 21, 2004 - 9:07 am UTC

technically i think you can with OCFS -- however, it'll be somewhat "not so fast" since the OS cannot really cache the text pages.

it would not be recommended -- except for shared configuration files (ifile= files in your init.oras for example)

How to Register Callback Function?

A reader, July 21, 2004 - 9:23 am UTC

Dear Tom,

How to register a callback function? Do I have to use OCI or something?

The doc say the below stuff but I really couldn't understand.

Could you please make it clear in your simple terminology? if posible with an example.

</code> http://download-east.oracle.com/docs/cd/F49540_01/DOC/server.815/a67775/chx_inst.htm#436898 <code>

Failover Callback
-----------------
Frequently failure of one instance and failover to another takes time. Because of this delay, you may
want to inform users that failover is in progress. Additionally, the session on the initial instance
may have received some ALTER SESSION commands. These will not be automatically replayed on the second
instance. You may want to ensure that these commands are replayed on the second instance. To address
such problems, you can register a callback function.


Tom Kyte
July 21, 2004 - 9:40 am UTC

you would be programming in OCI at that point, yes.

</code> http://docs.oracle.com/docs/cd/B10501_01/appdev.920/a96584/oci09adv.htm#449793 <code>

10g + 9i RAC on one box

Reader, July 21, 2004 - 3:19 pm UTC

I don't know if this is the best forum for this question but since I always had success here I will post my question here.

Is it possible to have 10g and 9i RAC installation on the same set of machines because it will help us saving a lot of money in hardware and software .

To match our production we need to have test on 9iR2 but to try new stuff we need 10g .

Any documentation .

Thanks again.

Tom Kyte
July 21, 2004 - 6:48 pm UTC

I've no idea if it would work -- I would strongly advise against it -- having both up and running at the same time. There would be a ton of potential conflicts.

You'd have to ask support if it is even feasible -- I've never tried it.

CFS

reader, August 05, 2004 - 3:03 pm UTC

For 9iR2, oracle provided CFS only for linux advanced server and SuSe. in 10g is it available for other platforms as well? Thanks.

Tom Kyte
August 05, 2004 - 8:26 pm UTC

ASM is the way to go forward. I've not really followed OCFS myself.

Probably a silly question but...

A reader, August 09, 2004 - 9:17 am UTC

Tom,

Can you explain what advantages RAC gives us that OS-level clustering doesn't? Setting up clustering at the OS level gives us high availability and load balancing. Our primary goal in using clustering is to eliminate a single point of failure (and we have a limited budget so I would need to justify purchasing RAC licenses). So I guess my main question is: Will setting up RAC significantly increase availability better than plain OS-Level clustering?

Hope you can help. Thanks!

Tom Kyte
August 09, 2004 - 10:29 am UTC

ok, so you set up os-level clustering.

no rac.

so, there is a single node in this cluster running oracle (because without RAC, you have no ability to run oracle on each node -- having them all access the database).

So, how does load balancing work? It doesn't, there is just one oracle instance to connect to.

So, the node oracle is on fails. Do you have failover? no, not really, Oracle isn't running on any other machine yet.


You can do clustering without OS clustering software in 10g RAC (eg: you buy OS, you buy Oracle -- you can now create a database cluster)

But you cannot do a database cluster without RAC

10g RAC and VIP(Virtual Internet Protocol)

Sami, August 15, 2004 - 12:24 am UTC

Dear Tom,

In addition to Private & Public IP address, 10g RAC requires VIP (Virtual IP). Document did not explain much about VIP.

Could you please explain what is the purpose on this extra IP address (VIP is not required in 9i).

Thanks in advance

Tom Kyte
August 15, 2004 - 9:26 am UTC

see support note <Note:264847.1>

and from <Note:220970.1>


Its all about availability of the application.

When a node fails, the VIP associated with it is supposed to be automatically failed over to some other node. When this occurs, two things happen. (1) the new node re-arps the world indicating a new MAC address for the address. For directly connected clients, this usually causes them to see errors on their connections to the old address; (2) Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

This means that when the client issues SQL to the node that is now down, or traverses the address list while connecting, rather than waiting on a very long TCP/IP time-out (~10 minutes), the client receives a TCP reset. In the case of SQL, this is ORA-3113. In the case of connect, the next address in tnsnames is used.

Without using VIPs, clients connected to a node that died will often wait a 10 minute TCP timeout period before getting an error.

As a result, you don't really have a good HA solution without using VIPs


CONTROL_FILES

Jojo, September 07, 2004 - 7:46 am UTC

I am facing a peculiar problem. I have set 9i RAC on Red Hat Linux 9.0. Every thing is working fine except the controlfile. I have specified

CONTROL_FILES='/xx1/ctl01.ctl','/xx2/ctl01.ctl'

But the second control files in order is not getting written. The same is case even if i change the order. Which ever file comes second is not getting updated by oracle? I checked the time stamps only the that comes first in order is only getting updated?

Tom Kyte
September 07, 2004 - 9:33 am UTC

<quote>
I have set 9i RAC on Red Hat Linux 9.0
</quote>

i would expect lots of "peculiar" things given that that is the desktop OS and not the server OS.

But, in any case, I cannot reproduce. How about you:

sys@ORA9IR2> show parameter control

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
control_file_record_keep_time integer 7
control_files string /home/ora9ir2/oradata/ora9ir2/
control01.ctl, /home/ora9ir2/o
radata/ora9ir2/control02.ctl,
/home/ora9ir2/oradata/ora9ir2/
control03.ctl
sys@ORA9IR2> column v new_val x
sys@ORA9IR2> select 'ls -l ' || replace( value, ',', ' ' ) V from v$parameter
2 where name = 'control_files';

V
-------------------------------------------------------------------------------
ls -l /home/ora9ir2/oradata/ora9ir2/control01.ctl /home/ora9ir2/oradata/ora9ir
2/control02.ctl /home/ora9ir2/oradata/ora9ir2/control03.ctl


sys@ORA9IR2> !&x
-rw-rw---- 1 ora9ir2 ora9ir2 1613824 Sep 7 09:18 /home/ora9ir2/oradata/ora9ir2/control01.ctl
-rw-rw---- 1 ora9ir2 ora9ir2 1613824 Sep 7 09:18 /home/ora9ir2/oradata/ora9ir2/control02.ctl
-rw-rw---- 1 ora9ir2 ora9ir2 1613824 Sep 7 09:18 /home/ora9ir2/oradata/ora9ir2/control03.ctl



and show us what you see.

RAC

Jojo, September 08, 2004 - 12:36 am UTC

But that did'nt answer my query. Can you give me some probable causes for my issue?

Tom Kyte
September 08, 2004 - 9:18 am UTC

I want to see the information I requested to see?!?

I don't even know that you HAVE a problem. show me a cut and paste EXACTLY in the manner I described (no editing).

seems in the time you waited to post this, wait for me to read this, you could have at least done a simple cut and paste from the results of a simple script.

deleting OCFS drives

Corne van den Beemd, September 22, 2004 - 9:24 am UTC

Hi Tom,

There is 1 document available 'how to add an OCFS drive in Windows'. I've not seen yet a paper 'how to delete an OCFS drive'. Is this just working the 'how to add' document bottom up?

Tom Kyte
September 22, 2004 - 10:11 am UTC

sorry, i've just never played with OCFS myself

A reader, September 23, 2004 - 4:17 am UTC

thanks for the reply anyway Tom,

I'll keep on searching.

RAC or Failover Server

Wayne, November 06, 2004 - 1:01 am UTC

Tom,
RAC is a good configuration to reduce down time and do load balancing. But we turned down the idea of using RAC on our new server and instead, created two domains to support failover (live with 30-45 minute down time, but not 3-4 hours of down time) with the following reasoning:

1) We don't know how reliable RAC is, but we do know how to turn on the failover server and switch users to it;
2) RAC entails extra software cost from Oracle and cluster communication agent software and they are very expensive.
Is it true and it is more expensive than having full redundancy system?
If we need 20 CPUs to support daily operation, with RAC, we can set up two clusters, each with 15 CPUs. When one cluster fails, the other can still provide 75% of the top load; with a redundant failover server, we have to put 20 CPUs on both, the 20 CPUs on the failover server will not be used in 99.9% of the time.
3) Is it so complicated to configure RAC that companies like HP sells their preconfigured servers and claim they worked closely with Oracle Core Development team therefore knows the in and out of it. Do they imply that most people can not figure it out easily? ( In one of the Oracle conference, HP had a demo, but it didn't work out well because the back side fiber optics was damaged during the shippment).

My last question is: Isn't 10g Grid an improved RAC? In what aspect?

Thanks in advance.




Tom Kyte
November 06, 2004 - 10:42 am UTC

1) we don't understand it so we fear it

2) 10g rac comes with everything you need -- no additional clustering or file management software.

and you are telling me 20 dead idle cpu's are free?

3) education overcomes lots of complications. but anyway, not really sure how to address this given your point #1.

part of 10g grid is RAC, a small part -- rac is not grid, grid is not rac.

RAC

A reader, November 06, 2004 - 12:28 pm UTC

Tom,
Thanks for the answer, I am with you there. In fact I asked the same question as an outsider about the 20 dead CPU issue (the real number is 60 !), about the failover time and about the cluster agent software stuff.
You answer to the first question is exactly I want to say but I could not say.

So there is some extra expenses (non trivial) and configuation in 9i to make RAC work? We are not in 10g yet (in fact just got in 9i today).

Regards,

Thanks,

RAC and replication

Marianna, November 08, 2004 - 5:00 pm UTC

Can you please point us to reference on setting up replication on RAC (in one of the answers above you said there are more options for RAC), what are they and where can we read about them? Thank you very much.

Tom Kyte
November 08, 2004 - 9:16 pm UTC

there really isn't one -- it is just that things like the job queues and job queue affinity can be used in RAC -- so you can point the replication "processors" to a single instance if you like.

Otherwise they are pretty much (RAC vs single instance) the same as far as replication goes.

10g RAC

Robert, November 11, 2004 - 4:36 pm UTC

Does 10g RAC need virtual IP for each node? If so, why? 9i RAC did not need virtual IP. I was just wondering why is it different in 10g RAC? Thanks.

Tom Kyte
November 12, 2004 - 6:52 am UTC

there are lots of things different in 10g

search upwards in this discussion for

Its all about availability of the application.

RAC pitfalls

A reader, November 12, 2004 - 8:19 am UTC

tom, i don't want to steel your time but -

on the one hand, the oracle documentation states that a RAC will behave like a single instance and on the other hand you did several times that anyone will have to develop for RAC, test for RAC, ...

if i would go for RAC i would have RAC for production, testing and development, of course. but will something change in ordinary programming OTHER than handling RAC features like TAF?

please, can you give a practical example for such a pitfall if there is one in at all?

regards,
max

Tom Kyte
November 12, 2004 - 1:14 pm UTC

I would say the same for "shared server" too -- develop ON it (not for it, on it, subtle distinction), test ON it, then deploy on it.

RAC is like an amplifier. if you take an application that cannot scale on a single instance (eg: doesn't use binds :), it'll do even worse on RAC. If you have an application that scales well, it'll scale well in general on RAC.

OCFS Vs ASM

Nathan, November 17, 2004 - 7:38 am UTC

Tom,
Thanks for your contribution to the oracle community.


What are the differences between OCFS and ASM ? If the database is 10g on Linux Redhat where we can select either what would be the best option ?


Also, EVA and ASM appears to be doing the same ? Can they co-exist ?


Tom Kyte
November 17, 2004 - 10:57 am UTC

you can "ls" on ocfs, there are "files" there, it is a file system. you can install software on ocfs.

you cannot "ls" on ASM, it is a pure database file system -- no files, just database data stuff. you cannot install software on ASM.

by "EVA" i assume you mean the hp stuff? that is a storage system, you can either use a filesystem driver on top of it to make it a filesystem or you can ASM it. the are complimentary, not mutually exclusive.

not scaling on rac

Dave, November 17, 2004 - 12:42 pm UTC

can you explain how / why an app doesnt scale in rac if it's rubbish on a single server.

It because you are just flooding the shared pool with rubbish that cant be reused and then takes long to parse and you incur far more latching across every instance

or does it affect cache fusion?

Tom Kyte
November 17, 2004 - 2:04 pm UTC

lets say you don't use binds.

so, you hard parse like mad

so you have massive contention on the shared pool...

so you think "two shared pools, that'll do it"

but you forget -- the shared pools have to update eachother -- i need to know on node 1 that you have a query on node 2 that references table T (for cursor invalidation, dependency tracking)..... so, hard parsing becomes even MORE expensive.....



Disk Pings

naresh, November 30, 2004 - 11:48 am UTC

Hi Tom,

In relation to bad applications not scaling on RAC - what about the case where you have processes on multiple nodes accessing the same data blocks in CR mode repeatedly, at a rapid rate? (and I completely agree that's a bad application).

I have read that such blocks would ultimately be written to disk if the CR blocks have to be served to remote instances too often. Is this true? Is there some threshold value for 'CR blocks served' per block beyond which the blocks are written to disk?

In this case, it seems the penalty is much greater than single instance. I do not understand the details of this very well, but I imagine the blocks being repeatedly mastered from one node to another in this situation.


Tom Kyte
November 30, 2004 - 11:58 am UTC

where did you read it. let me see it in context too.

Disk ping on RAC?

naresh, December 01, 2004 - 3:38 am UTC

http://groups.google.com/groups?hl=en&lr=&threadm=999721336.28996.0.nnrp-02.9e984b29%40news.demon.co.uk&rnum=2&prev=/groups%3Fas_q%3DJonathan%2520Lewis%2520Disk%2520cache%2520fusion%26safe%3Dimages%26as_ugroup%3Dcomp.databases.oracle.server%26as_uauthors%3DJonathan%2520Lewis%2520%26lr%3D%26hl%3Den

HI Tom,

Please see above message 4 from Jonathan Lewis in above thread. My previous question was based on this - it is from 2001 - quite a bit in the past, but worth a question. Is it different in newer versions now?

Thanks.
Naresh

Tom Kyte
December 01, 2004 - 8:29 am UTC

BSP's were an 8i'ism, they are not part of 9i RAC.

RAC books

A reader, December 22, 2004 - 4:23 pm UTC

Hi Tom,

Could you please recommend one or two GOOD 10g RAC books?

thanks,
-Charlie

Tom Kyte
December 22, 2004 - 6:55 pm UTC

have you read the free good ones?

RAC Books

A reader, December 22, 2004 - 7:44 pm UTC

Hi Tom,

You mean the Oracle documentation? If so, yes. I enjoyed reading the documenation, but I have to seat in front of the computer all the time. I like to read good books, for example, your books are excellent. Just wondering if there are any good 10g RAC books you knew and would recommend.

thanks,

Tom Kyte
December 22, 2004 - 7:56 pm UTC

i won't recommend anything I personally haven't read -- and to date I haven't read any of the "on the market rac books", so....

anyone out there read a RAC book and thought it was useful?

Killing a session in RAC

Reader, December 23, 2004 - 7:18 am UTC

If I have a two node RAC .
I am on node1 but I wish to kill a session on node2 by

select sid,serial# from gv$session
then,
alter system kill session 'Sid,Serial#' ;

Do I have to connect to the other instance or I can kill sessions on node2 from node 1.

Thanks



Tom Kyte
December 23, 2004 - 11:47 am UTC

(I've actually got a RAC cluster running on my laptop :)  vmware rocks.
http://www.oracle.com/technology/tech/linux/vmware/index.html

If you've got the bandwidth to download it -- you get a two node rac cluster running on linux with ASM and OCFS all virtualized in a vmware environment (90 day trial license of vmware included).  You want 1.5-2gig of ram on your machine to run it comfortably (the vm wants about 750meg of ram and you need memory for your host OS).  You run windows, it runs linux for you.  Really cool way to check it out (no database install or configuration, it is all setup for you -- this is what we used as OOW to do the hands on labs with)


SQL> select instance_number, instance_name from v$instance;
 
INSTANCE_NUMBER INSTANCE_NAME
--------------- ----------------
              2 O10G2
 
<b>so, i'm in instance 2</b>


SQL> c/v/gv
  1* select instance_number, instance_name from gv$instance
SQL> /
 
INSTANCE_NUMBER INSTANCE_NAME
--------------- ----------------
              2 O10G2
              1 O10G1
 
SQL> select inst_id, sid, serial# from gv$session where username = 'SCOTT';
 
 
   INST_ID        SID    SERIAL#
---------- ---------- ----------
         1        117        329
 
<b>Scott is in instance 1</b>


SQL> alter system kill session '117,329';
alter system kill session '117,329'
*
ERROR at line 1:
ORA-00030: User session ID does not exist.
 

<b>and we cannot "see" his sid, alter system works on the local system...</b>
 

two-node RAC

A reader, December 23, 2004 - 4:26 pm UTC

Hi Tom,

I currently have 3 linux boxes running Red Hat Enterprise Linux 3 (RHEL-3), can I use two of them to build a two-node Oracle RAC 10g database, and USE THE THIRD MACHINE AS SHARED STORAGE. If so, how can I do that? (pure NFS mount seems not working...)

Is OCFS still required in Oracle 10g RAC? If not, where do we install Cluster Ready Services (CRS)? ASM cannot be used for CRS, right?

Happy Holidays!!

thanks,

Tom Kyte
December 23, 2004 - 7:39 pm UTC

how can a "computer" serve as "storage"

storage does that.

You can use OCFS
You can use ASM

that third machine either will be

a) part of the cluster
b) or not

but it won't become a "storage array" of any sort.

You would run ASM on all nodes that want access to the underlying storage.

two-node RAC

A reader, December 23, 2004 - 11:09 pm UTC

Hi Tom,

Thanks! The question is ASM cannot be used for cluster ready services (CRS), right?

Tom Kyte
December 24, 2004 - 8:58 am UTC

like asking if a banana can be used for baking a cake.

ASM provides storage access.

CRS provides clustering.

ASM works in a clustered environment.
CRS provides a clustered environment.
ASM does not need a clustered environment but can operate in one.

Performance on read only tablespaces

David Pujol, December 27, 2004 - 8:08 am UTC

Hi Tom, I'm reading a lot of articles about Real Applications Clusters, but, I've read articles that talk about to improve performance if you place your tables/indexes that only do read operations, in read only tablespaces. Is true?, and if is true, can you explain me why?

I'm really interested in why.

A lot of thanks.
David Pujol

Tom Kyte
December 27, 2004 - 10:16 am UTC

no need to ask around if someones modified the data -- with RAC you have many caches to keep consistent -- if you have a copy of a block from a read only tablespace in your cache -- you KNOW that you don't need to ask "has anymore touched this guy over here"

Performance on read only tablespaces

David Pujol, December 27, 2004 - 10:53 am UTC

Ok, but it does mean that when I do a read only request Oracle RAC doesn't use cache fusion layer?. Can you explain me with more details this behavior?

A lot of thanks Tom
David Pujol

Tom Kyte
December 27, 2004 - 11:00 am UTC

it uses cache fusion but think about it


You are asked to do an LIO on block 5 from file 55.

You find it in your cache.


You also KNOW that file 55 is part of a read only tablespace. You are done, return block, finished -- NO OTHER NODE COULD HAVE MODIFIED IT.


Say you don't KNOW that - file 55 is part of a read/write tablespace. Before you can return that block, you have to find out if you "need to fuse the caches again" someone might have a committed version of that block out there you need -- with read only data that cannot happen -- no one could have possibly modified it.

Performance on read only tablespaces

David Pujol, December 27, 2004 - 11:20 am UTC


A reader, December 27, 2004 - 1:29 pm UTC


Doubts in tnsnames.ora in RAC configuration

N.Balasubramanian, January 08, 2005 - 1:32 am UTC

Hi Tom,

I give below the model tnsnames.ora
--------
test =
(DESCRIPTION =
(LOAD_BALANCE = OFF)
(FAILOVER = ON)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = host1)(PORT = 1526))
(ADDRESS = (PROTOCOL = TCP)(HOST = host2)(PORT = 1526))
)
(CONNECT_DATA =
(SERVICE_NAME = test)
(FAILOVER_MODE = (TYPE = SELECT))
)
)

test1 =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE = OFF)
(FAILOVER = ON )
(ADDRESS = (PROTOCOL = TCP)(HOST = host1)(PORT = 1526))
)
(CONNECT_DATA =
(SERVICE_NAME = test)
(INSTANCE_NAME = test2)
)
)

test2 =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE = OFF)
(FAILOVER = ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = host2)(PORT = 1526))
)
(CONNECT_DATA =
(SERVICE_NAME = test)
(INSTANCE_NAME = test2)
)
)

LISTENERS_test =
(ADDRESS = (PROTOCOL = TCP)(HOST = host1)(PORT = 1526))
(ADDRESS = (PROTOCOL = TCP)(HOST = host2)(PORT = 1526))

LISTENERS_test1 =
(ADDRESS = (PROTOCOL = TCP)(HOST = host1)(PORT = 1526))

LISTENERS_test2 =
(ADDRESS = (PROTOCOL = TCP)(HOST = host2)(PORT = 1526))
-----------
Questions:
-----------
From 8i onwards, the database automatically registers with the listener. And if we use non-default listener (other than port 1521) we have to add an entry in init.ora and the tnsnames.ora file. For host1, listener_test1 is the local listener and for host2 listener_test2 is the local listener. And for both the nodes, listener_test is the remote listener.

1) what is the need for the listener parameter in tnsnames.ora on the server side.
2) What are entries required at the client side tnsnames.ora
3) What is the exact use of the three net_service_names test, test1, test2
4) Please tell me how the connections are resolved.
5) One more doubt.
service_names = dbname + domain_name
service_name = dbname + domain_name
global_name = dbname + domain_name
global_dbname = SID + domain_name
IS it right ?

I have read the manual many times and I am not able to understand. That is why I am troubling you.

Thanks



Tom Kyte
January 08, 2005 - 4:52 pm UTC

1) it is not needed, you put things in the tnsnames.ora file because the client running on that machine wants to have that translation available. so, it is not "mandatory"

2) whatever you like? I mean -- only you can answer that. You would put entries that point to the services you want them to get pointed to, using whatever options or features you wanted them to have...


3) they are just samples? testN is a good "sample" name

4) Oracle takes the string test1 for example from sqlplus scott/tiger@test1 -- it goes to the tnsnames.ora file, looks it up, gets the details, parses the details and uses that information to connect.


5) no, not at all.

service_names/service_name = strings you like, they are just "names"
global_name = "a name you like, should be unique in your network of computers"




Thank you for replying even during week-ends

N.Balasubramanian, January 09, 2005 - 10:57 pm UTC

Thank you Tom for your immediate response.
NOrmally your answers will be more explicit, with lots of examples. But, NOw your answers are very terse.
Even in Oracle manuals, service_names is given as dbname + domain_name.
OK. Is global_dbname (in listener.ora) SID + domain_name ?
I will come back to you after doing some more reading.
Thanks

Tom Kyte
January 10, 2005 - 8:35 am UTC

it is convention only.


You can name it whatever you feel like. Terse answers spawn from things that don't have "lots of detail". Service name is a string you enjoy, you like, makes you feel good --whatever you want it to be.

In fact, the same database can register itself with "N" service names. It is just a string.

10g RAC Concepts

Arun Gupta, January 12, 2005 - 9:37 am UTC

Tom,
In Oracle 9i docs, there is a RAC Concepts document. In Oracle 10g docs, there is no such documentation. Does this mean that nothing has changed from 9i to 10g or is the concepts information available in some other manual. I am referring to the links:
</code> http://www.oracle.com/pls/db92/db92.docindex?remark=homepage#index-RAC http://otn.oracle.com/pls/db10g/portal.portal_demo3?selected=3#index-REA <code>
Thanks

Tom Kyte
January 12, 2005 - 10:04 am UTC

the 10g concepts guide was a major change from the 9i -- you'll find it in there, consolidated down.

A reader, January 12, 2005 - 10:59 am UTC


How come you haven't provided link. Kidding :-)

Thanks.

help plz

sumit, January 31, 2005 - 4:20 pm UTC

hi tom i am getting these messages in my alert file.


Started first pass scan
Mon Jan 31 09:39:00 2005
Completed first pass scan
17 redo blocks read, 8 data blocks need recovery
Mon Jan 31 09:39:00 2005
Started recovery at
Thread 1: logseq 35, block 127775, scn 0.0
Thread 2: logseq 35, block 48165, scn 0.0
Recovery of Online Redo Log: Thread 1 Group 1 Seq 35 Reading mem 0
Mem# 0 errs 0: G:\ORADATA\PDONDS\REDO01.LOG
Recovery of Online Redo Log: Thread 2 Group 3 Seq 35 Reading mem 0
Mem# 0 errs 0: G:\ORADATA\PDONDS\REDO03.LOG
RECOVERY OF THREAD 1 STUCK AT BLOCK 8058 OF FILE 1
Mon Jan 31 09:40:18 2005
Restarting dead background process QMN0
QMN0 started with pid=14


-----------

both the instances are not starting

OS is windows 2000
OCFS
oracle 9.2.0.4 RAC

Tom Kyte
January 31, 2005 - 4:32 pm UTC

contact support.

Cache Fusion and Resource Mastering

reader, February 14, 2005 - 2:01 pm UTC

In a single Instance if a user submits a read then Oracle looks into the cache and if not found it gets it from the disk.
In RAC ,
it will have to look into the cache of all the active instances.
I understand (as per the concepts manual ) that there will be a resource master instance and all requests for that particular request will be mastered on that instance.

How does oracle pickup this Resource Master ?
So is it true that all the participating instances can be the resource Master for some resource .

Thanks


Tom Kyte
February 14, 2005 - 5:42 pm UTC

no, it'll ask "anyone got it", there is a resource manager. it just happens, it is part of the software. All of the instances are a resource manager for some bits of data.

it is sort of internal -- the hows and the implementation and will vary from release to release over time.

RAC vs single instance

Reader, February 14, 2005 - 10:29 pm UTC

Thanks for clarifying my doubts .

My other concern is --

We have a large ERP application currently running on a 8 cpu sun box .We wish to transfer this application to a two cpu * 4 nodes = 8 cpu's RAC primarily to save cost on the hardware and if required add more 2cpu nodes as the next level of the Sun machines are much expensive.

The argument here is that what happens to the large batch jobs which are currently taking a lot of CPU time (more than 200 secs ) --will they become worse .
I am scared that there will be a lot of traffic over the interconnect and moving towards RAC could be a disaster .

Your advice will be highly beneficial in moving towards RAC and grid technology .

Thanks again ,



Tom Kyte
February 15, 2005 - 12:08 pm UTC

one word:

benchmark

well, two words:

hypotheticals


benchmark, so you need not fear hypothetical issues that might arise maybe....


I won't be able to convince you with words. You have to see it to believe it.

I will try ..

Reader, February 15, 2005 - 10:59 pm UTC

Well ,I am so amazed with the Architecture of RAC that I wish to move this problematic database to a clustered environment .It is about one TB in size .

Frankly ,we are pushing it hard but the 'senior generation' is not very comfortable with RAC so are putting all kinds of spokes.

Some day I will win this battle and then put my findings here .

Thanks for your encouragement.



Tom Kyte
February 16, 2005 - 7:48 am UTC


RAC is not going to make a 200 cpu second process run much faster.

algorithm changes would do that.

Cache Fusion --

Reader, February 24, 2005 - 7:45 am UTC

The cache which is Fused --

Data Buffer
Library Cache
Data Dictionary Cache

Anything else ?

What happens when there a parallel execution of any sql with slaves working on multiple nodes ?

Thanks

Tom Kyte
February 24, 2005 - 8:25 am UTC

all of them more or less. If I hard parse a SQL statement that queries T1, T2, T3 on node1, node2 needs to know I have a cursor dependent on tables T1, T2, T3... The library cache isn't so much "fused" as "invalidated in concert" -- so ddl on one node knows to invalidate stuff on the the others...

with parallel query over the nodes, it is pretty much the same as PQ on a single node -- a coordinator will have the processes on the different nodes run parts of the query and send their results back to a single coordinator ultimately to be returned to the client.

A reader, February 24, 2005 - 11:26 am UTC

Hi Tom,

I have 2 database with same sid in 2 different servers (ex: server1 & server2).

Both in same global domain.

I want to connect to these 2 database via Connection manager.

currently I can't do that. Could you please tell me the way that I can reach for this?

(1). can I make a alias in connection manager config files?

or (2). Do I have to reside these 2 database in 2 domain? then does it work?

could you please give me hint on this.

thanks

Tom Kyte
February 24, 2005 - 5:03 pm UTC

what is a "global domain"

and since the sid is not relevant in distributed databases, what is the "error" you might be getting?

Parsing in RAC

reader, February 24, 2005 - 11:01 pm UTC

Based upon your your book ,Chapter 5 ,statement processing --

In the case of RAC ,

if a user submits a sql statement --

1.it is parsed,syntax checked ,hash value is computed and then checked the cache to find out if the matching statement is already present there.(for soft parse)
now,if it finds the statement in the cache of another node then the generated plan is brought to the local and executed (pl comment)
Also ,if the cache is synchronised then it is safe to assume that all the nodes will have the desired info about library cache and dictionary cache locally available .(pl comment).

2. If a hard parse is required ,
GES will replace the latch of the single instance .
The referenced objects will need to be latched while the plan is being generated.
The required objects could be in a different cache .A latch will be placed in the respective cache (different node) OR will this be shipped to the local node by GES/GCS .(pl comment)

3.If the CBO is looking for the stats for a table and finds it in the cache of another node then will this be read from the other node OR shipped to the local node .
(pl comment)

Is there a way I can trace these activities ?

I apologize if my questions do not make sense though I have tried my best to compile them with clarity .


Thanks


Tom Kyte
February 25, 2005 - 4:49 pm UTC

no, it just lets the other nodes that "i am using this table". Each node has it's own unique library cache information (if you don't query table T1 on node B, node B won't have T1 library cache information.

The caches work together, but they are not mirror images of eachother. Each instance must let the others know what they are doing in general, but they are not clones.

It is more like the "left hand letting the right hand know what it is doing"


blocks are "cache fused". dictionary stuff is stored in "blocks" (the database runs queries). These queries (recursive sql) would use the same block fusing as your queries do.

Inter-node Row Level Locking

Reader, March 01, 2005 - 11:14 pm UTC

I was going thru a white paper on cache Fusion and unable to understand the following :

"Cache Fusion block Transfers operate independently of these user visible row level locks .The transfer of data blocks by the GCS is a low level process that can occur without waiting for row level locks to be released .Blocks may be transferred from one instance to another while row level locks are held.As long as there are no row level locks the GCS provides access to data blocks allowing multiple transactions to proceed in parallel."

Can you help me understand this ?

Thanks



Tom Kyte
March 02, 2005 - 7:00 am UTC

It means in short:

"cache fusion is transparent to you the end user/developer. It allows for the finest granularity of locking -- locking at the row level, so even in RAC reads do not block writes, writes do not block reads and writes only block writes of the same row (assuming no table locks of course)"

I'll speak loosely here:

The blocks are in "current mode" in one instance at a time.

I on node 1 may update and lock row x=5 on block 42 of file 5. In order to do that, node 1 will get block 42,5 and "own it". Other nodes might have copies of it. I have not committed.

You on node 2 may not update and lock row x=6 on block 42 of file 5. In order to do that, node 2 will get block 42,5 and "own it". My node 1 might have a copy of it, but your node "owns" it. Note: you have not committed.

So, now we have two transactions, in parallel, on different nodes, each playing with data on the same block.

9i RAC configuration file

Reader, March 02, 2005 - 11:03 pm UTC

Questions related to configuration file:

(1) If one uses CFS for storage, can the configuration file be placed on CFS or should it be on a raw partition? What is the recommended practice?

(2) What is the recommended practice to backup the config file in 9i RAC?

(3) If the config file gets corrupt or is not accessible, how does one proceed? Is it recommended to create a new partition and add all the info again using srvctl or restore from the backup if there is one? Is there a recommended solution? Does support have a tool to salvage a corrupted config file to extract info from it?

(4) What is the ROT to size the config file?

Thanks for your time.


Tom Kyte
March 03, 2005 - 7:22 am UTC

1) yes -- discussed above, ocfs would be a place for the shared configuration files

2) if you use spfiles, rman will do it, else you do it (they are just flat files in the OS)

3) backups. which config file are you talking about specifically?

or when you say "config" do you really mean "control"?

dave, March 03, 2005 - 9:27 am UTC

perhaps he means the quorum file?

9i RAC config file

Reader, March 03, 2005 - 10:42 am UTC

Yes, I was asking about the file that RAC uses to store information about nodes/instances using srvctl. On some platforms, it is also used as quorum file. Thanks.

Transactions in parallel

Reader, March 08, 2005 - 7:03 am UTC

"So, now we have two transactions, in parallel, on different nodes, each playing with data on the same block"

This is from the explanation above.

But this is also true in a single-instance database.Different sessions can update row from the same block.So ,the behaviour is same in RAC and non-RAC ?

Thanks


Tom Kyte
March 08, 2005 - 8:05 am UTC

the conceptual behaviour is the same, the observed concurrency (that there is concurrent modification of the same block) is the same

the underlying mechanics are a little different, but the net result is the same

Full table scans in RAC

Jon, March 08, 2005 - 8:34 am UTC

I have observed that ad-hoc full table scans on large tables in our RAC environments are slower than the same FTS in our single-instance environments, even when run repeatedly from the same node.

Doing a trace reveals the time is taken by GCS waits. All other measures indicate our RAC environments are running fine.

Our system is OLTP anyway, so this is not a concern - more just curious. Could you explain why this is?

Tom Kyte
March 08, 2005 - 10:49 am UTC

what are the PIO (physical) counts and LIO (logical counts in each environment?

where's the 10g RAC concepts guide?

Jeremy, March 08, 2005 - 11:05 am UTC

regarding 10g RAC concepts, you said: "the 10g concepts guide was a major change from the 9i -- you'll find it in there, consolidated down."

I can't find any such documentation in the 10g concepts guide; in fact a search for the phrase "LMS" turns up no hits:

</code> http://otn.oracle.com/pls/db10g/db10g.drilldown?word=LMS&book=b10743

oddly enough, it does get hits in the 10g admin guide, 9i concepts, and 9i RAC concepts.

http://otn.oracle.com/pls/db92/db92.drilldown?word=LMS http://otn.oracle.com/pls/db10g/db10g.drilldown?word=LMS <code>

are they still working on a 10g RAC concepts guide or should we just read the 9i guide and other books (such as your 1-on-1 book) to get the basic concepts?


Tom Kyte
March 08, 2005 - 12:01 pm UTC

but LMS is not "rac concepts", it is a background process.

The RAC concepts are in the concepts guide.

LMS gets a hit in the admin guide, but it isn't really necessary at the concepts level.

Not every process for even single instance oracle is in the concepts guide.

From the alert log 10g RAC

Reader, March 08, 2005 - 7:57 pm UTC

What do they mean ?

1.Private_strands 0 at log switch

2.LMS 1: 0 GCS shadows cancelled, 0 closed

3. LMS 1: 0 GCS shadows traversed, 0 replayed

Also ,

What happens in the second pass in the instance recovery in RAC ?

THANKS !


Full table scans in RAC cont...

Jon, March 08, 2005 - 9:13 pm UTC

As requested, here are the PIOs & LIOs for the non-RAC and RAC environments:

pfstst4:CST3:/u01/home/oracle> sqlplus /

SQL*Plus: Release 9.2.0.6.0 - Production on Wed Mar 9 13:08:28 2005

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.6.0 - Production

SQL> set timing on
SQL> show parameter cluster

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_database                     boolean     FALSE
cluster_database_instances           integer     1
cluster_interconnects                string
SQL>
SQL> set autotrace on
SQL>
SQL> select /*+ full(a) */ count(*) from pfs_owner.account a;

  COUNT(*)
----------
   2996431

<b>Elapsed: 00:00:16.19</b>

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5217 Card=1)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'ACCOUNT' (Cost=5217 Card=2996427
          )

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      52056  consistent gets
      46501  physical reads
          0  redo size
        494  bytes sent via SQL*Net to client
        651  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed


SQL> conn waldronj@CSP
Enter password:
Connected.
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
PL/SQL Release 9.2.0.6.0 - Production
CORE    9.2.0.6.0       Production
TNS for Solaris: Version 9.2.0.6.0 - Production
NLSRTL Version 9.2.0.6.0 - Production

Elapsed: 00:00:00.01
SQL> show parameter cluster

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_database                     boolean     TRUE
cluster_database_instances           integer     2
cluster_interconnects                string
SQL>
SQL> set autotrace on
SQL>
SQL> select /*+ full(a) */ count(*) from pfs_owner.account a;

  COUNT(*)
----------
   3055765

<b>Elapsed: 00:00:37.61</b>

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=5129 Card=1)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'ACCOUNT' (Cost=5129 Card=3034426
          )

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      53089  consistent gets
      43156  physical reads
          0  redo size
        494  bytes sent via SQL*Net to client
        652  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Thanks.
 

Tom Kyte
March 08, 2005 - 11:12 pm UTC

lets see the 10046 level 12 trace from both

and why different answers?

OCFS bug -- might be the reason for the full table scan issue in RAC

N.Venkatagiri, March 09, 2005 - 2:27 am UTC

(3849952-- issues in full table scan of OCFS).

We added this patch on our DB in RAC which solved similar issue.

FTS in RAC cont....

Jon, March 10, 2005 - 4:13 am UTC

Hi Tom,

Back again with the 10046 trace:

Non-RAC environment

select /*+ full(a) */ count(*)
from
pfs_owner.account a


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 3.46 12.58 51999 52056 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 3.46 12.58 51999 52056 0 1

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 48

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE
2996433 TABLE ACCESS FULL OBJ#(20757)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
db file scattered read 3286 0.03 10.55
db file sequential read 17 0.00 0.00
SQL*Net message from client 2 7.80 7.80

********************************************************************************

RAC Environment

select /*+ full(a) */ count(*)
from
pfs_owner.account a


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 13.20 47.34 45637 53083 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 13.20 47.35 45637 53083 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 42

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
global cache cr request 8148 0.05 3.98
db file scattered read 4937 4.28 28.96
db file sequential read 1000 0.02 1.43
db file parallel read 566 0.04 3.47
latch free 1 0.00 0.00
SQL*Net message from client 1 0.00 0.00

********************************************************************************

There's a hard parse in the RAC trace above but that doesn't materially affect the outcome.

As for the different results - well the test database is a few weeks behind production (but it is a clone of production), so it has fewer rows. But its only a 2% difference for this table, so I would expect the FTS times to be comparable. In fact the production server is attached to EMC Symmetrix storage, and the PIO is generally much faster than our test box.

But I see some parallel reads there in the prod environment - where did they come from? We're not running with parallel_query, and degree=1 for that table.

Can you shed any light on this?



Tom Kyte
March 10, 2005 - 7:44 am UTC

but the biggest contributor is db file scattered read -- the others are noise right now. Let's concentrate on that.

How different is the underly disk architecture here?

FTS on RAC

Jon, March 10, 2005 - 8:56 am UTC

Yes, agreed. A max-wait of 4.3 secs for a scattered read is a bit of a worry. The GCS waits seem fine.

The non-RAC environment is just a standard mid-range server (Sun Fire v480) with a bunch of attached drives, mirrored. Using a cooked file system, not even direct IO. Nothing special there.

The RAC environment is 2 x Sun Fire v1280s using EMC Symmetrix attached storage over fibre. Provided to us by our "out-sourcing partner" - just one logical mount point (i.e. a black hole - both technically and financially).

Batch processing (heavy PIO) has generally been 4-6x faster using the EMC storage, until RAC was introduced. The writes are still very fast - just the reads seem to be struggling a bit. With 9iRAC, we also have Veritas clusterware in the mix.

From the trace, it doesn't appear to be GCS, but it may be some side effect of the clustering? Well out of my area of expertise here...


Tom Kyte
March 10, 2005 - 7:01 pm UTC

you might try a test if possible with a single instance and see if it is the storage itself, even without rac (always trying to remove pieces to see what piece being removed makes a change)

FTS on RAC

Jon, March 11, 2005 - 6:31 am UTC

Will do. We're upgrading to 10g in 6 weeks, so we'll use that as an opportunity to evaluate the read performance in more detail. Thanks for your time.

srvctl configuration file questions

reader, March 16, 2005 - 1:41 pm UTC

I posted these questions above but did not get proper answers so I am posting them again for your consideration.

Questions related to configuration file srvctl uses to register information about cluster database, instances, nodes etc and also it is used as a quorum:

(1) If one uses CFS for storage, can the configuration file be placed on CFS or should it be on a raw partition? What is the recommended practice?

(2) What is the recommended practice to backup the config file in 9i RAC?

(3) If the config file gets corrupt or is not accessible, how does one proceed?
Is it recommended to create a new partition and add all the info again using srvctl or restore from the backup if there is one? Is there a recommended solution? Does support have a tool to salvage a corrupted config file to extract
info from it?

(4) What is the ROT to size the config file?

Thanks for your time.


Tom Kyte
March 17, 2005 - 6:38 pm UTC

I asked Kirk McGowan to look at this (a RAC expert in the field) and he says:

(1) If one uses CFS for storage, can the configuration file be placed on CFS or
should it be on a raw partition? What is the recommended practice?

Certainly - if CFS is being used for the datafiles, then srvconfig can be placed on CFS as well.


(2) What is the recommended practice to backup the config file in 9i RAC?

Basically, just take a copy of it. In 9i, srvconfig was not viewed as a critical component - losing it simply meant on had to revert to using sqlplus and lsnrctl to start/stop instances and listeners.

Also note that srvconfig file has nothing to do with cluster voting (quorum). (On Windows and Linux, cluster quorum was handled with another file, which could also be located on OCFS). This has all changed with 10g and the release of CRS.


(3) If the config file gets corrupt or is not accessible, how does one proceed?
Is it recommended to create a new partition and add all the info again using
srvctl or restore from the backup if there is one? Is there a recommended
solution? Does support have a tool to salvage a corrupted config file to extract
info from it?

If it gets corrupt, and there is no backup, you pretty much have to rebuild it with srvctl - a bit inconvenient, but not the end of the world.

(4) What is the ROT to size the config file?

My memory is not that good. I'm thinking the recommendation was around 100mb - but that was in the doc.

K.

Thanks Kirk and Tom. Very useful information....

reader, March 17, 2005 - 9:20 pm UTC


Instance Recovery in RAC

reader, March 18, 2005 - 7:52 pm UTC

I read in the documentation about Instance Recovery in RAC .
It has a first pass and second pass recovery .
Is it similar to applying all transactions (first pass) and then rolling back all the uncommitted transactions (second pass)
OR
something else ?

Thanks again .

Tom Kyte
March 18, 2005 - 8:18 pm UTC

links to documentation so I can see what you saw in context would be appreciated.

otn.oracle.com -> documentation, use those links.

Instance Recovery in RAC

Reader, March 19, 2005 - 3:50 pm UTC

Regarding Documentation ,
Concepts (chapter 10 pg 14) and Administration (Fig 7-1) guides on RAC do not go into great details .
But the metalink document 144152.1 refers to the first pass read and two pass log read.
-------------------------------------------
Crash/Instance Recovery for Cache Fusion
Because of the possible existance of past images in remote buffer caches, instance or crash recovery is handled differently in a RAC environment than in previous versions. The major differences are that thread recovery of failed instance(s) are done by a surviving instance's SMON process instead of a foreground process. The second major change is that during bounded instance and crash recovery (which introduces a two-pass log read during thread recovery) SMON eliminates BWR's (block written redos) from the recovery set. This enhancement should speed up recovery time if there were existing past images. So, if an instance fails:
1. The instance, or instances, dies.
2. Failure is detected by cluster manager or CGS.
3. Reconfiguration occurs and all locks owned by the departing instance are remastered (see Note 139435.1 for more info) and the first pass read of threads of failed instances done by SMON.
4. SMON claims locks needed to recover blocks found by the first pass read.
5. Locks are obtained and second pass of redo threads of failed instances is performed and blocks become available as they have been recovered.

-------------------------------------------

Oracle Online training on RAC also refers to first pass and two pass log read.

I would be grateful(as always) if you could help me understand the concept of pass (first and second) and what exactly is the work involved in these two phases .

Thanks

Tom Kyte
March 19, 2005 - 8:33 pm UTC

first pass identifies and locks blocks in need of recovery (they might not be in need -- since multiple nodes could have modified that block and the failed node might not have been the last)

second pass recovers and makes availble the blocks identified and locked in the first pass.

Instance recovery in RAC ?

Reader, March 23, 2005 - 7:02 pm UTC

Based upon the Global Resource Directory (GRD) ---the recovering instance will decide the availability of similar image of the block (to be recovered ) in any remote cache and accordingly take action whether it needs recovery or not
Does this mean that the contents of GRD are continuously sync'ed in all the SGA's ?

I understand that the GRD is maintained by GCS and GES processes of the instance which is the resource master .

Is there a gv$view which we can use to see the contents of GRD ?


RAC

Arun, April 12, 2005 - 11:39 pm UTC

It was very useful discussion forum. I am so glad that you guys are doing this.

Connectivity to Particular node on RAC mode

Prem, April 14, 2005 - 2:06 pm UTC

Hello TOM,

Problem: How to connect to particular node in RAC environment.

I configured RAC in 2 nodes.

Details
---------
Hostname Instance)Name
DB1 DMS1
DB2 DMS2

TNSnames.ora
---------------
DMS =
(DESCRIPTION =
(ADDRESS_LIST =
(address = (protocol = tcp)(host = 10.52.0.9)(port = 1529))
(address = (protocol = tcp)(host = 10.52.0.11)(port = 1529))
(LOAD_BALANCE = yes)
)
(CONNECT_DATA =
(SERVICE_NAME = DMS)
(failover_mode =
(type = select)
(method = basic)
)
)
)

Using this string dms i may connect to node1 or node2 depending upon load.

Some time i want to connect exclusily to db1 or db2 for that i created 2 TNS entries

DMS2 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.52.0.11)(PORT = 1529))
)
(CONNECT_DATA =
(SERVICE_NAME = DMS)
)
)

DMS1 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.52.0.9)(PORT = 1529))
)
(CONNECT_DATA =
(SERVICE_NAME = DMS)
)
)

When i tried to connect username/paswd@dms1 exclusively. When i check v$instance table some times showing db1 or db2

Can you pls tell me how to connect exclusvily to db1 r d2.

Thanks & Regards
Prem.


Tom Kyte
April 14, 2005 - 2:22 pm UTC

you would/could use the oracle_sid instead of the service_name in the tnsconnect string. the service name is used expressly to achieve "connect to a rac cluster", instead of connect to this specific instance.

or, you could use the instance_name=

dms2 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST =..)(PORT = ..))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = dms)
(INSTANCE_NAME = dms2)
)
)



Connectivity to Particular node on RAC mode

Prem, April 15, 2005 - 12:53 am UTC

Thanks A Lot Tom..
Rgds
Prem

Connectivity to Particular node on RAC mode

Prem, April 15, 2005 - 3:58 am UTC

Tom,

After changing to As you said:

dms2 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST =..)(PORT = ..))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = dms)
(INSTANCE_NAME = dms2)
)
)


{ OR }

DMS1 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.52.0.9)(PORT = 1529))
)
(CONNECT_DATA =
(SERVER - DEDICATED)
(SID = DMS1)
)
)

I am getting the same Error:
ORA-12504: TNS:listener was not given the SID in CONNECT_DATA.

Rgds
Prem


Tom Kyte
April 15, 2005 - 9:09 am UTC

you would have to setup the listener.ora for the sid as well, so it knows about it (it'll permit you to connect remotely as sysdba as well in that case if you wanted to set that up)

server - dedicated

would be wrong.

Connectivity to Particular node on RAC mode

Prem, April 15, 2005 - 9:09 am UTC

Tom, One strange issues i got on the same connectivity.

After changing following TNS entires got connected.

(CONNECT_DATA =
(SERVER - DEDICATED)
........

If i give SERVER = DEDICATED give same TNS error is getting.
Strange if i give SERVER - DEDICATED its connected.

Can you help me what could be the reasons.

Thanks & Regards
Prem

Tom Kyte
April 15, 2005 - 9:36 am UTC

that seems backwards to me truth be told.

is the listener on the same machine as the database?

24 node RAC,

sns, April 15, 2005 - 11:10 am UTC

In my current organization, three years ago they invited Teradata, Oracle, DB2 to implement their warehouse system. The size of the warehouse was around 20TB.

The requirement was to evenly distribute data across all the nodes and when a query is written to extract data, it should make use of the capacity of all the nodes and run faster. If a table contains at least 24 records, each row should be stored in each node. Well, the other key thing is to handle 20TB more effectively.

This is what RAC does right???

According to my manager, Oracle felt NOT POSSIBLE and finally Teradata proved that and our company have implemented TD right now. It seems TD is pretty good but not as flexible as Oracle.

As a big Oracle Fan, I did not totally agree with my manager but I thought sharing this info with you.

Could you please argue this matter in detail?

thanks,



Tom Kyte
April 15, 2005 - 11:28 am UTC

<quote>
If a table contains at least 24 records, each row should
be stored in each node.
</quote>

"stored" or "accessible from"

In Oracle it would be "accesible from" all nodes
In the others, a node would "own" it..


I don't know what to "argue" as I don't understand the situation. We do not statically partition data "across nodes". We have access to every by from every node. We use shared disk.

We can run a query in parallel on every node and in fact run the split up queries in parallel on each node. We need not partition the data to do so. We partition data to make your life easier, not the life of the software (as you do in teradata or db2)

We are shared disk.
They are shared nothing, static partition, a node owns a partition.

RE: connecting to a particular node on RAC

Vinayak, April 15, 2005 - 11:12 am UTC

We are using :

eca1 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = XXXX)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = eca.qa1.chase.com)
(INSTANCE_NAME = eca1)
)
)

and it works. Without any SERVER=DEDICATED line.

RAC performance issue

Menon, April 22, 2005 - 2:55 pm UTC

Database - 10g r1
Consider the following scenario.

A two node RAC.

There is a hot table which is being inserted into
from middle tier processes running on both nodes.

There are queries being executed on the same hot
table from some other middle tier processes.

Are there any *RAC-specific* contention issues we
should be aware of here?
I am aware of the free lists feature for
eliminating simultaneous insert contentions but
are there any other issues that jump out to
your mind?
(Sorry if this was not too "topic-related")

Thank you.


Tom Kyte
April 22, 2005 - 3:44 pm UTC

you'll have lots of global cache cr requests as the blocks fly back and forth for the queries. The inserts should not contend (on the table, indexes -- entirely different story perhaps), and the queries will be making the blocks go through the interconnect.

that'll be different than in single instance oracle.

what we thought..

Menon, April 22, 2005 - 4:18 pm UTC

That sort of confirms what some colleagues here
were thinking about. I guess it is one of those
"a problem in single instance gets magnified in
case of RAC" cases.
Btw, by cr blocks flying back and forth, you mean
cr blocks in one node may be required by the other
node and hence may need to go through the
cache synchronization process, correct?

Say you have confirmed that this is the issue.
Are there any thoughts on how you would address
this issue (apart from avoiding it in the first
place:))?

Thank you.

Tom Kyte
April 22, 2005 - 4:34 pm UTC

yes, the block that was modified and committed on node 1 needs to get to node 2 to be read and vice versa, they go back and forth.

To address this, we would need look at the design, no "general" solution, we'd have to look at the application and what it was doing and see if there was an equivalent implementation that would not have the same impact.

Thanx!

Menon, April 22, 2005 - 4:36 pm UTC


Design Change

Neeraj, April 22, 2005 - 5:42 pm UTC

Tom,

You said "we would need to look at the design". However, my undestanding about RAC implementation is that it requires *No Special* design conderations, whatsoever (as compared to other HA solutions). Correct me if I am wrong?

Thanks,
Neeraj

Tom Kyte
April 22, 2005 - 7:09 pm UTC

Nothing *requires* special design considerations, it'll work, it'll scale, it'll have that bottleneck however.

Single instance oracle -- this works:

create table my_sequence( name varchar2(30), val number );
insert into my_sequence values ( 'FOO', 0 );

create or replace funtion get_new_seq_val( p_name in varchar2 ) return number
as
l_number number;
begin
update my_sequence set val = val + 1 where name = p_name returning val into l_number;
if (sql%rowcount = 0 ) then .... failure
return l_number;
end;
/

it works, it'll be a bottleneck.

Every system has bottlenecks, we use physical implementations and design techniques to avoid them.

Every system.


I wrote in Expert One on One Oracle "it is far easier to build a non-scalable application than it is to build a scalable one, just because you are using a scalable database doesn't mean you don't have to think about it"

Thanks

Neeraj Nagpal, April 25, 2005 - 1:56 pm UTC

Thanks for the answer Tom

is it possible to build RAC@LINUX with NFS as shared drive

Jianhui, April 27, 2005 - 5:12 pm UTC

Hi Tom,
I do not have extra budget for shared drives. Is it possible to implement shared drives using NFS to build a RAC demo on linux x86 machines? If so, could you refer some resource or documents ? If not, Why? Is there conflict between NFS and oracle clusterware.

Thanks!

Tom Kyte
April 27, 2005 - 6:03 pm UTC

vmware -- goto otn.oracle.com/obe, you can download a demo system already built.

clarification

amit poddar, April 28, 2005 - 7:28 am UTC

Hi,

As far as I have tested you don't need multiple nodes to install RAC. RAC can be easily installed on a single node (By RAC i mean multiple instances opening same database). The only thing needed for RAC is a software via which multiple instances can communicate (OCM in 9i and CRS in 10g). I have successfully ran 5 instances on the same node.

So for learning and demonstration purposes (assuming one is trying to learn about RAC and not about hardware clustering) why can't Oracle document a single node install method instead of the VMWARE thing which itself is a huge download along with 200$ for the vmware ?

My question is does a single node install add any learning value in your opinion ?

In my opinion it does in following ways

1. multi node install (using fireware) costs atleast 2000$
(though less but still expensive for many people like me
who don't want to spend that much and would like to use
their existing hardware).
Moreover you have to have time to set up all the
hardware.

2. Oracle's vmware RAC download is about 4G. Sorry but
that means one would have to install whole new
operating system along with vmware (not to
mention 190$ vmware)

3. Multi node/Vmware installs confuses RACing with hardware
clustering. I think single node installs clears the
difference between hardware clusters and RACs, which is
a very important thing to understand in my opinion

4. To learn details about RAC (cache fusion, CRS details)
does not need multi node RAC. And a single node RAC can
be installed on any hardware. Specifically I installed
it on a very old hardware and whitbox linux. So this was
essentially a free RAC, not to mention the whole
installation (5 instance RAC with ASM) took about
15-20 minutes to do it on a Xeon 1GH cpu (256MB RAM).


SPLIT BRAIN SCENARIO IN A NT RAC

JOHN, April 28, 2005 - 11:06 pm UTC

IN A RAC ENVIRONMENT IN OUR SITE WE HAVE FREQUENT 'SPLIT BRAIN' SCENERARIO. ANY IDEA ON HOW TO AVOID THIS PERSISTENT PROBLEM?

Tom Kyte
April 29, 2005 - 8:01 am UTC

please utilize support for this, one would need to understand the situation leading up to the nodes not being able to communicate. You need to find the root cause before any suggestions can be made.

RAC - Instance failover, but what about the single Database

Charanjiv, May 17, 2005 - 9:31 am UTC

Hi
a newbie question - i have been reading up on Oracle RAC, one thing comes to my mind. In RAC there are multiple Oracle Instances pointing to one Database. Now we are secure again instance failure. But what strategies do we need to make sure that the Database server also has failover security. a few come to my mind, RAID, Oracle Failsafe , Standby DB ?? Is this correct

Tom Kyte
May 17, 2005 - 9:46 am UTC

ASM (in place of raid)
data guard for disaster recovery

failsafe -- no, RAC is already clustered, failsafe wouldn't make sense.


You want disks that never go down (mirroring, redundancy)
You want networks/interconnects that never go down (dual NICs, dual switches)
You want power supplies that never fail (two of them in each box, UPS and so on)

You want two of lots of things.

What is split brain?

Jim, May 27, 2005 - 9:31 pm UTC

Tom, What is a split brain in RAC? How does oracle handle it? Thanks.

Tom Kyte
May 28, 2005 - 9:09 am UTC

split brain is a clustering term, when the nodes in a cluster don't fail but their communication path -- the interconnect does. You now have two (or more) independent machines that each think the others are dead. Two brains going.

That is when a failback to another way to talk will take place -- using the shared disk, they can decide which one will live and which shall die.

</code> http://blogs.sun.com/roller/comments/kristien/Weblog/split_brain <code>
was a good talk about it from a clusterware implementation (not rac specific, as it isn't a rac thing, but a cluster thing)

A reader, May 28, 2005 - 12:12 pm UTC

Tom,

We had a weird situation in which Oracle was complaining about "Split Brain" but the OS(Sun) was fine, no issues there..

Had to open a tar with Oracle & Sun, and as always we were told(OSS) to go on to next release of Oracle :-)

Thanks.

Tom Kyte
May 28, 2005 - 12:52 pm UTC

split brain in a cluster environment indicates something was was at the cluster/OS/hardware layer and the interaction with the software running therein.

A reader, May 28, 2005 - 1:20 pm UTC

Tom,

Thanks Tom thats what my understanding of split brain is..

This the reaspm had to involve both Oracle & Sun support as Oracle was mentioning its a h/w issue..

After both teams(Oracle & Sun) were involved it was decided that its an Oracle issue not OS..

Still not clear why Oracle issue?

Thanks..




Tom Kyte
May 28, 2005 - 1:37 pm UTC

I'm not sure why it was decided that way either, I wasn't involved in the decision :)

A reader, May 28, 2005 - 1:59 pm UTC

Tom,

I thought of posting it here, but then dont want to bother with you those ora-600 or whatsoever support related.

Thanks,
Asif.

One More Question about RAC Please..

Drumil Narayan, June 01, 2005 - 9:20 am UTC

hi Tom,

being a student of this school, a simple question please..

if suppose I have two machines with their hardisks on it, no separate diskarry or external hardisk or SAN etc given to me..can I do RAC on it..

if yes, can you point me a link if any please..:)

thanx in advance..Drumil

Tom Kyte
June 01, 2005 - 11:15 am UTC

the disks need to be "shared" somehow. firewire, dual ported, something. so -- are they?

split brain..

reader, June 01, 2005 - 11:32 am UTC

You say, <quote>That is when a failback to another way to talk will take place -- using the shared disk, they can decide which one will live and which shall die. <quote>

Which file on shared disk is used? controlfile? Thanks.


Tom Kyte
June 01, 2005 - 11:51 am UTC

depends -- not control file, quorum files usually (i believe), but in the shared disk system, we know they have access to the shared disk

RAC ON NT

A reader, June 01, 2005 - 12:11 pm UTC

Is someone using rac on nt? I am upgrade from 9i to 10g on nt. Any comments will be appreciated.



J2EE Connection Pool over RAC

Jason, June 20, 2005 - 12:04 pm UTC

Tom,

Our application runs in a J2EE server, using Container Managed Transactions, on a 2-node 9iR2 RAC database. An intermittent problem occurs when we read inserted (and committed) data 1-3 seconds after the commit, it doesn't always appear. The hypothesis I test below is that this happens if the read and write come from different instances. Our database connections come from a J2EE managed connection pool (we can't/don't control which instance the connection is attached to). It seems that our RAC environment suffers from some latency.

The test will require a table:

create table t1 (
data varchar2(20)
);

Are their any Oracle settings that can reduce or eliminate the latency? Do we need to rethink connection pooling (this would work correctly in a single instance environment)? I didn't find anything that addresses this issue in the RAC Concepts guide, can you point me to any documentation that will help us understand this? The Java test case is below.

Thanks,
Jason

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import oracle.jdbc.OracleDriver;

public class TestDao {

public static void main(String[] args) throws Exception {
DriverManager.registerDriver(new OracleDriver());
Connection con1 = null;
Connection con2 = null;
PreparedStatement ps = null;
ResultSet rs = null;
try {
con1 = DriverManager.getConnection("jdbc:oracle:oci:@dev1", "scott", "tiger");
con1.setAutoCommit(false);
con2 = DriverManager.getConnection("jdbc:oracle:oci:@dev2", "scott", "tiger");
ps = con1.prepareStatement("insert into t1 values (?)");

for (int i=0; i != 100; i++) {
ps.setString(1, "xxxx" + i);
ps.executeUpdate();
}
con1.commit();
ps = con2.prepareStatement("select count(*) from t1");
rs = ps.executeQuery();
if (rs.next()) {
System.out.println(rs.getString(1)); // this will be 0 not 100!
}
} catch (SQLException e) {
e.printStackTrace();
}
}
}



Tom Kyte
June 20, 2005 - 1:02 pm UTC

read about max_commit_propagation_delay

Thanks...

Jason, June 20, 2005 - 1:47 pm UTC

Excellent! Thank you for being so helpful. I looked this up in the RAC Administrators guide. The following brings me to one more question.
"The default value (700 hundredths of a second, or seven seconds) is an upper bound that enables the preferred existing high performance mechanism to remain in place. "
If we set this to something lower than 700 (say, 0) what impact would this have on the "high performance mechanism"?

Thanks again Tom.

Tom Kyte
June 20, 2005 - 2:50 pm UTC

it would not be as high performance, it is a trade off.

OK

Raju, June 21, 2005 - 12:39 am UTC

Hi Tom,
How to create multiple instances for a single database??


Tom Kyte
June 21, 2005 - 8:20 am UTC

you run RAC, do you have RAC? if not, you won't be doing that. If you do, just "install it" and run it.

Howard J. Rogers, July 17, 2005 - 6:59 pm UTC

Shameless plug time:

If you want to work your way through the process of getting multiple instances to access a single database, and you don't happen to have a true hardware cluster sitting around with which to play, then the articles here...

</code> http://www.dizwell.com/html/laptop_rac.html <code>

...will certainly talk you through the process. Software installation, hardware configuration, etc etc etc. I like the Windows articles, but there's a Linux version too.

Tom Kyte
July 17, 2005 - 7:04 pm UTC

no worries on the plug, I've high confidence that the pointed to papers actually "work" and have been tested. You are such a script kiddie after all :)

I'm thinking of making some t-shirts with that on it -- I'm just a script kiddie.

SCN generation in RAC

Will, July 18, 2005 - 7:37 am UTC

Hi Tom,

excellent thread, full of useful information.

Can I query to find if the Lamport SCN generation scheme or the "broadcast on commit" scheme is in use? I think broadcast on commit is the same as hardware clock SCN generation: is this correct?

Many thanks
Will






Tom Kyte
July 18, 2005 - 8:17 am UTC

it is recorded in the alert log.

I don't know for the last bit.

connection issue,

sns, August 12, 2005 - 10:13 am UTC

From one of our Linux box, there seems to be an issue to connect to a 4 node RAC database. If I do tnsping it just hangs. However, from another Linux box, it seems to be working fine.

I verified the entry in the tnsnames in both the Linux boxes are exactly same.

What could be the reason? How to find out what happened between the Linux box from where I am connecting to the box where the database resides?

Thanks,

Tom Kyte
August 13, 2005 - 8:52 am UTC

wow, the list is really long. starting with /etc/hosts, dns, $ORACLE_HOME/network/admin/tnsnames.ora, firewalls, ......................


can you ping the instances (no oracle, just ping)

tnsping '(description=.....'
verify you are using the right tnsconnect string (watch out for $TNS_ADMIN)

9i RAC - Single VS Shared $ORACLE_HOME

A reader, August 15, 2005 - 1:57 pm UTC

Tom,Is it recommened to have $ORACLE_HOME on each node (Single home) or Shrared $ORACLE_HOME?. What is the best practice?.
Thansk.

Tom Kyte
August 15, 2005 - 10:23 pm UTC

It depends on your requirements and your needs. There are pros and cons of each.

Script kiddie

Connor, August 17, 2005 - 10:16 am UTC

"Warning: script bitch" has more impact :-)

For the last poster, www.polyserve.com has some good white papers on shared/separate Oracle software

Installation od RAC on Linux and VMware requests License key

Joe, August 26, 2005 - 10:48 am UTC

"(I've actually got a RAC cluster running on my laptop :) vmware rocks.

</code> http://www.oracle.com/technology/tech/linux/vmware/index.html <code>

If you've got the bandwidth to download it -- you get a two node rac cluster
running on linux with ASM and OCFS all virtualized in a vmware environment (90
day trial license of vmware included)."

Hi Tom:

I tried to install RAC on Linux and VMware on my desktop, but a popup window requested the evaluation key being entered. I went to VMware and sent them a email to get the key for VMware workstation 4.5.2, but had no response from them. Could you help?

Thanks in advace.

Joe

Tom Kyte
August 26, 2005 - 1:53 pm UTC

you don't need a key as far as I know, you'd need a key to turn on the vmware after the 90 day trial is up, that is all.

voting file in 9iR2 RAC

David, September 04, 2005 - 10:03 pm UTC

From, </code> http://download-west.oracle.com/docs/cd/B10501_01/rac.920/a96600/cfgdsk.htm#1004591 <code>

<quote>On Windows NT and Windows 2000, SRVM uses the srvcfg raw partition as the configuration repository and as the Cluster Manager voting disk.<quote>

What about on Unix/Linux platform? Do we have to create a voting file in addition to SRVM config file? Is this voting file required to resolve *split brain* issue? Please clarify. Thanks. You are the best!


voting file

David, September 05, 2005 - 9:25 pm UTC

may be I did not make my question clear regarding above. The doc talks about voting file for Windows. What about on Linux/Unix? My understanding is that voting file is used for emergency situations like interconnect failure, to resolve "split brain" syndrome. How is it resolved on Unix/linux platform without a voting file? Thanks.

oracle 10g RAC

reader, September 21, 2005 - 6:24 am UTC

Tom,

Like to know the following:

Envt. is: oracle 10.1.0.2 on Sun Solaris 8.

- How many nodes oracle 10g RAC will support on sun solaris platform?
- What is the minimum nodewise configuration requirement like no. of CPUs, memory for oracle 10g RAC to work?
- Any reference site/case study in evidence of multinode (more than 2 nodes)support by oracle 10g RAC. Kindly provide the source of information on this.

Regards

Tom Kyte
September 21, 2005 - 7:17 pm UTC

see www.oracle.com for references and case studies.

you need 2 computers with 1 cpu each at least to do a multi-node rac, although you could do it with 1 computer and 1 cpu but it would not make much sense.

I believe 16 or 64 - support would be best for that - but in 10gr2, we are certifying upto 100 nodes (given the OS can support it)

RAC

reader, September 26, 2005 - 7:21 am UTC

Tom,

For 10g RAC, the nodes should be of same configuration or different. Can we have 8cpu 16GB ram in one box and 4cpu and 8gb ram in another box, operating system being same?

Thanks and Regards

Tom Kyte
September 26, 2005 - 9:28 am UTC

they can be different, but that would not be recommended. You'd have to set additional parameters to tell us how to use the boxes (less manageable)

RAC concepts,

sns, September 26, 2005 - 1:46 pm UTC

I was going through oracle document regarding MONITORING RAC performance.

I came across this:

Use the V$SESSION_WAIT view to identify objects that have performance issues. Columns P1 and P2 identify the file and block number of the object as in the following example queries:

SELECT P1 FILE_NUMBER, P2 BLOCK_NUMBER FROM V$SESSION_WAIT WHERE EVENT = 'BUFFER
BUSY GLOBAL CR';


The output from this first query may look like this:

FILE_NUMBER BLOCK_NUMBER
----------- ------------
12 3841


If you then issue the query:

SELECT OWNER,SEGMENT_NAME,SEGMENT_TYPE FROM DBA_EXTENTS WHERE FILE_ID = 12 AND
3841 BETWEEN BLOCK_ID AND BLOCK_ID+BLOCKS-1;


Then the output would be similar to:

OWNER SEGMENT_NAME SEGMENT_TYPE
---------- ---------------------------- ---------------
SCOTT W_ID_I INDEX PARTITION



Is the meaning behind the value of P1 and P2 under v$session_wait differes for each event?

In the above example the value of P1 and P2 is signified as FILE_NUMBER and BLOCK_NUMBER. Is this applicable only for 'BUFFER BUSY GLOBAL CR'?

Do you have information about the meaning behind P1 and P2 values for other wait events?

Thanks,


RAC 9i to RAC 10gR2

Rodrigo, September 27, 2005 - 7:13 am UTC

Hi,
We want migrate Rac92 to Rac 10gR2 but muy Cluster use SunCluster 3.0 and ORACLE only certify SunCluster 3.1 this final or in the future Oracle will certify SunCluster 3.0 ?

Thank a lot,
Rodrigo

Tom Kyte
September 27, 2005 - 11:34 am UTC

you'll want to work with support on certification issues, I do not work with every single port/version combination.

db file scattered read

Anil, September 29, 2005 - 10:14 pm UTC

Tom,

given the tkprf report of a query which is taking almost an hour to execute on oracle 9i rac configured
could u please help me on which areas i should concentrate on
select count(*)
from
v4


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 2.26 2.27 0 0 0 0
Execute 1 0.05 0.89 0 30 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 2.31 3.16 0 30 0 0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 42

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
enqueue 16 0.00 0.01
DFS lock handle 116 0.50 0.78
KJC: Wait for msg sends to complete 12 0.04 0.48
process startup 4 0.18 0.70
PX Deq: Join ACK 6 0.00 0.00
PX Deq Credit: send blkd 99 0.23 1.57
PX Deq: Parse Reply 6 0.09 0.13
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.00 0.00
PX Deq: Execute Reply 352 2.00 56.87
db file scattered read 140826 0.16 576.13
global cache cr request 1108692 3646.44 7157.20
db file sequential read 861 0.07 0.96
db file parallel read 204 0.02 0.86
PX Deq Credit: need buffer 99 0.10 2.59
PX Deq: reap credit 89 0.00 0.00
IPC send completion sync 1 0.01 0.01


thanks
Anil


Tom Kyte
September 30, 2005 - 8:55 am UTC

that query takes 3 seconds ?

not sure what I'm looking here

RAC design

Dave B, October 27, 2005 - 12:12 pm UTC

Tom,

We have a third party application running on Oracle 8i that we're looking to upgrade to 10gR2. At the moment we're at the technical design stage, but the currently favoured option is to use RAC for availibility. Some questions:

1. The app is very poorly behaved - lots of hard parsing. I need to use cursor_sharing/session_cached_cursors to get round this (its a packaged solution). Any issues with RAC and this?

2. A lot of queries join two tables (using an index), but allow additional where criteria to be supplied for each table. The optimizer usually opts for a FTS of at least one of the tables. Any issues with this scaling over 4 nodes - i.e. lots of scanning of remote buffer caches?

3. The current 8i database uses a shared nothing stretched cluster (nodes separated by 8km). One proposed option is to use a stretched RAC with data duplication by Veritas Storage Foundation for RAC. Any gotchas (performance, split brain) here?

4. If we end up with a two node stretched RAC, are then any issues with adding a third node?

5. The app sets some session states (i.e. alter session set current_schema). Can we use a database logon trigger to replicate this in the event of TAF failover?

Thanks!

Tom Kyte
October 27, 2005 - 1:29 pm UTC

1) things that don't scale on a single node won't scale on multiple nodes.

There are no special issues with rac and cursor sharing, but having to use cursor sharing means you should be considering another application vendor as much as moving to 10gr2. For it is very unlikely that they will support you on 10gr2 rac - if they haven't been willing to fix their bugs (lack of binds)

2) could be, if the objects are frequently updated as well. Seems you might be missing additional indexes needed for performance on these tables?

3) haven't worked with that environment myself, do not feel qualified to respond. Seems you are serious about availability, willing to spend on that.... but you buy applications that are poorly designed? and live with them???

4) see #3

5) you might not want to use a TAF failover as you cannot be sure that they are not reliant on other "state" stuff - could be tricky. The final answer as to whether it can work with their application will be "test it"




I am skeptical of this working without the vendor helping and supporting the configuration - I am skeptical they will do that, given the lack of binds for such a long long long time.

Sad but True

David B, October 31, 2005 - 6:02 am UTC

The application is one that is "database independent". It's been reduced to a set of tables, indexes and simple single row dynamic SQL that don't make use of any feature that is post Oracle 5.

The package is one of the market leaders in its sector which is desperately sad as in many respects it doesn't deserve to make the vast profit that it does.

Anyway...

Could I simulate a TAF failover on a single instance (using an appropriate tnsnames.ora and database shutdown/startup)?


Tom Kyte
November 01, 2005 - 3:13 am UTC

hmm, taf failover on a single instance, it is documented as a supported configuration:
</code> http://docs.oracle.com/docs/cd/B10501_01/network.920/a96580/advcfg.htm#478706 <code>


Sequences and RAC

Juan Velez, November 10, 2005 - 5:47 pm UTC

I have a problem with a non-ordered-cycled-cached sequence on a 2-node RAC environment. As I understand an instance gets another set of cached values either the first time that it calls nextval from the sequence or when the cached values are exhausted. In our case, neither the cache values are being exhausted nor the instance is shutted down and restarted. Any idea what could it be causing this behaviour? Unfortunately losing numbers may at the end be a problem, because the sequence is part of a unique identifier (cycled sequence) that is used as key for partitioning.

Tom Kyte
November 11, 2005 - 12:04 pm UTC

you will always - repeat ALWAYS - as in "all of the time" lose sequence numbers. Many things will cause it. sequences are never gap free.




TAF and After Logon trigger

David Broomfield, November 18, 2005 - 4:55 am UTC

I'm currently looking into RAC with a view to moving a third party package onto it (don't have a test system at the moment). The tables are picked up via an ALTER SESSION SET CURRENT_SCHEMA command that is issued by the application immediately after logon.

I understand from the documentation that TAF will not carry this session setting forward to a new instance following a failure.

1. Can I use an AFTER LOGON trigger to accomplish the same thing? - I don't know whether the trigger will fire under TAF control.
2. If the trigger does fire, can I query something (sys_context(), v$session..) to find out if the logon is a "real" logon or whether it is a TAF logon?

Tom Kyte
November 18, 2005 - 2:59 pm UTC

The trigger should be firing and you have access to whatever you want from v$ views to setup the environment (v$session.program for example, not perfect - as it is just the name of the binary and so I can "spoof" it - but you are not doing this for security)


HOWEVER, TAF is only "a little transparent", the outstanding write transaction if any will be wiped out - the application will get an error. It is doubtful that a 3rd party application will be transparently failed over - it'll likely have a "burp" or two on the way over that will be exposed to the end user. Also, they might be reliant on other "state" items in their session - this is something you'll want vendor support with testing and implementing (will they even support you in this environment...)

A reader, December 14, 2005 - 5:15 pm UTC

Tom,

May be i read this but i am not able to find it if we are using third party clusterware ex veritas or sun regardless we need to install CRS to integrate with third party clusterware is that correct?

Thanks.

Tom Kyte
December 14, 2005 - 8:28 pm UTC

you need to install crs regardless, yes.

Slow connection time when one node in RAC is down

Luong Tran, December 26, 2005 - 6:10 am UTC

Hi Tom,

We are using Oracle 9i RAC in Windows 2003 Server. Normally, it takes just less than 1 second when we connect to the database using SQL Plus. However when one node is down, the connection time is always about 21, 22s.

The instance is in SHARED SERVER mode.

Could you give us any hints on this?

Thanks,
Luong.

Tom Kyte
December 26, 2005 - 10:54 am UTC

need more information about the setup, where is the listener, what method are you using for name resolution, what does the connect entry look like, and you have opened a tar right? (this is something support would be able to help you with pretty easily)

Long connection time when one node is down

Luong Tran, December 26, 2005 - 9:39 pm UTC

Hi Tom,

I couldn't access MetaLink for several days. Seems that there are some problems after the recent upgrade.

Below are the Listener.ora and Tnsnames.ora files.

************LISTENER.ORA******************************
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = vsbhodbs01)(PORT = 1521))
)
)
)

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = C:\Oracle\OraHome92)
(PROGRAM = extproc)
)
(SID_DESC =
(ORACLE_HOME = C:\Oracle\OraHome92)
(SID_NAME = SBANK1)
)
)
******************END OF LISTENER.ORA*******************

******************TNSNAMES.ORA**************************
SBANK.VIETTHAI.COM.VN =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.100.45)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.100.46)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = SBANK)
(FAILOVER_MODE =
(TYPE = select)
(METHOD = basic)
)
)
)

*********************END OF TNSNAMES.ORA*****************

Could you pls help?

Thanks,
Luong.

Tom Kyte
December 27, 2005 - 9:29 am UTC

metalink has been fine (and pretty snappy) the last couple of days - in particular due to the light load of the holiday - are you sure about that?

You haven't given me the topology here - where is that listener, how are the registrations taking place, is TCP taking a really really long time to time out on a failure (eg: if .45 goes down - how long is tcp taking to figure out "no one there to connect to")

and get your tar open please.

Long connection time when one node is down

Luong Tran, December 27, 2005 - 12:03 am UTC

Hi Tom,

I've just read the book "Oracle Real Application Clusters"
by Murali Vallath, which says BACKUP parameter is strongly recommended when using BASIC method of FAILOVER, otherwise reconnection may first attempt the instance that has just failed, adding additional delay until the client reconnects.

So could I try as the followings:

SBANK.VIETTHAI.COM.VN =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.100.45)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.100.46)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = SBANK)
(FAILOVER_MODE =
(backup = Backup_net_service_name) #Add backup parameter
(TYPE = select)
(METHOD = basic)
)
)
)

#The backup net service name is added below
Backup_net_service_name =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.100.46)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = SBANK)
(FAILOVER_MODE =
(TYPE = select)
(METHOD = basic)
)
)
)

Thanks,
Luong


Dave, December 27, 2005 - 9:39 am UTC

As tom said I bet it is network timeout settings, you client machine is waiting before it gives up.

So any new connection or failover which tries to access the dead node simply wait until it times out

in linux you would change the /proc/sys/net/ipv4 ( I think) settings to a couple of settings. No idea what the windos equivalent settings are

LOT OF "global cache cr request " IN STATSPACK REPORT

iceinwater, February 08, 2006 - 7:20 am UTC

HI,TOM:

MY ORACLE IS 8163,OPS.IN AIX 4.3

IN MY STATSPACK,I found that:

op 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (cs) Wt Time
-------------------------------------------- ------------ ------------ -------
global cache cr request 20,024 9,140 51.94
db file scattered read 9,380 2,140 12.16
buffer busy due to global cache 170 1,387 7.88
control file parallel write 451 820 4.66
latch free 18,061 773 4.39
-------------------------------------------------------------


it have lot of "global cache cr request ",HOW TO FIXED it?THANKS!

Archive log destination on RAC

A reader, February 16, 2006 - 6:59 am UTC

Hello Tom,

I am turning on archive log for a RAC 10G on Linux database that will go live soon. Due to space constraint, we are thinking of pointing archive_log_dest to a NFS mounted partition.
Is it OK to have NFS mounted partition for archive logs in RAC ?. We have two node RAC and in both the instance the archive_log_dest would be pointing to same NFS partition. We would be using log format as log_%t_%s_%r.arc. Also, we are using ASM for datafiles.
I went through documentation but haven't got any clear idea as to if its OK or not.
Thanks for your help.


Tom Kyte
February 16, 2006 - 12:01 pm UTC

why are you running RAC? Please don't say "for high availability" (HA), because you haven't thought this through all of the way.

using NFS for archives would not be wise. Think about it. You have a HA cluster with a huge single point of failure external from it - this NFS mount.


No, this would not be a wise thing to do.

RE: Archive log destination on RAC

Allen Shatzer, February 16, 2006 - 7:18 pm UTC

Tom,

I usually am 100% in agreement with what you say, but here I have to disagree, at least to a degree. If the NFS mount is on an Oracle certified storage solution such as EMC or NetApp, with high level of redundancy (RAID or mirroring) and a high degree of backup (e.g. snapshots on NetApp), then an NFS mount is not necessarily a "bad" thing. NFS IS a supported means of connecting to shared storage on LINUX with a certified storage solution. Yes, a single point of failure, but you need the all of the archive logs from all nodes to be visible to all nodes in case of a failure, at least according to the documentation I have read.

Now, an NFS mount from a local disk on a server, yes, a very bad idea, indeed.

Tom Kyte
February 17, 2006 - 1:29 pm UTC

I was not assuming that - I was assuming NFS.

If this is a netapp - no problem.

That is certified NAS, that was not mentioned at all. I'm figuring

...Due to space constraint, we are thinking of pointing archive_log_dest to a
NFS mounted partition. ....

was NOT NAS.

archive log disk

Charlie Zhu, February 17, 2006 - 3:10 pm UTC

We have been on 4 nodes 10.1.0.4 RAC on Linux for 10 months.

For local archive logs:

Local Disk is good.

NFS is not good
NAS need to apply some patch.

archive log on Shared Disks:
OCFS 1.0 is not good (not test OCFS 2.0)
ASM, tested with TEST db, OK; not test with production db.

Hope these information helpful.


Oracle 10g RAC codumentation

A reader, February 19, 2006 - 3:39 pm UTC

Hi

There are thse documents in 9i RAC

Real Application Clusters Administration Contents
Real Application Clusters Concepts Contents
Real Application Clusters Deployment and Performance Contents
Real Application Clusters Real Application Clusters Guard I - Concepts and Administration Contents
Real Application Clusters Setup and Configuration


In 10g we only have

Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide

In what 10g doc is RAC Explained? In concepts guide there is a very brief overview.

In 9i RAC ceoncepts guide there is a chapter which discuss about how cache fusion works. In 10g that disappeared. Does it mean there is no change between 9i and 10g?



Tom Kyte
February 19, 2006 - 5:29 pm UTC

RAC was "mainstreamed", it is documented not as something "outside of the box and special" but rather in with the rest of the documentation.



10g cache fusion

A reader, February 20, 2006 - 2:32 pm UTC

Hi

Where can I find about Cache Fusion principles in 10g R2 Documentation?

Hi Dave

A reader, February 21, 2006 - 9:58 am UTC

Hi Dave

I mean something like this but for 10g

</code> http://download-west.oracle.com/docs/cd/B10501_01/rac.920/a96597/pslkgdtl.htm#18836 <code>

Tom Kyte
February 22, 2006 - 7:53 am UTC

interleaved throughout the documentation. "mainstreamed", not treated as extraordinary nor unusual.

real application clusters

A reader, February 24, 2006 - 5:59 pm UTC

In RAC, there are multiple instances or oracle
on different computers accessing the same database.

Does that mean that in RAC there is only one physical
database on anyone of those computers, i.e. database is not copied on different computers?

If this is true, then the availability will be effected
if media fails on which the physical
database.





Tom Kyte
February 24, 2006 - 6:39 pm UTC

RAC is to computers what RAID levels that provide protection from failures is to disk.


it is called redundancy.

With disks, you can make them redundant to the degree that the odds they "disappear" is very small.

Same with your network.

And now the same with your computer.

Cache fusion

A reader, March 29, 2006 - 11:56 am UTC

Tom,
Going through the cache fusion process in a RAC environment, I have the following question. If I am migrating from a single instance to a 4 node RAC, how much SGA should I configure for each instance? Should I configure each instance's SGA to be same as that of single instance or 1/4 of the single instance? This is assuming that I will have to add some percentage to account for RAC processes. The instances will use automatic shared memory management and load balancing features. We are on 10gr2.

Thanks


Tom Kyte
March 29, 2006 - 12:39 pm UTC

look at the user community that will be accessing each node - size appropriately. Pretend they are single instances for SGA sizing purposes.

If by going to a 4 way, you have 1/4th the users accesing 1/4th the data - it could be "1/4th"

If by going to a 4 way, you have 1/4th the users accessing the SAME amount of data - it could be "1"

Where to See the Statspack report in case of RAC

amar, March 30, 2006 - 6:11 am UTC

Tom,
I have tried to search this alot.

If I have a 2 node rac..where should I see the statspack report as I have observed that some of the snap id's are generated at 1 node and some of them at another node..when I genreate the statspack report for a given period of time.

Any pointers would be great.

Regards

Tom Kyte
March 31, 2006 - 11:20 am UTC

statspack would need to be executed on each node, it is an "instance" tool.

rac

neeraj, April 19, 2006 - 5:35 pm UTC

Tom,
I have 1 cluster -> 4 nodes configuration on 9i RAC

I want to use 2 node [ a1 and a2 for oltp stuff] and other 2 [a3 and a4 for batch ] 24*7 .
I dont want to spilt my cluster.
Is there any way can i do this?
Thanks!

Tom Kyte
April 19, 2006 - 5:38 pm UTC

just have instance a1 and a2 register as service "oltp" and a3 and a4 register as service "batch"

then connect to oltp or to batch - no need to "split" really.

rac

neeraj, April 20, 2006 - 12:32 pm UTC

Do you mean in init.ora for a1 and a2 I should add service_name entry and then use same service_name is tnsnames.ora?

AUCSX_oltp=
(DESCRIPTION=
(LOAD_BALANCE=ON)
(FAILOVER=ON)
(ADDRESS=(PROTOCOL=TCP)(HOST=dbrac01)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=dbrac02)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=AUCSX_OLTP)
(FAILOVER_MODE=
(TYPE=SELECT)
(METHOD=BASIC)
)
)
)

Tom Kyte
April 20, 2006 - 12:39 pm UTC

yup

10g Control File Single Point of Failure...

Enzo Medici, April 27, 2006 - 2:31 pm UTC


In 9i, if you delete one control file on a running database or it become corrupt, the database will halt, even though you may have multiplexed and have two additional files. It should just keep going as it has two more control files. I understand your response earlier in this thread of "they are the eyes. they are important. a failed write shuts down. "

But a database should not shutdown if it has two more control files. That's just bad design and against the purpose of multiplexing in the first place.

Is this still the case in Oracle 10g?

If so, that makes RAC have a single point of failure. I'm concerned as we lost a single control file recently and the database stopped, but we had two more control files so we could easily restart the db.

However, I db should not crash in this situation. Just ignore the bad or missing file and keep going.

Tom Kyte
April 27, 2006 - 3:41 pm UTC

I cannot change the way things work really - but you can.

You can file an enhancement request via metalink (a tar). You supply a business case and the developers get it. It means a lot more coming from you than me (really it does).

A reader, April 27, 2006 - 4:06 pm UTC

Agree with both Enzo Medici and Tom.

I think I will do so...

Enzo, April 27, 2006 - 6:15 pm UTC


Sorry, I didn't have 10g to do a test, but I believe you confirmed what I was asking.

Yes, one of our dbas wanted to multiplex control files on a dev database as they were all in /u01/.. so they shutdown the server, changed the init.ora file, but during the process was in the wrong terminal window when they move the first control file so they move the control file of the production db instead. Doh.

I hadn't even realized this was a single point of failure since I have never seen that issue before and I have always have control files multiplexed. Learn something new everyday.

That blows RAC HA claims out of the water because it is single point of failure, so seems like
Oracle would fix this asap. We will file a tar.

Thanks Tom.


sga size

jas, April 30, 2006 - 2:45 am UTC

Hi Tom,

I have Red hat linux and it is 4 nodes Oracle 9.2.0.6 RAC.
each node has 6GB memory and 2 cpu's.
I am having performance issues with my site (ASP).
I want to discuss with you following:
*._b_tree_bitmap_plans=FALSE
*._trace_files_public=true
*.aq_tm_processes=1
*.archive_lag_target=1200
*.background_dump_dest='/home/app/oracle/admin/MERP/bdump'
*.cluster_database_instances=4
*.cluster_database=TRUE
*.compatible='9.2.0.0.0'
*.control_file_record_keep_time=14
*.control_files='/export/u01/oradata/MERP/control01.ctl','/export/u01/oradata/MERP/control02.ctl'
*.core_dump_dest='/home/app/oracle/admin/MERP/cdump'
*.db_block_size=8192
*.db_cache_size=1024M
*.db_domain='merp.com'
*.db_file_multiblock_read_count=16
*.db_name='MERP'
*.dg_broker_start=FALSE
*.fal_client='MERP_STDBY1'
*.fal_server='MERP_PRIME1','MERP_PRIME2','MERP_PRIME3','MERP_PRIME4'
*.fast_start_mttr_target=300
*.hash_join_enabled=TRUE
MERP1.instance_name='MERP1'
MERP2.instance_name='MERP2'
MERP3.instance_name='MERP3'
MERP4.instance_name='MERP4'
MERP1.instance_number=1
MERP2.instance_number=2
MERP3.instance_number=3
MERP4.instance_number=4
*.java_pool_size=2000000
*.job_queue_processes=2
*.large_pool_size=2000000
MERP1.local_listener='LISTENER_MERP1'
MERP2.local_listener='LISTENER_MERP2'
MERP3.local_listener='LISTENER_MERP3'
MERP4.local_listener='LISTENER_MERP4'
*.log_archive_dest_1='LOCATION=/export/ext01/oraarch/MERP'
*.log_archive_dest_10='LOCATION=/export/u10/oraarch/MERP optional'
*.log_archive_dest_2='service=MERP_STDBY1 optional reopen=60'
*.log_archive_dest_3='service=MERP_STDBYB1 optional reopen=60'
*.log_archive_dest_state_1='ENABLE'
*.log_archive_dest_state_10='DEFER'
*.log_archive_dest_state_2='ENABLE'
*.log_archive_dest_state_3='ENABLE'
*.log_archive_format='rdo_%t_%s.arc'
*.log_archive_start=true
*.log_buffer=2000000
*.max_commit_propagation_delay=0
*.open_cursors=300
*.optimizer_index_cost_adj=100
*.pga_aggregate_target=419430400
*.processes=300
*.query_rewrite_enabled='TRUE'
*.remote_listener='LISTENERS_MERP'
*.remote_login_passwordfile='exclusive'
*.shared_pool_size=500M
*.sort_area_size=1000000
*.standby_archive_dest='/export/ext01/oraarch/MERP'
*.standby_file_management='auto'
*.star_transformation_enabled='TEMP_DISABLE'
MERP1.thread=1
MERP2.thread=2
MERP3.thread=3
MERP4.thread=4
*.timed_statistics=TRUE
*.undo_management='AUTO'
MERP2.undo_retention=3000
MERP4.undo_retention=3000
*.undo_retention=3600
MERP1.undo_tablespace='UNDOTBS1'
MERP2.undo_tablespace='UNDOTBS2'
MERP3.undo_tablespace='UNDOTBS3'
MERP4.undo_tablespace='UNDOTBS4'
*.user_dump_dest='/home/app/oracle/admin/MERP/udump'
*.utl_file_dir='/home/app/oracle/admin/MERP/utl'

sga has max size of 1.7GB :(
db_cache_size is 1GB
Sharepool 500
PGA is only 400MB and we have sort area also.
Now
what I feel
1.PGA should be around 1GB. What do you think?
2. There should not be any sort_Area_size parameter.right?
I read metalink note to increase SGA to 2.4GB so I wil wotk on that but do you above 2 points makes sense?

Tom Kyte
April 30, 2006 - 5:03 am UTC

1) I have heard nothing nor seen any supporting information given by you to support this. The only reasonable response is "it depends"

for you say all you have said is "I am slow", you have not said "why I am slow", "this is what is slow", "we are slow because", "I have observed this and based on that I think we are slow because".


2) it does not HURT to have it and if you are using shared server in 9i, you sort of do want it to be set.



You have given us no information that says "hey we need to use memory more". Your perception of slowness may very well have NOTHING whatsoever to do with the amount of ram you are or are not using.

A reader, April 30, 2006 - 12:46 pm UTC

This is for "jas" if you can refer this book

"Oracle Wait Interface: A Practical Guide to Performance Diagnostics & Tuning"

will be very helpful.



rac

jas, April 30, 2006 - 1:23 pm UTC

"A reader" why are you sugesting me wait interface book :D
I did not mention anything about waits in last comment..Did i? ;)


Library cache output
NAMESPACE GETS GETHITS GETHITRATIO PINS PINHITS PINHITRATIO RELOADS INVALIDATIONS
--------------- ---------- ---------- ----------- ---------- ---------- ----------- ---------- -----
SQL AREA 41540485 41228496 .99248952 1447548969 1446892988 .999546833 38173 141470
TABLE/PROCEDURE 23019817 22976124 .99810194 21905129 21731546 .992075691 20100 0
BODY 169693 168582 .993452883 169797 168295 .991154143 220 0
TRIGGER 1298748 1297826 .999290086 1298756 1297712 .999196154 100 0
INDEX 485 312 .643298969 389 142 .36503856 49 0
CLUSTER 7465 7378 .988345613 9569 9401 .982443307 0 0
OBJECT 0 0 1 0 0 1 0 0
PIPE 0 0 1 0 0 1 0 0
JAVA SOURCE 0 0 1 0 0 1 0 0
JAVA RESOURCE 0 0 1 0 0 1 0 0
JAVA DATA 0 0 1 0 0 1 0 0

My question was
1. Do we need to increase PGA ?

Tom Kyte
April 30, 2006 - 2:11 pm UTC

he is recommending a book that will help you figure out "where I spend my time"

so that you do not post init.ora's and say "I think my pga should be X, what do you think"



It is unfortunate you did not mention anything about waits in your last post - that is not a good thing.


Answer to your question:

maybe - unfortunately, you provide nothing that would help anyone say

a) yes
b) no
c) not sure


(for example, the query output you just posted - that has to do with SGA memory, not PGA even.....)


tell you what, does statspack tell you you need more pga


rac

jas, May 01, 2006 - 12:32 pm UTC

This PGA report. I hope you will suggest something

PGA Aggr Target Stats for DB: MERCP Instance: MERCP3 Snaps: 31637 -31676
-> B: Begin snap E: End snap (rows dentified with B or E contain data
which is absolute i.e. not diffed over the interval)
-> PGA cache hit % - percentage of W/A (WorkArea) data processed only in-memory
-> Auto PGA Target - actual workarea memory target
-> W/A PGA Used - amount of memory used for all Workareas (manual + auto)
-> %PGA W/A Mem - percentage of PGA memory allocated to workareas
-> %Auto W/A Mem - percentage of workarea memory controlled by Auto Mem Mgmt
-> %Man W/A Mem - percentage of workarea memory under manual control

PGA Cache Hit % W/A MB Processed Extra W/A MB Read/Written
--------------- ---------------- -------------------------
75.1 4,281 1,420

%PGA %Auto %Man
PGA Aggr Auto PGA PGA Mem W/A PGA W/A W/A W/A Global Mem
Target(M) Target(M) Alloc(M) Used(M) Mem Mem Mem Bound(K)
- --------- --------- ---------- ---------- ------ ------ ------ ----------
B 400 321 57.4 0.0 .0 .0 .0 20,480
E 400 319 62.2 0.0 .0 100.0 .0 20,480
-------------------------------------------------------------

PGA Aggr Target Histogram for DB: MERCP Instance: MERCP3 Snaps: 31637 -31676
-> Optimal Executions are purely in-memory operations

Low High
Optimal Optimal Total Execs Optimal Execs 1-Pass Execs M-Pass Execs
------- ------- -------------- ------------- ------------ ------------
8K 16K 49,882 49,882 0 0
16K 32K 1,548 1,548 0 0
32K 64K 160 160 0 0
64K 128K 351 351 0 0
128K 256K 144 144 0 0
256K 512K 80 80 0 0
512K 1024K 795 795 0 0
1M 2M 105 105 0 0
2M 4M 99 99 0 0
4M 8M 154 154 0 0
8M 16M 22 22 0 0
32M 64M 4 0 4 0
64M 128M 2 0 2 0
128M 256M 8 0 8 0
-------------------------------------------------------------

PGA Memory Advisory for DB: MERCP Instance: MERCP3 End Snap: 31676
-> When using Auto Memory Mgmt, minimally choose a pga_aggregate_target value
where Estd PGA Overalloc Count is 0

Estd Extra Estd PGA Estd PGA
PGA Target Size W/A MB W/A MB Read/ Cache Overalloc
Est (MB) Factr Processed Written to Disk Hit % Count
---------- ------- ---------------- ---------------- -------- ----------
50 0.1 2,428,355.6 756,735.1 76.0 29,175
100 0.3 2,428,355.6 458,181.1 84.0 8,119
200 0.5 2,428,355.6 86,083.3 97.0 0
300 0.8 2,428,355.6 81,758.3 97.0 0
400 1.0 2,428,355.6 68,402.2 97.0 0
480 1.2 2,428,355.6 60,246.8 98.0 0
560 1.4 2,428,355.6 60,047.7 98.0 0
640 1.6 2,428,355.6 60,047.7 98.0 0
720 1.8 2,428,355.6 59,588.0 98.0 0
800 2.0 2,428,355.6 57,451.0 98.0 0
1,200 3.0 2,428,355.6 44,928.2 98.0 0
1,600 4.0 2,428,355.6 44,503.3 98.0 0
2,400 6.0 2,428,355.6 44,414.8 98.0 0
3,200 8.0 2,428,355.6 44,414.8 98.0 0
-------------------------------------------------------------

Tom Kyte
May 02, 2006 - 3:02 am UTC

well, this is basically saying "if you increase the pga aggregate target from the current 400 MB to 480 - you'll get a 1% increase in 'memory' operations, if you increase it to 3.2 GB, you'll get the same sort of increase"


This does not really look like the "pga" is your low hanging fruit. Did you read the metrics here - do you understand what they mean?

basically you had:

32M 64M 4 0 4 0
64M 128M 2 0 2 0
128M 256M 8 0 8 0

14 "big" sort/hash type operations, out of well over 50,000 such operations, that needs to go to disk in a one pass operation.

Unless these 14 operations are your low hanging fruit (very doubtful I think), then this is the wrong place for you to be looking.

RAC

jas, May 02, 2006 - 12:47 pm UTC

Thanks! :D

I did figure last night so now I am here with more details ;)

Top 5 Timed Events
~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 7,529 30.95
db file scattered read 1,372,401 5,101 20.97
db file sequential read 1,361,363 4,483 18.43
global cache cr request 5,686,071 2,662 10.94
control file sequential read 1,719,304 1,131 4.65

Please help me or give me some hint to resolve top 5 wait events.
Redh hat 2.1 , 9.2.0.6 with 4 instances

Tom Kyte
May 02, 2006 - 3:59 pm UTC

one very valid option is "ignore them", that makes them in effect go away.

but, umm, not sure where to go with this.

the top 5 section, I love to make fun of this part in my talks. I can come to ANY conclusion based on these "details" which are in fact NOT DETAILS AT ALL but an aggregation of lots and lots and lots of details.

I guess we could say "get faster disk, make IO go faster"

or "tune your sql to make IO happen less often"

or "increase buffer cache, maybe that'll do it"

in short, these are JUST NUMBERS.

do you know how to trace an application - do that with your MOST IMPORTANT ONE and lets start there.

This stuff isn't very useful - it is very much aggregated over everything. Focus on the THING you want to make "faster" and lets start there

Does RAC support different platform on the nodes

Thierry Yen Suin, May 17, 2006 - 8:26 am UTC

Hi Tom,
Don't want to pollute this thread with a newbie question but, can I mix and match different hardware/OS on the different nodes in a RAC implementation?
Like HP/UX on one node and Solaris on another node?
Thanks,
Thierry.

Tom Kyte
May 17, 2006 - 9:29 pm UTC

RAC is homogenous - must be same hardware type, same OS.

A reader, May 18, 2006 - 1:14 am UTC

Tom,

We are running 9.2.0.6 RAC (2 node) on Sun Cluster we would be upgrading soon to 10.2.0.2 and now we are debating as to use CRS as our clustering software or integrate it with Sun Cluster. The reason for this as CRS is Oracle's product it integrates well with Database and so far on other production environments (RAC-Linux) 10.2.0.1 we've had no issues. Please let us know your valuable feedback.

Thanks.


Tom Kyte
May 18, 2006 - 10:53 am UTC

It would certainly cost less to use the supplied functionality as you would not need the additional software. It would likely "install much easier" as you don't have to integrate 3 or 4 bits of software from 3 or 4 vendors.

A reader, May 18, 2006 - 12:47 pm UTC

Thanks.

RAC recovery

mal, May 18, 2006 - 2:09 pm UTC

Tom,

It think it will be a good one. Can you please suggest me an wasiest way to find out archivelog required from different thread and automating apply those archive logs while recovering?

Global application context support in Real Application Clusters RAC?

Mihail Daskalov, May 25, 2006 - 3:03 pm UTC

Hi Tom,
thanks for your support of the community.
I would like to know whether Global application context is supported in Real Application Clusters. I found out that it doesn't work, but I would like to know whether it SHOULD work? This is a question that was asked in the thread but never commented by you so far.

Here is the case: We have an application server which has a connection pool. After an application user is authenticated an application session is assigned and some variables are set to some values.
e.g.
procedure init(p_client_session_id in number, p_my_var_value in varchar2) IS
BEGIN
DBMS_SESSION.set_context(namespace => 'MSH_CONTEXT',attribute => 'my_var',value => p_my_var_value
,client_id => p_client_session_id);
END;

Then every other action through the appllication server first does
DBMS_SESSION.set_identifier(client_id => p_client_session_id);

and then all calls to SYS_CONTEXT('MSH_CONTEXT', 'my_var') return the value set to my_var for the current p_client_session.

This works perfectly with single instance Oracle Database (version 10.2.0.2). But when we tested this on RAC we found out that the values of the variables in the context are accessible only from the instance that set them. This is currently a show stopper for our project being deployed in a RAC environment.
I found no explicit limitation about SYS_CONTEXT and global context not usable on RAC (only about parallel queries).

What are your thoughts on this?

Regards,
Mihail

service_names standby

nn, June 05, 2006 - 2:38 pm UTC

Tom,
If go back little then I asked about splitting the cluster and you suggested we can do it based on service_names. Thanks!

Now my doubt is on standby configuration. I changed service_names for sid=test2 and did not change any tns enteries. Now I am wondering How can standby work fine without any tns changes? It is working fine but How?

fal_client string TEST_STDBY1
fal_server string TEST_PRIME1, TEST_PRIME2



Tom Kyte
June 05, 2006 - 2:49 pm UTC

you must have set it up so that primary site can see and find the standby site. If you don't know what you did (i question why you would have done that)....

I do these quick - not going to go through your entire configuration here, sorry.

service_names standby

nn, June 05, 2006 - 7:56 pm UTC

It is continue to neeraj thread
Tom,
I have 1 cluster -> 4 nodes configuration on 9i RAC

I want to use 2 node [ a1 and a2 for oltp stuff] and other 2 [a3 and a4 for
batch ] 24*7 .
I dont want to spilt my cluster.
Is there any way can i do this?
Thanks!


Followup:

just have instance a1 and a2 register as service "oltp" and a3 and a4 register
as service "batch"

then connect to oltp or to batch - no need to "split" really.


GOTO a page to Bookmark Review | Bottom | Top
rac April 20, 2006
Reviewer: neeraj

Do you mean in init.ora for a1 and a2 I should add service_name entry and then
use same service_name is tnsnames.ora?

AUCSX_oltp=
(DESCRIPTION=
(LOAD_BALANCE=ON)
(FAILOVER=ON)
(ADDRESS=(PROTOCOL=TCP)(HOST=dbrac01)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=dbrac02)(PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=AUCSX_OLTP)
(FAILOVER_MODE=
(TYPE=SELECT)
(METHOD=BASIC)
)
)
)


Followup:

yup


What changes do we need after above changes so that stand by should work?

Right now I have made above changes but did not change any tns entery and stand by is not complaining anything? :( How come?

Tom Kyte
June 06, 2006 - 8:20 am UTC

why *would* it complain?

Interconnect Failure

Yogesh, June 14, 2006 - 12:58 pm UTC

I was trying to find out answer for one scenario in RAC. If we have a two-node cluster and if interconnect fails, what will be the behaviour of RAC? As both nodes now can't communicate, will there be a problem when each of them will try to be a master?

Advisable to run 10g RAC and non-RAC 9ir2 databases on same box?

AR, July 31, 2006 - 12:24 pm UTC

Tom,
Is it advisable to run a 10gr2 RAC database along with a bunch of 'Non-RAC' 9ir2 databases on the same box? The 10gr2 RAC database would ofcourse span across multiple physical nodes.

Thanks.

Tom Kyte
July 31, 2006 - 1:44 pm UTC

It is not advisable to run more than one instance of Oracle on a box in real life! Forget the versions/features.

One box = One instance.

RAC is all about scaling out and availability. By having more than one instance on a given host, you blow both of those concepts out of the water.



10g RAC and non-RAC 9ir2 databases on same box?

AR, July 31, 2006 - 2:15 pm UTC

Tom,
Thank you. Can you elaborate a bit?

>> It is not advisable to run more than one instance of Oracle on a box in real life! Forget the versions/features.

I don't understand. Realistically, most customers cannot afford a separate box for every non-RAC instance or database -- right?? They tend to club multiple instances on a single box.

>> One box = One instance.

Specifically for RAC? or does that extend to non-RAC databases too?

>> RAC is all about scaling out and availability. By having more than one instance on a given host, you blow both of those concepts out of the water.

Yes. I can see how RAC scalability might be an issue by having more than one instance on a box. Why would RAC availability be impacted by adding non-RAC databases to one of the nodes? (Even if you were doing something on the non-RAC databases that requires taking NodeA down; Node B/C would still be available to the RAC database)

I am not questioning the veracity of what you're saying. Just trying to understand where you're coming from :).

Thank you for your time.

Tom Kyte
July 31, 2006 - 2:49 pm UTC

it is called consolidation, you can do more on a machine with a single instance than you can with two, or three, or four. That is - if you cannot afford a separate box for every instance - YOU CANNOT AFFORD MORE THAN A SINGLE INSTANCE on a box.

You can do more, with less, assuming you have a single instance.


Specifically for any configuration of Oracle.


Availability is a many sided figure. One of the sides is "I have control over my resources". If you have two instances of Oracle running on a single machine, you cannot say from the perspective of EITHER instance that you have control over your resources. They compete (without mercy) for CPU, Memory and IO. Since instance oracle running 9i can effectively wipe out your 10g Rac nodes ability to do *anything*. And there is nothing the 10g instance can do about it.



RAC Preperation

Yoav, August 03, 2006 - 8:49 am UTC

Hi Tom,
We have no practical experience withe RAC.
We are planing to create a new database with RAC 10gR2.
We already have 2 REDHAT4 on 380g4 .
We had been told that the shared disks could be connected to the storage only by NFS.
Is it possible for RAC to work over the NFS ?

As a dba i would like to know what we need to ask from the System administrator regarding the disk preperation, connectivity,networking and any configuration. ?

Regards



Tom Kyte
August 03, 2006 - 9:44 am UTC

as long as it is a certified NFS configuration (eg: a netapp filer for example)

Please utilize support to discover certified configurations - they are really pretty good at that.

nmgzw

nmgzw, August 14, 2006 - 2:07 am UTC

We have installed oracle10g rac.We use raid5 as our storgage.We use raw device for oracle clusterware file and datafile.
Some engineer say,we don't need to create more than one ocr fiel and voting disk. Because he said we wouldn't incur user read write error on raw device,we only need to avoid hardware error.

So,my question is: If we need to create more than one controlfile or more than one logfile in every log group????

For more clear explanation,eg:

method 1:
CONTROLFILE: control01.ctl,control02.ctl
INSTANCE 1 (REDO GROUP):
GROUP 1: redo_1_11 GROUP 2: redo_1_21 GROUP 3: redo_1_31
redo_1_12 redo_1_22 redo_1_32
INSTANCE 2 (REDO GROUP):
GROUP 1: redo_2_11 GROUP 2: redo_2_21 GROUP 3: redo_2_31
redo_2_12 redo_2_22 redo_2_32

method 2:
CONTROLFILE: control01.ctl
INSTANCE 1 (REDO GROUP):
GROUP 1: redo_1_11 GROUP 2: redo_1_21 GROUP 3: redo_1_31

INSTANCE 2 (REDO GROUP):
GROUP 1: redo_2_11 GROUP 2: redo_2_21 GROUP 3: redo_2_31

Do you advise which method?????

Thanks in advance!!!!!

Tom Kyte
August 14, 2006 - 10:48 am UTC

what kind of magic disk do you have that is impervious to all errors? I want some of that stuff.

raid 5 fails
raid 5 is slow too but
raid 5 fails

and remember, disks are manufactured in "lots" (groups). If you rip open your cabinet of disks you might even find sequential serial numbers on the disks.

and defects happen in "lots" (groups). Funny how fast a raid 5 array can fall apart with simultaneous failures.

nmgzw

A reader, August 16, 2006 - 8:55 pm UTC

Sorry tomkyte,I am always so late for your fast reponse!!!!
You are a kindly man.

Because our fund is limited.So we assume that our storage don't easyly corrupted.
Under the condition,do you like which method I mentioned above.

Thanks you very much sincerely.

Tom Kyte
August 16, 2006 - 8:59 pm UTC

if funds are "limited" (disk is so much cheaper than you or I these days), then multiplex the heck out of everything - on separate physical devices and cross your fingers that nothing ever goes wrong. Because this probably means there are no funds for "testing", "practicing", or "training" - so if something breaks.....

order to start crs and database

A reader, August 18, 2006 - 3:22 pm UTC

First, this is a great site.

We're running Oracle 10g R2 RAC on Linux. We have 2 rc scripts to start the Cluster demons and Oracle database at machine reboot. They're:

lrwxrwxrwx 1 root root 16 Jun 2 11:22 S75oracle -> ../init.d/oracle
lrwxrwxrwx 1 root root 20 Nov 11 2005 S96init.crs -> /etc/init.d/init.crs

My understanding is that the script "S75oracle" will be run before "S96init.crs". Is that correct?

If it is, do you think that we should move the scripts so that "oracle" will be run AFTER "init.crs", for example, "S98oracle"? Thanks.


Tom Kyte
August 18, 2006 - 4:31 pm UTC

S75 runs before S96 - yes

I use S99 for my Oracle stuff, you want it last - after all of the system services have started.

crs errors

A reader, August 18, 2006 - 7:51 pm UTC

Thanks for your quick response to my previous question. Your site is much better than Oracle E-support.

I got one more question. There's an error message in $ORA_CRS_HOME/log/phlnu45/alertphlnu45.log where phlnu45 is a node name:

"[cssd(6633)]CRS-1606:CSSD Insufficient voting files available [0 of 1]. Details in /opt/app/oracle/product/10.2.0/crs/log/phlnu45/cssd/ocssd.log.
2006-08-17 18:07:29.700
[cssd(6853)]CRS-1605:CSSD voting file is online: /dev/sdh1. Details in /opt/app/oracle/product/10.2.0/crs/log/phlnu45/cssd/ocssd.log.
2006-08-17 18:07:32.881
[cssd(6853)]CRS-1601:CSSD Reconfiguration complete. Active nodes are phlnu45 .
"
I couldn't find any info at Oracle's metalink website.

My question is:

What does "CRS-1606:CSSD Insufficient voting files available" mean? Does it mean that there's not enough number of voting files? Or, there's not enough disk space for the voting files? What should we do about it?

Thanks as always.

Tom Kyte
August 19, 2006 - 4:37 am UTC

well - I'd peek in ocssd.log to see if there are more details, but this typically means there was a problem accessing that particular file.

If the ocssd.log doesn't shed any light, I'd suggest filing a tar (not just searching metalink)

IP address on RAC

A reader, August 26, 2006 - 9:25 pm UTC

Sorry to bother you again. Below is a list of info about IP address on one of nodes.

______________________
-bash-3.00$ oifcfg getif
eth2 10.4.1.0 global cluster_interconnect
eth4 168.162.125.64 global public

-bash-3.00$ oifcfg iflist
eth2 10.4.1.0
eth4 168.162.125.64
eth5 10.129.0.0

-bash-3.00$ more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain phlnn45 localhost
168.162.125.73 phlnu45.oamp.sgns.net phlnu45.sgns.net phlnu45
168.162.125.87 phlnu45_v.oamp.sgns.net phlnu45_v.sgns.net phlnu45_v
10.4.1.1 phlnu45_d.oamp.sgns.net phlnu45_d.sgns.net phlnu45_d
______________________

From the command "oifcfg", it seems that the interconnect's ip address is 10.4.1.0. But this ip address doesn't exist in the /etc/hosts file. Instead, there's another similar ip address 10.4.1.1 in this file. We're going to add a second node to our existing single-node RAC env. The current database version is 10.2.0.1.0.

So my question are:

1. Is the IP address for interconnect 10.4.1.0?

2. If yes to the first question, should it be recorded in the /etc/hosts file?

3. What's the ip address 10.4.1.1 used for?

4. Should the ip address for the interconnect on the new node be the same as the one on the first node?

5. Do you think if it's better to apply the latest patchset to the existing node before a new node is added or to apply it after the new node is added?

6. Can the Oracle software be "pushed" over to the new node or a new installation of Oracle software needs to be completed on the new node?

Thanks.

another question!!!

nmgzw, August 27, 2006 - 9:19 pm UTC

First,thanks you tom!!

Our have upgraded Our database from 9i single instance to 10g rac.Because our application program is c/s struct with many many clients.And because original service_name is "ORAL3" that it was specified as database connect string in application program. i don't want to upgrade every single client's "tnsnames.ora", whether I can using another good method to implement it???

For clear:
Original "tnsnames.ora" on client:

TNSNAMES.ORA:
ORAL3 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.250.14)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = oral3)
)
)


But,now because using rac,"tnsnames.ora" file can't help but to become :
oral3 =
(DESCRIPTION=
(LOAD_BALANCE=on)
(FAILOVER=on)
(ADDRESS= (PROTOCOL=tcp) (HOST=10.1.250.13) (PORT=1521))
(ADDRESS= (PROTOCOL=tcp) (HOST=10.1.250.14) (PORT=1521))
(CONNECT_DATA=
(SERVICE_NAME=oral3)
(FAILOVER_MODE=
(TYPE=select)
(METHOD=basic)
(RETRIES=20)
(DELAY=15)
)
)
)


Tom Kyte
August 27, 2006 - 9:35 pm UTC

if each client is using their OWN tnsnames.ora file, not sure what else you can do?

short of implementing a centralized naming schema, like ldap, where things like this would be manageable?

nmgzw

A reader, August 28, 2006 - 10:09 am UTC

Thanks you tom very much!!!
My knowlege is limite. Whether you advise our to implement a ldap to replace many many client's configuration??

Tom Kyte
August 28, 2006 - 11:14 am UTC

it would be an "enterprise" decision - something your company would make as a strategic direction.

DFS lock handle very high

A reader, September 10, 2006 - 7:58 am UTC

Hi Tom,

part of my statspack : interval 20 minute between each snap and the total duration is 36 hours

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
DFS lock handle 42,001 1,805 1,007 25 0.2
log file parallel write 400,714 0 486 1 2.0
log file sync 215,324 0 481 2 1.0
control file sequential read 28.,987 0 198 1 1.3
global cache null to x 136,224 34 163 1 0.6
global cache cr request 101,703 679 127 1 0.5

could you please tell me what is the DFS lock handle and how i could reduce it ? what could be the reason to make it the highest wait

thanks

Tom Kyte
September 10, 2006 - 9:39 am UTC

are we looking at a 36 hour statspack?


</code> http://docs.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevents003.htm#sthref4392 <code>

versions and "what we do" sort of information would be sort of relevant...

more info

A reader, September 10, 2006 - 2:43 pm UTC

Hi Tom,
sorry for lack of information

i am using 9i rac (9.2.0.7)

the duration between snaps are 10 minutes
the report is created with start snap: 1 and end snap :216
( the whole interval was 36 hour)

i read the link about the DFS lock handle but i still don't know how to get the real reason for this wait , and how to reduce it, will you please help me in that ?

thanks

Tom Kyte
September 10, 2006 - 4:53 pm UTC

36 hours is, well, to put it bluntly, just about 35.5 hours too long.

You sort of need to tell us "what you do".

given that you had 36 hours * 60 minutes * 60 seconds = 129,600 seconds, I for one am not overly concerned here.

proper statspack interval

A reader, September 11, 2006 - 1:54 pm UTC

Hi Tom,

regarding the DFS lock handle wait ,do you mean that i am analyzing very long interval in my statspack ?

is there a proper interval for to be analyzed within the statspack .

i have read on your site that snaps should be seperated with 10 - 30 minutes to give accurate data.
but what about how many snaps will be proper to be include within the statspack ?

if i am looking to analyize the work load for a complete day , could you give me the right way to do it ( number of snaps & interval between each two snaps )?

thanks

Tom Kyte
September 11, 2006 - 1:59 pm UTC

you are doing a report that covers 36 hours of time.

you have hundreds of thousands of seconds in 36 hours of time.
having 1,000 seconds of wait is pretty tiny, boring, not significant

when measured over 36 hours.


You took snaps on the right interval.

But you chose two snapshots so far apart (ignoring all of the ones in the middle) as to make the report useless basically.

You want a report covering maybe 15 minutes during peak processing.

We only need two snapshots, taken within a short period of time of eachother.

that's more clear

A reader, September 11, 2006 - 2:34 pm UTC

Hi Tom,
thanks for your great help.
does oracle ignore all the snaps between the first and last snap ?
i thought Oracle will take into consideration all the snaps available between first and last snap, if Oracle doesn't take those snaps into consideration that mean like that i am wasting my resources :(

what is your recommendation if you aren't exactly sure about time of the heavy load ?
what is the longest interval that can give useful statspack report ,should i also make snaps apart with 15 minutes ?

thanks


Tom Kyte
September 11, 2006 - 2:59 pm UTC

no, a spreport just takes

a) the values from the last snap you specify
b) subtracts from them the FIRST snap you specified.

for all intents and purposes all of the "interior snapshots" didn't even have to take place, the results would be identical.

You sort of know when about your peak time is. If you don't, ask those that do.

15 minutes would be a start, yes.

Trace 10046 for DFS lock handle

A reader, September 11, 2006 - 4:18 pm UTC

Hi Tom,
do you think enabled Trace 10046 for a typical user session could help in determine the required lock which causing the wait DFS lock Handle

thanks

Tom Kyte
September 11, 2006 - 4:53 pm UTC

I'd rather see if it is even worth bothering about first.

Average wait of 27 second per hour. Or less than a half second per minute.

add node and db/instance status?

a reader, October 07, 2006 - 6:31 pm UTC

Hi, Tom:

Thanks for your advice for the readers' questions.

Currently I have a one-instance RAC database running on one node. I'm planning to add a second node/instance to this existing one. My questions are:

1). Do I need to shutdown the existing database/instance before running the addNode.sh command?

2). We're using raw devices for the voting disk and OCR files. What permissions for these 2 files should be before and after a node is added?

Tom Kyte
October 07, 2006 - 7:09 pm UTC

my question must be answered first:

is this a TEST system that it doesn't really matter WHAT happens to it?

RAC response.

reader, October 09, 2006 - 9:48 am UTC

To your question on my questions just above: This one-instance RAC database is already in PRODUCTION.

Tom Kyte
October 09, 2006 - 10:10 am UTC

so, get your test system IN LINE and work over there please. Won't say anything else unless and until you do :)



Add another database to a 2-node RAC env

reader, October 13, 2006 - 2:33 pm UTC

This site is very helpful. Below are my questions:

We're considering about adding another RAC database to a existing 2-instance RAC env, I think the 2 RAC databases will compete for resouces, especially the interconnect communications. So from the performance point of view, the existing database will perhaps have some degradation.

1. Do you agree with my point?

2. How would you recommend if we use this approach?

3. Do you think other options are vailable and better?


Thanks.


Tom Kyte
October 13, 2006 - 7:16 pm UTC

you will have to clarify your terms here.

are you really talking about adding another DATABASE or a third node to an existing cluster?

if you are talking about adding another database to an existing two node cluster - one word for you:

NO

don't do it. You already have a database, just create a schema and be done with it.

Is it possible to shrink the voting file

reader, October 27, 2006 - 10:41 pm UTC

We're keeping the voting disk on raw device. But it was originally configured too big (8 GB) and was not mirrored. We're trying to add 2 more regular sizes (200 - 500 MB) of voting disks.

1. Do you think it is feasible to add these 2 smaller voting files?
2. Is it possible to shrink our current big voting file?

cluster name

reader, December 04, 2006 - 12:29 pm UTC

In 10.2.0.2 RAC, is there a command to find the name of the cluster? Thanks.

Re: ORA-12505,TNS :listener does not currently know SID given in connect descriptor

Vinay, January 23, 2007 - 12:43 pm UTC

Hi TOM,

I have setup 2node RAC on win2k. Created a RAC db "test",it created two instances "test1&test2", Now when i try to connect using "test", im getting an error " Re: ORA-12505,TNS :listener does not currently know SID given in connect descriptor" . But if try to connect using instance, it works fine. But Again if i use any one instanse and if the particular node goes down, "FAILOVER" is not happening... I am new to it Please help me.

Finding Instance Number and Server Information

A reader, February 03, 2007 - 6:44 am UTC

Hi Tom,

We have a RAC SetUp in Production Environment with 4 instances.
Currently we do a basic audit and error logging.
But Now we also have to log instance information and server information while doing audit and error logging.

Can you tell me how can i get the instance information and server information while doing audit and error logging.

Regards
Tom Kyte
February 03, 2007 - 7:51 pm UTC

define "instance and server information" first.

Parameter level changes that should be considered from non - RAC to RAC

Karteek, February 04, 2007 - 1:50 pm UTC

Hi Tom,

Earlier we had our production on NON RAC. That was beautifully running. To increase the availability with growing business and data base size we recently migrated to RAC DB with 3 instances. Suddenly performance got dramatically fallen. One of the main tables in our DB has around 350 million records. Just to give as an example, a query to fetch the record count on this table used to take around 15-20 min in non-RAC, but now the same query is taking more than 50 min. It's just to give as an example. There are many cases, like updation of records, deletions, index rebuilding ,analyzing etc...all most everything in RAC is relatively slower. Could you please suggest what are all the things that should be checked or take care of on RAC to improve it's performace.

fyi, even the hardware configuration of Sun Solaris servers running this RAC DB are more efficient than our previous server running non-rac db.

Many Thanks TOm!!
Karteek

Follow up request - Parameter level changes that should be considered from non - RAC to RAC

Karteek, February 10, 2007 - 9:26 am UTC

Hi Tom,

I've been eagerly waiting for your valuable suggestions on our issue with RAC migration (above post - Parameter level changes that should be considered from non - RAC to RAC).

Thank you very much Tom. Hoping that you would consider this request...

- Karteek
Tom Kyte
February 12, 2007 - 10:15 am UTC

none, start with defaults - big fan of "let them mostly default" - memory parameters being the biggest caveat there.

You might find you need a bit more SGA as there are more bytes to manage in memory now.

nodeapps

A reader, February 18, 2007 - 9:45 am UTC

In 10g, when we start the database every body recommends to start nodeapps first for all the nodes and then start the database. Even some mentioned if we don't, then it might crash the database.

What exactly nodeapps does (apart from starting the listener)? Does oracle recommends to start nodeapps first before asm and the database?

Thanks,
Tom Kyte
February 18, 2007 - 9:49 am UTC


The order of starting machines would not "crash" the database.

but if you are in a clustered environment, it would be useful to have the cluster services running, yes.

nodeapps and cluster services,

A reader, February 19, 2007 - 11:51 am UTC

Does nodeapps starts the cluster services? If cluster services are already running (in case if it is independent of nodeapps), then I can start database first and then nodeapps since it would not "crash" my database.

I would prefer starting nodeapps first, but sometimes, I may accidently start the database first and then start the nodeapps. I am trying to learn what are the worst case scenario in this case.

thanks,

Tom Kyte
February 19, 2007 - 11:57 am UTC

worst case is you would get an error message saying something that needs to be running isn't.

asm

A reader, March 02, 2007 - 10:03 am UTC

We have a 4 node RAC on 10.1.0.5 database. It uses ASM to store oracle related files.

Apparently we have a datafile that is created off of ASM and have put in one of archive log destination. This was done due to insufficient space in ASM.

Is this Ok? Will Oracle generate any errors in future by doing this? In general, is this practice advisable?

Thanks,


Tom Kyte
March 04, 2007 - 6:09 pm UTC

it is fine, you can use asm, cooked files, raw files, ocfs files, whatever - all at the same time.

Advisable? Hard to say - doubt I would want a datafile in my archive log destination.

GV$SESSION

A reader, April 16, 2007 - 12:27 pm UTC

Tom,
Any inputs on what is FAILED_OVER column in the gv$session
Many thanks

A reader, April 18, 2007 - 11:49 am UTC

Thanks Tom

Alexander the ok, April 30, 2007 - 1:10 pm UTC

Hi Tom,

For RAC, would you say not having a private network for your interconnects is "not recommended" or is it just "wrong"?
Tom Kyte
April 30, 2007 - 3:34 pm UTC

"wrong"

RAC is for two things:

a) availability
b) scalability

if you share your network with everything else - the network for cross node chatter - your ability to have a specified level of availability (bandwidth) is near zero. Just takes one person downloading the latest patch to flood the network.

impacts scaling as well - as you scale up, you add more users, more users more traffic, if you share the road....

If you are buying into RAC, do not shortchange the rest of the implementation - it goes back to the "penny wise, pound foolish" quote.

RAC

sagar, May 01, 2007 - 11:12 am UTC

Hello Tom,
I am implementing a 2 node RAC with Oracle 10g on Linux RHEL4 using iSCSI and ASM.

The ASM instance is not starting on node 2 , it gives an error message while creating the database,the message is :"PRKS-1009 Failed to start ASM instance +ASM2 on node LINUX2 CRS-0215:Could not start resource 'ora.linux2.ASM2.asm'"

Looking forward for your help


Tom Kyte
May 01, 2007 - 11:32 am UTC

DB failover

superman, May 08, 2007 - 7:16 am UTC

we have implemented 10g RAC and it works fine for all the test cases except one .. in this test case, the power cord of one of the machines was pulled out manually .. the db failover to another instance didnot happen instantly .. instead it took 10 minutes for the failover to the second instance .. please can you help us with any parameter setting which will reduce this failover time for this specific case ... (the failover works fine when the instance is stopped but not with the power failure)
Tom Kyte
May 10, 2007 - 8:30 pm UTC

fast=true

that should do it.

hmmm, sort of vague inputs - they get sort of vague outputs.

A reader, May 09, 2007 - 2:19 pm UTC

Its a bug in 10gR2 (version2 and 3)

How to analyze performance on a RAC env. in order to decide to add a new node.

A reader, May 10, 2007 - 4:42 pm UTC

Hi Tom,

I am trying to analyze performance on my current 2 node rac cluster, and see if it will beneficial to add another node to the cluster to support the Online App. Could you please tell me which are the internal views where I can find information about interconnect traffic, hot tables/blocks, how they are being pin and who are the nodes that are requesting the block etc. I just want to make sure that adding a node will not degrade performance due to synchronization between the nodes because blocks are being modified in different instances of the RAC.
I would appreciate any help you can give me on this.
Thanks

ASM on RAC

Suvendu, May 26, 2007 - 9:19 am UTC

Hi Tom,

Here I have couple of question ASM on RAC:
# 1:
SQL> alter system set log_archive_format='arc%t_%t.arc' scope=spfile;

System altered.

SQL> startup force
ORA-19905: log_archive_format must contain %s, %t and %r

And, I don't have PFILE, as well I'm not able to touch the ASM file system where the spfile is. So, how could I bring back my database?

#2 :

a) I have an ASM diskgroup size of 5 GB and create database size of 3 GB.
b) Dropped the database using DBCA.
c) Started creating a new database using DBCA, and now ASM diskgroup free space is 2GB instead 5GB (or could be little less due to disk member stamping).

So, how could I reclaim this 3 GB from ASM which was used by earlier database?

Hope, I'm clear on above two question.

Thanks,
Suvendu


Tom Kyte
May 26, 2007 - 12:20 pm UTC

[tkyte@desktop ~]$ oerr ora 19905
19905, 00000, "log_archive_format must contain %%s, %%t and %%r"
// *Cause: log_archive_format is missing a mandatory format element.
// Starting with Oracle 10i, archived log file names must contain each
// of the elements %s(sequence), %t(thread), and %r(resetlogs id) to
// ensure that all archived log file names are unique.
// *Action: Add the missing format elements to log_archive_format.



i don't know what you mean by "bring back my database" in this context, I don't see you missing one.



you would remove the datafiles you created for the other database, just like a file system would have you do.

http://docs.oracle.com/docs/cd/B19306_01/server.102/b14231/storeman.htm#sthref1766

SPFILE on RAC

Suvendu, May 28, 2007 - 5:02 am UTC

Thanks always for your earliest response and answering to my question #2.
But have a consideration relate to ASM behavior; it seems NOT logically correct for Oracle.

- When I'm dropping the database through DBCA or with DROP DATABASE statement, it removes datafile, redolog and controlfile.
- Even, DROP TABLESPACE tbs1 INCLUDING CONTENTS AND DATAFILES which also make sense.

But, though I'm dropping database, why ASM keeps the datafiles which are not required any more¿

Just elaborating to question #1 as follows:

- The SPFILE is residing on ASM disk (file system) where it is not accessible to modify the parameter to correct value.
- And, there is no backup of SPFILE as PFILE. I think, the instance can be start with a new pfile with default value to the node. (I have not tested)
- Now, due to this error the instance is in down stage.

So, my question how could I startup the database, because the SPFILE which is ASM file system containing an invalid parameter?

Hope, I'm clear now.

Thanking you for spending your precious time for me.

Regards,
Suvendu

Tom Kyte
May 30, 2007 - 9:45 am UTC

dbca must NOT be erasing the files from ASM therefore, you said "dbca is removing datafiles", but apparently - it is not.


ASM isn't keeping anything, ASM hasn't been told to remove something.

did you read the links, if you execute those commands in ASM, you'll see the files, if the files are there, they are there and they were NOT erased yet.

You can always startup with a pfile that includes your spfile and overrides anything you want:

sys%ORA10GR2> show parameter log_archive_form

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_format                   string      x%s_%r_%t.arc
sys%ORA10GR2> alter system set log_archive_format='arc%t_%t.arc' scope=spfile;

System altered.

sys%ORA10GR2> startup force;
ORA-19905: log_archive_format must contain %s, %t and %r
sys%ORA10GR2> !cat tmp.pfile
spfile=/home/ora10gr2/dbs/spfileora10gr2.ora
log_archive_format=x%s_%r_%t.arc

sys%ORA10GR2> startup force pfile=tmp.pfile;

ORACLE instance started.

Total System Global Area  608174080 bytes
Fixed Size                  1263200 bytes
Variable Size             352324000 bytes
Database Buffers          247463936 bytes
Redo Buffers                7122944 bytes

Have you tried this

Alan, May 29, 2007 - 5:08 pm UTC

Have your tried:

startup mount
alter system set log_archive_format='something_valid' scope=spfile;
startup force
Tom Kyte
May 30, 2007 - 11:10 am UTC

did you?

you cannot startup mount with that invalid parameter set.


The question of installing RAC on RHEL 4

yyanthony, May 30, 2007 - 11:57 am UTC

Dear Sir,I have a question to ask you!
Last night,I installed Oracle 10G RAC on RHEL4,but I accounted a question.It said "Loading module 'oracleasm':Unable to load module 'oracleasm' [FAILED]",but I had already installed the ASM module before,also I had configured it rightly!How could I do?Please give me an answer,Thank you!
Tom Kyte
May 30, 2007 - 3:59 pm UTC

probably, oracleasm was already loaded and the loading of it again (installing something you already installed) failed.

but please - utilize support for installation and configuration - I don't have access to every OS, every version, every release, etc....

The OCFS2 question

yyanthony, May 30, 2007 - 12:09 pm UTC

Dear Sir,how do you do!!!
I have a question to ask you,last night I installed Oracle 10G R2 on RHEL4,I accounted a problem "Could not start cluster stack.This must be resolved before any OCFS2 filesystem can be mounted".My friend asked me to use root user to execute the ocfs2console,but I did that the problem also accounted.How I can do now?Thank you ,please!

load balancing

David, June 05, 2007 - 5:16 am UTC

Hi Tom,
we are running on 10G Release 2, 2-node RAC, 1st node run with 4 CPU, 2nd node run with 8 CPU, both are same processor speed. A service has created with both preferred nodes. During the normal operation time I query the database with the below sql:

SQL> select INST_ID, count(*) from gv$session group by inst_id;

INST_ID COUNT(*)
---------- ----------
1 118
2 117

I have performed an uptime while loop in the OS level to capture the load on both servers:

1st node load:

4:55pm up 33 day(s), 2:34, 3 users, load average: 2.94, 2.93, 3.09
4:55pm up 33 day(s), 2:34, 3 users, load average: 2.96, 2.93, 3.09
4:55pm up 33 day(s), 2:34, 3 users, load average: 3.01, 2.95, 3.09
4:55pm up 33 day(s), 2:34, 3 users, load average: 3.07, 2.96, 3.09
4:55pm up 33 day(s), 2:34, 3 users, load average: 3.11, 2.96, 3.09
4:55pm up 33 day(s), 2:35, 3 users, load average: 3.11, 2.97, 3.10


2nd node load:

4:55pm up 33 day(s), 2:30, 2 users, load average: 16.34, 11.92, 10.46
4:55pm up 33 day(s), 2:30, 2 users, load average: 16.32, 11.96, 10.48
4:55pm up 33 day(s), 2:30, 2 users, load average: 16.21, 11.98, 10.49
4:55pm up 33 day(s), 2:30, 2 users, load average: 16.05, 11.99, 10.50
4:55pm up 33 day(s), 2:30, 2 users, load average: 15.95, 12.01, 10.51


It is look like the connection load balancing is in place, this can be prove with the almost equal sessions connected to both nodes (node 1 - 118, node 2 - 117). Does Oracle load balancing will take server overhead into their account to perform the load balancing task? Or perhaps we are miss configure something?

Thanks


Tom Kyte
June 06, 2007 - 12:58 pm UTC

depends on how you configured load balancing.

using mis-matched server sizes like that is a bad idea, you'll likely have to go into the world of setting parameters to 'correct for that', but I'd recommend "homogeneous" as the path of least resistance.

Node Eviction - Oracle Clusterware and Vendor Clusterware

sriram vrinda, July 11, 2007 - 6:41 am UTC

Hi tom,

After reading through the oracle clusterware documentation and some third party clusterware documentation i framed up some notes which is referenced below.

Please can you comment on where my understanding is correct and where my unserstaning is flawed.



A node is evicted from the cluster after it kills itself because it is not able to service the applications. This generally happens during the communication failure between the instances, when the instance is not able to send heartbeat information to the control file and various other reasons.

This occurs when the members of the database cluster group are not able to communicate with each other which may include

LMON process of one instance not able to communicate with the LMON of the other instance

Communications error in the cluster

One of the RAC process spins.

Failure to issue a heartbeat to the control file due to instance death

A split brain has been detected.



If communication is lost at the cluster layer (for example, network cables are pulled), the cluster software may also perform node evictions in the event of a cluster split-brain. Oracle will detect a possible split-brain by a method known as Instance Membership Recovery where in the healthy instance will remove the problem instance(s) from the cluster.
When the problem is detected the instances `race¿ to get a lock on the control file Results Record lock for updating. Implemented in the same way as vendor Clusterwares takes a poll of the quorum disk to decide on the cluster owner. The instance that obtains the lock tallies the votes of the instances to decide membership and in turn wait for cluster software to resolve the split-brain by calling the required fencing library i.e. when using Oracle Clusterware then it invokes the OPROCD and in case a vendor Clusterware is present it calls the node membership service routine name libskgxn.so. If cluster software does not resolve the split-brain within a specified interval, Oracle proceeds with instance evictions.
But a potential catastrophe could arise if pending I/O from a dead instance is to be flushed to the I/O subsystem. This is where I/O fencing plays a crucial part. The I/O fencing is to be performed at the cluster layer and not at the RAC Layer. When Using Oracle Clusterware Oracle relies on the OPROCD and the way it does is to reboot itself, once gone through the reboot process, it just do the rejoin cluster again, then the cluster can decide whether accept it or not.
How ever this method of rebooting the system still doesn¿t prevent the pending I/O that is being flushed to the I/O subsystem. But it raises the question of what happens if the node is so unhealthy that it cannot successfully execute the reboot command. It is to be noted that a voting disk that is part of the Oracle Clusterware is a backup communication mechanism that allows CSS daemons to negotiate which sub cluster will survive. The usage of this can be appreciated when we consider more than two nodes where in it prefers a greater number of nodes to take over the cluster again in this approach the pending I/O if any from the sub cluster that is thrown out is not blocked.

So, when a communication failure occurs between the instances, or when an instance is not able to issue the heartbeat information to the voting disk, IMR is triggered. Without IMR (there is no mechanism to detect the failures), the entire cluster could hang.
But having said this the RAC locks are still not released even when the there is a split brain and oracle strongly relies that these locks are reassigned only when the remote node is down, and may lead to a cluster hang as there is a strong dependency tied up between the Oracle Clusterware process and RAC Background process.This tells us that the Fencing mehcanism provided by Oracle Clusterware strongly relies on RAC specific Locks.


Under such circumstances where oracle is not able to do the actual fencing operations, then the Vendor Clusterware if present would do this with known technologies like SCSCI Reservations or Fabric Fencing rather than trying to kill the node. If the vendor Clusterware is present and when oracle makes a call to the fence service module the Clusterware does this operation independently of the RAC process and leading to a smooth cleanup of the faulty nodes.

Thanks in advance

Node Eviction

sriram, July 12, 2007 - 12:18 pm UTC

Hi Tom,

Waiting for you expert comments on the above post

"Node Eviction - Oracle Clusterware and Vendor Clusterware "

Thanks in advance

Alexander the ok, August 28, 2007 - 2:25 pm UTC

Tom,

We're building a new environment with RAC. The architect wants only our test and production instances to be RAC, not development. Do you think this is a bad idea, to have developers creating stuff against a non RAC instance?

What types of issues can you see happening, so I can bring them up. Thank you.
Tom Kyte
September 04, 2007 - 3:32 pm UTC

it would be a mistake.

there are some features that work a little different on RAC than non-RAC; not many but there are some (dbms_pipe for example, global application contexts, there could be XA issues and the like)

I'd rather develop in the same environment I'm going to deploy in.

A reader, September 05, 2007 - 1:22 am UTC

Make sure you perform you TAF tests as well that again depends on what type of JDBC driver is being used (if Java is being used). Without the RAC-dev environment it would be hard to develop code.

Alexander the ok, September 18, 2007 - 1:18 pm UTC

The reader above bring up a very good point.

Tom, RAC does not port connections over seamlessly from one instance to the other if one goes down out of the box does it?

Can this be accomplished at the instance level without any application configurations using something like TAF?
Tom Kyte
September 18, 2007 - 4:51 pm UTC

if you are in a transaction and the node you are connected to fails, you will receive an error and assuming automatic reconnecting is configured - you'll be reconnected to the surviving instance and ready to retry the operation.

if you are in a READ ONLY situation, TAF can fail you over and restart the query and pick up where it left off.

Configuring LUN's for ASM

Irfan, September 24, 2007 - 3:53 pm UTC

Hi Tom,
We are currently installing 10g RAC2 ASM in HP/EMC. If I have 900gb of data, is it advisable that I make on LUN of 900gb or split to around 5-10 LUNS? And similarly for how many LUNS for flash recovery, archive logs and RMAN backups? If I split/dont split will I gain or loose performance? Lately I was searching for docs in otn, I was unable to?

Thanks

multiple databases in a single RAC cluster

sg, November 05, 2007 - 10:41 am UTC

Hi Tom,

we are planning to create an environment 3-Node Oracle RAC cluster with 4 databases
like

Four databases
--------------

- DB1
- DB2
- DB3
- DB4

Node1: (DB1 with instance DB11)+(DB2 with instance DB21)+(DB3 with instance DB31)+(DB4 with
instance DB41)
Node2: (DB1 with instance DB12)+(DB2 with instance DB22)+(DB3 with instance DB32)+(DB4 with instance DB42)
Node3: (DB1 with instance DB13)+(DB2 with instance DB23)+(DB3 with instance DB33)+(DB4 with instance
DB43)
And Planning to use ASM

- Do you suggest this type of implementation?
- If so ,will there be any performance degradation ?
- Can I use single oracle home for multiple instance's in a node or seperately?
- Can you explain how load balancing feature help in this kind of environment

thank you

Tom Kyte
November 06, 2007 - 9:17 am UTC

ask this stuff in ONE PLACE only please.

24/7 availability of INTERCONNECT

Rijesh, December 14, 2007 - 4:50 am UTC

Hi,

We have recently migrated to RAC and I have following questions related to Interconnects High Availability feature which is little discussed in the above thread.

1).Does oracle have any inbuilt mechanism for High availability of the Interconnect in case of any failure?

2).Can you also please clarify the following doubt regarding interconnect failure in RAC? If there is an interconnect failure in an RAC with two nodes, which one of the following is/are true?
a).The first node [the node that started first while starting the RAC] alone will survive and the other node will die.
b). Both the instances will die/shutdown?
c). Both the instances will hang?

Thanks.
Tom Kyte
December 14, 2007 - 1:16 pm UTC

1) the interconnect is hardware, we support you using redundant hardware so in a way "sure".

2) there is no such thing really as the "first node", node a could start, then b, then c, then a shutsdown, then a is restarted. which is the first node?

the quorum disk makes it so we don't get a split brain
http://docs.oracle.com/docs/cd/B19306_01/rac.102/b14197/rac_glossary.htm#CHDBEGFH
(two nodes each thinking they are the only game in town)
http://docs.oracle.com/docs/cd/B19306_01/rac.102/b14197/admcon.htm#CJHJGGGF

Why oracle utilities are so lame?

Luigi Sandon, January 14, 2008 - 4:54 am UTC

I was trying to add a third node instance to a two-node RAC database (Win x86-64, 10.2.0.3 + patch 14 applied), using DBCA. Well, DBCA failed shamelessy complaining about a "missing thread number" and then left the cluster with a non-working instance (time to tell the developers setup *must be transactional* too, clean up the mess you made!).
Before it showed a dialog with just the number "0" and an OK button: debugging code left there???
It looks DBCA never added the new instance data to the spfile, is this the missing thread number it complains about?
When I tried to remove the instance using this time the EM console - because I didn't trust DBCA anymore, it wasn't able to get past the "Cluster credential" page. Press next and it returns there without showing any error. Great tool too!

I'll file a service request, but Oracle should be ashamed that a high-end, expensive database is sold with utilities written by incompetent programmers, and with little or no QA at all. In this area Microsoft is really ahead of you.

And my management don't like to waste time for silly errors in applications after spending a nice bunch of money.

10g RAC

A reader, January 28, 2008 - 9:09 pm UTC

Hi,
Can you please tell, to install rac does all nodes need to have the exact same hardware architecture components.

I know OS needs to be the same but what about CPU configuration etc etc for instance.

Can you please elaborate. And please point to the documentation on this particular issue.

Thanx

Thanx
Tom Kyte
January 29, 2008 - 6:52 am UTC

the machines need be similar - same sort of hardware architecture (all x86 or all sparc or all risc) and the same OS - but can be different "sizes", different # of cpus, different amount of ram, whatever.

Archive Logs on RAC

Reader, January 29, 2008 - 12:07 pm UTC

Hi Tom,
In a situation where you want to Dataguard a RAC enviroment with 2 databases[2 instances in each node(1 for each database)], is it technically sound to dataguard the two databases to one set of shared disks? Secondly, which set of archive logs do you use, for the instances or for the databases?
Tom Kyte
January 30, 2008 - 9:28 am UTC

.... In a situation where you want to Dataguard a RAC enviroment with 2 databases[2
instances in each node(1 for each database)] ...

there is NO SUCH SITUATION ever.

Do not even think about doing that - especially with RAC. ONE INSTANCE PER HOST SYSTEM - period, no more (and if you ask me, no less :) every machine should run oracle)


You do not "dataguard to one set of shared disks" - that doesn't make sense.

You use dataguard to ship the redo information over a network to yet another machine, with it's own set of disks and apply it over there - on a machine in another location.

All of the redo generated for a database would have to be shipped over of course, all of it. Redo is "a database thing", generated by instances - but "a database thing"


DO NOT run more than one instance on a given machine ESPECIALLY with rac - you are just asking for trouble.

Archive Logs on RAC

Reader, February 01, 2008 - 7:36 am UTC

Thank you so much Tom, coming from you that this is not sensible has made my day. This is what I was asked to implement as a consultant. The databases are not so much busy and they work fine, why they need RAC in the first place beats me. The previous consultant who wrote the document has convinced them that best DR solution is RAC and Dataguard combined which is OK if you can afford the hardware. they have 2 databases and they only have two nice robust servers with two sets of shared diks. I will think Dataguarding the 2 databases will be enough with RMAN backup solution. What do you think?

Thanks
Tom Kyte
February 04, 2008 - 3:35 pm UTC

RAC is to prevent unplanned downtime - to make it so you don't have a single point of failure in a "room"

RAC is not DR
Dataguard is DR


RAC is about not losing your primary instance because a power supply fails.


So, tell me - do you want to have to FAILOVER if you lose a power supply?

RAC and dataguard are complimentary - not mutually exclusive and not always found together.

clusterware 11g

A reader, February 01, 2008 - 11:26 am UTC

hi Tom,
I would appreciate if you can give me some advice on this subject:
- our rac environment is a 3-nodes (Linux 5.1) cluster which will be running rdbms 10.2.0.3
Our big question is: should we use the Clusterware 11g (11.1.0.6)? In our viewpoints, the pros are more robust version, more tools but the cons are more bugs (?), more error-prone (since we will have to switch back/forth 10g and 11g thus it might make some corruption if mistake).
Could you please shed some of your brilliant ideas?
Thank you, cmai.

RAC on Windows 2003

A.K, February 04, 2008 - 2:15 pm UTC

Hi Tom,

We have implemented 2 node RAC on windows 2003. We are running 2 databases instances on each node.Is it not advisable to run 2 instances on single node and why.


Tom Kyte
February 04, 2008 - 5:03 pm UTC

... Is it not advisable to run 2 instances on single node
and why. ...


why did you get RAC? What was your goal?

It has to be one of

a) higher availability.
b) scalability.


Say it was a). You have defeated that, you have a complex (more complex) environment - with potential conflicts. Instance "A" can easily launch a denial of service attack on instance "B" (the instances know nothing of each other). You have a fragile system - it would be infinitely less so if you just gave machine one to instance A and machine two to instance B.

Say it was b). See the paragraph above. Instance "A" can easily launch a denial of service attack on instance "B" . You are consuming more resources using two instances (two sgas caching lots of stuff).

It just doesn't even make sense to run two instances on a machine like that, ESPECIALLY with RAC

A.K, February 04, 2008 - 2:52 pm UTC

This site is very helpful. Below are my questions:

We're considering about adding another RAC database to a existing 2-instance RAC env, I think the 2
RAC databases will compete for resouces, especially the interconnect communications. So from the
performance point of view, the existing database will perhaps have some degradation.

1. Do you agree with my point?

2. How would you recommend if we use this approach?

3. Do you think other options are vailable and better?


Thanks.




Followup October 13, 2006 - 7pm US/Eastern:

you will have to clarify your terms here.

are you really talking about adding another DATABASE or a third node to an existing cluster?

if you are talking about adding another database to an existing two node cluster - one word for
you:

NO

don't do it. You already have a database, just create a schema and be done with it.


you asked to just create a schema, what do you mean by a schema and how does it help. Can you please explain in detail what do you mean "just create a schema and be done with it".

Thank you.
Tom Kyte
February 04, 2008 - 5:04 pm UTC

Instance "A" can easily launch a denial of service attack on instance "B", yes.

You already know my stance on this, I've stated it very clearly.

One instance per host.

RAC with OS Cluster

parlee, February 21, 2008 - 10:48 am UTC

We have a new project starting soon. The proposal is to use a cluster with two nodes. The databases would exist on SAN disks. I am new to RAC.

1. I would like to understand if we need to use Oracle RAC? I understand that the nodes in a cluster will be in sync with each other, so the second node can act as a fail safe node. Please correct me if my understanding is wrong and if Oracle RAC would add any value as a high availability solution in this case?

2. The proposal is also to use four virtual machines on each node with four database instances in total (one instance per virtual machine). I gather from the discussions on your site that you recommend to use one database per node. Would you advise if we should be using only one instance per node OR once instance per virtual machine and why?

Thanks

Tom Kyte
February 21, 2008 - 11:30 am UTC

suggest you read this:

http://docs.oracle.com/docs/cd/B28359_01/rac.111/b28252/intro_tdprc.htm#sthref6


I recommend one instance PER HOST.

your virtual machines are hosts, they are the machine.

how does rac recover?

A reader, March 19, 2008 - 4:01 am UTC

Hello,Tom

Could you gave me some comments on this problem?
http://www.orafaq.com/forum/m/307492/109897/#msg_307492

I am sorry if you do not open a link ,I will past the question here.

Regards
Alan
Tom Kyte
March 24, 2008 - 9:32 am UTC

seems it was already answered - you assume that only one redo file at a time would be read - there is no reason why they would not all be read and processed at the same time - getting the right stuff from the right file at the right time.

it is sort of like a sort that writes out multiple pieces of data - using parallel query. Say you use parallel 4 on a

select * from t order by x;


Now, that table t is broken up into 4 chunks and 4 processes read their 1/4th of the table. Each process sorts 1/4th of the data. But we need a fully sorted result set right - so the coordinator would ask each of the 4 for their "first record" - their lowest record. The coordinator just needs to return the lowest of those four and ask the process behind the one that it picked at the lowest to provide its next lowest record - a MERGE of sorted data.


Just consider redo like sorted data. you could easily "sort them all out" right.

so, if multiple redo logs are needed to be read at the same time, no reason they cannot be.

setup a poor man's RAC?

Julie, April 06, 2008 - 4:05 pm UTC

Tom,
I have never had a chance to setup RAC though I have about 10 years Oracle DBA experiences. Now I am on job market, and found a lot companies ask RAC experience.

I want to learn RAC on my own. I wonder if there is any way to setup a simulated "RAC" on one Windows XP PC? I have another PC --the Vista. The both are home versions. Can I setup RAC on one of PCs or Both?


Tom Kyte
April 07, 2008 - 9:25 am UTC

if you have virtualization software - like vmware for example - yes.

http://www.oracle-base.com/articles/10g/OracleDB10gR2RACInstallationOnCentos4UsingVMware.php

that was linux, but the setup would be similar in vista since you'd be running linux in the vm's (you don't want to have to buy multiple vista's...)

rac performance

Doron, April 12, 2008 - 5:56 am UTC

Hi Tom,

First, thanks on your 2 last books, bought and enjoyed them very much.
As i understand, RAC main goal is to provide automatic redundency using TAF and other technology, but i would like to understand the performance impact.
As i know, OPS IN 8i already introduced cache fusion to solve readers/writers problem, although not writers/writers problems to solve true/false ping problems, using read consistency, but starting OPS added Locking and lock conversion at the block level, that every access to a block had to go to the IDLM.
How RAC locking is different with it's (GCS,GES) from OPS that it could prevent RAC to be slower then single node instance? Is the Global Resource Directory
tie specific objects to be served by the same instance, and therefore lower the locking conversion between instances at the RAC?
Can you recommand on a internal RAC book?

Thanks,
Doron



Tom Kyte
April 13, 2008 - 8:33 am UTC

8i was as read consistent and all as 9i - the difference between 8i and 9i was basically that pinging was done via DISK in 8i and before, via the high speed interconnect in 9i and above.

One instance per host

Radhak, May 09, 2008 - 4:55 pm UTC

This thread has lots of good information. Thank you.

Can you please elaborate your comments
>>>>
Instance "A" can easily launch a denial of service attack on instance "B", yes.
You already know my stance on this, I've stated it very clearly.
One instance per host. <<<

Do you allude to the performance impact that instance A can have on instance B given the same set of resources?

Thanks
Radhak


Tom Kyte
May 12, 2008 - 12:05 pm UTC

I'm not alluding - I'm stating as loud as I can - instance A can consume 100% of the CPU, 100% of the RAM and 100% of the IO capabilities of the machine, making it so that instance B cannot perform any work.

And someone responsible for instance B would not be able to do a thing about it.

Reponse to above

Alexander the ok, May 12, 2008 - 12:45 pm UTC

If we are talking RAC, aren't performance concerns very secondary to the fact that RAC can just reboot a node if it detects a problem?

That would be nuts putting another instance on there.

How to correlate Redo Thread and INSTANCE_NUMBER in RAC

Shivdeep Modi, July 04, 2008 - 5:42 am UTC

Hi,

Generally redo thread on an instance in RAC instance is same as the intance_number. This simplifies administration.But what if the redo thread is not as same as the instance number?
How do we correlate redo thead(s) to their corresponding instance(s)?

Regards,
Shivdeep

How to correlate Redo Thread and INSTANCE_NUMBER in RAC

Shivdeep Modi, July 04, 2008 - 9:08 am UTC

Ok, I've managed to find this information.

FAILOVER_MODE

Chaman, October 13, 2008 - 12:49 pm UTC

Tom,
Following is our tns entry for an Oracle RAC DB on Oracle 10g:

TNS ENTRY 1
-----------

SRMINST_1 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = some ip address 1)(PORT = 61900))
(ADDRESS = (PROTOCOL = TCP)(HOST = some ip address 2)(PORT = 61900))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = srm922)
(FAILOVER_MODE =
(TYPE = SESSION)
(METHOD = BASIC)
(RETRIES = 20)
(DELAY = 3)
)
)
)


TNS ENTRY 2
-----------

SRMINST_2 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = some ip address 1)(PORT = 61900))
(ADDRESS = (PROTOCOL = TCP)(HOST = some ip address 2)(PORT = 61900))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = srm922_S1)
(FAILOVER_MODE =
(TYPE = SESSION)
(METHOD = BASIC)
(RETRIES = 20)
(DELAY = 3)
)
)
)

SELECT NAME, VALUE
FROM GV$PARAMETR
WHERE NAME = 'service_names'

NAME VALUE
service_names SRM922, SRM922_S1, SRM922_S2
service_names SRM922, SRM922_S1

Question:

When I connect to DB using either tns entry 1 or tns entry 2, it always connects to service SRM922.

Since I have not configured the RAC DB, Iam not aware are there any other parameters to be set. Moreover I have seen the number sessions and number of open cursors are more on one of the instance.

Thanks

Tom Kyte
October 14, 2008 - 4:58 pm UTC

I don't know what you mean - since srm922 and srm922_s1 basically are equivalent - both instances are 'that service' 'all of the time'

RAC on extended distance clusteres

Sumon, October 19, 2008 - 1:51 am UTC

Hi Tom,
Thanks for your supports.
I have some questions about RAC.
Our office is going to take decission about RAC.
Our server's are at different geographical location. e.g.
Location Server Nos. Server Type
-------- ----------- --------------------------------
Newyork 2 IBM Xeon 64bit 8x2 cpu, 64Gb RAM
London 1 IBM Xeon 64bit 8x2 cpu, 64Gb RAM
Singapore 1 IBM Xeon 64bit 8x2 cpu, 64Gb RAM
Malaysia 1 IBM Xeon 64bit 8x2 cpu, 64Gb RAM
Bangladesh 1 IBM Xeon 64bit 2x2 cpu, 2Gb RAM

Servers will be run with OLTP applications e.g VOIP telephone over RHEL4/5. So,
1. it is possible to setup RAC with LONG geographical distance?
2. if so, than it will be wise?
3. if so, we will get good performance?
4. if so, how minimum dedicated internet speed required?
5. which brand/type of common storage you suggest?
6. you have any suggestion without RAC?

Please give me a brief advice.
Best regards.




Tom Kyte
October 21, 2008 - 11:18 am UTC

1) no, not at all. latency would kill you and - this is big - it would not alleviate the single point of failure. One of the sites would have the disk. that site goes away - well, there you go, gone.

RAC is for a room.
Dataguard for failover is for distance.


RAC is entirely inappropriate to even consider given your setup and distances.



Now, that said, I would be looking at a single server that everyone uses. We have offices ourselves in all of those locations (and more) and we run a single data center in the US for them - their email, HR, financial, file server services - etc - all run in a centralized location (with data guard for failover). You only want a single site. Really. Distributed complexity should be avoided - it is not easier, it is not faster, it is just more complex and harder to maintain.

Real Application Cluster

Lalu, November 04, 2008 - 11:06 am UTC

Hi Tom,

The clusterware(CRS,CSS,EVMD),ASM along with the Database comes up automatically during the server reboot/startup.

Can we overwrite this normal behaviour?
Is it good to do so?

Thanks.

Tom Kyte
November 11, 2008 - 12:15 pm UTC

you control that via the services

http://docs.oracle.com/docs/cd/B28359_01/rac.111/b28255/intro.htm#CJHHEDIC

how you disable those varies by OS of course - you could turn them off, but I would ask "why"

If this is a database machine, you would want it to be ready to be a database machine upon a restart.


10g RAC

A reader, November 19, 2008 - 2:31 pm UTC

We have 10g RAC in place with 6 nodes, we have 10 applications
but 4 of them use node1 and node2 and rest of 6 apps use other nodes.

we have 30 people in our group, and we all have our personal schema, and we do a lot of imports to our own schema (no exports , exports are provided by DBA every Friday). we use import/export utility not datapump.

On and off we end-up in the situation where we have to "RESTART" our oracle RAC instance (node1 and node2) where our own schema's are located. DBA opinion is that it is due to our heavy import activity, (each schema is approximately 1g in size) that puts some global lock on some service and oracle can not recover from that lock and we have restart.

I searched many places on internet for similar issues but found no information and it is hard to digest what DBA is saying (we need to restart oracle instance if you do a lot of parallel imports ? and if there is some global lock due to this activity, oracle can not recover from that)

Can you let us know if this is a known issue with oracle?
This made our mgmt (as this happens often enough and dev team can not use db ) re-think should we use oracle ?



Tom Kyte
November 24, 2008 - 10:57 am UTC

I am of the opinion that the restart is not necessary.

You give us nothing to work with, no one has performed (apparently, you don't provide anything) any diagnostic work at all to see what the underlying cause might be.


I think your DBA's should be a little more "inquisitive" - don't you?


it could be as simple as an enqueue lock (that is, an open sqlplus session that did something and someone walked away) and nothing as fancy as a RAC issue at all. But we have no idea because no information is provided. You make it sound all scary - but there is probably a pretty simple explanation - before they panic and shutdown abort - maybe, they should spend a couple of minutes to see what is causing the hold up?

RAC and multiple versions of Oracle,

A reader, November 25, 2008 - 6:32 pm UTC

Can we have multiple oracle versions (both clusterware and database) on the same server (servers in case of RAC)?

Also, if I have two different databases on the same cluster, can one of them use ASM storage and other use file system storage?

With regard to ASM storage, if ASM diskgroup has multiple LUNs associated, what if one of the LUN breaks down? Will we lose data or ASM will rearrange the data in the other available LUNs?

Thanks,

Tom Kyte
November 28, 2008 - 3:56 pm UTC

you could, but no one would ever do that in their right mind - short of a very early stage testing system.



A single database can use ASM, normal file systems, raw, ocfs, etc etc - simultaneously - you can do that across two different databases of course as well, but even with a single database (because you'd have to be doing it wrong to have more than on instance on a production machine) you can use asm and normal files.


if you were using ASM redundancy - we would re-mirror the damaged information (assuming sufficient free space elsewhere)


Real Application Clusters

lalu, December 01, 2008 - 9:33 am UTC

Hi Tom,

I am trying to set up a test RAC env using solaris+raw devices + asm.

I have installed clusterware and oracle RDBMS for 10.2.0.4 and it went on fine.

While trying to create the database or ASM instance using dbca getting some error like:
Oracle user equivalency not set....
PRKC-1030:error checking accessibility for miboss10, null.

But the pre and post cluvfy for user equivalency was PASSED.
oracle@miboss10> olsnodes
miboss10
miboss1
oracle@miboss10>

userid/group id are all same on both the nodes.

Thanks.
lalu.
Tom Kyte
December 01, 2008 - 10:47 am UTC

see note 473287.1

10g RAC

A reader, December 01, 2008 - 4:34 pm UTC

Yes. I understand I did not give you much info. as I don't have much my self. I wanted your opinion on it.

You are correct, I remember once our DBA saying we have enqueue locks, and da..da.. da... but the bottom line is, when I say drop and create schema, even if I have another session connected to the same schema using let's say sqlplus. does oracle choke ? why would it effect any other schema, and why would you have to restart the server even in the enqueue lock case ?


Tom Kyte
December 02, 2008 - 7:25 am UTC

If you have identified "we have enqueue locks" (instead of saying da..da..da.. which implies to me that you "shutdown and stop listening at this point as it is too technical or something"), then you have IDENTIFIED THE PROBLEM.

Someone locked a row
someone else wants that row
Unless and until the first guy gives it up the other guy WAITS and WAITS

Oracle is not choke-ing here, you have a very very simple "blocker/blockee" problem - an APPLICATION DESIGN issue.

Instead of going the da da da route, actually listen to what they are saying and if it isn't making sense, ask them to teach you about it. You sort of need to fix this, this would happen with or without rac.

It is not the app

A reader, December 18, 2008 - 12:04 pm UTC

This is Not an app issue it single threaded and one operation at a time, this is a day to day work issue. In my local env (i.e personal schema) I have a session open and let's say in WORST case i have an uncommitted update stmt. which locked the row. Now I went for a meeting as it is my LOCAL schema , I am the only one using it.. and when I came back in another session I issued a drop schema stmt. (do you expect me to remember that i had a lock in another session?) is that going to wait for ever ? you've got to have timeout on this, otherwise it is not practical to wait for 3 hrs. to drop the schema and it is still not done!

It not a question of java or database developer if you are just waiting and waiting on it, it is just not helping! how long would you wait ?
Tom Kyte
December 29, 2008 - 1:56 pm UTC

umm, sorry - but you cannot say "this is NOT an app issue"

it is entirely - completely - without a doubt - an application issue.

...(do you expect me to remember that i had a lock in another
session?) ...


No, that is the job of the database to remember that for you and it DID SO. It is working entirely the way it was designed to work.


and the drop user would not hang forever - it would stop like this:

drop user "OPS$TKYTE" cascade
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-00054: resource busy and acquire with NOWAIT specified


it will be a function of how long it took to get to THAT TABLE to figure out how long it would take for the drop user to ultimately fail. What if there were 1,000 tables and this table was the last to be dropped. You would wait for 999 tables to be dropped - and then it would fail.


If this is something you need to do over and over - and you think that there is a good chance someone will have an outstanding transaction - you should either

a) query the v$tables before you issue, see if there ARE ANY outstanding transactions. It won't assure you of anything obviously, since between the time you query and the time you do the operation - the status could change

b) don't use drop user, use drop <schema object>. But now you have a situation whereby you could end up dropping 99.9% of the schema, but not the last bit.


Probably (a) is all you need (and should have been a rather obvious approach?) But I guess with your "da da da" comment - we know that nothing is going to be obvious in the database, you have decided to "black box it", rather than understand it.

Thank you for your reply

A reader, January 02, 2009 - 12:17 pm UTC

Thank you for your reply..

you've explained me
"it will be a function of how long it took to get to THAT TABLE to figure out how long it would take for the drop user to ultimately fail. What if there were 1,000 tables and this table was the last to be dropped. You would wait for 999 tables to be dropped - and then it would fail. "

so if I issue drop user stmt. and my table is let's say 45 out of 80 tables that I have.. database would actually drop 44 table physically, and than fail? what good would that do ? we already drop half of the tables ?

on understanding end, yes I should understand and I do up to a level, but we have DBA group to menage it, Now a days, softwares provide vast rage of features and their apis, and if you try to lean everything on multiple softwares it will be too much time, you rather focus on your particular expertise.. no ?

Tom Kyte
January 05, 2009 - 10:37 am UTC

... and if you try to lean everything on multiple
softwares it will be too much time, you rather focus on your particular
expertise.. no ?
...

and are you trying to code database applications? Your expertise is building database applications

don't you think you should, well, understand what you are doing, how the database works?

NO?

I'll never get it.


You, as a database developer, actually do need to understand how this database thing works. What a transaction is (and isn't), why they are relevant, what could cause a bottleneck in an application that uses the database, how to correctly implement things.


I'll never get it. Yes, becoming proficient at what you do takes time. It takes a lot of time and work. But until you do it, you'll never be successful, you'll constantly be in the state of mind:

"but it should work the way I dream it should work, I don't have time to learn how it really works, someone else will fix my problems for me"


Life doesn't work that way.

To the reader above....

A reader, January 03, 2009 - 4:27 pm UTC


But that is the whole point. You are anyways dropping the user. How does it matter if it has dropped 45, 999, 999999 or 99 billion tables? The point is if it gets stuck owing to the reasons stated, you just kill the blocking session and continue with the drop.

If you are hinting that a user be "FORCED" dropped even if someone is holding a lock on the table, I think it will have some implications.


Thank you!

A reader, January 05, 2009 - 5:15 pm UTC

"but it should work the way I dream it should work, I don't have time to learn how it really works, someone else will fix my problems for me"

No, it should not work the way I dream it, obviously it "Might" work as may be you dreamed it :)

I actually think it should work in way which makes logical and practical sense. I am not questioning the way oracle makes decision because all software houses have their own hacks.

But this is a very basic issue I believe, that if you have a 100 tables and you are physically dropping 44 tables and on 45th table oracle realize that oh.. it has some lock I should stop.. leaving schema is non usable state and not completing task and in we don't now where we are ! instead of providing a definite state.

There is a vast community that uses oracle, and I believe they understand/learn from documentation and experience. So I guess there is no right or wrong thing , it is the way it works, and people learn it.. so that's why I think it should be logical not the dream!


XRAC

Jose Caodaglio, January 19, 2009 - 8:59 pm UTC

Hi Tom

Is there a guideline to determine the network bandwidth to an X-RAC instalation (say 7 miles between two sites)?

I think that latency is a constant requisite, but bandwidth is all about the volume of insert/update per second. Is this correct? Is there a methodology to calculate the bandwidth?

Thank you very much
Tom Kyte
January 19, 2009 - 9:55 pm UTC

it is almost entirely latency based. If you can afford a low latency network over 7 miles, you probably have infinite bandwidth.

I've *never* understood the desire for this configuration, never - I'll try again - why is this considered a good idea?

X-RAC

Jose Caodaglio, January 20, 2009 - 7:22 pm UTC

I totally agree with you. But I think we have some tangible issues:
1- The company can make loadbalance between the two sites so they can use all server power available
2- They have a very critical application (broker, hundreds/thousand investments per minute). Probably if they keep ten minutes without their systems it would be terrible
3- They can pay for the solution and have the desire to have the most modern architecture.
4- It´s a muscle demonstration from Oracle ;-)

What is the problem with this solution beside the complexity?
A few more questions please

Do I need to have the third site (voting)?

Using ASM means that I have an active-active storage architecture. Does it means that a query against the site B would be completed supported by this site and this would be a huge advantage over SRDF with active-passive storage architecture (only one storage would work for both sites increasing the use of net resources)

thanks


Tom Kyte
January 20, 2009 - 7:35 pm UTC

but you can do all of that in a room.

what advantage do you see with miles between? given that data guard can definitely be way less than 10 minutes to come to the rescue.

I'm sort of serious, I haven't "gotten it" yet. What is the advantage?

XRAC

Jose Caodaglio, January 20, 2009 - 7:50 pm UTC

You will have to have the site backup anyway. If you can use the servers in that site, you will have ~ twice computing power available.

With Dataguard solution, you can´t use the backup site for the production (report Ok, to use streams you would have to handle inconsistencies)
Tom Kyte
January 21, 2009 - 8:07 am UTC


With dataguard the configuration would be:

rac in a room - for unplanned outage minimization.
data guard over a wide area - for disaster recovery (stretch clusters do not make a disaster recovery site)

stretch clusters - please - again - (since you still need data guard if you are serious about availability) - what do they buy you?


XRAC

A reader, January 21, 2009 - 9:11 pm UTC

I have read about this Tom.

I can have two servers in both sites with XRAC at the same time that one site sends logs to the other. This means XRAC for active active purpose and to maximize the use of computer power avaiable at the same time that I have the dataguard working.

Having this in mind, I am not sure why XRAC would not use as a DR strategy. I would like to understand it. But if this is correct, I can have dataguard working together with XRAC
Tom Kyte
January 22, 2009 - 8:42 am UTC

Your disaster recovery site has to be much further away than a stretch cluster can support.

Think hurricane, earthquake, power failure (Ontario, US Northeast and Midwest August 2003) and so on.

And - for availability purposes - you don't want multiple databases running on that machine. You have conflicting goals there. You can have failover machines in each of the stretch data centers, but they should be separate and distinct from production. And - they are just not far enough away from each other to provide for disaster recovery.

XRAC

A reader, January 22, 2009 - 9:28 pm UTC

The necessary distance is calculated considering the potencial risks Tom,

In Brazil we do not have hurricanes, earthquakes, etc..
So, the distance is perfect for us and acctually the customer has an OK from their auditing.

Could you answer these questions?

Using ASM means that I have an active-active storage architecture. Does it means that a query
against the site B would be completed supported by this site and this would be a huge advantage
over SRDF with active-passive storage architecture (only one storage would work for both sites
increasing the use of net resources

thanks
Tom Kyte
January 23, 2009 - 8:28 am UTC

so, in 2002 and 2003 when you had a 6.9 and 7.1 magnitude quake??

If you have 11g, the answer is "yes, you can do that"
In 10g, that was not there.

It doesn't matter if the customer has an OK from someone, especially someone referred to as "auditing". Doesn't mean that group knows what they are talking about.


do not do data guard onto the same machines running the databases. You'd actually need to have each instance ship to two data guard instances.

But one wonders "why, why would you do that" if you do it on the same hardware (the database is *already there*)

XRAC

A reader, January 23, 2009 - 10:03 am UTC

Tom, for any practice view, we do not have earthquakes. Brazil is bigger than the continental part of the US so is reasonable that you would find some isolated events in google.
I never heard about a destroyed building by an earthquake here and almost all companies have this as a valid premise.
So, not all techniques are wrong when propose a 20KM distance backup site.

I told you the reasons
I can use all computer power available, I have a very fast recover capabilities, I can pay less because they already have the network infrastructure

Another question

The third site is mandatory for voting disk proposes?


thanks





Tom Kyte
January 23, 2009 - 10:29 am UTC

but if you run data guard on the same machines, all you will have done is waste resources (those machines already have the database). So, data guard on those machines = not a good idea.

And if you want to imagine that your immune to natural or artificial disasters - for what you describe as a really important "has to be up" system, ok - not my decision obviously.

see
http://kevinclosson.wordpress.com/2008/02/06/buiding-a-stretch-real-application-clusters-configuration-get-the-crs-voting-disk-setup-right/

X-RAC

A reader, January 23, 2009 - 7:11 pm UTC

I agree with you that DG is not necessary with XRAC. I was considering this because first I had understood that there were tech reasons to not consider XRAC as a DR strategy.

Thanks for your support


Tom Kyte
January 24, 2009 - 1:17 pm UTC

there are tech reasons to consider xrac as not being a DR strategy.

Mostly because: IT ISN'T

It is a solution to help eliminate unplanned downtown. That is what RAC is - to help eliminate unplanned downtime (you will still have downtime).

Dataguard over long distances is for Disaster recovery.
RAC helps reduce unplanned downtime.

What is the case for XRAC

Arup Nanda, January 25, 2009 - 6:09 am UTC

Tom,

I always wondered about the appeal extended RACs have on some people. Can you think of a use case where extended RAC would be a good fit (over data guard)?

Some facts:

(1) XRACs have more distance between them
(2) more distance means more latency (and, possibly less bandwidth)
(3) therefore less interconnect efficiency
(4) [possibly] leading to more gc related waits
(5) where would the OCR and voting disks reside, which site? and what about the datafiles?
(6) when one instance crashes and the other reads the redo log of the threads of other instance, the process will be slow
(7) the database is still one - single point of failure.
(8) the SAN has to be mirrered locally; fibers don't go too far

If the objective is to run some apps locally on two different locations, it makes no sense to do that since the database has to be at one location. One set of apps will always be at a remote location. In that case what's wrong with simply running the app remotely and connecting over TCP/IP to the database?

If the objective is DR; it does not accomplish that at all. Where do you keep the DB - at the DR site or the main site?

So I really couldn't figure out that perfect scenario where extended RACs would shine bright. Your thoughts will be highly appreciated.
Tom Kyte
January 25, 2009 - 11:16 am UTC

Arup -

I stopped reading after I read this:

... Can you
think of a use case where extended RAC would be a good fit (over data guard)? ...

The answer to that is quite simply NO.

Data guard = disaster recovery, large distance separation
RAC (extended or otherwise) = reduced unplanned downtime (note: I said unplanned as a qualifier there, you WILL have downtime). RAC is always small distance, insufficient for disaster recovery. Regardless of how immune you think you are.

http://lenovoblogs.com/insidethebox/?p=195
<quote>
If you¿re a smallish business, (below 250 seats) you probably don¿t have an offsite backup strategy. According to the National Archives & Records Administration in Washington, D.C., 93% of companies that lost their data center for 10 days or more due to a disaster filed for bankruptcy within one year of the disaster. Of those companies, 50% filed for bankruptcy immediately
</quote>



all valid points above, but for #5

o asm would mirror to two different sites (that is a performance issue, yes)
o read the kevin c link above regarding the voting stuff

Thanks for confirming

Arup Nanda, January 25, 2009 - 11:39 am UTC

I never really understood the obsession with extended RACs and was always alarmed to see people substituting DG with it. Thanks for confirming, Tom.

Ricardingo, February 14, 2009 - 6:56 am UTC

Hi
I`m liittle bit confused about the Past Images in Rac environment.

"When the same dirty block is requested by some other instance, an image of the block is created in owning instance and then the block is shifted to requesting instance. This image copy of the block is called Past Image (PI)."

What is the point of creating Past Image,I mean there is already undo blocks for consistent read...

Tom Kyte
February 16, 2009 - 12:12 pm UTC

and we use undo blocks to create past image blocks - in RAC, in single instance Oracle - in all releases.

We use the undo to create older versions of the data block, that is what we need - the data block - the undo is just the means to the end.

Ricardinho, February 16, 2009 - 5:51 pm UTC

In Rac environment
every instance has its own undo and redolog files, however there is only one temp tablespace which is shared.

What is the purpose of this Tom?
Tom Kyte
February 16, 2009 - 6:19 pm UTC

what is the purpose of what?


you share tablespaces too - your data for example....


we do not share redo because those are "streams of information" and a transaction operates on a given node - it need not be shared.

we do not share undo because - similar to redo....


but so what if we share temp? we might actually need it on different nodes (think parallel query across nodes). It does not hurt (your rollback could all be on the same disk regardless of node, it isn't an IO thing, it is a "we don't have to share this" thing)


A reader, February 21, 2009 - 10:32 pm UTC

Hi
What is the diffrence between using the failover parameter in clientside and serverside?
Tom Kyte
February 21, 2009 - 11:17 pm UTC

a bit more clarity here - what do you mean exactly - please be precise....



Oracle 10g RAC on solaris 10 using vmware.

Santosh Vijayan, February 26, 2009 - 4:41 am UTC

Hi Tom,

Thanks a lot for your earlier responses. This time I highly hope to get some tips on configuring Oracle 10g RAC on solaris 10 using vmware.

I have a Philips laptop (2 gigahertz, dual core, 4 GB RAM , 350 GB hard disk & Windows vista home). I have installed vmware workstation on laptop. Also managed to install two nodes of solaris 10 x86 Operating system on vmware. On both the nodes oracle 10.2.0.4 has been installed. Network connectivity is working properly between the nodes. I am able to configure dataguard between the nodes and practise failover and switchover.

I want to practise 10g rac on this machine on solaris x86.
Can you guide me on below?

1> Can you please let me know if my hardware requirement meet?
2> Can you provide any links for 10g RAC installation on solaris x86 using vmware. Or is it the same as any standard installation of RAC on solaris.

3> Can you please provide links for downloading the software for the 10g RAC .. Any 10g version will do?

4> Any further advice or links will be highly helpful.

Thanks a lot

Santosh

Problem with a 10g RAC Env

M, March 11, 2009 - 1:04 pm UTC

Hi Tom,

My production evironment runs Oracle 10g RAC on Solaris Box. This was running fine until we started receiving the below error on daily basis during the execution of the ETL process.

GES: Potential blocker (pid=9000) on resource TM-00029311-00000000;

What I fail to understand is that, the ETL process runs 4 times in 24 hrs but the series of error are only seen once during the first ETL run of the working hours. I understand that this would be the main ETL run which would be processing most of the data in comparison to the other 3 ETL's of the whole day.

While going through a document on Oracle Metalink (Doc ID: 473124.1), I found out that the main reason to trigger this error would be a missing index on a foreign key. Please can you put some light on this error and also a possible way to avoid this.

Thanks
Tom Kyte
March 12, 2009 - 12:24 pm UTC

do you have unindexed foreign keys?

and if so do you

a) delete from the parent
b) update the parent primary key
c) merge into the parent table?


Oracle 10g RAC on solaris 10 x-86 on vmware

Santosh Vijayan, March 18, 2009 - 11:00 am UTC

Hi Tom,

I am practising 10g RAC installation on my laptop, wherein I have vmware workstation and solaris-x86 version 10 installed on it.

I am trying to install Oracle 10g clusterware.
I have followed the steps for clusterware installation.
I am facing an errror on one of the node while running root.sh.

On the first node, the root.sh runs fine.
The following is the log.

bash-3.00# ./root.sh
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: xsan001 xsan001-priv xsan001
node 2: xsan002 xsan002-priv xsan002
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /dev/rdsk/c0d0s4
Format of 1 voting devices complete.
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
xsan001
CSS is inactive on these nodes.
xsan002
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.

=============================

On the second node, the root.sh gives error.

bash-3.00# ./root.sh
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/app/oracle/product' is not owned by root
WARNING: directory '/u01/app/oracle' is not owned by root
WARNING: directory '/u01/app' is not owned by root
WARNING: directory '/u01' is not owned by root
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: xsan001 xsan001-priv xsan001
node 2: xsan002 xsan002-priv xsan002
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /dev/rdsk/c0d0s4
Format of 1 voting devices complete.
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
Failure at final check of Oracle CRS stack.
10

Can you provide some sort of clue what could be done to resolve the above error?

Thanks a lot Santosh

A reader

A reader, April 22, 2009 - 7:38 am UTC

I would like to know whether we can have multiple listeners on RAC nodes using totally different network(seprate VIP). Our requirement is to have different application databases using separate listener which is not related to each other in terms of network address and port etc. We are thinking to do this because of security requirment.
Or is there any other way to achieve this ?

Cheers
Tom Kyte
April 23, 2009 - 12:45 pm UTC

I'm not following you.

Can your listener be on another machine? Yes

Your listener must be able establish network connectivity with the database server (the server typically has two networks, private for interconnect and public for everything else)

I don't see the security tie in? But whatever. You can have the listener on another machine, it needs to be able to establish networking with the server, as does the client.

Read only tablespace in RAC Environment

A reader, June 24, 2009 - 2:54 pm UTC

I was told by my fellow DBA , having tables in a readonly tablespace in a RAC environment would yield better performance,
as Oracle would do less work ( in cache fusion ) in transferring blocks .

Is there any Oracle documentation to this statement ?
Tom Kyte
June 26, 2009 - 10:12 am UTC

nope, but only because it isn't true.

If you don't modify the blocks in a read/write tablespace - we don't have to transfer them all of the time either - it would be the same.


Some people mistakenly believe that read consistency is turned off for read only stuff. It isn't.

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:895410916429


but, you know, you can always benchmark.


I would ask DBA for the references - if the references do not have any "quantitative numbers to back up the claims", ignore the references.

RAC and SGA

David, July 06, 2009 - 6:19 pm UTC

We have a two node RAC. On one node we have 8GB memory and on the second node we have 12 GB memory. Currently, the SGA size and pga_aggregate_target are configured the same on both instances. Question: if we configure sga size and pga_aggregate more on node2 compared to node1, would it affect execution plan for sessions connected to node2 as it has more memory allocated for oracle to use.

Thanks
Tom Kyte
July 06, 2009 - 8:39 pm UTC

would it - perhaps, it has the possibility to arrive at different plans since the optimizer can and will look at that information.

So yes, a query could come to a different plan if you have different init.ora parameters that the optimizer uses (pga is one of them)


RAC and SGA

David, July 06, 2009 - 9:38 pm UTC

In our environment oltp work load and dbms jobs that run during night time to produce reports are the same in our 2 node RAC.
(1) Do you think the Best practice in our case is to have the same memory setting for both instances?
(2) Regarding the possibility of different execution plan on the instance that has more memory allocated, is it going to be better or worse?
Is it worth trying in a production environment?
Thanks.
Tom Kyte
July 06, 2009 - 10:32 pm UTC

1) not if the machines actually have different memory resources - not necessarily. Just be aware that different plans *could* potentially result. The answers will be the same but a query might run faster on one node than the other.

2) in theory - it should be better of course :)

A reader, July 08, 2009 - 1:18 pm UTC

We have Java application running against 10g on Linux ( RAC ) .
What will be the value addition of "Oracle Coherence " to this mix ?

What are the pros and cons of Oracle coherence ?

Tom Kyte
July 08, 2009 - 3:55 pm UTC

you are asking the wrong person. I do not believe in application data caches.

Alexander, July 08, 2009 - 3:30 pm UTC

Tom,

I apologize in advance for the theoretical question.

I have a 2 node RAC cluster, we just lost a server earlier this morning. The node 2 was active and ok, but for some reason no one could connect to the cluster and hit the active node. I logged on and connected via the listener no problem. When I tried connecting remotely from my desktop, it just didn't do anything, no errors it would hang.

So, my question is, is there any kind of server failure that would be so severe that the state of the cluster could not be communicated to the voting disk in time and that the cluster was confused about what was actually available? It felt like I was being routed to the machine that was down.

As soon as the other machine was brought back up, every worked fine. That to me rules out any other outside factors like network etc. It seems like something didn't transition during the failover, but I've never seen this before and we've tested RAC failovers many times (but shutting down via server console).
Tom Kyte
July 08, 2009 - 4:03 pm UTC

... I logged on and connected via the listener no
problem. When I tried connecting remotely from my desktop, it just didn't do
anything, no errors it would hang.
....

sounds like your network configuration is wrong. when you "logged on", I presume you mean "logged onto the active server, node 2" - and there the listener address resolved correctly.

when you attempted from your desktop - you used a different tnsnames.ora file entirely and that one resolved to a listener that was unreachable (and TCP/IP wasn't timing out fast) or a database that was unreachable.

Alexander, July 08, 2009 - 4:32 pm UTC

Once again I think you are right. Do you see what could explain what I described:

Server:

OCP25P =
(DESCRIPTION =
(ENABLE = BROKEN)
(ADDRESS_LIST =
(LOAD_BALANCE = YES)
(FAILOVER = YES)
(ADDRESS = (PROTOCOL = TCP)(HOST = x30bpdc-v)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = x30cpdc-v)(PORT = 1521))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = OCP25P.LMIG.COM)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES = 120)
(DELAY = 5)
)
)
)

Mine:

OCP25P =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = x30bpdc-v.lmig.com)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = x30cpdc-v.lmig.com)(PORT = 1521))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = OCP25P.LMIG.COM)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES = 180)
(DELAY = 5)
)
)
)
Tom Kyte
July 08, 2009 - 7:15 pm UTC

not knowing your network topology, I cannot really say - I don't know what the hosts would resolve to from the server versus your client or how the tcp/ip connections are setup (timeouts and such)

Alexander, July 09, 2009 - 10:45 am UTC

Where is the tcp/ip stuff defined?
Tom Kyte
July 14, 2009 - 2:51 pm UTC

in the operating system configuration - your /etc/hosts, your dns servers, your system configurations.

Reader, July 12, 2009 - 12:55 am UTC

Regarding above:

Just use the same tnsnames entry on your client machine with what is described on the server and try. I don't see Failover=yes on yours. maybe that is why it is not connecting to the available node??
Tom Kyte
July 14, 2009 - 5:27 pm UTC

failover defaults to yes..

Alexander, July 15, 2009 - 4:05 pm UTC

Tom,

I contacted support, they got a RAC expert on the horn for me. Apparently you have to register a local listener specifically to use a vip and not the host name. Did you know that? I didn't ;) I presume there is documentation to explain this? He explained why this is, I had some difficulty understanding it though.
Tom Kyte
July 15, 2009 - 4:47 pm UTC

when you had them on the 'horn'....

did you actually ask them for documentation/note references? reading material to explain to you what they were saying verbally?


Alexander, July 16, 2009 - 12:34 pm UTC

He wrote it up for me on the SR.

I searched the documentation pretty extensively, I didn't see anything that says "you need to set the local_listener parameter or failovers won't work". I have a RAC book that doesn't mention it either. Pretty surprising for such a serious configuration item.

I figure the docs are so vast, it has to be in there somewhere.

Reader, August 06, 2009 - 3:56 pm UTC

Tom

I was running a procedure in a package and got the following error. The procedure has delete and insert statements.

ERROR at line 1:
ORA-25408: can not safely replay call


Can you please let me know what could be the cause for this error?

Oracle Version: 10.2.0.4.0 - Real Application Clusters
Tom Kyte
August 06, 2009 - 4:34 pm UTC

$ oerr ora 25408
25408, 00000, "can not safely replay call"
// *Cause:  The connection was lost while doing this call. It may not be
//          safe to replay it after failover.
// *Action: Check to see if the results of the call have taken place, and then
//          replay it if desired.
//


your node failed.

Reader, August 10, 2009 - 10:47 am UTC

Tom,

I asked the DBA if the node failed. I was told that there was no node failure. Could there be any other reason for this error?
Tom Kyte
August 13, 2009 - 8:42 am UTC

the node failed or you otherwise lost your connection to that node - network error on your side of the network, their side of the network - you lost your connection to that node.

RAC testing

Reader, August 17, 2009 - 7:18 pm UTC

We are upgrading AIX OS (5.2 to 5.3) and our aix admins wanted to test to see if RAC can come up with out any issue. What our aix admin is doing is that they take an image of OS from prod server and build a two node test RAC cluster using that image and upgrade the OS and hacmp on test cluster to a higer version. Assuming I have RAC RDBMS software installed on the test cluster, Is it possible to test to see if Oracle RAC can start up on the dev servers that does not have any database? because we have no disks attached to the test cluster. I was thinking if I do startup nomount, if there is an issue with OS or HACMP or RAC set up, then oracle should report in alert log file even in nomount state. correct? Thanks.
Tom Kyte
August 24, 2009 - 4:29 pm UTC

you'd be able to see if the instance can start, yes.

but that is all - you'd sort of want to have a database to actually mount, open, and play with - if you really want to test.

The unzip of archive file ./jrepack.zip failed

Tony, November 03, 2009 - 5:54 pm UTC

Kindly help me. I'm trying to install 10g R2 RAC in Red Hat Linux. When I do cluvfy,
I get the below error: [oracle@node1 cluvfy]$ ./runcluvfy.sh stage -post hwos -n node1
The unzip of archive file ./jrepack.zip failed



Tom Kyte
November 09, 2009 - 3:04 pm UTC

please utilize support for....

things that are obviously support issues?

The unzip of archive file ./jrepack.zip failed

Tony, November 03, 2009 - 5:57 pm UTC

Continuation of previous post....
The unzip untility is in the right $PATH

Oracle 10g database on 11g CRS and 11g ASM

P Mate, December 11, 2009 - 10:56 pm UTC

Hi Tom,

I am planning on installing Oracle 10gR2 RAC database on Oracle 11gR2 ASM & 11gR2 CRS on RHEL 5.4 x86_64 bit platform. There are 3 databases on this 2 node cluster but one of them needs to be 10g due to application compatibility issues. Do you foresee any limitations or issues going forward with this path? I tried doing some research but this one needs expert advice so looking up to you on providing some.

thanks for your advice and suggestions in advance.

P
Tom Kyte
December 14, 2009 - 8:11 am UTC

... There are 3 databases on this 2 node
cluster..

what a waste, you lost me right there.

running more than one instance on a host, especially when the host is part of a cluster, is asking for trouble - configuration trouble, performance (massively so) and so on.

I don't go this path, so I have nothing to convey to you experience wise.

reader

A reader, December 28, 2009 - 3:02 pm UTC

For a 2-Node RAC 10g cluster do I need ASM instance on both nodes?
can ASM instance on one node service both nodes since management on storage is done by ASM. However is not true that the sharing of the storage is specific to capability of storage/disk system ?

Thanks
Tom Kyte
January 04, 2010 - 8:13 am UTC

ASM is like a file system driver, it would be loaded and running on any machine that wanted to have access to that file system

If you ran it on one node only, it would sort of be a single point of failure otherwise.


You run it in every node that needs access to that filesystem.


... However is not true that the sharing of the storage is specific to
capability of storage/disk system ?
...

you need to have the proper hardware in place if that is what you mean, yes.

SOLARIS ZONE with RAC.

A reader, January 28, 2010 - 3:49 am UTC

Hi Tom,

I am in a process of some testing for setting a 2/3 node RAC system.(On single BOX).
I have a SUN Enterprise (32CPU with 64GB RAM) and planning to implement 3 zones(container) in it.

I was just going through Metalink Note:317257.1 and found that RAC is not supported on Local ZONE.


Do we have something like VMWARE for solaris used for setting RAC on single node in linux environment?

Or its totally unsupported?
Thanks.
lalu.

Alexander, February 02, 2010 - 9:56 am UTC

Tom,

We are experiencing problems with our vendor application not being able to deal with node failures. When a server goes down, they are unable to reconnect without bouncing services. It is not using an application server, but rather the application itself is managing the database connections.

For whatever reason, we do not see the same behavior in our test environment. We have reviewed everything we can at the RAC level. Can you suggest some things to look for, to ask the vendor to help determine what could cause this?

Unfortunately we cannot take advantage of Oracle's team that specialized in this, (I'm sure it's a cost thing) in case you were thinking that.
Tom Kyte
February 02, 2010 - 12:33 pm UTC

please utilize support - they will gather relevant bits of information like... your configuration, the error messages, more details on the symptoms


this level of detail isn't really sufficient to say much.



Alexander, February 02, 2010 - 1:56 pm UTC

They won't help with application problems, we tried. They just refer us to that team I mentioned.

I was hoping to draw from your experience, if you've ever worked with customers with this kind of a problem.
Tom Kyte
February 02, 2010 - 2:27 pm UTC

support has to help you - that isn't right.

One would need supporting information - of the type I mentioned above.

Alexander, February 05, 2010 - 10:42 am UTC

Some are great, some are horrible.

For example, the told me to remove some extraneous services we have running, I'm testing out the command and can't get it to work. I asked for them to assist, they told me to open another SR saying it was related, a command THEY TOLD ME TO RUN.

Anyway, can you help? How is this possible?

/database/oracle
(x223kdc:oracle)> sqlplus /

SQL*Plus: Release 10.2.0.3.0 - Production on Fri Feb 5 11:13:52 2010

Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SQL> select name from dba_services
  2  /

NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
seeddataXDB
seeddata.regress.rdbms.dev.us.oracle.com
OCP25XXDB
OCP25X.LMIG.COM
REPMAN.STREAMS_APPLY_Q

7 rows selected.

SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

/database/oracle
(x223kdc:oracle)> srvctl remove service -d ocp25x -s REPMAN.STREAMS_APPLY_Q -i OCP25X1,OCP25X2
PRKO-2017 : Service REPMAN.STREAMS_APPLY_Q does not exist for database ocp25x.


Stopping services

Jose, February 24, 2010 - 4:05 am UTC

Hello Tom:
Thanks for this great site.
I have a two nodes rac database.
I have defined a service and configured the listeners and clients (tnsnames.ora) to access the database using this service.
I expected that when i issued the command srvctl stop -d database_name -s service_name, the access to the database through the service was disallowed, but clients can connect when the service is stopped.
So, what is the funcionality of stopping a service?



Connecting to RAC instance from java

A reader, May 04, 2010 - 11:30 pm UTC

Hi Tom,

We have Oracle 10gR2 RAC on AIX. I am java newbie working on an application , and I connect to database from the java class.
Since we have the db running on two nodes, is there anyway to specify which node to connect to from java?

I am running into issues when I try to read a file on the db server that is stored on node1, and the db connection opened via the java application is on node2 - so it cannot find the file.

thanks a lot in advance.
Tom Kyte
May 06, 2010 - 1:50 pm UTC

uh oh

you have a major, huge design flaw here.


rather than answer your question, I'm going to ask you "why" - why are you architected in a manner that would force you to do this. Why are these files either

a) not in the database, they belong there.

or

b) not on shared storage.


You use RAC for availability, you are defeating the entire purpose here.

How to invoke a java stored procedure on node2 from node1

Aditya Nigam, May 18, 2010 - 3:03 pm UTC

Tom,

I am new to RAC.
suppose I have a 2 node RAC cluster, say node1 and node2.

I want to execute a java stored procedure on node2, but because of load balancing, it is invoked on node1.

This procedure is supposed to return the content of a file local to the node. The problem with the situation is that we cannot make these files to be read by the java stored procedure as Shared.


So my question is there a way to invoke this java stored procedure on node2 from node1 ?


The solution which we can think of is to create a schedule from node1 for node2, and wait till the procedure executes completely.
But it does not seem to be cleaner approach. Is there a direct way for the same without involving the schedules ?

Regards
Aditya
Tom Kyte
May 24, 2010 - 9:36 am UTC

you do not even want to go down this path.

You do want to do one of two things (option#1 being by far my preferred method)

1) load data into the database where data belongs. files in the OS are not going to be universally accessible as you have seen - nor are they kept in sync with the database (think backup and RECOVERY), nor are they secured (they are just files after all).

Get them in the database, then you can backup, recover, secure, and access them.

2) get them on a shared file system - you have one already, you are running rac, put them there



Your problem today is two nodes, what about tomorrow when it is 3, then 4 and so on - do not go down this path.

master node in 11g RAC?

jian huang zheng, June 06, 2010 - 12:16 am UTC

Hi Tom,
Do you have any idea regarding to master node concept in RAC?
if so, what does a master node do in RAC?
Thanks,
Tom Kyte
June 09, 2010 - 8:15 am UTC

that is more a term related to replicating data - a master. Do you have a reference to it in use with RAC so I can see it in context?

Alexander, June 08, 2010 - 2:35 pm UTC

Tom,

I have a RAC cluster that keeps trying to evict nodes and the reboot never completes. It freezes up the server to the point where the sys admins can't even get on the RIB.

Do we have any control at all over how Oracle issues the reboot, or if we can't configure it not to do that?

Oracle Coherence and RAC

sam, June 24, 2010 - 1:01 am UTC

Hi Tom,
Oracle is promoting Oracle Coherence data grid quite a lot.
Do you foresee it replacing RAC or may be reducing the nodes in a RAC environment and increasing the nodes for application server running coherence?
You said somewhere in this thread "I do not believe in application data caches". What are the disadvantages?
Tom Kyte
June 24, 2010 - 7:46 am UTC

In a word:

no


The disadvantage is that of true coherency, if you cache in the middle tier - guess where you cannot safely do things anymore... The database itself.


Tool or Funtionality

Vinnie, July 23, 2010 - 1:41 pm UTC

Tom,

Is ORACLE RAC a set of tools (GUI's) or is it more of functionality or programs provided? If a tool, do you have a link to see GUI screenshots? Thanks
Tom Kyte
July 23, 2010 - 4:05 pm UTC

RAC is the ability to have more than one instance mount and open a single database. It is a database option (feature of the database). There are GUI's in Enterprise Manager that allow you to manage your RAC implementation but it is a database thing, not a GUI

Query to find blocks of table owned by each instance in RAC

Tony, December 13, 2010 - 9:09 am UTC

Tom,

Kindly help me to find answer for the below questions:

1. Multiple instances access same tables in RAC database. How to find block# of a table owned/accessed by each instance?

2. Does one instance masters all data blocks of a table or each instance can own set of blocks of a table based on the blocks accessed by each instance?

Thank you.


database in separate servers clustering

Mohannad, December 29, 2010 - 1:33 am UTC

Dears,
we have two database installed in different servers in different branches, we are looking to synochronize those database online, at the the same time when the online connection disconnected we need each branch to connect to his local database because we dont need our buisness porcess to be stoped. but when the connection re-established we will have unsynchronized databases, what will be the behaviour of the cluster?
- we have branch no as primary key for all records in database, so one branch doesnt modify other brach data, so we can take all modification done in the period of dissonecting and run it in the other database so:
- what is the pest way to do that? is there any option in clustering to perform synchronization of the modification in poth sides?
- if not, its good way to use redo log files using LogMiner to take ascript of modifications and run it?
thanks.
Tom Kyte
December 30, 2010 - 1:10 pm UTC

.. what will be
the behaviour of the cluster? ...

that is not a cluster, that is a distributed database.

You would want to read about replication - and then decide you don't want to do this at all - you want a single database. In the year 2010 - network connectivity is to a point that you don't need to do distributed databases like this. You would be best off by designing a good redundant network. Have a failover site in the event of disaster - but do not do replication. It is really complicated.


But read about streams and/or golden gate. Do not think about writing your own replication using logminer or anything remotely similar. Re-inventing the wheel 20 years after it was invented would be a real waste of time.

Things we need to do to develop an application for RAC

A reader, February 19, 2011 - 4:21 pm UTC

Tom,

I have tried to find information about what things a developer should look at while developing an application that will be running on RAC.
Would you be so kind and give me your suggestions on what things we should be doing so the application we are developing will perform in a RAC database.

Thanks a lot for any inputs you can give me.

beneth load balancing,

A reader, May 24, 2011 - 3:00 pm UTC

Hello,

In TNS entry, LOAD_BALANCE=ON will turn on client side load balancing. However, I have a question:

How do we determine whether the concept of LOAD BALANCING is working accordingly? Do we need to look at number of sessions by each service per instance to measure the feature of load balance? Say, if there are less number of active sessions on an instance but the CPU is almost 100%, does the load balance feature lands a new session to this particular instance (based on number of active sessions)? Does Database time (and CPU time) plays a role in load balancing feature?

Thanks,


Tom Kyte
May 25, 2011 - 11:19 am UTC

load balancing can be configured in various ways - round robin, based on load for example.

http://docs.oracle.com/docs/cd/E11882_01/network.112/e10836/concepts.htm#sthref184

client load balancing will be round robin (randomized). You would simply look for a relatively even distribution of established sessions on the nodes.

follow up,

A reader, May 25, 2011 - 2:08 pm UTC

Least loaded node (or instance): does this mean by the number of sessions(connections)? If node 1 has 100 active sessions and node 2 has 80, then listener will most likely place a new session to node 2. This is connection load balancing. Is that correct? How to turn on connection load balancing?

So, what do you call balancing the node based on resouce consumption (CPU for example)?

Thanks,

Tom Kyte
May 25, 2011 - 2:24 pm UTC

The pmon process reports the load of the machine back to the listener periodically. The load is not "real time" enough to say "oh, there are 100 active sessions right now - go over there". The load is an average load on the machine over some period of time.

found this from the table of contents of the document I pointed you to:

http://docs.oracle.com/docs/cd/E11882_01/network.112/e10836/advcfg.htm#i475648

It is a section called "Configuring Connection Load Balancing"

RAC Migration

Rajeshwaran, Jeyabal, June 27, 2011 - 10:25 am UTC

Tom:

We have our production running on single instance database and we are planning to move RAC & we are on 10GR2. database size is around 4TB & DBA's are planning to do this migration using Export and import utilites and its estimated for 2-3 days downtime.

Shall we use this Cross-platform Transportable Tablespaces to do this migration rather than export and import?
http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/tspaces.htm#i1007169

Also please share other possible idea's and the best way to do this RAC migration

Tom Kyte
June 27, 2011 - 11:40 am UTC

what is the source and target hardware platform.

and when you said export/import you meant data pump right?

RAC Migration

Rajeshwaran, Jeyabal, June 27, 2011 - 11:47 am UTC

1) what is the source and target hardware platform
Source platform is solaris and Traget platform is Linux

2) and when you said export/import you meant data pump right?
Yes.
Tom Kyte
June 27, 2011 - 12:26 pm UTC

Is that solaris sparc or x86?

x86 you can do a full database transport using RMAN (data plus code)

if it is sparc, you can cross platform transport the data - but you'll have to datapump over the code, sequences, etc... It would likely be faster than data pumping the entire database.

RAC is useless??

A reader, July 15, 2011 - 10:17 pm UTC

Whether terracotta is better than oracle RAC from performance perspective. The newly joined java specialist is recommending the terracotta to achieve best performance for web application. Is that true?
Tom Kyte
July 18, 2011 - 10:45 am UTC



one would have to really understand what your performance issues are before suggesting a performance fix. If your java specialist is suggesting something without understanding what the potential bottlenecks are - be very very afraid.


RAC is useless

A reader, July 17, 2011 - 11:25 pm UTC

I am eagerly waiting for my Guru's response for the following comments:

1)Whether terracotta is better than oracle RAC from performance perspective. The newly joined java
specialist is recommending the terracotta to achieve best performance for web application. Is that
true?

2) RAC limitations

Here is some relevant info regarding Terracotta vs. Oracle RAC: http://forums.terracotta.org/forums/posts/list/2161.page

Is this comparing apple to orange? How can we compare the oracle RAC with terracotta. It does not even make sense to me. The terracotta says

"RAC has serious limitations when it comes to transactions and locking. For instance all work does as part of a single XA transaction has to be done on the same RAC node. RAC is really a joke in a lot of ways - first clue is that they have to give the product a name that tries to convince you it is "real application clustering" when it has such strong limitations."

I can't believe this? What's your thoughts on this?
Tom Kyte
July 18, 2011 - 10:52 am UTC

1) it doesn't make sense to even compare, we need a PROBLEM TO BE SOLVED before you can compare any two things.


If your problem is long running queries against frequently changed data - caching at the middle tier will do squat for you.

if your problem is concurrency due to lots of concurrent modifications of data - caching at the middle tier will do squat for you.

and so on and so on - IT DEPENDS on the nature of the performance issue you are experiencing. Unless and until you can numerically quantify that, anyone suggesting a fix is "not being smart"


2) statements like this are just 'stupid'

<quote from the forum>
Oracle RAC has some HA options but they are very expensive and cumbersome to run. Terracotta's HA options are transparent, automatic, in the box and require no complex setup or tuning.</quote>


Once the database is up and running, you have them - how is that different? That was written by someone that

a) ran terracotta
b) never ran Oracle (as a DBA)


Here is something smarter that they did say:

<quote from the forum>
if the data that you are using RAC to serve up is read-only and cacheable,
</quote>


So, if you don't actually use insert, update, delete - it works great.

(and by the way, if you use the result_cache for client side OR server side caching, we can - well - sort of do that right out of the box as well... but I digress)


I don't know how this came up:

"RAC has serious limitations when it comes to transactions and locking. For
instance all work does as part of a single XA transaction has to be done on the
same RAC node. RAC is really a joke in a lot of ways - first clue is that they
have to give the product a name that tries to convince you it is "real
application clustering" when it has such strong limitations."


and the attack on the name in place of a technical attack is just plain out funny (makes me laugh) and immediately makes me discount the statement.

Does terracotta do transactions? If not (guess what it isn't, it isn't a database), then why even compare something RAC can do and does well with something terracotta does not do?


send them to http://www.tpc.org/ and ask them "if RAC is such a joke as you say, why does Oracle use it in most all of their benchmarks to show high end performance?" - and where are the corresponding terracotta numbers please?



Anyone with critical thinking skills should be able to have ripped at least some of this apart. Especially with statements like:

RAC is really a joke in a lot of ways - first clue is that they
have to give the product a name that tries to convince you it is "real
application clustering" when it has such strong limitations."



where is even an ounce of technical professionalism in that?


I guess you could just respond with:


terracotta is great for planters. Anyone that would name their performance product after something that is boring orange in color and just sits there (who thinks "peformance - ah, my potted plants..."). What a joke of a product, the guys that make this are obviously laughing at you just for using it.

Now, we are even.

Alexander, July 18, 2011 - 11:26 am UTC

Even though you don't follow sports, my theory on these absurd attacks on Oracle I think can be best explained with a sports analogy.

When a great player plays in someone else's stadium, they get boed hard. Average players do not.

I see Oracle getting ripped by lots of competitors, rarely do I see Oracle doing something similar.

So my point is, I think everyone is gunning for them because it is the best in the business.

A reader, July 19, 2011 - 11:30 pm UTC

Wow! The terracotta specialist was speechless when I had asked to read your comments. Your are the best!!

RAC

sam, July 28, 2011 - 4:09 pm UTC

Tom:

In your 1st comment above about RAC, you mention it is used for failover situations when one instance/server dies, the other node picks up. This provides High availability.

Many DBAs I talked to tell me RAC is mostly for scalability and not failover situation as most of failures are due to DISK (not instance) and RAC will not do anything for that.

1) It seems to me that RAC is for both: High availabilty and scalability and performance. correct?

2) Would you use RAC for standby server or you would use DATA GUARD? It sounds the DataGuard solution can provide both INSTANCE and DISK in case of failure and more suitable for failover.

3) Can you give me an idea about level of complexity of implementing RAC or DATA GUARD for OLTP system?

thanks in advance,
Tom Kyte
July 28, 2011 - 7:48 pm UTC

Many DBAs I talked to tell me RAC is mostly for scalability and not failover
situation as most of failures are due to DISK (not instance) and RAC will not
do anything for that.


runs counter to my experience. disk failures in the 21st century happen but are rare due to redundancy.

software crashes - frequent
os panics - happen
machine gets overwhelmed and you need to pull the plug - unfortunate, but some people let that happen to them


I disagree with disk failure as being the main cause of outages, I don't hear of it that often myself. Disk fails - we get a notification and replace it, but with raid 10 - we never lose access, we never lose protection.

1) yes
2) you need both rac (for failures in a single data center) and data guard (because you will have to take rac offline for maintenance - and you will have your data center fail someday) and data guard. it is not an either or if you ask me.
You do not want to failover (major) just because a node got busy, or an OS paniced - they are totally recoverable issues, you do not want to go through a major failover in general for something so trivial

3) between zero (if you know what you are doing and have experience) and infinity (if you do not)

For example - for some people, doing it would be something they can do in their sleep. For others - it would seem an insurmountable task (until they've done it five times - then it becomes mundane)


RAC

sam, August 02, 2011 - 7:23 am UTC

Tom:

Your suggestion to have BOTH (RAC + Data Guard) is good but is not this the "Rich Man solution" when

RAC license is about $23,500 per CORE and
Data Guard license is about $5,8000 per CORE.

For a "poor man solution" where we can afford one day loss of data and having a COLD standby server, cant you just CLONE the database machine and just copy the database files every night from Primary machine to standby machine. When main machine breaks, you plug the CLONE into the network?

2) I was trying to get an idea about complexity level of install/config for RAC and Data Gurad if you are not familar with it.

Would it take DAYS, WEEKS, MONTHS? Is it as simple as installing an oracle DBMS software using OUI and creating a database using DBCA?
Tom Kyte
August 02, 2011 - 8:29 am UTC

The more availability you desire, the more it costs in terms of money, people, thinking, maintenance, etc.

They less you need, the less it costs.


For a "poor man solution" where we can afford one day loss of data and having a
COLD standby server, cant you just CLONE the database machine and just copy the
database files every night from Primary machine to standby machine. When main
machine breaks, you plug the CLONE into the network?


that is not a "poor man" solution, that is a solution that requires very little data availability and currency. If you were wanting to do that - sure, you could. I would just have a good backup policy in place and restore a backup to a failover site on demand (I might not lose anything!). Sure, the failover takes a bit longer than if I had already restored it - but it is even cheaper than your approach.



2) if you had training or if you had assistance (a mentor), it would take a very short period of time. It is mostly in getting the right hardware environment together.

If you did it hit or miss on your own - without outside assistance, it would take considerably longer as there will be some concepts you are not familiar with and you likely would not have the proper hardware initially (and you'd have to figure out what you really needed)

"...I do not believe in application data caches."

Graham, August 03, 2011 - 11:33 am UTC

Referring to Coherence, you said above (8th July 2009) that you don't believe in application data caches.

Can you point me at any articles going further into reasons against it to help in a little internal "discussion" I'm about to have?

Tom Kyte
August 03, 2011 - 1:20 pm UTC

the biggest one is of consistency - how do you maintain cache consistency when you have many middle tier caches and a database that can be changing.

And if they say "well, we do all modifications in the middle tier, propagate out the results to the other middle tiers and maintain the database", then I say you just got something that makes it impossible to use your data in any way EXCEPT via the application. In other words - in N years (where N seems to be getting smaller every day), when "new programming paradigm to end all programming paradigms" is released - you'll either not be able to use it (it isn't your application anymore, it is a new application) or you'll have to rip apart what you have and re-implement.

I prefer databased solutions

o IMDB - in memory database - also known as times ten. Uses replication to synchronize and ensure you get current data

o client side result set caching - transparent to the application, the results of frequently executed queries against relatively unchanging data (all you need to do is query it more often than it changes for this to be true) allows us to skip going to the database when we don't need to - but will refresh the cache transparently when the underlying data changes. Comes with 11g out of the box.

o server side result set caching - similar to the above, but caches in the shared pool - so that queries that perform a considerable amount of work against relatively unchanging data can be used and reused by any other session. Can be combined with the client side result cache.

o server side plsql function result cache - caches the answer to frequently executed plsql routines in the SGA for reuse by all.

Alexander, August 03, 2011 - 1:48 pm UTC

"And if they say "well, we do all modifications in the middle tier, propagate out the results to the other middle tiers and maintain the database", then I say you just got something that makes it impossible to use your data in any way EXCEPT via the application. In other words - in N years (where N seems to be getting smaller every day), when "new programming paradigm to end all programming paradigms" is released - you'll either not be able to use it (it isn't your application anymore, it is a new application) or you'll have to rip apart what you have and re-implement. "

I think you'll find many application developers argue the same point against databases. I've had this discussion with an older developer technologist that wanted all the sql moved out of their exists procs into dynamic sql (this for was a Sybase to Oracle conversion POC). His reason was to be database independent because first he was told Sybase was the way to go, then UDB, now Oracle. It was also to utilize the talents on their team which was centered around Hybernate.

So I feel like that's a hard argument to make, one that requires a crystal ball. Java has been around for awhile now too.
Tom Kyte
August 04, 2011 - 7:54 am UTC

I think you'll find many application developers argue the same point against databases

but the problem with that argument is... the basic capabilities of databases has been proven over the last 35 years to be more or less consistent. stored procedures, locking, concurrency controls, transactions, sql grammar, etc.


Java has been around, just like php, perl, ruby, python, c, c++ - etc etc etc. forget the language. Start naming the *frameworks* that have been in place with this java language. Every time we turn around there is some "end all be all" way to code it up, rewrite what we wrote five years ago because it uses an old paradigm. From corba and orbs (many people might not even know what those are - but man, where they cool or what 10 to 15 years ago) to beans to whatever.

The java language may have been around for a while (right after the orb thing), but the "ways" you "do" things in it changes on a dime. And - it is not the only language (never will be). The idea that you trap yourself in a language - in an application in fact, something much more specific than a language - you trap your DATA in that application - ugh.

It is exactly what happened when we went mainframe/client server/web. Mainframe applications tied everything up in the application - you couldn't get to the data any other way - we ended up writing lots of screen scraping code - a lot of which still exists because the only way to touch the data was via the application.

History has a way of repeating itself I guess.

load balancing

A reader, August 05, 2011 - 6:11 am UTC

Hi Tom,

I am new to RAC and have a question about our existing RAC database in our environment.

We have three node RAC running on 10.2.0.4. Issue is 1st node is configured with 4 CPUs while node 2 and 3 are configured with 8 CPU each. We have configured server side and client side load balancing.
There are multiple schemas configured within the database which are being used by different applications. Out of these applications, one application is CPU hungry. We have observed that when this application runs queries on node 2 and 3 where we have more CPUs, everything runs fine without any issues but when connections of this application is diverted to node 1, due to load balancing, it causes high CPU utilisation on the node due to which other application connections also suffer from performance degradation as CPU stays at 100% utilisation for few hours.

To avoid this happening, I would like to configure CPU hungry applications to use node 2 and 3 as we have more CPUs available there. We want this application connections to use node 1 only if node 2 and 3 are not available.

I understand that having unbalanced configuration of CPUs among the nodes is not advisable but unfortunately I don't think we will be able to get more CPU on node 1 soon thus I would like to know how do I configure the CPU hungry application connection to use node 2 and 3 all time and use node 1 only when node 2 and 3 are not available, something like node 2 and 3 are preferred node and node 1 is fail-over node?

Cheers,
Tom Kyte
August 05, 2011 - 8:22 am UTC

you want to set up services. You would have nodes 1,2,3 register with the listener as "service_1" and you would have nodes 2,3 register with the listener ALSO as service_2.

Your cpu hungry things would connect to service_2, everything else would use service_1. You can setup your tnsconnect string for the cpu hungry guys to always try service_2 first - and if there are no instances servicing that service - then to use service_1.




load balancing

A reader, August 07, 2011 - 10:44 pm UTC

Hi Tom,

Thanks for the reply.

In your reply you mentioned
"You would have nodes 1,2,3 register with the listener as "service_1" and you would have nodes 2,3 register with the listener ALSO as service_2. "

May I please ask why do we need to configure two services? Would not creation of one service with below suffice?

srvctl add service -d db_unique_name -s service_name -r node2 node3 -a node 1

Reference:
http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/srvctladmin.htm#i1008403

I believe if I create one service (Instead of two as you stated in your reply) using above command, and ask CPU hungry application team to modify their client side TNSNAMED.ORA to use newly configured service, all new connections from that application, after modification of client side TNSNAMES.ORA and bouncing the application, will go to node 2 and 3 as the service is configured to utilise node 2 and node 3 as preferred instances where we have more CPUs and the application connection will only go to node 1 when node 2 AND 3 are not available as the service is configured to utilise node 1 (with less CPU) as available instance?

Again, thanks for your time in providing invaluable guidance.

Cheers,

Tom Kyte
August 13, 2011 - 3:14 pm UTC

May I please ask why do we need to configure two services? Would not creation
of one service with below suffice?


because of your stated requirement???

You need two services - one that spreads out over 1,2,3 and another that hits just 2,3 - that is what you *said* you wanted???


You already have one service - it hits 1,2,3.

You are going to create a new service for 2,3

therefore, you shall have TWO services - one for 1,2,3 one for 2,3.

Not sure what you think is different about that from what I said??

load balancing

A reader, August 08, 2011 - 12:15 am UTC

Hi Tom,

It's me again!

Further to above question about RAC services, if we modify CPU hungry application's client side TNSNAMES.ORA file by removing entry of node 1 and bounce the application, will all new sessions of the application go to node 2 and 3 only as the client side TNSNAMES.ORA does not have entry for node 1 anymore?

Sorry if it sounds stupid as I am new to RAC. I understand that it is a bad idea, even if it works, to remove node 1 entry from client side TNSNAMES.ORA file of the CPU hungry application as if node 2 and 3 will go down, application will not be able to fail-over to node 1 unless the TNSNAMES.ORA file will be modified to include node 1 entry in it.

I am trying to provide options to business to fix the issue e.g.

Option 1: Add same amount of CPU on node 1 as we have on node 2 and 3 (As preferred solution)

Option 2: Configure another service as per above post (Waiting for your view on it)

Option 3: Remove node 1 entry from client side TNSNAMES.ORA . (Not a preferred solution).

Cheers,

Tom Kyte
August 13, 2011 - 3:22 pm UTC

Sorry if it sounds stupid as I am new to RAC. I understand that it is a bad
idea, even if it works, to remove node 1 entry from client side TNSNAMES.ORA
file of the CPU hungry application as if node 2 and 3 will go down, application
will not be able to fail-over to node 1 unless the TNSNAMES.ORA file will be
modified to include node 1 entry in it.


it is not a silly idea? why do you say that?

please re-read my initial response, I covered what would happen in a node 2,3 failure and what to do.

Load balancing

A reader, August 14, 2011 - 11:10 pm UTC

Thanks Tom.

I am now clear that I will need to configure another service with below command:

srvctl add service -d db_unique_name -s service_name -r node2 node3 -a node 1

Once the service is configured, I will supply TNS entry to CPU hungry application so that application support team can change their TNSNAMES to add new service info and can use it to connect to the database after bouncing the application (To clear current connections connected to node 1 using earlier service configuration).

Regarding below
"it is not a silly idea? why do you say that?"

My question was, is it possible to achieve this without creation of new service i.e. just ask application support team to modify existing TNSNAMES entry for existing service by removing node 1 entry and bounce the application. In this case I believe, if node 2 and 3 will go down, application will not failover to node 1 even if the node 1 is up & running as client does not have node 1 entry in their TNSNAMES.ORA file, is this true OR as the service is registered with listener of all nodes, Oracle will still redirect the application request to node 1 even if client does not have the entry of node 1 in their TNSNAMES?

Tom Kyte
August 15, 2011 - 2:24 am UTC

you want to use services, it is far easier than any other approach. Just use a service, that is what they were invented for.

Load balancing

A reader, August 15, 2011 - 8:14 pm UTC

Thanks Tom

import command run twice on RAC

Ravi B, October 18, 2011 - 12:29 pm UTC

Hello Tom,

I am not familiar with RAC environment.

One of our clients reported that an import command appears to be run twice as per the application log files in an RAC environment.

could you please let me know if this is anything to do with RAC?

Following is the log:

tail -f imp. log
With the partitioning, Real Application clusters, OLAP and Data Mining options
Export file created by EXPORT:v10.02.01 via conventional path
warning: the objects were exported by CSA, not by you
import done in US7ASCII character set and AL16uTF16 NCHAR character set
import server uses AL32UTF8 character set (possible charset conversion)
· importing CSA'S objects into FBA
· . importing table "LOCAL_SCAN_EXPORT" 1 rows imported
Import terminated successfully without warnings.
tail: imp. log: file truncated
2011-10-03 14:20:18.615 [imp, fba/__ BDNA_PASSWORD_NOT_LOGGED __ @BDNADB.AAA, fr
omuser=csa, touser=fba, file=/home/bdna/instal1750/exports/A-B_C_23SEP11.bdna
cs, i qnor'eey]
Import: Release 10.2.0.4.0 - Production on Mon Oct 3 14:20:18 2011
copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to: oracle Database 109 Enterprise Edition Release 10.2.0.3.0 -
64bit Production
with the partitioning, Real Application clusters, OLAP and Data Mining options
Export file created by EXPORT:v10.02.01 via conventional path
warning: the objects were exported by CSA, not by you
import done in US7ASCII character set and AL16uTF16 NCHAR character set
import server uses AL32UTF8 character set (possible charset conversion)
· importing CSA's objects into FBA
· . importing table "LOCAL_SCAN_EXPORT" 1 rows imported
· . importing table "LEl001_ATTR_DATA"
page
Tom Kyte
October 18, 2011 - 5:00 pm UTC

could you please let me know if this is anything to do with RAC?


no, it has nothing to do with RAC.



I don't know what I'm looking at here really, looks like a mixture of interactive unix commands (like page) and multiple tails. Very confusing presentation.


You'd have to tell us how imp was run

You'd have to tell us how the log file is created (redirection? if so, using > or >>? using log= on the command line?, etc)

You'd have to show us the log file without confusing commands in the middle of it.

import command run twice on RAC

Ravi B, October 19, 2011 - 12:03 pm UTC

Thanks Tom. That is the log i got from the client. I will try and see what and how they are running the commands.

But I got my answer that this behavior is not due to RAC.

Thanks very much!

A reader, November 15, 2011 - 2:28 am UTC

Hi

In rac environment, can I add a node by using a GUI interface from Enterprise Manager or with oracle universal installer ? (runInstaller -clone)

what should be the approach

ajeet, March 21, 2012 - 9:40 am UTC

Hi Tom,

We have 6 oracle databases which will be implemented as Real Application Clusters (RACs) to provide high availability. The idea is to have 6 grids of two node each.

The alternative approach/proposal is to have single multi-node grid of 12 nodes and use 2 nodes each for each RAC from that pool.

Can you guide on the right approach and what would be pros/cons , we plan to use Oracle Grid (11g R2) .
Tom Kyte
March 21, 2012 - 10:36 am UTC

if you go with a single multi-node grid, you'll have more flexibility. If you lost two nodes and they happened to be for application "x", you could use some of the other 10 to failover to. You'd have a single database so any node can run application 'x'. Also, if application 'x' needed extra capacity - you could steal a node from applications 'y' and 'z' temporarily and give them back later.

You'd have 5 less databases to upgrade and manage as well.

And if these applications share any data - you'd avoid replication which is a good thing to avoid

However, since there in only one database - they are one for all, all for one. If one application needs a database patch, they all get it. I personally view this as a positive thing (less versions floating around) but some view is negatively.

It is something to consider

Alexander, March 21, 2012 - 11:05 am UTC

What is a RAC grid? Does that just mean a different cluster?
Tom Kyte
March 21, 2012 - 11:17 am UTC

he is using 'grid' to mean 'machines per database'

they have 12 machines, they were either going to have six separate databases with 2 nodes each (six 'grids')

or one database with 12 nodes (one 'grid')

creating read-only service,

A reader, April 05, 2012 - 10:54 am UTC

Hello,

I am on 11gR2 and I learnt that I could use "-l" option in srvctl add service command that will figure out the role of the database.
My intention is to have two services - one that always points users to my primary database (read-write) and the 2nd one puts users to my read-only database (we have Active Dataguard).

My process is:
create serv_RW on both the databases using -l PRIMARY and start the service on both of them
Similarly create serv_RO service on both the databses with -l PHYSICAL_STANDBY and start both of them.

In the first case, when I tried to connect to the database using serv_RW, it put in standby database.

In the second case, when I tried to start serv_RO on standby , I got

CRS-2674: Start of 'ora.mdcp2p.mdcp9pcr.svc' on 'g2u1258c' failed
CRS-2632: There are no more servers to try to place resource 'ora.mdcp2p.mdcp9pcr.svc' on that would satisfy its placement policy
CRS-5017: The resource action "ora.mdcp2p.mdcp9pcr.svc start" encountered the following error:
ORA-16000: database open for read-only access

I didn't see a documentation from Oracle on how to using this feature.

thanks for your help.

cache transfer

Lokanath, May 03, 2012 - 6:30 am UTC

Hi Tom,

In a 2-Node RAC , a database block DB1 containing two rows (R1,R2) has been read into
databuffer cache of Instance 1 for update of row R1 and currently updating it.
At the same time if another session wants to update the row R2 in the same datablock DB1 through Instance 2 .

Please let me know

1.Does Instance1 creates a PI of the DB1 and transfer the block to Instance 2 without completing its update
or
2.Instance2 waits for the Instance1 to complete the update of R1 in DB1 and transfers it?.

If my above understanding is correct , does it cauase excessive delay in update as compared to update in single instance database?.

Thanks in Advance,
Lokanath.

Tom Kyte
May 03, 2012 - 9:09 am UTC

instance 1 would have to allow the transaction that has the block in current mode (so it can change the bytes of row R1) finish - and then it would transfer control of that block to instance 2 so the other transaction can modify the bytes of R2


does it cause excessive delay? No, could take longer than if you were just in a single instance perhaps - but if your single instance could not handle the concurrent workload - it could be faster than a single instance.


Think of it this way. On a multi-cpu machine you have the same issue when modifying memory. the CPU needs the bits of memory in its L1/L2 caches. If another process on another CPU in the same machine needs the same bits - they have to transfer the memory from one CPU cache to another. RAC is like SMP.

Lokanath, May 04, 2012 - 9:32 am UTC

Hi Tom,
Thanks for your response.

My understanding is in non-RAC ,oracle can update rows(R1,R2) in parallel even though they are all in the same datablock(DB1).
Then in RAC scenario why should Instance2 (wants to modify R2) has to wait for Instance1 which modifying different row R1 of the datablock?.
Does it mean in RAC updates for different rows on a datablock have to go in sequence if requests orginates from different instances?.

Kindly guide me with your explination
Tom Kyte
May 06, 2012 - 2:41 pm UTC

No, you cannot.

Only one transaction at a time can have a block in current mode - and you need to have the block in current mode to modify the bits and bytes on the block. Can you imagine the mess that would happen if two things simultaneously updated the same bits and bytes in memory????


In a single instance, you would experience a buffer busy wait.

RAC 11g Concepts

Archana, May 11, 2012 - 1:50 am UTC

Hi Tom,

I like this site a lot.I am new to RAC.Could you please provide me some link for RAC 11g concepts in detail.

Troubles in 4 node RAC

A reader, September 27, 2012 - 9:11 am UTC

Please, any comment will be welcome.

Caching 2 medium size (1-50 MB) tables (the most accessed ones by the application) in a 4 nodes RAC 10gR2 (Linux):

a) it would be advisable
b) it would make things worse
c) It wouldn't have effect

Would this statement be true "The bigger datablock size, the worse for updates in high concurrency "?.


Thanks
Tom Kyte
September 27, 2012 - 3:16 pm UTC

cannot make any comments on the caching question really. I assume you mean alter table T cache - if so, it would only have an effect if you full scan the tables (if you don't - they are already as cached as they are going to be).

If you do full scan them, it might

a) make things faster
b) make things slower
c) not change a thing

if you have plenty of space in your buffer caches, then maybe it might be beneficial - maybe.

if you don't - then you might be pushing out other stuff, that could be very bad.

or it might not matter at all.


not understanding the data access patterns (what you do) or the situation makes it hard to say anything concrete.

Would this statement be true "The bigger datablock size, the worse for updates
in high concurrency "?.


I can come up with scenarios in which it would be better for updates, worse for udpates, the same ;)

it depends again. I would just stick with the venerable 8k block and not deviate from it. It is was the database is developed and tested against, it is the most mainstream, it delivers excellent performance with minimal overhead.

Switching to RAC

Luke, October 02, 2012 - 7:51 pm UTC

Hi Tom - one of our customers wants to put our product into production using an 11gR2 RAC database. We have only tested on 11gR2 single machine database and I have no experience with RAC. This seems like a big risk to me since our product has not been tested on a RAC database, but my understanding is that it should 'just work'. The customer already has a RAC database and so what to utilize it - they are not looking to scale our product to anything more than we have tested.

Can I get your comments on this? Thanks...
Tom Kyte
October 09, 2012 - 11:45 am UTC

it needs to be tested and you need to have infrastructure to support supporting/debugging of it.

at best they should go "one node" ala RAC one node to avoid any sort of scaling issues.

but it would still be nice if you had such an environment in order to support this configuration.

RAC Scan re-arp

A reader, October 10, 2012 - 12:49 pm UTC

For the local node vip relocation, you mentioned above that :

"When a node fails, the VIP associated with it is supposed to be automatically failed over to some
other node. When this occurs, two things happen. (1) the new node re-arps the world indicating a
new MAC address for the address. For directly connected clients, this usually causes them to see
errors on their connections to the old address; (2) Subsequent packets sent to the VIP go to the
new node, which will send error RST packets back to the clients. This results in the clients
getting errors immediately.

This means that when the client issues SQL to the node that is now down, or traverses the address
list while connecting, rather than waiting on a very long TCP/IP time-out (~10 minutes), the client receives a TCP reset. In the case of SQL, this is ORA-3113. In the case of connect, the next
address in tnsnames is used."

Does the same process/step take place for 11g SCAN Vip relocation ?

Because during node-vip relocation, only the vip moves (leaving the local listener behind). Re-arp here helps for existing sql/new connections, by either informing them about ora-3113 or asking them to traverse to the next connection (quickly, without 10 minute delay). Since node-vips have the responsibility to fork a process (with the listener), having moved to another node does not help them to fork a new connection request.

But on the other hand, scan vip relocation moves "with" the listener. I am guessing, re-arp is still necessary for the client to know "quickly, where" the scan vip has moved, but then will it this time also send a "TCP RESET" this time ? (This is my main question). Because I think that it should be able to "take it from here the normal process of contacting the least-loaded listener" since it already has the scan listener (also relocated) to this node and also unlike node-vip, it does not have the responsibility to fork a new connection, just manage by sending it to the listener of the least-loaded node.

Please help clarify with your easy-to-understand explanation (as always).



Oracle 11gR2 Availability

Srinivasarao, December 22, 2012 - 3:43 am UTC

Hi Tom,

You are doing a great job by providing solutions and help to Oracle Users. I want to ask you about 11gR2, could you please explain how they work in delivering 24/7 Database Avilability.

Thank you

Regards,
Bathula
Oracle DBA

oracle RAC concept and high availablity

A reader, March 26, 2013 - 6:36 pm UTC

hi tom,


i understand the difference between an instance and a database. and i understand the baisc concept of rac which means

multiple instance to 1 database.

----------------------------------------

My question is

1) for that particular database, are we able to span the datafiles across different server/machine ?
or it must reside on just 1 machine ?

what i meant is, since we are able to group multiple instances from different machine together as 1 ( using oracle clusterware),

are we able to achieve the same for storage?

2) if we are able to group storage from different machine together, how is it achieve ? by ASM or clusterware ?

how does ASM on machine A see a harddisk from another machine B ? by doing a network mountpoint on machine A ?

3) with RAC, instance availability is guaranteed, but how about database availability ?

are we able to have RAC + dataguard together ? which means if my database go down, my rac instances will automatically connect to database B ?

or i must have another different set of RAC on database B ?

-----------

Please advise

Thanks
Regards,
Noob
Tom Kyte
March 27, 2013 - 5:02 pm UTC

1) the datafiles must reside on ALL machines. ALL machines need full read/write access to all datafiles.

so typically you are using a clustered file system (i.e. ASM) to access a set of files on a SAN or some other sort of attached storage.

But the disks need to be accessible by each node in the cluster independent of any other node, the storage cannot rely on some node being available (that would kill the availability features...)

2) the storage doesn't belong to the machines, the storage is separate and distinct, typically network attached

3) a database is available because an instance accessing that database is available.

RAC allows you to have redundant instances, each separate and distinct from each other, the failure of one instance does not affect another instances availability.

Therefore, as long as one instance in a cluster is available, the database mounted and opened by that instance is available.



Yes, you can use data guard with RAC databases.

and remember your database is only down if ALL INSTANCES in your RAC cluster are down.

oracle RAC concept and high availablity

A reader, March 30, 2013 - 6:59 pm UTC

hi tom,

not sure if you are going to see this on a saturday night but i can't stop my urge to ask the questions lurking inside me.

On your reply

But the disks need to be accessible by each node in the cluster independent of any other node, the storage cannot rely on some node being available (that would kill the availability features...)

2) the storage doesn't belong to the machines, the storage is separate and distinct, typically network attached



Assuming i have 2 machine A and B connected to a SAN storage

Should the setup be ->

Choice 1
A machine -> given SAN storage LUN A
B machine -> given SAN storage LUN B

Installing clusterware and asm will enable machine A and B to see each other LUN ?

Choice 2 (not sure is it possible to assign 2 LUN to a single machine though)

A machine -> given SAN storage LUN A,B
B machine -> given SAN storage LUN A,B

--------------------------

If the answer is choice 1, then it makes no different to have the storage to be inside machine A and B (instead of SAN) since installing the clusterware,asm will make both machine see each other storage , isn't it ?

Regards,
Noob
Tom Kyte
April 22, 2013 - 12:46 pm UTC

sorry, you submitted this on the beginning of a three trip around the world for me :) I just got back.




In short - all things "database" (where you will be putting database files, log files, control files) have to be shared - if you put a datafile on LUN A, then both A and B need full read write access to it.


You can use separate file systems (private to each of A and B) for things like trace directories, parameter files and the like - but it is not recommended (when a node fails, it would be nice to see the trace information on another node for example, it would be nice to share a common parameter file, and so on)


RAC is shared everything.

Syed Safi, July 06, 2013 - 6:15 am UTC

One of the blog I found the description regarding ocr master node
” In case of node eviction, The cluster is divided into two sub-clusters. The sub-cluster containing fewer no. of nodes is evicetd. But, in case both the sub-clusters have same no. of nodes, the sub-cluster having the master node survives whereas the other sub-cluster is evicted.”
Where as my understanding is “Odd number of disk are to avoid split brain, When Nodes in cluster can’t talk to each other they run to lock the Voting disk and whoever lock the more disk will survive, if disk number are even there are chances that node might lock 50% of disk (2 out of 4) then how to decide which node to evict.
whereas when number is odd, one will be higher than other and each for cluster to evict the node with less number”
Can you please clarify conceptually which is exact behaviour?

12c and RAC

Mani, July 11, 2013 - 6:43 pm UTC

was just attending the 12c Launch by Larry ellison this week, if i understand correct. in Cloud computing i.e.version 12c of oracle we have multi teneted databases shared by single set of processes. does that mean is this the reverse of RAC where multiple instances accessing a single database where as here multiple database accessed by single process.

can you please clarify.
Tom Kyte
July 16, 2013 - 4:05 pm UTC

not the reverse of RAC - you would still use RAC with multitenant in order to protect from machine/instance failure.

RAC is a high availability, scale out feature. We use multiple instances on multiple nodes to make data available fully from more than one computer.

Multitenant in a word is the ability to consolidate multiple databases into a "single" database while maintaining the physical security between databases. A method of consolidation. In multitenant - a single instance on a single host can be used to access multiple database. They can be RAC'ed together as well - so you would have a cluster of instances all able to read/write any or all plugged in databases.

Syed Safi, July 18, 2013 - 7:51 am UTC

Dear Tom,

One of the blog I found the description regarding OCR master node
” In case of node eviction, The cluster is divided into two sub-clusters. The sub-cluster
containing fewer no. of nodes is evicted. But, in case both the sub-clusters have same no. of
nodes, the sub-cluster having the master node survives whereas the other sub-cluster is evicted.”
Where as my understanding is “Odd number of disk are to avoid split brain, When Nodes in cluster
can’t talk to each other they run to lock the Voting disk and whoever lock the more disk will
survive, if disk number are even there are chances that node might lock 50% of disk (2 out of 4)
then how to decide which node to evict.
whereas when number is odd, one will be higher than other and each for cluster to evict the node
with less number”
My question is if the one of quote might be incorrect because if both the sub-clusters are even equal;having non master-OCR cluster will be automatically evicted so why there is need to have odd no of voting files?
have a good day
Tom Kyte
July 18, 2013 - 5:28 pm UTC

you should consider asking the author of the blog for clarification? things taken out of context are hard to comment on.

I cannot even tell what is the quote from the blog and what is your commentary since you seem to be using "quotes" quite liberally.

Syed Safi, July 18, 2013 - 7:51 am UTC

Dear Tom,

One of the blog I found the description regarding OCR master node
” In case of node eviction, The cluster is divided into two sub-clusters. The sub-cluster
containing fewer no. of nodes is evicted. But, in case both the sub-clusters have same no. of
nodes, the sub-cluster having the master node survives whereas the other sub-cluster is evicted.”
Where as my understanding is “Odd number of disk are to avoid split brain, When Nodes in cluster
can’t talk to each other they run to lock the Voting disk and whoever lock the more disk will
survive, if disk number are even there are chances that node might lock 50% of disk (2 out of 4)
then how to decide which node to evict.
whereas when number is odd, one will be higher than other and each for cluster to evict the node
with less number”
My question is if the one of quote might be incorrect because if both the sub-clusters are even equal;having non master-OCR cluster will be automatically evicted so why there is need to have odd no of voting files?
have a good day

RAC on 12c

Mani, August 06, 2013 - 5:56 pm UTC

Thanks for your explanation on 12c With RAC. Can we have 2 databases 1 with a 4 Node rac (DB1) and other a NON RAC database (DB2) co-exist in a multi-tenant, where i still want DB1 alone to be RAC and other DB2 not. is it possible to consolidate this way.


Tom Kyte
August 08, 2013 - 4:59 pm UTC

you would have to have two container databases (CDB) for that.

A CDB is either

a) configured for RAC
b) not configured for RAC


another option would be to have just one RAC'd CDB - and then just start PDB-1 on all four nodes and only start PDB-2 on one node. PDB-2 would be almost like RAC one node, you'd be able to shut it down and start it on another node, or if the node it was on failed, you could instantly start it on some other node for a failover. Not quite the same as RAC one node, but pretty close. Since only one node would be accessing its blocks - there would be no inter-connect traffic (but you'd be able to start it elsewhere at the drop of a hat if you wanted)

thanks!

Mani, August 09, 2013 - 2:49 pm UTC

Nice explanation. thanks a lot Tom!

clustering software or clustering file system ?

szejie, September 25, 2013 - 8:30 pm UTC

Hi tom,

I am trying to grasp the concept of RAC.
From what i know, all nodes in the RAC environment must be able to access a same shared storage.

However, i also understand that by having a same shared storage for all nodes come the risk of write/read file corruption.

Reading
http://www.orafaq.com/node/66, it seems like RAC can work on raw device as well.

q1) So what is the component controlling the read/write activities between the rac nodes to prevent corruption ?

Is it the oracle clustering software ? or the clustering file system (but for the raw device or normal LVM, there' isn't any)

Regards,
Noob


Rac documentation

Pranav, October 08, 2013 - 3:43 pm UTC

Hi Tom,

Somehow I couldn't find rac concepts guide in OTN. Especially starting from 10 & 11g. I am looking for the detailed explanation of Rac architecture and background processes. Please help me.


Rac documentation

A reader, October 10, 2013 - 4:03 pm UTC

Thank you Tom.

I did go through that link before. There isn't much information about the background processes(just a line or two). I remember, long time back I have seen very good explanation of each of the background processes in 10g documentation(which I couldn't see anymore)
http://docs.oracle.com/cd/E11882_01/rac.112/e41960/admcon.htm#CJHJAAFE

RAC

Anand, April 09, 2014 - 11:35 am UTC

Hi Tom,

Our DB is a two node RAC environment.During the batch process have seen that request has been gone to instance 2 and not a single request to instance 1.And on confirmation with DBA both the instance was up.Can you inform how to know the root cause of this behavior ?? i cant find anything relevant in alert and listener log. Is this related to load balancer ???
Tom Kyte
April 16, 2014 - 5:04 pm UTC

you give no information whatsoever here. none.

so no, I cannot tell you the root cause.


i know nothing about how your batch connects.
i know nothing about how many connections your batch has.
i know nothing about your sqlnet setup.


that said, it is probably a good idea to have your batch concentrate the workload on one node. If you have done your 'batch' as most people do (slow by slow processing with no consideration to data affinity), spreading it over N nodes would probably either run just as fast as a single node - or slower. As you are chaotically hitting data here and there and causing blocks to be sent over the interconnect like made.


tell me, is your single node running at 60%, 70%, 80%, 90% or 100% cpu? if it is in the 60-80 range, you've done good. spreading out over many nodes would probably (I'm guessing at the design of the batch) result in more cpu being used to perform less work. (eg: it would be slower)

SCAN-VIP and VIP

Alan, June 23, 2014 - 7:06 pm UTC

Hi Tom,

I am trying to understand how RAC works actually.
Hence the questions below

1) Assuming i have a 2 nodes cluster, there will be actually

2 x VIP
2 x Production IP
3 x SCAN-VIP
2 x Internal IP (for interconnect)

2) My questions are

a) are the VIP on the nodes tie to the interface card ? e.g eth0:1 ?

b) if a node is down, will the VIP1 in node1 be move the eth0:1 in node2 ?

c) will then be a unsolicited re-arp send out to all clients , but clients will still connect to VIP1 in node2 ? or VIP2 in node2 ?

d) why do we need 3 scan VIP if we only have 2 nodes ?

e) if I have 2 nodes, where will the scan-listeners be setup ?

f) i understand that scan-vip are not tie to the network interface, but in that case, how does a client know where to connect to ?
wouldn't the network layer in the nodes reject an ip that is not tie to its inteface ?

Regards,
Noob