Skip to Main Content

Breadcrumb

XMAS_TREE

The AskTOM team is taking a break over the holiday season, so we're not taking questions or responding to comments. Please have a happy and safe festive season and as always, thanks for being a member of the AskTOM community.

Question and Answer

Chris Saxon

Thanks for the question, Felix.

Asked: June 26, 2006 - 5:17 pm UTC

Last updated: December 04, 2014 - 10:43 pm UTC

Version: 10.2.0

Viewed 10K+ times! This question is

You Asked

Hello Tom,

I know this is my second question in a span of a few minutes. I wanted to get a feel from the user community of the best practices in laying out a SAN for the database. Where do the redo logs go? Does one set stay on the SAN and the other on the server's local disk. The point of getting a SAN is to reduce the storage requirements on the server. With a server attached to a SAN and with 2 or 4 local disks, how would you lay out the database. Will appreciate any information on this.

Thanks

and Tom said...

The point of getting a SAN is to centralize management of a shareable, poolable resource.

Pretty much everything typically goes on the SAN - the internal disks are used perhaps for the OS image, maybe scratch space - but everything else is on the SAN.

The backup stuff works (typically) off of the SAN as well.

So, if the machine (computer) blows up, just install new computer, mount SAN and you are back in business.

I would virtually ignore the existance of the internal disks.

Others may comment on their experiences of course...

Rating

  (73 ratings)

We're not taking comments currently, so please try again later if you want to add a comment.

Comments

SAN

Herodt, June 26, 2006 - 10:21 pm UTC

Others may disagree, but for us I have found the following very effective way of laying out a DB on a SAN with local disks.

the mount points are presented to the server as:

/prd for production data on SAN, normal speed disks, lots of protection without sacrificing too much speed.

/dev exists for dev/test since unfortunately dev and prd have to exist on the same machine in far too many instances.


/redoA, /redoB these are on the SAN and are as fast as fast can be via the SAN. These are smallest and fastest drives we could find. Vertically striped on the SAN, all the bells and whistles.

/undo, /temp fast, larger drives otherwise same as the /redo%

/arc on the SAN, big slow.

/backup this is 50/50 on the SAN or local disks. Either way it goes up the the tape array to go to tape.

All of the "scratch" is on the local disks or a NAS depending on where/how the server was configured. A image of the OS is on local disks as well as what is needed to use the image.

We have in the past when moving from DAS to SAN take the disks out of the server and build a NAS for the "community" to enjoy.

Our tape array server has the benefit of having it's own "SAN" in the enclosure that is capable of taking many types of drives (FC, SCIS even SATA). When 300GB drives come off of DAS, they tend to go into the tape array for storage for buffering backups to tape. Couple that with all the servers on a dedicated back end network, this is gives us very speedy backups from the backing up servers perspective and the tape writers just write what is in the big buffer.







S.A.M.E, know the beast

Gabriel Paulovic, June 27, 2006 - 12:24 am UTC

S.A.M.E - Stripe and Mirror Everything (SAME)
</code> http://www.oracle.com/technology/deploy/availability/pdf/OOW2000_same_ppt.pdf http://otn.oracle.com/deploy/performance/pdf/opt_storage_conf.pdf
.. unless you have very special requirements ...

Know the beast: do not believe everything what the storage group tells you, understand the beast (SAN) and its limitations ... and yes, you can kill SAN, with one server if you want, does not matter what model it is (I mean does not matter if it is model $$,$$$ or model $,$$$,$$$)...

... avoid RAID5 if you have a choice ... 
http://www.baarf.com/ <code>

Did I mention to stay away from RAID5? :-)

SAN, NAS or Local Attached Storage

David Kay, June 27, 2006 - 8:44 am UTC

Does an Oracle database care where its storage comes from (SAN, NAS or Local). Is it just a case of provide the filesystem (or let Oracle manage it with ASM) and Oracle will just get on with things. Is performance just down to the performance of the storage or will Oracle react differently to different storage architectures?

Tom Kyte
June 27, 2006 - 9:34 am UTC

For the most part - storage is storage. There are some features like "HARD" (better bit checking, making sure we catch errors as soon as possible) which we can do with specific types of hardware - but in general, storage is storage.

There is good storage and bad storage of course :) But it is all just "storage"

Every SAN is different

Andrew, June 27, 2006 - 8:46 am UTC

This is a time when you need to sit quietly with your storage admin and talk to (educate) him about your needs. Many storage admins do not understand that the storage needs of a database are _VERY_ different from those of a general purpose file server. You must also seek to understand how your SAN appliance works and its individual characteristics and limitations. For instance, EMC Clarion and EMC Symmetrix, while both very nice systems, are quite different in architecture and capabilities.

You will need many LUNs and multiple paths to the volume manager and volumes because the different classes of storage in an Oracle database require different kinds of storage service. You will need some very fast IO for undo and temp because that sees the most activity. You need fast read for data files but the write cache on the SAN should help with the datafile writes. You will need seperate physical volume sets (spindles) for your redo log groups and archives. Remember RAID is all about availability, not data protection. I have seen file systems get corrupted and no level of RAID will protect your data if that happens, so DO NOT think that mirrored file systems do away with the need for multi-volume redo log groups.

Read the links provided, they are quite useful.
Work out your disk plan on paper first
Then add 20%
Then implement on the SAN


SAME updates

Thomas Fox, June 27, 2006 - 8:54 am UTC

I wonder if they are planning an update to:

</code> http://otn.oracle.com/deploy/performance/pdf/opt_storage_conf.pdf <code>

mentioned above, because it labels the fastest disks at 10,000rpm and at 75GB maximum storage. This relates to a fairly old document. While still useful and very applicable, I am curious to any changes one would make to that document given today's very different storage technology. Also, I'm still a big fan of letting the SAN do the striping since it was designed to do this more efficient than software running on a particular host.

Tom Kyte
June 27, 2006 - 9:35 am UTC

The SAME updates would be any paper regarding ASM (Automatic Storage Management) which implements MASE (mirrored stripes)...

Great thread!

Jim, June 27, 2006 - 9:33 am UTC

Tom, I've personally never dealt with RAC but have heard that in some cases, the redo logs would sit on each corresponding node. I know you said put it all on the SAN, which makes perfect sense if the box tanks, amongst other reasons. Would there be any reason for redo on the nodes as opposed to the SAN other than just an "old school of thought" sort of thing?

Gabriel (and anyone else perhaps), in my experience and research, RAID10 seems to be the somewhat overall favorite. Are there any limits to when "enough is enough" though with regard to how many drives make up the mirror portion? (i.e. 2 is good, 3 is better, ...) I can definitely see how it would come down to how important is the data you're protecting worth. However, I was curious as to how much redundancy, in general, each one of the stripes contains for people that are using this to great success in high availability environments.

Tom Kyte
June 27, 2006 - 9:39 am UTC

the redo logs HAVE to be globally accessible to each node. They have to be on shared storage. In the event of a node failure, another node must be able to perform instance recovery and that necessitates having access to that failed nodes redo!



+1 to BAARF and SAME

smartin, June 27, 2006 - 10:39 am UTC

Curious as to why ignore internal drives, as opposed to using them as a multiplex location of redo log members. Put one copy on SAN/NAS flash recovery area, and put another copy on the local internal hard drive.

At least in a non RAC environment, with the idea being:

1) depending on server and disk controller system specifics, could write both redo log members (internal and external) at the same time, using different channels.

2) if SAN/NAS dies, you still have redo on another (safe) location, redo that may not yet be archived. Especially if you are using S.A.M.E / ASM concepts on the SAN/NAS.

By the way, is there a real, standards based difference between SAN and NAS? They seem to be used interchangeably so much, and different vendors / systems don't seem to be consistent with the terminology use. I am probably wrong in this, but I view them both as just a collection of disks that are external to a given server and that can be shared by multiple servers and configured in any number of ways.

Oh and in addition to the SAME paper, Millsap has a really good one somewhere in his hotsos library.

Tom Kyte
June 27, 2006 - 11:02 am UTC

1) can be done with san as well no?

2) the entire thing goes? you have the same opportunities for separation.

I just figure if you are going centralized, go centralized.

assorted thoughts

Gabriel Paulovic, June 27, 2006 - 10:42 am UTC

One thing I see with these huge SAN devices is that they create an illusion that they can overcome disk speed limitations ... and sometime, partly, they can do that with big NVRAM ... so, for example, your redo logs can be done as soon as they are in NVRAM eliminating any disk speed impact ... unfortunately, quite often things are setup to work fine on average but not enough attention is paid to handling peak situations ... Disk speed: look at last 5 years and compare disk size evolution to disk speed evolution, the disk size is going up in much faster rate than disk speed ... with these big SANs the illusion is that you can actually utilize all those disks and still get good performance ... do your math and see what are your I/O per second (IOPS) requirements ... and also what kind of I/O will be coming from your database to SAN (this depends on your application) and what is the impact of SAN architecture on all the different kinds of I/O ...

Difference between NAS and SAN

Andrew, June 27, 2006 - 11:18 am UTC

NAS = Network Attached Storage. This where a storage server is attached to the LAN. The storage server then connects to one or more storage (disk) arrays. Thus, all IO goes across the LAN at LAN speeds and impacts/is impacted by LAN traffic.

SAN = Storage Area Network. This is where each server accessing the SAN has one or more HBA (Host Buss Adapter) cards which attach via fiber to a SAN fabric switch. This switch (or switches) directs IO requests to the appropriate storage array and controller. This takes the IO off the LAN and puts it onto a seperate network of high speed fibre protocol and switches dedicated to moving IO requests very quickly.

Different speed disks on one SAN

Charlie B., June 27, 2006 - 1:42 pm UTC

Herodt and Andrew make good points re: speed of disks and what's placed on them. But wouldn't you, in general, have to purchase more disks to do that? If one of your slow-but-large disks failed, you couldn't hotswap it with a fast-but-small disk. And you probably wouldn't want to replace a failed fast-but-small disk with a slow-but-large one. So now you need one or two hot swap disks of each type, increasing your physical disk costs. You'd also have to convince your sysadmin to manage this more-complex arrangement of disks. ("more complex" as in "I already have to keep track of which hosts point to which slices of the SAN, and who's getting shadow copies and where they are - now you want to throw dissimilar disks into the mix?")

I understand RAID5 is supposed to be much slower than RAID1+0, but I was unable to find a benchmark to test this when we were purchasing our most recent SAN. The benchmarks I created resulted in comparable performance for both RAID5 and RAID1+0. (I did 10K inserts, then 5K random updates, then 5K random deletes, if I recall correctly.) I understand that there are issues re: short writes vs. long writes when comparing RAID5 to RAID1+0, but I couldn't replicate them - so we're on RAID5. The BAARF site, while excellent, didn't seem to have any practical benchmark suggestions.

Anyone got any bright ideas?

by the way: SAME methodology continues to work very well for me, been using it for 5 years now. New technologies only strengthen my feelings that it's the best way to go.

Tom Kyte
June 27, 2006 - 2:46 pm UTC

raid5 is great for reads - not so for writes.
raid10 is great for reads AND for writes (simplier, no parity to compute)



reponse to Charlie B.

Gabriel Paulovic, June 27, 2006 - 4:57 pm UTC

Are you sure the test was done properly? SAN can mask a lot of problems (depending of the specific device). But apart from the regular performance, are you prepare to accept severe performance hit when one disk fails? Do you accept that 2 disk failure will bring down the whole system? ... etc etc ...

For testing I definitely recommend:
</code> http://www.oracle.com/technology/software/tech/orion/index.html <code>
(sorry, only for Linux and Windows)


share with the class

Steve Montgomerie, June 27, 2006 - 11:39 pm UTC

Can others share their experiences with RAC on the SAN?

Specifically we are PS HR Oracle 9207 on Windooze 2000 with EMC CX 600. On this marvelous peice of machinery our production db with a sga 3X test ( which uses SCSII) runs no faster. It looks like the cache has been set on the SAN by the DELL dudes to optimize 75% writes 25% reads. In SANE SAN James says to ignore the SAN cache but I cannot help to wonder what if the cache was 75% reads. For our enviornment from v$filestat we are easily 30:1 reads/writes.

To Steve Montgomerie

A reader, June 28, 2006 - 8:00 am UTC

Steve,

I researched EMC's site and while I cannot locate a CX 600 anywhere, I checked out the CX 700. It appears this device can use either ATA or Fibre Channel drives. While ATA is decent, I would choose FC drives to get speed.

I would do some checking on the OS to see if you are waiting on I/O or maybe you are CPU or memory bound.

For Steve M.

Andrew, June 28, 2006 - 9:14 am UTC

An EMC CX 600 is much like a CX 700, but has capacity for 1/2 the number of hosts as the 700. This array comes with 8GB of cache, no more and no less. Now out of that *GB comes some memory for the OS (Windoze) and the vault cache. Of what is left you get IO cache. Now 75% for write and 25% for read is pretty reasonable because the read cache is use for read ahead and you probably will not get much use of that with a database, but write cache is quite useful because that is where your undo and logging get written. Now if you have to go to undo for consistent read and the data is still in cache, then so much better for your query performance.

The only time you might want to change this ratio is if you are doing so much writing that the write cache is getting flooded and your writes become synchronous, more read cache will not likely be of much benefit.

Thanks to Gabriel

Charlie B., June 28, 2006 - 9:19 am UTC

No, I'm not sure that the test was run properly, but it was what I could dream up in the middle of the installation. I hadn't heard of these ORION benchmarks, I'm sure they'll be very useful in the future. Wish I'd known about them before.

Yes, I pointed out that loss of a single disk will slow down RAID5 (while it speeds up RAID1+0), but in mgmt's opinion disk failures are infrequent enough that they choose not to worry about them. Same attitude I've seen at a couple of other places; it's probably not unusual.

Thanks again!

For Charlie B.

Andrew, June 28, 2006 - 10:00 am UTC

Does your management know that on a CX series disk array that the failure of a raid 5 drive will mean two rebuilds. The first will happen as soon as one disk fails and the array automatically starts rebuild on the hot spare. The second will occur when the failed disk is replaced and rebuilds -- no it does not copy to the replacement disk from the hot spare, and no the hot spare does not become a permanent raid member with the replacement disk becoming hte new hot spare. Also, a parity raid rebuild can take days.

Also, just hope that no two disks in a raid set were built in the same batch -- watch out for those sequential serial numbers.

Don't believe everything you read

Peter Schwenzer, June 28, 2006 - 10:55 am UTC

I would just like to add a few words about SAN storage, and raid configurations. We are moving from an EMC san to a Pillar san (a Lary E. startup). The EMC has FC drives, raid 0+1 and 2 Gb cache. The Pillar has SATA drives (all raid 5) and 24 Gb. of cache. With the new SAN, I can assign priorities to connections. It is possible to give Oracle the highest priority across the database LUN's.

As for speed, Oracle considers everything to be written, when it reaches the cache. This puts the actual write in the background, whild still giving the reads the highest priority over other applications on the SAN. Oracle makes very little use of a read cache, so that is disabeled for the database connections. We do use a large amount of write cache(no where near the 24Gb available).

Overall, performance on the Pillar, is approximately 5X what we havd been experiencing on EMC, and the cost is only about 1/4. (due to sata drives and raid 5 configuration). Currently, we are looking into mirroring the raid 5 array's. The way the storage is set up, if we lose a drive, only 1 rebuild is necessary. The new drives then becomes the spare, and does not need to go back into the raid array as a data disk. When we implement mirroring, we will be able to lose more than 1/2 the drives and still remain in business. We are going this direction, since this is a health care facility, and we can not put patient info in jeaporady.

Pillar

Gabriel Paulovic, June 28, 2006 - 12:39 pm UTC

I do not have any specific knowledge of Pillar but ... take ORION, read the setup, make sure you set it up properly and watch you 24GB of cache being exhausted ... of course, this is stress testing, in your normal operation 24GB of cache might cover the inherit bad design of RAID-5 .. and there still WILL be time when one disk fails and you will suffer ... and when 2 disks fail and you will suffer even more and the management will be calling for you head .. :-) ... or rather .. :-( ....

Great feedback

Felix, June 28, 2006 - 2:09 pm UTC

Thanks to all the folks for having given such great feedback. With the iSCSI SANs now much much cheaper than the traditional FC, are lot of people converting or trying the iSCSI SANs? Haven't seen much discussion anywhere on Oracle and iSCSI. Will appreciate people's feedback and their experiences with iSCSI.

Very helpful

Felix, June 29, 2006 - 5:14 pm UTC

Tom,
Appreciate yours and the readers response. Just wanted to throw this in since the iSCSI sans are gaining marketshare.
"With the iSCSI SANs now much much cheaper than the traditional FC, are lot of people converting
or trying the iSCSI SANs? Haven't seen much discussion anywhere on Oracle and iSCSI. Will appreciate people's feedback and their experiences with iSCSI."

Our iSCSI SAN experience

David, July 14, 2006 - 4:14 pm UTC

After much research of a wide variety of SANs, we opted for iSCSI. Because they use slower SATA drives, we did find that there are IOPS limitations under heavy load. Also, we're running Red Hat Linux, and there is no real support for multi-pathing at this time. (Multi-pathing to iSCSI devices is very mature and robust in Windows)

Being a small shop that doesn't have the luxury of a full time storage administrator, the ease of use, simple configuration and lower cost (compared to traditional FC storage) of the iscsi SAN was enough offset the IOPS and multi-pathing limitations given our current usage. We've been successfully running our production system on the SAN for several months (and a couple months longer than that on our test environment).

After some trial and error we found that the following command line option setting for the Oracle ORION test tool was the best way to to stress the storage systems IOPS capability:

-run advanced -testname mytest -num_disks 1 -size_small 8 -size_large 1024 -type rand -simulate concat -write 50 -duration 60 -matrix detailed

The OS test file was 4GB.

There are a number of iSCSI storage vendors out there. For the record we settled on a SAN by EqualLogic

If SAN then do you even consider ASM?

Chuck Forbes, July 24, 2006 - 5:30 pm UTC

So, if you go with the SAME methodology, allowing your SAN to do all the work, does ASM fit in anywhere? I haven't heard any convincing arguments which bend me towards ASM when I can allow the hardware on the SAN to do the same job. But at the same time, I feel under-educated on ASM - on how it fits into a modern-day storage framework (ie SAN RAID1+0).

-cf

Tom Kyte
July 24, 2006 - 6:37 pm UTC

sure, why deal with files and file systems when you don't need to?

asm is a "database file system", give me the LUNS.

Question to SAME

A reader, July 25, 2006 - 10:41 pm UTC

Hi Tom and guys

I read SAME document, but have some questions.
SAME always says to put log and busy file on the outside half disk, but how?
SAME says 10000RPM=3ms, how to calculate?
SAME says transfer rate varies from 12mB to 20mB, how to calculate?

Thanks

Tom Kyte
July 26, 2006 - 10:35 am UTC

don't worry about putting things on the "outside", "inside", same side, middle side, upside down. That is going a bit far 99.999999999999999999% of the time.


SAME doesn't "say" 10k rpm = 3ms, it is quoting some manufactures literature (it is a white paper...)

transfer rates vary more than that now. If you want to know the capabilities of YOUR disk, you would read the box they came in or read the specifications for them on the vendors web site.

ASM Continued

Chuck Forbes, July 26, 2006 - 10:33 am UTC

So, are you suggesting that ASM should just manage the files, but the SAN should be responsible for striping & mirroring? I guess that's where I am not completely informed on where ASM fits in.

Tom Kyte
July 26, 2006 - 11:37 am UTC

it is perfectly ok to stripe a stripe (guess what extents are - they are stripes).

You don't have to use ASM for redundancy.

ASM

Chuck Forbes, July 26, 2006 - 12:06 pm UTC

Not to beat a dead horse...

I'm a new DBA (~3 years), and so storage concepts are even newer. With that in mind, in approaching ASM, it seems like we'd get the best performance in allowing the SAN to stripe & mirror, and even avoid ASM for file management. Allowing ASM to stripe a stripe, or even to intervene at all, would just add to the overhead, right?

BTW, thanks for your comments on this.
-cf

Tom Kyte
July 26, 2006 - 1:30 pm UTC

the file system you would be building on top of that would be adding yet another layer of management.

look at segments - they are made up of extents - that are made up of blocks.


we can either do that directly to disk (asm) or you could build a conventional file system which manages files that are - collections of extents - that are collections of pages.

A file system adds yet another layer. ASM is more direct to disk. Benefits of raw without the "scary" raw word.

Question to SAME

A reader, July 26, 2006 - 7:50 pm UTC

Hi TOM

Before I got your answer, I did believe that outside theory can really improve performance much and I think this point is the most important one in SAME, but after reading your answer, I totally lost, why did you ask me to forget about which side which side?
For 10k rpm, I think it's quite easily to calculate the latency for round, right? it should be 60/10000, 6ms, I think if SAME said it's 3ms, there must something behind.

Thanks TOM!!

Tom Kyte
July 26, 2006 - 8:07 pm UTC

You are going way too deep. You are not going to be placing bits on disks in particular places. It just isn't realistic.

Read the specs on the disk, that'll tell you the spin rate, the average seek times, etc.

And SAME doesn't "say" anything. SAME stands for stripand mirror everything. It is an acronym.

someone wrote a white paper (a while ago, it sounds like - look at the date on the article) about SAME. SAME isn't saying "disks spin at this rate", someone wrote a paper, and they said some stuff.

S.A.M.E - doubt

whizkid420, July 31, 2006 - 10:16 am UTC

Hi Tom,

According to S.A.M.E methodology, we have to place frequently accessed data on the outer half of disk drives.

Just how do we do this? We are designing our storage for the new SAN box and not sure how to implement the above point since we dont have any control on where to place the data on a disk if all the disks are stripped and mirrored.

Looking forward to your help..

THanks..

Tom Kyte
July 31, 2006 - 10:26 am UTC

it is quite simply "not realistic"

You would basically have to partition each disk and only use part of it.

It is quite simply "not realistic"

A micro optimization that just isn't going to be practical.

outer disk...

Mark, July 31, 2006 - 1:16 pm UTC

If the figures in the SAME document are realistic, there could be a fairly significant performance increase in using the outer half for frequently accessed data.

I mean, it would definitely be overkill to collect stats about hundreds of tables, and handpick which go into which partitions. Then the stats change over time and you redo the whole thing. That would be silly.

But say you place archived data - perhaps big lobs that are occasionally retrieved - in the inner partition; your 1000 queries/sec application data in the outer. Doesn't seem like a bad idea to me...

Just my opinion,
Mark

Tom Kyte
July 31, 2006 - 2:45 pm UTC

it just isn't realistic if you ask me (give the size of disk arrays these days) but if someone wants to benchmark it and lay out the process - go for it :)

ASM vs. using specific disk requirements

A reader, August 03, 2006 - 7:51 am UTC

Does anyone have an opinion or expereince with using ASM in place of designating specific disks to redo, temp, archivelogs, etc.

I'm just wondering if it's necessary to have a disk group consisting of a mixture of small/fast and large/slow disks if you use ASM or if you can just use middle-of-the-road disks in the group and let ASM handle different I/O loads when needed.

Tom Kyte
August 03, 2006 - 9:41 am UTC

why would it have to be "in place of"

with ASM you can designate specific disks to redo, temp, archivelogs, etc

you would make the same decisions you would make with <file system driver of your choice here>

Platter Placement of Luns

Terry, August 03, 2006 - 10:17 am UTC

Well in a general sense platter placement of luns seems quite impractical until now. Pillar Data Systems Axiom storage array implements something called QoS (Quality of Service) which uses both platter placement and queue management to establish the QoS. You basically decide whether you want high, medium,low or archive class of service and you provide some information about the data access (read, write, mixed, random sequential etc.) It can be seen at pillardata.com and probably at Oracle Open World this year.

Tom Kyte
August 03, 2006 - 10:25 am UTC

Right, but we humans are not doing this slicing and dicing of 5000 spinning disks. It is a "feature" of a piece of hardware...

"fast=true -> less storage available"

but, it is a "feature of the disk array", nothing you and I really do.

To trust your SAN

Robert Lockard, August 03, 2006 - 4:07 pm UTC

In Our environment we use a number of SANs. Our Unix/storage group touted these wonderful pieces of equipment will never go down. And I am here to tell you, we have lost one of our SANs completely. The first time it took the vendor a week to repair the SAN, the second time it took the vendor a week to repair the SAN and the third time we lost our SAN, it took the vendor Â… a week to repair the SAN. We have silver maintenance from the vendor and that gives us 4 hour service.

I say this because if you are smart you will have a DR site with a standby database. It was our standby database that got us though all three of these instances. Your mantra should be stable and recoverable. So, if your Unix/storage group is telling you these wonderful pieces of equipment will never go down, think again and get a DR site up.

Tom Kyte
August 03, 2006 - 5:04 pm UTC

no one can ever say "it will not go down" - that would just be "silly" to say.

Consider this one word:

FIRE!

oh, there goes the san, and the servers, and the cables, and the monitors and the mice and.....


There are lots of "one words" - fire, flood, hurricane, tornado, tsunami, explosion.

Hetal, August 25, 2006 - 12:39 am UTC

IÂ’ve an issue related to Oracle 10g RAC.

IÂ’ve 2 node cluster each being Dell 2850 Server with RHEL 4.0

IÂ’ve EMC CX300 SAN storage with following partitions

/orasoft 10 Gb OCFS2 File system

/oracrs 2 Gb OCFS2 File system

/orabackup 100 Gb OCFS2 File system



The datafiles are on ASM which is not directly visible in OS.

IÂ’ve common Oracle Home installed in /orasoft/db_1 which is shared by both nodes in cluster.

IÂ’ve faced an issue recently related to EMC storage.

The /orasoft partition displays 1.4 Gb space available using df command.

With both nodes sharing the common Oracle Home (/orasoft/db_1), when ever I try to touch a file I get an error as No Space left on device. IÂ’m unable to start any service with the same reason.

Is this setup correct ??

Can You help me with this storage issue ??



SAN Troubles

A reader, August 28, 2006 - 2:52 am UTC

I'm not sure i'm a fan of SAN's per say.

Currently my bulk load is running 10 times slower then normal because we have a SAN shared between different business units.

These business units are totally unrelated to us yet they are sucking the performance out of our system.

It's probably more of an issue in terms of our SAN infrastructure but these are some of the potential issues with SAN's when they promise to deliver so much and set up as the one for all solution.

Tom Kyte
August 28, 2006 - 9:11 am UTC

</code> http://www.urbandictionary.com/define.php?term=per+say <code>

you have a problem with your SAN setup is all. There are potential issues with all technology when it is improperly implemented.

If this was correctly partitioned, your different business units would not be contending with each other.

A reader, October 03, 2006 - 7:35 am UTC


LUN size different on SAN for recovery

sajnish, October 06, 2006 - 2:53 pm UTC

Hi Tom,

We are running ORA9iR2 and are planning to recover from a SAN where LUN was 16gb to a 64gb LUN. The online backup was taken on 16gb and now we like to restore to 64gb, then recover with redo logs and then apply archive logs. Would it cause any issues because of the differenct block sizes. The db block size is the same.

Tom Kyte
October 06, 2006 - 3:12 pm UTC

different block sizes? where?

but basically "no" you are dealing with good old fashioned "files"

Different LUN sizes

sajnish, October 06, 2006 - 3:36 pm UTC

The difference would be in os block sizes, we are going from LVM -> VxVM (HPUX 11i PA-RISC to Itanimum). Still this would not make any difference why applying redo logs?

Tom Kyte
October 06, 2006 - 3:37 pm UTC

well, you mentioned "16gb and 64gb" which would not imply block changes necessarily.

The redo will be fine. They are "just files", put em on NFS, SMB, SAN, NAS, scsi, ide, eide, ext2, ext3, ASM, whatever.

Oracle Software installation on local drive instead of Shared drive

Vijaya, November 06, 2006 - 5:55 pm UTC

Hello Tom,

I am looking for the reason why oracle says to install software and init.ora on local drive instead of shared drive on hardware cluster configuration. We do not have RAC installed in our environment. software, datafile, control file, redo logs are in /u01 on shared drive for 2 node cluster.

I would like to convience management why we need to install software on local drive of each node.

Vijaya

Tom Kyte
November 07, 2006 - 4:28 pm UTC

why can't you

and I don't believe you'll really find it "convenient", patching will be an obvious issue.

You are using OCFS or something like that right - you wouldn't put a single point of failure like an NFS mount in there would you?

software installation on local drive instead of shared drive

vijaya, November 08, 2006 - 3:15 pm UTC

We are not using OCFS. The servers are cluster and redhat cluster software installed. The /u01 disk is on shared drive on the SAN.The startup and shutdown scripts on the home directory of oracle user which starts the databaseduring the failover.
My question is in RAC configuration, we are installing oracle software i.e. oracle home on each node.Though we don't have RAC. Why we are installing oracle home on local disk?

Tom Kyte
November 08, 2006 - 8:08 pm UTC

you tell me perhaps? I am not the one stuffing the install CD into the cdrom and running the installer??

A reader, November 09, 2006 - 12:11 pm UTC

>> Why we are installing oracle home on local disk?

This would allow you to apply upgrades and patches with no or minimum database outage. If you share the Oracle home, both instances may have to go down when you apply patches or upgrades.

Pre-allocate

HK, November 14, 2006 - 10:06 am UTC

Tom,

Is there any advantages in pre-allocating space to frequently extended tablespaces, rather than allowing Oracle to allocate the space as and when it is needed.

Regards,

HK

Tom Kyte
November 15, 2006 - 6:32 am UTC

sure, you can pay the performance price of allocating the storage right now, today, when the system is not very busy - rather than paying that price later when it is.

Also, you KNOW it is there and dedicated for that tablespace, with a dynamic allocation - the space may or may not exist in the future.

disk size and performance

a reader, November 17, 2006 - 9:07 pm UTC

Hi, Tom:

Currently our existing only asm disk in the datafile diskgroup is almost filled up. So we're going to add another asm disk to this disk group. If the new asm disk is of different size from the existing one, will there be any negative impact on the database operation, especially on performance?

Thanks.

How exactly implement my SAN storage

esiole, April 24, 2007 - 8:35 am UTC

I use to work with IBM Storage systems such as DS4200, DS4700 or even DS4800.
So when i install my DS4x00, i have to put disk into array (Raid5 or 10) and define LUN, or let them access directly by the server (as internal disk).

So if i want to use ASM, what is the best method ? Define Raid & Lun (Logical use) or use physical hard drive ?

Sorry for my english, i hope my question is understandable !
Tom Kyte
April 24, 2007 - 11:28 am UTC

there is no best method, there are only choices.

do you want the hardware to do the mirroring, or do you want ASM to do that. You choose.

RAC 10R2 migration from SAN to SAN

Baqir Hussain, July 17, 2007 - 12:40 pm UTC

3-node (Linux Opteron x86_64) RAC is configured on EMC SAN Clariion-400 with ASM. Oracle binaries are installed on local drives.
Now we have a new SAN Clariion CX-10 on different location with different subnet (IP address). All appropriate LUNS have been created on this SAN. The 3 servers attached to Clariion CX-400 will be used for new SAN Clarrion CX-10.

I would like to know how can I move oracle data (votingdisk, OCR, oracleasm disks etc) from Clariion CX-400 to Clariion CX-10?
Please advise.

Thanks


Tom Kyte
July 17, 2007 - 1:13 pm UTC

when you do your test restores, what process do you use to verify you can do that.

use the same process.

Shailesh, September 11, 2007 - 11:17 am UTC

Hi,

You mentioned <asm is a "database file system", give me the LUNS> above.
Does that mean that the real advantage of using ASM will be when you disable the RAID at hardware level?

consider the following -
Server having 500gb (say 50 gb * 10 disks). I want to keep 300 gb for datafiles. Remaing 200 gb is used by OS and Oracle binaries on LVM partition.
Remaing 300 gb is available as 50gb * 6 disks.
1. If i implement ASM on 300 gb space will it stripe and mirror datafiles on these 6 disks
2. Does that mean we dont need to have LVM over these 6 disks?
3. If we dont use LVM for these 6 disks then how this statement
"give me the LUNS" is true for datafiles


Shailesh
Tom Kyte
September 15, 2007 - 3:44 pm UTC

you can still have raid on at the hardware level.

1) yes.
2) yes.
3) don't know what you mean. why do you see a conflict there. We do not need a logical volume manager, you can use one - but you need not.

disk, nas, san... speed

dario, February 08, 2008 - 10:42 am UTC

(Not sure if this should be a new question...)
Could you provide (if it make sense) with some script or guideness to make one, in order to test storage performance?

The goal is to be able to measure impact of storage changes on a database. ("why did you recommended to buy this zillion dollars storage? It isn't any faster than my vmware to me...")

I have tried to do some basic test. (yes... I though It could work... really).
create table bla nologging tablespace beta as select /*+ NOCACHE */* from large_table_in_diff_tspace;

But results were poor against a simple 'cp'. There seems to be (well... obviously... ) many layers processing data...
1) should I try an export direct path?
2) do a parallel query?
3) Precreate table?
4) Is it useless and no-sense?
5) Should I hit my instance with several querys and read AWR carefully?

Thank you.
"Give it to us raw and wriggling" Gollum.

Beyond SAN (and disks even)

Andrew, February 10, 2008 - 10:49 pm UTC

Here's something which looks like a potential SAN killer:

http://www.fusionio.com/faq.html

If this product does deliver what it promises, I can see that Oracle should have a new flavor of its RDBMS especially for flash-based systems, where small vs big I/O requests no longer matter (basically cutting down a lot of the existing optimizations for disks).


datafiles on ASM,

A reader, September 15, 2009 - 11:20 am UTC

We are currently in the process of migrating 14 TB of data from single instance (SI) database to multi-node RAC. The data base in the single instance is on a file system and on RAC it is SAN storage. We have ASM instances running on each RAC cluster to manage the datafiles.

The size of all the datafiles on the SI database is 14 TB (sum of bytes from v$datafile), however, the acutal size of the database is 10 TB (sum of bytes from dba_segments). Some of the tablespaces are 100% full and the DBAs are adding more datafiles to the tablespace that are full. I think the 4 TB of free space is spread across on most of the datafiles. But we are bound to create datafiles for certain tablespaces that are 100% full without leveraging the free space on other datafiles.

After we migrate to the ASM (we have alloted only 14 TB of space on ASM), can those tablespaces which are almost 100% full make use of the space within (4 TB of free space)?

Could you please point out some inner details on how ASM works with regard to space allocation for datafiles.

Thanks,

Tom Kyte
September 15, 2009 - 12:59 pm UTC

... After we migrate to the ASM (we have alloted only 14 TB of space on ASM), can
those tablespaces which are almost 100% full make use of the space within (4 TB
of free space)?

...

i'm not entirely sure what you mean by that - but - why not create the tablespaces "smaller than you think you need" and let them autoextend. After the move, all tablespaces will be 100% and all free space on devices in a diskgroup will be usable by any of the tablespaces in that diskgroup.


... Could you please point out some inner details on how ASM works with regard to
space allocation for datafiles.
...

just like a filesystem - because it is. You create a diskgroup (mount point), it has X units of space. You create datafiles in that diskgroup and they can use in total upto X units of space - and then you can add more devices to the diskgroup - expanding the storage available.

Or if the datafiles take less than X units of space, you can drop a device from the diskgroup and add it to another.

just think of a diskgroup as a mount point.

ASM

A reader, September 15, 2009 - 2:58 pm UTC

Let me put my question in a different way:

In the 14 TB database which sits on file systems, we have 3-4 different tablespaces which are 100% full. There is no room in the filesystem (mount) to grow further. In this situation we add more datafiles (which sits on a different mount) to the tablespace.

In the same situation above but on ASM storage (we are alloting only 14 TB of raw devices to ASM storage), does Oracle make use of the 4 TB of free space for new data or do we need to add more disks to the ASM storage?

Thanks,

Tom Kyte
September 15, 2009 - 3:04 pm UTC

ASM is not any different from the file system.

in ASM you will create disk groups
disk groups are just like your mount points are right now.


in ASM you can grow and shrink the disk groups at will. If you wanted to you could take your paragraph:


In the 14 TB database which sits on ASM, we have 3-4 different
tablespaces which are 100% full. There is no room in the diskgroup (mount) to grow further. In this situation we add more datafiles (which sits on another disk group) to the tablespace.


or you could write it as:


In the 14 TB database which sits on ASM, we have 3-4 different
tablespaces which are 100% full. There is no room in the diskgroup (mount) to grow further. In this situation we drop a device from some other diskgroup with lots of free space - or we just had a device sitting unused - or we just added more new devices and add that device to our diskgroup, letting it grow and letting the datafiles on it grow (autoextend or resize)

ASM

A reader, September 15, 2009 - 3:40 pm UTC

So the bottomline about my issue is, currently we get ORA errors when the tablespace is full and after migrating to ASM we may still get the same error.

In file system, we add datafiles borrowing space from different mounts, in ASM we need to add more disks to diskgroup.

So how I leverage the 4 TB of free space? This is a lot to go unused.

Thanks,
-Shiva

Tom Kyte
September 15, 2009 - 4:33 pm UTC

if you run out of space, you run out of space - yes, it doesn't matter what the file system is - you run out of space and bam, it'll stop.


... So how I leverage the 4 TB of free space? This is a lot to go unused. ...

you tell me??? I suggested

a) put devices into a diskgroup
b) create small tablespaces, let their datafiles autoextend with a reasonable next size


and all space will be utilized and there will be "free" space in the diskgroup that isn't yet used. I'm not sure why this "4tb" is going to be a problem

unless you create lots of diskgroups (which cannot share space), in which case you are going to have to do careful planning to figure out how much disk to give each one.

ASM - Very useful tips,

A reader, September 15, 2009 - 5:25 pm UTC

I am going to have one diskgroup for all the Oracle datafiles (one each for archivelog files and redo log files). Since this is going to be a migration from single instance file system to 3-node RAC on ASM storage, I don't understand your 2nd suggestion.

Thanks a lot.

Tom Kyte
September 16, 2009 - 8:47 am UTC

when you say "I don't understand your 2nd suggestion", are you talking about:


a) put devices into a diskgroup
b) create small tablespaces, let their datafiles autoextend with a reasonable next size


All I'm saying there is "one diskgroup, has all storage" - that is (a)

And then "for each tablespace you want, create it SMALL, much smaller than you know it will actually be after you move. Permit these tablespaces to have datafiles that autoextend (grow automagically) with a reasonable next size (couple of megabytes)"

if you do that, when you are done you will have a diskgroup that has as much free space as possible, every tablespace will be 100% full just about. Exactly what you want - no wastage, no over allocations anywhere.

Create equi-sized LUNs

Pasko, September 16, 2009 - 4:59 am UTC

Hi Shiva,

For your Question:
So how I leverage the 4 TB of free space? This is a lot to go unused.

Like Tom told you, you have to ask your SAN-Admin to partition that 14 TB into a bunch of Equi-sized LUNS.
Then you would create different ASM Disk Groups( say DG_DATA for Datafiles and DG_FRA for Flash Recovery Area) and then assign the LUNS to Disk Groups.

We have RAC on Linux SuSe, 10GR2 10.2.0.4, using EMC DMX-3 SAN with ASM, and also EMC Powerpath Multipathing software.

The LUNS from the SAN are bound through Linux raw Devices.
We have different sizes for LUNS, some are 25 GB and some are 75 GB. Total Capacity around 2 TB. Best Practice for ASM is to put equi-sized LUNS in a Disk Group.

Current Config.
---------------
SAN-->LUNS-->Linux-raw-devices-->Powerpath Device-->ASM-->Disk-Groups-->Tablespaces

So far we haven't had any Problems with this Configuration.
All the Disk Groups can be shared from all Databases in the Cluster.
Configure enough LUNS and leave some LUNs as Spare LUNS just in case you need to add more space to a Tablespace, because adding a Disk(LUN) to a Disk Group can be done online. You can also move Disks(LUNS) from one Tablespace to another ONLINE.

One thing which we could not do online was when we attached new LUNS to the Server, we had to restart the Nodes.

One Important Consideration for RAC in SAN Environments is to have a Balanced System.
That means, you have to balance the Speed of various Components: CPU Speed (MB/s), HBA Speed(MB/s), SAN-Switch Speed, Interconnect-Speed and also you have to consider the number of Spindles. Remember, the weakest Link in your Configuration would be the highest throughput you can have.
http://blogs.oracle.com/AlejandroVargas/gems/LinuxRACASMForumSep4.pdf


Question:
Is it true that using ORION would erase Data on the Disks?
Or,can i use ORION on a running Production System?




Tom Kyte
September 16, 2009 - 4:13 pm UTC

... Is it true that using ORION would erase Data on the Disks? ...

only if you want it to. there is a write parameter that would control that.

... Or,can i use ORION on a running Production System? ...

that would be a bad idea, you'd want a maintenance window and a full backup ready.

ASM

A reader, September 16, 2009 - 11:05 am UTC

Thanks for sharing the information.

In our environment, the storage team is presenting us equi-sized LUNS (1.4 TB times 10). We are good in hardware setup.

Let me explain the situation in our end:

Database DB1 is on single instance and datafiles are stored on file system (various mounts). There are 3900 datafiles.

We need to migrate the database from single instance to 3-node RAC. We are using ASM storage (using the 10 LUNS mentioned above).

DB1 is backed up (hot backup) and all the required log files are sent us to tape. So the tape will contain, controlfile and all the 3900 datafiles along with bunch of archivelog files. Total size of all the 3900 datafiles is 14 TB.

We restore the contents of tape into a temporary file system. We will bring up the DB by pointing to the temporary file system. The database gets opened after applying the archivelog files.

Now, using RMAN we migrate all the datafiles from temporary file system to ASM. This migration will copy datafiles (as it is) to ASM and we open the database. Now my diskgroup size is 14.2 TB.

Tom mentioned to create small tablespaces on ASM storage and allow it grow based on it needs that way the free space of 4 TB can be used. In the above mentioned method, I don't understand at what stage the small tablespaces will come into picture.

Thanks for your time.

-Shiva


Tom Kyte
September 16, 2009 - 5:32 pm UTC

you didn't give us this bit of detail at all:

.. Now, using RMAN we migrate all the datafiles from temporary file system to ASM. ...

I thought, for whatever reason, you were loading raw data.

You will have what you have after you are done, nothing more, nothing less.

I don't know what the "4tb" is anymore here - was that the sum of bits of space on many many mount points? free space in each datafile?


but in any case, you know now that ASM is just a filesystem - you would use the same "math" you would normally use to figure out what space will be

ASM datafiles,

A reader, September 16, 2009 - 6:44 pm UTC

You are right. The 4 TB is sum of bits of space on datafiles located on many mount points.

About shrinking the datafile: If the size of the datafile is 1 GB and it contains 550 GB of data in that, I want to reclaim the 400 GB by shrinking the datafile to 600 GB.

In file system, I couldn't shrink the datafile if a small piece of data is residing at the end of the file. In ASM storage, for the same situation can the datafile be shrinked?

Thanks,
-Shiva

Tom Kyte
September 16, 2009 - 7:56 pm UTC

... If the size of the datafile is 1 GB and it
contains 550 GB of data in that, ...

how do you do that :)


... In ASM storage, for the same situation can
the datafile be shrinked?
...

ASM is a file system, extents are still extents, extents still have a location in a datafile. A datafile is a datafile - in ASM or a filesystem.

If you could not shrink it on a 'filesystem', you will not be able to shrink it on a filesystem (ASM)

Outer part of Disk - inbuilt functionality in ASM 11g

Jagjeet Singh, September 17, 2009 - 12:06 am UTC

It seems somewhat you are not agreed/convinced with approach of using outer part of disks while creating Partitions/Luns.

But in 11gR2, ASM provides the functionality hot/cold partitions where you can specify which data to be stored on outer part/partitions.

Regards,
Jagjeet Singh
Tom Kyte
September 17, 2009 - 9:04 am UTC

... It seems somewhat you are not agreed/convinced with approach of using outer part of disks while creating Partitions/Luns. ...

where did you read that? reference please


I did say it was a micro optimization that if you had to do it manually would be so tedious as to make it not realistic to do. If you had so many disks that this would make a difference - doing it manually would be really hard. If you have so few disks that doing it manually is ok, you have a tiny system and it wouldn't matter.

If you have something that can automate it and provide that incremental improvement - sure. On our TPC benchmarks - we many times just use a small percentage of the disk, just the outside, to eek out a tiny bit more. But in real life, most people cannot do that.


With ASM what they would do is have the concept of a hot and a cold portion of the disk.

The hot portion would be used for database files
The cold portion would be used for backups, staging, archiving, whatever

so all gets used and it is more of less transparent and automatic.

Did you count for Temp?

Pasko, September 17, 2009 - 4:15 am UTC

Hi Shiva,

You said you have 4TB Free Space, because you computed the sum of Bytes from dba_segments.
Note that TEMP Space is not included in dba_segments,so could it be that some of this 4TB might be TEMP?



ASM,

A reader, September 17, 2009 - 10:31 am UTC

Now I am kind of debating myself the advantages of having ASM (over a filesystem). I know IO is good, we can add or remove disks from diskgroup without bringing down the database but the age old requirement of resizing the datafile however the way it is, should have been an option.

If my datafile cannot be resized (no free space at the end of the file), this is a drawback in Oracle. All I need is 1 GB file to be resized to 600 MB (sorry for putting it as GB in my previous post) and if that doesn't happen just because one block of data is sitting at the end of the file, it is not a conveninent way for business. I don't know how much a deal for Oracle to make this happen, but I hope someday I could do that.

To answer Pasko question, 4TB of temp space!! that is too large.

-Shiva


Tom Kyte
September 17, 2009 - 1:35 pm UTC

... If my datafile cannot be resized (no free space at the end of the file), this
is a drawback in Oracle. ...

that is and will be called "a reorganization"

you understand that the ROWIDS HAVE TO CHANGE - that impacts a lot of stuff.

I would ask you "why do you need to constantly resize" - it could only be because you have tables you frequently truncate or drop - if so - isolate them, put them into a tablespace. Then - you will never need to shrink

Db in only one filesystem in the SAN

Linda, March 19, 2010 - 1:22 pm UTC

I want to know if it can be considered a Best Practice to put all database files (redo, control files, data files and indexes all in one filesystem.
system Admin told me that the filesystem resides on the San and inside the San there is several disk so to put it in that one big filesystem.

Tom Kyte
March 19, 2010 - 2:47 pm UTC

and the SAN man probably shared that with dozens of other people to make it easier for the SAN man...

So when the Microsoft exchange backup on the other server kicks in, and your database physical IO's go up and up and up in time, it'll be just dandy.

You will get average to less than average out of this configuration on a good day - unless you have average to less than average needs in which case you might be OK.

It depends on your IO needs. Right now, you'll be average.

Exadata vs SAN

A read, March 21, 2010 - 11:54 am UTC

Hi Tom,

How do you make a comparison between SAN and Exadata. after search for exadata. I have come the know the following about exadata.


1. uses flash drives therefore fast than SAN.
2. support parallel query optimization
3. advance compression.

I am evaluating exadata for warehousing purposes under very heavy loads. which require parallel query and compression. but i am not able to understand how parallel query will work with exadata.

document says that it process the parallel query close to data where disks resides. so for example if i use DOP 16 for select then a QC is created and slaves (16 i think) are created. these slaves reside on database server so how exadata will optimize parallel processing. cannot understand.


Tom Kyte
March 22, 2010 - 8:56 am UTC

Exadata is a SAN

A SAN that understands it is serving up Oracle database blocks. A SAN that knows how to process a where clause. A SAN that knows how to filter and group and do regular expressions and date comparisions and compression and decompression and encrypt/decrypt operations - in a massively parallel fashion.

A SAN from "some vendor" is designed to be a very general purpose file system. Nothing more, nothing less.

Exadata is designed to be a hardware extension of the database itself - with a lot of database functionality pushed all of the way down to the disk.


To full scan on exadata in parallel, you don't need 'slaves' on the server, you have a set of disks (an exadata cell) with a computer in front of it, with cpus, with memory, with software. When you add storage to an exadata system - you are not just adding disk, you are adding a 'cell' which has disk + CPU + memory + software. So, to do a massively parallel full scan - you just submit read requests to the exadata cells and they (in parallel, massively parallel, shared nothing) read the data, process the blocks, find the data of interest (doing the decryption, decompression, where clause processing, etc) and return to the server machine JUST the rows and columns of interest.


It is not comparable to a conventional SAN, a conventional SAN cannot do 99% of what Exadata is doing data wise.

check this out:

http://kevinclosson.wordpress.com/2008/09/26/oracle-exadata-storage-server-part-iii-a-faq-is-born/



SAN

John Fak, March 23, 2010 - 9:50 am UTC

SAN design is complex. Hence why we have highly paid (and warranted) system architects. Most of the time its overspec'd FC/large cache because of this very fact.

From and oracle perspective - SAME/MASE work - as its easiest, it incorporates the most disks, and is flexable.

How many DBA's out there know how many disks the LUN your volume/volumes are attached to have and what there IO capabilty/threshhold are - probaly very few.

Your best bet is to configure oracle first for best IO throughput, monitor caching and service times, then drill down with the assistance of your SA and san ppl :)

eg

Use RAW, ASM or ODM (vxfs)
Directio_IO and quick_io also help if don't want raw/asm
remember ufs and vxfs etc have different mechanics for direct_io and locking etc

Then monitor your service times (v$event_histogram, v$iostat, AWR reports) as well as filesystem level monitoring (iostat, vxstat etc)

Beyond that => discussions with your sysadmin and storage to understand what applications share your 'pathing' and essentially your san cache and disk/controllers etc :)

John

one more thing

John Fak, March 23, 2010 - 9:54 am UTC

oh - and something I still see very common :(

don't configure a 2GB SGA on a system that has 30GB RAM and uses ASM or direct_io :)

RAMS both expensive + and amazingly useful :) And unless you like caching everything outside of the database - not great value for your performance.

Confused About Raid Advise

Jim Cox, August 13, 2010 - 12:11 pm UTC

Hi Tom

Have read several articles on RAID recommendations and just when I think I understand, I find a contradiction. I have Oracle 10.2.0.4 on a Windows 2003 R2 OS with an HP SAN that allows for RAID 0,1,3,5,6 only.
Oracle states that everything BUT .dbf (datafiles) should go on RAID 5 (redo,archive). Datafiles go on RAID 1. UNDO tbs datafile , I am not sure about as it is a dbf file but one Oracle note (38281.1) (12.) says rollback segments go on RAID 1.
I am assuming UNDO is rollback segments in this article.
Appreciate your comments

Thanks
Jim
Tom Kyte
August 19, 2010 - 12:39 am UTC

where does oracle 'state' that.


Oracle Notes Regarding RAID

Jim Cox, August 19, 2010 - 4:14 pm UTC

Hi Tom

some of the oracle Notes that I have been reviewing are

Note 97597.1 Item 5.

https://support.oracle.com/CSP/ui/flash.html#tab=KBHome(page=KBHome&id=( )),(page=KBNavigator&id=(from=BOOKMARK&bmDocTitle=Optimizing%20RAID%20Performance%20for%20Oracle%20RDBMS%20on%20Windows%20NT/2000&bmDocType=BULLETIN&bmDocID=97597.1&bmDocDsrc=KB&viewingMode=1143))

Note 30386.1 Item 3.

https://support.oracle.com/CSP/ui/flash.html#tab=KBHome(page=KBHome&id=( )),(page=KBNavigator&id=(from=BOOKMARK&bmDocTitle=I/O%20Tuning%20with%20Different%20RAID%20Configurations&bmDocType=TROUBLESHOOTING&bmDocID=30286.1&bmDocDsrc=KB&viewingMode=1143))

Note 38281.1 Item 12.

https://support.oracle.com/CSP/ui/flash.html#tab=KBHome(page=KBHome&id=( )),(page=KBNavigator&id=(from=BOOKMARK&bmDocTitle=RAID%20and%20Oracle%20-%2020%20Common%20Questions%20and%20Answers&bmDocType=FAQ&bmDocID=38281.1&bmDocDsrc=KB&viewingMode=1143))

Thanks
Jim

Tom Kyte
August 25, 2010 - 11:58 am UTC

note 97597.1 is titled "Optimizing RAID Performance for Oracle RDBMS on Windows
NT/2000". Even before reading that one, I would have disregarded it as it is *ancient*. It is way old. I did not read it.




note 30286.1 (not 30386.1) doesn't say "put everthing but redo on raid 5", it says "DO NOT put redo on raid 5", which makes sense (avoid raid 5 if you can at all - good for reads, bad for writes and very low availability characteristics - cheap and slow it is)


note 38281.1 is giving you permission to put datafiles on raid 5 (you mostly READ from datafiles, dbwr does the writing MOST of the time in the background so it's good read but poor write speeds can sometimes be hidden), but I do not see it saying "it is best to put them there"

there is a huge difference between saying "it would be acceptable to use raid five for this type of file in general" and saying "use raid five for this type of flie"

The notes propose the former, not the latter.




Thanks For The Assist

Jim, September 14, 2010 - 3:58 pm UTC

Thanks Tom for looking into this issue

Jim

Extremely useful

DJB, December 15, 2010 - 9:47 am UTC

Especially in the light of the 'SANE SAN' presentation at the recent UKOUG. This suggested that SAN wasn't always suitable for database systems because of latency, amongst other things, and that SSD's would soon be as cheap. In addition it said that hosted disks for the database were now coming back into 'fashion'.

Tom, did you see that presentation or read the paper, and if so have you any comments ?
Tom Kyte
December 15, 2010 - 11:50 am UTC

I saw it a while ago - I haven't seen Jame's updates to it however.

In the years since this was first commented on - many things have changed - when I think of a high performance SAN for the database, I now think of exadata - a database only SAN.

Presenting Mount Points for multiple instances on the same server

Anand, February 07, 2012 - 10:25 am UTC

(We would like to get to a One Server One Instance situation..)

Our oracle server is such that multiple instances (at least 3) are in one server and the backend uses NAS. All of these are OLTP systems.

Currently we have presented our mount points to the server as someone had provided an example in prior posts above

/instance1 (RAID5) (Good Volume of data, over 150 GB)
/instance2 (RAID5) (Decent Volume of data, over 60 GB)
/instance3 (RAID5) (Small Database, over 5 GB)

/redoA, /redoB, /redoC (Redos of all instances, one file per mount point per instance go here)
/undo (Undo of all instances go here)
/temp (temp of all instances go here)
/arc
/backup

My question on this setup is
If 2 of the instances are busy most of the times, theoretically, could there be a impact from one instance affecting the other.

What would happen if all of the data files of all instances are put in one huge mount point say

/data

Your insights are very much appreciated.

(My knowledge on NAS is very limited at this point and I am finding it difficult to visualize how data is written in the backend)

Tom Kyte
February 08, 2012 - 1:29 am UTC

If 2 of the instances are busy most of the times, theoretically, could there be
a impact from one instance affecting the other.


absolutely. Think CPU consumtion, IO bandwidth, memory utilization.


What would happen if all of the data files of all instances are put in one huge
mount point say

/data


impossible to say, you don't say right now if /instance1 and /instance2 share devices, network paths, whatever.

If they do not, if they were kept separate at the NAS layer, then you would be changing the configuration big time.

If they do share everything, then you already have the logical equivalent of /data with your current scheme.

Previous Post Correction

Anand, February 07, 2012 - 1:55 pm UTC

I should have said SAN and not NAS.
Tom Kyte
February 08, 2012 - 1:29 am UTC

same answer.

storage

A reader, June 16, 2012 - 1:35 pm UTC

Tom:

Do you know of a good reference, book or link that explains storage design options for oracle server (like what options do I have like NAS, SAN, local disk ,etc.)

I am not very clear on the different options and i would like the know the best disk layout and RAID stuff.

I had some training on ASM and it sounds good thing but complex a little and more pratical for large databases. For a database of 15 GB in size it would is not worh it.
Tom Kyte
June 16, 2012 - 2:33 pm UTC

Sam

there is no best.. You should know that - there is no single best, otherwise there would be no alternatives.


ASM would be *awesome* for a database of any size, from 100mb to 100pb. Why do you think, Sam, that it is complex? It is not any more complex than any file system out there - given that it is - *just a file system*???? Like UFS, like NTFS, like *ANY*fs..

And what does it have to do with NAS, SAN, local disk - since it is a file system and you need one of those with NAS, SAN and local disk?? NAS, SAN and local disk are not file systems - they are simply storage devices.


with NAS/SAN/local disk you are worried either about IOPS (OP's per second) for OLTP or bandwidth (gigabytes/second) with reporting/warehousing.

The file system is a matter of choice after that. And as a DBA, ASM would be a natural choice.

storage devices

A reader, June 17, 2012 - 2:05 pm UTC

Tom:

I should have said the oracle options for storage devices.

for example, i can use one local disk, SAN, NAS to store oracle data. I want to explore the pros/cons of each option and what to request from storage guys on how to layout the physical/logical disks, controllers and power.

for example, I need to tell them i need one physical disk of size X for data files, and i need another physical disk of size Y for backups and flash recovery files, etc. I need to multiplex or mirror control files on 3 separate disks, etc. I need to use RAID 10 for this, etc.

Storage people do not know anything about how oracle stores data and the optimal storage design for it.

Do you know of a good reference (book, article, page) that explains that?
Tom Kyte
June 18, 2012 - 8:56 am UTC

Sam,


I know of no book that goes over these hardware architectures, but that is OK, because I sort of told you what you need to know and you can get that from the box and by understanding how YOUR engineers hooked all of the boxes up to each other.


If you are doing OLTP, random reads all over the place, you would be predominantly concerned with IOPS - IO's per second. How many do YOU need to do, how many can the box DO. Secondarily - you need to ensure you have the bandwidth to do it. If you say you need 10,000 8k IOPS per second - you need to be able to transfer from the storage (local, san or nas) at least 78MB/sec. If your NAS for example was connected to your server via a 1gb (gigabit) ethernet, you are very very very close to exhausting your ability to do that (you might, on a good day, get 100MB/sec on that cable).


If you are doing reporting/warehousing - you need to be concerned about IO bandwidth. How many megabytes of data transfer are you ASSURED of having (assured, guaranteed, dedicated to you). Make sure the engineers didn't route the SAN to your server via a single shared switch. Make sure the engineers didn't take ten 1gb cards out of the NAS and filter them all through a single switch. Etc. What is your guaranteed bandwidth and what sort of bandwidth do you need? If you need to full scan a TB of data on a non-exadata platform, where the 1TB would have to be returned to the server - how many minutes, hours, or days would that take and is that OK.



for example, I need to tell them i need one physical disk of size X for data
files, and i need another physical disk of size Y for backups and flash
recovery files, etc. I need to multiplex or mirror control files on 3 separate
disks, etc. I need to use RAID 10 for this, etc.


No really, if you use ASM, you just tell them "give me raw devices".

If you rely on them, you basically need to tell them "IOPS" or "Bandwidth" and that they are responsible for making sure they have diskgroups that look like this with failure groups that look like that (eg: if you don't use ASM you have to do the same work you would do with ASM....!!!)


And SAN/NAS/local storage/SSD/ABC/XYZ - it doesn't really matter. You have the same needs, the same demands, the same work everywhere.

I don't know how else to say this to you Sam.

A reader, June 20, 2012 - 6:42 pm UTC

<<<If you are doing OLTP, random reads all over the place, you would be predominantly concerned with IOPS - IO's per second. How many do YOU need to do, how many can the box DO.>>

OK, great. But how do you figure this out for the database.
Is there a SQL query or dynamic view that tells you how many IOs per second the database system is doing.

I assume he hardware specs will tell you how many can the box do even though it might be different for a VM box installed on that physical box.



<<< Secondarily - you need to ensure you have the bandwidth to do it. If you say you need 10,000 8k IOPS per second - you need to be able to transfer from the storage (local, san or nas) at least 78MB/sec.>>>

How did you calculate that? that 10,000 blocks per second need bandwidth of 78 MB/sec between server and storage device.

10,000 * 8,000 bytes * 8 bits = 640,000,000

is not that 640 Mega bits/second.

Thanks a lot for the excellent info.


Tom Kyte
June 21, 2012 - 6:11 pm UTC

OK, great. But how do you figure this out for the database.


only by understanding the needs of the application Sam, only by understanding the application.


Is there a SQL query or dynamic view that tells you how many IOs per second the
database system is doing.


sure, all over the place, just use AWR or statspack.

but remember - that is how many it is doing, not how many it WOULD LIKE to be doing. You have to include in your *analysis* of the numbers the waits you are experiencing and if they are above what they should be (3-5ms is pretty good for a random IO) - then you know you need more.


How did you calculate that?

ummm, I multiplied and divided???????

10,000 blocks of 8k =


ops$tkyte%ORA11GR2> select 10000*8/1024 from dual;

10000*8/1024
------------
      78.125



if you need 10,000 reads of 8k blocks, you need to be able to transfer about 78MB/second.


MB = megabytes
mb = megabits


some Oracle-specific information missing

Jon, October 04, 2014 - 2:26 am UTC

I can find out from myriad sources about io resource contention, etc but what I expect to find here is practical advice about Oracle-specifics WRT SAN storage such as:

What things (files/mounts/etc) should I place on a SAN such that if I replaced all of the others (in the Oracle installation ecosystem) with those from the same version of Oracle (such as from a snapshot of those things), I could startup of the db with out a reasonable hitch?

Alexander, December 04, 2014 - 4:05 pm UTC

Tom,

We have a strange issue in a dbaas environment we recently built using SAN storage. This storage is dedicated to this dbaas VMware cluster. The performance is satisfactory but we noticed when we are created and expanding datafiles it's terrible. For example, 20+ minutes to expand a 500MB file to 30GB. So the questions I have are 1) Is there anything special about the way Oracle expands datafiles that would explain how performance could be bad for that operation but be quite good for others like running an impdp? 2) Can you recommend a test or approach given this scenario I can use to get something more definitive?

Thanks.
Tom Kyte
December 04, 2014 - 4:31 pm UTC

1) we initialize the file, so you are writing 29.5gb of data. That's roughly 25MBs.


things like impdb write to the buffer cache, not directly to disk for most operations - an index would be a direct path write, but in normal mode, we are just inserting into the buffer cache.

tell you what, if you dd a big set of data from disk to disk (no oracle, just dd), what do you see there

then if you do a create table as select in the database and create a copy of a big table (just a create table t as select * from old_t - nothing fancy - full scan of a big table to a direct path write of a big table - consider using NOLOGGING if you are in archivelog mode...) - how does it look compared to the dd


2) see #1

Alexander, December 04, 2014 - 7:39 pm UTC

One other thing, what would you say is a reasonable throughput for writes for fiber SAN (in MBs)? Researching this hasn't turned up anything reliable. It's the first thing the storage guys will ask, "what are you expecting".
Tom Kyte
December 04, 2014 - 10:43 pm UTC

well since USB 3.0 could give me 625MBs - I'd want more than 25MBs from a high end SAN with administrators and everything :)

depends entirely on the configuration. If you have a 1g network card, best would be 100MBs - under ideal conditions. But then it would depend on how the SAN is configured, the amount of cache it has, what sort of raid it has implemented.


You should start your conversations with the SAN guys with "ok guys, you put it all together, what should I expect". Don't let it go the other way. The are providing a service, they should know the rated expectancy of said service.


kkk, October 24, 2023 - 7:22 am UTC

such a nyce work