Skip to Main Content

Breadcrumb

May 4th

Question and Answer

Tom Kyte

Thanks for the question, Steven.

Asked: June 27, 2000 - 10:02 am UTC

Last updated: November 28, 2011 - 9:44 am UTC

Version: Oracle 8.0.4.1

Viewed 10K+ times! This question is

You Asked

We are looking to buy a new database server. What is the best RAID configuration for databases and why? The current server we have has a RAID configuration of RAID 5.

and Tom said...

depends on the database, depends on the part of the database. Depends also on who you ask ;)

Here is what I like (raid 0 = stripes, raid 1 = mirrors, raid 5 = striping+parity):

o no raid, raid 0 or raid 0+1 for online redo logs AND control files. You should still let us multiplex them ourselves even if you mirror them. We have more opportunities for failure if the raid subsystem reports a "warning" back to us -- if we have multiplexed them -- we are OK with that.


o no raid or raid 0 for temporary datafiles (used with temporary tablespaces). no raid/raid 0 is sufficient. If you lose these, who cares? You want speed on these, not reliability. If a disk fails, drop and recreate temp elsewhere.

o no raid, raid 0 or raid 0+1 for archive. Again, let us multiplex if you use no raid or raid 0, let the OS do it (different from online redo log here) if you use 0+1.

o raid 0+1 for rollback. It get written to lots. It is important to have protected. We cannot multiplex them so let the OS do it. Use this for datafiles you believe will be HEAVILY written. Bear in mind, we buffer writes to datafiles, they happen in the background so the poor write performance of raid 5 is usually OK except for the heavily written files (such as rollback).

o raid 5 (unless you can do raid 0+1 for all of course) for datafiles that experience what you determine to be "medium" or "moderate" write activity. Since this happens in the background typcially (not with direct path loads and such) -- raid 5 can typically be safely used with these. As these files represent the BULK of your database and the above represent the smaller part -- you achieve most of the cost saving without impacting performance too much.



Try to dedicate specific devices to

o online redo
o archive
o temp

they should not have to share their devices with others in a "perfect" world (even with eachother).

Rating

  (100 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Best arcticle!!!!

Den, April 27, 2001 - 1:45 am UTC

Best arcticle!!!!

raid and file layouts

mike, October 10, 2001 - 11:46 am UTC

I am not very sure about the relationship of raid and the file layouts.
For example there are 10 disks and the 1+0 raid adopted, do we still need to consider such as put rodo files, rollcack file in the disk x, y? I think 0 means striping on all the disks, we really can not tell the files in which perticalar disks. They are over the disks.

Tom Kyte
October 10, 2001 - 12:26 pm UTC

You set up the relationships, You configure the raid sets. Striping can be across 2 devices, it can be across N devices. You decide, you configure. It is totally 100% up to you.

So, 0 does not mean striped across all, it just means striped across some.

Reader

Reader, October 10, 2001 - 1:04 pm UTC


What is the stripe width would you recommend in terms
of db_block_size and db_file_multiblock_read_count.
Does db_block_size or (db_block_size X db_file_multiblock_read_count) preferrable

Is it not preferrable to use their own device for rollback
segments

Thanks

mike, October 10, 2001 - 1:34 pm UTC

My concern was if really needs to consider the files distribution as the 0+1 adopted, the files well striped over disks allraedy?

Tom Kyte
October 10, 2001 - 2:03 pm UTC

does not syntatic sense make ;)

I don't understand.

mike, October 10, 2001 - 2:42 pm UTC

Assume there were 10 disks, someone wanted to distribute files as follow, I would say that it does not matter because the files stripped over 10 disks if 0+1 used. One disk could contain the portions of all the files. Could you clear on this?

disk 1 Oracle Software
2 SYSTEM tablespace, control file 1
3 RBS tablespace, RBS_2 tablespace, control file 2
4 DATA tablespace, control file 3
5 INDEX tablespace
6 TEMP tablespace, DATA_2 tablespace
7 TOOLS tablespace, INDEX_2 tablespace
8 Online redo logs 1, 2 and 3, Export dump destination
9 Application software, archived redo logs destination

Tom Kyte
October 10, 2001 - 6:33 pm UTC

If there were 10 disks and you used raid 1, you would really have 5 disks (pairs). Insufficient to really be picky about anything.

So, say it's really 20 disks so you have 10 mirrored pairs.

If you wanted the degree of control you have above -- you would not use striping, you would just mirror.

You could strip perhaps the pairs 2, 3, 4, 5, and 7 and just spread the io for those files out.

You don't have to stripe ALL of them.

raid

Mahbuba Rahman, April 29, 2002 - 11:00 pm UTC

Would you please explain what are the differences between Stripping and Mirroring and the advantages of each one over the other .

Tom Kyte
April 30, 2002 - 7:31 am UTC

Search on the internet. Tons of material on this.

Striping -- good for performance. No availability features. Achieves even IO across many devices.

Mirroring -- good for availability. You have to lose both (or three or whatever) disks before you lose data.


multiplexing on-line redo

harish, April 30, 2002 - 10:19 am UTC

Some people suggest thet on-line redo should be multiplexed using OS facilities ( ie mirror ) and not Oracle. There appers to be conflicting views on this subject.. Are you able to clarify why you recommend Oracle based multiplexing ?

Tom Kyte
April 30, 2002 - 11:43 am UTC

Some OS's return a "warning" when one of the mirrors cannot be written to. This could cause us to think "error".

Also, if a mirror goes down, you are unprotected at that point.



RAID

Shajan Joseph, April 30, 2002 - 10:22 am UTC

It is to be decided on the basis of the following parameters namely, Throughput/Response Time (Performance),Scalability & Backup strategy (which determines database availability)

Tom Kyte
April 30, 2002 - 11:44 am UTC

what is to be decided? Nice statement, no context.

This might clear up some things.

Michael S. Durham, April 30, 2002 - 12:11 pm UTC

</code> http://technet.oracle.com/deploy/availability/pdf/oow2000_sane.pdf <code>

I have switched to the methods outlined in this document and I must say that my life is now much easer. It should answer many of the questions people are asking about RAID for an Oracle database.

Regards,
Mike

raid (0+1) or (1+0)

Phil Williams, April 30, 2002 - 7:20 pm UTC

I am not sure if I was dreaming (or just drunk) but I heard that there was a diff in performace between ROAID 0+1 compared with RAID 1+0. Does anyone know if this is true or not and if so is there any data to back it up.

Cheers

Phil

Tom Kyte
April 30, 2002 - 8:23 pm UTC

There shouldn't be a perf difference - but an availability difference exists.

See
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2403901570002 <code>



A reader

ak, May 10, 2002 - 2:51 am UTC

Hi,
1)Regarding Raid 5 ,minimum three disk,how many is the
maximum here,Keeping good performance in mind of a
terabytes database and heavy oltp system.

2)All the Raid should have diff. controller,keeping
good performance ,controller corrupting the datafiles
in mind??

3)How about channel here ,keeping performance in mind.
Is there any effect of single channel handling
everything ??

Your comments on above 3 points please .
Thanks

Regards
Ak


A reader, August 12, 2002 - 12:28 pm UTC

You said:
<quote>
o no raid, raid 0 or raid 0+1 for online redo logs AND control files. You should still let us multiplex them ourselves even if you mirror them. We have more opportunities for failure if the raid subsystem reports a
"warning" back to us -- if we have multiplexed them -- we are OK with that.

o no raid, raid 0 or raid 0+1 for archive. Again, let us multiplex if you use no raid or raid 0, let the OS do it (different from online redo log here) if you use 0+1.
<quote>

How can we do in order to let Oracle multiplex the online redo log , control files and archive logs? CAn you tell how to do this, which command should I use?

Thanks


Tom Kyte
August 12, 2002 - 1:52 pm UTC

see the alter database command for how to create a log group with more then one member (add some members to the redo log groups)

Just copy the controlfiles across many (2 or 3) mount points on different physical devices and use the controlfiles= init.ora parameter to tell us to maintain them (whilst the data is shutdown)

use the log_archive_dest_1 string
log_archive_dest_10 string
log_archive_dest_2 string
log_archive_dest_3 string
log_archive_dest_4 string
log_archive_dest_5 string
log_archive_dest_6 string
log_archive_dest_7 string
log_archive_dest_8 string
log_archive_dest_9 string

init.ora parameters to tell us where to put more then one copy of archive.

..about online redolog files

Yogeeraj, August 13, 2002 - 12:11 am UTC

Hello,

from above:
<quote>
Try to dedicate specific devices to

o online redo
o archive
o temp

they should not have to share their devices with others in a "perfect" world
(even with eachother).
</quote>

..and from "OPTIMAL STORAGE CONFIGURATION MADE EASY" by Juan Loaiza, Oracle Corporation
<quote>
...
LOG FILES
It is generally more efficient and flexible to parallelise IO operations using parallel execution at the Oracle level than using small stripe widths at the storage level. However, online log file writes cannot be parallelised at the Oracle level. They must be parallelised at the storage system level. If an online log file is located on a single disk, then operations that make changes very rapidly such as parallel updates, parallel index creations, parallel loads, etc. may become bottlenecked on the log disk. Therefore the online log file should be spread across multiple disks using striping.
...
Striping all files across all disks ensures that the full bandwidth of all the disk drives is available for any operation. Therefore, parallel execution and other IO intensive operations will not be unnecessarily bottlenecked because of the disk configuration. You cannot reconfigure your storage without a great deal of effort, so striping across all disks is the safest and most future-proof choice you can make.
...
</quote>

According to this document, it is better to stripe datafiles across the multiple disks!

In my case, i have an external storage area with 12 disks (excluding 2 for hotspare), i am thinking about mirroring 6 disks over the rest of 6 disks and striping my datafiles across them.

Also, at the Oracle level, i will be multiplexing the files forming part of my recovery set (online logs, control file, and archive logs) so that one plex is found in my internal disks.

Is that ok? Did i miss something?

Please guide.

Best Regards
Yogeeraj

Tom Kyte
August 13, 2002 - 8:26 am UTC

with just 6 disks at the end of the day -- you do not have much opportunity for separating stuff out at all. You'll have to just stripe away and hope for the best. You might consider setting 2 of the mirrored disks aside just for log if you are really log intensive.

A reader, August 14, 2002 - 2:31 pm UTC

Hi Tom,

I am confuse on how many disks are needed at least to implement raid 1+0 or 0+1.
Imaging that you have 30 disks (including one hotspare) and you want to implement raid 1+0, and imaging that you want a strip width of 4. Can configurate raid 1+0 over these disks or not?.
If understood well I need at least 4 drives in order to implement raid 1+0, so if I have 30 disk (one hotspare) and I want a strip width of 4 I will not able to do it because I need two more disks that I don't have.

Can you help me with this

Thanks in advance.

Tom Kyte
August 14, 2002 - 2:50 pm UTC

I'll recommend you talk to your hardware supplier. I don't really do hardware stuff personally.

I usually go here:
</code> http://www.acnc.com/04_01_00.html <code>

for a good overview. You need a minimum of 4 drives and you have 30 so I don't see the problem?

A reader, August 14, 2002 - 6:29 pm UTC

Hi Tom,
I understood what your told me, but my doubt (after reading www.acnc.com/04 01 00.html) is the following:
If I understood well the idea of raid 10 or 01, I need at least 4 drives to implement raid 10 or 01, so if I have 30 disk, actually 29 (one is a hotspare) and I want to have a stripe width of 4 (this means a deep of 4 disk by group ), so each group will need to have 16 disks by group (8 for mirroring and 8 for stripping). Am I wrong?

Thanks for any help you can give me.


Tom Kyte
August 14, 2002 - 7:28 pm UTC

Sorry -- I just don't do this stuff, I'm out of my league. Not my cup o' tea if you know what I mean.

Maybe someone out there reading this knows, but your best bet is to talk to your Veritas technical Sales consultant. They should know this stuff like I know the database.

WRT S.A.M.E. methodology

Graham Halsey, August 21, 2002 - 9:48 am UTC

Hi Tom,

As Michael S. Durham mentions in this strand, the SAME document is a v. useful reference (yes, my keyboard is broken).

One of the recommendations in there is to place frequently accessed data on the outside half of the disks. I understand the reasoning behind this, but don't actually know how it's done. Is this purely an OS thing, or is there anything I can do in Oracle to control which tracks on the disk are used when creating datafiles etc...?

Sorry to ask you to comment on someone else's material!

Regards,
Graham.

Tom Kyte
August 21, 2002 - 10:26 am UTC

I believe you would be creating physical disk partitions, which you control the phyiscal location of, and using them where appropriate. So, it is an "OS" thing.

Watch how your controlers are set up.

Chuck Jolley, August 21, 2002 - 10:49 am UTC

The biggest performance increase we have seen tuning this stuff is getting the data and indexes on different controlers, regardless of how the raids are set up.
We literaly doubled the performance of one of our databases simply by getting another controler into it's server to drive the data and index arrays separately.
This is on a Compac/Win2k server, so it might not be an issue in other setups.


Tom Kyte
August 21, 2002 - 11:01 am UTC

Umm, probably more likely you finally accidently achieved "even IO distribution".

You see, for a given single query -- we access the index AND THEN the data sequentially -- not in parallel. So, one controller, 15 controllers -- it would not matter for the query in isolation.

Getting an even IO spread across controllers to scale up lots of users -- thats key. Check your iostats and you'll see that what you've done is just spread it out.

Helped multi-user and...

Chuck Jolley, August 21, 2002 - 11:42 am UTC

Tom,
It also greatly helped a large nightly summary we create that inserts about 1,200 rows into a report table using a structure like:

insert into report_table(col1, col2,....)
select report_name, sysdate, sum(bla1), sum(bla2),...
from 400_lines_of_sql;

Does this make sense?

Thanks,
Chuck

PS It is entirly possible that he HW people did something else like putting back "borrowed" memory without telling us too. :)


Tom Kyte
August 21, 2002 - 11:56 am UTC

We would have to see a statspack and tkprof from before and after but... another thought just came to mind:

in order to separate, you had to rebuild. You might have been removing tons of dead whitespace that accumulated. A simple rebuild of the objects may well have delivered the same effect.

But this does point out why it is good to use statspack (we would have the history to compare to) and to occasionally tkprof an application or two. Otherwise, we have nothing to compare to (other then our "memories") and can only sit around and speak hypothetically.

;) we'll never really know in this case (unless you are willing of course to unseparate the data again !)

Data Loading Slow in RIAD 5

Tony, May 30, 2003 - 3:55 am UTC

Our company uses RAID 5 DB for our Decision Support System. There is no performance problem reported by the application users as we have RAID 5 that is good for DSS. But the problem is with the data loading. As the size of the data to be loaded every night increases day by day, the loading time also increases day by day. Now it takes 7 to 8 hrs for data loading. We already enabled parallel processing, nologging ..etc. Our investigation shows lots of I/O waits during the loading. As we know, RAID 5 is not good for write intensive data loading. I think we can not have one RAID system for data loading and another for data access by the DSS application. What is the solution for this problem Tom?.

Tom Kyte
May 30, 2003 - 7:49 am UTC

don't use raid 5?

are your IO waits for write or read -- if you do lots of direct path options, they'll be waits for write and the only think you can do is spread the work out over as many logical disks as possible and just nail the system.

sorry, without in depth knowledge of your system and processes, I'm not going to be able to suggest something. Not knowing your processing needs makes it hard (NOT that you can enter all of that information here, please don't)

Reader, August 24, 2003 - 10:59 am UTC

</code> http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96533/iodesign.htm#PFGRF015 <code>

"On low-concurrency (sequential) systems, ensure that no single I/O visits the same disk twice. For example, assume that the stripe width is four disks, and the stripe depth is 32k. If a single 1MB I/O request (for example, for a full table scan) is issued by an Oracle server process, then each disk in the stripe must perform four I/Os to return the requested data. To avoid this situation, the size of the average I/O should be smaller than the stripe width multiplied by the stripe depth. If this is not the case, then a single I/O request made by Oracle to the operating system results in multiple physical I/O requests to the same disk."

Tom, how did they come up with "each disk in the stripe must perform four I/O." Thanks.

Tom Kyte
August 24, 2003 - 11:48 am UTC


i don't know, not really a hardware/disk guy myself. I don't do that stuff.

Thanks

Anthony Reddy, August 26, 2003 - 1:12 am UTC


Partition in RAID

Tony, August 26, 2003 - 5:52 am UTC

Its essential, if I'm not wrong, to store data of a table in different disk so that table partition and parallel operation will be more effective to increase the performance. Do we have to follow this for RAID systems also? (Data are already stripped/mirrored in RAID systems)


Raid configuration

A reader, September 14, 2003 - 4:09 pm UTC

"
no raid or raid 0 for temporary datafiles (used with temporary
tablespaces). no raid/raid 0 is sufficient. If you lose these, who cares? You
want speed on these, not reliability. If a disk fails, drop and recreate temp
elsewhere"

Tom,
Reviewing your quote above:
From a point of view of *good practices* would you recommend putting the temp files also in the 1+0 area ( in a raid 1+0 shop ). Because this helps from the point of availability. Agreed data is is safe but somebody has to re-create the temp files which causes delay for the application functioning. If the organization can spend the big bucks on the raid 1+0 setup, saving( if at all) on non-mirroring of the temp files exposes it as the weakest link when murphy's law asserts itself.

Thanks as always...

Tom Kyte
September 14, 2003 - 10:05 pm UTC

depends on what you are using the raid for -- yes, it has been pointed out that protecting temp can increase availability (not on this specific page, but i've had this discussion)

temp can be lost, and losing temp will not lose any data - only time.

so, if the goal is "no data loss", the above is good.

if the goal is "reduce downtime", protect temp. consider what happens when the disk TEMP is on fails 5 hours into a 5 hour and 1 minute process otherwise!



Is your RAID recomandation valid for ASM also?

Tony, September 15, 2003 - 1:02 am UTC

I think Oracle 10g's Automatic Storage Management internally usese SAME (Strip All Mirror Everything) concept.
If I'm correct, what is the relevance of the above RAID recomandation for ASM implemented Oracle 10g.?
Can you please clarify?


Tom Kyte
September 15, 2003 - 9:36 am UTC


it is all about implementing SAME, yes.

The above recommendation is for where you can use the various types somewhat safely. remember i said "raid 5 (unless you can do raid 0+1 for all of course)"!!

Distribute IO

A reader, November 24, 2003 - 2:50 pm UTC

Hi Tom,

Say if my disks are already RAID 1+0. Do you think it's still necessary to distribute all indexes into several INDEX tablespaces? Or just use one INDEX tablespace?



Tom Kyte
November 24, 2003 - 3:25 pm UTC

use tablespaces for "book-keeping" and "administrative ease of use".

I do this - i have a tablespace per application, everything into the same bin. Sometimes two tablespaces (or more) depending on size -- but in general, its all in one bucket.

OS or Manual Striping

A reader, November 24, 2003 - 6:00 pm UTC

Hi Tom,

My general impression is that OS/Hardware striping is more beneficiary than manual striping (e.g. assign different TS to each table partition). Is this correct?

Thanks.

Tom Kyte
November 24, 2003 - 6:23 pm UTC

I like it better because I don't have to think about it. It has been years since I've worried about things like that. Just let the hardware do it.

All you have to do then is make sure you use the raid sets effectively (spreading the load out over all of them)

learning every day, March 02, 2004 - 1:27 pm UTC

We are trying to create oracle 9i database in a machine where we will have 11 logical disks .All of them are in RAID1 .As all are mirrored ,actually in total there will be 22 (11 * 2)disks.Also there are 3 array controllers.
1st array controller has 1 disk
2nd array controller has 5 disk
3rd array controller has 5 disk
And we have 2 databases to migrate to this machine with 11 disks in hand. How we should distribute datafiles,undo datafile ,temp datafiles,online redolog files,archive logs,control files etc for the 2 9i oracle database ?Any recomendation on the physical design so that we can avoid all kind of contention (i/o ,process etc).
Thanks!


Tom Kyte
March 02, 2004 - 7:01 pm UTC

SAME.... strip and mirror everything....

It would be really hard to say. 11 devices isn't very many and I don't know the size/activity of the various databases. I'd just like a nice big striped system I guess and let the hardware do the work.

thank!

A reader, March 03, 2004 - 10:40 am UTC

Currently all 11 disks are in RAID1.
Are you suggesting to cnvert all of them to RAID01 (or RAID10) instead?



Advice on disk configuration

Sun, May 03, 2004 - 11:55 am UTC

Good day, Tom

I have a SUN E250 running Oracle 9.2

We configured it with DiskSuite RAID 5 (software RAID 5, sounds bad...):. With time, the load on the database was getting higher -- tipically oltp with few batchs.

We recently tuned most of the database (apps, database engine knobs) -- a good job, by the way. We diminished drastically the number of LIOs.

However, we are still getting too much 0 idle time with the run queue getting to 4 in vmstat.

In statspack, the top wait event are mostly IO-related.

I would like to know if you share the hunch that my next bottleneck to tackle is our RAID5.

Thanks.

Below is an extact from a typical statspack output:

STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
ORAPRD 2319364203 oraprd 1 9.2.0.1.0 NO srv16

Snap Id Snap Time Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
Begin Snap: 4192 03-May-04 10:00:03 72 29.8
End Snap: 4201 03-May-04 10:30:05 85 26.6
Elapsed: 30.03 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 144M Std Block Size: 8K
Shared Pool Size: 144M Log Buffer: 512K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 24,002.34 20,635.60
Logical reads: 1,520.94 1,307.60
Block changes: 143.37 123.26
Physical reads: 238.16 204.76
Physical writes: 4.15 3.57
User calls: 54.96 47.25
Parses: 34.33 29.51
Hard parses: 11.73 10.08
Sorts: 20.24 17.40
Logons: 1.72 1.47
Executes: 46.39 39.89
Transactions: 1.16

% Blocks changed per Read: 9.43 Recursive Call %: 85.42
Rollback per transaction %: 40.36 Rows per Sort: 71.94

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 87.90 In-memory Sort %: 100.00
Library Hit %: 88.30 Soft Parse %: 65.84
Execute to Parse %: 26.01 Latch Hit %: 99.83
Parse CPU to Parse Elapsd %: 52.09 % Non-Parse CPU: 59.55

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 100.00 100.00
% SQL with executions>1: 20.58 12.26
% Memory for SQL w/exec>1: 15.78 13.87

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 1,128 31.35
log file sync 8,139 984 27.35
db file scattered read 43,571 248 6.90
db file sequential read 24,540 241 6.70
db file parallel write 448 206 5.72
-------------------------------------------------------------
^LWait Events for DB: ORAPRD Instance: oraprd Snaps: 4192 -4201
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync 8,139 477 984 121 3.9
db file scattered read 43,571 0 248 6 20.8
db file sequential read 24,540 0 241 10 11.7
db file parallel write 448 222 206 459 0.2
direct path read (lob) 3,119 0 119 38 1.5
log buffer space 146 84 116 794 0.1
buffer busy waits 121 54 77 637 0.1
latch free 3,331 125 65 19 1.6
log file parallel write 7,421 7,274 51 7 3.5
PX Deq: Execute Reply 122 0 48 395 0.1
wait for unread message on b 7,711 65 46 6 3.7
LGWR wait on LNS 7,484 48 45 6 3.6
direct path read 5,350 0 43 8 2.6
enqueue 16 11 37 2289 0.0
async disk IO 193 0 36 189 0.1
PX qref latch 56 13 27 486 0.0
control file parallel write 597 0 26 44 0.3
SQL*Net message from dblink 788 0 22 28 0.4
LNS wait on SENDREQ 7,482 0 16 2 3.6
SQL*Net more data to client 33,736 0 7 0 16.1
log file sequential read 106 0 2 20 0.1
log file switch completion 2 0 2 806 0.0
LNS wait on LGWR 9,402 0 1 0 4.5
PX Deq: Parse Reply 4 0 1 252 0.0
direct path write (lob) 73 0 1 11 0.0
LGWR wait for redo copy 74 39 1 11 0.0
library cache pin 25 0 1 20 0.0
log file single write 4 0 0 35 0.0
process startup 1 0 0 121 0.0
PX Deq: Msg Fragment 16 0 0 5 0.0
control file sequential read 485 0 0 0 0.2
SQL*Net more data from dblin 79 0 0 1 0.0
ARCH wait on SENDREQ 29 0 0 1 0.0
db file parallel read 1 0 0 32 0.0
SQL*Net break/reset to clien 9 0 0 0 0.0
SQL*Net message to dblink 788 0 0 0 0.4
SQL*Net message from client 89,544 0 60,157 672 42.7
virtual circuit status 60 60 1,758 29305 0.0
PX Deq: Execution Msg 234 70 164 700 0.1
PX Idle Wait 40 35 74 1839 0.0
SQL*Net more data from clien 4,360 0 4 1 2.1
PX Deq: Table Q Normal 12 0 1 88 0.0
SQL*Net message to client 89,550 0 1 0 42.7
-------------------------------------------------------------
^LBackground Wait Events for DB: ORAPRD Instance: oraprd Snaps: 4192 -4201
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
db file parallel write 448 222 206 459 0.2
log file parallel write 7,413 7,266 51 7 3.5
wait for unread message on b 7,697 65 46 6 3.7
LGWR wait on LNS 7,476 48 45 6 3.6
control file parallel write 590 0 26 44 0.3
log buffer space 14 6 10 698 0.0
LGWR wait for redo copy 74 39 1 11 0.0
latch free 14 0 0 30 0.0
log file single write 4 0 0 35 0.0
control file sequential read 260 0 0 0 0.1
db file scattered read 2 0 0 5 0.0
ARCH wait on SENDREQ 1 0 0 2 0.0
db file sequential read 10 0 0 0 0.0
rdbms ipc reply 1 0 0 1 0.0
log file sequential read 4 0 0 0 0.0
rdbms ipc message 22,782 16,809 10,075 442 10.9
smon timer 6 5 1,635 ###### 0.0
-------------------------------------------------------------



Tom Kyte
May 03, 2004 - 7:10 pm UTC

Soft Parse %: 65.84

Holy moly. I do not believe I've ever seen one this low :(


you need to teach your developer how to spell "bind variables" and then use them someday.

Your logs are the major "system" bottleneck. Logs on raid 5 = deadly for performance.

90% of your cpu time is probably spent parsing.
most of your wait is on log.

while you order "normal disks" for logs, they need to fix the horrible bug in their code.

Disk configuration (again)

Sun, May 03, 2004 - 7:41 pm UTC

Thanks Tom... We are planning to migrate from the E250 above to an R280. The new machine has only one 70 GB disk. Do you think that it will behave better than the raid 5 in terms of performance and contention ?

As to binds, we are working on them already.

One more thing... I am not able to create an index because Oracle hangs. After I killed the session, the system was back to normal. Was that RAID 5 again ?



Tom Kyte
May 03, 2004 - 8:02 pm UTC

raid 5 for writes = slow.

log = lots of writes.

raid 5 + logs = not a good thing.


How did you diagnose that Oracle was "hanging". There is a huge difference between "hang" and "something taking a while"

Index creates can be done with direct writes.
direct writes in raid 5 are *slow as sin*.
direct writes down with software based raid 5 will peg the cpu horribly.

index creates on software based raid 5 -> all cpu will go to computing this parity thing and system will appear to be very very slow




Raid 10

ak, May 18, 2004 - 7:07 am UTC

You recommended Raid 10 or raid 0+1 ??
from </code> http://www.acnc.com/04_01_0p1.html <code>it look like raid 10 is best solution if one can afford,but from your answer it looks like you are recommending 0+1 ,Correct me if I am wrong.


Tom Kyte
May 18, 2004 - 12:24 pm UTC

depends on your goals -- striped mirrors or mirrored stripes.
raw performance / increased availability.

A reader, May 18, 2004 - 12:33 pm UTC

Tom,

For financial institution what do you recommend as both are important ?

Thanks.

Tom Kyte
May 18, 2004 - 6:25 pm UTC

10 has better availability characteristics, if that is the most important -- it can survive some multi-disk failures whereas 01 can only survive a single disk failure.

RAID-RAID??

NOTNA, May 18, 2004 - 10:03 pm UTC

Hi Tom

Is it still really important to know all of these... I mean do i really have to care now if my server is running on raid 0+1, raid 3, raid 5, etc.. with these new technologies that the manufacturers are introducing. And looking at 10g (which runs virtually on grid), do I need to worry if my disk setup as raids 1+0 etc...? Does it really have an impact with my db performance when it comes to raid configuration?

Cheers,
NOTNA

Tom Kyte
May 19, 2004 - 7:48 am UTC

put your logs for a high end oltp system on raid 5 and tell us how you like it.

you might need to know, as a "tuner" i want to know - sure.

A reader

Aj, June 27, 2004 - 4:13 am UTC

We have got 17 disk for our New db.Here is how I am planning to put oracle db on different disk,I would appreciate your comment on this..

1)redo logs -four disks in 2 separate mirrors,it will allows us to multiplex on different mirror set.

2)Temporary tablespace - Two disk in mirror.(Being 24x7)

3)Archive logs - Four disks in Two separate mirrors,so that oracle can write on two different place at a same time.

4)Six disks in raid 10 for datafiles

5)Hot spare - One disk

We have two Raid controller--Please comment on which raid should control what disks..

Cheers


Tom Kyte
June 27, 2004 - 11:18 am UTC

S.A.M.E. would work as well -- stripe and mirror everything.

Impossible to really comment on optimal disk layout beyond that. All depends on the system you are building there.

Is it reasonable for you to dedicate that much space to redo/arch/temp (50% of your resources), is it necessary....


but I'm not really the hardware guy... Not my specialty.


RAC and RAID

A reader, July 21, 2004 - 10:59 am UTC

Tom,

We are setting up a new database server - Your original response is very useful for planning our drive configuration. We plan to set up RAC for high availability and load balancing. My question is: Can RAC be considered a substitute for mirroring? For instance, if we put rollback on a RAID 0 (rather than RAID 0+1 as you suggested), once per instance in the cluster, would doing so provide us with the same kind of redundancy as RAID 0+1?

I hope you can just let me know whether I'm approaching this correctly...Thanks a bunch.


Tom Kyte
July 21, 2004 - 11:05 am UTC

RAC is not about replication of data.

raid 0 would offer no HA for undo information.

undo is as vital to the operation of the database as SYSTEM is.

while each rac instance can have it's own set of undo or can share undo over instances -- rac does not provide any redundancy of undo, you need mirroring or something similar for that.

io performance with multiple logical disk

Genaro Mendez, August 16, 2004 - 9:44 am UTC

how i must use mis disks in the tablespaces creation:
In my server (production) i have 3 logical disk,
. the first one have 5 physical disk (in raid 5)
. the second one have 4 physical disk (in raid 5)
. and the thirth have 3 physical disk in raid 5 too.
How can i create my tablespaces for data and indexes?:
. to use one big logical disk for indexes tablespaces and two disks for data tablespaces.
. to use on big logical disk for data tablespaces and two disks for indexes tablespaces.
. to create every tablespace(datafiles) across the 3 disks:
create tablespace data01 datafiles
/disk1/data01.dbf,
/disk2/data02.dbf,
/disk3/data03.dbf
.
.
.
create tablespace indx01 datafiles
/disk1/index01.dbf,
/disk2/index02.dbf,
/disk3/index03.dbf
.
.
.

what do you think is the best practice for i/o performance?.
thanks for your comments.

Tom Kyte
August 16, 2004 - 7:25 pm UTC

index/table data -- doesn't really matter. just spread it all out (if all you were giving me was that raid 5 stuff, might as well just give me one mount point -- you've already said "slow=true" for writes, might as well say "easy=yes" too and just raid stripe over *everything*).

unless you were going to give me some mirrors or striped mirrors, no reason to even make it 3 logical volumes here (except to make life harder)

RAID 5 and RAID 1 combination for Oracle database

Shibu V, August 23, 2004 - 3:14 am UTC

Hi Tom,

Would you clear this RAID configuration for an new Oracle 9i database for a less intense OLTP system (about 1000 transactions daily) :-

RAID 5 array:- ( 3 disks)
a. Data Files,Index files - 250 GB
b. GIS Maps - 20 GB. Can increase upto 50GB.

RAID 1 array:- (2 disks)
c. Redo log files, archive log files, temporary and Rollback Segment files - 5GB
d. Operating System files and Softwares Installed - 1 GB
e. Control files (Size will increase since it is used for storage of RMAN backup information) - 5GB

f. 1 disk as Hot spare

Is it advisable to keep the redo log files and archive log files within the same RAID 1 array?


Tom Kyte
August 23, 2004 - 7:33 am UTC

doesn't seem like you have enough disks to support contention free processing for c, d, e.....


no, it would not be advisable to have redo, archives and OS files on the same exact physical devices if you goal is an "intense oltp system".

but then again 1,000 transactions is about 1/60th of what I do on this site and I consider myself a lightweight as far as processing needs go -- so perhaps "intense" is the wrong terminology here...

RAID 5 on External Storage

Reader, September 23, 2004 - 3:58 am UTC

Hi Tom,
We will be implementing a new Oracle 9 Database on Sun Server. The disk configuration for the Server will be 2 local disks 33 GB each, and RAID 5 (3 disks 73 GB each) on External Storage. Database will be in Archivelog Mode.
I was thinking about keeping the Online Redo Logs and the TEMP Tablespace on the 2 local disks and the rest of the Datafiles(DATA,UNDO,INDEX along with ArchiveLogs) on the External Storage(RAID 5).
Is there a more optimal way of Organizing this Database considering the current Hardware Configuration ?
Your inputs will be very much appreciated.

Thanks .


Tom Kyte
September 24, 2004 - 8:00 am UTC

optimal and "3 73gig drives in a raid 5ish configuration" does not compute :)

I would not want temp (direct writes, direct reads) with my online redo log (direct writes, point of serialization, want speed).

But basically with disks the size you have and the low cardinality thereof, not much can be done. Not much to work with here -- sorry.

Disk Configuration of Asktom

A reader, December 06, 2004 - 12:37 pm UTC

* I would not want temp with my online redo log *

Hi Tom,

One question I have: Is it OK to place undo/temp/system on the same disk? I would think not because earlier you mention that undo should be mirrored and temp should rather not be mirrored. Please give your thoughts.

BTW, are you able to share with us the disk configuration/layout of your asktom db or any typical oltp/dss hybrid db? Just want to get an idea of what is OK to place with what, what is certainly not OK to place with something else, what should definitely be on a separate disk by itself.

Thanks.

Tom Kyte
December 06, 2004 - 12:51 pm UTC

undo should be mirrored

temp does not have to be (the loss of temp would impact only currently running queries -- if anything. the loss of undo -- that would kill you)

but we do lots of direct reads/writes on temp, if you were to split something out, it would be up there on the list of things to consider.


My system -- asktom -- is just SAME - stripe and mirror everything (really MASE but SAME sounds better). Too small to have a dozen separate devices.


On many systems -- SAME is perfectly OK. On a big high end system, you'll be looking at physical separation for fine tuning.

raid 5 = no way for everything regardless.

RAID5

Sripad, February 18, 2005 - 6:54 am UTC

Hi Tom,

I know RAID5 is not recommended for an OLTP system, but is it ok to use it in the case where we have and external raid controller and the raid array has memory on the controller to enable fast writing of the data directly to RAM rather than the disks? Is it still a problem with the above config to use RAID5??



Tom Kyte
February 18, 2005 - 9:01 am UTC

if you are willing to put up with the minimal availability increase you get, and the slow write performance (you will fill that cache if you do heavy lifting on this machine) sure -- raid5 :)



RAID5

Sripad, February 21, 2005 - 7:59 am UTC

Thanks very much for your prompt reply..

SAME -- oracle 9206

Baqir Hussain, May 23, 2005 - 3:46 pm UTC

It's going to be 2-Node RAC on OCFS.
Research through your site concludes that (SAME) Stripe and Mirror disks would be the best option for oracle.
I would like to clear few things.
After the SAME, is it a good practise to put all redo log, archive logs , data files including temp in one place?



SAME -- on RAC

Baqir Hussain, May 23, 2005 - 4:59 pm UTC

I found the answer on your site.
Thanks

Very useful innformation, but have a configration question.

Sebastian, May 25, 2005 - 7:52 am UTC

Regarding raid 10 and SAME
We have just bought a new server with 10 disk in all.
8 of the disks will be used for oracle data.
Disk size is 146 GB 15k.
Database size is close to 50 GB
Should we make a raid 10 on all disks or split the disks on 2 channels with 4 disk eah in raid 10 and place archive logs on one set and data on the other.
Answers or suggestions is appriciated.

Tom Kyte
May 25, 2005 - 8:14 am UTC

with just 8 disks -- I would probably opt for "just give me one mount point please". there isn't enough hardware here to really break things out.

planning RAID

Anurag, June 13, 2005 - 10:01 am UTC

Dear Tom,

We are using 9i and now going for 10g. Our database is of DSS. Mostly contain heavy images of even 2GB in size and GIS data. We've to plan for RAID server of capacity 1 TB. which one do we opt for, our vendor is very confused. Please advise.

Tom Kyte
June 13, 2005 - 11:50 am UTC

I don't do hardware :)

please give an idea atleast

anurag, June 13, 2005 - 12:44 pm UTC

Dear Tom,

Please atleast give a bleak idea even vague will do.

regards

Tom Kyte
June 13, 2005 - 1:05 pm UTC

umm, no. there is totally insufficient data to say anything about anything. It would be less than irresponsible to say "a thing" here.


why are you buying raid
what is your goal, is this for speed, for availability, for administration
how much money do you have
what is your companies long term plan for this.
is 1tb the end, or you getting more later
do you like direct attach storage, san or nas
and so on and so on and so on


A "raid server" is vague, beyond vague.


RAID 5 - Configuration

fynx, June 25, 2005 - 6:28 am UTC

Hi Tom,

SOrry to post this question here I would just like to verify my findings regarding my investigation of my statspack.

Please find my statspack report below.

STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
MIG 2551569495 mig 1 9.2.0.1.0 NO fynx


Snap Id Snap Time Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
Begin Snap: 17 24-Jun-05 17:26:40 12 551.6
End Snap: 18 24-Jun-05 17:46:44 12 551.6
Elapsed: 20.07 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 416M Std Block Size: 8K
Shared Pool Size: 160M Log Buffer: 512K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 11,235.54 4,509,198.67
Logical reads: 589.47 236,573.67
Block changes: 85.00 34,114.33
Physical reads: 1.12 449.67
Physical writes: 0.73 293.67
User calls: 0.01 4.00
Parses: 0.05 19.67
Hard parses: 0.00 0.00
Sorts: 0.27 107.67
Logons: 0.00 1.33
Executes: 209.64 84,135.67
Transactions: 0.00

% Blocks changed per Read: 14.42 Recursive Call %: 100.00
Rollback per transaction %: 0.00 Rows per Sort: 179.77

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 99.99
Buffer Hit %: 99.81 In-memory Sort %: 100.00
Library Hit %: 100.00 Soft Parse %: 100.00
Execute to Parse %: 99.98 Latch Hit %: 100.00
Parse CPU to Parse Elapsd %: % Non-Parse CPU: 100.00

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 93.82 93.82
% SQL with executions>1: 79.44 79.44
% Memory for SQL w/exec>1: 47.06 47.06

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file parallel write 764 37 35.17
log file parallel write 394 32 30.83
control file parallel write 391 29 27.51
CPU time 3 3.11
db file scattered read 89 2 1.63
-------------------------------------------------------------
Wait Events for DB: MIG Instance: mig Snaps: 17 -18
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
db file parallel write 764 382 37 48 254.7
log file parallel write 394 392 32 81 131.3
control file parallel write 391 0 29 73 130.3
db file scattered read 89 0 2 19 29.7
log file switch completion 4 0 1 209 1.3
async disk IO 13 0 1 42 4.3
log file sync 1 0 0 211 0.3
process startup 3 0 0 49 1.0
db file sequential read 21 0 0 1 7.0
db file single write 1 0 0 23 0.3
log file single write 2 0 0 11 0.7
control file sequential read 238 0 0 0 79.3
latch free 1 0 0 1 0.3
LGWR wait for redo copy 1 0 0 0 0.3
log file sequential read 2 0 0 0 0.7
virtual circuit status 41 41 1,202 29311 13.7
jobq slave wait 66 63 197 2978 22.0
-------------------------------------------------------------
Background Wait Events for DB: MIG Instance: mig Snaps: 17 -18
Current Maximum Initial
Resource Name Utilization Utilization Allocation Limit
------------------------------ ------------ ------------ ---------- ----------
parallel_max_servers 0 5 6 6
-------------------------------------------------------------
init.ora Parameters for DB: MIG Instance: mig Snaps: 17 -18

End value
Parameter Name Begin value (if different)
----------------------------- --------------------------------- --------------
background_dump_dest /oracle/admin/mig/bdump
compatible 9.2.0.0.0
control_files /symtemp/oradata/mig/control01.ct
core_dump_dest /oracle/admin/mig/cdump
db_block_size 8192
db_cache_size 218103808
db_domain
db_file_multiblock_read_count 16
db_files 1900
db_keep_cache_size 218103808
db_name mig
dispatchers (PROTOCOL=TCP) (SERVICE=migXDB)
dml_locks 1000
fast_start_mttr_target 0
global_names FALSE
hash_join_enabled FALSE
instance_name mig
java_pool_size 33554432
job_queue_processes 20
large_pool_size 33554432
open_cursors 500
pga_aggregate_target 209715200
processes 500
query_rewrite_enabled FALSE
remote_login_passwordfile EXCLUSIVE
shared_pool_size 167772160
sort_area_size 524288
star_transformation_enabled FALSE
timed_statistics TRUE
undo_management AUTO
undo_retention 10800
undo_tablespace UNDOTBS2
user_dump_dest /oracle/admin/mig/udump
-------------------------------------------------------------

End of Report

Based on the statspack report, I can see that there is a heavy wait on disks. The RAID implentation that they used here is RAID5 and since this is a batch run, RAID5 cannot cope up with the write operations....

Regards,
FYNX

Tom Kyte
June 25, 2005 - 9:44 am UTC

there is virtually no end user waits on disk.


db file parallel write 764 37 35.17
log file parallel write 394 32 30.83
control file parallel write 391 29 27.51
CPU time 3 3.11
db file scattered read 89 2 1.63


if you call that "heavy" for 20 minutes :)

db file parallel write. dbwr waits for this
log file parallel write lgwr waits for this
control file parallel write ckpt waits for this
CPU time, you used hardly any, looks like an idle system.

db file scattered read - your end users (maybe, might not even have been an end user) waited 2 seconds.




Thanks for the comments

fynx, June 26, 2005 - 1:27 am UTC

Hi Tom,

Thanks for the comments, actually this a batch operation where they perform :

For allrecs 1..(1.8m of records)
loop
... perform some validations
.. call some procs
. then update the record.
.. insert another record to some table.
end loop;

This very process takes more than 5 hours... we are currently modifying this process to be splitted in base on the primary key. Also, I was in the impression that since this is a heavy write operation, and all of the tablespace , even redo log files is residing on RAID 5 configuration, I recommended them to USE raid 0+1 instead of raid 5. Here are the top SQL stmt for the same statspack:

Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
204,152 50,461 4.0 28.8 15.14 18.49 3484153451
Module: OPD_BATCH.OP_EXP_UPD
UPDATE OP_EXP_BL set exp_amt = nvl(exp_amt,0) + nvl(:
b8,0), emk_amt = nvl(emk_amt,0) + nvl(:b7,0) where b
ranch = nvl(:b6,'~') and client_no = :b5
and ccy = :b4 and main_facil = nvl(:b3
,'~') and nvl(inter_facil,'~') = nvl(:b2,'~') and nvl(

101,052 50,472 2.0 14.2 4.55 5.64 242433303
Module: OPD_BATCH.OP_EXP_UPD
SELECT nvl(lfl.amount,0) -lfl.retract_amt + dec
ode (lfl.add_ded,'A', nvl(lfl.temp_lmt_amt,0),
'D',(-1)*nvl(lfl.temp_lmt_amt,0),0) amt,
lfl.ccy, lfl.appr_level, lfl.g
ap_type, lfl.branch

100,956 50,466 2.0 14.2 3.08 3.48 3979193534
Module: OPD_BATCH.OP_EXP_UPD
SELECT 'G' from lm_facility_lmt lfl where c
lient_no = :b2 and facil_code = :b1 and
branch is null

100,932 50,466 2.0 14.2 5.10 5.60 772540357
Module: OPD_BATCH.OP_EXP_UPD
SELECT decode(time_code_ind,'T', :b3 - :b4, :b3 - :b2 ),
gap_type from lm_facility where facil
_code = :b1

100,932 50,466 2.0 14.2 4.86 5.97 3539259487
Module: OPD_BATCH.OP_EXP_UPD
SELECT 'B' from lm_facility_lmt lfl where bran
ch = :b3 and client_no = :b2 and facil_c
ode = :b1

50,898 1 50,898.0 7.2 3.21 4.60 2225600597
DECLARE job BINARY_INTEGER := :job; next_date DATE := :mydate;
broken BOOLEAN := FALSE; BEGIN begin statspack.snap; end; :mydat
e := next_date; IF broken THEN :b := 1; ELSE :b := 0; END IF; EN
D;

50,043 1 50,043.0 7.1 0.80 1.06 4059808258
INSERT into stats$sqltext ( hash_value
, text_subset , piece
, sql_text , address , comman
d_type , last_snap_id )
select st1.hash_value , ss.text_subset

487 27 18.0 0.1 0.02 0.02 2654087038
Module: OPD_BATCH.OP_EXP_UPD
INSERT INTO OP_EXP_BL (branch, client_no, ccy,
SQL ordered by Gets for DB: MIG Instance: mig Snaps: 17 -18
-> End Buffer Gets Threshold: 10000
-> Note that resources reported for PL/SQL includes the resources used by
all SQL statements called within the PL/SQL code. As individual SQL
statements are also reported, it is possible and valid for the summed
total % to exceed 100

CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
main_facil, inter_facil, sub_facil,
exp_amt, emk_amt)
VALUES (:b8, :b7, :b6, :b
5, :b4, :b3, nvl(:b2, 0), nvl(:

473 235 2.0 0.1 0.00 0.09 2963598673
select job, nvl2(last_date, 1, 0) from sys.job$ where (((:1 <= n
ext_date) and (next_date < :2)) or ((last_date is null) and
(next_date < :3))) and (field1 = :4 or (field1 = 0 and 'Y' = :5)
) and (this_date is null) order by next_date, job

and it is backed up by the heavy write on the following tablespaces ( which are all residing in RAID 5 configuration ):

Tablespace
------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
UNDOTBS2
0 0 0.0 720 1 0 0.0
USERS
97 0 17.3 13.2 145 0 0 0.0
SYSTEM
14 0 5.7 5.1 3 0 0 0.0
INDX
0 0 0.0 11 0 0 0.0
-------------------------------------------------------------
File IO Stats for DB: MIG Instance: mig Snaps: 17 -18
->ordered by Tablespace, File

Tablespace Filename
------------------------ ----------------------------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
INDX /u1/oradata/mig/indx01.dbf
0 0 11 0 0

SYSTEM /u1/oradata/mig/system01.dbf
14 0 5.7 5.1 3 0 0

UNDOTBS2 /u1/oradata/mig/undotbs2a.dbf
0 0 720 1 0

USERS /u2/oradata/mig/users05.dbf
52 0 22.3 15.6 0 0 0
/u2/oradata/mig/users06.dbf
29 0 17.6 15.5 2 0 0
/u01/oradata/mig/users02.dbf
1 0 0.0 1.0 68 0 0
/u01/oradata/mig/users03.dbf
14 0 0.7 1.0 73 0 0
/u01/oradata/mig/users04.dbf
1 0 0.0 1.0 2 0 0


So, can I say that the main culprit for my system is disk configuration?

Regards,
FYNX




Tom Kyte
June 26, 2005 - 9:36 am UTC

no, you cannot -- you have proven in fact that it is not.

The CPU is under-reported here because this must be a long running stored procedure and it will not update its CPU consumption until it is done. But IO is constantly updated and you are not waiting on IO.

You would wait on log with a "log file sync" and only upon a COMMIT. But PLSQL has a commit optimization that would make this unlikely. And you don't see any log file sync waits.

You would wait on WRITE IO for writes to temp, direct path writes (insert /*+ APPEND */, create table as select and such) but you are not doing any of that, hence you have no WRITE IO waits.

You would wait on READ IO for reads, but you don't have any significant db file scattered|sequential reads.

Nothing here indicates "IO program" not even remotely.


Everything indicates "algorithm that cannot scale".


You might split the data by rowid, not by primary key ranges.

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:10498431232211 <code>

new hardware, lots of space, lots of drives

Laurie Murray, August 08, 2005 - 11:58 pm UTC

Hi, Tom:
We are migrating our entire PeopleSoft system to a new hardware environment and have the luxury of lots of space on many drives (7.5 TB on 104 drives). This SAN (shared storage) will hold both our production databases (one production database is 65GB, the other is 30GB) and all our test databases and our RMAN backups, arch logs, exports, etc.
In the original post in this thread you said, "Try to dedicate specific devices to
1) online redo
2) archive
3) temp.
"
Since this posting and answer was so long ago (2000) I wanted to ask if I should still try to do that or just do SAME.
I am definitely going to put the arch logs and rman backups on their own stripe set so they won't be using the same physical disks as the databases.
Other than that, for the databases themselves, should I use SAME, using RAID 10? Or would it be better to give the redo logs their very own drives and give the temp files their very own drives? (I do want to mirror temp files to reduce downtime)
thanks in advance!
--Laurie





Tom Kyte
August 09, 2005 - 9:26 am UTC

with the 72 or 76 gig devices you have, it might not be feasible.

You would need to slice off at least 6 of the 7x gig devices to do this and I'm not sure that is reasonable. (disk 1&3 are redo group 1, 2&4 are redo group 2, 5&6 are duplexed archives)

I'd be tempted to raid 10 it just to see if it weren't better than good enough, unless you were excessively high volume and redo was known to be a really big bottleneck.

easier to manage. And you have the ability to move things around in the SAN after the fact typically if you need. Moving redo off to a new device is pretty easy (just create new, drop old). Same with temp.

this goes with the previous post.

Laurie Murray, August 09, 2005 - 3:25 am UTC

Here are more details of what I was thinking of doing for the disk and raid config of this system. HR production is currently 30GB. Financials production is currently 65GB. HR has about 10 test/dev databases. Financials has about 10 test/dev databases.
This is PeopleSoft. Oracle 9.2.0.6
I'm not happy with so much space wasted with the online redo log location, but I want to mirror at the Oracle level and also at the OS level.

Storage Array (shared between test and production)
DISKS USED

1)HR Production Datafiles, RAID1+0, 292GB usable 8
Oracle system software for Production HR


2)FIN Production Datafiles, RAID1+0, 584GB usable 16
Oracle system software for Production FIN

3)RAID5 1168GB usable approx 19
HR Test Datafiles
Oracle system software for Test HR

4)RAID5 1314GB usable approx 22
FIN Test Datafiles
Oracle System software for Test FIN

5)HR Recovery, RAID1+0 219GB usable 6
archive logs –HR
RMAN backups—HR

6) FIN Recovery, RAID1+0 511GB usable 14
archive logs-FIN
RMAN backups-FIN

7)FIN temp segs (production), RAID1+0 73GB usable 2

8)HR temp segs (production), RAID1+0 73GB usable 2

9)FIN temp segments (test), RAID0 73GB usable 1

10) HR temp segments (test), RAID0 73GB usable 1

11)Online redo logs & cntrl files - HR Prod. 146GB usable 4

12) Online redo logs & cntrl files- HRTest & FINTest 146GB usable 4

13)Online redo logs & cntrl files for FIN production 146GB Usable 4



1) and 2) cannot fail at the same time.
1) and 5) cannot fail at the same time
2) and 6) cannot fail at the same time


Note: we want RAID1+0, not RAID0+1.


Tom Kyte
August 09, 2005 - 9:45 am UTC

agree with 10 over 01, 10 is more available, more "robust"


you have a single computer/san here, so 1 2 5 6 can all fail simultaneously.

In addition to the disks, think about the access to the disks (controllers , whatever is between the computer and the disks). For that, you'll want to get together with you SAN contact to determine the most failsafe way to configure it (realizing that without a disaster recovery site elsewhere, they can all go "down")

Thanks!

Laurie Murray, August 10, 2005 - 5:59 pm UTC

I incorporated your comments and ended up with this (below). I still want to break out the redo on the test systems because I don't want redo on RAID5.
For production, I am still breaking out arch logs and rman backups to get them on different physical disks than the datafiles....for disaster recovery reasons. I broke out temp because it doesn't cost me much disk space. I rolled the redo logs back into the datafile arrays, because after reading your comments I decided it wasn't worth 300GB to break them out on their own drives. Per your suggestion, I am going to talk to the h/w vendor about RAID controller placement and also "what if the whole SAN goes down?".

Even though you are famous (well, in the Oracle community!), you still take the time to help the little people. Thank you so much. Your answer gave me not only ideas for improvement, but also additional confidence in my design.

Storage Array (shared between test and production)


1) HR Production Datafiles, RAID1+0 438GB usable (73GBx6) 12 disks
Oracle system software for Production HR
HR Production online redo logs
HR Production control files


2)FIN Production Datafiles, RAID1+0 730GB usable 20 disks
Oracle system software for Production FIN
FIN Production online redo logs
FIN Production control files

3)RAID5 876GB usable 15 disks
HR Test Datafiles
Oracle system software for Test HR

4) RAID5 1694GB usable 29 disks
FIN Test Datafiles
Oracle System software for Test FIN

5) HR Recovery, RAID1+0 219GB usable 6 disks
archive logs –HR
RMAN backups—HR


6)FIN Recovery, RAID1+0 511GB usable 14 disks
archive logs-FIN
RMAN backups-FIN


7)FIN temp segments (production), RAID1+0 73GB usable 2 disks

8) HR temp segments (production), RAID1+0 73GB usable 2 disks

9) FIN temp segments (test), RAID0 73GB usable 1 disk

10) HR temp segments (test), RAID0 73GB usable
1 disk

11) Online redo logs & control files for HRTest & FINTest, no RAID, 146GB usable (73GBx2) 2 disks

Total disks used: 104


1) and 2) cannot fail at the same time.
1) and 5) cannot fail at the same time
2) and 6) cannot fail at the same time


Tom Kyte
August 11, 2005 - 9:19 am UTC

(remember, disk layout is not my area of expertise! run this past your SAN vendor, please)

ASM

Steve G, August 30, 2005 - 5:51 pm UTC

You mentioned that SAME was the way to go. I have currently been looking into ASM to provide high availability, i.e. 3 Disk Groups.

In this situation, if I have three striped storage arrays, each of which is a Disk Group would I need to mirror at the hardware level. Doesnt ASM provide the functionality of high availability. It seems like striping and mirroring on the storage array and on top of that having 3 disk groups for high availability would be a waste of resources.

After all, storage these days isnt that cheap, at least the really fast, really good SANS dont seem to be cheap.

Tom Kyte
August 31, 2005 - 12:47 pm UTC

you mean triple redundancy in a single disk group I think, but reading onto the second paragraph I'm not not sure what you mean.

Yes, I'm sure I don't know what you mean. "three striped storage arrays"??

You would give the disks (luns, whatever) to ASM

ASM would stripe.
ASM would mirror.


Diskgroups are not about High availability, failure groups WITHIN a disk group are.

Sorry for the misunderstanding

Steve G, August 31, 2005 - 4:29 pm UTC

I apologize for the poor info in my post.

In the situation where I had 3 failure groups within a disk group for high availability would i also need to perform striping and mirroring in the storage array?

From what I understand ASM will mirror the data by having say two log writer processes writing the same data to multiple failure groups.

This is where I am wondering what exactly striping and mirroring at the hardware level would do? It seems that it would be excessive,especially if utilizing data guard or other high availability features.

Is there any best practices to utilizing ASM. I have been playing around with it a lot and i really like what i see, I am just having a hard time finding some hard and fast rules to implementing it in a production environment.

Tom Kyte
September 01, 2005 - 1:37 am UTC

In the situation where I had 3 failure groups within a disk group for high
availability would i also need to perform striping and mirroring in the storage
array?


No, the failure groups will be doing the mirroring, and the striping will be done for the disk group.

From what I understand ASM will mirror the data by having say two log writer
processes writing the same data to multiple failure groups.


not log writer, it is the data blocks here that are mirrored, lgwr does redo.


This is where I am wondering what exactly striping and mirroring at the hardware
level would do?


it would be "redundant", yes. If your hardware does mirroring, you don't have to do it with ASM but ASM will stripe.

Your major decision is whether to use a single diskgroup for everything or segregate out say - temp, data, and log from eachother. It is mostly a tradeoff between a) ease of use (one big diskgroup) vs b) fine control over the resources with a litte more work on your part.



No need to stripe redo?

Russell H, October 11, 2005 - 5:24 pm UTC

If a set of disks is only used for the redo log, would it make sense to mirror only, and not stripe? If I understand correctly, only one LGWR process writes the log, and the only contention for the disks will from ARCn processes, and will be small.

Tom Kyte
October 12, 2005 - 6:48 am UTC

you can use IO slaves with log writer, so it is not true to say only lgwr writes to them, but in general - that is true.


striping should not affect write performance and if you do mirrored stripes, instead of striped mirrors - you can get better availability than with just mirrors.

Any advice on RAID-5 setup?

Jon, February 14, 2006 - 11:58 am UTC

Hi Tom,

I was hired recently as a DBA for an OLTP database. The DB is being developed by a separate contractor and when it's done it will be delivered and I will maintain it. Currently have the DB production server setup in the rack, but it's not in use. Sits there, I can look at it, but can't do anything. There is another one that they're still working on and once it's done the changes will be ported over to the production.

Anyways, looked at the physical setup, did some research on and realized it's not good. Has 5 hard drives. 3 for a RAID-5, which will have all the DB files and 2 hard drives for the OS (Actually a mirror setup). Plus, one empty slot that's not being used (can add one if I wish). I would definitely like to change the setup to something more efficient.

Currently, my idea is to move the redo, archive, control, and undo to the mirrored hard drives with the OS. Or, would it be even better to keep one or more of them on the RAID-5? (Probably the one(s) that has the least writes.) Add in the 6th hard drive and move the temporary tablespace to that (from what I've read it can be acceptable to not backup the temp).

From what I have to work with that's the best I can come up with. Do you have any suggestions? Am I totally off base or not?

Any helpful suggestions would be greatly appreciated.
Thanks.



Tom Kyte
February 14, 2006 - 1:27 pm UTC

insufficient disks to really do very much with.

undo is written to in the background by DBWR, just like any other datafile - it is "buffered IO", raid-5 on such a small system will likely be OK.

redo would likely be the biggest bottleneck if anything - and with just two disks - not much you can really do.



Concurrent I/O test

Sudershan, February 14, 2006 - 3:16 pm UTC

OS: AIX
Oracle Version: 9i

Hi Tom,
Our Unix folks are looking into doing some concurrent I/O
testing (single process..and multiple process).
I have to prepare some test cases that they are needing (each test case has to run atleast 15 minutes for them to catch enough stats)...
The tests have been classified into different categories..
like sequential read/write tests, random read-write tests,
mixed tests..
I am trying to think how one can prepare these kind of statements..Please comment..here is what I am thinking:
Sequential would be just a select from some tables (on the same physical disks)
Random would be going and reading/writing on disks on different physical disks datafiles).
Mixed is..just a mix of two...
and then Unix guys simulate it as multi-process vs single process.
What can one do to make sure that each read and write goes against the disk..vs database buffers..

Thanks


Tom Kyte
February 14, 2006 - 3:28 pm UTC

... What can one do to make sure that each read and write goes against the disk..vs
database buffers.....

well, if you are using buffered unix file systems, that'll be the first thing you need to "turn off". If you don't, we'll think we are doing physical IO, but we won't be.

And in 10g, you can alter system flush buffer_cache;

In 9i, you can offline and then online the tablespaces - that'll have the same effect.



Tablespace Offline and DIsk write

Sudershan, February 14, 2006 - 5:21 pm UTC

"When OFFLINE NORMAL is specified, Oracle takes a checkpoint for all datafiles of the tablespace as it takes them offline. NORMAL is the default."

So, everything in the buffer cache (for the tablespace) will be written to the disk or expired...
I guess..
--Read test
1. do offline/online
2. then read
--Write test
1. do offline/online
2. do some inserts


keep repeating (1 & 2) for the duration of the test..

comments..


Tom Kyte
February 14, 2006 - 5:40 pm UTC

if and only if you are not using a buffered file system, which most people would be using in many cases.

If you do that - forget it, the numbers will be entirely and wholly meaningless.

Worse than meaningless - entirely MISLEADING and wrong really.

Another question on the Raid-5 Setup

Jon, February 16, 2006 - 11:24 am UTC

Thanks Tom for the previous response.

Yeah, my DB is relatively small, so I probably shouldn't worry too much about the RAID-5 (for now, we'll see how it does when it's up and running). With that said I then turned my attention to simply having good backup strategy.

The way it is now the whole DB is on the RAID-5 (3 disk setup) and once a day there is a full backup via Veritas. The necessary files are copied over to a tape. I can also put the DB in archivelog mode, but as far as I'm aware, this is pointless. The archives, redos and control files aren't multiplexed at all.

What I have to work with is a 4th hard drive which currently only has the OS (it's also mirrored to a 5th hard drive) and 6th slot which currently doesn't have a hard drive (but can if I believe we need one).

My current idea is to put in a 6th hard drive and use it to multiplex the redos and control files between it and the RAID-5. The archives will solely be stored on the 6th hard drive. If HD6 fails, then there are still the redos and control files on the RAID-5. I lose the archives, but that's fine for now because the DB is still running fine. For total loss of the current day's work, 3 out of 4 hard drives would have to fail in a short amount of time.

Another possibility is if I'm able to use the hard drive with the OS on it, I can put the redos on that, which will allow it to be automatically mirrored to a second hard drive. There would be no need to multiplex the redo logs to the RAID-5, causing less IO contention than if the redos, archives and control files were written to the same hard drive. Though, there is a question whether if having the OS and redos on the same hard drive will cause worse IO contention (I really don't know how many/often read/writes the OS will do during normal operations). The current day's lost failure is now 3 out of 6.

Third ideas involves moving the archive, redos and control files to the OS hard drive and have everything mirrored. Now failure is 4 out 5 hard drives (no need for a 6th hd).

So, with weighing the need for a solid backup strategy and having as little IO contention as possible, what seems like the best idea?

Thanks in advance.

Tom Kyte
February 17, 2006 - 1:08 pm UTC

archive log mode is not pointless. You would have a goal of getting the archives OFF of the system frequently, not just once a day.

get the archives OFF and onto something else, often.

Otherwise I want you to sign a piece of paper that says "WE WILL LOSE ALL CHANGES MADE SINCE LAST BACKUP SOME DAY. I don't know what day, but we will. Signed (your DBA): "

And give it to management.

getting archives onto "hd6" is "ok" but MACHINES fail/blow up/get dropped/whatever - the loss of a single disk - hardly happens, the loss of everything - happens more.

I'd work on getting them onto another machine/tape with greater frequency.

RE: Another question on the Raid-5 Setup

Jon, February 17, 2006 - 1:50 pm UTC

Tom,

Oh, I'm well aware that archive log mode is not pointless; I just meant with the current setup archive log mode is rendered useless. All files are on one RAID-5 and it's only backed up once a day. Archive logs are not going to help you when we lose the RAID-5 in the middle of the day.

Actually, I had originally asked about a direct connection to another computer so at least all the logs can be saved there. I was met with reluctance.

Me trying to get the logs saved somewhere else on the server that isn't the RAID-5 is kind of my compromise.

But, thanks for the response. I'll try to get it changed so that updates, at least for the archive logs, are more frequent than once a day. I need to come up with a really good speech that will scare the crap out of them....


Tom Kyte
February 17, 2006 - 5:01 pm UTC

no it wasn't, archives should be gotten off onto something else - soon.

archives won't help you when you lose disks 1, 3 and 6 either.


Print out my message above and have THEM all sign it.

write back cache

Mark A. Gold, February 21, 2006 - 1:09 pm UTC

When considering mass storage devices and RAID levels I am also faced with the
decision to use or not use write back cache. What are your thoughts on the
matter?

P.S. My apologies for the cross posting.

Tom Kyte
February 22, 2006 - 8:07 am UTC

well, look at the other place. (If you know 'cross posting' is bad....?)

what about instance

A reader, February 22, 2006 - 9:58 am UTC

You said earlier in this thread that
<quote>
o no raid or raid 0 for temporary datafiles (used with temporary
tablespaces). no raid/raid 0 is sufficient. If you lose these, who cares? You
want speed on these, not reliability. If a disk fails, drop and recreate temp
elsewhere.
</quoute>

But when Oracle is doing processing on tempfiles and
during that the disk fails , doesnt the oracle instance
will be crashed, which will bring whole database down.?


Thanks


Tom Kyte
February 22, 2006 - 10:23 am UTC

it will not crash the entire instance, only sessions having something in temp will be affected.

If you don't care about that, no raid/raid 0 is sufficient.

Re: Followup

Mark A. Gold, February 22, 2006 - 2:32 pm UTC

Yes I know it's bad and I'm man enough to admit it! That'll teach me to click and not look.

Information Please -- Having Problems

Michael Cornwell, August 13, 2006 - 12:59 am UTC

I have an old Raid System (Aii Harmonics II)-(10 Drives) - (18 Gig Each) - (Adaptec AHA 2940U2W - Adapter)-- Back Groud - Aii Harmonics was bought by Agfa Inc - All drivers, information, and documentation stopped at that point.

After upgrading to XP Pro I Can not get the configuration working.
I run Scan on the adapter ID's 0-15 ID#0=aii CV640
ID#7=AHA 2940U2W -- Error ID#0 "Says it is not a SCSI Disk Device"

Tom Kyte
August 13, 2006 - 9:14 am UTC

sorry, not too much I (here at Oracle) can do?



striping performance?

A reader, August 13, 2006 - 11:42 pm UTC

Hi Tom

In some previous post, why did you say that striping should not affect write performance.
I think it should be, the more disks the more process to write the more I/O bandwidt, am I correct?

Thanks!

Tom Kyte
August 14, 2006 - 10:45 am UTC

instead of being vague with "in some previous post" please be more precise and point us to the exact quote in question.

the first thing I wrote about in the original answer included for example:

...
raid 0+1 for rollback. It get written to lots. It is important to have
protected. We cannot multiplex them so let the OS do it. Use this for
datafiles you believe will be HEAVILY written.

...
o raid 5 (unless you can do raid 0+1 for all of course) for datafiles that
experience what you determine to be "medium" or "moderate" write activity.

........

I seem to have said the opposite of what you state.

Strinping performance

A reader, August 14, 2006 - 9:21 pm UTC

Hi Tom

I quote what you said.

striping should not affect write performance and if you do mirrored stripes,
instead of striped mirrors - you can get better availability than with just
mirrors.

Can you pls explain?

Thanks

Tom Kyte
August 15, 2006 - 7:38 am UTC

we were talking about redo. optimially arch and lgwr "own" their own disk. striping will not really affect the write performance overall, in normal circumstances.

Striping performance

A reader, August 15, 2006 - 9:14 pm UTC

Hi Tom

Why striping can't improve writing performace of LGWR and ARCH? If LGWR gets some jobs to do, it will transfer these to I/O system, if more drivers can participate, the job will be finished faster, right? I think that's why oracle names LGWR writing as log file parallel write. In SAME, online log are striped on all disks, I think this should be better.

Thanks

Tom Kyte
August 16, 2006 - 8:13 am UTC

it is large sequential IO done by a single process.


striping may

a) increase
b) descrease
c) not affect at all

performance. How's that?

Striping performance

A reader, August 16, 2006 - 9:10 am UTC

Hi Tom

So if instance only has one DBWR, the striping doesn't improve performance at all, just like LGWR?

Thanks

Tom Kyte
August 16, 2006 - 9:19 am UTC

a, b, and c fully apply in all cases.

Striping performance

A reader, August 16, 2006 - 9:41 am UTC

ahhhhh!!

Tom, can you explain why striping causes a,b,c respectivelly? When I get a new system, what is the basic rule to design the I/O system?

Thanks! pls save me...


Tom Kyte
August 16, 2006 - 10:20 am UTC

You can come up with plausible cases for all three - whereby striping would make it faster, slower, not any different.


Say lgwr and arch had their own dedicated device.
Now you stripe it in with the "greater good" (everything else).
Whereas we had dedicated resources, now we have to share.

for example.

Striping performance

A reader, August 16, 2006 - 10:38 am UTC

Tom

Let's only say one process, LGWR or 1 DBWR, if this only 1 process's write performance can benefit from striping more hard disks?

Thanks

Tom Kyte
August 16, 2006 - 10:42 am UTC

yes.

if I make various and sundry assumptions.

such as "currently there is one disk" for example and by going to striping there are now "two disks"

"it depends"

Striping performance

A reader, August 16, 2006 - 10:50 am UTC

Hi Tom

So the best way to design the best IO subsystem is to benchmark various possible solution, right?

Thanks

Tom Kyte
August 16, 2006 - 11:23 am UTC

understand

a) your budget
b) your constraints (eg: thou shalt use this SAN)
c) how things work
d) what your "system" is going to do (workload)

to get a best guess.


could be "i'm a departmental thing, light to moderate use, I'll just stripe everything - it'll be better than good enough"

could be "I'm a data warehouse of never before seen size..." "I'm a transaction system where much thought must be given to making sure lgwr is able to go full throttle"

Striping performance

A reader, August 16, 2006 - 11:35 pm UTC

Hi Tom

I read many intelligent threads here before, but this one is the most impressively for me, I've been thinking SAME is the sliver bullet, but I forget the essence of rdbms, nothing is absolutely better!

REALLY THANK YOU AND HUG YOU IF YOU LIKE :)

Not to beat a dead horse, but...

Rob, August 18, 2006 - 4:47 pm UTC

Tom,

First, I greatly appreciate your site. You employ clear, concise language that is easy to follow. I am implementing a new 10g RAC on Linux that will use a SAN, and have a question regarding this comment of yours:

"no raid, raid 0 or raid 0+1 for online redo logs AND control files.
You should still let us multiplex them ourselves even if you mirror
them. We have more opportunities for failure if the raid subsystem reports a
"warning" back to us -- if we have multiplexed them -- we are OK with that."

Presuming that we're using a RAID scheme which provides redundancy/protection, is the RAID subsystem's "warning" interpreted differently depending on whether ALL redo log group members reside on a SINGLE array, or on DIFFERENT arrays?

For instance, if I use a single RAID array for all database-related files in a Stripe-And-Mirror-Everything (SAME) configuration, and one of the disks fails, then all redo log group members have lost their mirroring.

On the other hand, suppose I put each redo log group member on different RAID arrays. If one disk on one array goes down, then one group member will lose its mirroring, but the group member(s) on the other RAID array(s) still have their mirroring.

The situations are slightly different, and I'm wondering if Oracle interprets the warnings differently. My guess would be "yes," because the in the first case of a single array, Oracle would be warned about the writes for all redo log group members, while in the second case, Oracle would only be warned about one member.

However, I don't know whether that is actually the case, so I'm asking the question. Thanks!

Tom Kyte
August 18, 2006 - 4:56 pm UTC

you'd want them on presumably different physical devices yes - if a raid array started going "funky", returning soft errors or whatever - you could assume a high probability that other files on the same devices would do the same.

The fact is raid will go down, the SAN's that never go offline - do. You can skip the Oracle mirroring of these files if you like - but given the size of the data we are talking about here (small relatively speaking)...

Raid 5 is actaully quite good for oracle

John Silver, September 12, 2006 - 3:29 pm UTC

In the correct configuration that it. This guy </a>has done some benchmarks that show write performance on a hardware RAID 5 is almost exactly the same as on mirrors and RAID 10.

</code> http://spiralbound.net/2006/09/08/why-modern-raid-5-is-ideal-for-oracle-databases/ <code>

Tom Kyte
September 13, 2006 - 6:48 am UTC

I might be more impressed if he actually used, well, a database in the benchmark.

Also, did he or did he not "flood the cache", that is where the problem comes in, after the front end cache many arrays have is overloaded.

Raid 5 followup

John Silver, September 13, 2006 - 9:06 am UTC

I've emailed him, but have not heard back yet. I would imagine though that he did flood the cache since his device had a 1GB cache, and he did all his tests on files up to 2GB. It is true that his graphs show speed dropping off at the 1GB point, but this was the case on his mirror and raid 10 as well.

As far as using a database for the tests, I don't know how he could have graphed the results. As far as I can tell, writes to random points within a large file in various chunck sizes should represent updates pretty well.

Tom Kyte
September 13, 2006 - 2:43 pm UTC

as for the graphs, something like

"Hey, I'm loading my database using direct IO (warehouse, direct path loads) so we know we are slamming the datafiles (not using buffered IO - which I believe I said above was perfectly OK, yup, I did) and the online redo logs and archives"

To test raid 5 for databases, one would presume you would use - well - a database and raid 5, not just one or the other.

2gb is "small stuff" in the world of warehousing where raid 5 is, well, not a choice you should probably make.

Nothing represents a database like - a database.

Oracle Disk Configuration

Eduardo, October 19, 2006 - 6:13 pm UTC

Hi Tom, your comments would be very appreciated. We are going to change our environment ( from COMPAQ to IBM ) and we have a new SAN, vendor IBM.

Can you give me your opinion about this new disk configuration:

LOCAL DISK 1 (RAID1 2x36 gb mirrored)

* Oracle Software

LOCAL DISK 2 (RAID1 2x36 gb mirrored)

* ARC´S ALTERNATE

LUN 1 (RAID1 2x36 mirrored)

* SYSTEM
* CONTROL FILE 1

LUN 2 (RAID1 2x36 mirrored)

* REDO LOG
* CONTROL FILE 2

LUN 3 (RAID1 2x36 mirrored)

* ARC´S MANDATORY
* UNDO

LUN 4 (RAID5 3x36 mirrored)

* DATA
* INDEX

LUN 5 (RAID0 1x36 mirrored)

* TEMPORARY


Thank you very much

Tom Kyte
October 20, 2006 - 4:06 am UTC

not really. anyone that says they can based on your inputs is making stuff up.

Oracle Disk Configuration

Eduardo, October 19, 2006 - 6:17 pm UTC

Hi again Tom

Sorry, I´ve used badly copy & paste on LUN 4 and 5. The correct configuration for them are:

LUN 4 (RAID5 3x36)

* DATA
* INDEX

LUN 5 (RAID0 1x36)

* TEMPORARY


Thank you very much.

Does your first answer still apply

Aps, February 07, 2007 - 5:39 am UTC

Hi Tom -

Your original answer to the origianl question is almost 6.5 years old now.

How would you answer the original question now, given a 10.2.0.2 database, an Oracle Server with 3 x 73GB internal disks (15k speed), and a SAN with 25 300GB disks (10k speed each)?

What would you change from your original response in this situation?

Thanks,
Aps
Tom Kyte
February 07, 2007 - 6:40 pm UTC

nothing.

good to know

Doc J., February 09, 2007 - 8:00 am UTC



OK.

Then how about with ASM using external redundancy?

If I wanted to use ASM with external redundancy, then there's really no need to concern myslef with these different RAID levels for different categories. Or am I wrong?

Can you speak about some advantages and disadvantages of just using RAID5 with ASM (external redundancy) with a Hot Spare disk. Ideally I'd like to use RAID10 with normal redundancy, but can't afford the loss of disk space needed to do it. So I'm leaning towards RAID5 for the external redundancy.

I don't immediately see any major problem with RAID5 / ASM / external redundancy, but then again I don't have the experience with it that you or this community may have with this configuration.

Any insights or opinions or hints are very much appreciated. Thanks to all in advance. . .




Tom Kyte
February 09, 2007 - 8:29 am UTC

except from a performance and available perspective, sure if you can ignore perform and availability then you need not concern yourself!


there is no loss of disk space, there is simply "increased cost for increased availability"

raid 5 - much less available and performant then raid 10

raid 5 cheaper than raid 10


great information

A reader, February 09, 2007 - 5:03 pm UTC

Thanks Tom ,this is very useful information.

2 follow up questions.
1) Is it absolutely necessary to have a disk set aside as a Hot Spare in a RAID10 configuration? I know it depends on a lot of variables, and it may be cited as a "nice to have", but rather is it an unavoidable configuration necessity?

2) Can you point out the differences between using ASM with RAID10 with external redundancy vs. ASM with normal redundancy (specified with the ASM template). Is there any difference between these two options at the end of the day?

Thanks again for your valuable input


Tom Kyte
February 12, 2007 - 9:41 am UTC

1) nothing is absolutely necessary

but if you have a failure, without hot spares, you are running unprotected. You are back to a single disk.

2) There conceptually is no difference. You achieve mirroring and striping.

Qualifying the scenario further

A reader, February 27, 2007 - 2:07 am UTC

1. Given that I have 14 x 146GB SAS 15K drives to utilise, and I want to use ASM, is it reasonable to assume that I'll get 7 x 146GB in total space?

i.e. inherent 'SAME' of ASM is similar to RAID10 space usage wise.

2. Given that in order to actually connect those 14 drives I have a hardware RAID controller already, is it then preferrable to utilise this hardware's capability and let the controller do the mirroring, and let ASM only do the striping? (i.e. no failure groups)?

By preferrable, I really mean what would you choose Tom ;)

Thanks...
Tom Kyte
February 27, 2007 - 10:40 am UTC

1) depends

you can use

o external redundancy - no mirroring
o normal redundancy - paired mirrors, 2 copies
o high redundancy - triple mirroring

so - which do you pick, then divide by 1, 2 or 3.


2) depends on what the raid controller does. Raid by itself does not IMPLY MEAN or DENOTE "we are protected"


and what I would choose it not relevant. For I don't have your requirements and needs. Suppose your disk were hardware raid-5. That is good read performance, so so write performance and somewhat OK protection from failure.

If I needed really good protection from failure, raid-5 might not meet my needs.

Qualifying further...

A reader, February 28, 2007 - 6:08 pm UTC

Ok, let me clarify: I'm definitely looking for RAID10 overall. My dilemma is, do I let the hardware raid controller (which can do any raid redundancy type) do the mirroring, given that I have it available to utilize... or let ASM do the striping AND mirroring?

Tom Kyte
February 28, 2007 - 10:04 pm UTC

if you want mirrors, normal redundancy, you'll get 50% of the disk you have - yes.

ASM will always stripe
you want it to mirror as well

A reader, February 28, 2007 - 11:55 pm UTC

Thanks Tom. I'll let the hardware controller do the mirroring. I was just trying to see if you would opt for that given the same scenario (i.e. already owning the controller hardware, and already requiring overall RAID10 strategy).

hat happens when the disk that has temp fails?

Dev, April 30, 2007 - 3:40 am UTC

Hi Tom,
You said "no raid or raid 0 for temporary datafiles (used with temporary tablespaces). no raid/raid 0 is sufficient. If you lose these, who cares? You want speed on these, not reliability. If a disk fails, drop and recreate temp elsewhere."
What happens when the disk that has temp fails? Does the app that use the tablespace as temp continue functioning normally?
Thanks,
Dev

Tom Kyte
April 30, 2007 - 9:39 am UTC

any request that needed data

a) that was written to that disk
b) to be stored to that disk

would obviously fail at that point. You would recreate temp somewhere else so that b) is taken care of.

Things that wrote to temp and are then trying to read it back (after temp failed) will of course receive an error at that point.

Which redo log member does ARCH read

Rory Concepcion, June 14, 2007 - 3:40 am UTC

Hi Tom,

Considering that I have two members for each log group,
which member does ARCH process read to archive the online redo logs? The first member or the second member? Or does ARCH read both?
Or does Oracle have a function to randomly choose any member based from performance etc?

RAID 0

Deepak, September 05, 2007 - 2:05 pm UTC

Hi Tom,

If one has three disks. And the goal is to reduce i/o contention of the tablespace(TBS01).
Which of the following would be a better solution and why?

1>RAID 0 of three disks and a single file (of TBS01) striped across

2>Three files for the tablespace(TBS01) spread across all the three disks

Considering the fact that oracle distributes the extents equally across the files.
Tom Kyte
September 05, 2007 - 5:42 pm UTC

they would generally be about the same - they both "stripe"

in the year 2007, probably #1 since it is pretty ubiquitous (raid capabilities), especially since ASM that comes with the database is there.

Use Hardware RAID1 and ASM to Stripe?

LJ, November 24, 2007 - 10:17 am UTC

Hi Tom,

From the documentation:
http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/configbp.htm

"Oracle recommends that you configure redundancy in the storage array by enabling RAID protection, such as RAID1 (mirroring) or RAID5 (striping plus parity). For example, to create an ASM disk group where redundancy is provided by the storage array, first create the RAID-protected logical unit numbers (LUNs) in the storage array, and then create the ASM disk group using the EXTERNAL REDUNDANCY clause:

CREATE DISKGROUP DATA EXTERNAL REDUNDANCY DISK
'/devices/lun1','/devices/lun2','/devices/lun3','/devices/lun4';


If the storage array does not offer the desired level of redundancy, or if there is a need to configure redundancy across multiple storage arrays, then use ASM redundancy. ASM provides redundancy with the use of failure groups, which are defined during disk group creation. ASM redundancy can be either Normal redundancy, where files are two-way mirrored, or high redundancy, where files are three-way mirrored. Once a disk group is created, the redundancy level cannot be changed."

So it sounds like we can ask the Server/SAN guy to give us our disks in RAID1 (mirroring) and let ASM handle the striping, essentially giving us a RAID10. Is that how ASM can work?
If it does make sense, then what would a current RAID10 disk layout which uses ASM look like? i.e., the 0 part of the existing RAID10 is already doing striping, but then layer ASM on top of the existing striping and ASM stripes again? Like striping a stripe? Or do I see this incorrectly?

I suppose I mean to ask if you see any advantages/disadvantage or no difference if we asked the Server/SAN guy for RAID1 and let ASM do the striping; or does ASM need/use the 0 part of RAID10 to its advantage somehow.

Thanks for your thoughts, even thought you're not a server guy we all value your opinion.</code>

LJ
Tom Kyte
November 26, 2007 - 12:57 pm UTC

... So it sounds like we can ask the Server/SAN guy to give us our disks in RAID1 (mirroring) and let ASM handle the striping, essentially giving us a RAID10. Is that how ASM can work? ...


Yes.


ASM will always stripe and striping a stripe is perfect OK, normal.

So, ASM will always stripe and if the underlying devices are already stripe-ing themselves, that is OK.

Cool!!!

Louis, November 27, 2007 - 7:09 am UTC

So basically, we could just leave the SAN guy out of the mix and use Normal Redundancy from ASM to get the same effect as asking the SAN guy for RAID1.

Or if we have 30 disks to work with, ask 26 disks for Oracle with ASM normal redundancy and make the other 4 RAID10 using hardware RAID controller for the ORACLE_HOME or whatever else.

(*clink, light bulb goes on)



Tom Kyte
November 27, 2007 - 3:45 pm UTC

yes, you can do that.

ASM redo log multiplexing

Marc, February 19, 2008 - 3:53 pm UTC

Tom,

Is there any need to multiplex redo logs if we are using ASM which sits on a SAN? What are the real advantages/disadvantages of multiplexing redo logs in ASM, especially in the same diskgroup.


So, S.A.M.E. or RAID10?

Chris Stewart, March 04, 2008 - 12:11 pm UTC

It appears that both have been offered up as solutions at one point or another in this thread. Would you opt for RAID10 in most common scenarios, or S.A.M.E.?

With regards to S.A.M.E., are you suggesting that we RAID0+1 into a single logical drive, or into multiple logical drives for redo logs, temp, archive, rollback, data, and the OS respectively?
Tom Kyte
March 04, 2008 - 1:15 pm UTC

SAME should be MASE really... ASM is like raid 10 - mirrored stripes.

depends on how much you are willing to spend, do you want raid 10 for everything, or would you like to use differing levels of raid

vendor benchmarks

JM, March 04, 2008 - 4:40 pm UTC

I have seen a few whitepapers around done by various vendors that show that separating disks into a bunch of different 1+0 (all the same RAID config) groups didn't perform any better than taking all disks and making a single 1+0 group and using that for redo, data files, etc. The white paper I am thinking of was done by veritas.
Tom Kyte
March 04, 2008 - 7:48 pm UTC

sure, no argument here.

but what about using DIFFERENT raid levels - for $$$$ reasons. That is what I was saying..

raid 5 - perfectly OK many times for many people for data files. Exceptions would be data warehouse with large direct path operations and high end oltp.

and so on, as listed at the top.

Just a little question

Sylvain, March 05, 2008 - 10:05 am UTC

Hi Tom (great site, congrats),

We have a new IBM AIX server with 2 disks non-mirrored (A et B) and 4 disks in RAID10 (hardware controller, C). If I understand correctly your saying (if not, sorry, I'm french!), I can expect to have quite good performances by placing :
- OS/Oracle software on disk A/B (software mirroring by IBM for O/S & maybe oracle software)
- DATA / INDEX / UNDO on disk C (hardware RAID10)
- TEMP on disk A (no recovery needs)
- ONLINE REDO & control files multiplexed by oracle on A/B/C

Am I correct? (it will be a <100Gb database, used to make reporting)

Thanks a lot tom, & sorry for my english ;-)

Sylvain
Tom Kyte
March 05, 2008 - 2:57 pm UTC

... I can expect to have quite good performances by placing ....

you have very few disks, nominal performance maybe. adequate, "ok".

probably everything on C, multiplex to B for control/redo and use A for OS

there just isn't enough disk here

And on a SAN

John, September 04, 2010 - 11:01 am UTC

Are considerations on raid 0+1 can be said for a SAN Storage like EMC Symmetrix DMX3. For a classical station, Raid 0 performances multiply the bandwidth near the number of devices (3 Disks at 100MB/s gives a unique full scan on the database approx. near 300MB/s). Does SAN Storage give this advantage or is the bandwidth is far less than this result? (I supposed than FCs bandwidth is more than 300MB/s).
Thanks!
Tom Kyte
September 09, 2010 - 7:35 pm UTC

A SAN is just a bunch of disks in a cabinet (with computers and software on top to do striping, hot swapping, mirroring, rebalancing, etc). As long as the SAN is set up correctly (meaning - you are not really sharing your devices with everyone else, you have dedicated resources) it is pretty much the same as if they were connected to you. The raid story holds true for them.

San flash cash

John, September 11, 2010 - 3:08 pm UTC

Ok thank you, I didn't explained on the previous post but I thank that flash cache (SAN Flash cache) could lower the rate between physical disks in raid 0 and the database plateform...

Thank you for your response :)

Best Practice RAID for 11g RAC & ASM

Duncan, September 21, 2010 - 3:09 am UTC

You say that it is OK to leave all the redundancy to ASM, or use RAID on the SAN, and ASM redundancy on top of that. But the documents I have searched are mostly Oracle 10g, and I cannot find a best practice recommendation for the RAID level of a SAN with ASM.
In 11g, ASM can now store the voting discs and Oracle Cluster Registry for RAC, of which I need three copies, so need three disk groups. That in itself in unforunate because only two diskgroups are recommended.
I'm thinking that I need to have three LUNs on the SAN, two large ones for diskgroups 1&2 (data & flashback), and a small one for the 3rd copy of RAC files. I will only have one virtual disc in each group, so that should be configured as RAID 10. Is that best practice ?
The alternative seems to be to ask for the actual discs and hand the redundancy to ASM.

DB Setup

Slavko, February 02, 2011 - 11:01 am UTC

Hi Tom,

I actually did not know where to put this question. This was the closest I could find.

We are actually taking over a whole system from another company.
We are going to do the Oracle installation on a red hat linux with 36GB internal memory and 3 disks. My question is how to distribute the Oracle Software, Datafiles, Redo Logs and Archive redo logs with regards to maximum disk availability.
Also only oracle will run on this server. What would be a good starting point with respect to SGA. This is for a onlime gaming company so we have a lot of users.

We are considering running a standby database(on another linux with same setup) with default settings with regards to performance (maximum performance). Does this mean you dont loose any performance on the primary or you loose some but not much.

With Regards,
Tom Kyte
February 02, 2011 - 1:16 pm UTC

with just 3 disks you don't have much to work with at all. Availability is going to not be easy to achieve - you don't have enough physically separate hardware components to mirror everything.

You could give one disk to the os for software and the other two to ASM for mirroring - but if the OS disk failed you are out of business. You need more components in there to be more available - more redundancy.

as for the SGA, just max it out - you don't need the memory for anything else. Use the advisors to set individual components or just set the SGA_TARGET and let the database divvy up the memory.

Max performance means "least impact". It'll have some impact - everything will - but the least possible impact.

that much SGA

RussellH, February 03, 2011 - 6:04 pm UTC

I'm surprised that you would say to "just max out" the SGA on a system with 36 G memory if it is not needed. Do you think they will run into issues with very large page tables by default, if the don't have Hugepages set up?

http://kevinclosson.wordpress.com/2010/09/28/configuring-linux-hugepages-for-oracle-database-is-just-too-difficult-part-i/
Tom Kyte
February 04, 2011 - 8:34 am UTC

one would assume they knew how to configure such things - that they knew their operating system.

why bother having 36gb of ram if you are not going to use it, remember the only thing on this machine is their database.

Raid for my new Dell box

shafi M, November 25, 2011 - 12:10 am UTC

Dear Sir, I wish to know the best raid level possible(considering all oracle data/index/log/archives files and linux OS files ) for my new Dell box of 6 X 300 GB size.

Kindly advise as I want to configure now.

Thank You!
Shafi
Tom Kyte
November 28, 2011 - 9:44 am UTC

You tell us - are you space conscious? Do you want best over all performance? Do you want the highest of availability? Do you want "pretty good availability"?

I might suggest - no raid at the OS level for the database disk - give the disks to ASM and let it do the striping and mirroring for you.

Raid for my new Dell box

shafi M, November 25, 2011 - 12:11 am UTC

Dear Sir, I wish to know the best raid level possible(considering all oracle data/index/log/archives files and linux OS files ) for my new Dell box of 6 X 300 GB size.

Kindly advise as I want to configure now.

Thank You!
Shafi

RAID

A reader, June 16, 2012 - 1:36 pm UTC


RAID for Database

Hashim, March 22, 2014 - 7:59 am UTC

I have planning to purchase Database server. here i would like to configure two type of RAID. RAID5(3HDD) for operating systems and RAID 10 (3HDD) for DB.
1) Minimum how many HDD for RAID10?

2) If my concept is wrong, please suggest me the good one.

Thanks