Skip to Main Content
  • Questions
  • Stripe and Mirror Everything, including redo logs?

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Clement.

Asked: June 27, 2002 - 12:20 pm UTC

Last updated: November 21, 2012 - 9:17 am UTC

Version: 9.0.1

Viewed 1000+ times

You Asked

I have read the open world presenation on "Optimal Storage Configuration Made Easy" </code> http://technet.oracle.com/deploy/availability/pdf/oow2000_sane.pdf <code>

Based on the fact that Oracle now has OMF (Oracle Managed Files), in which we only provide one mount point and all files are placed on that one mount. Should DBA be concerned with multiplexing redo logs, control files and archived logs?

The above presentation makes good points that the operating system will do a better job at distributing I/O then the DBA ever could. Is it now safe to stop multiplexing Oracle files, since multiplexing will just double the I/O to the same set of disk drives? I know that some DBA like to multiplex as a safe guard for human error, but there is nothing to stop a careless DBA from deleting all of the multiplexed copies of a file.

Clement

and Tom said...

YES, the DBA SHOULD BE CONCERNED.


The parameters are actually:

db_create_file_dest string
db_create_online_log_dest_1 string
db_create_online_log_dest_2 string
db_create_online_log_dest_3 string
db_create_online_log_dest_4 string
db_create_online_log_dest_5 string

The online log dest controls where mirrors of the redo and control files go. You still need to pay attention to that. None of them affect the archiving dest, none of that has changed.

If the disks are 100% assured to never fail, you would not need multi-plexing. Mirrors decrease but do not remove that. I still multi-plex them no matter what. I'm not worried about the careless dba, I'm worried about hardward failing.




Rating

  (18 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Disk failure is not the only reason

A reader, June 27, 2002 - 1:33 pm UTC

I have a very high confidence in our disk array. Mirrors, stripes and all that. Still we backup daily because we know that a failure will occur. For example I have had to recover all files on a mount point because of a file system corruption. This made everything on that mount point unusable. I would hate to think of what would happen it that file system held our redo logs and they were not duplexed to another file system. Broken disks are not the only point of failure in a file system.

More questions on S.A.M.E.

Clement Charbonnet, June 27, 2002 - 2:45 pm UTC

The OpenWorld article which comes from an Oracle employee ( </code> http://technet.oracle.com/deploy/availability/pdf/oow2000_sane.pdf <code>) states that all the disks should be striped and mirrored together. Multiple mount points would actually exist on the same set of disks creating one large drive with sub-mount points. This means that a hardware failure that would cause loss of data would impact all the mount points and all the multiplexed copies. Given this methodology why would you continue to muliplex, since multiplexing will not protect from device failures?

Tom Kyte
June 27, 2002 - 3:02 pm UTC

it would seem self evident to me anyway that you would use common sense and multiplex to different, independent physical devices. Ones that will not be affected by the demise of the other. Else, as you pointed out, you are wasting your time.

Also, this paper discusses this:

<quote>
Striping the online log files across many disks does increase the probability that a double disk failure will destroy an online log. Loss of an online log will cause unrecoverable loss of the most recent updates. Again, the probability of
this is very low, and a standby or remote mirror of the online log is the best way to protect against it. If a standby is too complex or expensive to implement and you want to reduce the very low probability of data loss, then you should
implement additional mirroring of the online logs, control file, and archive logs. We call these three sets of files the recovery set. These are the files that are needed to perform a full recovery from a backup.


Ideally you should create an additional mirror of all files in the recovery set at the Oracle level. Mirroring by Oracle using file multiplexing is somewhat more resilient to corruptions than mirroring at the storage subsystem level1. The
additional copy of the recovery set should be separated from the main copies in every way possible in order to reduce the incidence of correlated failures. For example, the additional mirror should be on a separate files system, which
resides on a separate volume, which is on different disks, accessed by different controllers, and ideally on a separate RAID device.

</quote>

So, they tell you to do just that (use different, separate devices)


so yes, I would. Multiplex them -- standby (data guard in 9i) is a great mutli-plexing opportunity as well.


A reader, August 09, 2002 - 1:12 pm UTC

Hi Tom,

In that same paper (Optimal storage configuration made easy), they suggested :
<quote>
1-Ensure that sequential access occurs in at least 1 megabyte units to achieve high sequential bandwidth

2-Place frequently used data on the outer half of disks to provide the fastest transfer rate.

3-Place frequently used data on the outer half of disks to minimize seek overhead.

4-Place less frequently accessed data towards the inside half of disks.

<quote>

And here are my doubts:

a-I would really appreciate if you can explain me what should I do in order to implement points 2,3 and 4. I cannot figure out what to do in order to create a table in
the outer/inside half of the disk. Is there any special command to do this?

b-They suggested to set db_file_multiblock_read_count=stripe width=OS IO=1megabyte.
b.1-Does the OS IO mean OS block size?
b.2-What is your oponion about this? Do you think that this strategy can be apply to a server which will support an OLTP database with some batch process running during the day, without problems?

Thanks in advance for any help you can give me.


Tom Kyte
August 09, 2002 - 1:37 pm UTC

2, 3, 4 -- almost impossible to achive, especially with volume managers and other stuff.

Much Much Much bigger fish to fry out there. I would skip 2, 3, and 4. (you would use physical partitions on the disk to do this, you would partition the disk and use the right partitions -- datafiles -- for certain tables)


b -- sounds more like DW then OLTP to me. db_file_multiblock_read_count is data warehouse, not oltp.




A reader, August 09, 2002 - 2:47 pm UTC

thanks for answering me Tom.

1-So, taking into account your answer for point b, which is the OS block size and stripe size that you suggest for a database which will have a 8K db_block_size. Can you help me with this point?

2-Can you tell me if I am in the right direction:
I will have a new storage with 30 disks, each of them will have 73Gigabytes and
my intention is to distribute the disk in the following way:
-Rollbacks -> raid 10
-system, online redo and control files -> 10
-archive redo -> raid 10
-temporary -> raid 0
-most used appl.data files -> raid 10
-less used appl.data files -> raid 5

3-What does exactly mean a "hot spare disk"?

Thanks for helping me with this, I really appreciate it.


Tom Kyte
August 09, 2002 - 2:58 pm UTC

1) see </code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:396975084466
for some guidance. I don't have hard numbers in that area.


2) see
http://www.acnc.com/04_01_10.html <code>

raid 10 is good for that.


3) hot spare generally means "sitting there waiting to be used". You have extra disks just waiting to be plugged in or already plugged in in the event of failure.


A reader, August 09, 2002 - 3:59 pm UTC

I have read many posting in your web page and other papers about stripe size and OI size but I'm still a little confuse. Can you explain me the difference between IO size, stripe size and OS block size? Are the OS block size and IO size the same thing?

Thanks

Tom Kyte
August 09, 2002 - 4:41 pm UTC

OS block size -- how the operating system manages things at the lowest level

IO size -- how much IO the controllers can do in one fell swoop (hardware thing)

Stripe Size -- when setting up RAID, you many times "stripe" the disks -- using more then one disk to appear as a single disk. If your strip size was one meg and you wrote 10 meg of data -- 1 meg goes to one disk , 1 meg to the other, 1 meg to the first and so on.

IO size and OS block size are not the same, IO size is generally MUCH larger.

OMF & Redo's

vinnie, January 07, 2004 - 3:47 pm UTC

I am considering moving to OMF's.

When looking at the REDO's I noticed that when you configure
the following as suggested:
db_create_online_log_dest_1 = '/u1/oradata/sample'
db_create_online_log_dest_2 = '/u2/oradata/sample'

When you create the dB
you will get
/u1/oradata/sample/ora_1_xxxx.log

If this is true, wouldn't you just let the init.ora
file create your control files in alternate locations?
What do you do?
/u1/oradata/sample/ora_2_xxxx.log
/u1/oradata/sample/ora_xxxx.ctl
&
/u2/oradata/sample/ora_1_nnnn.log
/u2/oradata/sample/ora_2_nnnn.log
/u2/oradata/sample/ora_nnnn.ctl

My question is, why would I want the control files on the same mount pt as my logs?
Aren't .clt files written too often?
And .log files written sequentially?

Wouldn't I just add the entries in my init.ora file for the location of my .ctl files?
What do you do?

Tom Kyte
January 08, 2004 - 11:16 am UTC

control files are written to at least every three seconds yes.

If you have sufficient disk to spread everything out (SAME is a good idea), go for it. No one is saying you have to put ctl and redo on the same disk (i do myself try to isolate redo as much as possible, it is pretty rare for the ctl files to become a bottleneck, it can happen, but not to most systems)

OMF ??

Vinnie, January 08, 2004 - 9:42 am UTC

I am considering moving to OMF's.

When looking at the REDO's I noticed that when you configure
the following as suggested:
db_create_online_log_dest_1 = '/u1/oradata/sample'
db_create_online_log_dest_2 = '/u2/oradata/sample'

When you create the dB
you will get
/u1/oradata/sample/ora_1_xxxx.log

If this is true, wouldn't you just let the init.ora
file create your control files in alternate locations?
What do you do?
/u1/oradata/sample/ora_2_xxxx.log
/u1/oradata/sample/ora_xxxx.ctl
&
/u2/oradata/sample/ora_1_nnnn.log
/u2/oradata/sample/ora_2_nnnn.log
/u2/oradata/sample/ora_nnnn.ctl

My question is, why would I want the control files on the same mount pt as my
logs?
Aren't .clt files written too often?
And .log files written sequentially?

Wouldn't I just add the entries in my init.ora file for the location of my .ctl
files?
What do you do?



What is a min. number of hard disks to start with Oracle installation

Olaf, January 28, 2004 - 5:36 am UTC

Hi Tom,

we are starting one Oracle Project and I am going to purchase the hardware. It is a low budget project:
Oracle Standart One Edition for one reporting financial application being used intern from 10-15 users.
The server we are going to purchase can have max. of 12 hard disks, all on one controller.
How much disks should I purchase to start?
It should be absolute minimum to guarantee data protection and performance. What is the best way to configure RAID?
Early I worked on big and expensive UNIX boxes and I did not think about one or two disk more.
Now my manager says that "I think too big" and I should really find a good reason to buy one additional disk.

What would you say? How much disks is a minimum and how to organize them in RAID?

Tom Kyte
January 28, 2004 - 8:47 am UTC

one cannot answer this -- no idea how much storage you need.

Misunderstanding?

Olaf, January 28, 2004 - 9:57 am UTC

Tom,

my question was not how much storage we need -we need only 2-3 GB for our data.
It was about how many disks (minimum) I have to purchase to match Oracle recommendation to separate logfiles from data, mirror redologs (using Oracle or HW), separate disk for archivelogs, controlfile an so on.
There are a lot of recommendation. Which are "must be" and which are "well to have".I red one Oracle book from Oracle Press years ago and it was something about ideal installation on 21 disks (!?)

I always worked on UNIX hosts with a lot of disks and had no such problems. Now in my new company I must purchase HW myself and must explain to my boss why do I need more than one disk for Oracle installation. I bought both of your books, but not found how can I start - it is the first thing I have to do - planning the HW. In five years I have been working with Oracle I saw a lot of system with Oracle installed on one RAID5 or total oversized.....

Tom Kyte
January 28, 2004 - 10:24 am UTC

you'd need a lot to do it "mathematically complete".

you need two to do it OK in this case -- few users, not much data. a laptop could do this.

you need two for mirroring logs and control files at least.

how many physical disks

abz, October 11, 2006 - 5:38 am UTC

Ideally, how many disks are required for running an Oracle 10g Release 2 database (ignore the stripping/mirroring for now)
and on which types of files should be on those disks.

My questions is from the perspective of seperating files
based on there types. e.g. undo data files should be on
seperate physical disk, temp should be seperate, redolog should be seperate etc.

I have heard that 10g release 2 has more types of files.


Tom Kyte
October 11, 2006 - 8:13 am UTC

1 is required.

"ideally" is entirely based on your needs.

you need not segregate things by "file type", that is a choice you may (or may not) choose to make.

"it depends"


depth of stripping

abz, October 12, 2006 - 1:43 pm UTC

How can one decide the depth of stripping?
for example for a database of about 400GB, what should
be the depth of stripping.


what factors

abz, October 14, 2006 - 3:12 am UTC

What factors should be considered when deciding
the depth of stripping.

Tom Kyte
October 14, 2006 - 8:14 am UTC

you should ask someone that does disk setup this question :)

Update on SaME stripe size?

Leandro DUTRA, November 07, 2008 - 3:14 pm UTC

The original paper discußed here is from 2000. It specifically mentions the 1MiB stripe size recommendation should be revised as disk technology improves. What would be a reasonable stripe size with current SAS or SCSI drives?

Also, does SSDs change SaME?

Stripe Size

Avi, December 30, 2008 - 3:10 am UTC

Dear Tom,

Somewhere on google (HP forum) I read a document that suggested value of stripe size >=(db_block_size*db_file_multiblock_read_count). In my case db_block_size is 8k and db_file_multiblock_read_count is 16 then value of stripe size should be 128 by above formula.
Is it right to configure storage using this formula? Is it going to improve performance (throuhput)? Also in one HP-Oracle doc recommended value of stripe size was 1MB.
So what should be the idle value of stripe size for above scenario, 128K or 1M?

Thanks in advance.
Tom Kyte
January 05, 2009 - 8:45 am UTC

given that db_file_multiblock_read_count (mbrc) is affected by so many things and in 10g should be defaulted - it doesn't really matter.

You don't set it anymore, we do. And we max it out.

So, maybe the stripe size should be the maximum IO size on your platform....


We rarely are able to do the entire multi-block read. Let us say the mbrc is 16. We go to read blocks 1 thru 16 for your query. Before we can just issue an IO to read from disk 16 blocks - we have to make sure they aren't in the buffer cache already. Let us say that some other process needed block 5 - block 5 is in the cache. We will

a) issue a physical and then logical IO to read blocks 1-4 into the cache.
b) do a logical IO for block 5
c) issue a physical and then logical IO to read blocks 6-16 into the cache.

We would not do a read of 16 blocks from disk, we'd do two IO calls, one to read 4 blocks, one to read 11 blocks.


All drives are one RAID 10 volume

Kurt Look, June 09, 2011 - 10:40 am UTC

For better or worse, all the drives in my server are configured as a single RAID 1+0 volume. So data files, control files, the fast recovery area, etc are all on one single point of failure. Does the obvious good practice of duplexing control files, online redo logs, etc have any application in this environment? It seems like the duplexing simply increases I/O for no benefit. I've already screwed myself by going with the single point of failure.
Tom Kyte
June 09, 2011 - 12:05 pm UTC

... So data files, control files, the fast recovery area, etc are
all on one single point of failure ..

well, I disagree with that analysis. They are all on a thing that doesn't have a single point of failure. It would take multiple failures.

Even if you broke it up into smaller pieces, you'd still have the same sort of availability.

You have stripped mirrors here - you could lose a disk from every mirror group (50% of your devices) and still not lose anything.

... Does the obvious good practice of
duplexing control files, online redo logs, etc have any application in this
environment? ...

human error, protection from some classes of software/hardware defects - yes, there are applications of duplexing here. What if a write fails in one location - or gets corrupt somehow. Having a duplicate (that doesn't have the corruption) would be a good thing.


Multiplex

A reader, November 16, 2012 - 8:50 pm UTC

In your reply to the original question you state:
"I'm not worried about the careless dba, I'm worried about hardward failing."

So can I take this to mean:
1) Multiplexing for the careless DBA is nonsense as you can't multiplex the other files (ie data and temp) that may be accidently deleted.
2) Therefore the only reason to multiplex is where you're interested in recovering all committed data after an incident has occurred.
3) Therefore there is no point in employing multiplexing on a database that is not backed up (eg non-production that is regularly refreshed) because we will never be able to recover anything anyway.

Tom Kyte
November 19, 2012 - 10:15 am UTC

1) we can, ASM does that under the covers...

(and a DBA that erases datafiles or control files or whatever on production is beyond careless...)

2) yes, pretty much - redundancy to protect from physical and logical corruption - a write might go bad to one file - but probably not to all three

3) false - you have instance recovery to deal with - you would lose everything if we cannot perform crash recovery.

Disks unde rUnix

doug, November 20, 2012 - 6:24 pm UTC

Hi Tom,
This is more of a unix question.

lets say I run df -klh and get
Filesystem Size Used Avail Use% Mounted on
/dev/sdba1 36G 17G 18G 48% /
/dev/sdba2 1.9T 1.8T 99G 95% /oracle
/dev/sdbb1 391G 324G 67G 83% /u09

Is there anyway to tell how many physical disks there are?
If /dev/sdba1 and /dev/sdba2 are actually partitions on a disk. Is it possible that that the actual device is a RAID array appearing as one disk?




Tom Kyte
November 21, 2012 - 9:17 am UTC

google how to see physical disks in linux

Chuck Jolley, November 21, 2012 - 11:20 am UTC

Understand though, that if the raids are managed by the controller(s) or are on a san, you may not be able to see them without a vendor provided utility.