Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Anthony.

Asked: February 19, 2003 - 8:55 pm UTC

Answered by: Tom Kyte - Last updated: February 15, 2011 - 8:35 am UTC

Category: Database - Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Hi Tom,

I would just like to ask, how true that you can gain performance when you are using Raw Device instead of a File System.

Regards,
ANTON

and we said...

depends. You could see it go slower, you could see it go faster.

Lets say you are using a unix buffered file system. Your system does more physical IO then it should due to poorly tuned applications. You want to make things go faster. You've heard that "raw is faster" so you move your files to raw. The next day it is REALLY REALLY slow. What happened?

Well, your system was double buffered. There was the Oracle cache, and the OS file system cache. maybe 90% of your "physical io" when using the cooked (non-raw) file system was satisfied via the unix buffer cache. So, only 10% of your physical IO really went to physical disk. When you moved it to raw -- you took that cache away -- now 100% of your physical IO is TRUE physical IO, really goes to disk. You just killed performance (because you tried a quick fix to fix your problem instead of fixing the problem -- the way to fix IO related issues is generally to DECREASE your IO ;)


Conversely, you are using a unix buffered file system. Your system does more physical IO then it should due to poorly tuned applications. You want to make things go faster. You've heard that "raw is faster" so you move your files to raw. The next day it is marginally *faster*. What happened?

Well, you apparently didn't make much use of the unix buffer cache, perhaps your SGA was so large and the PGA's just crowded the unix buffer cache out so 90% of your physical IO was true physical IO. You are seeing the nominal increase in performance of RAW disk IO over cooked.

Conversely, you are using a unix buffered file system. Your system does more physical IO then it should due to poorly tuned applications. You want to make things go faster. You've heard that "raw is faster" so you move your files to raw. The next day it is really really fast. What happened?

Perhaps you removed an IO bottleneck.



In short -- what you need to do is

o determine cause of poor performance
o develop a custom plan to remove that cause


Just moving files -- well, maybe you get lucky once out of one hundred times but more typically it makes it worse, makes is marginally better or does nothing whatsoever.




and you rated our response

  (62 ratings)

Is this answer out of date? If it is, please let us know via a Review

Reviews

raw devices enable kaio

February 20, 2003 - 1:18 pm UTC

Reviewer: volkmar buehringer from germany


on many plattforms raw devices enable
the use of kaio (kernelized async io), which is not possible
on file systems

to remove the overhead of double buffering
a direct mount option for the file system
is sufficent,
but for kaio raw devices or veritas
qio are mandatory, thats the advantage of
raw devices



Tom Kyte

Followup  

February 20, 2003 - 7:05 pm UTC

you missed my point. Here the effect of double buffering is HIDING the terrible effect of too much physical IO. The OS we buffering the data that Oracle believed was a PIO.

So you look at your system and see "oh my, 100,000 pios an hour. That is terrible. I've heard RAW is faster at reads -- let's switch". Bummer -- you really only did 10,000 PIO's an hour cause 90k of them were buffered by the OS for you -- now you are REALLY doing 100k PIO's

So, beware -- DIAGNOSE YOUR PROBLEM first and then apply logical corrections.

I always say, if you watch the LIO's the PIO's take care of themselves.... Reduce the amount of LIO's and you'll most likely reduce the PIO's and then you don't need to remove the double buffering (which mostly affects WRITES, not READS anyway so if you are READ bound not WRITE bound -- kaio isn't going to do you very much good)

My point is -- beware the silver bullet syndrome. There aren't any.

The benefits of raw

February 20, 2003 - 7:23 pm UTC

Reviewer: Connor McDonald from Perth, Australia

Whilst I agree with Tom on not converting to (or from) raw just for the sake of doing so, I am an ardent fan when putting a system together to always go for raw. The reasons being:

a) a PIO as reported as Oracle is very likely to be a real PIO, so you get a true picture of whats going on

b) the buffering is controlled by Oracle so as a DBA I have more control over it, and Oracle should yield smarter buffering, since it knows thing like extent and segment boundaries

c) you typically get some benefits from the OS layer (kernelised aio etc)

d) you do *not* get autoextend. Yep, I see this as a benefit because it forces you to do some decent sizing and planning up front (sadly lacking in many databases)

The historical arguments against raw were always management, admin etc, but nowadays, every OS as a simple gui or equivalent to resize them, move them, create/drop them as if they were file systems anyway.

Tom Kyte

Followup  

February 21, 2003 - 7:38 am UTC

All valid and correct points.

I do see tons of people "afraid of raw" -- like it is "harder". Perhaps that is because the DBA needs SA help to set things up (needs root everynow and again) and they feel a loss of control. Or, they really don't understand what RAW means. A major "fear" for OPS/RAC adopters has always been "oh, i have to use raw". We had to create a cluster file system just to work around that!

To me, /dev/.... looks no different then /d01/oradata, they are just files after all.



Raw Device

February 21, 2003 - 12:04 am UTC

Reviewer: Anthony from Phil.

Tom,

Suppose we are in a process of putting up a system, and suppose almost everything is properly coded (all of the rules/advices/recommendations on the application side given by "T. Kyte" are followed), and we are just figuring out what to use (RAW or FS) in our setup, what would you give? What are the pros and cons of using a RAW as compared to FS?

Regards,
ANTON

Tom Kyte

Followup  

February 21, 2003 - 9:29 am UTC

If you have no fear of RAW, go for it.

If you fear RAW, don't go for it.


With tools like rman to do backups -- raw vs cooked doesn't come into play there (you used to have less "tools" to backup raw -- dd, cpio -- but not "tar" or "cp" for example).

You never touch online redo logs for anything -- so they are safe on raw (and benefit the most from it -- write intensive)...

archives are always turned into cooked files -- so it is not relevant for them.

controlfiles can be a sticky thing -- gotta make a relatively smallish slice for them, make sure they are on separate physical things as well (the 3 copies you have of them)



What is your comment about this DBA?

March 14, 2003 - 7:29 am UTC

Reviewer: A reader

I have a story about using the raw device:

We have an application running very slow. We don't know the reason. Our application team cannot find any clue in solving this problem. The application involves many components such as appliction server, Powerbuilder, 9i Database. Our DBA find that the disk IO of the redo log is very busy. So he said, "maybe we move the redo log files to the raw device and see...". Later, he moves the redo log files to the raw device, and the performance seem improve (but still not acceptable). And then he said, "ok, the performance is improve now, maybe we move the whole database to the raw device and see.."

What is your comments about this DBA?

Tom Kyte

Followup  

March 14, 2003 - 6:02 pm UTC

If the DBA's does anything and you were working in a state of "we don't know the reason" -- well, all I can say is:

Even a blind squirrel occasionally finds a nut

it could:

a) go slower
b) go faster
c) stay the same


but if you just blindly flip switches, you could get the same effect without having to take the system down.

One generally at least has a CLUE before one does stuff.

New best resource

March 23, 2004 - 12:04 pm UTC

Reviewer: Donald Cavin from Portland and Augusta, Maine

This is a forum that rivals Metalink. Also, Tom Kyte is well known in our industry. I will use this from now on as my first resource.

OS Cache Size

May 08, 2004 - 1:08 pm UTC

Reviewer: Reader

How do I find what is the OS Cache Size on Linux or Solaris? Thanks.

Tom Kyte

Followup  

May 10, 2004 - 7:19 am UTC

it'll be all free memory, like CPU, memory cannot be put into the bank and used later, so they tend to use it all and upon an application requesting more -- they'll give it back.



unix buffered file system

April 08, 2005 - 6:34 am UTC

Reviewer: Yogesh from Pune, India

How can we check, whether we are using UNIX buffered file system?

Tom Kyte

Followup  

April 08, 2005 - 8:19 am UTC

ask your SA, it'll vary by OS.

RAW vs Filesystem

May 12, 2005 - 11:05 am UTC

Reviewer: Ramesh from USA

"it'll be all free memory, like CPU, memory cannot be put into the bank and used later, so they tend to use it all and upon an application requesting more -- they'll give it back."

Tom,

This means OS only uses 'left over' memory from the application in the case of FS. Is there any amount memory that should be set aside if we are using FS. Will it affect the performance if we take about 80-85% of available memory for SGA, PGA, applications?






Tom Kyte

Followup  

May 12, 2005 - 1:21 pm UTC

the os only uses that which applications have not allocated.

I'm not sure that I would set aside any memory purposely for file system caching on a database server -- the SGA should cache blocks for us.

May 12, 2005 - 6:17 pm UTC

Reviewer: Ramesh

Thanks Tom. But if the data is going to be double buffered in the FS cache also, doesnt the size of the FS cache matter? If the SGA is typically 1/3rd RAM, shouldnt the FS cache be comparable. (Sorry, my knowledge on those things tend to zero)



Tom Kyte

Followup  

May 13, 2005 - 9:03 am UTC

use the SGA to buffer oracle, not the file system cache. that is the point.


RAW much better performance than FS

May 21, 2005 - 1:03 pm UTC

Reviewer: Syed from KSA

Hello,
I am using RAW on 10g 10.1.0.2 with ASM for almost 1 month and the performance is very good but my DB hangs once or twice in a week, I dont know where the problem was Oracle and HP both involved in this problem many things done but the problem remain same, the last solution for me to do is remove ASM and install custom database on file system. today is the first day after new installation i hope the system will never hang again but i worried with the performance issue, my application is well tuned

Environment is
Oracle DB 10g 10.1.0.2.0
HP-UX 11i 11.23 on Itanium
using SAN

If anybody face the same problem let me know the action please.

Syed

The OS flexibility in using available memory

July 19, 2005 - 10:47 am UTC

Reviewer: naresh from cyprus

Tom (and Connor),

In a cooked FS setup, I think the OS flexibility in using 'available memory' for the FS buffer cache would take care of fluctuating demand for memory from the application programs.

In a raw disk situation, we would have to be careful with the sizing as any fluctuating demands may well cause swapping out the wrong thing from memory (I know we are supposed to plan these things, but it DOES SEEM convenient with a FS).

What are your thoughts on this?


Tom Kyte

Followup  

July 19, 2005 - 11:57 am UTC

it could also be considered unpredictable, sometimes IO means PIO, sometimes IO does not mean PIO.

I'm ambivalent -- you need to see it if fits your needs, your circumstances. As long as you know and understand it (how it works, what it means) and it doesn't just look like "great black magic"...

storage option,

September 06, 2005 - 10:55 am UTC

Reviewer: sns from austin,tx

What do you mean by raw devices, cluster file system and ASM?
What are the differences between them?

What do you call the file system when we are able to add a mount point and assign to any of our tablespaces?

Thanks,

Tom Kyte

Followup  

September 06, 2005 - 4:29 pm UTC

a short excerpt from my forthcoming book:

<quote>
A Brief Review of File System Mechanisms

There are four file system mechanisms in which to store your data in Oracle. By your data, I mean your data dictionary, redo, undo, tables, indexes, LOBs, and so on-the data you personally care about at the end of the day. Briefly, they are

* "Cooked" operating system (OS) file systems: These are files that appear in the file system just like your word processing documents do. You can see them in Windows Explorer; you can see them in UNIX as the result of an ls command. You can use simple OS utilities such as xcopy on Windows or cp on UNIX to move them around. Cooked OS files are historically the "most popular" method for storing data in Oracle, but I personally expect to see that change with the introduction of ASM (more on that in a moment). Cooked file systems are typically buffered as well, meaning that the OS will cache information for you as you read and, in some cases, write to disk.

* Raw partitions: These are not files-these are raw disks. You do not ls them; you do not review their contents in Windows Explorer. They are just big sections of disk without any sort of file system on them. The entire raw partition appears to Oracle as a single large file. This is in contrast to a cooked file system, where you might have many dozens or even hundreds of database data files; a raw partition will appear to Oracle to be a single large data file. Currently, only a small percentage of Oracle installations use raw partitions due to their perceived administrative overhead. Raw partitions are not buffered devices-all I/O performed on them is a direct I/O, without any OS buffering of data (which, for a database, is generally a positive attribute).

* Automatic Storage Management (ASM): This is a new feature of Oracle 10g Release 1 (for both Standard and Enterprise editions). ASM is a file system designed exclusively for use by the database. An easy way to think about it is as a database file system. You won't store your shopping list in a text file on this file system-you'll store only database-related information here: your tables, indexes, backups, control files, parameter files, redo logs, archives, and more. But even in ASM, the equivalent of a data file exists; conceptually, data is still stored in files, but the file system is ASM. ASM is designed to work in either a single machine or clustered environment.

* Clustered file system: This is specifically for a RAC (clustered) environment and provides for the appearance of a cooked file system that is shared by many nodes (computers) in a clustered environment. A traditional cooked file system is usable by only one computer is a clustered environment. So, while it is true that you could NFS mount or Samba share (a method of sharing disks in a Windows/UNIX environment similar to NFS) a cooked file system among many nodes in a cluster, it represents a single point of failure. In the event that the node owning the file system and performing the sharing was to fail, then that file system would be unavailable. The Oracle Cluster File System (OCFS) is Oracle's offering in this area and is currently available for Windows and Linux only. Other third-party vendors do provide certified clustered file systems that work with Oracle as well. The clustered file system brings the comfort of a cooked file system to a clustered environment.

The interesting thing is that a database might consist of files from any and all of the preceding file systems-you don't need to pick just one. You could have a database whereby portions of the data were stored in conventional cooked file systems, some on raw partitions, others in ASM, and yet other components in a clustered file system. This makes it rather easy to move from technology to technology, or to just get your feet wet in a new file system type without moving the entire database into it. Now, since a full discussion of file systems and all of their detailed attributes is beyond the scope of this particular book, we'll dive back into the Oracle file types. Regardless of whether the file is stored on cooked file systems, in raw partitions, within ASM, or on a clustered file system, the following concepts always apply.
</quote>

September 06, 2005 - 5:03 pm UTC

Reviewer: sns from austin,tx

Excellent explanation. Thanks for sharing information with us.


What is the name of your forthcoming book? When it is going to be published?



Thanks,

Tom Kyte

Followup  

September 06, 2005 - 9:16 pm UTC

see home page, link to book and discussions on book....


Oracle .Os system.

March 15, 2006 - 11:44 am UTC

Reviewer: Jaya from Mumbai India.

I think it is time that Oracle builds /upgrade / or customize UNIX ..to avoid ...mismatch that OS and database front creates. Why not create an OS that maximize Oracle db capability minimizing duplicitiy of efforts, complicity of tasks, and maxmization of gain... leading to less confusion and clear cut task and full control at the hand of ORacle DBA.
a clean self sufficient Oracle BOX.

Tom Kyte

Followup  

March 15, 2006 - 5:22 pm UTC

we did that - it was called "raw iron"

</code> http://www.google.com/search?q=%22raw+iron%22+oracle&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org.mozilla:en-US:official <code>



raw iron...

March 15, 2006 - 6:59 pm UTC

Reviewer: Jaya from Mumbai, India..

hope.. oracle ppl are still working on it...

Tom Kyte

Followup  

March 16, 2006 - 7:49 am UTC

nope, did you see the dates on the articles? People didn't want "software in box", they might say they do - but they want to play with the OS.

Maybe Raw Iron was ahead of it's time

March 15, 2006 - 10:04 pm UTC

Reviewer: Chris from Australia

>>we did that - it was called "raw iron"

Did Raw Iron get dumped in 2000?
Can't imagine linux had much of a foothold in the enterprise in 2000. Maybe Oracle should reconsider it sometime soon.

I'd love to pop a DVD into my intel server and have a fully configured DB server 30 minutes later.

Tom Kyte

Followup  

March 16, 2006 - 7:58 am UTC

the concept did not even involve "pop dvd"

the concept was "here is your hardware device that happens to be a database"

sort of like buying NAS. (network attached storage)

this was network attached database.

raw iron

March 16, 2006 - 11:03 am UTC

Reviewer: Jaya from Mumbai, India


----- it is easy to criticize and comments when you are mere a spectator and when you donot know much:-

but if the product is worth use and if it really had shown gain in performance and ease of operation and maintenance, lower costs, then definitely it would have created NICHE of its own.

wonder how they 'GIVE UP' just because of unfavorable response and did not pursue it ahead.





Tom Kyte

Followup  

March 16, 2006 - 2:43 pm UTC

because if it doesn't sell, no one bothers making them.

It is all about supply and demand.

raw iron

March 16, 2006 - 4:01 pm UTC

Reviewer: Jaya from Mumbai India.

that is what the challenge is -
to make the product that is worth use..and the one that will sell.

Its failure provided opportunity to improve and then bounce back...and the cycle continues... but what
is baffling that the approach was simply abandoned.

IF the approach chosen had the flaw then there is always scope for the improvement, or a change of course is also possible.

Might be it does not make business sense, but an opportunity with lot of scope of innovation, enhancement, and improvement was lost.




Tom Kyte

Followup  

March 17, 2006 - 4:52 pm UTC

No, people wanted to be able to log in, configure, change it - sort of defeats the entire purpose.

Bottom line - people said they wanted a database appliance, but they did not.

It is a simple concept - not really too much in the way of "innovation" truly. It just didn't have wings.

raw iron

March 18, 2006 - 9:03 pm UTC

Reviewer: Jaya from Mumbai India

is the need for a more compatible OS felt in the community to suite the Oracle9i or 10g recuirments.. ?? and what, if there are any?

Tom Kyte

Followup  

March 19, 2006 - 6:52 am UTC

what do you mean by a more "compatible"?

This (raw iron) was about a "database appliance". People didn't want the black box - they wanted bits and pieces they put together themselves.

Oracle and current OS..

March 20, 2006 - 10:43 am UTC

Reviewer: Jaya from Mumbai India


so people (..users and one working with oracle...) are satisfied with current OS flavors ...or they feel that certain features are still missing which could have led to more efficient and effective performance of oracle database.. Do they yearn for "Dream OS for Oracle" to get max out from the resources in place ?

to put it in the other way ..
Sometimes it happens that things are found wanting from the end of software (.. in this case ORACLE db server .. ) that runs on OS and sometimes it is the otherway round.
So what is the current scenario ?

Tom Kyte

Followup  

March 22, 2006 - 12:39 pm UTC

they do not want a box that they cannot tweak or play with. That is all. The enjoy for whatever reason "doing it themselves"

I don't know how else to say it. We built a black box (well, it could be in different colors). It was self contained. It was a database appliance. The first thing people wanted to do was telnet into it and change it - you don't do that, it is an appliance. They did not like that.

Raw devices slower than cooked file system

March 23, 2006 - 12:26 am UTC

Reviewer: A reader from FL

Tom,

I'm experiencing slower I/O performance on raw devices when compared to cooked files.

Here are some numbers.

On cooked files:

1 select a.file#, b.file_name, a.singleblkrds, a.singleblkrdtim, a.singleblkrdtim/a.singleblkrds average_wait
2 from v$filestat a, dba_data_files b
3 where a.file# = b.file_id
4 and a.singleblkrds > 0
5* order by average_wait

FILE# FILE_NAME SINGLEBLKRDS SINGLEBLKRDTIM AVERAGE_WAIT
---------- ------------------------------------------------------------ ------------ -------------- ------------
2 /var/opt/apps/oracle/HKR/hkrs/data/undotbs_01.dbf 3868697 22627 .005848739
9 /var/opt/apps/oracle/HKR/hkrs/data/FE_SML_IDX01_01.DBF 164752 1822 .011059046
7 /var/opt/apps/oracle/HKR/hkrs/data/HRMS_LRG_DAT01_01.DBF 138380980 2365480 .017093968
13 /var/opt/apps/oracle/HKR/hkrs/data/FE_LRG_IDX01_01.DBF 91869 1730 .018831162
8 /var/opt/apps/oracle/HKR/hkrs/data/FE_SML_DAT01_01.DBF 194382 3866 .019888673
11 /var/opt/apps/oracle/HKR/hkrs/data/FE_MDM_IDX01_01.DBF 252087 6567 .02605053
19 /var/opt/apps/oracle/HKR/hkrs/data/HRMS_DEFAULT_01.DBF 28573 787 .027543485
5 /var/opt/apps/oracle/HKR/hkrs/data/HRMS_MDM_DAT01_01.DBF 40302 1174 .029130068
10 /var/opt/apps/oracle/HKR/hkrs/data/FE_MDM_DAT01_01.DBF 348298 10446 .029991559
12 /var/opt/apps/oracle/HKR/hkrs/data/FE_LRG_DAT01_01.DBF 90953 3004 .033028047
15 /var/opt/apps/oracle/HKR/hkrs/data/pdk_idx_01.dbf 2807 99 .03526897
6 /var/opt/apps/oracle/HKR/hkrs/data/HRMS_MDM_IDX01_01.DBF 36223 1816 .050133893
3 /var/opt/apps/oracle/HKR/hkrs/data/HRMS_SML_DAT01_01.DBF 9031 841 .093123685
14 /var/opt/apps/oracle/HKR/hkrs/data/pdk_dat_01.dbf 1495 184 .123076923
17 /var/opt/apps/oracle/HKR/hkrs/data/TOOLS.DBF 92653 12521 .135138636
4 /var/opt/apps/oracle/HKR/hkrs/data/HRMS_SML_IDX01_01.DBF 4746 852 .179519595
18 /var/opt/apps/oracle/HKR/hkrs/data/HRMS_LRG_IDX01_01.DBF 6702613 1214875 .181253938
1 /var/opt/apps/oracle/HKR/hkrs/data/system01.dbf 38931 9241 .237368678
16 /var/opt/apps/oracle/HKR/hkrs/data/USERS.DBF 25 8 .32


Same application running on raw devices:

FILE# FILE_NAME SINGLEBLKRDS SINGLEBLKRDTIM AVERAGE_WAIT
---------- ------------------------------------------------------------ ------------ -------------- ------------
5 /var/opt/apps/oracle/dflinks/hkrp/rdf500m_03 129111 140749 1.09013949
9 /var/opt/apps/oracle/dflinks/hkrp/rdf01_05 227882 253296 1.11152263
11 /var/opt/apps/oracle/dflinks/hkrp/rdf500m_02 88637 99182 1.11896838
8 /var/opt/apps/oracle/dflinks/hkrp/rdf01_04 102435 115199 1.12460585
13 /var/opt/apps/oracle/dflinks/hkrp/rdf500m_07 9846 14699 1.49289051
10 /var/opt/apps/oracle/dflinks/hkrp/rdf500m_01 55445 88511 1.59637479
19 /var/opt/apps/oracle/dflinks/hkrp/rdf02_04 121080 216098 1.78475388
12 /var/opt/apps/oracle/dflinks/hkrp/rdf500m_06 4448 9160 2.05935252
4 /var/opt/apps/oracle/dflinks/hkrp/rdf500m_05 112358 244539 2.17642713
1 /var/opt/apps/oracle/dflinks/hkrp/rdf01_01 138279 311761 2.25457951
6 /var/opt/apps/oracle/dflinks/hkrp/rdf02_01 8712637 20397685 2.34116089
7 /var/opt/apps/oracle/dflinks/hkrp/rdf02_02 6627950 15703779 2.36932671
22 /var/opt/apps/oracle/dflinks/hkrp/rdf500m_10 2255324 5513000 2.44443814
17 /var/opt/apps/oracle/dflinks/hkrp/rdf01_03 582815 1430976 2.45528341
3 /var/opt/apps/oracle/dflinks/hkrp/rdf500m_04 101689 251326 2.47151609
20 /var/opt/apps/oracle/dflinks/hkrp/rdf500m_09 3461420 8981033 2.59460944
23 /var/opt/apps/oracle/dflinks/hkrp/rdf08_03 11049732 32245017 2.91817186
16 /var/opt/apps/oracle/dflinks/hkrp/rdf01_02 477307 1414603 2.96371727
14 /var/opt/apps/oracle/dflinks/hkrp/rdf02_05 886913 2628941 2.96414755
21 /var/opt/apps/oracle/dflinks/hkrp/rdf01_06 3186129 9482714 2.97624924
2 /var/opt/apps/oracle/dflinks/hkrp/rdf08_01 64054 191511 2.9898367
24 /var/opt/apps/oracle/dflinks/hkrp/rdf08_04 2668405 8048550 3.01624004
18 /var/opt/apps/oracle/dflinks/hkrp/rdf02_03 151941 471125 3.10071014
15 /var/opt/apps/oracle/dflinks/hkrp/rdf500m_08 1319 41288 31.3025019



Any clues as to what could be causing the big average wait difference?

Thanks.


Tom Kyte

Followup  

March 23, 2006 - 10:25 am UTC

umm, read the original answer - it is precisely what I described????

March 23, 2006 - 1:35 pm UTC

Reviewer: Oraboy

this may be slightly off-topic..

Would you say , "raw-iron" (database appliance) to Teradata is a correct comparison? (not that I am going to run the test)

Just asking since I have heard "Teradata" comes with its own box and in-built OS and thats why it conquered DW market.

Its still kinda funny for me to think " Database not performing fine..Lets create a TAR and get the yellow cable inside replaced" ;-) For this reason, I would really like to see how Teradata DBAs work and diagnose the problems


Tom Kyte

Followup  

March 23, 2006 - 1:59 pm UTC

No, raw iron was inexpensive commodity based hardware with open systems software.

Not proprietary.


teradata can come on its own proprietary platform - doesn't have to.

Raw Vs ASM

July 26, 2006 - 6:10 am UTC

Reviewer: Lakshmi from Bangalore, India

Can you please tell the differences between ASM and Raw?

Note: We are using Raw on oracle 9i currently for our application developement

Tom Kyte

Followup  

July 26, 2006 - 11:19 am UTC

ASM gives you a "file system look and feel". It provides redundancy if you want it and striping.

RAW is just a device, you add the device and the entire device appears to be a file to us.

With ASM you could take tens or hundreds of RAW devices and create a "file system" on it and create as few or as many 'files' as you like

ASM has the performance characteristics of RAW coupled with the manageability features of a file system.

Thanks Tom !

July 28, 2006 - 12:10 am UTC

Reviewer: A reader


How to copy one datafile (15Gb) from one RAW to another RAW

October 14, 2006 - 8:36 pm UTC

Reviewer: Star Nirav from Land of Great India !!!

Hey TK,

Can you help me to move one datafile from one RAW device to another RAW device (Both are Logical Volume) as Phy I/O is increasing and we want to avoid that.

We are using AIX 5.3 with ML5 and having SAN.

Regards,
Star Nirav

Tom Kyte

Followup  

October 15, 2006 - 4:01 am UTC

Hey "SN"

how would moving a file decrease physical IO.



Will use either tables move or the datafiles move

October 17, 2006 - 10:06 am UTC

Reviewer: Star Nirav

Hi TK,

Either we will go for moving tables (which are big) to another new independent tablespace (created on raw) or move some datafiles from one raw to new raw.

Want to know which one would be the best and easier, kindly explain.

regards,
Sn

Tom Kyte

Followup  

October 17, 2006 - 10:11 am UTC

Star Nirav

why do you keep calling me "TK" - should I just call you "SN"?


You did not answer my question.

How would moving a file from "raw partition 1" to "raw partition 2" decrease your physical IO.




Ok... Now will call u as Tom only.

October 20, 2006 - 5:12 pm UTC

Reviewer: Star Nirav from India

Hi Tom,

Sincerely apologies if I have hurted / offended you. I heard that / believe that in the US and the Europe, people loved to be heared with the short names.

Sir, it would be all pleasure of mine if you can call me either SN or Star Nirav... I dont have any problem as far as I am clearing my doubts here and making my concepts clear.

Even i hope that I am not offending you by calling TK (because I believe TK means Technically King)

---
As you asked... How would I decrease i/o

Sir, If my one raw volume is having 2 tablespaces with 500 objects and simultaneously needs to be accessed then I would go for moving my some objects to another raw partitions... Correct ?

If it is correct then pls tell me how would i move from one to another raw partitions, as I heard that only RMAN can do this thing. Normal CP command (AIX) will not copy the files from raw device. Not sure about dd and cpio so please describe the other workarounds.

Thanks
Star Nirav

PS : However we are not in the one organisation but I strongly believed you as my mentor.

Tom Kyte

Followup  

October 20, 2006 - 5:54 pm UTC

Why would you call "u" Tom? Would not the person named "u" get upset by that?


If you move stuff you read from DISK A to DISK B, how will that reduce the number of times Oracle must perform physical IO?

Think about this - if you did 100 IO's against the file on the raw partition "A", and you move the data from raw partition A to raw partition B, you will do.....

100 IO's against B.

Now, unless "A" is a logical disk and the raw partition was just a small slice of it - and there was much CONTENTION on the logical disk that underlies "A" and by moving to "B" you can reduce contention - you might DECREASE IO TIMES, but you won't..... decrease IO.

Many things can move raw partitoins, including the unix command dd. However before doing so, you will get someone qualified (eg: has experience with this stuff) to do it WITH YOU.

Hmmm... !!! But still not completely convinced.

October 23, 2006 - 4:37 pm UTC

Reviewer: Star Nirav from India

Hi Tom,

What i said that if my one raw-partition is constantly having 100 % phy. IO then If I move some of the objects to the another raw-partition then there are chances to decrease the IO upto certain level. Right ?

And as datafile movement is very critical in raw-volume, can you suggest to move objects to another tablespace (creating new tablespace in new raw-volume) ?

Waiting for your expert advice / view on this !!!

Tom Kyte

Followup  

October 23, 2006 - 5:32 pm UTC

if you were doing 500 physical IO's yesterday on disks A and B
and you move data from A to C

I would hazzard a guess that you will be doing 500 physical IO's tomorrow.


think about it, how can MOVING data from "A" to "C" reduce the number of times Oracle must READ that data???????? It will just have to read it from somewhere else!


so, maybe you are confusing some terms or something, but you will not convince me that by moving data from A to C that you will reduce IO's - move them perhaps, but not reduce them.

Lets take an example to make you understand.

October 23, 2006 - 5:49 pm UTC

Reviewer: Star Nirav from india

lets assume that mine one tablespace is having 500 tables and those tables are big enough.

also pretend that I am having indexes on the same tablespace for those tables. (in daily operations, I require 300 tables so If i move to another tablespace (which would be dedicatedly used for those).

So if i move some tables and indexes to another tablespace, will i get the benefit or not ???

Hope I am now able to make you understand... !!!


Tom Kyte

Followup  

October 24, 2006 - 12:29 am UTC

sigh....

no, you best explain why you think that by MOVING something, you would reduce the number of times we need to do physical IO's????????????????????

do that, and I'll respond.

keep saying "but how can moving things NOT reduce physical IO" and I'll ignore it.

Reducing I/O vs. Spreading I/O

October 24, 2006 - 3:32 am UTC

Reviewer: Billy from Cape Town, Z.A.

Star Nirav (from india) wrote:

<snipped>
> So if i move some tables and indexes to another
> tablespace, will i get the benefit or not ???

Benefit of what?

I/O is the result of a need for data. For example, the need for data when UserA says "I want all rows from EMP with surname SMITH".

Now you move the table containing that data from volume 1 to volume 10 (be that a cooked or raw volume).

Have you now stopped UserA from wanting that data?

No, UserA still wants that data - unless you have hit UserA over the head with the trusty old lead pipe in the meantime.

And this is what Tom has been trying to tell you, seemingly in vain, all along - MOVING data does NOT reduce I/O.

Hitting users with a lead pipe, however can reduce I/O. (but then you need to be a card carrying member of the Scorched Earth Party for this to be legal)

Moving data simple spreads I/O. It moves I/O from one disk, raw device, file system, volume, slice, whatever, to another.

Moving data therefore does not reduce I/O, it speads I/O.

Yes, spreading I/O can alleviate I/O bottlenecks caused by an I/O channel/disk/device being overloaded due to processes asking for more I/O per second that what it can service.

In which case something like ASM is an excellent answer as this perform automatic I/O load balancing for you with the database instance up and running.

October 24, 2006 - 9:15 am UTC

Reviewer: Alexander the ok

Maybe Star means striping?

Tom Kyte

Followup  

October 24, 2006 - 9:35 am UTC

Maybe Star should say what Star means - I tried to make that point many many reviews ago myself:

...
Now, unless "A" is a logical disk and the raw partition was just a small slice
of it - and there was much CONTENTION on the logical disk that underlies "A" and
by moving to "B" you can reduce contention - you might DECREASE IO TIMES, but
you won't..... decrease IO.
.........

but he is pretty sure that it'll decrease IO for whatever reason.

Striping?

October 24, 2006 - 9:35 am UTC

Reviewer: Billy from Cape Town, Z.A.

So he's actually saying his hovercraft is full of eels?

Solid State Disks

October 31, 2006 - 1:46 pm UTC

Reviewer: HK from UK

Tom,

Basic questions:

1. What are Solid State Disks (SSD)?
2. How are they useful?
3. When would you recommend using SSD?

Many thanks in advance.

Regards,

HK

Tom Kyte

Followup  

October 31, 2006 - 4:05 pm UTC

they are "ram disks" in short. Big banks of memory.


I would hardly ever recommend them right now, quite expensive. Most of the advantages can be obtained by a bit of tuning and a nice cache in front of the physical disk.

so, still a bit of a "niche" offering.

Raw vs Veritas storage foundation

November 07, 2006 - 8:48 am UTC

Reviewer: Clem from Johannesburg, South Africa.

Hi Tom,

I enjoy your comments and concur with most of them. I do however have a question for you though. With the age old story of RAW vs file system, I would like your thoughts on Oracle running on Storage Foundation vs RAW.

What is the benefit vs the cost.


Tom Kyte

Followup  

November 07, 2006 - 4:40 pm UTC

almost zero on the cost these days in 2006.

the benefits are the ones you see, people are "afraid of raw", not afraid of file systems.

File system performance for Oracle

November 10, 2006 - 8:00 am UTC

Reviewer: A reader

Hi Tom

We run solaris 10 ufs filesystem and 10R2 without any patch, but the behaviour is really strange, we I use dd or cp to test file system, the average writing speed is about 15 MB per second and the io busy time is about 90 percent, but when I create a table, like create table x (a char(2000), b char(2000),c char(2000)), and then insert x select * from x, the writing speed is only about 1.5 MB per second and the busy time is 100 percent and io wait comes out.
why is Oracle io performance quite slowly compared to dd or cp? and how to improve it?

Thanks you!!

Tom Kyte

Followup  

November 10, 2006 - 9:06 am UTC

insert writes to the buffer cache. not to disk. Why do you think this is "slow"??

if you want to test "direct IO" use insert /*+ append */ and have it just bypass the buffer cache.

File system performance for Oracle

November 10, 2006 - 10:55 am UTC

Reviewer: A reader

Hi Tom

In awr report, the 4 of 5 top events are system I/O class, db file parallel write, log file parallel write, control file write, log sync, both average time and total time of these events are very high. As I mentioned what dd does is must faster then dbwr, I believe, the more important thing is that dd didn't make i/o 100% busy at 10+MB/sec, but Oracle make I/O 100% busy only at 1+MB/sec. What should I do now? and what causes the dd and dbwr's behavior are so different?

Thanks Tom!!

Tom Kyte

Followup  

November 10, 2006 - 2:46 pm UTC

but clients do NOT wait for db file parallel write, log file parallel write, control file parallel write.

clients do wait for log file sync, every time you issue COMMIT.


You are missing the point here, you CANNOT compare DD (a big sequential IO thing - large reads, large writes - all "contiguous" and "sequential") with DBWR - which does SINGLE BLOCK IO, OUT OF A CACHE, AS NEEDED, SLOWLY, OVER TIME.

If you were waiting for free buffer waits - then DBWR could be the bottle neck, but you are not.

your log file sync waits come from LGWR and LGWR also does pretty small IO's, not big huge "write this 50meg of data please"


Think about it, dd is not even comparable to the IO patterns a database experiences.

The only wait you listed above so far that affects RUNTIME PERFORMANCE of your insert is log file sync, and you cannot avoid that guy - it happens upon every commit.

DBWR != dd

November 10, 2006 - 12:39 pm UTC

Reviewer: Kevin Closson from OR

"What should I do now? and what causes the dd and dbwr's behavior are so
different?"

...Not that Tom needs the help, but since I was about to blog on this very topic and saw this, I thought I’d throw this post out there. A dd(1) I/O profile is nothing similar to DBWR. DBWR's job is much more "difficult" than dd(1). DBWR assembles a list of modified SGA buffers nearly by age alone. These buffers are never adjacent on disk and very seldom bound for the same file descriptor. Essentially, it is a set of random writes to a random set of datafiles. That's not bad, that's just how it is. DBWR does not do mult-block writes.

On the other hand, dd(1) has a simple, re-used buffer equal to the size of the bs command line option in the process data segment (heap). Its writes are sequential and large.

The closest thing in Oracle that mimics a dd(1) I/O profile are direct path writes which can be multiblock and sequential.

I recommend Orion in place of dd(1) for studying sequential writes as they pertain to Oracle.

There is more about things like Orion at:

</code> http://kevinclosson.wordpress.com/ <code>



Tom Kyte

Followup  

November 10, 2006 - 2:56 pm UTC

thanks Kevin - what I was trying to say exactly - even told them above to do direct path operations if they want to even slightly (even though they are not entirely comparable either) compare dd to Oracle writes.

Solaris dd and ufs

November 10, 2006 - 5:59 pm UTC

Reviewer: Roderick

There's also a chance that 'dd' on Solaris just writes to the file system buffer cache by default. Using the 'truss' utility, I did not notice anything like an O_DIRECT flag being set by dd on the output file [I believe 'dd' on Linux does have an option to specify a o_direct flag to bypass the file system buffer cache].

DBWn in contrast makes sure all writes go directly to disk.

So in that vein, it may be slightly fairer to compare INSERT into a large Oracle buffer cache with dd into a potentially large file system cache. DBWn might be more similar to a fsflush process.

As Tom and Kevin say, Oracle does more work by nature to keep data better protected.

I wonder what happens if two different dd processes run at the same time with different if=... but the same of=... parameters?

dd or oracle

November 10, 2006 - 9:29 pm UTC

Reviewer: Kevin Closson

I see the thread is about ufs ..they could do a -forcedirectio mount option and that will force the dd I/Os to be direct... as we are all saying, though...still wont look like DBWR...will look like direct path writes though

File system performance for Oracle

November 11, 2006 - 6:06 am UTC

Reviewer: A reader

Thanks Tom
Thanks Kevin Closson
I know dbwr's jobs belong background, it can't say what clients are waiting for, but log file synch should be, the average time of it is about 30ms, and system will spend more than 500 seconds on waiting for this in 1 hour, I think commit wait or log file synch should be caused by several things, latch free, log file parallel write etc., in my case I believe parallel write is the root cause. I discussed with one sun engineer, he said that system is capable of handling 10-30 MB per second, this can be approved by dd or cp, so the reason why system slow is that oracle can't submit big I/O jobs to system efficiently. so how can I simulate oracle's behavior with solaris built-in commands? we are not allowed to install bonnie or whatever. I believe raw or directio will be better, if solaris mounts a partition with direct I/O, should I change filesystem_options to directio as well? how about disk_asynch_io parameter?

Thanks !

Tom Kyte

Followup  

November 11, 2006 - 8:16 am UTC

log file sync is the wait you experience while waiting for lgwr to flush the redo log buffer - period.

Before we go any further in this thread - I need to you provide a complete technical comparison point by point of

o toaster ovens
o bananas

Because they are AS COMPARABLE AS COMPARING DD/CP to LGWR. Period. I don't know how else to tell you this.

Reduce the number of times you issue commit (are you loading and committing every N records - then you are doing this yourself)

Ask the sun engineer - what is the average time to write a random block somewhere on disk - that is a log file parallel write.

I do believe Kevin above told you how to simulate "database like IO" and if you cannot install anything on what apparently is a benchmark box - time to get a better work environment I think.

File system performance for Oracle

November 13, 2006 - 7:43 am UTC

Reviewer: A reader

Hi Tom

After I did more test, I found that the difference between oracle behavior and dd is that the each io job size. The number of io operations are almost the same, each io job size of dd is more than 100KM+ but oracle is exactly 8K! so how to increase job size of oracle?

Thanks Tom!

Tom Kyte

Followup  

November 14, 2006 - 3:58 am UTC

stop, go back, re-read all of this stuff please.

And stop stop stop trying to compare Oracle IO to dd, not until you give me that complete technical comparison point by point of

o toaster ovens
o bananas

Kernel parameter (dbc_max_pct etc.)settings for disk cache

December 29, 2006 - 1:23 am UTC

Reviewer: Vishal Goyal from India

Hi Tom,

Do the kernel parameter for disk cache like dbc_max_pct, dbc_min_pct etc have any effect on database using raw devices ?

Regards

raw devices for redo logs

January 19, 2007 - 2:03 am UTC

Reviewer: A reader

Hi Tom,

is there any know issues/problems per you experience in using raw devices as redo log files.
one of my in-house expert says that there is no way we should use (infact he says - we can not use raw) raw devices for redo log. I have read and read the documents on backup/recovery ,concept guide but could not find a good reason of this.
could you please share your thoughts on this.


April 02, 2009 - 5:37 am UTC

Reviewer: A reader

hi
Why should one single Raw Device must hold only one datafile?
What is the purpose of this?
Tom Kyte

Followup  

April 02, 2009 - 10:08 am UTC

well, do you understand the concept of a raw device? It is raw, not cooked (not making a joke...). There are no "files on it", it is just a bunch of blocks. You either

a) use the device
b) do not use the device

you can take a raw device and partition it into as many smaller devices as you want, but at the end of the day - a raw device is just a bunch of blocks - and you give that device to something and that something either uses it - or not.

Raw device = one datafile

April 03, 2009 - 4:20 am UTC

Reviewer: krzysztof.wlasiuk from Poland

I think that last reader mean, why ORACLE decide not to let user say that :
- datafile1 will be in /dev/raw1 from 1 to 500 block
- datafile2 will be in /dev/raw1 from 501 to 1000 block
and so on

Informix database can do it, but ORACLE can NOT.

Maybe before LVM it was an issue, now it is only detail, even not very interesting one.

Am I clear ?
I hope so :)
Tom Kyte

Followup  

April 03, 2009 - 7:48 am UTC

because it would be of no real benefit and it would be a maintenance nightmare as no one knows who has what - a device is a device, you either use it - or not.

if you want to do that, you would partition the device into two logical partitions - so now everyone on the planet knows "there are two separate and distinct things here"

April 03, 2009 - 4:59 am UTC

Reviewer: A reader

Hmmm
Thanks Tom
I read the whole read but still didnt get the advantage of raw device..
What is the main advantage of raw device comparing to other storage options?
Tom Kyte

Followup  

April 03, 2009 - 7:51 am UTC

today, in the year 2009, it would be highly unusual to use a raw device directly.

you would give it to a LVM (logical volume manager), preferably ASM (automatic storage management, a database file system) and make it 'easier to manage and use'

in the olden days, RAW was typically the only way to get an unbuffered file system, today we can usually mount any file system we want unbuffered - giving us the speed of raw (the database buffers, we don't need the OS to buffer for us, it is faster if only one of us buffers at a time) with the ease of use of cooked file systems.

April 03, 2009 - 1:35 pm UTC

Reviewer: Alex

Can you explain why it is faster to have only one doing the buffering?

I would think that would be an added "bonus" to have pseudo PIO's available.

If it's not found in the buffer cache, goto the filesystem cache and do a fake PIO. Then, if not found do a real PIO.

Or is it because to the database, going to the fs cache is no different then real PIO's, the trip has been made.
Tom Kyte

Followup  

April 03, 2009 - 3:12 pm UTC

you would just make the database buffer cache bigger - then you don't need that "bonus"

Say you have 4gb of ram.

You could either

a) give 3.5gb to the SGA to buffer data and use raw
b) give 1gb to the SGA to buffer data and use buffered IO


Ok, when you use raw, the algorithm to read is really really simple, it is:

a) perform PIO

that's it, done, simple, fast. We had to look in the buffer cache first - but you'd have to do that regardless. So, to do a PIO (which we'll do LESS OF because we have a big cache - it is simple

Now, switch to case 2, the algorithm to read is a tad more complex

a) look in the file system cache (we just spent cycles looking in buffer cache, here we are again, looking in yet another cache)
b) if we find it, great, else do physical IO


so, step (a) is totally "extra work", we don't need to do step (a). In fact, it is even worse than it sounds. In order to do a PIO we are doing a system call (context switch). If you just make the buffer cache bigger - you avoid that overhead as well.

So, raw+bigger cache = less PIO's and less system calls (context switching) and less copy this from point 1 to point 2.

versus double buffering.


In short, you give the memory to the SGA instead of the file system cache and you do less PIO altogether.

April 06, 2009 - 1:22 am UTC

Reviewer: A reader

hi Tom
Can we extent the datafiles in raw device?
Tom Kyte

Followup  

April 07, 2009 - 6:08 am UTC

sure, you can create a datafile that is smaller than the raw device, and then let it grow

but - one would ask the reasonable question of "WHY" in most cases... Unless you have a disk manager that is letting you grow partitions I guess.

raw

April 13, 2009 - 8:58 am UTC

Reviewer: A reader

Tom:

can you explain briefly what is a "raw device"? Is this thread about storing files inside a database versus a filesystem.
Tom Kyte

Followup  

April 13, 2009 - 5:31 pm UTC

in general, a raw device is a disk (logical or otherwise) that is not formatted as a file system.

it is just a disk, we (anyone) can do IO operations on it, but there is no file system - no files, you cannot "dir" it, you cannot "ls" it. If you want to use this raw device in your program, you must manage the storage on it (there is no concept of a file on this device, the device is just a bunch of storage). You have to remember where you put the bits and bytes if you ever want to see them again.

April 14, 2009 - 3:24 am UTC

Reviewer: Scofield

hey TOM

In your opinion;What is the major advantage of using raw device rather than file system?
Tom Kyte

Followup  

April 14, 2009 - 11:20 am UTC

In the year 2009 - you would be using ASM, giving ASM the raw device and letting it create a database file system out of it.

You wouldn't be using raw directly, you'd be using raw to create a database file system.

And if you want to keep using regular old file system files - you can mount those typically in direct mode, giving you the performance boost of raw (no file system cache to get in the way) and the ease of use of a file system.

So, I wouldn't be using them directly, there would be a file system on top to make them a bit more manageable.

May 28, 2009 - 4:52 pm UTC

Reviewer: A reader

How to identify whether file layout is RAW or cooked ? .

What is the advantage of ASM over other third party VMs ( like EMc,... )
Tom Kyte

Followup  

May 28, 2009 - 5:29 pm UTC

Ask your DBA (if you have to ask... You are not responsible for touching the database....)

In short, if you can "ls -l some files that are database files" you are cooked.
If you don't see any files, you are raw.


ASM comes with the database
ASM doesn't require additional software to be purchased.
ASM does database stuff only, it is optimized to be a database file system and nothing else
ASM provides clustered access without any additional software to be purchased
ASM provides the DBA with management features that the DBA should have - not the "SA", the DBA

Log & control file in RAID 0+1 - Reg

June 02, 2009 - 1:45 am UTC

Reviewer: Thiru from B'lore, India

Is it good to keep log & control file in RAID 0+1? I personally think it is not required as database itself maintains redundant copies. Any advice?
Tom Kyte

Followup  

June 02, 2009 - 7:32 am UTC

if a control file goes "bad", the database shuts down - it would be good to make sure the control files are highly available - hardware redundancy is good for that.

log files are very important - if you can make them highly available (hardware protected) and redundant (protects you from a single bad write to a single file - instead we do the same write to many files - the odds that all of the writes are 'bad' is much less than a single write) - all the better.


disk is cheap
data is priceless

July 11, 2009 - 10:11 pm UTC

Reviewer: A reader

Hi Sir.
If we are installing a single instance database on linux,Which filesystem do you suggest? Raw or Asm or Cooked?
Tom Kyte

Followup  

July 14, 2009 - 5:24 pm UTC

starting fresh - go with ASM.

afraid of ASM - go cooked.

How to identify raw..?

September 02, 2009 - 8:09 am UTC

Reviewer: Jatin from Delhi, India

Hi Tom

How do I identify if raw is being used when disks (principally) are said to have both a block special device and a character special device associated with them. As for example:

/dev/vg01 #ls -l | grep eai
brw-------   1 root       sys         64 0x01001b Oct 10  2008 eai-arch
brw-------   1 root       sys         64 0x01001c Oct 10  2008 eai-ctrl
brw-------   1 root       sys         64 0x01001d Oct 10  2008 eai-data01
brw-------   1 root       sys         64 0x01001e Oct 10  2008 eai-dump
brw-------   1 root       sys         64 0x01001f Oct 10  2008 eai-index01
brw-------   1 root       sys         64 0x010020 Oct 10  2008 eai-log
brw-------   1 root       sys         64 0x010021 Oct 10  2008 eai-rbs
brw-------   1 root       sys         64 0x010022 Oct 10  2008 eai-sys
brw-------   1 root       sys         64 0x010023 Oct 10  2008 eai-temp
crw-------   1 root       sys         64 0x01001b Oct 10  2008 reai-arch
crw-------   1 root       sys         64 0x01001c Oct 10  2008 reai-ctrl
crw-------   1 root       sys         64 0x01001d Oct 10  2008 reai-data01
crw-------   1 root       sys         64 0x01001e Oct 10  2008 reai-dump
crw-------   1 root       sys         64 0x01001f Oct 10  2008 reai-index01
crw-------   1 root       sys         64 0x010020 Oct 10  2008 reai-log
crw-------   1 root       sys         64 0x010021 Oct 10  2008 reai-rbs
crw-------   1 root       sys         64 0x010022 Oct 10  2008 reai-sys
crw-------   1 root       sys         64 0x010023 Oct 10  2008 reai-temp



And... that here is where all my relevant database files are:


/dev/vg01 #bdf|grep eai

/dev/vg01/eai-temp 2048000   69270 1855067    4% /data/eai/temp
/dev/vg01/eai-sys  3080192  175943 2722741    6% /data/eai/sys
/dev/vg01/eai-rbs  2048000 1435222  574487   71% /data/eai/rbs
/dev/vg01/eai-log  1032192  308560  678412   31% /data/eai/log
/dev/vg01/eai-index01
                   5128192 2050377 2885459   42% /data/eai/index01
/dev/vg01/eai-dump 2048000 1234088  763054   62% /data/eai/dump
/dev/vg01/eai-data01
                   20480000 15363464 5036600   75% /data/eai/data01
/dev/vg01/eai-ctrl 1032192   43166  927219    4% /data/eai/ctrl
/dev/vg00/eai-archive
                   5242880    2385 4912972    0% /data/eai/archive
/dev/vg01/eai-arch 2048000    1606 1918502    0% /data/eai/arch

December 19, 2009 - 7:06 pm UTC

Reviewer: Scofield

Respected Sir:

Effective Oracle by design page:449
You mentioned that:
"Unless you mounted your disks using directio or are using raw partitions, the o.s will cache your data....
....because a read from the operating system file system buffer will complete faster than a physical IO operation"


What I want to ask is:

Why raw device is recommended, if read from the operating system file system buffer will complete faster ?
What is the benefit of raw device in this case?


Tom Kyte

Followup  

December 20, 2009 - 9:07 am UTC

you know what is faster than a read from the OS filesystem cache?

A read from the buffer cache.

It would be preferable to find the data we need in:

a) the buffer cache
b) the flash cache (new in 11g Release 2)
c) the OS filesystem cache
d) on disk

in that order. Now, if you make the buffer cache larger - then we'll never get to (c) and (c) is much much slower than (a)

(a) would be preferred. So, make the cache large enough to account for the secondary SGA effect when you switch from cooked to raw (ASM would be preferred actually, it is raw, but easier/more flexible to manage database stuff with)

January 11, 2010 - 1:56 am UTC

Reviewer: A reader

Respected Sir;

Is this right?

Oracle first looks the data in buffer cache,
if it cant find, it will search for os filesystem cache
if it still cannot find in os cache, it will do disk access.


Tom Kyte

Followup  

January 18, 2010 - 6:15 am UTC

we don't search the filesystem cache. the filesystem does that.

Assuming no flashcache (new 11g R2 feature), we

a) take the DBA (data block address, (block#, file#) and hash it.
b) we search a list in the cache identified by that hash for this block
c) if we don't find it in cache we do physical IO. Physical IO might come from OS filesystem cache or not, we don't know, we don't care, we don't influence that at all.
d) we put block into that list in the cache and return it.

January 23, 2010 - 1:11 am UTC

Reviewer: A reader

Respected Sir;
Thanks for your reply.

Assume raw device. As far as I know there is no "os filesystem cache" in raw device system.

c-)In this case,if we dont find it in cache, it will always do phyical io to disk. (never os filesystem cache)

What is the advatage of raw device then?
(It should be worse to use raw device since no filesystem cache)

Tom Kyte

Followup  

January 25, 2010 - 5:56 am UTC

the buffer cache is better than the filesystem cache.

if you are not using a cooked file system, you would give that memory to the buffer cache. Hence, you would be caching everything that is possible to cache - with no need for the slower OS filesystem cache.

Cooked Redo Logs and ACID

May 13, 2010 - 10:56 am UTC

Reviewer: Chris Curzon from Monrovia, CA USA

Hi Tom:

Two questions.

First, one source I read says that online redo log files are always cooked. Is this correct?

Second, if the online redo log files are cooked, AND if there is an OS file system cache, this could undermine the D of ACID. I think this scenario would be possible:
1) Oracle writes the commit record to the redo log.
2) The OS responds that the write was successful (because it was successfully written to the filesystem cache).
3) So Oracle thinks the data was properly committed.
4) The system crashes before the data in the filesystem cache gets physically written to disk.
Net result, committed data is not actually committed.

Although a very slim possibility, is this understanding correct?

How to move from RAW to normal file system

June 02, 2010 - 1:15 pm UTC

Reviewer: A reader

How does one move from oracle on RAW device to normal file system.

Is it as simple as using dd command to copy from raw to normal FS?
Tom Kyte

Followup  

June 08, 2010 - 9:59 am UTC

could be, it depends on the OS - sometimes there is a header block you have to skip over.

Since you are going to be moving a ton of bytes anyway from disk to disk, why not

a) create new tablespace
b) alter move/rebuild from old tablespace to new
c) drop old

get a "reorg" for free if you will...

RAC Problem

June 26, 2010 - 3:06 am UTC

Reviewer: KD from INDIA

Can we configure RAC without raw device, if yes than what effect of production database performance.
if no than give me the perfect reasion
Tom Kyte

Followup  

July 06, 2010 - 10:54 am UTC

yes, absolutely, most people do - using ASM or some other clustered file system from a 3rd party.

perfect reason for "what"???

using raw devices

February 15, 2011 - 2:30 am UTC

Reviewer: A reader

On block devices I/O rate is very good like in MB/s
But on RAW device of the same block device the I/O rate is like 200 KB/s

Is it a normal behavior for raw devices when we use dd command

the main problem that we face a slowness when we make import to our oracle DB
Tom Kyte

Followup  

February 15, 2011 - 8:35 am UTC

That doesn't sound right to me. You should not drop to KB/s.

filesystem - shared/cluster,

February 17, 2011 - 10:49 am UTC

Reviewer: A reader

I am installing 11.2.0.2 grid infrastructure on HP-UX. We are provided with a mount (filesystem) that is shared between the two servers. this mount is for storing OCR files and Voting disks.

Question: Is this called a shared flle system or cluster file system? I could see both the terminolgies in Oracle doc installation guide. I am sort of confused.

About installation using OUI:

OUI doesn't like this mount when I try to put all the 3 OCR files (normal redundancy). It says,
WARNING: [WARNING] [INS-41310] More than one Oracle Cluster Registry is in the same partition.
CAUSE: The Oracle Cluster Registry was located more than once on the same partition. For normal redundancy, Oracle recommends that you specify three locations on separate partitions for the Oracle Cluster Registry.
ACTION: Select separate partitions for each Oracle Cluster Registry.


I also tried 3 differnt mounts to store each OCR file. These mount points are shared between the two servers.

This time, the OUI gave me an error saying [INS-41321] Invalid Oracle Cluster Registry (OCR) location.


The Invalid OCR location is a FATAL error and aborting my installation. What check need to be done to ensure the storage is suitable for OCR files?

Thanks