Skip to Main Content
  • Questions
  • How to find physical read/Writes made per second

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, vijay.

Asked: July 13, 2001 - 5:35 pm UTC

Last updated: August 28, 2013 - 5:15 pm UTC

Version: 8.1.7

Viewed 10K+ times! This question is

You Asked

Hi,

We are planning to test different raid levels , so i was wondering is there anyway we can find Reads or Writes per second.

Thanks,
Vijay

and Tom said...

v$filestat has this (turn on timed_statistics)

I will warn you when looking at this to consider some extenuating circumstances.

DBWR writes to datafiles in the background. As long as DBWR can keep up, a slow IO time can be perfectly acceptable. A disk performance decrease, on a well tuned system, can be hidden by DBWR, your end users will never know.

LGWR writes to logs in the FOREGROUND. Any performance decrease in IO will be visible to your end users, painfully visible. You cannot hide this.

TEMPORARY files are read/written in the foreground, you will notice this.

Datafiles are read in the foreground, you will notice this.


So -- datafiles need to have fast reads and can suffer slower writes (to a point). Log devices need to be FAST for writing, really really FAST for writing. Since you will be running in archive log mode (all production systems do), they need to be adept at READING as well. Your archive destination needs to be FAST.

Consider using more then ONE level of raid, eg:

log (online and archive) consider 0 or 0+1

datafiles consider any level, even 5 (depending on your usage. If you direct path load gobs of data, raid 5 could kill you.)

tempfiles, consider raid 0 -- who cares if they get nuked -- just recreate them, you want really fast reads and writes.


see also
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:359617936136 <code>
...


Rating

  (46 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

More Help

Vinnie Salerno, August 22, 2001 - 8:52 am UTC

I have turned on TIMED_STATISTICS but still do not get
any values for WRITETIM or READTIM? How can I generated
values for these to get AVG reads & Writes /Second?

Tom Kyte
August 22, 2001 - 9:33 am UTC

I've never seen that not happen.  Please contact support and open a TAR for that, I just tested on my machine and sure enough, right after issuing:

ops$tkyte@ORA8I.WORLD> alter system set timed_statistics=true;

v$filestat started collecting values. 

A reader, August 22, 2001 - 10:14 am UTC

hi tom,

I read your book and it is the best i have ever seen for oracle

Reader

A reader, October 01, 2002 - 10:05 pm UTC

I am a little confused
"
DBWR writes to datafiles in the background. As long as DBWR can keep up, a slow
IO time can be perfectly acceptable. A disk performance decrease, on a well
tuned system, can be hidden by DBWR, your end users will never know.

LGWR writes to logs in the FOREGROUND. Any performance decrease in IO will be
visible to your end users, painfully visible. You cannot hide this.
"

DBWR when it writes at a checkpoint, processes has to wait
for checkpoint to complete. Performance decrease, is it not.

LGWR when it writes at "commit", the server processes
continue to perform and hence performance does not
decrease unless the log buffer is filled or all log
groups are caught up. Is this not true

So, could you clarify why DBWR works in background and
LGWR in forground

Thanks


Tom Kyte
October 02, 2002 - 10:11 am UTC

we only have to wait for DBWR at a checkpoint if we need to advanced into a logfile that still has blocks that only exist in the cache that need the redo. then we wait.


If you find you are waiting, add another logfile and hey -- we can now advance further without waiting.


A checkpoint not complete can be "tuned" around. We can add dbwr's, we can increase our logs, we can use the log_* parameters and the fast_start_io_targets (mttr target in 9i) to make it so we never get this message and never wait.

LGWR on the other hand. Well, When you COMMIT, COMMIT does not return to you until LGWR has done the physical write and gotten confirmation back. You *wait* for LGWR in the foreground. YOUR PROCESS, you the client waits for lgwr. (this is why you never want to commit in a high speed "for" loop -- you'll spend half your life WAITING for log file sync's)




Reader

A reader, October 02, 2002 - 10:20 pm UTC

Thanks for the clarification.

1. If there is a BLOCK in log buffer and it is not copied to
redo log, the DBWR will wait before checkpoint the
block, I agree


2. "logfile that still has blocks that only exist in the cache
that need the redo"

Does this mean that if there is a Block in the cache it will
not be checkpointed until that block has been copied to
archived log from redo log if in "archived log mode" regardless
of if the log groups are caught up or not

Tom Kyte
October 03, 2002 - 9:14 am UTC

2) no, it does not mean that.

lets say there is a block in the buffer cache. It is "dirty" (modified). it has NOT been checkpointed to disk (it exists only in cache).

lets say redo log 1 has the redo to redo the changes to this dirty block.

lets say there is a log switch and we are trying to advance into redo log 1 to reuse it. We CANNOT reuse UNTIL AFTER that dirty block is checkpointed to disk.

that is what I was saying

Reader

A reader, October 03, 2002 - 10:35 pm UTC

Tom,

"
lets say there is a log switch and we are trying to advance into redo log 1 to
reuse it. We CANNOT reuse UNTIL AFTER that dirty block is checkpointed to disk.
"

1. If this is the case, it seems to me more like LGWR will wait , not the DBWR.
DBWR will never be blocked (except in the case of log buffer having the block)
2. In this scenario, will LGWR wait, even if that log 1 has been archived
completely, because the "modified buffer" that has redo in log 1 has
not been checkpointed.

If so, it appears that, Oracle tries to do on ONLINE recovery from the current log
file if by chance there is any corruption in the content of the block.

From "DATA BLOCK CORRUPTIONS: DETECTION ...." by Stephen Haisley
"
If a problem is found (during logical block checking) Oracle tries to
perform recovery on the block........using current online redo log
"

Your comments please

Thanks

Tom Kyte
October 04, 2002 - 8:16 am UTC

1) everyone waits whilst DBWR checkpoints. the system will appear to freeze

2) yes

we do instance (but not media) recovery using ONLY online redo logs, yes.




os cache

Reader, July 29, 2003 - 7:20 pm UTC

someone was saying to me that dbwR process writes to OS cache not directly to the datafiles. From the OS cache, os process then writes to datafiles. Is it true? Also, is it true that only LGWR writes directly to redo log files on disk.

Also, can server process write to data files directly? If so, in what circumstances in oracle can server process do this. Thanks. You are the best.

Tom Kyte
July 29, 2003 - 8:19 pm UTC

we use O_SYNC on the writes by dbwr -- meaning dbwr either

a) waits for the write to be put onto disk (sync io)
b) waits for the OS to notify dbwr that the write is in fact on disk (async io)

so, while the data might hit the OS buffer cache, we ask the OS to also put it onto disk.

Yes, server processes can write directly to datafiles

o direct path load
o create table as select
o insert /*+ APPEND */
o modifications to nologging nocache lobs
o sort direct writes
o index creates

to name a few.

How can I find how much memory is there for OS cache?

A Reader, July 29, 2003 - 8:53 pm UTC


Tom Kyte
July 29, 2003 - 9:23 pm UTC

you'd ask the guy who runs the OS ;)

solaris and many other unixes/linuxes -- just use whatever is free and shrink it as people use it for other stuff.

how to find dbwr has written or not???

Anurag, October 10, 2003 - 10:17 am UTC

Hi Tom,

I've read in lots of books and in oracle docs that DBWR writes after timeout ie. 3 secs. Is it standard for it, also how to check this and does it update the header of datafile or controlfile. Please give details....

Tom Kyte
October 10, 2003 - 11:04 am UTC

it doesn't update the controlfile -- ckpt does that.

you can see what dbwr's been doing in various ways. one quick one is:

select * from v$sysstat where name like '%DBWR%'




how to find dbwr....

anurag, October 16, 2003 - 7:07 am UTC

Thanks I found many things, but how to trace that DBWR is writing in every 3 secs?? What exactly is timeout??

Thanks in adv.

regards

Tom Kyte
October 16, 2003 - 10:36 am UTC

just trust us, it is the way it works.

negative value for phyrds and singleblrkrds in v$filestat

Sunil, June 30, 2006 - 3:43 pm UTC

Hi Tom,
Value One of the row(file #) is negative. Why is this so?
It is a 9i release 2 RAC on Red Hat Advanced Server 2.1 (two nodes)
Thanks

Tom Kyte
June 30, 2006 - 4:29 pm UTC

the statistics are sometimes in 32bit integers. When they exceed 2billion, they roll negative...

Ouch...Obviously this app is in bad shape.

Sunil, June 30, 2006 - 7:12 pm UTC

So phyrds and singleblrkrds is going to be 2 billion plus whatever is in those columns?
Thanks.

Tom Kyte
July 01, 2006 - 7:49 am UTC

that is since the database instance has been started - so "ouch" is premature.

Mismatch between v$filestat and v$sysstat

Tom.Lai, March 18, 2009 - 11:42 pm UTC

Why doesn't sum(phyrds) from v$filestat match 'physical reads' from v$sysstat?

Tom Kyte
March 19, 2009 - 10:26 am UTC

one is blocks, the other is IO requests (a single IO request can be for more than one block)


ops$tkyte%ORA10GR2> create table t as select * from all_objects;

Table created.

ops$tkyte%ORA10GR2> alter system flush buffer_cache;

System altered.

ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> connect /
Connected.
ops$tkyte%ORA10GR2> column value new_val v
ops$tkyte%ORA10GR2> column phyrds new_val pr
ops$tkyte%ORA10GR2>
ops$tkyte%ORA10GR2> select value, phyrds
  2    from (select value
  3            from v$sysstat
  4                   where name = 'physical reads'),
  5         (select sum(phyrds) phyrds
  6            from v$filestat  )
  7  /

     VALUE     PHYRDS
---------- ----------
     68155      52978

ops$tkyte%ORA10GR2> select count(distinct dbms_rowid.rowid_block_number(rowid)) from t;

COUNT(DISTINCTDBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
---------------------------------------------------
                                                689

ops$tkyte%ORA10GR2> select value, value-&v, phyrds, phyrds-&pr
  2    from (select value
  3            from v$sysstat
  4                   where name = 'physical reads'),
  5         (select sum(phyrds) phyrds
  6            from v$filestat  )
  7  /

     VALUE VALUE-68155     PHYRDS PHYRDS-52978
---------- ----------- ---------- ------------
     68849         694      53023           45

Tom.Lai, March 19, 2009 - 10:29 pm UTC

Hi Tom,

Thank you very much!

I have another a question:
Where are the value of "physical reads" in v$sysstat, where are the value of "phyrds" from v$filestat?

From your test result, this mean that the value of "physical reads" in v$sysstat is from database block, the value of "phyrds" from v$filestat is from OS block?

Thanks

Tom Kyte
March 24, 2009 - 10:26 am UTC

... one is blocks, the other is IO requests (a single IO request can be for more than one block) ....


v$sysstat is database blocks

v$filestat is IO requests which could be asking to read 1 to db_file_multiblock_read_count blocks at a time.


v$sysstat might show 64
v$filestate might show 1

that would be a single IO request for 64 blocks in a multi-block read.

IO latency,

A reader, May 26, 2010 - 10:48 am UTC

Hi Tom,

If we want to measure the IO latency (how many milliseconds to extract block from disk), which view we need to use?

I am assuming dba_hist_sysstat gives how many blocks were extracted per second but it doesn't say what was the time taken to extract that block from disk to memory.

Thanks,


Tom Kyte
May 26, 2010 - 11:09 am UTC

it would show up as wait time

look at DBA_HIST_EVENT_HISTOGRAM for wait classes system i/o and user i/o

IO latency,

A reader, May 26, 2010 - 11:15 am UTC

Hi,

I don't see DBA_HIST_EVENT_HISTOGRAM. I am an 10.2.0.4

Thanks,

Tom Kyte
May 26, 2010 - 11:51 am UTC

DBA_HIST_BG_EVENT_SUMMARY
DBA_HIST_FILEMETRIC_HISTORY
DBA_HIST_FILESTATXS

have read/write times as well.

IO

A reader, May 26, 2010 - 11:49 am UTC

I found the original table you were referring to.

It is DBA_HIST_BG_EVENT_SUMMARY. It has EVENT_NAME, TOTAL_WAITS,TIME_WAITED_MICRO columns which I can use for IO related wait events.

Thanks,

Tom Kyte
May 26, 2010 - 11:52 am UTC

we tied :)

Timed Statistics

DJB, May 27, 2010 - 6:09 am UTC

Tom,

Do timed statistics have to be switched on for these views to have data ?

DBA_HIST_BG_EVENT_SUMMARY
DBA_HIST_FILEMETRIC_HISTORY
DBA_HIST_FILESTATXS

Thanks
Tom Kyte
May 27, 2010 - 7:35 am UTC

if you don't have timed statistics, you are basically not able to capture ANY meaningful metrics.

There should be approximately zero databases with time statistics set to off.


You cannot get any timing information (basically the one thing that counts) unless you let us time things.

Timed Stats

DJB, May 27, 2010 - 8:13 am UTC

'if you don't have timed statistics, you are basically not able to capture ANY meaningful metrics.'

Of course - but that wouldn't apply when using Statspack, would it ?
Tom Kyte
May 27, 2010 - 10:26 am UTC

of course it would. why wouldn't it.

All statspack does is... well, what AWR does.

when you snap, we copy the v$ tables which have counters and timers.

If you have no timers, it is useless.

Oops !

DJB, May 28, 2010 - 6:23 am UTC

I thought that meant that none of my statspack information was useful, but clearly not:-

'For better performance analysis, set the initialization parameter TIMED_STATISTICS to TRUE. Statspack will then include important timing information in the data it collects. You can change the TIMED_STATISTICS parameter dynamically by using the ALTER SYSTEM statement. Timing data is important and is usually required by Oracle support to diagnose performance problems.'

From 9i documentation but still current I would imagine. As an aside there have been some references suggesting that Statspack is becoming obsolete. Is that correct ?
Tom Kyte
May 28, 2010 - 8:32 am UTC

It is my opinion that metrics without timing information is useless - I won't even look at it.


Say you waited 1,000,000 times for db file sequential read
Say you waited 5,000 times for enqueue waits


If you have no timed statistics, that is all you'll see - how many times. Now, which of those two might be a problem??

It is a trick question because no matter what you say, I'll give you timing information that proves you wrong.

You could say "both are" - and be wrong.
You could say "neither is" - and be wrong.
You could say "io is a prob, enqueue is not" - and be wrong.
You could say "io is OK, enqueue is problem" - and be wrong.

Then again, you could be right - but you'll never ever know.


Having a count is useless, less than useless actually because many people would be seriously mislead by it. In the above - everyone would get excited about IO and concentrate on that.


But not realizing that the time spent waiting on IO was 60 seconds in total, and the time spent waiting on enqueues was 60 minutes - they would be wasting everyone's time.

Silly me

DJB, June 02, 2010 - 4:16 am UTC

'I thought that meant that none of my statspack information was useful'

It's been that lomng since I set this up (while it seems that long anyway) I'd forgotten that I had set timed statistics to true as part of the set up. Still what of the suggestion that Statspack is obsolete ?
Tom Kyte
June 08, 2010 - 9:38 am UTC

... till
what of the suggestion that Statspack is obsolete ? ...

where is that suggestion?


If you don't have the EM tuning/diagnostic packs, statspack is still pretty "state of the art"

Staspack Obsolete

DJB, June 09, 2010 - 3:30 am UTC

I'll find the reference.

Obsolete ?

DJB, June 09, 2010 - 9:14 am UTC

This is the web page in question - or at least one that suggests it may be outdated:-

http://www.adp-gmbh.ch/ora/misc/10g.html

'statspack will be somewhat obsolete with 10g although it is still around. It's functionality is "taken over" by the Oracle kernel and the results can be made visible with enterprise manager (EM).'

You clearly disagree.

Tom Kyte
June 10, 2010 - 11:59 am UTC

that isn't an oracle page at all. It would not be a valid reference to use for obsoletion of stuff...

If you have AWR, I would say yes, statspack is somewhat obsolete for you - however, if you don't, statspack is rather state of the art.

Statspack

DJB, June 11, 2010 - 4:01 am UTC

Apologies - I should have made it clear it wasn't Oracle that was suggesting this but an opinion that had been expressed by a number of users, including the link I posted.
Tom Kyte
June 11, 2010 - 7:34 am UTC

opinions are a dime a dozen, facts are priceless. Just a thought.

the difference between opinion and rumor is a fine line :)

statspack/awr differences

tortureduck, June 11, 2010 - 6:35 pm UTC

I recherched some weeks ago for the awr/statspack differences, here my insights
.) you dont need to setup awr its installed by default (tested with an ee install + db-console)
.) statspack needs some scripts to run and a schema, pretty easy to setup by the way
.) there were some differences in the output, but not so important for me to say awr is the "one and only"
.) license fees.... you need to pay for awr (diagnostic pack), statspack is included with de db-license

Tom Kyte
June 22, 2010 - 7:48 am UTC

and there are more, AWR manages history - purges efficiently, statspack does not.

awr has lots of user interface components built on it, statspack does not.

ASH goes way beyond what statspack can even consider doing, it is almost as good as a tkprof - but in some respects even better (since you don't have to enable it, it has a history as well - you can use it at the drop of a hat)

and so on.

They are what they are.

there's a better method in 11gR2

Pasko, June 17, 2010 - 1:25 pm UTC

Hi Tom,

what do you think about this Method of finding Physical Reads/Writes Per Second in 11g?

http://arup.blogspot.com/2008/08/resource-manager-io-calibration-in-11g.html

Regards,

Pasko
Tom Kyte
June 22, 2010 - 12:37 pm UTC

absolutely

Awr File IO stats - multi block reads

Erman Arslan, October 12, 2010 - 9:08 am UTC

Hi,

I m investigating disk performance on an ERP system, which mades a lot of full table scans.db_multi_block_read_count=8.
According to the Awr report,Av Rd(ms) reaches 24.33. But when I look at the stats at Storage level, I see max 4 ms(average disk round-trip time per 4K block) for a disk in the aggregate. Db has a 8k db_block_size, so 4msx2=8 ms must be seen from Oracle Awr , but it s 24.33 ms.
I suspect db_multi_block_reads are also added to the Avr Rd (ms) calculations on Awr , and that's why I see this difference. Is that true?

Awr and Storage stats are collected for the same time interval.

Tom Kyte
October 12, 2010 - 10:26 am UTC

there is also the transfer time to be considered - our IO time is:

start timer
issue read to OS
stop timer

they (the SAN) kick in inside of "issue read to OS" somewhere. There could be a long wait before you get to the SAN that we count and they do not.

right after the av rd(ms) - you should see av blks/rd.... that'll tell you what sort of IO is taking place on the datafiles - if the average blocks/rd > 1 - multiblock IO is taking place at some level.





Erman, October 12, 2010 - 12:02 pm UTC

Thanks for your response..
blk/rd = 1 , so there was no multiblock IO, right.
If there would be some multiblock IO, the average time for a read would be calculated as if it s like a single IO? So will it increase the average time for a single read operation in AWR? A read considered as a whole , doesnt matter Multiblock or a single block in Awr IO Stats avr rd/ms column?

And how can I find the issue if it s OS related?
Can you give more details about your last update..
There are multipath > /dev device > hba driver > fiber > San
So for example from the server with local sqlplus , I will flush the buffer cache, issue a query statement with autotrace or set timer on, query should be chosen to return the rows only from the suspected datafile.
At the same i ll start collecting stats on Storage and start iostat. Then compare the sqlplus elapsed time,iostat and storage IO time. Would that be a good test?













Tom Kyte
October 12, 2010 - 12:15 pm UTC

we issue an IO for 1 or more blocks, we time it, that is an IO to us.

... I will flush the buffer
cache, ...

which does nothing if you are using a cooked file system (almost nothing) since the OS would have it cached. what type of file system does this appear to be to the database.

Erman, October 12, 2010 - 2:51 pm UTC

Thank you for your responses. I have some more questions..

1) So this IO section of Awr, may be pointing to the IO bottlenecks on OS, Storage and etc.. but maybe ?
2) Fragmentation , chained rows ,multiblock reads, Lgwr works , Archiver works, Temp Reads/writes will increase those Averages in Awr IO stats , is it true?
3) If I see a file( always the same file) in IO stats section of AWR report with high IO latency, Should I check the database objects too? ( tables, indexes etc..)? For Ex: table fragmentation, index clustering and etc.. ?
4) In this scneario , there are different dbf 's present on the same mount point. In that disk/mount point there are only oracle dbf's. But only dbf X has high Av Read ms value, altough the IO per second values of the other dbf are much higher than its.
5)There is another column named Av Reads/s in Awr Report-Io stat section , value is Zero. Is it because its floor value is taken?

Thanks & Regards



Tom Kyte
October 12, 2010 - 3:17 pm UTC

1) sounds like it. yes.

2) fragmentation doesn't exist - unless you define it. With single block IO's (indexed reads to tables - the data by definition is already scattered all over the place - it cannot be fragmented.

Chained rows would show up as a statistic, if they were migrated rows - they just mean one extra IO per chained row count. If that number isn't large, they are not a problem.

multiblock reads - you said you were not doing any.

lgwr would only increase the IO statistics against your data file if you have serious contention between them. same with arch. Typically - those files are on different devices altogether.

3) given your situation - single block IO - fragmentation doesn't exist. Clustering (the degree to which the data in the table is sorted by some index key that is used for LARGE range scans) could affect your physical IOs.

4&5) show us the numbers in context - format nicely so it doesn't wrap if you don't mind


erman, October 13, 2010 - 1:18 am UTC

I spitted IO stats rows for a better context..


Name Reads Av Reads/s Av Rd(ms) Av Blks/Rd
/u2/dbf1 3,066 0 29,75 1
/u2/dbf2 4,834 0 28,13 1
/u2/dbf3 18,126 1 16,15 2
/u2/dbf4 1,154 0 16,73 1
/u2/dbf5 568 0 14,07 1
/u2/dbf6 426,186 15 10,51 1,65
/u2/dbf7 221,817 8 10,86 1,62
/u2/dbf8 290,435 10 11,3 1,57
/u2/dbf9 310,714 11 11,73 1,58
/u2/dbf10 12,084 0 12,51 2,73
/u2/dbf11 5,651 0 17,59 2,03
/u2/dbf12 6,381 0 13,68 3,09
/u2/dbf13 156,848 5 5,07 3,28
/u2/dbf14 129,464 4 5,3 3,76
/u2/dbf15 120,402 4 5,14 4,01
/u2/dbf16 183,435 6 5,54 4,23
/u2/dbf17 175,021 6 6,16 4,42
/u1/dbf1 109 0 7,43 1
/u1/dbf2 157 0 5,1 1
/u1/dbf3 2,625 0 15,24 1

Tom Kyte
October 13, 2010 - 7:00 am UTC

how long was the report for - just looks like it was for a long time and given the really really small IO counts you have here - they are less than 1 read per second on average. Hard to tell out of context.

erman, October 13, 2010 - 1:19 am UTC

Write statistics ( continued rows..)

Name Writes Buffer Waits Av Buf Wt(ms)
/u2/dbf1 5,896 131 45,04
/u2/dbf2 12,448 4 2,5
/u2/dbf3 7,681 919 7,91
/u2/dbf4 1,323 14 30,71
/u2/dbf5 430 39 35,13
/u2/dbf6 16,966 29,377 8,29
/u2/dbf7 5,712 9,748 9,61
/u2/dbf8 5,678 14,389 9,7
/u2/dbf9 7,271 17,85 8,37
/u2/dbf10 2,804 24 11,67
/u2/dbf11 1,243 51 7,06
/u2/dbf12 3,526 13 7,69
/u2/dbf13 2,12 6,989 2,8
/u2/dbf14 1,073 7,395 3,1
/u2/dbf15 2,632 7,009 2,67
/u2/dbf16 6,899 12,27 2,25
/u2/dbf17 10,028 12,2 2,37
/u1/dbf1 86 0 0
/u1/dbf2 86 0 0
/u1/dbf3 86 621 7,79

erman, October 13, 2010 - 1:42 am UTC

Thanks you for your responses..

Given:
/u1/db1 and /u2/db2 are in the same tablespace .( Apps_ts_archive), this tablespace has only two dbfs.(dbf1 and dbf2)
db3 for instance is in another tablspace and.. tablespace definitions are the same...
/u2/dbf1 has 3066 read , with av 29,75 ms ,no multiblock_read.
/u2/dbf3 has 18126 reads , with av 16,15 ms, avg blks/rd=2..

1)So the difference in av Rd (ms) here, is it because of the multiblock_Reads on dbf3 (maybe )?
2)Dbf1 reads blocks scattered ( no multiblock), and dbf3 (because of multiblock reads) does read like sequential read, continous?

3)What is your opinion for this IO stat? what should be checked to find out the reason for the slow reads on /u2/dbf1 and /u2/db2 ..



Tom Kyte
October 13, 2010 - 7:04 am UTC

the number of IO's is so tiny here - teeny tiny. These numbers do not even seem relevant in the grand scheme of things do they???


erman, October 13, 2010 - 1:59 am UTC

there are no redolog ,archive destination , or tempfile on /u2 mount point.

erman, October 13, 2010 - 8:02 am UTC

Awr report was produced for following scope..8 hours..
Begin Snap: 18327 12-Oct-10 09:00:25 567 83.1
End Snap: 18336 12-Oct-10 17:00:03 825 97.0

Yes, there are less than 1 reads per second.
I just wonder why such a long wait for /u2/dbf1 (av Reads/s 29.75)


For,
3066 (reads) x 1 (avg blocks per read) x 8(blocksize)= 24528K data read,
29.75 x 3066= 91213 ms, so 91 seconds passed in general.

Normally , for /u2 mount point , from the shell 25000K IO ( read from a file) can be done in 6 seconds.




Tom Kyte
October 13, 2010 - 8:25 am UTC

One could even assume that the disks were so idle here that you waited for them to spin up after going to sleep - there is so little activity here. Remember - averages for small sets of observations can tend to be very very very skewed. It takes just a couple of exceptions to make the average go very high.

erman, October 13, 2010 - 2:46 pm UTC

Thanks Tom..

Then, I will wait for at least two days for more IO , and then extend the scope of Awr report.
One last thing.. Can you list the fact that can effect Physical IO that Awr IO section from database tier to Storage.. For ex, I couldnt understand how chained rows effects AV rd(ms) metric on Awr?

Regards..




Tom Kyte
October 15, 2010 - 7:34 am UTC

... and then extend the scope
of Awr report. ..

it will become even LESS useful than it is now.

Look - your system does so little work - it is as good as tuned. Unless and until you have an identified performance issue - let it alone, it is fine. This database isn't doing very much - it isn't worth the effort.

If you DO have a performance issue - then it will be very very very focused and occurs in a very very very small timeframe. You would want to use APPLICATION tracing (ash, sql_trace) not AWR to diagnose it.



chained rows won't affect average read times - it might CAUSE an extra IO, but it would just be an IO like any other IO - nothing special or hard (just "more")

re: DBMS_RESOURCE_MANAGER.CALIBRATE_IO

Chuck, October 14, 2010 - 10:59 am UTC

Regarding DBMS_RESOURCE_MANAGER.CALIBRATE_IO.

We are on 11gR2 and the io calibration documentation says to
set filesystemio_options=SETALL (among other things)
to turn on async io

The page mentioned earlier in this thread says to set filesystemio_options=ASYNC

Other places have said it is set automatically on install according to OS. It was null in our case.

Given we are 64bit 11gR2 on 64bit OEL5
40 gb database
24 gb ram
18 gb allocated to the db with the db handling how to use it

controller 1 has / and /u01 on 4 disk raid 10 for os and oracle software
and /u03 on 15 disk raid 6 for flashback area plus 1 hot spare.
controller 2 has /u02 on an 8 disk raid 10 for the db data files
/u04 points to a windows file server share
/u05 points to an nfs share on our backup server

We do 2 separate rman backups nightly, one to the flash area and one to the nfs share.

My questions are
1, how are SETALL and ASYNC different?
2, does filesystemio_options only apply to the datafiles?
3, are there any dangers to setting filesystemio_options?
(it did not blow up my test db, but that has a different io setup)

Thanks




erm

A reader, October 15, 2010 - 8:32 am UTC

In Awr Report, IO stats section.

If Av Rd(ms) = 10 ms and AV Block/Rd = 2
so there are Multiblock IO's.

Can I say, Db does 2 block read in 10 ms on average for that file? so 1 block read can be around 5 ms?

Thanks & Regards


Tom Kyte
October 15, 2010 - 9:22 am UTC

you can say whatever you would like regarding averages - they are averages.


But you cannot say it would talk half as long to read one block. a large % of the work would be 'overhead' that both read sizes would have to perform (that is, a large chunk of time is spent just getting ready to do IO and finishing doing the IO - both would still have that).

does it take you twice as long to pick up two pieces of paper from the floor as one? Much of your time is spent bending down and standing backup - that would be constant for both 'read sizes'

erm

A reader, October 16, 2010 - 2:55 am UTC

yes paper example clearly shows the process.
But when I think about the IO device, OS buffer sizes,page sizes and etc..so not twice, but there should be an overhead for multiple block read that we cant predict right? (maybe 1,01 or maybe 1,2..) even if they are settled sequentially.

Regards..


Tom Kyte
October 25, 2010 - 7:26 am UTC

... that we cant
predict right? ...

did you mean to say "can predict" - you could 'predict it' - by setting up controlled tests to simulate pure single block IO's from your devices, with your software, on your machine, with your cables and so on - and comparing that to multiblock IO's on the same. It would not be a trivial test - but you could get some insight into what the performance would be.

We try to measure it with system statistics to some degree - to get the true cost of single vs multi block IO.

convert HSEC to SEC

Sita, September 24, 2011 - 2:52 pm UTC

Hi Tom

DBA_HIST_FILESTATXS. READTIM and WRITETIM are in hsec. How to convert to second ?

Thanks
Tom Kyte
September 25, 2011 - 11:34 am UTC

ummm, divide?

if you have hundredths of seconds and want seconds, just divide by 100...

Thanks

Sita, September 25, 2011 - 11:42 am UTC


Followup question on DBA_HIST_FILESTATXS

Matthew McPeak, September 29, 2011 - 4:39 pm UTC

I have a query that shows history from DBA_HIST_FILESTATXS. In the results, I see snaps where the average time per read jumps (say from 8 to 58) but the number of reads and writes in the same period remain constant or even decrease.

I'll post the query, but why would reads be taking so much longer if the activity level didn't increase? Some theories:

1) Skew in the data -- AWR snaps are an hour apart and all the activity is bunched together into just a few minutes maybe?
2) SAN caching -- maybe the SAN cache is failing us and we're merely jumping up to the "true" cost of read?
3) Something else loading the disks? (UNIX admins swear this is impossible)

Any thoughts or insights into where to look next would be greatly appreciated!! Thanks in advance!

Matt

Here is the query:

-- Show IO activity for each disk over a given interval
-- Shows the # of I/O requests and the average service time
-- Includes stats for both permanent and temporary files
SELECT   
         to_char(TRUNC (begin_interval_time, 'MI'),'FMDay DD-MON-YYYY HHPM') snap_start_time,
         disk,
         phyrds_delta + phywrts_delta io_delta,
         phyrds_delta READS,
         phywrts_delta wrts,
         NVL (ROUND (1000 * readtime_delta / NULLIF (phyrds_delta, 0), 2), 0) time_per_read_ms,
         NVL (ROUND (1000 * writetime_delta / NULLIF (phywrts_delta, 0), 2), 0) time_per_write_ms,
         NVL (ROUND (1000 * wait_time_delta / NULLIF (phyrds_delta, 0), 2), 0) waits_per_read_ms
FROM     (SELECT snap_id,
                 begin_interval_time,
                 disk,
                 phyrds - LAG (phyrds) OVER (PARTITION BY disk ORDER BY snap_id) phyrds_delta,
                 readtime - LAG (readtime) OVER (PARTITION BY disk ORDER BY snap_id) readtime_delta,
                 phywrts - LAG (phywrts) OVER (PARTITION BY disk ORDER BY snap_id) phywrts_delta,
                 writetime - LAG (writetime) OVER (PARTITION BY disk ORDER BY snap_id) writetime_delta,
                 wait_count - LAG (wait_count) OVER (PARTITION BY disk ORDER BY snap_id) wait_count_delta,
                 wait_time - LAG (wait_time) OVER (PARTITION BY disk ORDER BY snap_id) wait_time_delta
          FROM   (SELECT   fs.snap_id,
                           snap.begin_interval_time,
                           SUBSTR (fs.filename || '/', 1, INSTR (fs.filename || '/', '/', 1, 2) - 1) disk,
                           SUM (fs.readtim) / 100 readtime,
                           SUM (fs.writetim) / 100 writetime,
                           SUM (fs.phyrds) phyrds,
                           SUM (fs.phywrts) phywrts,
                           SUM (fs.wait_count) wait_count,
                           SUM (fs.TIME) / 100 wait_time
                  FROM     (SELECT *
                            FROM   dba_hist_filestatxs
                            UNION ALL
                            SELECT *
                            FROM   dba_hist_tempstatxs) fs,
                           dba_hist_snapshot snap
                  WHERE    snap.snap_id = fs.snap_id
                  AND      snap.begin_interval_time BETWEEN TO_DATE ('26-SEP-2011 11:00:00', 'DD-MON-YYYY HH24:MI:SS')
                                                        AND TO_DATE ('30-SEP-2011 19:00:00', 'DD-MON-YYYY HH24:MI:SS')
                  GROUP BY fs.snap_id,
                           snap.begin_interval_time,
                           SUBSTR (fs.filename || '/', 1, INSTR (fs.filename || '/', '/', 1, 2) - 1) 
                           ))
WHERE    1 = 1
ORDER BY 
         snap_id ASC, disk

Tom Kyte
September 30, 2011 - 6:07 pm UTC

I'll post the query, but why would reads be taking so much longer if the activity level didn't increase? Some theories:


a) file system cache, if you have a buffered file system that you are using - it could be that your physical IO's are sometimes not really physical IO's and sometime are (search this site for secondary SGA)

b) you are using a shared resource, sometimes other people are using this same resource on other machines (your SAN) and sometimes not

c) you are doing the same number of IO's but to a smaller set of devices (more conention)

d) you are using a shared resource - the network. Sometimes other people are using this same resource and sometimes not.

I just started looking at your list now...

1, 2, 3 - yes.

How can the admins swear this is impossible when they bought a SAN just to be able to do that? A SAN is a shared resource, unless you are the only user of the SAN - ever - it is not only possible but highly probable.


It seems to be the writes

Matthew McPeak, October 03, 2011 - 8:58 am UTC

Thank you!

One other thing I noticed... the jumps in ms/read seem to be strongly correlated with the number of *writes* going on. So, 2 million I/Os split 80/20 reads-to-writes and we're fine. 2 million I/Os split 50-50 and suddenly the response time for *reads* tanks.

Is this because writes don't benefit from the file system cache? Or is there something about writes that make them impose more of a load on disks?

Also, do you think it would be a signficant overhead to have AWR generate snaps every 15 minutes (currently at 60 minutes)?

Thanks again for your guidance!


Tom Kyte
October 03, 2011 - 10:35 am UTC

the writes would be flooding the file system cache (and they are write through, they are updating the cache AND they are doing a physical IO, they are true physical IO's). They would/could be wiping out data in the file system cache that your reads where counting on being there - turning their physical IO's into "true" physical IO's which could be many times slower.


It would not be a significant overhead (short of quadrupled disk storage) to snap every 15 minutes for some period of time to collect more fine grained information - HOWEVER - what you want is probably already in the ASH repository. Zero in one a few sessions and watch their IO response times/waits.

Perfect!

Matthew McPeak, October 03, 2011 - 1:22 pm UTC

That's just the explanation I was looking for. Thanks again!

To Plot Physical Read / WRITES DATA

INDRANIL DAS, August 30, 2012 - 6:17 am UTC

Hi Tom,
I wanted to plot physical read and writes, I was thinking like below-
-------------------------
# Plot Physical Read DATA [ for (current time - 24hrs) i.e 24 hrs back from now]
--------------------------
select begin_interval_time,sum(phyrds)
from(
select begin_interval_time, phyrds
from dba_hist_filestatxs
natural join
dba_hist_snapshot
where begin_interval_time > sysdate -1)
group by begin_interval_time
order by begin_interval_time desc;
--------------------------
# Plot Physical Write [ for (current time - 24hrs) i.e 24 hrs back from now]
--------------------------
select begin_interval_time,sum(phywrts)
from(
select begin_interval_time,phywrts
from dba_hist_filestatxs
natural join
dba_hist_snapshot
where begin_interval_time > sysdate -1)
group by begin_interval_time
order by begin_interval_time desc;
########################################
Am I in the right track Tom ? Thanks for your help in advance.
Regards INDRANIL.INDIA.

IO - percentage of CPU versus milliseconds,

A reader, August 16, 2013 - 2:45 pm UTC

In order to measure the IO bottleneck, I have seen some databases measure it in terms of percentage of CPU and some others (like Oracle) measuring in milliseconds. What is the difference between these two interpretations?

I got an explanation for IO measurement in terms of CPU %

I/O wait measurement is the canary for an I/O bottleneck. I/O Wait is the percentage of time your processors are waiting on the disk.

The disk is being accessed while the rows are retrieved. During this time the processor is idle. It’s waiting on the disk.

If the I/O wait percentage is greater than (1/# of CPU cores) then your CPUs are waiting a significant amount of time for the disk subsystem to catch up.

Example:
The process takes 1 second to get 10,000 rows and perform operations on them. Let’s say the disk access took 700 ms, so I/O wait is 70%. In a fully utilized system, if we have 8 CPU’s, then I/O wait % should be less than 12.5% (1/8).

I didn't understand that. What is your opinion?

Thanks,



Tom Kyte
August 28, 2013 - 5:15 pm UTC

... I/O wait measurement is the canary for an I/O bottleneck. I/O Wait is the
percentage of time your processors are waiting on the disk. ...

but processors don't wait on IO, programs wait for it. the processor is free to go off and work doing something else while the IO is taking place.

You can be both 100% cpu bound and IO bound at the same time.

the processor would only be idle in a single user system (and even then, it wouldn't really be idle).

that example is bogus. What if you have 8 cpu's but 64 active sessions? each needs 300ms of cpu and 700ms of IO wait(which means they each do about 140 physical IO's).

all 64 need a total of 2 seconds of cpu time and the entire thing could finish in less then 1 second (since while some are waiting on IO others are burning cpu)...


what you want to do is look at a transaction and ask "what is this transaction spending most of its time doing".

In this case, it spends 0.3 seconds using CPU and 0.7 seconds waiting for IO.

Now, if you wanted this transaction to execute faster - where would you get your biggest bang for the buck? probably in finding some way to reduce the number of physical IO's it performs by using partitioning, clustering, IOT's, a table reorg, a better index (one that avoids the query hitting the table for example). If we reduced the IO's - we'd probably also reduce the cpu it used to (it would process less blocks probably)


I don't understand the logic behind an "IO %"

Check reads vs waits using AWR views

Basil, September 26, 2014 - 2:37 pm UTC

Hi,

I'm trying to compare the number of read IOs with number of read waits. Here is my understanding:
- the dba_hist_filestatxs.phyrds => number of physical read IOs (one IO could be one or more blocks)
- the dba_hist_system_event.total_waits => number of waits for different types of IO reads ( scatered, sequential, ...)
- one IO in the first view is always one wait in second view

I'm trying to compare the IOs and waits into an AWR interval and the IOs are always much higher than waits.
Do I miss something ?

select snap_id,name,aa
from
(
SELECT snap_id, 'READSIO' AS name, SUM (phyrds) aa
FROM (SELECT * FROM dba_hist_filestatxs
UNION ALL SELECT * FROM dba_hist_tempstatxs
)
GROUP BY snap_id
UNION ALL
SELECT snap_id, 'READWAITS' AS name, SUM (total_waits) aa
FROM dba_hist_system_event
WHERE event_name IN ('db file sequential read',
'db file scattered read',
'db file parallel read',
'direct path read',
'direct path read (lob) ',
'direct path read temp',
--'external table read',
--'control file sequential read',
--'log file sequential read'
)
GROUP BY snap_id
) order by snap_id

(I always substract values from consecutive snapshots)

Check reads vs waits using AWR views - update

Basil, September 30, 2014 - 4:22 pm UTC

Hi ,

I've changed the SQL for being able to compare values - the differences go up to 50%. Any hint? Thank you in advance.

select rio.snap_id, rio.io_end-rio.io_begin as read_ios,wio.io_waits_end-wio.io_waits_begin as wait_ios,100*(rio.io_end-rio.io_begin-io_waits_end+wio.io_waits_begin)/(rio.io_end-rio.io_begin) diff_percent
from
(select snap_id, aa IO_end,
lag(aa,1) over (order by snap_id) IO_begin
from
(SELECT snap_id,SUM (phyrds) aa
FROM (SELECT * FROM dba_hist_filestatxs UNION ALL SELECT * FROM dba_hist_tempstatxs)
GROUP BY snap_id)) rio
,
(select snap_id,aa IO_waits_end,
lag(aa,1) over (order by snap_id) IO_waits_begin
from
(SELECT snap_id, SUM (total_waits) aa
FROM dba_hist_system_event
WHERE event_name IN ('db file sequential read',
'db file scattered read',
'db file parallel read',
'direct path read',
'direct path read (lob) ',
'direct path read temp'
--,'external table read',
--'control file sequential read',
--'log file sequential read'
)
GROUP BY snap_id)) wio
where rio.snap_id=wio.snap_id

SNAP_ID READ_IOS WAIT_IOS DIFF_PERCENT
81473 609799 608568 0,20
81474 588877 549580 6,67
81475 662137 620896 6,22
81476 532453 482074 9,46
81477 655479 591736 9,72
81478 614806 591067 3,86
81479 1338027 721395 46,08
81480 1772508 1040298 41,30
81481 1706805 996522 41,61