PIOs, Again????
Arup Nanda, January 20, 2003 - 5:11 pm UTC
Tom,
I agree with you that LIOs are more important and watching them is probably the sanest thing to do. This was necessary to convince the management with fixed ideas the futility a tuning excercise which I considered futile. Anyway... that's another issue.
I am curious how the PIOs are same in subsequent calls to select count(object_id) from big_table. In the first time it was called after the flushcache program, I understand there are some physical IOs. However, in the next call, shouldn't the blocks be up on the db buffer cache? Why another set of PIOs?
Thanks.
Arup
January 20, 2003 - 5:15 pm UTC
my laptop has an insufficient buffer cache to hold it all. 13k blocks flushed it out.
Thanks for the question regarding "How to flush the db buffer cache", version 9.2.0.2
Greg Johnson, January 21, 2003 - 1:42 am UTC
Tom, you mentioned in your follow up to this question, that you had to perform the same number of PIO's because your Buffer Cache was 13K smaller than the table. It's my understanding that for a full table scan on an Uncached table, Oracle reads blocks into the LRU end of the buffer cache at the rate of DB_MULTI_BLOCK_READ_COUNT + 1 Blocks. Therefore, in this case, EVERY time you re-ran the count, you'd have to perform the same number of PIO's to get the same result.
Could the difference in time be related to loading the blocks into the HD's cache?
January 21, 2003 - 9:56 am UTC
13k was the number of blocks. Not that the buffer cache was 13k smaller -- it was that I had 13,000 blocks to read.
by reading 13,000 blocks -- i totally cycled them all through the buffer cache and back out again. by the time I read the last block from that table -- the first block was no longer in my cache. that is what I meant.
Yes, each and every time I run this, I'll do every block as a PIO.
The difference in time (as I was trying to point out) was exactly that. I flushed the OS buffer cache by allocating and touching every byte of ram on my disk. I was trying to point out that on most systems, simply flushing the buffer cache is like "yeah, so what". It doesn't help you tune -- it is misleading -- it is meaningless.
Reducing logical IOs isn't always desirable
Garry, January 21, 2003 - 6:47 am UTC
Hi Tom.
To tune your goal is "reduce the LIO's" -- for the PIO's will take care of themselves.
You can further qualify this by saying that "to tune QUERIES THAT USE THE BUFFER CACHE your goal is to reduce LIOs". In some cases, far greater performance gains can be made by cutting out the buffer cache completely and doing full table scans in parallel.
January 21, 2003 - 10:01 am UTC
but won't that reduce LIO's :)
Good Testing Advice
Robert, January 21, 2003 - 10:00 am UTC
Tom,
I had always thought that "cw" method would be the best. Thanks for pointing out and proving it is not!
Thanks,
Robert.
January 21, 2003 - 10:19 am UTC
Other CW's (that are all 100% wrong)
o indexes mean fast=true, if the optimizer doesn't use them, the optimizer has
a bug
o indexes need to be rebuilt frequently or on a scheduled basis
o rebuilt indexes are faster, smaller and better
o segments should have few extents -- double digit extents are really bad for
performance
o all cursors in code should be explicit cursors, implicit cursors are slow and
bad
o select count(1) is better then select count(*)
o procedural code is faster then doing it in SQL
o adding more CPU will make our systems faster
o index space is not reused
o nologging stops all redo log from being generated on that segment
o the most selective fiedls must be first in an index
o you should commit frequently to save resources and time
o a cold backup is better then a hot backup
What is the meaning of 'CW'
A reader, January 21, 2003 - 11:31 am UTC
What does 'CW' mean? What is its expansion
January 21, 2003 - 12:08 pm UTC
...
anyway, since no one listens to this advice anyway -- i suspect because it flies
in the face of "CW" conventional wisdom -- I'll tell you how.
.....
more examples
A reader, January 21, 2003 - 2:43 pm UTC
Hi Tom, Can you show us the following two CW is wrong in your usual way?
o adding more CPU will make our systems faster
January 21, 2003 - 3:03 pm UTC
You have a system.
It is "running slow".
80% of the users are OLTP. OLTP is CPU de-intensive, IO intensive. The OLTP users are constrained by the amount of IO they can get.
20% of the users are running reports. The processing of the data for the reports is sucking the life out of the CPU. These reports do a lot of IO but they are constrained by the CPU they can grab (so the amount of IO they do is gated by the amount of CPU they have).
You look at the system and say "oh my, CPU is at or near 100% utilized, we need more".
So, you buy more CPUs put them in. Now everything goes really slow for the OLTP guys (the ones that actually make money booking orders). Why? Cause the reports that were CPU bound now have lots of CPU and have DOUBLED the amount of IO they were doing -- meaning the IO bottleneck that was the gating factor for the transactional guys is even worse then before.
some bottlenecks are good -- we use them to control access to resources. For example think "connection pools in an app server" -- or think "transaction processing monitors -- tpms like tuxedo or CICS" -- or thing "shared server -- aka MTS under Oracle". All of those artificially constrain a resource in order to achieve the best overall performance.
What we need to do is identify what bottleneck we need to remove in order to increase performance -- throwing more hardware at a problem does not assure success, in fact the opposite can be true.
Reader
A reader, January 22, 2003 - 12:03 am UTC
Tom,
In your experiment, PIO is almost
constant for FTS everytime you run the query. Is this not
true that when Oracle reads multiblock_read_count
number of blocks from disk, if Oracle finds
a block that is already in the cache, it will
skip that block , and not read it from disk.
Has the algorithm been changed
Thanks
January 22, 2003 - 8:07 am UTC
if it reads mbrc (multi block read count) from disk -- it reads mbrc from disk -- the IO request is "read N blocks from file X starting at block Y". It never had the ability to say "read N blocks from file X starting as block Y but skip the 2cnd and 5th one"
It might do single block io if the data is cached, might blow off the cache and do multi-block io.
Reader
A reader, January 22, 2003 - 10:33 pm UTC
Tom,
This is an excerpt from the article "Predicting Multiblock
Read call size" (Hotsos)
<excerpt>
10. WAIT #47: nam='db file scattered read' ela= 0 p1=12 p2=71980 p3=9
11. WAIT #47: nam='db file scattered read' ela= 0 p1=12 p2=71992 p3=4
12. WAIT #47: nam='db file scattered read' ela= 0 p1=12 p2=71997 p3=16
13. WAIT #47: nam='db file scattered read' ela= 0 p1=12 p2=72013 p3=4
14. WAIT #47: nam='db file scattered read' ela= 0 p1=12 p2=72018 p3=3
15. WAIT #47: nam='db file scattered read' ela= 0 p1=12 p2=6857 p3=2
16. WAIT #47: nam='db file sequential read' ela= 0 p1=12 p2=6861 p3=1
<excerpt>
Here each 'db file scattered read' calls fetched 9,4, 16, 4, 32 block at each call.
In your comment you have said
"
might blow off the cache and
do multi-block io.
"
Is this an enhancement in newer version of Oracle.
Thanks
January 23, 2003 - 7:50 am UTC
I don't get the connection between the wait output you put there and your conclusion. please elaborate -- whats your point?
Read
A reader, January 23, 2003 - 8:51 am UTC
Tom,
For the event 'db file scattered read' (FTS) ,Oracle suppose to
read from diak at a rate of batches of multilock_read_count
number of blocks, The wait events was taken from setting
event 10046 to trace the SQL.
For one file (p1 = 12), Oracle starts reading from block# 71980 (p2)
9 blocks (p3) upto block 71988. Then Oracle skips 3 blocks
71989, 71990, 71991. Oracle then starts reading from block# 71992 (p2)
4 blocks (p3). It seems Oracle did not continuously read 16 blocks
(9+3+4) because the block# 71989, 71990, 71991 may already be in
the buffer cache
I do not have access to the article right now, since I am at work
and I do not have password to the hotsos site.
I'll update additional information from the article later on
Thanks
January 23, 2003 - 10:51 am UTC
it is skipping blocks sure in that example by reading "chip chop", but turn on parallel query or any other myriad of features and it won't...
it is still doing multi-block io in that case.
as i said "It might do single block io if the data is cached, might blow off the cache and do multi-block io."
but in my example -- none of the blocks would have been found in the cache since by the time I read the last blocks -- the first blocks were long gone.
Care to share your flushcache.c : )
A reader, January 23, 2003 - 9:22 pm UTC
January 24, 2003 - 7:08 am UTC
all it does it allocate a big chunk of memory and touch it. flushes the file system buffer cache (as well as most of the rest of the operating system!)
it is "slow=true"
#include "stdio.h"
#include "malloc.h"
#include "assert.h"
int main()
{
int meg = 512;
int i;
long size = meg*1024*1024;
char * ptr = malloc( size );
assert( ptr != NULL );
printf( "allocated %dm of ram 0x%X\n", size/1024/1024, ptr );
printf( "touching it all...." ); fflush(stdout);
for( i = 0; i < meg; i++ )
{
memset( ptr+(i*1024*1024), 0, 1024*1024 );
printf( "." ); fflush(stdout);
}
printf( "done...\n" );
return 0;
}
Reader
A reader, January 24, 2003 - 7:54 am UTC
Thanks Very Much
full table scan where size of buffer cache is less than size of the table
Arul, June 19, 2003 - 2:40 pm UTC
Hello Tom,
On our oltp system, a sql query from an application is doing a full table scan on a 60 million row table whose size is ~4000MB.
For a 30 minute interval of statspack, the buffer gets/exec and physical reads/exec is ~270,000 for this query; there are 3 entries for this query, 2 executions per hash_value (not sure why?) -- so a total of 6 executions during a 30 minute interval.
The buffer cache is set at 750MB, there are several other critical applications that use the same oracle instance.
So I am thinking that the server process that does a full table scan, reads all blocks for this table into the buffer cache for each execution. Thats a lot -- 4000MB into a 750MB buffer cache !?!?
Is my understanding correct? If so, is n't this affecting other applications? what happens to the current blocks for other server processes in the cache? how does Oracle handle this? is n't this very very very bad? so far, I have n't heard any complaints from anyone !
I would love to hear your comments, appreciate your time on this one.
(I can tune this query to use indexes, this brings down the execution time from a couple of minutes to sub seconds, as well as the lios and pios)
Thank you.
June 20, 2003 - 3:38 pm UTC
if you can tune this query to be subsecond and bring the LIO's down -- you don't even need to have this conversation!
yes, it is affecting others.
yes, it is used part of (not all of) the buffer cache. It will not flush the entire thing, but it will impact the utilization (big tables that are scanned are buffered in a way that causes those full scan blocks to be aged out much faster then other blocks -- so the full scan tends to age out it's own blocks first, it will not blow out the entire buffer cache)
yes, if you can index it and the index does not materially affect any other component of the system -- that would be the correct thing to do.
Consistent Gets and Physical Reads
Matt, August 21, 2003 - 8:49 am UTC
Tom,
You say we should strive to reduce the LIO - I have a situation where we have a plan generated with a low LIO which performs worse than a plan with higher (about 20times) LIO. The higher elapsed time I guess is due to greater physical reads, generated by the full table scans for the Hash_Joins, compared to the smaller set of unique index lookups generated by NL. What I don't understand is how the hash_join generates a far smaller number of consistent gets, but higher physical reads.
1 SELECT COUNT(*)
2 FROM XX_EXPLOSION_TEMP BeT,
3 Mtl_System_Items_Vl Msi,
4 Mtl_System_Items_Vl MsitoP
5 WHERE Bet.Organization_ID = Msitop.Organization_Id
6 AND Bet.Top_Item_ID = Msitop.Inventory_Item_Id
7 AND BEt.Organization_Id = Msi.Organization_Id
8 AND Bet.Component_Item_Id = Msi.Inventory_Item_Id
9* AND Bet.Group_ID = 999
SQL> set timing on
SQL> set autotrace on
SQL> /
COUNT(*)
----------
29724
Elapsed: 00:00:15.21
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3891 Card=1 Bytes=52
)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (Cost=3891 Card=29937 Bytes=1556724)
3 2 HASH JOIN (Cost=3232 Card=29937 Bytes=1197480)
4 3 HASH JOIN (Cost=2579 Card=29937 Bytes=838236)
5 4 HASH JOIN (Cost=1370 Card=29831 Bytes=626451)
6 5 TABLE ACCESS (FULL) OF 'XX_EXPLOSION_TEMP' (Cost
=164 Card=29726 Bytes=416164)
7 5 INDEX (FULL SCAN) OF 'MTL_SYSTEM_ITEMS_B_U1' (UN
IQUE) (Cost=1195 Card=447564 Bytes=3132948)
8 4 INDEX (FULL SCAN) OF 'MTL_SYSTEM_ITEMS_B_U1' (UNIQ
UE) (Cost=1195 Card=447564 Bytes=3132948)
9 3 TABLE ACCESS (FULL) OF 'MTL_SYSTEM_ITEMS_TL' (Cost=6
34 Card=447564 Bytes=5370768)
10 2 TABLE ACCESS (FULL) OF 'MTL_SYSTEM_ITEMS_TL' (Cost=634
Card=447564 Bytes=5370768)
Statistics
----------------------------------------------------------
28 recursive calls
25 db block gets
11894 consistent gets
7430 physical reads
0 redo size
231 bytes sent via SQL*Net to client
354 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
1 rows processed
1 SELECT /*+ Rule */ COUNT(*)
2 FROM XX_EXPLOSION_TEMP BeT,
3 Mtl_System_Items_Vl Msi,
4 Mtl_System_Items_Vl MsitoP
5 WHERE Bet.Organization_ID = Msitop.Organization_Id
6 AND Bet.Top_Item_ID = Msitop.Inventory_Item_Id
7 AND BEt.Organization_Id = Msi.Organization_Id
8 AND Bet.Component_Item_Id = Msi.Inventory_Item_Id
9* AND Bet.Group_ID = :Group_ID
SQL> c/:Group_ID/999
9* AND Bet.Group_ID = 999
SQL> l
1 SELECT /*+ Rule */ COUNT(*)
2 FROM XX_EXPLOSION_TEMP BeT,
3 Mtl_System_Items_Vl Msi,
4 Mtl_System_Items_Vl MsitoP
5 WHERE Bet.Organization_ID = Msitop.Organization_Id
6 AND Bet.Top_Item_ID = Msitop.Inventory_Item_Id
7 AND BEt.Organization_Id = Msi.Organization_Id
8 AND Bet.Component_Item_Id = Msi.Inventory_Item_Id
9* AND Bet.Group_ID = 999
SQL> /
COUNT(*)
----------
29724
Elapsed: 00:00:08.47
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: RULE
1 0 SORT (AGGREGATE)
2 1 NESTED LOOPS
3 2 NESTED LOOPS
4 3 NESTED LOOPS
5 4 NESTED LOOPS
6 5 TABLE ACCESS (BY INDEX ROWID) OF 'XX_EXPLOSION_T
EMP'
7 6 INDEX (RANGE SCAN) OF 'XX_EXPLOSION_TEMP_N4' (
NON-UNIQUE)
8 5 INDEX (UNIQUE SCAN) OF 'MTL_SYSTEM_ITEMS_B_U1' (
UNIQUE)
9 4 INDEX (UNIQUE SCAN) OF 'MTL_SYSTEM_ITEMS_TL_U1' (U
NIQUE)
10 3 INDEX (UNIQUE SCAN) OF 'MTL_SYSTEM_ITEMS_B_U1' (UNIQ
UE)
11 2 INDEX (UNIQUE SCAN) OF 'MTL_SYSTEM_ITEMS_TL_U1' (UNIQU
E)
Statistics
----------------------------------------------------------
28 recursive calls
0 db block gets
238941 consistent gets
263 physical reads
0 redo size
231 bytes sent via SQL*Net to client
354 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
August 21, 2003 - 6:57 pm UTC
yes, in general -- it is all about LIO.
whats your version
hash area size? or pga_aggregrate_target?
Think I have found my answer
Matt, August 21, 2003 - 9:41 am UTC
To place all of the table blocks in buffer cache
Reader, January 30, 2004 - 7:57 pm UTC
Tom, does select count(*) from table_name place all of the table blocks in the buffer cache? Thanks.
January 30, 2004 - 8:17 pm UTC
yes
no
maybe
sometimes
it depends.....
count(*) might not even hit the table (hope not!)
search for
"long and short"
on this site -- to read about the long and short of tables and how the cacheing of a full scan works.
OK
R.Chacravarthi, January 31, 2004 - 8:51 am UTC
Dear sir,
Regarding flushing buffer cache,I think in oracle 9i you can use
sql>Alter session set events immediate trace name'..'
I forgot the command but you can do
But in oracle 10g simple
sql> alter system flush buffer cache;
Hope this helps
Bye!
Nod from Cary Millsap
R.Chacravarthi, February 21, 2004 - 6:47 am UTC
Dear sir,
Nice to meet you.I said we can flush the cache the with the
command specified over there.But you never responded.I mailed Millsap and he has said that it is correct.But I was
very disturbed when you didn't put a followup.
My mail to him:
Dear Sir,
Well and wish the same from you.Doesn't the command
sql>alter session set events 'immediate trace name
flush cache';
Flushes the buffer cache in oracle 8i or above?Do you have any other way
for this?please do reply.
Bye!
Cary's reply to my mail as follows:
I don't have the resources at hand to check right now, but I believe
this is correct. In 10g, there's a more direct way to do it.
Cary Millsap
Hotsos Enterprises, Ltd.
February 21, 2004 - 11:15 am UTC
why where you disturbed?
I saw no reason to follow up, you provided information -- there you go. why did I need to followup with anything?
You said "here is a command you can use to do this. In 10g it gets easier. hope this helps". what more need be said?
I guess I could have said:
but, as i said above, if you are using "cooked files" without directio as 99.99% of the people out there are, flushing the buffer cache is quite simply "a really silly, misleading thing to do -- you cannot make any judgements -- other than leap to wrong conclusions -- based on it"
so, flush away, but beware, you are mostly just "wasting your time"
Nice
A reader, April 17, 2004 - 6:19 am UTC
Hi Tom,
How to find the size of the buffer cache?One way is
db_block_size * db_block_buffers.
Is there any other way?
Please do reply.
April 17, 2004 - 9:57 am UTC
SQL> show sga
SQL> select * from v$sgastat;
Buffer Cache used
atul, June 07, 2004 - 1:26 am UTC
Hi,
I'm using designer 6i with oracle 8.1.7.
When i use to log in using designer client and brwose to any application of designer its performance is very slow.
At the same time if i see buffer cache use,it shows me following results..
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
select decode( status, 'free', 'free', 'used' ),count(*)
from v$bh group by decode (status, 'free', 'free', 'used' );
DECO COUNT(*)
---- ----------
used 6000
So i'm using all blocks of buffer cache..
My sga size is as follows:
SQL> show sga
Total System Global Area 173625236 bytes
Fixed Size 73620 bytes
Variable Size 119148544 bytes
Database Buffers 49152000 bytes
Redo Buffers 5251072 bytes
so does that the reason for slow performance??
Thanks & Regards,
Atul
June 07, 2004 - 8:20 am UTC
no. it does not.
Buffer Cache...
Atul, June 10, 2004 - 2:40 am UTC
Hi,
I will give you more info..
performace is slow when i connect through designer client..
To Find the bottelneck i fired follwoing queries
++++++++++++++++++++++++++++++++++++++++++++++++++++++
select a.sid, a.value session_cpu, c.physical_reads,
c.consistent_gets,d.event,d.seconds_in_wait
from v$sesstat a,v$statname b, v$sess_io c, v$session_wait d
where a.sid=11
and b.name = 'CPU used by this session'
and a.statistic# = b.statistic#
and a.sid=c.sid
and a.sid=d.sid;
SQL> /
SID SESSION_CPU PHYSICAL_READS CONSISTENT_GETS
---------- ----------- -------------- ---------------
EVENT SECONDS_IN_WAIT
---------------------------------------------------------------- ---------------
11 0 61684 334388
db file scattered read 0
After firing the query many times i can see PHYSICAL_READS count is increasing..
Again i fired buffer cache used query
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
select decode( status, 'free', 'free', 'used' ),
count(*)
from v$bh
group by decode( status, 'free', 'free', 'used' );
And found output which says my all buffer cache is being used.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
To find the exact object for the waits:
select owner,segment_name,segment_type
from (select p1 file#, p2 block# from v$session_wait
where sid = 11
and event in ('buffer busy waits'
,'db file sequential read'
,'db file scattered read'
,'free buffer waits')) b
,dba_extents a
where a.file_id = b.file#
and b.block# between a.block_id and (a.block_id+blocks-1);
I get output as
OWNER
------------------------------
SEGMENT_NAME
--------------------------------
SEGMENT_TYPE
------------------
REPOS_OWNER
I$SDD_MOD
TABLE
its a designer table,where all indexes are in place..
Now could you advice me what problem is??
Thank,
Atul
June 10, 2004 - 7:51 am UTC
No, there are thousands of variables and you seem to believe "it is all IO". maybe yes, maybe no. but the only thing you are looking at is, well, IO.
perhaps you could create an on logon trigger, in that trigger, say "if user = yourself" then execute immediate an alter session and set the 10046 level 12 trace . Then, after logging in, run tkprof.
statspack would be useful to see aggregate information such as "hit ratios" to give you a chance to eyeball that and see if it is "really low" and needs attention.
Felipe Moreno, March 11, 2013 - 12:35 pm UTC
Suppose you are tuning a single sql statement.
What if you flushed the buffer cache before and after each version of the statement?
SQL> alter system flush buffer_cache;
SQL>
-- original version
select case a.status
when 'A' then 'ACTIVE'
else 'INACTIVE'
end, b.*
from a, b
where a.id = b.id
and a.status in ('A', 'I');
SQL> alter system flush buffer_cache;
SQL>
--new version
select 'ACTIVE', b.*
from a, b
where a.id = b.id
and a.status = 'A'
union all
select 'INACTIVE', b.*
from a, b
where a.id = b.id
and a.status = 'I'
SQL> --compare elapsed times.
Would that still be misleading?
March 11, 2013 - 1:42 pm UTC
.. What if you flushed the buffer cache before and after each version of the statement?
..
In most cases - you'll have the most biased test on the planet.
Many people use buffered file systems, a lot of them. When you flush the buffer cache, you flush the SGA, but not the OS file system cache. So the second query - while it appears to be doing the same physical IOs as the first - is doing "fake" physical IO's from the file system cache.
I call this the secondary SGA. search for that term on this site for more on that.
but that aside, go with query #1 above, I don't need to run that one :)
What does Oracle do when buffer reflushed?
Oracle fans, October 23, 2013 - 2:30 pm UTC
Tom,
It takes a long time to flush buffer. I wonder what Oracle is doing after the command; Does ORACLE try to
archive redo logs before flushing buffer?
Thanks for advance.
Oracle fans
November 01, 2013 - 8:47 pm UTC
It takes a long time to flush buffer.
why do you say that? It should take <5ms
or do you mean "flush the ENTIRE buffer cache", in which case - it could take 5ms * number of blocks needed to be written to disk.
flushing the buffer cache would cause you to wait for every block with a modification to be written out to disk. that would account for the vast majority of the time spent executing this command.