Deeper explanation would be nice
john, December 05, 2001 - 7:39 am UTC
Tom,
your explanation was very useful, thanks.
And I want to clear some things up:
If we query v$latch_children we will find
there exactly 1024 hash latches, no matter
what is the number of _db_block_hash_buckets.
So the more hash chains we have the more will
be the hash chain latch contention?
is it correct?
December 05, 2001 - 5:16 pm UTC
what is the exact name you are querying on.
john, December 06, 2001 - 3:52 am UTC
I'm using this query:
select count(*) from v$latch_children
where name = 'cache buffers chains'
December 06, 2001 - 6:35 am UTC
there are N cache buffer chains latches, depending on the size of your buffer cache.
ops$ora817@ORA817DEV.US.ORACLE.COM> show parameter db_block_buffers
NAME TYPE VALUE
------------------------------------ ------- ------------------------------
db_block_buffers integer 11000
ops$ora817@ORA817DEV.US.ORACLE.COM> select count(*) from v$latch_children
2 where name = 'cache buffers chains';
COUNT(*)
----------
1024
ops$ora817@ORA817DEV.US.ORACLE.COM> show parameter db_block_buffers
NAME TYPE VALUE
------------------------------------ ------- ------------------------------
db_block_buffers integer 1000
ops$ora817@ORA817DEV.US.ORACLE.COM> l
1 select count(*) from v$latch_children
2* where name = 'cache buffers chains'
ops$ora817@ORA817DEV.US.ORACLE.COM> /
COUNT(*)
----------
256
ops$ora817@ORA817DEV.US.ORACLE.COM>
So, tweak your block buffer cache and you'll get more/less cache buffers chains latches.
The bigger the buffer cache, the more you get to reduce contention.
john, December 06, 2001 - 7:18 am UTC
Thanks a lot!
Tianhua Wu, December 06, 2001 - 11:05 am UTC
Hi Tom,
If I remember correctly, the value of db_block_lru_tatches is defined between 0.5*cpu_count and 6*cpu_count. From the previous exemple, this limitation is not there any more. Is that correct? If you , do you have any new guide for use this parameter?
December 06, 2001 - 11:43 am UTC
No, db_block_lru_latches and cache buffers chains are different beasts. db_block_lru_latches is normally used in conjunction with multiple buffer pools or multiple dbwr's. cache buffers chains are latches to the individual, hashed lists of buffered blocks.
they are separate.
Tianhua Wu, December 06, 2001 - 12:10 pm UTC
Hi Tom,
Ok, I confused lru chain with cache buffer chains. For my understanding, the hash buckets by deault is a quarter of db buffers before Oracle 8i and two times of buffers for 8i (of course there is a limitation). Also the number of buckets can be changed by _db_block_hash_buckets. How can I increase the number if cache buffer chain latches?
I just want to understand this better.
Thank you!
T. Wu
December 06, 2001 - 2:02 pm UTC
Add more block buffers -- it is a function of the number of block buffers.
DB_BLOCK_LRU_LATCHES and free buffer waits
Nash, May 19, 2002 - 6:01 pm UTC
What co-relation does db_block_lru_latches have to 'free buffer waits'?
If we increase the db_block_lru_latches then will it possibly increase the dbwr throughput and hence decrease the 'free buffer waits'...
Thanks
May 21, 2002 - 8:12 am UTC
you need to be using:
...
db_block_lru_latches is normally used in conjunction with multiple buffer pools
or multiple dbwr's.
.....
in conjunction with that parameter to make a difference. You are basically tuning dbwr with it.
A reader, May 21, 2002 - 8:38 am UTC
Hi tom,
In your first paragraph you told
"Blocks in the buffer cache are placed on linked lists
(cache buffer chains) which hang off a hash table.
The hash chain that a block is placed on is based on the DBA
and CLASS of the block. Each hash chain is protected by a
single child latch. Processes need to get the relevant latch
to allow them the scan a hash chain for a buffer so that the
linked list does not change underneath them."
I understood "Blocks in the buffer cache are placed on linked lists
(cache buffer chains)"
Rest of the stuff go over my mind.
Can you explain "hash table","DBA" and other in simple terms?
Thanks.
May 21, 2002 - 10:02 am UTC
A hash table is a simple computer science data structure. If you have my book -- I go into details (with pictures and such) as to how it works. If you don't have my book -- most any book on data structures will cover what they are. See also
</code>
http://www.nist.gov/dads/HTML/hashtab.html <code>
DBA = Data
Block
Address
it is the file/block#
very big data buffer cause cache buffer chains latch contention
A reader, August 07, 2002 - 4:31 pm UTC
hi tom
you said that very long buffer chain list could lead to latch contention, is this the case of a very big data block buffer?
August 07, 2002 - 6:37 pm UTC
if you mean "will a larger block size reduce this contention" -- the answer is -- it depends. If you have a hot block, you have a hot block and a bigger block won't help that.
I mean db cache
A reader, August 08, 2002 - 2:25 am UTC
Hi I mean a very large database cache (data block buffer) would cause latch contention like a large shared pool causes latch contention
August 08, 2002 - 9:13 am UTC
the bigger the cache, the more hash buckets we create.
more blocks = more lists = more latches.
So no, it would not.
then what exatly is
A reader, August 08, 2002 - 2:51 pm UTC
hi
Then what do you mean by Very long buffer chains?
thank you
August 08, 2002 - 2:54 pm UTC
lots of blocks all hashing to the same list (eg: having hundreds of consistent read versions of the same block and having hundreds of blocks on a list can add up)
How to find the sql being waited on
A Reader, January 14, 2003 - 3:26 pm UTC
Hi Tom,
I have a really bad problem with cache buffers chains - you mention 'When I see this, I try to see what SQL the waiters are trying to execute' - how do I find exactly which SQL and which blocks this is a problem for ?
Regards,
Paul
Simialr Issue
Matt, February 28, 2003 - 1:30 am UTC
I'm investigating a similar issue with cache buffer chains. (I also have some library cache latch contention)
SQL> l
1 select addr, gets, misses, sleeps from v$latch_children where name = 'cache buffers chains'
2* and misses > 100
SQL> /
ADDR GETS MISSES SLEEPS
---------------- ---------- ---------- ----------
0000000406E4E7D8 382594 838 194
0000000406E35BD8 35723 110 85
0000000406E0DBD8 177819 334 149
0000000406DFE3D8 162678 223 69
0000000406DFB7D8 125035 117 54
0000000406DF93D8 198635 224 74
0000000406DF5BD8 144774 201 59
0000000406DBE7D8 93282 106 39
0000000406D973D8 68897 166 69
0000000406D907D8 2486324 18116 5372
0000000406D903D8 4970297 88713 51715
0000000406D74358 120722 102 44
0000000406D6A4E8 143278 117 73
I've highlighted that these are the two latches with contention.
0000000406D907D8 2486324 18116 5372
0000000406D903D8 4970297 88713 51715
So I look for the data blocks...
SQL> l
1 SELECT file#, dbablk, class, state
2 FROM X$BH
3* WHERE HLADDR=address of latch;
SQL> 3 WHERE HLADDR='0000000406D907D8';
SQL> /
FILE# DBABLK CLASS STATE
---------- ---------- ---------- ----------
69 61308 1 1
108 2772 1 1
2 14355 38 1
89 28987 1 1
28 346 1 1
94 33852 1 1
89 37179 1 1
97 4110 1 1
69 182506 1 1
63 656 1 1
63 2704 1 1
105 2858 1 1
124 1545 1 1
69 137267 1 1
37 1404 1 1
94 150954 1 1
1 267 1 1
15 24365 1 1
120 3086 1 1
69 148531 1 1
99 1200 1 1
86 550 1 1
69 102268 1 1
21 6746 1 1
24 rows selected.
SQL> l
1 SELECT file#, dbablk, class, state
2 FROM X$BH
3* WHERE HLADDR='0000000406D907D8'
SQL> 3 WHERE HLADDR='0000000406D903D8'
SQL> /
FILE# DBABLK CLASS STATE
---------- ---------- ---------- ----------
69 61307 1 1
122 358 1 1
103 2695 1 1
108 2771 1 1
89 28986 1 1
68 180888 1 1
69 221600 1 1
118 2923 1 1
91 1283 1 1
69 28356 1 1
16 381 1 1
84 2435 1 1
84 3459 1 1
37 2427 1 1
94 150953 1 1
1 266 4 1
15 24364 1 1
69 145458 1 1
9 1998 1 1
69 96123 1 1
69 148530 1 1
69 102267 1 1
69 153650 1 1
129 1736 1 1
126 682 1 1
6 11606 1 1
26 rows selected.
I then use the following to identofy the objects
SELECT owner, segment_name
2 FROM DBA_EXTENTS
3 WHERE file_id = &p1
4* AND &p2 between block_id AND block_id + blocks - 1
Abobe I've noticed that the file ids 69, 84 and 63 appear more than once. Using the above query I have identified three objects that seem to be the cause of contention. These are ALL indexes.
THis is a snippet of teh statspack report:
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 393M Std Block Size: 16K
Shared Pool Size: 80M Log Buffer: 160K
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 172,027.46 617,719.38
Logical reads: 3,624.48 13,014.84
Block changes: 959.83 3,446.59
Physical reads: 21.83 78.40
Physical writes: 10.07 36.14
User calls: 872.40 3,132.62
Parses: 10.52 37.77
Hard parses: 0.03 0.12
Sorts: 6.62 23.76
Logons: 0.00 0.01
Executes: 1,946.38 6,989.09
Transactions: 0.28
% Blocks changed per Read: 26.48 Recursive Call %: 81.67
Rollback per transaction %: 0.20 Rows per Sort: 5.33
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.93 Redo NoWait %: 100.00
Buffer Hit %: 99.41 In-memory Sort %: 100.00
Library Hit %: 100.00 Soft Parse %: 99.68
Execute to Parse %: 99.46 Latch Hit %: 97.26
Parse CPU to Parse Elapsd %: 71.13 % Non-Parse CPU: 99.83
Shared Pool Statistics Begin End
------ ------
Memory Usage %: 94.03 92.86
% SQL with executions>1: 68.40 67.82
% Memory for SQL w/exec>1: 19.74 26.75
Top 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (s) Wt Time
-------------------------------------------- ------------ ----------- -------
db file sequential read 48,481 1,054 23.37
log buffer space 3,824 1,002 22.21
log file parallel write 21,373 858 19.01
latch free 448,006 453 10.04
db file parallel write 2,612 420 9.30
-------------------------------------------------------------
The SQL I am seeing appears to indicate that there is a data load occuring.
Is it updates to indexes that would be causing this problem?
How would I tune this in this case (inserting/updating the index).
I'm also working on the other waits listed above.
cache buffer chains - 9.2.0.3
Baqir Hussain, May 28, 2003 - 1:42 pm UTC
Tom,
What parameters need to be tuned to avoid cache buffer chain problems in 9.2.0.3?
Please advise.
Thanks
May 28, 2003 - 7:12 pm UTC
it is less a problem of init.ora parameter and more of design.
avoid really really really hot blocks.
avoid 50 billion people full scanning the same table over and over and over.
those are the major causes of cbc latching issues.
db buffer cache sizing
mike, January 01, 2004 - 12:02 pm UTC
Tom, happy new year!!
i have 8174 db.
how can i know that i have the db buffer too small?
thanks
January 01, 2004 - 12:54 pm UTC
trick question.
you have to first make sure your SQL is as good as it gets (reduce the number of LIO's your sql does).
Then, you need to understand if a sql cache of any size can cache some percentage of your data (maybe it can, maybe it can't)
Then you can look at ratio to see if you have a satisfactory hit.
But basically, you want to look at what you system is waiting for. Not waiting on physical IO? then your cache is at least big enough. Waiting on physical IO? DO NOT INCREASE YOUR BUFFER CACHE FIRST, rather look at your top SQL, if you have a query that is executed lots and returns like 2 rows but does 100 LIO's to do it -- you need to look at that ( i would hope for single digit IO's for a popular query that returns 2 rows). do you have a popular query that just does a ton of LIO's (regardless of the rows) -- why? can you do anything about it? if so -- fix it.
Then, you might find that your buffer cache is just fine after all (watch the LIO's and the PIO's will take care of themselves).
Increasing the buffer cache to help a PIO problem is the last step -- you want to exhaust other avenues first (more efficient sql, better algorithms, quit running stuff you don't really need to run, that sort of stuff -- they'll give you the big payoff. increasing the buffer cache doesn't always have the same potential as those approaches)
db_block_hash_buckets
Aashish, March 23, 2004 - 10:33 pm UTC
Hi Tom
Can you please explain about db_block_hash_buckets for what purpose it used how do we find out the need to change this parameter, and what is the relation of latches with this parameter.
March 24, 2004 - 8:33 am UTC
tis one of the magical _parameters.
no need to touch it without support saying 'please touch it'
db_block_hash_buckets
Aashish, March 23, 2004 - 10:39 pm UTC
Hi Tom
Can you please explain about db_block_hash_buckets for what purpose it used how do we find out the need to change this parameter, and what is the relation of latches with this parameter.
Thnx
Hot blocks
Steve, April 22, 2004 - 5:17 am UTC
Tom
You've mentioned hot blocks a couple of times. If we identify hot blocks how can we deal with them.
For example, out top wait event is latch free and within that the misses on cache buffer chains is the highest. Running the following query to obtain the address:
select CHILD# "cCHILD"
, ADDR "sADDR"
, GETS "sGETS"
, MISSES "sMISSES"
, SLEEPS "sSLEEPS"
from v$latch_children
where name = 'cache buffers chains'
order by 4 desc, 1, 2, 3
produces 0A243B40 so running the following query:
select /*+ ordered */
e.owner ||'.'|| e.segment_name segment_name,
e.extent_id extent#,
x.dbablk - e.block_id + 1 block#,
x.tch,
l.child#
from
sys.v$latch_children l,
sys.x$bh x,
sys.dba_extents e
where
x.hladdr = '0A243B40' and
e.file_id = x.file# and
x.dbablk between e.block_id and e.block_id + e.blocks -1
order by x.tch desc;
SYS.DUAL 0 1 3127 2
SYS.DUAL 0 1 3127 1
SYS.DUAL 0 1 3127 4
SYS.DUAL 0 1 3127 3
SYS.DUAL 0 1 3127 1
SYS.DUAL 0 1 3127 1
SYS.DUAL 0 1 3127 1
SYS.DUAL 0 1 3127 1
...
WORKFLOW.WORKINGHRS_IDX 0 31 2495 247
WORKFLOW.WORKINGHRS_IDX 0 31 2495 246
WORKFLOW.WORKINGHRS_IDX 0 31 2495 245
WORKFLOW.WORKINGHRS_IDX 0 31 2495 243
WORKFLOW.WORKINGHRS_IDX 0 31 2495 241
WORKFLOW.WORKINGHRS_IDX 0 31 2495 239
WORKFLOW.WORKINGHRS_IDX 0 31 2495 237
....
What steps can we take to reduce the contention on the hot blocks - especially those that are from indexes?
Thanks
April 22, 2004 - 7:56 am UTC
it is applicaiton dependent.
for example -- you may well find that your DUAL usage is due to people using USER over and over and over in their code. Since USER doesn't change really, perhaps you can virtually eliminate the reads on DUAL by using a variable in your code that is set ONCE per session to user and referenced in the code.
Or, you might find you have triggers like this:
select seq.nextval into :new.id from dual;
well, I would prefer to REMOVE the trigger alltogether and just insert seq.nextval in the insert itself but short of that, I might:
sys@ORA9IR2> create view mydual as select * from x$dual;
View created.
sys@ORA9IR2> grant select on mydual to public;
Grant succeeded.
sys@ORA9IR2> create or replace public synonym mydual for mydual;
Synonym created.
sys@ORA9IR2>
sys@ORA9IR2> @connect /
sys@ORA9IR2> set termout off
ops$tkyte@ORA9IR2> set termout on
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create sequence s;
Sequence created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace procedure p1
2 as
3 l_s number;
4 begin
5 for i in 1 .. 10000
6 loop
7 select s.nextval into l_s from dual;
8 end loop;
9 end;
10 /
Procedure created.
ops$tkyte@ORA9IR2> create or replace procedure p2
2 as
3 l_s number;
4 begin
5 for i in 1 .. 10000
6 loop
7 select s.nextval into l_s from mydual;
8 end loop;
9 end;
10 /
Procedure created.
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec runStats_pkg.rs_start
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> exec p1
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> exec runStats_pkg.rs_middle
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> exec p2
PL/SQL procedure successfully completed.
ops$tkyte@ORA9IR2> exec runStats_pkg.rs_stop(1000)
Run1 ran in 60 hsecs
Run2 ran in 43 hsecs
run 1 ran in 139.53% of the time
Name Run1 Run2 Diff
STAT...redo size 490,188 488,744 -1,444
STAT...buffer is not pinned co 10,003 4 -9,999
STAT...no work - consistent re 10,002 2 -10,000
STAT...table scans (short tabl 10,000 0 -10,000
STAT...table scan rows gotten 10,000 0 -10,000
STAT...table scan blocks gotte 10,000 0 -10,000
STAT...calls to get snapshot s 31,505 11,502 -20,003
STAT...consistent gets 30,506 505 -30,001
STAT...session logical reads 32,569 2,552 -30,017
LATCH.cache buffers chains 69,191 9,165 -60,026
STAT...session pga memory max 3,689,104 0 -3,689,104
STAT...session pga memory 4,213,392 0 -4,213,392
Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
238,428 178,355 -60,073 133.68%
PL/SQL procedure successfully completed.
<b>Note that in 10g, this is not needed, it has already happened -- 10g does this for you</b>
As for a hot index block -- that would be a candidate for one of two things really
a) reverse key index perhaps (to avoid a right hand side index)
b) a hashed table with a locally partitioned index -- achieving the same effect as a reverse key but giving you more index structures.
but it is always going back to the application and asking "why is it happening, what am I doing that causes it" and then looking for ways to reduce or eliminate it from a design perspective.
SLEEPS, not MISSES
Mark J. Bobak, April 22, 2004 - 9:08 am UTC
I think it's worth saying that in general, latch contention
concerns and tuning should be concentrated on those latches
where there is a significant number of SLEEPS. Latches can
and will accumulate MISSES. That's just life. When you see
significant SLEEPS accumulated, that's when you probably have
something worth worrying about.
-Mark
April 23, 2004 - 8:04 am UTC
not really -- latch gets spin, spin = cpu consumption, lots of people going after the latches will cause lots of cpu stuff going on, even if you do not sleep. and while you are spinning in CPU, you are effectively "not doing anything truly useful" and taking time to do it.
actually, not at all. latch contention is best resolved way way way before it ever happens (that is why you might use something like runstats to see what approach results in significantly less latches). Prevention is the best medicine here. It can be insidously hard to remove a problem -- but if you never exhibit the symptons in the first place -- no problem to remove.
When you see excessive latching you have something to think about already. The reason we want a query to do as few LIO's as possible is because an LIO is a latch, a latch is a type of lock, a lock is a serialization device, serialization devices inhibit scalability...... and so on.
Ah but....
steve, April 22, 2004 - 9:37 am UTC
it's a third party app. I've already been asking them why 73% of transactions are rolled back, without success, so I suspect that getting them to look at this will be equally fruitless!! Maybe I should direct user complaints about the speed of the system straight through to them!! ;-)
Thanks
Hash Table and Prime Number
Vivek Sharma, July 02, 2004 - 2:58 pm UTC
Dear Tom,
I was going through a wonderful book written by Steve Adams "Oracle 8i Internals". The author writes about the hash tables, hash chains and hash collisions. In another paragraph, he says that
"By making the number of hash buckets a prime number, you can greatly reduce the risk of any pattern in the hash values resulting in hash collisions once the modulus function has been applied."
I wanted to know
1. how can we set the hash buckets to a prime number value. I mean which parameter decides the value of hash buckets so that it can be set to a prime number.
2. Is this also applicable for library cache (as the author states in the book). If yes, then again which parameter decides the value of this bucket.
Thanks and Regards
Vivek
July 02, 2004 - 4:41 pm UTC
we set all of these for you. don't set them except at the advice of support telling you to set them to a specific value. they are already set optimally.
Name of the parameters
Vivek Sharma, July 03, 2004 - 8:54 am UTC
Dear Tom,
I have checked the _db_block_hash_buckets parameter in one of my database which is 7841 and is a prime number. Is this the parameter which looks after the hash buckets in the buffer cache ?
What is the parameter for Shared Pool ? I will not change these but would like to know for my curiosity and knowledge.
Regards
Vivek
July 03, 2004 - 10:21 am UTC
i don't really comment on the uncommented.
Hash buckets and primes......
Mark J. Bobak, July 08, 2004 - 12:20 am UTC
IF you REALLY want to find out what all those undocumented
parameters are, you probably can. Read Steve Adams' book
and surf his website and poke around (but not on a production
system.) I'm going to respect Tom's wishes and not post any
of that info here. If you REALLY want it, it's out there.
With regard to hash buckets and prime numbers, I believe that
at least since 8.1.7, if you try to shoot yourself in the
foot and set the number to a non-prime, Oracle silently
increases the value to the next greatest prime number. So,
there's even less reason to worry about it.
Procedure
Anders Svensson, July 22, 2004 - 1:44 pm UTC
Hi!
I am wondering if it is possible to select all tables from a data base. I mean if I create some tables and later I want to selet those then how can I do this? Is there any syntax like:
Select tables .... or any ther other syntax
Thanks a lot
A.S
July 23, 2004 - 8:07 am UTC
do you want their names? review the data dictionary - dba_objects will answer that.
otherwise not sure what you mean.
selecting tables name
Andreas Svenssosn, July 26, 2004 - 7:00 am UTC
Yes I want to select the name of all existing tables
Thanks,
July 26, 2004 - 7:38 am UTC
see where you asked this on another page -- or see right above. review the data dictionary, see dba_objects, query it.
db buffers - any downside to it being large?
AR, August 25, 2004 - 11:29 pm UTC
Tom,
In 8i and below (where there isn't a cache advisory)- is there any downside in having as large a buffer cache area as physically (memory) possible? Assuming all other components of the SGA have been sized satisfactorily, that is. Lets assume none of the other regions get compromised by having a larger buffer cache, including user sort areas etc.
Basically if I have gobs of physical memory, can there be a potential downside to having a humongous buffer cache area? I read this thread..and it looks like additional latches will be created to reduce contention on larger buffer areas. I can't think of a downside in having a huge buffer cache. I would however like your opinion.
Thank you for your time.
August 26, 2004 - 9:34 am UTC
the downside is that this becomes standard operating procedure, that is makes you look "bad" (hey, i've a database that needs 40gig of ram -- probably not)...
if you got it, use it I suppose, but make sure it cannot be better used elsewhere (larger PGA's for example)
how to size db cache
Pinguman, August 27, 2004 - 3:55 am UTC
Hi
For a new application how would one start sizing the SGA? Specfically DB_CACHE_SIZE? What information would we need to perform such capacity planning.
What I have done in the past was start with a db_cache_size of around 256MB in OLTP databases and then monitoring for several days however my job now is capacity planning, i.e with some informations from development team and I am suppose to size the memory requirements for the instance. I find this really hard, there is not much help on Metalink.
The brute force way would be start with a big DB_CACHE_SIZE but then how would someone justify such requirement?
Also a very big DB_CACHE_SIZE/DB_BLOCK_BUFFER can make the database perform worse in Oracle 7 and Oracle 8? Same behaviour in Oracle 8i and 9i? What can go weong with a very big DB CACHE?
Cheers
August 27, 2004 - 7:47 am UTC
how much money do you have?
semi-serious there. In general, how much money do you want to through at RAM. with 64bit, you can pretty much use it all.
It is not an exact science -- it cannot be. If you frequent a smallish database and read 5% of it (the working set), you'll want Nmeg of cache, same database --read 95% of it (the working set) and you'll want Mmeg of cache.
change/small/medium or large or really big and you have the same issue.
if you are looking for a ROT (rule of thumb), an observation I have made is that a machine generally seems to come with 1 to 2 gig of RAM per CPU. So, 4 cpus would generally have 4, 6 or 8 gigs of RAM. It seems as valid as anything else to start.
Arun Gupta, August 27, 2004 - 5:02 pm UTC
Tom,
In answer to a question above you said:
"lots of blocks all hashing to the same list (eg: having hundreds of consistent
read versions of the same block and having hundreds of blocks on a list can add
up) "
Can you please explain in some more detail the mechanism which can cause multiple blocks hash to the same linked list. My understanding is that Oracle user process will search for data in buffer cache. If the data is not found, then Oracle will copy the data block into buffer cache from the datafile after finding a free buffer. This buffer will be moved to the MRU end of the LRU list. With this scheme, how can multiple blocks hash to the same linked list?
Thanks
August 27, 2004 - 5:43 pm UTC
there are lists of lists.
when you ask for "block 5 from file 55" -- 5.55 is the "DBA" the data block address.
That is hashed to find the list of blocks that could contain 5.55
There is a list of pointers to blocks that are dirty
There is a list of pointers blocks to age out
the blocks themselves are stuffed into the cache and then pointed at.
So, we need a definitive way to find block 5.55 -- you wouldn't want to have to search the entire cache for it right? so, they are organized -- there is an array of lists, to find which list you have to search thru, we hash the DBA into the array -- and search just that list for the block.
Arun Gupta, August 27, 2004 - 11:34 pm UTC
I am trying to understand what would cause the cache buffer chains to grow long. As I understand, the efficiency of hashing algorithm relies on the fact that blocks would hash to different values and would be distributed among the linked lists. If all blocks hashed to the same value, they will be on a single linked list and the hash search time would be O(n), which would be inefficient. I am trying to gain more in-depth understanding of what can cause the cbc to grow long and cause latch contention. The purpose is if I know what type of Oracle operations can cause cbc to grow, I can avoid such operations and latch contention.
Thanks for providing so much insight into working of Oracle.
August 28, 2004 - 9:50 am UTC
mostly "bad luck" or a "bad algorithm"
"bad algorithms" get filed as bugs and corrected, not much we can do about them except a) encounter them, b) get them fixed. It would be unlikely in current releases of the software to have this happen.
"bad luck" is just that -- bad luck. suppose you accidently have a table where every 10th block is used. further, suppose there are 10 lists. further, suppose the hashing algorithm tends to put the "i'th block of this table on the same list" (all of this is very hypothetical, doesn't really work this way). suppose further this table is the biggest, most accessed table in your entire system. All of it would be hanging off of one chain, one list in this hypothetical drama. one cbc latch would be really hot here, with tons of data behind it.
Latches - concepts
Parag Jayant Patankar, October 28, 2004 - 5:11 am UTC
Dear Tom,
1. I have came accross documents saying that "reparsed shareable SQL should be avoided ( check v$sqlarea ) for Shared Pool and Library Cache latches contention. Will you pl explain the concecpt of "reparsed shareable SQL" ?
2. Is DB_BLOCK_LRU latches obsolete in Oracle 9i ?
regards & thanks
pjp
October 28, 2004 - 12:21 pm UTC
1) need to see more in context.
and you know, documents that say "X", but don't say "why X" -- well, look for some other paper.....
reparsed is easy -- but don't know if they mean hard or soft, or in what context.
shareable sql is sql that can be used by other sessions easily -- that is, SQL with bind variables. eg: in a HR system:
select * from emp where empno = :x;
is "shareable", lots of people can use it over and over, but:
select * from emp where empno = 1234;
is not very "shareable", not too many people will be interested in empno = 1234;
2) yes
Difference in 8i and 9i
naresh, November 18, 2004 - 4:01 am UTC
Hi Tom,
From what I have read:
In 8i the latch serializes for readers as well as writers. In 9i, the readers can share this latch.
So in 9i, in the case that there are several reader processes accessing the same hot blocks repeatedly, they would swamp out the writers completely, because they would keep getting the latch.
Would the above be likely? I admit, having "several reader processes accessing the same hot blocks repeatedly" is bad design - but right now I am just trying to convince some folks about changing their design as we move from 8i to 9i.
We already have latch wait as out top event in statspack - and these are CBC - my guess is things would get worse in 9i.
Thanks.
November 18, 2004 - 10:41 am UTC
can -- depends on os implementations.
things would not get worse in general (less people in line if they are all going at the same time)
meaning of "writers"
Alberto Dell'Era, November 18, 2004 - 6:01 pm UTC
> So in 9i, in the case that there are several reader processes accessing the
> same hot blocks repeatedly, they would swamp out the writers completely,
> because they would keep getting the latch.
Is it "writers *of the hash chain structure* ", ie wanting to add/remove a block from the hash chain (sessions reading a block from disk, etc) ?
A writer *of the block* (ie wanting to update a row) is a reader in this context, correct ?
BTW It would be nice if the reviewer posted a link to the place where he read that.
Shared Read Latch
Jonathan Lewis, November 18, 2004 - 10:57 pm UTC
As far as I can tell, the shared read latch has only limited benefit for the cache buffers chains latch.
On a typical 'get', the session has to acquire a
'pin' structure on the buffer header before reading the
buffer. (The sequence is: get latch, search chain, pin buffer header, drop latch, read buffer, get latch unpin buffer header, drop latch), so most buffer reads start with a buffer header write and therefore the latch get is an exclusive write, not a shared read.
There are 'consistent gets - examination' which I believe are buffer reads whilst holding the latch, and these can use shared read. Typical cases for this are reads of index root block, reads of undo blocks, and reads of single table hash cluster blocks with no collisions.
November 19, 2004 - 10:15 am UTC
Jonathan -- thanks much.
Alberto Dell'Era, November 19, 2004 - 11:26 am UTC
Many thanks to Jonathan...
Enlightning.
db_block_lru_latches
SREENIVASA RAO, November 25, 2004 - 3:11 am UTC
Hi TOM,
Thanks in advance for your guidence.
MY db production runnning on 9.2.0.1.0
I have altered some procedures and functions yesterday.So the cahe hit came down to 30% here.
One table of size 4gb,which contains hot blocks accessing very frequently.but i can't keep in KEEP CACHE.
could you make a suggestion to increase having a look here
Snap Id Snap Time Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
Begin Snap: 76 25-Nov-04 11:52:42 43 33.9
End Snap: 77 25-Nov-04 12:15:11 39 35.9
Elapsed: 22.48 (mins)
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 704M Std Block Size: 8K
Shared Pool Size: 704M Log Buffer: 512K
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 4,801.42 7,757.03
Logical reads: 3,995.07 6,454.31
Block changes: 30.11 48.65
Physical reads: 2,625.00 4,240.86
Physical writes: 6.12 9.89
User calls: 43.42 70.15
Parses: 5.38 8.69
Hard parses: 1.49 2.40
Sorts: 2.54 4.10
Logons: 0.01 0.02
Executes: 11.56 18.67
Transactions: 0.62
% Blocks changed per Read: 0.75 Recursive Call %: 34.95
Rollback per transaction %: 0.24 Rows per Sort: 1262.97
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 99.99
Buffer Hit %: 34.29 In-memory Sort %: 100.00
Library Hit %: 91.52 Soft Parse %: 72.31
Execute to Parse %: 53.49 Latch Hit %: 99.97
Parse CPU to Parse Elapsd %: 97.60 % Non-Parse CPU: 98.13
Shared Pool Statistics Begin End
------ ------
Memory Usage %: 30.01 35.34
% SQL with executions>1: 34.81 35.18
% Memory for SQL w/exec>1: 40.54 40.63
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 217 76.65
db file scattered read 601,768 34 12.09
db file sequential read 227,244 23 7.98
log file sync 630 6 1.99
control file parallel write 447 2 .62
-------------------------------------------------------------
Wait Events for DB: RFES Instance: RFES Snaps: 76 -77
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
db file scattered read 601,768 0 34 0 720.7
db file sequential read 227,244 0 23 0 272.1
log file sync 630 1 6 9 0.8
control file parallel write 447 0 2 4 0.5
enqueue 2 0 1 628 0.0
buffer busy waits 71 0 0 4 0.1
process startup 12 0 0 17 0.0
latch free 4 2 0 10 0.0
log file switch completion 2 0 0 17 0.0
SQL*Net break/reset to clien 168 0 0 0 0.2
slave TJ process wait 1 1 0 9 0.0
log file parallel write 1,062 1,060 0 0 1.3
control file sequential read 249 0 0 0 0.3
db file parallel write 612 306 0 0 0.7
db file parallel read 13 0 0 0 0.0
log file single write 2 0 0 0 0.0
SQL*Net more data to client 3 0 0 0 0.0
LGWR wait for redo copy 9 0 0 0 0.0
log file sequential read 2 0 0 0 0.0
SQL*Net message from client 58,643 0 29,424 502 70.2
virtual circuit status 45 45 1,319 29307 0.1
wakeup time manager 42 42 1,259 29975 0.1
jobq slave wait 359 341 1,051 2928 0.4
SQL*Net message to client 58,639 0 0 0 70.2
SQL*Net more data from clien 4 0 0 0 0.0
-------------------------------------------------------------
Background Wait Events for DB: RFES Instance: RFES Snaps: 76 -77
-> ordered by wait time desc, waits desc (idle events last)
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
control file parallel write 447 0 2 4 0.5
db file scattered read 48 0 0 7 0.1
process startup 12 0 0 17 0.0
db file sequential read 21 0 0 3 0.0
log file parallel write 1,062 1,060 0 0 1.3
db file parallel write 612 306 0 0 0.7
control file sequential read 189 0 0 0 0.2
log file single write 2 0 0 0 0.0
LGWR wait for redo copy 9 0 0 0 0.0
log file sequential read 2 0 0 0 0.0
rdbms ipc message 4,113 3,090 7,999 1945 4.9
smon timer 4 4 1,200 ###### 0.0
-------------------------------------------------------------
Instance Activity Stats for DB: RFES Instance: RFES Snaps: 76 -77
Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
CPU used by this session 21,712 16.1 26.0
CPU used when call started 21,642 16.0 25.9
CR blocks created 170 0.1 0.2
DBWR checkpoint buffers written 8,255 6.1 9.9
DBWR checkpoints 1 0.0 0.0
DBWR transaction table writes 0 0.0 0.0
DBWR undo block writes 610 0.5 0.7
SQL*Net roundtrips to/from client 58,629 43.5 70.2
active txn count during cleanout 209 0.2 0.3
background checkpoints started 1 0.0 0.0
background timeouts 2,013 1.5 2.4
branch node splits 1 0.0 0.0
buffer is not pinned count 5,187,256 3,845.3 6,212.3
buffer is pinned count 547,834 406.1 656.1
bytes received via SQL*Net from c 4,639,572 3,439.3 5,556.4
bytes sent via SQL*Net to client 22,105,960 16,386.9 26,474.2
calls to get snapshot scn: kcmgss 29,550 21.9 35.4
calls to kcmgas 1,815 1.4 2.2
calls to kcmgcs 76 0.1 0.1
change write time 31 0.0 0.0
cleanout - number of ktugct calls 330 0.2 0.4
cleanouts and rollbacks - consist 109 0.1 0.1
cleanouts only - consistent read 21 0.0 0.0
cluster key scan block gets 176 0.1 0.2
cluster key scans 125 0.1 0.2
commit cleanout failures: block l 29 0.0 0.0
commit cleanout failures: callbac 3 0.0 0.0
commit cleanout failures: cannot 0 0.0 0.0
commit cleanouts 7,749 5.7 9.3
commit cleanouts successfully com 7,717 5.7 9.2
commit txn count during cleanout 519 0.4 0.6
consistent changes 1,430 1.1 1.7
consistent gets 5,344,096 3,961.5 6,400.1
consistent gets - examination 164,573 122.0 197.1
cursor authentications 724 0.5 0.9
data blocks consistent reads - un 251 0.2 0.3
db block changes 40,623 30.1 48.7
db block gets 45,251 33.5 54.2
deferred (CURRENT) block cleanout 3,445 2.6 4.1
enqueue conversions 246 0.2 0.3
enqueue releases 5,351 4.0 6.4
enqueue requests 5,351 4.0 6.4
enqueue timeouts 0 0.0 0.0
enqueue waits 2 0.0 0.0
execute count 15,593 11.6 18.7
free buffer inspected 18 0.0 0.0
free buffer requested 3,541,896 2,625.6 4,241.8
hot buffers moved to head of LRU 837 0.6 1.0
immediate (CR) block cleanout app 130 0.1 0.2
immediate (CURRENT) block cleanou 1,989 1.5 2.4
index fast full scans (full) 193 0.1 0.2
index fetch by key 39,986 29.6 47.9
index scans kdiixs1 226,596 168.0 271.4
leaf node 90-10 splits 4 0.0 0.0
leaf node splits 63 0.1 0.1
logons cumulative 20 0.0 0.0
Instance Activity Stats for DB: RFES Instance: RFES Snaps: 76 -77
Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
messages received 1,312 1.0 1.6
messages sent 1,312 1.0 1.6
no buffer to keep pinned count 0 0.0 0.0
no work - consistent read gets 4,960,107 3,676.9 5,940.3
opened cursors cumulative 4,688 3.5 5.6
parse count (failures) 32 0.0 0.0
parse count (hard) 2,008 1.5 2.4
parse count (total) 7,253 5.4 8.7
parse time cpu 407 0.3 0.5
parse time elapsed 417 0.3 0.5
physical reads 3,541,119 2,625.0 4,240.9
physical reads direct 0 0.0 0.0
physical writes 8,255 6.1 9.9
physical writes direct 0 0.0 0.0
physical writes non checkpoint 4,138 3.1 5.0
pinned buffers inspected 0 0.0 0.0
prefetched blocks 2,712,113 2,010.5 3,248.0
process last non-idle time 8,810,915,491 6,531,442.2 ############
recursive calls 31,479 23.3 37.7
recursive cpu usage 6,540 4.9 7.8
redo blocks written 13,339 9.9 16.0
redo buffer allocation retries 2 0.0 0.0
redo entries 21,560 16.0 25.8
redo log space requests 2 0.0 0.0
redo log space wait time 3 0.0 0.0
redo size 6,477,120 4,801.4 7,757.0
redo synch time 590 0.4 0.7
redo synch writes 637 0.5 0.8
redo wastage 270,788 200.7 324.3
redo write time 15 0.0 0.0
redo writes 1,062 0.8 1.3
rollback changes - undo records a 10 0.0 0.0
rollbacks only - consistent read 103 0.1 0.1
rows fetched via callback 22,585 16.7 27.1
session connect time 8,810,915,491 6,531,442.2 ############
session logical reads 5,389,347 3,995.1 6,454.3
session pga memory max 5,174,560 3,835.9 6,197.1
session uga memory max 99,027,816 73,408.3 118,596.2
shared hash latch upgrades - no w 226,158 167.7 270.9
shared hash latch upgrades - wait 0 0.0 0.0
sorts (memory) 3,420 2.5 4.1
sorts (rows) 4,319,353 3,201.9 5,172.9
switch current to new buffer 268 0.2 0.3
table fetch by rowid 393,965 292.0 471.8
table fetch continued row 292 0.2 0.4
table scan blocks gotten 4,694,020 3,479.6 5,621.6
table scan rows gotten 309,273,688 229,261.4 370,387.7
table scans (long tables) 373 0.3 0.5
table scans (short tables) 2,997 2.2 3.6
transaction rollbacks 4 0.0 0.0
user calls 58,578 43.4 70.2
user commits 833 0.6 1.0
user rollbacks 2 0.0 0.0
workarea executions - optimal 5,212 3.9 6.2
write clones created in foregroun 0 0.0 0.0
Instance Activity Stats for DB: RFES Instance: RFES Snaps: 76 -77
Buffer Pool Statistics for DB: RFES Instance: RFES Snaps: 76 -77
-> Standard block size Pools D: default, K: keep, R: recycle
-> Default Pools for other block sizes: 2k, 4k, 8k, 16k, 32k
Free Write Buffer
Number of Cache Buffer Physical Physical Buffer Complete Busy
P Buffers Hit % Gets Reads Writes Waits Waits Waits
--- ---------- ----- ----------- ----------- ---------- ------- -------- ------
D 84,084 65.8 10,348,921 3,541,062 8,255 0 0 71
K 2,002 100.0 53,510 0 0 0 0 0
R 2,002 0 0 0 0 0 0
-------------------------------------------------------------
Instance Recovery Stats for DB: RFES Instance: RFES Snaps: 76 -77
-> B: Begin snapshot, E: End snapshot
Targt Estd Log File Log Ckpt Log Ckpt
MTTR MTTR Recovery Actual Target Size Timeout Interval
(s) (s) Estd IOs Redo Blks Redo Blks Redo Blks Redo Blks Redo Blks
- ----- ----- ---------- ---------- ---------- ---------- ---------- ----------
B 230 44 14818 41481 40671 184320 40671
E 230 33 10129 31377 30929 184320 30929
-------------------------------------------------------------
Buffer Pool Advisory for DB: RFES Instance: RFES End Snap: 77
-> Only rows with estimated physical reads >0 are displayed
-> ordered by Block Size, Buffers For Estimate
Size for Size Buffers for Est Physical Estimated
P Estimate (M) Factr Estimate Read Factor Physical Reads
--- ------------ ----- ---------------- ------------- ------------------
K 16 1.0 2,002 1.00 149
K 32 2.0 4,004 1.00 149
K 48 3.0 6,006 1.00 149
K 64 4.0 8,008 1.00 149
D 64 .1 8,008 259.86 1,451,521,004
K 80 5.0 10,010 1.00 149
K 96 6.0 12,012 1.00 149
K 112 7.0 14,014 1.00 149
K 128 8.0 16,016 1.00 149
D 128 .2 16,016 151.23 844,743,564
K 144 9.0 18,018 1.00 149
K 160 10.0 20,020 1.00 149
K 176 11.0 22,022 1.00 149
K 192 12.0 24,024 1.00 149
D 192 .3 24,024 35.66 199,193,488
K 208 13.0 26,026 1.00 149
K 224 14.0 28,028 1.00 149
K 240 15.0 30,030 1.00 149
K 256 16.0 32,032 1.00 149
D 256 .4 32,032 8.67 48,438,515
K 272 17.0 34,034 1.00 149
K 288 18.0 36,036 1.00 149
K 304 19.0 38,038 1.00 149
K 320 20.0 40,040 1.00 149
D 320 .5 40,040 1.82 10,145,852
D 384 .6 48,048 1.32 7,349,832
D 448 .7 56,056 1.14 6,349,559
D 512 .8 64,064 1.07 5,996,893
D 576 .9 72,072 1.03 5,778,652
D 640 1.0 80,080 1.00 5,609,436
D 672 1.0 84,084 1.00 5,585,714
D 704 1.0 88,088 0.99 5,556,457
D 768 1.1 96,096 0.99 5,529,572
D 832 1.2 104,104 0.97 5,433,103
D 896 1.3 112,112 0.97 5,420,452
D 960 1.4 120,120 0.97 5,420,452
D 1,024 1.5 128,128 0.97 5,420,452
D 1,088 1.6 136,136 0.97 5,420,452
D 1,152 1.7 144,144 0.97 5,420,452
D 1,216 1.8 152,152 0.97 5,420,452
D 1,280 1.9 160,160 0.97 5,420,452
-------------------------------------------------------------
Buffer wait Statistics for DB: RFES Instance: RFES Snaps: 76 -77
-> ordered by wait time desc, waits desc
Tot Wait Avg
Class Waits Time (s) Time (ms)
------------------ ----------- ---------- ---------
data block 71 0 4
-------------------------------------------------------------
November 25, 2004 - 9:42 am UTC
so, what was the hit ratio before
and if you look at the aggregate times, you are not waiting very long here -- do you have a "problem" or a "perceived problem"
(hint: tune SQL, look for your sql doing lots of work and ask "why")
A reader, November 25, 2004 - 8:10 pm UTC
Reminds me of Gaja :-)
November 25, 2004 - 8:20 pm UTC
he is a smart guy, if you mean gaja of "101" fame.
A reader, November 26, 2004 - 11:33 am UTC
Yes Tom he is a smart guy, people suffering from CTD :-)
start of a dba career
Mike Yu, December 27, 2004 - 6:58 pm UTC
Hi,
I'm trying to get into the oracle database field, and I'm hoping you could shed some light on how to be a good dba. Most positions on the web require 5-8 years experience, for graduates and newcomers, what do you recommend?
Also
If a degree matters in obtaining a job, what degree would compliment the dba field such as computer science, electrical, computer enginner, versus certifications and hands on experience in real time situations.
December 27, 2004 - 7:43 pm UTC
sort of off topic....
I got started as "programmer wanted, no experienced necessary" (eg: at the total bottom, which was good, not too much else to do with a BS in Math except go back for more school). honest, that is what the ad in the washington post said...
worked the requisite 5/8 years to get the experience (I had over 6 years before I got to Oracle) and then probably another 7 before I was starting to really get noticed.
I answered lots of questions -- my own and others. I researched things. I gained more than a superficial knowledge of my chosen topic (databases -- not just Oracle but data and databases and data processing and transactions and the entire thing).
The matter of degrees/certifications is going to vary from employer to employer -- I have people in my group with everything from masters degrees to high school diplomas -- to some, the paper is less relevant after 15 years, to others -- it is all about the paper.
the skill I appreciate most in anyone? your ability to interact with other human beings - period.
very wishy washy answer -- but it is a hard one, I actually don't want to be very concrete with something like this.
LATCH.cache buffers chains
Nilanjan Ray, December 28, 2004 - 4:11 am UTC
Hi Tom,
One of our application is facing wait for cache buffers chains latch. what approach should I take to resolve the latch contention. Details using *runstats* is given below.
The sample is from development environment. the issue is more severe in production.
SQL>insert into run_stats
2 select 'before',a.*
3 from stats a ;
487 rows created.
Elapsed: 00:00:00.00
SQL>commit;
Commit complete.
Elapsed: 00:00:00.00
SQL>delete from usrpriv
2 where task_key = 163;
83430 rows deleted.
Elapsed: 00:00:11.02
SQL>commit;
Commit complete.
Elapsed: 00:00:00.00
SQL>insert into run_stats
2 select 'after',a.*
3 from stats a ;
487 rows created.
Elapsed: 00:00:00.00
SQL>commit;
Commit complete.
Elapsed: 00:00:00.00
SQL>select *
2 from
3 (
4 select a.name, b.value-a.value run1
5 from run_stats a, run_stats b
6 where a.name = b.name
7 and a.runid = 'before'
8 and b.runid = 'after'
9 and (b.value-a.value) >0
10 order by abs( (b.value-a.value)) desc
11 )
12 where rownum <= 10
13 /
NAME RUN1
-------------------------------------------------- ----------
STAT...redo size 71340924
LATCH.cache buffers chains 2861676
STAT...session logical reads 678646
STAT...db block gets 677252
STAT...db block changes 674637
LATCH.redo allocation 336349
STAT...redo entries 335379
STAT...table scan rows gotten 178813
LATCH.library cache 33825
LATCH.library cache pin 31255
10 rows selected.
Elapsed: 00:00:00.00
here's a sample of how the data is distributed..
SQL>select task_key,count(DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid))
2 from usrpriv
3 group by task_key
4 order by 2;
TASK_KEY COUNT(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
---------- -------------------------------------------
390 2
309 4
269 22
162 164
160 255
369 321
146 522
144 874
166 1392
142 1564
138 1950
TASK_KEY COUNT(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
---------- -------------------------------------------
137 1978
147 2060
165 2088
161 2239
131 2244
133 2376
141 3219
143 3306
139 3477
136 3784
70 4640
TASK_KEY COUNT(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
---------- -------------------------------------------
167 4802
134 7150
132 9700
17122
145 18128
163 83430
thanks in advance for your valuable help and guidance.
Regards
December 28, 2004 - 10:29 am UTC
what are your waits, are they "significant", show me the numbers.
to mike
aman, December 28, 2004 - 9:00 am UTC
hi mike!
you are rite that experience requirement is there and these are very very strong too.but what i have seen that even if you wont have any experience or some small expierence too then too things can be great for you.Yes what matters is the knowledge about the Tool,Language or about *Oracle* that can prove you that you actually know the job.I am doing my masters in computer applications without having no industry eperience but still i got a job of doing recovery in a bank where oracle was installed and i did that successfully.Now i have got another contract for the application tuning(lets see how things turn.Fingers are crossed!!)but what i feel if you have the right skills and knoweldge then there even with some odds and difficulties too,you will get what you want.
About the second thing,yes this is definately a requirement from company to company what they are looking for in a certain employee.Some are really strict about the educational conditions but again if you have the right skills that can outshadow this thing,again no limit.I have a science college degree and now i am doing masters with no relation but my masters in computer science is teaching me theoretical terms which are very important to make foundation of-course practical industry is entirely different.still i feel this helps a lots at times.about certification ,it is the really the best thing and the best way to learn the basics of Oracle Database.even if you dont want to go for examz ,still training will definately help you(both certainely did a lot for me)
At the end its all you and your skills that will help you.
Sir Please comment if i am write?
thanx
December 28, 2004 - 10:50 am UTC
<quote>
I am doing my masters in computer applications without
having no industry eperience but still i got a job of doing recovery in a bank
where oracle was installed and i did that successfully.
</quote>
have they actually experienced a catstrophic failure yet -- sort of scares me when I see something like this. "no experience but I set up the recovery solution for a bank". did someone pull the plug on the test system, did an experienced person come up with 50 test scenarios -- just to see if you can actually recover or not. Recovery is not to be taken lightly (last thing you want to go to the lowest bidder)
Recovery is the one thing a DBA is not allowed to mess up.
You can mess up tuning, we can fix it.
You can mess up your backup procedure, we can fix it.
You can mess up implementation, we can fix it.
But -- to mess up recovery, well it is too late to fix anything by then....
agree with everything else pretty much.
A reader, December 28, 2004 - 11:14 am UTC
Hey Aman, which bank was that in India, will think twice before investing :-))
yes sir i got ur point!
aman, December 29, 2004 - 9:12 am UTC
sir .i got your point that in the initial stages i must take more test cases for practise and definately work under the guidance in an expierenced person for specially recovery.sir i will be more carefull next time and definately wil not try to mess up anything with anything like this as it can never be always a lucky day.thanx a bunch you are the best sir !
to a reader:-
:-) well i have my account in the same bank too so its ok to invest.rest is your choice!!wish u all the best ;-)
starting out
Mike Yu, December 30, 2004 - 4:30 pm UTC
Thanks for the advice!
effect of index rebuild on cache buffer chain
Parikshit Paul, March 18, 2005 - 11:37 am UTC
TOM:
Does a rebuild of indexes causes waits on cache buffer chain??
Well I am having a very high wait on cache buffer chain after i rebuilt the indexes of a table(table size 247G).The highest TCH on the table block is 26(now. It was 2320 some hours before on a different block)
March 18, 2005 - 12:01 pm UTC
It could.
but it would not manifest itself on the table block (a table reorg would possibly do that).
it would manifest itself as a hot index block(s). That is the thing that changed structure here, the index.
how? before rebuild, index was fat (and will get fat again...). it was 50% whitespace because you insert into the index randomly (eg: the index is on last name and names arrive "out of order" into the index).
So, each leaf block had 100 entries on it.
You rebuild. Now there are fewer branch blocks (it is skinnier) for a while until it gets fat again.
You might notice increased CPU during your inserts now and increased redo generation and lots more current mode gets as well. (eg: more IO, IO = cache buffers chain latching)
There are fewer leaf blocks (and hence more index entries per block). In fact, twice as many per block. So you have twice as many people going after the same block(s) now.
This is hypothetically speaking, but easy to see
</code>
http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:35336203098853 <code>
for example.
cache_buffer_chains vs db_block_lru_latches.
Sai, March 18, 2005 - 7:11 pm UTC
Tom,
In your response to original question here, you said
"If it is long buffer chains, you can use multiple buffer pools to spread things
out. You can use DB_BLOCK_LRU_LATCHES to increase the number of latches. You
can use both together."
How is cache buffer chains contention is related to db_block_lru_latches, they are independent of each other right?
cache buffer chains latch corresponds to contention of cache buffer chains.
Am I missing something here?
Thanks.
March 18, 2005 - 8:16 pm UTC
db block lru latches controled (past tense, _ parameter now) the number of lists in the cache (the chains of buffers in the cache - the cache buffer chains)
Wait on Cache Buffer Chains
Parikshit Paul, March 19, 2005 - 6:08 am UTC
TOM:
Thanks for the response.
You said:
[QUOTE]
There are fewer leaf blocks (and hence more index entries per block). In fact,
twice as many per block. So you have twice as many people going after the same
block(s) now.
[\QUOTE]
But that would be index blocks wouldn't it? Or would be the table blocks (through index as it is not an iot)??
BTW, I followed the thread keenly when it was live.You have given a hard lesson to learn to those self procliamed Master of DBAs :)
March 19, 2005 - 7:02 am UTC
that is what I said
<quote>
it would manifest itself as a hot index block(s). That is the thing that
changed structure here, the index.
</quote>
index blocks, not table blocks.
Cache Buffer chains
Parikshit Paul, March 21, 2005 - 1:46 am UTC
Thanks a lot!
Latch statistics
Ronald Chan, May 17, 2005 - 6:38 pm UTC
I have a batch package written in PL/SQL, it was used to be run by itself, since the data volume grew, it was decided to be run in parallel.
I have noticed lengthening in execution time and hugh increase in latches (versus almost 0 when it wasnt ran in parallel) but the latch statistics shows 0 s in the wait time
I dont know which number is most important Get Request, Misses or Sleeps
I wish to find out if the latches is what is taking the extra time.
Could you please help clear the puzzle.
it is all because of hot blocks, and how to relief the hot blocks?
May 18, 2005 - 8:27 am UTC
what latch are you observing the increase in?
Re: Latch statistics
Ronald Chan, May 19, 2005 - 12:55 pm UTC
I have hugh increase in:
Latch Name Gets Misses/Sleeps
library cache 4,548,224 17,638
shared pool 2,145,391 4,660
cache buffers chains 11,176,531 4,167
library cache pin 4,066,138 2,006
redo allocation 739,607 630
sequence cache 1,060,762 509
where the Misses/Sleeps used to be zero
May 19, 2005 - 1:54 pm UTC
a) what did you change
b) does it matter (that is most important, is this affecting *anything*)
Re: Latch statistics
Ronald Chan, May 19, 2005 - 10:31 pm UTC
please kindly note:
a) what did you change
^^^^^
my PL/SQL stored procedure process 500m batch records and take 2+ hours
so I splitted it up into around 250m and start 2 instances of the stored procedure and expect it to complete in 1 hours time
but it turned out to spend 2 hours.
b) does it matter (that is most important, is this affecting *anything*)
^^^^^
that is why I post here, cause suspecting the increase in latch misses is the cause of that procedure being running slower than expected.
May 20, 2005 - 7:48 am UTC
you went from single user (no latch contention, you are the ONLY game in town) to multiuser.
this is to be expected.
so, what are the major WAITS by your application (10046 level 12 trace, tkprof, could be very revealing, see what you are waiting for and what you wait for the most)
SPIN_COUNT and SLEEP1 - SLEEP4
Christo Kutrovsky, June 09, 2005 - 3:42 pm UTC
Tom,
DB is 9.2.0.4
I am investigaing some ocasional problems with "latch free" on "cache buffers chains" waits.
I have a large number of misses and sleeps (in a given window, compared to gets). Yet SPIN_COUNT and SLEEP1 - SLEEP4 are always 0 for "cache buffers chains".Are those columns not populated in v$latch or v$latch_children? Is there no spinning when trying to obtain a cache buffers chain latch ? Given that SLEEP4 is 4 or more sleeps, it should have some values in it.
June 09, 2005 - 6:42 pm UTC
I don't have a multi-cpu 9ir2 machine with me right now, but from a 10g statspack:
Get Spin &
Latch Name Requests Misses Sleeps Sleeps 1->3+
-------------------------- -------------- ----------- ----------- ------------
cache buffers chains 2,836,958 630 41 612/4/8/6
they should not be all zeros if they are happening.
Could it be a 9i bug ?
Christo Kutrovsky, June 10, 2005 - 11:00 am UTC
Well they are all zeroes on my machine:
SQL> select * from v$latch where name like 'cache buffers chains';
ADDR 5000D3E0
LATCH# 98
LEVEL# 1
NAME cache buffers chains
GETS 163548999257
MISSES 4776602702
SLEEPS 7704389
IMMEDIATE_GETS 3645731855
IMMEDIATE_MISSES 13709210
WAITERS_WOKEN 0
WAITS_HOLDING_LATCH 3279356
SPIN_GETS 0
SLEEP1 0
SLEEP2 0
SLEEP3 0
SLEEP4 0
SLEEP5 0
SLEEP6 0
SLEEP7 0
SLEEP8 0
SLEEP9 0
SLEEP10 0
SLEEP11 0
WAIT_TIME 0
And a statspack output:
Child Get Spin &
Latch Name Num Requests Misses Sleeps Sleeps 1->4
---------------------- ------- ------------ ----------- ---------- ------------
cache buffers chains 381 2,851,637 7,747 1,318 0/0/0/0/0
cache buffers chains 725 2,832,631 4,348 197 0/0/0/0/0
cache buffers chains 1015 2,410,387 3,967 295 0/0/0/0/0
cache buffers chains 1143 2,196,529 2,848 73 0/0/0/0/0
cache buffers chains 727 2,132,478 2,540 181 0/0/0/0/0
cache buffers chains 360 1,984,093 2,486 93 0/0/0/0/0
cache buffers chains 2039 1,943,582 1,455 145 0/0/0/0/0
cache buffers chains 356 1,876,117 1,941 207 0/0/0/0/0
cache buffers chains 2024 1,843,631 1,862 152 0/0/0/0/0
cache buffers chains 1655 1,823,366 1,816 297 0/0/0/0/0
I dont have any cache buffer chains issues, but all 9i databases i could check had zeroes in spin and sleep.
June 10, 2005 - 11:17 am UTC
bug 2381327 and it is OS specific as you need to be on a system with shareable latches to see this.
fixed 10.1
patches for 9.2 (see support)
Thanks!
Christo Kutrovsky, June 10, 2005 - 11:43 am UTC
Appreciate it Tom, I understand your are not support and did some extra steps in this case.
I am sure it will be relavant to future readers trying to understand these zeroes.
Thanks.
June 10, 2005 - 3:36 pm UTC
absolutely, no problem looking into it.
Does this scenario require us to tune the latches?
Puja, August 02, 2005 - 3:39 am UTC
Hi,
Our production database suddenly crashed. This is the second time it is happening, and the set of messages is the same (Thankfully there is a logical standby configured). Do these error messages indicate tuning of latches? From what I understand, we cannot tune
the latches. Is there anything else we can do about it.
.....................................................
Status:
prstat went to above 90. no ora 600 error are indicated
...................................................
in alert.log, it says :
DBW1: terminating instance due to error 472
Tue Aug 2 12:32:58 2005
Errors in file /data1/ebsdata/admin/ebsdb/bdump/ebsdb_ckpt_25322.trc:
ORA-00472: PMON process terminated with error
Tue Aug 2 12:33:00 2005
Errors in file /data1/ebsdata/admin/ebsdb/bdump/ebsdb_lgwr_25320.trc:
ORA-00472: PMON process terminated with error
Tue Aug 2 12:33:33 2005
Errors in file /data1/ebsdata/admin/ebsdb/bdump/ebsdb_dbw3_25318.trc:
ORA-00472: PMON process terminated with error
Tue Aug 2 12:33:45 2005
Errors in file /data1/ebsdata/admin/ebsdb/bdump/ebsdb_dbw0_25312.trc:
ORA-00472: PMON process terminated with error
Tue Aug 2 12:33:46 2005
Errors in file /data1/ebsdata/admin/ebsdb/bdump/ebsdb_dbw2_25316.trc:
ORA-00472: PMON process terminated with error
........................................
pmon trace : PMON unable to acquire latch 380069a50 shared pool
possible holder pid = 107 ospid=16569
....................................
August 02, 2005 - 7:42 am UTC
please contact support.
cache buffers chains
sven, August 03, 2005 - 9:01 am UTC
Hi Tom,
My question is regarding cache buffers chains.
The scenario is following:
The environment is DWH.
I have 6-7 users who execute the same select statement with different parameters.
When I look at the v$session_wait they all wait on cache buffers chains (after translating p2 param. from this view).
It is very likely that all this sessions will read some amount of the blocks which are common for all this queries => we have here a lot of hot blocks.
What I do not understand is the following:
As far as I know in 9i, a latch holder who only reads a cache buffers chain can share access to its latch with other
readers i.e. readers do not block readers.
Do you have any idea what can cause so many cache buffers chains?
Thanks,
Sven
August 03, 2005 - 11:47 am UTC
do you do any physical IO?
review
sven, August 03, 2005 - 3:18 pm UTC
Yes, there are a physical IO done, but much more logical IO
August 03, 2005 - 6:02 pm UTC
so, PIO's have to latch the chain and modify the chain, you cannot be walking the linked list while it is being modified. So, while more than one can walk it, someone might be waiting for it to modify it. And while they are modifying it, you are not walking it.
Oracle DB shutdown automatically
Praful, August 27, 2005 - 5:15 am UTC
Hi tom
My Oracle DB shutdown automatically due to some ORA-600 errors in the alert log file
I also shows the " DBW1: terminating instance due to error 472 "
error due to which I guess DB instance shutdown abort
what could be the reason
August 27, 2005 - 5:36 am UTC
please utilize support for an issue like this.
DBA
Brian, December 02, 2005 - 11:00 am UTC
Tom, I am seeing lots of Waits on CBC Latches. I followed your example to find the hottest latch, and it lead me to a few tables. Do you have any recommendations / methods to reduce the contention for the CBC Latches on these tables? I was considering rebuilding the table, but think that might just make the problem worse. Any Suggestions?
December 02, 2005 - 11:33 am UTC
the answers to this are many and varied and huge in nature.
could be an IOT is the right answer.
could be a hash cluster
could be query tuning
could be adding an index
could be dropping an index
could be minimizing rows per block
could be increasing array fetch size in client application
could be ...
in order to suggest, one needs to know the nature of the problem (how the data is used). Ultimately the goal is to reduce LIO - all of the above are tools that under certain circumstances.
for example, if the query is:
select * from documents where username = :x;
I might say "index organized table" - we have a table with a primary key on (username,document_name) - but since documents arrive randomly here - all of the documents for a given user are spread out all over the place. Using an IOT, all of the documents for a given user would be stored on as few blocks as possible - and won't overlap with over users documents (reduced LIO, reduced number of people hitting the same blocks)
LIO - CBC - Pain
A reader, December 15, 2005 - 5:40 pm UTC
Tom,
below is the output from the statspack. its a RAC 920.
Top 5 Logical Reads per Segment for DB: PROD Instance: prod1 Snaps: 25 -
-> End Segment Logical Reads Threshold: 10000
Subobject Obj. Logical
Owner Tablespace Object Name Name Type Reads %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
PROD PROD_DATA RP_TAB_PK1 INDEX 60,789,728 99.56
the only table used in the database is the above one(50gigs). i have a lot of CBC's happening which is killing the performance.
After analyzing this thread, i have decided to rebuild this index with higher PCTFREE value.
i would appreciate your views.
Thanks,
Nayab
December 15, 2005 - 6:36 pm UTC
well, if you insert/update the table, the index leaf blocks will just fill up again.
You have an application that only uses ONE TABLE?
tell us more about this (perhaps we need some partitioning?)
CBC
A reader, December 16, 2005 - 12:04 pm UTC
Tom,
This is a database behind a portal(finacial). Information about all the investors are stored in this table. Investors log into this site and pull their data. there are around 40k investors, and everybody will hit this table, with same query behind.
December 16, 2005 - 1:08 pm UTC
sounds like a hash cluster to me
select * from t where key = :x
becomes "table access by hash cluster", no index read whatsoever.
clear screen
drop table hash_lookup;
drop table heap_lookup;
drop cluster hash_cluster;
set echo on
clear screen
create cluster Hash_Cluster
( id number )
SINGLE TABLE
hashkeys 75000 -- number of distinct lookups we expect over time
size 125 -- size of data associated with key
/
pause
clear screen
create table heap_lookup
( OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID,
DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME,
TIMESTAMP, STATUS, TEMPORARY, GENERATED, SECONDARY,
primary key(object_id)
)
as
select * from all_objects;
create table hash_lookup
( OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID,
DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME,
TIMESTAMP, STATUS, TEMPORARY, GENERATED, SECONDARY,
primary key(object_id)
)
cluster hash_cluster(object_id)
as
select * from heap_lookup;
pause
clear screen
begin
dbms_stats.gather_table_stats
( user, 'HEAP_LOOKUP', cascade=>true );
dbms_stats.gather_table_stats
( user, 'HASH_LOOKUP', cascade=>true );
end;
/
pause
clear screen
declare
type array is table of number;
l_object_ids array;
l_rec heap_lookup%rowtype;
begin
select object_id bulk collect into l_object_ids
from heap_lookup;
execute immediate q'|
alter session set events '10046 trace name context forever, level 12'
|';
for k in 1 .. l_object_ids.count
loop
select * into l_rec
from heap_lookup
where object_id = l_object_ids(k);
select * into l_rec
from hash_lookup
where object_id = l_object_ids(k);
end loop;
end;
/
pause
@connect /
Study the tkprof of that....
CBC
A reader, December 16, 2005 - 4:49 pm UTC
Tom,
There is little that i understood.
Please....
December 16, 2005 - 5:13 pm UTC
you said you have a single table most all will access by some key.
right now, you are doing tons of LIO against the index to find it.
I can take that LIO from 60,789,728 to *zero*
Run the example script I have, it shows the basic setup of a hash cluster and compares the heap table (like you have now) with the hash cluster as far as "access" goes.
If you have access to Expert one on one Oracle OR Effective Oracle by Design OR the more recent Expert Oracle: Database Architecture, I cover this data structure in depth
latches causing 100% cpu
daniel, February 20, 2006 - 3:10 pm UTC
Top,
Our db normally does not have a single latch free issue, BUT:
When there is a big load comes from one of the background process on three tables (big load is like 200 updates per sec on each table), all our user connections (there are 1000) starts spinning cpu becuase they all cause latch free contention of cache buffer chains.
So all these sids (1000) will have latch free as seen by v$sessionwait. However these sessions are not the culrits i guess, culprit is that background process which under normal conditions updates these tables say 90 upd per sec, but when it receives bulk load from the gateway (which is connected to the outside central bank which sends stock prices) then it starts to update these tables heavily and thats when the cpu of aix machine goes to 100%. we have 32cpus, all of them goes to 100 in less than 10 secs.
We then kill the background process and all seems ok.
Anyway the sql statement that the user connections wait on is:
all of then have been seen to run the following sql at the problem time:
SELECT IDXM_IDX_CHNG_DIR, TO_CHAR(NVL(IDXM_IDX_CHNG_NET, :"SYS_B_00"),:"SYS_B_01") CHANGE_NET,
TO_CHAR(NVL(IDXM_IDX_CHNG_PER, :"SYS_B_02"), :"SYS_B_03") CHANGE_PER,
TO_CHAR(NVL(IDXM_YEST_CLOS_IDX,:"SYS_B_04"),
:"SYS_B_05") YESTERDAY_INDEX, TO_CHAR(NVL(IDXM_IDX, :"SYS_B_06"), :"SYS_B_07") TODAY_INDEX,
TO_CHAR(SMKM_TRN_DT + :"SYS_B_08",:"SYS_B_09") LAST_TRADED_TIME,MKTM_SESS_STS
FROM IDXM,V_SMKM_TIME,MKTM
WHERE IDXM_IDX_ID = :"SYS_B_10"
Note that the table IDXM has 9 rows where IDXM_IDX_ID is the p-key.
Table MKTM has always single row only.
The view V_SMKM_TIME also return 1 single row, but it has the following statement:
select max(last_traded_time) from smkm, so it obviously returns 1 row, however the select max() statement from source table SMKM(which has 52) rows uses a index.
So you see the select statement that all sessions wait is having no dount cartesion join but on tables which has 9, 1, and 1 row respe.
App design can not be changed, so what do we do?
We can not distribute the rows as they are so less.
I am thinking of creating 9 paritions of the table idxm which has 9 rows and will remain 9. But am not sure if the culprit is idxm or the view table?
Tell me how do i solve this issue?
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
latch free 13,759,103 151,559 91.20
CPU time 7,112 4.28
log file sync 153,619 5,700 3.43
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
128,659,158 34,939 3,682.4 38.9 2617.45 ######### 4177808130
Module: JDBC Thin Client
SELECT IDXM_IDX_CHNG_DIR, TO_CHAR(NVL(IDXM_IDX_CHNG_NET, :"SY
S_B_00"),:"SYS_B_01") CHANGE_NET, TO_CHAR(NVL(IDXM_IDX_CHNG_PER
, :"SYS_B_02"), :"SYS_B_03") CHANGE_PER, TO_CHAR(NVL(IDXM_YEST_
CLOS_IDX,:"SYS_B_04"), :"SYS_B_05") YESTERDAY_INDEX, TO_CHAR(NV
L(IDXM_IDX, :"SYS_B_06"), :"SYS_B_07") TODAY_INDEX, TO_CHAR(SMK
NoWait Waiter
Latch Name Where Misses Sleeps Sleeps
------------------------ -------------------------- ------- ---------- --------
cache buffer handles kcbzfs 0 1 0
cache buffer handles kcbzgs 0 1 2
cache buffers chains kcbchg: kslbegin: bufs not 0 6,680,889 ########
cache buffers chains kcbchg: kslbegin: call CR 0 5,873,960 ########
cache buffers chains kcbgtcr: kslbegin excl 0 895,257 69,751
Get Spin &
Latch Name Requests Misses Sleeps Sleeps 1->4
-------------------------- -------------- ----------- ----------- ------------
cache buffers chains 921,047,041 ########### 13,751,583 0/0/0/0/0
library cache 15,609,965 64,628 4,968 59966/4462/1
February 21, 2006 - 6:58 am UTC
"app design cannot be changed"
hmm, well, given the design of the application is the single biggest contributor to its ability to scale, I guess you have to slow down the application.
laughing out loud - I see cursor sharing force is being used too. So you turned a massive HARD PARSE problem into a huge SOFT PARSE problem. Ouch, they made all of the mistakes on this one didn't they.
how cold that query you just described be doing 3,682 logical IO's per execution???
can we get a tkprof that shows where the IO's are happening in this query (with a row source operation section)
trace given for review
dani, February 23, 2006 - 2:51 am UTC
Please find trace with level 08
TKPROF: Release 9.2.0.5.0 - Production on Thu Feb 23 10:33:34 2006
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Trace file: lts2_ora_3781178_MY_dba.trc
Sort options: default
********************************************************************************
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
elapsed = elapsed time in seconds executing
disk = number of physical reads of buffers from disk
query = number of buffers gotten for consistent read
current = number of buffers gotten in current mode (usually for update)
rows = number of rows processed by the fetch or execute call
********************************************************************************
ALTER SESSION SET EVENTS '10046 trace name context forever, level 8'
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1 0.00 0.00 0 0 0 0
Misses in library cache during parse: 0
Misses in library cache during execute: 1
Optimizer goal: CHOOSE
Parsing user id: 50
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 4.59 4.59
********************************************************************************
SELECT IDXM_IDX_CHNG_DIR,TO_CHAR(NVL(IDXM_IDX_CHNG_NET, :"SYS_B_00"),:"SYS_B_01") CHANGE_NET,
TO_CHAR(NVL(IDXM_IDX_CHNG_PER, :"SYS_B_02"), :"SYS_B_03") CHANGE_PER,
TO_CHAR(NVL(IDXM_YEST_CLOS_IDX,:"SYS_B_04"),:"SYS_B_05") YESTERDAY_INDEX,
TO_CHAR(NVL(IDXM_IDX,:"SYS_B_06"),:"SYS_B_07") TODAY_INDEX,
TO_CHAR(SMKM_TRN_DT + :"SYS_B_08",:"SYS_B_09") LAST_TRADED_TIME,MKTM_SESS_STS
FROM LETS.IDXM,LETS.V_SMKM_TIME,LETS.MKTM
WHERE IDXM_IDX_ID = :"SYS_B_10"
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 58 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 60 0 1
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 50
Rows Row Source Operation
------- ---------------------------------------------------
1 MERGE JOIN CARTESIAN
1 MERGE JOIN CARTESIAN
1 TABLE ACCESS BY INDEX ROWID IDXM
1 INDEX SKIP SCAN IDXM_PR (object id 37845)
1 BUFFER SORT
1 VIEW
1 SORT AGGREGATE
1 INDEX FULL SCAN (MIN/MAX) IDX_LAST_UPD_DT (object id 51997)
1 BUFFER SORT
1 TABLE ACCESS FULL MKTM
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
buffer busy waits 1 0.00 0.00
SQL*Net message from client 2 8.68 8.68
********************************************************************************
ALTER SESSION SET EVENTS '10046 trace name context off'
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 0
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 50
********************************************************************************
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 2 0.00 0.00 0 2 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 58 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 7 0.00 0.00 0 60 0 1
Misses in library cache during parse: 2
Misses in library cache during execute: 1
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 3 0.00 0.00
SQL*Net message from client 3 8.68 13.27
buffer busy waits 1 0.00 0.00
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.00 0.00 0 0 0 0
Execute 3 0.00 0.00 0 0 0 0
Fetch 3 0.00 0.00 0 8 0 3
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 9 0.00 0.00 0 8 0 3
Misses in library cache during parse: 2
3 user SQL statements in session.
3 internal SQL statements in session.
6 SQL statements in session.
********************************************************************************
Trace file: lts2_ora_3781178_MY_dba.trc
Trace file compatibility: 9.02.00
Sort options: default
1 session in tracefile.
3 user SQL statements in trace file.
3 internal SQL statements in trace file.
6 SQL statements in trace file.
5 unique SQL statements in trace file.
85 lines in trace file.
February 23, 2006 - 7:57 am UTC
something is "up", why only 60 IO's. What is really happening here - can you get a trace with 3,000 to 4,000 IO's per execution?
pls explain further
dani, February 23, 2006 - 8:33 am UTC
Hi,
I did not follow you when you said get a trace 3000 ios?
how do i do that? The table has only 9 rows.
If you want to know the background, well the earlier statspack report which i gave you where you see top even as lath free and that sql whih tops buffer gets, was taken during the time when there was a problem in the database.
It is that problem which we like to solve, the senerio is:
1. when the load inreases 10 fold, all sessions (900) connted to db complain of latch free and that too on that samel sql idxm,,, sql.
Under the normal load, statspak does not show single lath free nor show that sql as top buffer gets.
2. Acording to oracle, since we are on cur sharing force, the pla might change for sql but only when we have historgrams defined, but that theory failed bec we do not have historgrams.
Lately this is happening 4 times a day. When it happens the db server goes 100% cpu and all sessions show latch free on that sql. If you mean running the trace level 8 from another sqlplus in that situation, i can do that when the problem apears again, is that what u wanted?
cheers
February 23, 2006 - 8:43 am UTC
You told me initially this query does 3,683.4 IO's per execution:
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
128,659,158 34,939 3,682.4 38.9 2617.45 ######### 4177808130
Module: JDBC Thin Client
SELECT IDXM_IDX_CHNG_DIR, TO_CHAR(NVL(IDXM_IDX_CHNG_NET, :"SY
S_B_00"),:"SYS_B_01") CHANGE_NET, TO_CHAR(NVL(IDXM_IDX_CHNG_PER
, :"SYS_B_02"), :"SYS_B_03") CHANGE_PER, TO_CHAR(NVL(IDXM_YEST_
CLOS_IDX,:"SYS_B_04"), :"SYS_B_05") YESTERDAY_INDEX, TO_CHAR(NV
L(IDXM_IDX, :"SYS_B_06"), :"SYS_B_07") TODAY_INDEX, TO_CHAR(SMK
That is where I got my information from - you.
So, on your production system - let us see it from THERE please.
1) but is the latch free on library cache or on cache buffers chains. I rather suspect you have a massive parsing problem in your system - based on your need to use cursor_sharing=force/similar. The developers DID NOT USE BIND VARIABLES (You know, after 13 years - I'm still wondering when people will "get it")
2) such a simple query, I want to see a tkprof for it FROM PRODUCTION, when the average gets/execute are 3000 plus.
I've got a feeling this is parsing related as much as anything else. But, tell us about the objects involved - is this the classic "really small, but really really really hot (modified) table"?
yes it is hot
dani, February 23, 2006 - 9:26 am UTC
Hi Tom,
I guess i have confused you. let me clear please.
I am only referring to prod, before and now when i run the trace.
Before when i send statspack, that was when prod has that problem. And the trace was sent when prod is normal.
Yes its a hot block issue, that tables in question have 9, 1 and 1 rows only.
There are 2 Bacground processes keeps on updating these tables which are in the from clause of this query, at rate of 200 updates/sec(200 commits). Same time, there are 900 connections to the database, all of them need to run this query very often.
When the problem starts,
the 16 cpus goes 100%, all sessions show latch free on this query, what we do is, we kill those two background process and all seems ok. After a minute we bring those processes up again.
Let me clear another point.
There are actually 10 background processes, 8 background process are the heart of application which does the actual buying and selling of stocks between users of db and the central bank.
Centeral back is connected at one end of these 10 processes, while at other end these process are connected to our db.
These are java process which scan ports (which receive data from central bank) and then these processes updates our db.
So these 10 processes always stay connected to our db(sort of middleware).
Now when the cetral bank sends bulk load via these 10 processes, what we found is that we just need to kill the two java process which are responisble for sending the prices updates to these tables (which are in from clause of query in question). The other 8 processes kept working find and after 30 secs or so the whole load is processed and we start the 2 engines back.
But this is happened so often lately.
I am not sure if you are tom or some one else, becuase once i read where tom rpelied that he has a team to answer people. So if you search i have opened another thread with you. i,e to ask which approach is best for replication under titel of merge in oracle9i thread. There you advised me to user merge in replication.
That idea is basically to put the two engines on another server which then kept on updating these tables from central bank, while the power builder application replicates those tables at 2 secs delay. We thought since we can not modify the behaviour of java process(since these java process updates these tables so heavy but our customers can stay with seeing stock prices with a sec or 2 delay).
Do you think this approach of latch issue will solve our latches?
are you the same person who answered my merge question?
February 23, 2006 - 10:03 am UTC
You read wrong. There is me, I am Tom - it would be ask_a_group_of_people.oracle.com otherwise.
You have the classic "small very very very hot table".
The other question you had 700 rows - how could this be the same?
explanation
dani, February 23, 2006 - 10:21 am UTC
Tom,
Thanks for clarifying, that its you.
Well yes you are right i owe you an explanation.
There are three tables involved;
IDXM has 9 rows, MKTM has single row and
the view V_SMKM is based on (select max(..) from smkm)
where smkm has 59 rows.
These tables need to be replicted,
I am sorry I made that thread to be as generic as it can be,
so instead of saying three tables, i made a case of one table with 700 rows. Am sorry for that
So tom, what do you think....
Can I call you tom please.. if u can give me your phone....
February 23, 2006 - 10:27 am UTC
You cannot call me, no. That would not work very well.
sounds like you just want to create complete refresh materialized views in a snapshot group.
All three tables will be fully refreshed in a consistent fashion on your schedule.
That will avoid the small hot table colliding with the excessively freqent queries against the same.
dani, February 23, 2006 - 11:35 am UTC
Hi Tome.
I understand , its ok, thanks for your answers, very useful.
Lastly, please clarify the followig:
1, You recommended materialiized views, did you mean i should not use the merge operation to copy tables between two databases but instead enable materialized view via dblink? I used mnaterialized view before in single warehouse database.
2. Is materialized view option easy to implement without enabling any options of databases on either side like mts etc? or have db link contention?
3, Alternatively can i try creating paritions on idxm table since it has 9 rows? is there such a thing as list of values parition, becuase these are char values and not ranges, or shall i try creating hash parition, would parition reduce hot blocks??
i have tried pctfree etc, did not improve?
Regards
PS: I have read one pdf document Resolving_Oracle_Latch_Contention by Guy Harrison who says that if db has no waits, abundant cpus, no io disk subsystem issues, then that db performs so fast that all operions are nothing but sheer shared memory access and hence latches, so he was saaying that thats what is the ultimate limiting factor in that kind of oracle db. I think thats our db where top5 waits are nothing, except cpu wait is 95%.
February 23, 2006 - 6:43 pm UTC
1) I believe in the "why write code if I don't have to". You finally described to me the real problem - not a made up one.
Yes, I would use a materialized view.
You would create a snapshot refresh group with all three tables in it.
2) yes.
3) yes, there is list partitioning, but you have just 9 rows, you still have the read consistency issues - that is what is causing the 3-4,000 IO's per execution in real life. Heck, you could just pad the rows out so they fit on single block with no room for another row - but it won't fix it totally, it is still a very very hot TINY -- teeny tiny -- table.
how do we do materialized across 2 dbs
dani, February 24, 2006 - 6:43 am UTC
Tom,
Since you favoured the materialized view concept better than our merge plsql code, I would do a test on that option.
What I like to ask, is how do I setup such a materialized replication, can you please give me some steps on how to do that, you can take examnples of two of your tables in your test db. How would I meaure the performance? like i did for the earlier three tests(delete-insert, update and merge)
What I am interested in is contention, will mat-view bring more contention-waits than merge?
And how would i refresh the mat view every second?
Cheers
local vs remote
dani, February 24, 2006 - 9:19 am UTC
Hi Tom,
I understood the refresh part now.
One question:
Separating these three tables to a different database on
a different server IS BETTER or how about if I CREATE another schema in the same database and have the three tables created there and the background process write to these tables. Then i create a materialized replication from these three tables in that schema onto the same three tables in apps schema. This way still hundreds of sessions reads from the replicated tables and not the tables where the background procerss writes. So these hundreds of sessions should not experience latchfree/cache buffer chains.
However the materized view process might expericne it when refreshing and can spin the cpu to 100%, but thats yet to be seen.
What do you think? If this works atleast i wont be adding another point of failure in the architecture.
February 24, 2006 - 10:47 am UTC
you only need to create a different schema (or just a different set of three tables, in the same schema)
I doubt that a single session (the mv refresh) hitting these tables would result in the same thing as hundreds to thousands hitting it - it is the sheer volume of the number of people (sessions) needing consistent reads on these three super hot tables that is causing this.
create public synonym dual for mydual;
VLS, March 28, 2006 - 11:45 pm UTC
Hi Tom,
In this and in some other forums, I have read the benefits of creating our own table instead of using sys owned DUAL table. I have also seen the benefits of this in one of my database. But in some cases, changing the application code is very difficult especially in Oracle Apps / CRM application. As a workaroung for these applications, can I follow these steps so that the application uses our created mydual table instead of SYS owned dual table.
1. DROP PUBLIC SYNONYM DUAL.
2. In Production user schema, create PUBLIC Synonym dual for mydual (create public synonym dual for mydual;)
3. GRANT select on dual to public.
I have checked for any queries on DUAL executed from SYS, it uses SYS owned dual table whereas for any queries on DUAL from other schemas, it uses a synonym which in turn uses MYDUAL.
Is it a correct approach ? I don;t want to change the application.
Regards
VLS
March 29, 2006 - 7:06 am UTC
I cannot validate or condone this practice.
By making DUAL an index organized table, you have forced the use of the Cost based optimizer for all queries that reference it. It could have side effects elsewhere that are not immediately obvious.
Might be better not to replace dual
A reader, March 31, 2006 - 11:04 pm UTC
Bottleneck on Cache buffers lru chain
Vikas, June 10, 2006 - 7:24 am UTC
Hi Tom,
I am into a very peculiar situation!
The database is experiencing a lot of 'db file sequential reads' because the optimizer suggests that the NL approach is the best when trying to join a tables' partition having 10 million rows to another table having 818 million rows using its Primary Key.
I have been seeing this activity for quite a few days and thinking why it is not going in for a HASH JOIN, becuause the result sets are so huge, and we also know that the resulting set will anyways contain 10 million rows as parents to these rows will definitely be found. This SQL just illustiates a simple join between these two tables.
The PQ slaves for the co-ordinator experience a huge 'cache buffers lru chain' wait event and what I feel that the PQ session has to wait for the latch to get the block in the buffer cache enabling it to read becuase of 'db file sequential read' wait.
Does that mean 'db file sequential read' will also induce 'cache buffers lru chain' wait if the query is trying to access the hot blocks of the database.
What can be done in order to avoid the latch wait?
db_block_buffers integer 725000
_db_block_lru_latches integer 256
db_writer_processes integer 8
Please suggest what can be done?
June 10, 2006 - 10:16 am UTC
what you say is not computing. Let's see that query plan please. If what you say is true - it would not use a nested loop.
Free List
atul, October 15, 2006 - 8:01 am UTC
Hi,
We are facing performace issues.
Lot of sessions are waiting for "Latch Free" event,Below is the wait events.
SQL> @sesswait
Database: ERPQ2 2006 Oct 15 04:53am
Report: sesswait.sql Page 1
Session Wait Information
Wait
SID USERNAME LOGON_TIME Event P1 P2 (Sec.)
---------- ---------- ------------------ ------------------------------ ---------- ---------- -------
20 APPS 15-OCT 04:07 latch free 1.8748E+10 98 0
125 APPS 15-OCT 04:08 latch free 1.8749E+10 98 0
221 APPS 15-OCT 04:07 latch free 1.8759E+10 98 0
234 APPS 15-OCT 04:07 latch free 1.8749E+10 98 0
318 APPS 15-OCT 04:07 latch free 1.8749E+10 98 0
351 APPS 15-OCT 04:07 latch free 1.8759E+10 98 0
369 APPS 15-OCT 04:08 latch free 1.8759E+10 98 0
407 APPS 15-OCT 04:07 latch free 1.8759E+10 98 0
416 APPS 15-OCT 04:07 latch free 1.8749E+10 98 0
450 APPS 15-OCT 04:08 latch free 1.8749E+10 98 0
441 APPS 15-OCT 04:07 latch free 1.8749E+10 98 0
431 APPS 15-OCT 04:07 latch free 1.8759E+10 98 0
420 APPS 15-OCT 04:08 latch free 1.8749E+10 98 0
410 APPS 15-OCT 04:08 latch free 1.8749E+10 98 0
391 APPS 15-OCT 04:07 latch free 1.8749E+10 98 0
366 APPS 15-OCT 04:07 latch free 1.8749E+10 98 0
341 APPS 15-OCT 04:07 latch free 1.8749E+10 98 0
279 APPS 15-OCT 04:07 latch free 1.8749E+10 98 0
259 APPS 15-OCT 04:08 latch free 1.8759E+10 98 0
237 APPS 15-OCT 04:07 latch free 1.8749E+10 98 0
I think this is "Cache buffer Latch".
What could be the solution of this?
2)Below is the o/p for sort
SQL> select name,value from v$sysstat where name like '%sort%';
NAME VALUE
---------------------------------------------------------------- ----------
sorts (memory) 254515960
sorts (disk) 366
sorts (rows) 1.0422E+10
Ours is 6GB pga_aggrigate_target value.
Can we increase it more?
Please help
October 15, 2006 - 9:03 am UTC
rather than "guess", why not find out what they are actually for?
I dont know why you would feel compelled to increase the pga, you seem to be doing pretty darn good there.
I would suggest discovering what your bottlenecks actually are and correcting them instead of looking for magic parameters.
Alexander the ok, January 03, 2007 - 11:40 am UTC
Hi Tom,
I was reading this thread, saw this
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1229436447262#17663865901325 very interesting, but why does creating a view of dual reduce work Oracle must do? I was very suprised to learn to selecting from dual a lot can cause cache buffer chain waits. I was looking into this because we are performance testing a new database and this was the top timed event. One of the top sqls by execution and parse calls is
select 1 from dual;
I love vendor apps.
January 05, 2007 - 8:21 am UTC
prior to 10g, dual is a real table - IO (logical IO mostly) must be performed - it takes about 3/4 LIO's to do that.
nagative values for address field (P1) of trace output
Gundogar, July 18, 2007 - 12:25 pm UTC
Thank you very much for your definitive entries.
We have 2 node RAC machine on HP/UX Itanium (10.1.0.4)
Sometimes we see some queries suffer from latch: cache buffer chain event. problem is mostly accessing same block concurrently.
WAIT #9: nam='latch: cache buffers chains' ela= 1 p1=-4611686000696528624 p2=116 p3=0
When I take 10046 trace for the queries I always see negative values for P1 value of the wait event. when I change this value to hexadecimal (both negative and positive) to use as an input to hot blocks script ( metalink (163424.1)), zero rows returs every time from the script.
what should I do to find the hot blocks ??? Am I hitting a bug ?
CBC
LB, January 29, 2008 - 12:03 pm UTC
Hi Tom,
Database: 9iR2 (patch 5)
OS: HPUX 11i
We have multiblock size buffer cache (Size default8k=8gb and 16k=32gb). The 16k buffer cache has been upgraded from 8GB to 32gb couple of days back.
Post upgrade I figured out there is a huge latch contention. CPU time high in SYS mode. This happens specifically when we start "RATING" process, which insert data into few tables. Also found that one of the CBC holds 118 blocks and it contain 8k and 16k blocks as well.
Q1. Does oracle use the same CBC to contain different block size? If yes, how to sagreggate so that a CBC to contain blocks from particular segment?
Q2. If a process spin on cpu for latch where does it account the CPU time. SYS? or USER?
Would appreciate if you can help on this.
Thanks for your time.
January 30, 2008 - 9:24 am UTC
well, i guess someone had a really good reason for doing this, right.... (sure..)
anyway.....
how did you make all of these findings, eg: how do you tie together "huge latch contention (I'd be interested in hearing the actual numbers there)" and "cpu time high in sys mode"
how did you find "one cbc (latch, cache buffers chains latch) 'contains' 18 blocks? And that they were different sizes?"
you have zero control over what list a block goes onto, (buffer chain). The list a block goes on is based on a hash of its DBA (data block address). You do not assign a cbc latch, a block is given to a list deterministically based on it's DBA. You would never expect to see all of the blocks for a segment on the same list.
whether it is system or user time depends on the operating system and how latches are implemented on that OS
CBC
LB, January 30, 2008 - 10:02 am UTC
Thanks for the response.
I am coming from here
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:44779427427730 (last post)
The CPU in SYS mode went up high only when we start the "RATING" process. subsequently I checked statspack and v$session_wait and found high no. of "latch free" events.
I am not sure if my findings are correct.
I found the latch with high no of sleeps using below query
select ADDR,GETS,MISSES,SLEEPS,CHILD# from v$latch_children
where name = 'cache buffers chains'
order by 4 desc
Output:
ADDR GETS MISSES SLEEPS CHILD#
C00000025A44C4B0 668529037 125091656 2009248 29997
C0000002592C3528 1201340558 50109916 1637790 28230
C000000254F73980 587970340 83996711 1247155 13613
C000000253847BE0 504099230 102936078 1082268 5084
C00000025F469250 426730134 31072591 456837 31090
C000000252BF2118 500903300 30695094 449568 14706
C000000256944130 1326335236 7138781 422809 690
C00000025C51D298 343803377 34049754 388659 29660
C000000252C28148 1003176340 6683510 369545 15168
C000000252F089A8 338568734 35017257 367175 21468
List truncated.....
For HLADDR C00000025A44C4B0 I queried the X$BH table using...
select ADDR,HLADDR,BLSIZ,DBABLK,TCH from sys.x$bh where HLADDR = 'C00000025A44C4B0'
order by TCH desc
Output: I got 118 rows
ADDR HLADDR BLSIZ DBABLK TCH
800003FA80002400 C00000025A44C4B0 16384 712710 80
800003FA800022C8 C00000025A44C4B0 16384 516102 70
800003FA80002190 C00000025A44C4B0 16384 483334 59
800003FA80002400 C00000025A44C4B0 16384 647174 59
800003FA800022C8 C00000025A44C4B0 16384 778246 59
800003FA800022C8 C00000025A44C4B0 16384 614406 59
800003FA80002400 C00000025A44C4B0 16384 48821 47
800003FA800022C8 C00000025A44C4B0 16384 35297 45
800003FA800022C8 C00000025A44C4B0 16384 174796 44
800003FA80002400 C00000025A44C4B0 16384 95838 34
800003FA80002400 C00000025A44C4B0 8192 870728 31
800003FA800022C8 C00000025A44C4B0 16384 157319 26
800003FA80002190 C00000025A44C4B0 16384 290056 22
800003FA80002400 C00000025A44C4B0 16384 69832 20
800003FA80002400 C00000025A44C4B0 16384 748910 18
800003FA80002190 C00000025A44C4B0 16384 745478 15
800003FA80002400 C00000025A44C4B0 16384 204234 12
800003FA80002400 C00000025A44C4B0 16384 606500 11
800003FA80002400 C00000025A44C4B0 16384 425602 11
800003FA80002400 C00000025A44C4B0 16384 322926 10
800003FA800022C8 C00000025A44C4B0 8192 443283 9
800003FA80002400 C00000025A44C4B0 8192 1096151 7
800003FA80002400 C00000025A44C4B0 16384 416654 6
800003FA800022C8 C00000025A44C4B0 16384 417798 6
800003FA80002400 C00000025A44C4B0 16384 124653 6
800003FA80002400 C00000025A44C4B0 16384 343733 5
800003FA80002400 C00000025A44C4B0 8192 1169593 5
800003FA80002400 C00000025A44C4B0 8192 1001800 5
800003FA800022C8 C00000025A44C4B0 16384 128606 4
800003FA80002400 C00000025A44C4B0 16384 695233 4
800003FA800022C8 C00000025A44C4B0 16384 37064 3
800003FA80002400 C00000025A44C4B0 16384 155133 3
800003FA80002400 C00000025A44C4B0 8192 55481 2
800003FA80002400 C00000025A44C4B0 16384 410934 2
800003FA80002400 C00000025A44C4B0 8192 637399 2
800003FA80002400 C00000025A44C4B0 16384 446511 2
800003FA80002400 C00000025A44C4B0 8192 1267897 2
800003FA80002400 C00000025A44C4B0 8192 637399 2
800003FA800022C8 C00000025A44C4B0 16384 1269143 2
800003FA80002400 C00000025A44C4B0 16384 451087 2
800003FA80002400 C00000025A44C4B0 8192 384826 2
800003FA80002400 C00000025A44C4B0 8192 1194455 2
800003FA80002400 C00000025A44C4B0 8192 2017291 2
800003FA80002400 C00000025A44C4B0 8192 93867 1
List truncated...
How to find out the latch implimentation on OS? mean what type of latch implimentation can lead to SYS cpu usage.
Thanks for your help.
January 30, 2008 - 10:31 am UTC
... The CPU in SYS mode went up high only when we start the "RATING" process. ...
did it USED to do that - meaning "did this change actually cause that, or was that always true"
...
select ADDR,GETS,MISSES,SLEEPS,CHILD# from v$latch_children
where name = 'cache buffers chains'
order by 4 desc
....
that is from the time the instance was started, those sleeps could have been days ago, they might not be related to your problem AT ALL. you need to use statspack or AWR and report only on the time of the problem.
....
select ADDR,HLADDR,BLSIZ,DBABLK,TCH from sys.x$bh where HLADDR = 'C00000025A44C4B0'
order by TCH desc
....
and that reports it as of right now, which might be totally "not representative of what was there when your problem was happening"
I recommend you stop looking so deep, back up, get a statspack report (for about 15 minutes) during the time your process is running.
CBC
LB, January 30, 2008 - 6:27 pm UTC
Thanks !!!! :)
How to find the queries that are trying to access a "hot block"?
Ana Potje, June 06, 2008 - 8:02 am UTC
Hi Tom,
I queried the v$latch_children view and got the following result:
select CHILD# "cCHILD"
, ADDR "sADDR"
, GETS "sGETS"
, MISSES "sMISSES"
, SLEEPS "sSLEEPS"
from v$latch_children
where name = 'cache buffers chains'
order by 5 desc, 1, 2, 3;
cCHILD sADDR sGETS sMISSES sSLEEPS
2209 0000040736853750 186403730 26483121 244873
4182 00000407368DE2F0 37513553 270054 575
5837 00000407369528D0 16947493 218718 85
1078 0000040736803EF0 20753709 97161 100
1082 0000040736804370 17327219 57629 66
...
The first address ("0000040736853750") was by far the one with the highest sleep count. Then I querued the x$bh table for this address and got the results below:
select /*+ RULE */
e.owner ||'.'|| e.segment_name segment_name,
e.extent_id extent#,
x.dbablk - e.block_id + 1 block#,
x.tch,
l.child#,
l.LATCH#
from
v$latch_children l,
x$bh x,
sys.dba_extents e
where
x.hladdr = '0000040736853750' and
l.LATCH# = 98 and
e.file_id = x.file# and
x.hladdr = l.addr and
x.dbablk between e.block_id and e.block_id + e.blocks -1
order by x.tch desc ;
SEGMENT_NAME EXTENT# BLOCK# TCH CHILD#
FDM.WGCC_UK1 20 30 516 2209
FDM.WH_CODE_COMBINATIONS_D 20 53 283 2209
SYS.OBJ$ 297 6 228 2209
FDM.ICDD_UK1 0 4 36 2209
FDM.TVVT_N1 50 8 18 2209
SYS.I_OBJ2 111 2 15 2209
FDM.WH_COMMON_TRADES_HISTORY_D 221 302 9 2209
FDM.WBKS_UK1 299 3832 8 2209
FDM.WCTHD_U1 194 32 8 2209
FDM.WH_CODE_COMBINATIONS_D 12007 5 5 2209
FDM.WCTHD_U1 141 48 4 2209
FDM.WH_CODE_COMBINATIONS_D 13129 5 2 2209
FDM.T_CAL_POSITION_2 0 52 1 2209
FDM.WGDB_BM2 0 3 1 2209
FDM.WGJHT_N1 0 3 1 2209
FDM.WATM_BM10 0 3 1 2209
FDM.T_CAL_POSITION_2 8 24 1 2209
FDM.WBKS_UK1 330 656 1 2209
FDM.TCWE_N4 0 3 1 2209
FDM.WH_CODE_COMBINATIONS_D 3144 5 1 2209
FDM.T_EVA_CAPITAL_GREENWICH_2 0 3 1 2209
FDM.ICE_CR_JUMP_RISK_VALUATIONS_F 0 3 1 2209
FDM.T_CAL_POSITION_2 2 39 1 2209
FDM.BOE_REP_ER_T_IND10 0 3 1 2209
FDM.T_CAL_POSITION_2 7 27 1 2209
FDM.T_CAL_POSITION_2 3 2 1 2209
FDM.T_CAL_POSITION_2 3 21 1 2209
FDM.T_ORD_DRCR_DETAIL_2 0 3 1 2209
FDM.WH_ATM_BALANCES_F 240 87 1 2209
FDM.WH_ATM_BALANCES_F 364 791 1 2209
FDM.WH_ATM_BALANCES_F 237 455 1 2209
FDM.WH_ATM_BALANCES_F 361 2887 1 2209
FDM.WH_ATM_BALANCES_F 97 432 1 2209
FDM.WH_ATM_BALANCES_F 358 3521 1 2209
FDM.WH_ATM_BALANCES_F 481 72 1 2209
FDM.TCWE_N2 0 3 1 2209
SYS._SYSSMU66$ 10 11 1 2209
FDM.WH_COMMON_TRADES_HISTORY_D 161 172 1 2209
FDM.WH_COMMON_TRADES_HISTORY_D 137 456 1 2209
FLOWS_030100.WWV_FLOW_WORKSHEET_COND_PK 0 1 1 2209
FDM.HMHDH_N2 0 3 1 2209
FDM.HPD_PK 0 3 1 2209
FDM.WGJLT_N3 0 3 1 2209
FDM.WH_ATM_BALANCES_F 438 2486 0 2209
FDM.T_TDW_CONTRACT 189 1610 0 2209
FDM.T_TDW_CONTRACT 241 1254 0 2209
FDM.T_CAL_POSITION_2 5 13 0 2209
FDM.WCTD_BM7 276 53 0 2209
FDM.T_CAL_POSITION_2 18 16 0 2209
FDM.T_CAL_POSITION_2 15 15 0 2209
FDM.T_CAL_POSITION_2 13 49 0 2209
FDM.T_CAL_POSITION_2 2 39 0 2209
FDM.WH_COMMON_TRADES_HISTORY_D 197 134 0 2209
FDM.T_CAL_TRADE_2 41 24 0 2209
FDM.T_CAL_TRADE_2 0 5 0 2209
FDM.T_CAL_TRADE_2 0 5 0 2209
FDM.WH_BOOKS_D 13 25 0 2209
FDM.WH_COMMON_TRADES_D 96 159 0 2209
FDM.T_TDW_CONTRACT 343 684 0 2209
FDM.WH_ATM_BALANCES_F 256 3966 0 2209
FDM.TVVT_N1 47 59 0 2209
FDM.WH_ATM_BALANCES_F 349 771 0 2209
FDM.EXT_ESS_ASIA_FILE_LOG_T 0 60 0 2209
FDM.T_CAL_POSITION_2 0 56 0 2209
FDM.T_TDW_CONTRACT 247 3318 0 2209
FDM.T_TDW_CONTRACT 170 228 0 2209
FDM.T_CAL_POSITION_2 4 48 0 2209
FDM.WH_CODE_COMBINATIONS_D 13215 5 0 2209
FDM.T_TDW_CONTRACT 193 58 0 2209
69 rows selected.
What can I conclude? Can I determine somehow which was the query (or queries) that were trying to access this hot block?
Many Thanks!
Ana
cache buffers chains
Jom, June 09, 2008 - 4:02 pm UTC
Tom,
While loading a data through a batch on one of the RAC node, we are seeing lots of latches of CBC, so we canceled the job as it wasn't doing much and we were waiting for hrs though we ran on another instance of RAC and it's worked fine without any CBC lathes
Event Waits Time(s)
-------------------------- ----------- ----------
latch: cache buffers chains 89,538,323 25,196.7
Could you please tell me what could be the cause?
June 09, 2008 - 4:55 pm UTC
the cbc latch controls access to the blocks in the buffer cache.
If you are waiting on it a lot - that means you had contention for the blocks protected by that latch in the cache.
by killing the job, you removed the contention.
you then ran it again later - when whatever it was in contention with was finished, gone.
CBC latch and FORALL
Matt, June 11, 2008 - 7:43 pm UTC
I am facing an issue with CBC latches when several deletes occur against different portions of the same table. This is a load job that does a delete for any existing records, in case they exist.
The relevant portion of the developer's PL/SQL code is
FORALL i IN 1 .. p_rec.MESSAGEPAYLOAD_ID.COUNT
DELETE ene_ea_T
WHERE Opr_dt = f_opr_dt(p_rec.INTERVALSTARTTIME(i))
AND Opr_hr = f_opr_hr(p_rec.INTERVALSTARTTIME(i));
p_rec is a pl/sql table object that contains about 30,000 rows and about 10 of these load procedures run at a certain time each day.
This looks logical that there would be a cache buffer chains issue when 10 of these run except that all records in the p_rec object have the same INTERVALSTARTTIME.
The leading columns of the ene_ea_t table's primary key is opr_dt and opr_hr and a select using them returns in < 1 seconds. If we run one load job alone it runs very quickly and without CBC issue.
Each of the load jobs correspond to a different opr_dt and opr_hr and thus each load job should not be trying to delete records corresponding to other jobs.
My question is
1) does the FORALL process each of 32000 deletes individually with respect to cache buffer chains latches even though it only shows about 10 total executes?
2) which would help my cache buffer chain issue more
- to change the code to eliminate the FORALL
or
- to work to ensure that only one of the load jobs at a time?
Thanks in advance!!
June 11, 2008 - 9:38 pm UTC
well, blocks are put into a cache buffer chain list based on a hash of their DBA (data block address).
so, a block from the first part of the table will be in the same list as a block from the last part of the table. That these processes are doing different bits of the table only means they won't content physical IO wise with each other - logical IO wise - they may well....
and don't forget, the delete will hit EVERY INDEX, every indexe's root block in particular. All of them will hit the same root blocks - constantly.
...
1) does the FORALL process each of 32000 deletes individually with respect to
cache buffer chains latches even though it only shows about 10 total executes?
....
cache buffer chains (cbc) latches are the latches used to protect the buffer cache. There are lists of blocks, each list is protected by one of these latches. A delete of one row will require many cbc latches - we have to read the indexes, the table, lots of stuff.
Each of the 32,000 rows to be deleted would be done basically one after the other. So, 32,000 gets on the index root block for each index we delete from...
2) getting rid of forall would be the worst thing to do!!!!!
actually, if you could change it to be:
DELETE from ene_ea_T
WHERE (Opr_dt,opr_hr) in (select intervalstarttime, intervalstarttime
from TABLE( p_rec ) )
and do them all at once, I'd be for that! the more the merrier, the better. (make p_rec a collection and this will work)
if you "work to ensure that only one of the load jobs runs", you would of course remove contention - but - so what? you will have serialized everything which is probably much worse...
Concurrent access and latches
Jayadevan, June 23, 2008 - 7:46 am UTC
Hi Tom,
1)If a number of users are accessing exactly the same data, even if it is just a SELECT, will it lead to latch issues? We are testing a scenario where 100 users login to a system and see the 'Products" listing - which means the same query. CPU is coming as the top event (latch: cache buffers chains is next)in AWR report and vmstat on the DB box keeps showig pretty long queues. The queries which are taking less than a second when we try on SQL*Plus take about 20-30 seconds in the load test (JMeter running 100 concurrent threads).
2) Any suggestions?
June 23, 2008 - 8:15 am UTC
1) in order to access shared memory, we have to protect it. In order to read something out of the cache, you need to latch it - to protect it from modification (it would be a really bad thing to be walking a linked list that someone else is modifying see...) The latch used to protect blocks in the buffer cache is the cache buffers chains latch (there are many of these, when you read a block in from disk, we hash the database block address (DBA) to figure out what list to put it on - latch that list and modify it, adding the block)
2) you tell us nothing about the system. If this is like a 2 cpu machine, then 100 concurrent threads isn't going to fly for doing something that is 100% cpu is it.
So, tell us about the MACHINE.
Latches in a read only scenario
Jayadevan, June 23, 2008 - 11:47 pm UTC
Hi Tom,
First - the machine details - It is, as you thought, a 2 CPU mcahine with 4 GB RAM.
pga_aggregate_target 418381824
processes 150
sga_target 1258291200
Another doubt - if there are 100 users doing a SELECT * FROM EMP WHERE ID = 200, with nobody changing the data, still it could lead to a scenario where we have a CPU bottleneck?
Is there a work-around (caching the data, if it is static at the app server is one option?)
What about opting for a read only transaction (set transaction) - will that help?
In our case, we are accessing 6 tables - one a materialized view with 50 columns and 6000 records. The rest are master tables with a few hundred records and less than 20 columns (the MV is about 7 MB in size).
Thanks,
Jayadevan
June 24, 2008 - 4:54 am UTC
but you are missing the point
the buffer cache is never read only. The buffer cache is a data structure, a shared data structure. When you start the database - it is EMPTY, it needs to get filled (modified, written to, changed)
As your progress over the course of the day, you'll have physical IO's (unavoidable). All of those physical IO's - they modify the buffer cache. They have to put their blocks somewhere where people can find them.
A 2 cpu machine with 100 concurrent threads. Do the math...
Concurrent access
Jayadevan, June 24, 2008 - 5:46 am UTC
Hi Tom,
So the only solution would be to avoid "really really really hot blocks" (as you mentioned elsewhere in this thread) or make the queries go really really really fast so that the latches are acquired and released very fast?
If many users are selecting the same data for read-only, can't Oracle share the latches or something like that?
June 24, 2008 - 6:46 am UTC
100 threads concurrent
divided by
2 cpus
equals
50 things concurrently trying to work on a cpu
summarizes to: run queues, big ones.
latches are acquired and released really really fast - no matter how long the query runs - accessing a block is accessing a block. What you want to do is reduce the amount of blocks you access.
for example, how many IO's should this query take:
SELECT * FROM EMP WHERE ID = 200
Well, if you use a hash cluster with a cluster key on ID, it would take 1.
If you used an IOT, it would probably take 3 (presume an IOT of height three)
If you use a heap table plus a primary key index, probably 4 (3 in the index, 1 for the table access)
If you hash partitioned this table into N hash partitions, maybe 3 (2 in the index, one for the table access)
You keep missing the point about the buffer cache being a DATA STRUCTURE, A LIST OF LINKED LISTS, EVERY PHYSICAL IO MUST MODIFY THIS STRUCTURE - many things must modify this structure. It matters not of the blocks being cached are read only or read write or write only or whatever - the DATA STRUCTURE that is the buffer cache is ALWAYS modifiable (else there would be NOTHING IN IT, it starts life "empty", it must get "filled" - that is modified)
and yes, there are shareable latches, but a latch is a latch is a latch is a serialization device. And the latch isn't to serialize access to the blocks, the latch is to protect the "cache buffers chains" - the lists of blocks.
cache buffers chains
gef, September 11, 2008 - 12:46 pm UTC
With 10g the behavior changes with respect to cache buffers chains, bigfile tablespces and auto storage mgmt. How can you tell how many lists are being manaaged in cache and if you are back in the game of trying to make more lists to reduce contention?
September 16, 2008 - 12:28 pm UTC
you don't? You do not want to play games with undocumented parameters.
this note was written in the 8i timeframe, that parameter was deprecated a couple of releases ago (8.1.7 was the last release).
Query is incomplete
Reader, October 17, 2008 - 6:34 am UTC
Just looked at the original query to find the hot blocks from the originator and v$latch_children is just hanging and it's not joined with any other table.
select /*+ ordered */
e.owner ||'.'|| e.segment_name segment_name,
e.extent_id extent#,
x.dbablk - e.block_id + 1 block#,
x.tch,
l.child#
from
sys.v$latch_children l,
sys.x$bh x,
sys.dba_extents e
where
x.hladdr = '0A243B40' and
e.file_id = x.file# and
x.dbablk between e.block_id and e.block_id + e.blocks -1
order by x.tch desc;
Should not l.addr be joined to hladdr?
October 18, 2008 - 8:13 pm UTC
not my query, you'd have to ask the original poster of it...
but it does (at a quick glance) look incorrect - yes.
cache buffer chains on a single row table
A reader, January 21, 2009 - 3:35 pm UTC
Tom,
I hope my question doesn't fall under a new question category . I apologize if it does and shall submit it when the window is open.
We are performing load test and we are seeing a very Cache buffer chain wait events.Its an oracle apps EBS system.I have identified the hot object(using Metalink note:163424.1) and the SQLs that are calling this hot object (from statspack). There is a high contention on a single table and When I look at the table, its a single row table and all users are reading this table concurrently.(due to load test).
How do i avoid contention on a single row table? As its a single row table I would not be able to distribute the hot block data across many blocks(by recreate changing pctfree/ pctused).freelist and freelist group on this table are 4.
Oracle version is :9.2.0.8, Sun Sparc Unix, 40 CPU,near to 200 Gb RAM( there are many other instance running on it), 19Gb db cache size,2.5Gb shared pool size.
How or what should be the approach for this problem? Any help or pointers would be of great help.
Below are the statspack data , statspack duration is 12min.
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
latch free 6,034,095 131,490 96.13
CPU time 4,581 3.35
db file sequential read 72,193 624 .46
io done 38,082 29 .02
log file sync 7,453 17 .01
-------------------------------------------------------------
Wait Events for DB
--------------------------------------------
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
latch free 6,034,095 6,033,952 131,490 22 195.4
db file sequential read 72,193 0 624 9 2.3
...
...
buffer busy waits 102 0 1 9 0.0
Latch Activity for DB
--------------------------------------------
Pct Avg Wait Pct
Get Get Slps Time NoWait NoWait
Latch Requests Miss /Miss (s) Requests Miss
------------------------ -------------- ------ ------ ------ ------------ ------
Consistent RBA 25,673 0.0 0.0 0 0
...
...
...
cache buffer handles 1,172,459 0.3 0.0 0 0
cache buffers chains 343,246,262 8.3 0.2 ###### 77,652 0.0
cache buffers lru chain 9,029 0.0 0 85,491 0.0
Latch Sleep breakdown for DB
--------------------------------------------
Get Spin &
Latch Name Requests Misses Sleeps Sleeps 1->4
-------------------------- -------------- ----------- ----------- ------------
cache buffers chains 343,246,262 28,516,989 6,056,865 24884663/222
6821/816890/
588616/0
library cache 7,177,790 11,550 55 11495/55/0/0
/0
cache buffer handles 1,172,459 3,038 14 3024/14/0/0/
0
Thanks for your time.
Thanks,
Sasi
January 21, 2009 - 5:49 pm UTC
well, one idea would be to convert this table into an IOT, if you are full scanning it we can cut the logical IO down a lot:
ops$tkyte%ORA9IR2> create table t1 ( x int primary key, data varchar2(40) );
Table created.
ops$tkyte%ORA9IR2> create table t2 ( x int primary key, data varchar2(40) ) organization index;
Table created.
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> insert into t1 values ( 1, rpad('*',40,'*') );
1 row created.
ops$tkyte%ORA9IR2> insert into t2 values ( 1, rpad('*',40,'*') );
1 row created.
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> alter session set sql_trace=true;
Session altered.
ops$tkyte%ORA9IR2> begin
2 for i in 1 .. 1000
3 loop
4 for x in (select * from t1 looping) loop null; end loop;
5 for x in (select * from t2 looping) loop null; end loop;
6 end loop;
7 end;
8 /
PL/SQL procedure successfully completed.
SELECT * FROM T1 LOOPING
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1000 0.01 0.01 0 0 0 0
Fetch 2000 0.03 0.02 0 7000 0 1000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3001 0.05 0.03 0 7000 0 1000
********************************************************************************
SELECT * FROM T2 LOOPING
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1000 0.01 0.01 0 0 0 0
Fetch 2000 0.02 0.02 0 3000 0 1000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3001 0.04 0.03 0 3000 0 1000
if the data in this single row table doesn't change - we could play other tricks with it.
for example, we could create a view - maybe on a pipelined function - no IO.
ops$tkyte%ORA9IR2> create or replace type myType as object
2 ( x int, data varchar2(40) )
3 /
Type created.
ops$tkyte%ORA9IR2> create or replace type myTableType as table of myType
2 /
Type created.
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> create or replace function f return myTableType
2 pipelined
3 as
4 begin
5 pipe row( myType( 1, rpad('*',40,'*') ) );
6 return;
7 end;
8 /
Function created.
ops$tkyte%ORA9IR2>
ops$tkyte%ORA9IR2> select * from table(f());
X DATA
---------- ----------------------------------------
1 ****************************************
ops$tkyte%ORA9IR2> begin
2 for i in 1 .. 1000
3 loop
4 for x in (select * from t2 looping_again) loop null; end loop;
5 for x in (select * from table(f()) looping_again) loop null; end loop;
6 end loop;
7 end;
8 /
PL/SQL procedure successfully completed.
SELECT * FROM T2 LOOPING_AGAIN
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1000 0.02 0.02 0 0 0 0
Fetch 2000 0.03 0.03 0 3000 0 1000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3001 0.06 0.07 0 3000 0 1000
********************************************************************************
SELECT * FROM TABLE(F()) LOOPING_AGAIN
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 12 0 0
Execute 1000 0.01 0.01 0 0 0 0
Fetch 2000 0.18 0.16 0 0 0 1000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3001 0.20 0.18 0 12 0 1000
but that might have the effect of turning your latch contention into a cpu issue (but then again, since latching is hugely cpu intensive when you have multiple spins like that - you might end up using less overall)
CBC on a single row table
A reader, January 22, 2009 - 2:07 pm UTC
Thanks Tom.
I would not be able to change this table to IOT as its an Oracle EBS table. But I shall test it in our test environment.
I have a question on the tests that you have mentioned. Question is: Though the logical IO comes down ,the table would be read one after another 1000 times and so concurrency test would not come into picture. But in our case many users are trying to read the single row table concurrently.
Would it still reduce the contention?
Why does IOT has less logical read in this case
when compared to the normal table full table scan that has only one row in it. Both would be doing only one or equal data block visits in this case? Is that correct ?
Thanks again for your time.
Sasi
January 22, 2009 - 3:52 pm UTC
if you get 7 blocks out of the cache (the same 7 blocks by thousands of users)
versus
if you get 3 blocks out of the cache
and each time you go to get a block remembering that you need the CBC latch....
To full scan a table, we read the segment header (multiple times - less in current releases) and then read the table.
To full scan an IOT, we would be walking a data structure - not needing to read a segment header.
different structures, different IO's
CBC on a single row table
A reader, January 23, 2009 - 6:21 am UTC
Thanks Tom. Your explanations are always simple and easy to understand.
Thanks,
Sasi
Trouble identifying root cause of cache buffers chains issue
Sandeep Takhar, March 07, 2009 - 6:35 pm UTC
Hi Tom,
I hope you can help me. We have a very clear cache buffers chains issue that happens in our production environment. As you noted, this is either a hot block or a long has chain. I can find proof of neither (querying x$bh for tch and grouping by hladdr...). I think it may be a hot block whose touch count isn't updated because of the algorithm of updating touch counts...dunno.
Our load has increased the last two weeks and the problem has shown up almost every day in the last two weeks. By doing a 2x2 comparison of 15 minute stats pack of good time to bad time (within the two weeks)...I can see the following pattern of things that I think are relevant to CBC (read the following from the perspective of the bad time):
1) Increase in logical i/o (Look for total logical reads on awr report or add consistent gets + db block gets.). For example from 250k -> 420k
2) Increase in logical i/o is attributed to increase in consistent gets (and consistent gets - examination)
3) Top 5 wait event includes cache buffer chains wait
4) Cache buffer chains miss % > 5% (where in good times it is still not all that good...1-2%)
5) buffer is pinned count is less than logical i/o
6) cache buffers chains latch > 1M gets
7) over 95% of sessions waiting on cache buffer chains at time of incident involve nc_objects query (table)
8) logical reads in a good time have the same top 5 segments and high capture rate...66%. logical reads in bad time have low capture rate < 40%. Bad time has NC_OBJECTS in top 5 logical reads that is never there for good times.
9) Physical reads in a bad time ...nc_objects has highest physical reads and > 80% capture for top 5.
10) Sql execution ...sql ids and execution counts (for top sql)...is very similar pattern for good time and a very similar pattern for bad time. One sql that shows up (not there in good report) 60k times executed with > 200 LIO's per execution is:
UPDATE NC_OBJECTS SET VERSION = VERSION + 1 WHERE OBJECT_ID = :B1 RETURNING VERSION INTO :O0
<It should be noted that at the time of the issue..this and other queries against nc_objects via primary key access account for 95% of queries where last_call_et > 5 and event is cbc latch)>
Questions:
1) How does logical i/o increase for the same query? For queries that go against a primary key..how can logical i/o increase (to over 200 gets per execution)?
2) Will multi-buffer pool help?
3) Does the following theory make sense...CBC latch is help for "consistent gets - examination" for the duration of the physical read which may be causing the issue?
4) How to logically solve this problem?
5) What other diagnostic data should I capture at the time of the issue?
6) There is another table (I've mentioned nc_objects all along as being a clear indication of having an issue) called nc_parameters..where a number of queries have increased logical i/o...I am wondering what may be causing it.
March 08, 2009 - 7:07 am UTC
1) one would supposed you have a index on version as well as object_id. You are reading a table that is being heavily modified, you are reading an index on version that is heavily modified (assumption). The additional logical IO's you see are not against the segment in question, but against the rollback segment.
2) no
3) did not follow that. a cbc latch is gained to access a list of blocks - high cbc latching does not mean only the two things you stated (hot block or long list) but rather includes "you do LOTS of logical IO". Very heavy access to the same blocks means the same list of blocks - not the same exact identical block (although you probably have that, the root block of the index on version and the root block on the index of object_id)
4) reduce logical io's by tuning queries.
6) probably you modify that table frequently as well
Some more info
Sandeep Takhar, March 07, 2009 - 9:44 pm UTC
I found this and it seems to mirror some of the questions I have and the awr reports we have (I'm not sure if you like getting links in here...but it seems to answer my question....):
http://www.dbanotes.net/archives/2005/08/buffer_gets_inc.html Since I am not an oracle expert..hopefully I don't keep finding things and pasting them here to answer my questions...
March 08, 2009 - 7:11 am UTC
that would be in line with what I said, you are seeing logical IO's that are not against the segment but are being done from undo for consistent reads.
Latch buffer chains in 10g
Suresh, November 02, 2009 - 3:41 am UTC
Hi Tom,
Am hitting the Latch buffer chains in 10g (10.2.0.1) and digging into further i find out that was an index block.
Session holding latches
=========================
SID NAME HMODE OBJECT Held Gets Held % Held ms Avg hold ms
---------- -------------------------- ------------- ----------------- ---------- ---------- ------- ----------- -----------
110 cache buffers chains exclusive 14000B9 3241 2623 3.24 222.333 .085
116 cache buffers chains exclusive 14000B9 1629 1317 1.63 111.749 .085
120 cache buffers chains exclusive 14000B9 1501 1085 1.50 102.969 .095
150 cache buffers chains exclusive 14000B9 1448 1444 1.45 99.333 .069
136 cache buffers chains exclusive 14000B9 1157 1155 1.16 79.370 .069
114 cache buffers chains exclusive 14000B9 1060 1058 1.06 72.716 .069
134 cache buffers chains exclusive 14000B9 1014 831 1.01 69.560 .084
119 cache buffers chains exclusive 14000B9 839 827 .84 57.555 .070
Hot Block
===========
STATE BLOCK_CLASS OBJECT_TYPE object TCH MODE_HELD D T P S D FLG_LRUFLG DQ
---------- ------------------ --------------- --------- ------ ---------- - - - - - ------------- ----------
xcur data block INDEX CLIC.SYS_C006030 5127 0 N N N N N 2202000:0 0
Here are some of inputs.
This index is on ASSM tablespace. Hence cant change freelists to avoid the same.
AWR report
===========
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time 13,997 76.6
latch: cache buffers chains 65,308 299 5 1.6 Concurrenc
latch free 35,215 225 6 1.2 Other
os thread startup 109 11 100 0.1 Concurrenc
control file parallel write 2,399 3 1 0.0 System I/O
Wait Class DB/Inst: CLICUAT/CLICUAT Snaps: 703-705
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc
Avg
%Time Total Wait wait Waits
Wait Class Waits -outs Time (s) (ms) /txn
-------------------- ---------------- ------ ---------------- ------- ---------
Concurrency 65,433 .0 310 5 61.6
Other 35,256 .0 225 6 33.2
System I/O 7,431 .0 4 1 7.0
User I/O 1,135 .0 1 1 1.1
Network 66,320 .0 0 0 62.4
Commit 205 .0 0 1 0.2
Application 18 .0 0 0 0.0
-------------------------------------------------------------
Wait Events DB/Inst: CLICUAT/CLICUAT Snaps: 703-705
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)
Avg
%Time Total Wait wait Waits
Event Waits -outs Time (s) (ms) /txn
---------------------------- -------------- ------ ----------- ------- ---------
latch: cache buffers chains 65,308 .0 299 5 61.4
latch free 35,215 .0 225 6 33.1
os thread startup 109 .0 11 100 0.1
####
^LSQL ordered by CPU Time DB/Inst: CLICUAT/CLICUAT Snaps: 703-705
-> Resources reported for PL/SQL code includes the resources used by all SQL
statements called by the code.
-> % Total DB Time is the Elapsed Time of the SQL statement divided
into the Total Database Time multiplied by 100
CPU Elapsed CPU per % Total
Time (s) Time (s) Executions Exec (s) DB Time SQL Id
---------- ---------- ------------ ----------- ------- -------------
13,954 18,223 0 N/A 99.7 46frrprhgrc06
Module: JDBC Thin Client
SELECT * FROM TABLE(CAST(GET_MARGIN_POSTING(:1) AS NET_MARGIN_OBJ_TABLE))
13,954 18,223 1 13953.57 99.7 c7n6ar4563nbc
Statement Causing Issue:-
===========================
SELECT T.CLIENT_ID, SUM(TM.NETTED_MARGIN) NETMARGIN FROM V_TRADE_FOR_MARGIN_CALC
T, TRADE_MARGIN TM WHERE T.EXCHANGE = :B5 AND T.TRADE_DATE BETWEEN :B4 AND :B3
AND T.CLIENT_ID LIKE :B2 AND T.SCRIP_CODE LIKE :B1 AND T.INTERNAL_ID = TM.INTERN
AL_ID AND TM.NETTED_MARGIN > 0 GROUP BY T.CLIENT_ID
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 16 (100)| |
| 1 | HASH GROUP BY | | 1 | 132 | 16 (7)| 00:00:01 |
|* 2 | FILTER | | | | | |
| 3 | NESTED LOOPS | | 1 | 132 | 15 (0)| 00:00:01 |
| 4 | MERGE JOIN CARTESIAN | | 1 | 65 | 15 (0)| 00:00:01 |
| 5 | MERGE JOIN CARTESIAN | | 1 | 46 | 8 (0)| 00:00:01 |
|* 6 | TABLE ACCESS FULL | TRADE_MARGIN | 1 | 26 | 3 (0)| 00:00:01 |
| 7 | BUFFER SORT | | 2599 | 51980 | 5 (0)| 00:00:01 |
| 8 | INDEX FAST FULL SCAN | SCRIP_VAR_ELM_PK | 2599 | 51980 | 5 (0)| 00:00:01 |
| 9 | BUFFER SORT | | 3794 | 72086 | 10 (0)| 00:00:01 |
| 10 | INDEX FAST FULL SCAN | SYS_C006001 | 3794 | 72086 | 7 (0)| 00:00:01 |
|* 11 | TABLE ACCESS BY INDEX ROWID| TRADE | 1 | 67 | 0 (0)| |
|* 12 | INDEX UNIQUE SCAN | SYS_C006030 | 1 | | 0 (0)| |
---------------------------------------------------------------------------------------------------
Buffersize : 300MB
Sharedpool : 800MB
CPU's : 2
From here am clueless to proceed further. Help to avoid the cbc latches.
-Thanks
Suresh
November 09, 2009 - 12:22 pm UTC
... Hence cant change freelists to avoid the
same.
...
and that wouldn't help you in a non-assm tablespace either - not sure where you were going with that. the CBC latch here means "lots of people going for the same block at the same time"
you don't say how long of a period of time this was for. 299 sounds "big" until you put it up against 13,997 - then it is small.
One execution taking 13,953 cpu seconds?? ouch, I'd be looking at the algorithm in that pipelined function - why not start with that????
what is pin buffer exactly and how it is come in use.
anant, August 24, 2011 - 2:50 am UTC
plz giv me answer
August 30, 2011 - 3:39 pm UTC
pinning is something you do to "keep" something. You pin it and until you un-pin it or something else forces it to become un-pinned - you have access to it.