Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Prince.

Asked: May 03, 2002 - 10:27 pm UTC

Last updated: March 11, 2011 - 7:39 am UTC

Version: 8.1.7.2

Viewed 10K+ times! This question is

You Asked

Tom,

When there is latch, is the entire DB operation is serialized or certain operations are serialized?

Assume there are queries with literals. (consider the case of latches on the shared pool/library cache)

Ex,
1. select col1, col2 from tab1 where col3 = 'Valx';

2, select col1, col2 from tab2 where col1 = 'Val1';

If the above queries (fresh) are executed exactly at the same point will one them has to wait for the parsing to be done, even if the tables do not have any relation to each other?

In summary, will any latch cause every other operation which need the latch to be serialized?

As you always do, some explanation would be wonderful.

Thanks,


and we said...

A latch is a lightweight lock used to protect a data structure. If I have latched a data structure, no one else will able to (they wait). It is not the whole database -- just that structure.


Yes, the hard parses require MANY MANY latches - on data structures in the shared pool. They will get in eachothers way.

Yes -- latches = locks, locks = serialization device, serialization devices = less scalable, slower.


Get my book -- I spend about 1300 pages on this topic!

Rating

  (96 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

qu

liz, May 05, 2002 - 2:16 pm UTC

hi
serialization means in queue is it?
liz

Tom Kyte
May 05, 2002 - 6:01 pm UTC

serialization in this case means "one at a time, after one another"

your book

ab, May 05, 2002 - 2:23 pm UTC

tom
we r not getting your book here in asia i searched a lot lot since last few months.
it will cost v.much if i buy it thru mail, in local money.do
do something for us

Tom Kyte
May 05, 2002 - 6:01 pm UTC

Asia is a big place, anything more specific then that?

asia

ab, May 06, 2002 - 3:44 pm UTC

sorry for being late sir
Its Pakistan I m talking of.
will be thankful if u could do anything for us


Tom Kyte
May 06, 2002 - 8:54 pm UTC

I can help out by suggesting you get a new keyboard -- your keyboard seems to be dropping letters such as Y and O and even A's from time to time!

Take the ISBN for the book to a reputable bookstore, they should be able to get it for you - It is over there, that I know.

It is There in Pakistan

A reader, May 07, 2002 - 6:50 am UTC

Sorry to Budge in Like this But thought i would help in something that i know ;-D

Yes the Book is available in Pakistana and the indian subcontinent. It is being published by shorff Publishers from Bombay.

HTH

Regards,
Ganesh R

thankyou

ab, May 07, 2002 - 8:23 am UTC

thankyou a reader
i am trying, inshallah i will find it from somewhere.


thankyou

ab, May 07, 2002 - 8:24 am UTC

thankyou a reader
i am trying, inshallah i will find it from somewhere.


Thanks Tom

Prince., May 07, 2002 - 2:11 pm UTC

Thank you Tom, for your response.

There are "latches" (V$LATCH) and "latch children" (V$LATCH_CHILDREN). If there is a latch in the parent level will it latches all the children?

Or, will the latches be always on the child level if the latch has children?




Tom Kyte
May 07, 2002 - 2:18 pm UTC

Most of the internal structures are protected by one latch. Some cases (eg: in the library cache for example), we have multiple latches each protecting part of the data structure (different parts of the library cache). These "multiple" latches are called CHILD LATCHES. There is one library cache "parent" with many children.

You will see the child latches being taken, it will be very rare to see a parent latch that has children being taken.

Wanna good book on this topic:

</code> http://www.amazon.com/exec/obidos/ASIN/156592598X <code>
....

Shared Pool one latch?

Dave, May 08, 2002 - 3:37 am UTC

Tom is the shared pool protected by one latch?
When we parse? We only can do one at a time?

Tom Kyte
May 08, 2002 - 7:32 am UTC

There are child library cache latches (as indicated by the above comment:

... Some cases (eg: in
the library cache for example), we have multiple latches each protecting part of
the data structure (different parts of the library cache).....




latches and serialization

prakash eranki, May 08, 2002 - 7:20 am UTC

That was really superb explanation.

Related Question

Prince, May 08, 2002 - 4:25 pm UTC

Tom,

I have another related question.

I have a long running select statement. The wait_stat event shows that one of the wait was on "library cahe" (event: latch_free).

SELECT name, child#, gets, misses, sleeps
FROM v$latch_children
WHERE addr='aaaa';

NAME CHILD# GETS MISSES SLEEPS
-------------------------------- ------- ---------- ---------- ----------
library cache 3 244270286 663240 185443

select * from v$session_event where sid = xx;

EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT MAX_WAIT
-------------------------------- ----------- -------------- ----------- ------------ ----------
latch free 13799 7492 257 .018624538 6


EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT MAX_WAIT
-------------------------------- ----------- -------------- ----------- ------------ ----------
latch free 14073 7644 268 .019043559 6


Now my question is, once my query is parsed why is it waiting on the "library cache" latch?.

Can you explain, why I am waiting on this event, once my statement is parsed and started executing?

my v$sesstat parse count is not varying/increasing. So my session is not parsing again and again.

Thanks,



Tom Kyte
May 08, 2002 - 4:39 pm UTC

why do you believe it was the library cache latch? More likely the cache buffers chains latches.

If you have my book see chapter 10, i describe how to generate a trace file with the wait events. When the wait even is latch free, the values for:

* P1 = Latch address
* P2 = Latch number
* P3 = Tries

are usful, you can goto

SELECT * FROM v$latchname
WHERE latch# = &P2
;


to see what latch it is really waiting on.


I traced the session during the query execution..

Prince, May 08, 2002 - 6:14 pm UTC

Tom,

Thanks for the update. 

I traced the session while the query was executing.

I used event 10046 on level 12. Is that the right one?

% grep WAIT ora_xxxx.trc > wait.log

WAIT #1: nam='db file sequential read' ela= 0 p1=14 p2=147292 p3=1
WAIT #1: nam='db file scattered read' ela= 0 p1=14 p2=147294 p3=8
WAIT #1: nam='db file scattered read' ela= 0 p1=14 p2=147302 p3=8
WAIT #1: nam='db file scattered read' ela= 0 p1=14 p2=147310 p3=8
WAIT #1: nam='db file scattered read' ela= 0 p1=14 p2=147318 p3=2
WAIT #1: nam='db file scattered read' ela= 0 p1=14 p2=147322 p3=2
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=0
WAIT #1: nam='db file sequential read' ela= 0 p1=14 p2=147326 p3=1
WAIT #1: nam='db file scattered read' ela= 0 p1=14 p2=147329 p3=8
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847740 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847740 p2=106 p3=1
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=1
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=1
WAIT #2: nam='latch free' ela= 0 p1=1677847740 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847740 p2=106 p3=1
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=1
WAIT #2: nam='latch free' ela= 0 p1=1677847740 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847740 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=0
WAIT #2: nam='latch free' ela= 0 p1=1677847532 p2=106 p3=1


SQL> select * from v$latchname where latch# = 106 ;

    LATCH# NAME
---------- ----------------------------------------------------------------
       106 library cache

 

Tom Kyte
May 08, 2002 - 9:16 pm UTC

different cursors there.

#1 and #2. Don't know what #2 is -- but the first one #1, looks like your select.

Edit the raw trace file (if you have my book -- chapter 10 describes what you'll see)....


Look in the raw trace file.... You'll find that cursor #2 is something totally different (or many many different statements)

Latches and serialization

Bharath, May 09, 2002 - 11:36 am UTC

Hi Tom,

Let me thank you for the wonderful book. I have two questions

1. How to avoid soft parse? One solution is keep the cursor open till it completes its utility. But If I have a cursor in a formula column of a report how can I keep it open for all the records returned by the query. Is there any workaround for this. And what are the other ways to avoid soft parse?

1. I used the following syntax to generate trace file but no information related to waits were generated?


Thanks,
Bharath

Tom Kyte
May 09, 2002 - 7:32 pm UTC

1) to avoid a soft parse, keep the cursor open. I don't use reports -- never wrote one in my life. I don't know what to tell you there.

1) what syntax. we only print out wait information WHEN there is a wait. perhaps you had none. it happens.

Latches

Bharath, May 09, 2002 - 11:51 am UTC

Sorry. My previous question did not have the command that I had used in my reports.

SRW.DO_SQL('ALTER SESSION SET EVENTS ''10046 trace name context forever, level 12''');

Thanks,
bharath



Tom Kyte
May 09, 2002 - 7:33 pm UTC

You may well not have had any waits.

Thanks Tom. But the other query uses bind variable

Prince, May 09, 2002 - 3:06 pm UTC

Thank you very much Tom.

My original query calls a function which calculates the months since 1970.

The original query was like,

"select col1, col2, myfun(col3_date) from big_table."

Now, why does it cause so many latches though I calculate the months from col3_date to 1970 thru myfun (pl/sql)?

Note: TO_DATE('01-Jan-1970','DD-MM-YYYY'), is not in the right format. Should that be of any concern?

I have your book and I am still reading it. I have Steve Adma's book as well


Thanks,

*********Begin - Entry from RAW trace file ***************
PARSING IN CURSOR #2 len=89 dep=1 uid=34 oct=3 lid=34 tim=3305195552 hv=3915096024 ad='655b4bfc'
SELECT MONTHS_BETWEEN(TRUNC(:b1,'MONTH'),TO_DATE('01-Jan-1970','DD-MM-YYYY')) FROM DUAL
END OF STMT
*********End - Entry from RAW trace file ******************


*********Begin - Entry from tkprof ************************

SELECT MONTHS_BETWEEN(TRUNC(:b1,'MONTH'),TO_DATE('01-Jan-1970','DD-MM-YYYY'))

FROM
DUAL


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 108042 26.11 25.59 0 0 0 0
Fetch 108042 16.85 17.46 0 108042 432168 108042
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 216084 42.96 43.05 0 108042 432168 108042

*********End - Entry from tkprof ************************


Tom Kyte
May 09, 2002 - 7:50 pm UTC

Because you are calling PLSQL from SQL and that will do that.


You should just call months_between DIRECTLY. Only use plsql in sql when left with no other choice (eg: code is way to hairy to do in sql). Calling that 108k times is "bad".



Excellent

Prince, May 09, 2002 - 8:42 pm UTC

I am wondering, why does a simple "select 'x' from dual" causes "4" db block gets and "1" consistent get?

Also, is it the context switching between pl/sql and sql causes that many library cache latches?

Thanks.

Tom Kyte
May 09, 2002 - 8:56 pm UTC

the 4 db block gets are the reading of the segment header -- you'll see it for all full scans of any table.

I'd have to see the entire plsql routine in order to comment.  It shouldn't in general.  In this case, I would say it was all of the recursive sql being done.

If you compare:

ops$tkyte@ORA817DEV.US.ORACLE.COM> create table run_stats ( runid varchar2(15), name varchar2(80), value int );

Table created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> grant select any table to ops$tkyte;
Grant succeeded.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create or replace view stats
  2  as select 'STAT...' || a.name name, b.value
  3        from v$statname a, v$mystat b
  4       where a.statistic# = b.statistic#
  5      union all
  6      select 'LATCH.' || name,  gets
  7        from v$latch;

View created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> create or replace function f1(p_date in date) return number
  2  as
  3      l_return number;
  4  begin
  5      SELECT MONTHS_BETWEEN(TRUNC(p_date,'MONTH'),TO_DATE('01-Jan-1970','DD-MM-YYYY'))
  6        into l_return
  7        from dual;
  8      return l_return;
  9  end;
 10  /

Function created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> create or replace function f2(p_date in date) return number
  2  as
  3  begin
  4      return MONTHS_BETWEEN(TRUNC(p_date,'MONTH'),TO_DATE('01-Jan-1970','DD-MM-YYYY'));
  5  end;
  6  /

Function created.

ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> 
ops$tkyte@ORA817DEV.US.ORACLE.COM> declare
  2      l_dummy number;
  3      l_start number;
  4  begin
  5      insert into run_stats select 'before', stats.* from stats;
  6  
  7      l_start := dbms_utility.get_time;
  8      for i in 1 .. 1000
  9      loop
 10          l_dummy := f1(sysdate+i);
 11      end loop;
 12      dbms_output.put_line( (dbms_utility.get_time-l_start) || ' hsecs' );
 13  
 14  
 15      insert into run_stats select 'after 1', stats.* from stats;
 16      l_start := dbms_utility.get_time;
 17      for i in 1 .. 1000
 18      loop
 19          l_dummy := f2(sysdate+i);
 20      end loop;
 21      dbms_output.put_line( (dbms_utility.get_time-l_start) || ' hsecs' );
 22  
 23      insert into run_stats select 'after 2', stats.* from stats;
 24  end;
 25  /
44 hsecs
25 hsecs

PL/SQL procedure successfully completed.

ops$tkyte@ORA817DEV.US.ORACLE.COM> select a.name, b.value-a.value run1, c.value-b.value run2,
  2         ( (c.value-b.value)-(b.value-a.value)) diff
  3    from run_stats a, run_stats b, run_stats c
  4   where a.name = b.name
  5     and b.name = c.name
  6     and a.runid = 'before'
  7     and b.runid = 'after 1'
  8     and c.runid = 'after 2'
  9     and (c.value-a.value) > 0
 10     and (c.value-b.value) <> (b.value-a.value)
 11   order by abs( (c.value-b.value)-(b.value-a.value))
 12  /

NAME                                 RUN1       RUN2       DIFF
------------------------------ ---------- ---------- ----------
LATCH.done queue latch                  1          0         -1
LATCH.ksfv messages                     1          2          1
LATCH.undo global data                  0          1          1
STAT...free buffer requested            3          2         -1
STAT...parse count (total)              6          5         -1
STAT...opened cursors current           2          1         -1
STAT...opened cursors cumulati          6          5         -1
ve

STAT...enqueue releases                 3          2         -1
LATCH.direct msg latch                  0          2          2
LATCH.session queue latch               0          2          2
STAT...enqueue requests                 4          2         -2
LATCH.redo writing                      1          3          2
STAT...redo entries                    16         13         -3
LATCH.cache buffers lru chain           2          6          4
LATCH.messages                          1          5          4
STAT...db block changes                27         22         -5
LATCH.checkpoint queue latch            2          8          6
LATCH.redo allocation                  14         24         10
LATCH.shared pool                     129        118        -11
STAT...recursive cpu usage             28          4        -24
STAT...redo size                    21488      21432        -56
STAT...buffer is not pinned co       1016         16      -1000
unt

STAT...consistent gets               1011         11      -1000
STAT...execute count                 1005          5      -1000
STAT...table scan blocks gotte       1000          0      -1000
n

STAT...table scans (short tabl       1000          0      -1000
es)

STAT...table scan rows gotten        1000          0      -1000
STAT...no work - consistent re       1008          8      -1000
ad gets

STAT...calls to get snapshot s       1006          5      -1001
cn: kcmgss

STAT...recursive calls               1031         30      -1001<b>
LATCH.library cache                  2281        272      -2009</b>
STAT...db block gets                 4024         17      -4007
STAT...session logical reads         5035         28      -5007
LATCH.cache buffers chains          10093        133      -9960

34 rows selected.


<b>question from me -- why do so many people "select .... from dual" in plsql when a simple assignment would work??</b>

 

What if it is called from within a package?

Prince, May 09, 2002 - 8:54 pm UTC

Also, I forgot to tell you that this function is a part of a package and is mostly used within the package.

In that case, will it have much difference if it is a function or direct "months_between"?

Meantime, let me test the difference.

Thanks,

Tom Kyte
May 09, 2002 - 8:57 pm UTC

package/function/procedure -- won't matter. It's the recursive sql, the context switching. DO IT STRAIGHT IN SQL. It'll be faster, cheaper, better.

Perfect.

Prince, May 09, 2002 - 9:51 pm UTC

The explation was wonderful.

Did some testing and straight sql was way faster.

Thanks.

Also isn't it true we can not always do direct assignment?

Prince, May 09, 2002 - 10:18 pm UTC

IN some cases, like assigning "NEXTVAL", we have to select it from dual. Or, can this also we avoided by some means?


Tom Kyte
May 10, 2002 - 6:40 am UTC

That is correct, NEXTVAL must come from the database, you must select it.

I'm talking about the 99.99% of the times when you don't need to!

Raw trace file waits analysis

Amit, June 21, 2002 - 1:24 pm UTC

Hi Tom,

I have searched the Oracle docs, metalink articles,
browsed your site, and read the chapter in your book discussing
raw trace files and how to interpret them.

I have several questions related to traces with waits, some of which are already addressed in this thread,
so thanks are in order already...

Generally- what are the rules regarding appearances of wait
event lines interspersed between the other activity lines in the
trace file, and what are the rules for associating a wait event
with a statement / activity within a statement.

Any algorithm / insight you could give in this regard will be highly appreciated.


For example-

:
(01) WAIT #1
:
====================================
(02) PARSING IN CURSOR #1 hv=A dep=0
:
(03) EXEC #1
:
(04) WAIT #1
:
(05) EXEC #1
:
(06) WAIT #1
:
====================================
(07) PARSING IN CURSOR #2 hv=C dep=1 -- recursive stmt as result of stmt C
:
(08) WAIT #1
:
====================================
(09) PARSING IN CURSOR #1 hv=B dep=0
:
(10) PARSE #1
:
(11) WAIT #1
:
(12) EXEC #1
:
(13) FETCH #1
:
(14) WAIT #1
:
(15) STAT #1
:
(16) WAIT #1
:


1. Associating waits with stmts.
Consider the waits on cursor #1-
(01): How can we know if this wait is related to stmt "A" ?
could it be related to a previous stmt of CURSOR #1 ?
are there different answers depending on the event itself ?
(04): Since this wait is between two EXEC operations while the cursor
is working on stmt "A", is it always true that this wait
is related to directly to stmt "A" ?
(06): This wait is between an EXEC operation of stmt "A", and
PARSE,EXEC operations of stmt "B".
How can we determine if it is related to stmt "A" or "B" ?
(08): This wait happens after cursor #2 (which is recursive SQL of stmt "B" is parsed).
Does this mean that the wait is related to "B" ?

2. Associating waits with activities.
In the trace file we see the following different activity entries-
PARSE/EXEC/FETCH/UNMAP
BINDS
STAT
Assuming we know how to determine that a wait is associated with a statement,
a) How can we associate the wait with an activity/category (PARSE/EXEC/FETCH/UNMAP/STAT/BINDS) ?
b) which are valid categories ?
c) Are there waits which cannot be assigned directly to an activity / category
(related to something that does not appear in the tracefile) ?
d) For each activity- which waits appear before the activity, and which appear after the activity ?
for example "log file sync" comes after the EXEC of a commit, and I/O waits appear before the EXEC operation.

Thanks


Tom Kyte
June 21, 2002 - 2:31 pm UTC

Everything is linear, everything is linear.

(01) is not related to A since A was not parsed until after (01) happened. That wait was for some other cursor#1.

(04) yes

(06) by the #1 associated with it. It is associated with cursor #1

(08) not unless the wait had a #2 with it, which it doesn't, hence it is associated with #1.

2)

a) Everything is linear. (waits won't be associated with stat, binds btw. They are purely "informational" messages -- not database activity). Look up at the last thing that took place for the WAIT #1 (or #2 -- whatever)

b) catagories of what? I don't know what categories you mean.

c) no.

d) everything is linear. waits are recorded as they happen. If a wait happened before an activity -- it was from the PRIOR activity. You may wait for log file syncs during the execution of a statement if you had to wait for room in the redo log buffer as well. they can happen anywhere.



values for cache buffer lru chain

A reader, November 12, 2002 - 4:46 am UTC

Hi

I have a question regarding DB_BLOCK_LRU_LATCHES. Oracle suggest it to be 2 * CPU however, if I have 2 instances can i use that rule too???

For example I have a 8 CPU HP_UX Server with two instances, can I set DB_BLOCK_LRU_LATCHES to 16 for both of them?

Tom Kyte
November 12, 2002 - 10:22 am UTC

yes (and in current releases, you are freed from having to even consider this -- no such parameter exists anymore.

about LRU latch, again

A reader, November 12, 2002 - 5:24 pm UTC

Hi

Thanks for the reply. I always thought when Notes on Metalink suggest values for parameters are per server basis, for example for LRU latch it states it has to be twice the number of CPUs so my logic was telling me that then if there are more instances I should get that value and divide the number of LRU latches into number of instances! Something like if I have 8 CPU, 4 instances I would use (2 * 8)/4 and set that value to each instances' LRU latch...
Now, seems my logic was incorrect :-)
We are using Oracle 8i and not going to migrate to any new version probably for a long time (just migrated the app from oracle 7 to 8i) so I guess I will have to play with these parameters

thank you



latches serialize

A reader, November 12, 2002 - 5:29 pm UTC

Hi

You said that latches serialize operations, my question is what if I have more than one child latch for a specific latch? Or that is not possible?

When parse a query one of many latches required by Oracle is library cache latch. Is there only one parent latch for this one? Or there are several child latches like cache buffer chain (one per data block in data cache)? If there are more than one library cache latch then does it parallelize several parses or all parses are done serially one after another?


Tom Kyte
November 13, 2002 - 12:44 pm UTC

we divvy up some chunks of memory -- eg: there is more then one library cache each protecting different regions of the library cache -- to increase concurrency.

btw: with the cache buffer CHAIN (chain=list) it is not one per block, but rather per set of blocks in a chain.

how to avoid latch free waits

H, December 12, 2002 - 2:58 pm UTC

How do we avoid latch free waits. Any thoughts?

Tom Kyte
December 12, 2002 - 3:03 pm UTC

what latches are you waiting on....

99 times out of a 100 its library or shared pool -- indicating you aren't using bind variables. use bind variables.

Latch waits

h, December 12, 2002 - 4:07 pm UTC

lib cache
cache buffer chains
redo copy

I dont know how much these waits are impacting performance. I am gen. the tkprof hopefully that will give some clues.

Tom Kyte
December 12, 2002 - 4:08 pm UTC

tkprof won't, statspack is what you want.

run a 15 minute window (take a snap, wait 15 minutes, take a snap)

then look at your top 5 waits.

review

Sven, May 12, 2004 - 8:59 am UTC

Hi Tom,

Recently when I wanted to compile one package my session start hanging and when I look into v$session_wait I sow that session is waiting on "library cache lock". I am looking currenly in docs. what could be the reason for this, but if you have time could you give me some info about "library cache lock"?

Thanks,

Sven


Tom Kyte
May 12, 2004 - 6:44 pm UTC

someone was running it. you cannot compile it while that is true. (probably meant library cache pin). you were trying to get something exclusively that others had already and were waiting for it.

review

Sven, May 13, 2004 - 7:29 am UTC

Not quite....The latch was "library cache lock" not the "library cache pin". Basically I have tried to compile the procedure that only I use and on a objects that only I use and got above message. What I have managed to figure out that DB has tried to search the library cache chain and DB has tried to obtained a lock in order to locate an object in the library cache, but somehow this chain has been already locked by some other user. If my scenario is correct I can not undestand how it is possible that some else who has issued some other SQL statement is in the same chain like "alter package xxx compile", or I am wrong?

Tom Kyte
May 13, 2004 - 10:35 am UTC

that lock is gotten during a parse and/or compilation. It should be held for very very short periods. If you can reproduce that, please contact support.

Arun Gupta, May 13, 2004 - 11:47 am UTC

The scenario described by Sven happens almost everyday to our DBAs. This is what happens:
a) Developer runs a stored proc in a package. Due to a code bug or a long running query, the session seems to hang.
b) The developer realizes the mistake, closes the SQL*Plus window (or TOAD or Golden) and asks me to kill his session in Oracle.
c) I kill the session, but Oracle is not able to cleanup the user process at OS level. The session is shown as KILLED in v$session.
d) The developer opens a brand new SQL*Plus session and tries to compile the same package, his session hangs.
e) I kill the user's processes at OS level and the session which was hanging in d) proceeds.

I have seen that if I do not kill the user's OS process, this situation can persist for several days if the database is not restarted. Maybe this is what is happening here.



Tom Kyte
May 13, 2004 - 3:01 pm UTC

that would/should show up as the library cache pin.

More questions on latches

Arun Gupta, June 03, 2004 - 3:51 pm UTC

Tom,
I have some questions about latches:
a) Is latch address the memory location which indicates whether or not the latch is available?

b) If the latch requested with an immediate mode is not available the requesting process continues processing. Does this mean that acquiring the latch was not important enough for the requesting process?

c) The statspack report shows many different type of latches as:
library cache kglupc: child 0 821 831
library cache kglpndl: child: before pro 0 775 961
library cache kglpin: child: heap proces 0 647 1,148

Is there any documentation on the meaning of kglupc, kglpndl, kglpin etc.?

Thanks

BTW the Name and Email text boxes are now yellow colored. Does this have any significance?

Tom Kyte
June 03, 2004 - 4:12 pm UTC

a) its a pointer to the protected thing, the thing being protected.

b) means it could do something else, it would have to come back if it needs it. or it means there are other places it could do work -- eg: the redo copy latch, if one immediate get fails -- no worry, there are others, we'll try them.

c) those are internal names, not meaningful too much outside of support.

(they are not yellow to me?)

Arun Gupta, June 03, 2004 - 5:24 pm UTC

Thanks for reply. The yellow color in text box is caused by the Google toolbar. Sorry...!!

Latch Deadlocks

Vivek Sharma, July 01, 2004 - 10:12 am UTC

Dear Tom,

I have a book written by Steve Adams and published by O'Reilly. The Name of the Book is "Oracle 8i Internal Services for Waits, Latches, Locks and Memory". The book gives a very good indepth knowledge of oracle internals.

In the 3rd Chapter on Latches, the author explains about Latch Levels and possiblility of Latching deadlocks. The paragraph states that

"It is very common for an Oracle process to need to hold a number of latches concurrently. Therefore, there might be a possibility of latching deadlocks occuring - namely, one process holding Latch A and another process holding Latch B, and both process spinning and waiting for the alternate latch. Oracle ensures that this cannot happen by ensuring that latches are always taken in a defined order, when more than one latch is required. To support this, every latch in Oracle has a level between 0 and 15, and a 2-byte bitmap is maintained for every process representing the level of the latches that the process is currently holding. When a process attempts to get a latch in willing-to-wait mode, a check is made to ensure that it is not already holding a latch at the same level or at a higher level. In general, if this rule is broken, an ORA-600 [504] internal error is raised."

A Paragraph just below the above states that

"Contention for a high-level latch such as the redo allocation latch(level 6) can easily exacerbate contention for lower-level latches such as the cache buffers chains latches (level 1). This happens because the processes
needing the higher-level latch have to sleep while holding a lower-level latch. So the lower-level latches are held for much longer than normal. An indication of this factor is available in WAITS_HOLDING_LATCH columns of the v$latch family views. That statistic represents the number of times
that a process missed on this latch while holding another latch. For example, the WAITS_HOLDING_LATCH statistic for the 'redo allocation' latch counts both 'spin gets' and 'sleep gets' against the 'redo allocation' latch while
holding another latch such as a 'cache buffers chains' latch. However, it could also include other waits such as log buffer space waits."

Hence the author says that the tuning should be done based on the descending order of the latch level if the WAIT_HOLDING_LATCH statistics appears to be a significant factor in contention for that latch.

In this scenario, the session would show high waits on cache buffer chains latch rather than redo allocation latch, whereas the problem was the wait event
log buffer space.

I have a question on this :

1.What are your comments on this ?

2 How do I know what all latches have been acquired by a user process ? Which v$view or x$view will help me getting this information.

3 Does "Latch Activity" Section of Statspack reports takes care of this ? I feel Statspack report will just show the latches based on the descending order of waits rather than the levels (Waits / Latches) due to which they had to wait.

Your explanation will help me understand the things more clearly.

Awaiting your reply.

Thanks and Regards
Vivek


Tom Kyte
July 01, 2004 - 11:30 am UTC

1) steve is a smart guy, exceptionally smart. looks factual to me.

2) many times you have to do process dumps, however, latches by design are held for TINY amounts of time - two things -- 1) by the time you see what they have, they have changed and in fact, they are changing while you are looking (the latches reported at a top of a dump might not be held by the time the dump was finished printing), 2) the very act of you LOOKING changes the behavior of the lookee (sort of like the heisenburg uncertaintity principle here...)

3) no it does not. it cannot, it is just information you can use when looking at overall activity. truthfully -- this is to me microdata that doesn't come into play 99.999% of the time when 'tuning'. I find 'tuning' much easier if you concentrate on *the application* and not *the system*. I personally have never gone to this level. I avoid internals as much as possible (and have been very successful at doing so, I don't use "source code", I don't use magic parameters, I use the same information available to everyone.

cache buffer chains, latch free hangs the whole database

Marvin, July 12, 2004 - 4:39 pm UTC

Hi

Our database was running smoothly until today, no changes are made, no parameter changes, no application modifications and statistics are up to date.

This afternoon suddently we see 200 concurrent sessions where 100 are waiting for LATCH FREE, 66 so it's cache buffer chains. We rebooted twice the instance and problem continues. I dont understand, when I see v$latchholder I can identify the session, however if I check v$session?Wait it's event is latch free also and the second of wait increases and I am not able to see who is blocking this latch holder. If I check v$latch I can see gets are increasing very very slowly. If I check v$session?Wait, p1, p2 all 100 sessions which suffers latch free are waiting for same p1 and p2. If I kill the latch holder, who is running a small report which normally takes 2 minutes, pmon starts to consume 100% CPU, I dont understand what need to be freed by PMON if this is a query so there are no locks involved? And PMON takes like 20 minutes to clean up however after pmon finishes we see latch free increasing again.

What can be the cause oif this. How can a query hangs the whole database? And all happens suddently!


We are using 8.1.7.4 on Suse Enterprise Linux 7.

Tom Kyte
July 12, 2004 - 11:36 pm UTC

if you have a hang issue -- support is the first and primary place to go (i travel lots, actually on vacation this week -- not the best place to go with a P1 sort of issues)

Identifying latch free waits

Thiru, July 28, 2004 - 10:49 pm UTC

Hi Tom,

Running statspack in a 5 min period reveals the following numbers. A test script comprising of queries that was being executed it 1000's of times a day and have been identified to be the culprit. The queries are severly computational intensive though the no of records that are accessed are only in a few thousands. Like to get your feedback. 
Excuse me asking questions inbetwen the report.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:  100.00       Redo NoWait %:              100.00
            Buffer  Hit   %:  100.00    In-memory Sort %:              100.00
            Library Hit   %:  100.00        Soft Parse %:               99.98
         Execute to Parse %:   85.58         Latch Hit %:               99.68
Parse CPU to Parse Elapsd %:   31.17     % Non-Parse CPU:               97.76

 Shared Pool Statistics        Begin   End
                ------  ------
             Memory Usage %:   34.82   35.06
    % SQL with executions>1:   96.05   95.97
  % Memory for SQL w/exec>1:   93.57   93.66

                                      Total Wait   wait    Waits
Event                               Waits   Timeouts   Time (s)   (ms)     /txn
---------------------------- ------------ ---------- ---------- ------ --------
latch free                          1,646        856         32     24    149.6
log file sync                          62          0          2     38      6.9

The top query has the following:

                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value

--------------- ------------ -------------- ------ -------- --------- ----------

      1,566,936          405        3,805.4   61.2    25.91     93.39 1362401524

Does the figures look ok to you. How do we figure out the latch free waits going on?
What is the Timeout mean?
The memory usage looks ok.
The parse cpu to parse elapsed I feel is low at 31%. How do we get to more details on
this?

Also the other stats of my database:

Why is session pga memory maxed out at 331080. You have recommended starting 
pga_agg_target at 500m. Is the settings below to do something with the query above 
taking the time?

SQL> sho parameter pga

NAME                                 TYPE        VALUE
------------------------------------ ----------- -------------------
pga_aggregate_target                 big integer 104857600


SQL> sho parameter policy

NAME                                 TYPE        VALUE
------------------------------------ ----------- -------------------
workarea_size_policy                 string      AUTO

SQL> select * from my_stats where value >0 order by value desc;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
session pga memory                                                   331080
session pga memory max                                               331080
session uga memory                                                   143896
session uga memory max                                               143896
bytes sent via SQL*Net to client                                       1210
bytes received via SQL*Net from client                                  431
recursive calls                                                          34
buffer is not pinned count                                               14
session logical reads                                                     8
workarea memory allocated                                                 8
consistent gets                                                           8

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
opened cursors cumulative                                                 7
calls to get snapshot scn: kcmgss                                         7
table fetch by rowid                                                      7
execute count                                                             7
parse count (total)                                                       7
no work - consistent read gets                                            6
user calls                                                                4
SQL*Net roundtrips to/from client                                         4
recursive cpu usage                                                       2
consistent gets - examination                                             2
enqueue requests                                                          2

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
enqueue releases                                                          2
logons cumulative                                                         1
rows fetched via callback                                                 1
parse time cpu                                                            1
parse count (hard)                                                        1
parse time elapsed                                                        1
index fetch by key                                                        1
opened cursors current                                                    1
CPU used when call started                                                1
CPU used by this session                                                  1
logons current                                                            1


 

Tom Kyte
July 29, 2004 - 7:33 am UTC

what were the latches for and is 32 seconds of wait time over ALL of the sessions out there relevant?

If you want to tune an application, don't use statspack, use tkprof.

A reader, July 28, 2004 - 10:52 pm UTC

I missed out on the following info:

Buffer Cache: 208M Std Block Size:
8K
Shared Pool Size: 256M Log Buffer:
512K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 2,922.59 66,245.33
Logical reads: 10,942.14 248,021.78
Block changes: 6.90 156.33
Physical reads: 0.02 0.44
Physical writes: 0.00 0.11
User calls: 115.35 2,614.56
Parses: 72.87 1,651.67
Hard parses: 0.01 0.33
Sorts: 48.28 1,094.33
Logons: 5.75 130.33
Executes: 505.43 11,456.44
Transactions: 0.04

% Blocks changed per Read: 0.06 Recursive Call %: 85.47
Rollback per transaction %: 0.00 Rows per Sort: 51.40

A reader, July 29, 2004 - 11:39 am UTC

All of the latch free had p2=97,156,157 ie., library cache, library cache pin or cache buffers chain.

Tom Kyte
July 29, 2004 - 1:31 pm UTC

means you spend lots of time parsing over and over (also why the parse cpu/ela is low, you are waiting to parse sql)

session_cached_cursors can help.

parse once, execute many is the real solution if you can do that.

Thiru, July 29, 2004 - 2:15 pm UTC

I dont see session_cached_cursors view. 9.2.03 ver. Can we do a query that parses once and executes many times. In this case of ours, the select on the view is called number of times by different sessions. 

SQL> select CHILD#  "cCHILD"
  2  ,      ADDR    "sADDR"
  3  ,      GETS    "sGETS"
  4  ,      MISSES  "sMISSES"
  5  ,      SLEEPS  "sSLEEPS" 
  6  from v$latch_children 
  7  where name = 'cache buffers chains'
  8  order by 4 desc, 1, 2, 3
  9  
SQL> /

    cCHILD sADDR                 sGETS    sMISSES    sSLEEPS
---------- ---------------- ---------- ---------- ----------
        48 0000040093BD9270    3205499      64039      14289
       874 0000040093CB5FD0    6580169      53812      10962
       786 0000040093C9E9D0    2310133      24966       2022
       683 0000040093C83410    2462041      19295       8150
       682 0000040093C82FD0    3688529      15910       2634
       508 0000040093C54C50     886247       8973        168

Any more suggestions on finding out whether the query itself needs tuning or to raise the buffer cache area.

Thanks. 

Tom Kyte
July 29, 2004 - 2:21 pm UTC

session_cached_cursors is an init.ora parameter -- to enable a softer soft parse.


if your major waits are library cache related, it is due to parsing, not the performance of the query itself.

Thiru, July 29, 2004 - 10:34 pm UTC

Will individual queries put in a stored proc do better on parsing? If I have 5 queries in a stored proc, then will the parse count be 1 instead of 5 for 5 individual queries?

Thanks.

Tom Kyte
July 30, 2004 - 7:35 am UTC

no, they will still be parsed. the nice thing about plsql is that it will parse the statement once per session in general -- instead of once per execution as most programmers would code.

A reader, July 30, 2004 - 4:19 pm UTC

I find latch free (cache buffer chains) still going on and introducing session_cached_cursors degraded the performance. I also put sql's in a stored proc causing more elapses time on parsing compared to earlier. By the way I found this on meta:

Bug:2079526 - FREE BUFFER WAITS DURING CHECKPOINT If the number of dirty buffers that need to be written to disk is larger than the number that dbwr can write per batch, this bug may cause degradation in database performance. The wait events in V$session_wait or statspack will show lots of waits on 'latch free' (for "cache buffers chain" latches) and/or waits on 'free buffer waits'.

Do you feel this could be the cause. 9.2.0.3 ver.

Tom Kyte
July 30, 2004 - 6:16 pm UTC

not that i don't believe you -- but prove it, i've never seen session cached cursors degrade performance in a manner you could detect in statspack. How did you correlate this?


putting sql into plsql is going to have NO NET EFFECT either. the code paths to parse are *identical*


if you are suffering from library latch contention -- looking at free buffers waits is going to serve what purpose?

A reader, August 02, 2004 - 12:52 pm UTC

A procedure with just one sql query is being executed at a rapid interval.
I find the trace file shows "Misses in library cache during parse" in
all the trace files. The number
varies sometimes between 5 and 18. Why is this happening even after
executing DBMS_SHARED_POOL.KEEP for the procedure.
What steps could I take to avoid these misses.
My latch free waits are for 97,156.

Tom Kyte
August 02, 2004 - 1:04 pm UTC

the procedure might be kept, but the sql cursor isn't being kept.

sounds like you either

a) need to increase shared pool size, it is getting kicked out

b) you have people performing operations that cause the cursor to be invalidated -- like a truncate, a revoke, an alter, an analyze, something.


I stopped at 97,156 red lights on my last really long car trip. good or bad? you cannot tell, insufficient data.

A reader, August 02, 2004 - 1:23 pm UTC

Excuse me. I meant 156 and 97 as Latch#.

Even when I run the procedure from sqlplus, I do see mis=1 in the trace.

I had mentioned about many misses. They are almost 1 to 1 for every parse. 16 parses. 16 lib cache misses. We are testing these runs from a java code through jdbc call that puts simultaneous bursts of requests to the db and each time the alter session for 10046 trace level 12 is called at the head.

Shall try increasing shared pool size. But my statspack shows around 55% mem usage for shared pool under efficiency percentage part of the report.

Tom Kyte
August 02, 2004 - 2:22 pm UTC

do you see lots of child cursors in v$sql for that sql?

describe the process here - you would expect a hard parse the first time, but soft after that. is this a single statement parsed 16 times?

A reader, August 02, 2004 - 2:03 pm UTC

Does incresig session_cached_cursor parameter use a good part of the lib cache. I had upped it to 300 from 0.

Tom Kyte
August 02, 2004 - 3:33 pm UTC

it should not affect the library cache at all, it is a client thing. it'll increase the size of the client (pga/uga) marginally.

latch free waits

Thiru, August 05, 2004 - 10:00 am UTC

When the alter session set events was used in the stored proc itself running the sql queries, then there
was no misses in lib cache while when the alter session was used as a separate sql statement call from the jdbc and
then calling the stored proc, resulted in as many hard parses as there were calls. Is there any explanation for this? Inspite of
all this, I find lots of p2=97 latch free wait, though removing functional calls in the view improved the performance.
How does a select statement called 400 times ( in our case) through a loop result in hot blocks?
Is there a way to do some changes in the buffer cache to isolate these latch free waits?
I increased the buffer cache size from 200 to 350m but saw the no of chain links remaining at 1024. What would be the
best approach to this issue?

Thanks for all what you do.

Tom Kyte
August 05, 2004 - 1:05 pm UTC

enabling trace can cause a child cursor to be created itself.


if 1,000 people concurrently run the same sql statement hitting the same block(s) - you have a hot block.


library cache latch and hard parse

sven, April 23, 2005 - 6:14 pm UTC

Hi Tom,

During the hard parse operation, which latch have to be held by the process: library cache latch or library cache load lock latch?

Thanks,

sven

Tom Kyte
April 23, 2005 - 7:04 pm UTC

well, if we compare hard parse 10,000 times vs hard parse 1 time and just bind/execute 10,000 times:

ops$tkyte@ORA9IR2> create table t ( x int );
 
Table created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace procedure p1
  2  as
  3          l_cnt number;
  4  begin
  5          for i in 1 .. 10000
  6          loop
  7                  execute immediate 'select count(*) from t where x = ' || i into l_cnt;
  8          end loop;
  9  end;
 10  /
 
Procedure created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> create or replace procedure p2
  2  as
  3          l_cnt number;
  4  begin
  5          for i in 1 .. 10000
  6          loop
  7                  select count(*) into l_cnt from t where x = i;
  8          end loop;
  9  end;
 10  /
 
Procedure created.
 
ops$tkyte@ORA9IR2>
ops$tkyte@ORA9IR2> exec runStats_pkg.rs_start
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec p1
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec runStats_pkg.rs_middle
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec p2
 
PL/SQL procedure successfully completed.
 
ops$tkyte@ORA9IR2> exec runStats_pkg.rs_stop(1000)
Run1 ran in 527 hsecs
Run2 ran in 53 hsecs
run 1 ran in 994.34% of the time
 
Name                                  Run1        Run2        Diff
STAT...parse count (hard)           10,001           3      -9,998
STAT...opened cursors cumulati      10,004           5      -9,999
STAT...parse count (total)          10,004           5      -9,999
LATCH.row cache enqueue latch       40,000           4     -39,996
LATCH.row cache objects             40,000           4     -39,996
LATCH.child cursor hash table       80,008          26     -79,982
LATCH.library cache pin alloca      80,027          37     -79,990
LATCH.library cache pin            110,061      20,073     -89,988
LATCH.shared pool                  151,968      10,173    -141,795
LATCH.library cache                210,116      20,156    -189,960
 
Run1 latches total versus runs -- difference and pct
Run1        Run2        Diff       Pct
775,553     113,629    -661,924    682.53%
 
PL/SQL procedure successfully completed.


The big use latches are there at the bottom. 

review

sven, April 24, 2005 - 6:38 am UTC

Thanks Tom.

What I can not understand is the use of:
1. library cache hash chains latch
2. library cache load lock latch

Library cache load lock latch have the same latch level as library cache latch meaning the process can not hold them both at the same time and the library cache hash chains latch have the higher level than cache and load lock latches. During parsing (hard/soft) the process hold only library cache latch.
Do you happen to know in which cases the process need to obtain above mentioned latches?


Thanks.

sven

Tom Kyte
April 24, 2005 - 9:56 am UTC

1* select * from v$latchname where name like 'library cache%'

LATCH# NAME
---------- ------------------------------
157 library cache
158 library cache pin
159 library cache pin allocation
160 library cache load lock


hash chains?

the load latch is used to protect "loads" into the library cache - things we load in there to do other stuff.

review

sven, April 24, 2005 - 11:34 am UTC

select *
from v$latch
where (name like 'lib%'
or name like 'shared%')

ADDR LATCH# LEVEL# NAME
40B8D988 198 7 shared pool
40B8D988 199 5 library cache
40B8D988 200 6 library cache lock
40B8D988 201 6 library cache pin
40B8D988 202 3 library cache pin allocation
40B8D988 203 3 library cache lock allocation
40B8D988 204 5 library cache load lock
40B8D988 205 9 library cache hash chains
40B8D988 236 6 shared server info


Tom Kyte
April 24, 2005 - 11:57 am UTC

that's why versions are always 100% relevant. 10g.

Looks like new latches in 10g. Looks like it is related to the hashing of sql into the shared pool.

Are you having an issue with them?

review

A reader, April 24, 2005 - 11:49 am UTC

Tom,

Regarding your example, also is interesting that on a Run2 session had only 3 parse count (hard) but on the other hand the session had to obtain almost 11000 (and the PL/SQL block is executed 10000) times the shared pool latch. Why that much? I would expect much less shared pool latches in Run2?

If the session is performing a hard parse, does the sessions have to during the parse hold both the library cache lock and library cache pin on a shared cursor at the same time or the library cache lock is hold only until the library cache pin is obtained?

Also, does the library cache pin latch is hold until the library cache pin on a shared cursor is obtained and after getting the library cache pin on a cursor the library cache pin latch is released?

Thanks,

sven


Tom Kyte
April 24, 2005 - 11:59 am UTC

Have to make sure the sql is still there after all -- you cannot execute that which no longer exists. You can get misses in the library cache during execute.

You got much less latching in run2 -- 1/7th the latching. But you still have to latch.



review

Sven, April 24, 2005 - 12:08 pm UTC

Tom,

Does the library cache lock latch is hold until session get library cache lock or...?

Thanks,

sven


Tom Kyte
April 24, 2005 - 1:02 pm UTC

I don't really track it down to that level of detail. Goes beyond where I normally go.

library cache lock waits

max, April 26, 2005 - 8:21 am UTC

hi tom,

we have an application server runnig some query statements. some of these statements seemed to be hanging today. i put some of these query statements into our 9iR2 (9.2.0.4) database via sqlplus. i saw in v$session_wait that my session was waiting on "library cache lock" in following. i canceled the query (STRG+C), killed some user processes i thought they might causing these waits, executed the query again, got same effect, canceled the query again,... i turned sql tracing on... and saw in the trace file that also the "alter session set events..." statement is causing "library cache lock" waits. see snipet below:

*** 2005-04-26 10:48:39.148
*** SESSION ID:(99.11028) 2005-04-26 10:48:39.107
Version: Generic_108528-23
Machine: sun4u
Instance name: xxxx
Redo thread mounted by this instance: 1
Oracle process number: 112
Unix process pid: 22234, image: oracle@xxxx (TNS V1-V3)

*** 2005-04-26 10:48:39.148
*** SESSION ID:(99.11028) 2005-04-26 10:48:39.107
APPNAME mod='SQL*Plus' mh=3669949024 act='' ah=4029777240
=====================
PARSING IN CURSOR #3 len=69 dep=0 uid=41 oct=42 lid=41 tim=27551076876581 hv=2004533713 ad='f27efba8'
alter session set events '10046 trace name context forever, level 12'
END OF STMT
EXEC #3:c=0,e=219,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=2,tim=27551076836565
WAIT #3: nam='SQL*Net message to client' ela= 8 p1=1650815232 p2=1 p3=0
WAIT #3: nam='SQL*Net message from client' ela= 5981933 p1=1650815232 p2=1 p3=0
WAIT #3: nam='library cache lock' ela= 3001981 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3007660 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3007634 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3003757 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3001727 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3007529 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3007628 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3007646 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3007586 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3008359 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3006908 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3007654 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3007712 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3007643 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3007524 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3015973 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3008983 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3007606 p1=16682348264 p2=17098128296 p3=201
WAIT #3: nam='library cache lock' ela= 3007638 p1=16682348264 p2=17098128296 p3=201
...

Q1) any idea about that?

Q2) how can i find out what is going on? how can i find out WHO is causing this "library cache lock" wait to kill the responsible sessions? please bear in mind that i got the same wait stat's as i put the query in there for the first time!

Q3) i have your excellent book "oracle expert one on one". is this topic covered in there?



Tom Kyte
April 26, 2005 - 8:41 am UTC

this is an unusual situation, I would recommend utilizing support to walk through this one. It isn't something that should happen. that lock is taken during parsing of various statements like create tables, views, and such.

library cache lock

max, April 26, 2005 - 9:18 am UTC

hi tom,

thank you for your immediate reply. i will ask support.

anyway. is there a way to find out WHO is responsible for this library cache lock wait situation? you told us that lock is taken out during parsing and compilation of statements. if i can find out who is responsible for that i simply could kill their sessions. no?

Tom Kyte
April 26, 2005 - 9:44 am UTC

Ok, I simulated this by setting up a trigger:


create or replace trigger after_create
after create on schema
begin
    dbms_lock.sleep(1000);
end;
/


In one session, I issued a create or replace procedure p, and in another turning on 10046 level 12 trace and tailed the trace file from another window.  In that traced session I did the create or replace as well.

It blocked and the trace file had:

WAIT #1: nam='library cache lock' ela= 2997975 p1=1620355776 p2=1607628660 p3=301
WAIT #1: nam='library cache lock' ela= 2997981 p1=1620355776 p2=1607628660 p3=301
WAIT #1: nam='library cache lock' ela= 2997988 p1=1620355776 p2=1607628660 p3=301

Now, the p3 is different -- the 3 is the mode, I was going for Exclusive, you were going for Share mode.  The namespace is the same (1) -- table/view/sequence/etc...

DBA_DDL_LOCKS was useful:

ops$tkyte@ORA9IR2> select session_id, owner, name from dba_ddl_locks where mode_held = 'None' and mode_requested <> 'None';
 
SESSION_ID OWNER                          NAME
---------- ------------------------------ ------------------------------
        16 OPS$TKYTE                      P


That was the blocked session (the one getting traced), it didn't have a lock on the library object, was trying to get one...


ops$tkyte@ORA9IR2> select session_id, mode_held, mode_requested
  2  from dba_ddl_locks where owner = 'OPS$TKYTE' and name = 'P';
 
SESSION_ID MODE_HELD MODE_REQU
---------- --------- ---------
        14 Exclusive None
        16 None      Exclusive


would show me who has what and who is waiting for it. 

library cache lock

max, April 26, 2005 - 10:15 am UTC

thanks again. lets go one step further. i think to make really sure that these are the both sessions conflicting for real we might follow the chain: dba_ddl_locks.session_id = v$session.sid and v$session.sql_address = v$sql.address pointing to the same sql statement. no?

Tom Kyte
April 26, 2005 - 10:30 am UTC

you'd be looking for the thing you want.... (dba_ddl_locks query 1)

and whose got it in a conflicting mode (dba_ddl_locks query 2)

for me, in 9i -- since ddl isn't in v$sql -- one step further probably wouldn't have happened....)

library cache lock

max, April 26, 2005 - 11:04 am UTC

your last query shows all sessions having or requesting exclusive ddl locks. having n developers working on n-1 different procedures will result in n-1 exclusive holds and 1 wait. cumbersome. no? isn't there a connection between the both sessions as with dml locks?



Tom Kyte
April 26, 2005 - 11:09 am UTC

any one of the exclusive locks would be enough -- no?

give it a try when you get blocked -- your circumstances are a little bit different than mine, see what you see. for example WHY is an alter session even getting this, what could it be waiting for. That in itself would be very interesting.

library cache pins

abc, June 16, 2005 - 7:58 pm UTC

PRd1
EVENT
----------------------------------------------------------------
SQL*Net break/reset to dblink
SQL*Net message from client
SQL*Net message from dblink
SQL*Net more data to dblink
db file sequential read
library cache pin
pipe get
pmon timer
rdbms ipc message
smon timer
SQL> select count(EVENT) from v$session_wait;
COUNT(EVENT)
------------
179
 

Analyze statement stuck on LIBRARY CACHE LOCK

Nags, July 18, 2005 - 11:20 am UTC

I am running the following 

ANALYZE TABLE COMPANY COMPUTE STATISTICS;

This statement is waiting for LIBRARY CACHE LOCK.  This is the only statement that is waiting for this lock.  There is no other session.  How do I identify what else is the bottle neck ?

I am putting in some of the info in the v$latch table.

SQL>select name name,  gets
  2   from v$latch
  3  where gets != 0
  4* and name like '%cache%';

NAME                                           GETS
---------------------------------------- ----------
cache buffers lru chain                      169387
cache buffers chains                      122791245
cache buffer handles                            923
sequence cache                               339591
row cache enqueue latch                     4141946
row cache objects                           5060219
library cache                              34142090
library cache pin                          18004906
library cache pin allocation               11932172
library cache load lock                        2642

10 rows selected.

SQL>  select name name,  gets
  2   from v$latch
  3  where gets != 0
  4* and name like '%lock%';

NAME                                           GETS
---------------------------------------- ----------
multiblock read objects                        1304
OS file lock latch                                2
dml lock allocation                         1516411
list of block allocation                     219078
user lock                                    438100
library cache load lock                        2642

6 rows selected. 

Tom Kyte
July 18, 2005 - 11:34 am UTC

what else is going on in the database, what are the other sessions doing. could someone being doing ddl to this table

and why analyze? and not dbms_stats?

Analyze statement stuck on LIBRARY CACHE LOCK

Nags, July 18, 2005 - 11:50 am UTC

There are no DDL's running. All other sessions are doing normal DML queries. One wierd behaviour is, the status of several sessions remain ACTIVE although, if I drill down, it does not give any queries that are running in those sessions. The top waits gives only LIBRARY CACHE LOCK and the Analyze statement is stuck on it.

I was doing DBMS_STATS, but this hung for more than 3 days, so I killed it and started running updates statistics.

Another observation is, oracle.exe is using 100% CPU on the machine and no queries are running. We shutdown all the application and the CPU utilization for oracle.exe stays at 100 %.



Tom Kyte
July 18, 2005 - 12:18 pm UTC

those sessions, what are they doing then? DDL won't show up in whatever tool you are using perhaps.

if you have some things active -- I'm sorry, but you are running *sometime*

I'll suggest you work with support on this so they can get your background and history here.

Analyze statement stuck on LIBRARY CACHE LOCK

Nags, July 18, 2005 - 12:25 pm UTC

I am the only person who runs DDL. The application cannot run DDL. I am 100% sure that only DML's are running.

I am using Enterprise Manager of Oracle.

My experience tells me that the status of sessions will stay active only if it is running queries, otherwise it goes back to inactive. Is my observation wrong ?

But, currently on this server, when I drilldown using Enteprise Manager, on all the ACTIVE sessions, it tells me that "No SQL is being executed by this session".

Tom Kyte
July 18, 2005 - 12:53 pm UTC

If the active sessions were in fact running DML, you would see their DML.

... the status of several sessions remain ACTIVE although,
if I drill down, it does not give any queries that are running in those
sessions. ....

that leads me to believe that perhaps -- what you say isn't what is actually happening. I've a gut feeling that says "they are executing DDL". If they are active and you see no sql, tell me -- what are they executing then?

Analyze statement stuck on LIBRARY CACHE LOCK

Nags, July 18, 2005 - 1:51 pm UTC

It is impossible for the application to use DDL.  I know the application in and out and the application framework itself does not have any support to run DDL.  Moreover, all the sessions that are remaining ACTIVE are from the application.  I checked the session properties.  I am not being rude, but these are the facts.

"If they are active and you see no sql, tell me -- what are they executing then? "

This is the exact problem I am having.  I do not see any query executing and I am not able to figure out what the sessions are doing.  The sessions have been in ACTIVE status for last 4 days.  

I ran this statement to figure out what queries are running and I got the following :

SQL> l
  1* select substr(sql_text, 1, 50) sql_text, users_executing from v$sql where users_executing > 0
SQL> /

SQL_TEXT                                           USERS_EXECUTING
-------------------------------------------------- ---------------
SELECT COUNT(:"SYS_B_0") FROM (SELECT  POETIC_USER               1
select substr(sql_text, :"SYS_B_0", :"SYS_B_1") sq               1
/* OracleOEM */ select value from v$sesstat where                1
SELECT  BACKGROUND_JOB.JOB_DESCRIPTION,T                         1
SELECT  BACKGROUND_JOB.JOB_DESCRIPTION,T                         1
SELECT  LKP_STATIC_DATA_DETAIL.LOCALIZED_DESCRIPTI               7
SELECT * FROM POETIC_VERSION                                     2
SELECT * FROM POETIC_VERSION                                     1
SELECT SEQUENCE_NAME FROM SEQUENCE_ID                            2
SELECT SEQUENCE_NAME FROM SEQUENCE_ID                            3
ANALYZE TABLE COMPANY COMPUTE STATISTICS                         1
Select id_value from SEQUENCE_ID where SEQUENCE_NA               1
Select id_value from SEQUENCE_ID where SEQUENCE_NA               1
/* OracleOEM */ select value from v$sysstat order                1

14 rows selected.
 

Tom Kyte
July 18, 2005 - 2:09 pm UTC

I'll recommend you contact support, they can help you take a process dump to see exactly what these things are.

Similar Problem - Analyze Hung and Blocking other sessions

Shreyas, July 19, 2005 - 1:34 am UTC

Hi Tom,

Our database was apparently "hung" yesterday - and I was able to notice 2 things.

1. One of the tables was getting analyzed (by the DBMS_STATS)
2. Most of the other sessions were hung and running the same query below

SELECT vs.SID, vp.spid, vs.status, vs.username dbuser,
vs.schemaname || '@' || SYS_CONTEXT ('USERENV', 'DB_NAME', 20) schemaname, vs.osuser,
vs.machine, SYS_CONTEXT ('USERENV', 'IP_ADDRESS', 20) ipaddr
FROM v$session vs, v$process vp
WHERE USERENV ('SESSIONID') = vs.audsid AND vp.addr = vs.paddr

We use an Oracle 9.2 RAC database with 2 instances. Application uses JDBC connection pool to connect to the database. Analyze was running on one node while the connections were blocked on the other. I could not see any activity happening on either node. Unfortunately, I am not very familiar with wait events (got to do some reading now!), and didn't have enough time to research and check - hence got the DBAs to kill the session running Analyze - which in turn unblocked everything else, and database was back to normal. I was wondering what could have caused the analyze to hang, and how could it cause the other sessions to execute the above query? Any guesstimates?

Thanks,
Shreyas.

Analyze statement stuck on LIBRARY CACHE LOCK

Nags, July 19, 2005 - 2:19 pm UTC

I found out the issue (but not the solution :o). There is one select query which is causing this issue. When the select query is run from the application, the CPU on the database server goes 100% and the query hangs forever. If the same query is run from SQL*Plus it runs in few seconds. This happens only on this server. This is really wierd.

These are the session which are staying in ACTIVE status forever. These sessions do not show that they are running any queries in Enterprise Manager. I need to contact support for this. Thank you for all your help.

cache lock

ADI, August 19, 2005 - 4:43 pm UTC

Tom,
We have following procedure whihc is getting stuck in library cache lock.
Wrokround is kill all session then run :) But don't you think that's funny.

Please suggest
Oracle version is 8.1.7.0

PROCEDURE TRUNCATE_TBL
(TABLE_NAME IN VARCHAR2)
AUTHID DEFINER AS CURSOR_ID INTEGER;
BEGIN
CURSOR_ID := DBMS_SQL.OPEN_CURSOR;
DBMS_SQL.PARSE(CURSOR_ID, 'TRUNCATE TABLE ' || TABLE_NAME, DBMS_SQL.V7);
DBMS_SQL.CLOSE_CURSOR(CURSOR_ID);
EXCEPTION
WHEN OTHERS
THEN DBMS_SQL.CLOSE_CURSOR(CURSOR_ID);
RAISE;
END TRUNCATE_TBL;


exec truncate_tbl('i_distribution_plan');

INSERT INTO i_distribution_plan
SELECT DO_ID,TO_LOCATION_ID,FROM_LOCATION_ID,ITEM_ID,TRANSPORT_MODE_ID,SHIPMENT_DATE,DELIVERY_DATE,QUANTITY,USER_ADJUSTED_VAL
UE,STATUS,FROM_ITEM_ID
FROM kit_distribution_plan WHERE status IN ('APPROVED', 'AUTO_APPROVED')
UNION ALL
SELECT
SUBSTR(do_id, 1, DECODE(INSTR(do_id, '_', 1, 3), 0, LENGTH(do_id)+1,INSTR(do_id, '_', 1, 3))-1),
TO_LOCATION_ID,
FROM_LOCATION_ID,
ITEM_ID,
'DEFAULT___MODE',
SHIPMENT_DATE,
DELIVERY_DATE,
SUM(QUANTITY),
0,
'AUTO_APPROVED',
ITEM_ID
FROM ws_distribution_plan
WHERE status IN ('APPROVED', 'AUTO_APPROVED')
GROUP BY SUBSTR(do_id, 1, DECODE(INSTR(do_id, '_', 1, 3), 0, LENGTH(do_id)+1,INSTR(do_id, '_', 1, 3))-1),FROM_LOCATION_ID, TO
_LOCATION_ID, SHIPMENT_DATE, DELIVERY_DATE, ITEM_ID
UNION ALL
SELECT
DO_ID,
TO_LOCATION_ID,
FROM_LOCATION_ID,
ITEM_ID,
TRANSPORT_MODE_ID,
SHIPMENT_DATE,
DELIVERY_DATE,
QUANTITY,
USER_ADJUSTED_VALUE,
STATUS,
ITEM_ID
FROM
R_distribution_plan
WHERE status IN ('APPROVED', 'AUTO_APPROVED');


Tom Kyte
August 20, 2005 - 4:41 pm UTC

please utilize support for this.

Shared Pool

atul, September 27, 2005 - 10:11 am UTC

Hi,

We are using 9i,Oracle 11i.
We started facing database hang issues due to intermeetant shared pool fragmentation.

We wish to find out what is there in shared pool,sql & progran or process execting that sql.

Could you tell me the sql to find out that


Thanks,
Atul



Tom Kyte
September 27, 2005 - 11:58 am UTC

well, that isn't going to "help" -- v$sql shows you "whats there", v$session joined to that can show you who is executing what.

But with apps - I would strongly suggest you work with support to develop a get well plan.

library cache pin

VKOUL, October 29, 2005 - 4:16 am UTC

Hi Tom,

I am going thru "Oracle Wait Interface: A practical guide to Performance Diagnostics & Tuning". If you do not mind, can you shed some expert knowledge on "library cache pin" in the context below (actually I am putting an excerpt from the book page 61):

"A common problem scenario is the use of DYNAMIC SQL from within a PL/SQL procedure where the PL/SQL code is recompiled and the DYNAMIC SQL calls something that depends on the calling procedure."

I am not able to visualize the scenario ?

Thanks

Tom Kyte
October 29, 2005 - 11:17 am UTC

that sentence is too small of a snippet to comment on and I don't happen to have the book with me this week (it is at home, I am not....)

Let me put the start of the whole paragraph, just in case it makes sense ...

VKOUL, October 29, 2005 - 6:22 pm UTC

library cache pin

The library cache pin wait event is associated with library cache concurrency. It occurs when the session tries to pin an object in the library cache to modify or examine it. The session must acquire a pin to make sure that the object is not updated by other sessions at the same time. Oracle posts this event when sessions are compiling or parsing PL/SQL procedures and views.

What actions to take to reduce these waits depend heavily on what blocking scenario is occurring. A common problem scenario is the use of DYNAMIC SQL from within a PL/SQL procedure where the PL/SQL code is recompiled and the DYNAMIC SQL calls something that depends on the calling procedure. If there ...

Tom Kyte
October 30, 2005 - 3:35 am UTC

if there <what> - seems like they are getting ready to tell you the answer to your question to me?

latch free - nothing dumping

A reader, October 30, 2005 - 6:20 pm UTC

Tom - I went through parts of chapter 10 of your one on one book so I am making raw trace files of some sessions that are giving a lot of latch free waits.
I did it this way - exec dbms_system.set_ev(58,3859, 10046, 12, ''); But, nothing is dumping out, is it just a matter of being patient?


Tom Kyte
October 31, 2005 - 5:11 am UTC

if the sessions are using shared server - they will trace to the background dump dest (and be rather hard to piece together before 10g)

if the sessions HAD a trace file and someone erased it - you'll never ever see that trace file again for that session.

As soon as you enable trace in a session, you would see the trace file appear. If it is not, probably one of the two above issues.

suggest you use dbms_system.set_sql_trace_in_session - the interface is a tad easier to use than setting the event like that.

ok I got my traces

A reader, October 31, 2005 - 7:43 pm UTC

Thanks for the above,
so there are tons of these..
WAIT #31: nam='latch free' ela= 614 p1=504403158567461992 p2=157 p3=0
WAIT #31: nam='latch free' ela= 256 p1=504403158567461992 p2=157 p3=1
WAIT #31: nam='latch free' ela= 758 p1=504403158567461992 p2=157 p3=2
you have an enqueue_decode in the chapter and reference the Oracle Wait Events appendix in the 9i reference.
It said something like P1=address P2=process P3=tries. Any suggestions as to how to track this down farther? You mention that latch free is usually SQL area problems with bind variables but that seems unlikely. There are only 14 statements in the system when this happens.

Tom Kyte
November 01, 2005 - 10:32 am UTC

use tkprof, get the wait even report by sql statement, see if somethig "obvious" pops out from that (you'll see things like "this is the statement suffering the waits, this is the AMOUNT of waits (one cares not if one waits 1,000,000 times for something - if they waited a small amount of time, one cares very much if they waited for something ONCE but waited a long time for it)

Here is the "If there" part ...

VKOUL, October 31, 2005 - 11:23 pm UTC

If there is general widespread waiting, the shared pool my need tuning. If there is a blocking scenario, the following SQL can be used to show the sessions that are holding and/or requesting pins on the object that are given in P1 in the wait.

Alberto Dell'Era, November 01, 2005 - 4:49 am UTC

Actually, for "latch free", p2 is the latch# ... you can find the name of the latch by running (as stated in the Database Reference):

select *
from v$latchname
where latch# = <value-of-P2>;

Latch oddity

Jonathan Lewis, November 01, 2005 - 12:42 pm UTC

There's an anomaly in these latch times that I haven't seen before:

>WAIT #31: nam='latch free' ela= 614 p1=504403158567461992 p2=157 p3=0
>WAIT #31: nam='latch free' ela= 256 p1=504403158567461992 p2=157 p3=1
>WAIT #31: nam='latch free' ela= 758 p1=504403158567461992 p2=157 p3=2

I am assuming that this is a 9i system, reporting elapsed times in microseconds. The output is three consecutive sleeps on the same latch - (p3 = 0,1,2).

The sleep time is only a few hundred microseconds each time - rather than the usual 1 centisecond, followed by 1,2,2,4,4,etc. centiseconds.

This looks like the latch implementation is some sort of 'yield' rather than the more typical 'deschedule with alarm'. Has anyone seen anything similar. It looks as if it could use a lot of CPU if there were any contention for popular latches.



latch oddity

Mario Broodbakker, December 22, 2005 - 12:48 pm UTC

I've seen latch behavior like this: on 10.1.0.4 IA64 Windows. A bug is introduced where a latch sleep of '0' is tried, which, of course returns immediately, and most of the times resulting in loads and loads of latch waits (see 'p3' of the count). This is supposed to be fixed in 10.1.0.5. Support can provide you with a workaround (I hope) if you really suffer from this

latch free on select count(*)

daniyal, February 19, 2006 - 3:56 pm UTC

Hi Tom,
I have read the whole thread and since I have a big problem with my database which also has high wait on latchfree.
Now when I have a latch free wait, i used another query to find out the sids involved with latch free waits and what sqls they are running.
The database is finacial oltp.
The sql that always comes when there is a latch wait is:

select count(1) from c_customer where custcode = :b1

Now the table c_customer has 500,000 rec and custcode is the primary key, So the plan of the sql is perfect that is unique scan.

But the statement runs avg of 8 million times in 15 mints.
Imagine whole trading day it ran for 10s of million times.

This is lik the heart of the application, any request comes to the database via application, there is a stored procedure in the db which is used for example balance unquiry, debit balance, pay utility bills etc. The developers used old style logic that is for every action, they ran the above sql to make sure customer exists then they use if else logic .
I told them that this is a relational db and you should go on doing ur work and the ref integty will fire an exception, but thie cvhanges are so large in the code , it needs time.

Since the plan of sql involves unique index, is the latch on table or index?
I have tried setting rows of table into more blocks by leaving more pct free and less pctused, tried reorg the table and rebuild index, but all in vain?

What should i do, development is not willing to reduce the occurence of this query and they say the query will increase its execution time 8 milll to 10 in 15 mints in next few days.


So what shall do? Is the a trick that i can use, like rev key ind etc???
Cheers


Tom Kyte
February 19, 2006 - 5:31 pm UTC

pretty silly query (actually, I'll be blunt and call it what it is - a stupid query)

I hate that - "hey, let us run a query to count records to see if we should select them out for processing!!!" What a brilliantly dumb idea.

They should just try to -- well, select out the record. They either get it or they don't.

Question to you - why is such a silly query the "heart of the application"

The latch free is likely a soft parse library latch free - you could remove 8 million parses every 15 minutes from your system I'll bet!!!



embarassment

dani, February 20, 2006 - 8:40 am UTC

Tom,
I am really so embarassed, and well....what shall we do?
Lets suppose if that was a genuine query to be run so much time, so often, how would you resolve it? Is there any way temp, like a workaround? The funny thing is that this query is coming as top most buffer gets, but the top5 even does not have latchfree waits....Here is the statspack output.
The problem was noticed accidently when users complain, and I ran the select ... from v$session_wait and i found a latchfree and then i ran:select s.sid,s.osuser,a.sql_text from v$sqlarea a,v$session s,v$session_wait w
where w.event = 'latch free'
and w.sid = s.sid
and a.hash_value =s.sql_hash_value
Constantly running the sql gave me same count(1) sql:

Statspack:

STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
SYMP 2006689470 SYMP 1 9.2.0.5.0 NO h801

Snap Id Snap Time Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
Begin Snap: 257 19-Feb-06 10:00:00 152 15.0
End Snap: 265 19-Feb-06 12:00:00 164 16.6
Elapsed: 120.00 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 2,048M Std Block Size: 8K
Shared Pool Size: 864M Log Buffer: 5,120K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 321,052.22 1,312.54
Logical reads: 52,711.38 215.50
Block changes: 1,857.26 7.59
Physical reads: 986.60 4.03
Physical writes: 63.17 0.26
User calls: 343.29 1.40
Parses: 953.05 3.90
Hard parses: 0.55 0.00
Sorts: 9.95 0.04
Logons: 0.03 0.00
Executes: 13,861.91 56.67
Transactions: 244.60

% Blocks changed per Read: 3.52 Recursive Call %: 97.98
Rollback per transaction %: 0.00 Rows per Sort: 58.97

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 98.13 In-memory Sort %: 100.00
Library Hit %: 99.99 Soft Parse %: 99.94
Execute to Parse %: 93.12 Latch Hit %: 99.50
Parse CPU to Parse Elapsd %: 84.04 % Non-Parse CPU: 98.91

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 60.07 72.04
% SQL with executions>1: 89.22 86.73
% Memory for SQL w/exec>1: 59.55 57.45

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 13,668 66.73
db file sequential read 439,534 3,546 17.31
log file sync 598,503 1,398 6.82
db file scattered read 953,231 686 3.35
LNS wait on SENDREQ 2,780 524 2.56
-------------------------------------------------------------
Wait Events for DB: SYMP Instance: SYMP Snaps: 257 -265
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
db file sequential read 439,534 0 3,546 8 0.2
log file sync 598,503 18 1,398 2 0.3
db file scattered read 953,231 0 686 1 0.5
LNS wait on SENDREQ 2,780 0 524 188 0.0
log file sequential read 2,772 0 184 66 0.0
db file parallel write 10,838 0 150 14 0.0
buffer busy waits 12,363 0 138 11 0.0
enqueue 98,133 4 119 1 0.1
latch free 4,987 2,213 18 4 0.0
log file switch completion 103 0 16 156 0.0
control file parallel write 3,433 0 7 2 0.0
LNS wait on LGWR 1,676,922 0 7 0 1.0
async disk IO 5,290 0 4 1 0.0
log file parallel write 1,534,155 1,512,961 3 0 0.9
ARCH wait on SENDREQ 118 0 3 22 0.0
LGWR wait on LNS 700 227 2 4 0.0
LGWR-LNS wait on channel 322 227 2 8 0.0
direct path read 9,455 0 2 0 0.0
control file sequential read 24,255 0 1 0 0.0
LGWR wait for redo copy 10,327 10 1 0 0.0
log file single write 56 0 1 15 0.0
direct path write 2,995 0 0 0 0.0
SQL*Net more data to client 4,238 0 0 0 0.0
SQL*Net break/reset to clien 152 0 0 0 0.0
local write wait 24 0 0 2 0.0
db file single write 12 0 0 1 0.0
buffer deadlock 60 59 0 0 0.0
SQL*Net message from client 2,466,696 0 654,598 265 1.4
SQL*Net more data from clien 421 0 3 7 0.0
SQL*Net message to client 2,466,707 0 2 0 1.4
-------------------------------------------------------------
Background Wait Events for DB: SYMP Instance: SYMP Snaps: 257 -265
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
LNS wait on SENDREQ 2,780 0 524 188 0.0
log file sequential read 2,772 0 184 66 0.0
db file parallel write 10,840 0 150 14 0.0
control file parallel write 3,409 0 7 2 0.0
LNS wait on LGWR 1,676,976 0 7 0 1.0
async disk IO 5,278 0 4 1 0.0
log file parallel write 1,534,170 1,512,976 3 0 0.9
ARCH wait on SENDREQ 118 0 3 22 0.0
LGWR wait on LNS 700 227 2 4 0.0
LGWR-LNS wait on channel 322 227 2 8 0.0
db file scattered read 187 0 2 11 0.0
direct path read 9,455 0 2 0 0.0
LGWR wait for redo copy 10,327 10 1 0 0.0
log file single write 56 0 1 15 0.0
db file sequential read 66 0 1 12 0.0
direct path write 2,995 0 0 0 0.0
control file sequential read 3,071 0 0 0 0.0
latch free 115 9 0 1 0.0
rdbms ipc message 4,226,430 2,586,288 60,874 14 2.4
pmon timer 2,413 2,408 7,030 2914 0.0
smon timer 44 20 6,778 ###### 0.0
-------------------------------------------------------------
SQL ordered by Gets for DB: SYMP Instance: SYMP Snaps: 257 -265
-> End Buffer Gets Threshold: 10000
-> Note that resources reported for PL/SQL includes the resources used by
all SQL statements called within the PL/SQL code. As individual SQL
statements are also reported, it is possible and valid for the summed
total % to exceed 100

CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
207,987,520 12,857 16,177.0 54.8 3714.84 4041.54 2386815261
Module: cis_inquire@mware02 (TNS V1-V3)
begin cis_caller ( :input_fields , :output_fields ) ; END ;

196,634,528 65,596,344 3.0 51.8 2025.66 1895.51 2100004012
Module: cis_update@mware02 (TNS V1-V3)
SELECT COUNT(1) FROM C_CUSTOMER WHERE CUST_CODE = :B1

133,512,849 249,338 535.5 35.2 2585.97 5541.47 4019074281
Module: cammoltp@mware02 (TNS V1-V3)
begin txd_oltp . txd_process ( :insrcid , :indestid , :inmti , :
inflds , :indata , :outmti , :outflds , :outdata ) ; END ;

27,032,226 85,013 318.0 7.1 174.86 180.70 2904936698
Module: cammoltp@mware02 (TNS V1-V3)
SELECT COUNT(PLEDGED_AMT) FROM RB_RESTRAINTS_MEMO WHERE INTERNAL
_KEY = :B2 AND TRUNC(:B1) BETWEEN START_DATE AND END_DATE AND NV
L(STATUS, 'N') != 'P'

27,024,912 85,007 317.9 7.1 161.56 166.60 1846391594
Module: cammoltp@mware02 (TNS V1-V3)
SELECT NVL(SUM(PLEDGED_AMT), 0) FROM RB_RESTRAINTS_MEMO WHERE IN
TERNAL_KEY = :B2 AND TRUNC(:B1) BETWEEN START_DATE AND END_DATE
AND NVL(STATUS, 'N') != 'P'

11,283,723 141,981 79.5 3.0 6319.14 6329.24 3919012089
Module: fts_request@mware02 (TNS V1-V3)
begin :output_fields := precamm ( :input_fields ) ; END ;

11,211,642 5,603,047 2.0 3.0 136.20 127.63 1657741403
Module: cis_update@mware02 (TNS V1-V3)
SELECT COUNT(1) FROM C_CUSTOMER_RESERVED WHERE CUST_CODE = :B1




Tom Kyte
February 20, 2006 - 12:14 pm UTC

you don't say what the latch free is FOR.


Hey, can you show us the EXECUTIONS and PARSE CALLS for that select count(1) for that period of time?

I see you parse almost 1000 sql statements per second, your low hanging fruit is to reduce that number (I've a feeling we'll see parse = execute for this statement, parse should be way low, execute high - that'll be the biggest bang for the buck)


it is funny that latch free isn't in the top 5.

2 hours is way way long.

how to find the exeutions

DANiel, February 20, 2006 - 2:28 pm UTC

Hi,
Can you please tell me how to find out the parse and executions or the sql statement, is it not included in the stats report i posted?
You mean I should run the select count.. with autotrace on in sqlplus?

I am also wondering why latchfree is not coming.

Cheers


Tom Kyte
February 20, 2006 - 2:31 pm UTC

just query v$sql

thanks-prob solved

dani, February 21, 2006 - 1:01 pm UTC

Hi Tom,
Thanks for your comments, the development has removed the sql from a loop and implemented today afternoon,
those 65 millions are now reduced to 5000 sqls.
The problem that there was a loop for each request to find the gaps in cust code whic =h is no longer required to check.
All of this becuase of your strong comments,
cheers

Su Baba, February 27, 2006 - 3:41 pm UTC

In your response to the original question, you said:

"Yes, the hard parses require MANY MANY latches - on data structures in the shared pool. They will get in eachothers way."

What data structures were you referring to?

How does latching actually work? Suppose I issue the following SQL statement:

SELECT * FROM x WHERE id = :b1;

and suppose Oracle hashes this statement and finds that it already exists in the shared pool - soft parse. Would latches come into play at all in this case?

Suppose Oracle does not find this statement in the shared pool, what data structures would it latch?

Tom Kyte
February 27, 2006 - 4:51 pm UTC

the data structures are the SGA - the SGA is a big "shared global area"

Everything in the SGA is a data structure, it is a big chunk of shared memory.


Latches come into play constantly and a lot with a soft parse.

select * from x where id = :b1;

before even taking the hash and going to the shared pool (remember, we have to LATCH the shared pool just to INSPECT IT, you cannot change that data structure whilst I'm walking the pointers in it!) we have to do a ton of work.

For example, figure out "what the heck is X" - that requires some dictionary lookups which require - LATCHING of the dictionary cache.

Then we have to figure out "what the heck is "*" " - again....

There are tons of latching going on to figure out what that query actually "goes after" and whether you are allowed to see it. Then we can hash it and goto the shared pool where there might be MANY SQL's that match that text but are not the same (when I say "from x" it means "from tkyte.x", when you say "from x" it means "from sbaba.x" - same query text - entirely different query!). Each time we want to hit that shared structure and walk it - we latch.

soft parsing takes LOTS of latching, lots and lots.

Su Baba, February 28, 2006 - 5:00 pm UTC

Let's say two users issued the following statement:

SELECT * FROM x WHERE col1 = :b1;

One statement resolves to tkyte.x and the other to sbaba.x. Would the following be the correct representation of how these two statements are stored in the shared pool?

- one copy of the text (one record in v$sql)
"SELECT * FROM x WHERE col1 = :b1"

- two different parsed form of the above SQL
(one for tkyte.x; another for sbaba.x)

- two execution plans for the SQL
(one for tkyte.x; another for sbaba.x)



Tom Kyte
March 01, 2006 - 8:02 am UTC

consider it to be two copies of everything.

PL/SQL Caching & Bind Variables

A reader, February 28, 2006 - 8:09 pm UTC

I ran the following test and was surprised to see the last query returning no rows. I guess I shouldn't be surprised since you mentioned that PL/SQL caches cursor so within the same ananamous PL/SQL block, there would be no soft parses. However, how does that caching work exactly? Isn't the SQL within a PL/SQL block also re-usable by any sessions? Wouldn't it need to be in the shared pool (v$sql) as well?

SQL> alter system flush shared_pool;

System altered.

SQL> 
SQL> DECLARE
  2     l_number NUMBER;
  3  
  4  BEGIN
  5     FOR i IN 1..10000 LOOP
  6        l_number := dbms_random.random;
  7  
  8        INSERT INTO t VALUES 
  9        (l_number, l_number, l_number, l_number);
 10     END LOOP;
 11  
 12     COMMIT;
 13  END;
 14  /

PL/SQL procedure successfully completed.

SQL> 
SQL> SELECT sql_text
  2  FROM   v$sql
  3  WHERE  sql_text LIKE 'INSERT INTO t%' AND
  4         rownum < 30;

no rows selected
 

Tom Kyte
March 01, 2006 - 8:08 am UTC

plsql is not truly closing cursors.

Yes the insert is shareable over sessions.
Your session only needs to parse it ONCE (as does ANY SESSION)


you should use

sql_text like 'INSERT%INTO%T%VALUES%(%)%';

insert you don't know what plsql might be doing to it (it uppercases it these days)

Oracle Latches

Ravi Kumar, May 31, 2006 - 6:18 am UTC

I am trying to understand what are the ORACLE Latches.

Can you please explain it with simple step by step example ?



Tom Kyte
May 31, 2006 - 10:13 am UTC

Hmm, sort of gave the definition in the original answer???


Here is a longer one, more details to be found in the book:

<quote src=Expert Oracle Database Architecture>

Latches

Latches are lightweight serialization devices used to coordinate multiuser access to shared data structures, objects, and files.

Latches are locks designed to be held for extremely short periods of time—for example, the time it takes to modify an in-memory data structure. They are used to protect certain memory structures, such as the database block buffer cache or the library cache in the shared pool. Latches are typically requested internally in a “willing to wait” mode. This means that if the latch is not available, the requesting session will sleep for a short period of time and retry the operation later. Other latches may be requested in an “immediate” mode, which is similar in concept to a SELECT FOR UPDATE NOWAIT, meaning that the process will go do something else, such as try to grab an equivalent sibling latch that may be free, rather than sit and wait for this latch to become available. Since many requestors may be waiting for a latch at the same time, you may see some processes waiting longer than others. Latches are assigned rather randomly, based on the luck of the draw, if you will. Whichever session asks for a latch right after it was released will get it. There is no line of latch waiters—just a mob of waiters constantly retrying.

Oracle uses atomic instructions like “test and set” and “compare and swap” for operating on latches. Since the instructions to set and free latches are atomic, the operating system itself guarantees that only one process gets to test and set the latch even though many processes may be going for it simultaneously. Since the instruction is only one instruction, it can be quite fast. Latches are held for short periods of time and provide a mechanism for cleanup in case a latch holder “dies” abnormally while holding it. This cleanup process would be performed by PMON.

Enqueues, which we discussed earlier, are another, more sophisticated serialization device used when updating rows in a database table, for example. They differ from latches in that they allow the requestor to “queue up” and wait for the resource. With a latch request, the requestor session is told right away whether or not it got the latch . With an enqueue lock, the requestor session will be blocked until it can actually attain it.

Note Using SELECT FOR UPDATE NOWAIT or WAIT [n], you can optionally decide not to wait for an enqueue lock if your session would be blocked , but if you do block and wait, you will wait in a queue.

As such, an enqueue is not as fast as a latch can be, but it does provided functionality over and above what a latch can offer. Enqueues may be obtained at various levels, so you can have many share locks and locks with various degrees of shareability.
</quote>

SQL against data dictionary is slow

Ray, June 07, 2006 - 11:10 am UTC

Hi Tom,

I am using the following SQL to find out the active SQLs in our database:

select sql_text,CPU_TIME,ELAPSED_TIME,DISK_READS,BUFFER_GETS,ROWS_PROCESSED,EXECUTIONS
from v$sql
where (address,hash_value) in
(
SELECT
s.sql_address
,s.sql_hash_value
FROM v$session s
WHERE s.status = 'ACTIVE'
and (sql_hash_value <> 0 or sql_address <>'00')
);

The SQL runs exceptionaly slow, ofetm more than 3 min.

Here is a excerpt from tkprof

********************************************************************************

select sql_text,CPU_TIME,ELAPSED_TIME,DISK_READS,BUFFER_GETS,ROWS_PROCESSED,EXECUTIONS
from v$sql
where (address,hash_value) in
(
SELECT
s.sql_address
,s.sql_hash_value
FROM v$session s
WHERE s.status = 'ACTIVE'
and (sql_hash_value <> 0 or sql_address <>'00')
)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.10 0 7 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1.81 57.26 0 0 0 11
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.81 57.37 0 7 0 11

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 68 (STOWNER)

Rows Row Source Operation
------- ---------------------------------------------------
11 MERGE JOIN
16437 SORT JOIN
16437 FIXED TABLE FULL X$KGLCURSOR
11 SORT JOIN
10 VIEW
10 SORT UNIQUE
19 FIXED TABLE FULL X$KSUSE

error during execute of EXPLAIN PLAN statement
ORA-01039: insufficient privileges on underlying objects of the view

parse error offset: 163

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
latch free 58 0.19 1.03
SQL*Net message from client 2 131.17 131.21
SQL*Net more data to client 1 0.00 0.00
********************************************************************************

SELECT NAME name_col_plus_show_param,DECODE(TYPE,1,'boolean',2,'string',3,
'integer',4,'file',6,'big integer','unknown') TYPE,VALUE
value_col_plus_show_param
FROM
V$PARAMETER WHERE UPPER(NAME) LIKE UPPER('%user_d%') ORDER BY
name_col_plus_show_param,ROWNUM


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.02 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.01 0.00 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 0.03 0 2 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 68 (STOWNER)
error during execute of EXPLAIN PLAN statement
ORA-01039: insufficient privileges on underlying objects of the view

parse error offset: 233

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
SQL*Net message from client 1 0.00 0.00
********************************************************************************

alter session set events '10046 trace name context forever, level 12'


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 0 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1 0.00 0.00 0 0 0 0

Misses in library cache during parse: 0
Misses in library cache during execute: 1
Optimizer goal: CHOOSE
Parsing user id: 68 (STOWNER)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 53.17 53.17
latch free 7 0.01 0.01
********************************************************************************

ALTER SESSION SET EVENTS '10104 trace name context off'


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.00 0 0 0 0

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 68 (STOWNER)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 17.24 17.24



********************************************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3 0.00 0.13 0 9 0 0
Execute 4 0.00 0.00 0 0 0 0
Fetch 4 1.82 57.27 0 0 0 12
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 11 1.82 57.40 0 9 0 12

Misses in library cache during parse: 3
Misses in library cache during execute: 1

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 6 0.00 0.00
SQL*Net message from client 5 131.17 201.64
latch free 65 0.19 1.04
SQL*Net more data to client 1 0.00 0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.00 0.00 0 0 0 0
Execute 4 0.00 0.00 0 0 0 0
Fetch 4 0.00 0.00 0 9 0 4
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 12 0.00 0.02 0 9 0 4

Misses in library cache during parse: 2

4 user SQL statements in session.
4 internal SQL statements in session.
8 SQL statements in session.
0 statements EXPLAINed in this session.
********************************************************************************

It shows up Latch wait, but not significant. From ra trace file I found out it to be library cache

WAIT #1: nam='latch free' ela= 4952 p1=1846886548 p2=157 p3=0
WAIT #1: nam='latch free' ela= 11037 p1=1846886748 p2=157 p3=0
WAIT #1: nam='latch free' ela= 10183 p1=1846886548 p2=157 p3=0
WAIT #1: nam='latch free' ela= 37127 p1=1846886348 p2=157 p3=0
WAIT #1: nam='latch free' ela= 6846 p1=1846886748 p2=157 p3=0
WAIT #1: nam='latch free' ela= 2234 p1=1846886348 p2=157 p3=0
WAIT #1: nam='latch free' ela= 2271 p1=1846886348 p2=157 p3=0
WAIT #1: nam='latch free' ela= 3274 p1=1846886348 p2=157 p3=0

...same throughout.

Yours own runstats shows:

NAME DIFF
-------------------------------------------------- ----------
LATCH.cache buffers chains 18794820
LATCH.channel operations parent latch 10220879
STAT...session uga memory max 8641248
STAT...session pga memory max 8593112
LATCH.simulator hash latch 4668165
LATCH.library cache 311592
LATCH.library cache pin 202295
LATCH.shared pool 139896
LATCH.row cache objects 125987
LATCH.row cache enqueue latch 125150
LATCH.cache buffers lru chain 123140
LATCH.library cache pin allocation 83944
LATCH.redo allocation 68953
LATCH.enqueue hash chains 58841
LATCH.multiblock read objects 48150
LATCH.SQL memory manager workarea list latch 32284
LATCH.undo global data 31307
STAT...redo size 28272
LATCH.dml lock allocation 26554
LATCH.enqueues 19970
LATCH.session allocation 19550
STAT...sorts (rows) 19279
LATCH.checkpoint queue latch 15223
STAT...session pga memory 7896
LATCH.session idle bit 7659
LATCH.messages 5996
LATCH.cache buffer handles 5358
LATCH.redo writing 5179
STAT...bytes sent via SQL*Net to client 2900
LATCH.sort extent pool 2440
LATCH.mostly latch-free SCN 1606
LATCH.lgwr LWN SCN 1600
LATCH.Consistent RBA 1599
LATCH.post/wait queue 1522
LATCH.transaction allocation 1516
LATCH.list of block allocation 1485
LATCH.user lock 1470
LATCH.dummy allocation 1465
LATCH.sequence cache 1322
LATCH.simulator lru latch 1127
LATCH.channel handle pool latch 737
STAT...bytes received via SQL*Net from client 734
LATCH.message pool operations parent latch 732
STAT...CPU used by this session 357
STAT...CPU used when call started 357
LATCH.global tx hash mapping 341
LATCH.transaction branch allocation 326
LATCH.loader state object freelist 254
LATCH.ktm global data 248
LATCH.commit callback allocation 243
LATCH.child cursor hash table 177
LATCH.session timer 174
LATCH.AQ Propagation Scheduling Proc Table 157
LATCH.active checkpoint queue latch 144
LATCH.job_queue_processes parameter latch 73
LATCH.KWQP Prop Status 68
LATCH.kwqit: protect wakeup time 67
LATCH.hash table column usage latch 50
STAT...session logical reads 43
LATCH.temporary table state object allocation 42
STAT...db block changes 36
STAT...db block gets 31
STAT...recursive calls 29
STAT...redo entries 22
STAT...parse time elapsed 20
LATCH.FOB s.o list latch 17
STAT...buffer is not pinned count 16
STAT...workarea executions - optimal 14
STAT...consistent gets 12
LATCH.spilled msgs queues list latch 11
LATCH.ncodef allocation latch 8
STAT...enqueue releases 8
STAT...enqueue requests 8
STAT...no work - consistent read gets 8
STAT...calls to get snapshot scn: kcmgss 8
STAT...table fetch by rowid 8
LATCH.session switching 8
LATCH.AQ Propagation Scheduling System Load 7
STAT...opened cursors cumulative 7
STAT...user calls 7
STAT...sorts (memory) 7
STAT...parse count (total) 7
STAT...execute count 7
LATCH.process group creation 5
LATCH.FIB s.o chain latch 4
STAT...free buffer requested 4
STAT...commit cleanouts successfully completed 4
STAT...commit cleanouts 4
STAT...SQL*Net roundtrips to/from client 4
STAT...calls to kcmgcs 3
LATCH.event group latch 1
STAT...parse time cpu 1
STAT...redo synch writes 1
STAT...user commits 1
STAT...redo synch time 1
STAT...parse count (hard) 1
LATCH.hash table modification latch 1
STAT...calls to kcmgas 1
STAT...messages sent 1
LATCH.process allocation 1
LATCH.object stats modification 1

It shows LATCH.cache buffers chains at the top.

I am using;
Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
PL/SQL Release 9.2.0.6.0 - Production
CORE 9.2.0.6.0 Production
TNS for Solaris: Version 9.2.0.6.0 - Production
NLSRTL Version 9.2.0.6.0 - Production

Since it is a SQl against data dictionary, what action point I have to improve the performance. Also your analysis on the above would help immensely.

Thanks & Regards
Ray


Tom Kyte
June 07, 2006 - 3:13 pm UTC

it is not against the dictionary, it is against magic, in memory v$ views (dynamic performance views)

Querying these views can be very expensive (many of them require latched access, on a busy system, over-querying them can be the cause of your performance problem)


try this:

with active
as
( SELECT s.sql_address ,s.sql_hash_value, rownum r
FROM v$session s
WHERE s.status = 'ACTIVE'
and (sql_hash_value <> 0 or sql_address <>'00')
),
sql
as
(select address,hash_value,sql_text,CPU_TIME,ELAPSED_TIME,DISK_READS,BUFFER_GETS,ROWS_PROCESSED,EXECUTIONS, rownum r
from v$sql
)
select sql_text,CPU_TIME,ELAPSED_TIME,DISK_READS,BUFFER_GETS,ROWS_PROCESSED,EXECUTIONS
from sql
where (address,hash_value) in (select sql_address,sql_hash_value from active)
/


it should "copy" the v$ information and use that in the rest of the query.

SQL against data dictionary is slow

Ray, June 08, 2006 - 6:28 am UTC

Thnaks Tom for the suggestion. Even using wuth clause the performance was slow. I worked around it using Pipelined function.

Regards
Ray

cache buffers chains

Vadim, August 04, 2006 - 12:13 am UTC

Tom,

we have had a strange issue where one day database server CPU utilization jumped to 100% and stayed there for about 15 hours. I have ran statspack and also checked what sessions were waiting for. I have found that there were many sessions waiting for "latch free" event on "cache buffers chains". The following day CPU utilization dropped to usual 60-70%. Important to mention that we execute the same application jobs on more or less the same data profile day after day. I compared statspack output for two days, they were virtually the same in Load Profile and Top 5 wait events with exception of "latch free" which was 10 times higher on the day when problem occurred.

Any idea why it could happen ? Could it be a coincidence that some blocks became hot as a result of particular repots running at the same time ?

What can be done to prevent this issue from reoccurring besides tuning application SQL ? Can increasing DB_BLOCK_HASH_BUCKETS help ? We run Oracle 9.2.0.6.

1. Bad day

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 5,008M Std Block Size: 16K
Shared Pool Size: 2,000M Log Buffer: 3,072K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 1,889,261.88 413,662.49
Logical reads: 472,575.82 103,472.63
Block changes: 5,954.36 1,303.73
Physical reads: 8,799.70 1,926.73
Physical writes: 960.56 210.32
User calls: 360.94 79.03
Parses: 120.35 26.35
Hard parses: 4.93 1.08
Sorts: 945.84 207.10
Logons: 1.64 0.36
Executes: 591.94 129.61
Transactions: 4.57

% Blocks changed per Read: 1.26 Recursive Call %: 86.98
Rollback per transaction %: 13.98 Rows per Sort: 311.14

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.90 Redo NoWait %: 100.00
Buffer Hit %: 98.31 In-memory Sort %: 100.00
Library Hit %: 99.26 Soft Parse %: 95.90
Execute to Parse %: 79.67 Latch Hit %: 98.96
Parse CPU to Parse Elapsd %: 19.76 % Non-Parse CPU: 98.87

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 78.75 80.69
% SQL with executions>1: 41.35 49.67
% Memory for SQL w/exec>1: 23.99 28.43

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file sequential read 485,728,218 841,909 24.00
CPU time 815,686 23.25
PX Deq Credit: send blkd 4,204,485 512,384 14.61
latch free 25,332,578 433,118 12.35
db file scattered read 5,862,867 278,176 7.93
-------------------------------------------------------------



2. Good day


Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 5,008M Std Block Size: 16K
Shared Pool Size: 2,000M Log Buffer: 3,072K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 1,844,994.35 428,530.24
Logical reads: 422,235.21 98,071.06
Block changes: 4,561.96 1,059.59
Physical reads: 7,841.51 1,821.32
Physical writes: 974.90 226.44
User calls: 333.36 77.43
Parses: 101.24 23.51
Hard parses: 4.64 1.08
Sorts: 392.75 91.22
Logons: 1.54 0.36
Executes: 388.76 90.30
Transactions: 4.31

% Blocks changed per Read: 1.08 Recursive Call %: 84.41
Rollback per transaction %: 5.35 Rows per Sort: 717.72

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.92 Redo NoWait %: 100.00
Buffer Hit %: 98.40 In-memory Sort %: 100.00
Library Hit %: 98.97 Soft Parse %: 95.42
Execute to Parse %: 73.96 Latch Hit %: 99.26
Parse CPU to Parse Elapsd %: 26.89 % Non-Parse CPU: 99.00

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 80.69 80.53
% SQL with executions>1: 49.67 39.25
% Memory for SQL w/exec>1: 28.43 22.44

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 729,333 27.32
db file sequential read 329,917,598 603,372 22.60
db file scattered read 7,846,223 355,902 13.33
PX Deq Credit: send blkd 2,338,160 318,713 11.94
SQL*Net message from dblink 997,464 164,936 6.18
-------------------------------------------------------------



Thanks


Tom Kyte
August 04, 2006 - 7:48 am UTC

you did more transactions per second (more work) the "bad day" - are you sure there wasn't an increase in something (like the workload)

It is really hard to make any judgements from a statspack that covers what appears to be a really long amount of time.

When CPU is 100% busy...

Roderick, August 04, 2006 - 11:04 pm UTC


Then latch waits will also artificially look much higher since it will include the time processes spend waiting on the CPU run queue to even request the latch. When CPU is 100% used, it is generally more useful to find out why CPU usage was higher (rather than see who's waiting for anything since a waiting or sleeping process uses relatively little CPU). As Tom says, looks like the workload in general was higher based on the snippet that was provided.

Huge latch free wait

A reader, November 19, 2007 - 12:39 pm UTC

Hi tom, Regards from Barcelona. Sorry about my english.

Well, a DBA who's leaving is passing me a database which degradates in a period of the day. It starts to going up from 1100 sessions to 1900 ... then it seems to hangs for a while, maybe 5 minutes, then the number of sessions go down and stabilizes again. The server starts to swap, so i understand that the PGA is consuming the RAM (the machine has 32GB, 8 dual CPU).

I was checking a statspack during that period and i found this:

STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
CUBA 2632031150 cuba 1 9.2.0.8.0 NO cuba.uoc.es

Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 7678 19-Nov-07 11:32:36 1,111 2.9
End Snap: 7682 19-Nov-07 12:00:16 1,228 2.7
Elapsed: 27.67 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 7,024M Std Block Size: 8K
Shared Pool Size: 1,008M Log Buffer: 4,096K

Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 170,609.58 1,239.18
Logical reads: 377,221.13 2,739.86
Block changes: 1,137.59 8.26
Physical reads: 796.07 5.78
Physical writes: 51.69 0.38
User calls: 5,534.87 40.20
Parses: 1,226.05 8.91
Hard parses: 0.60 0.00
Sorts: 311.75 2.26
Logons: 3.03 0.02
Executes: 1,264.89 9.19
Transactions: 137.68

% Blocks changed per Read: 0.30 Recursive Call %: 12.95
Rollback per transaction %: 17.49 Rows per Sort: 391.70

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 99.79 In-memory Sort %: 100.00
Library Hit %: 99.97 Soft Parse %: 99.95
Execute to Parse %: 3.07 Latch Hit %: 99.80
Parse CPU to Parse Elapsd %: 5.98 % Non-Parse CPU: 96.75

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 95.44 66.88
% SQL with executions>1: 76.72 81.37
% Memory for SQL w/exec>1: 71.14 78.33

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
latch free 147,497 7,809 50.01
CPU time 4,109 26.32
db file sequential read 1,232,813 1,969 12.61
log file sync 205,158 766 4.91
enqueue 560 462 2.96

.
---

If you see, the Execute to parse is very low and the top wait event is latch free with about 8000 seconds !!!.

So, i have two questions .. hope you can help me:

1.- The SGA vs RAM is ok ?? .. i ask this because the "bad" behaviour begin when the machine starts to swap after reaching high number of sessions/os processes.
2.- How can i know which SQL statements are parsing but no executes.
3.- Can this (question 2) be the cause of the huge latch free wait ?.

I have a statspack from a non "bad" behaviour period and the top 5 looks like. You'll see there is no latch free waits at all.

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 100.00 Redo NoWait %: 100.00
Buffer Hit %: 99.82 In-memory Sort %: 100.00
Library Hit %: 99.98 Soft Parse %: 99.96
Execute to Parse %: 1.80 Latch Hit %: 99.87
Parse CPU to Parse Elapsd %: 31.37 % Non-Parse CPU: 96.70

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 91.23 92.38
% SQL with executions>1: 73.73 73.79
% Memory for SQL w/exec>1: 58.17 60.17

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 2,669 54.48
db file sequential read 847,941 1,043 21.28
latch free 24,432 511 10.42
log file sync 180,783 299 6.10
log file parallel write 166,432 134 2.73
-------------------------------------------------------------


Thanks !!!!
Tom Kyte
November 21, 2007 - 1:02 pm UTC

always curious - swapping is like so drastic - to actually SEE it happen would be so rare, yet - I hear of it all of the time.

How do you observe the machine SWAPPING (paging - paging is cool, paging is fine, paging is normal).

8 cpus
1900 sessions.

If more of them than 8 try to be active simultaneously - you know what happens right...

...
Parses: 1,226.05 8.91
Hard parses: 0.60 0.00
...

you are parsing like mad (perhaps - maybe - session cached cursors can help, but ultimately the only true corrective action can take place in the application)


1) I seriously, honestly doubt you are swapping. Swapping is pretty dead serious.

You have an 8gb SGA
You have some 24gb left over

If your pga aggregate target is about 20gb, that is about 10mb per session for 1900 sessions - sounds reasonable for a transactional application.


2) look at v$sql, it has parse count, it has execute count

3) sure, but the latch report in the statspack would give you more insight into that.



I would say you have a burst of activity, lots more parsing happening - hence more contention for the library cache, you experience waits. I seriously doubt this is "swapping" - you are in fact burning up lots of cpu during this burst and waiting for latches (which in turn - BURNS MORE CPU)

You are not swapped out

Latches and Locks

aaaa, February 19, 2008 - 2:03 pm UTC

Hi Tom,

The user executing a statement below statement:

DELETE FROM a.temp_rpts_appr_universe
WHERE ROWID IN (
SELECT t.ROWID
FROM a.temp_rpts_appr_universe t, b.items i
WHERE t.session_id = :b3
AND t.title_link_number IN (
SELECT title_link_number
FROM temp_rpts_appr_universe a,
b.items itm
WHERE u.session_id = :b3
AND u.isbn = itm.isbn_13
AND itm.pk_binding_code IN (:b4))
AND t.isbn = i.isbn_13
AND i.pk_binding_code IN (:b2, :b1))

and there is a wait event details below:

ROWNUM,EVENT,TOTAL_WAITS,TOTAL_TIMEOUTS,TIME_WAITED,AVERAGE_WAIT,MAX_WAIT,TIME_WAITED_MICRO
1,latch: cache buffers chains,12194,12194,5357,0,17,53568791
2,buffer busy waits,2,0,0,0,0,71
3,db file sequential read,110,0,78,1,3,777138
4,db file scattered read,2,0,3,1,2,29147
5,latch: shared pool,1,0,0,0,0,2494
6,SQL*Net message to client,21,0,0,0,0,140
7,SQL*Net more data to client,1,0,0,0,0,289
8,SQL*Net message from client,21,0,26,1,22,263490
9,SQL*Net break/reset to client,2,0,0,0,0,676


seems to be a latch: cache buffers chains,db file sequential read and db file scattered read which is why its taking long time to execute the query.

Even though I have increased pga_aggregate_target to 2 GB, the queries still opting for Nested loops on Hash Join.

Optimizer is set to ALL_ROWS.

We are running Oracle 10.1.0.5 on Windows 2000 SP 4.

What is latch: cache buffers chains and how can I resolve.

Thanks

Tom Kyte
February 20, 2008 - 7:15 am UTC

cache buffers chains latches are the latches we use to protect the buffer cache (when I'm reading the cache, you cannot be modifying that data structure - and reads of the cache modify it). Think of the cache as two dimensional array, every block is in a linked list - and there are many linked lists. To find a block in the cache, we hash the DBA (data block address) to determine which of the N lists it is going to be in (if it is there), then latch that list and walk it to find that block.

bummer to have to read that table three times to accomplish your task.

Normally, I might look at it, but this query isn't the real query anyway - so it would be pointless.

DELETE FROM a.temp_rpts_appr_<b>u</b>niverse
      WHERE ROWID IN (
               SELECT t.ROWID
                 FROM a.temp_rpts_appr_<b>u</b>niverse t, b.items i
                WHERE t.session_id = :b3
                  AND t.title_link_n<b>u</b>mber IN (
                         SELECT title_link_n<b>u</b>mber
                           FROM temp_rpts_appr_<b>u</b>niverse a,
                                b.items itm
                          WHERE <b>u</b>.session_id = :b3
                            AND <b>u</b>.isbn = itm.isbn_13
                            AND itm.pk_binding_code IN                 (:b4))
                  AND t.isbn = i.isbn_13
                  AND i.pk_binding_code IN (:b2, :b1))


U is not a valid correlation name for example...

Latch spin related issue

Nilanjan Ray, March 20, 2008 - 1:46 pm UTC

Hi Tom,

Going though this thread it seems that latch and its relation to buffer cache size is an area which is rarely looked upon while undertaking tuning analysis.
-----------------------------------
I want to share my views regarding the recent increased performance complaints we have been experiencing from the users.

The recent changes to the database involved increasing the data cache size (10g (db_cache_size = 4gig and db_keep_cache_size=1 gig) to a considerable extent which means that most of the data is now cached. This is reflected through a massive reduction in physical I/O. The disk read wait have reduced from 18-20% to a mere 2-3%.

Soon after doing this the CPU utilisation increated from around 78-80% to 90-92%, physical I/O related wait reduced from 18-20% to a mere 2-3%. But this also resulted in users complaining more frequently about performance degradation.

I have made the following observations regarding the performance degradation to the team and the management-

1. A SQL which needs thousands of data blocks to complete will now spend more than 90% of the time on CPU since most of the data blocks are now cached. Earlier it would have spent 78-80% of the time on CPU and 20-22% of the time on physical I/O

2. A session will now have to scan through long list of buffers holding on to a cache buffer chain latch exclusively. In the meanwhile if a second session need blocks which are hashed onto the same cache buffer chain latch it will spin on the CPU and try to obtain the latch repeatedly for a fixed number of times (I guess this is still 2000 since Oracle version 7) before leaving the CPU and going off to sleep. It will wake up again to repeat the same cycle till it obtains the latch and finally the data block it needs. This spin and retry cycle consumes CPU cycles (without producing a throughput) which creates a queue of other ready to run processes in the system. We need to bear in mind that this cannot be identified using Oracle wait interface because this is not a wait as far as Oracle is concerned. Extending this scenario to more number of processes requiring considerable number of data blocks from cache gives us an understanding the frequent excessive load on the CPU. Currently the CPU runs at around 90% usage.

3. Even earlier the system used to experience such expensive SQLs with demands for too many data blocks, but this didn¿t cause CPU starvation for the entire system to the extent it does now. The reason in my opinion is -
- Since less data was available in the cache, once Oracle finds that the data is not cached it leaves the CPU and waits for the data to be read from the disk
- Earlier prior to the cache increase, an expensive SQL spent around 20% on disk reads, this created 20% free CPU cycles to be consumed by other ready to run processes
- Thus the ready to run processes get their share of CPU without too much waiting
- Thus earlier when these SQLs spent 20% on disk, they would have impacted the response time more for the sessions issuing these SQLs rather than the entire system. What happens now is that since these sessions spend their entire time on the CPU creating a run queue of session waiting for the CPU

3. We also need to remember that latches are serialization devices, which seriously prevents scalability. The longer we allow a process to hold on to the latch the more we prevent scalability

4. So how do we prevent processes holding latches for long durations:
- Tune expensive SQLs. But little can we do for one of our query tools where the user can provide unselective predicates. This is a limitation of the tool, but that's a seperate question
- Perform a balancing act between the CPU and disk I/O. If we can reduce the cache size (keep cache size in particular) a little allowing the expensive SQLs to leave the CPU while waiting on disk I/O for some time given that most of the disk controller bandwidth is going unutilized.

This in my opinion would do well to the CPU and the processes which genuinely need the CPU for real inexpensive transactions

5. The slow system response which is occurring quite frequently is due to the run queue which is created when several processes spin on the CPU for cache buffer chain latch consuming active CPU cycles. This should not be confused with enqueue or row locks which does not impact CPU cycles

6. In my opinion too much of data cache is proving to be a performance bottleneck because more and more and more processes are spinning on the CPU consuming CPU cycles. In trying to eliminate physical I/O completely through data cache is creating secondary CPU related issues due to increased latch activity resulting in performance degradation.

I would request you to validate and comment on the observations I made.

Also I have prepared a graph (if your site had a facility to upload one I would have uploaded it) where I have plotted the OS loads against the latch spin gets. The peaks of these rise and fall together which seems to be in agreement with my observation.

Thanks for your valuable time. And I would be eager to see comments and opinion and suggestion from you.

Regards
Nilanjan
Tom Kyte
March 24, 2008 - 10:37 am UTC

another easy way for them to visualize what you are saying is via an example.

You have a system, there are two types of users on it - reporting users and oltp users.

The oltp users are saying "this is slow". So, as everyone knows, adding CPU will make things faster - it has to be so.

So, you double the number of cpu's - and the oltp users come back and say how horrendous the system is now - it is much worse than before.

what happened?

Well, the OLTP users were predominantly throttled by the amount of IO they could achieve. They were bottlenecked on IO.

The report users - while doing lots of physical IO's - were cpu bound computing results.

So, you added cpu - making the reports go faster - meaning they now did more IO - but IO was a scarce resource, so IO waits went up and up and up.


Meaning - the alleviation of one bottleneck frequently leads to the introduction or worsening of another. For better or worse, computer systems are like little biospheres - self contained ecologies. Throw the eco-system out of balance, introduce something new and one of three things will happen -

things will improve
things will stay basically the same
things will get much much worse


I agree with your above analysis. You may have to reintroduce some physical IO in order to alleviate the run queue (classic toll booth problem - too many people arriving for a shared resource, the queueing mechanism breaks down) issue.

You seem to have thought this out pretty well from the above (there are shareable latches and such - so it is not "pure" serialization - but for purposes of discussion - I think it suffices to keep it at the level you have, if you start introducing all sorts of caveats and what not, it gets hard to explain fast - you have kept it pretty short, sweet and simple)

latch spin

rakesh, March 25, 2008 - 12:49 pm UTC

Hi,

We have also moved to 10g with huge memory. I have now changed db_cache_size to 4 gb as well. Now I want to check whether system is slow due to 'cache buffers chains'. I have issued following sql

SELECT TO_CHAR(c.end_interval_time,'dd-mon-yyyy hh24:mi') AS meas_date,
a.latch_name,a.spin_gets-b.spin_gets sp,(a.wait_time-b.wait_time)/1000000
FROM dba_hist_latch a, dba_hist_latch b, dba_hist_snapshot c
WHERE a.snap_id-1=b.snap_id AND a.latch_hash=b.latch_hash AND a.snap_id=c.snap_id
AND TO_CHAR(c.end_interval_time,'dd-mon-yyyy')='19-mar-2008'
AND a.latch_name = 'cache buffers chains'
ORDER BY 1 DESC;

MEAS_DATE LATCH_NAME SP (A.WAIT_TIME-B.WAIT_TIME)/1000
19-mar-2008 23:00 cache buffers chains 99 0.000178
19-mar-2008 22:00 cache buffers chains 335 0.000442
19-mar-2008 21:00 cache buffers chains 1124 1E-5
19-mar-2008 20:00 cache buffers chains 211064 0.678523
19-mar-2008 19:00 cache buffers chains 250097 0.16956
19-mar-2008 18:00 cache buffers chains 406650 0.278521
19-mar-2008 17:00 cache buffers chains 796037 2.550016
19-mar-2008 16:00 cache buffers chains 1509065 8.649397
19-mar-2008 15:00 cache buffers chains 501974 1.412104
19-mar-2008 14:00 cache buffers chains 246877 0.182511
19-mar-2008 13:00 cache buffers chains 2581430 29.728842
19-mar-2008 12:00 cache buffers chains 2575500 39.228069
19-mar-2008 11:00 cache buffers chains 2933422 12.903942
19-mar-2008 10:00 cache buffers chains 863905 4.209993
19-mar-2008 09:00 cache buffers chains 143077 0.879959
19-mar-2008 08:00 cache buffers chains 39977 0.006524
19-mar-2008 07:00 cache buffers chains 30448 0.001733
19-mar-2008 06:00 cache buffers chains 41517 0.71575
19-mar-2008 05:00 cache buffers chains 38253 0.231507
19-mar-2008 04:00 cache buffers chains 101261 10.454397
19-mar-2008 03:00 cache buffers chains 59203 0.203427
19-mar-2008 02:00 cache buffers chains 38750 0.375048
19-mar-2008 01:00 cache buffers chains 47769 0.003704
19-mar-2008 00:00 cache buffers chains 155411 0.126424

Now If we see the last column of output, it is saying the wait time for this particular latch in that interval. Can I conclude that system was not at all slow for this latch because maximum wait is 39 secs ( cpu was busy for 39 secs in that interval altogether for this latch only ) for snap when end time is 12 pm ?

Secondly, if we see snaps which ends at 11 am , 12 pm 13 pm, then we can find out sping gets were very high . But wait time in any case did not cross more than 40 secs. Is it ok for the system ? I think it is ok.

Please comment on observation.

Thanks
rakesh


Tom Kyte
March 26, 2008 - 8:34 am UTC

the latch algorithm looks like this:


loop
   for i in 1 .. 2000 
   loop
      try to get latch
      if gotten return
      if (i=1) then increment miss cnt end if;
   end loop;
   INCREMENT WAIT_CNT;
   wait....
   INCREMENT WAIT_TIME;
end loop; 



see how you can spin (2000 loop) without waiting - waiting happens after you have spun too long.

so, here:

19-mar-2008 13:00    cache buffers chains    2581430    29.728842
19-mar-2008 12:00    cache buffers chains    2575500    39.228069
19-mar-2008 11:00    cache buffers chains    2933422    12.903942


you can say "some of our cpu, perhaps a lot, was spent spinning (2.5 million times per hour) - some of our idle-non-cpu wait was due to not getting the latch after spinning and using cpu"


latch misses = cpu munching = a 'wait' that is not recorded as a 'wait', you were not waiting, you were spinning and burning cpu.

what you would like to do would be compare the response times with various cache sizes (eg: what USED to happen vs what is NOW happening)

latch wait_time

Nilanjan Ray, March 25, 2008 - 6:15 pm UTC

Rakesh,

Good point. Here's what I think:

1. Since latches are meant to be held briefly, 29-30 secs (even something of the order of 8-10 secs as well) could be significant. Since latches were designed to be held for much much shorter duration Oracle had the concept of latch spin because context switch could be real expensive.

Think of this example of a real busy day in a four lane traffic road. Something unexpected happens all of a sudden.Four cars in each of the lanes breaks down and the traffic is blocked on all the four lanes for an hour. This creates a huge queue (congestion) of cars, buses etc. All four cars are fixed in an hour but the queue (congestion) of traffic that is created may take much longer than an hour to clear.

Now if you think the lanes as CPU, the traffic as the processes and the faulty cars till they are fixed as holding the latch (holding the traffic) it will be possible to somewhat guess the impact of the 29-30 sec latch wait_time in the hourly window.

The example may not be very exact but I think it is representative enough to understand a simple concept: Even after the actual cause of the performance bottleneck is removed it might take a longer time for the system to return to normal, be it latches or others.

Tom I am eager to see your viewpoint.

Thanks
Nilanjan

latch wait_time

rakesh, March 25, 2008 - 7:50 pm UTC

What I think, if you see the data, then 35 secs in 1 hour is very negligible. It should not create any huge queue on OS.

From the above data , it does not prove that latch spin count was the main reason for the slow .

Tom, could you please tell me whether above data proves anything or not ? Is there any other data dictionary which can prove that latch was the main factor or not ?

Thanks
Tom Kyte
March 26, 2008 - 8:50 am UTC

see above, i answered above.

latch wait_time

rakesh, March 26, 2008 - 12:59 pm UTC

Hi Tom,

You are saying "what you would like to do would be compare the response times with various cache sizes (eg: what USED to happen vs what is NOW happening) "

How to find out response times with various cache sizes ?

There is another thing . Under latch activity section of AWR report, it is saying that no. of get requests of cache buffers chains is 236,246,745 and % get miss for the same latch is .12 ( 277,825 ). So in terms of percentage is very low. These are the statistics for 15 minutes interval. In 15 minutes interval, 236,246,745 is too high. There must be some sqls which are responsible for such high . Am I right ?

Thanks
rakesh

Tom Kyte
March 26, 2008 - 4:11 pm UTC

... How to find out response times with various cache sizes ? ...

probably too late now, since you probably have no metrics from your OLD system. Short of setting up a test with various sized caches on your existing system.

if your misses are low, then you are not spinning often (so I don't know why you get into a wait event so often...)


latch latch_wait

rakesh, March 27, 2008 - 8:38 am UTC

Hi Tom,

Could you please tell me how to measure response time against various cache sizes ? I don't know this. It will be good if I know this which I can use in future .

Thanks
Rakesh
Tom Kyte
March 27, 2008 - 10:45 am UTC

you don't know how to measure your applications response times?

Ok......

Well, one approach could be statspack.

take a snapshot, run a workload, take a snapshot.
change something
take a snapshot, run a workload, take a snapshot.


generate reports, compare them.

AWR/ASH/ADDM in 10g if you have them can do the same.

latch latch_wait

rakesh, April 02, 2008 - 3:38 pm UTC

Hi Tom,

I have one small doubt about cache buffer chain.

1) Let say there is parse and if it is hard parse, then internally oracle will query internal table for hard parsing. In this case will it have chance to increase cache buffer chain latch in busy system ?

2) If it is soft parse ( look for the pasred sql in memory ) , then is there any chance to increase cache buffer chain latch?

3) Or cache buffer chain latch is only meant for the user defined tables only ( not for the internal table ) ?

4) In awr report , there is section where sqls are listed on basis of perse calls . Does it mean any type of parse ( including hard, soft and softer ) ? If so , if I want to find out hard pasre for a particular sql listed in parse call section of awr report and cpu time consumed for this , is it possible ? If so then how ? If not , atleast is it possible to find out cpu consumption due to parsing for that sql ?

5) The following sql is doing huge parsing.

select min(bitmapped) from ts$ where dflmaxext =:1 and bitand(flags, 1024) =1024;

executions : 4,533,493 and parse calls : 4,535,624

Could you please tell me why this sql is issued and how to avoid parsing for this sql ?

6) If we run addm for same snap , then it is saying that if we can avoid this parsing ( listed in above section of awr report ) then there will be 11% improvement. Can I trust this ? Since there is impact of 11% for parsing , can we say this 11% may lead to most of the cpu usage ?

I know I have asked so many questions, but if you can answer thsi question , i will be higly obliged as we are having huge performance problem.

Thanks
rakesh
Tom Kyte
April 02, 2008 - 8:01 pm UTC

1) sure, the sql we execute on your behalf to execute your SQL does IO just like your sql does....

2) depends, auditing could - sure

(that said on 1 and 2 - the cache buffer chain latch would not be the thing to tune, the fact you are parsing like crazy would be - parsing in itself consumes hundreds/thousands of latch gets beyond a simple cache buffers chain latch)



3) no, all gets from the buffer cache will incur them

4) they are just parses, the parse might be hard, might be soft - doesn't matter - they are BOTH HUGELY EXPENSIVE. That parse count is just a count of parse calls

5) are you using a temporary tablespace group - that is a query associated with that. See bug 5455880

the workaround in 10g would be to not use a temporary tablespace GROUP - just use a temporary tablespace (I'm doubtful that anyone really needs a temporary tablespace group - striping at the OS/ASM level for temp should be more than sufficient).

there are various one off patches in 10g, but it would be easy to just not use that particular option.

6) "most", probably you cannot say "most". some or substantial - yes, most, no.


KR, July 15, 2008 - 7:59 pm UTC

Hi Tom - I have the following issue. I have an sql that is being run once a day. It generally tables couple of minutes until last week. Since then it takes few hours to finish and most of the wait events are "latch: cache buffers chains" I don't see this behaviour in any of the other envrionments. Why is this and also the developers say, at times it finishes in 2 minutes and dont' have trace for that. Wait class is concurrency and the wait event is latch: cache buffers chains and p2 value is 122.
My question is
1. I know the wait event is on latch: cache buffers chains and how to proceed to fix it. I am on 10.2.0.1.
2. It appears that this behaviour is not consistent.
3. How to simulate this on other envrionment?
4. I just have one single user running this sql and still wait class says concurrency. Why is that?

Eager to read your input and is highly appreciated.

Thanks,
KR.
Tom Kyte
July 15, 2008 - 8:20 pm UTC

does the plan stay consistent?


1) cbc latches indicate contention for the buffer cache. likely, when it runs "ok", it is the only thing running. When it runs 'bad', it is contending with lots of other users. Study that - what is different when it runs 'good' vs 'bad'

2) sure it is, you just haven't figured out what external variable is changing between good and bad. Once you do - you'll be able to do this performance difference any time you want.

3) you have to figure out #1 first.

4) you have more than one user - you need to look deeper, you have a batch, a job, a scheduler job, something else is running....



A reader, July 16, 2008 - 12:04 pm UTC

Hi Tom -
does the plan stay consistent?
--Plan is different when it gets into "latch: cache buffers chains" Important thing in the plan is it just does UNIQUE index range scan on pretty much all the tables in the query. When I use sql*plus it has different plan and finishes in no time. This is a problematic behaviour I am seeing for few weeks and generally ends with high db sequential read except this one went to cbc. When I took an sqltxplan, the same query had about 3 different plans and it picked a plan with low cost which is not an optimal, faster plan. Why didn't the optimizer pick the right plan? At the same time when I do explain plan, it ALWAYS picks up the correct plan which is faster execution time, but the cost is little higher. I was told it could be a 10.2.0.1 bug with optimizer in consistency with histograms. I use dbms_statistics with opt_method size AUTO. Is there any other explanation for this?

1) cbc latches indicate contention for the buffer cache. likely, when it runs "ok", it is the only thing running. When it runs 'bad', it is contending with lots of other users. Study that - what is different when it runs 'good' vs 'bad'
--How do I check someone else is quering. No lock on the table.
2) sure it is, you just haven't figured out what external variable is changing between good and bad. Once you do - you'll be able to do this performance difference any time you want.
--I have the sqlID that was fired through the etl job. If I get a sqltxplan should have any setup difference if at all. Is that the way to go?
3) you have to figure out #1 first.

4) you have more than one user - you need to look deeper, you have a batch, a job, a scheduler job, something else is running....
--This was run as a scheduled job. Even if another session is querying this data, would it lead to cbc? In typical reporting, how do I control this?

Thanks,
KR.
Tom Kyte
July 17, 2008 - 11:22 am UTC

1) they do not have to be querying that table, just doing stuff in the database.

use v$session to see who else is in there doing stuff.

2) did not follow you here - but, you have in fact identified what is different, you just told us "when this happens, it has a suboptimal plan"

4) does not matter if it is the same data or not. the buffer cache is a bunch of lists, these lists point to blocks, blocks every lots of different segments end up on the same list. what you have is indicative of a query that does a ton of logical IO (gets from cache) while others are trying to do the same - and we are all bumping into eachother getting into the cache to get the blocks.

A reader, July 17, 2008 - 2:17 pm UTC

Hi Tom -

1. For the issue with "latch: cache buffers chains", I increased the initrans to 4 on the tables and the index to 4 from default 2. Since then I am not getting cbc. Is that the fix for these issue?

2. I am not clear why would the optimizer picks the sub-optimal plan for a query which leads to high seq. read? While the same query works fine in sqlplus. This is not just one sql, but seeing many different sqls from the application, just based on the low cost picks a bad plan. For example, I have a partitioned table with 30M rows by year. The reporting query through PS application, does a unique scan and runs for ever with Sequential read. At the same time when I run on a sqlplus session, picks up another plan which is higher in cost but very efficient plan.

With this I am hitting a high performance issue. Why is this behaviour?

Thanks,
KR
Tom Kyte
July 18, 2008 - 4:07 pm UTC

1) nope, something else changed. cbc latches are latches to get a list of blocks. changing the attribute of a block like initrans won't do it.

2) insufficient data, but I would guess "bind peeking", when it was parsed the first time, the inputs to the query drove it to that plan. That plan was great for those inputs. That plan was bad for the set of inputs (bind variables) you happened to use.


"latch free" wait on one INSERT session followed by "library cache load lock" waits on other session

Chellam, October 21, 2008 - 10:43 pm UTC

Tom,

Oracle version : 8.1.7.4

One session was waiting on "latch free" wait during INSERT and observed the following recursive sql query during that time :

select ts#,file#,block#,nvl(bobj#,0),nvl(tab#,0),intcols,nvl(clucols,0),audit$,flags,pctfree$,pctused$,initrans,maxtrans,rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,analyzetime, samplesize,cols,property,nvl(degree,1),nvl(instances,1),avgspc_flb,flbcnt,kernelcols,nvl(trigflag, 0),nvl(spare1,0),nvl(spare2,0),spare4,nvl(spare3,0) from tab$ where obj#=:1

During this wait, INSERTS from other sessions were blocked and waited on the above session (latch free wait session). Blocked sessions were waiting on "library cache load lock" as shown below :

1* select * from v$session_wait where sid in (54,48,21,87)
SQL> /

       SID       SEQ# EVENT                                                            P1TEXT                              P1 P1RAW
---------- ---------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------- ----------------
P2TEXT                                                                   P2 P2RAW            P3TEXT                              P3 P3RAW              WAIT_TIME
---------------------------------------------------------------- ---------- ---------------- ---------------------------------------------------------------- ---------- ---------------- ----------
SECONDS_IN_WAIT STATE
--------------- -------------------
        21      10469 library cache load lock                                          object address                      1.5331E+10 0000000391CED3A0
lock address                                                     1.5329E+10 0000000391A89E08 10*mask+namespace                12811 000000000000320B          -2
            202 WAITED UNKNOWN TIME

        48       3100 library cache load lock                                          object address                      1.5331E+10 0000000391CED3A0
lock address                                                     1.5329E+10 0000000391A89A60 10*mask+namespace                12811 000000000000320B           0
            469 WAITING

        54      52845 library cache load lock                                          object address                      1.5331E+10 0000000391CED3A0
lock address                                                     1.5329E+10 0000000391A89F70 10*mask+namespace                12811 000000000000320B           0
            379 WAITING

        87      19509 library cache load lock                                          object address                      1.5331E+10 0000000391CED3A0
lock address                                                     1.5329E+10 0000000391A89BC8 10*mask+namespace                12811 000000000000320B           0
            161 WAITING


Question :

How do i find what particular latch "latch free" was waiting on ? 

What is the cause of Library cache load lock ..?

When i killed the "latch free" waiting session, all blocked sessions were successful. can you explain.

Thanks!

Chellam, October 22, 2008 - 5:12 pm UTC

Tom,

could you respond to the question above when you get a chance.
Tom Kyte
October 23, 2008 - 9:10 am UTC

well 8.1.7.4 is quite old.

it is more of a "diagnose my problem" rather than a followup/review.

v$session_wait
http://docs.oracle.com/docs/cd/B19306_01/server.102/b14237/dynviews_2094.htm#REFRN30229

describes the view that shows what sessions are waiting on - you would take p1, p2, p3 from there and goto the referenced "wait events" chapter to see what they mean (they would be the latch wait information - the latch address and such in this case)


You asked:

... When i killed the "latch free" waiting session, all blocked sessions were successful. can you
explain. ....


well, if you have a bunch of sessions waiting on session X - and session X is waiting on something, and you kill session X - the sessions waiting on X won't be waiting on X anymore.

latches vs physical ram ...oracle10.2.0.3, aix5.2

raghu, September 10, 2010 - 12:20 pm UTC

tom,

For one of our production systems, because of some discrepancy,
the physical RAM got reduced from 28G to 16G after a box reboot schedule.
Ever since that we are noticing high levels of latch waits intermittently in the system

During that time lot of sessions show valid blocking session in v$session,
these blocking sessions sql_id mostly shows some sql with literals

But these sqls(with literals) were present even before that reboot schedule
TOP WAIT EVENTs observed are - latch: library cache,latch: shared pool

if all the other things like application code,transaction nature & activity,
oracle parameters remain the same in the box can this sudden
increase in latches be attributed to difference in physical ram size ?

thanks.
Tom Kyte
September 13, 2010 - 7:50 am UTC

did the SGA size change at all - there are so many "hypotheticals" here.

Here is one hypothetical:

your SGA stayed the same size. The machine is now memory starved and paging much more than before.

Therefore, code is running slightly slower than before, the OS is doing a lot more work - more time in the kernel than before.

A latch holder has a much higher change of getting bumped off the CPU while holding a latch therefore.

Meaning, the people waiting for the latch will spin and burn CPU hoping the latch holder will give up the latch (which they cannot of course, they are not running right now - they are waiting for a cpu to run on).

Which leads to people going to sleep waiting on the latch

which means your latch wait times skyrocket.

There is one "for instance" that could have happened - yes.

library cache contention for same sessio

KG, March 10, 2011 - 12:46 pm UTC

Hi Tom
The database version is 11.2.0.1
i run below sql and Find Holder and Requestor is same session

SQL> l
  1  select s.sid holder,
  2     decode(x.kgllkmod,2,'S',3,'X',1,'N') "mode" ,
  3             x.KGLLKTYPE ktype ,
  4             w.sid requestor,
  5     decode(w.reqmode,2,'S',3,'X',1,'N') "req" ,
  6             w.wtype,
  7     o.KGLNAOWN||'.'||o.KGLNAOBJ locked_object ,
  8             o.KGLOBTYP ,
  9     s.sql_hash_value holder_sql,
 10     s.module holder_module,
 11     s.machine holder_srvr,
 12     hw.event  || ' - ' || hw.wait_time holder_wait
 13    from dba_kgllock x, x$kglob o, v$session s, v$session_wait hw,
 14      (select sid, p1raw, trunc(p3/10) reqmode
 15             ,decode(substr(event,15),'pin','Pin','lock','Lock') wtype
 16         from v$session_Wait
 17        where event like 'library cache %' ) w
 18  where x.kgllkHDL = w.p1raw
 19    and x.kgllkMOD > 0
 20    and x.kgllkuse = s.saddr
 21    and x.kgllkHDL = o.kglhdadr
 22    and x.KGLLKTYPE = wtype
 23    and s.sid = hw.sid
 24* order by 7,3,5

HOLDER m KTYPE  REQUESTOR r WTYP LOCKED_OBJECT                    KGLOBTYP HOLDER_SQL HOLDER_MODULE        HOLDER_SRVR     HOLDER_WAIT
------ - ------ --------- - ---- ------------------------------ ---------- ---------- -------------------- --------------- -----------------------------------
   984 S Pin          984   Pin  APPS./fd935ca6_TelnetOutputStr         29 2928025313 NPIFTPPJM            npi-or-app-p-21 library cache pin - -1


Is it possible ?
Whether pinning object would help ?
What needs to be done to avoid this issue

Regards

Tom Kyte
March 10, 2011 - 1:17 pm UTC

x and v$ tables are not read consistent, they are pure memory structures - you have to be very very careful when joining them together as you are reading things that could be changing thousands of times per second.

This is not my query, I'm not about to reverse engineer it, without a detailed explanation of the data we are supposed to be looking at - not going to go any further.

Not sure of data

KG, March 11, 2011 - 6:10 am UTC

Hi Tom
As i have not written the java code ,i am unsure of data/logic.
Is there any specific questions ,i should ask to developer .

Regards
Tom Kyte
March 11, 2011 - 7:39 am UTC

I'm not sure what problem you are experiencing - so I wouldn't know what to start asking.

can you describe what you are trying to fix exactly