Skip to Main Content

Breadcrumb

Question and Answer

Tom Kyte

Thanks for the question, Ahlam.

Asked: December 13, 2000 - 8:12 pm UTC

Last updated: April 20, 2012 - 4:13 pm UTC

Version: 734 through 816

Viewed 10K+ times! This question is

You Asked

One of our systems has serious problems on certain days. I am looking at some oracle views (v$system_event, v$session_event & v$session_wait) to make sense of the info gathered.

Top waits now are:
db_file_sequential_read
log_file_parallel_write
io done.

I looked in the oracle8 Ref for a list of Oracle Wait Events. I found it, but the explanation of P1-P3 in v$session_wait is so weak. Do you know of a better place to look?

Thanks in advance.

and Tom said...

For the db file sequential read, see
</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:957829973821 <code>
Log file parallel write is writing redo records to the redo log files from the log buffer. The wait time is the wait for LGWR to complete multiple log file writes. Even though they are written in parallel, one needs to wait for the last I/O to be on disk before the parallel write is complete.

Parameters:
P1 - No. of log members.
P2 - No. of log blocks.
P3 - No. of i/o requests

files Number of files written to.
blocks Number of blocks to be written.
requests Number of I/O requests.

P2 blocks will be written to each log member, in P3 number of
requests. i.e. If there were 2 requests, there would be two
physical writes per log member. The no. of log blocks will be
divided up between the requests.


IO done, this event is used by the IO slave processes (oracleInnn) when waiting for slave IO processes to complete a request or by idle slave IO processes waiting for work.

This wait event is used in both IDLE and NON-IDLE conditions
so can be very misleading.




Rating

  (125 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Interesting..

Doug Cowles, October 16, 2001 - 8:39 pm UTC

Very helpful. One more question though.. can you explain the meaning of idle and non-idle conditions?

Tom Kyte
October 16, 2001 - 8:53 pm UTC

It basically means that it can happen as the result of an idle wait (which is "innocent") or as the result of a non-idle wait (we were REALLY truly waiting for this to finish before doing something else)

For example:

ops$tkyte@ORA717DEV.US.ORACLE.COM> select event, time_waited
  2  from v$session_event
  3  where sid = ( select sid from v$mystat where rownum = 1 )
  4  and event = 'SQL*Net message from client'
  5  /

EVENT                                                            TIME_WAITED
---------------------------------------------------------------- -----------
SQL*Net message from client                                              263

ops$tkyte@ORA717DEV.US.ORACLE.COM> rem wait a bit....
ops$tkyte@ORA717DEV.US.ORACLE.COM> @a
ops$tkyte@ORA717DEV.US.ORACLE.COM> select event, time_waited
  2  from v$session_event
  3  where sid = ( select sid from v$mystat where rownum = 1 )
  4  and event = 'SQL*Net message from client'
  5  /

EVENT                                                            TIME_WAITED
---------------------------------------------------------------- -----------
SQL*Net message from client                                              925

Here the sql*net message from client is a true IDLE wait event.  As we sit in sqlplus, it just keeps on ticking up.....  It is showing that we are idle

However, its not ALWAYS an idle wait event.  Say you are running a pro*c app you wrote, its a batch program.  Its taking too long to run.  You look and see that the sql*net message from client is HUGE -- guess what that means.  In this case, it means the database is not the bottleneck, its spent alot of time WAITING for you to tell it to do something.  In this case, this is a non-idle condition, we were not idle (as an entire process -- client+database)....

 

Reader

Reader, October 16, 2001 - 9:13 pm UTC

Tom,

There is only one LGWR process right? How does it
write in parallel to redo log members

What constitutes the # of i/o waits (P3 value)


Thanks


Tom Kyte
October 16, 2001 - 9:18 pm UTC

There can be IO slaves in the system and most all systems support async IO (fire the write off but don't wait for it so you can fire that next one off -- then wait for them all to finish)..


P3 is the number of requests - not waits.

Reader

Reader, October 17, 2001 - 9:36 am UTC


I understand async i/o. In some some systems ( ex:Silicon Graphics), even disk_asynch_io = true, does not enable
asynch i/o because of some hardware requirements.

In such cases, does Oracle write to log members alternatively in a round-robbin fashion. If so how big a
Oracle buffer is used for each batch of logfile parallel
write. ( I've read from an article that ARC process reads
from the redo logs alternatively buffer = archive_log_buffer_size).

Again could you let us know what constitutes *no. of io requests*

Thanks

Tom Kyte
October 17, 2001 - 10:02 am UTC

the number of IO requests is how many requests we made to perform IO. How many writes we did to write the data. If you have 1meg of data to write, you typically don't do it in one big call.

As I said, we can use IO slaves to simulate async IO or if there is no async io, we would do it sequentially. We dump the log buffer (happens upon commit, every 3 seconds, when 1/3 full, when 1meg full -- whichever comes first). It is a function of the size of the log buffer which you control.

Follow up on IDLE and Non-idle conditions

Doug, October 17, 2001 - 9:06 pm UTC

Just to follow up on idle and non-idle conditions. I understand the difference, however you said that the log posting waits can include both idle and non-idle conditions and can therefore be misleading. Does this mean that I'm going to get log file sync and log parallel write waits even if there is nothing wrong and it is just waiting for a checkpoint interval for example? Log file sync and log file parallel wait are showing up as the #2 and #3 wait events in one system I watch over which I think is peculiar. I usually seem them in the top 10 or so but never right at the top.

Tom Kyte
October 17, 2001 - 9:37 pm UTC

IO done is the idle/non-idle event. Log file Parallel Write -- not an idle/non-idle event.

Log stuff -- real time waits.

IO done, maybe real time, maybe not.

What about PX DEQ waits

DGR, October 03, 2002 - 1:44 pm UTC

I can't find ANY documentation on these, but they keep showing up on my reports and people are asking me what can be done about them.

I know some of these must be meaningless (when looking for problems), but can all of them be ignored?

The two I see most often are Execute Reply and Credit:send blkd.

I realize that 2 parallel operations can take different amounts of time to finish their tasks and that some waits cannot be avoided (or even considered bad), I just don't know which ones.

I've seen lot's of questions on the internet about these, but no answers. Can you tell if any of the PX Deq waits are interesting or significant/correctable?

Thanks.

Tom Kyte
October 03, 2002 - 6:30 pm UTC

search for

px deq

on this site. Ignore this article, read the other (currently) two.

SQL*Net message from client waited time

vinodhps, May 07, 2003 - 8:32 am UTC

Hi... Tom,
In our production system , while executing one procedure it take very long time to execute it when i run the query individualy it run fast.. so i traced the session and found that my SQL*Net message from client  waited that much time... Here what will the exact reason for this Tom.



SQL> select * from v$session_event where sid=88;

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.03          0          0          0           0
Execute  18480     36.69      88.65          0          0          0           0
Fetch    18480      8.77      20.28          0      73920          0       18480
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    36961     45.46     108.96          0      73920          0       18480

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 324     (recursive depth: 1)

     SID EVENT                          TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
--------- ------------------------------ ----------- -------------- ----------- ------------
 MAX_WAIT
---------
        3

       88 SQL*Net more data to client              1              0           0            0
        0

       88 SQL*Net message from client            304              0      680756    2239.3289
   382821

SQL> select * from v$session_wait where sid=88;

      SID      SEQ# EVENT
--------- --------- ------------------------------
P1TEXT                                                                  P1 P1RAW
---------------------------------------------------------------- --------- --------
P2TEXT                                                                  P2 P2RAW
---------------------------------------------------------------- --------- --------
P3TEXT                                                                  P3 P3RAW    WAIT_TIME
---------------------------------------------------------------- --------- -------- ---------
SECONDS_IN_WAIT STATE
--------------- -------------------
       88      2132 SQL*Net message from client
driver id                                                        675562835 28444553
#bytes                                                                   1 00000001
                                                                         0 00               0
            136 WAITING

Thanks in advance and Your doing  great job tom........

 

Tom Kyte
May 07, 2003 - 8:49 am UTC

sqlnet message from client means .....

we were sitting idle waiting for the client to bug us to do something. meaning, the slowdown -- well, it is in the client, it is not asking us to do any work.


now, looking at that tkprof -- that "whatever" statement you have the report for runs instantly -- less then 5/1000ths of a second by the wall clock.


However, you do it lots and anytime you do something lots, it ADDS UP.

I'm always suspicious when I see statements being executed 10's of thousands of times. It leads me to believe the coder only knew how to do row at a time processing, and doesn't do any bulk SET oriented operations.

That is like the difference between going to the library and asking for War and Peace a page at a time -- making the librarian run back and forth and back and forth

versus

Asking for War and Peace -- one bulk operation....


I think the problem lies in the logic of the client.

SQL*Net message from client waited time

vinodhps, May 07, 2003 - 1:13 pm UTC

Thanks for your immediate responce tom, well my query goes like this below.

Can you please tell me with an example about can librarian run page for "WAR" & "PEACE"

SELECT
SUBSTR (a.prd_cd,1,1) PRD_GRP,
SUBSTR (a.prd_cd, 2, 3) variety_cd,
SUBSTR (a.prd_cd, 5, 1) std_ret_ind,
TO_NUMBER (SUBSTR (a.prd_cd,6,3)) gsm,
TO_NUMBER (SUBSTR (a.prd_cd, 9, 4)) / 10 width,
TO_NUMBER (SUBSTR (a.prd_cd, 13, 4)) / 10 length,a.mchn_no,
(SUM(NVL(a.st,0)) - NVL(b.pend_qty,0)) qty,
DECODE( SUBSTR(a.prd_cd,2,3) , c.variety_cd, c.dscnt_cd,
DECODE(c.variety_cd, 'ALL', c.dscnt_cd,NULL)) dscnt_cd,
DECODE( SUBSTR(a.prd_cd,2,3) , c.variety_cd, c.oth_dscnt,
DECODE(c.variety_cd, 'ALL', c.oth_dscnt,NULL)) oth_dscnt,
DECODE( SUBSTR(a.prd_cd,2,3) , c.variety_cd, c.remarks,
DECODE(c.variety_cd, 'ALL', c.remarks,NULL)) remarks
FROM (
SELECT prd_cd, mchn_no,
Func_mks_gdw_ret_stock (prd_cd,DECODE ( MCHN_NO , 1, 1, 2,2),
DECODE ( MCHN_NO , 1, st_date, 2, st_date), 1) st
FROM (
SELECT prd_cd, mchn_no, MAX (stock_date) st_date
FROM t_gdw_daily_stock
WHERE stock_date >= '1-APR-2001'
AND (DECODE (SUBSTR (prd_cd, 13, 4), '0000', 'R', 'S') = 'B'
OR 'B' = 'B')
AND (SUBSTR(prd_cd,2,3) = 'CRW' OR 'CRW' = 'ALL')
AND ( (TO_NUMBER(SUBSTR(prd_cd,6,3)) BETWEEN 999 AND 999)
OR (999 = 999 AND 999 = 999 ))
AND ( (SUBSTR(prd_cd,5,1)= 'B' ) OR ('B' = 'B'))
AND (mchn_no = 0 OR 0 = 0)
GROUP BY prd_cd,mchn_no
)
) a,
(
SELECT prd_cd, SUM (NVL(accpt_qty,0) - NVL (dsp_qty, 0)) pend_qty
FROM t_ord_prd_dtls
WHERE ROWID IN
(
SELECT /*+ index_ffs(T_ORD_PRD_DTLS CS_ORD_PRD_DTL_PK) */ rowid
FROM t_ord_prd_dtls
WHERE prd_status = 'N'
AND (accpt_qty - NVL(dsp_qty,0)) >= .2
AND (DECODE (SUBSTR (prd_cd, 13, 4), '0000', 'R', 'S') = 'B' OR 'B'= 'B')
AND (SUBSTR(prd_cd,2,3) = 'CRW' OR 'CRW' = 'ALL')
AND ( (TO_NUMBER(SUBSTR(prd_cd,6,3)) BETWEEN 999 AND 999)
OR (999 = 999 AND 999 = 999 ))
AND ( (SUBSTR(prd_cd,5,1)= 'B' ) OR ('B' ='B'))
AND (mchn_no = 0 OR 0 = 0)
)
GROUP BY prd_cd
) b,
(
SELECT a.mchn_no, a.variety_cd, a.from_gsm, a.to_gsm, a.from_width,
a.to_width, a.from_length, a.to_length, a.dscnt_cd,
a.oth_dscnt, a.std_ret_ind, a.remarks, b.amt
FROM t_mks_stock_lot_mst a,
t_mks_gen_dscnt_mst b
WHERE a.DSCNT_CD = b.DSCNT_CD
AND TRUNC(SYSDATE) BETWEEN a.FROM_DATE AND a.TO_DATE
) c
WHERE a.prd_cd = b.prd_cd
AND (( TO_NUMBER(SUBSTR(a.prd_cd,7,2)) BETWEEN c.from_gsm AND c.to_gsm)
OR (TO_NUMBER(SUBSTR(a.prd_cd,7,2)) = TO_NUMBER(SUBSTR(a.prd_cd,7,2))))
AND (( TO_NUMBER(SUBSTR(a.prd_cd,9,4))/10 BETWEEN c.from_width AND c.to_width )
OR ( TO_NUMBER(SUBSTR(a.prd_cd,9,4))/10 = TO_NUMBER(SUBSTR(a.prd_cd,9,4))/10) )
AND ((TO_NUMBER(SUBSTR(a.prd_cd,13,4))/10 BETWEEN c.from_length AND c.to_length)
OR ( TO_NUMBER(SUBSTR(a.prd_cd,13,4))/10 = TO_NUMBER(SUBSTR(a.prd_cd,13,4))/10))
AND (SUBSTR(a.prd_cd,2,3) = c.variety_cd
OR 'ALL' = c.variety_cd
OR SUBSTR(a.prd_cd,2,3) =SUBSTR(a.prd_cd,2,3))
AND (SUBSTR(a.prd_cd,5,1) = c.std_ret_ind
OR 'B' = c.std_ret_ind
OR SUBSTR(a.prd_cd,5,1) = SUBSTR(a.prd_cd,5,1))
AND (a.mchn_no = c.mchn_no
OR c.mchn_no = 0
OR a.mchn_no = a.mchn_no )
GROUP BY
SUBSTR (a.prd_cd,1,1) ,
SUBSTR (a.prd_cd,2,3) ,
SUBSTR (a.prd_cd,5,1) ,
TO_NUMBER (SUBSTR (a.prd_cd,6,3)) ,
TO_NUMBER (SUBSTR (a.prd_cd, 9, 4)) / 10 ,
TO_NUMBER (SUBSTR (a.prd_cd, 13, 4)) / 10 , a.mchn_no, B.PEND_QTY,
DECODE( SUBSTR(a.prd_cd,2,3) , c.variety_cd, c.dscnt_cd,
DECODE(c.variety_cd, 'ALL', c.dscnt_cd,NULL)) ,
DECODE( SUBSTR(a.prd_cd,2,3) , c.variety_cd, c.oth_dscnt,
DECODE(c.variety_cd, 'ALL', c.oth_dscnt,NULL)) ,
DECODE( SUBSTR(a.prd_cd,2,3) , c.variety_cd, c.remarks,
DECODE(c.variety_cd, 'ALL', c.remarks,NULL))
HAVING (SUM(NVL(a.st,0)) - NVL(b.pend_qty,0)) > 0

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.04 0 0 0 0
Execute 1 0.00 0.02 0 0 0 0
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.00 0.06 0 0 0 0

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 324
********************************************************************************

SELECT (NVL(OPNG_STOCK,0) + NVL(RCPT_STOCK,0) - NVL(ISS_STOCK,0) -
NVL(RET_STOCK,0) )
FROM
T_GDW_DAILY_STOCK WHERE STOCK_DATE = :b1 AND PRD_CD = :b2 AND MCHN_NO =
:b3


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.03 0 0 0 0
Execute 18480 36.69 88.65 0 0 0 0
Fetch 18480 8.77 20.28 0 73920 0 18480
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 36961 45.46 108.96 0 73920 0 18480

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 324 (recursive depth: 1)

Thanks in Advance Tom,


Tom Kyte
May 08, 2003 - 9:03 am UTC

look at your logic

try to figure out "well, hey, why do I run this query 18,480 times. PERHAPS I am missing an opportunity to join"

that is, instead of codeing like this:

for x in ( query1 )
loop
run query2 with inputs from query1
process


just do this


for x in ( query1 joined to query2 )
loop
process.

vinodhps, May 08, 2003 - 11:38 am UTC

Hi Tom,

That was great tom, and iam very happy with your responce 
This time i got 2 different senario 
1)In this table i got 35,000 rows and 316 block appox~  75 avg row len 8KB block size but here why it takes these much of time..(wait time)

Here using 10046 i traced the session

SELECT SUM (DECODE (doc_type, 'FI', 1, 'BI', 1, 'EI', 1, 'DN', 1, 'IC', 1, -1) * out_amt),
         SUM (DECODE (doc_type, 'FI', 1, 'BI', 1, 'WI', 1, 0))
      FROM   t_fas_sale_awa
      WHERE  br_code = br_code
      AND    doc_type IN ('FI', 'BI', 'EI', 'DN', 'IC', 'CN', 'ADV', 'UMR')
      AND    doc_no = doc_no
      AND    prd_group_ind = DECODE (:b5, 'B', prd_group_ind, :b5)
      AND    indr_code = DECODE (:b4, 'ALL', indr_code, :b4)
      AND    cons_cd = DECODE (:b3, 'ALL', cons_cd, :b3)
      AND    :b2 - doc_date > :b1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute   2573      1.03       1.03          0          0          0           0
Fetch     2573    303.79     817.92     183000     975167          0        2573
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     5146    304.82     818.95     183000     975167          0        2573

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 21     (recursive depth: 2)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                      1680        0.11         10.21
  db file scattered read                      19032        0.20        503.57
********************************************************************************


2) At same session the overall value make me worry about the procedure 

SQL> @t1

Query Plan
--------------------------------------------------------------
SELECT STATEMENT    [CHOOSE] Cost=19 Rows=2332 Bytes=16324
  TABLE ACCESS FULL T_MKS_CONS_MST  [ANALYZED]

/*
select cons_cd, func_fas_chk_custind_avail_bal ('P', 'ALL', cons_cd, sysdate,
                        60, 'Y', 'B') ans
from   t_mks_cons_mst
where  cons_type = 'C';
*/

CREATE OR REPLACE
FUNCTION func_fas_chk_custind_avail_bal (f_prd_group_ind    IN    VARCHAR2,
                         f_indr_code        IN    VARCHAR2,
                     f_cust_code        IN    VARCHAR2,
                     f_as_on_date        IN    DATE,
                     f_days            IN    NUMBER,
                     f_is_2give_net_bal    IN    VARCHAR2,
                     f_chk_what        IN    VARCHAR2)
RETURN NUMBER AS
v_awa_bal        NUMBER := 0;
v_pending_qty_bal    NUMBER := 0;
v_os_inv_cnt        NUMBER := 0;
v_ret_val        NUMBER := 0;
v_limit            NUMBER := 0; 
v_days            NUMBER := f_days;
BEGIN
   BEGIN
      IF f_is_2give_net_bal = 'Y' THEN
     IF f_cust_code <> 'ALL' THEN
            SELECT cred_lmt, credit_days
        INTO   v_limit, v_days
        FROM   t_mks_cons_cred_lmt_mst
        WHERE  cons_cd = f_cust_code
        AND    f_as_on_date BETWEEN from_date AND NVL (to_date, f_as_on_date);
         ELSE        
            IF f_indr_code <> 'ALL' THEN
               SELECT cred_lmt, credit_days
           INTO   v_limit, v_days
           FROM   t_mks_ind_cred_lmt_mst
           WHERE  ind_cd = f_indr_code
           AND    f_as_on_date BETWEEN from_date AND NVL (to_date, f_as_on_date);
            END IF;    
         END IF;    
      END IF;
   EXCEPTION
      WHEN NO_DATA_FOUND THEN
         v_limit := 0; -- temp val so that output will fail
     v_days := 0;
   END;

   IF f_chk_what IN ('A', 'B') THEN
      SELECT SUM (DECODE (doc_type, 'FI', 1, 'BI', 1, 'EI', 1, 'DN', 1, 'IC', 1, -1) * out_amt),
         SUM (DECODE (doc_type, 'FI', 1, 'BI', 1, 'WI', 1, 0))             
      INTO   v_awa_bal, v_os_inv_cnt
      FROM   t_fas_sale_awa
      WHERE  br_code = br_code
      AND    doc_type IN ('FI', 'BI', 'EI', 'DN', 'IC', 'CN', 'ADV', 'UMR')
      AND    doc_no = doc_no
      AND    prd_group_ind = DECODE (f_prd_group_ind, 'B', prd_group_ind, f_prd_group_ind)
      AND    indr_code = DECODE (f_indr_code, 'ALL', indr_code, f_indr_code)
      AND    cons_cd = DECODE (f_cust_code, 'ALL', cons_cd, f_cust_code)
      AND    f_as_on_date - doc_date > v_days;
   END IF;

   IF f_chk_what IN ('O', 'B') THEN
      IF f_cust_code <> 'ALL' THEN
         SELECT NVL (pending_order_value, 0)
     INTO   v_pending_qty_bal
     FROM   t_mks_cons_mst
     WHERE  cons_cd = f_cust_code;    
      ELSE
         IF f_indr_code <> 'ALL' THEN
            SELECT NVL (pending_order_value, 0)
        INTO   v_pending_qty_bal
        FROM   t_mks_ind_mst
        WHERE  ind_cd = f_indr_code;
     END IF;    
      END IF;     
   END IF;

   v_ret_val := v_awa_bal + v_pending_qty_bal;


   v_ret_val := v_ret_val - v_limit;

   IF v_ret_val > 0 AND v_os_inv_cnt > 0 AND f_cust_code <> 'ALL' THEN
      v_ret_val := v_ret_val * -1;      
   END IF;

   RETURN (v_ret_val);
END func_fas_chk_custind_avail_bal;
/

 

Tom Kyte
May 09, 2003 - 11:48 am UTC

well that is about as ugly as it gets. sorry -- you'll want to relook at the logic, see if you can tighten it up, see if you can move things around. It is a bummer when the data you need is strewn throughout your database sort of all over the place. Makes it really hard to piece back together.

"WAR" & "PEACE"

vinodhps, May 08, 2003 - 12:17 pm UTC

Hi tom ,
With Your Referance for

just do this


for x in ( query1 joined to query2 )
loop
process.

Actually i would like to tell what we are doing is..

we got 6696 consinee for each person we are doing that process how do i join query one with query two.

so do i have to process 6696 times ...

can you clear my doubt Tom..

Thanks In Advance.



SQL*Net message from Client - more on

A reader, August 16, 2003 - 9:11 am UTC

Hi Tom,
I run a query from SQL*Plus (connected directly - no connect string passed). I have a very unpredictable and high wait against SQL*Net message from client, when I see the SQL Trace. Why is that when a query is submitted a non-idle event SQL*Net message from client kicks into? Please tell me more on this event. This is the only event that is making the difference to the query from a minute to a sec.
Thanks

Tom Kyte
August 16, 2003 - 9:12 am UTC

need

a) example
b) how you are computing waits
c) versions
d) platforms
e) lots of stuff so I could make an educated guess...

How to get v$mystats contect from other session

A reader, January 21, 2005 - 4:44 am UTC

Hi Tom, is there a way to get the content I can see in the view v$mystats in my session from other session, please

Thank your Tom

Tom Kyte
January 21, 2005 - 8:26 am UTC

you use v$sesstat instead, it has them all.

A reader, January 21, 2005 - 1:27 pm UTC

Thanks Tom

Is there a view showing all stats per class

A reader, January 24, 2005 - 2:08 pm UTC

Hi Tom in 10g
Sorry, is there a view showing only all stats per class
I can't find.
Thanks.

Tom Kyte
January 24, 2005 - 3:38 pm UTC

the "stats by class" are in v$sysstat

A reader, January 24, 2005 - 5:48 pm UTC

Thanks Tom, I found them in v$event_name

Please clarify

A reader, January 25, 2005 - 4:44 pm UTC

Tom,

In the above discussion, you mentioned
<quote>
now, looking at that tkprof -- that "whatever" statement you have the report for
runs instantly -- less then 5/1000ths of a second by the wall clock.

However, you do it lots and anytime you do something lots, it ADDS UP.

I'm always suspicious when I see statements being executed 10's of thousands of
times. It leads me to believe the coder only knew how to do row at a time
processing, and doesn't do any bulk SET oriented operations.

That is like the difference between going to the library and asking for War and
Peace a page at a time -- making the librarian run back and forth and back and
forth
</quote>

1. Where can you find the information you mentioned: "from the tkprof report, it runs runs instantly -- less then 5/1000ths of a second by the wall clock."
2. Is there a way to differentiate the "SQL*Net message from client" in a situation where a client say SQL*Plus, just logs into SQL*Plus without doing nothing and the clock ticking up vs. the scenario where a batch process runs and the server waits for the client asking it to do something
3. What is the meaning of the following wait events :
1. SQL*Net message from client
2. SQL*Net message to client
3. SQL*Net more data from client
4. SQL*Net more data to client
I read the descriptions of Oracle manuals and found very difficult to understand and interpret.

5. What can we infer if SQL*Net message from client is huge and we're running a batch-job as you had mentioned.

Your help is greatly appreciated






Tom Kyte
January 25, 2005 - 7:09 pm UTC

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.03 0 0 0 0
Execute 18480 36.69 88.65 0 0 0 0
Fetch 18480 8.77 20.28 0 73920 0 18480
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 36961 45.46 108.96 0 73920 0 18480


36.69/18480 = 0.001985 cpu seconds/execution
88.65/18480 = 0.004797 wall clock seconds/execution.

the query ran really really fast. They just ran it over and over and over.

2) you would "know" -- in a batch program -- you know "this is batch, therefore sqlnet message from client is not an idle wait event.

or "these 5 queries are executed in rapid succession in our application -- any sqlnet message from client waits during their execution (trace file contains this raw data) indicates client isn't processing data nearly as fast as the database is throwing it at it.

3) 1 - database waiting to be told to do something
2 - database is sending message to client but having to wait (slow link to
client perhaps)
3 - waiting on read from client -- they've started talking to us, but
it is taking a long time to get the entire message
4 - same as #2 but the n'th send in a series of sends.


4) I like 4, it was easy to answer :)

5) make the client process data faster "typically" (that is usually the cause, the client is taking a long time to process the data, hence the database is sitting there "waiting")

if the client is going at light speed, split the client up -- implement "do it yourself parallel processing" -- run N copies of your client at the same time.

Typo!!!

A reader, January 25, 2005 - 10:25 pm UTC

Hi Tom,

Thanks much for your answer. However, for question 5 above, in your last paragraph you mention "if the client is going at light speed...", however after reading the discussion above it I understand it should be "if the client is NOT going at light speed...". Please advice if it is a typo or my understanding is not correct.

Tom Kyte
January 26, 2005 - 8:24 am UTC

it was not a typo. I was saying "if the bottleneck is in the client, but the client is already going light speed (as fast as it can), the only way to remove the bottleneck is by adding more of the same clients -- all working at light speed, but getting the job done faster since there are more of them"

RE : Bulk collect

A reader, January 26, 2005 - 9:20 am UTC

Tom,


I have understood the significance of SQL*Net message from client. However, I have the following question regarding "SQL*Net message from client".

1. "SQL*Net message from client" event will be applicable only
A. If the client and server are on different machines and the client uses Net8/SQL*Net for communicating with server
B. If the client and server are on same machine but the client uses Net8/SQL*Net using the protocol such as TCP/IP to communicate with the server instead of using the IPC communication.
Is my understanding correct?
2. When the client say a PL/SQL or Pro*C application does a bulk fetch of 100 rows at a time, the server process would bring back 100 rows at a time to the client. The client process would then these 100 records 1 at a time.
In that case what would happen to SQL*Net message from client event? Is it not true that the "SQL*Net message from client" would keep ticking (since client is doing it's own processing and the server is waiting for the client to give some work). Please advise if I am not correct. If my understanding is correct, how would we identify a difference between a true bottleneck and a bulk operation?
3. Also, if the client and server are running on the same box and if the client uses Net8/SQL*Net connection to connect to the server rather than IPC, how would the performance compare? What would happen to "SQL*Net message from client" event in these cases.

Sorry that I get these kind of doubts as and when I understand more about how these wait events work.

As always, your help is highly appreciate to clear the confusions I have

Tom Kyte
January 26, 2005 - 9:35 am UTC

1) is is applicable in all cases. it is the wait registered by the server under all protocols when the server is just sitting there "waiting"

2) anytime the server isn't being asked to do something, it is ticking away at this wait event.

how do you recognize the difference between a true bottleneck and a bulk operation? first you have to tell me what the difference is! You look at the accrued wait time of say "15 seconds" encountered during the fetch of those rows. Now, you tell me (with your domain knowledge of your process -- something you and only you has...) is 15 seconds

a) good
b) bad
c) just ok

if 15 seconds of server idle time while the client processed the records is a) good, then so be it. You and only you can make that determination. Perhaps that was 15 seconds but we processed 1 billion records. that is probably good.

But if we processed 101 records, it is probably (PROBABLY) bad. but again, only you can make that determination.

3) ipc gives the same metrics.





Unit of measure of statitics

A reader, January 28, 2005 - 4:13 am UTC

Hi Tom I would like to know which is the unit of mewasure for al statistics in mystats view, is this possible, or the only unit of measure is blocks.

Thank you

Tom Kyte
January 28, 2005 - 7:28 am UTC

depends on the statistic.

see the reference guide

</code> http://docs.oracle.com/docs/cd/B10501_01/server.920/a96536/apc2.htm#1186645 <code>



A reader, January 28, 2005 - 10:09 am UTC

Thanks Tom I was searching for something like another view to join, and get the unit of measure :)

Lost time.

Sai, January 28, 2005 - 6:35 pm UTC

Tom,

We are using Oracle 8.1.7.4 64 bit and SAN storage. Here, the total time waited by each background process from v$session_event not matching with the database up time.

15:03:22 SQL> select sum(time_waited)/(100*3600) from v$session_event where sid=4;


SUM(TIME_WAITED)/(100*3600)
---------------------------
                 36.6146056

  1* select (sysdate-startup_time)*24 from v$instance where 4=4
15:04:53 SQL> /

(SYSDATE-STARTUP_TIME)*24
-------------------------
                   58.935

15:04:53 SQL> select sum(time_waited)/(100*3600) from v$session_event where sid=2;

SUM(TIME_WAITED)/(100*3600)
---------------------------
                 56.1558333

15:07:25 SQL> c/2/3
  1* select sum(time_waited)/(100*3600) from v$session_event where sid=3
15:10:27 SQL> /

SUM(TIME_WAITED)/(100*3600)
---------------------------
                 55.2067444

There Sid's 2 and 3 are the DB writers and Sid 4 is the Log writer.

can you please explain why the timings didnt match?

Thanks 

Tom Kyte
January 28, 2005 - 7:31 pm UTC

why do you think they should match even a little bit?




Re: Lost time.

Sai, January 28, 2005 - 9:03 pm UTC

Tom,

My understanding is that all Oracle background processes record their time spent on each waitevent in v$session_event, including the idle waitevents like 'rdbms%' or '%timer', hence I thought the times should match.

Othewise, where is that time spent, which is not recorded in v$session_event?

Thanks

Tom Kyte
January 29, 2005 - 8:26 am UTC

it is spent "not waiting on something"

If they were always waiting "on something", your database would be "frozen" :)



Pls. clarify

A reader, February 03, 2005 - 9:21 am UTC

Hi Tom,

In the discussion above, you mentioned the following :
<Quote>
2) you would "know" -- in a batch program -- you know "this is batch, therefore
sqlnet message from client is not an idle wait event.

or "these 5 queries are executed in rapid succession in our application -- any
sqlnet message from client waits during their execution (trace file contains
this raw data) indicates client isn't processing data nearly as fast as the
database is throwing it at it.
</Quote>

1. In the above case, when the database throws data to the client which the client could not process as fast as it is received from database, you will see some "SQL*Net message to client" wait event also when the database waits for the message to be sent to client. Is it correct?
2. How does the server process know that the client has received all of the data it sent to the client and stop waiting on "SQL*Net message to client". Will there be a hand-shake between client and server?
3. For all "SQL*Net message from client", will there be "SQL*Net message to client" wait event? If not, what are the situations under which the database will send data to the client and what are the situations under which the "SQL*Net message to client" wait event seen?
4. How do we know from SQL*Net message from client wait event that the bottle-neck is in the client? By this I mean, the database would have sent the data to the client back, but there may be network problem/latency that makes client not to receive all of the data before it can process them, so the server may be waiting on "SQL*Net message from client". In the above case will it reflect in "SQL*Net message to client" wait event also in which case we can conclude this is a network latency?

I would appreciate if you could give me an example of how the client/server communication works and how the "SQL*Net message ..." wait event works. If you could give a step-by-step scenario of how the communication takes place, it would be really helpful in understanding how these wait events work together?

Tom Kyte
February 03, 2005 - 2:24 pm UTC

1) if the data base sends data to the client and the client doesn't tell database to do something - sqlnet message from client starts to add up, increment. yes.

2) we sent the last bytes, we are done. now we are waiting. no handshake needed, we finished our send. so sqlnet message from client could be due to

a) client being slow
b) network buffering a ton of data and sending it really slow

3) maybe, maybe not. we report waits when and if we wait, not if we don't have to wait for something

4) you check out things, look at the network, look at the code in the client. if the client is instruemented (fat chance, people just never ever seem to do that :( ) that would be very useful, you could see timings from it.



database waiting for client to say something - sqlnet message from client+++++
client says something, sqlnet message from client written out to trace

it is as simple as it sounds really. while waiting, wait increments, when done waiting, wait is written out to trace file



Some doubts

A reader, February 03, 2005 - 3:30 pm UTC

Tom,

You mentioned in the above thread that when the server sends the last byte to the client, then the server process has done it's job and no hand-shanking needed. So under what circumstances will we see server waiting on "SQL*Net message to client". So my understanding on this is as follows :

Assume process C is client and process S is the server. A typical client/server communication looks like this :

0. S is waiting on "SQL*Net message from client" and the count keeps ticking
1. C sends a message to S
2. A "SQL*Net message from client" event is recorded in trace file
3. S processes and sends the results back to client.
4. After step 3 is complete, S starts waiting again for
"SQL*Net message from client" and increments the counter
5. C (client) processes the data that S sent and sends data/another request to S. Another "SQL*Net message from client" event is written to the trace file.

Questions :

Qn #1. Is the above sequence of steps I mentioned correct? Please advise if this is not correct.

Qn #2. After step 3, a "SQL*Net message to client" event is recorded in trace file if S waits. Correct? If so, under what circumstances will that
happen (because you have mentioned in the above thread that no hand-shaking is needed and S is done once it sends the last byte).

Qn #3. If the time difference between when S sends data to client and starts waiting (step 3 & 4) and when S gets another data/request
from client C is large (step 5), then it can mean :
a) client C is slow
b) network buffering a ton of data and sending it to client C really slow
Is that correct?

Qn #4. Will "SQL*Net message from client" count be reset to 0 each time the server S sends data back to the client and waiting on client C
to send a new request

Qn #5. In one of the discussion threads, you mentioned that "CPU time" is one of the response time components that is written out to the
trace file once the call is completed and not as it happens. Also, I remember the same thing about "SQL*Net message from client". ie
it is recorded in the trace file only when client sends another request. Is it correct?

Qn #6. What are the other wait events that are in this category (mentioned in Qn #5).

As always, thanks much for your help to Oracle community

Tom Kyte
February 04, 2005 - 1:26 am UTC

that is the time it takes for the send to complete.


start timing
send(...)
stop timing, output wait for "to client"
start timeing
read( .....)
stop timing, output wait for "from client"

Pls. clarify

A reader, February 06, 2005 - 9:04 pm UTC

Hi Tom,

In your book "Expert one-on-one Oracle", you provide an example in chapter 10 that captures the session event once before a PL/SQL block (that does EXECUTE IMMEDIATE in a for loop to insert into a table T. After the example, you mention about the description of SQL*Net message from client. You mentioned that "In most cases, this wait event can be ignored. If application has lots of think-time in it, this number will necessarily be high. In our case, since the SQL*Plus session should have been continously been feeding statements to the database during the time we measured therefore we expect a lower number there". The following are my questions. Please let me know if my understanding is correct. Otherwise, please explain me.

1. My understanding of "SQL*Net message from client" is that the server process keeps incrementing the time waited column until the client session is connected and until the client (SQL*Plus in this case) requests the server to do something. That is the reason you mention that the wait-time on this event will be high ? Correct?
2. When the client sends a request to the server, the wait is written out to the trace file and the counts are reset and the server starts waiting again. Is that the reason, you mention that "you expect a lower number there" ?
3. If the client is extremely slow, then there will be FEWER NUMBER of WAITS (TOTAL_WAITS), but the TIME_WAITED between consecutive waits will be high. This is because the wait count gets incremented each time the server begins a new wait, but the time waited between the successive waits are high indicating a slow network or an inefficient client process or client process doing lot of work and no problem exists in the database. Right?
4. When a session connects to the database, but remains idle, the WAIT count (TOTAL_WAITS) will be 1, but the TIME_WAITED keeps incrementing. Right?
5. In that example, although you loop through 10000 times, is there any reason why the total_waits is only 4 and the time_waited is only 14. My understanding is that total_waits keeps incrementing each time the server process begins a new wait (so it should be 10000 in this case) and the time_waited will be reset each time the client sends a request to the server (in this case it will send 10000 requests). Is that correct ?
6. Also, in that chapter on page 467, you mention that "... event was 8.18 seconds. This just means I waited 8.18 seconds after sending the ALTER SESSION command to the database to send next command....".
In this context, I refers to the database server process.
Correct?
Why is the "SQL*Net message from client" for the ALTER SESSION command not in the trace file?
Why is the ela = 0 for the subsequent "SQL*Net message from client" message (which appears after the EXEC#3 line)
When a client issues a SQL command, is the following the correct sequence that you could see in the trace file?

-- client sent a request. So print "SQL*Net from client"
WAIT #n "SQL*Net message from client" ela = xxx
PARSING IN CURSOR #n......
<SQL statement here>
WAIT #n "<If any non-idle waits associated with parse>"
PARSE #n c=xxx e=zzz ....."
WAIT #n "<If any non-idle waits associated with exec>"
EXEC #n ....
WAIT #n "<If any non-idle waits associated with fetch>"
FETCH #n ....
WAIT #n "SQL*Net message to client....."

-- wait for another request
WAIT #n "SQL*Net message from client...."

I would appreciate if you could confirm my understanding on the wait events. If not please help me.

Tom Kyte
February 07, 2005 - 4:16 am UTC

these are looking very familar - i feel like I've answered some of them before.....


1) the message from client increments as long as the server process is sitting there waiting and waiting for the client to say something. If you log into sqlplus in the morning -- do nothing all day, and then before you leave home -- issue "select * from dual", that would have generated a days worth of "message from client" wait time prior to the select....


2) as long as we keep the server busy -- it cannot be waiting for us, hence that is why I would expect a low number -- because we are keeping it busy. yes, when we send a request to the server, it writes out the wait, resets the counters, does whatever we told it to and then waits for the next request. if the next request is there already (we are keeping it busy) then there would not be a wait on the message from client.


3) in this case yes, since the message from client is "saved up" until we send it a message. one big wait with lots of time in it.

4) v$session wait will show the increasing time waited on this event, yes.

5) I did multiple statements there -- we "saved" the before waits via an insert, did a couple of statements. that is why you see multiple waits there.

6) No, "I" refered to "me, tom". I waited 8 seconds after sending the alter to send the next command.

if there was no wait, that means it didn't wait -- there was no message from client to be recorded (think about it, the alter turned on tracing, prior to turning on tracing, we are not recording wait times, there was nothing to report on! we only trace when we trace)



How to simulate

A reader, February 07, 2005 - 6:06 pm UTC

Hi Tom,

I want to simulate a situation where there is inefficiency in the client application. For that I wrote the following PL/SQL block.

-- Capture the initial statistics
drop table prev_sess_wait
/
create table prev_sess_wait
as select sid,event,total_waits,time_waited
from v$session_event
where sid = '87'
and event = 'SQL*Net message from client'
/
-- Issue SELECT stmt to generate 'from client' message
declare
x char(10);
begin
select 'aba' into x from dual;
end;
/
-- Simulate a scenario to show inefficiency in client
-- application (tight loop with a huge loop index)
declare
j number;
begin
for j in 1..1000000000
loop
null;
end loop;
end;
/
-- Issue SELECT stmt to generate 'from client' message
declare
x char(10);
begin
select 'aba' into x from dual;
end;
/
-- Get the delta.
select a.event,
b.total_waits - a.total_waits,
b.time_waited - a.time_waited
from v$session_event b,
prev_sess_wait a
where a.sid = b.sid
and a.event = b.event
/


I was trying to simulate a scenario where there is inefficient client/application code and to test my understanding of SQL*Net message from client based on your feedback. I was also hoping that there would be a huge difference in TIME_WAITED between the first and second select statements (because of the tight loop in between that loops for a large number of iterations). But the output I get always prints a lower value for time_waited column.

I am logging into the server where Oracle is running and executing a SQL*Plus session.

I am not sure what I am doing wrong and is there a way to simulate this.

Please help.

Thanks

Tom Kyte
February 08, 2005 - 1:20 am UTC

then you would need to use a client - not plsql running in the server!

you'll have no sqlnet waits with the above script.

you would do something like:

select * from dual;
pause
select * from dual;


the pause, that is the inefficient client.

Help please

A reader, February 08, 2005 - 11:09 am UTC

Tom,

Thanks much for your previous response. I realized what I was doing wrong there. However, I did the following :

From my TOAD client session (SID = 30), I ran the following :

insert into begin_sess_wait
select sid,event,total_waits,time_waited
from v$session_event
where sid = '30'
and event = 'SQL*Net message from client'
/
declare
x char(10);
begin
select 'aba' into x from dual;
end;
/
-- To simulate a PAUSE
declare
j integer;
i integer;
begin
for j in 1..1000000
loop
for i in 1..1000
loop
null;
end loop;
end loop;
end;
/
declare
x char(10);
begin
select 'aba' into x from dual;
end;
/
insert into end_sess_wait
select a.sid,
a.event,
b.total_waits - a.total_waits,
b.time_waited - a.time_waited
from v$session_event b,
begin_sess_wait a
where a.sid = b.sid
and a.event = b.event
/
-- So an other session can see the wait times
commit
/

After running the above block of code, I logged into another session and did :
select * from end_sess_wait
hoping to get a big number (because of tight loop), but still got only low numbers.

Also, please let me know if the tight loop will be run on the server or on my client?

Please advise

Tom Kyte
February 09, 2005 - 1:25 am UTC

your pause similuation is NOT A PAUSE, it is busy work on the server, the server is not paused.

the plsql is run in the server.

Waits from TOAD DBA/Server Statistics/Waits

Ma$e, February 09, 2005 - 10:49 am UTC

Hi Tom,

I see the folllowing waits being reported on my d/b. Can you please point me in the right direction what I should be addressing first or what would give me the best roi ?
What numbers are of important? total_wait or time_waited or timeouts.

(sorry about the formatting)

Thanks

System Event total_wait timeouts Time_waited
rdbms ipc message 56898972 36120150 33668096.42
log file parallel write 14737457 14683699 38703.95
pmon timer 1723112 1722480 4991993.55
latch free 1640076 1283451 11745.19
jobq slave wait 338267 322892 1002667.58
enqueue 129245 87978 285329.53
smon timer 16726 16611 4818615.78
log file sync 12798192 7949 128665.68
PX Idle Wait 1953 1735 3545.33
library cache pin 11580 753 3005.81
buffer deadlock 126 126 0
PX Deq: Execute Reply 1612 120 592.35
kksfbc child completion 112 110 1.68
LGWR wait for redo copy 66536 101 13.85
buffer busy waits 5780368 89 38665.97
log buffer space 36496 84 1774.5
PX Deq: Execution Msg 2407 72 193.41
PX Deq: Signal ACK 161 70 5.14
library cache load lock 112 25 123.64
PX Deq: Table Q Normal 850458 12 725.86
row cache lock 54 12 43.71
log file switch completion 29408 7 1246.3
log file switch (checkpoint incomplete) 15 3 4.87
process startup 15354 2 1308.42
rdbms ipc reply 6160 2 89.36
control file heartbeat 1 1 4.01
library cache lock 8 1 3.17
sort segment request 1 1 1.01
inactive session 1 1 1
slave TJ process wait 1 1 0.01
SQL*Net message from client 4639248215 0 285400827.1
db file scattered read 897071974 0 2363449.05
db file sequential read 975636541 0 1040653.1
sbtwrite 22616707 0 119158.44
sbtbackup 492 0 72716.73
async disk IO 9096833 0 43924.3
db file parallel write 849749 0 35688.79
SQL*Net more data to client 157299429 0 20019.08
SQL*Net message to client 4639248348 0 10631.92
sbtclose2 476 0 10581.06
SQL*Net break/reset to client 114953 0 10059.36
db file parallel read 428253 0 9557.09
sbtinfo2 469 0 9330.47
direct path read 3938064 0 8320.02
control file parallel write 1968466 0 7518.8
direct path write 711456 0 4233.8
log file sequential read 185980 0 4085.61
SQL*Net message from dblink 1506 0 1441.8
SQL*Net more data from client 8183157 0 969.79
control file sequential read 2781538 0 869.75
sbtrestore 2 0 526.76
PX Deq Credit: send blkd 136578 0 236.9
log file single write 47992 0 231.86
SQL*Net more data from dblink 63050 0 106.85
control file single write 1714 0 95.59
switch logfile command 1455 0 81.82
PX Deq Credit: need buffer 81146 0 77.03
sbtread2 1893 0 42.62
sbtremove2 5 0 27.07
sbtinit 449 0 23.4
db file single write 4626 0 6.05
PX Deq: Parse Reply 145 0 1.38
local write wait 32 0 0.8
sbtcommand 449 0 0.56
PX Deq: Join ACK 176 0 0.35
refresh controlfile command 22 0 0.05
reliable message 1 0 0.03
sbtend 449 0 0.02
SQL*Net break/reset to dblink 65 0 0.02
wait list latch free 1 0 0.02
PX Deq: Table Q Get Keys 2 0 0.01
SQL*Net message to dblink 1506 0 0
sbtinit2 449 0 0
sbterror 38 0 0
PX qref latch 8 0 0
PX Deq: Msg Fragment 3 0 0
instance state change 2 0 0
PX Deq: Table Q qref 1 0 0


Tom Kyte
February 09, 2005 - 2:48 pm UTC

just erase it.

sort of meaningless.

I mean -- over what period of time.
are you experiencing a performance issue.
who is waiting - end users (they care) batches (they care less) background processes - they care even less (pmon timer -- big deal)



Let me try again...formating the data

Ma$e, February 09, 2005 - 11:29 am UTC

rdbms ipc message 56898972 36120150 33668096.42
log file parallel write 14737457 14683699 38703.95
pmon timer 1723112 1722480 4991993.55
latch free 1640076 1283451 11745.19
jobq slave wait 338267 322892 1002667.58
enqueue 129245 87978 285329.53
smon timer 16726 16611 4818615.78
log file sync 12798192 7949 128665.68
PX Idle Wait 1953 1735 3545.33
library cache pin 11580 753 3005.81
buffer deadlock 126 126 0
PX Deq: Execute Reply 1612 120 592.35
kksfbc child completion 112 110 1.68
LGWR wait for redo copy 66536 101 13.85
buffer busy waits 5780368 89 38665.97
log buffer space 36496 84 1774.5
PX Deq: Execution Msg 2407 72 193.41
PX Deq: Signal ACK 161 70 5.14
library cache load lock 112 25 123.64
PX Deq: Table Q Normal 850458 12 725.86
row cache lock 54 12 43.71
log file switch completion 29408 7 1246.3
log file switch (checkpoint incomplete) 15 3 4.87
process startup 15354 2 1308.42
rdbms ipc reply 6160 2 89.36
control file heartbeat 1 1 4.01
library cache lock 8 1 3.17
sort segment request 1 1 1.01
inactive session 1 1 1
slave TJ process wait 1 1 0.01
SQL*Net message from client 4639248215 0 285400827.1
db file scattered read 897071974 0 2363449.05
db file sequential read 975636541 0 1040653.1
sbtwrite 22616707 0 119158.44
sbtbackup 492 0 72716.73
async disk IO 9096833 0 43924.3
db file parallel write 849749 0 35688.79
SQL*Net more data to client 157299429 0 20019.08
SQL*Net message to client 4639248348 0 10631.92
sbtclose2 476 0 10581.06
SQL*Net break/reset to client 114953 0 10059.36
db file parallel read 428253 0 9557.09
sbtinfo2 469 0 9330.47
direct path read 3938064 0 8320.02
control file parallel write 1968466 0 7518.8
direct path write 711456 0 4233.8
log file sequential read 185980 0 4085.61
SQL*Net message from dblink 1506 0 1441.8
SQL*Net more data from client 8183157 0 969.79
control file sequential read 2781538 0 869.75
sbtrestore 2 0 526.76
PX Deq Credit: send blkd 136578 0 236.9
log file single write 47992 0 231.86
SQL*Net more data from dblink 63050 0 106.85
control file single write 1714 0 95.59
switch logfile command 1455 0 81.82
PX Deq Credit: need buffer 81146 0 77.03
sbtread2 1893 0 42.62
sbtremove2 5 0 27.07
sbtinit 449 0 23.4
db file single write 4626 0 6.05
PX Deq: Parse Reply 145 0 1.38
local write wait 32 0 0.8
sbtcommand 449 0 0.56
PX Deq: Join ACK 176 0 0.35
refresh controlfile command 22 0 0.05
reliable message 1 0 0.03
sbtend 449 0 0.02
SQL*Net break/reset to dblink 65 0 0.02
wait list latch free 1 0 0.02
PX Deq: Table Q Get Keys 2 0 0.01
SQL*Net message to dblink 1506 0 0
sbtinit2 449 0 0
sbterror 38 0 0
PX qref latch 8 0 0
PX Deq: Msg Fragment 3 0 0
instance state change 2 0 0
PX Deq: Table Q qref 1 0 0



Tom Kyte
February 09, 2005 - 2:53 pm UTC

rm *

it is not very meaningful.

tkprof - useful
statspack over 15 minutes during a peak load *maybe* useful

a dump of the stats since the database is up, not very useful.

(lots of latch frees, you are using binds right.......)

True. I forgot to give you the elapsed time

Ma$e, February 14, 2005 - 11:20 am UTC

Hi Tom,

I guess a small oversight can lead to inconclusive results.

These stats were collected over several days, at most over a week.

We are having a generic performance problem and I was hoping that your trained eye might pick out something right away.

For example, do the ratios between certain metrics make sense (I'm not sure any specific ones)?

Also. What do I need to look for if I am trying to determine if I need to add freelists? The reason I am going down this path is that we have about 50 d/b connections all trying to insert into at least 4 tables? How do I determnine/demonstrate that I need to consider increasing free lists and if required what pre and post metrics do I need to collect to determine the benefits derived.

Thanks once again.

Ma$e

Tom Kyte
February 14, 2005 - 1:57 pm UTC

definition of ratio (my definition bear in mind):

ratio: 1) a mathematical device used to hide and obscure information, most useful to make non-relevant information appear to be really important looking. 2) a source of great confusion and panic


the only ratio I look at anymore is soft parse pct -- it should always be HIGH (like 99 high) after the system is up for some period of time.


freelist content could be observed by seeing high values for data block in v$waitstat over some short measured period of time. so, you would look for a reduction in this metric after an increase of freelists on the segments involved.



SQL*Net message from client

Dennis, August 23, 2005 - 2:24 am UTC

Hi Tom,

We are having performance issue with an user screen.
I have traced the module using sql trace & tkprof.

The stats are as below

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 271 0.31 0.44 0 0 0 0
Execute 272 0.08 0.22 0 0 0 0
Fetch 272 325.42 329.89 0 3064670 0 12
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 815 325.81 330.55 0 3064670 0 12

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 753

Rows Row Source Operation
------- ---------------------------------------------------
0 FILTER
103 TABLE ACCESS BY INDEX ROWID BMF
72614 INDEX RANGE SCAN (object id 134629)
102 INDEX RANGE SCAN (object id 534095)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message from client 1630 5.71 1459.53
SQL*Net message to client 1629 0.01 0.01
latch free 76 0.02 0.64
********************************************************************************


The wait event is showing event as SQL*Net message from client

1> Can this be attributed to the network or is it the processing time in the client front end

2> The sql when run from sql plus comes out in 1 secs even when run from client PC.
The execution plan is optimal, why does it access 3064670 buffers to fetch 12 rows.

3> Does the high buffer read and CPU time have anything to do with the wait event.

thanks,
Dennis




Tom Kyte
August 24, 2005 - 3:59 am UTC

103 TABLE ACCESS BY INDEX ROWID BMF
72614 INDEX RANGE SCAN (object id 134629)

I'd attribute it to that, you hit that index 72,614 times only to goto the table and find that a small 103 rows actually match the criteria.

Your index likely needs to have a couple of more columns at the "end" of it, so that hte index range scan doesn't return 72,614

Look at the number of LIO's you have there for a query that returns 12 small records!


the sqlnet message from client in this case likely means "you sat idle for a while before running the query"

this is an example of what I meant by adding more columns to the index, see the difference in work performed:

create table t1 ( x int primary key, data char(30) );
create table t2 ( x int, y int, data char(30) );


insert into t1 select rownum, username from all_users;

insert into t2 select mod(rownum,28)+1, rownum, owner from all_objects;
create index t2_idx1 on t2(x);
exec dbms_stats.gather_table_stats( user, 'T1', cascade=>true );
exec dbms_stats.gather_table_stats( user, 'T2', cascade=>true );
set autotrace traceonly
@trace
select /*+ first_rows */ * from t1, t2 where t1.x = t2.x and t1.x <= 12 and t2.y <= 100;
create index t2_idx2 on t2(x,y) compute statistics;
SELECT /*+ first_rows */ * from t1, t2 where t1.x = t2.x and t1.x <= 12 and t2.y <= 100;
set autotrace off


select /*+ first_rows */ *
from
t1, t2 where t1.x = t2.x and t1.x <= 12 and t2.y <= 100


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 5 0.09 0.13 0 2084 0 47
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 7 0.09 0.13 0 2084 0 47

Rows Row Source Operation
------- ---------------------------------------------------
47 TABLE ACCESS BY INDEX ROWID T2 (cr=2084 r=0 w=0 time=131534 us)
12036 NESTED LOOPS (cr=52 r=0 w=0 time=52634 us)
12 TABLE ACCESS BY INDEX ROWID T1 (cr=9 r=0 w=0 time=121 us)
12 INDEX RANGE SCAN SYS_C008981 (cr=5 r=0 w=0 time=62 us)(object id 41218)
12023 INDEX RANGE SCAN T2_IDX1 (cr=43 r=0 w=0 time=19555 us)(object id 41220)
********************************************************************************
SELECT /*+ first_rows */ *
from
t1, t2 where t1.x = t2.x and t1.x <= 12 and t2.y <= 100


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 5 0.01 0.00 11 34 0 47
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 7 0.01 0.00 11 34 0 47

Rows Row Source Operation
------- ---------------------------------------------------
47 TABLE ACCESS BY INDEX ROWID T2 (cr=34 r=11 w=0 time=1724 us)
60 NESTED LOOPS (cr=29 r=11 w=0 time=1478 us)
12 TABLE ACCESS BY INDEX ROWID T1 (cr=9 r=0 w=0 time=104 us)
12 INDEX RANGE SCAN SYS_C008981 (cr=5 r=0 w=0 time=56 us)(object id 41218)
47 INDEX RANGE SCAN T2_IDX2 (cr=20 r=11 w=0 time=1191 us)(object id 41221)



SQL*Net message from client

Dennis, August 25, 2005 - 2:13 am UTC

Hi Tom,


the sqlnet message from client in this case likely means "you sat idle for a
while before running the query"

- Can this be attributed to network, please advice

thanks,
Dennis






Tom Kyte
August 25, 2005 - 3:36 am UTC

go into sqlplus.

turn on sql_trace

go to lunch

run select * from dual;
exit sqlplus

run tkprof.



sqlnet message from client is the time the server process spent sitting there "waiting" for the client to tell it to do something, in your trace file from above, you'll see a total wait of "your lunchtime"

Jack, September 03, 2005 - 2:09 am UTC

Excessive "SQL message from/to user" waits might be an indication of network problem when running batch program between hosts.

I have seen significant SQL message waits in a Peoplesoft batch job. We narrowed down the problem to some full duplex setting on interface card and switch.

Tom Kyte
September 03, 2005 - 8:53 am UTC

yes, but given this:

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message from client 1630 5.71 1459.53
SQL*Net message to client 1629 0.01 0.01
latch free 76 0.02 0.64


I tend to doubt that, this is likely due to client fetching bunch of data and taking it's good time about processing it before asking for more.

async io and LGWR

reader, September 03, 2005 - 10:03 pm UTC

From your reply earlier on this thread,
<quote>As I said, we can use IO slaves to simulate async IO or if there is no async io,
we would do it sequentially.<quote>

Is there a way to find that if LGWR is writing sequentially to all members or using async io slaves? Thanks.

Tom Kyte
September 04, 2005 - 10:09 am UTC

show parameter, see if io slaves have been asked for or not.

dbwr_io_slaves

is currently used to configure both dbwr and lgwr IO slaves.


brief snippet from my forthcoming book
<quote>
I/O Slaves

I/O slaves are used to emulate asynchronous I/O for systems or devices that do not support it. For example, tape devices (which are notoriously slow) do not support asynchronous I/O. By using I/O slaves, we can mimic for tape drives what the operating system normally provides for disk drives. Just as with true asynchronous I/O, the process writing to the device batches a large amount of data and hands it off to be written. When the data is successfully written, the writer (our I/O slave this time, not the operating system) signals the original invoker, who removes this batch of data from its list of data that needs to be written. In this fashion, we can achieve a much higher throughput, since the I/O slaves are the ones waiting for the slow device, while their caller is off doing other important work getting the data together for the next write.

I/O slaves are used in a couple of places in Oracle. DBWn and LGWR can make use of them to simulate asynchronous I/O, and RMAN will make use of them when writing to tape.

Two parameters control the use of I/O slaves:

* BACKUP_TAPE_IO_SLAVES: This parameter specifies whether I/O slaves are used by RMAN to back up, copy, or restore data to tape. Since this parameter is designed around tape devices, and tape devices may be accessed by only one process at any time, this parameter is a Boolean, and not the number of slaves to use, as you might expect. RMAN will start up as many slaves as necessary for the number of physical devices being used. When BACKUP_TAPE_IO_SLAVES = TRUE, an I/O slave process is used to write to or read from a tape device. If this parameter is FALSE (the default), then I/O slaves are not used for backups. Instead, the dedicated server process engaged in the backup will access the tape device.

* DBWR_IO_SLAVES: This parameter specifies the number of I/O slaves used by the DBW0 process. The DBW0 process and its slaves always perform the writing to disk of dirty blocks in the buffer cache. By default, the value is 0 and I/O slaves are not used. Note that if you set this parameter to a nonzero value, LGWR and ARCH will use their own I/O slaves as well-up to four I/O slaves for LGWR and ARCH will be permitted.

The DBWR I/O slaves appear with the name I1nn, and the LGWR I/O slaves appear with the name I2nn, where nn is a number.


Jack, September 04, 2005 - 2:05 am UTC

Tom,

Is it possible to have asktom send an email notification when you posted a follow-up?

Tom Kyte
September 04, 2005 - 10:10 am UTC

only if you asked the question in the first place currently.....

jobq slave wait

Sami, September 11, 2005 - 3:00 pm UTC

Dear Tom,

We just upgrade our production database this weekend from 8.1.7.3 (2 Node OPS) to 9.2.0.6 (3 Node RAC - Solaris 5.9).

We see lots of "jobq slave wait" event. The number of job queue process defined as 18 and at any point we don't see any
more than 2 or 3 jobs running.

Do we have to change any specific init parameters?

Tom Kyte
September 11, 2005 - 6:34 pm UTC

well, if you want to remove those waits -- make them (the job queues) do more.

It is an idle wait event, the job queue processes are just hanging out waiting to be told to do something.

it is a new idle wait for the cjq0 (job queue coordinator) process in 9i.

jobq_slave_wait --Okay to ignore?

A reader, September 11, 2005 - 7:19 pm UTC

Thanks Tom.

What you are saying is "jobq_slave_wait" event is harmless similar to "SQL*Net message from client".

Hence we don't need to worry about this wait. Am I right?

Tom Kyte
September 11, 2005 - 7:23 pm UTC

yes, it could be considered similar to that wait.

*similar*, for sql*net message from client could be a seriously bad wait event to have.

v$system_event.time_waited_micro

Roger, November 11, 2005 - 10:10 am UTC

Hi Tom,

I am trying to understand the output of v$system_event. The times for wait events in v$system_event are from the start of the instance, right? Based on this, here is a quick test that confused me (I hope the formatting doesn't get messed up):

[code]
09:43:08 SQL> select * from (select x.*, rownum rnum from v$system_event x order by x.time_waited_micro desc) where rownum = 1;

EVENT                                                            TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
---------------------------------------------------------------- ----------- -------------- ----------- ------------
TIME_WAITED_MICRO       RNUM
----------------- ----------
rdbms ipc message                                                     117821         110135    39176981          333
       3.9177E+11          4


Elapsed: 00:00:00.00
09:44:08 SQL> set line 150
09:44:19 SQL> select * from (select x.*, rownum rnum from v$system_event x order by x.time_waited_micro desc) where rownum = 1;

EVENT                                                            TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO       RNUM
---------------------------------------------------------------- ----------- -------------- ----------- ------------ ----------------- ----------
rdbms ipc message                                                     117850         110158    39181789          332        3.9182E+11          4

Elapsed: 00:00:00.00
09:44:20 SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
09:44:49 SQL> startup;
ORACLE instance started.

Total System Global Area  135338868 bytes
Fixed Size                   453492 bytes
Variable Size             109051904 bytes
Database Buffers           25165824 bytes
Redo Buffers                 667648 bytes
Database mounted.
Database opened.
09:45:13 SQL> select * from (select x.*, rownum rnum from v$system_event x order by x.time_waited_micro desc) where rownum = 1;

EVENT                                                            TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO       RNUM
---------------------------------------------------------------- ----------- -------------- ----------- ------------ ----------------- ----------
rdbms ipc message                                                         96             53        7368           77          73680537          3

Elapsed: 00:00:00.00
09:45:18 SQL> select * from (select x.*, rownum rnum from v$system_event x order by x.time_waited_micro desc) where rownum = 1;

EVENT                                                            TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO       RNUM
---------------------------------------------------------------- ----------- -------------- ----------- ------------ ----------------- ----------
rdbms ipc message                                                        114             71       13513          119         135131484          3

Elapsed: 00:00:00.00
09:45:32 SQL>
09:45:39 SQL> select * from (select x.*, rownum rnum from v$system_event x order by x.time_waited_micro desc) where rownum = 1;

EVENT                                                            TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO       RNUM
---------------------------------------------------------------- ----------- -------------- ----------- ------------ ----------------- ----------
rdbms ipc message                                                        138             95       21707          157         217071883          3

Elapsed: 00:00:00.00
09:45:53 SQL> select * from (select x.*, rownum rnum from v$system_event x order by x.time_waited_micro desc) where rownum = 1;

EVENT                                                            TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO       RNUM
---------------------------------------------------------------- ----------- -------------- ----------- ------------ ----------------- ----------
rdbms ipc message                                                        211            147       34088          162         340882760          3

Elapsed: 00:00:00.00
[/code]

As you can see, the time_waited_micro column had a large value before I restarted the instance but was reset on instance startup. After the restart, time_waited_micro was increasing faster than time was moving! 35 seconds later total_waited_micro went from 73680537 (approx 74 seconds) to 340882760 (approx 340 seconds). We can clearly see from the time stamps, the instance had not been up for 267 (340-74) seconds. 

This was just a short example. In our production systems when I look at the instance startup time (v$instance) and compare it to the most waited event, the most waited event has waited more than 10 times the time since the instance started.

Can you please help me understand this better? Thanks. 

Tom Kyte
November 12, 2005 - 8:36 am UTC

rdbms ipc message is not a good one to use - we are constantly and forever doing that one. it is not a meaningful one for us to look at even.

look at and use an event that means something to us outside of the background processes.

wait event

Roger, November 14, 2005 - 4:49 pm UTC

Hi Tom,

Thanks for the response. Can you please give me an example of a event that I can use? A simple example would be very useful. Thank you for your valuable time.



Tom Kyte
November 15, 2005 - 7:29 am UTC

what events do you have in your list to choose from.

PX Deq Credit: send blkd

Ashish Kapur, December 14, 2005 - 2:49 am UTC

Hi Tom,

Can you please let me know the above mentioned wait event. What are the causes of this wait event.

How best to overcome this wait event.

Thanks,
Ashish

Tom Kyte
December 14, 2005 - 8:28 am UTC

see support Note 271767.1 on metalink for details.

Wait Events

Karmit, March 05, 2006 - 6:16 am UTC

Hi Tom,

We are trying to continuously performance tune various processes in our
development database, ranging from data load processes to end user sql's.
Sometimes we need to change the method of doing things i.e implement some
new/better feature or rewrite end user sql's and we've acheived good results.
We have been using basic "instrumentation" so far - i.e make the process log
the start/end time during each step. Using this "log" we could make out which
step was an issue and using our "collective knowledge" plus "asktom/metalink
reference" tune that step.

However, now we have come to point where we are running multiple things at
the same time (as would happen in production) and someone raised an "expert
point" that we need to start logging "internal statistics" and our tuning
henceforth should be guided not only by start/end log times of the processes
but mostly by the wait events/ internal stats for each session. This should
also help us to determine exactly why a certain process took longer/shorter
than usual and what else was going on the system during its run.

I've been thinking about what all needs to be done to incorporate such a
thing. I understand that AWR is available in 10g - but main issue is that
its stats are purged after regular intervals and also its stats are system
wide - which is not very useful if trying to diagnose a particular process.

As a supplement to AWR, would taking a snap of the following views
help? There would be a procedure which will take in parameters like SID,
serial# (determined using mystat) and would be called by each package/procedure
in the database as soon as it enters the BEGIN part of the code and just
before its about to exit.

a) V$SESSION_EVENT
b) V$SESSTAT
c) V$SESSION_WAIT

1. Would like to hear your views / recommendations on the above.

2. Also, considering a RAC environment should I be using using GV$ views?

3. One more thing I'm having issues understanding is the parallel query
slaves get allocated their own SID's - which are different to the
PQ Coordinator. In such a case will it be enough to take a snap of just
the PQ Coordinator process? or should it be taking snap of all the
slave SIDs?! Sometimes (by design) the PQ coordinator sits on
node 1 while the slaves are on node 2. I guess this makes it all the
more critical to access GV$ instead of V$? But given the fact that
most GV$ views "duplicate" information - which ones should I be using
and which not? Or rather "how" should I be accessing ONLY the "correct"
information?

Thanks!
Karmit

Tom Kyte
March 05, 2006 - 1:46 pm UTC

awr data lives for a long time, days and days - so what if it is purged? You can control that as well if you like (the purging).

I would not snap those views - they would be rather not useful - how would you use snaps of them to determine anything? They are a snapshot of various current sessions at a point in time. Sessions come, they go - I don't see that being useful

You use v$ views when interested in an INSTANCES data.
You use gv$ views when interested in data across instances, or in another instance.

snapping these sids would not make sense to me either, what would you do with it? they are "at a point in time" and come and go (get associated with different coordinators all over the place)

Wait Events

Karmit, March 05, 2006 - 2:44 pm UTC

Hi Tom,

The purpose of taking a snap of the v$sesstat OR v$session_event
per SID is to store the events/ stats that the session was waiting
on/ generated during *that* run. We have various stored procedures
/packages which are run at various times to load/refresh data in
the database.

All these PL/SQL procedures store the start/end date with an
application generated "jobid" within the database. This is what
we use to calulate the "elapsed time" of a specific run of a specfic
job. Now, if we do some "performance tuning" - and check the
elapsed time - which hopefully has decreased, we assume that
our "change" helped and theorize why it did so. No FACTS.

So, if we had a procedure which takes a snap of v$sesstat/event
views - taking input as the SID, we could then call this procedure
twice from within all the application procedures (which would
determine the SID using v$mystat and pass it on as parameter) -
first time as soon as the "to be instrumented" code BEGIN's
and second when its just about to exit.

Since we can also link these "snaps" of the views to the application
generated "jobid" of that process - we would have the ability to
analyze the behaviour of a specific run of a specific job and also
to compare it with any of its runs in the past.

Most importantly we could then use this to "performance tune" the job
in question and support any change in behaviour with
"facts from the snaps".

At least thats what the idea is!

But after I read your reply I'm now totally confused!!! since I've
read in your book (Expert Oracle - 9i and 10g Programming - pg 300)
that we could use v$ views to collect information like "redo
generated" . In fact doesn't your runstat_pkg do something similar?

Also, having a look at the book "Oracle Wait interface - Shee/Deshpande"
seems to advocate a session logoff trigger to store this information
(however in that case I lose the ability to track it to my application
job run - hence a stored procedure).

If the v$session_event view shows the events waited on by the
session and the v$sesstat shows the stats - why would storing them
(with the ability to link them to the applications job) not help?

AWR for me is of little use since I can't link it to my applications
jobs - or use it to compare different runs of the same job - OR
to analyze what exactly it did/ waited on etc.

Please enlighten!

Thanks & Regards,
Karmit


Tom Kyte
March 06, 2006 - 8:33 am UTC

all you need is the ending view for a session, unless you are talking about "we would like to monitor process X" then yes it would/could make sense but that wasn't spelled out in the first case.



Wait events

Karmit, March 06, 2006 - 10:52 am UTC

Tom,

Sorry! I thought I'd made it quite clear especially with -
"This should also help us to determine exactly why a certain process took longer/shorter than usual and what else was going on the system during its run...." ...maybe
not clear enough.

Again, why only one snap at the end? shouldn't it be
start/end of the session? (as you might do in runstats?)

Also, in case of a parallel execution where there are
multiple SIDs, will taking the snap only for the SID associated with the PQ coordinator make sense?

Thanks in advance,
Karmit

Is i am looking in wrong view

Orakle_lover, March 06, 2006 - 6:38 pm UTC

I was trying to know the number of user commits in my session.
So my query was :

  1  select a.name,a.class,b.value
  2  from v$statname a,v$mystat b
  3  where a.statistic#=b.statistic#
  4* and a.name = 'user commits';
Before Commit:

NAME                   CLASS      VALUE
------------------------- ---------- ----------
user commits               1          0

SQL> commit;
Commit complete.
After Commit:
SQL> select a.name,a.class,b.value
  2  from v$statname a,v$mystat b
  3  where a.statistic#=b.statistic#
  4  and a.name = 'user commits';

NAME                   CLASS      VALUE
------------------------- ---------- ----------
user commits               1          0

Is the value does not change for each commit????
I think i was looking in some wrong way!!!!
What is the way to find the number of commits by a session?
If the "user commits" is not the one we have to look then what is the value is used for? 

Tom Kyte
March 08, 2006 - 4:20 pm UTC

because the commit of "nothing" is optimized away - we know we didn't need to do it, so we ignored it.

ops$tkyte@ORA10GR2> create table t ( x int );

Table created.

ops$tkyte@ORA10GR2> insert into t values ( 1 );

1 row created.

ops$tkyte@ORA10GR2> commit;

Commit complete.

ops$tkyte@ORA10GR2> @mystat "user commit"
ops$tkyte@ORA10GR2> set echo off

NAME                                                    VALUE
-------------------------------------------------- ----------
user commits                                                3

ops$tkyte@ORA10GR2> commit;

Commit complete.

ops$tkyte@ORA10GR2> @mystat "user commit"
ops$tkyte@ORA10GR2> set echo off

NAME                                                    VALUE
-------------------------------------------------- ----------
user commits                                                3

ops$tkyte@ORA10GR2> update t set x = 1;

1 row updated.

ops$tkyte@ORA10GR2> commit;

Commit complete.

ops$tkyte@ORA10GR2> @mystat "user commit"
ops$tkyte@ORA10GR2> set echo off

NAME                                                    VALUE
-------------------------------------------------- ----------
user commits                                                4



user commits go up when there is some actual work in need of committing. 

Oracle Wait Events

Karmit, March 07, 2006 - 4:42 pm UTC

Tom,

Sorry! I thought I'd made it quite clear especially with -
"This should also help us to determine exactly why a certain process took longer/shorter than usual and what else was going on the system during its run...." ...maybe not clear enough.

a) Again, why only one snap at the end? shouldn't it be
start/end of the session? (as you might do in runstats?)

b) In case of parallel execution where there are
multiple SIDs, will taking the snap only for the SID
associated with the PQ coordinator make sense?

c) Is it possible for a session to post the "amount of
temp space it used" immediately before it logsoff?
I'm trying to get this through v$tempseg_usage but it
seems that the information vanishes as soon as the
session ends?

Thanks,
Karmit

Tom Kyte
March 09, 2006 - 11:23 am UTC

you said "snap", a "snap" is an isolated point in time "lets copy the data".

You mean "when the session is over, we will - at the session level - save off the relevant wait events - the cumulative wait events - for that session. This is very different from "snapping" the v$ tables.


Yes, that can (obviously) be useful - maybe. You might have to go more granular with the ability to say "begin_task(:x)" and "end_task(:x)" and implement sort of a stack that can let you see what the various waits and statistics were for that single task in your larger program (aggregate waits for the entire session would be not extremely useful)

a) if you were doing this at the session level, no need - everything for the session starts at "zero". if you do it task oriented, then you need the begin/end sort of thing.

b) not necessarily. you know you waited - but not really why.

c) correct, it does disappear. I don't think that statistic is captured, but you do have physical reads/writes to temp which would be an indicator.

PL/SQL lock timer

Reader, March 07, 2006 - 9:29 pm UTC

I would appreciate if you could help me understand more about this wait .

Thanks

Tom Kyte
March 09, 2006 - 12:09 pm UTC

dbms_lock.sleep( 5 )

will generate 5 seconds of wait on plsql lock timer.

you are sleeping.

Alexander the ok, March 17, 2006 - 10:40 am UTC

Hi Tom, Happy Friday

I've been practicing a little backup and recovery stuff recently on my test database. I had to enable archivelogmode. Today I tried using our application against my database and I noticed on of the queries that normally doesn't take long hung on a wait. The wait was a log file switch(archive needed). The docs say this:

"Waiting for a log switch because the log that the LGWR will be switching into has not been archived yet".

I don't really know what this means. "Archived" as in, backed up? How do you archive something? I thought once you enable archivelogmode on you system, oracle just continues to write changes to your redo log files instead of writing over them in a circular fashion.

I also noticed I had a ton of archive files (I think that's what the are), they're the same size as my redo log files, with this type of naming convention "ARC01028.001". I had almost 50 of them in the directory I specified when I enabled archivelogmode in OEM. Is this normal to have this many files generated? I've only had this set for a week or so, and it's my test single user system. I couldn't believe that much was created from the little work I do. Does this mean I have too few redo log files, or they're too small, or both?

I've read through the backup and recovery basics guide once already. This is still kid of confusing I hope you can clarify these questions. Thank you.

Tom Kyte
March 17, 2006 - 5:51 pm UTC

It means lgwr is waiting for ARCH to make a copy of the file elsewhere before it can reuse it.


This is exactly what archive log does - as lgwr fills up a log file, ARCH copies it to the archive destination and names them sensibly. You then back this up to some other storage (get them off of the system so they are safe).

In the event you need to restore - you restore the datafile(s) and apply the archives that were created since the backup was taken

Alexander the ok, March 20, 2006 - 9:52 am UTC

Ok so when this occurs, I should backup my archive files? How does one stay ahead of this problem so it doesn't happen all of a sudden on a production system and all kinds of stuff comes to a halt?

I have a book that basically says this problem is due to having too few archiver processes configured, or your redo logs are too small. Can you provide guidance as to determine which of these it might be, and how to know how to configured how many ARCH(n) processes and redo log sizes?

My redo logs are currently 20 megs in size, and I have 5 of them. This is an OLTP system, roughly 35 gb of data. I know these settings depend on a lot of things, I'm just looking to get some idea, a ball park. Thanks as always.

Tom Kyte
March 22, 2006 - 12:33 pm UTC

You should be CONSTANTLY AND CONTINUALLY backing them up.

scenario: machine fails, takes disk with it.

if you have not constantly been backing up the archives, you have just lost them. Your job is to make sure they are safe, away, elsewhere.


whether that book is right or not depends on whether the problem was

a) you didn't have automatic archiving enabled
b) you ran out of room in the archive destination
c) you really couldn't archive fast enough



long waits while disks are not busy

Andr, March 24, 2006 - 9:36 am UTC

Hi Tom.
Could you please give some tips to understand the situation:
Sometimes in our database DBWR and CKPT waits very long.
Right after their status has become 'waiting', others sessions become
waiting also. (very long, sometimes more than 90 sec)
Session's waits are various: db file sequential read,
buffer busy waits, direct path read etc)

It is imortant that it is not a checkpoint, because :
1) In the alert log, event 'checkpoint completed' appear much later after
hang finishing (10-15 minutes after it).
2) status of all logfiles ('current','active','inactive') is not changing
during the hang
(status before the hang and after is identical for all logfiles)

Also, at these moments the disks activity is not large:
(OS iostat) : 10-30 kbps. Disks with rollback have some more activity:
500-700 kbps (only writes).
Behind these moments disk activity can reach 1000-3000 kbps and more.
CPU activity is about 30-40%

sessions waits like this (from v$session_wait):
SID EVENT_NAME P1TEXT P2TEXT P3TEXT P1 P2 P3 SEC_IN_W STAT
--- -------------------- --------- ---------- --------- ------ ------ ---- ------ ------
...
11 rdbms ipc message timeout 030000 0 0 1360 WATING
56 buffer busy waits file# block# id 04 7781 130 41 WATING
132 buffer busy waits file# block# id 08 114724 130 66 WATING
4 control file parallel files blocks requests 02 2 2 66 WATING
write
198 buffer busy waits file# block# id 08 114724 130 66 WATING
162 buffer busy waits file# block# id 05 23693 130 21 WATING
141 buffer busy waits file# block# id 08 2526 130 42 WATING
94 buffer busy waits file# block# id 04 7781 130 25 WATING
126 buffer busy waits file# block# id 08 2526 130 30 WATING
125 buffer busy waits file# block# id 05 23693 130 27 WATING
68 db file scattered rea file# block# blocks 08 114724 3 66 WATING
115 db file scattered rea file# block# blocks 04 7781 2 66 WATING
135 db file scattered rea file# block# blocks 05 23693 2 41 WATING
196 db file scattered rea file# block# blocks 08 2526 4 65 WATING
2 db file parallel writ files blocks requests 0208 0 0 103 WATING
201 direct path read file num first dba block cnt 0203 421513 1 66 WATING
160 direct path write (lo file num first dba block cnt 06 156319 2 25 WATING
.... etc

( 4 - CKPT, 2 - DBWR, 11 - ARC;
among file# waits are no files with rollback segments )

Oracle 8i
log_checkpoint_interval 0
log_checkpoint_timeout 0
redo logs about 400Mb, they are on the separate disks

What is it ? Why CKPT waits so long ? Why session is reading one block so long (including direct path)?
We very hope for your help.


strange waits (long waits while disks are not busy)

Andr, March 29, 2006 - 2:39 am UTC

Hi Tom.
We are hopening for your help still. (see above)
Could you give us any supposition ?

Tom Kyte
March 29, 2006 - 7:16 am UTC

you have really long IO waits maybe disk problem?

strange waits

A reader, March 30, 2006 - 3:19 am UTC

Hi Tom.
Now I have some more information.
_Presumably_ (but I'm not sured yet finally) it can happen when the current log is filled about 90% (not always) . It is important that dbwr _and_ ckpt waits. Also during the waiting the table x$kcccp is not changing (in other time it changes constantly). Sometimes at that the disk activity is really about 0. The 'Completed checkpoint' appeared in the alert.log some minutes later. For example (at 20 sec interval)
disks: tps kb_read kb_wrtn
disk1 0.0 0 0
disk2 0.0 0 0
Out of that disks work normally.
We have one DBWR and async io is true. The Buffer cache is more 1Gb. But we have no free buffer waits.

Tom Kyte
March 31, 2006 - 11:01 am UTC

then, if it is not causing waits to happen elsewhere in the system - is there an identified problem you are trying to solve?

wait event,

A reader, March 31, 2006 - 10:13 am UTC

What is "wait for master scn" event?
In my database, one of my INSERT statement is showing on this wait for a long time.

EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
------------------------------ ----------- -------------- ----------- ------------
wait for master scn 108775355 138 3354407 0

What would cause this and how to fix this?

Thanks,


Tom Kyte
March 31, 2006 - 12:48 pm UTC

you have an insert that waited 108,775,355 for this event?

followup,

A reader, March 31, 2006 - 1:58 pm UTC

Yes. It is for INSERT statement. Later we killed the SID and restarted the job.

Thanks

Tom Kyte
March 31, 2006 - 3:00 pm UTC

using rac? distributed queries?

follow up,

A reader, March 31, 2006 - 3:16 pm UTC

Sorry..I didn't put that information earlier.
The database is a 10.1.0.4 on a 3 node RAC.

Another question:
I have seen in our other databases as well, parallel query sometimes just hang on RAC machines (not always but whenever there are good number of sessions starting at the same time).

Why is that?

The wait events in those cases will be
PX Deq: Execute Reply,gc cr request,gc buffer busy,
PX Deq: Execution Msg

Any settings at the database level to avoid these kind of wait events?



Tom Kyte
March 31, 2006 - 3:18 pm UTC

I'll ask you to work with support on this particular issue - I don't think we'd be able to get to the bottom of it efficient here.

strange waits (long waits while disks are not busy)

Andr, April 03, 2006 - 4:31 am UTC

Hi Tom
You said
"if it is not causing waits to happen elsewhere in the system - is there an identified problem you are trying to solve?"
- Long waits DBWR and CKPT is causing other long waits
(scattered read, direct read/write read).
And I don't understand what the reason of this. Definitely,it's not a disk problem . No LGWR waits also.
But I have no another suppositions about source of it. I'm puzzled.
(see the messages here of March 24 and March 30)


Tom Kyte
April 03, 2006 - 8:14 am UTC

dbwr and ckpt wait - so what?

unless my end users are - I don't care, they are just software.

so, is this causing a wait anywhere we CARE about.

(same about lgwr, unless I'm waiting on lgwr, I don't really *care*)

strange waits (long waits while disks are not busy)

Andr, April 03, 2006 - 9:08 am UTC

Hi Tom.
That's just the point that at the same time
some end users _also_ are waiting long. But there is no a disk activity at that. And the queue of the active users is increasing until the hung finish. And the incremental checkpoint position is not moving (x$kcccp is not changing) at that (it seems that it is frozen at these moments). And it isn't normal checkpoint. (I look at the log file status before and after).

For example (from from v$session_wait):
Event_name Seconds_in_wait state
...
direct path read 56 WATING
db file scattered read 41 WATING
...
control file parallel wr 56 WATING
db file parallel write 83 WATING


Tom Kyte
April 04, 2006 - 9:32 am UTC

if you are waiting 50 seconds for an IO to complete - you have something seriously wrong with the disks there. Don't you think?

strange waits (long waits while disks are not busy)

A reader, April 04, 2006 - 10:04 am UTC

Hi Tom.
"if you are waiting 50 seconds for an IO to complete - you have something seriously wrong with the disks there. Don't you think?"
Of course, I am not sured at all. But I suppose that it is not the disk problem. Because the correlation exists: the long waits appear only when CKPT is waiting (control file parallel write). And the situation occurs only a few times per day. May be it any oracle reason ?

Tom Kyte
April 04, 2006 - 7:23 pm UTC

you have 50 seconds of wait on an IO

that is a huge, gigantic, impossibly large number.




strange waits (long waits while disks are not busy)

Andr, April 05, 2006 - 2:04 am UTC

Hi Tom
Thanks for the answer.
But in what way I can identify exactly this disk problem in OS (unix). Our system administrator says the disks are all right. Mainly we have no long waits (only sometimes).

Tom Kyte
April 05, 2006 - 5:43 pm UTC

if you have 50 second waits for an IO to return, there is seriously something very very wrong. We can tell you how long we waited for IO.

long disk waits

A reader, April 12, 2006 - 4:05 am UTC

Hi Tom
Maybe these disk waits are caused by AIX ?
Can you give advice about oracle on AIX ?

Tom Kyte
April 12, 2006 - 8:07 am UTC

the IO waits would be for everyone, not just oracle.

long disk waits

A reader, April 13, 2006 - 6:12 am UTC

while oracle has io waits the file coping on disk works normally (no waits).

Tom Kyte
April 13, 2006 - 8:01 am UTC



We queue for IO using the same stuff as anything else.


SQL*Net break/reset to client

Avid Follower, May 18, 2006 - 10:17 am UTC

Hello Tom,

We often get errors in our Alert logfile referring to a certain trace file in udump for errors, and 90% of the times its
Dump of memory from 0x0274E8C0 to 0x0274EA00
274E8C0 3A20736C 62647520 73666564 00000000 [ls : udbdefs....]
[opipls:incorrect]
[ no. of binds...]
[opipls:sqlstr...]
[FATAL ERROR IN T]
[WO-TASK SERVER: ]
[error = %d..Prob]
[able error (ORA-]
[28546) in networ]
[k administration]
[....Network prot]
[ocol error on fi]
[rst data after n]
[ew connect..opit]
[sk_1....opitsk_2]
[....------------]
[---------.Remote]
[ SQL-DSR stateme]
[nt:.....opixrb-1]
[8...opixrb-14...]
Argument/Register addr=0x78B49B78.
Then followed by similar boxes with information of the Forms client. And then the following lines
O/S info: user: radoff, term: SKM-0312, ospid: 3168:3716, machine: SKM\SKM-0312
program:
last wait for 'SQL*Net break/reset to client' blocking sess=0x0 seq=689 wait_time=1
driver id=28444553, break?=1, =0
temporary object counter: 0

It's always this, we haven't been able to figure out what this means, seems some sort of locking somewhere, but what exactly do we need to look at. (I can paste you the rest of the lines that follow if they can help you in guiding us what do we need to look at). Especially the "last wait for 'SQL*Net break/reset to client'" I don't get.
If more of these come, severe locking is caused, or I should say when locking occurs then we find a number of occurences of such traces in the alert logfile.
I have contacted Oracle support, and we haven't been able to figure out the reason for TWO-TASK SERVER Error, so I was hoping that you could tell me what could be the reasons, and/or where do we need to look first. Application we try to check but since it's always a different client and a different query given, I can't put it to one application. Will be grateful if you could tell me what could possibly be happening: network, application, database or something else!
Regards,


Tom Kyte
May 19, 2006 - 9:47 am UTC

you should really utilize support for something like this.

SQL*Net break/reset to client

Avid Follower, May 23, 2006 - 10:35 am UTC

Ok, will try raising this issue again with them, but if you can only tell me, generally, what does this system wait actually mean, "SQL*Net break/reset to client"? as it appears in alot of examples when someone tries to find different types of waits on their databases. And what area one needs to look at in case this event is a problem area.

Thanks

Tom Kyte
May 23, 2006 - 3:29 pm UTC

problem in returning the status from dbms_pipe.send_message

supriya, May 30, 2006 - 6:20 am UTC

Hi Tom,

I have a procedure which calls one more procedure.
the 2nd proc has dbms_pipe.send_message command.

here its getting hanged and not returning status.

is there any way to return the status even it doesn't run... so, that the procedure to continue to execute the other statements...

plesae needs quick reply.

Thanks in advance,
Supriya




Tom Kyte
May 30, 2006 - 9:34 am UTC

did you perhaps specify a timeout or did you let it default to "wait forever"

If you don't want to wait forever (for more space in the pipe, it must be FULL), then tell us.

</code> http://docs.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_pipe.htm#sthref5417 <code>



rdbms ipc reply

A reader, May 31, 2006 - 5:49 pm UTC

Hi Tom there are some sessions in my database that in a wait state of "rdbms ipc reply"

When I turn tracing on, I get a trace file with the following line repeated over and over:

WAIT #0: nam='rdbms ipc reply' ela= 1953595 from_process=3 timeout=2 p3=0 obj#=-1 tim=1122179692553413

How do I tell what is actuall happening with the session and how do I resolve this wait?

Thanks.

Tom Kyte
June 01, 2006 - 9:44 am UTC


Who is waiting on it? (what processes)
What are they "doing" - what is that session (do you have any insight into that at all - looking at the session, what is it - a user session, a background, what?)

and - is this in fact causing an issue (eg: a human being is waiting for something)

follow-up

A Reader, June 01, 2006 - 11:05 am UTC

These are all SYS sqlplus sessions there were doing system state dumps. They were started by a monitoring script.

Looking at the v$session table, these sessions are active and waiting on a from_process = 3.

SELECT * FROM v$process WHERE pid =3;
yields:

ADDR PID SPID USERNAME SERIAL#
---------------- ---------- ------------ --------------- ----------
TERMINAL PROGRAM
------------------------------ ------------------------------------------------
TRACEID
--------------------------------------------------------------------------------
B LATCHWAIT LATCHSPIN PGA_USED_MEM PGA_ALLOC_MEM PGA_FREEABLE_MEM
- ---------------- ---------------- ------------ ------------- ----------------
PGA_MAX_MEM
-----------
00000000DF474868 3 8907 oracle 34
UNKNOWN oracle@opcenterdbc (TNS V1-V3)

470721 539361 131072
1580105


Tom Kyte
June 01, 2006 - 11:23 am UTC

well, there you go.

They sent a message to a background "please dump"

They are waiting for the background to come back and say "done dumping"

to remove this wait - stop dumping?

Follow-up

A Reader, June 01, 2006 - 11:38 am UTC

dumping of a system state dump lasts just a few minutes, if even that long.

Why would these sessions run for days to dump the system state.

I am trying to figure out why they are held up for so long.

I believe the following queries are useful, but I am not sure how to interpret the results of the second one:

SQL> select s.sid
  2  from v$process p,v$session s
  3  where s.paddr = p.addr
  4  and p.pid = 3;

       SID                                                                      
----------                                                                      
       283                                                                      

SQL> select * from V$SESSION_EVENT where sid =283;

       SID EVENT                                                                
---------- ----------------------------------------------------------------     
TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT   MAX_WAIT TIME_WAITED_MICRO
----------- -------------- ----------- ------------ ---------- -----------------
  EVENT_ID WAIT_CLASS_ID WAIT_CLASS#                                            
---------- ------------- -----------                                            
WAIT_CLASS                                                                      
----------------------------------------------------------------                
       283 SQL*Net message to client                                            
          6              0           0            0          0                17
2067390145    2000153315           7                                            
Network                                                                         
                                                                                
       283 SQL*Net message from client                                          
          6              0           0          .04          0              2504
1421975091    2723168908           6                                            
Idle                                                                            
                                                                                
       283 events in waitclass Other                                            
     346893         345621    67660408       195.05        208        6.7660E+11
1736664284    1893977003           0                                            
Other                                                                           
 

Tom Kyte
June 01, 2006 - 11:53 am UTC

if you are doing system state dumps, you MUST be working with support - please continue to work with them on this issue. The only reason you'd be doing this is because they asked you to.

oracle wait event,

A reader, June 07, 2006 - 11:00 am UTC

What is db file parallel read wait event?

I see this pretty much on RAC databases.

A sample output from tkprof output says this wait event was the biggest. Do you know why it is happening? How to fix this?

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.16 0.37 168 15801 0 0
Execute 1 5.05 142.68 14191 14284 0 0
Fetch 1 0.05 0.09 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 5.26 143.16 14359 30085 0 0

Sample output of tkprof:

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
rdbms ipc reply 2 0.00 0.00
gc cr grant 2-way 164 0.00 0.10
db file sequential read 195 0.09 2.01
gc cr multi block request 9644 0.01 1.33
db file parallel read 150 1.61 135.92
gc cr grant congested 1 0.00 0.00
latch: KCL gc element parent latch 3 0.01 0.01
latch: gcs resource hash 1 0.00 0.00
enq: PS - contention 50 0.00 0.02
latch: ges resource hash list 1 0.00 0.00
KJC: Wait for msg sends to complete 1 0.00 0.00
DFS lock handle 12 0.15 0.26
enq: PI - contention 4 0.00 0.00
PX Deq: reap credit 159 0.00 0.01
PX Deq: Join ACK 5 0.00 0.01
PX Deq: Parse Reply 7 0.29 0.31


Tom Kyte
June 07, 2006 - 3:05 pm UTC

wait events

mal, June 21, 2006 - 2:46 pm UTC

Can you please provide some suggestion after looking at following ?

select EVENT,TIME_WAITED from v$system_event order by TIME_WAITED desc

EVENT TIME_WAITED
---------------------------------------------------------------- -----------
SQL*Net message from client 489711645
rdbms ipc message 483178445
gcs remote message 110453129
ges remote message 55418765
pmon timer 55404668
smon timer 53678792
wakeup time manager 53421063
PX Idle Wait 8094027
db file scattered read 3396348
db file sequential read 3157320
ARCH wait on SENDREQ 2208004
global cache cr request 1270045
control file sequential read 1154137
enqueue 951439
PX Deq: Execution Msg 460557
db file parallel read 396697
log file sync 164629
log file parallel write 128685
log file sequential read 99154
SQL*Net more data to client 94446
global cache null to x 52629
wait for scn from all nodes 42464
buffer busy global CR 41099
control file parallel write 39633
log file switch (archiving needed) 34082
buffer busy waits 31820
global cache s to x 30801
KJC: Wait for msg sends to complete 30636
log buffer space 27504
global cache open x 24963
latch free 21894
PX Deq: Table Q Normal 17949
row cache lock 11890

select parameter, count, dlm_requests, dlm_conflicts, dlm_releases
2 from v$rowcache;

PARAMETER COUNT DLM_REQUESTS DLM_CONFLICTS DLM_RELEASES
-------------------------------- ---------- ------------ ------------- ------------
dc_free_extents 0 0 0 0
dc_used_extents 0 0 0 0
dc_segments 1346 14400 100 9286
dc_tablespaces 19 166 0 147
dc_tablespace_quotas 0 3388 51 10
dc_files 29 232 0 203
dc_users 23 212 0 189
dc_rollback_segments 108 270 34 0
dc_objects 1790 15774 171 12978
dc_global_oids 17 126 0 109
dc_constraints 0 808 23 1
dc_object_ids 1383 10918 25 9089
dc_sequences 20 580765 17744 67
dc_usernames 11 101 0 90
dc_database_links 1 8 0 7
dc_histogram_defs 4780 71509 27565 30214
dc_table_scns 62 47533 12766 1562
dc_outlines 0 0 0 0
dc_profiles 2 16 0 14
dc_encrypted_objects 0 0 0 0
dc_encryption_profiles 0 0 0 0
dc_qmc_cache_entries 0 0 0 0
dc_qmc_ldap_cache_entries 0 0 0 0
dc_users 0 0 0 0
dc_histogram_data 0 0 0 0
dc_histogram_data_values 0 0 0 0
dc_partition_scns 0 0 0 0
dc_user_grants 20 0 0 0
dc_app_role 0 0
select namespace, dlm_lock_requests, dlm_pin_requests, dlm_pin_releases, dlm_invalidation_requests, dlm_invalidations
2 from v$librarycache;

NAMESPACE DLM_LOCK_REQUESTS DLM_PIN_REQUESTS DLM_PIN_RELEASES DLM_INVALIDATION_REQUESTS DLM_INVALIDATIONS
--------------- ----------------- ---------------- ---------------- ------------------------- -----------------
SQL AREA 0 0 0 0 0
TABLE/PROCEDURE 1661706 16424 13373 11168 4
BODY 79 416 337 366 3
TRIGGER 0 274 239 274 0
INDEX 37495 193 82 42 0
CLUSTER 2144 37 22 26 0
OBJECT 0 0 0 0 0
PIPE 0 0 0 0 0
JAVA SOURCE 0 0 0 0 0
JAVA RESOURCE 0 0 0 0 0
JAVA DATA 0 0 0 0 0



Tom Kyte
June 22, 2006 - 11:43 am UTC

I think you should use statspack or AWR to look at things, since these are just a big bunch of numbers that have been accumulating since the database instance was started.

you need to look at a smaller, defined timeframe. and the statspack/awr reports will contain "meaningful useful information you can read and interpret and act on"

wait events

mal, June 23, 2006 - 7:11 pm UTC

Snap Id Snap Time Sessions Curs/Sess Comment
--------- ------------------ -------- --------- -------------------
Begin Snap: 33942 22-Jun-06 00:25:04 10,818 1.0
End Snap: 33981 22-Jun-06 21:25:00 12,017 1.0
Elapsed: 1,259.93 (mins)


Top 5 Timed Events
~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 10,628 26.96
global cache cr request 5,867,899 7,014 17.79
db file sequential read 1,825,166 6,794 17.23
db file scattered read 1,318,552 6,125 15.53
ARCH wait on SENDREQ 9,568 2,664 6.76


Tom Kyte
June 24, 2006 - 11:40 am UTC

1,259.93 minutes...

that does not fit into my definition of "you need to look at a smaller, defined timeframe. "

I was thinking more like 15 minutes, not 21 hours.


I see nothing wrong here, no one could. You have 75,595 seconds of elapsed time. Way too much time to average anything over.

SQL*Net Messsage from client

vivek, July 26, 2006 - 11:52 am UTC

Tom,

On my production server i have gathered some statspack report and found SQL*Net message from client wait number to be huge. I am aware of some network issues on the production site how can i confirm from my statspack report that this SQL*Net message from client or any other event is primarily because of network not a database bottleneck.
Abstract from statspack
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
db file parallel write 483 0 60 124 0.0
log file sync 3,144 8 38 12 0.3
log file parallel write 8,834 4,859 28 3 0.8
db file scattered read 17,257 0 22 1 1.5
control file parallel write 579 0 21 36 0.1
buffer busy waits 34 9 14 425 0.0
db file sequential read 5,311 0 13 2 0.5
latch free 135 20 6 41 0.0
direct path read 363 0 1 2 0.0
SQL*Net more data to client 5,479 0 0 0 0.5
direct path write 16 0 0 11 0.0
SQL*Net break/reset to clien 280 0 0 0 0.0
LGWR wait for redo copy 69 0 0 0 0.0
control file sequential read 588 0 0 0 0.1
log buffer space 1 0 0 4 0.0
SQL*Net message from client 35,008 0 66,622 1903 3.1
jobq slave wait 1,189 1,113 3,500 2943 0.1
SQL*Net more data from clien 2,720 0 0 0 0.2
SQL*Net message to client 35,127 0 0 0 3.1




Tom Kyte
July 26, 2006 - 12:03 pm UTC

it is likely neither a database or network.

It is just the client. It is the normal wait event accrued when you are connected but not asking the database to do anything.

Just log into sqlplus for example, let it sit there. Then select * from dual.

You'll have a lot of sqlnet message from client. The entire time you had the blinking prompt in sqlplus - connected but not active - was sqlnet message from client!

SQL*Net

vivek, July 27, 2006 - 12:18 pm UTC

Tom,

During the slow response time, Capacity planner statistics shows average wait (for SQL*Net break/reset to client event) jumped from 0.33 to 101.92 s and at the client side i recieved following error in sqlnet.ora
Tns error struct:
nr err code: 12224
TNS-12224: TNS:no listener
ns main err code: 12541
TNS-12541: TNS:no listener
ns secondary err code: 12560
nt main err code: 511
TNS-00511: No listener
nt secondary err code: 61
nt OS err code: 0

There were no errors in the listener log. So i believe that its not just idle time for server but it was not able to receive the request from clients. So please advise what conclusion can i make from this statatistics?

As i mentioned earlier that there are some reported issues with network and for that sniffer is going to be configured on router side, What can i monitor with in database so that next time if slow response issue is raised i should be able to identify the cause?


Thanks
Vivek

Tom Kyte
July 27, 2006 - 12:23 pm UTC

you asked about sqlnet message from client before???? I don't see any correlation between this and that.

What I said above applies.

Statspack Anomoly

Danny Gawarecki, October 05, 2006 - 1:06 pm UTC

Tom - We have an OLTP system that is capable of processing more than 1,000 logical transactions per second (although it currently runs around 400-500 per second now). We are making a change to the system to add a new table to query for every transaction that is processed. In addition there is a process to populate this new table that will completely delete and insert the data in the table every 20 seconds. We have done multiple 1 hour test runs of the current system and the system with the new table, etc. and captured CPU and database statistics. The CPU stays very even on both runs, but the database stats are wildly different. Below are some stats from each run that are wildly different. Those with a description ending in 2 are from the current production run and the others are from the run with the new table. We have repeated the tests three times and received the same results each time. Our quandry is that we can't understand why the db stats would improve by adding a new table and constantly deleting/inserting that table. In the very best case it should stay the same or possibly even become worse. Can you help explain what might be going on? Any help you can give will be appreciated and I can provide you with any other information that you might need. Thanks.

latch free 17,233 3,411 1,500 87 0.0
latch free2 205,379 66,944 865 4 0.1
log file sync 726,473 0 942 1 1.2
log file sync2 1,552,982 49 5,349 3 0.7
buffer busy waits 4,613 0 30 7 0.0
buffer busy waits2 148,357 0 249 2 0.1
LGWR wait for redo copy 8,500 0 1 0 0.0
LGWR wait for redo copy2 121,397 22 9 0 0.1


Tom Kyte
October 05, 2006 - 1:24 pm UTC

you say the cpu stays the same, but is the amount of work performed equal??

insufficient data to say anything so far.

Follow Up

Danny Gawarecki, October 05, 2006 - 1:49 pm UTC

The system was put through identical tests in both cases. The hardware, setup, time span and work done is the same between both tests. The only difference is the spicific differences that we are testing (i.e. the new table and processing for that table). We can do the exact same test two times in a row and get nearly (although not completely) identical outcomes from an application, database, and Unix system standpoint.

Tom Kyte
October 05, 2006 - 4:19 pm UTC

tell you what, zip up both statspack that cover the SAME AMOUNT OF WORK (eg: transactions committed should be equal) - save as a '.zyp' file and send it to me, I'll peek at it and see what I see. insufficient data in the above to say anything.


Update, I got the statspack and this is what I observed:

both statspacks are for 60 minutes.

one of them does 272.21 transactions per second, or a total of 982,079.238 commits issued by your transaction. That is "production"

The new_table_changes.lst does 168.06 transactions per second, or a total of 606,226.032

The "new table" statspack therefore reflects a 40% DECREASE in work performed during the same period of time.

so, it was sort of what I thought, they are doing significantly less "transactions" in the same amount of time, the code is not running faster due to these changes - it is doing less work in the same amount of time, which of course accounts for the reduced statistics reported above - less commits per hour = less log file sync waits per hour!


local write wait

abz, October 28, 2006 - 10:54 am UTC

can you please describe the wait event "local write wait".

Also
Please read the first section at the link
</code> http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevents003.htm <code>

I was not able to find the "local write wait" event in
the documentation, so this means according to the above link, "local write wait" event is not of "GREATEST INTEREST",
then as the manual suggests, I issued

SELECT name FROM V$EVENT_NAME ORDER BY name;

the V$event_name does not have any column describing the
cause of this event.

We are receiving a lot of this event, so it is of
GREATEST INTEREST to us.

How can Oracle leave such things undocumented, what
should a person receiving such events do in this case.






Tom Kyte
October 28, 2006 - 11:28 am UTC

can you show it to me IN CONTEXT "abz" ?

have you been to metalink.oracle.com? (one of the first places I would think to type in a wait event name and see what comes back?)

indeed, when I typed it in there, I found lots.

basically, you are waiting for dbwr to finish a local write (a write on your machine) in order to reuse a buffer.

do you do lots of truncates of tables that possibly have lots of blocks in the cache? You have to wait for them to be "checkpointed", that'll do it too.

IN CONTEXT?

abz, October 28, 2006 - 11:43 am UTC

"can you show it to me IN CONTEXT "abz" ?"
cant get it.

Ok, metalink provides the information, but I was surprised
why Oracle chooses to leave such things undocumented.

Also, the session in which I am receiving this error
is executing an INSERT statment not a TRUNCATE, is this
event also can occur in INSERT statements?



Tom Kyte
October 28, 2006 - 1:15 pm UTC

sure, dbwr is cleaning out some blocks.

IN CONTEXT?

abz, October 28, 2006 - 1:31 pm UTC

Can you please clarify the sentence
"can you show it to me IN CONTEXT "abz" ?"

I am note sure what you mean by above, but my point was
that metalink support is an optional thing, which, a client
who purchased Oracle software, might or might not purchase
metalink support. So things like this must be in the
documentation.




Tom Kyte
October 28, 2006 - 3:24 pm UTC

show me where you are "seeing them", tkprof, statspack, what.

in context

abz, October 28, 2006 - 6:06 pm UTC

Ok you mean where I saw the "local write wait" wait event.

I am using browser based dbconsole (database control)
of 10g, there I clicked on my session ID, and then a screen appeared where this event was shown under the heading WAIT.

You didnt comment on not documenting this.

Tom Kyte
October 29, 2006 - 7:03 am UTC

I'm not going to comment on everything - It isn't in the documention, I personally cannot say anything beyond "so sorry" to you.

And how much of a wait is this for you - you give absolutely no context here "abz"

More on SQL*Net break/reset to client

John Feibusch, November 06, 2006 - 3:13 am UTC

There've been some questions about "SQL*Net break/reset to client", and the description in the doc seems pretty unhelpful. It seems to me that this event occurs when a SQL statement execution generates an error, or when an interactive user aborts execution of a SQL statement with CTRL-C (which is really treated like an error in Oracle).

I've done 10046 traces and seen this event occur whenever a SQL error occurs.

That would mean that if you see a sudden increase in the number of waits on this event and it seems to be causing performance issues, you should see if one of your apps is getting a lot of SQL errors.


Even class "others"

A reader, December 20, 2006 - 1:54 pm UTC

Tom,
We are on 10gr2, RAC.

If I have to write a query to find out: Database time spent waiting (%) for event class "Others", which v$ view(s) would I use?

The reason I am asking is that grid control is constantly reporting that database time spent waiting (%) for event class "Other" is more than 40 throughout the day and often reaching 100%. This is happening on several databases, some of which have practically no activity during the day. I wrote the following queries to find out what % of wait is attributed to "Other":

select (sum(decode(wait_class#,0,time_waited,0))/sum(time_waited))*100 from gv$session_wait_class;

select (sum(decode(wait_class#,0,time_waited,0))/sum(time_waited))*100 from gv$system_wait_class;

The results were less than 0.1% in both cases throughout the day. The AWR/ADDM reports do not report any waits on "Other". I opened a SR with Oracle support 2 months ago but they are still unable to give me a resolution.

My thought is that if a database has 40%-100% waits on "Other" throughout the day, it will be a database where no work could be done. This is not the case. The databases are running fine.

Can you please help me on this issue? I understand that working with support and asking a question on this site are a duplication of effort, but you can understand my frustration when I do not get any resolution for 2 months and everyday I get 200-300 e-mail messages from grid control about these bogus waits.

Thanks

Tom Kyte
December 21, 2006 - 3:55 pm UTC

I asked around and got back:

If the query is reporting only a small value but EM is alerting because it believes that the value is high, then there must be a problem somewhere.
I haven't seen it at any other sites though.

As a workaround, they could try setting the alert threshold to 100 in EM to avoid the emails.

Event class "other"

A reader, December 21, 2006 - 6:37 pm UTC

Tom,
Thanks very much for replying. Oracle support did get back to me today and they are using the v$waitclassmetric view to get the information. The query is:

SELECT TO_CHAR(CAST(wcd.end_time AS TIMESTAMP),
'YYYY-MM-DD HH24:MI:SS TZD') time,
'wait_sess_cls', label, wcn.wait_class, 0, wcd.value
FROM (SELECT DISTINCT wait_class, wait_class_id
FROM v$event_name) wcn,
(SELECT end_time, wait_class_id,
'avg_users_waiting_on_class' as label
,average_waiter_count as value
FROM v$waitclassmetric
UNION ALL
SELECT end_time, wait_class_id,'dbtime_waitclass_pct' as label
, dbtime_in_wait as value
FROM v$waitclassmetric) wcd
WHERE wcn.wait_class_id = wcd.wait_class_id;

I am bit confused about the v$waitclassmetric view and the way it is being used in the query.

From the label, I can infer that the query is treating dbtime_in_wait as a percentage. From the Oracle reference manual, dbtime_in_wait is defined as "Database time spent in the wait". Now the documentation does not mention whether this is a percentage or actual time in cs. If the column is a percentage, then yes this view is showing values anywhere from 12% to 100% and I will have to open SR with RDBMS group since the numbers do not match what I get from other wait event views. If this column is just actual time in cs, then its interpretation in the query is wrong and it is a grid control issue.

I did try setting the threshold to 100%. It slows down the e-mails but does not get rid of them altogether because most times this number is at 100 for one instance or the other.

I will follow up with support. I thank you very much and really appreciate your taking time to respond back.


Still no info on all waits.

Evan, February 19, 2007 - 7:33 pm UTC

This was a great thread, but not exactly what I need.
I have a RAC with "reliable message" waits and a bunch of other, unknown to me, events. I just want to know what it is and how to tune it if needed.

Searching on Metalink and Google keeps showing it in the "list of events", but I never find a definition. How is a guy supposed to find these obscure waits?

Thanks
Tom Kyte
February 19, 2007 - 9:26 pm UTC

well, first I would ask myself "are my applications waiting on this" - tkprof would be the tool to use.

Clarity on Time out

A reader, June 13, 2007 - 9:24 am UTC

Hi tom ,

The v$system_event has a column named timed_out.
I read the reference manual where this is defined as
TOTAL_TIMEOUTS: Number of times the wait timed out

What exaclty is the meaning of this?
what happens when a wait is timed out.
Are all the waits timed out?
Does this mean that when a wait it timed out it goes on a sleep mode ?

Please can you elaborate the time out with an example.

Thanks in advance
Tom Kyte
June 13, 2007 - 1:57 pm UTC

some waits are waited for only for some period of time - then we time out - look around - make sure everything is OK - and go back to waiting.

Eg:

if you wait on an enqueue (to lock a row for example), you will wait 3 seconds, time out, wait 3 seconds, time out and so on - until the row becomes available.

Time out

RAM, June 13, 2007 - 2:28 pm UTC

Tom,
Now Lets say we have 2 sessions waiting on enqueue then the total_waits would be 2 now after waiting one of the session got hold of what it was waiting and the other session continues to wait and if this exceeds 3 seconds this times out and again comes back on wait which means that the TOTAL_WAITS is now 3 and the timed_out is 1.
And the total_wait_time would be 3*2=6 for the said duration
Is this correct?

Many Thanks
Tom Kyte
June 13, 2007 - 3:31 pm UTC

total wait time would be > 6 since one session waited and then got the resource and the other session waited 3 seconds twice.

buffer busy waits

Deepak, June 21, 2007 - 7:33 am UTC

Hi Tom,

We are encountering a lot of waits on "undo header". I believe it is same as "buffer busy waits". Am I correct?

And as we can not do much about automatic UNDO management, would like to know what best can be done to avoid this problem.
Tom Kyte
June 21, 2007 - 10:49 am UTC

if it were the same, they would be named the same.

what is "a lot", how are you measuring, and is it affecting the application you are actually concerned about tuning.

UNDO HEADER

Deepak, June 21, 2007 - 11:10 am UTC

Thanks a lot for your answer.

Actually am doing a schema import. And when queried the v$waitstat dyanamic performance view, it is showing that oracle is waiting for "undo header".

Is it normal?

Just wanted to know if we can do anything to reduce the figure. Just out of curiousity.
Tom Kyte
June 21, 2007 - 11:22 am UTC

if you are the only thing going - there will not really be any waits.

however, v$waitstat is cumulative from the beginning of the instance. if you want to reduce it, bounce the instance :)

and sigh, still no indication of how long a piece of string you believe you have.

is truncate a transaction ?

A reader, June 29, 2007 - 3:11 pm UTC

is truncate a transaction ?
Tom Kyte
July 03, 2007 - 8:44 am UTC

Transaction defined (copied from Expert Oracle Database Architecture)

<quote>
Transactions
Transactions are one of the features that set a database apart from a file system. In a file system, if you are in the middle of writing a file and the operating system crashes, this file is likely to be corrupted. It is true there are ¿journaled¿ file systems and the like, which may be able to recover your file to some point in time. However, if you need to keep two files synchronized, such as system won¿t help you there¿if you update one file, and the system fails before you finish updating the second, then you will have out-of-sync files.

This is the main purpose of transactions in the database; they take the database from one consistent state to the next. That is their job. When you commit work in the database, you are assured that either all of your changes have been saved or none of them is saved. Furthermore, you are assured that the various rules and checks that protect data integrity are implemented.

In the previous chapter we discussed transactions in terms of concurrency control and how, as a result of Oracle¿s multi-versioning read-consistent model, Oracle transactions can provide consistent data every time, under highly concurrent data access conditions. Transactions in Oracle exhibit all of the required ACID characteristics. ACID is an acronym for

* Atomicity: Either all of a transaction happens or none of it happens.
* Consistency: A transaction takes the database from one consistent state to the next.
* Isolation: The effects of a transaction may not be visible to other transactions until the transaction has committed.
* Durability: Once the transaction is committed, it is permanent.
</quote>


TRUNCATE is DDL

DDL is like a stored procedure - it is typically dozens or hundreds of statements executed as a single transaction.


Think of truncate as being logically equivalent to this plsql code:
begin
  COMMIT;  -- all DDL starts by closing any open tranaction
  do_truncate(); -- do the DDL
  COMMIT;  -- all DDL ends by committing when successful 
EXCEPTION
  when others then
       ROLLBACK;  -- all unsuccessful DDL rolls back anything it started doing
       RAISE;  -- and re-raises the error
END;



sql*net connection tracing

Reader, August 10, 2007 - 6:14 pm UTC

Hi Tom,
We have been receiving complaints from the application team regarding slowness in database. we have enabled session tracing and evaluated the trace files. The elapsed time for all the sql statements seem to be acceptable. But its taking so much of time for the output to be displayed on front end.
There are wait events sql*net message from client and sql*net message to client for each statements.

Do you what could be the issue?
How do we fix this?

Can you please let me know how to trace the sql*net connections and operations associated with it?
I would really appreciate you advice.
Thanks..

Tom Kyte
August 14, 2007 - 3:27 pm UTC

sqlnet message from client.

we are WAITING FOR THE FRONT END TO TELL US TO DO SOMETHING.

you have just shown to the application team that - well - problem is not database, problem apparently is their code.

The data is coming from the database fast - we are spending most of our time in the database waiting to be told to do something. Problem is not in database in all likelihood

control file heartbeat??

Hector Gabriel Ulloa Ligarius, August 22, 2007 - 10:53 am UTC

Hi Team Tom

What's that event?? "control file hearbeat"

The code :
col event for a60
col total_waits for 999,999,999,999
col total_timeouts for 999,999,999,999
col time_waited for 999G999G999G999
col average_wait for 999,999,999,999
select a.* , 
       b.startup_time
  from v$system_event a,
       v$instance b
 order by a.time_waited;


Result :

EVENT             : control file heartbeat
TOTAL_WAITS       : 1
TOTAL_TIMEOUTS    : 1
TIME_WAITED       : 430
AVERAGE_WAIT      : 430
TIME_WAITED_MICRO : 4296704
STARTUP_TIME      : 19-07-2007 02:07:57


TOTAL_WAITED is 430 seconds??

Is very high???

Thank you very much

Regards
Hector Gabriel Ulloa Ligarius
http://ligarius.wordpress.com
Tom Kyte
August 22, 2007 - 2:15 pm UTC

it is an event waited on by a background process in a clustered environment - it is normal to have it.

But what information can we find - not much

Evan, November 19, 2007 - 5:08 pm UTC

Tom,

You said "it is an event waited on by a background process in a clustered environment - it is normal to have it."

I wish you had more insight than I get elsewhere.
You say it is a background process, but not anything more. I would love for Oracle to provide a web page with all of the wait states AND WHAT THEY MEAN!!!

This is an ongoing problem. I need to know about wait state "xyz". When we search metalink, there is little to no information. When we google, the text we search is listed, but no information about it.

It is a shame there is no list of ALL wait events and what they mean for RAC.

I could give a list of the most obscure, but I would rather you get an address or dig up a listing of all of them for us users.

You must hear this all the time, but I really appreciate what you do for us all.

Thanks,

Evan

jobq slave waits...

VLS, March 13, 2008 - 10:52 am UTC

Hi Tom,

In one of the post above, you mentioned that jobq slave waits are idle and should be ignored, which is fine. But I have seen a strange behaviour on one of a 9205 database.

The parameter job_queue_processes are defined as 100.
Sometimes (almost 2-3 times a week), we observe almost 80-90 sessions waiting on jobq slave waits and all these are active sessions. Surprisingly, when we query for the sql that is being executed by these waiting sessions, the sql's are purely OLTP Application Queries and not a query that should be a part of job. Secondly, querying dba_jobs_running shows only 2 or 3 jobs running.

How is it possible ? Need your help in this.

Regards

Tom Kyte
March 15, 2008 - 8:40 am UTC

sorry, but I believe you mis-diagnosed something.

our dedicated and shared server processes do not wait on this event. The Background snpN or jNNN processes do.

so, maybe you are joining wrong?

rdbms ipc reply event debug

David, April 11, 2008 - 2:20 pm UTC

Tom,
I'm trying to debug the weird hangs for my truncate statements - they might hand for 10-20 minutes
(without any user locks blocking from another session)
and then all release at once.

I join V$SESSION with V$SESSION_WAIT and it shows that truncates seem to line up waiting for "rdbms ipc reply"
from process 5, Process with SID=5 is "smon timer".

Not sure what this could be waiting for... even log sync or buffer writes never take 10-20 minutes on our system :)


SELECT substr(module,1,28) as module,
EVENT,
case
when LAST_CALL_ET < 120 then LAST_CALL_ET||' seconds'
when LAST_CALL_ET < 3600 then round(LAST_CALL_ET/60,1)||' minutes'
when LAST_CALL_ET < 3600*24 then round(LAST_CALL_ET/60/60,1)||' hours'
else round(LAST_CALL_ET/60/60/24,1)||' days'
end LAST_CALL,
P1TEXT,
P1,
P2TEXT,
P2
FROM V$SESSION_WAIT JOIN V$SESSION s using (sid)
WHERE s.status = 'ACTIVE' and schemaname <> 'SYS'
order by 3 ;


MODULE EVENT LAST_CALL P1TEXT P1 P2TEXT P2
----------------------- ------------------ ----------- ------------- ---------- -------- ----------
SQL*Plus log file sync 0 seconds buffer# 3775 0
MYSELF(DEBUGGER) SQL*Net message to 0 seconds driver id 1650815232 #bytes 1
DBSync: _EDU.ADM_PUBLIS buffer busy waits 2.6 minutes file# 2 block# 2
DATA_COLLECTOR Truncate rdbms ipc reply 4.3 minutes from_process 5 timeout 21474812
DATA_COLLECTOR Truncate enqueue 4.3 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 4.3 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 2.9 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 7.3 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 4.3 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 2.9 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 4.3 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 4.3 minutes name|mode 1128857606 id1 1
DATA_COLLECTOR Truncate enqueue 76 seconds name|mode 1128857606 id1 1
REFRESH_SUM_PHP enqueue 4.2 minutes name|mode 1128857606 id1 1
DBSync: Disable constr enqueue 71 seconds name|mode 1128857606 id1 1

Tom Kyte
April 11, 2008 - 3:34 pm UTC

I would guess that you have lots of dirty buffers - and we are waiting for the necessary checkpoint that is issued for the truncate to complete - and the rest of the truncates are waiting on the first truncate.

So, do you see lots of write activity when this happens - if so, it is the large amount of dirty buffers that must be written to disk before the truncate can safely happen.

You can use something like fast_start_mttr_target to have the cache be kept "more clean" (limit the number of blocks that need checkpointing at any point in time) or just realize that the truncate is not an instantaneous thing - it might take a while to complete - depending on the amount of work to be done before it can be safely executed.

Dirty Buffers/IO bottleneck

David, April 14, 2008 - 4:01 pm UTC

Tom,
thank you very much - this is the problem.

I cleared some secondary issues on the instance, set fast_start_mttr_target to 60, added another DBRW process and clearly see now by iostat utilization that everything comes to halt when it lears the dirty buffers:

Normal state:

TARGET_MTTR EST_MTTR ACTUAL_REDO_BLKS RECOVERY_ESTD_IOS
----------- -------- ---------------- -----------------
60 59 232671 7702


when everything hangs:

TARGET_MTTR EST_MTTR ACTUAL_REDO_BLKS RECOVERY_ESTD_IOS
----------- -------- ---------------- -----------------
60 601 1021561 87394

Thanks a lot!


SQL*Net more data from dblink

Chaman, May 09, 2008 - 7:29 am UTC

Tom,
One of the query seems to be hanging while selecting data over the dblink and we get the wait event "SQL*Net more data from dblink".

Does increasing PGA in remote db will help me? Can you please tell me any other reason for this ?
Tom Kyte
May 12, 2008 - 10:29 am UTC

... Does increasing PGA in remote db will help me? ...

yes
no
maybe


Really - it totally depends on what is going on on the remote site doesn't it. Increasing the PGA will not help a query like "select * from t" - faster IO might, but PGA - not likely.

There are *so many* other reasons for this that one cannot enumerate them. You'll really need to dig a little bit to see what might be the bottleneck on the remote site. What is IT doing.

response time analysis

karthik, August 12, 2008 - 5:17 am UTC

Tom,

We are doing production support and have to analyze the statspack report periodically during the batch runs and high usage of database(peak hour).We have both top 5 wait events and top sql statements.
1,Do all top wait events(like sequential read) related to application has a significant impact over response time?To quote an example,my wait time for sequential reads may be high but how can i conclude that is the bottleneck for my application---or--how can i associate an particular session with the listed top 5 wait events which is having a performance problem.

2,If there are background processes with high wait time reported,how can i relate to an application that has an unusual response time?

3,You may suggest me to do an application level tuning to tune the response time. But what evidence related to tuning an application i can find from statspack report.(my above 2 questions are based on this only).


Tom Kyte
August 13, 2008 - 4:20 am UTC

1) you cannot, not really from a statspack, unless you identify all of your sql (not just the top) - and can see that it does lots of the IO...

You'd have to use application level tracing (active session history - ASH - in 10g would be fairly useful there)

2) you have to see what the foregrounds are waiting on. for example - if you see log writer is waiting a lot on log file parallel write, but, you do not see the foreground processes waiting on log file sync - you don't really care that log writer is waiting on that - log writer will do lots of IO - but unless end users are waiting for commits to finish - you do not really care if log writer is waiting on that IO. You care about foreground wait events mostly - and if you have them, you might look at the background wait events to see if they are a cause (eg: if you see log file sync waits, your end users wait for that, you might look to see if lgwr is bottlenecked on IO or if there is some other cause)

3) statspack is an aggregate at the system level, it is fairly hard truly to use statspack for application level tuning. If your goal is to improve the response time of application X out of a set of applications A-Z, you'll need really to look at X, not averages over A-Z

Consistent Read Calculation

Arvind Mishra, December 01, 2008 - 12:34 am UTC

Hello Tom,

I read your book on performance tuning and understood the importance of consistent read. But how can calculate or estimate the minimum required consistent read value for a query? I need this because even though I get this value but I am not sure whether it is the minimum required value for this query or query is doing more than required consistent read.

Thanks,

Arvind
Tom Kyte
December 01, 2008 - 6:44 am UTC

... But how can calculate or estimate the minimum required
consistent read value for a query? ...

using explain plan, it is basically what it is doing. The cost is roughly equivalent to IO's (not entirely, in 10g and above - cpu costing is involved as well)


or, by closing your eyes and envisioning how it works (I am being *dead serious* here, I am not kidding)

Think about this - ok, take a query:

select * from emp where empno = ?;


Ok, what do you know about this table.

a) empno is primary key
b) index exists on empno
c) table was created with reasonable pctfree
d) table has a few thousand records.

So, what would be the MINIMUM (and maximum) IO's we would expect....


Well, we'd have to guess the index on empno probably is height 2 (a few thousand records only - just a root and then leaf blocks in index). We'd guess that rows were not migrated in the table so, the minimum IO's would be:

2(index reads) + 1(table access by index rowid read) = 3 IO's

What would the max be:

4(index reads - if index was really tall for some reason) + 2(table access by index rowid reads - the row was migrated and the first IO did not hit it) = 6 IO's


so, that query would be between 3 and 6 IO's in a single user system with no concurrent updates. If people were modifying this data - then the IO's might go up as we read undo....


What about a query like:

select * from emp where ename like ?;

Now, that one is sort of "impossible". What would you need to KNOW?

a) how many rows are returned by this query. You cannot, you do not know what ? is

b) whether a full scan or index would be used

c) if index is used, what is clustering factor of the index (is the data stored on disk in the same order as the keys in the index - less IO will result - if not most IO will result)

d) what array fetch size does the application use?

You would use your knowledge of what is REASONABLE for that query - truly. You would have to assign some common sense numbers to those questions.

a) I estimate that about 100 records will be returned on average
b) I have an index in place and expect it to be used
c) The clustering factor of ename (search this site for "clustering factor" if those words mean nothing to you) is 'poor' since ename's arrive randomly over time, not sequentially.
d) I use an array fetch of 100 rows - that is a very nice number.

Therefore, I'd guess 5 IO's through the index (root+branch+3 leaves to get 100 entries) + 100 table blocks - since the data is spread out - so, about 105 IO's

The array fetch size will not have a considerable effect - since the data is randomly spread out.


Now what if the query were:

select * from emp where hiredate between ? and ?

you would need to know

a) how many rows...
b) whether full scan or index
c) clustering factor if index

Let us assume:

a) about 100 rows again
b) uses index
c) clustering factor is VERY good since hiredate is sequential and therefore, the data in the table is actually very well sorted by the index key in this case (that is, everyone hired in Aug-2008 would be "close" since they are inserted sequentially and we tend to not delete from EMP - just insert over time - and the inserts TEND to go to the 'end' of the table)
d) I'd use an array fetch size of 100, and this will have great effect since the data is very clustered


So, now we guess 5 IO's through the index + 3 IO's against the table (since all 100 employee records would be close to each other) or about 8 IO's


both queries return 100 records
both queries use an index to do this

one would be expected to consume 105+ IO's
the other would be expected to consume about 8 IO's


but you need to have that knowledge - the knowledge of how the data is stored, the data distributions, how the data is processed by Oracle and so on...


ops$tkyte%ORA11GR1> /*
ops$tkyte%ORA11GR1> drop table t;
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> create table t
ops$tkyte%ORA11GR1> as
ops$tkyte%ORA11GR1> select object_name ename, to_date('01-jan-2000','dd-mon-yyyy')+rownum hiredate,
ops$tkyte%ORA11GR1>        x.*
ops$tkyte%ORA11GR1>   from (
ops$tkyte%ORA11GR1> select *
ops$tkyte%ORA11GR1>   from stage
ops$tkyte%ORA11GR1>  order by dbms_random.random
ops$tkyte%ORA11GR1>        ) x
ops$tkyte%ORA11GR1> /
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> create index ename_idx on t(ename);
ops$tkyte%ORA11GR1> create index hiredate_idx on t(hiredate);
ops$tkyte%ORA11GR1> exec dbms_stats.gather_table_stats( user, 'T' );
ops$tkyte%ORA11GR1> */
ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> select ui.index_name, ui.clustering_factor,
  2         ut.blocks, ut.num_rows
  3    from user_indexes ui,
  4         user_tables  ut
  5   where ui.table_name = 'T'
  6     and ut.table_name = 'T'
  7  /

INDEX_NAME                     CLUSTERING_FACTOR     BLOCKS   NUM_ROWS
------------------------------ ----------------- ---------- ----------
ENAME_IDX                                  67481       1332      67522
HIREDATE_IDX                                1302       1332      67522

<b>data is well clustered by hiredate (cluster factor near number of blocks tells us that) but not ename (cluster factor near number of rows tells us that)</b>


ops$tkyte%ORA11GR1> set autotrace traceonly statistics
ops$tkyte%ORA11GR1> set arraysize 100
ops$tkyte%ORA11GR1> select * from t where ename like 'UT%';

105 rows selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        110  consistent gets
          0  physical reads
          0  redo size
      13320  bytes sent via SQL*Net to client
        431  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        105  rows processed

ops$tkyte%ORA11GR1> select * from t where hiredate between
  2  to_date( '01-jan-2005', 'dd-mon-yyyy') and
  3  to_date( '01-jan-2005', 'dd-mon-yyyy')+100
  4  /

101 rows selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          9  consistent gets
          0  physical reads
          0  redo size
      15626  bytes sent via SQL*Net to client
        431  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        101  rows processed

<b>My guesses were very accurate (and yes, trust me, they were GUESSES - I got lucky, I did the example after guessing, you can too eventually...)</b>

ops$tkyte%ORA11GR1> set arraysize 2
ops$tkyte%ORA11GR1> select * from t where ename like 'UT%';

105 rows selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        161  consistent gets
          0  physical reads
          0  redo size
      19916  bytes sent via SQL*Net to client
        992  bytes received via SQL*Net from client
         54  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        105  rows processed

ops$tkyte%ORA11GR1> select * from t where hiredate between
  2  to_date( '01-jan-2005', 'dd-mon-yyyy') and
  3  to_date( '01-jan-2005', 'dd-mon-yyyy')+100
  4  /

101 rows selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        105  consistent gets
          0  physical reads
          0  redo size
      21996  bytes sent via SQL*Net to client
        970  bytes received via SQL*Net from client
         52  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        101  rows processed

ops$tkyte%ORA11GR1> set autotrace off
ops$tkyte%ORA11GR1>

<b>just to show the profound effect arraysize MIGHT have - not WILL have, MIGHT have - on your IO's</b>


OBI array size

Arvind Mishra, December 01, 2008 - 7:06 pm UTC

Hello Tom,

Thanks Tom. Now I am able to understand it completely. You are a very good teacher indeed :-). Can you tell me how to set array size in OBI(Earlier known as Sibel)

Regards,

Arvind


Tom Kyte
December 02, 2008 - 7:29 am UTC

talk to the Siebel people to see how/if they do it. I do not work with much of the middle tier stuff myself.

Performance book?

A reader, December 02, 2008 - 11:27 am UTC

"I read your book on performance tuning and understood the importance of consistent read." sorry Tom, but which book he is talking about ?
Tom Kyte
December 02, 2008 - 11:35 am UTC

I think he most likely meant "effective oracle by DESIGN"

which is less about "turning" (that is reactionary, after the fact) and more about designing, measuring and testing up front...

Thanks.

A reader, December 02, 2008 - 3:17 pm UTC


Book Name

Arvind Mishra, December 03, 2008 - 12:23 am UTC

Yes..It is 'Effective Oracle by Design'.

Thanks,

Arvind

Performance Tuning

A Reader, December 09, 2008 - 8:55 am UTC

Tom, which is the best book for performance Tuning Guide.
Tom Kyte
December 09, 2008 - 2:17 pm UTC

the book of life.

seriously - you need a smattering of lots of bits of knowledge (no one book has it all) and an understanding of how things work.

there is no single 'best' book.


read as many as you can.

question those that just tell you what to do - NOT WHY you are doing something. In fact, do more than question - benchmark and evaluate what they say so you can see just how wrong it usually is.

enjoy those that show you what you should be seeing when you do something - so that you can gain an understanding of WHY you are doing what you are doing (tuning wise).

A reader, December 09, 2008 - 4:30 pm UTC

I guess what Tom is saying that to do performance tuning, one must first understand what is the root cause of problem. Keep digging till root cause is identified and then do the tuning. The process of finding the root cause and fixing it will stimulate learning, either through books or Asktom website or any other means. The learning process will be the true "book of tuning".

My thought is that this not only applicable to tuning, but to everything we do on a daily basis whether it is DBA work or development work or anything else.

Tom Kyte
December 10, 2008 - 9:14 am UTC

... one must first
understand what is the root cause of problem. ...


exactly. You wouldn't like it if you Doctor said "Ok, we do not really need an appointment to figure out what is wrong - many times when I prescribe antibiotics - the problem goes away, so take these pills and see how you make out with them"



Tuning.

A Reader, September 29, 2009 - 12:29 pm UTC

Many thanks Tom,
for your time.

Database version  9.2.0.8
Optimizer Mode : First_rows

From 20 mins statspack: 

********************************************************************************
                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
        650,274            1      650,274.0    2.1    30.83    192.91  417649475
Module: java_q4p@hostName (TNS V1-V3)
SELECT * FROM ( SELECT T.objid,T.x_request_date FROM view_T T  WHERE  (  T.x_parent_objid =  :P16 )  ORDER BY 2 ASC ) WHERE ROWNUM <= 200

********************************************************************************

The  view_T  definition: 

CREATE OR REPLACE VIEW VIEW_T
(OBJID, DEV, X_DT_OBJID, X_PARENT_OBJID, X_PARENT_ID, 
 X_CASE_OBJID, X_CASE_ID, X_OUTCOME, X_DTR, X_DTR_TEXT, 
 X_TEST_INST_ID, X_TEST_ID, X_EQUIPMENT_ID, X_REQUEST_DATE, X_MFL, 
 X_COMPLETION_CODE, X_TEST_STOP_TIME)
AS 
(select   rownum as objid, 
  -1 as dev, 
  dt.objid as x_dt_objid, 
  pc.objid as x_parent_objid, 
  pc.id_number as x_parent_id, 
  c.objid as x_case_objid, 
  c.id_number as x_case_id, 
  dt.x_outcome as x_outcome, 
  dt.x_dtr as x_dtr, 
  dt.x_dtr_text as x_dtr_text, 
  dt.x_test_inst_id as x_test_inst_id, 
  dt.x_test_id as x_test_id, 
  dt.x_equipment_id as x_equipment_id, 
  mr.x_date_sent as x_request_date, 
  dt.x_mfl as x_mfl, 
  dt.x_completion_code as x_completion_code, 
  dt.x_test_stop_time as x_test_stop_time 
FROM T1 pc, 
  T1 c, 
  T2 dt, 
  T3 mr 
where pc.objid(+) = c.case_victim2case 
and  c.objid = dt.x_focus_id 
and  dt.x_focus_type = 0 
and  dt.x_diag_test2x_msg_req = mr.objid(+) 
and        dt.x_outcome not like 'Fail -%' 
union 
select   rownum as objid, 
  -1 as dev, 
  dt.objid as x_dt_objid, 
  c.objid as x_parent_objid, 
  c.id_number as x_parent_id, 
  c.objid as x_case_objid, 
  c.id_number as x_case_id, 
  dt.x_outcome as x_outcome, 
  dt.x_dtr as x_dtr, 
  dt.x_dtr_text as x_dtr_text, 
  dt.x_test_inst_id as x_test_inst_id, 
  dt.x_test_id as x_test_id, 
  dt.x_equipment_id as x_equipment_id, 
  mr.x_date_sent as x_request_date, 
  dt.x_mfl as x_mfl, 
  dt.x_completion_code as x_completion_code, 
  dt.x_test_stop_time as x_test_stop_time 
FROM T1 c, 
  T2 dt, 
  T3 mr 
where c.objid = dt.x_focus_id 
and  dt.x_focus_type = 0 
and  dt.x_diag_test2x_msg_req = mr.objid(+) 
and        dt.x_outcome not like 'Fail -%' 
);




********************************************************************************
 
Explain Plan of query:

SELECT * FROM ( SELECT T.objid,T.x_request_date FROM view_T T  WHERE  (  T.x_parent_objid =  :P16 ) 
 ORDER BY 2 ASC ) WHERE ROWNUM <= 200



SQL> set linesize 200
SQL> /

PLAN_TABLE_OUTPUT
 

------------------------------------------------------------------------------------------------
| Id  | Operation                    |  Name                   | Rows  | Bytes |TempSpc| Cost  |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |   200 |  4400 |       | 48982 |
|*  1 |  COUNT STOPKEY               |                         |       |       |       |       |
|   2 |   VIEW                       |                         |   169K|  3636K|       | 48982 |
|*  3 |    SORT ORDER BY STOPKEY     |                         |   169K|  5785K|    15M| 48982 |
|*  4 |     VIEW                     |            VIEW_T       |   169K|  5785K|       | 47385 |
|   5 |      SORT UNIQUE             |                         |   169K|    24M|    52M| 47385 |
|   6 |       UNION-ALL              |                         |       |       |       |       |
|   7 |        COUNT                 |                         |       |       |       |       |
|   8 |         MERGE JOIN OUTER     |                         | 84638 |    13M|       | 21985 |
|   9 |          SORT JOIN           |                         | 84638 |    12M|    29M|  6883 |
|* 10 |           HASH JOIN          |                         | 84638 |    12M|    10M|  4032 |
|* 11 |            HASH JOIN OUTER   |                         | 61471 |  2941K|  3691K|  2781 |
|  12 |             TABLE ACCESS FULL|         T1              | 61471 |  1560K|       |  1252 |
|  13 |             TABLE ACCESS FULL|         T1              | 61471 |  1380K|       |  1252 |
|* 14 |            TABLE ACCESS FULL |         T2              | 84638 |  8348K|       |   207 |
|* 15 |          SORT JOIN           |                         |   785K|  9976K|    36M| 15102 |
|  16 |           TABLE ACCESS FULL  |         T3              |   785K|  9976K|       | 11321 |
|  17 |        COUNT                 |                         |       |       |       |       |
|* 18 |         HASH JOIN            |                         | 84638 |    11M|  9048K| 19682 |
|  19 |          TABLE ACCESS FULL   |         T1              | 61471 |  1380K|       |  1252 |
|  20 |          MERGE JOIN OUTER    |                         | 84638 |  9422K|       | 17298 |
|  21 |           SORT JOIN          |                         | 84638 |  8348K|    21M|  2196 |
|* 22 |            TABLE ACCESS FULL |         T2              | 84638 |  8348K|       |   207 |
|* 23 |           SORT JOIN          |                         |   785K|  9976K|    36M| 15102 |
|  24 |            TABLE ACCESS FULL |         T3              |   785K|  9976K|       | 11321 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

   1 - filter(ROWNUM<=200)
   3 - filter(ROWNUM<=200)
   4 - filter("T"."X_PARENT_OBJID"=16)
  10 - access("C"."OBJID"="DT"."X_FOCUS_ID")
  11 - access("PC"."OBJID"(+)="C"."CASE_VICTIM2CASE")
  14 - filter("DT"."X_FOCUS_TYPE"=0 AND "DT"."X_OUTCOME" NOT LIKE 'Fail -%')
  15 - access("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
       filter("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
  18 - access("C"."OBJID"="DT"."X_FOCUS_ID")
  22 - filter("DT"."X_FOCUS_TYPE"=0 AND "DT"."X_OUTCOME" NOT LIKE 'Fail -%')
  23 - access("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
       filter("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))

Note: cpu costing is off

48 rows selected.

SQL>




********************************************************************************
tkprof output.( I ran the sql identified from statspack in a separate session, with value of :P16 as 16 )



 SELECT * FROM ( SELECT T.objid,T.x_request_date
FROM view_T T
WHERE  (  T.x_parent_objid =  16 )  ORDER BY 2 ASC ) WHERE ROWNUM <= 200

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          2          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     25.10     313.88     512597     561247        209           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     25.11     313.89     512597     561249        209           4

Misses in library cache during parse: 1
Optimizer goal: FIRST_ROWS
Parsing user id: 42  (SA)

Rows     Row Source Operation
-------  ---------------------------------------------------
      4  COUNT STOPKEY
      4   VIEW
      4    SORT ORDER BY STOPKEY
      4     VIEW
 222568      SORT UNIQUE
 222568       UNION-ALL
 111284        COUNT
 111284         MERGE JOIN OUTER
 111284          SORT JOIN
 111284           HASH JOIN
  73011            HASH JOIN OUTER
  73011             TABLE ACCESS FULL T1
  73011             TABLE ACCESS FULL T1
 111284            TABLE ACCESS FULL T2
  99111          SORT JOIN
 954900           TABLE ACCESS FULL T3
 111284        COUNT
 111284         HASH JOIN
  73011          TABLE ACCESS FULL T1
 111284          MERGE JOIN OUTER
 111284           SORT JOIN
 111284            TABLE ACCESS FULL T2
  99111           SORT JOIN
 954900            TABLE ACCESS FULL T3


 

********************************************************************************


So, in order to read 4 rows output we are hitting above 60k rows (ID- 12,13 of the explain plan). However    filter("T"."X_PARENT_OBJID"=16) is done at Id-4 of the explain plan.  
column x_parent_objid is indexed.


Questions:

a) Why optimizer is not choosing this filter ( "T"."X_PARENT_OBJID"=16 ) in the begining so that initial rowcount hit is reduced . hence less IO.. less work?

b) There are lot PIO. On table T1 looks it is due to the following predicate:
where    pc.objid(+) = c.case_victim2case 
Here. why  hash outer join on table T1   not going for Index Scan as pc.objid ( Unique index)  and c.case_victim2case (non unique index) are indexed?

c)Any other way we can re-write the query?



Regards

 
 


 





Tuning.

A Reader, September 29, 2009 - 12:35 pm UTC

Tom,
reposting the explain plan & tkprof o/p for the above question for better redability.

SQL> set linesize 200
SQL> /

PLAN_TABLE_OUTPUT
 

------------------------------------------------------------------------------------------------
| Id  | Operation                    |  Name                   | Rows  | Bytes |TempSpc| Cost  |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |   200 |  4400 |       | 48982 |
|*  1 |  COUNT STOPKEY               |                         |       |       |       |       |
|   2 |   VIEW                       |                         |   169K|  3636K|       | 48982 |
|*  3 |    SORT ORDER BY STOPKEY     |                         |   169K|  5785K|    15M| 48982 |
|*  4 |     VIEW                     |            VIEW_T       |   169K|  5785K|       | 47385 |
|   5 |      SORT UNIQUE             |                         |   169K|    24M|    52M| 47385 |
|   6 |       UNION-ALL              |                         |       |       |       |       |
|   7 |        COUNT                 |                         |       |       |       |       |
|   8 |         MERGE JOIN OUTER     |                         | 84638 |    13M|       | 21985 |
|   9 |          SORT JOIN           |                         | 84638 |    12M|    29M|  6883 |
|* 10 |           HASH JOIN          |                         | 84638 |    12M|    10M|  4032 |
|* 11 |            HASH JOIN OUTER   |                         | 61471 |  2941K|  3691K|  2781 |
|  12 |             TABLE ACCESS FULL|         T1              | 61471 |  1560K|       |  1252 |
|  13 |             TABLE ACCESS FULL|         T1              | 61471 |  1380K|       |  1252 |
|* 14 |            TABLE ACCESS FULL |         T2              | 84638 |  8348K|       |   207 |
|* 15 |          SORT JOIN           |                         |   785K|  9976K|    36M| 15102 |
|  16 |           TABLE ACCESS FULL  |         T3              |   785K|  9976K|       | 11321 |
|  17 |        COUNT                 |                         |       |       |       |       |
|* 18 |         HASH JOIN            |                         | 84638 |    11M|  9048K| 19682 |
|  19 |          TABLE ACCESS FULL   |         T1              | 61471 |  1380K|       |  1252 |
|  20 |          MERGE JOIN OUTER    |                         | 84638 |  9422K|       | 17298 |
|  21 |           SORT JOIN          |                         | 84638 |  8348K|    21M|  2196 |
|* 22 |            TABLE ACCESS FULL |         T2              | 84638 |  8348K|       |   207 |
|* 23 |           SORT JOIN          |                         |   785K|  9976K|    36M| 15102 |
|  24 |            TABLE ACCESS FULL |         T3              |   785K|  9976K|       | 11321 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

   1 - filter(ROWNUM<=200)
   3 - filter(ROWNUM<=200)
   4 - filter("T"."X_PARENT_OBJID"=16)
  10 - access("C"."OBJID"="DT"."X_FOCUS_ID")
  11 - access("PC"."OBJID"(+)="C"."CASE_VICTIM2CASE")
  14 - filter("DT"."X_FOCUS_TYPE"=0 AND "DT"."X_OUTCOME" NOT LIKE 'Fail -%')
  15 - access("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
       filter("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
  18 - access("C"."OBJID"="DT"."X_FOCUS_ID")
  22 - filter("DT"."X_FOCUS_TYPE"=0 AND "DT"."X_OUTCOME" NOT LIKE 'Fail -%')
  23 - access("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
       filter("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))

Note: cpu costing is off

48 rows selected.

SQL>




********************************************************************************
tkprof output.



 SELECT * FROM ( SELECT T.objid,T.x_request_date
FROM view_T T
WHERE  (  T.x_parent_objid =  16 )  ORDER BY 2 ASC ) WHERE ROWNUM <= 200

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          2          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     25.10     313.88     512597     561247        209           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     25.11     313.89     512597     561249        209           4

Misses in library cache during parse: 1
Optimizer goal: FIRST_ROWS
Parsing user id: 42  (SA)

Rows     Row Source Operation
-------  ---------------------------------------------------
      4  COUNT STOPKEY
      4   VIEW
      4    SORT ORDER BY STOPKEY
      4     VIEW
 222568      SORT UNIQUE
 222568       UNION-ALL
 111284        COUNT
 111284         MERGE JOIN OUTER
 111284          SORT JOIN
 111284           HASH JOIN
  73011            HASH JOIN OUTER
  73011             TABLE ACCESS FULL T1
  73011             TABLE ACCESS FULL T1
 111284            TABLE ACCESS FULL T2
  99111          SORT JOIN
 954900           TABLE ACCESS FULL T3
 111284        COUNT
 111284         HASH JOIN
  73011          TABLE ACCESS FULL T1
 111284          MERGE JOIN OUTER
 111284           SORT JOIN
 111284            TABLE ACCESS FULL T2
  99111           SORT JOIN
 954900            TABLE ACCESS FULL T3


 

********************************************************************************

Tuning. ....

A Reader, October 07, 2009 - 10:38 am UTC

Many thanks Tom,
for your time.

Sorry I am reposting above question again... actually I gave all the details but  missed the asking question previously ( error in copy/paste from my side). Also this question is best categorired under "SQL Tuning" section..I will take care of the same next time and make sure that I post the question to right category.


Database version 9.2.0.8
Optimizer Mode : First_rows

From 20 mins statspack:

********************************************************************************
                                  CPU    Elapsd
Buffer Gets  Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
    650,274        1    650,274.0  2.1  30.83  192.91 417649475
Module: java_q4p@hostName (TNS V1-V3)
SELECT * FROM ( SELECT T.objid,T.x_request_date FROM view_T T WHERE ( T.x_parent_objid = :P16 ) ORDER BY 2 ASC ) WHERE ROWNUM <= 200

********************************************************************************

The view_T definition:

CREATE OR REPLACE VIEW VIEW_T
(OBJID, DEV, X_DT_OBJID, X_PARENT_OBJID, X_PARENT_ID,
X_CASE_OBJID, X_CASE_ID, X_OUTCOME, X_DTR, X_DTR_TEXT,
X_TEST_INST_ID, X_TEST_ID, X_EQUIPMENT_ID, X_REQUEST_DATE, X_MFL,
X_COMPLETION_CODE, X_TEST_STOP_TIME)
AS
(select    rownum as objid,
    -1 as dev,
    dt.objid as x_dt_objid,
    pc.objid as x_parent_objid,
    pc.id_number as x_parent_id,
    c.objid as x_case_objid,
    c.id_number as x_case_id,
    dt.x_outcome as x_outcome,
    dt.x_dtr as x_dtr,
    dt.x_dtr_text as x_dtr_text,
    dt.x_test_inst_id as x_test_inst_id,
    dt.x_test_id as x_test_id,
    dt.x_equipment_id as x_equipment_id,
    mr.x_date_sent as x_request_date,
    dt.x_mfl as x_mfl,
    dt.x_completion_code as x_completion_code,
    dt.x_test_stop_time as x_test_stop_time
FROM  T1 pc,
    T1 c,
    T2 dt,
    T3 mr
where  pc.objid(+) = c.case_victim2case
and    c.objid = dt.x_focus_id
and    dt.x_focus_type = 0
and    dt.x_diag_test2x_msg_req = mr.objid(+)
and    dt.x_outcome not like 'Fail -%'
union
select    rownum as objid,
    -1 as dev,
    dt.objid as x_dt_objid,
    c.objid as x_parent_objid,
    c.id_number as x_parent_id,
    c.objid as x_case_objid,
    c.id_number as x_case_id,
    dt.x_outcome as x_outcome,
    dt.x_dtr as x_dtr,
    dt.x_dtr_text as x_dtr_text,
    dt.x_test_inst_id as x_test_inst_id,
    dt.x_test_id as x_test_id,
    dt.x_equipment_id as x_equipment_id,
    mr.x_date_sent as x_request_date,
    dt.x_mfl as x_mfl,
    dt.x_completion_code as x_completion_code,
    dt.x_test_stop_time as x_test_stop_time
FROM  T1 c,
    T2 dt,
    T3 mr
where  c.objid = dt.x_focus_id
and    dt.x_focus_type = 0
and    dt.x_diag_test2x_msg_req = mr.objid(+)
and    dt.x_outcome not like 'Fail -%'
); 

 SQL> set linesize 200
SQL> /

PLAN_TABLE_OUTPUT
 

------------------------------------------------------------------------------------------------
| Id  | Operation                    |  Name                   | Rows  | Bytes |TempSpc| Cost  |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |   200 |  4400 |       | 48982 |
|*  1 |  COUNT STOPKEY               |                         |       |       |       |       |
|   2 |   VIEW                       |                         |   169K|  3636K|       | 48982 |
|*  3 |    SORT ORDER BY STOPKEY     |                         |   169K|  5785K|    15M| 48982 |
|*  4 |     VIEW                     |            VIEW_T       |   169K|  5785K|       | 47385 |
|   5 |      SORT UNIQUE             |                         |   169K|    24M|    52M| 47385 |
|   6 |       UNION-ALL              |                         |       |       |       |       |
|   7 |        COUNT                 |                         |       |       |       |       |
|   8 |         MERGE JOIN OUTER     |                         | 84638 |    13M|       | 21985 |
|   9 |          SORT JOIN           |                         | 84638 |    12M|    29M|  6883 |
|* 10 |           HASH JOIN          |                         | 84638 |    12M|    10M|  4032 |
|* 11 |            HASH JOIN OUTER   |                         | 61471 |  2941K|  3691K|  2781 |
|  12 |             TABLE ACCESS FULL|         T1              | 61471 |  1560K|       |  1252 |
|  13 |             TABLE ACCESS FULL|         T1              | 61471 |  1380K|       |  1252 |
|* 14 |            TABLE ACCESS FULL |         T2              | 84638 |  8348K|       |   207 |
|* 15 |          SORT JOIN           |                         |   785K|  9976K|    36M| 15102 |
|  16 |           TABLE ACCESS FULL  |         T3              |   785K|  9976K|       | 11321 |
|  17 |        COUNT                 |                         |       |       |       |       |
|* 18 |         HASH JOIN            |                         | 84638 |    11M|  9048K| 19682 |
|  19 |          TABLE ACCESS FULL   |         T1              | 61471 |  1380K|       |  1252 |
|  20 |          MERGE JOIN OUTER    |                         | 84638 |  9422K|       | 17298 |
|  21 |           SORT JOIN          |                         | 84638 |  8348K|    21M|  2196 |
|* 22 |            TABLE ACCESS FULL |         T2              | 84638 |  8348K|       |   207 |
|* 23 |           SORT JOIN          |                         |   785K|  9976K|    36M| 15102 |
|  24 |            TABLE ACCESS FULL |         T3              |   785K|  9976K|       | 11321 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------


   1 - filter(ROWNUM<=200)
   3 - filter(ROWNUM<=200)
   4 - filter("T"."X_PARENT_OBJID"=16)
  10 - access("C"."OBJID"="DT"."X_FOCUS_ID")
  11 - access("PC"."OBJID"(+)="C"."CASE_VICTIM2CASE")
  14 - filter("DT"."X_FOCUS_TYPE"=0 AND "DT"."X_OUTCOME" NOT LIKE 'Fail -%')
  15 - access("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
       filter("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
  18 - access("C"."OBJID"="DT"."X_FOCUS_ID")
  22 - filter("DT"."X_FOCUS_TYPE"=0 AND "DT"."X_OUTCOME" NOT LIKE 'Fail -%')
  23 - access("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))
       filter("DT"."X_DIAG_TEST2X_MSG_REQ"="MR"."OBJID"(+))

Note: cpu costing is off

48 rows selected.

SQL>




********************************************************************************
tkprof output.



 SELECT * FROM ( SELECT T.objid,T.x_request_date
FROM view_T T
WHERE  (  T.x_parent_objid =  16 )  ORDER BY 2 ASC ) WHERE ROWNUM <= 200

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          2          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     25.10     313.88     512597     561247        209           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     25.11     313.89     512597     561249        209           4

Misses in library cache during parse: 1
Optimizer goal: FIRST_ROWS
Parsing user id: 42  (SA)

Rows     Row Source Operation
-------  ---------------------------------------------------
      4  COUNT STOPKEY
      4   VIEW
      4    SORT ORDER BY STOPKEY
      4     VIEW
 222568      SORT UNIQUE
 222568       UNION-ALL
 111284        COUNT
 111284         MERGE JOIN OUTER
 111284          SORT JOIN
 111284           HASH JOIN
  73011            HASH JOIN OUTER
  73011             TABLE ACCESS FULL T1
  73011             TABLE ACCESS FULL T1
 111284            TABLE ACCESS FULL T2
  99111          SORT JOIN
 954900           TABLE ACCESS FULL T3
 111284        COUNT
 111284         HASH JOIN
  73011          TABLE ACCESS FULL T1
 111284          MERGE JOIN OUTER
 111284           SORT JOIN
 111284            TABLE ACCESS FULL T2
  99111           SORT JOIN
 954900            TABLE ACCESS FULL T3


 

********************************************************************************

Questions:

a) Why filter in explain plan output  ( 4 - filter("T"."X_PARENT_OBJID"=16) ) is not considered at step#12 & 13 in execution plan? as this would have  reduces row source drastically in the begining stage.
 I have access to your book "Effective Oracle by Desgin"- and it looks first entire result set is calculated then filter is applied.

b) Why indexes are not being considered in Id#12 & 13?
   here only it goes for lots of PIOs
   Indexing scheme:
   pc.objid has unique index
   c.case_victim2case has non unique index
   c.objid has unqiue index
   dt.x_diag_test2x_msg_req  has non unique index
   mr.objid has unique index

c) Any other way to optimise(rewrite) the query?


Regards



Tuning. ....

A Reader, October 07, 2009 - 10:42 am UTC

Sorry
Tom. It looks text is getting truncated after certain limit.

Posting the questions below for above.


Questions:

a) Why filter in explain plan output ( 4 - filter("T"."X_PARENT_OBJID"=16) ) is not considered at step#12 & 13 in execution plan? as this would have reduces row source drastically in the begining stage.
I have access to your book "Effective Oracle by Desgin"- and it looks first entire result set is calculated then filter is applied.

b) Why indexes are not being considered in Id#12 & 13?
here only it goes for lots of PIOs
Indexing scheme:
pc.objid has unique index
c.case_victim2case has non unique index
c.objid has unqiue index
dt.x_diag_test2x_msg_req has non unique index
mr.objid has unique index

c) Any other way to optimise(rewrite) the query?


Regards



asynch descriptor resize ?

A reader, June 28, 2010 - 11:10 am UTC

What is the meaning / cause of "asynch descriptor resize"? . I see this wait in my top list of waits and consuming 199361.53% DB time .

I looked into metalink . Not enough information.
I am on 11g R2 on windows 2008 64 bit.

Thanks
Tom Kyte
July 06, 2010 - 12:44 pm UTC

it cannot consume more than 100% of db time, numbers look wrong if it does. Please file a service request via support - if it consumes way more than 100% of time, something is wrong.

(and there is nothing you can do to offset this wait, short of "do not do direct IO" which in itself would become the larger bottleneck (lack of directio)

Oracle wait events

Shafi M., July 29, 2010 - 8:57 pm UTC

Hello Tom,suddenly I see a RMAN backup job hangs(for more than 2 hours) with "Enqueue: CF Contention" on 10gR2 7-node RAC on Linux RH system.

looks like the controlfile backup is taking enormous time.I killed the session later.what may be the issue?

best regards,
Tom Kyte
August 02, 2010 - 7:55 am UTC

it could be caused by a few things, I start with bug 7023147 - but utilize support for this please. You'll need a bit more information to diagnose.

row cache lock in top 5 wait events

baskar.l, October 05, 2010 - 9:41 am UTC

Hi Tom,

DB version is 10.2.0.4. i could see row cache lock in top 5 wait events in the db..

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
db file sequential read           3,544,685      49,464     14   59.3   User I/O
CPU time                                         19,307          23.2
ARCH wait on SENDREQ                  2,272       5,002   2202    6.0    Network
db file scattered read              128,920       3,514     27    4.2   User I/O
row cache lock                    1,787,698       2,192      1    2.6 Concurrenc
          -------------------------------------------------------------

SQL> select p1text,p1,p2text,p2,p3text,p3 from v$session where event = 'row cache lock';
 
P1TEXT                       P1 P2TEXT                       P2 P3TEXT                                                                   P3
-------------------- ---------- -------------------- ---------- ---------------------------------------------------------------- ----------
cache id                     16 mode                          0 request                                                                   3
 
1 row selected.
 
SQL> select PARAMETER ,COUNT ,GETS ,GETMISSES ,MODIFICATIONS from v$rowcache where cache#=16;
 
PARAMETER                             COUNT       GETS  GETMISSES MODIFICATIONS
-------------------------------- ---------- ---------- ---------- -------------
dc_histogram_defs                      1704   47481213   11479508          5408
dc_histogram_data                       218    8211149    1277649             0
dc_histogram_data                        83    2668657     301607             0
 
3 rows selected.

what does dc_histogram_defs and dc_histogram_data?? Why is it showing in row cache lock??

baskar.l

Tom Kyte
October 05, 2010 - 12:28 pm UTC

hah, given your other waits, I wouldn't even be concerned - it would not have dawned on me to look at them (the row cache waits).

you are seeing waits on getting histogram information - probably related to IO waits to read that stuff in - given your other massive IO problem.

You can try increasing your shared pool to increase the chance of caching more of the dictionary

Thanks Tom

baskar.l, October 06, 2010 - 10:34 pm UTC

Thanks Tom..Its just amazing to read your blogs and answers.Guess you have a great time management in place(off topic)

baskar.l

Why need db file sequential read wait event?

tom.lai, October 26, 2010 - 1:16 am UTC

Hi Tom,

db file sequential read wait event is relative to single block read event, db file scattered read wait event is multi-block read. my question is why oracle need to "db file sequential read"? because contiguous multi blocks read into cache buffer, this will occur better performance.

thanks
Tom Kyte
October 26, 2010 - 7:53 pm UTC

because sometimes we need A SINGLE BLOCK.


If you give us a rowid (after seaching an index for example), we need A SINGLE BLOCK.

reading a bunch of blocks we don't need would be a waste, would not it? (rhetorical question, yes is the answer)

A reader, October 26, 2010 - 8:26 pm UTC

Thanks tom.


wait class

ashish kumar, July 28, 2011 - 2:53 am UTC

Hi Tom,

I am facing issues with "Concurrency" wait class, which is consuming database time, due to it performance of database is low now. In my addm report it shows like this:

Contention for latches related to the shared pool was consuming significant
database time.
NO RECOMMENDATIONS AVAILABLE

ADDITIONAL INFORMATION:
Waits for "cursor: pin S wait on X" amounted to 38% of database time.
Hard parsing of SQL statements was not consuming significant database
time.

SYMPTOMS THAT LED TO THE FINDING:
SYMPTOM: Wait class "Concurrency" was consuming significant database
time. (39% impact [71248 seconds])

and in AWR top event category it shows:
Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
cursor: pin S wait on X 3,589,632 69,583 19 38.5 Concurrency

give some suggestion to overcome this issue

Thanks in advance,
Ashish
Tom Kyte
July 28, 2011 - 7:19 pm UTC

you are parsing a lot.

what is your setting for session cached cursors?
how many parses/second do you do?

wait class

ashish kumar, July 29, 2011 - 3:35 am UTC

Hi Tom,

We have set session_cached_cursors=50
and
parse time elapsed 1,841.98 per sec and consuming 1.02% of DB time

I got some more statistics in AWR Regarding concurrency, its:
Concurrency(Wait Class)
3,596,358(Waits)
99.83(%Time -outs)
71,248(Total Wait Time/s)
20(Avg wait/ms)
1.44(Waits/txn

Instance Efficiency Percentages (Target 100%)

Buffer Nowait %: 99.81 Redo NoWait %: 100.00
Buffer Hit %: 79.96 In-memory Sort %: 100.00
Library Hit %: 98.80 Soft Parse %: 89.92
Execute to Parse %: 95.60 Latch Hit %: 99.94
Parse CPU to Parse Elapsd %: 6.59 % Non-Parse CPU: 98.98


Regards,
Ashish
Tom Kyte
August 01, 2011 - 11:29 am UTC

soft parse % is horrible, you are hard parsing like crazy.

nothing but fixing the code that is not using binds correctly will fix that (I will not recommend cursor_sharing=force/similar, I do not recommend using that)

buffer busy wait vs buffer gets

Lasse Jenssen, November 09, 2011 - 7:37 am UTC


Hi Tom

Our database got yammed last night, and I'm strugling to see what's the different from the other nights.

There is (rman) backup going on at this time and there is a delete script deleting thousands of rows from different tables (this includes a log table that gets written to by the web application).

The system is running on a IBM lpar with 10 lcpu's. I thinkt this means that there are 5 physical CPU available for the partition.

The day before the load statistics looks the same.
But in the statspack report (30 min snap) from yesterday and the day before i see the following top events:
Here we definitly got a problem:
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                     % Total
Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
buffer busy waits                               1,404,117      54,841    38.71
latch free                                      1,814,866      51,804    36.57
enqueue                                            30,772      17,764    12.54
CPU time                                                        7,608     5.37
sbtwrite2                                         274,823       4,706     3.32

The day before everything is ok:
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                     % Total
Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file sequential read                           973,353       4,783    30.26
sbtwrite2                                         545,896       4,656    29.45
CPU time                                                        4,539    28.71
db file parallel write                             22,299         605     3.83
async disk IO                                     301,410         392     2.48

In the top SQL order by Gets from yesterday i see:
                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
  2,380,729,134       24,511       97,129.0   98.2  5584.91 ######### 2174909325
Module: JDBC Thin Client
INSERT INTO log ...


The day before the number of execution was even higher, but buffer gets was "only" one fifth (about 481,300,919):
                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
    481,300,919       30,197       15,938.7   91.6  1012.91   3251.11 2174909325
Module: JDBC Thin Client
INSERT INTO log ...


Question:
Does Oracle count an "buffer get" also when there is a "buffer busy waits"? So when you see abnormal "buffer busy waits" there would naturally be an increase in "buffer gets"?</code>
Tom Kyte
November 09, 2011 - 7:57 am UTC

Here we definitly got a problem:


how so - maybe yes, maybe no.


The day before the load statistics looks the same.


what does that mean in the context of this discussion? I'm not sure what "load statistics" are to you?



It looks like your delete of the LOG table was running at a time of heavy activity (other DML activity - the insert) today during this window of time - but not yesterday.


A buffer get could be preceded by a buffer busy wait - yes. But it would be an attempt to get a buffer that would cause the wait, not the wait causing increased buffer gets directly.


The increased buffer gets sound like they could be due to read consistency (we've hit blocks that have been modified since our statement began and we need to roll them back - the extra buffer gets are reads against undo to roll back the changes).


Can you consider using partitioning to purge this information - a simple fast DDL operation to truncate or drop an old partition. No redo, no undo, no read consistency issues?

buffer busy wait vs buffer gets

Lasse Jenssen, November 10, 2011 - 2:56 am UTC

You wrote: "maybe yes, maybe no."

No. Definitly yes! The application ran slower and slower, and after a while the customer couldn't even log into the application. That is a problem, and its definit :-)

You wrote: "not sure what "load statistics" are to you?"
Sorry. I was talking about the load profiles in the statspack report (... I know :-( ... we're working to upgrade to 11g). As you see below the load seems to be almost similar for the two periods, except of the logical reads. And the question regarding buffer gets and buffer busy waits was regarding this (see further explaination below).

Load Profiles from statspack:

DB Name         DB Id    Instance     Inst Num Release     Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
xx            xxxxxxxxxx xx                  1 9.2.0.8.0   NO      xxxxx
              Snap Id     Snap Time      Sessions Curs/Sess Comment
            --------- ------------------ -------- --------- -------------------
Begin Snap:     46269 08-Nov-11 19:15:02      418      33.1
  End Snap:     46270 08-Nov-11 19:45:03      413      33.5
   Elapsed:               30.02 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
               Buffer Cache:     2,560M      Std Block Size:          8K
           Shared Pool Size:       512M          Log Buffer:      1,024K

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:          1,211,114.60             79,239.20
              Logical reads:            735,605.13             48,128.20
              Block changes:              4,025.15                263.35
             Physical reads:                526.88                 34.47
            Physical writes:                521.38                 34.11
                 User calls:                380.24                 24.88
                     Parses:                293.71                 19.22
                Hard parses:                  0.70                  0.05
                      Sorts:                 67.20                  4.40
                     Logons:                  0.98                  0.06
                   Executes:                462.64                 30.27
               Transactions:                 15.28

  % Blocks changed per Read:    0.55    Recursive Call %:     67.49
 Rollback per transaction %:    6.12       Rows per Sort:    176.17

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:   99.95       Redo NoWait %:    100.00
            Buffer  Hit   %:   99.93    In-memory Sort %:     99.99
            Library Hit   %:   99.91        Soft Parse %:     99.76
         Execute to Parse %:   36.51         Latch Hit %:     96.70
Parse CPU to Parse Elapsd %:    6.69     % Non-Parse CPU:     99.95

              Snap Id     Snap Time      Sessions Curs/Sess Comment
            --------- ------------------ -------- --------- -------------------
Begin Snap:     46210 07-Nov-11 19:15:01      394      31.4
  End Snap:     46211 07-Nov-11 19:45:01      390      31.6
   Elapsed:               30.00 (mins)

Cache Sizes (end)
~~~~~~~~~~~~~~~~~
               Buffer Cache:     2,560M      Std Block Size:          8K
           Shared Pool Size:       512M          Log Buffer:      1,024K

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:          1,229,491.52             78,192.58
              Logical reads:            291,867.69             18,562.05
              Block changes:              4,011.40                255.11
             Physical reads:                652.70                 41.51
            Physical writes:                572.51                 36.41
                 User calls:                359.03                 22.83
                     Parses:                291.68                 18.55
                Hard parses:                  1.06                  0.07
                      Sorts:                 68.46                  4.35
                     Logons:                  1.00                  0.06
                   Executes:                469.89                 29.88
               Transactions:                 15.72

  % Blocks changed per Read:    1.37    Recursive Call %:     70.04
 Rollback per transaction %:    6.12       Rows per Sort:    163.33

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:   99.96       Redo NoWait %:    100.00
            Buffer  Hit   %:   99.80    In-memory Sort %:     99.99
            Library Hit   %:   99.87        Soft Parse %:     99.64
         Execute to Parse %:   37.93         Latch Hit %:     98.77
Parse CPU to Parse Elapsd %:   11.54     % Non-Parse CPU:     99.93


You wrote: A buffer get could be preceded by a buffer busy wait - yes. But it would be an attempt to get a buffer that would cause the wait ...

Yes. A buffer get precedes a buffer busy waits. I think you maybe misunderstood the question (or more precise: my formulation was not good). I'll try to explain. When you for instance have a sleep waiting to get a latch (after spinning) oracle will retry to get the latch. The same need for a latch could therefore trigger more than one latch wait. When oracle has to wait for a buffer (buffer busy waits) it does not retry to get the buffer? It just waits to get it? So the attempt to get one buffer could not lead to more than one buffer busy wait?

You wrote: Can you consider using partitioning to purge this information?

Yes we are definitly considering this for the log table and other tables, both for performance and for maintainance reasons.

You wrote: could be due to read consistency

Yes. It does seem that some of the buffer busy waits come from read conistency. I'm also wondering if some of these waits could come from reaching the maxtrans on index block on the log table. The inserts on this table are using a sequence as the PK, and always hitting one side of the index. If autonumous transaction is being used, this shouldn't be a problem, but I'll have to do some more analysis here.

Thanks for your comments Tom!

By the way! Just to let you know ... I must always put an -/code- in the beginning of the review to get the formating ok. I also get an -/code- attached at the end of the review. If I put an -code- at the end, it will disappear in the preview, but report an unclosed HTML-tag.
</code>
Tom Kyte
November 10, 2011 - 1:27 pm UTC

Given the information you gave me above, the only logical response is "maybe yes, maybe no". You cannot make the conclusion you made on this page based on the information supplied. You may well have other information that makes it obvious *TO YOU*, but we don't. Please do keep that in mind. We cannot see your console, we do not know your history, we do not know your current situation. please just keep that in mind.


It just waits to get it? So the attempt to get one buffer could not lead to more than one buffer busy wait?

In most all cases, we wait for up to a second before waking up and looking around and waiting again.

I'm also wondering if some of these waits could come from reaching the maxtrans on index block on the log table.

look at v$segement_statsistics, it would manifest itself as ITL waits on that index segment.

A global hash partitioned index on that key could work (but would make range scans on the ID not quite as possible) to reduce that sort of contention. As would not using a sequential number.

buffer busy wait vs buffer gets

Lasse Jenssen, November 11, 2011 - 2:27 pm UTC

I do not take this service as granted. I really think it's excellent, and really try to give you the necessary information needed.

With 5 physical CPUs and a 30 minute snapshot (this would be 9000 CPU seconds available) i would think a total of over 135000 sec in wait would indicate a problem?

You wrote: "In most all cases, we wait for up to a second before waking up and looking around and waiting again"

So when waking up and the "looking around" will trigger another "buffer get"?

I'll definitely have a look at the v$segment_statistics, and I'll look into your suggestion on a global hash partitioned index.
Thanks again Tom.
Tom Kyte
November 11, 2011 - 3:07 pm UTC

I do not take this service as granted. I really think it's excellent, and really try to give you the necessary information needed.


no worries, I'm just trying to point out that what is obvious to you - is not necessarily obvious to us :) we only have what you give us to look at.

... 135000 sec in wait would indicate a problem? ...

depends on the user load. If you had 10,000 users, they only waited 13.5 seconds apiece. You could have much more wait time in a given window than 'real time' and have everything be just dandy. It depends.


.. So when waking up and the "looking around" will trigger another "buffer get"?
...

I do not believe so, the buffer get gets stopped by the wait - not the other way around. So 1 get.


wait events: PX Deq: Execute Reply

George Joseph, March 05, 2012 - 12:37 am UTC

Dear Tom

We are using oracle version 10g Enterprise Edition Release 10.2.0.3.0.
There is a stored procedure in a package that inserts data from a multi-table join sources and insert into 4 tables.

I find that there is PARALLEL enabled in one of the tables of the source. This is a partitioned table and index on local partitioned.

We are trying to tune the procedure to finish in less time. THe current procedure takes close to an hour to complete.
It basically selects data from source, performs smaller look ups to non_parallel enabled tables and finally inserts to 4 main tables.

What i did is to check out the wait events while the procdure is running from v$sesssion_event.(484 is the main session id)

SID    EVENT                        TOTAL_WAITS    TOTAL_TIMEOUTS    TIME_WAITED      WAIT_CLASS
484    latch: cache buffers chains        3            0                2             Concurrency
484    db file sequential read          5604           0              6290            User I/O
484    db file scattered read            54            0                74            User I/O
484    latch: row cache objects           1            0                 0            Concurrency
484    PX Deq: Join ACK                  76            0                 8            Idle
484    PX Deq Credit: need buffer       3027           0                17            Idle
484    PX Deq: Parse Reply               52            0                47            Idle
484    PX Deq: Execute Reply            1525           567            118549          Idle
484    PX Deq: Table Q Normal            27            0                 1            Idle
484    SQL*Net message to client         23            0                 0            Network
484    SQL*Net message from client       23            0               12986          Idle
484    SQL*Net more data from client     58            0                75            Network
484    events in waitclass Other       328114          326797           1516          Other


Since most of the TOTAL_TIMEOUTS lie in the "events in waitclass Other" i investigated a bit more on that front and used
waitprof.sql by Tanel Poder.

After running the waitprof.sql with sample size of 500000. I figured out that there
SW_SID STATE SW_EVENT SWP_P1 SWP_P2 SWP_P3 SWP_SEQ# PCT_TOTAL_SAMPLES WAITPROF_TOTAL_MS DIST_EVENTS WAITPROF_AVG_MS
484  WAITING PX Deq: Execute Reply sleeptime/senderid= 200 passes= 5 0  25.9536 3155.95776 2 1577.97888
484  WAITING PX Deq: Execute Reply sleeptime/senderid= 200 passes= 1 0  18.2622 2220.68352 2 1110.34176
484  WAITING PX Deq: Execute Reply sleeptime/senderid= 200 passes= 2 0  16.9174 2057.15584 1 2057.15584
484  WAITING PX Deq: Execute Reply sleeptime/senderid= 200 passes= 4 0  16.4168 1996.28288 2 998.14144
484  WAITING PX Deq: Execute Reply sleeptime/senderid= 200 passes= 3 0  16.2952 1981.49632 1 1981.49632
484  WAITING PX Deq: Execute Reply sleeptime/senderid= 200 passes= 6 0  6.1502 747.86432 1 747.86432
484  WORKING On CPU / runqueue     0.0046 0.55936 7 0.079908571


Googling further aboutthis event :PX Deq: Execute Reply from http://oracledoug.com/px8.html.says its an idle wait event and not to
worry too much about it, although the article says it could indicate that PX Slaves are taking time to process and send results back to
the query coordinator.

What i did is to set the NO_PARALLEL hint at the source tables from wehre i am selecting from and the whole process finishes in less than 10 minutes,
and everyone is happy.

I was hoping to get some direction from you as to how to pin-point what the exact problem is with regards to slaves taking more time(if my assumption is correct that
slaves were indeed the cause of the procedure taking long time).

I then attempted to think it was because of parallel dml not enabled(my procedure essentially does this INSERT INTO DEST SELECT col1,col2, from <parallel_enabled_table>).
ie the slaves were queuing up the work because they knewthe query consumer couldnt process the results that quick.

I set the session to alter sesion enable parallel dml and reran the procedure, however the effect was the same. Same wait events showed up.

With the NO_PARALLEL hint i find that the wait event is "On CPU / runqueue".
Is such an event the ideal wait event to know a process is being run correctly?.Does this show that the entire process is CPU bound.
SW_SID STATE SW_EVENT SWP_P1 SWP_P2 SWP_P3 SWP_SEQ# PCT_TOTAL_SAMPLES WAITPROF_TOTAL_MS DIST_EVENTS WAITPROF_AVG_MS
483 WORKING On CPU / runqueue     100 10110 1 10110

Tom Kyte
March 05, 2012 - 6:56 am UTC

It basically selects data from source, performs smaller look ups to non_parallel enabled tables and finally inserts to 4 main tables.

don't do that, just write one larger query that does the lookups, pack as much work into the sql statement as possible. DO NOT code a "do it yourself nested loops join" which is what you've just described here.


What i did is to set the NO_PARALLEL hint at the source tables from wehre i am selecting from and the whole process finishes in less than 10 minutes,
and everyone is happy.


parallel is not fast=true, it works when you have an abundance of free, fast, idle resources that you can use. You apparently - didn't have them. You do not have sufficient CPU to run with the degree of parallelism you tried, or you didn't have enough memory, or you didn't have enough IO bandwidth, or any combination of the three.

event in wait class other

George Joseph, March 05, 2012 - 7:55 am UTC

The degree of parallelism on the table which has the problem is 2 and instances=1. If i were to run a parallel query on this table such as select count(col) from table it returns the data in a fast manner compared with /*+NO_PARALLEL*/ hint.

Could you please share your thoughts on the wait event event in wait class other.


Tom Kyte
March 05, 2012 - 8:12 am UTC

the wait event is exactly what you described, we are waiting for a parallel execution server to finish the work it is doing. You'd have to look at the parallel execution servers session to see what it was doing.

log file parallel write - really in parallel?

Michael, April 11, 2012 - 8:33 am UTC

hi tom,

afaik a log write is always done synchronous ("_lgwr_async_io=false").

question: how can a "log file parallel write" (writing to several logfiles at the same time) be done in parallel when writing synchronous having only one lgwr process (on unix)?

cheers, michael


Tom Kyte
April 11, 2012 - 11:53 am UTC

As that is an undocumented parameter, its true meaning is not clear to either of us.

It no longer appears in 11g

It was ignored in many cases prior to that.

see note 247959.1, it references HP/UX but it is true on other platforms as well.

log file parallel write

Michael, April 12, 2012 - 12:02 pm UTC

i can only find patch 247959 - "data fix for encumbered amount......"

i can't find a note 247959.1 ?

cheers, michael
Tom Kyte
April 12, 2012 - 1:33 pm UTC

I see, it is in limited publication - sorry about that.

basically - it says that this undocumented parameter is ignored in many situations, even if set explicitly to false, it won't be respected.

meaning, false for this undocumented parameter isn't necessarily false

log file parallel write

Michael, April 13, 2012 - 11:23 am UTC

i'm asking because of following situation.

we have several logfile groups. every group has two members. every member is written to its own independent volume (1:1) - on production.

we plan to change this for pre-production to save costs. every member should write to its own subdirectory on the same volume only (2:1). (loss of data is accepted).

one argument is that lgwr writes log members sequentially in any case. is this a valid assumption?

environment: rhel5, 1 lgwr, no slaves, direct IO

cheers, michael
Tom Kyte
April 13, 2012 - 1:17 pm UTC

what are disk async io and filesystemio options set to


you can always truss lgwr for a bit to see what OS calls it is making on a test system as well.

log file parallel write

Michael, April 17, 2012 - 3:38 am UTC

disk_asynch_io = true

filesystemio_options = asynch (having n databases per host)

filesystemio_options = setall (having 1 database per host)


Tom Kyte
April 17, 2012 - 5:29 am UTC

the underscore parameter is ignored in your case.

logfile parallel write

Michael, April 18, 2012 - 8:36 am UTC

sorry, i don't understand.

my question was:
<quote>
one argument is that lgwr writes log members sequentially in any case. is this a valid assumption?
<\quote>

cheers, michael
Tom Kyte
April 19, 2012 - 4:19 am UTC

question: how can a "log file parallel write" (writing to several logfiles at the same time) be
done in parallel when writing synchronous having only one lgwr process (on unix)?


you are not using synchronous io, you have async on.

logfile parallel write

Michael, April 20, 2012 - 6:23 am UTC

so i got a logfile parallel write is like this:

start logfile parallel write
=> send write call to logfile 1
=> send write call to logfile 2
<= wait until both calls finished
end logfile parallel write


would this be true in any case as long as "disk_asynch_io = true"?


thanks,
michael

Tom Kyte
April 20, 2012 - 4:13 pm UTC

yes, you have async io enabled - even though it looks like you don't by looking at an undocumented parameter (that no longer exists in current releases)

you can always truss or strace the process to verify if you like (suggest doing that on TEST)

gc buffer busy release

A reader, June 14, 2012 - 1:14 pm UTC

Hello,

We are having a insert statement running for more than 2 days. This was supposed to take less than 4 hours (historically that was the elapsed time). I just started tracing the session and noticed wait events like:
gc buffer busy release
ges message buffer allocation
gc current grant busy

This table has quite a few CLOBs.

I haven't seen this wait event before and am curious to know about that. Our system is a 2-node RAC 11.2.0.1

Thanks,

gc Wait Events in 3 instance RAC

Rajeshwaran, Jeyabal, January 11, 2013 - 4:51 am UTC

Tom,

We are monitoring a batch process in production it is running since last night (for past 22 hrs), The same process with same data volumn completed in Pre-production environment with in 8hrs. I queried v$session_event in production and found the below waits are the major ones. (our prod is 3 instance RAC on Oracle 10.2.0.5 running in linux)

gc cr grant 2-way
gc current block 3-way
gc current grant 2-way

What could be the reason for this and how can we resolve this?

More to Explore

Performance

Get all the information about database performance in the Database Performance guide.